143 44 17MB
English Pages 644 [611] Year 2022
Advances in Intelligent Systems and Computing 1424
Satyabrata Roy · Deepak Sinwar · Thinagaran Perumal · Adam Slowik · João Manuel R. S. Tavares Editors
Innovations in Computational Intelligence and Computer Vision Proceedings of ICICV 2021
Advances in Intelligent Systems and Computing Volume 1424
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
More information about this series at https://link.springer.com/bookseries/11156
Satyabrata Roy · Deepak Sinwar · Thinagaran Perumal · Adam Slowik · João Manuel R. S. Tavares Editors
Innovations in Computational Intelligence and Computer Vision Proceedings of ICICV 2021
Editors Satyabrata Roy Department of Computer Science and Engineering Manipal University Jaipur Jaipur, India Thinagaran Perumal Department of Computer Science and Information Technology Universiti Putra Malaysia Selangor, Malaysia
Deepak Sinwar Department of Computer and Communication Engineering Manipal University Jaipur Jaipur, India Adam Slowik Department of Computer Engineering Koszalin University of Technology Koszalin, Poland
João Manuel R. S. Tavares Departamento de Engenharia Mecânica Faculdade de Engenharia Universidade do Porto Porto, Portugal
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-19-0474-5 ISBN 978-981-19-0475-2 (eBook) https://doi.org/10.1007/978-981-19-0475-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
ICICV 2021 Committees
Chief Patron K. Ramnarayan, Chairperson, Manipal University Jaipur, India
Patron G. K. Prabhu, President, Manipal University Jaipur, India
Honorary Chair N. N. Sharma, Pro-President, Manipal University Jaipur, India Nitu Bhatnagar, Registrar, Manipal University Jaipur, India Jagannath Korody, Dean FoE, Manipal University Jaipur, India
General Chair Rajveer Singh Shekhawat, Manipal University Jaipur, India
Program Chair Vijaypal Singh Dhaka, Manipal University Jaipur, India
v
vi
Program Co-Chair Sandeep Joshi, Manipal University Jaipur, India Pankaj Vyas, Manipal University Jaipur, India
Organizing Chair Deepak Sinwar, Manipal University Jaipur, India Satyabrata Roy, Manipal University Jaipur, India
Organizing Co-Chair Sunil Kumar, Manipal University Jaipur, India Amita Nandal, Manipal University Jaipur, India Vijander Singh, Manipal University Jaipur, India Arvind Dhaka, Manipal University Jaipur, India Linesh Raja, Manipal University Jaipur, India
Finance Chair Gulrej Ahmed, Manipal University Jaipur, India
Finance Co-Chair Ghanshyam Raghuwanshi, Manipal University Jaipur, India
Publication Chair Satyabrata Roy, Manipal University Jaipur, India Deepak Sinwar, Manipal University Jaipur, India
ICICV 2021 Committees
ICICV 2021 Committees
Publication Co-Chair Kusumlata Jain, Manipal University Jaipur, India Geeta Rani, Manipal University Jaipur, India
Registration Chair Kuntal Gaur, Manipal University Jaipur, India
Website Chair Krishna Kumar, Manipal University Jaipur, India
Web Designers Dheeraj Sain, Manipal University Jaipur, India Vikas Tatiwal, Manipal University Jaipur, India
Local Organizing Committee Dinesh Kumar Saini, Manipal University Jaipur, India Neha Chaudhary, Manipal University Jaipur, India Sourabh Singh Verma, Manipal University Jaipur, India Vaishali Yadav, Manipal University Jaipur, India Mahesh Jangid, Manipal University Jaipur, India Hemlata Goyal, Manipal University Jaipur, India Somya Goyal, Manipal University Jaipur, India Anita Shrotriya, Manipal University Jaipur, India Nitesh Pradhan, Manipal University Jaipur, India Ravinder Kumar, Manipal University Jaipur, India Pradeep Kumar, Manipal University Jaipur, India Suman Bhakar, Manipal University Jaipur, India Vivek Sharma, Manipal University Jaipur, India Neha V. Sharma, Manipal University Jaipur, India Pradeep Kumar Tiwari, Manipal University Jaipur, India Saket Acharya, Manipal University Jaipur, India
vii
viii
Vaibhav Bhatnagar, Manipal University Jaipur, India Kavita Jhajharia, Manipal University Jaipur, India Rahul Saxena, Manipal University Jaipur, India Anju Yadav, Manipal University Jaipur, India
ICICV 2021 Committees
Preface
This volume comprises research papers presented at the 2nd International Conference on Innovations in Computational Intelligence and Computer Vision (ICICV 2021) organized by Department of Computer and Communication Engineering, School of Computing and Information Technology, Manipal University Jaipur, India, during August 5–6, 2021. The conference was focused on addressing the research challenges and innovations specifically in the field of “Computational Intelligence” and “Computer Vision.” Shri Sandesh Nayak (IAS, Commissioner College Education, Government of Rajasthan, India) inaugurated the event and enlightened the audience with his excellence. The conference received a total of 370 research paper submissions from different countries, including Australia, Azerbaijan, Bahrain, Bangladesh, Canada, Egypt, Indonesia, Iraq, Lebanon, Philippines, Poland, Tanzania, and India. After careful blind review assessment, only 55 qualitative submissions (with an acceptance rate of 15% only) related to the conference’s theme were selected for oral presentations. The papers covered recent innovations in Computer Vision, Machine Learning, Advanced Computing, Image Processing, and Data Processing. We express our sincere thanks to Manipal University Jaipur, India, for providing the wholehearted support in organizing this conference. We would like to extend our sincere appreciation for the outstanding work contributed over many months by the organizing committee of ICICV 2021. We also wish to express our appreciation to Prof. Ibrahim A. Hameed (Norwegian University of Science and Technology, Alesund, Norway), Prof. Rajesh Kumar (MNIT Jaipur, India), Prof. Rajeev Srivastava (IIT BHU, Varanasi, Uttar Pradesh, India), Mr. Aninda Bose (Executive Editor, Springer Nature), Prof. Bhabani Shankar Prasad Mishra (Kalinga Institute of Industrial Technology, Bhubaneswar, Orrisa, India), and Prof. Nilanjan Dey (JIS University Kolkata, West Bengal, India) for their valuable keynote talks during the event. Our special thanks go to all session chairs, track managers, reviewers, student volunteers (Team Aperture, MUJ), and IT-infra team for their outstanding support in organizing
ix
x
Preface
this conference. Finally, our thanks go to all the participants who revitalized the event with their valuable research submissions and presentations. Jaipur, India Jaipur, India Selangor, Malaysia Koszalin, Poland Porto, Portugal
Satyabrata Roy Deepak Sinwar Thinagaran Perumal Adam Slowik João Manuel R. S. Tavares
Contents
An Efficient Self-embedding Fragile Watermarking Scheme Based on Neighborhood Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anupam Shukla, IfshaWadhwa, Aakanksha Gupta, Simran Jaglan, and Shivendra Shivani GeoCloud4EduNet: Geospatial Cloud Computing Model for Visualization and Analysis of Educational Information Network . . . . Chandrima Roy, Ekansh Maheshwari, Manjusha Pandey, Siddharth Swarup Rautaray, and Rabindra K. Barik Political Polarity Classification Using NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . Sagi Harshad Varma and Yanamandra Venkata Sree Harsha Robust Segmentation of Nodules in Ultrasound-B Thyroid Images Through Deep Model-Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siddhant Baldota, C. Malathy, Arjun Chaudhary, and M. Gayathri
1
9
19
35
Image Compression Using Histogram Equalization . . . . . . . . . . . . . . . . . . . Raj Kumar Paul and Saravanan Chandran
47
Violence Detection in Video Footages Using I3D ConvNet . . . . . . . . . . . . . Joel Selvaraj and J. Anuradha
63
Performance Analysis of Gradient Descent and Backpropagation Algorithms in Classifying Areas under Community Quarantine in the Philippines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carmella Denise M. Adeza, Crishmel Bemar G. Manarin, Jerry S. Polinar, Paolo Jyme A. Cabrera, and Arman Bernard G. Santos Least Mean Square Algorithm for Identifying High-risk Areas Vulnerable to Man-Made Disasters: A Philippine Perspective . . . . . . . . . . Jaztine B. Abanes, Mary Joy B. Collarin, Kathleen Q. Manlangit, Kem Irish G. Mercado, Nickcel Jirro M. Natanauan, and Arman Bernard G. Santos
77
87
xi
xii
Contents
Finding Numbers of Occurrences and Duration of a Particular Face in Video Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Prasanth Vaidya and Y. Ramanjaneyulu
95
Dynamic Tuning of Fuzzy Membership Function for an Application of Soil Nutrient Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 R. Aarthi and D. Sivakumar Representative-Based Cluster Undersampling Technique for Imbalanced Credit Scoring Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Sudhansu Ranjan Lenka, Sukant Kishoro Bisoy, Rojalina Priyadarshini, and Biswaranjan Nayak Automatic Road Network Extraction from High-Resolution Images using Fast Fuzzy C-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Tarun Kumar and Rashmi Chaudhari Analysis of Approaches for Irony Detection in Tweets for Online Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 S. Uma Maheswari and S. S. Dhenakaran An Efficient Gabor Scale Average (GSA) based PCA to LDA Feature Extraction of Face and Gait Cues for Multimodal Classifier . . . . 153 N. Santhi, K. Annbuselvi, and S. Sivakumar Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting Style via Shape Simulation Approach . . . . . . . . . . . . . . . . . . . 165 Preeti P. Bhatt, Jitendra V. Nasriwala, and Rakesh R. Savant Kids View—A Parents Companion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Sujata Khedkar, Advait Naik, Omkar Mane, Aditya Gurnani, and Krish Amesur Computational Operations and Hardware Resource Estimation in a Convolutional Neural Network Architecture . . . . . . . . . . . . . . . . . . . . . 189 Jyoti Pandey, Abhijit R. Asati, and Meetha V. Shenoy Load Balancing in Multiprocessor Systems Using Modified Real-Coded Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Poonam Panwar, Chetna Kaushal, Anshu Singla, and Vikas Rattan Robust Image Tampering Detection Technique Using K-Nearest Neighbors (KNN) Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Prabhu Bevinamarad and Prakash H. Unki LIARx: A Partial Fact Fake News Data Set with Label Distribution Approach for Fake News Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Sharanya Venkat, Richa, Gaurang Rao, and Bhaskarjyoti Das
Contents
xiii
A Block-Based Data Hiding Technique Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 P. V. Sabeen Govind and M. V. Judy Energy-Efficient Adaptive Sensing Technique for Smart Healthcare in Connected Healthcare Systems . . . . . . . . . . . . . . . . . . . . . . . . 239 Duaa Abd Alhussein, Ali Kadhum Idrees, and Hassan Harb Transfer Learning Approach for Analyzing Attentiveness of Students in an Online Classroom Environment with Emotion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 K. V. Karan, Vedant Bahel, R. Ranjana, and T. Subha Prediction of COVID-19 Cases and Attribution to Various Factors Across Different Geographical Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Megha Agarwal, Amit Singhal, Monika Patial, and Brejesh Lall Emotion Enhanced Domain Adaptation for Propaganda Detection in Indian Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Malavikka Rajmohan, Rohan Kamath, Akanksha P. Reddy, and Bhaskarjyoti Das Target Identification and Detection on SAR Imagery Using Deep Convolution Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Anishi Gupta, D. Penchalaiah, and Abhijit Bhattacharyya Lung Cancer Detection by Classifying CT Scan Images Using Grey Level Co-occurrence Matrix (GLCM) and K-Nearest Neighbours . . . . . . 293 Aayush Kamdar, Vihaan Sharma, Sagar Sonawane, and Nikita Patil Real-Time Translation of Indian Sign Language to Assist the Hearing and Speech Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 S. Rajarajeswari, Naveen Mathews Renji, Pooja Kumari, Manasa Keshavamurthy, and K. Kruthika EYE4U-Multifold Protection Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Vernika Sapra, Rohan Gupta, Parikshit Sharma, Rashika Grover, and Urvashi Sapra A Comprehensive Source Code Plagiarism Detection Software . . . . . . . . . 343 Amay Dilip Jain, Ankur Gupta, Diksha Choudhary, Nayan, and Ashish Tiwari Path Planning of Mobile Robot Using Adaptive Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Himanshu, Arindam Singha, Akash Kumar, and Anjan Kumar Ray
xiv
Contents
Impact on Mental Health of Youth in Punjab State of India Amid COVID-19—A Survey-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Ramnita Sharda, Nishant Juneja, Harleen Kaur, and Rakesh Kumar Sharma SmartACL: Anterior Cruciate Ligament Tear Detection by Analyzing MRI Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Joel K. Shaju, Neha Ann Joshy, Alisha R. Singh, and Rahul Jadhav Building a Neural Network for Identification and Localization of Diseases from Images of Eye Sonography . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Shreyas Talole, Aditya Shinde, Atharva Bapat, and Sharmila Sengupta A Quick Dynamic Attribute Subset Method for High Dimensional Data Using Correlation-Guided Cluster Analysis and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Nandipati Bhagya Lakshmi, Nagaraju Devarakonda, Zdzislaw Polkowski, and Anusha Papasani Copy-Move Forgery Detection Using BEBLID Features and DCT . . . . . . 409 Ganga S. Nair, C. Gitanjali Nambiar, Nayana Rajith, Krishna Nanda, and Jyothisha J. Nair Engineering Design Optimization Using Memorized Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Raghav Prasad Parouha and Pooja Verma Image Forgery Detection Using CNN and Local Binary Pattern-Based Patch Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Shuvro Pal and G. M. Atiqur Rahaman Extracting and Developing Spectral Indices for Soil Using Sentinel-2A to Investigate the Association of Soil NPK . . . . . . . . . . . . . . . . 441 V. Dhayalan and Karuppasamy Sudalaimuthu Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth Engine: Case Study of Odisha State, India . . . . . . . . . . . . . . . . . . . . . 455 Somya Jain, Anita Gautam, Arpana Chaudhary, Chetna Soni, and Chilka Sharma Remote Sensing Image Captioning via Multilevel Attention-Based Visual Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Nirmala Murali and A. P. Shanthi Auto Target Moving Object with Spy BOT . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Atharva Ambre, Ashwin Selvarangan, Rushabh Mehrotra, and Sumeet Thakur
Contents
xv
Power System Restoration at Short Period of Time During Blackout by Plugin Hybrid Electric Vehicle Station Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 R. Hariharan Robust Adversarial Training for Detection of Adversarial Samples . . . . . 501 Sandip Shinde, Jatan Loya, Shreya Lunkad, Harsh Pandey, Manas Nagaraj, and Khushali Daga Performance Evaluation of Shallow and Deep Neural Networks for Dementia Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Deepika Bansal, Kavita Khanna, Rita Chhikara, Rakesh Kumar Dua, and Rajeev Malhotra Computer Vision Based Roadside Traffic Convex Mirror Validation for Driver Assistance System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Suraj Dhalwar, Hansa Meghwani, and Sachin Salgar Feature Extraction Using Autoencoders: A Case Study with Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Maria Achary and Siby Abraham Combination of Expression Data and Predictive Modelling for Polycystic Ovary Disease and Assessing Risk of Infertility Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Sakshi Vats, Abhishek Sengupta, Ankur Chaurasia, and Priyanka Narad Dematerializing Vehicle Documents with IoT—Effective Solution Using Existing Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Namrata Thorve and Mansi Subhedar Building NTH: Network Threat Hunter with Deep Learning . . . . . . . . . . 567 Taniya Thawani, Sourav Kunal, and Parth Gandhi Future Cases Prediction of COVID-19 Using Deep Learning Models . . . 579 VijayBhaskar Kanchipamu, Pappu Bhavani, and Javvadi Tejasri An Intelligent Species Level Deep Learning-Based Framework in Automatic Classification of Microscopic Bacteria Images . . . . . . . . . . . 597 Priya Rani, Shallu Kotwal, and Jatinder Manhas Modeling Daily Pan Evaporation Using Tree-Based Regression Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Sherin Babu and Binu Thomas Optimized Pose-Based Gait Analysis for Surveillance . . . . . . . . . . . . . . . . . 615 Apoorva Parashar, Anubha Parashar, and Vidyadhar Aski Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
About the Editors
Dr. Satyabrata Roy is an Assistant Professor at the Department of Computer Science and Engineering, School of Computing and Information Technology at Manipal University Jaipur, Rajasthan, India. He received his Ph.D. and M.Tech. (with honors) degrees in Computer Science and Engineering in 2020 and 2014 respectively; and B.Tech. in Computer Science and Engineering Department in 2009. His research interests include Cryptography, Internet of Things, Cellular Automata, Computer Networks, Computational Intelligence, Machine Learning and Formal Languages. He is an enthusiastic and motivating technocrat with more than 10 years of research and academic experience. He has served as resource person of many FDPs and seminars. He is a member of ACM and senior member of IEEE. Dr. Deepak Sinwar is an Assistant Professor at the Department of Computer and Communication Engineering, School of Computing and Information Technology at Manipal University Jaipur, Jaipur, Rajasthan, India. He received his Ph.D. and M.Tech. degrees in Computer Science and Engineering in 2016 and 2010 respectively; and B.Tech. (with honors) in Information Technology in 2008. His research interests include Computational Intelligence, Data Mining, Machine Learning, Reliability Theory, Computer Networks and Pattern Recognition. He is an enthusiastic and motivating technocrat with more than 11 years of research and academic experience. He is a life member of Indian Society for Technical Education (India), and member of ACM and IEEE. Thinagaran Perumal is currently a Senior Lecturer at the Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia. He is also currently appointed as Head of Cyber-Physical Systems in the university and been elected as Chair of IEEE Consumer Electronics Society Malaysia Chapter. He is the recipient of 2014 Early Career Award from IEEE Consumer Electronics Society for his pioneering contribution in the field of consumer electronics. He completed his Ph.D. at Universiti Putra Malaysia, in smart technology and robotics. His research interests are towards interoperability aspects of smart homes and Internet of Things (IoT), wearable computing, and cyber-physical systems. He is xvii
xviii
About the Editors
also heading the National Committee on Standardization for IoT (IEC/ISO TC/G/16) as Chairman since 2018. Some of the eminent works include proactive architecture for IoT systems; development of the cognitive IoT frameworks for smart homes and wearable devices for rehabilitation purposes. He is an active member of IEEE Consumer Electronics Society and its Future Directions Committee on Internet of Things. He has been invited to give several keynote lectures and plenary talk on Internet of Things in various institutions and organizations internationally. He has published several papers in IEEE Conferences and Journals and is serving as TPC member for several reputed IEEE conferences. He is an active reviewer for IEEE Internet of Things Journal, IEEE Communication Magazine, IEEE Sensors Journal, and IEEE Transaction for Automation Science and Engineering, to name a few. Adam Slowik (IEEE Member 2007; IEEE Senior Member 2012) received the B.Sc. and M.Sc. degrees in computer engineering and electronics in 2001 and the Ph.D. degree with distinction in 2007 from the Department of Electronics and Computer Science, Koszalin University of Technology, Koszalin, Poland. He received the Dr. habil. degree in computer science (Intelligent Systems) in 2013 from the Department of Mechanical Engineering and Computer Science, Czestochowa University of Technology, Czestochowa, Poland. Since October 2013, he has been an Associate Professor in the Department of Electronics and Computer Science, Koszalin University of Technology. His research interests include Soft Computing, Computational Intelligence, and, particularly, Bio-inspired Optimization algorithms and their engineering applications. He is a reviewer for many international scientific journals. He is an author or co-author of over 100 refereed articles in international journals, two books, and conference proceedings, including one invited talk. He is an editor of two books Swarm Intelligence Algorithms published in 2020 by Taylor & Francis Group (CRC Press). He was a Program Chair during International Conference on Advanced Intelligent Systems and Informatics (2020). Many times, he was a Guest Editor in Special Issues which were organized in such journals as IEEE Transactions on Industrial Informatics, IEEE Transactions on Industrial Electronics, IEEE Transactions on Fuzzy Systems. Dr. Slowik is an Associate Editor of the IEEE Transactions on Industrial Informatics. He is a member of the program committees of several important international conferences in the area of Artificial Intelligence and Evolutionary Computation. He was a recipient of one Best Paper Award (IEEE Conference on Human System Interaction—HSI 2008). Dr. Slowik is a Head of the Department of Computer Engineering at Koszalin University of Technology. João Manuel R. S. Tavares graduated in Mechanical Engineering at the Universidade do Porto, Portugal in 1992. He also earned his M.Sc. degree and Ph.D. degree in Electrical and Computer Engineering from the Universidade do Porto in 1995 and 2001, and attained his Habilitation in Mechanical Engineering in 2015. He is a senior researcher at the Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial (INEGI) and Full Professor at the Department of Mechanical Engineering (DEMec) of the Faculdade de Engenharia da Universidade do Porto (FEUP). João Tavares is co-editor of more than 75 books, co-author of more than 50
About the Editors
xix
book chapters, 650 articles in international and national journals and conferences, and three international and three national patents. He has been a committee member of several international and national journals and conferences, is co-founder and coeditor of the book series Lecture Notes in Computational Vision and Biomechanics published by Springer, founder and Editor-in-Chief of the journal Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization published by Taylor & Francis, Editor-in-Chief of the journal Computer Methods in Biomechanics and Biomedical Engineering published by Taylor & Francis, and co-founder and cochair of the international conference series: CompIMAGE, ECCOMAS VipIMAGE, ICCEBS and BioDental. Additionally, he has been (co-)supervisor of several M.Sc. and Ph.D. thesis and supervisor of several post-doc projects, and has participated in many scientific projects both as researcher and as scientific coordinator. His main research areas include computational vision, medical imaging, biomedical engineering, biomechanics, scientific visualization, human-computer interaction and new product development.
An Efficient Self-embedding Fragile Watermarking Scheme Based on Neighborhood Relationship Anupam Shukla, IfshaWadhwa, Aakanksha Gupta, Simran Jaglan, and Shivendra Shivani
Abstract As the technology is growing at a drastic rate, threats to confidential images have also increased. Various software have been developed with which anyone can easily manipulate the images having malevolent intents or passing some information by hiding in it. The motive of this paper is to propose an approach through which we can detect whether an image has been tampered or not. To detect tampering, we have used pixel-wise bit slicing approach. This approach is efficient in a way as it can locate the precise area that has been tampered; for this, we have used the last three least significant bits (LSBs) operated on first five most significant bits (MSBs). If there is any tampering, XOR operator is used on the bits of original image and tampered image to locate the area of distortion. We have shown the accuracy of this algorithm by testing this algorithm on various images under attack. Keywords Fragile watermarking · Self-embedding technique · Bit slicing approach · XOR operator · LSB-based approach
1 Introduction Invasion of privacy has always been a threat. With the introduction of Internet, risk of exposure to our daily life has also increased. Digital media is used by every generation, and people upload images which show details of their daily life. Knowing if the image is genuine or not has become a very difficult task, as with the availability of various applications for manipulation of the image, it has become very easy to tamper the images. Seeing all the tampering, a scheme has been developed, i.e., digital watermarking. This technique helps to identify the location where an image has been tampered. In this, we embed a watermark by editing the LSBs of an image, which are the A. Shukla · IfshaWadhwa · A. Gupta · S. Jaglan · S. Shivani (B) Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_1
1
2
A. Shukla et al.
authentication bits, so if anyone tries to manipulate an image we can easily locate the area of tampering. Digital watermarking is of various kinds: robust watermarking, semi-fragile watermarking and fragile watermarking. (1) (2)
(3)
Robust watermarking is when with the tampering of image the watermark is not affected at all. It is mainly used for the protection of copyright. Semi Fragile Watermarking can only detect some kind of tampering, as an image in which semi fragile watermark has been embedded is tampered are mainly unintentional. Fragile watermarking is very sensitive to modification as when there is slightest manipulation in the image the watermark will also lose its integrity. Its main application is tamper detection.
Under fragile watermarking, schemes have been categorized into block-based scheme and pixel-based scheme. In block based, the image is divided into 4 × 4 blocks or 8 × 8 blocks, in each block watermarking is embedded, and the when the image is tampered the whole block is located as tampered [1–3]. Dhole et al. have proposed a scheme in which they are using non-sequential block chain based on secret key to detect the tampered area and then recover the host image; this scheme can efficiently detect the erroneous block but not detect the exact location of tampering. In pixel-based watermarking, the image is divided into multiple pixels and watermark is embedded in bits of every pixel so as when the image is tampered we can easily locate the exact location [4–7]. Qin et al. have proposed a scheme in which they are deriving the self-embedding bits from the mean value of each overlapping block and authentication bit is embedded into the center pixel of each block; this scheme can easily detect the tampered area even at larger tampering rate. In this paper, we are using pixel-based fragile watermarking scheme as in this exact manipulated pixels are highlighted whereas in block-based scheme the whole block is highlighted. This concludes that pixel-based scheme is more efficient than block-based scheme. The proposed scheme can identify any attack with high probability. In this scheme, we have introduced valid-dependent integral (VDI) bits. These bits help in localizing the tampered area as they are embedded as a watermark in the image. The introduced bits are inserted as last 3 LSBs which are dependent on first 5 MSBs. The valid bit is obtained by operating XOR operation on first 5 bits of selected pixel, dependent bit is obtained by operating various operations like XOR, multiplication and mod on neighboring pixel, and integral bit is based on the valid and dependent bit. The rest of this paper is organized as follows. In Sect. 2, we present proposed scheme. Experimental results and analysis are given in Sect. 3. Section4 deals with literature review, and Sect. 5 is of conclusion.
An Efficient Self-embedding Fragile Watermarking Scheme Based …
3
2 Literature Review Fang Cao et al. proposed a scheme [1] where they are taking the MSBs individually with different extension ratios according to their importance to image visual quality. The individual data are segmented into a series of groups corresponding to the divided non-overlapping blocks and then embedded into the LSBs of the block with its authentication bit. Dhole et al. [2] proposed a modified fragile watermarking technique for image recovery. They had used a non-sequential block chain and a random block chain based on a secret key. Eswaraiah et al. worked on scheme [8] based on securing the images which belongs to medical field through block-based medical image watermarking technique. Here, they had selected a region of interest (ROI), where they have embedded the watermark. Huang et al. [9] proposed a divide– and-conquer fragile self-embedding watermarking with adaptive payload for digital images. A GBVS is adopted to identify ROI and ROB. The main aim of this algorithm is to protect the ROI with its higher priority by two procedures, that is, backup information collection and payload allocation. Kiatpapan et al. proposed a scheme [4] based on self-embedding dual watermarking. This algorithm is capable of detecting tampered area in the host image and also recovering it. Qin et al. [5] used hashing method to generate authentication bit for the host image. NSCT coefficient is used to embed restoration bit which provides restoration ability. After this, host image is divided into two categories, that is, smooth and complex blocks. Here, complex block contains higher priority than smooth block. Rakhmawati et al. [3] proposed a technique to make an algorithm of self-embedding watermarking in the host image. It is capable of detecting the tampered area in the host image and also restoring the tampered area of that image. Singh et al. [10] presented discrete cosine transformation (DCT) based on self-recoverable fragile watermarking technique. The proposed technique is effective as the two authentication bits are based on two levels of hierarchical tampered detection mechanisms. So, there is high probability of detecting tampered area in the host image. Tong et al. proposed a scheme [11] based on fragile watermarking. Here, they have used cross-chaotic map. In that, they have confused the block formed by the original image. They have taken the combination of MSBs and LSBs and inserted the resultant bits into the LSBs of the host image. This scheme improves the tamper detection and recovers it. Veni et al. worked on [7] based on the combination of discrete wavelet transform and singular value decomposition (DWT-SVD) and fruit fly algorithm. Here in this, the host image is divided into various sub-bands, then the value of those sub-bands is calculated, and those which are low bands are kept for further process.
3 Proposed Scheme The proposed scheme can be divided into three steps. In the first step to generate the value of the valid bit, the second step is to derive the value of dependent bit and last
4
A. Shukla et al.
Fig. 1 Block representation of the proposed approach
we have to find the value of integral bit. The value of these bits is embedded into the image as watermark, and when an image is tampered, we easily locate the area (Fig. 1). For this, we take a grayscale image of ‘m × n’ size where total number of pixels will be N = m × n. Now, we know that pixel value of all the pixels is in between (0, 1, …, 255), that is, Pi = (0, . . . , 255), where i = 1, 2, 3, . . . , N . Then, each bit of a pixel can be represented in the form of b(Pi , 0), b(Pi , 1), . . . , b(Pi , 7) where b = bit of pixel Pi . Therefore, individual pixel is represented in Eq. 1: b( Pi , u) = Pi /2u mod 2
(1)
Bits of a particular pixel can be represented in Eq. 2: Pi =
7
b(Pi , u) ∗ 2u
(2)
u=0
As we know, MSB contains structural information and LSB contains detailed information of a pixel. Hence, changes in LSBs will reflect minimum distortion in image. In the proposed approach, first three LSBs are chosen for embedding. Changes in these bits may modify the intensity from 0 (zero) to 7 (seven) units. All these changes cannot be recognized by human visual system, and therefore, operating last three LSBs on first five MSBs for watermarking can bring minimum change with maximum impact. Step 1. The first LSB (valid bit) of the pixel is for sequence bound bit. To achieve this, we will perform XOR operations between the 5 MSBs of that pixel. If the current bit is at the end of the 5 MSBs, then the two MSBs of the other end are calculated by using XOR operator and then on the resultant bit we again use the XOR operator with the current bit. And if current bit is not at the end of the MSBs, then both adjacent bits are used to perform XOR operation and then the resultant bit used to perform the XOR operation with the current bit. Through above process, we will get 5 bits. We will again perform XOR operation on these 5 bits until we have one bit remaining. It has been represented in Fig. 2. Suppose 5 bits are 01,010. Step 2. Second LSB (dependent bit) is for pixel bounding through its adjacent pixel’s positions. This can be achieved by taking last 5 bits of each pixel which is adjacent to the current pixel and XOR them. After XOR, multiply the resultant with
An Efficient Self-embedding Fragile Watermarking Scheme Based …
5
Fig. 2 Representing the pattern for generation of first LSB
itself and its mode 2 will be our second LSB of the current pixel. It can be represented in Fig. 3: Suppose Pi is the current pixel, ( Pi−1 , 3) ⊕ ( Pi+1 , 3) , (Pi−1 , 4) ⊕ (Pi+1 , 4), . . . ., (Pi−1 , 7) ⊕ (Pi+1 , 7) [(R0 . R0 ) + (R1 · R1 ) + . . . + (R4 . R4 )] mod 2
(3)
where R(0, 1, …, 4) are obtained bits.
Fig. 3 Experimental results: a host images, b watermarked images, c tampered images, d detection results
6
A. Shukla et al.
Step 3. In this step, we will place an integral bit at the last LSB. We will use the property of XOR, i.e., a XOR b = c, where a, b will be our first and second LSBs. After that, the combination of resultant and a pseudo-random matrix of size ‘m × n’ using a secret key will be placed at the last LSB.
4 Experimental Result and Analysis Experiments have been performed on different sets of images. In this section, tamper detection results for few images have been demonstrated. Figure 3 shows the experimental results where three medical images are taken as test data. Set of images (a) and (b) show the host and watermarked images, respectively. Here, one can see the higher imperceptibility of the proposed scheme as both the sets of images are almost identical. Set of images (c) show the altered images. Here, intentional attacks have been performed on each image which cannot be perceived by our human visual system. As these are medical images, hence a minor change also may cause a wrong diagnosis. Set of images (d) are nothing but the tamper detection results. Here, black pixels show the non-tampered region, whereas white pixels show the tampered region. Here, we can see that we are able to detect almost 90% tampered pixels with exact locations. This proves the efficacy of the proposed scheme. Table 1 shows the quantitative analysis of the proposed approach where data has been demonstrated for five test images. Column 1 shows the name of the watermarked image. Column 2 is the imperceptibility in terms of peak signal-to-noise ratio (PSNR) between cover images and watermarked images. Third column demonstrates the number of tampered pixels after any intentional or unintentional attack. Forth column shows the accuracy of the proposed approach in terms of the tamper detection ratio. Table 1 Tamper detection accuracy and imperceptibility results Images
PSNR (dB)
Tampered pixels
Detected pixels
Medical image 1
35.9
85
76
Medical image 1
34.2
164
152
Medical image 1
34.7
72
61
Lena
36.1
276
251
Baboon
36.8
187
174
An Efficient Self-embedding Fragile Watermarking Scheme Based …
7
5 Conclusion This paper proposes a self-embedding fragile watermarking scheme where we have discussed and reviewed various papers which are based on block-based and pixelbased algorithms. We have taken three LSBs of a pixel, and each LSB is embedded through different techniques. The contribution of these three LSBs according to our algorithm makes the tamper detection more efficient in the host image. Quantitative and graphical experimental results show the accuracy and efficiency of the proposed approach.
References 1. Fang, C., An, B., Wang, J., Ye, D., Wang, H.: Hierarchical recovery for tampered images based on watermark self-embedding. Displays (2017) 2. Dhole, V.S., Patil, N.N.: Self embedding fragile watermarking for image tampering detection and image recovery using self recovery blocks. In: IEEE, 2015 International Conference on Computing Communication Control and Automation 3. Singh, D., Singh, S.K.: Effective self-embedding watermarking scheme for image tampered detection and localization with recovery capability. CrossMark 1047–3203 (2016) 4. Kiatpapan. S., Kondo, T.: An Image Tamper Detection and Recovery Method Based on SelfEmbedding Dual Watermarking, IEEE, 978–1–4799–7961–5 (2015) 5. Qin, C., Ji, P., Zhang, X., Dong, J., Wang, J.: Fragile image watermarking with pixel-wise recovery based on overlapping embedding strategy. Signal Process. S0165–1684(17), 30125– 30131 (2017) 6. Rakhmawati, L., Wirawan, W., Suwadi, S.: A Recent Survey Of Self-Embedding Fragile Watermarking Scheme for Image Authentication with Recovery Capability. Springer Open, s13640–019–0462–3 (2019) 7. Zhang, X., Wang, S: Statistical fragile watermarking capable of locating individual tampered pixels. IEEE Signal Process. Lett. 14(10) (2007) 8. Eswaraiah, R., Sreenivasa Reddy, E.: ROI-Based Fragile Medical Image Watermarking Technique for Tamper Detection and Recovery using Variance, IEEE (2014) 9. Huang, R., Liu, L., Liao, X., Sun, S.: A Divide-and-Conquer Fragile Self-Embedding Watermarking with Adaptive Payload. Springer 10. Tong, X., Liu, Y., Zhang, M., Chen, Y: A novel chaos-based fragile watermarking for image tampering detection and self-recovery. CrossMark 301–308 (2013) 11. Veni, M., Meyyappan, T.: Digital Image Watermark Embedding and Extraction Using Oppositional Fruit Fly Algorithm. Springer, s11042–019–7650–0 (2019)
GeoCloud4EduNet: Geospatial Cloud Computing Model for Visualization and Analysis of Educational Information Network Chandrima Roy, Ekansh Maheshwari, Manjusha Pandey, Siddharth Swarup Rautaray, and Rabindra K. Barik Abstract Information regarding a particular college is available online but a centralized system that maps all the colleges and their faculties to their research works is needed for the aspiring students who are willing to join on their preferred department. In the present research paper, Geospatial cloud computing model, i.e., GeoCloud4EduNet proposed for educational information infrastructure to tackle big data problem, which works on the distributed computing environment. As a case study, this paper currently focusing on the National Institutes of Technology (NITs) and their various departments as well as the specialization or the research interest of faculties. NITs are autonomous public institutes of higher education, located in India. The key challenge is for the students to connect with the right faculty of their domain as this kind of issue has not been addressed in the past. Keywords Big data · Education · Research interest · Centralized model
1 Introduction The geospatial cloud is opening up to many companies, utilities and governments the field of geospatial modeling and analysis. In the cloud environment, geospatial
C. Roy (B) · E. Maheshwari · M. Pandey · S. S. Rautaray School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, Orissa, India e-mail: [email protected] M. Pandey e-mail: [email protected] S. S. Rautaray e-mail: [email protected] R. K. Barik School of Computer Application, Kalinga Institute of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, Orissa, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_2
9
10
C. Roy et al.
technologies have generated an entire different architecture and scope for geospatial applications. For big data processing and computation, the geospatial cloud is flexible and facilitates spikes during critical events. This research will use geospatial cloud computing model which can test collected geospatial database. In this case study, there are so many different branches available in NITs; it is tough to select the best one from it. It also varies from different NIT to NIT. While selecting a branch in a particular NIT, candidates should ensure the faculty strength as well as the domain interest of a specific faculty. In the time of over escalating data and information, centralization of information is needed, i.e., a student needs to search every college website manually and then to faculty page and search their research interest. It is very much time consuming and not at all effective because a student cannot compare every NIT at a glance. This research developed a distributed and centralized system to find colleges and relevant faculties very efficiently [1]. First a database was created manually which has all the information of all the faculties at one place, after that K-means clustering is performed on the data. This kind of large dataset cannot be executed using a simple query that’s why this research demonstrate a model around tools which are capable in handling vast amount of data.
2 Related Works 2.1 Cloud Computing Cloud computing is an omnipresent computing model that has pioneered the delivery of computer facilities and services [2]. Universities in third world countries face social and economic obstacles that affect their ability to engage in extensive information technology to compete globally. Cloud computing vendors enable users to automatically increase or reduce their use of service based on their need using a pay per use model. Massive growth has been observed in the data scale [3] or big data produced by cloud computing. Big data allows users to use distributed computing in a timely manner to process distributed queries through various databases and return resulting sets. Big data allows users to use distributed computing in a timely manner to process distributed queries through various databases and return resulting sets. Big data uses cloud-based distributed storage systems instead of local storage connected to electronic device [4]. In geospatial cloud model, data semantics perform an incredibly important role by providing geospatial data with semantic requirements and thus allowing information sharing and scalability.
GeoCloud4EduNet: Geospatial Cloud Computing Model …
11
2.2 Geospatial Big Data Geospatial big data denotes to spatial data set that surpass current computing systems capacity. A large proportion of big data is basically geospatial information, and at least 20% of such data expands exponentially every year [5]. Cloud Geographic Information Systems (GISs) have evolved as a platform for geospatial data modeling, storage and distribution [6]. GIS is used in business decisions, storing various data types, delivering data and maps to a standard user-specific level, recreating, verifying data, building and testing and sending report to authorities. Abstract Spatial Data Infrastructure (SDI) is a central mechanism for web-based sharing of geospatial large data [7]. Cloud computing integration of SDI resulted in the formation of cloud-SDI as a tool for geospatial data transfer, processing and analysis. Geospatial Big Data (GBD) can be commonly viewed as data sets containing locational information and beyond the ability of commercially available infrastructure, software and information technology [8].
2.3 Educational Information Network Geospatial Web Service is one of the main SDI design and deployment software specifications. SDI’s development and implementation is based on service oriented architecture (SOA), which is used to exchange comprehensive educational information on the web [9] . SDI creates an atmosphere in which organizations engage with technology to facilitate the use, management and development of geographic information [10]. The service oriented architecture attempts to build a decentralized, adaptive, scalable and Internet-based reconfigurable service infrastructure that can meet SDI design knowledge and service needs [11] .
3 Proposed Work The main aim of the present work is development of the prototype as GeoCloud4EduNet in web platform for sharing the educational institute data network in India. The goal of the designed model is to provide a medium for the distribution of geospatial information as queried by the user by a web interface. It is possible to collect and map geospatial information to the recorded data and upload it to the cloud database server. Figure 1 shows the system architecture for Geospatial Cloud Model which consist of client side and server side. Client send request to the database service to get the desired result. After the preparation of dataset, raw data is transformed into understandable format which further can be analyzed to use in different purpose. Along this selection
12
C. Roy et al.
Fig. 1 Conceptual overview of GeoCloud4EduNet framework
of evaluation metrics for further analysis is taken place. Next this work compute the faculty count for each NIT as well as the distribution of faculty in various departments has been visualized by pie chart and bar graph using clustering.
4 Result and Discussion The National Institutional Ranking Framework (NIRF) was approved by the Ministry of Human Resource Development (MHRD) to rank institutions, including each NIT College, across the country [12]. While ranking a college, the following six factors are considered such as Placement percentage, Placement packages, Infrastructure, Faculty, Brand value and Cut off ranks. Among them, this research is considering the factor faculty. Many studies are going on in various NITs in India, as there is no centralized system; very few are known to everyone. This research work proposes a geospatial cloud computing model for educational infrastructure in India. Using these system students can easily find different researches and the faculties associated with it [3, 13].
GeoCloud4EduNet: Geospatial Cloud Computing Model …
13
4.1 Dataset Preparation No existing dataset is present online which covers information about various NITs, their departments, researches and the consulting faculties and their universities as well as location. So, we prepared our own dataset that encompasses information about all the NIT’s in India [14]. Data has been collected from various NIT’s official website and verified by government approved agencies. This dataset contains of attributes like: University name, Faculty name, Faculties designation, Research areas, Qualification, email ID to contact them, Department in which they work, Latitude and Longitude for the location purpose.
4.2 Data Visualization Since visualization is a significant assistance to build SDI’s geospatial database, a free open source Quantum GIS software has been used to visualize the data and its results to take benefit of functioning in distributed environment; represented in Fig. 2.
4.3 Analysis of Test Case In the present study, python tools are used to perform k-means cluster analysis [15]. The input is a set of data points which represented in the dataset. Here, we take k = 3 as an input, i.e., 3 cluster will produce. It starts by placing a centroid in random locations in each vector space. For each individual data point, the algorithm finds the nearest centroid, after that it will compute the distance between every cluster centroid and each data point. Next, it will select the cluster which has the minimum distance to the nearest centroid. Then the algorithm assigns the data point to the nearest centroid [3, 4]. Again it run over the cluster for K centroid and for each centroid, recomputed the position of the data points fall into that specific cluster. There are three colors that have been used for clustering. Green color represents high, blue medium and red low. In Fig. 3, scatter plot is used to show percentage of faculty present in various NIT’s. Total faculty strength of NIT Rourkela is 304 which hold the highest number of faculty within the entire NITs. NIT Raipur has 251; NIT Warangal has 243 numbers of faculties, respectively. Above work estimates the percentage wise faculty distribution of top two NIT according to faculty distribution in each department present on NIT Tiruchirappalli and NIT Warangal. A pie chart is used to depict the distribution of faculty by each department (Fig. 4). Such as in NIT Tiruchirappalli, Mechanical Engineering department has most number of faculty which is 9.04% (20 out of 221), whereas in
14
C. Roy et al.
Fig. 2 Visualization of all NIT’s India
NIT Warangal, Electronics and Communication Engineering department has most number of faculty which is 11.52% (28 of 243) out of total faculty present on that respective NIT. A striking aspect of Fig. 5 is that majority of the faculty of NIT Tiruchirappalli are seen forecasting their interest on basic core computer science subjects such as Internet of things, theory of computation, algorithm whereas in NIT Warangal most of them are into machine learning.
5 Conclusion This paper proposes a geospatial cloud computing model which presented the visualization and analysis of the faculty strength of NITs in India, which department has what number of faculties, which NIT is good for which department, etc. Aim of this
GeoCloud4EduNet: Geospatial Cloud Computing Model …
15
Fig. 3 Percentage of faculty in each NIT
Fig. 4 Percentage wise faculty distribution in NIT Warangal and NIT Tiruchirappalli for each department
paper is to show how various aspects of different NITs research potential as well as strength can be measured and visualized. By summarizing the analysis of different groups of faculties of various department of the studied university, this research made it easy for the students to choose their preferred NIT along with respective department. As a future work, fog computing can be used for analysis of data and also, it can be implemented by using IoT and geospatial edge computing environment.
16
C. Roy et al.
Fig. 5 CSE research area-based distribution for NIT Tiruchirappalli and NIT Warangal
References 1. Maheshwari, E., Roy, C., Pandey, M., Rautray, S.S.: Prediction of factors associated with the dropout rates of primary to high school students in India using data mining tools. In: Frontiers in Intelligent Computing: Theory and Applications, pp. 242–251. Springer, Singapore (2020) 2. Sabi, H.M., Uzoka, F.M.E., Langmia, K., Njeh, F.N.: Conceptualizing a model for adoption of cloud computing in education. Int. J. Inf. Manage. 36(2), 183–191 (2016) 3. Roy, C., Barua, K., Agarwal, S., Pandey, M., Rautaray, S.S.: Horizontal scaling enhancement for optimized big data processing. In: Emerging Technologies in Data Mining and Information Security, pp. 639–649. Springer, Singapore (2019) 4. Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015) 5. Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Research 2(2), 74–81 (2015) 6. Barik, R.K.: CloudGanga: cloud computing based SDI model for ganga river basin management in India. In: Geospatial Intelligence: Concepts, Methodologies, Tools, and Applications, pp. 278–297. IGI Global (2019) 7. Barik, R.K., Dubey, H., Mankodiya, K., Sasane, S.A., Misra, C.: GeoFog4Health: a fog-based SDI framework for geospatial health big data analysis. J. Ambient. Intell. Humaniz. Comput. 10(2), 551–567 (2019) 8. McCoy, M.D.: geospatial big data and archaeology: prospects and problems too great to ignore. J. Archaeol. Sci. 84, 74–94 (2017) 9. Barik, R.K., Lenka, R.K., Samaddar, A.B., Pattnaik, J., Prakash, B., Agarwal, V.: mGeoEduNet: mobile SDI model for education information infrastructure network. In: Advances in Electronics, Communication and Computing, pp. 291–300. Springer, Singapore (2018) 10. Barik, R.K., Samaddar, A.B.: Service oriented architecture based SDI model for education sector in India. In: Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013, pp. 555–562. Springer, Cham (2014) 11. Roy, C., Rautaray, S.S., Pandey, M.: Big data optimization techniques: a survey. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 10(4), 41–48 (2018) 12. Roy, C., Pandey, M., Swarup Rautaray, S.: A proposal for optimization of data node by horizontal scaling of name node using big data tools. In: 2018 3rd International Conference for Convergence in Technology (I2CT), pp. 1–6. IEEE (2018)
GeoCloud4EduNet: Geospatial Cloud Computing Model …
17
13. Lowe, R.A., Gonzalez-Brambila, C.: Faculty entrepreneurs and research productivity. J. Technol. Transf. 32(3), 173–194 (2007) 14. Austin, A.E., Sorcinelli, M.D., McDaniels, M.: Understanding new faculty background, aspirations, challenges, and growth. In” The scholarship of teaching and learning in higher education: an evidence-based perspective, pp. 39–89. Springer, Dordrecht (2007) 15. Van Raan, A.F.: Challenges in ranking of universities. In: Invited paper for the First International Conference on World Class Universities, Shanghai Jaio Tong University, Shanghai, pp. 133–143 (2005)
Political Polarity Classification Using NLP Sagi Harshad Varma and Yanamandra Venkata Sree Harsha
Abstract Twitter is a popular platform for sharing our perspective with the world and perceiving the expert opinions of the popular figures on day-to-day affairs. Politicians also find an outlet here for their campaigns, thereby reaching out to a vast audience online. In this research, we focus on identifying and classifying the subtleties of such political tweets that aim to influence the crowd. However, the noise in the dataset concerning linguistic anomalies makes it challenging to apply the direct classification methods. We begin by preprocessing the raw tweets to tackle grammatical and semantic issues. Further, natural language processing (NLP) tools such as Word2Vec that help in preserving semantic and syntactical relationships are incorporated. The classification accuracy is affected by this technique because grammatical structures distort with Word2Vec. Bigram count of special tokens is added to the resulting set of features to solve this problem. A Receiver Operating Characteristic (ROC) curve is used to measure the accuracy by selecting a different set of features, once using a Naïve Bayes classifier and once using random forest. Keywords NLP · CNN · Machine learning
1 Introduction and Problem Understanding Tweets are a common way for political candidates to express their opinions about current affairs. Since the arrival of Web 2.0 microblogging, platforms have become political instruments and reveal political attitudes of political candidates all over the world (excluding countries with censored access to the Internet).
All authors have contributed equally S. H. Varma (B) School of Computer Science and Engineering, VIT University, Vellore, Tamil Nadu, India e-mail: [email protected] Y. V. S. Harsha Department of Computer Science, BITS Pilani, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_3
19
20
S. H. Varma and Y. V. S. Harsha
A group of Mexican anthropological researchers took the task of making a taxonomy just for classifying political attitudes that can be identified on tweets. Their hypothesis is that those attitudes are correlated with future campaign proposals. The taxonomy developed was the following and the tweets are classified into these same categories in this paper: 1. 2. 3. 4.
Proactive: Tweets under this category are aimed to generate information about their personal virtues, their proposed program, and their party’s current efforts. Reactive: Seek to neutralize their adversaries’ derisions and any infamous depiction on the media. Aggressive: Emphasize the negative traits of their enemies or defame their opponents. Vote winner: Demagogue speech aimed at winning a political advantage.
Many Mexican governor election candidates from the EdoMex use social networks to react to ongoing events. Each reaction reflects topics associated with the candidate’s campaign. In this research, our aim is to perform attribute extraction using Spanish and political corpuses, which has never been done in Mexico. An example of previous work done on the political milieu can be found on [1–3]. However, tweets are very limited in the following ways [4]: • • • • •
Data sparsity Changing nature of languages due to trending topics. Political candidates usually make use of jargon and informal language. Lack of context from the text. The text field is limited to 40 characters. Short irregular forms may be used.
As a result, if we implement a machine learning classifier over the raw tweets headfirst, the results would be inaccurate. Our objective is to find a good instance representation that our machine learning models can use uniformly, express latent attributes present in the data, and finally reduce the noise produced by redundant data. To achieve this objective, we should use some common preprocessing steps that can help in semantic and grammar decomposition and incorporate domain knowledge into our search space. Previous attribute extraction based on an ensemble of natural language techniques and classifiers improve the overall attitude classification precision over unseen tweets. These techniques are shown in Sect. 4 which includes deep learning techniques and each of those will be tailored to a single aspect of the problem. Deep neural networks are essentially artificial neural networks that contain several layers. Convolutional neural networks (CNN) are among the most widely used deep neural networks, and they have proved to be useful in many realms, from heuristics design to natural language processing and computer vision. To manipulate the dataset for this study, we apply a CNN-based approach for extracting the latent features as mentioned earlier. One reliable technique is Word2Vec, which utilizes a neural network model to learn the correlations between words present in the dataset. This method is capable of both preserving semantic and syntactic representation, along with categorizing a bag-of-words when data is sparse. Further, we
Political Polarity Classification Using NLP
21
test the extracted attributes on a bag-of-words classifier to measure the degree of improvement. Our objective is to improve classification precision by adding a new set of features to each tweet tuple. First, we go through the related works in Sect. 2 and identify the possible scope for improvement in the existing works. Based on our study, we propose an algorithm that involves an ensemble of three classifiers. We use a pretrained model1 to process the tweets in a Word2Vec representation and apply the Naïve Bayes classification for categorizing the tweets into four political attitudes. Bigram count of special tokens are added to the resulting set of features to improve the classifier accuracy. Results show that the ROC Area Under ROC curve (AUC) increases significantly, from 0.65 (on average for the four political attitudes) using only bag-of-words to 0.72 using only bigram features, to 0.85 using Word2vec features and finally to 0.857 using all the features. Having explored the data, we saw there is a tendency for proactive classification. Thus, two classifiers are tested, a Naïve Bayes classifier that adapts very well to binary features and a random forest one that can use features more uniformly. Section 2 details the related works which is succeeded by our proposed algorithm to improve classification precision by adding a new set of features to each tweet tuple. Then we use the data scraped from Twitter in our experimental setup that had a political inclination and processed tweets relevant to the Mexican Edomex governor election campaigns. We use the ROC curve to measure the accuracy by selecting a different set of features, once using a Naïve Bayes classifier and once using random forest. Random forest was used to exploit the features more uniformly to improve the results. Then, we move on to measuring the feature importance in the accuracy of each classifier. In the results and discussion section each of the feature sets are evaluated individually. We further used confusion matrices for measuring performance on multiclass classifiers. The paper ends with a conclusion and future work.
2 Previous Works Several authors have published research papers on topics that run parallel with our research interests. Sentiment analysis is an important aspect of our study to gain insights from tweets about political polarity. It involves the analysis of the underlying emotions of a given text using NLP and other techniques [5]. In text classification on Twitter data [6], sentiment analysis (SA) within the area of natural language processing (NLP) is defined as the computational treatment of opinions, feelings, and subjectivity in text. This article mentions that early history indicates 2001 as the milestone at which a widespread awareness began to arise around sentiment analysis,
1
2017 Spanish billion-word corpus and embeddings. URL https://crscardellino.github.io/SBWCE/.
22
S. H. Varma and Y. V. S. Harsha
with belief systems as forerunners. One of the major factors was the emergence of datasets for machine learning on the World Wide Web. The work on sentiment analysis in Twitter [4] brings to the table two of the first approaches for the research community to tackle the problem of SA. Turney [7] proposes the use of linguistic analysis. This kind of approach can be thought of as supervised because it relies on previous domain knowledge, e.g., Chomsky grammatical structures. At the other end we have [8], which proposes the use of classical machine learning techniques. Contrary to the approach taken by Turney, here we rely more on achieving a high accuracy using an ensemble of different techniques, commonly ignoring grammatical structures as in the case of a simplification using bag-of-words representation. The bag-of-words representations gets its name from a passage from linguist Zellig Harris, “language is not merely a bag-of-words but a tool with particular properties.” Norvig, P suggests we think of the model as “putting the words of the training corpus” in a bag and then selecting one word at a time. Then, the notion of order is lost, but we end up with a binary vector that we can neatly use in our machine learning classifiers. As aforementioned, one major hurdle is that we are limited to a 140 characters text context. Furthermore, tweets generally lack a representative and syntactically consistent structure. The authors of Ontology-based sentiment analysis of twitter posts [9] propose a sentiment grade for each distinct notion in the post using an ontology instead of evaluating it as a whole. The authors use a Formal Concept Analysis (FCA) algorithm proposed by Ganter, B. and Willie, R in which applies a user-driven, step-by-step methodology for creating domain models, i.e., it creates an ontology specific for the bulk of tweets to be classified. Classification of tweets was done on a rank per topic. They used a tool called OntoGen in which a semi-supervised approach was possible. Through the lens of our work, topics and ontologies could prove useful when considering political parties, allies, government institutions, commercial, and foreign institutions. However, these ontologies must be built mostly by human annotations, a cost we cannot afford in this study. The approach taken by Pang, B., Lee, L., and Vaithyanathan, S. was a bit different. The paper measures how word of mouth (WOM) affects movie sales, negatively or positively. There were four tweet categories very similar to the ones we are measuring: intention tweets, positive tweets, neutral tweets, and negative tweets. Intention tweets are very similar to our vote winner category because an intention to win votes can be achieved either by aggressive or proactive tweets. The authors decided to use two well-known classical machine learning classifiers: Naïve Bayes and support vector machines. This approach is similar to the one proposed by Kontopoulos et al. [9] in which we harness the efficiency of classical machine learning algorithms by using meaningful instance representations. In deriving market intelligence from microblogs [10], many approaches for feature extraction are mentioned. Namely, extracting frequent terms while measuring compactness, association rule mining to find syntax rules, ontologies, hyponyms (more general) and meronyms. However, most of the methods mentioned in the
Political Polarity Classification Using NLP
23
introduction use unigrams, ngrams, and part-of-speech (POS) [4]. The next section will explain our approach.
3 Proposed Algorithm We propose an ensemble of features that can be better representations than the bagof-word alone. Although Word2vec preserves semantic and syntactic relationships, it does not preserve grammatical structures. This drawback can be compensated by using bigram structures of special tokens in which the order is still maintained. The algorithm is developed for vector representation of tweets that will be used by a Naïve Bayes classifier or a random forest classifier. The algorithm for vectorizing the tweets can be described as follows: Our objective is improving classification precision by adding a new set of features to each tweet tuple. We feed the algorithm with N tweets and a list of tokens for this purpose. There are three extraction methods involved here, which include bag-ofwords, Word2Vec, and bigram models. The algorithm for extracting the features involves taking N tweets and a list of tokens as input that returns a set of features. For executing this algorithm, a union of bag-of-words, Word2Vec, and bigram extraction methods is applied followed by their normalization. A neural network pre-trained model (Spanish Corpus) that maximizes conditional probability of the context, given a word, is fed to the Word2Vec feature extraction algorithm along with the tweets. The algorithm returns a set of features accordingly upon calculating the sentence vector by averaging the vector of their words. In the bag-of-words algorithm, the output set contains a binary vector of words. This set returns when tweets and a word list is fed to it as input. The algorithm processes the input to find the words in the input dataset that match with the ones that are already present in the bag-of-words dataset. The bigram count vector algorithm returns bigram counts after taking a list of tokens that should be used, and the tweets as input. Table 1 represents the special tokens used in the bigram generator. These tokens will also be counted individually (1-g). In contrast to other approaches found on the Internet, the bigrams will be counted instead of asserting their presence. The algorithms are simplified due to the fact that we are relying more on the pre-trained neural networks models that came with the genism python library for Word2vec representations. Those packages already came pre-trained on Spanish billion-word corpus and embeddings. The above process can be summarized with the following points: 1.
A set of features represented in a Word2Vec vector representation of the tweet can leverage the power of an already trained Word2Vec model and gives a Naïve Bayes classifier a very low generalization error.
24
S. H. Varma and Y. V. S. Harsha
Table 1 Special tokens to use on the bigram generator
2.
3.
Token
Purported attitude
Ellipsis
Reactive
Exclamation
Aggressive, proactive
Hashtag
Proactive, vote winner
Mention
Proactive, reactive
Name
Text aggressive, vote winner
Neg_emoticon
Aggressive
Pos_emoticon
Proactive, vote winner
Question
Proactive
Quoted
Vote winner
Uppercase
Aggressive
Url
Proactive
Colon
Vote winner, proactive
Semicolon
Aggressive
Comma
Vote winner
A more diverse set of features can increase accuracy [11]. Thus, a minimum representation of a grammatical structure, i.e., a bigram count of special tokens are added to the resulting set of features. This bigram count increases the classifiers accuracy. Normalized features achieve better results and can be selected more easily because they are scale invariant. Thus, the vectors corresponding to tweets with different lengths are weighted. However, tweet length will be added to the features in representation of energetic grammatical structures.
4 Experimental Setup We scraped the data from Twitter that had a political inclination and processed tweets relevant to the Mexican Edomex governor election campaigns. From over 51,453 samples, we found that only 7594 were relevant to our attitude classification as the others did not fall into the aforementioned categories. Due to the skewness, we have decided to make a stratified sample set consisting of 60% of the data for training and 40% for testing, i.e., 4500 and 3094, respectively (Fig. 1). Ground truth consists of a rank given for each of the four categories in each tweet. These ranks go from “0” to “9” and can be considered as ordinal values. As far as this study is concerned, no correlation exists between these 4 target classes. Thus, we will treat each category as a separate classification problem (Fig. 2). Figure 3 represents the most used words per category of political attitude in the classified tweets.
Political Polarity Classification Using NLP
25
Fig. 1 Data is heavily skewed toward proactive attitude
Some tweets present a homogeneous structure (having only one class dominate over the others) while other tweets are more ambiguous. Figure 1 shows the 4 target classes distribution in a binary way, “0” equals “not present” and “not 0” equals “present”: Some of the most common words that can serve as a basis for the bag-of-words representation was figured out using the process described in the subsequent lines. We remove the stop words due to low inverse document frequency. Owing to the fact that we added the word “no” to the exception list, it is one of the most common words. Figure 2 depicts the most frequently used words in the tweets. A ROC curve will be used to measure accuracy selecting different set of features, once using a Naïve Bayes classifier and once using random forest. Naïve Bayes classifiers works well when using prior binary knowledge, of which words are present in a sentence, and even when we use bag-of-words, it is highly probable that we will achieve uniform results. To complete our test, we use random forest to exploit the features in order to improve the results.
26
S. H. Varma and Y. V. S. Harsha
Fig. 2 Bar chart showing the most frequently used word
It is impossible to measure the statistical significance of the improvement because we are only using one database. The Wilcoxon test, which deals with just two classifiers as in our case, also relies on this number of separate databases. Finally, we are going to measure feature importance in the accuracy of each classifier.
5 Results and Discussion We run a wide range of files in the cleansing python package to achieve our objectives. First, the tweets are loaded, thereby centralizing the information contained in the Excel files consisting of raw data into a one hdf5 and then a pickle file called final.pickle. We then move forward to tokenizing the tweets by generating the column
Political Polarity Classification Using NLP
27
Fig. 3 Most used words per category
tokenized_text which is the text array which has cleansed text for the BOW extraction. We also leverage testing files that help in generating a plot that shows the most frequent words that will be probably important for our classifier. It also helps us visualize the problem in terms of most frequent words’ distribution over the 4 attitudes. We further move on to leverage the files in the feature package. This includes ExtractBOW.py that is for Binary bag-of-words vector extraction. All the extracted features are saved in Excel and pickle for later usage. SpecialTockens.py contains the function that we are going to evaluate in order to compute the special tokens ngram model. ExtractBIG.py comprises common sense features extracted first by counting special tokens. Then bigrams are extracted and counted to preserve grammatical structures that are obviated by the W2V representation. Spanish Word2vec vector
28
S. H. Varma and Y. V. S. Harsha
representations are extracted for each word and then an average overall tweet vector is calculated in ExtractW2V.py. Finally, all the features are consolidated into one file. Having saved all the features, we utilize the testing file containing base functions for calculating the ROC AUC for multiclass environment, as well as precision, recall, accuracy, and f1-scores that can be appreciated indirectly on the confusion matrices. For the sake of conciseness, the only metric presented on the report was the ROC AUC. A cross validation function is provided. Then, two graph types were generated, the general one presented on Sect. 6 and the confusion matrices that will be explained in due course. Finally, we visualize what are the most relevant top 20 features for the W2V + BIG + BOW random forest. Having explored the data, we saw there is a tendency for proactive classification. Thus, we should expect having low recall values for the aggressive, vote winner and reactive classes. Two classifiers will be tested, a Naïve Bayes classifier that adapts very well to binary features and a random forest one that can use features more uniformly. Both are natively multiclass. Random forest was trained with 100 trees. In this section, we evaluate each of the feature sets individually. We also take the classes into account that they are trying to predict (Fig. 4). General ROC areas (one vs rest) demonstrate that BIG + W2V using Naïve Bayes has a slightly better accuracy than the rest of the features. BOW features actually worsened the classifier accuracy and so did using random forest. However, literature recommends using confusion matrices for measuring performance on multiclass classifiers. We derive some conclusions by looking at them.
Fig. 4 Results obtained expressed as the macro average ROC area using a one versus rest classifier (Weka approach)
Political Polarity Classification Using NLP
29
5.1 Bag-Of-Words
The first row of the confusion matrix heat map shows the BernoulliNB classifier and the second one the random forest classifier. It is no surprise that bag-of-word has a good performance on the Proactive label. However, the actual data has a penchant for 3 so our classifier fails to pick up this static tendency. Vote winner category improves significantly when we use maybe owing to the fact that random forest uses the features better.
5.2 Bigrams
30
S. H. Varma and Y. V. S. Harsha
Compared to BOW, bigrams improve the aggressive and the proactive and reactive categories using Naïve Bayes. However, using random forest the behavior is a little worse.
5.3 Word2Vec
Word2vec features improve substantially both Naïve Bayes and random forest. Proactive precision and recall levels also seem to improve.
5.4 Word2Vec + BOW
Political Polarity Classification Using NLP
5.5 Word2Vec + Bigrams
5.6 BOW + Bigrams
31
32
S. H. Varma and Y. V. S. Harsha
5.7 Mixed Features (BOW + W2V + BIG)
All the features combined generate a nearly perfect Proactive classification for random forest. However, that is as far as it gets, because the data is insufficient for learning other classes. The vertical lines indicate that the actual class is always static, the source of a big generalization error that could be ameliorated if we had more data.
6 Conclusion Features with which we feed a machine learning algorithm are very important. We just saw how by just adding bigram features improved accuracy and ROC AUC consistently. Feature work may generate more levels of bigrams and then feature extraction by importance to improve the recall of the classifier. Word2vec vector representation can help integrate the syntactic and semantic structures of any language and find similarities between sentences. Those similarities can then be used to train a more complex classifier. The vertical lines in our results indicate that the actual class is static and that there is a big generalization error, so we probably could do better with more data.
References 1. Tumasjan, A., Sprenger, T., Sandner, P., Welpe, I.: Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pp. 178–185 (2010). http://www.aaai.org/ocs/index. php/ICWSM/ICWSM10/paper/viewFile/1441/1852 2. Bermingham, A., Smeaton, A.F.: On using twitter to monitor political sentiment and predict election results. Psychology 2–10 (2011)
Political Polarity Classification Using NLP
33
3. Jungherr, A., Jürgens, P., Schoen, H.: Why the pirate party won the German election of 2009 or the trouble with predictions: a response to Tumasjan, A., Sprenger, t. O., Sander, P. G., & Welpe, I. M. “predicting elections with twitter: What 140 characters reveal about political sentiment”. Social Science Computer Review 30 (2), 229–234 (2012). http://journals.sagepub. com/doi/https://doi.org/10.1177/0894439311404119 4. Martínez-Cámara, E., Martín-Valdivia, M., Ureña-López, L., Montejo-Ráez, A.: Sentiment analysis in Twitter. Natural Language Eng. 20(1), 1–28 (2014). https://doi.org/10.1017/S13 51324912000332 5. Harjule, P., Gurjar, A., Seth, H., Thakur, P.: Text Classification on Twitter Data, pp. 160–164 (2020). https://doi.org/10.1109/ICETCE48199.2020.9091774 6. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations Trends® Inf. Retrieval 2(1–2), 1–135 (2008). http://www.nowpublishers.com/article/Details/INR-011 7. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, USA, pp. 417–424 (2002). https://doi.org/10.3115/1073083.1073153 8. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques (2002). http://www.aclweb.org/anthology/W02-1011 9. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40(10), 4065–4074 (2013). https://doi.org/10. 1016/j.eswa.2013.01.001 10. Li, Y.M., Li, T.Y.: Deriving market intelligence from microblogs. Decision Support Syst. 55(1), 206–217 (2013). https://doi.org/10.1016/j.dss.2013.01.023 11. Ramirez-marquez, J.: Some features speak loud, but together they all speak louder: a study on the correlation between classification error and feature usage in decision-tree classification ensembles. Eng. Appl. Artif. Intell. 67, 270–282 (2018). http://www.sciencedirect.com/science/ article/pii/S0952197617302488
Robust Segmentation of Nodules in Ultrasound-B Thyroid Images Through Deep Model-Based Features Siddhant Baldota , C. Malathy, Arjun Chaudhary, and M. Gayathri
Abstract Ultrasound imaging is used to perform diagnosis on organs of the human body. Segmentation of thyroid nodules has been a pressing problem in recent time. The application of deep learning has served as a solution. This research work uses ultrasound-B images of the thyroid obtained from an open-sourced database by the National University of Colombia. Images from 389 patients and their subsequent masks annotated by experts were collected and served to a modified U-Net style deep learning model with a 70:30 train–validation split. Robust segmented masks were obtained with a pixel accuracy of 93.3% and an intersection over union of 0.56 on the validation set. The proposed model outperformed the state-of-the-art U-Net-plus-plus model which yielded around 45% validation pixel accuracy and 0.22 intersection over union. Image similarities over 99.7% were obtained between the predicted masks and the labels of the validation set. The proposed model was deployed at http://thy-seg.herokuapp.com/ as a Web application. It is proposed that the model could potentially aid medical professionals for localization of thyroid nodules in ultrasound images. Keywords Deep learning · Segmentation · Thyroid nodules · Ultrasound
1 Introduction The thyroid is an important endocrine gland which regulates the metabolism of the human body. It is susceptible to various disasters. One among them is the development of a thyroid nodule. Thyroid nodules are mutations on the thyroid gland. They press S. Baldota (B) · C. Malathy · A. Chaudhary · M. Gayathri Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur 60203, India e-mail: [email protected] C. Malathy e-mail: [email protected] M. Gayathri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_4
35
36
S. Baldota et al.
against other internal structures in the neck causing neck pain and goiter. Major problems can arise when these nodules become malignant. There have been various systems in place to classify thyroid nodules into benign and malignant. However, there is a need to find out whether the nodule is exactly located. Segmentation of thyroid nodules using an ultrasound image has served as an aid in this regard. Image segmentation is essentially the prediction of a class of each pixel for a 2D image and a voxel for a 3D image. Traditionally, CNNs were implemented to categorize images and detect objects. However, for segmentation, the application of CNNs was unsuccessful since the intended output was localization, which involved assigning a class mark to each pixel. In the case of biomedical problem statements, there were attempts at training a model using a sliding window in order to predict the class of each pixel. This study proposes a deep learning model for robust segmentation of thyroid nodules in ultrasound-B images.
2 Related Works Vigueras-Guillén et al. [1] used a region of interest-based patch as input features. However, this resulted in high amount of superfluous computations and had a tradeoff between patch size and accuracy. The background increased as the patch size increased, but this resulted in an increase in max pooling layers, which reduced spatial information and resulted in lower localization accuracy. When the patch size was reduced, the background was often reduced. To overcome the tradeoff, semantic segmentation using fully convolutional networks was first proposed by Long et al. [2]. Classification models like VGGNet [3] and AlexNet [4] were converted to fully convolutional networks by adaptation of their feature maps to serve the purpose of the segmentation task. Skip architectures were implemented to fuse coarse-grained and fine-grained features in order to obtain an accurate segmented mask. This was the basis of U-Nets proposed by Ronneberger et al. [5]. The key proposition was to have a regular convolving network with sequential layers. Upsampling layers were used to replace the max pooling layers, resulting in a higher pixel density and, eventually, a higher resolution. The upsampled performance was concatenated with these higherresolution features from the convolving direction. As the expanding course, more convolutional layers were added, resulting in an increase in the number of feature channels, allowing the network to transfer spatial information to the high-resolution layers. Ronneberger et al. [5] proposed the U-Net architecture for biomedical image segmentation. The U-Net was fed with an input image of shape 572 × 572 × 3. The input image had subsequent convolutional filters of 64, 64, 128, 128, 256, 256, 512, and 512. In between every two convolutional filters, there was a max pooling of size 2 × 2. This was accompanied by a bottleneck pipeline that does not have any max pooling layers but was used to push the network to learn a feature map compression. The upsampling limb of the U-Net consisted of upsampling layers which acted as inverse convolutional layers whose feature maps were not learnt. The feature maps from the upsampling layers were fused at each step with feature maps from the
Robust Segmentation of Nodules in Ultrasound-B …
37
convoluting part. An output feature map was obtained by using 1 × 1 convolution where the number of filters corresponded to the number of classes for each pixel to be classified into. The activation for the convolutional feature maps in each step was set as ReLU [6] (rectified linear unit). ReLU is a hybrid linear activation function that, if the input is positive, will output it as it is; otherwise, it will output zero.
3 Data The original dataset for the work was obtained from the from Thyroid Digital Image Database (TDID) at http://cimalab.intec.co/applications/thyroid provided by Pulido et al. [7]. The dataset consisted of B-mode ultrasound images with complete annotation and description of diagnosis carried out by experts. The B-mode ultrasound images consisted of pixels depicting the brightness of image pixels due to echo from the ultrasound waves. When an ultrasound wave encounters a collision with a thyroid nodule, differential echo patterns were generated which assigned low pixel values (darker regions) to the ultrasound image.
3.1 Dataset Description The folder consisting the dataset consisted of 480 images and 389 XML files corresponding to each patient. A single XML file had the annotations of one or more corresponding images. All images and corresponding masks were resized to 224 × 224 [8]. The ultrasound-B image consisted of three channels (RGB), whereas the mask was a binary grayscale image. The dataset was split into training and validation sets with a ratio of 70:30 (336 training images, 144 validation images). These datasets were split in such a way that if the ultrasound thyroid nodule image records of a patient x belonged to the set of training images, then any record of patient x was not considered for the validation set. An additional test set consisting of 94 annotated thyroid nodule image reports (images and XML files) of patients different from the training and validation data was collected from the TDID testing database.
3.2 Ground Truth Mask Generation In order to use the annotations for training and validation, annotations were to be converted binary ground truth masks. Figure 1 shows a raw unprocessed ultrasoundB thyroid nodule from the dataset. The ultrasound images from the TDID dataset were subjected to median filtering to account for the speckle noises of the ultrasound images.
38
Fig. 1 An ultrasound-B thyroid nodule image from the dataset
Fig. 2 The tree view of a sample XML file annotation from the dataset
S. Baldota et al.
Robust Segmentation of Nodules in Ultrasound-B …
39
Fig. 3 The binary ground truth mask for the raw image
Figure 2 shows a sample XML annotation file. Element tree parsing of each XML file was carried out. The tenth child node of the XML file tree is the part from where the points are extracted. These points were connected to form an enclosed region of interest indicating the presence of the thyroid nodule. The coordinates were plotted on a image consisting of pixel values equal to 0 (a pure black image), and each pixel lying within the region enclosed by the points were assigned value equal to 255 (white pixel). Thus, a binary grayscale ground truth mask was developed that highlighted the nodule region in the thyroid image. Figure 3 shows the ground truth mask for the raw image in Fig. 1. Ground truth mask generation was performed for training, validation, and test sets, and the image–mask pairs formed the final processed dataset.
4 Proposed Methodology 4.1 Model Architecture For our use case, we have an input image of dimension 224 × 224 × 3. The convoluting part consists of the subsequent 28, 28, 56, 56, 112,112, 224, and 224 filters with kernel size as 3 × 3. Each convoluting block has the following set of layers: Convolutional 2D, Dropout, Convolutional 2D, and Max Pooling 2D. Across the convoluting part, the % age value of dropout increases from 10 to 30 %. Dropout is a method of regularization approximating parallel training of several neural networks with different architectures. Some layer outputs are randomly overlooked or “removed” during training. Every layer update is carried out with a distinct vision of the configured layer during training. A Dropout of 0.5 or 50 % means temporary disappearance of 50% of the layer units and its connections. Dropout is always random.
40
S. Baldota et al.
Fig. 4 Schematic diagram for proposed model architecture
The activation for the convolutional layers is the swish function. The swish function proposed by Ramchandran et al. [9] is a piecewise activation function which has the inclination to work more effectively on deeper models [10]. It is mathematically represented in (1). swish(x) = x × sigmoid(x) (1) For the bottleneck, we use the following set of layers: Convolutional 2D layer with 448 filters of kernel size 3 × 3 and swish activation, a Dropout layer with 30 % dropout followed by a Convolutional 2D layer with 448 filters of kernel size 3 × 3, and swish activation. For the deconvoluting part, we use 2D transpose of convolutional layers instead of upsampling layers. The rudimentary difference between these two layers is that the reconstructed feature maps can be learnt in the transpose layers. Convolutional transpose layers leverage the power of weight updation as well as upsampling. The output feature map was obtained with one 1 × 1 convolutional filter. The model architecture can be seen in Fig. 4.
4.2 Experimental Setup Since there were only two classes, one for the thyroid nodule and the other for the background, for the pixels to be classified into, the loss function for the model was set to be binary cross entropy. Entropy is a value of uncertainty of a variable in a distribution. Cross-class entropy means the negative summation of the entropy products of a class variable and the log probability of the class variable. When a weighted average of cross entropy is calculated over the samples for two classes, it is called binary cross entropy. The optimizer used for minimizing the loss was set as Adam optimizer. Adam, proposed by Kingma et al. [11], is an adaptable learning rate optimizer which leverages the power of root mean square propagation and
Robust Segmentation of Nodules in Ultrasound-B …
41
momentum-based stochastic gradient descent. While training deep neural networks, gradients play an integral role. Weights are penalized with the product of the gradient and learning rate which is a hyperparamter. Momentum assists the learning rate to be modified with respect to the gradient average. This results in the optimizer using a moving gradient average rather than gradient itself. We defined the metrics as pixel accuracy and intersection over union (IoU). Pixel accuracy is the number of correctly classified pixels upon the total number of pixels. IoU alias Jaccard Index is the area of intersection (or overlap) between the predicted segmentation mask and the ground truth mask. For our use case of binary segmentation, the IoU of the two classes is computed on an individual basis, added up and then divided by two. Callbacks are defined prior to training. A callback is a functional object which dictates the behavior of the training process. The callbacks included are logging of the model checkpoints in the form of weights corresponding to the maximum mean IoU obtained on the validation dataset. Another callback function defined is the reduction of learning rate on plateau of mean IoU.
5 Training and Results The proposed model is trained on a Nvidia K80 GPU for 500 training passes through the entire training dataset, and the best weights are saved as defined in the callbacks. The logs for training are plotted in Fig. 5 which shows the accuracy versus epoch graph on both training and validation data (labeled as test in the plots). It can be noticed that there is the accuracies plateau after a point of time and eventually decrease. This is due to over training. However, over training does not the final model weights as the best weights are encountered around epoch 220 where the maximum validation mean IoU is obtained. This can be noticed in Fig. 6. In order to compare the proposed model, a similar training pipeline was followed for a state-of-the-art U-Net++ model [12]. A test set consisting of 94 images belonging to different patients from the training set was taken into consideration. Each image had previously defined ground truth masks. The state-of-the-art U-Net++ model failed to generalize on our dataset and performed poorly with a pixel accuracy of 45% and an IoU of 0.22. Our proposed model performed exceedingly well with a pixel accuracy of 93.3% and an IoU of 0.56. Figure 7 shows a set of original image, its ground truth mask, and predicted segmented mask. Predicted masks were considered above a confidence level of 0.5 in order to remove noise which is added to a ground truth mask in order to generalize it.
42
S. Baldota et al.
Fig. 5 Pixel accuracy versus epochs for the proposed model
Fig. 6 Intersection over union (IoU) versus epoch for the proposed model
5.1 Verification A function was defined to calculate the similarity score between the ground truth masks and the predicted masks. This similarity measure was based on the histograms of the ground truth mask and the predicted mask. Histograms for images were essentially graphs depicting the tonal distribution of the image. The histograms of the two
Robust Segmentation of Nodules in Ultrasound-B …
43
Fig. 7 (left to right) Original ultrasound-B thyroid nodule image, ground truth mask, predicted mask
Fig. 8 The process of overlaying a ultrasound-B thyroid nodule image
images were compared and their overlap led to the similarity percentage. Similarity scores of over 99.7% were obtained for 92 out of the 94 test cases.
5.2 Additional Features One important addition proposed is to overlay the predicted mask as shown in Fig. 8. The original image was converted to a grayscale image. The mask is squeezed so that it becomes 224 × 224 (grayscale). There is no information loss from 224 × 224 × 1 to 224 × 224 because both images have one channel (gray). A background image is created when 100 is added to each pixel of the original image, thus increasing its
44
S. Baldota et al.
brightness. The background is set to white wherever the mask is non-black (in the presence pixels having non-zero values). Then, the original image factor is multiplied to wherever the mask is non-black and added to the background. The background is then resized to the dimensions of the original image. The benefits of overlay after segmentation are that the nodule is highlighted from the rest of the image, and there is no need to compare a predicted binary mask with the input image.
6 Conclusion The proposed deep learning model yielded a pixel accuracy of 93.3% and an IoU on the test set. Image similarities over 99.7% were obtained between the predicted masks and the ground truths of the validation and test set. The proposed model was deployed in the form of a Web application at http://thy-seg.herokuapp.com/. The flask app was deployed locally and ephemerally using flask ngrok. It is proposed that the model could potentially aid medical professionals for localization of thyroid nodules in ultrasound images. However, there was scope of improvement in the IoU. It is estimated that the IoU would increase when trained over a larger dataset. Training the model over a larger dataset would allow the model to generalize better. Overlaying is proposed to highlight the prediction.
References 1. Vigueras-Guillén, J.P., Sari, B., Goes, S.F., Lemij, H.G., van Rooij, J., Vermeer, K.A., van Vliet, L.J.: Fully convolutional architecture vs sliding-window CNN for corneal endo-thelium cell segmentation. BMC Biomed. Eng. 1, 4 (2019). https://doi.org/10.1186/s42490-019-0003-2 2. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmenta-tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 5. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer, Cham (2015) 6. Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 7. Pulido, C., Espinoza, F., Pedraza., L.: Thyroid nodules ultrasound imaging database. TDID (Thyroid Digital Image Database). http://cimalab.intec.co/applications (2014-present). unpublished 8. https://news.ycombinator.com/item?id=12509852 9. Ramachandran, P., Zoph, B., Le., Q. V.: Searching for activation functions, arXiv preprint arXiv:1710.05941 (2017) 10. Basirat, M., Roth, P. M.: The quest for the golden activation function. arXiv preprint arXiv:1808.00783 (2018)
Robust Segmentation of Nodules in Ultrasound-B …
45
11. Kingma, D. P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 12. Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architec-ture for medical image segmentation. In: Deep learning in medical image analysis and multi-modal learning for clinical decision support, pp. 3–11. Springer, Cham (2018)
Image Compression Using Histogram Equalization Raj Kumar Paul and Saravanan Chandran
Abstract The camera, scanner, or other digital devices create huge images, which have excess and redundant information not visualized by the human eye. The compression method eliminates extra and redundant data that does not hamper the image’s visual perception. Image compression is beneficial for large-size image transmission and huge amounts of image storage in databases. The histogram equalizationbased image compression method is proposed for performance analysis of the different image compression methods. The paper attempts to present a compression model by selecting different stages of compression. The first standard model’s steps are the Discrete Wavelet Transform, Quantization, and Huffman coding. The proposed new model’s steps are Histogram Equalization, Discrete Wavelet Transform, Quantization, and Huffman coding. The objective of the proposed model is to improve redundancy. These two models have been experimented with the four standard grayscale images using the Matlab platform. The image quality and compression efficiency were evaluated using Mean Square Error, Peak Signal to Noise Ratio, and Compression Ratio. The experimental outcomes indicate that our model achieved a higher Compression Ratio value, 17.3819, than the standard image compression model and the other five traditional image compression models. The obtained results show that our model has provided the required level of Peak Signal to Noise Ratio and Mean Square Error values. Keywords Histogram equalization · Discrete wavelet transform · Compression ratio · Peak signal to noise ratio · Mean square error · Quantization
R. K. Paul (B) · S. Chandran National Institute of Technology, Durgapur, India e-mail: [email protected] S. Chandran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_5
47
48
R. K. Paul and S. Chandran
1 Introduction Redundancy is a vital issue in compression techniques. Compression aims to reduce the requirement of bits for image construction to minimize storage size and reduce the transmitting time. Images are created and shared every day using advanced technologies and applications for various human activities in society. The types of images are social images, multimedia images, and e-commerce images [1]. A grayscale image and a color image have 65,536 elements (256 × 256 pixels) and 65,538 × 3 elements (256 × 256 × 3 pixels), respectively. The download and upload of large image files are very time-consuming and space-consuming. The limited bandwidth creates a bottleneck in network places. Image compression is a solution for transmission and storage problems through a specific bandwidth [2–6]. The fundamental principle of an image is the correlation between neighboring pixels with some specific features, called redundancy. Compression aims to develop a strategy for the image in which neighboring pixels are minimally correlated. Compression techniques are based on the concept of redundancy and irrelevant information that is present in the image. Redundancy removes redundant pixel values from the image, and irrelevancy omits pixel values that are not visible by human sight. The bits are required to represent the image that is minimized by removing the redundant pixel value [5]. The image compression research objective is to minimize the number of bits to represent an image by eliminating redundancies. The mathematical relation between data redundancy (DR) and image compression ratio (CR) define as Eq. (1), which is realized as CR enhancement in the proposed research work. DR = 1 −
1 CR
(1)
The relational expression has three conditions based on original image size (n1) and compressed image size (n2). Condition1, If the n2 = n1 then CR = 1 and DR = 0. Condition2, If the n2 n1 then CR = ∞ and DR = 1. Condition3, If the n2 n1 then CR = 0 and DR = −∞ [5]. The paper’s contributions are as follows: The multimedia image compression technique compresses the image for minimizing storage. Also, the proposed compression technique reduces the bandwidth requirements for image transmission. There are two different compression strategies. The standard compression method stages are Discrete Wavelet Transform (DWT), Quantization (Q), and Huffman coding(HC) used for multimedia image compression. The proposed compression method’s stages are Histogram Equalization (HE), DWT, Q, and HC. The standard compression model and proposed model have achieved a higher CR value than the five traditional image compression models. The remaining portions of this research article are as follows: Sect. 2 describes the related research work, Sect. 3 presents the proposed model: HE, DWT, Q, HC, and models description, Sect. 4 describes the experimental result and analysis, Sect. 5 concludes with references.
Image Compression Using Histogram Equalization
49
2 Related Research Work In 2019, Kasban H et al. had introduced a new technique for radiographic image compression. It kept details of the important area in the image. In this technique, the interest area was separated from the background of the image using a thresholdbased variance image histogram. The vital area was compressed using the HC with minimum CR. The image’s background was compressed with the maximum CR using the pyramid technique by Vector Quantization (VQ) and Generalized Lloyd technique. The compressed image was generated by adding both the HC’s output and the VQ’s output. The coding scheme achieved better performance over completely compressing the radiographic image without separating the image area [7]. This technique has motivated the researchers to develop a coding scheme that applies to sensitive images. In 2021, Wahab O. F. A. et al. had introduced a hybrid compression technique based on the RSA cryptography technique. It was a lossy and lossless compression steganography strategy. The technique was used to reduce the amount of transmitting data for the speed up transmission process while available slow speed Internet or available small storage space. The authors compressed the plain text and the cover image using the HC technique and the DWT, respectively. The DWT reduced the cover image’s dimensions. The LSB was used to embed the encrypted data in the pressed-together cover image. The authors had evaluated the method on performance analysis parameters, namely, CR, BPP, Percentage Savings (PS), Compression Time(CT), MSE, SSIM, Compression Speed(CS), and PSNR. The system showed a higher performance than other systems that use the same method [8]. The technique has motivated the researchers to develop a coding scheme that applies to sensitive information. In 2019, Devadoss C. P. et al., a novel compression method for transmitting images using the Block-Burrows-Wheeler (BBW) and Move to front transform with the hybrid fractal coding algorithm (HFCA) and HC was developed. A small loss in the important medical image area leads to a wrong assessment of medical treatment. The compression model solved this problem using region-based compression that was the lossless compression for the important area and lossy compression for the other area. The compression model had diagnostic features. This method was divided and encoded without important diagnostic feature loss using a BBW image compression algorithm. The other areas were encoded using HFCA. The compressed areas were added to reconstruct the overall compressed image. The outcomes showed that the method provided a higher PSNR value than the conventional methods [9]. This image compression model has motivated researchers to develop a coding scheme. In 2019, Lin J. et al. had introduced a lossless compression technique. It obtained continuous intensity images with a lossy encoding technique to achieve better compression performance. The decoded original image had no data loss as the standard lossless compression coding technique. The compression performance of the lossy sub-band coding techniques was enhanced at high bit rates. The run-lengthbased symbol grouping method was implemented for entropy coding. In entropy
50
R. K. Paul and S. Chandran
coding, scalar quantization was developed with a thresholding technique for quantization. The results were shown that a lossy compression algorithm performed like the lossless compression scheme, which gave higher compression efficiency over the conventional lossless scheme [10]. This research work has motivated the researchers to develop a novel coding algorithm that gave good compression efficiency. In 2019, Suresh Kumar S. et al. had developed Morlet’s wavelet transformation (MWT) technique to enhance the compression performance for the grayscale image (GI) and binary image (BI). At first, the authors performed a process to remove artifacts and noises from the GI and BI using generalized lapped orthogonal transforms and Wiener filters. A wavelet quantization transformation-based image compression algorithm was performed using MWT. In the end, the quantized wavelet transformation-based image decompression process was executed to obtain the reconstructed image. CR, PSNR, space complexity, and compression time analyzed the compression technique’s performance. The results showed that the technique achieved higher CR and reduced storage size [11]. This research work aimed to develop a coder that applies GI and BI.
3 Proposed Model This section describes the different stages of compression. The compression stages are Histogram Equalization, Discrete Wavelet Transform, Quantization, and Huffman Coding. Also, this section describes the image compression models.
3.1 Histogram Equalization (HE) A Histogram is a visual representation of an image using an intensity distribution technique. It summarizes numerical data, i.e., continuous or discrete, by showing the number of data points within a range of values. A histogram is a bar graph to show the frequency distributions of an image. It calculates the distribution and median of the data [12]. HE improves the contrast of images by spreading out the higher frequent intensity values, i.e., stretching out the image’s intensity range [5, 13]. Let r is an input image. It represents the imr by imc matrix of intensities, ranging from 0 to L−1. L is the possible intensity value, 256 (28 ) for 8 bit. h denotes the histogram of r with a subset for each intensity. Equation (2) shows the equation of the histogram. This equation is used to compute the histogram of the input image. hk =
(n k ) (n)
(2)
Image Compression Using Histogram Equalization
51
nk indicates the pixel’s number with intensity value k. n indicates the total pixels, where k = 0, 1, ..., L−1. s denotes equalized image using the histogram. Equation (3) shows the HE equation, which is used to equalize the intensity value of the histogram. se = floor(L − 1)
e
(h k )
(3)
k=0
The floor() is a function by which we have rounded down to the nearest integer value that is, the transformation of the pixel intensities, e of r by the function [5, 13–15]. We have executed the HE process on Lena, Harry, Flower, and Nature images. The HE process of a flower is shown in Fig. 1. The figure has four parts, namely, the original flower, the histogram for the flower, equalized flower, and the histogram for equalized flower.
Fig. 1 Histogram equalization of the flower image
52
R. K. Paul and S. Chandran
3.2 Discrete Wavelet Transform (DWT) The DWT has many applications in the digital image processing domain for the analysis of an image. It is analyzed for both numerical and functional computation. The DWT is implemented using digital filters and downsamplers. The two-dimensional DWT defines the scaled and translated basis functions. Equations (4) and (5) show the scaled and translated basis functions used to understand the DWT concept. ϕl,r,t (u, v) = 2l/2 ϕ 2l u − r, 2l v − t
(4)
k ψl,r,t (u, v) = 2l/2 ψ 2l u − r, 2l v − t
(5)
The k denotes directional wavelets H, V, and D. The DWT function f (u, v) of size P × Q. Equations (6) and (7) show the two-dimensional DWT function’s equations used to compute the DWT function for proposed algorithms. Wψk (l, r, t) = √
P−1 Q−1 1 k ∫(u, v)ψ(l,r,t) (u, v) P Q u=0 v=0
(6)
Wϕ (l, r, t) = √
P−1 Q−1 1 ∫(u, v)ϕ(l,r,t) (u, v) P Q u=0 v=0
(7)
It decomposes the image into four sub-bands, namely, HL, LH, HH, and LL [5–7]. W ϕ (l, r, t), W H ψ (l, r, t), W V ψ (l, r, t), and W D ψ (l, r, t) denote LL, HL, LH, and HH sub-bands, respectively. DWT’s two-level decomposition process is shown in Fig. 2a.
(a) Two-level Decomposition[6]
(b) sub-bands of the Lena image
Fig. 2 DWT a two-level decomposition [16], b sub-bands of the Lena image
Image Compression Using Histogram Equalization
53
The DWT decomposition process employs the low pass filter and high pass filter of an image followed by downsampling of the first row and columns. One part is called the LL sub-band or approximation. The other parts are called detailed sub-bands, namely, HL, LH, and HH sub-bands [8]. The human visual system is more sensitive to embed the watermark in the LL sub-band, so it is embedded in the other three sub-bands [4–6, 16, 17]. The Lena grayscale image (256 × 256 pixels) has 65,536 bytes, where each pixel size is 8 bits. After the execution of the DWT process, the image has 22,616 bytes. The DWT decomposes the Lena image into four parts or sub-bands. Figure 2b shows LL, HL, LH, and HH sub-bands of the Lena image. Also, we have executed the DWT technique for flower, harry, and nature images.
3.3 Quantization The quantization process approximates the continuous values in the image with finite values. The quantizer input is the actual data, and the quantized output is a finite set of levels. The quantizer is a function whose set of output values are finite and discrete [11]. In scalar quantization, each input symbol is considered separately for producing the output. The vector quantization principle is the clustering technique. The vector quantization’s input symbols are clubbed together in groups. These groups are called vectors. If the number of clubbing data or groups is increased, then increase the vector quantizer’s optimality. A quantizer defines its input division and output levels. The uniform quantizer’s input is divided into equal sizes [5, 18]. Qiv denotes quantization interval, (fmax, fmin ) presents a finite range, and L means quantization levels. Equations (8), (9), and (10) are the uniform quantizer’s equations and used in the proposed algorithms for quantization procedures. Q iv =
( f max − f min ) L
(8)
( f − f min ) Q iv
(9)
Qi ( f ) = Qi (f ) denotes the quantized Index.
Q( f ) = Qi ( f )Q iv +
Q iv + fmin 2
(10)
Q(f ) denotes quantized value. Non-uniform quantizer’s input is not divided into equal spacing [5, 7]. Equations (11) and (12) are the non-uniform quantizer’s equations used in the proposed algorithms for quantization procedures. The L denotes quantization levels. bl denotes partition values. Bl means partition regions. Bl = (bl−1, bl ), gl presents reconstruction values.
54
R. K. Paul and S. Chandran
Q i ( f ) = 1, if f ∈ Bl
(11)
Qi (f ) denotes the quantized Index. Q( f ) = g1 , if f ∈ Bl
(12)
Q(f ) denotes the quantizer value. The quantization’s objective is to minimize precision and achieve a better CR. The quantization is a lossy operation [5, 6, 11]. We have used the scalar quantization technique to quantize the transform values. In this paper, quantization decides the quality level of the image.
3.4 Huffman Encoding (HC) Huffman coding is a technique to eliminate coding redundancy. It is lossless compression and statistical coding. HC assigns smaller codes for more frequently used symbols and larger codes for less frequently used symbols to reduce the size of the file compress [5]. It is a variable-length coding system. The concept of HC is a chain of source reductions by arranging the probabilities in decreasing order, and merging the two smallest probable symbols creates a compound symbol that replaces the successive source reduction. This process is repeated continuously up to two probabilities of two compound symbols that are only left [6, 12, 19, 20]. The steps of Huffman coding are: We have constructed a Huffman tree (H t ) using the input symbols (S = s1 , s2 , s3 … , sn ) and set of weight or frequency (W = w1 , w2 , w3 …, wn ) for each symbol. The H t is a binary tree data structure, has n leaf nodes (L n ) and n−1 internal nodes (I n ). We have used the Priority Queue (Pq ) for creating the H t , the low frequency nodes have the high priority. We have used the Min Heap (M h ) data structure to implement the functionality of a Pq . In the beginning, all nodes are L n nodes having the character with the weight or frequency. The I n nodes have weight and links to two child nodes, namely, the left child node (L cn ) and right child node (Rcn ). We have assigned the set of binary codes (C = c1 , c2 , c3 …, cn ) to each symbol by traversing the H t . Finally, 0 denotes the L cn , and 1 denotes the Rcn [5–7].
3.5 Models Description This subsection presents the two image compression models, namely, the standard image compression scheme and the proposed coding model. Figure 3 has been shown the flowcharts of the (a) standard model [2] and (b) proposed a coding model. The standard model has started with an image as the input to the DWT process. Then the quantization technique has quantized the output of the DWT process. The quantizer has decided the image’s quality level, followed by the HC to obtain the compressed
Image Compression Using Histogram Equalization
55
Fig. 3 The flowcharts: a standard model (DWT-Q-HC), b proposed model (HE-DWT-Q-HC)
image. The proposed model has started with an image as the input to the HE process, which has equalized the input image. Then the DWT process is performed using the histogram equalized image. After the DWT process, quantization has decided the image’s quality level, followed by the HC to obtain the compressed image. Finally, the standard model’s output and the proposed new model’s output have been compared with visual quality measurement performance parameters like CR, MSE, and PSNR. Algorithm1 describes the standard model, which is DWT-Q-HC. Algorithm2 describes the proposed model, which is HE-DWT-Q-HC. Algorithm 1: DWT-Q-HC
56
R. K. Paul and S. Chandran
Input : Original Image Output : Compressed Image 1: (Iu,v) ← Original Image 2: Discrete Wavelet Transform Procedure (Pdwt) : dwt ← Pdwt ((Iu,v)) Pdwt : X ← dwt2((Iu,v), 'db1') 3: Quantization Procedure (PQ) : Q ← PQ(Pdwt ((Iu,v))) QV ← quanta value or quanta values PQ : if(Partition is not required) then : Q ← uniform_quantization(X, QV) else Q ← non_uniform_quantization(X, QV) 4: Huffman Coding Procedure (PHC) : HC ← PHC(PQ(Pdwt (Iu,v)))) PHC : I’u,v ← huffmanenco(Q, dict) 5: Compressed Image ← (I’u,v) 6: Compute the CR, MSE, PSNR values Algorithm 2: HE-DWT-Q-HC (Proposed) Input : Original Image Output : Compressed Image 1: (Iu,v) ← Original Image 2: Histogram Equalization Procedure (P HE) : HE ← PHE((Iu,v)) H ← imhist(Iuv) PHE : HE ← histeq(Iuv) 3: Discrete Wavelet Transform Procedure (Pdwt) : dwt ← Pdwt (PHE((Iu,v))) Pdwt : X ← dwt2(HE, 'db1') 4: Quantization Procedure (PQ) : Q ← PQ(Pdwt (PHE((Iu,v)))) QV ← quanta value or quanta values PQ : if(Partition is not required) then : Q ← uniform_quantization(X, QV) else Q ← non_uniform_quantization(X, QV) 5: Huffman Coding Procedure (P HC) : HC ← PHC(PQ(Pdwt (PHE((Iu,v))))) PHC : I’u,v ← huffmanenco(Q, dict) 6: Compressed Image ← (I’u,v) 7: Compute the CR, MSE, PSNR values
Image Compression Using Histogram Equalization
57
4 Experimental Result Analysis This section describes the experimental result of the DWT-Q-HC and the HE-DWTQ-HC models. The experiment was performed on the Matlab platform. Figure 4 represents the standard images, namely, Lena, Flower, Nature, and Harry are the same size that is 256 × 256 [3, 17]. We have assumed the Lena, Harry, Flower, and Nature images as image1, image2, image3, and image4. We have compressed all four images using the DWT-Q-HC and the HE-DWT-Q-HC models.
4.1 Performance Parameters In this paper, to measure the DWT-Q-HC and HE-DWT-Q-HC image compression algorithm’s performance, three performance parameters are used. Compression Ratio (CR): It is rational in the original image size (n1) and the compressed image size (n2) [5]. Equation (13) shows the equation used to compute the CR value for the compression methods. CR =
n1 n2
Fig. 4 The standard input images (Lena, Flower, Nature, Harry) [3, 17]
(13)
58
R. K. Paul and S. Chandran
Mean Square Error (MSE): It is the error metric used to analyze the compressed image quality. It represents the error between the compressed image C(x, y) and the original image O(x, y). It is measured for computing the distortion rate in the reconstructed image of the DWT-Q-HC and the HE-DWT-Q-HC models [14, 15]. A and B denote rows and columns in both images. Equation (14) shows the equation used to compute the MSE value for the compression methods. MSE =
A B 1 (O(x,y) − C(x,y))2 A ∗ B I=1 J=1
(14)
Peak Signal to Noise Ratio (PSNR): It is the error metric used to analyze the DWT-Q-HC and the HE-DWT-Q-HC model’s image compression quality. It measures the total number of noise in the image [14, 15]. R is the highest possible pixel value in the image matrix. Equation (15) shows the equation used to compute the PSNR value for the compression methods. PSNR = 10 ∗ log10 ∗
R2 (dB) MSE
(15)
Tables 1 and 2 have illustrated the DWT-Q-HC and the HE-DWT-Q-HC (Proposed) model’s experimental result, respectively. As seen in Tables 1 and 2, we have measured and compared the compression performance and image quality. The proposed model achieved a greater CR value, 2.8342, over the DWT-Q-HC model. Also, PSNR and MSE values have shown the required level. We have observed that the proposed model achieved the highest CR, 19.4498, for image4 and the lowest CR Table 1 Experiment results using (DWT-Q-HC) SL. No.
Image
MSE
PSNR (dB)
CR
1
Image1
2.6104
43.9638
13.8432
2
Image2
4.1433
41.9574
12.3861
3
Image3
5.7418
40.5404
14.2189
4
Image4
3.8429
42.2843
17.7426
4.0846
42.1865
14.5477
Average
Table 2 Experiment results using (HE-DWT-Q-HC) SL. No.
Image
MSE
PSNR (dB)
CR
1
Image1
0.1176
57.4253
15.8480
2
Image2
2.2861
44.5399
16.7138
3
Image3
1.9934
45.1349
17.5162
4
Image4
2.0583
44.9957
19.4498
1.6138
48.0239
17.3819
Average
Image Compression Using Histogram Equalization
59
value, 15.8480, for image1. We have observed that the DWT-Q-HC model achieved the highest CR value, 17.7426, for image4 and the lowest CR value, 12.3861, for image2. The HE-DWT-Q-HC model achieved the highest PSNR value, 57.4253 dB for image1, and the lowest PSNR value, 44.5399 dB for image2. The DWT-Q-HC model achieved the highest PSNR value, 43.9638 dB for image1, and the lowest PSNR value, 40.5404 dB for image3. Table 3 has been represented and compared the average compression ratio of different methods. It has been shown that the compression ratio of the proposed model has been achieved higher than the compression ratio of the DWT-Q-HC model and the compression ratio of Ref. [14, 21–24]. Also, Fig. 5 has been visually shown that the compression ratio of the proposed model has given higher than the compression ratio of the DWT-Q-HC scheme and the compression ratio of Ref. [14, 21–24]. The figure has been shown the graphical representation of the average compression ratio to understand the compression ratio of different compression models easily. So, we have observed that the proposed image compression model provided higher CR 17.3819 than the DWT-Q-HC and Ref. [14, 21–24]. Finally, the proposed image compression model (HE-DWT-Q-HC) is a useful image compression model. Table 3 Comparative interpretation of MSE, PSNR, and CR SL. No.
Methods
MSE
PSNR (dB)
1
Ref. [21]
0.3875
42.2481
0.3333
2
Ref. [14]
0.6145
50.2792
1.6129
3
Ref. [22]
–
25.7446
1.9800
4
Ref. [23]
–
52.782
5
Ref. [24]
–
18.52
6
DWT-Q-HC
4.0846
42.1865
14.5477
7
Proposed
1.6138
48.0239
17.3819
Fig. 5 Interpretation of experimental results
CR
5.418 12.14
60
R. K. Paul and S. Chandran
5 Conclusion Image compression reduces transmission time and storage size. This article presents a HE-DWT-Q-HC model for image compression using histogram equalization. This image compression technique is useful for commercial images, and it has achieved a better compression ratio. The HE-DWT-Q-HC model had given a greater CR value, which was 2.8342, than the DWT-Q-HC model. The proposed model has a vital stage, which is the wavelet technique. As per the CR, MSE, and PSNR values, we have observed that the proposed image compression model is useful for image compression with better compression efficiency. Although the proposed model has achieved a better CR value than the DWT-Q-HC model and other five traditional image compression models, some aspects will develop in the future to enhance the performance of the compression method.
References 1. Bondi, L., Bestagini, P., Perez-Gonzalez, F., Tubaro, S.: Improving PRNU compression through preprocessing, quantization, and coding. IEEE Trans. Inf. Forensics Secur. 14(3), 608–620 (2019). https://doi.org/10.1109/TIFS.2018.2859587 2. Ibraheem, M.S., Ahmed S.Z., Hachicha K., Hochberg S., Garda P.: Medical images compression with clinical diagnostic quality using logarithmic dwt. In: 2016 IEEE—EMBS International Conference on Biomedical and Health Information, Las Vegas, NV, USA, pp. 402–405 (2016) 3. Kabir M.A., Mondal M.R.H.: Edge-based transformation and entropy coding for lossless image compression. In: International Conference on Electrical, Computer and Communication Engineering, Bangladesh, pp. 717–722 (2017). https://doi.org/10.1109/ECACE.2017.791 2997 4. Bruylants, T., Munteanua, A., Schelkens, P.: Wavelet based volumetric medical image compression. Signal Process: Image Commun. 31, 112–133 (2015) 5. Gonzalez, R.C., Woods, R.E.: Digital Image Proceedings, 2nd edn. Pearson Edu, India (2004) 6. Sayood, K.: Introduction to Data Compter, Third Ed., USA, M.K. is an imprint of Elsevier (2006) 7. Kasban, H., Hashima, S.: Adaptive radiographic image compression technique using hierarchical vector quantization and huffman encoding. J. Amb. Intel. Hum. Comput. 10, 2855–2867 (2019). https://doi.org/10.1007/s12652-018-1016-8 8. Wahab, O.F.A., Khalaf, A.A., Hussein, A.I., Hamed, H.F.: Hiding data using efficient combination of RSA cryptography, and compression steganography techniques. IEEE Access 9, 31805–31815 (2021) 9. Devadoss, C.P., Sankaragomathi, B.: Near lossless medical image compression using block BWT MTF and hybrid fractal compression techniques. Cluster Comput. 22, 12929–12937 (2019). https://doi.org/10.1007/s10586-018-1801-3 10. Lin, J.: A new perspective on improving the lossless compression efficiency for initially acquired images. IEEE Access 7, 144895–144906 (2019) 11. Suresh Kumar, S., Mangalam, H.: Quantization based wavelet transformation technique for digital image compression with removal of multiple artifacts and noises. Cluster Comput. 22, 11271–11284 (2019). https://doi.org/10.1007/s10586-017-1379-1 12. Meftah, M., Pacha, A.A., Hadj-Said, N.: DNA encryption algorithm based on Huffman coding. J. Discrete Mathemat. Sci. Cryptograp. 1–14 (2020) 13. Gonzalez, R.C., Woods, R.E.: Digital Image Proceedings, 3rd edn. Pearson, India (2008)
Image Compression Using Histogram Equalization
61
14. Rahman M.A., Rabbi M.M.F., Rahman M.M., Islam M.M., Islam M.R.: Histogram modification based lossy image compression scheme using Huffman coding: In: 2018 4th International Conference on Electrical Engineering and Information and Communication Technology (iCEEiCT), Dhaka, Bangladesh, pp. 279–284 (2018). https://doi.org/10.1109/CEEICT.2018. 8628092 15. Rahman, M.A., Islam, S.M.S., Shin, J., Islam, M.R.: Histogram alternation based digital image compression using base-2 coding. In: 2018 Digital Image Computer: Technical and Application, Canberra, Australia, pp. 1–8 (2018). https://doi.org/10.1109/DICTA.2018.8615830 16. Torkamani, R., Sadeghzadeh, R.A.: Wavelet-based Bayesian algorithm for distributed compressed sensing. J. Info. Sys. Telecom. 2(7), 87–95 (2019) 17. Huang, J., Luo, Y., Zhon, R., Liu, Y., Bi, J., Qiu, S., Cen, M. Liao, Z.: A novel DWT and CTSMbased image watermarking method. In: IEEE 18th International Conference on Communication and Technology (ICCT), Chongqing, China, pp. 1232–1236 (2018). https://doi.org/10.1109/ ICCT.2018.8600033 18. Malathkar N.V., Soni S.K.: High compression efficiency image compression algorithm based on subsampling for capsule endoscopy. Multimedia Tools Appl. 1–13 (2021) 19. Peng, X., Jiang, J., Tan, L., Hou, J.: 2-D Bi-level block coding for color image compression and transmission with bit-error awareness. IEEE Access 8, 110093–110102 (2020) 20. Li, G., Hou, Y., Zhu, J.: An efficient and fast VLIW compression scheme for stream processor. IEEE Access (2020) 21. Bhuvaneswary, N., Reddy, B.S., Reddy E.H., Gopi, G.: Design of parallel pipelined architecture for wavelet based image compression using daubechie’s. In IEEE International Conference on Intelligent Technical in Control, Optical and Signal Processing (INCOS), Tamilnadu, India, (2019). https://doi.org/10.1109/INCOS45849.2019.8951432 22. Gupta, N.K., Parsai, M.P.: Improvised method of five modulus method embedded JPEG image compression with algebraic operation. In: 2nd International Conference on Intelligent Computer Instruction and Control Technologies (ICICICT), Kannur, Kerala, India, pp. 1338–1342 (2019). https://doi.org/10.1109/ICICICT46008.2019.8993154 23. Li, F., Hong, S., Wang, L.: A novel near lossless image compression method. In: IEEE International Symposium on Circle and System Japan (2019). https://doi.org/10.1109/ISCAS.2019. 8702673 24. Cheng H.H., Chen C.A., Lee L.J., Lin T.L., Chiou Y.S., Chen S.L.: A low-complexity color image compression algorithm based on AMBTC. In: IEEE International Conference On Construction Electrical-Taiwan (ICCE-TW) (2019). https://doi.org/10.1109/ICCE-TW46550. 2019.8992037
Violence Detection in Video Footages Using I3D ConvNet Joel Selvaraj
and J. Anuradha
Abstract With increasing surveillance systems through the help of drones and wide deployment of closed-circuit television (CCTV) cameras in public spaces, the need for monitoring these videos is more than ever before. Drones and CCTV footages have helped in solving crimes that had already been committed but have not been able to prevent them from happening. To monitor these huge live video feeds manually to prevent or identify, a crime is a labor-intensive and expensive task. Thus, automatic monitoring of these videos for any violent behavior is the optimal solution for this problem. In this paper, generation of optical flow images using FlowNet2 (Ilg et al., FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks, pp. 1647– 1655, 2017) and training these on the Inflated 3D (I3D) (Carreira and Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, pp. 4724–4733, 2017) deep neural network, drastically improves the state-of-the-art performance upon the surveillance camera fight dataset. Keywords Violence detection · I3D · ConvNet · Inception · Optical flow
1 Introduction As more and more surveillance options are being implemented across countries, the need to monitor these videos for any criminal activity is higher than ever. These surveillance options can help in identifying possible criminals during the event of a crime. However, it is too expensive to manually monitor the huge volume of video footages. Thus, video processing techniques to detect violent activities can potentially identify crime and possibly stop them in real-time. As most crimes that occur in public
J. Selvaraj (B) · J. Anuradha Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India e-mail: [email protected] J. Anuradha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_6
63
64
J. Selvaraj and J. Anuradha
places involve violent behaviors, a model that can identify potential assault attempts is developed in this work. The problem can be visualized as a human activity recognition (HAR) problem. In general, there are two approaches for solving human activity recognition problem. One approach is extracting various features such as optical flow, violent flow (ViF) descriptor [1], motion intensities, skin and blood colored region, oriented gradients, and pose detection and using traditional machine learning classifiers such as Knearest neighbors (KNNs), support vector machine (SVM), and AdaBoost to classify human activity. Another approach is using deep neural network instead of traditional machine learning classifiers to identify the human activity. The deep neural network is trained with video level annotations or frame level annotations over a large data to accurately classify the human activity. In this study, inflated 3D (I3D) deep neural network [2] and optical flow images are used to enhance the detection of violence in videos from the surveillance camera fight dataset [3]. The work can be summarized as follows: • First, optical flow images are generated for the videos in the surveillance camera fight dataset using FlowNet 2.0, a deep neural network introduced by Ilg et al. [4] to generate the optical flow images instead of the traditional mathematical approach. • Then, the I3D model is trained and evaluated upon the generated optical flow images from the surveillance camera on fight dataset. The model effectively learns temporal changes by stacking images over time to form the 3rd dimension in the model. • In this study, the I3D model takes 20 optical flow images stacked as a single 3D input for each video and achieves a state-of-the-art accuracy of 80% in the surveillance camera fight dataset. The remainder of this paper is organized as follows: In Sect. 2, related research done by others is discussed. The architecture and algorithm of the approach are discussed in Sect. 3. In Sect. 4, the pre-processing, training, and test stage of the implementation process are elaborated. The experimental results of this approach are discussed and compared in Sect. 5. Finally, the present paper is summarized in Sect. 6.
2 Related Work Spatio-temporal information is very crucial to effectively detect violence in video footages. There are two widely used approaches for building deep neural network model that can learn spatio-temporal data. The first approach is using a 3D convolutional neural network (CNN) which can take a sequence of adjacent images (providing temporal information) as an input and perform 3D convolution to improve performance over traditional 2D CNN.
Violence Detection in Video Footages Using I3D ConvNet
65
Li et al. [5] introduce a custom 3D CNN model which takes 16 adjacent frames combined to form a single input, such that the model can learn spatio-temporal properties effectively. The model achieved an accuracy of 98.3% on the Hockey Fight dataset [6]. Ullah et al. [7] work uses an 2D CNN to filter only frames that contain persons then combine sequence of such 16 frames containing persons and feed them into a 3D CNN. Their approach achieves an accuracy of 98% on the Violence Crowd dataset [1]. The second approach is combining a long short-term memory (LSTM) layer which is good at handling sequence of data (temporal information) with traditional 2D CNN which is good at learning spatial information. Samuel et al. [8] extracted histogram of oriented gradients (HOG) feature from video footages and used them to train a bidirectional LSTM (BDLSTM/Bi-LSTM) model. The model is trained on the violent interaction dataset (VID) and tested against their own football violence dataset with an accuracy of 94.5%. Ullah et al. [9] approach the problem by first extracting the features from the video footages by using a ResNet-50 model pretrained on the ImageNet dataset. The extracted features are then passed through a BDLSTM network which detects anomaly in the video footages. This model achieved an accuracy of 85.53% on UCF-Crime dataset and 89.05% on UCFCrime2Local dataset. Aktı et al. [3] introduced a Fight-CNN + Bi-LSTM + attention model which takes 10 frames stacked as a single input. The model achieves an accuracy of 71% on the custom surveillance camera fight dataset which they have generated. The Fight-CNN used is a Xception model pretrained on the Hockey Fight dataset. In this paper, the work done by Aktı et al. [3] is extended by generating optical flow images for the input images and using an inflated 3D (I3D) model to drastically increase the performance on the surveillance camera fight dataset.
3 Architecture Carreira et al. [2] introduced the inflated 3D ConvNet (I3D) model which works by inflating the 2D pooling and filters kernel of very deep 2D ConvNet to form 3D ConvNet that are capable of effectively learning spatio-temporal features from video footages. The weights of 2D filters are repeated N times along the dimension of time and divided by N to rescale them to form N × N × N 3D filter from a N × N 2D filter. The I3D model architecture builds upon the 2D ImageNet architectures such as Inception [10], Xception [11], VGGNet [12], and ResNet [13] and can optionally transfer weights from those pretrained 2D ImageNet models. The I3D model differs from C3D like 3D ConvNet models by going deep with Inception layers but having much lesser parameters to train. In this study, the I3D architecture is made up of Inception v1 modules, 3D filters, and max pooling layers as shown in Fig. 1.
66
J. Selvaraj and J. Anuradha
Fig. 1 Inflated 3D (I3D) model architecture
The I3D model starts with a convolutional layer of stride 2 and consists of four max pooling layers with stride 2 and a 7 × 7 average pooling layer before the classification layer at the last. The Inception v1 modules are placed besides the max pooling layers. The internal structure of the Inception v1 module can be seen in Fig. 2. It consists of four 1 × 1 × 1 convolution to down sample the feature maps as the network goes deep, with two of them followed by a 3 × 3 × 3 convolution, one preceded by a 3 × 3 × 3 max pool layer, and one connected in between the previous layer and concatenation layer. The two 3 × 3 × 3 convolution and two 1 × 1 × 1 convolution are concatenated in the final layer of the Inception v1 module. The I3D model takes an input in the dimension of N × 224 × 224 × 3 where N is the number of frames per video selected over time. All the convolution layers in the I3D model use a rectified linear unit (ReLU) activation function. The final classification layer uses a sigmoid activation function since a binary classification is performed to check whether violence is detected or not in the video. In this study, the binary cross-entropy loss is calculated using Eq. 1. Loss(Q) = −
N 1 yi log( p(yi )) + (1 − yi ) log(1 − p(yi )) N i=1
(1)
where N is number of inputs, y is output label (1 for violence, 0 for no violence) and p(y) is the predicted probability by the I3D model. For error correction, stochastic
Violence Detection in Video Footages Using I3D ConvNet
67
Fig. 2 Inception v1 module
gradient descent (SGD) is used as the optimizer which updates the model’s weights using Eq. 2. w := w − α∇ Q i (w)
(2)
where w is the model’s weight that needs to be updated, ∝ is the learning rate and Q i (w) is the value of the loss function till the ith sample in the dataset.
4 Implementation The I3D model is implemented on the surveillance camera fight dataset to detect violence in video footages through the following 3 stages.
4.1 Data Pre-processing The surveillance camera fight dataset was introduced by Aktı et al. [3]. The dataset is made mostly by generating videos from YouTube and some non-fight videos are extracted from datasets like CamNet [14] and Synopsis [15, 16]. The dataset contains 150 videos in fight category and 150 videos in non-fight category. Each
68
J. Selvaraj and J. Anuradha
Fig. 3 First input image (left) and second input image (center) for FlowNet2.0 and the generated optical flow image (right)
video is approximately of 2 s duration. The videos have different number of frames and various pixel sizes. In the data pre-processing stage, the surveillance camera fight dataset is split into three parts with training set containing 240 videos, validation set containing 30 videos, and test set containing 30 videos. The videos are converted to images for each frame in the video, and the images are resized into 400 × 400 pixels. To generate the optical flow for the images, FlowNet 2.0 model [4] is used. FlowNet 2.0 takes two consecutive frames as input to generate a single optical flow output. Thus, for a video with N frames, N-1 optical flow images are generated. A sample of the generated optical flow image and their respective input images from surveillance camera fight dataset is shown in Fig. 3. The following algorithm shows the pseudocode of the overall pre-processing stage.
Violence Detection in Video Footages Using I3D ConvNet
69
70
J. Selvaraj and J. Anuradha
4.2 Training the I3D Model During the training stage, for each video in the training set, starting from a random start frame and iterating over consecutive 40 frames, 20 frames are selected alternatively by skipping a frame in between. If the video does not contain a minimum of 40 frames, the images are repeated in the video from beginning till 40 frames are generated such that 20 frames can be alternatively selected out of it. The selected optical flow images are resized to 224 × 224 pixels and rescaled by dividing them by 255. Thus, the I3D model takes input of the shape 20 × 224 × 224 × 3, where 3 represents the RGB channel of the optical flow image. Stochastic gradient descent (SGD) optimizer is initialized with a momentum of 0.9 and learning rate (∝) of 0.01. The learning rate is dynamically adjusted using the “Reduce Learning Rate on Plateau” feature in Keras library to converge well on the local or global minima. After each epoch, the model is evaluated against the validation dataset and the validation loss is computed. If the validation loss has not improved in the past 10 epochs, the learning rate of the model is reduced by factor of 0.1. As a binary classifier is being developed to detect whether violence is present or not in a video, binary cross-entropy is used as the loss function. The I3D model is trained with a batch size of 16 videos per batch. The model is trained and evaluated using an Intel Core i7-8700 6-cores 12 Threads CPU and NVIDIA GTX 1070 Ti with 8 GB Graphics memory and 24 GB RAM memory for the system. Keras with TensorFlow backend is used to develop and train the model. After the I3D model is trained, it is saved as a hierarchical data format (HDF) file for later evaluation. The following algorithm shows the pseudocode of the overall training stage.
Violence Detection in Video Footages Using I3D ConvNet
71
72
J. Selvaraj and J. Anuradha
4.3 Testing the I3D Model During testing, for each video in the test set, equispaced sample of 20 frames is selected starting from the first frame. If the video does not contain a minimum of 20 frames, the images are repeated in the video from beginning till 20 frames are generated. The selected optical flow images are resized to 224 × 224 pixels and rescaled by dividing them by 255. For each video in the test set, the I3D model takes an input of the shape 20 × 224 × 224 × 3 and predicts the probability of whether violence is present or not. The predicted probability is used to calculate various metrics such as accuracy, loss, precision, recall, F1, and area under curve (AUC) score. The following algorithm shows the pseudocode of the overall testing stage.
5 Results and Discussion In this experiment, the I3D model’s performance is measured by calculating precision, recall, accuracy, F1-score, and AUC score on the test set. Precision measures the proportion of positive identifications that is correct. Recall measures the proportion of actual positives that is identified correctly. Harmonic mean of precision and recall provides the F1-score. The precision, recall, and F1-score for each class achieved by the model are given in Table 1. The receiver operating characteristic (ROC) curve which shows the performance of the I3D model at all classification thresholds is plotted in Fig. 4, and the AUC score is calculated to be 0.84.
Violence Detection in Video Footages Using I3D ConvNet Table 1 Precision, recall, and F1-score of I3D model over the surveillance camera fight dataset
73
Precision
Recall
F1-Score
0 (no violence)
0.86
0.75
0.80
1 (violence)
0.75
0.86
0.80
Fig. 4 ROC curve of I3D model over the surveillance camera fight dataset
The approach of using I3D model and optical flow images has achieved a stateof-the-art accuracy of 80% on the surveillance camera fight dataset. A comparison of this model and other models evaluated by Aktı et al. [3] is given in Table 2 and Fig. 5. It can be observed that this approach outperformed the Fight-CNN + Bi-LSTM + attention model [3] and other LSTM-based models and effectively detects violence in the surveillance camera fight dataset. Table 2 Performance comparison of I3D + optical flow model against other models on the surveillance camera fight dataset
Model
Accuracy (%)
VGG16 + LSTM (10 frames) [3]
62
VGG16 + Bi-LSTM (10 frames) [3]
45
Xception + LSTM (10 frames) [3]
60
Xception + Bi-LSTM (10 frames) [3]
63.3
Xception + Bi-LSTM + attention (10 frames) [3]
69
Fight-CNN + Bi-LSTM (5 frames) [3]
70
Fight-CNN + Bi-LSTM + attention (5 frames) 72 [3] Inflated 3D (I3D) + Optical Flow (20 frames) 80
74
J. Selvaraj and J. Anuradha
Fig. 5 Graphical performance comparison of I3D + optical flow model against other models on the surveillance camera fight dataset
6 Conclusion The drastic improvement in the accuracy shows the effectiveness of using optical flow and I3D model over LSTM-based approaches in learning spatio-temporal features from video footages. In future, the work can be extended to use a two-stream architecture which is not explored yet. Further, to speed up the training process and achieve better accuracy, pretrained weights of Inception can be used. This paper shows that this approach of using optical flow and I3D ConvNets provides superior results over contemporary LSTM-based models in violence datasets such as the surveillance camera fight dataset.
References 1. Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2012). https://doi.org/10.1109/CVPRW.2012.6239348 2. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017). https://doi.org/10.1109/CVPR.2017.502 3. Aktı, S., ¸ Tataro˘glu, G.A., Ekenel, H.K.: Vision-based fight detection from surveillance cameras. In: 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (2019). https://doi.org/10.1109/IPTA.2019.8936070 4. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1655 (2017). https://doi.org/10.1109/CVPR.201 7.179 5. Li, J., Jiang, X., Sun, T., Xu, K.: Efficient violence detection using 3D convolutional neural networks. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8 (2019). https://doi.org/10.1109/AVSS.2019.8909883 6. Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H.,
Violence Detection in Video Footages Using I3D ConvNet
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
75
Berciano, A., Kropatsch, W. (eds.) Computer Analysis of Images and Patterns, pp. 332–339. Springer, Berlin Heidelberg, Berlin, Heidelberg (2011) Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3d convolutional neural network. sensors 19 (2019). https://doi. org/10.3390/s19112472 Samuel R., Fenil, E., Manogaran, G., Vivekananda, G.N., Thanjaivadivel, T., Jeeva, S. and Ahilan, A.: Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput. Netw. 151, 191–200 (2019). https://doi.org/10.1016/j.comnet.2019.01.02 Ullah, W., Ullah, A., Haq, I.U., Muhammad, K., Sajjad, M., Baik, S.W.: CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09406-3 Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/ CVPR.2015.7298594 Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195 Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730–734 (2015). https://doi.org/10.1109/ACPR.2015.7486599 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90 Zhang, S., Staudt, E., Faltemier, T., Roy-Chowdhury, A.K.: A camera network tracking (CamNeT) dataset and performance baseline. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 365–372 (2015). https://doi.org/10.1109/WACV.2015.55 Wang, W., Chung, P., Huang, C., Huang, W.: Event based surveillance video synopsis using trajectory kinematics descriptors. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 250–253 (2017). https://doi.org/10.23919/MVA. 2017.7986848 Huang, C., Chung, P.J., Yang, D., Chen, H., Huang, G.: Maximum a posteriori probability estimation for online surveillance video synopsis. IEEE Trans. Circuits Syst. Video Technol. 24, 1417–1429 (2014). https://doi.org/10.1109/TCSVT.2014.2308603
Performance Analysis of Gradient Descent and Backpropagation Algorithms in Classifying Areas under Community Quarantine in the Philippines Carmella Denise M. Adeza, Crishmel Bemar G. Manarin, Jerry S. Polinar, Paolo Jyme A. Cabrera, and Arman Bernard G. Santos Abstract This paper has focused on classifying the areas under community quarantine in the Philippines based on the analysis with gradient descent and backpropagation algorithm with the support of the multilayer perceptron (MLP) model. The backpropagation algorithm calculates the gradient loss function by the chain rule with respect to each weight, calculating the gradient one layer at a time, iterating backward from the last layer to prevent redundant intermediate-term calculations in the chain rule. Gradient descent is an algorithm for a distinguishing function to find a local minimum. The purpose of this research is to lessen the error of decision-making and give a possible solution. This research gives the reader the parametric training results through gradient descent and backpropagation in all data sets to classify the areas under community quarantine within a span of time. This research will improve once and keep developing when time goes by. Keywords Backpropagation · Feedforward · Gradient descent · Multilayer perceptron · Neural networks · Quarantine
1 Introduction Pandemic is a phenomenon that the world experiences right now. It is spreading throughout the world. The pandemic is alarming because the contagion is spreading from person to person, resulting in a lot of catastrophic hindrances for people to live the way they used to. The outcome of it led countries and cities to implement community quarantine with no transportation, closed malls, and establishments, no one is allowed to go out, and most importantly, the economy collapsed. Coping with these challenges and finding the right information for classifying the areas under
C. D. M. Adeza · C. B. G. Manarin · J. S. Polinar · P. J. A. Cabrera · A. B. G. Santos (B) Asiatech-Sta. Rosa, City of Sta. Rosa, Laguna, Philippines e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_7
77
78
C. D. M. Adeza et al.
community quarantine with the help of artificial intelligence are relevant to this improved society. Many challenges at the present time, especially the information, are needed by society. There are numbers of algorithms that are fit to sustain the apprehensiveness on a mathematical and scientific explanation that can support that classification of data sets for setting out the possible solutions and results to the specific problem. The researchers used neural network algorithm with the help of gradient descent and backpropagation algorithms to classify areas under community quarantine in the Philippines. Recognizing the areas will be applied in the algorithm to analyze what will be the location that is under community quarantine. In deep reinforcement learning (deep RL), the optimization of the methods is used along the policy gradient descent or difference error for training neural networks. If the gradient descent algorithm has shown implicit momentum, the output will modify. The goal shifts when an agent ameliorates over time through the change of the loss landscape and local optima [1]. Backpropagation algorithm is the proposed algorithm to enhance its convergence process to access the optimal steady-state and minimize the misadjustment of the error to enhance the known patterns. The learning rate parameter, which depends on the independent values of the input autocorrelation matrix, regulates this algorithm. For the weights update, it provides low error efficiency [2]. The data sets may easily demonstrate and classify the areas under community quarantine in the Philippines using gradient descent and backpropagation. Areas are closed for individuals or the community to reduce the transmission of diseases. The domains are categorized on what basis the quarantine implemented as fatality rate and prediction rate of quarantine. Classifying the areas under quarantine with the help of an algorithm are intended learning outcomes by the researchers. This research aims to evaluate and assess the areas under quarantine in the algorithm utilizing a neural network. The classifiers parametric will be applicable in the parametric model to the data sets and suitable to classify the test data.
2 Related Literature and Studies Cigizoglu and Ki¸si emphasized the performance of flow by artificial neural networks (ANNs) is generally considered dependent on the length of the data. In the ANN training point, a statistical method, k-fold partitioning, was employed in this analysis. In the case of using the traditional feedforward backpropagation algorithm, the procedure was considered useful. Comparison of prediction efficiency and convergence velocity between three different backpropagation algorithms, Levenberg-Marquardt, conjugate gradient, and gradient descent, was the study’s next concern, and its shorter training time and more satisfactory performance [3]. Gill et al. state that neural networks are sufficient to predict these meteorological processes due to non-linearity in climatic physics. The most significant algorithm for training a neural network is the backpropagation algorithm using the gradient descent
Performance Analysis of Gradient Descent and Backpropagation …
79
method. In this paper, to address some of these issues, an integrated backpropagationdependent evolutionary algorithm technique for training artificial neural networks is proposed. Backpropagation is coupled with the genetic algorithm in the proposed methodology in such a manner that the algorithm’s pitfalls are transformed into advantages [4]. Shapiro and Wardi stated the convergence of the stochastic gradient descent algorithm on the basis of a sample path to optimize expected-value output measures in discrete event systems. At successive iterations, the algorithm uses increasing precision, and it shifts against the direction of a generalized gradient of the computed sample output function. The evidence is based on a version of the uniform law of large numbers that is shown to be highly consistent in several discrete event systems where infinitesimal perturbation analysis is understood [5]. According to Dr. R. Karthikeyan [6], artificial neural network is used for exploration and study and for purposes of prediction in a number of health fields. Using supervised training with backpropagation is among the finest common techniques for training a neural network. In both instances, the feedforward backpropagation neural network is used as a classifier to determine the difference between infected and non-infected individuals. The neural network targets will be labeled as infected with 1’s and will be identified as non-infected with 0’s. The neural network model of backpropagation is trained and with data sets and was discussed by Meenakshi [7]. Guo and Gelfand emphasized the multilayer feedforward neural networks analyze certain dynamic properties of gradient descent type learning algorithms. These properties are more related to the net’s multilayer structure than to the nodes’ individual output units. The study is carried out in two stages using a simplified deterministic gradient algorithm. Using LaSalle’s theory, a global study of a related ordinary differential equation (ODE) is completed. Then, by linearizing along a nominal ODE trajectory, a local analysis of the gradient algorithm is achieved [8]. Al-Sammarraie et al. [9], the backpropagation network is one of the most-known neural networks. In a range of application fields, this network has been used. One of them is to identify such objects by understanding only a portion of the object’s knowledge to be categorized. A detailed comparison is provided between the backpropagation maximum and neural network likelihood classifiers for the classification of urban land use was discussed by Paola and Schowengerdt [10]. Rong Zeng et al. [11], the architecture of the backpropagation neural network (BPNN) assisted by an algorithm of adaptive differential evolution to estimate energy consumption. The hypothesized hybrid model incorporates statistics on gross domestic product, demographics, imports, and exports.
80
C. D. M. Adeza et al.
3 Methods and Procedure This section aims to classify the areas that are under community quarantine in the Philippines using multilayer perceptron (MLP) with the help of a backpropagation and gradient descent algorithm. After conducting some experimental tests for modeling this research, there are four (4) sets of data, including sixteen (16) provinces that are under community quarantine in the Philippines. Additionally, the four (4) main parameters that are used include enhance community quarantine (ECQ), modified enhance community quarantine (MECQ), general community quarantine (GCQ), modified general community quarantine (MGCQ). Regarding to reach the minima of the lost function, the learning function described as W (t + 1) = W (t) − dJ(W )/dW (t). For the hidden layer, there are other experimental relations given [12].
3.1 Multilayer Perceptron A multilayer perceptron needs to consist of three (3) layers: input layer, hidden layer, and output layer. MLP can calculate the outcome from inputs that are derived from the weight. MLP is demonstrated in Fig. 1. Figure 1 shows the multilayer perceptron (MLP), which is used to classify the type of quarantine needed for backpropagation and gradient descent. The multilayer perceptron is a class of artificial neural network that is feedforward. A set of input Province
Input
Fatality Rate
Hidden Layer 1
Fig. 1 Multilayer perceptron model
Predicted rate of quarantine
Hidden Layer 2
Output
Performance Analysis of Gradient Descent and Backpropagation …
81
data is mapped onto a series of output data. The MLP that the researchers used consists of 4 layers: input layer, 2 hidden layers, and an output layer. The model used is useful for backpropagation and gradient descent because it is helpful in solving problems of this kind of algorithm. The multilayer perceptron is made up of input which consists of the data from each region affected in the Philippines. The stored data are inside the hidden layers and will be compared to the input given to produce the desired output. Gradient Descent: The key to minimize the loss function and to estimate or predict its original value. Backpropagation: The process to calculate the derivatives. Derived formula L = (Ar − Br )2 L= L=
2 D · C f r − (D · Pr )
Wj Wk
2 Wj x − Pr z Wk
where L = Loss of function Ar = Actual output Br = Predicted output D = Population density C fr = Case fatality rate Pr = Predicted rate W j = Population W k = Area X = death Z = cases Neural networks are trained using gradient descent when designing and configuring the model, this requires selecting a loss function. The loss function has an essential role in that all aspects of the model must be condensed faithfully into a single number in such a manner that changes in that number are a sign of a better model [13]. The loss function is a measurement of an error that defines the prediction lost by comparing the predicted output to the actual output [14]. Table 1 shows the network parameters of gradient descent and backpropagation, which consists of two (2) hidden layers, a number of hidden neurons which composed of the first (1st) layer: 16 and second (2nd) layer: 4. The formula used in backpropagation training function is loss of function, and the weight/bias learning function is gradient descent. Backpropagation is the method of calculating the data sets and the
82
C. D. M. Adeza et al.
Table 1 Network parameters of gradient descent and backpropagation used in our analysis
Parameters
Formula used for computing the feature
Number of hidden layers
2
Number of hidden neurons
1st layer:16 2nd layer:4
Backpropagation training function
Loss of function
Weight/bias learning function
Gradient descent
regression of the gradient method on classifying the data sets to go down through the function of loss [15].
4 Results and Discussion The information on the accuracies acquired in relation to this research from the data sets on multilayer perceptron (MLP) with the help of gradient descent and backpropagation was shown below in Table 2. This study exploited the density area Table 2 Performance analysis of the MLP networks Province
D
C fr
MGCQ (0.01)
GCCQ (0.02)
MECQ (0.03)
ECQ (0.04)
Actual output
Abra
0.1582
0.0060
0.0016
0.0032
0.0047
0.0063
0.0009
Benguet
0.4314
0.0010
0.0043
0.0086
0.0129
0.0173
0.0004
La Union
1.4291
0.0330
0.0143
0.0286
0.0429
0.0572
0.0472
Pangasinan
1.4820
0.0210
0.0148
0.0296
0.0445
0.0593
0.0311
Batanes
0.2255
0.0170
0.0023
0.0045
0.0068
0.0090
0.0038
Cagayan
0.3552
0.0300
0.0036
0.0071
0.0107
0.0142
0.0107
Tarlac
3.4045
0.0250
0.0340
0.0681
0.1021
0.1362
0.0851
Pampanga
2.7537
0.0130
0.0275
0.0551
0.0826
0.1101
0.0358
Laguna
4.3239
0.0230
0.0432
0.0865
0.1297
0.1730
0.0994
Batangas
0.2889
0.0140
0.0029
0.0058
0.0087
0.0116
0.0040
Romblon
1.2190
0.0202
0.0122
0.0244
0.0366
0.0488
0.0246
Palawan
0.2060
0.0120
0.0021
0.0041
0.0062
0.0082
0.0025
Albay
2.6311
0.1090
0.0263
0.0526
0.0789
0.1052
0.2868
Masbate
0.5873
0.0320
0.0059
0.0117
0.0176
0.0235
0.0188
Navotas
63.2862
0.0390
0.6329
1.2657
1.8986
2.5314
2.4682
Malabon
50.5416
0.0370
0.5054
1.0108
1.5162
2.0217
1.8700
Performance Analysis of Gradient Descent and Backpropagation …
83
Table 3 Performance analysis of the loss of function Province
C fr
Loss MGCQ
Loss GCQ
Loss MECQ
Loss ECQ
Abra
D 0.1582
0.0060
0.0000004
0.0000049
0.0000144
0.0000289
Benguet
0.4314
0.0010
0.0000151
0.0000672
0.0001565
0.0002831
La Union
1.4291
0.0330
0.0010803
0.0003451
0.0000184
0.0001001
Pa-gasinan
1.4820
0.0210
0.0002658
0.0000022
0.0001779
0.0007929
Batanes
0.2255
0.0170
0.0000025
0.0000005
0.0000086
0.0000269
Cagayan
0.3552
0.0300
0.0000505
0.0000126
0.0000000
0.0000126
Tarlac
3.4045
0.0250
0.0026079
0.0002898
0.0002898
0.0026079
Pampanga
2.7537
0.0130
0.0000682
0.0003716
0.0021914
0.0055278
Laguna
4.3239
0.0230
0.0031596
0.0001683
0.0009161
0.0054031
Batangas
0.2889
0.0140
0.0000013
0.0000030
0.0000214
0.0000564
Romblon
1.2190
0.0202
0.0001546
0.0000001
0.0001427
0.0005826
Palawan
0.2060
0.0120
0.0000002
0.0000027
0.0000138
0.0000333
Albay
2.6311
0.1090
0.0678478
0.0548335
0.0432036
0.0329582
Masbate
0.5873
0.0320
0.0001669
0.0000497
0.0000014
0.0000221
Navotas
63.2862
0.0390
3.3683305
1.4458589
0.3244171
0.0040051
Malabon
50.5416
0.0370
1.8621971
0.7382373
0.1251683
0.0229901
based on the provinces in the Philippines as functions on input layers and the functions of the hidden layers utilized as fatality rate and predicted rate of quarantine, and the gradient descent and backpropagation derived formula applied for the last layer, the output layer. Table 2 shows a list of the results by making runs in the model in the data sets with the actual output based on the population density and the fatality rate. The following table, Table 3, shows the list of the results lost function by the accuracies from the Table 2 performance table that is obtained. In Table 2, the results from this research using the four MLP functions are as follows: The first section of the multilayer perceptron indicating sixteen nodes for the input layer, and the hidden layer 1 adding sixteen nodes, hidden layer 2 adding four notes, and four nodes for the output layer. Likewise, the three remaining MLP perceptron designs with the additional nodes for the hidden layer as GCQ, MGCQ, ECQ, MECQ. In Table 3, the results from this research using the four MLP functions are guided with the loss of function that is in the last weight of the nodes. The remaining MLP perceptron designs with the additional nodes of loss ECQ, loss MECQ, loss GCQ, loss MGCQ. In Fig. 2, it shows the province of Laguna’s gradient descent visualization of loss of function. The graph shows the magnitude of loss that is combined with the slope is back to the network. The slope shows promising results that are more accuracy desirable.
84
C. D. M. Adeza et al.
Fig. 2 The gradient descent slope
5 Conclusions The researchers concluded based on the computations and study of previously published related studies. It raised the following conclusions were presented: • This study stated that gradient descent and backpropagation are related to adjusting the parameters through the loss of function. Using multilayer perceptron is applicable for classifying the analysis for the risk classification in the rate of quarantine. Based on the Table 3, there is actual output in identifying the risk and chance in terms of rate in quarantine, and it showed excellent accuracy in analyzing the data sets with the MLP and the algorithm that the researcher used. • As part of the artificial neural network, multilayer perceptron is appropriate for identifying the prediction of the given data sets that are from the actual record. This study concluded that researchers find using MLP accessible in identifying predicted risk classification of rate quarantine. The predicted produced value that is presented by data sets from the known actual data was produced accurately. • Gradient descent propagation is significant for minimizing the loss function and predicting its value, and backpropagation is used to calculate the derivatives. It is concluded that the algorithm used an effective pattern in analyzing the classification because it responded accurately to the actual data sets. Acknowledgements The authors wanted to convey one’s gratitude and appreciation to Asia Technological School of Science and Arts, particularly the Faculty of Engineering, and to the research advisor Arman Bernard G. Santos for providing an abundance of support and patience to the Bachelor of Science in Computer Engineering students, as well as fondly remembered experiences and knowledge necessary for learning computer concept. The authors wish to thank SpringerNature for possible publication for International Conference on Innovations in Computational Intelligence and Computer Vision (ICICV-2021). Above all, the researchers expressed their entire faith in God Almighty for strength and wisdom.
Performance Analysis of Gradient Descent and Backpropagation …
85
References 1. Henderson, P., Romoff, J., Pineau, P.: An Empirical Analysis Of Gradient Descent Optimization in Policy Gradient Methods. arXiv:1810.02525 (2018) 2. Hameed, A.A., Karlik, B., Shukri Salman, M.: Backpropagation algorithm with variable adaptive momentum. Knowledge-Based Syst. 114(2016), 79–87 (2016), ISSN 0950-7051 3. Kerem Cigizoglu, K., Ki¸si, O.: Flow prediction by three backpropagation techniques using k-fold partitioning of neural network training data. Hydrol. Res. 36(1), 49–64 (2009) 4. Gill, E.J., Singh, E.B., Singh, E.S.: Training backpropagation neural networks with genetic algorithm for weather forecasting. IEEE 8th International Symposium on Intelligent Systems and Informatics, Subotica, 2010, pp. 465–469. https://doi.org/10.1109/SISY.2010.5647319 5. Shapiro, A., Wardi, Y.: Convergence analysis of gradient descent stochastic algorithms Rong Zeng. J. Optim. Theory Appl. 91(2), 439–454 (2019) 6. Dr, R.K.: Artificial neural network for image computation of decision-making process in machine learning. J. Crit. Rev. 7(10), 2054–2059 (2020). https://doi.org/10.31838/jcr.07.10.359 7. Meenakshi, V.: Medical diagnosis using back propagation algorithm in ANN. Int. J. Sci. Eng. Technol. Res. (IJSETR) 3(1) (2014) 8. Guo, H., Gelfand, S.: Analysis of gradient descent learning algorithms for multilayer feedforward neural networks. In: Proceedings of the 29th IEEE Conference on Decision and Control, pp. 1751–1756 (2005) 9. Al-Sammarraie, N.A., Al-Mayali, Y.M.H., Baker El-Ebiary Y.A.: Classification and diagnosis using backpropagation artificial neural networks (ANN). In: Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, 2018, pp. 1–5 (2018). https://doi.org/10.1109/ICSCEE.2018.8538383 10. Paola, J.D., Schowengerdt, R.A.: A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. In: Proceedings of the IEEE Transactions on Geoscience and Remote Sensing 33(4), 981–996 (2010) 11. Rong Zeng, Y., Zeng, Y., Choi, B., Wang, L.: Multifactor-influenced energy consumption forecasting using enhanced backpropagation neural network. Energy 127, 381–396 (2017), ISSN 0360-5442 12. Yitong, R.: A Step by Step Implementation of Gradient Descent and Backpropagation (2019). Retrieved from https://towardsdatascience.com/a-step-by-step-implementation-of-gra dient-descent-and-backpropagation-d58bda486110 13. Ebrahimi (2020) Gradient Descent and Backpropagation Work Together. Retrieved from https://datascience.stackexchange.com/questions/44703/how-does-gradient-descent-and-bac kpropagation-worktogether 14. Brownlee, J.: Deep Learning Performance (2019). Retrieved from https://machinelearningmas tery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks 15. Simplilearn.: Backpropagation and Gradient Descent in Neural Networks (2019). Retrieved from https://www.slideshare.net/Simplilearn/backpropagation-and-gradient-descent-in-neu ral-networks-neural-network-tutorial-simplilearn/Simplilearn/backpropagation-and-gradientdescent-in-neural-networks-neural-network-tutorial-simplilearn
Least Mean Square Algorithm for Identifying High-risk Areas Vulnerable to Man-Made Disasters: A Philippine Perspective Jaztine B. Abanes, Mary Joy B. Collarin, Kathleen Q. Manlangit, Kem Irish G. Mercado, Nickcel Jirro M. Natanauan, and Arman Bernard G. Santos Abstract This research has centralized on interpretation moves situated on identifying the high-risk areas vulnerable to man-made disasters with the assistance of the least mean square algorithm. The least mean square algorithm is a program that indicates personalized information tools prescribed for extreme learning involvement. The data field will be able to gage us to identify the region in the Philippines that are vulnerable to high-risk man-made disasters. Types included in this research are the frequent man-made disasters in the Philippines specifically: fire, crime, vehicular accidents, and virus spread. This has three factors for the estimation algorithm: the input, hidden, and output layer. Utilizing it in managing information questions acquires distinction to incorporating all the training sets of input and concede to provide the dependence amidst them. This paper gives the proofreader conclusions of perceiving the high-risk areas vulnerable to man-made disasters through this in all datasets and 87% accuracy results. Keywords Feed forward · Neural network · Least mean square · Algorithm · Man-made · Disasters · Philippines
1 Introduction When it comes to man-made disasters, there are some identified high-risk areas vulnerable here in the Philippines. By showing the possible outcome or result using one of the most useful algorithms that can calculate the probable outcome data, using the least mean square (LMS) algorithm is the distinction between the desired and the actual signal. The use of the least mean square (LMS) algorithm is to filter signals and track the performance of an object or things that happened during a man-made disaster. Man-made disasters are calamities due to human intent such as J. B. Abanes · M. J. B. Collarin · K. Q. Manlangit · K. I. G. Mercado · N. J. M. Natanauan · A. B. G. Santos (B) Asiatech College, City of Sta. Rosa, Laguna, Philippines e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_8
87
88
J. B. Abanes et al.
fires, transport accidents, industrial accidents, oil spills, furthermore, and nuclear explosions/radiation are a few examples coming about from human risks. Diverse characteristics and man-made disasters batter the Philippines. In expansion, individuals can inform others of the peril ahead through AppLERT and Facebook so that they can maintain a strategic distance from the zone where the threat is through crowdsourcing. We, too, calculate its hypothetical transitory and steady-state mean square deviation [1, 2]. The least mean square (LMS) algorithm offers special learning curves that are beneficial for machine learning theory and application, using the notion of algorithm convergence. It differs or far from a particular problem which is the disaster caused by humans based on the algorithm we employed. With the use of the feed forward (FF) neural network, datasets can easily interpret and analyze the three regions’ assessment results in terms of typical man-made disaster rate per region in the Philippines and give results on what areas are vulnerable to a man-made disaster. Vulnerability refers to the inability to withstand the effects of a hostile environment. This research aims to solve the transparent problem with the help of the least mean square algorithm. This research reaches the expectation of providing knowledge for the incoming researchers and guides to make it more reliable. It is also aiming for the awareness of the citizens to particular kinds of man-made disasters such as crime, fire, vehicular accidents, and virus spread. With this, we can prevent or lessen the man-made disasters that are happening every day.
2 Related Literature and Studies According to N. Pidgeona and M. O’Learyb, the paper presents a framework see of the organizational preconditions to mechanical mischances and catastrophes, and in specific, the seminal “man-made disasters model” proposed by the late Teacher Barry Turner. Occasions such as Chernobyl, the Challenger, and Bhopal have highlighted the reality of looking for the causes of numerous present-day large-scale mishaps. We must presently consider as key the inter-action between innovation and organizational failings [3]. According to Asor et al., this paper’s most objective is to analyze street accident information to find a covered-up design that can be utilized to a prudent degree to reduce the mishap that happens annually in Los Baños Laguna, Philippines. Anticipating algorithms such as choice tree, Naïve Bayes, and run utilized the show acceptance to distinguish variables influencing mishaps in Los Baños, Laguna. The analysts found that the accident happened does not have a critical relationship with the casualty’s casualty. Then again, the analysts found that time and day play a crucial part in the loss or seriousness of street mischance casualties, especially car collisions [4]. In this paper, Bin Xu and Xu Gong stated that utilized vibration-based harm discovery methods to distinguish harm in strengthened concrete columns after diverse levels of quasi-static cyclic loadings. Finally, performed an energetic test
Least Mean Square Algorithm for Identifying High-risk Areas …
89
on the harmed reinforced concrete (RC) columns, and inspected changes within the column’s dynamic characteristics at unprecedented harm levels. Connected eigensystem algorithm to distinguish the reinforced concrete (RC) column’s modular parameters based on its drive reactions recorded in its entire and different harmed states utilizing accelerometers. These recognized modular parameters are used to identify the column’s harm through a limited component show upgrading procedure [5]. According to Babaeian et al., the number of car mischances due to driver tiredness is exceptionally soaked. Propelled by this desperate requirement, we propose a novel strategy that can distinguish driver’s laziness at an early arrange by computing heart rate variety utilizing progressed calculated relapse-based machine learning algorithm [6]. Another significant research by Saurav Gupta et al., in today’s world of flag handling, nonlinear frameworks have been achieved a substantial significance within the field of framework distinguishing proof and framework control. The modeling of numerous physical frameworks presented by a nonlinear Wiener demonstrates inactive nonlinear work taken after by a linear time-invariant (LTI) energetic framework. The yield of the nonlinear work is considered to be persistent and invertible. This work leads the distinguishing proof of Wiener shows parameters utilizing least mean square (LMS) algorithm, and its two diverse variations named cracked least mean square (LMS) and altered, broken least mean square (LMS) due to its straightforward and viable versatile nature. The reenactment comes about when supporting the derived technique is gotten to analyze the algorithm performance successfully [7]. Jose Principe et al. stated that the blend of the acclaimed part stunt and the most un-mean square algorithm gives a fascinating example by-test update for a versatile channel in repeating portion Hilbert spaces, named in this paper the kernel least mean square (KLMS). This outcome is the paper’s principle commitment and upgrades the least mean square (LMS) algorithm’s current comprehension with an artificial intelligence (AI) point of view [8]. Another significant research by Celesti et al. The sudden activity lull, particularly in quick looking over streets and interstates characterized by rare visibility, is one of the major causes of motorized vehicles’ mischances. In this situation, fast real-time handling of enormous activity information is essential to anticipate mischances [9]. According to Gupta et al., this paper discusses the road accident severity survey using data mining, where different approaches have been considered. The article consists of collections of methods in various scenarios to resolve road accidents [10]. Among the related literature and studies of this research, the commonly found was its fundamental objective—to identify the utter cause of every man-made disaster from the Philippine perspective. As for this research, crime, fire, vehicular accidents, and virus spread were the most eminent disasters to be found in all locales within the country. Every literature and study differ from what algorithm and neural network are going to be utilized. Every algorithm has its ways to answer every research topic, as well as neural networks that give support to its reliable algorithm. For this research,
90
J. B. Abanes et al.
least mean square (LMS) algorithm was used to track down the linear regression of areas vulnerable to man-made disasters. As for the feed forward (FF) neural network, it supports addressing the process of how the prediction settled.
3 Experiment and Methodologies This chapter briefly presents the diverse strategies and strategies utilized by the analysts in doing their examinations. It comprises the investigation plans; the inquire about the district and the subjects. It incorporates the rebellious used to collect and gather information and the measurable devices used in handling and analyzing the data (Fig. 1). To formulate the least mean square algorithm, use this formula: w(n ˆ + 1) = w(n) ˆ + ηx(n)e(n)
(1)
where “w(n)” ˆ is the weight mean “ηx(n)” is the vector mean “e(n)” is the error signal To formulate the linear regression using least square method, use this formula: Yi = f (X i , β) + ei
(2)
where “Yi ” is the dependent variable “ f ” is the function
Vulnerable Areas
Formulas
Man-Made Disasters
Fig. 1 This is the feed forward neural network. It addresses the cycle of how the prediction settled
Least Mean Square Algorithm for Identifying High-risk Areas …
91
“X i ” is the independent variable “β” is the unknown parameter “ei ” is the error term
4 Results and Discussion This chapter presents the given datasets of the four selected man-made disasters and discusses the results. Table 1 shows the raw datasets of man-made disasters such as crime, fire, vehicular accidents, and virus spread that were formulated using the least mean square algorithm. Figure 2 displays the linear regression using the least square method of areas vulnerable to man-made disasters. Figure 3 shows the common man-made disaster rate per region in the Philippines. Figure 4 organizes the areas vulnerable to man-made disasters. Table 1 shows the given datasets. The outcome of the four man-made disasters that were formulated using the least mean square algorithm is as follows: the manmade disasters such as crime, fire, vehicular accidents, virus spread—the least mean square algorithm indicates the formula as a hidden layer to reach the output layer. Table 1 Raw datasets of man-made disasters Crime (%)
Fire (%)
Vehicular accidents (%)
Virus spread (%)
LMS (%)
NCR
12.32
61.41
72.44
74.69
55.22
CAR
17.17
5.70
2.04
0.82
6.43
R-1
6.71
0.97
0.18
0.57
2.11
R-2
2.96
4.98
0.38
1.32
2.41
R-3
5.69
0.49
2.24
4.66
3.27
R-4A
4.04
3.76
0.62
9.70
4.53
R-4B
2.36
0.00
0.00
0.19
0.64
R-5
4.77
0.73
4.59
0.19
2.57
R-6
2.81
1.46
0.69
1.07
1.51
R-7
7.72
3.28
1.83
1.95
3.70
R-8
2.76
2.06
1.72
0.06
1.65
R-9
6.26
1.09
2.93
0.13
2.60
R-10
7.31
6.31
4.72
0.19
4.63
R-11
7.76
0.36
0.81
3.78
3.18
R-12
4.59
3.52
2.96
0.06
2.78
R-13
4.09
3.28
1.83
0.31
2.38
ARMM
0.66
0.61
0.00
0.31
0.40
92
J. B. Abanes et al. LINEAR REGRESSION USING LEAST SQUARE METHOD
60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% -10.00%
0
2
4
6
8
10
12
14
16
18
Fig. 2 Graph displays the linear regression of the datasets gathered in the least mean square algorithm of areas vulnerable to man-made disasters using the least square method
COMMON MAN-MADE DISASTERS RATE PER REGION IN THE PHILIPPINES 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
AR NC CA R- R- R- R- R- R- R- R- R- R- R- R- R- RM R R 1 2 3 4A 4B 5 6 7 8 9 10 11 12 13 M
CRIMES
12. 17. 6.7 2.9 5.6 4.0 2.3 4.7 2.8 7.7 2.7 6.2 7.3 7.7 4.5 4.0 0.6
FIRE
61. 5.7 0.9 4.9 0.4 3.7 0.0 0.7 1.4 3.2 2.0 1.0 6.3 0.3 3.5 3.2 0.6
VEHICULAR ACCIDENT 72. 2.0 0.1 0.3 2.2 0.6 0.0 4.5 0.6 1.8 1.7 2.9 4.7 0.8 2.9 1.8 0.0 VIRUS SPREAD
74. 0.8 0.5 1.3 4.6 9.7 0.1 0.1 1.0 1.9 0.0 0.1 0.1 3.7 0.0 0.3 0.3
Fig. 3 Graph shows the rate of the common man-made disaster per region in the Philippines using the raw datasets in Table 1. The colors above represent the separation of the four man-made disasters
5 Conclusions The researchers had finally drawn a conclusion supported by the series of computations and a review from different prior published researches. Researchers presented the following decision: This study showcased the differences of feed forward neural network applied to the National Capital Region (NCR), Cordillera Administrative Region (CAR), and Autonomous Region in Muslim Mindanao (ARMM) datasets. Table 1 from the results and discussion section by observing the raw datasets shows the four manmade disasters that formulate using the least mean square algorithm. These work researchers can further optimize work to arrive at an accuracy of more than 40–80%
Least Mean Square Algorithm for Identifying High-risk Areas … 60.00%
93
AREAS VULNERABLE TO MAN-MADE DISASTERS
50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
AR NC CA R- RR- R- R- RR-1 R-2 R-3 R-5 R-6 R-7 R-8 R-9 M R R 4A 4B 10 11 12 13 M
LMS 55. 6.4 2.1 2.4 3.2 4.5 0.6 2.5 1.5 3.7 1.6 2.6 4.6 3.1 2.7 2.3 0.4
Fig. 4 Graph organizes the areas vulnerable to man-made disasters in the Philippines from highrisk to low-risk using the datasets conducted in the least mean square algorithm. The red line shows the rate of risk
on this paper’s finding. Predicted classification of the three regions can be accurately suggested with machine learning using a feed forward neural network classifier. The researchers use the feed forward neural network to classify the percentage of 3 regions that we employ in datasets. Feed forward neural network, as part of artificial neural network, is a biologically inspired classification algorithm. Feed forward neural network is principally utilized for managed learning when the information to be learned neither sequential nor time-dependent. The feed forward algorithm is to calculate from output to input vector. Acknowledgements The researchers might want to thank the following people for helping us to finalize this research. First, to our Adviser Mr. Arman Santos who is giving us several ideas and advice. Also, for being part of this research. His assistance was greatly appreciated. We also wanted to extend our special thanks to our classmates who helped us and responds to our questions regarding this research. Most especially to our Almighty God who gave us strength and courage to accomplish this research.
References 1. Fabito B.S., Balahadia F.F., Cabatlao J.D.N.: AppLERT: a mobile application for incident and disaster notification for Metro Manila. https://ieeexplore.ieee.org/abstract/document/7519420 2. Arablouei, R., Werner, S., Huang, Y.F., Dogancay, K.: Distributed Least Mean-Square Estimation With Partial Diffusion. https://ieeexplore.ieee.org/abstract/document/6671443 3. Pidgeon, N., Leary, M.O.: Man-Made Disasters: Why Technology and Organizations (Sometimes) Fail. https://www.sciencedirect.com/science/article/abs/pii/S0925753500000047
94
J. B. Abanes et al.
4. Asor, J.R., Catedrilla, G.M., Estrada, J.E.: A Study on the Road Accidents Using Data Investigation and Visualization in Los Baños, Laguna, Philippines. https://ieeexplore.ieee.org/abs tract/document/8350662 5. Xu, B., Gong, X.: Damage Detection of Reinforced Concrete Columns Based on Vibration Tests. https://ascelibrary.org/doi/abs/https://doi.org/10.1061/41096(366)214 6. Babaeian, M., Bhardwaj, N., Esquivel, B., Mozumdar, M.: Real-Time Driver Drowsiness Detection Using Logistic-Regression-Based Machine Learning Algorithm. https://ieeexplore.ieee. org/abstract/document/7790075 7. Gupta, S., Sahoo, A.K., Sahoo, U.K.: Parameter Estimation of Wiener Nonlinear Model Using Least Mean Square (LMS) algorithm. https://ieeexplore.ieee.org/abstract/document/8228077/ authors#authors 8. Liu, W., Pokharel, P.P., Principe, J.C.: The kernel least-mean-square algorithm. https://ieeexp lore.ieee.org/abstract/document/4410463 9. Celsti A., Galleta., Carnevale L., Fazio M., Ekuakille A.L.: An IoT Cloud System for Traffic Monitoring and Vehicular Accidents Prevention Based on Mobile Sensor Data Processing. https://ieeexplore.ieee.org/abstract/document/8119786/authors#authors 10. Gupta, M., Solanki, V.K., Singh, V.K.: Analysis of Data Mining Technique for Traffic Accident Severity Problem: A Review. https://annals-csis.org/proceedings/rice2017/drp/pdf/121.pdf
Finding Numbers of Occurrences and Duration of a Particular Face in Video Stream S. Prasanth Vaidya and Y. Ramanjaneyulu
Abstract Face recognition automation had crucially progressed with the introduction of Eigenfaces algorithms. Still, the automation of the face recognition is having several issues with uncertain environments. In this paper, a novel system is proposed to detect and recognize a particular person or object in a video and output the person’s occurrences and duration in the total video stream. The proposed system uses the max-margin object detection method to detect and recognize the human. The system starts counting the occurrences of a particular person if a match is found in the system. The total duration of that specific person and frequency count will be displayed as an output. Keywords Object recognition · CNN · Face recognition · Video stream
1 Introduction Nowadays, despite the technological advancements, identifying a person has become is still a challenging task. There are different types of biometrics used for identifying people; one such is face recognition (FR) [1]. In certain environment conditions, FR can perform well if light, pose, and other features can be controlled even multiple humans are present in the database [2]. Automatic face recognition (AFR) is the trending and demanding topic in the present research, as it includes the real-time applications from security to biometric applications [3]. For the human, face acts as a crucial part since it provides the data like identity, expressions, and so on [4]. The goal of proposed scheme is to count the number of occurrences of the particular human profile in the respective map. AFR algorithm extracts the human profile for authentication. Other than Eigenfaces, S. Prasanth Vaidya (B) Aditya Engineering College, Surampalem, India e-mail: [email protected] Y. Ramanjaneyulu Department of CSE, National Institute of Technology, Rourkela, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_9
95
96
S. Prasanth Vaidya and Y. Ramanjaneyulu
Fig. 1 AFR system architecture
several methods are proposed to detect and recognize profiles in captured images or videos [5]. AFR application for real-time data has many obstacles such as environment conditions, processing system hardware, and so on [6]. Basic AFR architecture is provided in Fig. 1. In AFR, the initial step is preprocessing, which separates forsaken artifacts from captured frame or boundary map. In real-time video stream, the complex task is to detect activity of various objects or profiles in both dynamic and static backgrounds.The extraction of the facial features from the dynamic backgrounds is an effort task. By implementing some algorithms (LBP, DRLBP), facial features can be extracted. It is a basic process in various CV applications like live video surveillance, objects tracking, human–computer interaction. Face recognition in a video can be done using Haar features, but it requires a big dataset. The facial values extract using the Haar features will be stored and used for the recognition [7]. With the unique data, the location of the profile can identify. Profile of human differs from one person to another by the nature of culture, physique, and other changes. Pixel detection is one of the basic methods used for FR. This scheme is having less computational complexity and also inadequate to manage minimum changes in the features. In the proposed scheme, convolution neural networks (CNN) is utilized [8]. The rest of the paper is organized as follows: the literature survey in Sect. 2, about Convolution Neural Networks (CNN) in Sect. 3, the proposed scheme in Sect. 4, results and discussion in Sect. 5, and conclusion and future enhancement in Sect. 6.
2 Literature Survey A standard and relative analysis of video-based FR is accessible from COX database containing multiple subjects and various video sequences [9]. A video surveillance framework is simulated with COX DB of three different videos for the purpose of evaluation. Tathe et al. [10] proposed a FR and tracking in videos. In recognition stage, video segments are captured by removing irrelevant information from the image using histogram equalization algorithm. Viola-Jones face detector is used for detection of profiles in complex backgrounds. With the help of Gabor filters, profile features are extracted. In tracking stage, Kalman features are utilized. This algorithm helps in detecting the unauthorized persons to particular places. Andrew et al. [11] proposed a recognizing faces in the broadcast video. In this scheme, FR is combined with acoustic speaker identification to provide the speaker who is speaking currently.
Finding Numbers of Occurrences and Duration …
97
Zhang et al. [12] proposed a cascade FR scheme based on body movements data from adjacent frames. The profiles are identified and verified in different modules by verifying face skin, symmetry, and eye template. Wu et al. [13] proposed a videobased profile re-identification with adaptive multi-part features learning. The features of pedestrians’ appearance and spatio-temporal are combined. This scheme is tested on iLIDS-VID and PRID-2011 benchmark databases to know the performance. The problem of person re-identification in video is investigated. Set of video sequences of different durations are matched instead of image set. A spatio-temporal attention method that functionally holds video-based person re-identification difficulties is proposed [14]. Multiple spatial attention scheme is utilized instead of directly encoding the image for localization of discriminative regions in an image. The disadvantages in the existing system are man power is required, keen observation is needed, mistakes may happen in recognition of human. Problems that are faced by the user while using the existing system are detection and recognition of individual in a video, time consuming process, huge amount of man power is required, results obtained are not absolute and difficulty in identification. The proposed method can be utilized for bird watching of the person, for tracking company commercial advertisement time, tracking of the object. The advantages of the method are with minimum number of image training, it could able to detect the face of a person with CNN, not only detecting the face but also able to count the duration. This proposed system can detect and recognize a particular person in a video and output the occurrences and the duration of the person in the total video. The proposed system uses max-margin object detection to detect and recognize humans. If any match is found, the system will start counting a particular person’s occurrences by starting a timer whenever the person appears in the video and finally displays the total duration of that particular person or object appeared and the count of occurrences in the video. For implementing the proposed scheme, Python with Open CV combination is utilized [15, 16].
3 Convolutional Neural Networks (CNN) CNN is a well-built artificial intelligence (AI) method that utilizes deep learning (DL) to execute generative and descriptive operations, frequently utilizing computer vision consisting of image and video recognition with recommended systems and natural language processing (NLP). A neural network (NN) is a patterned (either software or hardware) after the operations of neurons’ in human brain. For image and signal processing, traditional NNs (TNNs) are not suitable. The neurons’ in the CNN are organized like frontal lobe (the brain part that controls important cognitive skills in humans) to avoid the problems faced by TNN [8]. The structure of the CNN is shown in Fig. 2. The CNN consists of different layers: preprocessing layer for removing irrelevant information, convolutional layer for feature extraction with the help of activation
98
S. Prasanth Vaidya and Y. Ramanjaneyulu
Fig. 2 Convolutional neural networks structure
Fig. 3 Proposed system design
function [17], pooling layer to intercept overfitting [18], classification layer to provide the output class [19].
4 Proposed System In proposed system, the sequence of images in a video which was already captured by camera …or live streaming video as input to the proposed system. For detection of face, there is an image data source to recognize a particular face. The proposed system compares each and every sequence frame by the existing image data source without human interaction. By the result, it reduces the processing time, and there is no chance to make mistakes to detect a face without any ambiguity. The proposed system design is shown in Fig. 3. The advantages of proposed system are time saves, easy to compare, detect people from given source, man power reduces, and mistake in detection of people is low.
Finding Numbers of Occurrences and Duration …
99
4.1 Training Phase In this training phase, the process is stepwise-oriented; first, all the sample images that are to be trained are imported from the system directory; then, next step is to analyze each image that is imported and create an trained dataset which is used in testing phase. So for that, the system should recognize and extract the face encoding from each image and store the returned face encoding in an NumPy array. These face encoding can be recognized using the face recognition package which uses the DLIB, i.e., a pre-trained special landmark detector used to map the facial structure of face. The DLIB uses CNN for the detection of the facial landmarks in the imported image. Using this process, the proposed system has created a trained dataset which can be used in the testing phase.
4.2 Testing Phase In the testing phase, the first step is to input a video from the system directory on which the test is to be performed. After importing the video, the system should divide the video into frames so that each frame will be an image. This can be done by using the inbuilt read function. After dividing the video into frames, now the task is to detect and extract the face encoding in the each image. This can be done by using the inbuilt function face_encoding. Now, after getting the face encoding, they should be stored in the numpy array. After the data is stored, now perform the testing by comparing the video frame face encodings with the sample image face_encodings, so if any match occurs, then start call the opencv functions to display the match in the video by using some functions [20]. By using this type of function, the system can display any type of representation in the video when match is found.
4.3 Count and Duration Phase In this phase, the proposed system has implemented an algorithm to get the count and duration of the particular person, so for this, the system will run a background process, i.e., whenever a match is found, then a counter will be implemented based on the recognized person, and an timer will be set to on when match is found, and then, finally, it will display the total duration a particular person is present in the video and will also display the occurrences of that person in that video (Fig. 5). Using this procedural process, the proposed system will work and displays the result.
100
S. Prasanth Vaidya and Y. Ramanjaneyulu
5 Results and Discussion The sample video with object identification with occurrences count is shown in Fig. 4.
5.1 Testing The proposed system is tested with a video frame consisting 25 frames per seconds (FPS) and also can process a video of length and size extended to any limit. It can also able to recognize the persons in live streaming by providing the number of persons/objects in a video. Five different test cases(TC) are analyzed and are given in Table 1. • Test Case 1: There are no input images are given for training which produce null as the output. • Test Case 2: The proposed system will check for facial values of a person in the video for recognition, if facial values are not matched with the current detected facial values, it will show a block around the face displaying ‘UNKNOWN’ as shown in Fig. 6. • Test Case 3: The proposed system will recognize the particular person when the facial values of the video are matched with the given training data as an input to the system. Then, the name of the person will be displayed as shown in Fig. 7.
Fig. 4 Sample video with object identification with occurrences count
Finding Numbers of Occurrences and Duration …
101
Fig. 5 Sample video-2 with object identification with occurrences count Table 1 Test cases TC ID TC1 TC2 TC3 TC4
TC5
TC scenario No input images Face detection Face recognition
Expected outcome
Null Successfully displayed Successfully recognized Occurrences of the Successfully printed person in a video the total number of a person occurrences in a video value Duration of the person Successfully in a video recognized
Result Unsuccessful Successful Successful Successful
Successful
• Test Case 4: It is obtained that the number of occurrences of a particular person are counted and displayed. • Test Case 5: It is obtained that occurrences of a particular person are calculated successfully and the duration of the that person in the video are calculated and displayed.
102
S. Prasanth Vaidya and Y. Ramanjaneyulu
Fig. 6 Test case-2
Fig. 7 Test case-3
5.2 Confusion Matrix (CM) A CM is a 2 × 2 table with the binary outcome classifier. These outcomes are represented as the number of ‘TP, FP, TN, and FN’ where Tp is true positive, FP is false positive, TN is true negative, and FN is false negative. The confusion matrices for sample videos are given in Table. 2.
Table 2 Confusion matrix for sample video-1 and video-2 Confusion matrix Video-1 Predicted True False Observed
Positive Negative
570 30
50 80
Video-2 Predicted True
False
349 50
0 51
Finding Numbers of Occurrences and Duration … Table 3 Report metric table S. No Measures 1 2 3 4 5 6 7 8 9 10
Sensitivity Specificity Precision Negative predictive value False positive rate False discovery rate False negative rate Accuracy F1 Score Matthews correlation coefficient
103
Values Video-1
Video-2
0.8747 1.0000 1.0000 0.5050
0.9216 1.0000 1.0000 0.3333
0.0000 0.0000 0.1253 0.8889 0.9332 0.6646
0.0000 0.0000 0.0784 0.9245 0.9592 0.9216
5.3 Report Metric Analysis The report metric measure values lie between 0 and 1.0. The RMA values for different metrics on sample video-I and video-II are shown in Table 3. The proposed system has achieved above 90% sensitivity, specificity, and precision with 100%. The proposed system was detecting correctly with less than 50% for sample videos. For FPR and FDR, the system has achieved 0%, this shows that the system has working smoothly, and the detection rate is high without any errors. The proposed system has achieved FNR below 1%, so this shows that the system is working smoothly and the detection rate is high without any errors. The accuracy of the system is greater than 85%, which gives the best performance. The ‘F1 Score’ must be high. So by observing the score, the system has achieved performance more than 93%. Matthews correlation coefficient is the most important one in report metric it is regarded as balance. So by observing the coefficient, the proposed system has achieved performance 66% for Video-I and 92% for Video-II.
6 Conclusion and Future Enhancement Finding number of occurrences and duration of a particular object in video is helpful to find the duration and occurrences of a particular person in the video. The proposed system tracks each person in the video by using the trained dataset of an image. The system improves face recognition under illumination variation and non-frontal view. The system is very simple in terms of calculation and improves accuracy in
104
S. Prasanth Vaidya and Y. Ramanjaneyulu
recognition. It requires only one scanning without any complicated analysis. Facial recognition-based biometric access will be the further implementation of this project.
References 1. Beumer, G., Tao, Q., Bazen, A.M., Veldhuis, R.N.: A landmark paper in face recognition. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), IEEE, 6–pp (2006) 2. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proceedings 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 586–587 (1991) 3. Chaitanya, A.K., Kartheek, C., Nandan, D.: Study on real-time face recognition and tracking for criminal revealing. In: Soft Computing: Theories and Applications. Springer, pp 849–857 (2020) 4. Tang, X., Li, Z.: Video based face recognition using multiple classifiers. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., IEEE, pp 345–349 (2004) 5. Sanivarapu, P.V.: Multi-face recognition using cnn for attendance system. In: Machine Learning for Predictive Analysis. Springer, pp 313–320 (2020) 6. Singh, A., Vaidya, S.P.: Automated parking management system for identifying vehicle number plate. Indonesian J. Electr. Eng. Comput. Sci. 13(1), 77–84 (2019) 7. Laganière, R.: OpenCV Computer Vision Application Programming Cookbook Second Edition. Packt Publishing Ltd (2014) 8. Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neuralnetwork approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997) 9. Huang, Z., Shan, S., Wang, R., Zhang, H., Lao, S., Kuerban, A., Chen, X.: A benchmark and comparative study of video-based face recognition on cox face database. IEEE Trans. Image Proc. 24(12), 5967–5981 (2015) 10. Tathe, S.V., Narote, A.S., Narote, S.P.: Face recognition and tracking in videos. Adv. Sci. Technol. Eng. Syst. J. 2(3), 1238–1244 (2017) 11. Senior, A.W.: Recognizing faces in broadcast video. In: Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In: Conjunction with ICCV’99 (Cat. No. PR00378), IEEE, pp. 105–110 (1999) 12. Zhang, P.: A video-based face detection and recognition system using cascade face verification modules. In: 37th IEEE Applied Imagery Pattern Recognition Workshop. IEEE, vol. 2008, pp. 1–8 (2008) 13. Wu, J., Jiang, J., Qi, M., Liu, H., Wang, M.: Video-based person re-identification with adaptive multi-part features learning. In: Pacific Rim Conference on Multimedia, Springer, pp. 115–125 (2018) 14. Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for videobased person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 369–378 (2018) 15. Bird, S., Klein, E., Loper, E.: Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. (2009) 16. Grace, M.S., Church, D.R., Kelly, C.T., Lynn, W.F., Cooper, T.M.: The python pit organ: imaging and immunocytochemical analysis of an extremely sensitive natural infrared detector. Biosens. Bioelectron. 14(1), 53–59 (1999) 17. Beel, J., Gipp, B., Langer, S., Genzmehr, M., Wilde, E., Nürnberger, A., Pitman, J.: Introducing mr. dlib, a machine-readable digital library. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, pp. 463–464 (2011) 18. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Finding Numbers of Occurrences and Duration …
105
19. Larsen, R.L.: The dlib test suite and metrics working group: harvesting the experience from the digital library initiative. D-Lib Working Group on Digital Library Metrics Website (2002) 20. Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc. (2008)
Dynamic Tuning of Fuzzy Membership Function for an Application of Soil Nutrient Recommendation R. Aarthi and D. Sivakumar
Abstract Successful soil nutrient management in the agriculture sector will enhance soil fertility and crop productivity. Laboratory soil testing procedure provides the nutrient ranges and recommends the fertilizers. After the chemical analysis, technicians recommend the nutrient ranges using the fertility table. The soil characteristics and conditions decide the fertility ranges. This paper proposes the fuzzy inference system-based soil nutrient recommendation system. Changing the fuzzy membership function (MF) ranges based on soil conditions is a tedious job in a fuzzy nutrient recommendation system. Therefore, the particle swarm optimization (PSO) technique dynamically updates the MF ranges in a fuzzy nutrient recommendation system. The final simulation result demonstrates the fuzzy PSO tuning gives finer performance over the typical fuzzy system for soil nutrient recommendation. Keywords Fuzzy system · Membership function tuning · Soil nutrient · Particle swarm optimization
1 Introduction Soil nutrient measurement helps to improve crop productivity and reduce environmental impacts. Because soil nutrient imbalance directly affects the environment as well as farm production. The crop growth depends on 16 essential nutrients, and they are separated as primary, secondary, and macronutrients. The crop growth mainly depends on primary nutrients such as nitrogen [N], phosphorous [P], and potassium [K]. Soil properties have heterogeneous nature, so it is difficult to predict the exact nutrient and fertilizer range [1]. Traditional methods like soil testing provide the advanced procedure for defining the nutrient level and offer the required nutrient status and fertilizer type [2]. Both cases, like advanced laboratory procedures and in-field soil measurement, are required nutrient recommendation procedures. These R. Aarthi (B) · D. Sivakumar Department of Electronic and Instrumentation Engineering, Annamalai University, Chidambaram, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_10
107
108
R. Aarthi and D. Sivakumar
nutrient recommendation procedures are done manually in most laboratories. Based on fertility groups and ranges, soil technicians recommend the nutrient and fertility levels. Interpreting the result from soil tests is a challenging and time-consuming task for soil technicians and farmers because the nutrient recommendation process requires some mathematical calculation. Every time laboratory technicians must ensure a corresponding soil fertility table for a soil nutrient recommendation. Soil fertility table ranges may vary based on soil topographical conditions. The technician may recommend nutrients without error using some mathematical methods, but in-field application is complex for farmers to recommend the nutrient status using the mathematical approach. Utilizing soft computing is the best choice for soil nutrient recommendation. The soft computing techniques are generally utilized in various divisions because fault-tolerant impression information, manages uncertainty, partial truth, and approximation [3]. Fuzzy logic is the best choice for heterogeneous agriculture nature and resents research shows that fuzzy decision-making tools are widely used in different agriculture sectors. The author utilizes the fuzzy system to interpret soil nutrient values (NPK) from the conventional soil test and predict the required nutrient [4]. Mamdani fuzzy-based expert system widely adopted in different agriculture sectors [5–9]. The fuzzy system is affected by subjective decisions because skilled knowledge is varied by experts. Therefore, efficient management of knowledge-based systems is urged. Because the membership function (MF) and rule base directly influence the system performance. The rules and MFs are constructed based on human knowledge. Designing the MF is a crucial part of any fuzzy system. Recent research shows that some intelligent computing techniques are used in the knowledge base system to handle uncertainty. The approaches include are clustering, neural network, PSO, genetic algorithm, self-organizing feature map (SOFM), and Tabu search [10]. The main aim of this paper dynamically adjusts the MF ranges based on soil conditions using the particle swarm optimization (PSO) technique. The PSO technique optimizes the MF ranges of the Mamdani fuzzy inference system. MF ranges are adjusted using laboratory dataset of soil primary nutrients (nitrogen, phosphorus, and potassium). This paper is organized into four sections: Sect. 2 discusses the related work regarding fuzzy MF optimization. Section 3 analysis the proposed method for fuzzy membership optimization and detailed information of primary soil nutrients. Section 4 has the result and discussion, and the final Sect. 5 discusses the conclusion for fuzzy PSO optimization for soil primary nutrient recommendation.
2 Related Works Omizegba and Adebayo implement the optimization technique in a fuzzy inference system to adjust the MF shape and parameter value. The author adopts the PSO algorithm to adjust the linguistics variables in a fuzzy system. The performance results of the optimized FIS better than the normal FIS model [11]. Permana and Hashim applied the PSO optimization technique in a fuzzy-based truck backer-upper
Dynamic Tuning of Fuzzy Membership Function for an Application …
109
problem to define the specific position. The results of the fuzzy particle swarm intelligence technique produce better results in MF than the usual FIS technique [12]. Raouf ketata et al. proposed a numerical data technique to automatically adjust linguistic variables and rule reduction in the fuzzy system. This numerical data are applied to the liver trauma and truck backer-upper control model [13]. The fuzzy rules are generated with the help of swarm intelligence optimization techniques [14]. Bagis presented an approach to determine the optimum MF in the fuzzy system using Tabu search [15]. Elragal, to increase the precision of the fuzzy inference system, the author adopts the swarm intelligence technique. Two different FIS used for optimization is Mamdani and Takagi surgeon inference system. The PSO optimizes MF and rule base. The results show PSO-based classification produces better performance than the normal FIS model [16]. Arslan and Kaya adjust the membership parameter and shape using a genetic algorithm. The GA is implemented in a single input and output function [17]. The author utilizes the PSO algorithm to turn the MF to obtain the precise parameter in the triangular MF. This fuzzy PSO technique is applied in energy consumption in IWSN [18]. Khooban and Abadi developed the closed-loop control system using a fuzzy inference model to monitor the patient glucose profiles to handle the uncertainty in the FIS model. The author applied a population-based heuristics optimization technique for regulating glucose levels in type 1diabetic patients [19]. To obtain the best performances of the fuzzy system, Turano˘gul et al. develop the auto-adjustment MF model using PSO and artificial bee colony (ABC) algorithm. Results show that both methods are more effective to find optimal value in MF determination [20]. The fuzzy-based PSO and ant colony optimization algorithm are applied and test the application of water level and temperature control [21, 22]. The soil nutrient management zone is developed for fertilization management. The author adopts the fuzzy clustering and PSO technique to delineate the management zones and map the primary soil properties to improve soil fertility [23]. The above survey papers show that the PSO is a better choice for MF optimization.
3 Proposed Method In this proposed study, Mamdani fuzzy classifier was adopted for a primary nutrient recommendation. The soil fertility rating helps build input and output MF of fuzzy logic. The MF and rule generation determine the performance of a fuzzy inference system. The MF values and linguistic variables are decided based on the soil fertility status table. Each state follows a different range and groups based on soil conditions. In Tamil Nadu, fertility table ranges are slightly varied based on regions and soil conditions. The boundary of the MF is decided based on the fertility table. Table 1 represents the Tamil Nadu soil primary nutrient fertility table. Despite that, the manual setting of MF variables is time is taken and prone to error, especially each time it requires human experts to change the value of MF. Therefore, novel method was proposed for automatic MF function adjustment using PSO techniques for soil primary nutrient recommendation system.
110
R. Aarthi and D. Sivakumar
Table 1 Soil fertility rating S.No
Soil primary nutrient
Available nutrient level Low
Medium
High Above 150
1
Nitrogen N (KMnO4 )
0–80
80–150
2
Phosphorous P (NaHCO3 )
0–10
10–20
20 above
3
Potassium K (NH4 OAc)
0–60
60–113
113 above
For each iteration, the particle value will change until to reach the optimal value. Based on the final optimal value, the MF will shrink and expand. The study region was selected as Ariyalur district in Tamil Nadu state in India. The district lies between the latitude of 11.2399 N and the longitude of 79.2902 E. Agriculture is a primary resource of this district over the 70% population depends on allied agriculture activities. The major soil texture of this district is loamy and color varied from red to surface yellow. The soil primary nutrients like nitrogen and phosphorus are present at a low level, but potassium is present adequately. The required dataset was collected from the Ariyalur soil texting laboratory total of 400 soil samples are collected from the Jayankondam block. Only soil primary nutrient [N P K] parameters are used in this study.
3.1 Fuzzy System for Soil Nutrient Recommendation System The input fuzzy variables are nitrogen present (N-present), phosphorus present (P-present), and potassium present (k-present). Similarly, the output variables are nitrogen required (N-required), phosphorus required (P-required), and potassium required (K-required). And it corresponding input ranges are [0–200], [0–30], and [0–150] and output ranges are [0–100], [0–60] and [0–100]. The membership ranges are decided based on the fertility table. Triangular and trapezoidal MFs shapes are used. Both input and output of linguistics variables named as low, medium, and high. The fuzzy rules are developed based on information gathered from agriculture experts. Totally, 18 rules are generated for a primary nutrient recommendation. The inference mechanism used is Mamdani max–min and for defuzzification procedure based on the center of area.
3.2 Fuzzy PSO Technique and Model Formation The soil primary nutrient fuzzy system is represented with three inputs and three output systems. Overall, 56 parameters used for particle initialization in the PSO algorithm for the input side 28 parameters, and from the output, they are 28 parameters used. Other details are given in Table 2.
Dynamic Tuning of Fuzzy Membership Function for an Application … Table 2 PSO parameters
Parameter
Value
C1
0.12
C2
1.2
W
0.9
No. of particle
50
Dimension
28
No. of iteration
2
111
The optimization starts with an initial set of parameters, the fuzzy MF parameter is considered as the initial particle. The input and output MF variables are considered as a particle. The position and velocity are evaluated and set the best particle. The condition is satisfied and gets the new optimum value to update the new parameter in the input and output MF. After the optimization, particle checks with the initial parameter and update new fuzzy membership variables. This process is repeated until the optimal Gbest value is reached. Flow diagram of the fuzzy PSO illustrated in Fig. 1.
Fig. 1 Flow diagram of fuzzy PSO
112
R. Aarthi and D. Sivakumar
4 Results and Discussion For soil primary nutrient recommendation, a fuzzy inference system developed and implemented in the MATLAB platform. Three inputs and three fuzzy output models developed using the FIS tool. As mentioned early, soil primary nutrient ranges varied based on soil fertility conditions. Every time we need to change MF ranges in the fuzzy system and make the system complex. Therefore, fuzzy PSO tuned optimize MF is proposed. The input and output parameter of the fuzzy system are considered as the initial particle. PSO algorithm turned until to reach optimal value. Optimize fuzzy PSO input and output MF of soil primary nutrient recommendation system illustrated in Figs. 2 and 3. For easy understanding, the traditional MF and PSO optimized fuzzy MF intersection graph shown in Figs. 4 and 5. As a result, the fuzzy PSO MF slightly changed based on the given collected dataset. Figure 4 shows that the intersection view of the fuzzy PSO input MF. Similarly, Fig. 5 shows that the intersection view of fuzzy PSO output MF. After optimization, the fuzzy parameter ranges are slightly varied. It clearly shows MF ranges varied in some cases and there is no variation in some cases. Based on the primary nutrient dataset ranges, the membership parameter automatically updates through the PSO technique. Therefore, the fuzzy PSO technique dynamically updates its MF ranges compared with the conventional fuzzy system.
5 Conclusion The soil primary nutrient dataset helps to build the MF ranges in the fuzzy system. In the proposed work, the fuzzy system dynamically changes the ranges of MFs. The PSO techniques optimize the fuzzy MF and update new MF ranges automatically. Finally, the fuzzy PSO model automatically updates the MF range based on the soil nutrient dataset. The result showed that the soil primary nutrient recommendation system works well in a dynamic environment using the fuzzy PSO technique. Currently, the fuzzy PSO technique monitors the primary nutrient. In future, the fuzzy PSO technique will recommend the secondary and macronutrients.
Dynamic Tuning of Fuzzy Membership Function for an Application …
Fig. 2 Optimized input membership function of soil primary nutrients
113
114
R. Aarthi and D. Sivakumar
Fig. 3 Optimized output membership function of soil primary nutrients
Dynamic Tuning of Fuzzy Membership Function for an Application …
Fig. 4 Intersection view of fuzzy PSO input membership function
115
116
R. Aarthi and D. Sivakumar
Fig. 5 Intersection view of fuzzy PSO output membership function
References 1. Department of Agriculture and Cooperation: Soil Testing in India. Ministry of Agriculture, Government of India, New Delhi (2011) 2. Rathinasamy, A., Bakiyathu Saliha, B.: Fundamental of soil science. Scientific Publishers, India (2017) 3. Huanga, Y., Lanb, Y., Thomson, S.J., Fang, A., Hoffmann, W.C., Lacey, R.E.: Development of soft computing and applications in agricultural and biological engineering. Comput. Electron. Agric. 71, 107–127 (2010) 4. Ogunleye, G.O., Fashoto, S.G., Mashwama, P., Arekete, S.A., Olaniyan, O.M., Omodunbi, B.A.: Fuzzy logic tool to forecast soil fertility in Nigeria. Scientific World Journal 2018, 7, Article ID 3170816 (2018) 5. Alavi, N.: Quality determination of Mozafati dates using Mamdani fuzzy inference system. J. Saudi Soc. Agric. Sci. 12, 137–142 (2013) 6. Papadopoulos, A., Kalivas, D., Hatzichristos, T.: Decision support system for nitrogen fertilization. Comput. Electron. Agric. 78, 130–139 (2011) 7. Prabakaran, G., Vaithiyanathan, D., Ganesan, M.: Fuzzy decision support system for improving the crop productivity and efficient use of fertilizers. Comput. Electron. Agric. 150, 88–97 (2018) 8. Toseef, M., Khan, M.J.: An intelligent mobile application for diagnosis of crop diseases in Pakistan using fuzzy inference system. Comput. Electron. Agric. 153, 1–11 (2018) 9. Azaza, M., Tanougast, C., Fabrizio, E., Mami, A.: Smart greenhouse fuzzy logic based control system enhanced with wireless data monitoring. ISA Trans. 60, 297–307 (2016) 10. Siler, W., Buckley, J.J.: Fuzzy expert systems and fuzzy reasoning. Wiley, New York (2007)
Dynamic Tuning of Fuzzy Membership Function for an Application …
117
11. Omizegba, E.E., Adebayo, G.E.: Optimizing fuzzy membership functions using particle swarm algorithm. In: Proceedings of the 2009 IEEE International Conference on Systems, Man, And Cybernetics, San Antonio, TX, USA (2009) 12. Permana, K.E., Hashim, S.Z.M.: Fuzzy membership function generation using particle swarm optimization. Int. J. Open Problems Compt. Math. 3(1) (2010) 13. Esmin, A.A.A.: Generation of fuzzy rules from examples using the particle swarm optimization algorithm. In: Seventh International Conference on Hybrid Intelligent System, IEEE Computer Society (2007) 14. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Network (1995) 15. Bagis, A.: Determining fuzzy membership functions with tabu search—an application to control. Fuzzy Sets Syst. 139, 209–225 (2003) 16. Elragal, H.M.: Using swarm intelligence for improving accuracy of fuzzy classifiers, world academy of science, engineering and technology. International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering 4(8) (2010) 17. Arslan, A., Kaya, M.: Determination of fuzzy logic membership functions using genetic algorithms. Fuzzy Sets Syst. 118(2), 297–306 (2001) 18. Maniscalco, V., Lombardo, F.: A PSO-based approach to optimize the triangular membership functions in a fuzzy logic controller. In: AIP Conference Proceedings, vol. 1906, Article ID 190011 (2017) 19. Khooban, M.H., Abadi, D.N.M.: Swarm optimization tuned Mamdani fuzzy controller for diabetes delayed model. Turk. J. Electr. Eng. Comput. Sci. 21, 2110–2126 (2013) 20. Turano˘glua, E., Ozceylanb, E., Kiran, M.S.: Particle swarm optimization and artificial bee colony approaches to optimize of single input-output fuzzy membership functions. In: Proceedings of the 41st International Conference on Computers And Industrial Engineering (2010) 21. Fang, G., Kwok, N.M., Ha, Q.: Automatic fuzzy membership function tuning using the particle swarm optimisation. In: IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application (2008) 22. Valdez, F., Vazquez, J.C., Gaxiola, F.: Fuzzy dynamic parameter adaptation in ACO and PSO for designing fuzzy controllers: the cases of water level and temperature control. Adv. Fuzzy Syst. 2018, 1–19 (2018) 23. Fu, Q., Wang, Z., Jiang, Q.: Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO. Math. Comput. Model. 51, 1299–1305 (2010)
Representative-Based Cluster Undersampling Technique for Imbalanced Credit Scoring Datasets Sudhansu Ranjan Lenka, Sukant Kishoro Bisoy, Rojalina Priyadarshini, and Biswaranjan Nayak
Abstract Credit scoring is an imbalanced binary classification problem, where the number of instances of bad customers is much less than that of good customers. Traditional classification algorithms may not give an effective performance while dealing with these imbalanced datasets, especially when classifying the minority class instances. To overcome the imbalanced problems, different undersampling and oversampling techniques have been proposed to reduce the majority class instances and oversample the minority class instances, respectively. In this paper, the clusteringbased undersampling technique (CUTE) is proposed to tackle the imbalanced credit scoring problems. CUTE implements a new strategy to compute the representativeness of each member of the majority class subset. Here, the proposed model is compared with four traditional resampling techniques using two credit scoring datasets. Additionally, the performance of the proposed model consistently improves in different measures, such as accuracy, precision, recall, Fscore, and AUC. Keywords Credit scoring · Class imbalance · Clustering · Representative instance · Resampling
S. R. Lenka (B) · S. K. Bisoy · R. Priyadarshini Department of Computer Sc. and Engineering, C.V.Raman Global University, Bhubaneswar, India e-mail: [email protected] S. K. Bisoy e-mail: [email protected] R. Priyadarshini e-mail: [email protected] B. Nayak Department of Computer Sc. and Engineering, Trident Academy of Technology, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_11
119
120
S. R. Lenka et al.
1 Introduction Analyzing the credit risks of a customer is a more challenging and essential task for financial organizations. It helps to identify the risks associated with the credit customers, such as selecting the creditworthiness of a customer, supervising the customer before and after the loan approval, portfolio risk management, as well as performance evaluation of the customers. Enterprises such as banking, finance sectors, and internet-oriented finance companies that possess high accuracy and effective credit risk prediction can gain huge financial benefits in the global market. Generally, while dealing with loans for applicants, credit scoring which is a binary classification technique is used to segregate the applicants into two classes such as good (or majority/negative) credit and bad (or minority/positive) credit, according to different characteristics like gender, age, occupation, salary, etc. The financial industries face large financial losses if they failed to identify the bad applicants among their credit clients. In reality, a large chunk of credit applicants are good and very few are bad which possesses a data imbalance ratio. The most important risk in credit scoring problems is to correctly distinguish the bad applicants from those good applicants. When a classification model is designed, imbalanced datasets possess some challenges which do not happen in balanced data sets [1, 2]. The data-level method is one of the common resampling techniques in which the class distribution is adjusted by implementing certain preprocessing steps. It resamples the training set either employing the oversampling or undersampling techniques [3]. In oversampling methods, new minority class samples are generated, whereas in the case of undersampling methods, the majority class instances are randomly eliminated. Among these two resampling methods, undersampling is considered to be more appropriate than oversampling [4]. Oversampling methods may lead to overfitting, whereas in undersampling methods, some useful majority class instances may be eliminated [5]. Since the credit scoring dataset increases steadily, so undersampling method is the better approach to tackle the imbalanced datasets. Clustering-based undersampling techniques are used to reduce the number of majority class instances. It gives significantly better results than random undersampling (RUS) [6]. Traditionally, clustering-based approaches resample the class distributions by partitioning the majority class instances into a set of clusters, and then, the required numbers of representative instances are selected from each cluster to make the dataset balanced. But, the major challenging issues in these approaches are to identify the optimal number of clusters and selecting the informative instances from each cluster. In this study, a clustering-based undersampling technique (CUTE) is proposed to tackle the imbalanced credit scoring datasets and build an effective tool for credit risk assessment. CUTE filer out the unrepresentative majority class instances by applying three approaches. In the first step, it partitions the training set into an optimal number of clusters. In the second step, it computes the representativeness of each majority class sample, and finally, it generates the required number of representative instances from each cluster which is combined with all the minority instances to make the dataset balanced.
Representative-Based Cluster Undersampling Technique …
121
The remaining part of the paper is outlined as follows: Sect. 2 reviews the different data-level techniques to deal with imbalanced datasets. Section 3 has introduced the proposed CUTE model. Section 4 presents the experimental setup considering four factors: descriptions about credit scoring datasets, evaluation metrics, and experimental settings. Result analysis has been done in Sect. 5, and a conclusion is made in Sect. 6.
2 Literature Review Imbalance dataset refers to a scenario when more number of instances belongs to one class while less number of instances belongs to another class. The real-time credit scoring dataset has imbalanced class distributions, and lots of methodologies have been proposed to solve the imbalanced problems [7]. The imbalanced dataset degrades the classification performance of the algorithms, especially while classifying the minority class instances. Mainly three methods are most commonly used to tackle the imbalanced datasets, such as data-level methods, algorithm-level methods, and cost-sensitive methods [8]. As per researchers’ observation [9], datalevel methods can effectively handle the imbalanced problems by modifying the original data distribution. Three commonly used data-level methods are oversampling, undersampling, and hybrid methods. Oversampling methods resample the data distribution either by randomly duplicating the minority class instances or creating new minority class instances. One of the major issues in oversampling methods is overfitting, since the new instances may be generated by replicating the minority class instances. The random undersampling method randomly eliminates the instances from the majority class subset to make the class distribution balanced. The main demerit of this method is that it may remove some informative instances from the subset. Due to this, different techniques have been proposed to select the representative instances in the undersampling methods. In the location-based undersampling method, the majority class instances are more informative that are located near the decision boundaries, so these instances can be used as representative instances [10]. Similarly, the majority class instances are reduced by applying vector quantization technique and the resultant balanced dataset is used to train the support vector machine [11]. In a cluster-based undersampling method [6], first, it partitioned the training set into majority and minority class subsets. Then, the majority class subset is partitioned into different numbers of clusters, where the number of clusters is equal to the number of minority class instances. Finally, the centroid of all the clusters is selected as the majority class subset which is combined with all the minority class samples to build a balanced training set. In the paper [12], an ensemble-based undersampling method has been proposed to deal with imbalanced classification problems. It splits the majority class instances into different subsets, and the size of each subset is equal to the number of instances of the relevant class. As a result, multiple balanced
122
S. R. Lenka et al.
subsets were generated to train the classifiers. Finally, the outputs of each model are integrated to obtain the final class label.
3 Proposed Model In the proposed cluster-based undersampling technique (CUTE), the original dataset is divided into training (80%) and testing (20%) sets. The training dataset is made balanced by implementing the following steps: (a) Identify the required number of clusters, (b) Divide the training set T into M clusters, (c) Select the representative majority class instances from each cluster, (d) Add these representative instances to all the minority class instances to produce a balanced training set. Figure 1 shows the block diagram of CUTE to deal with imbalanced credit scoring datasets.
3.1 K-means Clustering K-means clustering is an unsupervised algorithm which segments the dataset into k clusters. In the first step, randomly k samples are initialized as cluster centers. In the next step, the distance between a sample and the centroid of each cluster is computed. Through Euclidean distance, the distance between data point X i and the centroid C M of dimension d is defined as d 2 j j dist(X i , C M ) = Xi − CM (1) j=1
Fig. 1 Block diagram of CUTE
Representative-Based Cluster Undersampling Technique …
123
Then, each sample is assigned to the cluster with the nearest center. The centroid of each cluster is recomputed using the samples assigned to them. Finally, the algorithm terminates when the centroid of each cluster does not vary significantly. In the proposed work, we determine the optimal number of clusters using the average silhouette coefficient (ASC) method. Silhouette coefficient of instance X i is determined by computing the average distance between X i and all other instances within the same cluster denoted as a(X i ) and the average distance between X i and all other samples in the nearest cluster denoted as b(X i ). Silhouette coefficient of sample X i is expressed as SCoef =
b(X i ) − a(X i ) max{a(X i ), b(X i )}
(2)
3.2 Compute the Representativeness of Majority Class Instances Computing the representativeness of each majority instance from every cluster is the most important task of this study. It helps to filter out the unrepresentative majority instances from each cluster. Let T be the training set consisting of instances that belongs to majority class subset T maj and minority class subset T min . For each X i ∈ T maj , its K-nearest neighbor is defined as NN(X i ) = N i maj ∪ N i min , where N i maj and N i min represent the majority and minority class subset of X i in the ith cluster. The representativeness of each majority class sample is computed by considering two factors: density and closeness. The density of each instance X i ∈ T maj is computed by considering the number of minority class instances Nimin in the set NN(X i ), which is defined as min N (3) D(X i ) = i K D(X i ) determines the proportion of minority class samples in the set NN(X i ). The closeness of each X i ∈ T maj is defined as C(X i ) =
maj
X j ∈Ni
X j ∈Nimin dist(X i ,
dist(X i , X j ) +
X j)
X j ∈Nimin
dist(X i , X j )
(4)
C(X i ) defines the proportion of the sum of distances from X i to all the instances of N i min to the sum of the distances from X i to all the instances of NN(X i ). The representativeness of each X i ∈ T maj can be computed by adding both D(X i ) and C(X i ), i.e., R(X i ) = D(X i ) + C(X i )
(5)
124
S. R. Lenka et al.
R(X i ) carries the information of X i and it can be normalized as R(X i ) R N (X i ) = |SizeiM A | R(X i ) i=1
(6)
and |SizeiM A | i=1
R N (X i ) = 1
(7)
where SizeiM A represents the number of majority class instances in the cluster Ck . R N (X i ) can be considered as the probability of being selected as a representative instance from the majority class subset of ith cluster.
3.3 Selecting the Expected Number of Representative Majority Class Instances The representative majority class instances are selected based on their probability (defined in Eq. 6). To make the dataset balanced, the number of majority class instances are selected using the following strategy. Let N i MA represents the required number of majority class instances retrieved from the ith cluster (1 ≤ i ≤ M), which is defined as Size M A /Size M I i i MA (8) Ni = |Tmin | ∗ M M A / SizeiM I i=1 Sizei where SizeiM A and SizeiM I represent the number of majority and minority class instances in the ith cluster, respectively. All these representative majority class instances selected from each cluster are combined to generate a reduced majority subset Tmaj_reduced . Tmaj_reduced =
M
NiM A
(9)
i=1
Finally, the selected majority instances are combined with all the minority class instances to make the training set balanced Tbal . Tbal = Tmaj_reduced ∪ Tmin
(10)
Algorithm-1 presents the pseudocode of CUTE, in which the input is the training set T and output is the balanced training subset T bal .
Representative-Based Cluster Undersampling Technique …
125
4 Experimental Setup In the experimental section, three factors are considered to access the performance and robustness of the proposed model, i.e., credit scoring dataset, evaluation metrics, and parameter settings.
4.1 Credit Scoring Dataset Two credit scoring datasets are used to evaluate the performance of the model. In this study, German and Taiwan credit datasets were used, which are obtained from the UCI machine learning repository which is presented in Table 1.
126
S. R. Lenka et al.
Table 1 Details of the credit scoring datasets Dataset
#instance
#feature
Good customer
Bad customer
Imbalanced ratio
German
1000
20
700
300
2.33
Taiwan
30,000
24
23,364
6636
3.52
Table 2 Performance metrics of classification models Metrics
Definition
ACC
(TP+TN) (TP+TN+FP+FN)
Precision Recall Fscore AUC
Description Ratio of correctly predicted instances by the total number of instances
TP (TP+FP) TP (TP+FN) 2∗Precision∗Recall (Precision+Recall)
Ratio of true positive to the total instances predicted as positive
–
Area under receiver operating characteristic curve
Ratio of true positive that is actually positive It compute the harmonic mean between precision and recall
4.2 Performance Metrics In credit scoring classification models, for determining the final evaluation results, effective evaluation metrics need to be carried out to determine the predictive capability of the model. The following performance metrics are usually applied to evaluate the credit scoring models, such as accuracy (ACC), precision, recall, Fscore, and area under receiver operating characteristic curve (AUC) (details are listed in Table 2).
4.3 Parameter Settings The number of clusters (M) and the number of nearest neighbors (K) are the two key parameters that will influence the performance of the model. The optimal number of clusters is identified using silhouette coefficient. Similarly, the best K value is determined according to the performance of the algorithm. The efficacy of the proposed model in predicting the probability of default is analyzed by using two classification algorithms, i.e., random forest (RF) and logistic regression (LR). Additionally, the performance of the proposed model CUTE is analyzed using imbalanced credit scoring datasets and comparing it with other traditional oversampling and undersampling techniques, such as random undersampling (RUS), random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and cluster centroid.
Representative-Based Cluster Undersampling Technique …
127
5 Results Analysis The results of the comparative analysis of the proposed model with other traditional resampling methods are discussed in this section. In the experiments, the effectiveness of the proposed clustering-based undersampling technique was examined by applying LR and RF classifiers. Tables 3 and 4 show the comparative analysis of CUTE with different oversampling and undersampling methods in terms of ACC, precision, recall, Fscore, and AUC using German and Taiwan credit scoring datasets, respectively. The best performances in each metrics are shown in bold fonts. In terms of accuracy, CUTE obtains an accuracy of 76% on the German credit scoring dataset and 82% on the Taiwan credit client’s dataset using the RF classifier. As compared to RF, the accuracy level of the proposed model reduces by applying LR but it is quite better than other resampling techniques. It has been observed that RF not gives better Table 3 Performance comparison of different resampling techniques using German credit dataset Algorithm
Method
ACC
Precision
Recall
Fscore
AUC
LR
SMOTE
72.6
74.1
72.6
73.2
74
ROS
70.0
67.0
70.0
67.0
73
RF
RUS
72.4
72.8
72.4
72.2
69
Cluster centroid
62.0
70.0
62.0
64.0
63
CUTE
73.1
72.0
73.3
74.5
76
SMOTE
75.0
69.0
69.0
69.0
78
ROS
75.0
69.0
68.0
69.0
77
RUS
73.0
76.0
73.0
74.0
72
Cluster centroid
65.0
76.0
65.0
67.0
70
CUTE
76.0
78.0
74.0
76.0
79
Table 4 Performance comparison of different resampling techniques using Taiwan credit client’s dataset Algorithm
Method
ACC
Precision
Recall
Fscore
AUC
LR
SMOTE
69.0
77.0
69.0
71.0
68
ROS
69.0
76.0
69.2
71.3
67
RUS
70.0
77.0
70.0
72.1
68
Cluster centroid
62.4
75.1
62.2
66.2
62
CUTE
69.3
78.5
68.3
73.2
71
SMOTE
80.2
79.2
80.6
79.6
68
ROS
81.0
80.2
81.2
80.2
68
RF
RUS
75.1
79.0
75.4
76.0
71
Cluster centroid
67.3
77.2
65.6
68.2
64
CUTE
82.1
80.0
83.1
81.2
72
128
S. R. Lenka et al.
accuracy but also performs better in all the metrics than the LR in both German and Taiwan credit datasets. Similarly, the proposed model achieves much better results in precision, recall, Fscore, and AUC than other resampling techniques. Table 3 shows that using RF on the German credit dataset, CUTE has obtained the precision of 78%, recall of 74%, Fscore of 76%, and AUC of 0.79. Table 4 shows that the performance of the proposed model further improves by applying RF on Taiwan credit client’s dataset in terms of precision, recall, Fscore, and AUC.
6 Conclusion and Future Work In this study, CUTE a clustering-based undersampling technique is proposed to handle imbalanced classification problems. In the experimental analysis, two credit scoring datasets with different imbalanced ratios were applied. The proposed model is compared with four state-of-the-art oversampling and undersampling methods, namely RUS, ROS, SMOTE, and cluster centroid. As performance metrics, accuracy, precision, recall, Fscore, and AUC are used in the experiments along with LR and RF classifiers. From the experimental analysis, it has been observed that CUTE not only achieves better accuracy but also performs well in other metrics. Therefore, we conclude that CUTE can effectively handle the imbalanced credit scoring datasets. In future studies, the proposed model can be further enhanced by focusing on two aspects. First, we will implement ensemble techniques to optimize the performance of the model. Second, we will further enhance the model to be able to deal with multi-class imbalanced problems.
References 1. Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. Adv. Intell. Syst. Comput. 1176, 255–264 (2021). https://doi.org/10.1007/978-981-15-5788-0_24 2. Feng, S., Zhao, C., Fu, P.: A cluster-based hybrid sampling approach for imbalanced data classification. Rev. Sci. Instrum. 91(5), 055101 (2020). https://doi.org/10.1063/5.0008935 3. Yu, L., Zhou, R., Tang, L., Chen, R.: A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Appl. Soft Comput. J. 69(71433001), 192–202 (2018). https://doi.org/10.1016/j.asoc.2018.04.049 4. Błaszczy´nski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150, 529–542 (2015). https://doi.org/10.1016/j.neucom.2014.07.064 5. Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2015). https://doi.org/10.1109/ TKDE.2015.2458858 6. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in classimbalanced data. Inf. Sci. (Ny) 409–410, 17–26 (2017). https://doi.org/10.1016/j.ins.2017. 05.008
Representative-Based Cluster Undersampling Technique …
129
7. Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28(1), 224–238 (2012). https://doi.org/10.1016/j.ijforecast. 2011.07.006 8. Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. (Ny) 477, 47–54 (2019). https:// doi.org/10.1016/j.ins.2018.10.029 9. Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE—majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014). https://doi.org/10.1109/TKDE.2012.232 10. Anand, A., Pugalenthi, G., Fogel, G.B., Suganthan, P.N.: An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39(5), 1385–1391 (2010). https://doi.org/10.1007/s00726-010-0595-2 11. Li, Q., Yang, B., Li, Y., Deng, N., Jing, L.: Constructing support vector machine ensemble with segmentation for imbalanced datasets. Neural Comput. Appl. 22(S1), 249–256 (2013). https:// doi.org/10.1007/s00521-012-1041-z 12. Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recognit. 48(5), 1623–1637 (2015). https://doi.org/10.1016/j.patcog. 2014.11.014
Automatic Road Network Extraction from High-Resolution Images using Fast Fuzzy C-Means Tarun Kumar and Rashmi Chaudhari
Abstract Road networks extracted from very high-resolution (VHR) satellite imagery can be used for updating the geographic information system (GIS) database in real-time. In this work, an automatic road extraction method from VHR images is presented. The proposed method uses the fast fuzzy c-means (FFCM) segmentation and the shape feature analysis for road delineation from VHR imagery. In the first step, the spectral noise from VHR images is eliminated by using median filtering. In the second step, the road network is segmented from non-road components by the FFCM segmentation method. Finally, the segmented road network’s accuracy is further boosted by eradicating non-road parts from the segmented road network. The experiments are performed on different types of VHR images to delineate road networks by the proposed approach. The proposed method’s accuracy is assessed on multiple statistical parameters demonstrating the proposed method’s high precision. Keywords Road extraction · Median filter · Fast fuzzy c-means · Shape features
1 Introduction Automatic road delineation from remote sensing (RS) imagery is a lively study area in the field of RS. The road network delineated from satellite imagery is used to construct and update road network databases, urbanization, and navigation. In the last two decades, various methods of road delineation have been proposed by researchers. The automatic road extraction from VHR imagery is a scientifically stimulating and vital task [1]. The road network extraction task is affected by uneven geometric distortions and radiometric differences due to hurdles generated by shadows, buildings, vegetation, etc.
T. Kumar (B) · R. Chaudhari Radha Govind Group of Institutions, Meerut, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_12
131
132
T. Kumar and R. Chaudhari
The road network delineations methods are categorized into a classificationsbased, knowledge-based, snake and morphology-based methods [2]. The road delineation methods based on automation can be categorized into automatic and semiautomated [3]. Semi-automated methods are laborious and costly due to the intervention of the operator. The automated extraction methods are complex and require additional knowledge. Apart from this, the road extraction process’s automation is limited by various factors such as similarity in spectral properties of road and non-road parts and occluding factors. In past years, various attempts have been made by researchers to automate or semi-automate the road delineation task. The classification using support vector machine (SVM) and mathematical morphology is used to develop a semi-automated approach to demarcate road path from RS images [4]. An integrated framework of spatial-spectral and shape features is used to detect road centerlines [5]. The convolutional neural network and Gabor filter are employed to delineate the road network from satellite images [6]. Level set and mathematical morphology are efficiently implemented to extract road networks from VHR images [7]. An approach based on the linear structure of road network, mean-shift, and level set methods automatically extracts road network from VHR images of the IKONOS satellite [8]. An automated framework of the artificial neural network, k-means clustering, is used to detect the roads in VHR images of QuickBird and IKONOS [9]. An automatic road extraction method using fuzzy classification is proposed to extract data from multispectral data based on the road network’s spectral, local, and global properties [10]. A vectorization method, consisting of segmentation of road components, with the help of decision making to detect road, is used for automated road extraction [11]. In the presence of complex nature of satellite imagery and road features, automatic road extraction is still challenging. In the present work, an automated approach to detect the road paths from satellite imagery is proposed. The proposed method is based on FFCM and the shape feature analysis. The experiments are performed on the different VHR images of two wellknown datasets. The experimental result and statistical evaluation on different parameters establish the proposed approach’s accuracy on different images. The remaining paper is organized as follows: Sect. 2 describes the method used. Section 3 presents the experimental results analysis and comparison with existing methods, and Sect. 4 discusses the conclusion and future aspect of the work.
2 Methodology In this work, an automatic method of road network delineation from VHR imagery has been presented using a non-linear median filter, FFCM [12]algorithm, and geometric feature of the road network. The proposed method is divided into four steps. Firstly, a non-linear filter is applied in preprocessing step to eliminate noise from the VHR imagery, limiting the road information from VHR imagery. Secondly, to segment road networks from VHR imagery, an FFCM algorithm is used. In the third step, the
Automatic Road Network Extraction from High-Resolution …
133
Fig. 1 Flowchart of the proposed study
segmented image contains some non-road components eliminated by shape feature analysis. Finally, the identified road network is evaluated on various quantitative parameters. The flowchart of the proposed method is shown in Fig. 1.
2.1 Image Preprocessing using the Median Filter Satellite imagery during the image acquisition suffers from noise due to thermal effects, circuits, and sensors. Such kind of noise distorts the road network information that lies on the edges. The median filter eliminates the outliers without losing the sharpness of road edges in VHR imagery, and its pdf can be articulated as follows: G(x) =
x ∝−1 −x .e a a (a − 1)!a
where a 2 and x are the variance and intensity values.
(1)
134
T. Kumar and R. Chaudhari
2.2 Road Network Segmentation using FFCM The primary objective of image segmentation is to separate the areas having similar spectral features. In VHR satellite imagery, the road network has different spectral features than adjacent regions such as buildings, vegetation, and water bodies. These non-road components must be segmented from the road network. The present work uses FFCM to identify the road regions from VHR imagery due to its speed and efficiency in handling the noise without losing image information. Fuzzy c-mean (FCM) [13] is a widely used image segmentation method; its fuzziness in pixel clustering generates better segmentation results than traditional segmentation methods. The primary FCM’s objective function for partitioning data points N ⊂ R d into c clusters is given by Eq. 2. {xk }k=1 Jm = (U, V ) =
c N
p
u i j ||xk − vi ||2
(2)
i=1 k=1 c where V = {vi }i=1 is the model of the cluster, U = {μik } signifies partition matrix, the parameter p ∈ (1, ∞) determines the degree of fuzziness and μik is the fuzzy membership subject to constraint given in Eq. 3. c
μik = 1, μik ∈ [0, 1], i ≤ k ≤ n, 1 ≤ i ≤ c
(3)
i=1
The FCM algorithm is well suited for noise-free images, and its performances decrease in the presence of noise and artifacts. The VHR satellite images have high resolution and contain much noise and artifacts. An improved version of the FCM-S1 [14] method, which increases its robustness, is proposed by modifying the objective function in Eq. 2 is presented in Eq. 4. J f cm = jm + α
N c
p
u ik ||x k − vi ||2
(4)
i=1 k=1
where the parameter a limits the result of the consequence and x k is the mean vector of the neighbors of xk . The essential circumstances for u ik to be local smallest are as follows: − 1 ||xk − vi ||2 + α||x k − vi ||2 (m−1) u ik = 1 c 2 2 − (m−1) j=1 ||x k − v j || + α||x k − v j || n p u (xk + αx k ) vi = k=1 ik n p (1 + α) k=1 u ik
(5)
(6)
Automatic Road Network Extraction from High-Resolution …
135
However, the segmentation result of FCM-S1 is not satisfactory for images having salt and pepper noise and if the value of α = 0 then it performs like FCM algorithm. To overcome all these factors, FFCM is proposed [12], which is robust to noise, preserve all the details of the image and computationally faster as it depends on q (no of gray levels) not on N. The objective function of FFCM can be explained as follows: JF FC M =
q c
γl μil (ξl − vi )2
(7)
i=1 l=1
where γl is the number of pixels have gray value l (l = 1,2…., q), ξi is the gray value of ith pixel and x j is the gray value of xi and the value of ξi is calculated as follows: J ∈N ξi i
Si j x j
J ∈Ni
Si j
where Si j =
1 j =i 1 j = i
(8)
2.3 Shape Feature Analysis The segmented image from the FFCM algorithm still contains some false road components, which are segmented as a road because their spectral features are the same as road networks such as buildings and vehicles. The road network has geometric/shape features like elongated structure, length and width ratio, and area. The non-road components in the segmented road network are analyzed based on shape features and are eliminated from segmented images. Further, due to the shadow of trees and vehicles in the road network, they generate holes in the road network repaired by morphological operations.
3 Experimental Results and Analysis The various experiments are performed on different QuickBird images of 1 m/pixel spatial resolution from VP Labs [15] and aerial image [16] of 1.5 m/pixel resolution. The performance of road extraction is evaluated on quantitative parameters [17] E 1 (completeness), E 2 (correctness), and E 3 (quality), which can be calculated as follows: TP ∗ 100 (9) E1 = T P + FN
136
T. Kumar and R. Chaudhari
E2 = E3 =
TP T P + FP
∗ 100
TP T P + FN + FP
(10)
∗ 100
(11)
where TP, FN, and FP stand for the matched road network is extracted and reference data, present in detected but absent in reference road data, and present in ground truth but absent in detected road data, respectively. The experimental results on diverse VHR satellite images are shown in Fig. 2. The
Fig. 2 a1–d1 VHR aerial image [16] reference road network shown in red, a2–d4 median filtered image, a3–d3 segmented road network using FFCM, d1–d4 extracted road network
Automatic Road Network Extraction from High-Resolution …
137
images are of the sub-urban area having other structures similar to the road, making the task challenging. The ground truth data are exposed in red color in Fig. 2a1–d1, the segmented road network using FFCM is shown in Fig. 2a3–d3, and the final extracted road network by the proposed approach is shown in Fig. 2a4–d4. At some places, the road is fragmented due to environmental or man-made factors such as vehicles and trees, due to which the spectral properties of the road are not detected. Another set of VHR images of QuickBird satellite is selected to authenticate the accuracy of described method, and delineated road network is depicted in Fig. 3. The images are of a highly built-up area containing more building structures and
Fig. 3 a1–d1 VHR image of QuickBird [15] reference road network shown in red, a2–d4 median filtered image, a3–d3 segmented road network using FFCM, d1–d4 extracted road network
138
T. Kumar and R. Chaudhari
Table 1 Performance analysis of proposed method on quantitative parameters on VHR aerial images
E1
E2
E3
Figure 2a1
88.24
80.06
72.34
Figure 2b1
82.60
78.05
67.03
Figure 2c1
82.23
76.73
66.46
Figure 2d1
78.01
63.25
53.68
Table 2 Comparative study of road extraction methods Methods
Figure 2b1
Figure 2c1
E1
E2
E3
E1
E2
E3
Huang
80.4
77.3
65.1
93.39
61.42
58.86
Miao
68.1
85.7
61.2
26.12
49.56
21.00
proposed work
82.60
78.05
67.03
82.23
76.73
66.46
are made of material similar to road that poses hurdles in road extraction. The road network extracted from such complex images is shown in Fig. 3a4–d4. To assess the accuracy of the proposed work, the defined road network is matched with the ground road network obtained manually from VHR images. The proposed method’s performance in VHR aerial images on quantitative, E 1 , E 2 , and E 3 parameters is shown in Table 1. To validate the accuracy, the proposed method is compared with the road extraction method proposed by Huang and Zhang [18] and Miao et al. [19] and a comparative study is shown in Table 2. The proposed method outperformed the compared methods in unbiased parameter E 3 because E 1 and E 2 can be influenced. In Fig. 2b1, the proposed method increased the extracted road network’s completeness and quality by 2.2% and 2.1%, respectively, compared to other methods. While in Fig. 2c1, the correctness and quality increase by 15% and 8%.
4 Conclusion This work presents a simple yet fruitful automated road network delineation from VHR imagery. The proposed methods include removing noise from the non-linear median filter, segmentation of road components using the FFCM segmentation algorithm. To eliminate the false road components from segmented image shape features, analysis based on geometric features of the road network is used to delineate the final road network. Different types of VHR satellite imagery are used to validate the efficacy of the work presented. It effectively delineates the road network in complex imagery of the developed urban and semi-urban areas, as shown in experimental results. The proposed work’s performance decreases (road network is broken) when there is variation in the road network’s spectral reflectance due to various occluding
Automatic Road Network Extraction from High-Resolution …
139
factors. The broken road network can be handled efficiently in future works, and the road network can be delineated from low-resolution imagery.
References 1. Mena, J.B.: State of the art on automatic road extraction for GIS update: a novel classification. Pattern Recognit. Lett. 24(16), 3037–3058 (2003). https://doi.org/10.1016/S0167-8655(03)001 64-8 2. Wang, W., Yang, N., Zhang, Y., Wang, F., Cao, T., Eklund, P.: A review of road extraction from remote sensing images. J. Traffic Transp. Eng. (English Ed.) 3(3), 271–282 (2016). https://doi. org/10.1016/j.jtte.2016.05.005 3. Lian, R., Wang, W., Mustafa, N., Huang, L.: Road extraction methods in high-resolution remote sensing images: a comprehensive review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 13, 5489–5507 (2020). https://doi.org/10.1109/JSTARS.2020.3023549 4. Bakhtiari, H.R.R., Abdollahi, A., Rezaeian, H.: Semi automatic road extraction from digital images. Egypt. J. Remote Sens. Sp. Sci. 20(1), 117–123 (2017). https://doi.org/10.1016/j.ejrs. 2017.03.001 5. Shi, W., Miao, Z., Debayle, J.: An integrated method for urban main-road centerline extraction from optical remotely sensed imagery. IEEE Trans. Geosci. Remote Sens. 52(6), 3359–3372 (2014). https://doi.org/10.1109/TGRS.2013.2272593 6. Liu, R., et al.: Multiscale road centerlines extraction from high-resolution aerial imagery. Neurocomputing 329, 384–396 (2019). https://doi.org/10.1016/j.neucom.2018.10.036 7. Yang, L., Wang, X., Zhang, C., Zhai, J.: Road extraction based on level set approach from very high-resolution images with volunteered geographic information. IEEE Access 8, 178587– 178599 (2020). https://doi.org/10.1109/ACCESS.2020.3027573 8. Revathi, M., Sharmila, M.: Automatic road extraction using high resolution satellite images based on level set and mean shift methods. In: 2013 4th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2013, pp. 1–7 (2013). doi: https:// doi.org/10.1109/ICCCNT.2013.6726766 9. Mokhtarzade, M., Zoej, M., Ebadi, H.: Automatic road extraction from high resolution satellite images using neural networks, texture analysis, fuzzy clustering and genetic algorithms. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 37, 549–556 (2008) 10. Bacher, U., Mayer, H.: Automatic Road Extraction From Multispectral High Resolution Satellite Images. Geomatica XXXVI, 29–34 (2005) 11. Hormese, J., Saravanan, C.: Automated road extraction from high resolution satellite images. Procedia Technol. 24, 1460–1467 (2016). https://doi.org/10.1016/j.protcy.2016.05.180 12. Cai, W., Chen, S., Zhang, D.: Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit. 40(3), 825–838 (2007). https:// doi.org/10.1016/j.patcog.2006.07.011 13. Peizhuang, W.: Pattern recognition with fuzzy objective function algorithms (James C. Bezdek). SIAM Rev. 25(3), 442–442 (1983). https://doi.org/10.1137/1025116 14. Chen, S., Zhang, D.: Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans. Syst. Man. Cybern. Part B 34(4), 1907–1916 (2004). https://doi.org/10.1109/TSMCB.2004.831165 15. Vplab: Vplab. http://www.cse.iitm.ac.in/~vplab/satellite.html. Accessed 15 July 2020 16. Turetken, E., Benmansour, F., Fua, P.: Automated reconstruction of tree structures using path classifiers and mixed integer programming. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 566–573 (2012). doi: https://doi. org/10.1109/CVPR.2012.6247722 17. Heipke, C., Mayer, H., Wiedemann, C., Jamet, O.: Evaluation of automatic road extraction. International Archives of Photogrammetry and Remote Sensing 32(3–4W2), 151–156 (1997)
140
T. Kumar and R. Chaudhari
18. Huang, X., Zhang, L.: Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. Int. J. Remote Sens. 30(8), 1977–1987 (2009). https://doi.org/10.1080/01431160802546837 19. Miao, Z., Shi, W., Zhang, H., Wang, X.: Road centerline extraction from high-resolution imagery based on shape features and multivariate adaptive regression splines. IEEE Geosci. Remote Sens. Lett. 10(3), 583–587 (2013). https://doi.org/10.1109/LGRS.2012.2214761
Analysis of Approaches for Irony Detection in Tweets for Online Products S. Uma Maheswari
and S. S. Dhenakaran
Abstract In sentiment analysis, the most common attention and broad current research topic is irony/sarcastic/satire detection. The most complex problem is irony detection which is identifying the meaning or emotion of satire reviews. Sarcastic reviews have positive words but emotion is negative and vice versa. This research work has carried out for irony detection on Twitter Tweets of Amazon product’s reviews. Lexicon-based features with N-gram and Skip-gram-based methods are explored for irony detection. To recognize the best approach for irony detection and prediction, a total of 22,000 variegated irony and non-irony Tweets of Amazon products are collected and used with new deep learning (DL) approach. The results are compared with various machine learning approaches namely, decision tree (DT), support vector machine, logistic regression (LR), and random forest (RF). The proposed work is implemented to find irony detection. It is seen the proposed DL model has produced average results when compared to the classical machine learning DT and RF model. Keywords Sarcasm · Irony · Twitter tweets · Amazon products · Sentiment analysis · Prediction · Lexical approach · N-gram approach · Skip-gram approach
1 Introduction People are living in interconnected digital technology world. For marketing new products, the organizations choose social networks to launch their products and subsequently analyze public opinions through reviews or posts on social networks. But analyzing reviews has some challenges to identify sarcastic opinion. Twitter is widely used social networks for expressing opinion or posting tweets. Sarcastic analysis is a kind of sentiment analysis. To disclose the high or stressful thoughts, sarcasm or irony is used in reviews. While people search reviews to see comment of products, encounter sarcastic reviews. The sarcastic reviews are tough task for S. U. Maheswari (B) · S. S. Dhenakaran Department of Computer Science, Alagappa University, Karaikudi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_13
141
142
S. U. Maheswari and S. S. Dhenakaran
analyst to seek the meaning since customer express their feelings in sarcastic or satire reviews. People with negative implicit intuition explicitly expressed opinion format may literally convey positive and vice versa. Thus aforementioned issue focuses on irony detection finding the precise meaning product reviews. The aim of this work is for finding effective irony detection either with the proposed deep learning (DL) approach and various kinds of machine learning techniques. To find irony reviews, lexicon-based features are applied in POS-tagging, sentiment analysis process and natural language processing (NLP). Typically sarcastic detection are classified into three parts namely, lexicon-based features, pragmatic features and hyperbolic features. Lexicon-based features have sentiment oriented words; hash tagged words and N-gram words. Hyperbolic-based features have punctuation marks, intensifiers, modifiers, quotation marks, capitalized words and slang words. Pragmatic features have emojis, smilies and replies. Kreuz et al. [1] initiated N-gram features at first time with importance and beneficiary of lexical features to recognize the irony detection. [2] Author has illustrated the usefulness of syntactic features with lexical features in irony detection. Riloff et al. [3] have demonstrated the construction of lexical features accompanied by unigram, bigram and trigram features. The rest of this work is classified as follows. Section 2 has investigated related works on sarcasm and irony detection on sentiment analysis. Section 3 has explained process of proposed system. Section 4 draws experimental analysis and results. Finally, Sect. 5 states conclusion with future direction of work.
2 Related Works Yunitasari et al. [4] have done automatic analysis on Indonesian tweets for purpose of sarcasm detection, unigram sentiment related features, and punctuation related features, lexical words, syntactic features and top words by feature selection process. Authors have suggested random forest (RF) algorithm for identification of sarcastic reviews. This work has illustrated TF-IDF for the feature retrieval process while using Naïve Bayes classifier on sentiment analysis. This work has proved achieved accuracy by 5.49% increment. Potamias et al. [5] have employed DL technique namely, RCNN-RoBERT to identify satire reviews. This mechanism has used pre-trained word embeddings with the help of recurrent neural network (RNN) layer of DL approach. In current situation, research community uses DL approaches. Author also stated that classical machine learning techniques namely, K-nearest neighbor, decision tree (DT), RF are not suitable for real-time applications. Several diverse mechanisms [6] are being utilized to recognize reviews irony which are rule-based approach [7], pattern-based approach [8], and behavioral designing approach [9], and so on. To identify the satire reviews, researchers have opted different machine learning techniques on dataset namely, Twitter, Facebook, IMDB, etc. [10–14]. Reganti et al. [15] have recommended sarcasm detection model on four kinds of irony features for exaggeration, incongruity, parody and reversal. Bharti et al. [16] have devised six kinds of satire, namely, contrary reviews among negative view, positive view
Analysis of Approaches for Irony Detection in Tweets …
143
and vice versa, hash tag-based, behavior-based, contradiction-based and temporal knowledge-based satire detection. Riloff et al. [3] have proposed irony detection on unbalanced massive amount of political dataset [17]. Vijayalakshmi and Senthilrajan [18] have built hybrid approach for irony recognition by the combination of N-gram methods, knowledge extraction, lexical analysis, contrast approach, hyperbole and emotion-based approach. Some researchers have recommended sentiment analysis model based on lexicon approach [19–21]. Gonçalves and Araújo [21] have carried out works for investigating and comparing various lexicon-based approaches to find most efficient method for sentiment analysis. González-Ibánez et al. [22] have formulated lexical features from WordNet affect [23] and Linguistic Inquiry and Word Count (LWIC) [24] to detect satire reviews. Barbieri et al. [25] have developed seven kinds of lexical features in irony detection. Tsur et al. [26] have illustrated sarcasm detection system with outperforming results on Amazon product’s tweets with the help of the bigram-based features. From the study, it is understood that many researchers have utilized several types of features for sarcasm detection in reviews. Also it is seen that features are combined with lexicon-based features for detection of irony reviews. The investigation says that the classical machine learning techniques are not best fit and DL can produce best result on satire review detections. This proposed work also carried out lexical features with DL approach and compared the result with machine learning approach for irony identification in product reviews.
3 Proposed Methodology This proposed methodology have the six phases namely, data sanitization, lexiconbased feature selection, DL classification model construction, classification and prediction, performance analysis and comparisons of results with basic models.
3.1 Data Sanitization Python Tweepy and Twitter API have facilitated this work to collect real-time irony and non-irony tweets about Amazon products. The tweets are stored in Hadoop Distributed File Systems (HDFS) and tweets are extracted for data sanitization. Typically, tweets are consisted of user mentions, #sarcasm, #sarcastic, URL link, spelling mistakes, special symbols and characters, and these terms are ignored from the captured tweets. Because of the above mentioned terms not useful for proposed social sentiment analysis. And the remaining terms of tweets are sanitized by python and NLP preprocessing methods [27]. Tokenization method is performed for splitting tweets into tokens or words. Stop word removal method is performed removing surplus words from the tweets. Stemming method is used to cut the prefix to make
144
S. U. Maheswari and S. S. Dhenakaran
frame basic terms. Part of Speech Tagging (POS Tagging) is applied for identifying terms like noun, pronoun, verb, adverb and adjective, etc.
3.2 Lexicon-Based Feature Selection Then the sanitized data (words) are passed to feature selection process. In this work, lexicon-based features namely, opinion oriented words, N-gram words and Skipgram are used. N-gram words include one-gram, bigram, trigram and multi-gram words. Skip-gram words covers one skip-gram, two-skip-gram, and tri-skip-gram words. Generally, N-gram words use bigram and some of the work has included trigrams but multi-gram and skip-gram words are not considered. The proposed work has considered multi-gram words as well as skip-gram words to retain the syntactic meaning of the words with the adjacency words.
3.2.1
N-Gram Words
An n-gram phrase may refer to a unigram, bigram, trigram and a multi-gram. If n is defined as 1 then gram is called as unigram, likewise n = 2 bigram, n = 3 trigram, n = 4 or 5 or 6 and so on, and finally a gram is defined as multi-gram. In unigram model each word is extracted from the tweets, in bigram model two successive words are fetched from reviews, in trigram model, three successive words are retrieved from the tweets, in multi-gram model four consecutive words are collected from the reviews.
3.2.2
Skip-Gram Words
The skip-gram is just like N-gram model but in skip-gram alternate terms are selected and hence in the one skip-gram words, first, third, fifth, etc. terms are selected as feature words. Similarly, in two-skip-gram words is formed by first, fourth, seventh terms (by leaving two words) as feature words. Likewise three skip-grams is formed by first, fifth words as feature words.
3.2.3
Sentiment Orientation Words
Opinion oriented words are extracted from each tweet and valence of each opinion word is estimated from the Text Blob, Vader, SentiWordNet and modified AFINN lexicons. When the valence is positive, positive count is incremented and for negative valence, negative count is incremented otherwise neutral count is incremented. Count of positive, negative and neutral words utilized as sentiment orientation features for classification of irony tweets.
Analysis of Approaches for Irony Detection in Tweets …
3.2.4
145
Overall Valence
Here overall opinion valence of tweet is calculated. The overall opinion can be positive or negative or neutral, which is estimated with the help of the aforementioned four lexicons. Overall opinion of the tweet is also utilized one of the lexicon features for classifying irony tweets in this work.
3.3 Deep Learning Classification Model Construction Deep learning has gained more attention in the classification of sentiments on reviews. Impact of previous and ongoing research on sentiment analysis and irony detection has motivated to develop DL model and compare the proposed model with basic machine learning models for choosing appropriate classifier for irony detection and classification. In this proposed DL approach, Multi-Layer Perceptron model is implemented with classical machine learning principles. The proposed model DL has used 12 nodes for input layer, 2 hidden layers with 9 nodes and one output layer have been defined with corresponding Relu and Sigmoid activation functions, respectively. Adam Optimizer approach is used to train the neural network; here Binary Cross Entropy loss function is used to evaluate the binary classification model. In this feed forward neural network model, Epochs have been utilized for iterations and batch size has been used to split entire data into certain fixed small size data. The performance of the implemented neural network model is assessed by performance matrices (Figs. 1, 2).
3.4 Classification and Prediction Sentiment score is calculated for the extracted features. Positive words, negative words, neutral words, overall polarity of the tweets and polarity of adjacent negation words are estimated. In this work, irony tweets are labeled as 1 and non-irony tweets are labeled as 0 for the purpose of irony classification and prediction process. The captured tweets are divided into training and testing data sets. For training purpose 80% of data and for testing purpose 20% of data have been utilized to perform classification and prediction. Training and testing are implemented on the MultiLayer Perceptron DL model. Classification and prediction are performed on these existing basic machine learning models namely, support vector machine (SVM), DT, RF and logistic regression (LR). Finally, result of classified tweets is stored in MangoDB.
146
S. U. Maheswari and S. S. Dhenakaran
Fig. 1 Proposed system architecture
Fig. 2 Neural network layers in proposed deep learning model
Analysis of Approaches for Irony Detection in Tweets …
147
3.5 Performance Analysis Performance of the proposed DL model and classical machine learning models are evaluated with the confusion matrix. F1-score, precision, recall and accuracy are derived using confusion matrix [27].
3.6 Comparison of Results with Basic Models The proposed DL model performance measured by accuracy, precision, recall and F1score [27] are compared with performance of existing traditional machine learning models.
4 Experimental Analysis Deep learning model and other exiting machine learning models are tested on various ranges of 5000, 7000 and 10,000 reviews. The performance measures of DL network model are depicted below. Table 1 represents the performance analysis of sentiment analysis with 5000 irony and non-irony tweets. Here, the DT and LR models have produced the highest accuracy, precision, recall and F1-score values. The proposed DL model has achieved an average rate of accuracy, precision, recall and F1-score values. SVM produced the lowest rate of accuracy, precision, recall and F1-score values. Graphical representation for this test results are illustrated in Fig. 3. Table 2 shows the performance of irony identification model on miscellaneous of 7000 irony and non-irony tweets. Here, the LR and DT models have provided highest accuracy, precision, recall and F1-score values. The proposed DL model has provided medium rate of accuracy, precision, recall and F1-score values. SVM only has provided lowest accuracy, precision, recall and F1-score values. Graphical representation for the test result revealed in Fig. 4. Table 3 has shown the performance of irony detection model with 10,000 tweets. Table 1 Performance measure on 5000 tweets Techniques
Accuracy %
Precision %
Recall%
F1-Score%
DT
86
88
86
87
SVM
61
62
82
71
LR
85
87
88
87
RF
71
72
82
77
DL
72
75
80
77
148
S. U. Maheswari and S. S. Dhenakaran
Fig. 3 Performance of various methods Table 2 Performance measure on 7000 tweets Techniques
Accuracy %
Precision %
Recall %
F1-score %
DT
85
88
86
87
SVM
61
62
82
71
LR
86
89
87
88
RF
71
72
82
77
DL
72
71
83
77
Fig. 4 Graphical representation of various methods
Analysis of Approaches for Irony Detection in Tweets …
149
Table 3 Performance measure on 10,000 tweets Techniques
Accuracy %
Precision %
Recall %
F1-score %
DT
85
88
86
87
SVM
61
62
82
71
LR
86
88
87
87
RF
71
72
82
77
DL
72
70
88
78
Fig. 5 Graphical representation of methods with 10,000 tweets
Here also LR and DT models have shown highest accuracy, precision, recall and F1-score values. But the proposed DL model has performed better level of accuracy, precision, recall and F1-score values. The SVM only has shown the lowest accuracy, precision, recall and F1-score values. Graphical representation and screen shot for this test result is exhibited in Fig. 5.
5 Conclusion A new DL method is proposed to perform sentiment analysis. The experiment is conducted to analyze the comments of customers for online products. The work has considered irony words in tweets to better perform sentiment analysis. For experimentation, 5000, 7000 and 10,000 tweets of Amazon products are captured. The tweets are then sanitized for irony classification and prediction. For irony classification and prediction, a new DL model called feed forward model is implemented. The results of the proposed neural network model are compared with classical machine learning algorithms DT, LR, SVM and RF. It is concluded that the DL algorithms and
150
S. U. Maheswari and S. S. Dhenakaran
machine learning algorithms are not supportive for irony detection in tweets. From the investigation, it is understood that a hybrid model may be necessary to derive irony detection in tweets to forecast customer comments for their online products.
References 1. Kreuz, R.J., Roberts, R.M.: Two cues for verbal irony: hyperbole and the ironic tone of voice. Metaphor. Symb. 10(1), 21–31 (1995) 2. Kreuz, R., Caucci, G.: Lexical influences on the perception of sarcasm. In: Proceedings of the Workshop on Computational Approaches to Figurative Language, pp. 1–4 (2007) 3. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 704–714 (2013) 4. Yunitasari, Y., Musdholifah, A., Sari, A.K.: Sarcasm detection for sentiment analysis in Indonesian tweets. IJCCS 13(1), 53–62 (2019) 5. Potamias, R.A., Siolas, G., Stafylopatis, A.G.: A transformer-based approach to irony and sarcasm detection. Neural Comput. Appl. 32(23), 17309–17320 (2020) 6. Masroor, S., Husain, M.S.: Sarcasm analysis using social media: a literature review. In: Second International Conference on Advancement in Computer Engineering and Information Technology, ISSN (online): 2250–0758, ISSN (print): 2394–6962, ACIET (2018) 7. Hiai, S., Shimada, K.: A sarcasm extraction method based on patterns of evaluation expressions. In: 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), vol. 11, pp. 31–36. IEEE (2016) 8. Bouazizi, M., Ohtsuki, T.O.: A pattern-based approach for sarcasm detection on twitter. IEEE Access 4, 5477–5488 (2016) 9. Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106 (2015) 10. Bouazizi, M., Ohtsuki, T.: Opinion mining in twitter how to make use of sarcasm to enhance sentiment analysis. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 1594–1597 (2015) 11. Chen, C., Zhang, J., Xie, Y., Xiang, Y., Zhou, W., Hassan, M.M., AlElaiwi, A., Alrubaian, M.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational Social Systems 2(3), 65–76 (2015) 12. Lunando, E., Purwarianti, A.: Indonesian social media sentiment analysis with sarcasm detection. In: 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 195–198. IEEE (2013) 13. Khokhlova, M., Patti, V., Rosso, P.: Distinguishing between irony and sarcasm in social media texts: linguistic observations. In: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), pp. 1–6. IEEE (2016) 14. Tayal, D.K., Yadav, S., Gupta, K., Rajput, B., Kumari, K.: Polarity detection of sarcastic political tweets. In: 2014 International Conference on Computing for Sustainable Global Development (INDIACom), pp. 625–628. IEEE (2014) 15. Reganti, A.N., Maheshwari, T., Kumar, U., Das, A., Bajpai, R.: Modeling satire in English text for automatic detection. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 970–977. IEEE (2016) 16. Bharti, S.K., Babu, K.S., Jena, S.K.: Parsing-based sarcasm sentiment recognition in twitter data. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1373–1380. IEEE (2015) 17. Khodak, M., Saunshi, N., Vodrahalli, K.: A large self-annotated corpus for sarcasm. arXiv preprint arXiv:1704.05579 (2017)
Analysis of Approaches for Irony Detection in Tweets …
151
18. Vijayalaksmi, N., Senthilrajan, A.: A hybrid approach for sarcasm detection of social media data. IJSRP 7(5) (2017) 19. Rao, Y., Lei, J., Wenyin, L., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4), 723–742 (2014) 20. Bae, Y., Lee, H.: Sentiment analysis of twitter audiences: measuring the positive or negative influence of popular twitterers. J. Am. Soc. Inform. Sci. Technol. 63(12), 2521–2535 (2012) 21. Gonçalves P., Araújo, M.: Comparing and combining sentiment analysis methods. In: Proceedings of the First ACM Conference on Online Social Networks, pp. 27. ACM (2013) 22. González-Ibánez, R., Muresan, S., Wacholder, N.: Identifying sarcasm in Twitter: a closer look. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 581–586 (2011) 23. Strapparava, C., Valitutti, A.: Wordnet affect: an affective extension of wordnet. Lrec 4, No. 1083–1086, p. 40 24. Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count: Liwc, vol. 71, no. 1, pp. 1–11. Lawrence Erlbaum Associates, Mahway 25. Barbieri, F., Saggion, H., Ronzano, F.: Modelling sarcasm in twitter, a novel approach. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 50–58 (2014) 26. Tsur, O., Davidov, D., Rappoport, A.: ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 4, No. 1 (2010) 27. Uma Maheswari, S., Dhenakaran, S.S. Sentiment analysis on social media big data with multiple tweet words. IJITEE 8(10), 3429–3424 (2019)
An Efficient Gabor Scale Average (GSA) based PCA to LDA Feature Extraction of Face and Gait Cues for Multimodal Classifier N. Santhi, K. Annbuselvi, and S. Sivakumar
Abstract The aim of multimodal biometric system is to enhance the recognition performance by applying the complementary features of face and gait. Feature extraction is a vital process in recognition that can change the performance considerably. In this paper, we present a new approach Gabor Scale Average (GSA) based Principal Component Analysis (PCA) to Linear Discriminant Analysis (LDA) to extract and reduce the features of face and gait cues. In our approach, a bank of Gabor filters are firstly adopted to extract features of the face and gait images among different scales and orientations. It generates high dimension and high redundant Gabor features. This high dimension and redundant features of face and gait cues are reduced by averaging different orientations of Gabor features of same scale into one average Gabor features called Gabor Scale Average (GSA) separately. To offer a good discrimination ability and to avoid the curse of dimensionality in features, we transformed this GSA feature vectors of face and gait into the basis space of PCA to LCA. Next, these feature vectors are normalized and fused at feature level for better classification. The proposed approach is tested using publicly available database and shows promising results compared to other related approaches in terms of recognition rate. Keywords Feature extraction · GSA · PCA · LDA · Multimodal
N. Santhi · K. Annbuselvi Department of Computer Science, V.V.Vanniaperumal College for Women, Virudhunagar, Tamilnadu, India e-mail: [email protected] K. Annbuselvi e-mail: [email protected] S. Sivakumar (B) Cardamom Planters’ Association College, Bodinayakanur, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_14
153
154
N. Santhi et al.
1 Introduction A biometric recognition system identifies an individual by determining the genuineness of a particular physiological and/or behavioral characteristic influenced by that person. Physiological biometrics include face, iris and retina, fingerprints, hand geometry, ear patterns, eye patterns and etc. Behavioral biometrics include signature, voice, key stroke, gait, gesture and others. Most of the biometric systems that are in use today have only one biometric trait to establish identity. Some of the problems of unimodal biometric systems are (i) the lack of universality of certain biometrics traits (ii) lack of data availability for some biometrics and (iii) criminal activity. Thus, challenge response authentication can be facilitated with the help of the multimodal biometric system. Multimodal biometric system aims to overwhelm the unimodal system by integrating the information obtained from multiple biometric features. To achieve optimum performance, the system must utilize as numerous cues as possible and merge them meaningfully. The merging of biometrics can take place on three levels: (1) the feature level, where the features obtained from individual biometrics are integrated into a new feature set, which signifies the subject’s individuality in a higher dimension and more discriminating hyperspace. (2) The score level, where individual biometric match scores are normalized and then combined using the rules like sum, product, max, etc. to get a different likeness score. (3) The decision level, where different classifiers make their determination as to the identification of the individuals and then rules like majority voting, ranking and etc. are used to make the ultimate decision [1]. Within the framework of automated human recognition, it is common to combine face and gait, which can be obtained from a single camera. Suppose the individual is long away from the camera, it is hard to get facial data with adequate resolution for recognition. However, when available, it provides a very strong reconnaissance indicator. A trait, which can be identified and measured when the individual is long away from the camera, is human gait or the style of walking. They can be opposite features for recognition because the gait is less sensitive to the factors that influence face recognition, such as lighting, low resolution, etc. whereas the face is vigorous to covariates which affect gait recognition, for example, walking surface, clothing, carrying bags, etc. [2]. Research issues in Human Recognition is increasing the recognition rate, which is based on preprocessing, extraction of features and classification. Among them feature extraction changes the accuracy considerably. Gabor Wavelet has come out as one of the powerful approaches for extraction of image features. There are numerous related works using Gabor Wavelets. Yazdanpanah et al. [3], proposed a multimodal biometric system using face, ear and gait, using Gabor followed by PCA to extract and reduce the feature sets and integrated them at matching score level. Later Wang et al. [4] proposed a method using Gabor wavelets with different orientation and scales to extract the gait features, then, a Two Dimensional Principal Component Analysis (2DPCA) technique is applied
An Efficient Gabor Scale Average (GSA) Based PCA to LDA …
155
to decrease the dimension of vectors for better classification. In similar fashion. MageshKumar et al. [5] proposed a method, where Gabor wavelets preprocess the face image to remove the variability in pose and illumination, later features are extracted using PCA and LDA, finally classified using distance measure. Further Deng et al. [6], proposed a new local Gabor filter family with part of frequency and orientation to extract face features, subsequently dimensionality reduction using PCA followed by LDA for better classification. Moreover Mattela and Gupta [7] offered a Gabor-mean-DWT method for extracting features of face to automatically recognize facial expression. From potent feature extraction and distinguish method the accuracy of biometric recognition system is enhanced but not yet sufficient for applications autonomous of the practical person. Even though many features have been used to describe biometrics, the need for the new algorithm with robust, distinct, comprehensive and accessible functionality remains urgent. In this paper, we offer a new method that extract features from face and gait images and fuse them at the feature level for better recognition. First a bank of Gabor Wavelet filters are applied for extracting the features of the face and gait images among 3 scales {1, 2, 3} and 6 orientations {0, 30, 60, 90, 120, 150}. It generates high dimension and high redundant Gabor features. This high dimension and redundant features are reduced by averaging 6 different orientations of Gabor features of same scale into one average Gabor feature called Gabor Scale Average (GSA). Subsequently the dimensionality of the GSA vectors of face and gait are transformed into the space of PCA to LDA, normalized and integrated at feature level. Finally similarity based distance measures are applied for classification. The present paper is organized as follows: Sect. 2 gives an overview of Gabor Wavelets, GSA feature vectors, Dimensionality reduction methods PCA, LDA and detailed description of proposed method GSA based PCA to LDA feature extraction of face and gait cues for multimodal classifier. Section 3 presents results and discussion and finally, Sect. 4 provides the conclusion.
2 Materials and Method The major parts of the proposed method are Gabor Wavelets filter, Gabor Scale Average, PCA and LDA are combined to form a new approach to efficiently extract features of face and gait cues. These methods are described in brief now.
2.1 Gabor Wavelets A Gabor Wavelets filter is a vital tool used to extract local features in the spatial and frequency domain. They can be applied on images to extract features, aligned to specific orientations. The Gabor Wavelets filter capture visual features like spatial
156
N. Santhi et al.
localization, orientation selectivity, frequency selectivity, and quadrature phase relationship. The Gabor Wavelets kernel in spatial domain is a complex exponential modulated by a Gaussian function. The Gabor wavelets (kernels, filters) can be defined as following, [4–6] ||k v,μ, ||2 ||Z ||2 ||k v,μ ||2 −σ 2 exp(ik )) exp Z − exp( ϕμ,v (z) = v,μ σ2 2σ 2 2
(1)
where, v, μ denote the scale and orientation of the Gabor kernel, z denotes the pixel, ie. (x, y) and ||.|| denotes norm operator. Wave vector kv,μ defined as kv,μ = kv ei∅μ
(2)
and ∅μ = πμ where f is the spacing factor among filters in the where, kv = kmax fv 8 frequency domain. The Gabor transform of an image I (face or gait) is obtained by the convolution of the image I (z) with a bank of Gabor kernels ϕμ,v (z). Oμ,v = I (z) ∗ ϕ μ,v (z)
(3)
In our work, Gabor filters at three scales (v = {0, 1, 2}) and six orientations (μ = {0, 1, 2, 3, 4, 5}) ranging between 0° and 5π/6° in Eq. (1) are applied √ on each preprocessed face and gait images with σ = 2π , kmax = π/2 and f = 2.
2.2 Gabor Scale Average (GSA) Feature Vectors A Gabor filter with three scales and six orientations applied over face and gait images, it generates Gabor bank of 3 × 6 = 18 Gabor coefficient matrices for face and gait respectively. It is high dimension and high redundant. This high dimension and redundant features of face and gait are reduced by the proposed GSA, is averaging 6 different orientations of Gabor matrices of same scale into one average matrix using Eqs. 4 and 5. Now the 18 Gabor matrices is reduced into 3 scale average Gabor matrices of different orientations. This Gabor matrix is called GSA. f
f
f
f
f
f
(4)
g
g
g
g
g
g
(5)
f = mean(G v,0 , G v,1, , G v,2 , G v,3 , G v,,4 , G v,5 ) GSAv,μ g GSAv,μ = mean(G v,0 , G v,1, , G v,2 , G v,3 , G v,,4 , G v,5 )
where, (v = {0, 1, 2}) are three scales and (μ = {0, 1, 2, 3, 4, 5}) are six orientations f g ranging between 0° and 5π/6°, GSAv,μ is GSA of face and GSAv,μ is GSA of gait respectively.
An Efficient Gabor Scale Average (GSA) Based PCA to LDA …
157
Fig. 1 GSA Feature extraction of face image
In our work, the width and height of face and gait images are 160 × 160. Therefore, for each face and gait image the Gabor filter generates 160 × 160 × 3 × 6 features. But, when we average the six different angles of Gabor matrices of same scale using GSA, generates only 160 × 160 × 3 features. The resulting vector of face and gait is called GSA face and gait feature vector. The flow diagram of GSA Feature Extraction of Face Image is shown in Fig. 1.
2.3 Principal Component Analysis (PCA) The aim of a PCA analysis is finding the directions of maximum variance in highdimensional (n) dataset and project it onto a lower dimensional subspace while holding most of the information. In other words, it is a process that uses an orthogonal transformation to alter a set of possibly M correlated variables into a set of K uncorrelated variables are called as Eigen vectors (K < M) [8, 9].
2.4 Linear Discriminant Analysis (LDA) The aim of the LDA is projecting the data on to a lower dimension space [10]. It is developed to transform the features into a lower dimensional space, which maximizes the ratio of the between-class variance to the within-class variance, thereby promising maximum class separability.
158
N. Santhi et al.
To attain this goal, three steps have to be carried out. The first step involves computing the class separability i.e. the distance between the means of different classes, called the between-class variance. The second step involves computing the distance between the average and the samples to each class, which is called the within-class variance. The third step consists of constructing the lower dimensional space which maximizes the between-class variance and minimizes the within class variance.
2.5 GSA Based PCA to LDA Feature Extraction of Face and Gait Cues for Multimodal Classifier The frame work of GSA based PCA to LDA Feature Extraction of Face and Gait Cues for Multimodal Classifier is shown in Fig. 2. Primarily, we loaded the face images from the ORL database and Gait images from CASIA B gait database and construct a bank of 18 Gabor filters (3 scales × 6 orientations) for extracting the Gabor features of face and gait images separately. This Gabor wavelets eliminate the variations caused by illumination, pose and features to a certain degree. The generated high dimension and redundant features of face and gait are reduced by averaging 6 different angles of Gabor matrices of same scale into one average matrix using GSA. Subsequently the dimensions of face and gait GSA feature vectors are reduced by a two stage PCA to LDA dimensionality reduction techniques. Using PCA first construct lower dimensional GSA_Eigen vectors of face and gait images. Next employ LDA over GSA_Eigen vectors of face and gait separately to produce GSA_EigentoFisher vectors [11].
Fig. 2 Frame work of GSA based PCA to LDA feature extraction of face and gait cues for multimodal classifier
An Efficient Gabor Scale Average (GSA) Based PCA to LDA …
159
Next we normalize and concatenate the corresponding GSA_EigentoFisher face and gait reduced feature vectors into a single feature vector. In order to recognize, the probe face and gait cues GSA feature vectors are extracted and projected onto the PCA to LDA reduced dimensionality space [12]. Finally Euclidean distance is used to estimate the similarity for classification. The process of the proposed method is given in Algorithm 1. Algorithm 1: Procedure of GSA based PCA to LDA Feature Extraction of Face and Gait Cues for Multimodal Classifier. Training Phase A Construction of GSA features (1) Set Gabor filter with scale v = {0, 1, 2}), orientations μ = {0, 1, 2, 3, 4, 5}) ranging between 0° and 5π/6°. (2) Apply Gabor filter to obtain 18 Gabor magnitude pictures of face and gait respectively Gf (1 … 18) and Gg (1 … 18). f
f
f
f
f
f
f
f
g
g
g
g
g
g
g
g
G f = (G 0,0 , G 0,1, , G 0,2 , G 0,3 , G 0,4 , G 0,5 , . . . ., G 2,4 , G 2,5 ). G g = (G 0,0 , G 0,1, , G 0,2 , G 0,3 , G 0,4 , G 0,5 . . . .., G 2,4 , G 2,5 ). (3) Compute GSA by averaging 6 different angles of Gabor matrices Gf and Gg of same scale into one average matrix using Eqs. (4) and (5). (4) Construct a one dimensional GSA feature vector for face (G S A f v ) and gait (G S A gv ) b. Construction of Lower Dimensional Projected GSA_Eigen vectors using PCA [8] (1) Calculate mean image vector () and subtract the mean image vector () from each one dimensional column vectors g (i.e.) Øi = gi – . (2) Calculate GSA_Eigen vectors and GSA_Eigen values from a covariance with reduced dimensionality using C = AT A, where A = [Ø1 , Ø2 , Ø3 ,…, ØM ]. (3) Select K best GSA_Eigen vectors and convert into original dimensionality. GSA_Eigen vectors of face is (G S AEig f v ) and gait is (G S AEig gv ) c Construction of Lower Dimensional Projected GSA_EigentoFisher vectors using LDA [10] (1) Partition the K GSA_Eigen projected vectors into C classes. (2) Find the mean vector of each class (μ1 , μ2 , μ3 ,…, μC ) to construct within class scatter matrix (S w ) nj M C 1 μj = xi Sw = (xi j − μ j )(xi j − μ j )T j=1 n j i=1 i=1
(3) Find the mean vector (μ) of all classes to construct between-class variance (S B ) matrix.
160
N. Santhi et al.
μ=
C C 1 μi S B = n i (μi − μ)(μi − μ)T C i=1 i=1
(4) Find the Transformation Matrix (w = Sw−1 S B ) and select the best K GSA_Eigen vectors and do projection along these GSA_EigentoFisher vectors for classification. The GSA_EigentoFisher vectors of face is (GSAEigFis f v ) and gait is (GSAEigFisgv ) d. Feature level fusion at training phase and concatenate GSA_EigentoFisher vectors of face (1) Normalize GSAEigFis f v and gait (GSAEigFisgv )
GSAEigFis f gv = GSAEigFis f v GSAEigFisgv Testing Phase e. Construct GSA feature vectors of probe images face and gait denoted as gsa f v and (gsa gv ) f. Construct GSA_Eigen vectors using PCA [8] for face gsa f v and gait (gsa gv ) denoted as (gsaeig f v ) for face and (gsaeiggv ) for gait g. Construct GSA_EigentoFisher vectors using LDA [10] for GSA_Eigen vectors of face and gait. h. Feature level fusion at testing phase: Concatenate GSA_EigentoFisher vectors of probe face gsaeigfis f v and gait (gsaeigfisgv )
gsaeigfis f gv = gsaeigfis f v gsaeigfisgp i. Find match score values between GSA_EigentoFisher vectors of training images and probe images using distance measures denoted as S 1 , S 2 , S 3 , S 4 , …, S M. j. Find minimum score value of (S 1 , S 2 , S 3 , S 4 , …, S M ). If minimum value is less than threshold the given probe images is Genuine else is Imposter.
Genuinei f Minimum ≤ T hr eshold I mposteri f Minimum > T hr eshold
3 Results and Discussion The performance of the proposed method was tested with two well-known database of face and gait image, i.e. ORL face database and CASIA B gait database. From ORL face database we have taken 50 images of 10 subjects with five images per person. Similarly from CASIA B gait database 50 average silhouette images over one gait cycle are extracted from 50 video sequences corresponding to 10 subjects with five average silhouette images per person have been used. The sample images of face and gait databases are shown in Fig. 3.
An Efficient Gabor Scale Average (GSA) Based PCA to LDA …
161
Fig. 3 Sample face images from ORL and gait images from CASIA
Each of these images is transformed into a Gabor magnitude image with 3 scales and 6 orientations ranging between 0° and 5π/6°. In our experiment, the width and height of face and gait images are 160 × 160. Therefore, Gabor filter generates 160 × 160 × 3 × 6 = 460,800 features for face and gait separately. This is high redundant and high dimension. By averaging the 6 different orientations of Gabor matrices of same scale into one average matrix using the GSA Eqs. 4 and5 for face and gait, each generates only 160 × 160 × 3 = 76,800 features. By comparing the features generated by Gabor filter with GSA, the proposed method has the advantage of reducing the feature set size thereby decreasing the computation time and requirement of storage space also. The performance of the proposed method with respect to recognition rate is compared between unimodal and multimodal systems with face, gait biometrics [4–6]. The recognition rate using face biometric is given Table 1 and graphically presented in Fig. 4. Similarly using gait biometric is shown in Table 2 and pictorially shown in Fig. 5. Likewise Table 3 shows recognition rate of multimodal system using face and gait biometrics and visually presented in Fig. 6. The methods are demonstrated with 5 samples per person. Figure 7 shows the comparison of recognition rates between unimodal and multimodal system. The results illustrate that as the number of samples increases, recognition rate also increases. It also reports that the proposed method produces enhanced result compared to other methods with less number of samples. Table 1 Comparison of recognition rate in % using face biometric Method
No. of samples 1
2
3
4
5
Gabor + PCA
62
68
73
79
84
Gabor + LDA
72
77
83
86
90
Gabor + PCA to LDA
83
85
88
91
94
Proposed GSA + PCA to LDA
87
90
92
94
97
The results of proposed method are highlighted
162
N. Santhi et al.
Fig. 4 Comparison of Recognition rate using face biometric Table 2 Comparison of recognition rate in % using gait biometric Method
No. of samples 1
2
3
4
5
Gabor + PCA
60
65
69
73
80
Gabor + LDA
66
72
78
81
87
Gabor + PCA to LDA
69
74
82
84
90
Proposed GSA + PCA to LDA
79
82
87
90
92
The results of proposed method are highlighted
Fig. 5 Comparison of Recognition rate using gait biometric
An Efficient Gabor Scale Average (GSA) Based PCA to LDA …
163
Table 3 Comparison of recognition rate in % using face + gait biometrics Method
No. of samples 1
2
3
4
5
Gabor + PCA
74
76
79
81
84
Gabor + LDA
78
83
86
87
91
Gabor + PCA to LDA
84
87
90
91
96
Proposed GSA + PCA to LDA
89
91
95
97
99
The results of proposed method are highlighted
Fig. 6 Comparison of Recognition rate using face + gait biometrics
Fig. 7 Comparison of Recognition rate between unimodal and multimodal
164
N. Santhi et al.
4 Conclusion This paper offers a new method GSA based PCA to LDA to extract and reduce the features of face and gait cues. A bank of Gabor filters are firstly adopted for extracting the features of face and gait images among 3 scales {1, 2, 3} and 6 orientations {0, 30, 60, 90, 120, 150}. It generates high dimension and high redundant 18 Gabor bank features. This can be reduced by averaging 6 different orientations of Gabor features of same scale into one average Gabor feature called GSA. Further to reduce dimensionality, the GSA feature vectors are transformed into the space of PCA to LDA. Next the extracted features are normalized and integrated at feature level and finally classified using Euclidean distance measure. The results of the experiments show that the proposed method has higher accuracy (99%) with respect to the recognition rate and that the GSA based PCA to LDA turns out to be a powerful method compared to other combinations of PCA and LDA.
References 1. Annbuselvi, K., Santhi, N.: Intelligences of fusing face and gait in multimodal biometric system: a contemporary study. IJCST X(X), ISSN: 2347–8578 2. Geng, X., Wang, L., Li, M., Wu, Q., Smith-Miles, K.: Adaptive fusion of gait and face for human identification in video, IEEE Applications of Computer Vision (2008) 3. Yazdanpanah, A.P., Faez, K., Amirfattahi, R.: Multimodal biometric system using face, ear and gait biometrics. In: 10th International Conference on Information Science, Signal Processing and their Applications, ISSPA (2010) 4. Wang, X., Wang, J., Yan, K.: Gait recognition based on Gabor wavelets and (2D)2 PCA. Multimed Tools Appl. doi: https://doi.org/10.1007/s11042-017-4903-7 5. MageshKumar, C., Thiyagarajan, R., Natarajan, S.P., Arulselvi, S.: Gabor features and LDA based face recognition with ANN classifier. In: Proceedings of ICETECT, IEEE (2011) 6. Deng, H.-B., Jin, L.-W., Zhen, L.-X., Huang, J.-C.: A new facial expression recognition method based on local Gabor filter bank and PCA plus LDA. International Journal of Information Technology 11(11) (2005) 7. Mattela, G., Gupta, S.K.: Facial expression recognition using Gabor-mean-DWT feature extraction technique. In: 5th International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, 2018 8. Annbuselvi, K., Santhi, N.: Role of feature extraction techniques: PCA and LDA for appearance based gait recognition. International Journal of Computer Sciences and Engineering 6, Special Issue-4, E-ISSN: 2347–2693 (2018) 9. Murali, M.: Principal component analysis based feature vector extraction. Indian Journal of Science and Technology 8(35) (2015) 10. Santhi, N., Annbuselvi, K.: Performance analysis of feature extraction. International Journal of Engineering Research in Computer Science and Engineering 5(3), ISSN (Online) 2394–2320 (2018) 11. Kumar, G., Bhatia, P.K.: A detailed review of feature extraction in image processing systems. In: 2014 Fourth International Conference on Advanced Computing and Communication Technologies 12. Tariq, U., Huang, T.S.: Features and fusion for expression recognition—a comparative analysis. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 146–152. IEEE (2012)
Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting Style via Shape Simulation Approach Preeti P. Bhatt , Jitendra V. Nasriwala, and Rakesh R. Savant
Abstract In this era, people want a handwriting style personal font for communication with the goal that a document or message can be displayed as they are written by their own hands. It can be achieved by the concept of handwriting synthesis. It has two approaches: Movement simulation, where neuromuscular hand movement is the key feature to synthesis the writing. Another is shape simulation, in which characters’ glyph can act as input elements and synthesized the writing based on the glyph. Our study aims to generate Gujarati handwritten font in personal handwriting style using stroke-based synthesis and style learning. Here, we have proposed the complete font generation model that has two main phases: A learning phase and a generation phase, in the learning phase we created rules set for generating Gujarati characters that include information like size, a position of strokes, width, endpoints, junction point, etc. In generating phase, the user needs to write a small subset of Gujarati characters and generate the others character glyph from the ruleset and prepare a font file for the same. Our algorithm extracts the strokes based on structural features of character and classified strokes with the help of extracted features. Then, character glyph is generated from extracted strokes, by concatenating them properly according to the extracted ruleset. Finally, the open type font is generated from the glyph, ready to use in any text editor that supports Gujarati script. Keywords Gujarati font · Personal font · Handwritten font · Glyph synthesis · Handwriting synthesis
1 Introduction Handwriting synthesis is artificial text generated by the computer that can simulate user writing. Handwritten text can give a personal touch by writing text, books, and notes in own handwriting. It is not only effective to present personal touch, but it P. P. Bhatt (B) · J. V. Nasriwala · R. R. Savant Faculty of Computer Science, Babu Madhav Institute of Information Technology, Uka Tarsadia University, Bardoli, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_15
165
166
P. P. Bhatt et al.
has several useful applications such as Editable handwritten file with real-time inline correction, document digitization, historical documents repairing, font graphics [1]. Moreover, it can also be helpful to forensic examiners, the disabled, and researchers who are working on handwriting recognition systems [2]. Handwritten text can be synthesis by the process of converting input text data into handwritten text by learning features and preserving the style of humans. There are two approaches to model handwriting: movement simulation and shape simulation [3]. In movement simulation, the neuromuscular moment of hand is used to model the writing. However, in the shape simulation approach; handwriting can be synthesized by learning the features of user writing style, glyph, and shape of the characters. Handwritten Gujarati characters are a more complex structure than English alphabets. Moreover, it has complex curvature and a more stylish look and requires critical analysis of the shape and characteristics of scripts [4]. The Gujarati characters have several features such as junction points, endpoints, vertical lines, horizontal lines, and loops. Gujarati characters are made up of multiple components known as ‘strokes. It plays a vital role in the current era of researches with Gujarati scripts. The character is segmented into multiple strokes which supports decreasing the difficulties of character recognition. Besides, stroke extraction and concatenation can help to synthesize the Gujarati handwritten character by reusing the strokes from small sets [5]. Our study aims to synthesize handwritten Gujarati font in personal handwriting style using a stroke-based shape simulation approach. The rest of the document organized as follows. The next section describes the related work. Further, it emphasizes the proposed methodology followed by the learning and generating phase with a detailed description. Finally, the conclusion and future work are given in the last section.
2 Related Work Much work is found for the literature for the handwritten character generation or font generation for non-Indian language. The prominent way of font generation using stroke-based synthesis is discussed in Chinese [6–8], whereas the handwriting synthesis using style learning and machine learning approach is reported in Latin (English and Spanish Language) [2, 9, 10]. In [11], they presented a generative model that matches the user writing with publicly available font families and render the user writing, however, their work fails while mimicking exact writing. In [6–8, 12], they expanded usage of Font Forge tools which helps to create and modify fonts in many standard formats without human involvement. But works are supported only with apple font files. In [6–8], they presented an elegant way of generating handwritten Chinese characters using the stroke concatenation approach. Work is specifically supported for the Chinese
Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting …
167
language. Long-term structures have been used in [13] using long short-term recurrent networks. His approach is demonstrated for the English text and online handwriting. The survey concludes that fewer works have been reported for Indian scripts like Arabic [14], Indian [15], and negligible work is reported for the Gujarati script [1]. So, our study aims to synthesize handwritten Gujarati font by concatenation approach and by reusing the strokes of few characters; generate other characters. Another objective of this work is to generate Gujarati handwritten stroke bank and character bank.
3 Proposed Work We intend to learn handwriting style from the small subset of characters written by a user and then automatically generate the handwritten font in the user style. Here, we have proposed a handwritten Gujarati font generation model which has mainly two phases: a learning phase and a generation phase. In the learning phase, the standard Gujarati characters’ dataset has been taken and prepared the ruleset for each character to simulate the characters using information like size, the position of strokes, width, endpoints, junction point, etc. In the generation phase; the user needs to write a small number of characters and simulate the glyph of other characters from the style and shape of inputted characters’ strokes and ruleset which is defined in phase 1. Finally, an OpenType font is developed based on Gujarati scripts and Unicode.
3.1 Learning Phase In the learning phase, we label all necessary information such as size, position, the distance between disconnected components, height, width, joining point, number of endpoints from input characters, and targeted character set. The strokes which are considered to synthesize the targeted characters are shown in Fig. 1 labeled A and B as endpoints.
Fig. 1 Strokes with class labels which is consider for generation of targeted characters
168
P. P. Bhatt et al.
Fig. 2 Eleven Gujarati characters have been synthesized from two strokes of Gujarati characters. Yellow (level 0) are handwritten inputted characters, green (level 1) are generated from strokes of yellow, and white (level 2) are synthesized from both
3.1.1
Analysis of Characters and Its Strokes
In this phase, a detailed analysis of each character and its strokes has been carried out based on the structure reported in [4]. We also find a small subset of Gujarati characters having all components required to synthesize the other eleven target characters. The concept is shown in Fig. 2. In this concept, by writing two Gujarati characters, we can synthesize 11 other Gujarati characters. In this study, when all components are available, then Gujarati characters can be easily synthesized.
3.1.2
Ruleset Preparation
The features of individual character classes have been extracted from the standard dataset of handwritten Gujarati characters [16] and then prepared the ruleset which helps to generate glyphs of the respective character. For generating a ruleset, we have fixed some common features for all target characters and some variable features as per the requirement of characters. In common features, we have considered the position of strokes, the occurrence of strokes, size of the strokes, and the Joining point. In variable features, we have considered the distance between two strokes, appearance of strokes. The detailed description of features is shown in Table 1. In some characters like ‘Ka’, ‘Ta’, ‘Da’, the joining points are simply the endpoint of strokes, i.e., A or B like shown in Fig. 1. However, in some cases like Ya’, ‘Va’ ‘Dha’, ‘Gha’, etc. At what specific point two strokes join (JP) is decided by extracting junction point from each sample of and then the value is to be normalized by taking percentage. Finally, chooses the junction point by taking the mode of the normalized point. Same way, another ruleset is derived by statistical analysis of features of each targeted character. The complete procedure to generate characters is shown in Table 2.
Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting …
169
Table 1 Features which are considered to prepare ruleset with its description Features
Feature type
Short form
Description
Position
Common
POS
Position of stroke in specific characters
Occurrence
Common
OCC
Numbers of times strokes occurred in characters
Size
Common
S
Size of strokes. Whether it is the same or need to reform
Joining Point
Common
JP
The point where two strokes combine
Distance between two strokes
Variable
DS
Distance needs to keep between two strokes
Appearance
Variable
AP
The appearance of stroke. Whether need to consider same or need to flip or rotate
3.2 Generation Phase The main objective of this phase is to generate the glyph of targeted handwritten Gujarati characters from the small samples written by the user and ruleset derived in the learning phase. To generate the handwritten characters, the system needs to learn the handwriting style of the user by analyzing few samples written by the user. Then, the stroke of each character is extracted by the methodology proposed by us in [5]. With the proposed methodology, stroke is extracted from the thinned binary image and separates stroke based on the endpoint and junction point. Total 9 features (endpoints, 4 line element, and 4 curve element) with all directions have been identified with the help of 16 different templates from each stroke. Unique feature code is assigned to each feature based on the template. Here, we have used K-nearest neighbor (KNN) as a classifier to evaluated stroke classification of handwritten consonants. Here, Euclidean distance is considered as a distance measure as our feature vector is of numerical type. By performing different tuning is with K = 3, 5, 7, 9, and 21, and based on the experimental result K = 5 is frozen.
3.2.1
Character Generation
Given a Gujarati character, when all of its strokes are available, we can place each stroke according to joining point, size, position, and other information extracted in the learning phase based on the concept shown in Fig. 3. In our current study, we have considered only those targeted characters that can be generated using strokes available in characters ‘ka’ and ‘ga’. A total of thirteen characters have been generated based on two input characters. The generated characters are shown in Fig. 4. Algorithm 1 describes the procedure of the glyph synthesis of specific characters. ALGORITHM 1: GlyphSynthesis.
170
P. P. Bhatt et al.
Table 2 Ruleset defined based on the standard dataset to generate handwritten characters from five strokes Char
Strokes class with the details of its features
Remark
‘Ka’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
‘Da’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
S = Same for all
‘Ta’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
• AP = Each stroke Flip and then used • S = Same for all
‘da’
Class 3 POS—1,2 OCC—2 JP—B, A
‘Ga’
Class 1g POS—1 OCC—1 JP—NIL
Class 2 POS—2 OCC—1 JP—NIL
• DS = taken from user input • S = Same for all
‘Ala’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
• AP = Character generated and then rotate by 90 • S = Same for all
‘Ha’
Class 3 POS—1,3 OCC—2 JP—B, A
Class 1 POS—2 OCC—1 JP—A
• JP = 55% of an active pixel in generated stroke ‘da’ • S = Half of the original
‘Za’
Class 1g POS—1, 2 OCC—2 JP—60%, A
POS—3 OCC—1 JP—B
• JP = 60% of an active pixel of class 1g • S = Same for all
‘La’
Class 1g POS—1— OCC—1 JP—Nil
Class 2 POS—3 OCC—1 JP—50%
‘Va’
Class 1g POS—1 OCC—1 JP—B
Class 2 POS—2 OCC—1 JP—46%
Class 4 • JP decided based on user input POS—3, 4 • S = Same for all OCC—2 JP—A1, B2
S = Same
Class 4 POS—3 OCC—1 JP—B
• AP = Class 1g strokes flipped and used • JP = 50% of an active pixel of class 2 • S = Same for all • JP = 46% of an active pixel of class 2 • DS = 15% and S = Same for all • Draw a line between class 1 JP and class 2 JP (continued)
Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting …
171
Table 2 (continued) Char
Strokes class with the details of its features
Remark
‘Ka’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
Class 4 • JP decided based on user input POS—3, 4 • S = Same for all OCC—2 JP—A1, B2
‘Ya’
Class 3 POS—1 OCC—1 JP—B
Class 1 POS—2 OCC—1 JP—A
Class 2 POS—2 OCC—1 JP—46%
JP = 49% of an active pixel of class 2 DS = 15% Height of class 2 by height/4 Draw a line between JP of generated character ‘ta’ and class 2 JP • S = Same for all
• • • •
‘Gha’ Class 3 POS—1,2 OCC—2 JP—B, (A,B)
Class 2 POS—3 OCC—1 JP—75%
• JP = 75% of an active pixel of class 2 • DS = 15% • Draw a line between JP of generated character ‘da’ and class 2 JP • S = Same for all
‘Dha’ Class 3 POS—1,2 OCC—2 JP—B, (A,B)
Class 2 POS—3 OCC—1 JP—75%
• • • •
JP = 54% of an active pixel of class 2 DS = 15% Height of class 2 = 71% Draw a line between JP of generated character ‘da’ and class 2 JP • S = Same for all
Fig. 3 Concept of character generation of Gujarati character from the strokes extracted from user writing
Input: Name of character which is to be generated Output: Image matrix of generated character’s glyph N = Find Strokes requires to generate given characters For each Strokes in N: SD= Get strokes details and stroke image from Stroke Bank
172
P. P. Bhatt et al.
Fig. 4 Generated Gujarati characters from strokes extracted from character ‘ka’ and ‘ga’
Reform Stroke to size 100X100. Translate reform strokes based on the information given in SD. Merge all updated stroke. Post Processing of generated characters
4 Font Generation Once the character is generated as an image, a picture of the image is converted in to outline image known as a computer font. Here, we have automatically converted the image into an open type font with the help of the Font Forge tool [12]. Font Forge is an open-source program that supports scripting language that allows batch processing of many fonts at once. To generate dynamic font, we have to define a font property file that describes characteristic font and finally with the help of a script, generating an image into open type font.
5 Conclusion The proposed methodology will extract the strokes from user handwriting. For each given character, the system will collect all strokes and generate glyph of other Gujarati
Stroke-Based Handwritten Gujarati Font Synthesis in Personal Handwriting …
173
characters from these strokes and ruleset defined in a learning phase. By giving twocharacter input, we have generated a total of thirteen different characters including user input. Verification of newly constructed characters with corresponding real handwritten characters is under process. Here, we have considered only four strokes’ class and generated thirteen different handwritten characters. So, in the future, work will be extended to generate and verify all handwritten Gujarati consonants and aim to generate complete handwritten font for Gujarati.
References 1. Bhatt, P.P., Nasriwala, J.: Aspects of handwriting synthesis and its applications. Int. J. Adv. Innov. Res. 6(3), 36–43 (2019) 2. Elanwar, R.: The state of the art in handwriting synthesis. In: 2nd international Conference on New Paradigms in Electronics and Information Technology (peit’013), vol. 35, no. 3, pp. 23–24 (2013). doi: https://doi.org/10.2312/eurovisstar.20141174 3. Elarian, Y., Abdel-Aal, R., Ahmad, I., Parvez, M.T., Zidouri, A.: Handwriting synthesis: classifications and techniques. Int. J. Doc. Anal. Recognit. 17(4), 455–469 (2014). https://doi.org/ 10.1007/s10032-014-0231-x 4. Thaker, H.R., Kumbharana, C.K.: Analysis of structural features and classification of Gujarati consonants for offline character recognition. Int. J. Sci. Res. Publ. 4(8), 1–5 (2014) 5. Bhatt, P.P., Nasriwala, J.V.: Stroke extraction of handwritten Gujarati consonant using structural features. In: International Conference on Emerging Technology Trends in Electronics Communication and Networking, pp. 244–253 (2020). doi: https://doi.org/10.1007/978-98115-7219-7_21 6. Lin, J.W., Hong, C.Y., Chang, R.I., Wang, Y.C., Lin, S.Y., Ho, J.M.: Complete Font Generation of Chinese Characters in Personal Handwriting Style. IEEE, Nanjing (2016). doi: https://doi. org/10.1109/PCCC.2015.7410321 7. Chen, Z., Zhou, B.: Effective radical segmentation of offline handwritten Chinese characters towards constructing personal handwritten fonts, p. 107 (2012). doi: https://doi.org/10.1145/ 2361354.2361379 8. Lian, Z., Zhao, B., Xiao, J.: Automatic generation of large-scale handwriting fonts via style learning (2016). doi: https://doi.org/10.1145/3005358.3005371 9. Lin, Z., Wan, L.: Style-preserving English handwriting synthesis. Pattern Recognit. (2007). https://doi.org/10.1016/j.patcog.2006.11.024 10. Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, vol. 6133, pp. 127–138. Springer, Berlin (2010). doi: https:// doi.org/10.1007/978-3-642-13544-6_12 11. Balreira, D.G., Walter, M.: Handwriting Synthesis from Public Fonts. IEEE, Niteroi (2017). doi: https://doi.org/10.1109/SIBGRAPI.2017.39 12. Williams, G., Font creation with FontForge. In: EuroTeX 2003 Proceedings (2003) 13. Graves, A.: Generating Sequences with Recurrent Neural Networks. arXiv Prepr. arXiv1308.0850 (2013). Accessed 24 Jan 2019 [Online]. http://arxiv.org/abs/1308.0850 14. Elarian, Y., Ahmad, I., Awaida, S., Al-Khatib, W.G., Zidouri, A.: An Arabic handwriting synthesis system. Pattern Recognit. 48(3), 849–861 (2015). https://doi.org/10.1016/j.patcog. 2014.09.013 15. Jawahar, C.V., Balasubramanian, A.: Synthesis of online handwriting in Indian languages. Tenth Int. Work. Front. Handwrit. Recognit. (2006) 16. Prasad, M.J.R.: Data set of handwritten Gujrati characters. Technol. Dev. Indian Lang. (2013)
Kids View—A Parents Companion Sujata Khedkar, Advait Naik, Omkar Mane, Aditya Gurnani, and Krish Amesur
Abstract Video surveillance cameras have been around for ages. Taking a step forward, a system is built that does the work of a surveillance camera and helps us understand a child’s behavioral and emotional aspects. The research proposes a solution that aims to help the working parents of children between the ages of 4 and 12. Due to constant work commitments, some parents are forced to leave their children at home alone or with a caretaker. The objective of the research is to detect and recognize the day-to-day activities a child performs using the human activity recognition model. Emotions play relevant roles in social and daily life so after detecting the activity being performed, the aim is to detect the emotion expressed by the child with the help of emotional analysis using the facial expression recognition model. The model also analyzes the data recorded in the system and does the graphical analysis of the emotions expressed by the child. The research also includes a model to keep a check on the behavioral aspects of the caretaker/guardian present at home to prevent inappropriate behavior toward the child and also to protect the child from being a victim of child abuse or careless handling of harmful objects. The research also provides the dataset for child activity recognition and child abuse detection that can animate researchers interested in activity recognition and abuse detection for children. Random forest yields an accuracy of 91.27% for activity recognition which is higher than the other experimented model. The proposed AbuseNet is superior to other ImageNet models with 98.20% accuracy. Keywords Viedo surveillance · Working parents · Activity recognition · Emotional analysis · Child abuse detection · SVM · Random forest · CNN
S. Khedkar · A. Naik (B) · O. Mane · A. Gurnani · K. Amesur Computer Engineering, Vivekanand Education Society’s Institute of Technology, Mumbai, Maharashtra, India e-mail: [email protected] S. Khedkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_16
175
176
S. Khedkar et al.
1 Introduction Every day, in a hundred different ways, children ask their parents: ‘Do you hear me? Do you see me? Do I matter?’. Their behavior always mirrors our reactions. According to a survey, in 2019, 80.3% of employed mothers with children ages 6– 17 worked full time, compared with 75.8% of mothers with children under age six. Employed fathers of younger and older children were almost similarly likely to work full time, with 96.1% and 96.2%, respectively, working full time. With a family’s growing needs, parents must either leave their child at home with a nanny or enroll them in day care. This limits the interaction between the child and the parents during the crucial childhood years which consequently affects the growth and development of the child. Apart from this, the safety of the child alone at home is also a major cause of concern for the parent. Most parents leave their children with a caretaker, but still, there is a chance that the caretaker ill-treats and does not behave decently with the child in the absence of the child’s parents. The proposed work solves all these problems. The research work is divided into three main sections. The first section focuses mainly on the children’s activity recognition. The system is trained in such a way that it keeps a continuous watch on the child and records the activities of the child. To recognize all of the child’s activities during the day, four classifiers are used, namely the SVM classifier [1] using two kernels, random forest classifier, and deep neural network classifier. The extracted key points from the frame are used to train the classifier and recognize the activities. Random forest classifier [2] gave us the best accuracy. The four activities recognized by the proposed system are as follows: sleeping, eating, watching, and reading. The second section of work deals with the emotional recognition of the child. The CNN [3] algorithm enables us in recognizing all of the child’s emotions. All the emotions the child goes through throughout the day are recorded, and using this, a report is created. The emotions system recognizes are as follows: neutral, happy, scared, sad, disgust, surprise, and angry. The report contains charts that the parent can check at the end of the day and hence get to know his child better. As it is known, children’s ability to understand emotions is viewed as an important predictor of the development of social competence. This is why it is really important what the child goes through during the day. Lastly, the proposed work also does child abuse detection. For child abuse detection, CNN image classifications take an input image, process it, and classify it under the categories abuse and non-abuse. This features are of great help as it helps the parent to have a stress-free workday. The contribution is summarized in the following points: 1. The dataset is created for activity recognition and abuse detection in children. 2. The parallel integrated system for detection of the activities a child performs, the emotion expressed by the child, and inappropriate behavior toward the child. The remaining paper is organized as follows: In Sect. 2, related work discusses activity and emotion recognition and child abuse detection. In Sect. 3, an overview of
Kids View—A Parents Companion
177
the created dataset is discussed in detail. In Sect. 4, the methodology with subsection activity recognition, emotion recognition, and child abuse detection. In Sect. 5, the experiment evaluation along with experimental result and analysis is discussed, and the paper is concluded in Sect. 6 with a conclusion and future scope.
2 Related Work This section summarizes the work done by the other researchers and includes methods that are closely related to the proposed work. Koutnik et al. [4] show that for a task that previously demanded networks with over a million weights, extremely small recurrent neural network (RNN) controllers can evolve. A deep, max pooling convolutional neural network is used to convert the high-dimensional visual input that the controller will usually obtain into a compact function vector (MPCNN). Wang et al. [5] use the spatial and temporal dictionaries of the parts to describe behaviors by clustering the extracted joints into five parts, which can capture the spatial structure of the human body and movements. Cheng et al. [6] to address both flaws, a novel shift graph convolutional network (Shift-GCN) is proposed. The Shift-GCN is made up of novel shift graph operations and lightweight pointwise convolutions instead of strong normal graph convolutions, with the shift graph operations providing versatile receptive fields for both spatial and temporal graphs. Bevilacqua et al. [7] looked at a variety of activities and sensors, demonstrating how various network architectures can adjust motion signals to be fed into CNNs. The classification capacity of single, double, and triple sensor systems is investigated by comparing the performance of different groups of sensors. The experimental findings from a dataset of 16 lowerlimb behaviors collected from a group of participants using five separate sensors are very positive. Alsheikh et al. [8] present an introduction to HAR-based deep learning models. To feed real images to a convolutional neural network, they create a spectrogram image from an inertial signal. The spectrogram generation phase simply removes the process of feature extraction, adding initial overhead to the network training. This method eliminates the need for reshaping the signals in a suitable format for a CNN. Wang et al. [9] obtained temporal structure by extracting local occupancy patterns from the presence around skeleton joints and then processing them with FTP. Emotion analysis and child abuse detection methodology have been proposed using CNN. CNN are best known for their excellent image efficiency. CNN is used in several image classification [10], object detection, image creation, etc., applications. The identification of images may involve the detection of peasants on roads, moving cars, etc., while images may include the classification of images into a specific class, i.e., image classification. To distinguish between them, CNN takes a picture as their input attaches value, i.e., weights and biases to them. CNN is nothing more than a point-specific increase of two functions to generate a third function [11]. The input picture is multiplied by the function detector to build the functionality map in the convolution procedure. A variety of layers can be used for CNN. The extraction of
178
S. Khedkar et al.
the feature is achieved via convolutionary layers. Pooling is applied to the image after the convolution process. The purpose of pooling is to reduce the size of the room. Optional batch normalization [12] or dropout [13] may be accompanied by a convolution sheet. The approaches are included that are image based and are closely related to the work. At the end of most CNNs for feature extraction, there are several completely linked layers. Most of the parameters in a CNN are found in fully linked layers. Specifically, the last entirely connected layers of VGG16 [14] constitute approximately 90% of all its parameters. Recent architectures, such as InceptionV3 [15], have used a global average pooling operation to reduce the number of parameters in their final layers. By averaging all elements in a feature map, global average pooling reduces each feature map to a scalar value. The network is forced to remove global features from the input image by using the average activity. Modern CNN architectures like Xception [16] combine two of the most successful experimental hypotheses in CNNs: residual modules [17] and depth-wise separable convolutions [18]. The network is forced to remove global features from the input picture as a result of the average activity. Depth-wise separable convolutions minimize the number of parameters much further by splitting the function extraction and mixture processes within a convolutional sheet. Furthermore, for the FER-2013 dataset, the state-ofthe-art model is built on CNN trained with square hinged loss [17]. Using roughly 5 million parameters, this model achieved a 71% accuracy [19]. In this architecture, the last connected layers contain 98% of all parameters. Using an ensemble of CNNs, the second-best methods presented in [19] obtained a 66% accuracy. In many cases, datasets and measured efforts are adult based without regard to how they fit within the larger landscape of child health measures. Furthermore, attempts to collect data on other areas of growth, education, or family and social contexts are not aligned with efforts to collect data on child and adolescent health. The proposed model is trained on our dataset collection for children. Also, there is no existing system that performs activity recognition with emotion classification for children with report assisted and child abuse detection in a single integrated module.
3 Dataset The dataset is prepared to train the activity recognition model as shown in Table 1 for children to influence research and also the dataset for abuse detection as shown in Table 2 for children is also created. The data set will be available to researchers after this paper has been published. To our knowledge, this is the first of this kind of video dataset for activity recognition in which the activities like eating, reading, sleeping, and watching are captured for the children of age group 4–12 years. It has 100 video clips of 720p resolution at 30fps. The child abuse dataset is the first of its kind precisely designed dataset for abuse recognition for children. It has 20 abuse videos and 20 non-abuse videos with varying resolution 720p with a duration of 10 s. The videos in the datasets have a wide variety and specificity in each set which precisely checks the model precision.
Kids View—A Parents Companion
179
Table 1 Activity recognition dataset
Reading
Eating
Sleeping
Abuse
Non-Abuse
Watching
Table 2 Abuse detection dataset
4 Methodology This section is divided into three subsections. Each part explains the methodology used to develop the respective modules of the proposed system.
4.1 Activity Recognition To achieve a physical action, the child should use his or her head, hands, arms, legs, and bodies to communicate with and provide input. Similarly, joint positions correspond to various actions. As a result, human activity can be described as a series of joint movements of the human body. For extracting human key points, a 2D pose estimate is carried out. This is a very effective way of detecting the 2D pose of persons and developing skeletons in pictures or videos. Video or images are used for input, and 2D positions are created for children within the output frame, as given in Table 4. The architecture has a sequential approach to recognize the position and association of the components. Frame by frame, the model extracted 15 key points from the images and videos, including body, foot, hand, and head key points as shown in Fig. 1. All key point x, y coordinates have been saved. The key points
180
S. Khedkar et al.
Fig. 1 Framework of proposed approach for action recognition Table 3 Performance evaluation of proposed work with different classifiers and dense neural network for activity recognition Methods Precision Recall F1-score G-Means Accuracy Random forest SVM (with poly kernel) Dense neural network SVM (with RBF kernel)
0.90278 0.90432
0.90201 0.90679
0.89722 0.90095
0.9301 0.93126
0.91277 0.90432
0.88257
0.89628
0.88771
0.93057
0.89351
0.86898
0.53361
0.56241
0.68742
0.59722
in each frame have been collected and labeled in a CSV format. The missing key points are dealt with by checking the value of each key point by frame. The values of those missing key points have been replaced with the corresponding −1 values. The coordinates are retrieved of the extracted key point. These key points are then viewed as features, which are fed into an SVM classifier, random forest, and a dense neural network to distinguish various activities. The models used for classification to predict the multi-class classification of four groups eating, watching, sleeping, and reading are SVM, random forest, and dense neural network. Furthermore, to perform the activities classification, the SVM model is trained with two separate kernels: RBF kernel and poly kernel. As a result, models for successfully recognizing activities are created and trained upon the extracted features of the activities. The random forest model is the most reliable model for classification that effectively classifies the children’s activities, according to a comparison of the used classification models. The classification results of the models are represented in Table 3.
Kids View—A Parents Companion
181
Fig. 2 Architecture of proposed EmotionNet for emotion recognition
4.2 Emotion Recognition The model is developed that is analyzed according to test precision and several parameters. The model is intended to create the highest possible accuracy for the number of parameters. Reducing the number of parameters helps one overcome major issues. The use of hardware-controlled systems such as robot platforms of small CNNs alleviates us from slow results. The architecture incorporates the removal of the connecting layer and the integration of the combined depth-wise separable convolutions and residual modules. The ADAM optimizer trains architectures. The Xception architecture inspires our concept. The residual modules and depthwise separable convolutions of this architecture. Residual modules alter the desired mapping of two layers so that they can change the features learned from the original map and the features desire. Two layers are composed of depth-wise separable convolutions: depth-wise convolutions and pointwise convolutions. These layers are intended to distinguish spatial cross-correlations from channel cross-correlations. The proposed model is a fully convolutional neural network with two residual depthwise separable convolutions, each accompanied by a batch normalization operation and a ReLU activation feature. A global average pooling and softmax activation are used for producing a prediction in the last layer. It is a 40, 000 parameter architecture. The full finishing architecture that is called EmotionNet is shown in Fig. 2. Moreover in the FER-2013 dataset in which each image belongs to one of the following groups angry, disgust, fear, happy, sad, surprise, we evaluated and trained the architecture. For the emotion classification process, it achieves 74% accuracy.
182
S. Khedkar et al.
Fig. 3 Architecture of proposed AbuseNet for abuse detection
4.3 Child Abuse Detection The proposed approach is explained in detail in this section. The lightweight CNN as shown in Fig. 3 is suggested to detect whether or not the frame image contains improper behavior. While working out all of the experiments, the information about the proposed convolutional architecture is presented. We also work with other pretrained models and compare their findings to those of AbuseNet. Tests are run on pre-trained models to check the accuracy of AbuseNet model. The ImageNet dataset was used to train all of the pre-trained models used in the experiments. Hidden layers, nodes, activation functions, batch normalization, filters, and strides vary by model. Since the pre-trained models already have weights trained on the ImageNet dataset, all we have to do now is add a few layers at the end and train those layers while keeping the pre-trained model’s layers intact. The number of nodes in the final output layers is reduced from 1000 to 2. Since these models were equipped to classify 1000 different classes, there are two classes to classify. VGG16, Xception, ResNet50, and InceptionV3 were among the common pre-trained models we tested. Table 5 summarizes the best results obtained for all of the pre-trained models, as well as the proposed AbuseNet.
Kids View—A Parents Companion
183
Table 4 The figure shows the extracted human’s key points on our recorded dataset and the skeleton for four activities
Reading
Eating
Sleeping
Watching
5 Result and Analysis The section discusses the experimental descriptions and results of activity recognition using a dense neural network and a classification model for distinguishing activities, as well as an emotion analysis model and a child abuse model using a lightweight CNN. Nvidia GeForce GTX 1650 was used to process the images and videos.
5.1 Activity Recognition The support vector machine (SVM) with two kernels and the random forest classifier are trained on the feature for each activity class, and we have also developed a dense neural network model using the Keras framework to perform multi-class classification. Table 3 shows the result of the used models in terms of precision, recall, F1score, G-Means, and classification accuracy. The model outputs are better but can be concluded that they are often guilty of overfitting. Furthermore, the ROC curve for dense neural network, random forest model, and SVM with poly, RBF kernel model is shown in Fig. 4 each activity class. The ROC curve is used to assess the performance of the proposed method. The AUC value for the proposed models on different activity classes is shown, indicating that the random forest model correlates well enough for activity classification.
5.2 Child Abuse Detection In Kera’s deep learning system with TensorFlow back end, the proposed model is implemented. For preprocessing and plotting purposes, data are used for programming language Python, other libraries such as scikit-learn, Pandas, NumPy, and
184
S. Khedkar et al.
Fig. 4 ROC curves for different models in comparison with activity eating, reading, watching, and sleeping Table 5 Results obtained with the pre-trained models are described in detail Exp Model name Training accuracy (%) 1 2 3 4
VGG16 InceptionV3 Xception AbuseNet
93.77 95.87 97.98 98.20
Matplotlib. An Adam optimizer is chosen, and the learning rate is set to 0.001. During the model training with an early stop, the best accuracy is registered. Several experiments to adjust the optimum hyperparameter values are conducted. The obtained training results are depicted in the confusion matrix as shown in Fig. 5. The diagonal cells in this figure reflect the number of correct classifications made by the proposed AbuseNet model. For example, in the first column of Row 1, the classifier correctly predicts abuse in 2187 cases. Similarly, the model correctly predicted 1782 for the non-abuse class. Examining a specific column of the confusion matrix in Fig. 5 can be used to evaluate the accuracy of an algorithm for a specific class. The results linked to abuse, for example, are summarized in the first row. The second column of Row 1 indicates that the algorithm misclassified the abuse as nonabuse in three cases. For the abuse class, the classification accuracy equals 99.86 and 99.94% for non-abuse on the training dataset.
Kids View—A Parents Companion
185
Fig. 5 Confusion matrix of the AbuseNet obtained using the training dataset
Table 6 Emotion recognition result
Happy
Scared
Neutral
Angry
5.3 Emotion Recognition For the result analysis of emotional recognition, real-time enabled guided backpropagation visualization technique is presented. The output of the CNN model is a live bar graph which depicts the percentage of an emotion shown on the face of the child. Guided backpropagation uncovers the dynamics of the weight changes and evaluates the learned features. The most notable emotion has the most percentage in the graph, and the others have small percentages. Now all the percentages that are recorded during the day are stored in a CSV file. These records are added and presented in the form of a pie chart as shown in Fig. 6. The results of the proposed work for classifying emotions in unseen faces can be seen in Table 6.
186
S. Khedkar et al.
Fig. 6 Pie chart for child emotion analysis
6 Conclusion and Future Work Children’s well-being is the result of complicated, diverse mechanisms caused by the presence of environmental factors such as the children’s families, social, and physical environments, as well as their chromosomes, genetics, and behaviors and has a significant impact on their welfare. Nonetheless, often national, state, and local data collection and assessment efforts use adult-based methods to define and quantify wellness, which does not reflect the developmental nature or the various impacts on children’s health. Specifically, a vision system that performs activity recognition with emotion classification with report assisted with child abuse detection in a single integrated module is developed. This simulation approach allows us to see and explore the high-level functionality that developed models have mastered. The proposed system describes the method of recognition of a child activity and emotions of both sexes up to the age of 4–12 year and also detects the child abuse. For the recognition of child activity and child abuse, we created our own dataset. The skeleton-based key points extraction method is used for activity recognition. Random forest classifier has a higher accuracy of 91.27% for activity recognition than the other models tested. The emotion recognition and child abuse detection are implemented using convolution neural network. With 98.20% accuracy, the proposed AbuseNet model outperforms other ImageNet models. Lastly, analysis of activities and emotions of the child can be developed with the help of recorded activity and emotions, which proves helpful. Since children rapidly change and adapt responses to interactions, the developmental process can be examined. In near future instead of using a surveillance camera, a robot can be created which has a camera and a motion sensor. This robot will sense the motion of the child and follow it all around the house tracking the child’s activities and emotions using the same models. This can help us give more accurate results.
Kids View—A Parents Companion
187
References 1. Zhang, Y.: Support vector machine classification algorithm and its application. In: International Conference on Information Computing and Applications, pp. 179–186. Springer, Berlin, Heidelberg (2012) 2. Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest? In: International workshop on machine learning and data mining in pattern recognition, pp. 154–168. Springer, Berlin, Heidelberg (2012) 3. Thomas, S., Ganapathy, S., Saon, G., Soltau, H.: Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2519–2523. IEEE (2014) 4. Koutník, J., Schmidhuber, J., Gomez, F.: Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 541–548 (2014) 5. Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 915–922 (2013) 6. Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020) 7. Bevilacqua, A., MacDonald, K., Rangarej, A., Widjaya, V., Caulfield, B., Kechadi, T.: Human activity recognition with convolutional neural networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 541–552. Springer, Cham (2018) 8. Alsheikh, M.A., Selim, A., Niyato, D., Doyle, L., Lin, S., Tan, H.P.: Deep activity recognition models with triaxial accelerometers. arXiv preprint arXiv:1511.04664 (2015) 9. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297. IEEE (2012) 10. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431– 3440 (2015) 11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 25, 1097–1105 (2012) 12. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015) 13. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1), 1929–1958 (2014) 14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 15. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 16. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) 17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 18. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp 315–323. JMLR Workshop and Conference Proceedings (2011)
188
S. Khedkar et al.
19. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., Zhou, Y.: Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer, Berlin, Heidelberg (2013) 20. Nehete, J.O., Agrawal, D.G.: Real time Recognition and monitoring a child activity based on smart embedded sensor fusion and GSM technology
Computational Operations and Hardware Resource Estimation in a Convolutional Neural Network Architecture Jyoti Pandey, Abhijit R. Asati, and Meetha V. Shenoy
Abstract The convolutional neural network (CNN) models have proved to be very advantageous in computer vision and image processing applications. Recently, due to the increased accuracy of the CNNs on an extensive variety of classification and recognition tasks, the demand for real-time hardware implementations has dramatically increased. They involve intensive processing operations and memory bandwidth for achieving desired performance. The hardware resources and approximate performance estimation of a target system at a higher level of abstraction is very important for optimized hardware implementation. In this paper, initially we developed an ‘Optimized CNN model’, and then we explored the approximate operations and hardware resource estimation for this CNN model along with suitable hardware implementation process. We also compared the computed operations and hardware resource estimation of few published CNN architectures, which shows that optimization process highly helps in reducing the hardware resources along with providing a similar accuracy. This research has mainly focused on the computational complexity of the convolutional and fully connected layers of our implemented CNN model. Keywords Convolutional neural network · Computational operations · Hardware resource estimation
1 Introduction In past few years, the availability of huge amount of reliable data in the form of text, video, audio, etc. and immense progress in the technologies of digital electronics has provided vast computing power. Thus, artificial intelligence (AI) got a huge rise, especially in the deep learning (DL) area [1, 2]. One major field of research in AI, computer vision, and pattern recognition is the handwritten character recognition that can be applied to identify digital data in postcodes, bank notes, accounting, J. Pandey (B) · A. R. Asati · M. V. Shenoy Dept. of Electrical and Electronics Engineering, Birla Institute of Technology and Science, Pilani (Pilani Campus), Pilani, Jhunjhunu, Rajasthan 333031, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_17
189
190
J. Pandey et al.
etc. The rapid pace with which the time is moving, this identification technology of great importance. So, its exploration is important without utilizing large hardware resources. • The work done in this paper presents the estimation of computational operations and hardware resources required to carry out necessary computations which is missing in published literature. • The hardware allocation decides the performance and cost of the target systems in terms of speed, number of clock cycles required for the computations, memory requirements and number of components, etc. • In this paper, initially we present ‘Optimized CNN model’ for identifying the handwritten digits in MNIST dataset following the proposed optimization process. • Later on, we investigate the hardware estimation for this model which explored various factors like the number of operations per layer, resources required, etc. • The major work of this research is focused on the computational complexity of the convolutional and fully connected layers of our implemented ‘Optimized CNN model’ architecture and to find out number of operations and its hardware resource requirements for individual layers of the architecture. The remainder of this paper consists of four sections. Section 2 describes background and related work. Section 3 describes the experimental studies. Section 4 presents results and discussion and Sect. 5 concludes the paper.
2 Background and Related Work Deep learning emerged with the invention of LeNet-5 CNN architecture based on LeNet-5 algorithm [3]. Before its invention, hand engineered features were used for character recognition. Those manually created features were then learnt and classified by a machine learning model. LeNet-5 was implemented with the Modified National Institute of Standards and Technology (MNIST) handwritten digit dataset [4]. For improving the original LeNet architecture and to get higher accuracies with MNIST dataset, there has been an uninterrupted increasing research leading to a lot of recent advancements. It has a lot of applications like OCR, text interpretation, text manipulation, signature authentication, etc. [5, 6]. With the progress in the recent past years, CNNs have now attained sufficient capability so that it can be applied to various fields like natural language processing, and identifying different diseases through radiographic images, etc. [7, 8]. Krizhevsky et al. put forward a CNN architecture named as AlexNet which proved that a deep CNN can perform with outstanding results on a complex dataset [9]. Szegedy et al. proposed the Inception architecture with increased depth and width of the network, with a purpose to utilize the computing resources efficiently [10]. Chen et al. proposed a framework for CNN-based handwritten character recognition. In this work, few strategies and properties related to handwritten characters such as proper sample generation, the training scheme and CNN network structure were studied [11].
Computational Operations and Hardware Resource Estimation …
191
Agarap proposed a CNN architecture to investigate the effects of using conventional softmax function and the linear support vector machine (SVM) at the last layer of CNN architecture [12].
3 Experimental Studies In this paper, we started with designing an ‘optimized CNN model’ for identifying the digits in MNIST dataset with a good accuracy. Then we computed the number of operations in various layers. Lastly, we estimated the hardware requirements and resource usage of various layers for a suitable sequential and pipelined hardware implementation process of the layers of ‘optimized CNN model’.
3.1 Design an Optimized CNN Model with a Good Accuracy We started with assuming a basic model for CNN architecture with default parameters and optimized various hyper-parameters like the number of filters and filter sizes for convolution and pooling layers, padding and stride, activation functions, optimization functions, learning rates, number of Epochs to get the ‘Optimized CNN model’. For this study, we used MNIST dataset of 70,000 images (28 × 28 × 1) of digits from zero to nine. We used 60,000 images for training and 10,000 images for testing. We implemented the architecture on a laptop with Intel core processor i5-2520M @ 2.50 GHz having 10 GB RAM with MATLAB R2020a. The training cycle comprises of 10 epochs with 468 iterations per epoch. Learning rate is kept constant at 0.001. This architecture has first image input layer, then two sets of convolutional and maxpooling layers. Then one fully connected layer and finally a classification layer with 10 output classes implemented using softmax function. The ‘Optimized CNN model’ architecture is shown in Fig. 1. For both convolution layers, we used nine filters of size 3 × 3. From calculation point of view the filter size of 3 × 3 is best, because the filter of size 3 × 3 will Max Pooling 1 filter 3x3
Convolution 1 9 filters of 3x3
Convolution 2 9 filter of 3x3
Max Pooling 2 filter 3x3
Fully Connected layer
Batch & ReLU
Batch & ReLU
Softmax Classification OUTPUT
INPUT 28x28x1
26x26x9
24x24x9
Fig. 1 ‘Optimized CNN model’ architecture
22x22x9
20x20x9
1x1x10
192
J. Pandey et al.
Fig. 2 Training progress of ‘Optimized CNN model’ architecture
require nine (3 × 3) weights in total during each convolution step. The lesser number of multiplications will give lesser weights, thus making the filter of size 3 × 3 computationally efficient. Few published literatures have also used filters of size 5 × 5, which requires 25 (5 × 5) weights during each convolution step. It makes those CNN architectures computationally complex. Few literatures have used even sized filters also like 2 × 2 or 4 × 4, but it is good to prefer odd size filters. The reason being, on examining the final output pixel obtained by the convolution operation of the previous layer, all the pixels of previous layer are symmetrical with that particular output pixel. If this symmetry is not present, it will create an imbalance across the layers. This model gave an accuracy of 98.98% as shown in Fig. 2. Now we need to estimate the computing capabilities of the convolutional and the fully connected layers.
3.2 Computation of Number of Operations in CNN Layers First Convolutional Layer. Here, as the filter slides over input image, every position of the 3 × 3 × 1 filter over input image pixels requires nine multiplications and eight additions to give a final value. The steps of computation are shown in Table 1. Finally, after first convolution, image size is changed from 28 × 28 × 1 to 26 × 26 × 9. Second Convolutional Layer. Again, we have nine filters of size 3 × 3 × 9 (Since there are nine input channels from the previous layer, so each 3 × 3 filter will have a depth of nine) and stride is one. Similar to previous convolution, every filter position again requires nine multiplications and eight additions. The complete steps of computation are shown in Table 2. Finally, the image size is changed from 24 × 24 × 9 to 22 × 22 × 9.
Computational Operations and Hardware Resource Estimation …
193
Table 1 Computations in first convolutional layer S.no
Filter applied
Total multiplication operations required
Total addition operations required For weights
For bias
1
3 × 3 filter applied on 3×3 first 3 × 3 image portion
(3 × 3 − 1)
1
2
3 × 3 filter moves 26 × (3 × 3) row-wise with stride one and applied on complete first row of image pixels
26 × (3 × 3 − 1)
26
3
Now, filter moves down 26 × 26 × (3 × 3) with stride one and again applied row-wise. When one filter of size 3 × 3 is applied on whole image of size 28 × 28
26 × 26 × (3 × 3 − 1)
26 × 26
4
Here, one filter of size 3 × 3 with depth one is applied on whole image of size 28 × 28 × 1
26 × 26 × (3 × 3) ×1
26 × 26 × (3 × 3 − 1) ×1
26 × 26
5
Thus, for total nine 26 × 26 × (3 × 3) filters of size 3 × 3 with × 1 × 9 depth one to convolve over whole image of size 28 × 28 × 1
26 × 26 × (3 × 3 − 1) ×1×9
26 × 26 × 9
Fully connected layer. As we have used nine filters, these nine different filters create nine different channels/feature maps. Now, with these nine channels we have the input image of size 20 × 20. So, we get 9 × 20 × 20 weights for each digit in this layer. And, for 10 digits, i.e., zero to nine, total multiplications required = 9 × 400 × 10. After all these computations, we obtain a vector of size [10 × 3600] as the weights of fully connected layer. For each digit, we get 3600 weights, so 10 rows with 3600 values each correspond to the weights of 10 digits from zero to nine. The computation of number of operations in each layer of our proposed ‘Optimized CNN model’ architecture is given in Table 3. For max-pooling layer, we have filter size of 3 × 3. So, we need to compare/sort the values in the filter window eight times to obtain maximum value. After having a clear estimate of calculations and resource usage of various layers, we can estimate the hardware resources required for the layers of this architecture like number of multipliers, adders, accumulators, etc.
194
J. Pandey et al.
Table 2 Computations in second convolutional layer S.no
Filter applied
Total multiplication operations required
Total addition operations required For weights
For bias
1
3 × 3 filter applied on 3×3 first 3 × 3 image portion
3×3−1
1
2
3 × 3 filter moves 22 × (3 × 3) row-wise with stride one and applied on complete first row of image pixels
22 × (3 × 3 − 1)
22
3
Now, filter moves down 22 × 22 × (3 × 3) with stride one and again applied row-wise. When one filter of size 3 × 3 with depth one is applied on whole image
22 × 22 × (3 × 3 − 1)
22 × 22
4
For one filter of size 3 × 22 × 22 × (3 × 3) 3 having depth of nine to × 9 convolve over whole image of size 24 × 24
22 × 22 × (3 × 3 − 1) ×9
22 × 22
5
Thus, for total nine 22 × 22 × (3 × 3) filters of size 3 × 3 ×9×9 having depth nine to convolve over whole image of size 24 × 24 × 9
22 × 22 × (3 × 3 − 1) ×9×9
22 × 22 × 9
Table 3 Computation of number of operations in each layer of our proposed ‘Optimized CNN model’ architecture CNN layers
Multiplication operations required
Total addition operations required For weights
For bias
Sorting operations required
1. First conv. layer 26 × 26 × (3 = nine filters 3 × 3; × 3) × 1 × 9 Stride is one
26 × 26 × (3 × 3 − 1) × 1 × 9
26 × 26 × 9
–
2. First max-pooling – = 3 × 3 filter; Stride is one
–
–
(3 × 3 − 1) × 24 × 24 × 9
3. Second conv. 22 × 22 × (3 layer = nine filters 3 × 3) × 9 × 9 × 3; Stride is one
22 × 22 × (3 × 3 − 1) × 9 × 9
22 × 22 × 9
–
4. Second – max-pooling = 3 × 3 filter; Stride is one
–
–
(3 × 3 − 1) × 20 × 20 × 9
5. Fully connected layer
20 × 20 × 9 × 20 × 20 × 9 × 10 10
10
–
Total
443,592
408,754
70,272
Computational Operations and Hardware Resource Estimation …
195
Fig. 3 A sample image with 3 × 3 filter and line buffer implemented using shift registers
3.3 Suitable Hardware Implementation Process Convolutional layer implementation. In convolution layer, a filter window is moved across the image both vertically and horizontally. Our ‘Optimized CNN model’ architecture is using a filter of size 3 × 3. The multiplications of the convolutional layer can be implemented in hardware using line buffers. Since our model has 3 × 3 filter size, so we can use two 3-line buffers (first two rows of buffer will have same buffer size as width of the image while third row will have three buffers). The line buffers are just like shift registers which can be used to shift the pixel values in every clock cycle. For our 3 × 3 filter window we need only nine values (R1–R9 as shown in Fig. 3) from the line buffer. The rightmost elements of the first row of line buffer contain the elements R1, R2, and R3. Similarly, the rightmost elements of second row of line buffer contain the elements R4, R5, and R6. For first multiplication of 3 × 3 filter over the input image, we need nine multipliers. As shown in Fig. 3, R1–R9 gives the respective values after multiplication of image pixel and filter pixel values. The addition of these values from R1 to R9 gives the final pixel value in the output image. For one filter position, nine multipliers and eight adders can be used. For next filter position after stride one, again nine multipliers and eight adders are required, and so on for further filter positions. Repeatedly using same multipliers and adders after each clock cycle will reduce the resource requirement and power consumption. This method is proposed in this paper. Thus, for each of the nine filters, it will require similar multiplier and adder resources (i.e., nine multipliers and eight adders per filter). Max-pooling layer implementation. In max-pooling layer, the filter window selects the maximum value and keeps it for output image. This can be done by using 2-way sorters. For our proposed ‘Optimized CNN model’, we have used filter window of size 3 × 3. So, eight 2-way sorters (a 2-way sorter consists of one comparator and one MUX) are needed to choose the maximum value. Thus, for each of the nine filters, it will require similar 2-way sorter resources (i.e., eight sorters per filter).
196
J. Pandey et al.
Fully connected layer implementation. For implementing this layer in hardware, we can use multipliers and accumulators. As there are nine channels, so the implementation could require 9 × 400 × 10 multiplication operations and their additions. For each channel, multiplication and addition operations can be performed using a single multiply and accumulate unit repeatedly. Each channel is connected with all the feature maps of size 20 × 20 for each of the 10 digits. Thus, for each of the 10 digits and nine channels, it would require 9 × 10 multiply and accumulate units. Now by using the given implementation process of various layers, we can estimate the hardware resource requirements of various layers of our proposed ‘optimized CNN model’. As shown in Table 4, this process gives a clear estimate of resource usage like number of multipliers, adders, accumulators and memory, etc. (memory to save filter weights and buffer size required in convolution operations) required to deploy our CNN model. Here, each layer operation is performed in sequential manner while connections from one layer to another layer are in pipelined manner. Table 4 Computation of number of resources required in each layer of our ‘optimized CNN architecture’ using the proposed implementation process CNN layers
2-way Sorters required
Memory required For weights
For buffer
1. First conv. (3 × 3) × 9 (3 × 3 − 1) × layer = nine 9 (for weights) filters of size 3 9 (for bias) × 3; Stride is one
–
(3 × 3) × 9
28 + 28 + 3
2. First – max-pooling = 3 × 3 filter; Stride is one
(3 × 3 − 1) × 9
–
–
–
(3 × 3) × 9
24 + 24 + 3
(3 × 3 − 1) × 9
–
–
3. Second conv. layer = nine filters of size 3 × 3; Stride is one
Multipliers required
Adders required
–
(3 × 3) × 9 (3 × 3 − 1) × ×9 9 × 9 (for weights) 9 (for bias)
4. Second – max-pooling = 3 × 3 filter; Stride is one
–
5. Fully connected layer
9 × 10 MAC units (for weights) 10 adders (for bias)
–
9 × 400 × 10 = 36,000
–
Total
900
144
36,162
110
838
Computational Operations and Hardware Resource Estimation …
197
4 Results and Discussion Our ‘Optimized CNN model’ architecture achieves 98.98% accuracy with the MNIST dataset. It uses only two convolutional layers, and each layer is having nine filters of size 3 × 3, which makes it most suitable due to the simplicity of the network. We have compared the number of operations and number of resources for few published CNN architectures (based on their filter size, number of filters, number of layers, etc. and applying similar proposed hardware computation approach) [11, 12] as shown in Tables 5 and 6. The performance comparison graphs of number of operations and number of resources required are shown in Figs. 4 and 5. Firstly, our findings, calculations and various CNN comparisons in Table 5 proves that our model is most optimized from calculation point of view. Secondly, we model the hardware resource requirements of different CNN models as shown in Table 6. It is clear that our architecture will require least number of adders, multipliers and other resources during hardware implementation and this will also speed up the system. Table 5 Comparison of number of operations in various CNN architectures CNN model
Total multiplications required
Total additions required Sorting operations for weights and bias required
CNN-based framework1
56,825,568
53,968,794
490,496
CNN SVM2
59,436,544
58,388,042
162,912
59,436,544
58,388,042
162,912
443,592
408,754
70,272
CNN
Softmax2
Optimized CNN model3 1 Ref.
[11]; 2 Ref. [12]; 3 Proposed
Table 6 Comparison of number of resources required in various CNN architectures CNN model
Total multipliers required
Total adders required
2-way sorters required
Memory required For weights
For buffer
CNN-based framework1
258,560
249,722
1792
30,120,128
145
CNN SVM2
122,656
121,706
288
31,726,944
124
122,656
121,706
288
31,726,944
124
900
838
144
36,162
110
CNN
Softmax2
Optimized CNN model3 1 Ref.
[11]; 2 Ref. [12]; 3 Proposed
198
J. Pandey et al.
Fig. 4 Performance comparison graph of number of operations
Fig. 5 Performance comparison graph of number of resources required
5 Conclusion and Future Work In CNNs, the core computation comprises of the multiplication and addition operations. In this paper, we have estimated the computations required and the memory requirements of a CNN model, it gives a clear idea of how to efficiently implement the CNN on a hardware platform. The CNN architecture given in this paper is simple and unique with reference to the selection of hyper-parameters across various layers of the network. Also, it is evident through this study that by making few cutbacks in the hyper-parameters like filter size, number of filters, we get a remarkable reduction on the computations and resource requirements for the CNN architecture. This decrease in architecture computations and CNN resource requirements will complement the software and hardware implementation or the implementations with ‘hardware-software co-design’ platform. The major aim of this research work is to find out the resource requirements of our implemented CNN model. It will help in implementing CNN architecture in hardware such as ASICs, or FPGAs. The future work can focus on exploiting the trade-offs between hardware and software in a system, so that we can attain system
Computational Operations and Hardware Resource Estimation …
199
level objectives. It will help in improving design quality, cost, and test time. Also, it will support in reducing the complexities of embedded systems. Acknowledgements Our heartfelt appreciation to Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for the MNIST dataset (Lecun et al. 1999). This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.
References 1. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press Cambridge (2016). https:// www.deeplearningbook.org/ 2. Nielsen, M.A.: Neural networks and deep learning. Determination Press (2015). http://neural networksanddeeplearning.com/ 3. LeCun, Y., Bottou, L., Benjio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998) 4. LeCun, Y., Cortes, C., Burges, C.: The MNIST database of handwritten digits (1999) 5. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 6. Hamid, N.A., Sjarif, N.N.: Handwritten Recognition Using SVM, KNN and Neural Networks (2017). arXiv, abs/1702.00723 7. Sun, W., Tseng, T.L.B., Zhang, J., Qian, W.: Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data. Comput. Med. Imaging Graph. 57, 4–9 (2017) 8. Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016) 9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 10. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhaocke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 11. Chen, L., Wang, S., Fan, W., Sun, J., Naoi, S.: Beyond human recognition: a CNN-based framework for handwritten character recognition. In: 3rd IAPR Asian Conference on Pattern Recognition. Fujitsu Research and Development Center, Beijing, China (2015) 12. Agarap, A.F.: An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification (2019). arXiv:1712.03541v2 [cs.CV]
Load Balancing in Multiprocessor Systems Using Modified Real-Coded Genetic Algorithm Poonam Panwar, Chetna Kaushal, Anshu Singla, and Vikas Rattan
Abstract Load balancing algorithm in a homogeneous or heterogeneous multiprocessor system is used to distribute a set of jobs to a set of processors with the objective of efficient processing. There are various load balancing algorithms proposed in literature, but none are focusing on minimization of the total execution time, i.e., makespan and load balancing simultaneously. So, a real-coded genetic algorithm has been proposed in present study to minimize the makespan by balancing the load on each processor. To achieve the specified objective, twofold fitness function is used here; first, fitness function is used to reduce the makespan, whereas the second one is used to maximize the load on individual processor. The proposed algorithm is tested on a total of 12 problems taken from literature as well as on three additional benchmark problems. The analysis shows that the proposed algorithm is efficient as compared to previously known. Keywords Makespan · Load balancing · Genetic algorithm · Multiprocessor system · Job scheduling
1 Introduction Developments in software as well hardware engineering take power to amplify attention in the usage of outsized multiprocessor systems for databanks, realistic P. Panwar (B) · C. Kaushal · A. Singla · V. Rattan Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] C. Kaushal e-mail: [email protected] A. Singla e-mail: [email protected] V. Rattan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_18
201
202
P. Panwar et al.
situations, military, and wide-ranging industrial processes. Various elements like operating system and organization of the synchronized methods comprise essential portions of these environments. The principal concerns in these systems are the growth of efficient practices for the dissemination of the jobs of a job graph on numerous processors. The main difficulty is to schedule the jobs or processes among different processors to attain certain required objective(s), for example, reducing total job processing time by minimalizing communication intervals, and/or amplifying the utilization of resource [1–7]. Moreover, this dissemination turns out to be a load balancing difficulty and must be measured as a significant feature through the design stage of parallel/distributed processors [8–12]. The burden on the processing elements in a multiprocessor system is a precarious issue. Severely burdened processors use additional energy, which produce greater heat intemperance and can instigate additional existence of faults. A noble load harmonizing algorithm executes the jobs as quick as achievable exclusive of loading distinct processor, delivers suitable resource consumption, and reduces job completion time [13–29]. Noble load balancing escapes extreme intemperance on the processor. Lihua [15] and Zhi-qiang and Sheng-hui [30] focused on algorithms like EDF-FF “Earliest Deadline First-First Fit” and EDF-BF “Earliest Deadline FirstBest Fit” correspondingly. Both the algorithms divide jobs using FF “First Fit” and BF “Best Fit” correspondingly and make use of earliest deadline first algorithm to execute jobs on distinct processors of a multiprocessor system after subdividing. First fit algorithm initiates with former processor and explores for another processor that can implement the job exclusive of overburdening. Best fit on the other hand looks for a processor with highest present load that can adjust the job deprived of overfilling. All these algorithms are modest but forgot to contemplate load balancing between the processors. Zhang et al. [31] proposed an algorithm with load balancing called LBSA “Local Border Search Algorithm.” However, processing of simultaneous jobs in multiprocessor atmosphere comprises of job allotment phase and job scheduling phase. In job allotment phase, the jobs are organized in ascending order according to their loads initially. Next, allocation is completed in numerous stages. In each stage, existing processors are arranged in descending order of their recent weights, and then, every processor is allocated individual job from categorized job list. Later, in scheduling phase, EDF “Earliest Deadline First” is applied to execute jobs in solitary processor. This algorithm deals underprivileged load balancing at substantial load. It is witnessed that both EDF-FF and ED-FBF does not offer load balancing; however, LBSA contribute deprived functioning at excessive load situations [31]. Keeping these facts in view, our recent research undertakes a genetic algorithm with following key contributions: • Proposed algorithm while allocating jobs to processors to minimize makespan also tries to maintain load balance in multiprocessor systems. • It is a real-coded genetic algorithm that uses twofold fitness functions and analysis shows improvement over other algorithms.
Load Balancing in Multiprocessor Systems Using Modified …
203
The rest of this paper is structured as: Sect. 2 provides a description of the proposed modified genetic algorithm [32] with an additional load balancing feature. Implementation of the proposed technique is given in Sect. 3, and its implementation is analyzed in Sect. 4. Conclusions based on the present study are ultimately drawn in Sect. 5.
2 Proposed Algorithm for Load Balancing and Minimizing Makespan The foremost aim of the job scheduling is to minimize the total processing time length (i.e., makespan); it is found that numerous solutions can yield the identical makespan; however, the load balance among processors may not be contented in certain of them [30, 33–39]. The goal of load balance alteration is that to achieve a lowest schedule length at the equivalent time to fulfill the load balance prerequisite. The proposed algorithm consists of five steps: starting from chromosome representation, mapping of jobs to processors, selection of chromosome, their crossover and mutation. The fitness function used here is a single evaluation function based on the makespan ( f ) of the schedule, which has to be maximized. However, in the proposed algorithm, the solutions have to satisfy dual fitness functions, which are used in sequential order. The primary fitness function is used for minimalizing the overall processing time, and the next fitness function is to fulfill the necessity of load balance amid processors. The fitness function for makespan is same as given in [32], and the second function is suggested here. It is represented as the relationship of the highest schedule length to the mean processing time throughout allthe processors. If the processing time of processor P j is represented by P_time P j , then the mean processing time around all processors is as follows: m P_time P j Mean = m j=1
(1)
Here, m is the total count of available processors in system to process the jobs. So, the load balance is calculated as: load_balance =
makespan Mean
(2)
For instance, consider a DAG specified in Fig. 1 consisting of nine jobs to be scheduled on a homogeneous multiprocessor structure with three processing elements. Assume, there are two job scheduling results as given in Figs. 2 and 3 are obtained using some algorithm. The makespan of both resolutions is equivalent to 23. Mean = (23.0 + 17.0 + 12.0)/3.0 ≈ 17.3333
204
P. Panwar et al.
Fig. 1 Sample dag “directed acyclic graph” with nine tasks
Fig. 2 Job assignment of DAG given in Fig. 1 (makespan 23 and load balance 1.326)
Load_ balance = 23.0/17.3333 ≈ 1.3269 Mean = (11.0 + 9.0 + 23.0)/3.0 ≈ 14.3333 Load_ Balance = 23.0/14.3333 ≈ 1.6046 So, permitting to the Load_balance function results in Fig. 2 is superior than result in Fig. 3. Hence, using this function, we will obtain not only the minimum makespan but also greater load balance. Therefore, we now propose to use two fitness functions given in (3) and (4) one after other, where f k is the makespan of the kth chromosome, and kc is the total number of chromosomes and where (3) has to be maximized and (4) has to be minimized. evaluate(Vk ) = 1/ f k , k = 1, 2, 3, . . . , kc
(3)
evaluate(Vk ) = f k /Avg, k = 1, 2, 3, . . . , kc
(4)
Load Balancing in Multiprocessor Systems Using Modified …
205
Fig. 3 Job assignment of DAG given in Fig. 1 (makespan 23 and load balance 1.604)
3 Implementation of the Proposed Algorithm To assess the accomplishment of our recommended algorithms, we have applied it to twelve job graph examples taken from literature and three standard applications which are acquired from a STG “Standard Task Graph” repository [40] (and given in Table 1). The initial program in STG group comprises of a job graph of robot graph program with ninety jobs; the other two are random job graphs with fifty and hundred nodes, respectively. The algorithm is implemented on an Intel processor (2.6 GHz) exercising C language. The results obtained are compared in Figs. 4, 5, and 6. The
Fig. 4 Collation of load balance for homogeneous multiprocessor problems
Benchmark application
No. of jobs
No. of processors
Comment
robot.stg
90
2
Robot control program
rand0000
50
2
Random graph
rand0001
100
2
Sparse matrix solver
1.5 Load Balance
Table 1 Referred benchmark applications
1 0.5 0 1
2
3
4
5
6
Problem Number
with load balancing without load balancing
7
206
P. Panwar et al.
Fig. 5 Collation of load balance for heterogeneous multiprocessor problems
Load Balance
2 1.5 1 0.5 0 1
2
3
4
5
Problem Number
with load balancing without load balancing
Load Balance
1.06 1.04 1.02 1 0.98 0.96
1
2
3
Problem Number
with load balancing
without load balancing
Fig. 6 Collation of load balance for benchmark problems
population range studied is 20, and the maximum sum of iterations used is 500. The outcomes achieved by recommended algorithm are shown in Tables 2 and 3.
4 Analysis of the Results From the results presented in Tables 2 and 3, we observe that in the case of homogeneous multiprocessor problems without communication cost there is very minor and practically no effect of load balancing on the makespan (examples 3 and 7 and benchmark problems). In the case of homogeneous examples, with communication cost, there is improvement in load balancing in almost each case. In the case of heterogeneous multiprocessor examples, there is improvement in the load balance, but it is at the cost of makespan. However, in case of example 3 of heterogeneous type, which is an assignment of eleven jobs on four processors, there is considerable improvement in load balance value (from 1.82 to 1.41) without any increase in makespan. This is primarily because without load balancing jobs are
Load Balancing in Multiprocessor Systems Using Modified …
207
Table 2 Results obtained before applying load balance fitness function Example no.
No. of tasks
No. of processors
Communication cost
Makesppan
Load distribution
Avg load
Load balance
440
440, 400
420
1.05
Homogeneous multiprocessor system 1
18
2
Yes
2
9
2
Yes
23
13, 23
18
1.27
3
12
2
No
60
60, 58
59
1.01
4
9
3
Yes
16
10, 12, 16
12.66
1.26
5
9
3
Yes
160
100, 160, 140
133.30
1.20
6
18
3
Yes
390
390, 360, 350
366.7
1.06
7
40
4
No
208
208, 192, 206,166
193
1.07
Heterogeneous multiprocessor system 1
10
3
Yes
23
15, 23, 12
16.66
1.38
2
10
3
Yes
73
39, 73, 36
49.33
1.47
3
11
4
Yes
26
26, 0, 14, 17
14.25
1.82
4
10
2
Yes
24
24, 20
22.00
1.09
5
11
2
Yes
56
56, 56
56
1.00
Benchmark examples (homogeneous multiprocessor system) 1
90
2
No
1313
1313, 1170
1241.5
1.05
2 3
50
2
No
131
131, 131
131
1.00
100
2
No
291
291, 291
291
1.00
assigned only to three processors. In general, it has been noticed that load harmonizing is not required in the situation of problems without communication costs but is needed in case of heterogeneous multiprocessor systems with communication cost.
5 Conclusions and Future Scope This paper undertaken a modified genetic algorithm for finding optimal schedule for multiprocessor job scheduling problems which tries to balance the load on different processors by using load balance function. This algorithm is capable to locate an optimal solution by finding minimum makespan and maximizing available load spread on each processor. Our proposed algorithm is tested on a group of total fifteen problems. It is noticed that although load balancing can be tried on both homogeneous and heterogeneous multiprocessor systems, it is more effective in the
208
P. Panwar et al.
Table 3 Results obtained after applying load balance fitness function Example no.
No. of tasks
No. of processors
Communication cost
Makespan
Load distribution
Avg load
Load balance
Homogeneous multiprocessor system 1
18
2
Yes
440
440, 400
420
1.05
2
9
2
Yes
23
16, 23
19.50
1.18
3
12
2
No
62
62, 62
62
1.00
4
9
3
Yes
16
16, 14, 10
13.33
1.20
5
9
3
Yes
160
140, 160, 100
133.33
1.20
6
18
3
Yes
390
390, 360, 360
370.00
1.05
7
40
4
No
217
216, 202, 217, 201
209
1.04
Heterogeneous multiprocessor system 1
10
3
Yes
28
19, 28, 19
22
1.27
2
10
3
Yes
90
72, 90, 66
76
1.18
3
11
4
Yes
26
26, 13, 16, 19
28.50
1.41
4
10
2
Yes
25
25, 22
23.50
1.06
5
11
2
Yes
56
56, 56
56
1.00
1313
1313, 1289
1301
1.01
Benchmark examples (homogeneous multiprocessor system) 1
90
2
No
2
50
2
No
131
131, 131
131
1.00
3
100
2
No
291
291, 291
291
1.00
case of homogeneous systems as compared to heterogeneous systems. The proposed algorithm can also be used in networking systems with little modifications to make efficient utilization of servers.
Load Balancing in Multiprocessor Systems Using Modified …
209
References 1. Ahmad, I., Kwok, Y.K.: On parallelizing the multiprocessor scheduling problems. IEEE Trans. Parallel Distrib. Syst. 10(4), 414–431 (1999) 2. Akbari, M., Rashidi, H., Alizadeh, S.H.: An enhanced genetic algorithm with new operators for task scheduling in heterogeneous computing systems. Eng. Appl. Artif. Intell. 61, 35–46 (2017) 3. Arora, R.K., Rana, S.P.: Heuristic algorithms for process assignment in distributed computing systems. Inf. Process. Lett. 11(4), 199–203 (1980) 4. Bajaj, R., Agrawal, P.D.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15(2), 107–118 (2004) 5. Baskiyar, S., SaiRanga, P.C.: Scheduling directed a-cyclic task graphs on heterogeneous network of workstations to minimize schedule length. In: 2003 International Conference on Parallel Processing Workshops. Proceedings, pp. 97–103. IEEE (2003) 6. Bohler, M., Moore, F.W., Pan, Y.: Improved multiprocessor task scheduling using genetic algorithms. FLAIRS Conference (1999) 7. Chhabra, R., Verma, S., Krishna, C.R.: A survey on driver behavior detection techniques for intelligent transportation systems. In: 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence, pp. 36–41. IEEE (2017) 8. Boregowda, U., Chakravarthy, V.R.: A hybrid task scheduler for DAG applications on a cluster of processors. In: IEEE Fourth International Conference on Advances in Computing and Communications (ICACC) (2014) 9. Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001) 10. Brest, J., Zumer, V.: A performance evaluation of list scheduling heuristics for task graphs without communication costs. In: IEEE International Workshops on Parallel Processing (2000) 11. Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988) 12. Chitra, P., Venkatesh, P., Rajaram, R.: Comparison of evolutionary computation algorithms for solving bi-objective task scheduling problem on heterogeneous distributed computing systems. Sadhana 36(2), 167–180 (2011) 13. Kwok, Y.K., Ahmad, I.: Benchmarking and comparison of the task graph scheduling algorithms. J. Parallel Distrib. Comput. 59(3), 381–422 (1999) 14. Kwok, Y.K.: Parallel program execution on a heterogeneous PC cluster using task duplication. In: IEEE 9th Workshop on Heterogeneous Computing (2000) 15. Lihua, X.Y.: The First Fit Algorithm for Distributing Dependent Tasks in Multiprocessors System, 3rd edn. Journal of Qingdao University Engineering & Technology Edition (1996) 16. Liu, C.H., Li, C.F., Lai, K.C., Wu, C.C.: Dynamic critical path duplication task scheduling algorithm for distributed heterogeneous computing systems. In: Proceedings of the 12th IEEE International Conference on Parallel and Distributed Systems, pp. 365–374 (2006) 17. Lu, H., Carey, M.J.: Load-balanced task allocation in locally distributed computer systems. University of Wisconsin-Madison, Computer Sciences Department (1986) 18. Luo, J., Dong, F., Cao, J., Song, A.: A novel task scheduling algorithm based on dynamic critical path and effective duplication for pervasive computing environment. Wirel. Commun. Mob. Comput. 10(10), 1283–1302 (2010) 19. Mehrabi, A., Mehrabi, S., Mehrabi, A.D.: An adaptive genetic algorithm for multiprocessor task assignment problem with limited memory. In: Proceedings of World Congress on Engineering and Computer Science, vol. 2, pp. 1018–1023 (2009) 20. Montazeri, F., Salmani-Jelodar, M., Fakhraie, S.N. and Fakhraie, S.M.: Evolutionary multiprocessor task scheduling. In: International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06). IEEE, pp. 68–76 (2006)
210
P. Panwar et al.
21. Omara, F.A., Arafa, M.M.: Genetic algorithms for task scheduling problem. J. Parallel Distrib. Comput. 70(1), 13–22 (2010) 22. Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. 100(12), 1425–1439 (1987) 23. Price, C.C., Krishnaprasad, S.: Software Allocation Models for Distributed Computing Systems. In: ICDCS, pp. 40–48 (1984) 24. Qinma, K., He, H.: A novel discrete particle swarm optimization algorithm for meta-task assignment in heterogeneous computing systems. Microprocess. Microsyst. 35(1), 10–17 (2011) 25. Rath, C.K., Biswal, P., Suar, S.S.: Dynamic task scheduling with load balancing using genetic algorithm. In: 2018 International Conference on Information Technology (ICIT), pp. 91–95. IEEE (2018) 26. Ritchie, G., Levine, J.: A hybrid ant algorithm for scheduling independent jobs in heterogeneous computing environments. In: Proceedings of 23rd Workshop of the UK Planning and Scheduling Special Interest Group (2004) 27. Saadat, A., Masehian, E.: Load balancing in cloud computing using genetic algorithm and fuzzy logic. In: 2019 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1435–1440. IEEE (2019) 28. Sagar, G., Sarje, A.K., Ahmed, K.U.: On module assignment in two-processor distributed systems: a modified algorithm. Inf. Process. Lett. 32(3), 151–153 (1989) 29. Sandhya, S., Cauvery, N.K.: Dynamic load balancing by employing genetic algorithm. Cloud Reliability Engineering: Technologies and Tools, p. 221 (2021) 30. Zhi-qiang, X., Sheng-hui, L.: A scheduling algorithm based on ACPM and BFSM. Appl. Sci. Technol. 30(30), 36–38 (2003) 31. Zhang, K., Qi, B., Jiang, Q., Tang, L.: Real-time periodic task scheduling considering loadbalance in multiprocessor environment. In: 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content, pp. 247–250. IEEE (2012) 32. Panwar, P., Lal, A.K., Singh, J.: A Genetic algorithm based technique for efficient scheduling of tasks on multiprocessor system. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011), 20–22 December 2011, pp. 911–919. Springer, New Delhi (2012) 33. Sulaiman, M., Halim, Z., Lebbah, M., Waqas, M., Tu, S.: An Evolutionary computing-based efficient hybrid task scheduling approach for heterogeneous computing environment. J. Grid Comput. 19(1), 1–31 (2021) 34. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002) 35. Yadav, P.K., Jumindera, S., Singh, M.P.: An efficient method for task scheduling in computer communication network. Int. J. Intell. Inform. Process. 3(1), 81–89 (2009) 36. Yang, J., Ma, X., Hou, C., Yao, Z.: A Static Multiprocessor scheduling algorithm for arbitrary directed task graphs in uncertain environments. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 18–29. Springer, Berlin (2008) 37. Yellapu, G., Penmetsa, S.K.: Modeling of a scheduling problem with expected availability of resources. OPSEARCH 1(11) (2015). https://doi.org/10.1007/s12597-015-0203-z 38. Yuming, X., Li, K., Khac, T.T., Qiu, M.: A multiple priority queueing genetic algorithm for task scheduling on heterogeneous computing systems. In: IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), pp. 639–646 (2012) 39. Zhu, D., Huang, H., Yang, S.X.: Dynamic task assignment and path planning of multi-AUV system based on an improved self-organizing map and velocity synthesis method in threedimensional underwater workspace. IEEE Trans. Cybern. 43(2), 504–514 (2013) 40. http://www.Kasahara.Elec.Waseda.ac.jp/schedule/
Robust Image Tampering Detection Technique Using K-Nearest Neighbors (KNN) Classifier Prabhu Bevinamarad and Prakash H. Unki
Abstract Today, image tampering has become very common due to easily available inexpensive electronic gadgets and various online and offline sophisticated multimedia editing tools. Many people edit the contents of the original image to hide the truth and circulate it on social media without knowing its implications on the feeling of individuals and society. Also, many times people used to produce the tampered images as proof of documents to deceive and take advantage of online facilities. Therefore, we have proposed a robust tampered image detection system to recognize whether the given input image is tampered or not. The proposed scheme uses an orthogonal transform, i.e., the discrete cosine transform (DCT) and singular value decomposition (SVD) to extract significant image features and k-nearest neighbors (KNN) classifier to classify forged from un-forged images accurately. To validate the robustness of the proposed system, we have considered image samples from two publicly available benchmark datasets, i.e., copy-move forgery dataset (CMFD) [1] and the Columbia image splicing dataset [2]. Based on the results obtained during training and testing phase, the proposed system conforms better detection results in terms of F1 -score, i.e., 92.89 and 93.73, for CMFD and Columbia image splicing dataset, respectively. Keywords Copy-move forgery · Audio forgery · KNN classifier
P. Bevinamarad (B) Department of Computer Science and Engineering, BLDEA’s V.P. Dr. P.G. Halakatti College of Engineering and Technology (Affiliated to Visvesvaraya Technological University, Belagavi-590018, Karnataka), Vijayapura, Karnataka 586103, India e-mail: [email protected] P. H. Unki Department of Information Science and Engineering, BLDEA’s V.P. Dr. P.G. Halakatti College of Engineering and Technology (Affiliated to Visvesvaraya Technological University, Belagavi-590018, Karnataka), Vijayapura, Karnataka 586103, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_19
211
212
P. Bevinamarad and P. H. Unki
1 Introduction In the present situation, multimedia data has become one of the prominent part of various digital automated systems. The multimedia content may be an image, audio, and a combination of audio and video. Due to availability of various sophisticated multimedia editing tools, the individuals can edit multimedia data easily without taking much effort. For an instance, consider a copy-move image tampering operation where the particular region of an image is copied and pasted within a different location. However, in image splicing tampering operation, two distinct image parts are joined to create the tampered image. Similarly, the audio and video can be tampered such as audio copy-move, audio splicing forgery, copy-move forgery in video, frame interchange in video, and video splicing. [3, 4]. Among different multimedia content tampering, the image tampering operations are very common and easy to do operations widely used by many forgers. Therefore, in order to detect and prevent the forgery practices, various image forgery detection techniques were evolved during past decade, and they are grouped into two types, such as block-based and keypointbased techniques. In a former technique, an entire image is split into non-overlapping irregular or regular shape blocks. Each image block is processed to extract significant features. Then, an appropriate block-matching technique is adopted to detect the forgery [5, 6]. However, as in the latter, the significant points of an image are extracted and matched to find the correlated pixels present in an image [7, 8]. The key contributions of this paper include the following. • The DCT is employed to convert an image from spatial to spectral domain to remove redundant data and capture more significant features of an image. • Each 8 × 8 overlapping blocks is subdivided into 2 × 2 non-overlapping blocks, and SVD is applied to extract feature and averaged in order to reduce the feature size. The remaining part of this article is arranged as follows: In Sect. 2, we discuss the existing image tampering detection schemes. Section 3 describes the proposed technique and illustrates a step-by-step procedure along with a flow diagram, while Sect. 4 includes the experimental setup and analysis of results, and at last Sect. 5 concludes the work carried out in this paper.
2 Existing Image Tampering Detection Schemes In the last decade, many researchers have developed various techniques to find whether the given input image undergoes any tampering operation along with a tampered region detection. As per the literature review, the first block-based image tampering detection technique was developed by Fridrich et al. [9] in 2003. Later, many techniques were gradually developed with different feature extraction and matching algorithms to improve the detection accuracy and decrease the compu-
Robust Image Tampering Detection Technique Using …
213
tational complexity. Therefore, this section describes the most recent and relevant techniques evolved during the last decade. Muhammad et al. [10] proposed a method based on steerable pyramid transform (SPT) and texture descriptor and similar method [11] using a combination of SPT and the local binary pattern (LBP) to extract the feature of each SPT sub-band to train support vector machine(SVM) to identify given input image tampered or authentic. Lee et al. [12] developed a method based on the histogram of orientated gradients to extract statistical features, and duplicated regions are identified using similarity matching. Pun et al. [13] introduced an adaptive over-segmentation to divide irregular non-overlapping blocks to extract feature points in order to identify the forgery. Uliyan et al. [14] developed a methodology by combining the Hessian and center symmetric local binary patterns technique to extract image features to detect duplicated regions present in an image. Alhussein [15] developed a method to capture local textures in the form of LBP, and histograms of patterns are prepared to form feature vector and fed to an extreme learning machine (ELM) classifier to detect image tampering. Alahmadi et al. [16] proposed an LBP and DCT-based methodology to extract discriminative localized image features. The SVM is used for training and testing for detecting copy-move and image splicing forgeries present in an image. Lin et al. [17] developed a scheme by extracting combined features using local intensity order pattern (LIOP) and the scale-invariant feature transform (SIFT). Then, a transitive matching technique is employed to detect a duplicated region efficiently. Liu et al. [18] developed a GPU-based convolutional kernel network (CKN) to detect copy-move forgeries present in an image. Huang and Ciou [19] developed a keypoint-based image forgery detection scheme by employing Helmert transformation and simple linear iterative clustering (SLIC) algorithm. Priyanka et al. [20] used DCT and SVD technique to extract reduced the features and fed to SVM for training, and K-means machine learning technique was employed to highlight the forgery region present in an image.
3 Proposed Image Tampering Detection Scheme The proposed system employs DCT to extract significant features of an image and the SVD technique to obtain reduced feature dimension. The main motivation to use DCT is to enable a high degree of spectral compaction at the qualitative level. As a result, the image which is transformed using DCT represents the more energy concentrated pixels into a few coefficients than other discrete Fourier and wavelet transform. The extracted features will be combined to form a feature descriptor matrix and, finally, fed to KNN classifier to train and predict image tampering. Figure 1 shows an operational model of the proposed image tampering detection scheme. The following subsection describes in detail the significance of each step.
214
P. Bevinamarad and P. H. Unki
Fig. 1 Flow diagram of proposed image tampering detection scheme
3.1 Preprocessing Usually, the handling of RGB image data becomes very expensive in terms of storing and processing the data. Indeed, each RGB image pixel will be represented in three different layers that corresponds to red, green, and blue color intensities. Therefore, to efficiently handle the image data and reduce the processing time, the colored images are converted into grayscale using the weighted method as shown in Eq. (1), Y = 0.2989Red + 0.5870Green + 0.1140Blue where red, green, and blue represent the color components of an RGB image.
(1)
Robust Image Tampering Detection Technique Using …
215
3.2 Block Tiling In this step, the converted grayscale image of size W × H is split into overlapping blocks of size Bw × Bh (8 × 8) by keeping stride = 1 and iterating the sliding window process till it reaches the last row and column of a grayscale image. Thus, the total number of overlapping blocks (TOB) obtained using the block tiling process for a selected input image is defined in Eq. (2) as follows: TOB = (W − Bw + 1) × (H − Bh + 1)
(2)
The parameter W, H defines the width and height of the grayscale image. The Bw , Bh indicates the width and height of each overlapping block (8 × 8). Therefore, each overlapping block obtained during the block tiling process contains eight (8) consecutive rows and eight (8) consecutive column’s pixel values from the grayscale image represented by block Bi of size 8 × 8 where i = 1, 2, . . ., TOB.
3.3 Feature Acquisition and Dimensionality Reduction In this step, each overlapping block Bi obtained from the block tiling process is transformed by DCT and quantized using default quantization matrix to group the intervals of data into a single value in order to acquire significant image features. Equation (3) defines the process of quantization and rounding to the nearest integer. Q Bi [w,h] = round
Bi [w, h] Q[w, h]
where w, h ∈ {0, 1, 2, . . . , 7}
(3)
Next, each 8 × 8 DCT transformed overlapping blocks is divided into 2 × 2 nonoverlapping sub-blocks and decomposed using SVD, and greatest diagonal value is selected and averaged to form reduced feature vector in order to train the KNN model effectively and minimize the classification time. For an instance, the overlapping block of size 8 × 8 is divided into 16 non-overlapping sub-blocks of size 2 × 2. Applying SVD on each sub-block, selecting the greatest diagonal value, and averaging, we obtain 1 × 1 reduced features from 1 × 64 corresponding to each overlapping block. Equation (4) defines the process of singular value decomposition as follows: M = U SV H
(4)
where U, V indicate unitary matrix parts and S indicates diagonal matrix contains singular values.
216
P. Bevinamarad and P. H. Unki
3.4 Formation of Feature Descriptor Matrix In this step, the extracted features of each overlapping block are stored columnwise to represent the feature vector of a single image. The entire feature descriptor matrix for ‘n’ images is constructed by repeatedly applying the feature acquisition and dimensionality reduction step described in Sect. 3.3 and storing image features row-wise. Each row of the feature descriptor matrix corresponds to an image, and each column represents the feature of each overlapping block of a particular image.
3.5 Image Tampering Prediction Using KNN Classifier The classification is the strategy of analyzing the given data and categorizing according to established criteria. In the proposed work, we have employed KNN classifier. The KNN classifier uses some reference data and predicts the sample data record to which category it belongs. The KNN algorithm computes the distance between the sample data and all other reference data, and based on the plurality vote of its neighbors, the object is classified. For an instance, k = 5, then the algorithm looks at ‘5’ closest records in the reference data and whatever the majority class in this group of ‘k’ s data records that will be the class for the data record. In KNN, the parameter ‘k’ indicates the numbers of nearest neighbors to consider for comparison. Usually, the value of ‘k’ is selected as odd numbers such as 1, 3, and 5 to avoid the tie and unnecessary delay in comparisons.
4 Experimental Setup and Result Analysis The proposed tampering detection technique is implemented in a system having an Intel Core (TM) i3-5005U processor with 2.0 GHz speed and internal memory of size 4.00 GB using MATLAB® version 2015a with Image Processing ToolboxTM . The evaluation of the proposed scheme is done using the CMFD [1] and Columbia image splicing [2] publicly available dataset. CMFD dataset contains 48 highresolution images of PNG format. It consists of the original and corresponding forged image along with ground truth images. Columbia image splicing dataset contains a total of 1845 including both authentic and forged images of a fixed size of 128 × 128 of BMP file format. Initially, we have taken a total of 91 images from the CMFD dataset [1] and created training and testing data. The training dataset includes total of 64 images, in which 34 images are original and 30 tampered images. The testing dataset includes 27 completely distinct images and consisting of both 14 original and 13 tampered images. Similarly, we have prepared tampered image splicing training and testing data using the Columbia image splicing dataset [2] as described above.
Robust Image Tampering Detection Technique Using …
(a) Original image
(b) Tampered image
(a) Original image
217
(b) Tampered image
Fig. 2 Authentic and tampered image samples from CMFD benchmark dataset [1]
(a) Original image
(b) Tampered image
(a) Original image
(b) Tampered image
Fig. 3 Authentic and tampered image samples from Columbia image splicing dataset [2]
The image samples shown in Figs. 2 and 3 are original and corresponding tampered images taken from CMFD and Columbia image splicing datasets, respectively. During training phase, all the images of respective database are selected, and the significant features of each image are extracted by applying both quantized DCT and SVD techniques without using any segmentation. The extracted features are stored in a row-wise to form a feature descriptor matrix as described in Sect. 3.4. Finally, the entire feature descriptor matrix is fed to KNN to train the model. It employs a KNN classifier to detect the tampered images accurately. At the time of testing, an input query image is randomly selected from the respective testing dataset, and its features are extracted and fed to the KNN classifier to predict the result. The obtained
218
P. Bevinamarad and P. H. Unki
Table 1 Detection results comparison with existing technique evaluated at image level using CMFD benchmark dataset [1] Existing techniques Precision (%) Recall (%) F1 -score (%) Li [22] Pun et al. [13] Priyanka et al. [20] Proposed system
78.33 89.28 90.03 89.61
96.92 79.61 97.12 95.12
87.04 89.73 93.00 92.89
Table 2 Detection results comparison with existing technique evaluated at image level using Columbia image splicing dataset [2] Existing techniques Precision (%) Recall (%) F1 -score (%) Singh and Tripathi [23] Lynch et al. [24] Priyanka et al. [20] Proposed system
79
72
74
94 91 91.23
92 95 95.15
93 93 93.73
results are analyzed using benchmark parameters, namely precision, recall, and F1 score described in [13, 21] and tabulated in Tables 1 and 2 separately. Figure 4 shows the performance comparison graph of proposed technique and existing techniques evaluated using CMFD and Columbia image splicing datasets. It is noticed from Table 1 that the proposed system yields better results for the Columbia image splicing dataset in terms of precision, recall, and F1 -score compared to an existing system. But, the results obtained for the CMFD tampering dataset in Table 2 are slightly lesser than the existing techniques. Hence, there is a demand for a better feature extraction technique need to be derived to represent an image.
5 Conclusion The proposed scheme uses the block tiling process to divide the given input query image into 8 × 8 fixed size overlapping blocks. The block features are extracted and reduced using DCT and SVD techniques. Finally, the extracted features are fed into the KNN classifier to train the classifier and predict the results. It is clear from the results obtained from experimentation and tabulated in Sect. 4 that the KNN classifier yields better results from existing systems in terms of performance metrics and is also robust against various geometrical transformation and post-processing operations. Nevertheless, the tampering can be done even more difficult using various post-processing operations and blend thereof. Indeed, in the future, we focus to
Robust Image Tampering Detection Technique Using …
100
93
89.73
87.04
F1 -score (%)
60 40
74
60 40
st sy d se op o Pr
et iy an ka
nc h
em
20 20 )[1 6] al .(
.C R
G Pr
sy d se po
.(2 01 3) [2 3]
.(2 01 1) [2 2]
st em
6]
(a)
Pr o
et ka Pr iy an
Pu
n
et
al
al
.(2
.(2
02
01
0)
5)
[1
[9
]
4] )[2 13 (2 0 Li n na
93.73
93
20
20
Yu e
80
93
Ly
F1 -score(%)
100
92.89
80
219
(b)
Fig. 4 Results comparison between proposed technique and existing techniques evaluated using a CMFD benchmark dataset [1] and b Columbia image splicing dataset [2]
incorporate deep learning techniques such as convolutional neural networks (CNNs) and CapsuleNet modules to increase the power of classification and mark the forgery regions.
References 1. Image manipulation dataset, Department of Computer Science, Friedrich Alexander University. Available at https://www5.cs.fau.de/research/data/image-manipulation. Accessed on 16th Jan 2018 2. Columbia image splicing detection evaluation dataset, DVMM Laboratory of Columbia University. Available at https://www.ee.columbia.edu/ln/dvmm/AuthSplicedDataSet/photographers. html. Accessed on 19th Apr 2018 3. Bevinamarad, P.R., Shirdhonkar, M.S.: Audio Forgery detection techniques: present and past review. In: IEEE 4th International Conference on Trends in Electronics and Informatics (ICOEI2020), pp. 609–614 (2019) 4. Bevinamarad, P.R., Mulla, M.U.: Review of techniques for the detection of passive video Forgeries. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2, 199–203 (2017) 5. Tralic, D., Zupancic, I., Grgic, S., Grgic, M.: CoMoFoD—new database for copy-move forgery detection. In: Proceedings Elmar—International Symposium on Electronics in Marine, no. September, pp. 49–54 (2013) 6. Ryu, S.J., Kirchner, M., Lee, M.J., Lee, H.K.: Rotation invariant localization of duplicated image regions based on Zernike moments. IEEE Trans. Inf. Forensics Secur. 8(8), 1355–1370 (2013) 7. Amerini, I., Ballan, L., Caldelli, R., Del Bimbo, A., Serra, G.: A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Trans. Inf. Forensics Secur. Part 2 6(3), 1099–1110 (2011)
220
P. Bevinamarad and P. H. Unki
8. Pan, X., Lyu, S.: Region duplication detection using image feature matching. IEEE Trans. Inf. Forensics Secur. 5(4), 857–867 (2010) 9. Fridrich, J., Soukal, D., Lukáš, J.: Detection of copy-move forgery in digital images. In: Proceedings of Digital Forensic Research Workshop, Cleveland, Ohio, USA (2003) 10. Muhammad, G., Al-Hammadi, M.H., Hussain, M., Mirza, A.M., Bebis, G.: Copy move image forgery detection method using steerable pyramid transform and texture descriptor. In: IEEE EuroCon 2013, no. July, pp. 1586–1592 (2013) 11. Muhammad, G., Al-Hammadi, M.H., Hussain, M., Bebis, G.: Image forgery detection using steerable pyramid transform and local binary pattern. Mach. Vis. Appl. 25(4), 985–995 (2014) 12. Lee, J.C., Chang, C.P., Chen, W.K.: Detection of copy-move image forgery using histogram of orientated gradients. Inf. Sci. (NY) 321, 250–262 (2015) 13. Pun, C.M., Yuan, X.C., Bi, X.L.: Image forgery detection using adaptive over segmentation and feature point matching. IEEE Trans. Inf. Forensics Secur. 10(8), 1705–1716 (2015) 14. Uliyan, D.M., Jalab, H.A., Abdul Wahab, A.W.: Copy move image forgery detection using Hessian and center symmetric local binary pattern. In: ICOS 2015—2015 IEEE Conference on Open Systems, pp. 7–11 (2016) 15. Alhussein, M.: Image tampering detection based on local texture descriptor and extreme learning machine. In: Proceedings—2016 UKSim-AMSS 18th International Conference on Computer Modeling and Simulation, UKSim 2016, pp. 196–199 (2016) 16. Alahmadi, A., Hussain, M., Aboalsamh, H., Muhammad, G., Bebis, G., Mathkour, H.: Passive detection of image forgery using DCT and local binary pattern. Signal Image Video Process. 11(1), 81–88 (2017) 17. Lin, C., et al.: Copy-move forgery detection using combined features and transitive matching. Multimed. Tools Appl. 78(21), 30081–30096 (2019) 18. Liu, Y., Guan, Q., Zhao, X.: Copy-move forgery detection based on convolutional kernel network. Multimed. Tools Appl. 77(14), 18269–18293 (2018) 19. Huang, H.Y., Ciou, A.J.: Copy-move forgery detection for image forensics using the superpixel segmentation and the Helmert transformation. EURASIP J. Image Video Process. 1, 2019 (2019) 20. Priyanka, Singh, G., Singh, K.: An improved block based copy-move forgery detection technique. Multimed. Tools Appl. 79(19–20), 13011–13035 (2020) 21. Christlein, V., Riess, C., Jordan, J., Riess, C., Angelopoulou, E.: An evaluation of popular copy-move forgery detection approaches. IEEE Trans. Inf. Forensics Secur. 7(6), 1841–1854 (2012) 22. Li, Y.: Image copy-move forgery detection based on polar cosine transform and approximate nearest neighbor searching. Forensic Sci. Int. 224(1–3), 59–67 (2013) 23. Singh, V.K., Tripathi, R.C.: Fast and efficient region duplication detection in digital images using subblocking method. Int. J. Adv. Sci. Technol. 35, 93–102 (2011) 24. Lynch, G., Shih, F.Y., Liao, H.-Y.M.: An efficient expanding block algorithm for image copymove forgery detection. Inf. Sci. 239, 253–265 (2013)
LIARx: A Partial Fact Fake News Data Set with Label Distribution Approach for Fake News Detection Sharanya Venkat, Richa, Gaurang Rao, and Bhaskarjyoti Das
Abstract In a world where most of the information received is digital through the means of various online sources, the authenticity of available information always poses an atmosphere of doubt over the information. The rapid dissemination of fake news has led to swayed opinions, influencing decisions and misleading beliefs. As humans learn how to combat fake news, it is increasingly turning out to be subtly distorted from the truth. Existing approaches address this aspect of continuum by defining multiple classes depending on extent of fakeness. This paper proposes a novel approach of Label Distribution Learning to emphasise the degree of importance of each class in the underlying distribution rather than producing an affirmative classification. It showcases this approach by proposing a partial fact data set consisting of instances labelled with their truth values and leverages the non-binary nature of Label Distribution Learning. Keywords Label Distribution Learning · Partial fake news data set · Multi-label learning · Pre-trained embedding
1 Introduction Inferring a classification function from a set of training data with clearly defined labelled instances is known as single-label learning (SLL) [12] where one label is assigned to a particular instance. Recently, the problem of label ambiguity has become significant in machine learning. Label ambiguity is when several candidate labels can be assigned to a single instance. To tackle this problem, another variant called multi-label learning (MLL) [14] has emerged which assumes each instance can be associated with multiple class labels. MLL is different from multi-class learning as it does not have the mutual exclusivity constraint that multi-class learning has.
S. Venkat · Richa · G. Rao (B) · B. Das PES University, 100 ft Ring Road, Banashankari Stage III, Bengaluru, Karnataka 560085, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_20
221
222
S. Venkat et al.
Essentially, both SLL and MLL consider the relation between instance and label to be binary [3, 7]. However, there are a variety of real-world tasks which involve instances that are involved with labels in different degrees of importance. Hence, associating an instance with a soft label instead of a hard one seems to be a reasonable solution. Recent advances in this front have resulted in a novel learning paradigm, called Label Distribution Learning [5] (LDL). LDL tackles the problem of label ambiguity by proposing a more general approach which is based on an assumption that any instance can have multiple labels associated with it, even though the extent to which each label is associated with the instance may vary. Single-label learning and multilabel learning can be considered as special cases of LDL, which is a rather general learning framework, and answers how much of each label an instance can belong to, as opposed to which labels an instance belongs to. LDL is different from partial multi-label learning (PML) [17] which is focused on coping with the noisy label problem of multi-label data sets. Post the 2016 US elections that witnessed widespread effects of fake news, fake news detection became a popular research topic in the machine learning community, resulting in many data sets and linguistic models [10]. However, over the last few years, fake news has started taking more complex forms as propagators started combining fake news with real news to bypass existing fake news detectors. It is no longer possible to entirely classify an instance as fake or not [11]. Hence, increasingly, fake news is conceptualised as a continuum. Fact checking websites classify articles into multiple classes based on extent of fakeness (Snopes 12 level, Politifact 6 level). The various fake news data sets also follow a similar strategy of defining multiple classes of fakeness, i.e. LIAR (6), FEVER (3), Buzzfeednews (4), Buzzface (4), BS detector (10), etc. The recently published data set Fakenewsnet [13], however, has only two labels, though it offers additional information such as propagation context, multi-modal content and spatiotemporal information. This work employs an approach completely different from defining multiple classes, i.e. Label Distribution Learning. The key contributions of the paper are summarized as follows: 1. LDL is used to classify instances in a partial fact data set to detect what per cent of the instance is true or fake. Though only two labels are used, this approach can be easily scaled to many labels as described above. 2. To showcase this approach, this work proposes a partial fact data set called LIARx improvising on the LIAR data set by Wang [15] in the domain of US politics. The original LIAR data set has not been updated, thus consisting of outdated data. The data set LIARx proposed in this paper solves that problem and also extends the LIAR data set to contain more recent data. The data set is published according to the FAIR guiding principle [16].1 To display the research done, the paper is organised in the following manner: starting with 1
Sharanya Venkat, Richa, Gaurang Rao, & Bhaskarjyoti Das. (2021). LIARx: A Partial Fact Fake News Data set (Version v1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4723768.
LIARx: A Partial Fact Fake News Data Set …
223
related work in Sect. 2, the paper reviews existing work on the research problem and inspiration for solutions. The paper then presents the data set created and used for the purpose of this research in Sect. 3. Further, the implementation details are discussed in Sect. 4, along with the evaluation of results obtained, in Sect. 5. This paper finally presents possible future work and the conclusion in Sect. 6.
2 Related Work The type of fake news considered in this work is different from content that can be classified as polarised, satire, misreporting by professional news organisations, rumour, commentary, opinion, citizen journalism, etc. The definition [21] of fake news considered in this work is an intentionally false news that can be verified as fake. Essentially, a fake news has three aspects, i.e. news, intention and authenticity. A typical fake news life cycle consists of three stages, i.e. creation of an article about the news, publication of the same (on social media and on publishing websites) and propagation. The first stage is closely associated with knowledge in the article and credibility of the source. The second stage is associated with linguistic style, whereas the third stage is associated with the characteristics of the spreaders’ network. Hence, fake news detection effort has used all four and recently a combination of them. Molina et al. [9] provide a comprehensive description of the fake news indicators in four areas, i.e. content, structural information from publishing web pages, source (independence, pedigree, verifiable or not) and network. The content captures important aspects like factuality, evidence, journalistic style, lexical and syntactic patterns. The Label Distribution Learning work proposed in this paper is inspired by the works of Geng [5] that proposed a novel approach to solve the problem of label ambiguity while dealing with data that involves a distribution of labels. Researchers have used LDL in different problems with multi-label cases such as emotion recognition from facial expressions [22], crowd estimation where crowd images have multiple labels [19], estimating face age [6] where multiple age labels could be provided to face images. Recently, Liao, et al. [8] use LDL framework in a foetal age prediction problem to address congenital anomalies. In other recent work, Zu et al. [18] use topological information in feature space to recover the label distribution information and finally used a multi-class predictive model. Shikai Chen et al. [2], while addressing the label ambiguity in facial expression data sets, adopt a similar approach of recovering label distribution first, from the label space of few well-defined tasks. Though partial fake news can be investigated using all indicators, i.e. knowledge, source, style and propagation, the work in this research paper is limited to the linguistic style aspect. The style-based fake news detection approach can make use of both, non-latent features in traditional machine learning approaches as well as latent features using deep learning approaches. Recent work by Zhou et al. [20] showed that latent features generally outperform non-latent features in fake news detection.
224
S. Venkat et al.
Hence, the same approach has been adopted in this work. In the proposed LIARx data set, the distribution of labels of true and fake news is well defined; i.e. additional effort to recover label distribution information is not required. Every instance of the data set has three possibilities, i.e. two fake statements, two non-fake statements and one fake statement with one non-fake statement.
3 LIARx Data Set While LIAR [15] generously encompassed several features into their data set, LIARx proposed here is put forward with the idea of showcasing the feasibility of LDL to capture the non-binary nature of fakeness based solely on the information itself. This work identifies the source of derivation of the LIAR data set, and by using that as a starting point, the following methodology is devised to generate the LIARx data set.
3.1 Extending LIAR Data Set The LIARx data set extends from the LIAR data set, which contains data only until 2017 (when the data set was published), and adds newer data points, containing data until the end of January 2021. LIARx is published after updating the data from LIAR, thus containing more recent data.1
3.2 Preprocessing The preprocessing step involves basic cleaning such as removing quotation marks, extra white spaces and unwanted symbols. Lemmatization is performed to convert every term to its base form. The text is not converted to lowercase, as some entities are recognised by the capitalisation of the first letter. All stop words present in the data set are removed, as defined by the spacy library for the English language. A crucial step of pronoun disambiguation is performed which converts pronouns to their equivalent entity. For instance, in the sentence “Barack Obama says he’s going to reduce longterm debt and deficit by $4 trillion”, upon disambiguation of the pronoun he’s to Barack Obama the sentence converts to “Barack Obama says Barack Obama’s going to reduce long-term debt and deficit by $4 trillion”.
LIARx: A Partial Fact Fake News Data Set …
225
Table 1 Comparing LIAR, extended LIAR and LIARx: number of data points LIAR Extended LIAR LIARx (PF) Test Train Validate Total
1283 10,269 1284 12,836
1752 14,016 1752 17,520
865 6918 865 8648
3.3 Creating Partial Fact Data Set After preprocessing the data set, every instance in the data set is classified as either entirely true or entirely false. This paper modifies this to create a partial fact data set, where some part of an instance is true and some part of an instance is false with the end goal of identifying what degree of an instance is true or false. The first step towards creating the partial fact data set is to generate an encoding of each textual instance in the data set. Similarity of instances is devised by performing text encoding using bidirectional encoder representations from transformers (BERT) [4] on the instances of the data set and clustering the encoded statements using HDBSCAN [1]. Two similar instances are manually picked and concatenated to generate a partial fact instance.
3.4 Label Distribution The set of labels employed in this study are {True, False}. To get the label distribution, the proportions of true and false facts are estimated. In the partial fact data set LIARx, for MLL, completely true, partially true and completely false instances will have labels [1,0], [1,1], [0,1]. For LDL, the distribution for the above cases will be represented by [1,0], [0.5,0.5] and [0,1], respectively. The original test–train– validate ratios of statements in LIAR are maintained for the LIARx data set that consists of 8.6k instances (Table 1).
4 Implementation 4.1 Label Distribution Learning In Label Distribution Learning, each instance is linked with a real-valued vector where each element of the vector represents the extent of association of a label with the instance. If xi denotes the i th instance and y j denotes the j th label, the extent
226
S. Venkat et al.
of association of the label y j to the instance xi is denoted by the real-valued vector y Vi whose elements are denoted by Vxi j . Each element of this real-valued vector is a value from within the interval [0, 1] and represents the degree of description for a particular label. For any instance xi , the sum of all such elements of the LDL vector is 1. The primary task at hand is to learn the distribution of the instances in the data set in order to model the distribution for new instances. Kullback–Leibler (KL) divergence is used as the loss function to calculate the difference between the distribution of the instance and the underlying distribution of the data set. KL divergence is calculated as N P(xi ) . p(xi ) log D K L (PQ) = Q(xi ) i=1 where Q(x) is the approximation of the distribution of the input instance and P(x) is the true distribution of the data set we are interested in matching Q(x) to.
4.2 Pre-trained Embeddings To pass the statements into the LDL model, the statement is required to be encoded into vectors, which is performed using BERT [4] a textual encoder. BERT is very famous in a wide variety of NLP tasks and makes use of transformers, an attention mechanism that learns relations between words (or sub-words), based on surrounding context in given text. BERT is available with different architectures in the form of pre-trained models, using a masked language modelling (MLM) objective. BERTas-a-Service is used, allowing one to easily encode the textual data using BERT models. During the experimentation under this project, BERT base-cased was used which contains 12 layers of 768 hidden neurons each along with 12 attention heads, resulting in 109 million parameters (Fig. 1).
4.3 Architecture The input statements are encoded into vectors using BERT that are fed to the machine learning model. A neural network consisting of two layers(excluding the output layer) is used, and the encoded vectors of the training section of the data set are fed into this network. The neural network itself is made up of a layer of 64 neurons followed by a layer of 32 neurons. Techniques such as dropout and early stopping are employed during the training of this network to make the model robust. The results mentioned in Sect. 5 are based on this architecture.
LIARx: A Partial Fact Fake News Data Set …
227
Fig. 1 Proposed LDL architecture Table 2 Accuracy in percentage for various model architectures Model Model type M1 M2 M3 Model M4
Single-label learning(SLL):BERT Label Distribution Learning(LDL):BERT Multi-label learning(MLL):BERT Model type Multi-label learning(MLL): BERT+Transformer
Accuracy 50.8% 83.4% 30.1% LRAP 0.93
5 Evaluation Table 2 compares the performance of various models used to classify textual instances related to US politics. Models M1, M2 and M3 use accuracy while model M4 uses label ranking average precision (LRAP) score as a measure of evaluation. LRAP measures the average precision of the predictive model by examining the label ranking of each sample. Model M1 uses an SLL approach to classify the instances of the partial fact data set and is not adequate to accommodate composite statements consisting of partial facts and hence shows poor results. Model M2 employs LDL which is able to classify an instance with a degree of importance associated with all labels, hence outperforming the previous model and producing superior results. Model M3 uses a naive MLL approach, whereas model M4 improvises on M3 and uses a multi-label classification pre-trained transformer model from Simple Transformer library and achieves a LRAP score of 0.93. Models M1, M3 and M4 use binary cross-entropy, whereas M2 uses KL divergence as the loss function. Since M4 uses binary labels, the LRAP measure does not adequately capture the non-binary label distribution, resulting in model M2 as the baseline. It achieves an accuracy of 83.4%.
228
S. Venkat et al.
6 Conclusion and Future Work This paper proposes a partial fact data set that addresses the non-binary nature of fake news. In order to learn the degree of veracity of each instance, a label distribution approach is investigated to model this task, instead of approaches such as SLL and MLL that are binary in nature. Bidirectional encoder representations from transformers (BERT) embeddings are used to capture the context Of the text from left-to-right and right-to-left. The model seems to perform significantly well corroborating the significance of applying Label Distribution Learning in this use case. With the introduction of this data set, future research is encouraged to look into more refined models that can leverage this approach to generate stronger classifiers. This data set has its limitations, as this research takes the liberty of combining only two statements from a pool of similar statements, thus limiting the range of values that the distribution could take. Further work could be done in this area to ensure a more general distribution and not limit its range. The existing fake news data sets that address the fakeness continuum by creating multiple classes can also be re-examined using this approach.
References 1. Campello, R.J., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160– 172. Springer, Berlin (2013) 2. Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13984–13993 (2020) 3. Cheng, W., Dembczynski, K., Hüllermeier, E.: Graded multilabel classification: the ordinal case. In: ICML (2010) 4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018) 5. Geng, X.: Label distribution learning. IEEE Trans. Knowl. Data Eng. 28(7), 1734–1748 (2016) 6. He, Z., Li, X., Zhang, Z., Wu, F., Geng, X., Zhang, Y., Yang, M.H., Zhuang, Y.: Data-dependent label distribution learning for age estimation. IEEE Trans. Image Process. 26(8), 3846–3858 (2017) 7. Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artif. Intelligence 172(16–17), 1897–1916 (2008) 8. Liao, L., Zhang, X., Zhao, F., Lou, J., Wang, L., Xu, X., Zhang, H., Li, G.: Multi-branch deformable convolutional neural network with label distribution learning for fetal brain age prediction. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 424–427. IEEE (2020) 9. Molina, M.D., Sundar, S.S., Le, T., Lee, D.: “Fake news” is not simply false information: a concept explication and taxonomy of online content. In: American Behavioral Scientist, p. 0002764219878224 (2019) 10. Pérez-Rosas, V., Mihalcea, R.: Experiments in open domain deception detection. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1120–1125 (2015) 11. Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A Stylometric Inquiry into Hyperpartisan and Fake News. arXiv preprint arXiv:1702.05638 (2017)
LIARx: A Partial Fact Fake News Data Set …
229
12. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333 (2011) 13. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: Fakenewsnet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8(3), 171–188 (2020) 14. Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehousing Mining (IJDWM) 3(3), 1–13 (2007) 15. Wang, W.Y.: "Liar, liar pants on fire": A New Benchmark Dataset for Fake News Detection. arXiv preprint arXiv:1705.00648 (2017) 16. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016) 17. Xie, M.K., Huang, S.J.: Partial multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018) 18. Xu, N., Liu, Y.P., Geng, X.: Partial multi-label learning with label distribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6510–6517 (2020) 19. Zhang, Z., Wang, M., Geng, X.: Crowd counting in public video surveillance by label distribution learning. Neurocomputing 166, 151–163 (2015) 20. Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: a theory-driven model. Digital Threats: Res. Pract. 1(2), 1–25 (2020) 21. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53(5), 1–40 (2020) 22. Zhou, Y., Xue, H., Geng, X.: Emotion distribution recognition from facial expressions. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1247–1250 (2015)
A Block-Based Data Hiding Technique Using Convolutional Neural Network P. V. Sabeen Govind
and M. V. Judy
Abstract In this paper, we propose a block-based data hiding method using convolutional neural network. The capability of a convolutional neural network to automatically learn the feature is exceptional. Hence, these recent times have seen a predominant usage of this kind of deep learning network in numerous research areas. Initially, the cover image is divided into a non-overlapping block. Convolutional neural network classifies these blocks into either smooth block or texture block. Normally, the changes made to the texture areas are imperceptible to human eye when compared to the smooth areas. So, our data embedding algorithm will select only the texture block for data embedding process. Hence, visual distortion created by the embedding process is less, and a high-quality stego image is obtained. Experimental results proved that our proposed method produced high-quality stego images, and an average of 2 dB increase in the peak signal-to-noise ratio was observed in comparison to the recent works in this domain. Keywords Cover image · Data hiding · Convolution · Embedding
1 Introduction Data hiding is a prominent technique for secret communication [1] where sender can use a cover image as bearer to disguise secret information. The biggest challenge in this research is, whether our embedding algorithm will create some distortion in the cover image. Researchers are working in this domain for better algorithm which will reduce the overall distortion. Edge-based steganography [2] is an approach where edge pixels are used for data embedding. The changes in the edge areas are normally unnoticeable to human visual system. Canny, Sobel, Prewitt are the most common P. V. Sabeen Govind (B) Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala, India e-mail: [email protected] P. V. Sabeen Govind · M. V. Judy Rajagiri College of Social Sciences (Autonomous), Kalamassery, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_21
231
232
P. V. Sabeen Govind and M. V. Judy
edge detection [3] methods, and in [4], they used a fuzzy technique for edge detection, and more number of edges are extracted. Dadgostar and Afsari [5] also used fuzzybased approach for edge detection, and the cover image is divided into edge regions and non-edge regions where more payload is embedded into edge regions. The potential of a data hiding algorithm is normally evaluated using payload capacity which determines the total number of bits that can be concealed using the given algorithm and the amount of distortion created by the embedding process. Generally, higher payload leads to more distortion and are more vulnerable to attackers. In this digital era, security is a major concern when we are sending data. Encryption is a technique in which original message is transformed into another format. Recently, numerous work has been published in this domain which are noteworthy. In [6], a novel encryption scheme using the concept of cellular automata has been introduced and where the algorithm resists all kind of attacks. Data hiding is also an alternative to encryption. Using this technique, the sender can conceal all the secret data into a cover medium, and the existence of the data is not noticeable to any third party. Researchers are continuously working in this area to enhance the quality of the algorithm in terms of capacity and quality. The major contributions of this work are as follows: • Convolution neural network (CNN) is used for cover image block classification. • Texture blocks are used for embedding process so that altogether distortion is minimal. • Algorithm produces high-quality stego images. The paper is structured as follows. The proposed work is given in Sect. 2, and detailed experimental results and discussion are given in Sect. 3, and finally, the concluding remarks are given in Sect. 4.
2 Proposed Method Figure 1 shows the block diagram of the proposed technique. In the block processing phase, the cover image is divided into non-overlapping block of size 32 × 32. A convolutional neural network (CNN) is designed for classifying these blocks into two categories: soft block and texture block. Figure 2 shows a sample cover image with a smooth block (marked in red) and a texture block (marked in blue). CNN is a type of deep learning network [7]. Grayscale test images from USCSIPI image database [8] have been used, and 3000 texture block and 3000 soft block with size 32 × 32 have been extracted from various test images. This classification is initially done based on two parameters called block variance and entropy [9]. If variance and entropy are less, then that block is treated as a soft else texture block. These image patches are used for training and testing the CNN. Figure 3 shows sample soft block and texture block used for training the CNN. A CNN has convolutional layers, pooling layers, and fully connected layers. In our architecture, four convolutional layers with kernel size 3 × 3 and ReLU activation
A Block-Based Data Hiding Technique Using Convolutional …
233
Cover Image
Block Processing
Convolutional Neural Network (CNN)
Soft Block
Texture Block
Block Copying
Data Embedding
Stego Image
Fig. 1 Block diagram of the proposed method Fig. 2 Sample cover image with smooth and texture block marked
Secret Data
234
P. V. Sabeen Govind and M. V. Judy
Fig. 3 Sample soft block and texture block
function [10] were used. Also, three pooling layers with pool size 2 × 2 and in the fully connected layers, one flatten layer, and two dense layer with ReLU activation function were used. 2500 soft blocks, and 2500 texture blocks were used for training the network, and remaining 500 blocks each were used for testing the network, and a 94% accuracy for classification was obtained. Texture block is used only for the data embedding purpose, while the soft block is directly copied as the stego image block. To evaluate the performance of the proposed technique, least significant bit (LSB) substitution [11] was used for data embedding. Algorithm compares the LSB of each pixel in the texture block with the secret bit that the sender wants to embed. If both are equal, we can keep the same LSB; otherwise, there is a need to change the LSB accordingly. After the block copying and embedding, we received a high-quality stego image. Receiver can also use the same CNN for identifying the soft block and texture block, and the receiver will extract the data from LSBs of the texture block.
3 Experimental Results Test images from USC-SIPI image data base [8] were used for conducting experiments. We used MATLAB R2021a environment, and secret data were generated using a random function. Peak signal-to-noise ratio (PSNR) [12] and structural similarity index (SSIM) [13] were the commonly used metric to evaluate the quality of the stego image. Table 1 shows the number of edge pixel detected using Canny, Sobel, Prewitt, LOG, and fuzzy-based algorithms in two standard test images (512 × 512) Lena and Baboon. Figure 4 shows the test image Lena and Baboon.
A Block-Based Data Hiding Technique Using Convolutional …
235
Table 1 Edge detection methods Edge detection algorithm
Total pixels
Number of edge pixels in Lena
Number of edge pixels in Baboon
Canny
262,144
6256
11,848
Sobel
262,144
2035
2833
Prewitt
262,144
2017
2764
LOG
262,144
5635
9512
Fuzzy based
262,144
12,807
13,339
Fig. 4 Test image Lena and Baboon
Using our CNN, number of texture block (block size 32 × 32) identified from Lena and Baboon were 82 and 124, respectively. On comparison of this value with the number of edge pixel available for data embedding in Table 1, this value is much higher. Table 2 shows the performance comparison with two latest works where 8 KB of secret information is embedded in the texture blocks of the test images of size 256 × 256, and the PSNR value is tabulated. Higher PSNR value in Table 2 indicates that the proposed method gives high-quality stego images. Figure 5 shows two stego images Tree and House. Figure 6 graphically depicts the comparison of proposed method with [1, 14]; the suggested method yields higher PSNR in all the test images. Table 2 Performance comparison at 8 KB
Test image
Joshi et al. [1]
Sabeen Govind et al. [14]
Proposed
House
49.35
55.02
57.12
Tree
49.29
57.39
58.17
Lena
49.37
54.43
56.41
Baboon
49.38
56.17
58.74
Moon
49.33
56.12
58.25
236
P. V. Sabeen Govind and M. V. Judy
Fig. 5 Stego image Tree and House
Fig. 6 PSNR comparison
4 Conclusions A block-based data embedding method using CNN is proposed in this work. Texture blocks are only used for embedding the secret data so that the overall distortion is very less. Hence, visual quality of the stego image is remarkably good. We can use some high-capacity algorithms for data embedding in the texture block to improve the payload capacity. In the future, we would like to classify the cover image block into smooth, hard, and highly texture block. In addition to this, we can apply some edge detection techniques in the hard and highly textured block, and these pixels can be used for embedding so that visual quality can be further improved.
A Block-Based Data Hiding Technique Using Convolutional …
237
References 1. Joshi, K., Gill, S., Yadav, R.: A new method of image steganography using 7th bit of a pixel as Indicator by introducing the successive temporary pixel in the gray scale image. J. Comput. Netw. Commun. 1–10 (2018) 2. Islam, S., Modi, M.R., Gupta, P.: Edge-based image steganography. EURASIP J. Info. Secur., pp. 1–14 (2014) 3. Kumar, S., Singh, A., Kumar, M.: Information hiding with adaptive steganography based on novel fuzzy edge identification. Defence Technol., pp. 162–169 (2019) 4. Vanmathi, C., Prabu, S.: Image steganography using fuzzy logic and chaotic for large payload and high imperceptibility. Int. J. Fuzzy Syst. 20, 460–473 (2018) 5. Dadgostar, H., Afsari, F.: Image steganography based on inter valued intuitionistic fuzzy edge detection and modified LSB. J. Inf. Secur. Appl. 30, 94–104 (2016) 6. Roy, S., Shrivastava, M., Pandey, C.V., et al.: IEVCA: An efficient image encryption technique for IoT applications using 2-D Von-Neumann cellular automata. Multimed. Tools Appl. (2020).https://doi.org/10.1007/s11042-020-09880-9 7. Sarvamangala, D.R., Kulkarni, R.V.: Convolutional neural networks in medical image understanding: a survey. Evol. Intel. (2021).https://doi.org/10.1007/s12065-020-00540-3 8. USC-SIPI Image Database, University of Southern California. Available online at http://sipi. usc.edu/database 9. Atee, H.A., Ahmad, R., Noor, N.M., Rahma, N.M.S., Aljeroudi, Y.: Extreme learning machine based optimal embedding location finder for image steganography. PLOS ONE (2017). https:// doi.org/10.1371/journal.pone.0170329 10. Lin, G., Shen, W.: Research on convolutional neural network based on improved Relu piecewise activation function. Proc. Comput. Sci. pp. 977–984 (2018) 11. Swain, G.: Very high capacity image steganography technique using quotient value differencing and LSB substitution. Arab J. Sci. Eng. 44, 2995–3004 (2019) 12. Sabeen Govind, P.V., Judy, M.V.: A secure framework for remote diagnosis in health care: a high capacity reversible data hiding technique for medical images. Comput. Electr. Eng. 89, 106933 (2021) 13. Zhou Wang, A., Bovik, C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. In: IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612 (2004).https://doi.org/10.1109/TIP.2003.819861 14. Sabeen Govind, P.V., Varghese, B.M., Judy, M.V.: A high imperceptible data hiding technique using quorum function. Multimedia Tools and Applications. Springer (2021) https://doi.org/ 10.1007/s11042-021-10780-9
Energy-Efficient Adaptive Sensing Technique for Smart Healthcare in Connected Healthcare Systems Duaa Abd Alhussein, Ali Kadhum Idrees , and Hassan Harb
Abstract Nowadays, keeping good and strong health is one of the main concerns of the government’s health monitoring systems. These systems are based on the data gathered by biosensors deployed on the body of the patients. One of the main challenges for health-based sensing applications is the big and heterogenic gathered data by these biomedical sensors. Since the biosensors have limited resources in terms of memory, energy and computation and they transmit a large amount of data periodically, therefore, it is important to reduce the sent data to save energy while preserving the accuracy of data at the coordinator. In this paper, an energy-efficient adaptive sensing technique (EASeT) for Smart Healthcare in connected healthcare systems is proposed. EASeT is executed at each biosensor. It works into rounds. The round contains two periods. It consists of two steps: detection of emergency and biosensor sensing adaptation. First, EASeT employed National Early Warning Score (NEWS) to eliminate redundant medical-sensed data before transmitting them to the coordinator. Second, the adaptive sensing rate algorithm is applied after each two periods to adjust the sensing rate according to the situation of the patient for two consecutive periods. The simulation results are achieved based on real sensed data of patients show that the proposed EASeT technique can reduce the transmitted data and decrease the consumed energy while maintaining the suitable accuracy of data at the coordinator in comparison with an existing approach. Keywords Internet of Things (IoT) · Wireless body sensor networks (WBSNs) · Connected health care · Emergency detection · Adaptive sensing rate
D. A. Alhussein · A. Kadhum Idrees (B) Department of Computer Science, University of Babylon, Babylon, Iraq e-mail: [email protected] H. Harb Computer Science Department, American University of Culture and Education (AUCE), Nabatiyeh/Tyre, Lebanon e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_22
239
240
D. A. Alhussein et al.
1 Introduction Recently, the world faces several challenges such as increasing number of patients, aged persons, population that effect on the public health. This will make the task of hospitals and medical staff is very difficult. In last few years, health care has received a lot of interest from governments, companies, and people, as they spent a lot of money to provide distinct health services and applications [1]. The high-speed development in the IoT technologies, biosensor devices, and big data techniques led to grow and emerge the systems of health care that are called connected health care [2–5]. This technology provided easy and cheap solutions to monitor and track the patient wherever he is and at any time and allow experts to access patient data remotely. The healthcare system is composed of a group of biosensors that are responsible for monitoring the patient situation and sensing the vital signs like respiration, oxygen, rate of heart, the pressure of blood, and temperature then send it to the coordinator to achieve more analysis and processing [2, 6]. There are some important challenges in the connected healthcare application such as: decrease the consumed energy of the biosensor devices to guarantee as long monitoring as possible for the patient and fast detecting of the patient’s emergency and report it to the medical experts to provide the appropriate decision. To deal with these challenges, this paper proposed an energy-efficient adaptive sensing technique (EASeT) for Smart Healthcare in IoT networks. EASeT integrates between two energy-saving approaches: data reduction with emergency detection of patient and adaptive sensing of the biosensor. The first phase aims to discover the emergency of the patient and eliminate the repetitive medical data before send it to the coordinator. The second phase achieves the adaptive sensing based on the similarity between the scores of the last two periods. The remainder of this paper is organized as follows. The related work is explained in the next section. Section 3 demonstrates the proposed EASeT technique for Smart Healthcare in IoT networks. The simulation results and analysis are presented in Sect. 4. Section 5 introduces the conclusions and future work.
2 Related Literature One of the effective solutions in hospitals is to use the connected health care to save and process the sensed vital signs of the patients to make the appropriate decision to save their lives. Some related work is focused on compression methods to reduce the huge data [7, 8], aggregation [9, 10], and prediction methods [11]. In [11], the authors propose a technique named priority-based compressed data aggregation (PCDA) to minimize the medical-sensed data. The authors employed compressed sensing with cryptography to compress the data while saving the quality of received data.
Energy-Efficient Adaptive Sensing Technique …
241
The work in [12–14] presented an adaptive sampling with risk evaluation to decide for monitoring the patients by WBSNs. They proposed a framework to gather medical data by the biosensors and then introduce the risk of the patient using fuzzy logic. Finally, they presented an algorithm for deciding according to the level of patient risk. Shawqi and Idrees [15, 16] introduced a power-aware sampling method using several biosensors to provide the risk of the patient and the best decision to notify the medical experts. First, multisensor sampling based on the weighted scores model is introduced, and then, they suggested a decision-making algorithm that applied at the coordinator. The works in [17–21] proposed adaptive sampling approaches for WSNs. They employed similarity measures and some data mining techniques to measure the similarity between two dataset of two periods so as to change the sampling rate accordingly. In [22], the authors combine between two efficient methods: divide and conquer (D&C) and clustering. They applied the D&C at the sensor nodes, and then, they applied the enhanced K-means at the cluster node to remove the redundant data and save energy before send it to the sink. The authors in [7, 8] proposed lossless compression methods for compressing EEG data in IoT networks. In [7], they combine between the fractal compression methods with differential encoding. In [8], the authors combine between Huffman encoding and clustering to further reduce compressed data before sending it to the IoT network.
3 Proposed Work The proposed EASeT technique for smart health in IoT network will be explained in more details in this section. The major task of medical staff is monitoring the patients and making the decisions to prevent a critical situation that led to patient death.
3.1 Early Warning Score Early warning score (EWS) is physiological scoring system that used by medical staff within hospital with the identification of patient who have the criticality level of risk so as to provide them with the medical attention and suitable care. The National Early Warning Score (NEWS) contains six physiological variables that reflect the basics of this proposed scoring system: oxygen saturation, respiratory rate, systolic blood pressure, pulse rate, temperature, and level of consciousness or new uncertainty. NEWS employed as a method to determine the scores of vital signs of the patient to provide the risk and saving the data. It represents the device of communication to give the review of the patient’s status to give the decision about the required care for the patient [23]. The EWS can be applied easily to identify patient’s risk
242
D. A. Alhussein et al.
Table 1 National Early Warning Score (NEWS) Physiological parameters
3
Respiration rate
≤8
2
Oxygen saturation ≤91 92–93 Any supplemental oxygen
1
0
9–11
12–20
94–95
≥96
Yes
2
3
21–24
> 25
No
Temperature
≤35
Systolic BP
≤90 91–100 101–110
111–219
Heart rate
≤40
51–90
Level of consciousness
1
35.1–36.0 36.1–38.0 38.1–39.0 ≥39.1 41–50
A
≥220 91–110
111–130 ≥131 V, P, or U
level who require a suitable care in the hospital. The EWS is a tool for determining the patient state based on some physiological parameters: respiratory rate, systolic blood pressure (Bp), temperature, and pulse rate [24]. Table 1 shows National Early Warning Score (NEWS) [25]. It will be employed by the proposed EASeT technique.
3.2 Emergency Detection of the Patient In connected healthcare systems, the patient’s vital signs are gathered by various biomedical sensors which are deployed on the patient’s body to capture the vital signs (e.g., pressure of blood, temperature, etc.). Each biosensor senses the measures (vital signs) from the body of the patient and then send it to the coordinator in periodic way. The coordinator will receive a large amount of data per period. Therefore, it is important to eliminate the similar data before forwarding it to the coordinator. This can prevent the complex analysis of huge data and save the energy of biosensors. The biosensors can only send the critical measures that have scores greater than 0 according to NEWS to the medical staff and ignore the normal measures from the transmission. This might prevent the periodic monitoring of the patient’s state but reduces the transmitted measures to the coordinator. This problem is solved through finding the relationship between the gathered measures during the period before transmitting it toward the coordinator. In this paper, we are employing the patient emergency detection algorithm [14] at each biosensor with some modification to check the scores of the collected measures and send the measures which have scores greater than 0. Algorithm 1 shows the patient emergency detection that implemented at each biosensor.
Energy-Efficient Adaptive Sensing Technique …
243
Algorithm 1 Patient Emergency Detection Require: M: a set of measures during one period Ensure: TM: transmitted measures, S: scores of the transmitted measures 1: prev_s NEWS(M1) 2: TM TM M1 3: S S prev_s 4: TransmitToCoordinator (M1 ) 5: UpdateBiosensorEnergy() 6: For each measure mi TM do // i = 2,…, N 7: curr_s NEWS(Mi) 8: If curr_s ≠ prev_s then 9: S S curr_s 10: TM TM Mi 11: TransmitToCoordinator (Mi) 12: UpdateBiosensorEnergy() 13: prev_s curr_s 14: endif 15: endfor 16: return TM, S The following example explains application of the patient emergency detection algorithm. Suppose a respiratory rate biosensor captures a series of 10 measures during a period. These captured measures are (16,16, 14, 10, 11, 11, 21, 22, 24, 25). According to the NEWS table, the scores of captured measures will be as follows: (0, 0, 0, 1, 1, 1, 2, 2, 2, 3). The biosensor will send the measures [7, 15, 20, 24]. Therefore, the biosensor sends the patient critical records to the medical staff as well as the first measure even if it was of score 0.
3.3 Adapting Sensing Frequency The gathered data by the biosensors are correlated with the patient’s conditions in medical applications. Therefore, more redundant data will be sent when the conditions of the patient are more stable. The patient health risk can be one of the flowing levels: (1) Low risk refers to the normal state of the patient that do not require low attention from the doctors; (2) Medium risk refers to the state between the normal and the critical that requires high attention to the state of the patient by medical staff, and (3) High risk refers to the dangerous state of the patient that requires monitoring continuously. The data collected by the biosensors will be similar in the case when the patient condition is in a low or high risk, and therefore, a large amount of redundant data will be sent by these biosensors that lead to energy loss and burdening the medical staff. To get rid of this problem, the redundant data will be removed via adapting the sensing rate of the biosensors according to the dynamic change of the monitored patient state.
244
D. A. Alhussein et al.
The proposed adaptive sensing rate algorithm in EASeT technique works in round where each round contains two periods. Therefore, this algorithm requires to compute the percentage of similarity between the scores of the measures of the two periods. The similarity between the two periods’ scores is calculated using edit distance. Edit distance (ED) is a method to measure the similarity between two strings and also named as Levenshtein distance [26]. Algorithm 2 shows the dynamic programming algorithm for calculating the edit distance between two sets of data. Algorithm 2 Edit Distance Require: set1, set2: are two sets of measures during one round (2 periods) Ensure: dis: distance between the set1 and set2. 1: dp 0 2: For i Length(set1)+1 do 3: For j Length(set2)+1 do 4: if i = 0 then j 5: dpi,j 6: else if j = 0 then 7: dpij i 8: else if ,seti-1 = setj-1 then 9: dpi,j dpi-1,j-1 10: else 1 + min(dpi,j-1, dpi-1,j, dpi-1,j-1) 11: dpi,j 12: endif 13: endfor 15: endfor 16: return dpLength(set1), Length(set2) The time requirements of the edit distance is (Length(set1), Length(set2)), and the storage requirement is (Length(set1), Length(set2)), and this can be enhanced to (min(Length(set1), Length(set2))) by remarking that the algorithm at any moment needs two columns (or two rows) in memory. Algorithm 3 shows the adaptive sensing rate that executed at every sensor node at the end of round. Algorithm 1 Adaptive Sensing rate Require: M1, M2 : are two sets of measures during one round (2 period), Smin: minimum sensing rate, Smax : maximum sensing rate Ensure: Srate: new sensing rate 1: Distdp Edit Distance (M1, M2) 2: DistDP Length(M1) - DistDP 3: SimRate DistDP/ Smax 4: P sampling (1 - SimRate ) * 100 5: If P sampling < Smin then 6: Srate = (Rangerow *Smin )/100 7: Else 8: Srate (Rangerow * P sampling)/100 9: endif 10: return Srate
Energy-Efficient Adaptive Sensing Technique …
245
4 Results and Discussions This section introduces the results of the proposed EASeT technique using a custom simulator-based Python programming language. This simulation results are by using real medical biosensors measures obtained from Multiple Intelligent Monitoring in Intensive Care (MIMIC) dataset of PhysioNet [27]. We have used some performance measures to evaluate the proposed EASeT technique such as the adaptation of sensing rate versus data reduction, energy consumption, and data integrity. EASeT is compared with an existing method that introduced by Habib et al. named modified LED* [14]. In this simulation, the proposed EASeT used a period of length 100 s and 70 periods (2 h). The S min and S max are 10 and 50 measures per period, respectively. EASeT used record 267 n of the patient, and the experimentations are only performed on the respiration biosensor considering both high and low-risk conditions into account.
4.1 Adaptation of Sensing Rate Versus Data Reduction In this experiment, the sensing rate adaptation of the biosensor and the reduction in the transmitted data is studied. Figure 1 presents the conducted results of the proposed EASeT technique and for two types of patients: normal (a) and critical (b) that compared with modified LED* for the same types of patients (c) and (d). In Fig. 2a, b, the blue and orange colors represent the number of sensed data and number of transmitted data, respectively, after implementing the EASeT technique. In Fig. 1c, d, the light gray and black colors represent the amount of sensed data and the number of transmitted data, respectively, after applying the EASeT technique. The dark gray color in Fig. 1c, d was neglected that refers to the original method of LED. The results in Fig. 1a, b show that the amount of sensed data is changed to a minimum due to the similarity between the scores values of the sensed data of the two periods in both normal and critical cases of the patient. Besides, EASeT decreases the volume of sensed data using the emergency detection approach by eliminating the redundant data during each period before sending it to the coordinator for both normal and critical patients. As shown in Fig. 1b, the transmitted sensed data are larger than Fig. 1a because it refers to the high-risk case of the patient. It has been shown in Fig. 1 that EASeT ((a) and (b)) has a better performance than modified LED* ((c) and (d)) by reducing the amount of data transmitted to the coordinator and adapting the rate of sensing of the biosensor to the minimum.
246
D. A. Alhussein et al.
Fig. 1 Adaptation of sensing rate versus data reduction: a low-risk patient, b high-risk patient of EASeT, c low-risk patient, and d high-risk patient of the modified LED* [14]
4.2 Energy Consumption This section studies the consumed power inside every biosensor and according to the situation of the patient (see Fig. 2). In this experiment, EASeT is used the same energy parameter values of the modified LED*, where the initial energy of the biosensor
Energy-Efficient Adaptive Sensing Technique …
247
Fig. 1 (continued)
Fig. 2 Energy consumption
is 700 units, and the energy spend during sensing and transmitting one measure by the biosensor is a quake to 0.1 and 1, respectively. The energy consumption of the biosensor is affected by the volume of the sampled (sensed) data and the transmitted data to the coordinator. Since the proposed EASeT technique reduces the size of the sampled and transmitted data to the coordinator, it will highly outperform the modified LED* in term of the consumed energy of the biosensor.
4.3 Data Integrity This section studies the effect of the proposed EASeT technique on data integrity. EASeT presents the results for two cases: normal and critical patients (see Fig. 3a, b), while the results of the modified LED* for a normal patient are presented in Fig. 3c.
248
Fig. 3 Data integrity
D. A. Alhussein et al.
Energy-Efficient Adaptive Sensing Technique …
249
This experiment is achieved based on the gathered data of every period without (NS) and with (AS) executing the adaptive sensing rate on the biosensor. It is achieved through scores distribution comparison (NEWS). Eight periods are selected from 70 periods to show the sensed data size and their distributions of scores with and without using adaptive sensing rate at the biosensor. The results in Fig. 3 show that the EASeT technique reduces the sensed data for eight periods up to 85% in comparison with modified LED* (see Fig. 3c) that reduced the data up to 64.5% for a normal patient. In the case of the normal patient, the types of the score are limited to the score 0. This leads to a high reduction in the sensed data of the biosensor. Besides, the EASeT technique reduces the sensed data for eight periods up to 85% while maintaining a suitable representation of all scores at the coordinator. Hence, it can be regarded that EASeT ensures an acceptable level of sensed data integrity of the gathered data while keeping the whole scores without loss at the coordinator.
5 Conclusion and Future Work An energy-efficient adaptive sensing technique (EASeT) for Smart Healthcare in connected healthcare systems is suggested. EASeT is work into rounds (2 periods). It is achieved through two main steps: detection of emergency and biosensor sensing adaptation. EASeT applied NEWS to discard unnecessary medical data before transferring it to the coordinator. After that the adaptive sensing rate algorithm is executed at the end of the second period of each round to adapt the rate of sensing of the biosensor according to the patient condition. The simulation results that based on real sensed biosensor data show that EASeT technique outperform the modified LED* in terms data reduction, consumed power, and data integrity. In future, we plan to perform the multi-biosensor adaptive sensing with continuous monitoring using machine learning at the coordinator to decide the situation of the patients based on the arrived data from the biosensors that deployed on the body of the patient.
References 1. Harb, H., Mansour, A., Nasser, A., Cruz, E.M., de la Torre D…ez, I.: A sensor-based data analytics for patient monitoring in connected healthcare applications. IEEE Sensors J 00(00) (2019) 2. Dey, N., Ashour, A.S., Bhatt, C.: Internet of things driven connected healthcare. In: Internet of Things and Big Data Technologies for Next Generation Healthcare, pp. 3–12. Springer (2017) 3. Singh, P.: Internet of things-based health monitoring system: opportunities and challenges. Int. J. Adv. Res. Comput. Sci. 9(1) (2018) 4. Roy, S., Shrivastava, M., Pandey, C.V., Nayak, S.K., Rawat, U.: IEVCA: an efficient image encryption technique for IoT applications using 2-D Von-Neumann cellular automata. Multimedia Tools Appl. 1, 1–39 (2020)
250
D. A. Alhussein et al.
5. Roy, S., Rawat, U., Sareen, H.A., Nayak, S.K.: IECA: an efficient IoT friendly image encryption technique using programmable cellular automata. J. Ambient Intell. Human. Comput. 11(11), 5083–102 (2020) 6. Vitabile, S., Marks, M., Stojanovic, D., Pllana, S., Molina, J.M., Krzyszton, M., Sikora, A., Jarynowski, A., Hosseinpour, F., Jakobik, A.: Medical data processing and analysis for remote health and activities monitoring. In: Kołodziej, J., González-Vélez, H. (eds.) High-Performance Modelling and Simulation for Big Data Applications, vol. 11400, pp. 186–220. Springer, Basel, Switzerland (2019) 7. Kadhum Idrees, S., Kadhum Idrees, A.: New fog computing enabled lossless EEG data compression scheme in IoT networks. J. Ambient Intell. Human. Comput. 1–14 (2021) 8. Al-Nassrawy, Kahlaa K., Al-Shammary, D., Kadhum Idrees, A.: High performance fractal compression for EEG health network traffic. Proc. Comput. Sci. 167, 1240–1249 (2020) 9. Kadhum Idrees, A., Jaoude, C.A., Al-Qurabat, A.K.M.: Data reduction and cleaning approach for energy-saving in wireless sensors networks of IoT. In: 2020 16th international conference on wireless and mobile computing, networking and communications (WiMob), 50308. IEEE (2020) 10. Kadhum Idrees, A., Al-Qurabat, A.K.M.: Energy-efficient data transmission and aggregation protocol in periodic sensor networks based fog computing. J. Netw. Syst. Manage. 29(1), 1–24 (2021) 11. Soufiene, B.O., Bahattab, A.A., Trad, A., Youssef, H.: Lightweight and confidential data aggregation in healthcare wireless sensor networks. Trans. Emerg. Telecommun. Technol. 27(4), 576–588 (2016) 12. Habib, C., Makhoul, A., Darazi, R., Salim, C.: Self-adaptive data collection and fusion for health monitoring based on body sensor networks. IEEE Trans. Ind. Inf. 12(6), 2342–2352 (2016) 13. Habib, C., Makhoul, A., Darazi, R., Couturier, R.: Real-time sampling rate adaptation based on continuous risk level evaluation in wireless body sensor networks. In: 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–8. IEEE (2017) 14. Habib, C., Makhoul, A., Darazi, R., Couturier, R.: Health risk assessment and decision-making for patient monitoring and decision-support using wireless body sensor networks. Inf. Fusion 47, 10–22 (2019) 15. Jaber, A.S., Kadhum Idrees, A.: Adaptive rate energy-saving data collecting technique for health monitoring in wireless body sensor networks. Int. J. Commun. Syst. 33(17), e4589 (2020) 16. Jaber, A.S., Kadhum Idrees, A.: Energy-saving multisensor data sampling and fusion with decision-making for monitoring health risk using WBSNs. Softw.: Pract. Experience 51(2), 271–293 (2021) 17. Kadhum Idrees, A., Harb, H., Jaber, A., Zahwe, O., Abou Taam, M.: Adaptive distributed energy-saving data gathering technique for wireless sensor networks. In: 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 55–62. IEEE (2017) 18. Al-Qurabat, A.K.M., Ali Kadhum M., Kadhum Idrees, A.: Energy-efficient adaptive distributed data collection method for periodic sensor networks. Int. J. Internet Technol. Secured Trans. 8(3), 297–335, (2018) 19. Kadhum Idrees, A., Al-Qurabat, A.K.M.: Distributed adaptive data collection protocol for improving lifetime in periodic sensor networks. IAENG Int. J. Comput. Sci. 44(3) (2017) 20. Al-Qurabat, A.K.M., Kadhum Idrees, A.: Data gathering and aggregation with selective transmission technique to optimize the lifetime of internet of things networks. Int. J. Commun. Syst. 33(11), e4408, (2020) 21. Harb, H., Makhoul, A., Jaber, A., Tawil, R., Bazzi, O.: Adaptive data collection approach based on sets similarity function for saving energy in periodic sensor networks. Int. J. Inf. Technol. Manage. 15(4), 346–363 (2016)
Energy-Efficient Adaptive Sensing Technique …
251
22. Kadhum Idrees, A., Al-Qurabat, A.K.M., Jaoude, C.A., Al-Yaseen, W.L.: Integrated divide and conquer with enhanced k-means technique for energy-saving data aggregation in wireless sensor networks. In: 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 973–978. IEEE (2019) 23. https://www.weahsn.net/wp-content/uploads/NEWS_toolkit_njd_19Apr2016.pdf 24. Schein, R.M.H., Hazday, N., Pena, M., Ruben, B.H., Sprung, C.L.: Clinical antecedents to in-hospital cardiopulmonary arrest. Chest 98, 1388 (1990) 25. National Early Warning Score (NEWS): Royal College of Physicians, London, U.K., http:// www.rcplondon.ac.uk/resources/nationalearly-warning-score-news (2015) 26. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001) 27. Goldberger, A.L., Amaral, L.A.N., Glass, L., et al.: Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000) 28. Makhoul, A., Laiymani, D., Harb, H., Bahi, J.M.: An adaptive scheme for data collection and aggregation in periodic sensor networks. Int. J. Sensor Netw. 18(1–2), 62–74 (2015)
Transfer Learning Approach for Analyzing Attentiveness of Students in an Online Classroom Environment with Emotion Detection K. V. Karan , Vedant Bahel , R. Ranjana , and T. Subha
Abstract There is a crucial need for advancement in the online educational system due to the unexpected, forced migration of classroom activities to a fully remote format, due to the coronavirus pandemic. Not only this, but online education is the future, and its infrastructure needs to be improved for an effective teaching–learning process. One of the major concerns with the current video call-based online classroom system is student engagement analysis. Teachers are often concerned about whether the students can perceive the teachings in a novel format. Such analysis was involuntarily done in the offline mode, however, is difficult in an online environment. This research presents an autonomous system for analyzing the students’ engagement in the class by detecting the emotions exhibited by the students. This is done by capturing the video feed of the students and passing the detected faces to an emotion detection mode. The emotion detection model in the proposed architecture was designed by fine-tuning VGG16 pre-trained image classifier model. Lastly, the average student engagement index is calculated. Authors received considerable performance setting reliability of the use of the proposed system in real time giving a future scope to this research. Keywords Emotion detection · CNN · VGG16 · Education · Transfer learning · Engagement
K. V. Karan (B) · R. Ranjana Sri Sairam Engineering College, Chennai, India e-mail: [email protected] V. Bahel G H Raisoni College of Engineering, Nagpur, India T. Subha Department of Computational Intelligence, SRM Institute of Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_23
253
254
K. V. Karan et al.
1 Introduction In recent times, with the world going online, there have been advances in the social software technology in the field of education. Especially, with increasing digitalization, many researchers have worked on intelligent tools to improve the educational system. Educational data mining (EDM) is a popular field of research focusing on how data science can be used on data from educational settings [1–4]. In [2], authors discuss how educational data can be used for multiple applications like student performance prediction, course recommendation, early dropout predictions, and some more. This research focuses on a similar application of analyzing student’s engagement in an online video conferencing-based classroom system from their video feed using computer vision. The interest in e-learning is on an upward trend, especially in the period of pandemic, and it seems to increase higher in the future. There aren’t necessary tracking systems available for the educational institutions to track the engagement of the students during their lectures and sessions, which makes teachers helpless for the progress of their students. Thus, the application discussed in this paper is in more need than any time before. In on-campus classroom learning, teachers were able to receive continuous feedback on their teaching by witnessing student’s reactions to what they are learning. Such feedback often helps to understand the state of the students about specific concepts in the class and allows teachers to take necessary steps. For example, if the teacher senses that the class seems to be confused about certain concepts, the teacher could sense and revisit them. But, in online classroom systems that seems to be lacking. In [5], Raes et al discusses the difficulties that the teachers face to engage students in a remote learning environment as compared to the face-to-face learning environment. In the study of Weitze [6], both students and teachers state that remote students learned less, were generally more passive, and often behaved like they were watching TV and not attending a lesson. The above statements mark that facial expressions are the vital identifiers of human feelings. Detecting facial emotion is quite an easy task for the human brain. But, the same becomes challenging when required to achieve this task with a computer algorithm. With the recent advancement in computer vision and machine learning models, it is possible to detect facial emotions from images and videos synchronously. The system tries to detect the facial emotions of the students, i.e., confused, happy, neutral, sleepy, and displays the emotional index of the class as a whole. In this way, teachers will be able to know/understand the state of the class, making them feel comfortable and ensuring the reach of the knowledge shared with the students. This research uses computer vision via transfer learning. Transfer learning is roughly defined as the task of improving a machine learning model by leveraging knowledge from existing models in the same domain [7]. Such methods of fine-tuning pre-trained deep learning models are extremely beneficial when there is less data for the current tasks [8]. Transfer learning can be improved by these three measures. First is the initial performance achievable with the transferred knowledge before any further learning. Second is the difference between the time taken to finish the learning with transferred knowledge to the time taken for achieving
Transfer Learning Approach for Analyzing Attentiveness …
255
the same from scratch. Third is the final performance level achievable in the target task with transferred knowledge compared to the final level without transfer [7]. This paper highlights the following: • The critical examination of students’ attention in an online classroom setting with emotion detection is the major contribution to this research using emotion dataset. • More significantly, this research demonstrates that the model’s performance was excellent when evaluated in grid view. Various pre-trained neural network models like ResNet, MobileNet, VGG, and Inception have found to out-perform traditional methods for a variety of tasks [9]. This research uses VGG16 pre-trained neural network model since the model is trained on 3.31 million images of person identity. Thus, the nature of this model suits best for the target task of emotion recognition [10]. Further, in this paper, we will be discussing about the current existing projects that resemble our work in the literature review, the dataset that was used in our current work and the way it was acquired; the methodology section explains the method we followed during the process and finally the result of the work with its performance followed by the conclusion.
2 Literature Review Facial expression is the most common way of conveying the mood or feelings, not only for human beings but also for many other living organisms. There are a lot of attempts in developing the technologies that analyze facial expression as it has many applications in fields like lie-detector, medicine, and robotics [11–13]. In a recent study on facial recognition technology (FERET) dataset, Sajid et al. found that the impact of facial asymmetry is the marker of age estimation [14]. Since the twentieth century, Ekman and Friesen [15] defined seven moods or expressions in which a human grows irrespective of culture, tribe, and ethnicity. They were anger, sadness, fear, disgust, contempt, happiness, and surprise. In a recent study on facial emotion recognition (FERC) using convolutional neural networks, Mehendale [16] described a way of improving the accuracy of facial emotion detection by using single-level CNN and by novel background removal techniques. There has been substantial work on determining the user emotions, one of which is done by McDuff et al. [17] at MSR, developed techniques to determine three aspects of human emotion: valence (checking the positiveness or negativeness), arousal (degree of emotion), and engagement level. These mining algorithms use data captured from hardware sensors like microphones, Web cameras, and GPS. In addition, they used interaction data such as Web URLs visited, documents opened, applications used, emails sent and received, and calendar appointments. They used the inferences mainly to allow users to reflect on their mood and recall events in the last week. Dewan et al. examined the engagement of the remote audience through computer vision. They just measured two scenarios, namely bored and engaged, and displayed the result of the engagement of the audience using OpenCV library [18] and computer vision.
256
K. V. Karan et al.
With the increase in the companies that offer video-based learning services such as tuitions and exam preparation purposes, video-based learning has become a trend due to numerous benefits [19], most importantly getting audio and video communications at the same time. Theories suggest that there are two subsystems involved here [20, 21]. One is the processing of visual objects, and the other is verbal processing. They both happen separately in the brains and can only process limited information [20], which in turn distracts students. Some methods have been proposed by researchers using video learning analytics to better understand how students learn with videos. For example, Kim et al. [22] investigated the learners’ in-video dropout rates and interaction peaks by analyzing their video-watching patterns (pausing, playing, replaying, and quitting) in online lecture videos. But to date, there is limited research on how students interact with video lectures, and therefore, it is difficult for instructors to determine the effectiveness of the learning design, especially at a fine grain level. Transfer learning has been recently used commonly by a lot of researchers where artificial intelligence and machine learning models are in use. Hazarika et al. proposed a way to recognize the state of emotion in a conversational text through transfer learning and natural language processing [23]. Transfer learning has played a critical role in the success of modern-day computer vision systems. Knetsch et al. used computer vision and deep learning techniques for the analysis of drone-acquired forest images on invasive species through transfer learning. He also mentions that the usage of transfer learning in his analysis improved the accuracy by 2.7% [24]. This evidently in literature, there have been multiple approaches in using transfer learning for a wide variety of tasks in the domain of computer vision. Additionally, researchers have also implemented this approach specifically for emotion recognition. However, there is hardly any approach to use emotion recognition for teaching– learning environments to improve the online learning system, which is the goal of this research.
3 Dataset The primary objective of this research was to create a model that can detect the facial expressions of the students in the class. There are a lot of varied facial expressions that a human being can exhibit. However, authors considered those facial expressions that are crucial when the learning activity of a student is concerned. This was also decided based on what teachers find most relevant in the classroom learning sphere. Psychological research shows that some positive emotions of students, such as concentration, happiness, and satisfaction promote their learning interests, motivation as well as cognitive activities, while negative emotions, such as boredom, sadness, and anxiety, can have a bad influence on students’ commitment and patience [25]. This paper considers four classes of emotion, namely: confused, sleepy, happy, and neutral. The facial dataset for each class was collected automatically by scrapping
Transfer Learning Approach for Analyzing Attentiveness …
257
Table 1 Number of training and validation images in each class in the dataset Class name
Happy
Confused
Sleepy
Neutral
# training images
126
228
168
161
# validation images
18
18
18
18
publicly available google images. Web scraping is the process of using bots to extract content and data from a Web site. A variety of bots are used for Web scraping for the applications like recognizing unique HTML site structures, extracting and transforming content, store scraped data, and extracting the data from application programming interface (API) [26]. Web scraping of publicly available google images is one of the efficient ways to collect the data to train the model. Python was used to scrape the image through the Firefox Web driver. The benefit of this approach is that (a) it retrieves thousands of images “from the wild” and (b) can automatically label the images using the keywords in the query [27]. Selenium and Beautiful Soup libraries was used for scraping purposes. The code used for scraping these google images can be found in this GitHub repository.1 The class size for each class of emotion is discussed in Table 1. Though the size of the dataset is relatively lesser than the general requirement of deep learning models, the transfer learning approach considered in this research has proven to show superior results even with a limited size of data [28]. The data are made publicly available2 for widening the scope of this research and potential future work to improve the current system.
4 Method and Implementation The first module of the proposed project pipeline is to capture the snapshot of an active online classroom screen in a grid format. That image is passed to the face detection model which detects individual faces and extracts them as separate image files. The face detection model is based on OpenCV. Further, in the pipeline, the individual identified facial images are passed to the emotion detection model. This research considers the VGG16 pre-trained model and have fine-tuned it for the dataset. VGG16 is a convolutional neural network model proposed by Simonyan and Zisserman from the University of Oxford in a paper named “Very Deep Convolutional Networks for Large-Scale Image Recognition” [29]. The model achieves 92% top-5 test accuracy in ImageNet, which consists of over 14 million images belonging to 1000 classes. It increases the depth of the architecture with exceedingly small (3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing 1 2
https://github.com/karankv26/Google-image-webscraper. https://github.com/karankv26/Google-image-webscraper/tree/main/dataset.
258
K. V. Karan et al.
the depth to 16–19 weight layers. This model was trained for weeks and was using NVIDIA Titan Black GPUs. One of the significant ways to avoid overfitting is to use a larger dataset. However, over scraping of the dataset (that was done) produced garbage data with irrelevant images. Thus, image augmentation was considered to increase the size of the dataset by various transformations of the scraped relevant dataset. Some of the image augmentation techniques practiced in this research are rotation, width_shift, height_shift, shear, zoom, and horizontal flip. Finally, the emotion detector model finds individual emotions and based on that the class emotion index is calculated using Eq. (1). Emotion index(emotion = e) =
# of students exhibiting e , total # of students
(1)
where emotion = {happy, confused, sleepy, neutral}.
5 Result and Discussion The proposed pipeline starts with a face detection module, details of which has been discussed in the previous section. When a grid of faces (as expected from an online classroom environment) is passed to the face detection module, the result obtained is shown in Fig. 1. Further, these images are passed to emotion detector model. Figure 2 shows the training and validation accuracy received for the proposed fine-tuned model with ran on 10 epochs. The training accuracy appeared to roughly flatten (parallel to the xaxis) at an accuracy of 74% (approx) after the 4th epoch which marks the appropriate place to stop training to avoid overfitting. However, the validation accuracy continued to improve. Fig. 1 Working of face detection and individual cropping tool
Transfer Learning Approach for Analyzing Attentiveness …
259
Fig. 2 Training and validation performance on 10 epochs
Fig. 3 Training and validation performance on 15 epochs a trial 1, b trial 2
To investigate further, authors considered running the model for 15 epochs to see a further movement. Figure 3 shows the training and validation accuracy for the same model when run for 15 epochs. In this case, the performance received was not exceptionally smooth. However, both the training and validation accuracy were still found to be roughly improving. To analyze better, authors ran the model with the same configuration again (refer to figure). This time the performance found was relatively smoother, with flattening performance for both the training and validation curve at an averagely adjusted accuracy of 77.5%. Overall, the best accuracy obtained for the model was 80% and 78% in training and validation, respectively. The reliable average accuracy received was 77.5% for both. Finally, the detection results are used to find the final emotion index using the formula given in Sect. 5.
260
K. V. Karan et al.
6 Conclusion This paper proposes an architecture for analysis of student’s engagement in videobased online classroom systems. The architecture starts with the detection of the faces of the student from their incoming video feed. Later, the detected faces are cropped individually and passed to the emotion detection model. The emotion detection model is a fine-tuned transfer learning model based on VGG16 as the pre-trained model. The reliable validation accuracy received for this task was found to be 77.5%. Later, these individual detected emotions are used to find an emotion index based on the number of students exhibiting a certain emotion. There are some limitations to the work presented in this paper. First, the model’s accuracy should be improved further. If the performance is not high, the system could be prone to gaming by the students. In future, author wish to extend the scope of the research, focusing on improving the accuracy of this research. Secondly, the research was only focused on developing a model. This project was not tested in real time. Hence, authors would want to put this architecture to the test in real time by developing software that includes capabilities like making an online classroom meet and detecting student attention, which will be shown on the teacher’s dashboard. This aids instructors in gaining a thorough grasp of these situations.
References 1. Dutt, A., Ismail, M.A., Herawan, T.: A systematic review on educational data mining. IEEE Access 5, 15991–16005 (2017) 2. Bahel, V., Bajaj, P., Thomas, A.: Knowledge discovery in educational databases in Indian educational system: a case study of GHRCE, Nagpur. In: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dec 2019, pp. 235–239. IEEE. 3. Bahel, V., Malewar, S., Thomas, A.: student interest group prediction using clustering analysis: an EDM approach. In: 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), pp. 481–484. IEEE (2021) 4. Bahel, V., Thomas, A.: Text similarity analysis for evaluation of descriptive answers. arXiv preprint arXiv:2105.02935 (2021) 5. Raes, A., Vanneste, P., Pieters, M., Windey, I., Van Den Noortgate, W., Depaepe, F.: Learning and instruction in the hybrid virtual classroom: an investigation of students’ engagement and the effect of quizzes. Comput. Edu. 143, 103682 (2020) 6. Weitze, C.L.: Pedagogical innovation in teacher teams: an organizational learning design model for continuous competence development. In: EXCEL 2015: The 14th European Conference on E-Learning, pp. 629–638. Academic Conferences and Publishing International (2015) 7. Torrey, L., Shavlik, J.: Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242–264. IGI global 8. Li, X., Grandvalet, Y., Davoine, F., Cheng, J., Cui, Y., Zhang, H., Yang, M.H.: Transfer learning in computer vision tasks: remember where you come from. Image Vis. Comput. 93, 103853 (2020) 9. Bahel, V., Pillai, S.: Detection of COVID-19 using chest radiographs with intelligent deployment architecture. In: Big Data Analytics and Artificial Intelligence Against COVID-19: Innovation Vision and Approach, pp. 117–130. Springer, Cham (2020)
Transfer Learning Approach for Analyzing Attentiveness …
261
10. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognizing faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), May 2018, pp. 67–74. IEEE (2018) 11. Ali, N., Zafar, B., Riaz, F., Dar, S.H., Ratyal, N.I., Bajwa, K.B., Iqbal, M.K., Sajid, M.: A hybrid geometric spatial image representation for scene classification. PLoS ONE 13(9), e0203339 (2018) 12. Ali, N., Zafar, B., Iqbal, M.K., Sajid, M., Younis, M.Y., Dar, S.H., Mahmood, M.T., Lee, I.H.: Modeling global geometric spatial information for rotation invariant classification of satellite images. PLoS ONE 14, 7 (2019) 13. Ali, N., Bajwa, K.B., Sablatnig, R., Chatzichristofs, S.A., Iqbal, Z., Rashid, M., Habib, H.A.: A novel image retrieval based on visual words integration of SIFT and SURF. PLoS ONE 11(6), e0157428 (2016) 14. Sajid, M., Iqbal Ratyal, N., Ali, N., Zafar, B., Dar, S.H., Mahmood, M.T., Joo, Y.B.: The impact of asymmetric left and asymmetric right face images on accurate age estimation. Math. Prob. Eng. 2019, 1–10 (2019) 15. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Personal Soc. Psychol. 17(2), 124 (1971) 16. Mehendale, N.: Facial emotion recognition using convolutional neural networks (FERC). SN Appl. Sci. 2(3), pp.1–8 (2020) 17. McDuff, D., et al.: AffectAura: an intelligent system for emotional memory. In: Proceedings of CHI, 2012 18. Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Li-brary. O’Reilly Press (2008) 19. Sabli´c, M., Mirosavljevi´c, A., Škugor, A.: Video-based learning (VBL)—past, present, and future: an overview of the research published from 2008 to 2019. Technol. Knowl. Learn. https://doi.org/10.1007/s10758-020-09455-5 20. Mayer, R., Mayer, R.E.: The Cambridge Handbook of Multimedia Learning. Cambridge University Press (2005) 21. Paivio, A.: Dual coding theory: retrospect and current status. Can. J. Psychol. 45(3), 255 (1991) 22. Kim, J., Guo, P.J., Seaton, D.T., Mitros, P., Gajos, K.Z., Miller, R.C.: Understanding in-video dropouts and interaction peaks in-online lecture videos. In: Proceedings of the First ACM Conference on Learning @ Scale Conference, Atlanta, Georgia, USA (L@S ’14). Association for Computing (2014) 23. Hazarika, D., Poria, S., Zimmermann, R., Mihalcea, R.: Conversational transfer learning for emotion recognition. School of Computing, National University of Singapore, Singapore. Computer Science & Engineering, University of Michigan, USA, Information Systems Technology and Design, Singapore University of Technology and Design, Singapore. Received 28 Nov 2019, Revised 20 May 2020, Accepted 13 June 2020, Available online 1 July 2020 24. Kentsch, S., Lopez Caceres, M.L., Serrano, D., Roure, F., Diez, Y.: Computer vision and deep learning techniques for the analysis of drone-acquired forest images, a transfer learning study. Remote Sens. 12, 1287 (2020). https://doi.org/10.3390/rs12081287 25. Bo, S., Yongna, L., Jiubing, C., Jihong, L., Di, Z.: Emotion analysis based on facial expression recognition in smart learning environment. Mod. Distance Edu. Res. 2, 96–103 (2015) 26. Nigam H., Biswas P.: (2021) Web scraping: from tools to related legislation and implementation using Python. In: Raj J.S., Iliyasu A.M., Bestak R., Baig Z.A. (eds.) Innovative Data Communication Technologies and Application. Lecture Notes on Data Engineering and Communications Technologies, vol. 59. Springer, Singapore. https://doi.org/10.1007/978-981-15-9651-3_13 27. Kasereka, H.: Importance of web scraping in e-commerce and e-marketing, 19 Jan 2021. Available at SSRN: https://ssrn.com/abstract=3769593 or https://doi.org/10.2139/ssrn.376 9593 28. Hutchinson, M.L., Antono, E., Gibbons, B.M., Paradiso, S., Ling, J., Meredig, B.: Overcoming data scarcity with transfer learning. arXiv preprint arXiv:1711.05099 (2017) 29. Simonyan, K., Zisserman, A., Visual Geometry Group.: Very deep convolutional networks for large-scale image recognition. Department of Engineering Science, University of Oxford
Prediction of COVID-19 Cases and Attribution to Various Factors Across Different Geographical Regions Megha Agarwal , Amit Singhal , Monika Patial , and Brejesh Lall
Abstract COVID-19 is a pandemic spread across various parts of the world. This paper proposes Susceptible-Hidden-Infected-Recovered-Dead (SHIRD) mathematical model for the prediction of COVID-19 cases. The completeness of model is achieved by including the concept of unidentified patients as hidden cases. Data for eight Indian states and four countries, i.e., Canada, Italy, Japan, and France, is considered in this work. Using the SHIRD model, the number of cases is predicted for the near future. Further, many factors such as population density, average temperature, absolute humidity, per capita gross domestic product (GDP), and testing per million are selected to understand the pattern of infections. It is analyzed that different regions similar in population do not share the trend of infection. Total confirmed cases exhibit significant correlation with population density while the death rate shows high correlation with per capita GDP, average temperature of that region, and the number of tests conducted per million of population. Keywords Attribution · COVID-19 · Hidden nodes · Reproduction rate
1 Introduction At the end of year 2019, a typical pneumonia was detected in Wuhan, China, with unknown cause. This disease was identified as novel coronavirus disease 2019 (COVID-19) by World Health Organization (WHO). It is caused by the severe acute M. Agarwal Jaypee Institute of Information Technology, Noida, India A. Singhal (B) Netaji Subhas University of Technology, Delhi, India e-mail: [email protected] M. Patial Independent Researcher, Dulwich Hill, NSW, Australia B. Lall Indian Institute of Technology Delhi, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_24
263
264
M. Agarwal et al.
respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. On January30, 2020, it was declared as a public health emergency of international concern due to its highly contagious nature. This epidemic outbreak has deeply affected the economy growth rate of the whole world. It is confirmed that the virus is spreading among humans primarily by small droplets produced by coughing, sneezing, and talking during close contacts [2]. The first case of India was reported in Kerala on January 30, 2020. Since then, it has escalated very rapidly in the country with a total of 63,92,048 cases, 53,48,746 recoveries, and 99,807 deaths as on October 01, 2020 [3]. Starting March 22, 2020, Indian Government has taken many preventive measures to restrict the virus transmission at a early stage. A health advisory is issued on regularly screening international travelers, washing hands, wearing masks, and staying at home. Mathematical modeling for the spread of COVID-19 can be very useful in handling this difficult situation [4]. Authors in [5] consider Hawkes model for spread of Ebola in Africa while a space-time conditional intensity model is investigated in [6] for occurrence of invasive meningococcal disease. However, compartmental models initiated by Kermack and McKendrick [7] continue to be the most popular tools in epidemiological modeling. Among these models, Susceptible-Infected-Removed (SIR) model is a widely used method for epidemic estimation [8, 9]. SIRD model [10] is obtained from SIR model by separating the removed into recovered and dead. Prediction of end dates for COVID-19 pandemic using Gaussian mixture model is provided in [11]. In this paper, we propose Susceptible-Hidden-Infected-Recovered-Dead (SHIRD) model to include the concept of hidden nodes, i.e., the infected people who have not been identified yet by the authorities. The data for many Indian states and some other countries having similar population is analyzed to understand major factors dictating the variation in spread of COVID-19 for different regions. The multimodal nature of trend of infected cases is also captured as opposed to the popular belief of a single peak. Rest of the paper is organized as follows: The SHIRD model and the various factors governing the spread of disease are discussed in Sect. 2. Section 3 presents the simulation results while the discussions are provided in Sect. 4. The conclusions are drawn in Sect. 5.
2 Methods 2.1 SHIRD Model SIR model [8, 9] has been widely studied for modeling the spread of epidemics across populations. Susceptible (S) refers to the population that has managed to stay away from the epidemic but can get infected in future. As people come in contact with infected people, they have a chance to get infected as well, if suitable precautions are not followed. This rate of infection is measured by the parameter β. All the active cases are counted among the infected (I ) category. Lastly, removed denotes
Prediction of COVID-19 Cases and Attribution to Various Factors …
265
the people who were infected in the past and have either recovered or succumbed to their illness. Removed comprises both recovered (R) and dead (D) [10]. The rates at which people move from I category to R and D categories are denoted by γr and γd , respectively. However, there is a category of population that gets infected but fails to get identified into category I . In this work, we refer to this category as hidden (H ). People can enter this category for the following reasons: (1) A section of population may not have access to medical facilities owing to a lack of resources or knowledge on their part. (2) Few people consciously hide their condition fearing social stigma and isolation. (3) Some people develop initial symptoms but are able to fight the virus because of a stronger immunity. The consolidated SHIRD model including the effect of H category is expressed as follows: dS(t) dt dH (t) dt dI (t) dt dR(t) dt dD(t) dt
= −β1 H (t) − β2 I (t) = β1 H (t) − αH (t) − (γr 1 + γd1 )H (t) = β2 I (t) + αH (t) − (γr 2 + γd2 )I (t) = γr 1 H (t) + γr 2 I (t) = γd1 H (t) + γd2 I (t) ,
(1)
where t denotes the number of days elapsed since the first case was reported and α measures the rate at which people can move from H to I category. A flowchart for SHIRD model is depicted in Fig. 1. Since H is similar to I in terms of recovery and death rate, without loss of generality, we can assume that the values of β2 , γd2 , and γr 2 are similar to β1 , γd1 , and γr 1 , respectively. The hidden cases may begin to turn up as the movement of people increases, and screening is performed at various checkpoints. SHIRD model for India is shown in Fig. 2 (left), depicting the trend of susceptible, hidden, infected, recovered, and death cases. Susceptible population denotes a very large number in comparison with the other factors. Therefore, Fig. 2 (right) indicates the same plot after adjusting the Y -axis to highlight the cases for hidden, infected, and dead. Fig. 1 Flowchart for SHIRD model
266
M. Agarwal et al.
Fig. 2 (left) SHIRD model for India, (right) adjusting the scale of Y -axis to focus on the hidden, infected, and dead cases
2.2 Factor Attribution We prepare a list of important factors that govern the spread of this pandemic, including population density, per capita gross domestic product (GDP), average temperature and humidity conditions, and the amount of testing done per million of the population. Recently, the spread of disease has been related to the prevailing temperature and humidity conditions. On the other hand, per capita GDP is a measure of ‘developed’ status of the area and affects the immunity of people, since people living in less-developed areas are exposed to harsher living conditions and do not have the comfort of staying in a protected environment. Also, higher economic activity leads to a larger movement of people, increasing the risk of coming in contact with another infected person. Further, a lower population density helps in the implementation of social distancing norms, essential in preventing the spread of this disease. Lastly, sufficient testing facilities are required to ensure that all the cases get reported timely. The total confirmed cases are considered as cases per million of population, and the recovered cases and deaths are expressed as recovery rate and death rate, respectively, where the rates are computed as a percentage of the total cases. The reproduction rate R0 represents the average number of people infected by each infected person. The average value of R0 [12] is computed from the data using the formula R0 = r i ∗ τ ,
(2)
where infection rate ri denotes the new people infected on a daily basis as a ratio of the active cases and τ is the average number of days that a person stays in the category of active cases. The statistics for spread of COVID-19 are attributed to the considered factors by computing a correlation of the obtained values. Pearson correlation coefficient ρ X Y between two set of values denoted by X i and Yi is computed as n ρX Y =
i=1 (X i
− μ X )(Yi − μY ) , nσ X σY
(3)
where the number of samples is n, i.e., 1 ≤ i ≤ n, and μ X , μY , σ X , σY refer to mean and standard deviation for X and Y , respectively.
1.57 2.66 2.59 1.94 1.12 1.61 1.44 0.39 1.22 1.89 11.56 6.05 5.89
2.69 3.16 2.07 2.49 1.80 2.19 2.16 2.17 1.61 2.08 1.75 1.83 1.91
82.58 76.94 84.98 86.97 83.61 90.34 83.65 68.19 74.87 91.33 72.42 85.62 17.62
India Maharashtra Gujarat Delhi Rajasthan Tamil Nadu Uttar Pradesh Kerala Uttarakhand Japan Italy Canada France
4404 10820 1941 13,526 1642 7596 1701 4785 4169 646 5125 4060 8251
R0
Table 1 List of factors for areas considered in this work State (or Confirmed Recovery rate Death rate Country) (per million) (%) (%)
35.0 29.33 33.0 35.0 32.83 30.33 33.33 28.83 27.33 22.0 23.0 14.0 18.16
13.73 21.17 17.05 13.73 13.13 21.32 15.72 22.17 13.55 11.97 11.37 9.3 9.15
Average temp Absolute (◦ C) humidity
149.87 264.05 277.61 432.32 146.76 276.22 79.60 274.07 263.96 2986.01 2530.60 3449.10 3086.72
GDP per capita (million rupees) 445 370 313 11,367 205 560 833 864 194 336 206 4 118
Population (per km2 )
52,027 53,749 62,300 147,610 39,490 93,806 42,782 78,882 61,000 15,439 183,439 188,190 161,639
Testing (per million)
Prediction of COVID-19 Cases and Attribution to Various Factors … 267
268
M. Agarwal et al.
Fig. 3 Plots of active cases for Indian states
3 Results We consider the data up to October 01, 2020, for eight Indian states [3]: Delhi, Uttar Pradesh, Maharashtra, Gujarat, Kerala, Uttarakhand, Rajasthan, and Tamil Nadu. In addition to these, we consider data from four other countries [13]: Japan, France, Italy, and Canada. It is interesting to note that Japan, France, and Canada have similar population as compared to Maharashtra, Gujarat, and Kerala, respectively. The proposed SHIRD model is applied to analyze the spread of COVID-19 in these areas, and the plots for infected (active) cases for Indian states are shown in Fig. 3, while Fig. 4 depicts the plots obtained for the other countries. Actual values are also indicated for validation of the simulated ones. We focus on the active cases, as they represent the true size of infected population, thereby determining the medical
Prediction of COVID-19 Cases and Attribution to Various Factors …
269
Fig. 4 Plots of active cases for Japan, France, Italy, and Canada Table 2 Correlation coefficients of various factors Average temp Absolute GDP per Population (◦ C) humidity capita (million (per km2 ) rupees) Confirmed 0.09 (per million) Recovery rate 0.42 (%) Death rate (%) −0.60 R0 0.34
Testing (per million)
0.19
−0.10
0.64
0.45
0.28
−0.36
0.18
−0.39
−0.52 0.41
0.68 −0.28
−0.15 0.24
0.76 −0.19
facilities required at any given point of time. It is observed that there are two clear peaks in some of the regions considered, highlighting a multimodal trend. For Delhi and Japan, the second peak is even larger than the first one, while some other regions are struggling to contain the rising cases. The data regarding number of cases and the various factors considered in this work is available at [3, 14, 15] for Indian states, while the data for other countries is taken from [15, 16]. A summary of this data along with the estimated values for reproduction rate R0 is presented in Table 1. In order to understand the role of various factors, correlation of confirmed cases per million of population, recovery rate, death rate, and R0 with these factors is measured in terms of Pearson correlation coefficients, as given in Table 2.
270
M. Agarwal et al.
4 Discussion Lockdown was imposed in India on March 23, 2020, but some relaxations began since May 03, 2020. The slope of total infected cases started dropping because of the strict enforcement of lockdown, but the relaxations have resulted in reporting of many new cases owing to an increased movement of the people. On the other hand, the recovery rate has also been increasing steadily. Among Indian states, the number of cases has been highest in Maharashtra. It is evident from Table 1 that despite having a population similar to Gujarat, Canada has seen a larger number of cases per million of population. Kerala and France have similar numbers for population and cases per million, but Kerala has shown a strong control over the number of deaths, leading to a low death rate of 0.39%. In comparison with other foreign countries, Japan has been able to handle the pandemic more effectively in spite of having a large population density. The following observations are made from Table 2 regarding the death rate: (1) high correlation of 0.76 with the testing per million of population. (2) An increase in average temperature and absolute humidity helps in reducing the death rate. (3) With a positive correlation of 0.68, per capita GDP also emerges as prominent factor. Higher death rates in Canada, Italy, and France can be attributed to higher per capita GDP, and lower values of temperature and absolute humidity, in comparison with the Indian states. Further, an increase in average temperature supports the recovery of infected people. Confirmed cases show a high correlation with the population density indicating more number of cases in densely populated areas. R0 exhibits some correlation with absolute humidity and average temperature, but a weaker correlation with the other factors.
5 Conclusion SHIRD model is more inclusive due to hidden nodes concept and is capable of providing more accurate prediction. The cases in near future for eight Indian states and four other countries, such as Canada, Italy, France, and Japan, have been predicted using the proposed model, and the value of reproduction rate R0 is also estimated. Another important analysis is done to understand the spread of pandemic in different regions with similar population. Indian states, with a lower per capita GDP in comparison with the other countries considered in this work, exhibit lower death rates, indicating a higher immunity of people residing in less-developed areas. Temperature and absolute humidity also have a negative impact on the death rate; therefore, the virus becomes more deadly in cold and dry weather. This study has been performed for research purposes only, and the predictions would change in case of any changes in the conditions prevalent in that region, for example, the emergence of new strains of virus could lead to rapid increase of cases in any of the regions.
Prediction of COVID-19 Cases and Attribution to Various Factors …
271
References 1. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y.: Clinical features of patients infected with, et al.: novel Coronavirus in Wuhan, China. Lancet 395, 497–506 (2020) 2. Coronavirus disease 2019 (COVID-19) situation report-73. World Health Org 2020 Apr. 3. COVID19-India API (2020) https://api.covid19india.org/. Accessed 01 Oct. 2020 4. Wang, L., Li, J., Guo, S., Xie, N., Yao, L., Cao, Y., et al.: Real-time estimation and prediction of mortality caused by COVID-19 with patient information based algorithm. Sci. Total Environ. 727, 138394 (2020) 5. Park, J., Chaffee, A.W., Harrigan, R.J., Schoenberg, F.P.: A non-parametric Hawkes model of the spread of Ebola in West Africa. J. Appl. Stats. (2020) 6. Meyer, S., Elias, J., Hohle, M.: A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics 68, 607–616 (2012) 7. Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. A Math. Phys. Eng. Sci. 115, pp. 700–721 (1927) 8. Ng, T.W., Turinici, G., Danchin, A.: A double epidemic model for the SARS propagation. BMC Infect. Dis. 3(19) (2003) 9. Zhong, L., Mu, L., Li, J., Wang, J., Yin, Z., Liu, D.: Early prediction of the: novel coronavirus outbreak in the mainland China based on simple mathematical model. IEEE Access 8, 51761– 51769 (2020) 10. Fanelli, D., Piazza, F.: Analysis and forecast of COVID-19 spreading in China. Italy and France, Chaos Sol. Frac. 134, 109761 (2020) 11. Singhal, A., Singh, P., Lall, B., Joshi, S.D.: Modeling and prediction of COVID-19 pandemic using Gaussian mixture model. Chaos Sol. Frac. 138, 110023 (2020) 12. Ndarou, F., Area, I., Nieto, J.J., Torres, D.F.: Mathematical modeling of COVID-19 transmission dynamics with a case study of Wuhan. Chaos Sol. Frac. 135, 109848 (2020) 13. Novel Coronavirus (COVID-19) Cases Data, https://data.humdata.org/dataset/novelcoronavirus-2019-ncov-cases (2020). Accessed 01 Oct. 2020 14. Ministry of Statistics and Programme Implementation, Government of India. MOSPI Net State Domestic Product, http://www.mospi.gov.in/data, (2020). Accessed 01 Oct. 2020 15. World Weather Online, https://www.worldweatheronline.com/ (2020). Accessed 01 Oct. 2020 16. Worldometer-real time world statistics, https://www.worldometers.info/ (2020). Accessed 01 Oct. 2020
Emotion Enhanced Domain Adaptation for Propaganda Detection in Indian Social Media Malavikka Rajmohan , Rohan Kamath , Akanksha P. Reddy , and Bhaskarjyoti Das
Abstract With the rapid growth of social media platforms such as Twitter, propaganda has become rampant. However, the state-of-the-art models in the field of propaganda research are limited to data sets of formal linguistic construct, such as news articles. But social media conversations such as tweets are characterized by the heavy usage of informal language. Hence, most of the text-based models developed on news articles do not perform well on sociolinguistic data sets. Also, emotions play an important role in propaganda as there is a direct correlation of emotions with various textual propaganda constructs. Hence, this paper proposes a methodology that combines a structure-aware pivot-based language model for domain adaptation with emotional footprints in the text data towards effective propaganda detection in social media data set. Keywords Domain adaptation · Cross-domain learning · Emotion · Propaganda detection · Twitter
1 Introduction The term propaganda first got coined with respect to propagation of Catholic faith [11]. It is defined as an expression of opinion or specific actions by an individual or a group with the objective of influencing those of other groups or individuals. Propaganda has become computational with the rising popularity of social media such as Twitter. It always has an end goal and in today’s world, and propagandists try to achieve this end goal by computational means. Propaganda can be classified as having a verifiable source and being positive (white), having false news from dubious sources (black) and somewhere in between, i.e. source may be identifiable but the Supported by PES University M. Rajmohan (B) · R. Kamath · A. P. Reddy · B. Das PES University, 100 Feet Ring Road, Banashankari Stage III, Bengaluru, Karnataka 560085, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_25
273
274
M. Rajmohan et al.
information is inaccurate (grey). This is different from fake news and rumour. In fact, disinformation is used as an instrument in a propaganda campaign. The seven main techniques of propaganda, i.e. transfer, bandwagon, name calling, plain folks, testimonial, glittering generalities and card stacking, were first identified in 1937. Subsequently, other techniques were also identified, which were essentially sub-classes of these. Modern propaganda research started with document level binary classification and got further refined by the identification of 18 propaganda techniques at the text fragment level [6] to enable automated detection. Modern propaganda research has happened essentially in two areas, i.e. contentbased approach and network propagation-based approach. Content-based approach migrated from text classification to propaganda technique identification and from classical feature engineering-based machine learning model to deep learning-based models. Propagation analysis became necessary as the style and content-based approaches were enough to flag the content as possible propaganda but to detect the intent behind the propaganda, one needs to analyse the network level information. While the network-oriented research has moved on to social media such as Twitter, content-based approach mostly remained confined to articles. Firstly, there is scarcity of annotated social media data in content-based propaganda research. Secondly, due to the considerable difference in feature distribution in news articles and tweets, although both are textual data, fitting a model trained on one domain into the other performs poorly. Emotion also plays an important role in propaganda. However, emotion is rarely used as a means to identify propaganda. The work in this paper fills in the above gaps in content-based approach for propaganda on social media. It does so, by utilizing a suitable domain adaptation methodology to leverage the existing work in formal textual content in social media space while making use of the emotional footprint of the content as a possible feature. This paper will first review the previous works in related domains and present their findings in Sect. 2. It will go on to describe the data set used for the experiment in Sect. 3. Section 4 espouses the methodology and architecture used. Results that were obtained during the research will be elucidated in Sect. 5. The paper is concluded in Sect. 6 by discussing the scope and future works.
2 Literature Survey 2.1 Content-Based Propaganda Detection Feature-based approaches have been used for text classification of articles using classical machine learning models [2, 22]. Oliinyk et al. [18] framed a binary classification task built using a regression model, to classify articles as propagandistic or not. The limitation of this paper was that it did not account for the context, and thus, the features extracted were not representative of propagandistic features. Yoosuf et al. [25] used a bidirectional encoder representations from transformers (BERT)-based
Emotion Enhanced Domain Adaptation for Propaganda …
275
model while employing oversampling strategies to account for class imbalances. Subsequent efforts by Da San Martino et al. [6] built a benchmark corpus for identifying propagandistic span of text along with identification of the propaganda technique used. The authors also built a system [7] to identify propagandistic news using the same methodology. Patil et al. [20], in their SEMEVAL 2020 submission, used text tokenization and binary classification of the tokens for the span identification task. For the propaganda technique classification, an ensemble learning based on linguistic features was done using a BERT-based model. There are very few propaganda researches using content-based approaches on social media data sets. Nizzoli et al. [16] used a data set consisting of tweets with at least one ISIS-related keyword to detect propaganda. Recently, Tundis et al. [23] have used supervised learning with a labelled data set to detect propaganda in mixed language tweets. Also, there has been some amount of work to detect automated propaganda on social media [3, 5] using content-based approach. Majority of content-based research is directed towards detecting propaganda than propagandists. Orlov et al. [19] detected propagandists based on the assumption that propaganda spreaders work in group and used unsupervised methods to cluster users based on content and then detect propagandists.
2.2 Emotion in Propaganda Propaganda essentially makes an emotive appeal. Krishnamurthy et al. [14] found clear correlation between different propaganda techniques and sentence level emotional salience features for six propaganda techniques. In a very recent work, Ghanem et al. [9] have used effective information flow to detect fake news with superior performance. Researchers [1, 12] have also used emotion to detect propaganda in videos posted in social media. Emotion has also been used in analysing the extremist propaganda [15, 17]. However, in propaganda research, effective signals are rarely utilized.
2.3 Domain Adaptation Ramponi et al. [21] categorized domain adaptation as (1) model centric, (2) data centric and (3) hybrid. The model-centric approach was further divided into losscentric, feature-centric and pre-trained models. In feature-centric approach, various procedures were followed to represent the “pivot” features, i.e. the common features between the source as well as target domains. These methods were improvised from merely representing the common feature set to capturing the context as well. Ziser et al. [26] coupled pivot-based language model (PBLM) with long short-term memory (LSTM) and convolutional neural networks (CNN) to manipulate and understand the structure of the input text. Many autoencoder-based domain adaptation (DA) models
276
M. Rajmohan et al.
were also devised, but however, do not account for the context. Using a loss-centric method, Ganin et al. [8] proposed that a decent cross-domain transfer representation is one for which any algorithm cannot figure out how to recognize the origin domain of the observational input. Jiang et al.[10] used another loss-centric method based on instance weighting and implemented several adaptation heuristics. This samplebased domain adaptation technique requires target labelled data. However, since the work described in this paper did not have much labelled data, this was not very useful. The pre-training-oriented method was inclined towards a fine-tuned transformer model with a limited amount of labelled data. The authors discovered that auxiliary task pre-training, using supervision from relevant labelled auxiliary tasks through multi-task learning (MTL) or intermediate task transfer is a viable approach, but there are a number of unanswered questions. Hence, this approach was also not used in this work. In data-centric approach, various semi-supervised methods were explored such as pseudo-labelling, alternatively, multiple bootstrap models or slower yet more precise handcrafted models as a guide for pseudo-labelling. This was not pursued as semi-supervised learning research itself is not yet fully explored. For the same reason, the hybrid approach that combined data-centric and model-centric methods, i.e. combination of semi-supervised objectives with an adversarial loss, combination of pivot-based approaches with pseudo-labelling etc., was not pursued. Kouw et al. [13] have categorized the approaches as sample based, feature based, inference based and have drawn some theoretical inferences. An important inference from this paper was that neither of the mentioned methods are not mutually exclusive. So, finally the work described in this paper adopted the approach that combined pivot-based language model with deep learning [26].
3 Data Set 3.1 Background The Citizenship Amendment Act (Bill) agitation, also known as the CAA Protest, began after the Government of India established the Citizenship Amendment Act (CAA) on December 12, 2019. These protests saw a lot of activity on Twitter that had propaganda, bias, fake news, hate speech with liberal use of image and videos. So, it made a good case study on techniques in disinformation research. Thus, the two data sets obtained for domain adaptation—labelled source data set and unlabelled target data set—were on the online CAA protests.
Emotion Enhanced Domain Adaptation for Propaganda …
277
3.2 Source Data Set A large part of the source data set was the labelled data set [6] of news articles from SemEval 2020’s open shared task. This data set had 451 news articles that manually annotated as propagandistic and non-propagandistic, based on the 18 propaganda techniques. Since the accuracy of the PBLM was dependant on the number of common elements (pivot) in the source and target data sets, 41 additional articles on the CAA protests were scraped and then added to this corpus. Out of these, 13 had propagandistic sentences as detected by a model with 76% accuracy that was trained on a data set published by “Data Science Society” and manually verified. This data set also followed the same convention as SEMEVAL data set.
3.3 Target Data Set A data set of 597 tweets was scraped on the chosen topic. Based on the background research, eight highly propagandistic and two relatively neutral hashtags were identified. For each hashtag, tweets with appropriate tweet IDs were collected using Twitter application programming interfaces (APIs). The tweets were then manually annotated with reference to the 18 well-documented propaganda techniques, and after this process, 462 propagandistic and 135 non-propagandistic tweets were collected.
4 Methodology 4.1 Data Preparation The model utilizes the information from its labelled data at the sentence level to classify the unlabelled data. The following procedures were followed for the source and target.
4.1.1
Source
Each sentence of the 464 labelled articles was checked for the presence of a propagandistic span and written into two separate parsed extensive markup language (XML) files, each containing propagandistic and non-propagandistic sentences, respectively.
278
4.1.2
M. Rajmohan et al.
Target
Tweets consist of hashtags, URLs, usernames and emojis. The emojis were converted to text, and the contents of the hashtags were stored. The 597 manually annotated tweets were then separated out into two files as mentioned above, and the remaining unlabelled tweets were written into another file. The final data set is published according to the findability, accessibility, interoperability and reuse (FAIR) guiding principle [24].1
4.2 Model This section elaborates on three baseline models and compares their results. The baseline 1 is a pre-trained BERT-based model without domain adaptation, baseline 2 is with PBLM domain adaptation, and baseline 3 is the enhanced PBLM model that additionally utilizes the emotion footprint.
4.2.1
Baseline 1
The first baseline was a model identifying propagandistic spans in a sentence by using a fine-tuned BERT-based uncased model trained on the SEMEVAL data set, with a single multilabel token classification head. By adding a linear classification head to BERT’s last layer, the token classifier is built. The BERT model used had 12 transformer layers with 110 million trained parameters including batch size 64, sequence length 210, early stopping on F1-score on the validation set with a patience value of 7, 0.01 as weight decay and Adam optimizer with a learning rate of 3e−5 as hyper-parameters. The model was trained for 20 epochs and was tested on the unlabelled tweets.
4.2.2
Baseline 2
Pivot-based language model (PBLM) [26] is a domain adaptation through representation learning (DRel) technique where source and target learn a structure-aware shared representation. The model used a sequential neural network (LSTM) and output, for every input word, a context-dependent vector. The first phase involved taking the labelled instances and generating unigram and bigram features for each tweet and computing the MI score giving the relevance of a feature to a particular label as tuple of feature and score. The second phase involved training to obtain the shared 1
Malavikka Rajmohan, Rohan Kamath, Akanksha P. Reddy, Bhaskarjyoti Das (2021). Data set for cross-domain propaganda detection in Indian social media (Version v1.0). Zenodo. https://doi.org/ 10.5281/zenodo.4726846.
Emotion Enhanced Domain Adaptation for Propaganda … Table 1 PBLM model architecture Layer (type) Output shape embedding (Embeddings) lstm (LSTM) time_distributed (Time distributed)
(None, 500, 256) (None, 500, 256) (None, 500, 502)
Table 2 PBLM-CNN model architecture Layer (Type) Output shape conv1d max_pooling1d (MaxPooling1D) dense (Dense)
Param # 2,560,000 525,312 129,014
Param #
(None, 498, 250) (None, 250)
192,250 0
(None, 1)
251
Table 3 PBLM-LSTM model architecture Layer (Type) Output shape sequential_2 (Sequential) sentLSTM (LSTM) dense_1 (Dense)
279
(None, 500, 256) (None, 256) (None, 1)
Param # 3,085,312 525,312 257
representation, by modifying the standard LSTM training process—the next word was predicted only if it was a pivot (as generated from the previous phase). The third phase trained the classifier by loading the previously trained PBLM model without the softmax layer (containing the structural-aware representation) and sequentially stacking them with a CNN and LSTM, and the accuracies of both of them were recorded against the manually annotated tweets. The model architecture for the three phases is shown below; Tables 1, 2 and 3 show the model architectures of PBLM, PBLM-CNN and PBLM-LSTM, respectively
4.2.3
Baseline 3
This baseline additionally used the emotion footprint [4] as a feature following Ekman’s model of six basic emotions of anger, disgust, fear, joy, sadness and surprise. For each tweet, the embedding generated from the PBLM training phase was multiplied with its corresponding emotion vector. To avoid a sparse matrix and to accentuate the emotion signals, matrix multiplication is done. The obtained vector is fed into an LSTM layer and finally passed through a dense layer with a sigmoid activation for the text classification.
280
M. Rajmohan et al.
Table 4 Enhanced PBLM model architecture Layer (Type) Output shape sentLSTM (LSTM) dense_2 (Dense)
(None, 256) (None, 1)
Param # 269,312 257
Below is the algorithm of the proposed solution, and Table 4 shows its corresponding model architecture Algorithm 1 Enhanced PBLM Input: tweet Output: 0/1 1: n ← no of tweets 2: m ← dimensions of emotion vector 3: d ← PBLM embedding dimension 4: X ← [x1 , x2 , ....., xn ] n×d
5: F ← [] 6: for i ← 1 to N do 7: E ← [e1 , e2 , ....., en ] 1×m
8:
I = xi × E
d×m
9: F = F.concatenate(I ) 10: i ← i + 1 11: end for 12: out = P B L M_L ST M(F)
5 Results From the obtained values, it was observed that transferring features learnt from structured text such as news articles did not perform well on a data set which had a completely different distribution of words, such as in Twitter. Thus, baseline 1 did not perform very well on the target data set having obtained an accuracy of 46%. From baseline 2, it was seen that within domain adaptation, PBLM-LSTM classifier performs better than PBLM-CNN by obtaining an accuracy of 63% as compared to the latter with 59% accuracy. Thus, the PBLM-LSTM was used for the enhanced model. Finally, baseline 3, which was the emotion enhanced model, gave an accuracy of 75% and thus proves that identifying emotions in tweets and coupling it with structure-aware representations can improve the overall performance of a propaganda detection model. Table 5 shows the results obtained.
Emotion Enhanced Domain Adaptation for Propaganda …
281
Table 5 Accuracies of the baseline models calculated against labelled test set Model Accuracy (%) Baseline 1 (Fine-tuned BERT) Baseline 2 a.(PBLM-CNN) Baseline 2 b.(PBLM-LSTM) Baseline 3 (PBLM-LSTM + Emotion vector)
46 59 63 75
6 Conclusion and Future Work Given that most of the text-based propaganda detection research has been on news articles, this work utilizes the text-based propaganda detection techniques on lengthlimited social media text by addressing the domain gap with a pivot-based language model approach. Propaganda always utilizes emotion, and hence, addition of the emotion foot prints has demonstrated a sizable increase in the model accuracy. The logical next step of this work can be in two directions; i.e. firstly, a semi-supervised method of domain adaptation can be investigated, and secondly, the same work can be combined with a network-based approach.
References 1. Abd Kadir, S., Lokman, A.M., Tsuchiya, T.: Emotion and techniques of propaganda in You Tube videos. Indian J. Sci. Technol. 9, S1 (2016) 2. Barrón-Cedeno, A., Da San Martino, G., Jaradat, I., Nakov, P.: Proppy: A system to unmask propaganda in online news. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 9847–9848 (2019) 3. Caldarelli, G., De Nicola, R., Del Vigna, F., Petrocchi, M., Saracco, F.: The role of bot squads in the political propaganda on Twitter. Commun. Phys. 3(1), 1–15 (2020) 4. Colneriˇc, N., Demšar, J.: Emotion recognition on twitter: comparative study and training a unison model. IEEE Trans. Affect. Comput. 11(3), 433–446 (2018) 5. Cresci, S.: A decade of social bot detection. Commun. ACM 63(10), 72–83 (2020) 6. Da San Martino, G., Barron-Cedeno, A., Nakov, P.: Findings of the nlp4if-2019 shared task on fine-grained propaganda detection. In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pp. 162–170 (2019) 7. Da San Martino, G., Shaar, S., Zhang, Y., Yu, S., Barrón-Cedeno, A., Nakov, P.: Prta: A system to support the analysis of propaganda techniques in the news. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 287–293 (2020) 8. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. The journal of machine learning research 17(1), 2096–2030 (2016) 9. Ghanem, B., Ponzetto, S.P., Rosso, P., Rangel, F.: Fakeflow: Fake news detection by modeling the flow of affective information. arXiv preprint arXiv:2101.09810 (2021) 10. Jiang, J.: Domain adaptation in natural language processing. University of Illinois at UrbanaChampaign, Technical report (2008)
282
M. Rajmohan et al.
11. Jowett, G.S., O’donnell, V.: Propaganda & Persuasion. Sage publications (2018) 12. Kadir, S., Lokman, A., Tsuchiya, T., Shuhidan, S.: Analysing implicit emotion and unity in propaganda videos posted in social network. In: Journal of Physics: Conference Series. vol. 1529, p. 022018. IOP Publishing (2020) 13. Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intel. (2019) 14. Krishnamurthy, G., Gupta, R.K., Yang, Y.: Soccogcom at Semeval-2020 task 11: characterizing and detecting propaganda using sentence-level emotional salience features. arXiv preprint arXiv:2008.13012 (2020) 15. Morris, T.: Extracting and networking emotions in extremist propaganda. In: 2012 European Intelligence and Security Informatics Conference, pp. 53–59. IEEE (2012) 16. Nizzoli, L., Avvenuti, M., Cresci, S., Tesconi, M.: Extremist propaganda tweet classification with deep learning in realistic scenarios. In: Proceedings of the 10th ACM Conference on Web Science, pp. 203–204 (2019) 17. Nouh, M., Nurse, J.R., Goldsmith, M.: Understanding the radical mind: identifying signals to detect extremist content on twitter. In: 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 98–103. IEEE (2019) 18. Oliinyk, V.A., Vysotska, V., Burov, Y., Mykich, K., Basto-Fernandes, V.: Propaganda detection in text data based on nlp and machine learning. In: CEUR Workshop Proceedings, vol. 2631, pp. 132–144 (2020) 19. Orlov, M., Litvak, M.: Using behavior and text analysis to detect propagandists and misinformers on Twitter. In: Annual International Symposium on Information Management and Big Data, pp. 67–74. Springer (2018) 20. Patil, R., Singh, S., Agarwal, S.: Bpgc at semeval-2020 task 11: propaganda detection in news articles with multi-granularity knowledge sharing and linguistic features based ensemble learning. arXiv preprint arXiv:2006.00593 (2020) 21. Ramponi, A., Plank, B.: Neural unsupervised domain adaptation in nlp—a survey. arXiv preprint arXiv:2006.00632 (2020) 22. Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: Analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937 (2017) 23. Tundis, A., Mukherjee, G., Mühlhäuser, M.: Mixed-code text analysis for the detection of online hidden propaganda. In: Proceedings of the 15th International Conference on Availability, Reliability and Security, pp. 1–7 (2020) 24. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016) 25. Yoosuf, S., Yang, Y.: Fine-grained propaganda detection with fine-tuned Bert. In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pp. 87–91 (2019) 26. Ziser, Y., Reichart, R.: Pivot based language modeling for improved neural domain adaptation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1241–1251 (2018)
Target Identification and Detection on SAR Imagery Using Deep Convolution Network Anishi Gupta, D. Penchalaiah, and Abhijit Bhattacharyya
Abstract Paper captures implementation of a real-time ship target detection algorithm on SAR imagery. SAR images are independent of range and can operate at allday and all-weather conditions. The features in SAR images, noise and clutter make it real challenging for detection of objects. An open-source SAR-ship dataset is used for detection task. The traditional methods of detection (two-stage object detector) have region proposals followed by extraction of features. The speed of the process is slow, and the accuracy is unsatisfactory. Hence, one-stage detector model (YOLOv3-16 model with 16 convolution layers) is used, which outperforms YOLOv2 by providing 89.35% mean average precision (mAP) at 0.5. The customized model YOLOv3-16 outperforms YOLOv3 in terms of inference time with nearly comparable accuracy. The prediction time is ~100 ms on YOLOv3-16 model as compared to YOLOv3 which provides ~172 ms on GPU. NVIDIA GPU (K40) hardware is used for faster training of the model with speedup of ~ 5 times over CPU (Xeon). Keywords Convolution network · Intersection over union · Precision · Recall · SAR
1 Introduction Radar has long been used for military and non-military purposes in a wide variety of applications such as imaging, guidance, remote sensing and global positioning [1]. Synthetic aperture radar (SAR) is a technique which uses signal processing to improve the resolution beyond the limitation of physical antenna aperture [2]. Since SAR is an active sensor, which provides its own source of illumination, it can, therefore, operate day or night; able to illuminate with variable look angle and can select wide area coverage.
A. Gupta (B) · D. Penchalaiah · A. Bhattacharyya DRDL, DRDO, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_26
283
284
A. Gupta et al.
Fig. 1 One-stage detector (deep convolution network)
SAR imagery has significant contributions for military applications as SAR images are all-weather, day–night capable. The resolution of a SAR image is independent of a range. These characteristics of a SAR image make it widely used for numerous military areas. It has a vital role in the field of situation awareness as target identification and detection. For detection of a target in SAR imagery, a real-time object detection algorithm is required. A single stage end-to-end pipeline is used for target identification and detection for fast and good precision as depicted in Fig. 1. The aim of a paper is to classify, detect targets using open-source SAR-ship dataset on customized model. Since object detection is a very important part of artificial intelligence, convolution neural network (CNN) is chosen for object detection and classification. A convolution neural network (CNN or ConvNet) is a class of deep, feedforward artificial neural network that has successfully been applied to analyzing visual imagery [3]. It has a unique superior feature for object recognition with its special structure of sharing local weights. CNN basically includes convolution layer, pooling layer and fully connected layer. Current algorithms for object detection and classification are based on CNN. Firstly, extraction of proposals and features are done; then, the classification [4] occurs in traditional detection [5] methods. It is slow process and involves many errors. Some of region-based [6] object detection algorithms as two-stage detectors performs classification by CNN with the process of extraction of proposals using the selective search method. Proposal extraction spends a relatively long time. One-stage detector [7] proposes a method to implement detection and classification in a single pipeline. Single stage network is customized by reducing convolution layers for faster inference on embedded system and known as customized YOLOv3-16. One-stage detector model (customized YOLOv3) outperforms Yolov2 in terms of precision and outperforms YOLOv3 in terms of prediction/inference time, on NVIDIA GPU (K40). For better and high accuracy, we fine-tuned customized model by proper selection of anchors according to the size of targets, size of training data, data preprocessing,
Target Identification and Detection on SAR Imagery Using Deep Convolution Network
285
transfer learning and parameters optimization. GPU hardware is used to speed up the training over CPU hardware. The paper is organized as follows: Section 2 deals with the literature survey related to object classification and detection of variety of images including occluded target in different scenes. Section 3 illustrates Real-Time Target Detection on SAR imagery. The paper gives conclusion and future work in the Sect. 4.
2 Literature Survey Many excellent CNN models have come into picture with the evolution of CNN. The beginning of CNN started with the birth of LeNet in 1998. In 2012, Image Net LargeScale Visual Recognition Challenge (ILSVRC) was won by Alex using convolution detection for images lies in the changes of illumination, angle of view and target. In 2014, the R-CNN [8] framework made an excellent breakthrough in object detection. Based on convolution neural networks, SPP-Net [9], fast R-CNN [10], faster R-CNN [11], R-FCN [12] and mask R-CNN [13] rise one by one. Fast R-CNN and faster R-CNN algorithms implement location (extracting proposals) and classification separately. They locate objects accurately, but proposal extraction takes a large amount of time. From R-CNN to R-FCN, the object detection process has become more smooth, streamlined, accurate and faster. But R-FCN algorithms are based on regions. To implement the algorithms, first image proposals should be produced. In 2016, Ji Feng, et al. of MSRA proposed one-stage object detector for improving the speed of object detection. It is not a region-based model which can provide the end-to-end service. Detection and classification is done with single convolution neural network. Recently, pedestrian detection is also key area of research which includes pedestrian tracking [14] and robot navigation [15]. FPN [16] generates a multi-scale pyramid of feature maps which helps in detection of tiny objects with high precision. FPN has bottom-up and top-down approach. Top-down approach is better than bottom-up approach by making lateral connections between feature maps and generating pyramids of strong semantics features at all levels including high resolution level. Single-stage networks as G-CNN [17], YOLO [7], SSD [18], YOLOv2 [19] and DSSD [20] are regression-based models which are faster than two-stage region-based detectors. One such variant of YOLO is used as a reference in the paper.
3 Real-Time Target Detection for SAR Imagery Target Identification and detection on SAR images takes place using real-time object detection framework, customized YOLO-v3. Dataset acquisition, annotation, anchor selection, methodology are described in following subsections.
286
A. Gupta et al.
Fig. 2 Images from SAR-ship dataset [21]
3.1 Dataset The dataset is open-source SAR-ship dataset from sentinel-1 sensor with 3 m resolution. Figure 2 illustrates few images which are used for training the model. The training dataset contains 10k images, and validation dataset contains 1k images. The size of an image is 256 × 256 pixels.
3.2 Dataset Annotation The target in an image is manually annotated with Yolo_mark [22], an open-source image annotation tool. Figure 3 shows the graphical user interface (GUI) of the tool creating bounding box on an object.
3.3 Anchors Selection Three anchors per grid are chosen as hyperparameter for training the model. The anchors are taken in ration of 1:1, 1:2 and 2:1.
Target Identification and Detection on SAR Imagery Using Deep Convolution Network
287
Fig. 3 Detection of ships
3.4 One-Stage Detector Deep Convolution Network The prime objective for reducing convolution layers in network is for fast processing/inference on embedded system/hardware. The algorithm is suited for drones, aircrafts for on-the-fly target detection. The customized model therefore has 16 convolution layers in network. K-means++ technique is used for selection of the anchors from annotated SAR dataset. Three anchors per cell are selected for the model. Filters = (classes + coordinates + 1) * anchors [7], there is one category in our dataset to train the network, four coordinates and five anchors. The last layers contain 18 filters. The optimization loss function is as follows: B s 2
γcoor d
obj
1i j
2 2 xi − x i ] + yi − y i ]
i=0 j=0 B s 2
+ γcoor d
obj
1i j +
√ 2 2 h i −− h i wi −− w i +
i=0 j=0
+
s2 B
obj 1i j Ci
−
2 Ci
2
+ γnoobj
i=0 j=0
+
s2
i=0 c∈classes
s B i=0 j=0
2 ( pi (c) − p i (c)
noobj
1i j
Ci − Ci
2
288
A. Gupta et al.
The symbols in the above-mentioned formula (x i, yi), (x i, y i ) indicates ground truth and predicted center coordinates of a bounding box, respectively; (wi , hi ), (w i , h i ) is the true and predicted width and height of a box, respectively. (C i , C i ) are the true and predicted objectness score (binary value 0 or 1) if there is an object or no object contained in a box. pi (c) and p i (c) are the probability of true and predicted value of a given class ‘i’ in a particular grid cell. The bounding box contains confidence score in a grid, which is decided by a given formula I. i. Pr(Classi ) ∗ IOUtruth pred . Pr (Classi ) is the probability of a given class ‘i’ in a particular grid cell. IOUtruth pred is an intersection over union score between a ground truth and predicted bounding box. Value of IOUtruth pred lies between 0 and 1.
4 Experimental Results Ship target identification and detection is carried on Nvidia K40 GPU on Ubuntu distribution. The GPU has 2880 CUDA cores and 12 GB GDDR GPU memory. The workstation has Intel(R) Xeon(R) CPU E5-2660 v3 at 2.60 GHz with 20 physical cores.
4.1 Performance Analysis of the Deep Convolution Model The model performance is evaluated in terms of mean average precision. The model provides 94.98 mean average precision (mAP) with validation dataset with size 416 × 416 given to the model for prediction as depicted in Table 1. We have validated the model (customized YOLOv3-16 layers) with validation dataset with variety of image sizes as 416 × 416 and 544 × 544. The result for variety of images with different input sizes for prediction of mAP, recall and average IOU are presented in Table 2. Table 1 Result with one stage detector Datasets/Models
YOLOv2 YOLOv3 Customized mAP at 0.5(416 × 416) mAP at 0.5(416 × 416) YOLOv3-16 layers (%) (%) mAP at 0.5(416 × 416) (%)
Generated dataset 78.29
89.67
89.35
Target Identification and Detection on SAR Imagery Using Deep Convolution Network
289
Table 2 Mean average precision, recall and average IOU at different input sizes of images Image input size mAP at 0.5 (%) Recall at 0.5 Avg.IOU (%) Prediction time (on a single image) (ms) 416 × 416
89.35
0.85
68.98
100
544 * 544
89.50
0.89
69.92
128
Table 3 Comparison of training time of model on CPU and GPU Model/Architecture
CPU
Single GPU
Deep convolution network
~39.84 days
~8.3 days
Table 4 Comparison of prediction time of model on CPU and GPU Model/Architecture
CPU (ms)
YOLOv3 Single GPU (ms)
Customized YOLOv3-16 layers Single GPU (ms)
Customized model
~550
~172
~100
Increasing an image size on validation dataset enhances mAP, recall and average IOU on the dataset, at the cost of prediction time. Note: map at 0.5 and map at 0.6 means mean average precision with IOU-threshold of 0.5 and 0.6, respectively.
4.2 Faster Training of the Model on GPU Architecture For faster training, we trained the model on CPU and GPU (NVIDIA K40) on SAR dataset. We achieved ~5 times speedup on GPU as compare to CPU as depicted in Table 3. The prediction time is approximately 100 ms on a single image using YOLOv3-16 layers model as compared to prediction time of 172 ms using YOLOv3 model, tested on single GPU (K40) as reflected in Table 4.
4.3 Testing of the Model with Validation Images Figure 3 reflects the detected targets on few images of validation dataset. The size of input image given to the prediction model is 416 × 416 pixels.
290
A. Gupta et al.
5 Conclusion and Future Work In this report, we implemented a real-time target detection on SAR imagery. The beauty of SAR images is that it remains same in all-day and all-weather conditions like fog, rain and haze. Hence, it is operational at all time. But, noise, clutter and its features make it a real challenge for detection of target in an image. A singlestage target detector detects the target in real time with 11 fps. The model provides 89.35% mAP at 0.5 for on validation dataset with image size 416 × 416 given to the customized model. The prediction time is ~100 ms on a single image using YOLOv3-16 layers model as compared to prediction time of 172 ms using YOLOv3 model, tested on GPU(K40). Faster training of the model on single GPU achieves high speed up as compare to CPU (Xeon) architecture. The hyperfine tuning of the model requires less training time as compare to time requires training model without fine-tuning over single GPU (K40) architecture. Fine-tuning also enhances precision of the model by ~2% mAP. There is always a room for improvement, prediction time may be further minimized on SAR images through better fine tuning of the model.
References 1. Skolnik, M.I.: Radar Handbook. McGraw-Hill, New York (1970) 2. Curlander, J.C., McDounough, R.N.: Synthetic Aperture Radar, Systems and Signal Processing. Wiley, New York (1991) 3. LeCun, Y.: LeNet-5, convolutional neural networks. Retrieved 16 Nov 2013 4. Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627 (2010) 5. Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model 8, 1–8 (2008) 6. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with onlinehard example mining 761–769 (2016) 7. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection 779–788 (CVPR, 2016) (2015) 8. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detectionand semantic Segmentation. Comput. Sci. 580–587 (CVPR, 2014) (2013) 9. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015) 10. Girshick, R.: Fast R-CNN. Comput. Sci. (2015) 11. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137 (2017) 12. Li, Y., He, K., Sun, J., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: NIPS, 2016, pp. 379–387 (2016) 13. He, K., Gkioxari, G., Dollar, P., Girshick, R.B.: Mask r-cnn. In: ICCV (2017) 14. Jiang, Z., Huynh, D.Q.: Multiple pedestrian tracking from monocular videos in an interacting multiple model framework. IEEE Trans. Image Process. 27, 1361–1375 (2018) 15. Rinner, K.B., Cavallaro, A.: Cooperative robots to observe moving targets: review. IEEE Trans. Cybern. 48, 187–198 (2018) 16. Lin, T.-Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. arXiv: 1612.03144v2 [cs.CV] 19 Apr 2017
Target Identification and Detection on SAR Imagery Using Deep Convolution Network
291
17. Najibi, M., Rastegari, M., Davis, L.S.: G-cnn: an iterative grid based object detector. In: CVPR, 2016 18. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: ECCV, 2016 19. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv:1612.08242, 2016 20. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: Deconvolutional single shot detector. arXiv:1701.06659, 2017 21. There are no sources in the current document.github.com/CAESAR-Radi/SAR-Ship-Dataset 22. github.com/alexeyab/Yolo_mark
Lung Cancer Detection by Classifying CT Scan Images Using Grey Level Co-occurrence Matrix (GLCM) and K-Nearest Neighbours Aayush Kamdar, Vihaan Sharma, Sagar Sonawane, and Nikita Patil
Abstract Lung cancer is the unbridled growth of abnormal cells in the lungs; as the growth of these abnormal cells continues, tumours are formed which obstructs with the natural functioning of the lung. Early cancer diagnoses, combined with treatment and proper medical care, enhances survival and cure rates. This study of lung cancer detection has divided into four stages: a pre-processing stage, image enhancement stage, feature extraction stage, and cancer classification stage. The system thoroughly focuses on detecting lung cancer disease with various image processing and machine learning techniques. The system accepts input in the form of an image called a computerized tomography (CT) scan, which is a medical screening technique used to study and detect lung cancer. This study aims to classify lung cancer into benign and malignant lung cancer using CT scan images. The testing of this system on the given dataset has shown a classification accuracy of 92.37% for determining benign or malignant cancer. Keywords Lung cancer detection · CT scan images · Image processing · Machine learning · Feature extraction · Cancer classification
1 Introduction Cancer is frequently termed as an incurable and excruciating disease. The National Cancer Registry Programme Report 2020 suggests 13.9 lakh cancer cases in India by the end of 2020 [1], and this figure is expected to increase by 13% in five years. Lung cancer is known to spread fast even to the neighbouring organs. There are two types of lung cancer tumours: benign (non-cancerous) and malignant (cancerous).
A. Kamdar (B) · V. Sharma · S. Sonawane · N. Patil Department of Computer Engineering, Atharva College of Engineering, Mumbai, India e-mail: [email protected] N. Patil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_27
293
294
A. Kamdar et al.
Lung cancer in India constitutes 6.9% new cancer cases, also makes up for 9.3% of all cancer deaths in both genders [2], which is a major cause of death. However, it is a misbelief to think that all forms of cancer are incurable and deadly. Early treatment steps can have a big advantage in increasing the survival rate of the patient. Our goal is to design a system where through image processing and machine learning technologies, the system can also help the medical practitioners by reducing the time to detect and diagnose the cancer. This system aims to improve the existing medical technology to detect and diagnose if the patient has benign or malignant tumour. CT scans are the most popular and widely used imaging diagnostic medical techniques used for detection and diagnosing of lung cancer [3]. Lung CT scans are used as input of the system, and to study the image better, we convert them into grayscale images. Then, we implement an image enhancement technique which will deduct the noise in the input image. The enhanced images would further be taken as input for feature extraction to get statistical data about the image which will be used for image classification. This classification will classify if the lung tumour is benign or malignant based on statistical data of the training dataset. This paper is organized as follows: Section 2 brings forward the summary of the relevant work. Section 3 and its sub sections define the problem definition and the proposed method. The analysis of the proposed method and its results are shown in Sects. 4 and 5 bring the study to a close.
2 Literature Survey According to the findings of a study conducted by Qurina Firdaus [4], uses GLCM and SVM methods to detect lung cancer in a six-stage process that includes input CT scan images, pre-processing images, segmentation, feature extraction, classification, and output based on whether the cancer is normal, benign, or malignant. They employ a variety of methods in order to obtain accurate cancer detection results at each stage. The accuracy of this research implementation in detecting lung cancer is 83.33%. Mohd Firdaus Abdullah’s research [5] uses the same kind of process, but with image processing and classification using the kNN method. They discovered that the kNN is a better method and has a higher accuracy after analysing and implementing this research. Sanjukta Rani Jena’s research [6] focuses on three types of feature extraction: shape-based FETs, texture-based FETs, and intensity-based FETs. Ayshath Thabsheera [7] conducted a review on various image enhancing and image classification techniques used for lung cancer detection. Many image enhancement techniques were reviewed in the paper, including the Gabor filter, median filter, and others. They go on to discuss some segmentation techniques, such as thresholding. A summary of feature extraction was also provided, as well as an explanation of a few of the extracted features.
Lung Cancer Detection by Classifying CT Scan …
295
The use of the GLCM feature extraction method is used in this study by Kusworo Adi [8]. The data is extracted from the input using GLCM features such as contrast, correlation, energy, and homogeneity. Avinash [9] uses the Gabor filter and Watershed segmentation for image enhancement and segmentation in another study. The goal of this study is to show that this new method, which uses a Gabor filter and watershed segmentation to diagnose lung cancer quickly, can be used to quickly diagnose lung cancer. The authors of this study by Khin Mya Mya Tun [10] use the median filter for image pre-processing, followed by Otsu’s thresholding for picture segmentation. GLCM is used to extract features, while artificial neural networks are used to classify images (ANN).
3 Proposed Methodology 3.1 Problem Definition Cancer is a catastrophic disease which sets panic into people as there is a lack of a clear way to how to deal with cancer. Firstly, the majority of the people do not know about the early symptoms of various cancer diagnoses, and as a result, tend to ignore the symptoms following their cancer getting worse. Prognosis of cancer survival rates when the cancer is detected in advanced stages is poor. Patient reports are hard to maintain because of various tests, diagnostics, and lots of past medical documents. The current system is flawed in such a way that the doctors spend more time making decisions due to the cure for cancer not being definitive. Diagnoses of cancer by the current methods are inefficient as existing standard procedures are time-consuming and error-prone. In a world where artificial intelligence is becoming integrated with everything, medicine should be at the forefront of it. There are very few methods of predicting (determining) diagnosis for various diseases with the aid of AI and machine learning techniques.
3.2 Proposed System The system will take lung CT scan images as input and then go through four stages: pre-processing, image enhancement and segmentation, feature extraction, and image classification, as shown in Fig. 1.
296
A. Kamdar et al.
Fig. 1 Proposed system
3.2.1
Input Image
A computerised tomography (CT) scan is a medical imaging technique that produces cross-sectional (slice) images of a specific body part. CT scan uses a variety X-ray images taken from around the focused body part and combines them. The system takes a CT scan image input in the form of DICOM image, which is converted into JPEG image to help perform the research as it is easier to study and analyse the image better.
3.2.2
Pre-processing
The input image is now converted into a grayscale image to ease up the further process. We convert the image to grayscale in order to simplify the complexity of image and to extract the intensity data. Also, it is faster to process grayscale image rather than any coloured image.
Lung Cancer Detection by Classifying CT Scan …
3.2.3
297
Image Enhancement
The next phase of the solution is image enhancement. Image enhancement is used to improve pixel intensity, reduce noise, and makes it simple for further processing. We use Gabor filter which is an adaptive filtering technique, which uses bandpass filters for applications like texture analysis or feature extraction. These filters apply a certain kind of band frequencies to an image, and also apply various Gabor parameters functions (1) like sigma, lambda, gamma, and phi into what we call a Gabor kernel. This kernel helps in processing the Gabor filter, which spread across the function in two dimensions according to what we pass to the values of the parameter, giving a filtered image capable read the textures and to extract features of the image. 2 2 (1) G(x, y; λ, θ, ψ, σ, γ ) = exp − x + γ 2 y /2σ 2 exp i 2π x /λ + ψ After this to separate the target object from the background, we utilise image thresholding, which is an effective segmentation technique for images with considerable differences in intensity values between the backdrop and the main object. This method is used to find cancerous areas and convert grayscale into binary images, which are simply black and white.
3.2.4
Feature Extraction
After enhancing the images, these images are passed into feature extraction stage. Feature extraction is a technique to extract vital features like energy, entropy, contrast, etc. which are extracted which will help in the further stage for classification of cancer. For feature extraction, we employ grey-level co-occurrence matrix (GLCM) on enhanced images. In image classification problems, the grey-level co-occurrence matrix (GLCM) texture characteristics are extensively employed. It is a simple yet effective method to extract features out of a grey level image using second-order statistics. The count of co-occurring pixels which is stored in a matrix that is specified across an image occurring together at a given offset is called a co-occurrence matrix. Basically, it is a texture filter functions which provide a statistical interpretation of texture for given image histogram. Here, the offset distance is one pixel and the angles are 0, pi/4, 3pi/6, pi/2. The properties used are energy(2), correlation(3), dissimilarity(4), homogeneity(5), ASM(6), and entropy(7). Energy :
p
(i, j)2
(2)
(i − μi)( j − μj) p(i, j) . σ j σi i, j
(3)
i, j
Correlation :
298
A. Kamdar et al.
Dissimilarity :
N −1
Pi, j |i − j|.
(4)
i, j=0
Homogeneity :
ASM :
N −1
Pi j 1 + − j)2 (i i, j=0 N −1
Pi,2 j
(5)
(6)
i, j=0
Entropy :
N −1
−1n Pi j Pi j
(7)
i, j=0
The statistical data or extracted features thus obtained on an image are saved in a form of a csv file. The data saved in the csv file is used for analysis and image classification.
3.2.5
Image Classification
The final stage is image classification stage where the lung cancer image is classified into benign or malignant. This will be achieved by using kNN machine learning model. The csv files generated in the past stage is passed onto kNN machine learning model for the classification of lung cancer. kNN or k-nearest neighbours is a supervised machine learning model. The basic assumption of kNN is to find the similarity between the trained cases and new test cases, and add the new test cases to their most similar categories that are available from the train cases. First, the training dataset is passed to the model in order for it to be trained, and then, the test dataset is passed on the trained model to predict or classify the cancer. Applying kNN model to the test cases data, the input is the statistical interpretation generated from the feature extraction stage. These statistical values will be compared with the values of the train case data stored in a csv file. This process provides an output which is either 0 or 1, where 0 denotes benign lung cancer, and 1 denotes malignant lung cancer.
4 Experimental Results and Analysis CT scan images are initially converted to grayscale as a pre-processing stage in order to better study and interpret the image (Fig. 2). This grayscale images are then passed through the image enhancement stage which uses Gabor filter. It reduces image noise and improves image texture to help extract the features. The output of filtered image is given below in Fig. 3.
Lung Cancer Detection by Classifying CT Scan …
299
Fig. 2 Grayscale image
Fig. 3 Gabor filtered image
After enhancing the image, we move to image segmentation stage where we used thresholding technique to convert the image into binary image as shown in Fig. 4. Segmentation partitions the image into multiple image segments. These segments Fig. 4 Thresholding
300
A. Kamdar et al.
Fig. 5 Contour segmentation
Table 1 Features extracted for types of cases (T)—benign (0) and malignant (1) T
Energy
Correlation
Dissimilarity
Homogeneity
ASM
Entropy
0
0.181989
0.737039
12.62417
0.319139
0.03312
4.90813
0
0.146102
0.7344
11.85163
0.286115
0.21346
4.920787
1
0.13742
0.875047
7.780956
0.361196
0.01888
5.009644
1
0.139809
0.885935
7.47172
0.36514
0.01955
5.013208
defines the tumour which is meaningful and easier to analyse. As seen in Fig. 5, where certain image parts are selected, and we placed a contour over them. The next step is feature extraction, where we find out different features for all the images, namely energy, entropy, homogeneity, dissimilarity, correlation, and ASM (angular second moment). The features are extracted at every angle and at every offset of pixel range set by its parameters. The values of these features are calculated using GLCM method and saved in a csv file. The following Table 1 shows different values of each features calculated by the GLCM method of different CT images. For classification of the lung cancer, we first train the model on above-extracted features on train dataset. And then, on passing the test dataset on the trained kNN model, the system is getting an accuracy of 92.37% on the test dataset. The study concludes that the accuracy can be increased even further if improved image segmentation process is used.
5 Conclusion Knowledge of early symptoms and diagnosis of cancer is vital towards the prevention and cure for cancer. Thus, to be able to provide help in the treatment of cancer, we have decided to improve pre-existing methods and to make a system that can be
Lung Cancer Detection by Classifying CT Scan …
301
of great use in the treatment process. This study will help hospitals by detecting cancer through image processing which will provide better results. The system takes a minimal amount of time to complete the detection of cancer which may help the doctors to provide better treatment care. We have combined image enhancement, feature extraction, and machine learning methods which results in an accuracy of 92.37%.
References 1. Healthworld—Economic Times. https://health.economictimes.indiatimes.com/news/diagno stics/india-to-have-13-9-lakh-cancer-cases-by-year-end-15-7-lakh-by-2025-icmr/78572244, Date of visit: 02/09/2020 2. Lung Cancer: Prevalent Trends and Emerging Concepts. https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC4405940/, Date of visit: 02/09/2020 3. National Foundation for Cancer Research. https://www.nfcr.org/cancer-types/lung-cancer/. Date of visit: 04/08/2020 4. Firdaus, Q., Sigit, R., Harsonoand, T., Anwar, A.: Lung cancer detection based OnCTScan ımages with detection features using gray level co-occurrence matrix (GLCM) and support vector machine (SVM) methods. 2020 International Electronics Symposium (IES), Surabaya, Indonesia, 2020, pp. 643–648. https://doi.org/10.1109/IES50839.2020.9231663 5. Firdaus Abdullah, M., Noraini Sulaiman, S., Khusairi Osman, M., Karim, N.K.A., Lutfi Shuaib, I., Danial Irfan Alhamdu, M.: Classification of lung cancer stages from CT scan ımages using ımage processing and k-nearest neighbours. 2020 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia, 2020, pp. 68–72. https://doi.org/10. 1109/ICSGRC49013.2020.9232492 6. Jena, S.R., George, T., Ponraj, N.: Texture analysis based feature extraction and classification of lung cancer. In: 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 2019, pp. 1–5. https://doi.org/10.1109/ ICECCT.2019.8869369 7. AyshathThabsheera, A.P., Thasleema, T.M., Rajesh, R.: Lung cancer detection using CT scan ımages: a review on various ımage processing techniques. In: Nagabhushan, P., Guru, D., Shekar, B., Kumar, Y. (eds.) Data Analytics and Learning. LectureNotes in Networks and Systems, vol. 43. Springer, Singapore (2019). https://doi.org/10.1007/978981-13-2514-4_34 8. Adi, Kusworo & Widodo, Catur & Widodo, Aris & Gernowo, Rahmat & Pamungkas, Adi & Syifa, Rizky.: Detection lung cancer using gray level co-occurrence matrix (GLCM) and back propagation neural network classification. J. Eng. Sci. Technol. Rev. 11. 8–12 (2018). https:// doi.org/10.25103/jestr.112.02 9. Avinash, S., Manjunath, K., Kumar, S.S.: An improved image processing analysis for the detection of lung cancer using Gabor filters and watershed segmentation technique. In: 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 2016, pp. 1–6. https://doi.org/10.1109/INVENTIVE.2016.7830084 10. Khin, M., Mya, T., Aung Soe, K.: Feature extraction and classification of lung cancer nodule using ımage processing techniques. Int. J. Eng. Res. Technol. (IJERT) 3(3) (2014)
Real-Time Translation of Indian Sign Language to Assist the Hearing and Speech Impaired S. Rajarajeswari, Naveen Mathews Renji, Pooja Kumari, Manasa Keshavamurthy, and K. Kruthika
Abstract The most predominant mode of communication among people with hearing and speech impairment is the sign language. Therefore, if those concerned individuals are not equally skilled in sign language, there will be a communication barrier between them, thereby making the availability of an interpreter/translator immensely important for communication. Efficient recognition of gesture-based communication, hence, becomes essential to overcome the obstacle to communication among the people with hearing/speech impairment and people without such impediment. The ISL gesture motions are recorded by utilizing a mobile camera, and these pictures are then processed appropriately by our model. The user receives the output which is returned by the SoftMax layer as the most probable class after comparing confidence scores. The model created as part of this project availed us an accuracy of 91.84% under the adaptive thresholding filter. An accuracy of 92.78% was determined under the hybridized SIFT mode with adaptive thresholding filter. Keywords Image processing · Semantic segmentation · Machine learning (ML) · Convolutional neural networks (CNN) · Indian sign language (ISL) · Android studio · Java · Adaptive thresholding filter · Deep learning (DL) · Artificial neural network (ANN) · Scale-invariant feature transform (SIFT)
1 Introduction As a reality, there are just around 250 confirmed gesture-based communication mediators in India, interpreting for a hard of hearing and mute populace of between 1.7 million and 7.5 million. For the attenuation of this reliance of people with hearing and speech handicap on mediators for translation, and to bridge the communications barrier, we propose an ISL (Indian Sign Language) interpretation framework which perceives the communication via hand signs and hand motions from pictures or S. Rajarajeswari · N. M. Renji (B) · P. Kumari · M. Keshavamurthy · K. Kruthika Department of Computer Science Engineering, M.S. Ramaiah Institute of Technology Affiliated To Visvesvaraya Technological University, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_28
303
304
S. Rajarajeswari et al.
recordings, changes them over into their corresponding message, utilizing different machine/deep learning algorithms and yields it as speech or speech. Current pattern recognition and image processing capacities utilized in machine/deep learning in tandem with the utilization of convolutional neural networks on a custom-generated dataset of pictures have helped us in making a model that perceives the Indian Sign Language motions with high accuracy. Hand signs and facial expressions are the key features involved in sign language communication. Gesture-based communication is one of the most essential and widely used manners of correspondence for humans with hearing and speech handicap. However, not every person is well versed in the subtleties of gesture-based communication. Our objective is to close the correspondence gap between the two, thereby empowering the said parties to converse sans any problem. We attempt to make a model that can precisely perceive and decipher motions of the gesturebased communication to text/speech. As gesture-based languages differ territorially, it becomes a difficult task to communicate. Additionally, as the vast majority of people are not trained or even familiar with hand signs and gestures of sign languages, the need to have a form of translation increases. The framework ought to dynamically follow body developments, for example, finger positions, hand signals, arm developments concerning their body, head position, its development and outward appearances. The identified gesture is iteratively compared to the Indian Sign Language hand signs. The class which gets the most elevated certainty scores is utilized to decipher the motions and articulations to a method of communication comprehensible by people without hearing and speech impedance. The model will follow dynamic along with static hand signals including finger and hand position and arm development motions to interpret the letter, word or sentence in communication via gestures. The paper initially explores the various related works that have been performed which is followed by the main focus of the article that involves issues, problems, methodology, results and ends with conclusion. The methodology describes in detail the dataset gathering and processing, model building, training and its testing as well the algorithmic information and finally a brief on the android app.
2 Background A significant number of approaches have been proposed to solve the problem of gesture-based language translation. In any case, a dominant part of these makes use of MATLAB for image processing, followed by calculations through ML. Further, artificial neural networks have been capable of developing simulations to achieve the translation. Since sign language consists of various movements and gesture of hand; therefore, the accuracy of sign language depends on the accurate recognition of hand gesture. Vision-based approaches require a camera for capturing the hand gesture or body part gesture. This gesture in the form of video or images is then given to the computer for recognition purposes. This work is seen in [1] Vision Based Hand
Real-Time Translation of Indian Sign Language …
305
Gesture Recognition Using Fourier Descriptor for Indian Sign Language-Archana Ghotkar1, PujashreeVidap1 and Santosh Ghotkar2, Signal & Image Processing: An International Journal (SIPIJ) Vol.7, No.6, December 2016. Though this approach looks simple, but actually this approach has many challenges like background problems, variation in lighting condition, color of skin etc. Recognition time, computational complexity, robustness is some of the constraints posed by the system. In vision-based techniques, bare hands are used to extract the information required for recognition. The advantage of this method is that it is simple, natural and it directly interacts with the user with the computer. Necessary features are extracted from the segmented hand region from the video frames and stored in the database for training. This method is also expensive due to the use of sensory gloves. Approach Based on Colored Marker. Right now, hued marker hand gloves are worn in the hand. The shade of the glove is utilized to follow the development, position and movement of the hand. Its restriction is that it is likewise not a natural method for PC human communication. A survey of recent hand gesture recognition systems is presented. Key issues of hand gesture recognition system are presented with challenges of gesture system in [2] Hand Gesture Recognition: A Literature Review-Rafiqul Zaman Khan and Noor Adnan Ibraheem, International Journal of Artificial Intelligence & Applications (IJAIA), Vol.3, No.4, July 2012. The methods used are parametric and nonparametric techniques, Gaussian Model and Gaussian Mixture Model are parametric techniques, and histogram-based techniques are non-parametric. Some research overcome this problem using data gloves and colored markers which provide exact information about the orientation and position of palm and fingers. Others used infrared cameras, and range information generated by special camera time-of-flight camera, although these systems can detect different skin colors under a cluttered background, but it is affected with changing temperature degrees besides their expensive cost. Some preprocessing operations are applied such as subtraction, edge detection and normalization to enhance the segmented hand image. Good segmentation process leads to a perfect features extraction process and the latter plays an important role in a successful recognition process. Feature vectors of the segmented image can be extracted in different ways according to particular application. Considering the limitation of data glove/sensor-based approach vision-based hand gesture recognition system was designed using Indian sign language in the proposed work in [3] Hand Gesture Recognition for Sign Language Recognition: A Review— Swapnil Chandel, Mrs. Akanksha Awasthi, IJSTE—International Journal of Science Technology & Engineering | Volume 2 | Issue 10 | April 2016. System was considering manual alphabets and numbers for recognition and application interface of text and voice was given for recognized gesture. The proposed algorithm developed for both single and two-handed static ISL alphabet with a single normal web camera. Recognition was done using nearest neighbor classification using Euclidean distance. Five persons dataset of a total 780 signs of A–Z alphabet were considered. Here, two datasets were considered for testing the algorithm. With NMC approach, Dataset 1 which contained sample images with varying distance, different illumination lighting and variable change in shapes of different persons gave average true positive rate
306
S. Rajarajeswari et al.
76%, whereas dataset 2 contained similar shape for signs of different persons with fixed distance achieved up to 79. 23% true positive rate. The recognition result can be improved by adding more robust features and efficient classifier. Result shows that perfect separation, same enlightenment condition definitely improved the true positive rate. Here, the result of ISL manual alphabet is presented and analyzed for Fourier descriptor with two classifiers. The system is suitable for both single-handed and two-handed gestures. To increase the recognition efficiency, contour-based as well as region-based shape descriptor need to be considered. A hybrid feature descriptor, which consolidates the benefits of SURF and Hu Moment Invariant techniques, is utilized as a joined list of capabilities to accomplish a decent acknowledgment rate alongside a low time complexity in the work in [4] Hand Gesture Spotting Using Sign Language through Computer Interfacing-Neha S Chourasia and Sampa Barman, Journal of Engineering Research and Applications, January 2014.ISSN:2248–9622, Volume-4, Issue-1(Version2). Department of Electronics and Communications Engineering, RTMNU (Nagpur). Derived features from the available feature set is introduced to make the system resilient to viewpoint variations. K-nearest neighbor (KNN) and support vector machine (SVM) are made use of for hybrid classification of single marked letter. This paper displays a procedure which perceives the Indian Sign Language (ISL) and converts into normal text. The technique comprises three phases, specifically a training stage, a testing stage and a recognition stage. To create a new feature vector for sign classification, combinational parameters of Hu invariant moment and structural shape descriptors are created. Experimental results have shown a success rate of 96%. A vision-based approach for converting dynamic gesture into text and then voice is proposed in [5] Twinkal H. Panchal and Pradip R. Patel in their paper ‘A Novel Approach of Sign Recognition for Indian Sign Language’ published in IJIRSET in the year 2018. The proposed methodology uses video preprocessing, skin color filtering, key frame extraction with histogram analysis, feature extraction and classification techniques. In preprocessing, skin color filtering has been used for noise removal and hand detection. In feature extraction, Eigen value and Eigen vector has been used to extract the feature of the image. In classification, Euclidean distance is used to recognize the sign. Since more work has been done in the static sign recognition system, it is a good approach, but more work needs to be done in words and sentences of ISL. A system to facilitate communication between the normal and deaf–mute people was proposed in the paper [6] Channaiah Chandana K, Nikhita K, Nikitha P, Bhavani N K and Sudeep J in their paper ‘Hand Gestures Recognition System for Deaf, Dumb and Blind People’ published in the International Journal of Innovative Research in Computer and Communication Engineering. The system uses MATLAB for hand gesture detection. After this an ID is generated, which is processed by microcontroller and then sent to the FM_MI6P unit where it is matched with the stored commands in the SD card. The speaker then outputs the voice of the respective command. Only red and green combination is recognized by the webcam; hence, colored tapes were used which is a drawback of the system.
Real-Time Translation of Indian Sign Language …
307
Image processing techniques such as blurring, masking and eroding along with other logic such as analyzing hue, saturation and value of human hand (HSV) to process the real-time stream of video data captured by the Raspberry Pi’s camera and then mapped the processed visual input to its corresponding audio, which was further amplified using an amplifier was used in [7] Rishad E K, Vyshakh C B, Shameer U Shahul in their paper ‘Gesture Controlled Speaking Assistance for Dump and Deaf’ published in IOSR Journal of Computer Engineering. The efficiency of the same was tested to be 70%. Variation in lighting, correct positioning of hand, HSV variation from person to person and limited real-time processing capability of Raspberry Pi were a few of the drawbacks. Captured images were preprocessed to form binarized image in the work by [8] Rupesh Prajapati, Vedant Pandey, Nupur Jamindar, Neeraj Yadav and Prof. Neelam Phadnis in their work ‘Hand Gesture Recognition and Voice Conversion for Deaf and Dumb’ published in International Research Journal of Engineering and Technology (IRJET). Principal component analysis (PCA) algorithm is used to extract the best featured image from the database. Database is created to train the system. For classification of test images, KNN and SVM algorithms are used. The accuracy of the system was 90%. A video database was created and utilized which contained several videos, for a large number of signs in [9] Recognition of in real time Anup Nandy, Jay Shankar Prasad et al. A repository of training and testing samples was created with a fixed dark background since background removal is computationally complex task and affects the recognition result in real time. The Indian Sign Language videos were first split into image frames and then converted into grayscale images. To this, Gaussian filter and normalization are applied. For feature extraction, the hand skin color information is used along with computation of gradient at each point for extracting the direction histogram. Three tap derivative filter kernel is applied to compute angles, which are normalized and quantized into bins. For classification, the method of K-nearest neighbor and Euclidean distance were used, and K-nearest neighbor was observed to give more accurate predictions. A database of hand gestures was generated and captured by a digital camera in [10] LabVIEW-based hand gesture recognition for deaf and mute people Vaishali S. Pande. For preprocessing, the noise removal technique of Canny edge detection is used. The image thus obtained is converted into an image histogram. From this, the mean value of the image is calculated. Vaishali S. Pande implements gesture recognition using LabVIEW which is a professional development environment for graphical programming. After preprocessing, the image subtractor is used to calculate the difference between the test sample and the images in the database. This difference is given to the in range and block where the upper limit and lower limit is specified as 1 and 0, respectively. The whole Vi for comparison and decision-maker is enclosed in the for loop. The for loop runs four times, in the first loop the Webcam captures the image, and it is stored. In the second loop, the Canny edge detection reads the image and converts the image to Canny edge image. In the third loop, histogram reads the images in the Canny edge form and calculates the mean values, and finally, in the fourth loop, comparison is done and decision is displayed in a suitable manner.
308
S. Rajarajeswari et al.
Image processing for gesture recognition is the main idea behind [11] Gesture Controlled speaking assistance for deaf and mute Rishad E.K, Vyshakh C B, et al. For processing a gesture, the Raspberry Pi board is used. HSV tracking is done using CV Python. The authors found hue, saturation and value of human hand is in range 0– 30, 30–180 and 60–255, respectively. Rishad E.K, Vyshakh C B, Shameer U. Shahul subjected human hand to a sequence of image processing steps which encompassed blurring, cropping, eroding and masking. Gesture recognition is done by drawing a line to the farthest point of contour from its center and selecting the points which are at a distance greater than 75% of the line previously drawn and which are at least 40 pixels away from each other. The lines are then drawn from these points to the center of the counter and the biggest angle between the lines is calculated. By checking the number of lines and biggest angle, each gesture is recognized, and corresponding audio is played. The authors saved audio files within Raspberry Pi and audio mapping was done using Python. Variation in lighting was observed to result in degradation of pixels which affected correct determination of hand gestures. The HSV variation from person to person was also observed to affect efficiency of gesture recognition. A simple classifier for predicting hand gestures was used in [12] Real time hand gesture recognition using finger Segmentation Zhi-hua Chen, Jung-Tae Kim, et al. The images are captured under identical background using a digital camera. The color of the skin is measured using the HSV model. Image of hand is then obtained using background subtraction method, and the result is transformed into a binary image. Calculation of palm points, wrist points and palm mask are done to facilitate fingers and palm segmentation. In the segmented image of fingers, the labeling algorithm is applied to mark the regions of the fingers. The hand gestures are then recognized using a rule classifier which predicts the gesture according to the number and content of the fingers detected.
3 Main Focus of the Article 3.1 Issues, Controversies, Problems One of the proposed arrangements managed ISL two-handed signs by capturing the images with the assistance of MATLAB as a progression of multiple pictures. Then, these images are run under recognition and translation algorithms followed, by which the output is taken care of by NLP algorithms to produce syntactically correct messages. The image processing technologies used here are highly sensitive to the background of the image and the image lighting conditions. Another was a computer vision and artificial neural network (ANN) method which involves classification of the images through the webcam as a video. The network is trained by images in the dataset and then the video, divided into frames, is sent as input into the network. The Haar Cascade Classifier interprets motions and its translations. Finally, a speech synthesizer generates the speech from the translated message (text).
Real-Time Translation of Indian Sign Language …
309
Another method observed used a three-stage solution: The training of the model, testing its efficiency and accuracy, and the final translation. All the labeled images constitute the training dataset which is used for the training of the model. The most fitting biases, weights and other variables that are required to produce the most accurate model is decided by the training stage. The trained model is then input with a dataset with unlabeled images which constitute the testing dataset. This assesses the veracity of the model by testing it. The outcomes of the unlabeled images which are not exactly the same yet to some degree akin to the images of the training dataset. Finally, it is used to translate gesture-based communication in real time. Although hardware implementation for gesture recognition and translation produces greater veracity in specific circumstances, it strays away from our aim of making correspondence simple and easily available as hardware equipment segments can be costly and require regular maintenance. Our objective is to mitigate the correspondence barrier with the most easily available approach. Various ML problems are dealt with by making use of deep neural networks to tackle the problem of recognition and translation of gesture-based communication can produce appreciable findings, which is exactly what our model investigates.
3.2 Methodology The authors start by defining the problem statement for the project followed by gathering datasets, data analysis, preprocessing, preparation, training and evaluation, and finally, prediction is done at pixel level and performance is analyzed. Figure 1 depicts the flow of data and operations followed in the project. Creating Training Dataset We made use of a Python program in tandem with the Python cv2 library to capture the images for our training dataset. Fixed dimensions were decided for the training samples, which were decided to be 640 × 480 pixels. We then apply adaptive threshold filter or sift filter hybridized with adaptive thresholding onto the images using the functions cv2.adaptiveThreshold and cv2.xfeatures2d.SIFT_create() available in OpenCV. We make use of the function cv2.gaussianBlur() to eliminate Gaussian noise from our training samples. Our training sample comprised over 2600 images of ISL alphabets. All of the training images of ISL alphabets were captured using a webcam, by utilizing a Python program. Each ISL alphabet class comprises 100 training pictures. After applying the adaptive thresholding and the hybridized Adaptive x SIFT filter, the image samples are labeled and stored in the training set folder. To build our training dataset, we made use of a constant background with a fixed lighting condition since the elimination of the background proved to be an arduous task speaking computationally which could impact the accuracy of the results. Figure 2 shows some of the images from our training dataset. While building our training set, we made use of a white piece of clothing to facilitate efficient handling of hand gestures.
310
S. Rajarajeswari et al.
Fig. 1 Flow diagram
Detection of Features The adaptive threshold filter eliminates the background from the foreground image by contrasting and selectively choosing the pixel intensities of every region. A constant threshold value is applied to each pixel. In case the pixel value is found to be lesser than the specified threshold, it is set to 0, in any other case, the max value is set. We make use of the OpenCV function cv.threshold to apply this filter. The source image, a grayscale image, is the first argument taken into this function, and the threshold value is the second argument which classifies pixel values. Pixels that exceed the threshold will be reassigned with a numerical value specified as maxvalue, which is the third argument to this function. Figures 3 and 4 are examples of images after the application of the adaptive threshold filter. The adaptive threshold filter isolated the foreground image object (hand gesture) from the background. Scale-invariant feature transform (SIFT) is an algorithm that isolates key points within an image. Isolated key points of the images are scale and rotation invariant, which means, the accuracy of the output is unaffected by the orientation of the images in the testing and translation stages. Our model makes use of a hybridized algorithm of SIFT with an adaptive threshold filter. We make use of the sift.detect() function from the OpenCV to locate Keypoints in our training samples. Given below are Figs. 5 and 6 showing training samples after applying the sift filter. The images below depict the training samples after the application of adaptive thresholding followed by a SIFT filter. The key points located by the sift.detect() are represented by the green circles. These key points help in classification and translation of hand gestures. To the test image, the sift filter will be applied and the key points will be compared to find
Real-Time Translation of Indian Sign Language …
Fig. 2 Labeled image files depicting the ISL hand signs Fig. 3 Adaptive thresholding filter applied on the letter ‘X’ in ISL hand signs
311
312
S. Rajarajeswari et al.
Fig. 4 Adaptive threshold filter when applied on the letter ‘I’ in ISL hand gesture
Fig. 5 SIFT mode filter applied on the letter ‘X’ in ISL hand signs
similarity. We calculate the Euclidean distance between the feature vectors of the test image and training samples to find the nearest neighbor. Model Building Our model comprises CNN layers, flatten and max pooling layers, dense layers and dropout layers to reach an overall of 24 layers in total. The model proposed by us is a sequential model as in each layer, it has only a single input and a single output. • Our model uses the following modules: – Image-related operations are performed by importing cv2Adaptive threshold filter cv2.adaptiveThreshold().
on
the
images
is
applied
using
Real-Time Translation of Indian Sign Language …
313
Fig. 6 SIFT mode filter applied on the letter ‘W’ in ISL hand signs
The noise and texture on the images is reduced using cv2.GaussianBlur(). The SIFT algorithm ID applied using cv2.xfeatures2d. SIFT_create() greencv2.drawKeypoints(img,kp,img,(0,0,255)) The key-points are located using the greencv2.drawKeypoints(img,kp,img,(0,0,255)) – To use a TensorFlow backend, we import TensorFlow and backend. – To create a simple model, we import Sequential from keras.models. – To perform various layer operation functions, we import Dense, Conv2D, MaxPooling2D, Flatten and Dropout from keras.layers. We import plot model from keras.utils to visualize the model. • One of the feature detection algorithms we used is SIFT. Before reaching the key point matching phase, it works mainly in five steps: – Scale-space extrema detection: Comparison of one pixel with eight of its neighbors as well as the next nine pixels and the nine pixels from the previous scales occur. The local extrema identify the potential key points. Basically, it means the particular keypoint can be best depicted in that particular scale. – Keypoint Localization: In Keypoint localization, the expansion of scale the space in Taylor series is done to obtain a more precise location of extrema and will lead to rejection of the extrema if the intensity is lower compared to the threshold value which is known as the contrastThreshold in OpenCV. – Orientation Assignment: Orientation will be assigned to each keypoint in order to achieve invariance to image rotation. The gradient magnitude and direction are calculated in that region after a neighborhood is taken around the keypoint location depending on the scale. – Keypoint descriptor: 16 sub-blocks of 4 × 4 size are obtained after dividing the 16 × 16 neighborhood around the keypoint. An orientation histogram of
314
S. Rajarajeswari et al.
eight bins is created for each sub-block. Furthermore, several measures such as illumination changes, rotation and the like are taken to achieve robustness. – Keypoint Matching: The nearest neighbors of the two images, we match the Keypoints between them. The match which is the second-closest may be in some cases near to the first. Noise or any other reasons may cause this. If the ratio of second-closest distance and closest distance ends up being more than 0.8, it will lead to their rejection. It discards only 5% of the matches. It causes elimination of 90 percent of the false matches. Adaptive threshold filtering is also used. It segments an image. The values of intensity of all the pixels whose values of intensity are greater than a threshold value is set to a foreground value, and the rest of the pixels are set to the background value. The difference of the pixel intensities between the two regions is utilized to isolate the background image from the image of the hand. The model is built having its first layer as a conv2D () layer using libraries of Python that are keras and TensorFlow. This layer is a convolution 2D layer, and such a layer is a hidden layer and will be used in image analysis for detecting patterns. The cov2D layer performs an operation called cross correlation where some filters which are pattern detectors are mentioned for each layer which are basically just small tensors of values. Any textures, colors, shapes, curves and edges can be detected by these filters as they get more and more sophisticated. CNN automatically derives these filters or pattern detectors as it combines and trains each layer at every epoch. The filter matrix takes values at random initially, and it improves and learns as it gets trained. In the image matrix, these filters convolve for pattern identification. Wherever the values coincide on convolving, the kernel and the image matrix and their dot product are considered. Take a look at the filter and image matrix below. The operation will take place as shown below
4 321
8 6 7 5 → Image matrix 17 18 12 11 1 0 [13 ∗ 1] 16 15 14 101 1 0 1 → filter kernel The values that do not coincide with the kernel as it is convolved along with the image matrix will not be considered during the operation. “Relu” which is the activation function is used in the conv2D layer. All negative values of the matrix are removed using ReLU activation and gradients’ non-saturation is ensured so that in a relatively equivalent time, each epoch runs along with quick weight updates, and all this is because of increased gradient values. The conv2D layers are followed by the Max Pooling layers. They are a useful reduction of dimensions of the gesture in image and is done by considering the greatest value in the pool in every stride on the matrix.
Real-Time Translation of Indian Sign Language …
315
It is determined by us that the size of the pool is a 2 × 2 matrix. The computational load is also reduced as the output dimension is reduced, and thereby making the model more efficient as greater areas are taken into consideration at once, and that will lead to reduction of network parameters which aids in minimizing overfitting problems that may be created. This is, as Max Pooling picks in each pool stride, the most activated pixel.
03 11
The most activated pixel in the above pool is ‘3.’ After each max pool layer, dropout layers were added and an initial increase in training duration was observed. It was resolved by increasing the number of epochs. We then obtained greater accuracy considering a dropout ratio of 0.25 to max pool layer’s output. Then, a dense layer with a SoftMax activation layer was added to the model. This is after 23 layers of conv2D, dropout layers and max pool layers. The correct classification of the input image is determined by the confidence scores computed by SoftMax or the resultant probability from output of the final hidden layer. Output will look like: ⎡
⎤ 0.002 ⎣ 0.118 ⎦ 0.800 Here, confidence scores for each label are 0.002, 0.118 and 0.800. The highest confidence score is 0.8 in the example of the SoftMax layer in the output. Therefore, a particular label will have input images classified under it. Finally, the ‘adadelta’ optimizer is used to compile the model with 24 layers. • Adadelta optimization addresses two drawbacks: – The continuous learning rates’ decay throughout the training – The need to manually select a global learning rate The weighted average in each layer in each epoch is given to the Adadelta optimizer in order to calculate the new learning rates. This prevents the learning rate from converging slowly until reaching a global minimum before it starts increasing. Model Training The adaptive thresholding filter and the hybridized adaptive threshold x SIFT filter datasets are used to train the model. The images are converted to an array and labeling is done to create the dataset. Then, samples are randomly selected from the dataset and are shuffled to ensure accuracy and randomness of data within the array. 75% of the data is utilized for training of the model after splitting the data into train and test
316
S. Rajarajeswari et al.
Fig. 7 CNN flow diagram
sets. The training is done for a particular number of epochs. We selected that number to be 50 with a batch size of 10. Into the hdf5 file, the weight of each neuron will be overwritten for usage in Adadelta in order to determine the learning rate for the layers in the next epoch. Model Testing The models will be trained using the adaptive threshold filtered and the SIFT filtered datasets. Test data comprises 25% of the dataset. The model is then tested after the training is completed in order to determine the performance of the model, and the metric used for the same is accuracy. The test accuracy and loss for every epoch can be observed. Figure 7 can be used to understand the flow of network implementation. • In the CNN model, following were observed – In adaptive threshold mode, an accuracy of 85.09% and a loss of 52.70% was observed on the test data. – In scaled-invariant feature transform (SIFT) mode, an accuracy of 83.20% and a loss of 53.94% was observed. Mobile App Integration The mobile application was developed in Java8 using Android Studio Development Environment and Android NDK. The minimum API Level required for the application to run the application is 16. The deep learning model is stored as a protobuf file. It further freezes by passing the appropriate input and output node names. The model is then imported to the app. We need NDK because we have to inference from TensorFlow which is written in C++ . We then call the feed function which outputs the predicted result (Fig. 8).
Real-Time Translation of Indian Sign Language …
317
Fig. 8 Flow of data-low-level
Android App Operation The hand gestures and image signs were detected within the region of interest (ROI) and the frames were translated to ISL in real time as facilitated by the working Android app regardless of the background that the gesture was detected within. The real-time recognition and translation of several ISL hand signs are depicted in Fig. 10 which are screenshots from the mobile application (Fig. 9).
4 Results 4.1 Performance Analysis on the CNN Model Comparing the 2 Modes Adaptive Thresholding Filter See Table 1 and Fig. 11. The model on initial training had very insignificant accuracy until the tenth epoch, and upon further training, the accuracy was observed to increase at an exponential rate and then flat lining after the 40th epoch to reach a maximum of 92.23% for training accuracy and 85.09% for testing accuracy (Fig. 12). The model on initial training had a very high loss until the tenth epoch, and upon further training, the loss was observed to decrease at an exponential rate and then flat
318
S. Rajarajeswari et al.
Fig. 9 Workflow of the application
line after the 40th epoch to reach a value of 0.2081 for training accuracy and 0.5270 for testing accuracy. Sift Mode Filter See Table 2 and Fig. 13. The model on initial training had very insignificant accuracy until the tenth epoch, and upon further training, the accuracy was observed to increase at an exponential rate and then flat lining after the 40th epoch to reach a maximum of 92.54% for training accuracy and 83.20% for testing accuracy (Fig. 14). The model on initial training had a very high loss until the tenth epoch, and upon further training, the loss was observed to decrease at an exponential rate and then flat lining after the 40th epoch to reach a value of 0.2120 for training accuracy and 0.5394 for testing accuracy. Our model using the convolutional neural networks algorithm using adaptive thresholding and a hybridized adaptive thresholding and the SIFT mode filter detects
Real-Time Translation of Indian Sign Language …
319
Fig. 10 ISL hand signs real-time recognition by mobile application Table 1 Results after training the model for 50 epochs successfully
Losses
Accuracies
Training
0.2081
0.9223
Validation
0.6665
0.8647
Testing
0.5270
0.8509
Fig. 11 The model accuracy on adaptive threshold filter per epoch represented graphically
320
S. Rajarajeswari et al.
Fig. 12 The model loss on adaptive threshold filter per epoch represented graphically Table 2 Results after training the model for 50 epochs successfully
Losses
Accuracies
Training
0.2120
0.9254
Validation
0.6486
0.8471
Testing
0.5394
0.8320
Fig. 13 The model accuracy on SIFT mode filter per epoch represented graphically
Real-Time Translation of Indian Sign Language …
321
Fig. 14 The model loss on SIFT mode filter per epoch represented graphically
ISL hand signs in real time with a high accuracy. The model created using raw images under no filter performed with significantly less efficiency as well as accuracy. Moreover, the dataset of images with no filter obtained satisfactory result parameters only after 1000 epochs. Future development into research on the feature extraction on images with no filter may help increase the image processing capability and accuracy of the sign language translation models.
5 Conclusion Since the vitality of technological innovations in the field of algorithm efficacy and image processing will always remain atop, the possibilities are endless to the level of development possible in the gesture-based language translation. Further investigation into deep learning will surely prove more fruitful than currently seen. Hence, under such continuous innovations, there will always be improvement and the need to create a more up to date model for sign language translation. Further, our model of image processing can be used for several other applications such as gesture tracking, gesture/motion-controlled apps and even for the recognition and translation of other regional sign languages as well as objects. The fundamental aim of this project is to try and aid or assist individuals deprived of hearing capabilities to communicate smoothly with people without the particular physical impairments by making use of computational innovations in machine learning and image processing. A primary obstacle in gesture or sign-based language
322
S. Rajarajeswari et al.
communications is the fact that it varies regionally across the world making it difficult for inter-communication between differently hearing abled people from other regions to communicate with hearing-impaired individuals from another region. Another obstacle is the fact that most people do not have even the slightest clue of how sign language works or its syntaxes. Thus, making the need for a translator of high importance to help facilitate communication to interpret as well as understand what the other is trying to express. Our venture aims to reduce the dependency on human translators and ease communications made in the Indian Sign Language. Facial expression along with the hand signs plays a vital role in giving the messages in the sign language its substance and tone. The current scope of this project is limited in its ability as it is yet to understand and translate facial expressions as well. The main reason for that is the lack of ISL facial expression dataset and the lack of ISL expertise to create one.
References 1. Ghotkar, A., Ghotkar, S., Vidap, P.: Vision based hand gesture recognition using Fourier descriptor for Indian sign language. SIPIJ 7(6) (2016) 2. Rafiqul and Noor.: Recognition of hand gestures. Int. J. Artif. Intell. Appl. (IJAIA) 3(4) (July 2012) 3. Akanksha, S.: Sign language gesture recognition: a review. IJSE—Int. J. Sci. Technol. Eng. 2(10) (April 2016) 4. Xie, C., Li, C., Zhang, B., Chen C, Han J Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition, Journal of Pattern Recognition arXiv:1707.03692v1 [cs.CV] 12 Jul 2017 5. Chen, Y., Zhao, L., Peng, X., Yuan, J.: Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention, Research Gate. arXiv:1907.08871v1 [cs.CV] 20 July 2019 6. Kakde, M.U., Rawate, A.M.: Hand gesture recognition system for deaf and dumb people. IJESC 6(7) (2016) 7. Singha, J., Das, K.: Indian sign language recognition using Eigen value weighted Euclidean distance based classification technique (March 2013). Int. J. Adv. Comput. Sci. Appl. 4(2) (2013) 8. Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: 4th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). arXiv:1901.10323v3 [cs.CV] 18 Oct 2019 9. Tang, H., Liu, H., Xiao, W., Sebe, N.: Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion. arXiv:1901.04622v1 [cs.CV], 15 Jan 2019 10. Abaviasani, M., Joze, H., Patel, V.: Improving the performance of unimodal dynamic handgesture recognition with multimodal training. In: Conference on Computer Vision and Pattern Recognition, arXiv:1812.06145v2 [cs.CV] 11. Pande, V.: Labview based hand gesture recognition for deaf and dumb people. Int. J. Eng. Sci. Invention (IJESI) ISSN 7(4), Version. V (2018) 12. Liu, Y., Nadai, M., Zen, G., Sebe, N., Lepri, B.: Gesture-to-gesture translation in the wild via category-independent conditional maps. In: 27th ACM International Conference on Multimedia, arXiv:1907.05916v3 [cs.CV] (2019)
EYE4U-Multifold Protection Monitor Vernika Sapra , Rohan Gupta, Parikshit Sharma, Rashika Grover, and Urvashi Sapra
Abstract The universal dissemination of Covid-19 due to the SARS-COV-2 virus has increased the inclination of the world in the backward direction, where it is dallying from its substantial growth approximately by 5 years and is worsening dayby-day. People are suffering from this disease and are trying to fight against it, by taking enormous precautions for themselves like cleansing their hands at regular intervals, wearing single or double-layered masks, eating healthy and home-cooked food, and working from home. The task of detecting if a person is wearing a mask or not and maintaining a proper sanitization is a critical matter of concern. In this paper, we propose a multifold protection monitor which collects the information about the same and helps us to sustain the etiquette of the place. This system necessitates the latest technologies like the wireless communication for sensor networks which is used to check the temperature of the person and the room they are going into, machine learning, and a few of its libraries like Keras, TensorFlow, and OpenCV used for capturing the images and checking if the person is wearing a mask, database management for the images captured in real-time and analyzing them. With this evolving technology, we were able to capture the images of every person walking into that place, check if they are wearing a mask, and have their hands sanitized properly and check if the room in which they go has nominal room temperature. Keywords Pandemic · Wireless communication · Keras · TensorFlow · OpenCV V. Sapra (B) Manipal University, Jaipur, India e-mail: [email protected] R. Gupta Pt. LR. College of Technology, MDU, Rohtak, India P. Sharma Manav Rachna University, Faridabad, India R. Grover Genpact India Private Limited, Bengaluru, India U. Sapra D.A.V. Centenary College, Faridabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_29
323
324
V. Sapra et al.
1 Introduction Covid-19 has impacted the world horrendously. It has created constant distress among the people related to their surroundings. This pandemic is marked as a blemish to the worldwide economy, education, employment opportunities, careers, and most important livelihood of the population. Covid-19 has not only damaged the physical health of the people but has caused a devasting effect on their mental conditions as well. The ones who are cautious and follow the guidelines or take the necessary precautions to protect themselves from it are saved, and the ones who are impetuous for even a short period are getting caught by this disease. Due to lack of body endurance and low immunity, these people are more susceptible to the disease. As this malady prohibits any contact between the people or any other person around like family or friends, therefore builds a sense of ferocity that they will be abandoned by their own families or near and dear ones. This situation has built a sight of mistrust among the people toward each other because it is very dubious whether the other person is fully taking the precautions and measures as advised to them or not. It is obligatory to wear a surgical mask or any other mask around the clock, washing hands repeatedly if you have visited any place outside full of people, cleansing hands with sanitizer at regular intervals, maintaining social distancing or not going out unnecessarily. Most of the places started allotting their security guards or any other reliable person the duty of sanitization of the place and the various customers visiting it, thereby eluding the chances of any casualties. As of now, out of 7.8 billion worldwide population, only, 180 million (approx.) is infected by this virus, and as per the statistics, 3.90 million deaths have been reported. It has become difficult to identify who is a carrier of this virus or is infected by it. People have faced a lot of difficulties and have suffered immensely, but the advancement in technology has eased out a lot of problems like the activation of various online payment portals for immediate transactions, people started digitizing themselves, online shopping stores got into trend and virtual learning, or video conferencing became the new normal. Despite massive improvisations in technology, some people are still careless and completely irresponsible in following the basic rules toward taking care of themselves and their surroundings. This feckless behavior of the people has cost a lot of lives. To prevent this from happening any further, a multiple steps protection system is introduced named as “Eye4U-Multifold Protection Monitor” which will follow a “multifold protection mechanism.” Eye4U monitor specifies that this is a system that helps the owner of a place (like restaurant, malls, or grocery store) to keep an eye on the people who are not following the steps listed in the monitor and helps them to keep their place virus free by meticulously maintaining them. This will be done with the help of the utilization of “wireless communication” and “multiple sensor devices” along with their various modules. It will involve basic modules of machine learning, OpenCV, and sensor technology. This system will be based on manifold steps of protection, and its working will be dependent upon four basic parameters [1]. Initially, the room/place temperature would be monitored
EYE4U-Multifold Protection Monitor
325
consistently, and if the temperature is within its threshold values, only then additional people will be allowed in the room. As the amount of humidity increases due to an increase in the number of people, this vigorously activates the virus and increases the chances of its traveling from one surface to another. The temperature monitoring will be done round the clock till the time the place is working, and simultaneously, people outside the place will have to follow the listed steps. Firstly, they will have to stand on a marked spot (circle like) where a camera overhead would be placed at the entrance gate of the building/shop or anywhere and then with the help of one of the famous open-source libraries related to machine learning and open CV called “Haarcascade” with which it will be easy to recognize the faces of the people who are wearing a mask or not wearing masks [2]. If they are wearing masks, then they can move to further steps, or else, they will be prohibited from entering that place. Secondly, the person’s temperature would be checked by the “infrared sensor” which emits infrared rays and would be placed along with the camera overhead. If the temperature is within the threshold value, then only the person can move to the third step, or else, the person will not be allowed any further. Lastly, the amount of sanitizer present on their hands will be checked by the “mq3 sensor,” which is used to check the amount of gas/alcohol on the person’s hands; if it is within the desired value, then the person will move forward, or else, the automatic sanitizer spray will spray the sanitizer on their hands. Now, if the person passes all the three steps and the room temperature is adequate to go inside, then only, they can enter through the gates as the gates are connected by the motors which rotate in the direction of opening only if all the conditions are fulfilled. These are the various steps which are to be followed by every person to maintain their safety; because due to poor monitoring, there can occur a lot of casualties [3]. The novelty of the concept exists in various protection mechanisms are being introduced like sanitizer sprayer tunnel, face mask detection systems, but in this system, three most important parameters which can protect the entire society are being recorded. The paper is organized as follows. Section 2 reviews the drawbacks of the various systems present in the current market and their harmful impacts on the society. Section 3 enlists the system proposed which can be easily enacted by following the various threshold parameters mentioned in the same section. In Sect. 4, the entire system description along with the explanation is mentioned which gives the idea of how the system will work at each step. Section 5 reviews the implementation of the entire system with a deep dive into the working of every sensor and how the execution is to be done. In Sect. 6, the result is illustrated, and it further puts a benchmark of the system. Section 7 provides conclusion of the study discussed, whereas Sect. 8 lists the Future Work which can be done in near future. Section 9 covers the references.
326
V. Sapra et al.
2 Drawbacks in the Existing System In the market, currently, systems like sanitization tunnels, foggers, atomizers, disinfectant tunnel kits, and face mask detection systems are all the systems which work independently and have no inter-relation between them as of now. Therefore, the disadvantages are as follows: 1.
Absence of the product in the market:
2.
The current systems do not exist in the combination of all the parameters which are provided by EYE4U-monitor; they only have either face mask detection or just sanitizer sprays. These systems are not used at most of the places because of the pricing and efficiency. Missing or Wrong Results:
3.
False results due to poor device connections or network issues are obtained which make the whole system a failure. Cost inefficient:
4.
Devices like sanitization tunnels or doors or even face mask detectors become extremely costly for routine purposes and cannot be afforded by middle-sized industries or people belonging to average background. Harmful effects of excessive sanitizer: If we spray extreme amount of sanitizer on people and along with that they use their personal sanitizers as well then skin diseases, dryness, or allergies can be caused frequently, which are nowadays reported also, because people unnecessarily keep on sanitizing themselves. But here, if we know that the sanitizer is already present on the hand, then no more sanitizers will be sprayed.
3 Proposed System In this proposed system, we are providing multifold protection mechanisms to people and the places they visit. There are four parameters taken into consideration (as mentioned in). Firstly, the temperature of the place/room in which people are going must be present within the threshold values, i.e., room temperature. Secondly, we tend to check if the person visiting the place/room is wearing a mask or not before entering the place. If they are wearing a mask, then they can move into the place, and if not, then they cannot. Thirdly, the temperature of the person is to be monitored via infrared sensors, and the values recorded should fall within the desired range. Lastly, the amount of alcohol/sanitizer present on their hands is to be detected by the help of variations observed in the values of voltage recorded by the MQ-3 gas sensor. If a person passes all these criteria and the values observed are within the approximate range, then the motors connected to the doors will rotate in a pre-managed
EYE4U-Multifold Protection Monitor
327
Table 1 Threshold ranges of the detection parameters S.no
Parameter
Equipment used to measure
Threshold range
1
The temperature of the room
Temperature sensor DHT-22
22–25 °C
2
Face mask is present or absent
OpenCV, Keras, TensorFlow
Should be present
3
The temperature of the person
Infrared sensor
98.0–98.4 °C
4
Sanitizer presence
Gas sensor MQ3 sensor
Above 60%
direction which will open the door so that the person can enter the place/room. If a person fails to fulfill any of the conditions mentioned above, then they are prohibited from entering the place, and the motors will not rotate at all (Table 1). If the temperature inside a room increases from its usual range, then the humidity increases, and the chances of dissemination of the virus are increased; hence, the temperature sensor (DHT-22) is connected within the system to monitor the temperature level of the room and then help further for deciding to let people inside the place or not. After checking the temperature of the room, if it has a scope to accommodate people without increasing the humidity and its temperature of the room, only then people will follow the further steps of precautions, or else, they will have to wait until the room temperature comes within the desired scale. After the temperature is checked and the people can be accommodated inside the room, then they must follow the further steps like face mask detection, temperature-level check, and sanitizer-level check. In face mask detection, there is a camera that is present overhead, on-to the door and is used to capture real-time images of the people entering the place [4]. The image of a person is captured, and a bold line occurs around their face; if it appears “red” in color, then it means the person is not wearing a mask and it will say “Mask Absent” which depicts the person cannot move further, and if the person wears a mask, then the bold line across their face will say “Mask Present” and will appear “green” in color which depicts the person is good to go to further steps. Along with this, there is a percentage also displayed will tell the amount of face is covered and how much it should be covered [5]. After the face mask detection, the infrared sensors present besides the camera will check the person’s temperature using the infrared rays and will detect the temperature of the person. If the temperature falls within the tentative threshold range, then they can move further, or else, they will not be allowed into that place and would be recommended to either consult a doctor or isolate themselves. As soon as the temperature check is done, the last step is the sanitizer-level check. The level of alcohol present over the hands of the person is checked using the “MQ-series–MQ3” sensor, which is used to detect the amount of the gas (isopropyl alcohol/ethanol used in sanitizers) present on their hands. The values must lie within the threshold range which is calculated by the amount of increase in the voltage of the MQ-3 sensor. Sections 4 and 5 will have the detailed explanation of the system. After passing all these steps successfully, the person can enter the place thereby keeping himself and others around him safe.
328
V. Sapra et al.
Fig. 1 System architecture flowchart
4 System Description The system design is depicted with proper architecture, technological requirements, and step-by-step module explanations along with flowcharts, readings that clearly explain the entire process.
4.1 System Architecture The flowchart as follows is the system design which signifies how the person must follow this process. Since this a multifold protection mechanism, hence, a person will have to follow multiple steps for the protection of their health and maintain the same for their surroundings (Fig. 1).
4.2 System Explanation The flowchart below depicts the entire system—the algorithms utilized, sensors used, number of steps to be followed, and how to follow them all without any failed attempts. When the person initially comes to the place, the first thing which is to be done is checking the room temperature to see if there is a scope to let people enter inside the
EYE4U-Multifold Protection Monitor
329
room or not. The humidity inside the room decides the temperature thereby letting more people in or not. The humidity increases with an increase in temperature, and the temperature rises when the number of people inside a room is high; at that point of time, the room temperature exceeds its normal range, i.e., 22–25 °C. As the temperature is directly proportional to humidity, therefore, if the temperature will rise from its usual values, then the humidity will increase thereby creating a higher inclination toward the promulgation of the virus. As Covid-19 is just about the contact with the infected person or a carrier of the disease, therefore, it is important to maintain the required amount of distance, temperature, and humidity. If the room can accommodate people, then they proceed further, or else, they must wait till the temperature gets back within the threshold range. After checking for the same, the next step is to detect if a person is wearing a mask or not. This is done with the help of a face mask detection system that follows the principles of OpenCV and one of the most famous open-source libraries called “Haarcascade” which is mainly used for facial recognition or image recognition. Here, in this case, a camera is adjusted overhead on the door, and it helps us to detect if a person is wearing a mask or not by clicking the real-time images of the people and then by calculating a ratio termed as “aspect ratio” with which we check what percentage of the persons face is covered with a masks; if its more than 50%, then a bold line of square shape will appear around their face along with a message “Mask Present” and would be green in color, which means a person is good to go to next step of the multifold protection. If the aspect ratio calculated is less than 50%, then the same bold line but red will appear giving a message “Mask Absent,” and due to this, the person cannot go to further steps until he either purchases a mask on the spot (if needed), which will be available in the mask vending machine or adjusts his mask correctly. If a person passes through this step of face mask detection, then they go further. After this, the next step in the system is of checking the temperature of the people who have passed the first step of face mask detection. This is done via an infrared sensor that uses the infrared rays for calculating the temperature of the people. The usual temperature of the people ranges between 98.4 and 98.7 °F if it exceeds the limit then the person is prohibited to go any further, and if the temperature is within the expected range, then the person can proceed to the last step in the mechanism. The temperature recording is mainly done to keep track of if a person is expected to be a potential patient or maybe is suffering from similar symptoms but are unknown (asymptomatic). If a person is still willing to go inside, then he/she is supposed to carry a medical report which depicts a negative Covid-19, and the fever caused is just a usual body reaction to any of the antigens, or the normal body temperature of the person is lying in this range only since last 1-year minimum; without this report, the person will not be allowed into the place. The very last step of the multifold protection mechanism is checking the amount of sanitizer [6] present on the person’s hand who has passed all the prior steps of protection. Now, the amount of sanitizer is checked with one of the MQ-series sensors called the MQ-3 sensor which is used to detect the presence gases. As a sanitizer contains 95% alcohol (isopropyl alcohol/ethanol) and it is a gas in nature
330
V. Sapra et al.
with properties of odor and volatility, therefore, it is detectable by the gas sensors. If the alcohol rate present on the person’s hands is less than 60% (as per the calculation from formula 1), then the sanitizing machine will automatically spray according to the condition; if the alcohol present is more than 60% on the person’s hands, then no sanitizer will be sprayed, and after this, the person can go inside the room/place. As soon as this condition is also taken into consideration and all the prior conditions are also satisfying, then the door will get opened. There are motors attached to the door which basically will work according to this system; if a person passes all the parameters, then the motors will move in a pre-decided direction (e.g., if the motor rotates in the right direction, then the door will get opened and closed after 5 s thereby rotating the motors back to left direction, and if initially, the motor rotates to the left direction for opening, then for closing, it will move in the opposite direction). But, the door will only get opened when the person has passed all the criteria. Also, if motors are absent in the doors, then you can use automatic sensor doors near which we will reach only when we pass all the conditions, or else, we cannot enter the room, and if you still want the person to go inside or exit, then you will have to provide with a mandatory test report; if you do not have this, you might not be able to enter the place (Fig. 2). The above flowchart specifies the basic design of the system along with the images of the sensors used.
Fig. 2 Main elements used in Eye4u system
EYE4U-Multifold Protection Monitor
331
5 Implementation In this section, we are going to discuss the algorithms used by the various sensors and face mask detectors for their connections with the system. There are threshold values for each sensor, and it is successful working, and if they are followed, only, then the system will work (Fig. 3).
5.1 Room Temperature Sensor The DHT-22 is a temperature sensor that is used to check the indoor temperature of the room or the place where the system EYE4U is set up. The average room temperature is supposed to be between 22 and 25 °C, and if it is above, then that no more people will be permitted to enter the room until it attains its threshold range back. Meanwhile, the people can either wait for their chance or follow the EYE4U processed or exit as per their need. The graph above in Fig. 4 depicts the fluctuating values of room temperature on the Y-axis versus the time in milliseconds on X-Axis, noted by the temperature sensor DHT-22. The values are ranging from 22/22.5 to 24 °C, which is the normal room temperature. Basically, the usage of this sensor is that it will record the room temperature; if it exceeds the threshold, then no more people will be allowed in the room. It can be placed anywhere in the room (Fig. 5).
Fig. 3 Algorithm used by room temperature sensor
332
V. Sapra et al.
Fig. 4 Values depicting the room temperature
Fig. 5 Algorithm for face mask detection
5.2 Face Mask Detection The face mask detection can be done with the help of the very famous facial recognition open-source library called as Haarcascade which used the features of OpenCV and machine learning libraries. In this case, the camera captures the images, and the system using Haarcascade detects if the person is wearing a mask or not. If the person wears the mask, then they can proceed to the next step or else not. If they wear a mask, then a bold line appears around their face and turns into green color, and if not, then it turns into red color like present in the snip below in Figs. 6 and 7 [7]. The percentage of the covered face appears. If the face is covered more than 50%, i.e., above the nose till the eyes, then the green bar will automatically appear, and if not, then the red bar as shown in the snip will appear immediately. The person will
EYE4U-Multifold Protection Monitor
333
Fig. 6 With face mask
Fig. 7 Without mask
either be requested to wear the mask properly or else will be shunned out of the line if not following the norms cautiously [8].
5.3 Temperature Detection of People The temperature of the people entering inside the place is checked via infrared sensor, and the average range of the temperature recorded should be approximately present between 98.4 and 98.7 °F. This measure is basically the normal human body temperature which should lie within this range; if it exceeds the range, then it simply depicts the person has a fever or some sort of illness which the body is fighting and is producing heating effect. If the temperature is inclined toward a higher limit, then the person is specifically advised to visit a doctor, or if the person is still willing to enter the place, then he or she is required to carry a negative covid test report so that there are no casualties taking place (Fig. 8).
334
V. Sapra et al.
Fig. 8 Algorithm for checking temperature of people
5.4 Sanitization Process This process is followed if the person passes all the steps before. In this step, the alcoholic sensor or the gas sensor named as MQ3 sensor is used. This sensor helps to find the amount of sanitizer/alcohol present on the person’s hand. Basically, MQ-3 alcohol sensor detects approximately 25–500 ppm of alcohol (parts per million). The average range of sanitizer which a person should use is ranging between 50 and 100 ppm for a time span of less than 7 s. The MQ-3 sensor works on the principle of oxidation such that if the gas which is to be found is detected on the surface, then the MQ-3 sensor will heat up and will provide with higher analog values. The more the voltage, the more the ppm available and more the analog value. As in this case, the sensor provides us with analog values; therefore, the values can be converted, and here, it is represented in the form of percentage [9]. Now, if the ppm range is above the normal threshold value, i.e., (50–100 ppm), then the voltage will be increased, and higher analog values will be produced by the sensor. Basically, the analog value range of the MQ-3 sensor lies from 0 to 1024 and is exceeding if the values are anywhere near it. After the appropriate calculation according to the algorithm below, the sanitizer level will be obtained and will be sprayed if needed. To find the value of the MQ-3 sensor, the values mentioned are to be calculated. The ratio of the resistance(RS) observed at the sensor of the target gas(Isopropyl) to the resistance (R0) observed at the sensor when 0.4mg/L of alcohol is present in the air molecules is to be calculated.
EYE4U-Multifold Protection Monitor
335
Pseudocode 1 This is the pseudocode used to calculate the RS/R0 ratio. Step 1 Assign the value to the variable R2, (the resistance of the MQ-3 sensor) as 3 k. Step 2 Void Loop1(). { Step 2a Calculate the sensor value (integer type), whose value is taken analog value is obtained by the readings on the pin A0 with pre-defined function (Analog_read); Step 2b Calculate the sensor volt (float type) dividing the sensor value by the value obtained from the range of the analog value of the MQ-3 sensor multiplied by the DC voltage; (The DC voltage supplied to the sensor can be variable and, in this papers testing, it is considered as 5.0 V.) Step 2c Calculate the Rs_gas value by multiplying the DC voltage with the resistance of the MQ-3 sensor and dividing the entire value by sensor volt, thereafter, subtracting the value of the resistance of the MQ-3 sensor; Step 2d Calculate the value of resistance of the sensor(R0) by dividing the value of Rs_gas by the constant value(60) obtained after [10] calculating the value of RS divided by R0; Step 2e Return R0; Step 2f PRINT THE R0 VALUE CALCULATED; } Pseudocode 2 This is the pseudocode used to calculate the blood alcohol level (BAC value mg/L) Step 3 The value of variable resistance of MQ-3(R2) sensor is declared as 2 k.
336
V. Sapra et al.
Step 4 Void Loop2() { Step 4a Repeat the same calculations as performed above in Step 2a, Step 2b, Step 2c; Step 4b Assign a value of 16,000 to the variable R0 which is the resistance of the sensor when only 0.4 mg/L of alcohol is present in clean air; Step 4c Calculate the value of variable ratio which is obtained by dividing Rs_gas by R0; Step 4d Calculate the double type of value of a random variable (x) which is equivalent to × 0 multiplied by the value of variable ratio; Step 4e Calculate the blood alcohol level value by equating any variable (BAC) to the power function of x raise to the power of—1.43 (mg/L value); Step 4f Return the value of BAC; Step 4g Print the value of BAC; Step 4f Convert the mg/L value to g/L by multiplying the BAC with 0.0001; Step 4g Print the g/L value; } Formula to calculate the value of Blood alcohol Level. The RS/R0 and BAC is found by this formula: The equation of a line a log-log plot is: F(x)=F0(x/x0)log(F1/F0)/log(x1/x0)= Pmg/LQ
(1)
where F 1 , x 1 and F 0 , x 0 are two points from the line on the plot which we will receive from the sensor.
EYE4U-Multifold Protection Monitor
RS/R0 = Pmg/L Q
337
(2)
where P and Q are the values found out by the calculation of Eq. 1. Solving for mg/L: mg/L = (x0 RS/R0)antilog value
(3)
now here, antilog value specifies the antilog taken of Eq. 2. The formula to calculate the value for BAC in mg/L, we need to convert the result into g/dL. gd L = 0.0001 ∗ mgL
(4)
now here, as we receive the value of gdL, therefore, we obtain the values of the sensor and then convert it into percentage. (The formulas are all interrelated, and you can change in value into any other value.) (Fig. 9) Here, the Fig. 10 depicts the fluctuations observed in the graph of analog values of the MQ-3 sensor in which the X-axis depicts the range of the analog values from 0 to 1024, and the Y-axis depicts the voltage supplied.
Fig. 9 Algorithm for sanitization process
338
V. Sapra et al.
Fig. 10 Graph to depict the analog values of the MQ-3 sensor
5.5 Motor Rotation As soon as the person gets sanitized and passes all the steps enlisted before in the system, the motors get a signal that since all the parameters are satisfied therefore the person is eligible to walk inside, therefore the motors connected to the door rotate in the pre-defined direction which can be either clockwise or anticlockwise or right or left anything which is suitable [11]. In this case, we are considering the right direction as the direction in which the door opens for 5 s and so that the person can walk inside, and the left or opposite direction is for the door to get closed as mentioned in the flowchart below. This step is one of the most important steps; because in this case, the door opens for a very short time; therefore, the person needs to walk very fast so that the door does not closes, and the person can easily enter the room [12] (Fig. 11).
Fig. 11 Algorithm for the door opening with the motor rotation
EYE4U-Multifold Protection Monitor
339
6 Result This paper mainly reviews a multifold protection mechanism for the people who are either forced to move out of their places on duties in their offices or shops or must urgently rush to the places, due emergencies or even they themselves wish to go out sometimes for buying groceries/medicines/food, etc. This system mainly helps in preventing the vigorous spread of the virus; therefore, it is important to follow it thereby protecting themselves and their families [13]. The camera images of the people entering the room are captured consistently at least till the time the shop or outlet or place is open and working. The steps taken in this system are mainly for the betterment and safety of the society because the people are wildly suffering from this disease and are dying without count; their infinitesimal careless can cost them their lives or can put theirs and their family’s life in jeopardy. Though this system might not provide us with an absolute solution as there will be ample number of times when the amount of spreading of the virus can be variable (very low or high) therefore, it can be used for taking supreme preventative measures and at least be an eye for the owners of the crowded places where people can either follow the measures and enter the place or will be prohibited from the entrance. As of now, the results mentioned above are based on individual (personal prototype based) testing and not on real implementation in the market; therefore, getting the access to real database is not possible. But, as mentioned in Sect. 8, there will be a future implementation of this system.
7 Conclusion This paper has presented an experimental and unprecedented approach for addressing the critical needs of multifold protection mechanisms in this world to at least get control over the pandemic situation [14]. The data present in this paper are based on the prototype made, and we firmly believe that the utilization of the above-mentioned technology can yield a huge difference in the world (but, the technologies involved can be variable in the final product as well). Initially, people thought that this process is not beneficial, but the fact is due to increase in the number of cases and people’s carelessness has led the world to suffer. To improve this condition, utilization of this technology can be beneficial. Due to Covid-19, there is a shortage of the resources in the world and specifically in our country India; therefore, this system can be of tremendous use. As the amount of resources decrease, therefore, the cost of the various equipment’s increases, and the economy takes a hit, which makes it difficult for the people to afford a lot of workers to take care of the place and the people visiting it; it becomes cumbersome to manage the crowd in places like grocery stores or marts or even chemists because people think that the stores will run out of supply and literally make it so messy for the place honor’s to manage the things.
340
V. Sapra et al.
The face mask detection step helps in putting the maximum people away from the stores automatically who do not wear it and helps in dodging a situation where people have to scold others to wear it. In this system, as the sanitizers are used, therefore, the places where the sanitizers are kept in the form of sanitizer tunnels or sprayers or even at the bottles can be re-used instead of taking the ones which will be set at prior for the system usage. The amount of sanitizer used by people is also more than expected and can cause a lot of skin diseases, allergies and can sometimes turn into fatalities like cancer or poisoning [15]. The places which have automated doors or glass doors, or even wooden doors are not required to buy another door or get another one fitted; instead, the initial doors can be re-used thereby attaching a relay and a motor to it for the system successful setup of the monitor. The temperature check of the people is done by the IR sensor; although the temperature marked has an approximation of 0.2 units, but still, it gives a fair warning to the people themselves and around them that they can be affected by covid, or their body is suffering from some sort of infection which is deteriorating their immunity if their temperature exceeds the threshold value [16]. The system can also prevent the places from getting overly populated thereby following the regulations made by the government to follow social distancing. Hence, this system, if fitted with proper instructions, can be a huge benefit to the society and its people if used wisely.
8 Future Work EYE4U-multifold protection monitor is majorly designed to prevent the spread of Covid-19 disease [17]; therefore eventually, its usage can start from small-scale industries or restaurants for testing purposes as it will be receiving ten folds the initial data, and training will become much easier, and later if the resultants obtained are positive, then we can start helping the government immediately for its nationwide implementation [18]. The system can be implemented into various public transports like airport services, railway services, buses, or even metro stations, thereby providing better security. The system can be implemented in the parking lots or toll booths also, to check if the people are wearing masks or not; if not, then the violation can be checked immediately by scanning the number plate of the car by utilizing it through Haarcascade algorithm itself, which detects the images. Then, the incident can be reported officially to any government on-duty subordinate who will take care of the situation himself/herself [19]. If the private places like restaurants or malls allow to give access to the entire database of the cameras or the application interface, therefore, the people who are inside the malls can also be checked if they are wearing masks and sanitizing properly or not. If they somehow do not follow the rules, the security or any of the officials
EYE4U-Multifold Protection Monitor
341
can be informed who can take the required actions against such people. This will help us in prediction of the risk at that place and how many people might have gotten effected. These are the few of many ways in which this system can be implemented in future thereby enabling remarkable improvisation in the people’s health and maintenance of the places in proper decorum hence preventing the ultimate spread of coronavirus.
References 1. https://towardsdatascience.com/face-mask-detection-using-yolov5-3734ca0d60d8 2. https://data-flair.training/blogs/face-mask-detection-with-python/ 3. https://becominghuman.ai/face-detection-using-opencv-with-haar-cascade-classifiers-941dbb 25177 4. https://www.cdc.gov/coronavirus/2019-ncov/community/organizations/business-employers/ bars-restaurants 5. https://www.kaggle.com/omkargurav/face-mask-dataset 6. Bessonneau, V., Thomas, O.: Assessment of exposure to alcohol vapor from alcohol-based hand rubs. Int. J. Environ. Res. Public Health 9(3), 868–879 (2012) 7. Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. Sciencemag 255–259 (2015) 8. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), pp. 483–487 (2017). https://doi.org/ 10.1109/ICEMI.2017.8265863 9. https://www.teachmemicro.com/mq-3-alcohol-sensor/ 10. https://www.sparkfun.com/datasheets/Sensors/MQ-3.pdf 11. https://onlinelibrary.wiley.com/doi/epdf/https://doi.org/10.1111/j.1530-0277.1991.tb00589.x 12. https://link.springer.com/article/https://doi.org/10.1007/s10916-017-0770-z 13. Liao, S., Chen, C., Hu, D.: The role of knowledge sharing and LMX to enhance employee creativity in theme park work team. Int. J. Contemp. Hosp. Manage. 30(5), 2343–2359 (2018). https://doi.org/10.1108/IJCHM-09-2016-0522 14. Barboza, Y., Martinez, D., Ferrer, K., Salas, E.M.: Combined effects of lactic acid and nisin solution in reducing levels of microbiological contamination in red meat carcasses. J. Food Prot. 65, 1780 (2002) 15. Scheid, J.L., Lupien, S.P., Ford, G.S., West, S.L.: Commentary: physiological and psychological impact of face mask usage during the COVID-19 pandemic. Int. J. Environ. Res. Public Health 17(18), 6655 (2020). https://doi.org/10.3390/ijerph17186655 16. The role of community-wide wearing of face mask for control of coronavirus disease 2019 (COVID-19) epidemic due to SARS-CoV-2 17. Cheng, V.C.C., Wong, S.C., Chuang, V.W.M., So, S.Y.C., Chen, J.H.K., Sridhar, S., To, K.K.W., Chan, J.F.W., Hung, I.F.N., Ho, P.L., Yuen, K.Y.: J. Infect. 81(1), 107–114 (2020). https://doi. org/10.1016/j.jinf.2020.04.024 18. Cheng, V.C., Lau, S.K., Woo, P.C., Yuen, K.Y.: Severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection. Clin. Microbiol. Rev. 20, 660–694 (2007) 19. Wu, J., Xu, F., Zhou, W., Feikin, D.R., Lin, C.Y., He, X., et al.: Risk factors for SARS among persons without known contact with SARS patients, Beijing, China. Emerg. Infect. Dis. 10, 210–216 (2004) 20. Allegranzi, B., Pittet, D.: Role of hand hygiene in healthcare-associated infection prevention. J. Hosp. Infect. 73(4), 305–315 (2009)
A Comprehensive Source Code Plagiarism Detection Software Amay Dilip Jain, Ankur Gupta, Diksha Choudhary, Nayan, and Ashish Tiwari
Abstract Source code plagiarism is a simple to finish task, yet exceptionally hard to identify without legitimate device support. Different source code recognition frameworks have been created to help distinguish source code plagiarism. Those frameworks need to perceive various lexical and underlying source code adjustments. For instance, by some primary adjustments (e.g., alteration of control structures, adjustment of data structures or underlying overhaul of source code), the source code can be changed so that it nearly looks real. The greater part of the current source code comparability recognition frameworks can be befuddled when these primary alterations have been applied to the first source code. To be viewed as powerful, a source code similitude discovery framework should address these issues. Existing source code plagiarism detection systems that have been most widely used system worldwide are JPlag. There are a few disadvantages of JPlag which have been accounted for. The vast majority of those disadvantages are outcome of the strategy by which JPlag changes over the source code into token strings JPlag. It addresses all variable types using same token, i.e., string and int will be assigned same token “Vardef,” which results in false positive. To overcome it, in our approach we have assigned different tokens to variables of different type, i.e., string and char will be assigned with same token, whereas string and int array will have different tokens. We have undertaken this project to address them, and we planned and built up the source code similarity detection framework for plagiarism identification. To show that the proposed framework has the ideal adequacy, we played out a notable conventionalism test. The proposed framework showed promising outcomes when contrasted with the JPlag framework in distinguishing source code likeness when different lexical or underlying changes are applied to the copied code. Keywords Machine learning · Efficient tokenization · Similarity detection · Plagiarism techniques and solutions
A. D. Jain (B) · A. Gupta · D. Choudhary · Nayan · A. Tiwari Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_30
343
344
A. D. Jain et al.
1 Introduction Source code plagiarism can be characterized as attempting to pass off (parts of) source code composed by another person as one’s own (i.e., without showing what parts are duplicated from which creator). Literary theft happens regularly in scholastic conditions. Students then deliberately or inadvertently remember hotspots for their work without an appropriate reference. Manual identification of copyright infringement in an assortment of many student’s entries is infeasible and uneconomical. Literary theft in college course tasks is an undeniably basic issue. A few reviews showed that a high level of understudies has occupied with some type of scholastic dishonesty, especially copying other’s assignments. Source code plagiarism is not so natural to identify. Generally, when people are taking care of a similar issue by utilizing a similar programming language, there is a high chance that their task arrangements will be pretty much comparable. A few methodologies of source code alteration exist that can be utilized to veil plagiarism. Instances of such methodologies are renaming identifiers or joining a few sections replicated from various source code records. These changes increment the trouble in perceiving plagiarism. Following source code copying is a tedious errand for educators, since it requires contrasting each pair of source code records containing hundreds or even huge number of lines of code (LOC). Next we have discussed our approach toward detecting plagiarism in source code.
2 Approach 2.1 Four-Phased Approach The main goal is a proficient source code similitude location framework. To be profoundly hearty to lexical and primary adjustments, the framework should have the option to proficiently distinguish and call attention to similarity between source code documents. To achieve this, the framework will carry out an itemized and proficient tokenization calculation and backing different calculations for comparability identification that will work on a delivered token set. The proposed framework will likewise uphold: • Algorithm extensibility • System will likewise be not difficult to stretch out with new calculations for similarity estimation. As per the reported designs commonly found in other systems, our similarity detection approach comprises the following four phases: • Preprocessing • Tokenization
A Comprehensive Source Code Plagiarism Detection Software
345
Fig. 1 Approach overview comprises four stages: preprocessing, tokenization, closeness estimation, and lastly similarity computation
• Similarity measurement • Final similarity calculation (Fig. 1).
2.2 Preprocessing The main period of source code similarity detection makes the identification cycle strong to the accompanying straightforward source code changes: expansion, alteration or cancelation of remarks, part or converge of variable presentations, changing the request for factors, just as expansion of some repetitive assertions. Consequently, all remarks are taken out from the first source code. This interaction incorporates eliminating block remarks, single-line remarks, following remarks and the end offline remarks. Joined variable statements are part of a succession of individual assertions. The completely qualified names are supplanted by straightforward names, while bundle revelations and import explanations are eliminated from the first source code. Additionally, in this stage, factors in articulations are gathered by type. As it tends to be seen, after the pre-preparing stage, all remarks have been eliminated from the source code, and completely qualified class names have been supplanted by their relating basic names; consolidated variable assertion has been part into two individual revelations, while factors in explanations are gathered by type.
2.3 Tokenization Tokenization is the way toward changing over the source code into tokens. This method is extremely well known and utilized by many source code counterfeiting location frameworks. The tokens are picked so that they describe the pith of a pro-
346
A. D. Jain et al.
Fig. 2 Preprocessing example
gram, which is hard to change by a counterfeiter. For instance, whitespace ought to never create a token. This tokenization procedure is ordinarily used to kill different source code adjustments. A basic tokenization calculation can substitute all identifiers and qualities with tokens individually. Our tokenization calculation substitutes identifiers with the fitting tokens. These tokens are picked dependent on the identifier type. For instance, all identifiers of Java numeric sorts, for example all identifiers of byte, short, int, long, coast, twofold alongside their comparing covering classes (byte, short, integer, long, float and double, individually), are subbed with the token. Likewise, their qualities are subbed with the token. With this tokenization calculation, two fundamental benefits are acquired as follows: • False positives are reduced. • Shortened use of data types is overcome (Fig. 2).
A Comprehensive Source Code Plagiarism Detection Software
347
Fig. 3 Various similarity formulas Fig. 4 Overall similarity formula
2.4 Similarity Detection and Measurement When contrasting two strings Str1 and Str2, the point of this calculation is to track down a bunch of substrings that are indistinguishable and that fulfill the accompanying prerequisites • Any token of Str1 may only be matched with exactly one token from Str2. • Substrings are to be found independent of their position in the string. • Because long substring matches are more reliable, they are preferred over short ones. Short matches are more likely to be spurious. As such, changing the request for explanations inside code squares and reordering of code blocks is anything but a viable assault if the reordered code sections are longer than the insignificant match length. In the event that the reordered code fragments are more limited than the insignificant match length, at that point referenced adjustments can be a compelling assault. The matches are checked, so they cannot be utilized for additional matches in the primary period of a resulting cycle. This ensures that each symbolic might be utilized in one match. After every one of the matches is denoted, the main stage is begun once more. The calculation completes its work when the match length remains equivalent to the base match length edge esteem. Various formulas used are shown in Fig. 3.
2.5 Final Similarity Detection The fourth stage in this similarity measure is the last closeness computation. This computation depends on closeness measure esteems acquired from the comparability identification calculations and their weight factors. The general 5 n-gram is a touching substring of length n. The similarity measure between two source code documents a and b is determined utilizing the following formula (Fig. 4): where simi is the similarity measure value obtained from the similarity detection algorithm i, wi is the weight factor of the similarity detection algorithm i, w is the sum of weight factors of all used similarity detection algorithms, and n is the number of all used similarity detection algorithms.
348
A. D. Jain et al.
Fig. 5 Training model
3 Weight Calculation To calculate an appropriate value of weights in the above formula, we have trained our dataset using feedforward neural network model with backpropagation algorithm. Our model is a single-layer feed forward model. In this, we have the following two layers: • Input layer. • Output layer. In the input layer, there are two neurons along with their corresponding weight value S. The percentage similarity calculated from the two algorithms, i.e., Winnowing and Running Karp-Rabin Greedy String Tiling algorithm from previous phases, is given as input to these two neurons of the input layer. Output layer consists of a single neuron which outputs the overall percentage similarity value obtained using a specific set of values for each variable (Fig. 5).
4 Results For the training to find the appropriate value of weights, 200 Java files have been used. They were taken from the JPlag homepage. The tool is widely used for plagiarism detection, and therefore, we decided to take the free accessible test data provided on their homepage. These files only use the Java Standard Library, which is an important requirement of this implementation. The JPlag homepage only provided us with 100 files. Other remaining files have been taken from various Internet sources, and we calculated their similarity using JPlag which acted as an expected similarity percentage for training those files. The following table shows the results obtained
A Comprehensive Source Code Plagiarism Detection Software
349
after training and testing the aforementioned Java files using the training model stated in the phases above. Sr. No. 1 2 3 4 5
Values of n (For grouping tokens) 25 20 10 7 6
Accuracy (compared to JPlag) (%) 89.21 91.70 92.34 96.01 94.78
From the table, it is evident that the value of “n = 7” gave the best result of 96.01% and gave the weight values as W 1 = 58 (Winnowing) and W 2 = 36 (Running Karp-Rabin Greedy String Tiling algorithm)
5 Conclusion The main motive of this project was to develop a robust and accurate source code plagiarism detection software. Study of various existing systems and algorithms has been completed, and we had arrived at the conclusion to use the methods that we have mentioned above as the best possible way to get the maximum accuracy and speed. The final result after taking the weighted averages of the chosen algorithms Winnowing and Running Karp-Rabin Greedy String Tiling and training them to get the desired weights gave us an accuracy of 96% when the “n-gram” size was taken as n = 7. This shows that lower the window, higher is the accuracy which makes sense since tokenized strings are much more comparable when the window size is smaller. A smaller “n-gram” size could not be taken as comparisons were taking a much longer time, and it would defeat the purpose of creating an accurate and fast plagiarism detection algorithm.
References 1. Bandara, U., Wijayarathna, G.: A machine learning based tool for source code plagiarism detection. Int. J. Mach. Learn. Comput. 1(4) (2011) 2. Gondaliya, T.P., Joshi, H.D., Joshi, H.: Source code plagiarism detection ‘SCPDet’: a review. Int. J. Comput. Appl. 105(17), 0975–8887 (2014) 3. Petrik, J., Chuda, D., Steinmuller, B.: Source code plagiarism detection: the UNIX way. In: 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI) Conference 4. Cheers, H., Lin, Y., Smith, S.P.: A novel approach for detecting logic similarity in plagiarised source code. In: 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS) 5. Ali, A.M.E.T., Abdulla, H.M.D., Snasel, V.: Overview and comparison of plagiarism detection tools. DATESO, Citeseer, pp. 161172 (2011)
350
A. D. Jain et al.
6. Karnalim, O., Budi, S., Toba, H., Joy, H.: Source code plagiarism detection in academia with information retrieval: dataset and the observation. Inf. Educ. 18(2), 321–344 (2019) 7. Guo, S., Liu, J.B.: An approach to source code plagiarism detection based on abstract implementation structure diagram. In: MATEC Web of Conferences 232, 02038 EITCE 2018 (2018) 8. Yudhana, A., Mukaromah, I.A.: Implementation of Winnowing algorithm with dictionary English-Indonesia technique to detect plagiarism. (IJACSA) Int. J. Adv. Comput. Sci. Appl. 9(5) (2018) 9. Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection. IEEE Trans. Softw. Eng. 99, 1 (2017) 10. Park, J., Son, D., Kang, D., Choi, J., Jeon, G.: Software similarity analysis based on dynamic stack usage patterns. In: Proceedings of the 2015 Conference on Research in Adaptive and Convergent Systems, RACS, pages 285–290, New York. ACM (2015) 11. Yasaswi, J., Kailash, S., Chilupuri, A., Purini, S., Jawahar, C.V.: Unsupervised learning based approach for plagiarism detection in programming assignments. In: Proceedings of the 10th Innovations in Software Engineering Conference, ISEC ’17, pages 117–121, New York. ACM (2017)
Path Planning of Mobile Robot Using Adaptive Particle Swarm Optimization Himanshu, Arindam Singha, Akash Kumar, and Anjan Kumar Ray
Abstract This paper proposes an adaptive particle swarm optimization (APSO)driven algorithm for robot path planning and obstacle avoidance in an unknown environment. The proposed algorithm consists of two different obstacle avoidance methodologies; considering one obstacle at a time and all obstacles in the path between the robot position to the goal position. To avoid collision, the robot determines the tangential points on the safety circle. The simulation results are presented for different environmental situations. The proposed algorithm is applied on Webots, to validate its effectivity. It works efficiently, and the mobile robot has successfully avoided the obstacles while moving toward the goal position. A comparative study between the two obstacle avoidance methodologies is presented in terms of the minimum path length to reach the goal position. The proposed algorithm is also compared with the existing algorithms, and it provides satisfactory result. Keywords Adaptive particle swarm optimization (APSO) · Path planning · Obstacle avoidance · Mobile robot · Navigation
1 Introduction In recent time, autonomous mobile robots are used in logistics, military operations, space missions, emergency situation like fire hazards, medical, radiation leak etc. The basic objective for the mobile robot is to plan an obstacle-free path from the initial to the goal position. The basic conception about the particle swarm optimization (PSO) was proposed in article [1]. The adaptive inertia weight-based PSO algorithm was proposed and discussed in [2]. Deep reinforcement learning-based algorithms for mobile robot navigation in 2D environment were proposed in articles [3]. An integrated sliding mode controller and soft computing-based algorithm for wheel mobile robot navigation in an unknown environment were developed in Himanshu · A. Singha (B) · A. Kumar · A. K. Ray Electrical and Electronics Engineering Department, National Institute of Technology Sikkim, Ravangla, Sikkim, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_31
351
352
Hanshu et al.
[4]. A detailed survey of the mobile robot navigation was discussed in [5]. A stereo visual-based robust and autonomous navigation of mobile robot was discussed in article [6]. The proposed algorithm was verified with number of experiments in real environment. A radio-frequency identification technology-based localization of the mobile robot and navigation through petri nets dynamics was proposed in article [7]. A cooperative path planning algorithm for mobile robots using particle swarm optimization (PSO) was proposed in [8]. A novel membrane evolutionary artificial potential field approach for mobile robot path planning in both static and dynamic environment was discussed in [9]. An integrated artificial potential field method and ant colony optimization (ACO)-based algorithm was developed in [10] for the mobile robot. In [11], the authors proposed a combined PSO and an artificial potential field-based algorithm for optimized collision-free path planning for the mobile robot. Cuckoo optimization search-based algorithm for mobile robot navigation in dynamic environment was proposed in [12]. A robust path planning for mobile robot using smart particle swarm optimization was developed in [13]. The neural dynamics-based algorithm was also proposed in article [14]. The rest of paper is organized as follows, problem formulation, and basic background of PSO and APSO are described in Sect. 2 and 3 , respectively. Followed by the proposed algorithm in Sect. 4. The simulation results are described in Sect. 5. The concluding remark about this work, and future improvement scopes are discussed in Sect. 6.
2 Problem Formulation The main objective of this work is to generate an obstacle-free path for the mobile robot. Figure 1 depicts the possible path between the initial and the goal position of the mobile robot in presence of an obstacle. In this work, the obstacle is encapsulated within a safety circle having a radius of σ . A straight line will be drawn comprising of the robot position and the goal position to identify the presence of obstacles. If any
Fig. 1 (left) Structure of robot navigation; (right) obstacle elimination beyond the robot path
Path Planning of Mobile Robot Using Adaptive …
353
obstacle is presented in the path, the straight line will intercept it. By following this approach, the straight line may also intercept the obstacles beyond the goal position and prior to the initial robot position as shown in Fig. 1. These obstacles can be eliminated by comparing the difference in x-coordinate of the robot with goal and obstacles, same for y-coordinates, and by comparing the Euclidean distance between the robot position with the obstacles and the goal position. If the distance between robot and obstacle position is greater than the distance between the robot position and the goal position, then the obstacle will be eliminated. The distance between the robot and the tangent point (D1 ) and the robot to the goal position (D2 ) are expressed as 2 2 D1 = (Rx − Tx )2 + R y − Ty , D2 = (Rx − G x )2 + R y − G y (1)
3 Adaptive Particle Swarm Optimization (APSO) The PSO is a population-based search algorithm, in which swarm of particles are evaluated by a function known as objective function to obtain faster convergence [1]. The objective function minimizes or maximizes the potential solution of a particle. The particle will store the personal best and the global best values. The velocity of each particle is updated with respect to its personal and global behavior. The velocity and position of the particles can be updated using the following Eqs. (2) and (3) [1]. [ pv (i, j)]ab+1 = [wpv (i, j) + c1r1 ( pb (i, j) − px (i, j)) + c2 r2 (gb (i, j) − px (i, j))]ab (2) [ px (i, j)]ab+1 = [ px (i, j)]ab + [ pv (i, j)]ab+1
(3)
where a is the particle number in the total population, b is the iteration numbers. c1 and c2 are the cognitive and social parameters, respectively.r1 , r2 are the random numbers. pb (i, j) is the particle best value for that particular particle, and gb (i, j) is the global best value of the particles. In APSO, the particles automatically adapt the changes in velocity using the inertia weight. This increases the capability to search the solution space. The inertial weights are varied according to Eq. (4). The value of wmin is taken as 0.5 [2] and random value is generated between 0 and 1 [2]. w = wmin + c1 = c1max − c1max
rand() 2 − c1min ∗ Od
(4) (5)
354
Hanshu et al.
c2 = c2max − c2max − c2min ∗ Od
(6)
where c1max , c2max , c1min , and c2min are the maximum and minimum values of c1 and c2 , respectively. Od is a flag representing whether an obstacle is detected or not. If obstacle detected Od = 1, otherwise Od = 0.
4 Proposed Methodology for Robot Path Planning Using APSO 4.1 Minimizing the Distance Between the Robot Initial and the Goal Position The minimum distance between the goal position and the robot position can be expressed as Eq. (7). f 1 (i) =
2 (Rx − G x )2 + R y − G y
(7)
where Rx , R y and G x , G y are the coordinates of the robot’s current position and the goal position along x-coordinate and y-coordinate, respectively. i represents the iteration number. If there is no obstacle present in the path, the robot will try to follow the minimum distance the robot’s initial position and the goal position.
4.2 Path Planning and Obstacle Avoidance Obstacle Detection: To avoid the obstacle, using both the methods, the robot will determine the tangential points on the safety circle. So, to find the tangential points, a straight line will be drawn between the robot position and the goal position; it is expressed as Ry − G y Ry − G y ∗ x + Gy − ∗ Gx (Rx − G x ) (Rx − G x )
y=
(8)
The safety circle enclosing the obstacles can be expressed as x 2 + y 2 − (2Ox ) ∗ x − 2O y ∗ y + Ox2 + O y2 − σ 2 = 0
(9)
where Ox and O y are the center position of the obstacle. Now, replacing the value of y from Eq. (8) in Eq. (9), it becomes a quadratic equation. If the roots of the quadratic
Path Planning of Mobile Robot Using Adaptive …
355
equation are real and unequal, it signifies that the straight line intersect the safety circle. n is the total number of iterations. Ot represents total number of obstacles, and n p is the total number of particles. Method I: To avoid the collision with the obstacles, the robot will determine the tangential points (Tx , Ty ) on the safety circle. The equation of tangents from the robot position to the safety circle can be represented as Tx ∗ (Rx − Ox ) + Ty ∗ R y − O y − Ox ∗ Rx − O y ∗ R y + Ox2 + O y2 − σ 2 = 0 (10) Equation (10) can be rewritten as Ty =
(Rx − Ox ) Ry − Oy
∗ Tx −
Ox2 + O y2 − σ 2 − O y R y − Ox Rx Ry − Oy
(11)
Now, considering (x ≡ Tx ) and (y ≡ Ty ) in Eq. (11), and replacing with Eq. (9), it becomes a quadratic equation. By solving the equations, the tangential points will be determined. Out of two tangential points, the one with the minimum distance from the goal position will be considered as target position. The objective function f 2 for avoiding the collision with obstacle can be expressed as 2 f 2 (i) = ∗ Od (Rx − Tx )2 + R y − Ty
(12)
Once the distance between the robot position and tangential point is less or equal to σ , then Od is set to 0. So, that the robot can again search for new obstacles ahead. The overall objective function for method I can be represented as, f (i) = f 1 (i) + f 2 (i)
(13)
Method II: In this approach, all the obstacles are considered between the robot’s initial position and the goal position. The modified objective function for obstacle avoidance can be represented as
2 ∗ Od f 3 (i) = (Rx − Tx )2 + R y − Ty
(14)
j
j represents the total number of obstacles detected. The interpretations of the variables are same as described before. So, the total objective function for method II is, f (i) = f 1 (i) + f 3 (i)
(15)
356
Hanshu et al.
In each method, the global best position of particles will be considered to be position of the robot. This process will continue until the robot reaches to the goal position.
5 Simulation Results 5.1 Implementation on Different Environmental Conditions The workspace is a 2D environment of size 220 ∗ 220 (m). The value of σ is considered as 10 m. The values of c1min , and c2min are taken as 1.1, whereas c1max , and c2max are taken as 2. Figures 2 and 3 depict the result obtained by using method I and method II, respectively. The proposed algorithm is validated with varying environmental conditions as well as different initial and the goal positions. The robot has successfully reached to the goal position without colliding with the obstacles. A detailed comparative result is given Table 1, in terms of the path length of the path for all four conditions using method I and method II, respectively. It can be concluded from Table 1 that both the algorithms are effective for the robot to generate a collision-free path. The pseudocode of the proposed algorithm is given below.
Fig. 2 Robot movement using method I in different environmental situations
Path Planning of Mobile Robot Using Adaptive …
357
Fig. 3 Robot movement using method II in different environmental situations
Table 1 Comparison of two different methods in terms of length of the path Situations Path length
1
2
3
4
Method I
278.53
349.39
387.15
381.48
Method II
294.80
287.84
314.16
283
358
Hanshu et al.
5.2 Representation on Webots Platform To validate the proposed algorithm, it is implemented on Webots simulator. An offset value is taken around the goal. If the robot reaches within that offset of the goal position, it is assumed that the robot has reached to the goal position. Figures 4 and 5 are depicting the results of the robot path planning using method I and method II, respectively. The robot has taken similar path in the simulated result for both of these cases. So, it can be stated that the proposed algorithm works efficiently and provides satisfactorily result on Webots.
Path Planning of Mobile Robot Using Adaptive …
359
Fig. 4 Implementation of method I: (left) simulated environment; (right) Webots simulator
Fig. 5 Implementation of method II: (left) simulated environment; (right) Webots simulator
5.3 Comparative Study To validate the effectiveness of the proposed algorithm, it is compared with the algorithm proposed in [13, 15]. In [13], the authors had given the path length using PSO and APSO. A detailed comparative result is tabulated in Table 2. It can be seen in Table 2 that the robot has taken shortest path using method I and method II than using PSO and APSO given in [13]. It shows that the proposed algorithm Table 2 Comparison with [13] in terms of length of the path S. No
PSO
APSO
Method I
Method II
1
441.72
434.14
304.27
367.16
2
447.42
428.42
418.93
424.08
3
600.81
580.11
480.60
465.65
4
466.93
456.32
300.15
332.02
360
Hanshu et al.
Table 3 Comparison with [15] for different parameters Environment
PSO method
Path length (pixels)
Execution time (sec)
Number of best generations
Map 1
S-PSO
14.408
1.6054
64
B-PSO
14.3872
1.5719
73
TV-IWPSO
14.3792
1.5354
83
Proposed algorithm
18.8645
1.3921
33
S-PSO
14.2179
1.6654
66
B-PSO
14.3994
1.6661
55
TV-IWPSO
14.1796
1.6606
88
Proposed algorithm
15.2925
1.4485
32
Map 2
is working efficiently than [13] for this particular set of environmental conditions. While comparing with [15], the result is given with method II for path planning. The proposed algorithm is compared in terms of path length, execution time, and number of best generations. The comparative parameters are summarized in Table 3. Though the robot has longer path lengths in the proposed algorithm, the execution time and number of best generations are less compared to [15].
6 Conclusion and Future Work In this work, an APSO-based algorithm is proposed for robot path planning and an obstacle avoidance. Two different methodologies are proposed for collision avoidance. The effectiveness of those methodologies is verified through different environmental conditions. The simulation results have verified the effectiveness of the proposed algorithm. Along with that, the proposed algorithm is implemented on Webots simulator. The comparative study shows the effectiveness of the proposed algorithm over existing works in the literature. The possible scope of improvement of this work, is to multi-robot task assignment and path planning in presence of dynamic obstacles. Acknowledgements This work is supported by Visvesvaraya Ph.D. Scheme, Digital India Corporation for the project entitled “Intelligent Networked Robotic Systems”.
Path Planning of Mobile Robot Using Adaptive …
361
References 1. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995) 2. Nickabadi, A., Ebadzadeh, M.M., Safabakhsh, R.: A novel particle swarm optimization algorithm with adaptive inertia weight. Appl. Soft Comput. 11(4), 3658–3670 (2011) 3. Xiang, J., Li, Q., Dong, X., Ren, Z.: Continuous control with deep reinforcement learning for mobile robot navigation. In: Chinese Automation Congress (CAC), pp. 1501–1506, Hangzhou, China, (2019) 4. Suganya, K., Arulmozhi, V.: Sliding mode control with soft computing-based path planning wheeled mobile robot. In: 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 1–5, Coimbatore, India (2017) 5. Gao, X., Li, J., Fan, L., Zhou, Q., Yin, K., Wang, J., Song, C., Huang, L., Wang, Z.: Review of wheeled mobile robots’ navigation problems and application prospects in agriculture. IEEE Access 49248–49268 (2018) 6. Chae, H.W., Choi, J.H., Song, J.B.: Robust and autonomous stereo visual-inertial navigation for non-holonomic mobile robots. IEEE Trans. Veh. Technol. 69(9), 9613–9623 (2020) 7. Da-Mota, F.A.X., Rocha, M.X., Rodrigues, J.J.P.C., De Albuquerque, V.H.C., De Alexandria, A.R.: Localization and navigation for autonomous mobile robots using petri nets in indoor environments. IEEE Access 6, 31665–31676 (2018) 8. Baygin, N., Baygin, M., Karakose, M.: PSO based path planning approach for multi service robots in dynamic environments. In: International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–5. Malatya, Turkey (2018) 9. Orozco-Rosas, U., Montiel, O., Sepulveda, R.: Mobile robot path planning using membrane evolutionary artificial potential field. Appl. Soft Comput. 77, 236–251 (2019) 10. Chen, G., Xu, Y.: Mobile robot path planning using ant colony algorithm and improved potential field method. Comput. Intell. Neurosci. (2019) 11. Mandava, R.K., Bondada, S., Vundavilli, P.R.: An optimized path planning for the mobile robot using potential field method and PSO algorithm. Soft Comput. Probl. Solving 817, 139–150 (2018) 12. Hosseininejad, S., Dadkhah, C.: Mobile robot path planning in dynamic environment based on cuckoo optimization algorithm. Int. J. Adv. Robotic Syst. 16(2) (2019) 13. Dewang, H.S., Mohanty, P.K., Kundu, S.: A robust path planning for mobile robot using smart particle swarm optimization. Procedia Comput. Sci. 290–297 (2018) 14. Chen, Y., Liang, J., Wang, Y., Pan, Q., Tan, J., Mao, J.: Autonomous mobile robot path planning in unknown dynamic environments using neural dynamics. Soft Comput. 13979–13995, (2020) 15. Abaas T.F, Shabeeb A.H.: Autonomous mobile robot navigation based on PSO algorithm with inertia weight variants for optimal path planning. In: IOP Conference Series: Materials Science and Engineering, vol. 928(2), p. 022005 (2020)
Impact on Mental Health of Youth in Punjab State of India Amid COVID-19—A Survey-Based Analysis Ramnita Sharda, Nishant Juneja, Harleen Kaur, and Rakesh Kumar Sharma
Abstract The COVID-19 pandemic has triggered one of the biggest crises for human health distressing more than 200 countries worldwide. The petrifying effects of this pandemic have caused significant impact on mental health of public. Now, due to protracted pandemic time, the daily routine chores, social meetings and human interactions have been massively disturbed, resulting in increased stress levels amongst the youth in particular. In the present study, a survey-based analysis has been done to underline the main factors that are affecting the mental health of youth of Punjab state of India in this pandemic period and an attempt has been made to suggest the potential recommendations from the findings. The average perceived stress score (APSS) of the respondents has been calculated, and it has been found that the average APSS comes out to be 3.4851 which is really shocking. The statistical analysis from the paper suggests that some of the parameters like ‘feeling more sad or depressed now as compared to pre-pandemic period’ strongly influence APSS. Also, it has been found that three parameters viz; sleep disturbances, more anxiety and restlessness, frequent mood swings are strongly correlated with APSS of the respondent. The statistical analysis has been performed using the software Statistical Package for Social Sciences 25 (SPSS25). Keywords Mental health · COVID-19 · Correlation coefficient · SPSS
R. Sharda Department of English, Dev Samaj College for Women, Ferozepur, Punjab, India N. Juneja (B) Department of Mathematics, Dev Samaj College for Women, Ferozepur, Punjab, India e-mail: [email protected] H. Kaur Department of Chemistry, Dev Samaj College for Women, Ferozepur, Punjab, India R. K. Sharma School of Humanities and Social Sciences, Thapar Institute of Engineering and Technology (Deemed University), Patiala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_32
363
364
R. Sharda et al.
1 Introduction Youth is the dynamic force of the population in any country. Optimal mental health of youth is essential for optimal functioning of any nation; moreover, the youth embody the future of any country. The pandemic has thrown the world’s youth into crises. The COVID-19 pandemic has stirred critical mental issues all across the world. Many geographical scientists and researchers elaborated in their research that the severity of coronavirus depends on geographical/climatological such factors mainly on temperature, sun exposure, due point, humidity, wind speed, rainfall [5, 6, 12]. To prevent the spread of the epidemic, governments of many countries announced some preventive steps like total shut down of educational institutes, travel restrictions, closure of gyms, theatres, museums, parks and places with large gathering [10], Salcedo et al. [17]. Preliminary evidence indicated that only elderly people were getting affected [12, 16, 24, 26], but later on it was observed that social distancing measures, reduction in social interaction and self-isolation also causes a deepened detrimental impact on the psychology of the youth [1, 2]. Youth unemployment has risen since the 2008 financial crises which stifled investment in education, and now, the pandemic has made career prospects so much worse for them. In many places, young workers were the first ones to lose their jobs. About 80% of young people confess that their mental health has deteriorated. Mental health has always been a hushed-up topic and a taboo of the sorts in Punjab. There is a dire need to ponder upon the issues that are adversely affecting the mental health of the youth of Punjab, as the state is already struggling with issues like drug abuse and brain drain manifested in the movement of major chunk of youth to the western countries. Pandemic has changed the dynamics of social relations for people, means of communication have forced the youth to isolate themselves from their peer and a degree of alienation has crept into their lives. It is imperative to understand what the youth is facing at the moment. The paper is structured in the following manner. The brief literature showing the effect of pandemic on youth mental health has been discussed in Sect. 2. The materials and methods used are given in Sect. 3. Section 4 gives the main results from the survey, followed by Sect. 5, in which statistical analysis has been done of the obtained results. In the end, Sect. 6 briefly concludes the major findings from the paper.
2 Literature Survey Several surveys have revealed that anxiety and stress level have been diagnosed much more amongst the college and university students [9, 15, 18, 20]. In the light of rising concern about the current COVID-19 pandemic, a growing number of universities across the world have either postponed or cancelled all campus events such as workshops, conferences, research projects, sports and other activities. These
Impact on Mental Health of Youth in Punjab …
365
preventive steps are necessary for further spread of pandemic, but at the same time, these lead to unfavourable effects on mental health of youth. They are feeling stressed because of major interruptions in teaching and assessment in the final part of their studies,their graduation is delayed due to the postponement of the final examinations. Further, the youth is starting work life in a massive global recession and facing serious challenges to future prospects caused by the COVID-19 crisis [16]. Wang et al. [21, 22] assessed mental health status of youth by the depression, anxiety and stress scale (DASS-21) in China. He reported the study of 1210 respondents from 194 cities in China. Most respondents spent 20–24 h per day at home (84.7%) and were worried about their family members contracting COVID-19 (75.2) [25]. Short-term and long-term effects on mental health, due to viral infection of brainstem nuclei, neurotoxin effect, myalgia, coryza and poor self-rated health status, were significantly associated with a greater psychological impact of the outbreak [11, 21, 23] that leads to higher levels of stress, anxiety and depression [3, 21, 22]. Lai et al. [7] reported that during the epidemic, in response to the COVID-19, psychological assistance services including Internet, counselling, telephonic and hotlines services have been widely deployed by local and national mental health institutions. However, evidence-based evaluations reported that and mental health of youth is still a front-line target for all countries. Roy et al. [14] reported that India observed over 300 suicidal cases in one the month (May, 2020) because of distress caused by lockdown. Apart from this, ‘non coronavirus deaths’, due to mental health disorders, starvation and accidents, have been publicized through social media [8, 19]. The anger, angst, disillusionment, disappointment, distress, feelings of ennui, world weariness and loss of mental health of this ‘lost generation’ are a serious issue that needs to be addressed along with implementation of preventive measures. However, there are many ways by which we can improve their future. The findings in this study will identify factors that are significantly affecting the mental health of youth in Punjab state of India. It also provides ample space to the participants to share their valuable feedback that can be used to formulate psychological interventions to improve the mental health of vulnerable groups during the COVID-19 epidemic.
3 Materials and Method A structured questionnaire was designed to access the impact of various factors on the mental health of the students. The questionnaire was broadly divided into four sections. Section 1 was well designed to collect the information about the demographic profile of the participants. In Sect. 2, six questions were there in all asking about the personal information of the participants. Section 3 of the questionnaire was designed to study the social impact of COVID-19 on the life of the participants. In the last Sect. 4, an attempt has been made to study the level to which COVID19 affects the psychological behaviour of the participants. To order the respondents opinion and to measure the construct through different statements, a Lickert scale
366
R. Sharda et al.
has been used in Sects. 4 and 5. The questionnaire was circulated to 512 students across Punjab through social Websites in the form of Google form link. The aim of the survey was clearly explained to the respondents. The probable time taken by the participants to fill the questionnaire was from 5 to 7 min. The responses to each question were cautiously considered. The data obtained from the survey have been statistically analyzed with the help of SPSS. The main factors that can significantly affect the mental health of youth have been traced out and a multivariate regression analysis has been carried out along with ANOVA using SPSS software.
4 Results 4.1 Demographic Characteristics A total of 512 valid responses were received. It is observed from the responses to the questions related to demographic profile of the students that out of the total responses recorded, majority of the responses are recorded from the female participants (86.9%). However, there is a good mixture of the age group of the respondents with 57.8% of the responses from the youngsters with age ranging between 15 and 20 years and 42.2% from the age group ranging between 21 and 25 years (Table 1). Moreover, 44.4% of the respondents are students of undergraduate classes and 37.4% of the respondents are studying in post graduate class. The questionnaire was filled by people from various districts of Punjab state of India. The pie charts presenting the statistics of demographic profiles of respondents are shown in Fig. 1. Table 1 Demographic descriptions of N = 512 respondents
Gender
Type
No. of respondents
Female
445
Male Age group Education
67
Others
00
15–20
216
21–25
296
UG student
227
PG student
191
Any other
76
N.A
18
Impact on Mental Health of Youth in Punjab …
367
Fig. 1 The pie charts presenting the statistics of demographic profiles
4.2 Psychological Impact of COVID-19 COVID-19 has significant psychological impact on the participants. To access the psychological impact of COVID-19, Lickert scale is used in this section with options: strongly disagree, disagree, neither agree nor disagree, agree, strongly agree. For statistical analysis, the options were further replaced by the digits from 1 to 5. Higher number indicating more stress factor associated with the response of the participant. There are eight questions designed for this section which are sufficient enough to read the mental stress level of the respondent. The average value of perceived stress scores (APSS) has been calculated for each individual by adding all the responses (numeric value) of the participants followed by taking their average. The following criterion is used for accessing the mental health of each individual (Table 2). It has been observed that COVID-19 pandemic has significantly affected the mental health of the public in this particular region of Punjab. More than 50% of the participants reported an increase in depression level, sleep disturbances, more mood
368
R. Sharda et al.
Table 2 Criterion used for accessing the mental health of each individual APSS
Stress level
Less than 2.5
Normal
2.6–3
Moderate
3.1–3.5
High
Greater than 3.5
Very high
swings during this pandemic time. Some of them faced problems related to their appetite schedule as well. Around 57% of the participants reported that COVID-19 has significantly increased their overall symptoms of stress and anxiety. The basic descriptive statistics for the responses of the participants related to psychological impact of COVID-19 has been shown in Table 3. It can be seen that the mean of APSS is 3.4851 which is indicating the severity of stress level the youth is facing during this pandemic time. Table 3 Basic descriptive statistics for responses to psychological impact of COVID-19 Descriptive statistics questions
N
Mean
Std. deviation Variance
Feeling more sad or depressed now as compared to pre-COVID times
512 3.4961 1.09078
1.190
Facing more problems related to sleep, i.e. either 512 3.3945 1.17234 sleeping too little or sleeping too much as compared to pre-COVID times
1.374
Confidence level has decreased too much as compared 512 3.1211 1.14034 to pre-COVID times
1.300
More anxious, restless and much more worried as compared to pre-COVID times
512 3.4512 1.06793
1.140
Experiencing poor appetite or problems of over eating as compared to pre-COVID times
512 3.1660 1.11113
1.235
Experiencing more mood swings now as compared to pre-COVID times
512 3.5898 1.02021
1.041
More tensed about your health as compared to pre-COVID times
512 3.6699 1.01382
1.028
Postponed your planned activities much more than as compared to pre-COVID times
512 3.9922 0.89570
0.802
Average perceived stress score (APSS)
512 3.4851 0.71394
0.510
Impact on Mental Health of Youth in Punjab …
369
5 Statistical Analysis Eight questions scale is designed to access the average perceived stress score of the participants. Firstly, reliability analysis is checked for this designed scale. The value of Cronbach’s alpha was found to be 0.824 which shows the good internal consistency in the designed scale. Now, an attempt is made to identify the variables which influence the average perceived stress score (APSS) of the respondents using factor analysis in SPSS. It has been observed that the first factor (feeling sadder or depressed now) has strong influence on the value of APSS. It has Eigen value of 3.602 and this single factor explained around 45% of the variance in APSS of the respondents (Table 4). Now, correlation coefficients have been found in order to identify the variables that strongly influence the APSS. Here, APSS is taken as dependent variable and all other components of designed scale as independent variables. The criterion for determining the strength of correlation is as follows (Table 5). It can be seen from the Table 6 that APSS is highly correlated with some factors like sleep disturbances, more anxiety and restlessness, frequent mood swings. These three components of the scale need to be addressed in order to have a stress-free life. Moreover, APSS is moderately correlated with feelings of being sad or depressed, decrease in confidence level, appetite problems and tension of health. However, postponing of planned activities is somehow weakly correlated with the APSS. The results from Kaiser–Meyer–Olkin (KMO) and Bartlett’s test truly justified the significance and consistency of the results obtained (Table 7). Table 4 Factors influencing the average perceived stress score (APSS) Feeling more sad or depressed now as compared to pre-pandemic time (component-1) Eigen value Percentage of variance explained
3.602 45.02%
Extraction method: principal component analysis Rotation method: varimax with Kaiser normalization
Table 5 Thresholds for correlation coefficient APSS
Stress level
Greater than 7
Highly correlated
Between 6–7
Moderately correlated
Less than 6
Weakly correlated
370
R. Sharda et al.
Table 6 Correlation of Q1 to Q8 of the designed scale with APSS Pearson’s correlation coefficient
Correlation
APSS Feeling more sad or depressed now
0.663
Moderately correlated
Facing more problems related to 0.708 sleep, i.e. either sleeping too little or sleeping too much
Highly correlated
Decrease in confidence level
0.656
Moderately correlated
More anxious, restless and much more worried
0.770
Highly correlated
Poor appetite or problems of over eating
0.671
Moderately correlated
More mood swings
0.725
Highly correlated
More tensed about your health
0.626
Moderately correlated
Frequently postponed your planned activities
0.520
Weakly correlated
Table 7 KMO and Bartlett’s test Bartlett’s test of sphericity
Kaiser–Meyer–Olkin measure of sampling adequacy
0.880
Approx. chi-square
1107.681
Df
28
Sig
0.000
6 Conclusion In the present paper, eight-point scale was designed to access the mental health of youth of Punjab state of India. A survey-based analysis was carried out, covering the respondents from the most affected districts of Punjab like Jalandhar, Amritsar, Ludhiana, Patiala, Ferozepur, Mansa, etc. The average perceived stress score of the respondents was calculated with the help of responses to each question of this scale. It has been found that the mean value of APSS comes out to be 3.4851 which is really alarming. The correlation coefficient of APSS with all the eight parameters in the scale has been calculated. The reliability analysis along with factor analysis has been done with the help of SPSS software. The main factors influencing the APSS have been chalked out. The statistical analysis from the present study suggests that some of the parameters like ‘feeling more sad or depressed now as compared to prepandemic period’ contributes major part in the variance of APSS. Further analysis shows that three parameters viz; sleep disturbances, more anxiety and restlessness, frequent mood swings are strongly correlated with APSS of the respondent. These factors need to be addressed on priority basis in order to have healthy youth in the society. The government should take some proactive measures to support the mental
Impact on Mental Health of Youth in Punjab …
371
health and well-being of youth. In spite of lack of information regarding the effect of pandemic on social life, daily routine of the participants, this scale may be used in the future studies for basic screening of the mental health status of the respondents and understand their problems and make efforts to stand up for them.
References 1. Basch, C.H., Sullivan, M., Davi, N.K.: The impact of the COVID-19 epidemic on mental health of undergraduate students in New Jersey, Cross-sectional study. PLoS ONE 15(9), 1–16 (2020) 2. Han, W., Xu, L., Jing, A.N., Jing, Y., Qin, W., Zhang, J., Jing, X., Wang, Y.: Online-based survey on college students’ anxiety during COVID-19 outbreak. Psychol. Res. Behav. Manag. 14, 385–392 (2021) 3. Huang, C.: Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 6736(20), 30183–30185 (2020) 4. Kafka, A.C.: Shock, Fear, and Fatalism: as Coronavirus Prompts Colleges to Close, Students Grapple with Uncertainty. The Chronicle of Higher Education (2020) 5. Kumar, A., Sinwar, D., Saini, M.: Study of several key parameters responsible for COVID-19 outbreak using multiple regression analysis and multi-layer feed forward neural network. J. Interdisc. Math. 24(1), 53–75 (2020) 6. Kumar, R., Pandey, A., Geleta, R., Sinwar, D., Dhaka, V.S.: Study of social and geographical factors affecting the spread of COVID-19 in Ethiopia. J. Stat. Manage. Syst. 24(1), 99–113 (2021) 7. Lai, J., Ma, S., Wang, Y., Cai, Z.X., Hu, J., Wei, N., Wu, J., Du, H., Chen, T., Li, R., Tan, H., Kang, L., Yao, L., Huang, M., Wang, H., Wang, G., Liu, Z., Hu, S.: Factors associated with mental health outcomes among health care workers exposed to coronavirus disease 2019. Jamma Network Open 3(3), 1–12 (2020) 8. Lennan, M.Mc., Zurich, S.K.: A report by world economic form. Glob. Risk Initiative 16 (2021) 9. Liu, X., Ping, S., Gao, W.: Changes in undergraduate students’ psychological well-being as they experience university life. Int. J. Environ. Res. Public Health 16, 2864 (2019) 10. Moghe, K., Kotecha, D., Patil, M.: Covid-19 and Mental Health: A study of ots Impact on Students (2020). https://doi.org/10.1101/2020.08.05.20160499 11. Nishiura, H., Jung, S.M., Linton, N.M., Kinoshita, R., Yang, Y., Hayashi, K., Kobayashi, T., Yuan, B., Akhmetzhanov, A.R.: The extent of transmission of novel coronavirus in Wuhan, China. J. Clin. Med. 9(330), 1–5 (2020) 12. Pandey, A., Kumar, R., Siwar, D., Tadele, T., Raja, L.: Assessing the role of age, population density, temperature and humidity in the outbreak of COVID-19 pandemic in ethiopia. Congree Intell. Syst. 725–734 (2021) 13. Patel, A., Jernigan, D.B.: Initial public health response and interim clinical guidance for the 2019 novel coronavirus outbreak—United States. Erratum 69(5), 140–146 (2020) 14. Roy, A., Singh, A.K., Mishra, S., Chinnadurai, A., Mitra, A., Bakshi, O.: Mental health implications of COVID-19 pandemic and its response in India. Int. J. Soc. Psychiatry 1–14 (2020) 15. Savitsky, B., Findling, Y., Ereli, A., Hendel, T.: Anxiety and coping strategies among nursing students during the covid-19 pandemic. Nurse Education in Pract 46:46 (2020) 16. Sahu, P.: Closure of Universities due to coronavirus disease 2019 (COVID-19): impact education and mental health of students and academic staff. Cureus 12(4) (2020) 17. Salcedo, A., Cherelus, G.: Travel restrictions. Organ. Manage. J. 17(4), 171–172 (2020) 18. Sharma, P., Devkota, G.: Mental health screening questionnaire: a study on reliability and correlation with percieved stress score. J. Psychiatrists Assoc. Nepal 8(2), 1–6 (2019) 19. Sharma, R., Sharma, N.: Statistical analysis of COVID-19 (SARS-Cov-2) patients data of Karnataka, India. Res. Square
372
R. Sharda et al.
20. Stein, M.B.: Editorial: COVID and anxiety and depression in 2020. Depress Anxiety 37(4), 302 (2021) 21. Wang, C., Horby, P.W., Hayden, F.G., Gao, G.F.: A novel coronavirus outbreak of global health concern. Lancet 6736(20), 30185–30189 (2020) 22. Wang, C., Pan, R., Wan, X., Tan, Y., Xu, L., Ho, C.S., Ho, R.C.: Immediate psychological responses and associated factors during the initial stage of the 2019 corornavirus disease (COVID-19) epidemic among the general population in China. Int. J. Environ. Res. Public Health 17, 1729 (2020) 23. Xiang, Y.T., Yang, Y., Li, W., Zhang, L., Zhang, Q., Cheung, T., Ng, C.H.: Timely mental Health Care for the 2019 novel coronavirus outbreak is urgently needed. Lancet Psychiatry 7, 28–29 (2020) 24. Xiang, Y.T., Yang, Y., Li, W., Zhang, L., Zhang, Q., Cheung, T., Ng, C.H.: Timely mental health care for the 2019 novel Coronavirus outbreak is urgently needed. Lancet Psychiatry 0366(20), 30046–30048 (2020) 25. Zhai, Y., Du, X.: Mental health care for International Chinese Students affected by the Covid-19 outbreak. Lancet Psychiatry 7, e22 (2020) 26. Zou, P., Huo, D., Li, M.: The Impact of COVID-19 pandemic on firms: a survey in Guangdong Province, China. Glob. Health Res. Policy 5(41), 1–10 (2020)
SmartACL: Anterior Cruciate Ligament Tear Detection by Analyzing MRI Scans Joel K. Shaju, Neha Ann Joshy, Alisha R. Singh, and Rahul Jadhav
Abstract Out of two cruciate ligaments in the human knee, one is the anterior cruciate ligament (ACL). This ligament connects the thighbone to the shinbone. Anterior cruciate ligament tear is one of most common knee injuries in an athlete or a dancer or any professional in which rigorous physical activity is required. The chances of this injury increase as the participation in physical activity increases. People participating in basketball, football, hockey, or different forms of dance generally face more weight bearing on hip, knee, and ankle which results in direct tearing of muscular tissues of anterior cruciate ligament. This injury is usually diagnosed by a magnetic resonance imaging (MRI) scan which is then analyzed by radiologists with high experience to produce reports. These reports can take up to weeks depending on various factors like the availability of radiologists or the limited number of experts in this field. Delay in such crucial reports can lead to degradation of the patient’s physical health as if this injury is critical, it needs to be attended as soon as possible else, it may lead to serious issues like permanent damage of the leg. To improve the current situation, this paper proposes a deep learning model to analyze these magnetic resonance imaging scans and predict the injury to automate and speed up the complete process of detecting an anterior cruciate ligament tear. Keywords Anterior cruciate ligament (ACL) tear · Magnetic resonance imaging (MRI) · Deep learning
1 Introduction There exist four ligaments in the knee that connects the femur (thighbone) to the tibia (shin bone). The two collateral ligaments are the medial collateral ligament (MCL) and the lateral collateral ligament (LCL), while the two cruciate ligaments are the anterior cruciate ligament (ACL) and the posterior cruciate ligament (PCL). When knee joint stability is considered, the anterior cruciate ligament (ACL) is mainly J. K. Shaju (B) · N. A. Joshy · A. R. Singh · R. Jadhav Fr. C. Rodrigues Institute of Technology, Mumbai University, Navi Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_33
373
374
J. K. Shaju et al.
termed in passive restraint to anterior translation of the tibia with respect to the femur. The term cruciate means cross-shaped because of which the ACL crosses the posterior cruciate ligament to form an “X”. It is composed of wavy bundles of collagen fibers which assist in controlling excessive motion by limiting the mobility of the joint in both frontal and transverse planes due to its specific orientation. Injuries to the ACL are one of the most common knee injuries which are mainly sustained as a result of sports participation or any other physical activity when the foot is firmly planted on the ground and a sudden force hits the knee with a jerk while the leg is in a straight or slightly bent position. This type of injury is common in football, hockey, handball, wrestling, badminton, cricket, gymnastics, volleyball, and other activities such as dance with lots of stop-and-go movements. These injuries result in joint effusion, altered movement along with the weakening of muscle, and reduction in functional performance of the foot. Ligament injuries may range from mild (small tear) to severe, which involves the complete tear of the ligament or when the ligament gets separated from the part of the bone. There are various symptoms of an ACL injury that includes hearing a pop in the knee at the time of injury, pain around the knee, sudden swelling of the knee within the first few hours of the injury which is a sign of bleeding inside the knee joint, limited knee movement because of pain, and swelling and instability of the knee. After an acute injury, the person may not be able to walk because of the pain and swelling. When it comes to diagnosing this kind of injury, the doctor mainly checks the stability, movement, and tenderness in both injured and as well as in uninjured knee. Imaging tests such as X-rays and magnetic resonance imaging (MRI) are used to detect the damage in the ligament. A knee X-ray only detects the dislocation of the joint whether it is in proper alignment or not and the causes of common symptoms such as pain, tenderness, or swelling of the knee whereas MRI determines the damage caused to ligaments, tendons, muscles, and knee cartilage. Both the imaging tests are essential to detect the ligament tear but MRI plays a major role to identify the type of the tear. Taking into consideration the seriousness of this injury and since MRI scan is the prime source of detecting it, an attempt has been made to detect and expedite the complete process of detecting an ACL by analyzing MRI scans. Convolutional neural networks (CNNs) are a widely used technique in deep learning for image recognition. The aim behind this work is to build a system using numerous CNN models and finally combine them using a Softmax regression model to get the desired result. The paper has been organized as follows. In Sect. 2, related works have been studied through a literature survey of different research papers related to the work. Section 3 presents the proposed system of this work starting with the problem statement followed with the proposed solution. Section 4 includes the design components of the work. In Sect. 5, detailed explanation of implementation along with results is discussed. Section 6 discusses the conclusion of this work.
SmartACL: Anterior Cruciate Ligament …
375
2 Overview 2.1 Related Works This section includes the papers that have been read and referred to for ideas of implementation of the work and the algorithms that were studied and understood for completion of the work. A few of the papers have been mentioned below. The paper [1] was studied so as to ensure that analyzing MRI is the way to go. It was studied because there were options other than MRI scans to detect ACL tears. Paper [2] helped in finalizing the dataset upon which to train the model, i.e., the MRNet dataset. The dataset consisted of 1370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001 and December 31, 2012. Study [3] helped in deciding to use 2D CNN models in place of 3D CNN models. This study had to be done since the MRI scans are series of images and hence can be treated as 3D data. The study compared the accuracies of the 2D model which was pretrained on ImageNet dataset and the 3D model which was a custom model built by the researchers and it was found similar, i.e., 2D CNN with 92% accuracy and 3D CNN with 89% accuracy, and hence, 2D model was selected. Studies [4, 5] helped to conclude the number of planes to be considered was discovered. The models used in these studies used only a single plane. Study [4] used sagittal plane and had AUC 0.894. Study [5] used coronal plane and had maximum accuracy 91.5%. These accuracies were lower compared to the other studies, and hence, it was decided that all the planes have to be considered for finding the optimal solution. The study [6] compared different existing CNN architectures like AlexNet, ResNet, and GoogLeNet, and it was found of that the accuracies were on the lower side, i.e., 81.66%, 81.67%, and 78.06%, respectively. This helped reach to the conclusion that a custom model has to be implemented. Study [7] led to the conclusion that too much dense models should not be used. This was because the researchers used 3 CNNs, i.e., the LeNet-5, You Only Look Once (YOLO). For the 3rd CNN, they tried densely connected convolutional networks (DenseNet), AlexNet, and very deep convolutional networks for large-scale image recognition (VGG16). They decided to continue with the DenseNet because of its superior performance. The drawback to this approach was the training, and prediction time was on the higher side. Therefore, it was decided that a relatively lighter model has to be implemented.
3 Proposed System This work aims to detect and expedite the complete process of detecting an ACL by analyzing MRI scans. Deep learning will be used to classify an MRI scan to
376
J. K. Shaju et al.
Fig. 1 Backend flowchart
ACL tear or normal or any other abnormality. The components involved in the entire system are a database to store data that will be received from the users, a backend model to classify the MRI scan, and a frontend Web application that will serve as the user interface of the system. For the backend, there will be four models in total. Three CNNs and a Softmax regression. Since there are three different planes in each MRI scan, three different CNN models for each plane will be built. The input will be given to the system by the user through the Web interface. The input consists of images from three different planes, i.e., coronal, sagittal, and axial planes. These separate images will be fed to the CNN model where the probability of the three classes (normal, abnormal, and ACL) is predicted by a Softmax layer. Each image of a plane has different number of slices; therefore, each CNN model would be iterated according to the number of slices. The probabilities from each iteration would then be aggregated. The aggregated probability from each CNN model will act as an input to the Softmax regression model. The output of the Softmax regression model will be either an ACL tear or normal or any other abnormality. Finally, this model will be integrated with a Web application so that users can upload their scans and get the results. Flowchart of the backend model is as shown below (Fig. 1).
4 Design 4.1 Dataset The MRNet dataset that consists of 1370 knee MRI exams has been used. The exams were performed at Stanford University Medical Center. The dataset contains 1104 abnormal cases, with 319 ACL tears and 508 meniscal tears. The data have been split into a training set, a validation set, and a hidden test set. The dataset is already divided into the three different planes; hence, no pre-processing is required [8]. Although the dataset did not require pre-processing, augmentation on the data was performed as the abnormal cases were much higher than the normal and ACL tear cases. After
SmartACL: Anterior Cruciate Ligament …
377
augmenting the data, the abnormal case data were divided and were used partially, to create a balanced dataset.
4.2 System Overview The system works on the concept of image recognition which is one of the major applications of deep learning. The overview of the backend has been discussed in this section where the four models required for prediction of the input image, i.e., MRI scan are described.
4.2.1
CNN Models for Axial, Coronal, and Sagittal Plane
A typical MRI for a patient consists of three scans, for the three planes. These three MRI scans act as input represented on three different anatomical planes: coronal, sagittal, and axial in the structure. The function of these CNN models is to find the probability of the three classes (normal, abnormal, and ACL) and forward the output to the Softmax regression model.
4.2.2
Softmax Regression Model
Softmax regression is a generalization of logistic regression where multiple classes are handled to normalize an input value into a vector of values that follows a probability distribution whose total sums up to 1. The output values are between the range [0,1] in order to avoid binary classification and accommodate as many classes or dimensions in a neural network model. Here, we use this model as a final step toward our prediction. This model takes input, which is the output of the three CNN models and finds the probability of the three classes by taking into account all the three planes.
5 Implementation The backend comprises of 3 CNN models and 1 Softmax model. The architecture of these models is shown in Fig. 2. As individual slices were taken as input, we had 16 slices for the 1130 cases which meant 18,080 slices of size (2,562,561). However, as the number of cases of the “abnormal” class were significantly higher than the other cases, our model was biased toward the “abnormal” class. Therefore, we decided to use class balancing and this ensured that the bias was removed. This was done by augmenting the “ACL” class data using rotation in the range (0,25). The outputs from each of the 3 models of the planes were the probabilities of each case to occur
378
J. K. Shaju et al.
Fig. 2 System architecture
in the format (normal, abnormal, and ACL). The outputs of the three planes of all the cases were then concatenated and used as training data for the Softmax regression model which then gave final output as either 0, i.e., normal, 1, i.e., abnormal, 2, i.e., ACL. The Softmax regression had an accuracy of 97%. The implementation of the individual models are explained in detail in the following points.
5.1 CNN Model for Axial, Coronal, and Sagittal Plane The slice of size (2,562,561) was provided as the input layer. Convolutions of size 5 × 5 with 28 filters were done followed by batch normalization and with ReLU activation function. Average pooling of size 14 × 14 with ReLU activation function. Convolutions of size 5 × 5 with 10 filters were done followed by batch normalization and with ReLU activation function. Average pooling of size 5 × 5 with ReLU activation function. Convolution of size 5 × 5 with 1 filter was done followed by batch normalization and with ReLU activation function. Fully connected layer of size 256 × 64 × 3 which gave the set of probabilities for a single slice. Since the 16 slices in the middle are our region of interest, we calculate the aggregate of these 16 sets of probabilities. The aggregation function we decided to use was the mean function.
5.2 Softmax Regression Model The set of 3 probabilities obtained from the 3 CNN models, i.e., size (1,9) was concatenated and was used as input to train this model (see Table 1). The size of the table was 558 × 10. The model predicted 545 of the 558 cases correctly, thereby
0.0097125
0.0141913
0.0078303
0.0097032
0.0058265
0.0069859
0.0056158
0.0225296
0.0059156
0
1
2
3
4
5
6
7
8
Axial normal
0.9936693
0.9701744
0.9942367
0.9916912
0.9935168
0.9874074
0.9903803
0.9806672
0.9897488
Axial abnormal
Table 1 Softmax dataset
0.8783656
0.8757323
0.8661973
0.9379159
0.9259056
0.9334804
0.9006688
0.9256332
0.9163734
Axial ACL
0.0342440
0.0297998
0.0776427
0.0062342
0.0183287
0.1281738
0.0781291
0.0443644
0.1199736
Coronal normal
0.9667232
0.9722293
0.9244673
0.9934385
0.9826248
0.8735583
0.9237438
0.9563352
0.8830496
Coronal abnormal
0.743085
0.8732092
0.5567818
0.8969083
0.6075508
0.5961386
0.7401179
0.7890317
0.7188952
Coronal ACL
0.0074898
0.0139200
0.0053919
0.0136549
0.0131890
0.0101669
0.0167410
0.0178554
0.0151616
Sagittal normal
0.9926226
0.9847384
0.994248
0.9861777
0.9864763
0.9897662
0.9842295
0.9814295
0.9850378
Sagittal abnormal
0.8506985
0.8951145
0.9513843
0.9503137
0.7707613
0.9596432
0.913814
0.9294456
0.9260935
Sagittal acl
1
1
1
0
0
0
2
2
2
Label
SmartACL: Anterior Cruciate Ligament … 379
380
J. K. Shaju et al.
Fig. 3 Confusion matrix
giving an accuracy of 97.67%. The labels are interpreted as follows: 0-normal, 1abnormal, and 2-ACL. The output of this model is the final prediction, i.e., normal, abnormal, or ACL.
5.3 Results A confusion matrix was used to check the accuracy of the final Softmax model. It was generated by testing the model with data other than the training data. The dataset consisted of 377 cases; 124 ACL cases, 188 abnormal cases, and 65 normal cases. The accuracy obtained according to the confusion matrix is 90.8%. The results of the Softmax model as seen in the confusion matrix are shown below: 1. 2. 3.
65 cases out of 65 cases of the normal class. 100% were detected properly. 162 cases out of 188 cases of the abnormal class. 86.20% were detected properly. 111 cases out of 124 cases of the ACL class. 89.50% were detected properly (Fig. 3).
6 Conclusion Computer-aided diagnosis plays an important role in today’s world with its ability to suggest medical specialists by enhancing the accuracy, sensitivity, and specificity of the automated detection methods. This paper provides an automated system to predict the anterior cruciate ligament (ACL) injury from magnetic resonance imaging (MRI) in a human knee which happens mainly because of physical activities such as sports, fitness activities, and so on. The proposed method uses a deep learning model to analyze the MRI scans and predict the injury. This system is able to distinguish between normal, abnormal, and completely ruptured ACLs, which could be used
SmartACL: Anterior Cruciate Ligament …
381
as an early warning system for both patients and doctors. However, notifying them of an impending operative treatment would thus allow them to immediately plan ahead. The system is able to detect only human knee ACL injury at present. If the current system is able to perform as per requirements, the current system could be integrated into a system that could be used to predict injuries or diseases pertaining to all human body parts by analyzing MRI scans which could then be used in hospitals or by doctors. Since, this is an era of automation, an automated machine to predict injuries or diseases, which provides results in minutes would be greatly appreciated.
References 1. Kocabey, Y., Tetik, O., Isbell, W.M., Atay, O.A., Johnson, D.L.: The value of clinical examination versus magnetic resonance imaging in the diagnosis of meniscal tears and anterior cruciate ligament rupture. Arthroscopy : the journal of arthroscopic & related surgery : official publication of the Arthroscopy Association of North America and the International Arthroscopy Association, pp. 696–700 (2004) 2. Pranav, R., Bien, N., Ball, R.L., Irvin, J., Park, A., Eric, J., et al.: Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. Public Library Sci. Med. 15(11), e1002699 (2018) 3. Namiri, N.K., Flament, I., Astuto, B., Shah, R., Tibrewala, R., Caliva, F., Link, T.M., Pedoia, V., Majumdar, S.: Deep learning for hierarchical severity staging of anterior cruciate ligament injuries from MRI. Radiol.: Artif. Intell. 2, e190207 (2020) 4. . Stajduhar, I., Mamula, M., Miletic, D., UEnal, G.: Semi-automated detection of anterior cruciate ligament injury from MRI, Comput. Methods Programs Biomed. 140, 151–164 (2017) 5. Chang, P.D., Wong, T.T., Rasiej, M.J.: Deep learning for detection of complete anterior cruciate ligament tear. J. Digital Imaging 32 (2019) 6. Irmaki, I.., Anwar, S.M., Torigian, D.A., Bagci and Ulas.: Deep learning for musculoskeletal image analysis. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp. 1481–1485 (2019) 7. Liu, F., Guan, B., Zhou, Z., Samsanov, A., Rosas, H., Lian, K., Sharma, R., Kanarek, A., Kim, J., Guermazi, A., Kijowski, R.: Fully automated diagnosis of anterior cruciate ligament tears on knee MR images by using deep learning. Radiol.: Artif. Intell. 1 (2019) 8. Stanford ML Group, [Online]. Available: https://stanfordmlgroup.github.io/competitions/mrnet/
Building a Neural Network for Identification and Localization of Diseases from Images of Eye Sonography Shreyas Talole, Aditya Shinde, Atharva Bapat, and Sharmila Sengupta
Abstract The aim is to create a system that uses an image classifier model to process the sonographic images of the eye of the patient to identify the disease, to confirm the presence of disease; to enable the doctor to compare the image analysis with the diagnostic report of sonography and implement localization of disease, if present within the sonographic image. Doctors make use of ultrasound machines to generate a sonographic image of the eye. As per the image generated, doctors analyze if the disease is present or if the eye is healthy and then specify their inference in the diagnostic report. The dataset is organized according to disease by sorting the data of existing patients. This process is accomplished by applying NLP on the diagnosis of existing patient diagnostic reports to find disease. Diseases can be identified with the help of an accurate model if a doctor is absent or it can act as a second opinion which is of utmost importance for patients and even doctors. A Web app is built alongside to provide the doctors with a user interface to access the data immediately. The collective dataset of sonographic images of existing patient records is further used to train a model; a classifier model that takes the sonographic image of the eye then gives the result for the presence of disease from the trained classes. The interface helps to display the prediction of disease made by the classification model and also shows the localized area of the presence of the disease. Keywords Ophthalmology · VGG · Inception v3 · Xception · Image classification · Image processing · Ultrasound images
1 Introduction Optical coherence tomography (OCT) is a method to reap an extremely good picture of the cross-sectional part of the eye. The innovation of OCT turned into an enterprise and supplied economically for ophthalmic diagnostics in 1996 (Humphrey Systems, S. Talole (B) · A. Shinde · A. Bapat · S. Sengupta Department of Computer Engineering, Vivekanand Education Society’s Institute of Technology, Chembur, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_34
383
384
S. Talole et al.
Dublin, CA). Many organizations have achieved several medical research withinside numerous years. OCT approach offers picture resolutions of 1–15 µm which attained one to two orders significantly better than traditional ultrasound [1]. OCT is a painless and non-invasive imaging technique that takes a picture of the retina, located at the back of the eye. It uses rays of light to measure the retinal thickness to examine the patient’s eyes’ parameters to determine the patient’s disease. OCT was applied for imaging in the eye, and it has the most extensive clinical impact in ophthalmology. It empowers the non-contact, non-invasive imaging of the foremost eye and imaging of morphologic highlights of the human retina, including the fovea and optic plate. NLP helps to free up the doctors to maintain the manual and complex structure of clinical reports. The huge volume of unstructured data is inputted in the EHR so it is very difficult to sort the relevant information from it. NLP assists to extract crucial information from the clinical documents [2]. Image processing plays a significant role in medical image analysis to visualize the actual problem of the eye. In the medical field, image processing technology has become an important part of the clinical routine. It uses various imaging modalities to decide on how to process and analyze a significant volume of images so that highquality information can be produced for disease diagnosis and treatment. It produces visible images of the inner structure of an eye for medical studies and also helps to visualize the interior structure of the eye. Making use of neural networks to process this high-quality information would help gain beneficial information and in turn would display the power of AI to the world. The paper is organized as follows; after introducing the problem definition in this section, various research papers are reviewed in Sect. 2. There are not many recent developments in the similar domain, but research on different machine learning models based on eye related diseases is studied. The process of implementation and the architecture of the model is described in Sects. 3 and 4. Disease classes and comparison of different ML models are specified in the analysis sections of 5 and 6. Though it is an ongoing research, the current outcomes are mentioned in Sect. 7 and are followed by the several references which were useful for better understanding of the similar research on eye diseases.
2 Literature Review An automated classification framework is used on retinal fundus images [3] and several image processing techniques are applied such as quality enhancement, segmentation, augmentation of an image followed by classification. The process highly depends on the quality and quantity of the images. Here, in [2], ocular slit-lamp examination is used to examine the ocular surface, that is, cornea and conjunctiva. The heatmap visualization feature, apart from classification of images, is successful to detect abnormal corneal regions in the images for diagnosis of diseases but the
Building a Neural Network for Identification and Localization …
385
accuracy hugely depends on datasets. Live images of the internal structure of the body are captured with the help of an ultrasound scan, which is a medical test that uses high frequency sound waves. In this paper, authors augmented the data of medical sonographic images, then used the fine-tuned Inception v3 model based on transfer learning to extract features automatically, and used different classifiers (Softmax, Logistic, SVM) to classify the images. Finally, it was compared with various models based on the original Deep Convolution Neural Network (DCNN) model [4]. The experiment proved that the experiment based on transfer learning was meaningful for pulmonary image classification. The paper talks about the evaluation of the feasibility of retinal imaging which involves acquiring OCT scans on images of a patient’s eyes [5]. The doctor will review the images and check for any abnormalities if anything abnormal turns up on the ultrasound, the patient may need to undergo other diagnostic techniques. The main theme of this study is to understand the various machine learning prediction models that are being used for the prediction of diseases correctly. Such prediction models help to analyze the nature of medical image data which often helps the doctor to cross-check their already made decisions. The purpose of this study is to understand the automatic identification of diseases by using the various image processing algorithms that are capable of solving the critical prediction problems in the medical field. Such algorithms can detect and identify age-related diseases of the eyes. This research helps to answer the difficult clinical questions which are the main key constraints to solve the problems related to medical diagnostics images. Some machine learning models take out evidence-based useful information which is necessary to be considered to detect the disease of the patients [6].
3 Implementation Details 3.1 Methodology To classify sonographic images according to disease, we train neural network models based on different architectures. The input images are taken from existing patient data [7]. Sonographic images of diseases like DME, Optic Disc Drusen, CNVM, and normal eye are used to train the model. We have built a custom model based on VGG16 architecture but with one extra layer for added computational power. The other models viz. Inception v3 and Xception [4, 8] having the same number of parameters were imported directly from Keras library. The last layer of all models makes use of the Softmax activation function. The entire dataset of 2.6 GB is then divided into test images and train images. Keeping validation accuracy as a monitored parameter, all the models were trained. For all models, the weights giving minimum validation accuracy were saved. The VGG16 model was trained with uninitialized parameters and with Adamax optimizer which gave better results among RMSProp and Adam optimizers. The Inception v3
386
S. Talole et al.
model was trained with ImageNet pretrained weights while the weights of Xception were kept uninitialized. For both Inception v3 and Xception, the training was done by keeping the middle layers as trainable and non-trainable as well. This helped us study the dependence of models on middle layers. Since the dataset for two of the classes viz., macular hole and central serous retinopathy [7] was just 104 each, image augmentation was used to generate around 1000 more images of each class. Image augmentation helps in increasing the dataset by slightly modifying images by introducing lateral and longitudinal shifts, horizontal or vertical flips, rotation, and zoom. We made use of only those modifications that would not alter the image understanding. A square with a definite size of the cross section is rolled over the pixels of the image. Every time detection is done for the disease, with an image classifier, for that particular cross section. Keeping a threshold, all the squares with probability lesser than the threshold are discarded. For the remaining squares, the square that overlaps most of the other squares is kept, that is, the square which covers most of the other squares and can be considered as a superset, with the help of non-maximum suppression.
3.2 Algorithm VGG16 is a convolution neural network (CNN) architecture that uses a vision model that focuses on convolution layers of 3 × 3 filter with a stride one and always uses the same padding and max pool layer of 2 × 2 filter of stride. It has an arrangement of convolution and max pool layers followed by Softmax for output. Inception v3 [9] is a convolution neural network used for assisting image analysis and object detection, and it also classifies the images into various object categories with the help of its pretrained network. This network has an image input size of 299 × 299 although we changed it to 150 × 150, along with some layers that are used for label smoothening, factorized 7 × 7 convolutions, with a Softmax layer to classify 6 classes. The Xception model is an extension of the inception architecture which replaces the standard inception modules with depthwise separable convolutions. Xception offers an architecture that is made of depthwise separable convolution blocks + max pooling for the feature extraction from the set of images [10]. Non-maximum suppression uses the object detection pipeline for generating proposals for classification. It selects the box with the highest objectiveness score then compares the overlap (intersection over union) of this box with other boxes. After that, it discards boxes that are below a given probability bound, and with the remaining boxes, it repeatedly picks the box with the highest probability and discards any remaining box where an IoU > threshold box outputs in the previous iteration.
Building a Neural Network for Identification and Localization …
387
Fig. 1 Data flow diagram
4 System Architecture See Figs. 1 and 2.
5 Disease Analysis Modifications were made by adding layers and adding batch normalization, for better accuracy while using the image classification model of VGG architecture. Batch normalization is a technique by which the training input is standardized. This technique not only stabilizes the learning process but also helps in significantly reducing the number of epochs required for training. Categorical cross-entropy and Adamax optimization were found to provide the best accuracy for the VGG model after implementing different loss functions and optimization functions. The accuracy achieved for VGG architecture for the classification of 6 disease classes is 92%. A Web app is developed with the help of the Python Django framework. This framework was selected based on compatibility with the code of recurrent neural networks models viz., VGG, Inception v3, and Xception. Django framework provides the flexibility of developing Web applications and also manipulating data on storage along with the database. Since scaling and security features are also available, it
388
S. Talole et al.
Fig. 2 Flowchart
Fig. 3 Disease classes
stood out as the best choice to make a Web-based application for intensive backend workloads. The Web application was developed with the aim of providing flawless and tangible access to classifier models and their analysis. The doctor will be able to access the data disease wise, tests wise, or directly with the patient’s name, with its help (Fig. 3).
6 Result Analysis From the training done by changing certain parameters, the obtained results are mentioned in the table. We can understand that the middle layers play a crucial role in decreasing the validation loss of a model which increases the confidence level of the model. We can say that by initializing the model with ImageNet weights, the number of iterations required to train the model decreases by a few epochs. This is because
Building a Neural Network for Identification and Localization …
389
Table 1 Comparative analysis between different classification models Model
Size (Mb)
Middle layers trainable
Validation loss
Training accuracy
Test accuracy
Test loss
VGG16
236
True
0.7
0.92
98.6
0.15
Inception
228
False
0.2
0.72
90.3
0.32
Xception
864
False
0.3
0.77
90
0.3
Inception
312
True
0.09
0.93
98.6
0.06
Xception
944
True
0.08
0.94
99.3
0.028
the Inception model took lesser epochs as compared to the Xception model. Even though Inception v3 and Xception have the same number of parameters, Xception has performance gain due to efficient use of model parameters (Table 1; Figs. 4, 5, 6, 7, 8, 9, 10, 11, and 12).
Test Result : 98.599 , loss : 0.15
Fig. 4 Loss and accuracy graphs for VGG model
Test Result : 90.31 , loss : 0.32
Fig. 5 Loss and accuracy graphs for Inception v3 model, layers trainable false
390
S. Talole et al.
Test Result : 98.601 , loss : 0.062
Fig. 6 Loss and accuracy graphs for Inception v3 model, layers trainable true
Test Result : 90.01 , loss : 0.3
Fig. 7 Loss and accuracy graphs for Xception model, layers trainable false
Test Result : 99.301 , loss : 0.028
Fig. 8 Loss and accuracy graphs for Xception model, layers trainable true
Building a Neural Network for Identification and Localization …
Fig. 9 Login page
Fig. 10 User interface
Fig. 11 Prediction by model
391
392
S. Talole et al.
Fig. 12 Localization of disease
7 Conclusion Three image classifier models were trained of different architectures viz., VGG architecture, Inception v3 model, and Xception. A comparative analysis was made between the performances of each one of them and found the Xception model to have the highest performance accuracy. An algorithm was implemented over the image classifier to localize the position of the disease on the image. In order to enable easy view as well as access to the patient data and image classifier’s analysis and localization, a Web-based GUI is developed using Python Django Framework. The system is expected to provide rapid predictions and give some valuable insights into varied disease patterns.
References 1. Fujimoto, J.G., Pitris, C., Boppart, S.A., Brezinski, M.E.: Optical coherence tomography: an emerging technology for biomedical imaging and optical biopsy. Neoplasia 2(1–2), 9–25 (2000). https://doi.org/10.1038/sj.neo.7900071 2. Gu, H., Guo, Y., Gu, L., et al.: Deep learning for identifying corneal diseases from ocular surface slit-lamp photographs. Sci. Rep. 10, 17851 (2020). https://doi.org/10.1038/s41598020-75027-3 3. Sarki, R., Ahmed, K., Wang, H., et al.: Image preprocessing in classification and identification of diabetic eye diseases. Data Sci. Eng. (2021). https://doi.org/10.1007/s41019-021-00167-z 4. Wang, C., Chen, D., Hao, L., Liu, X., Zeng, Y., Chen, J., Zhang, G.: Pulmonary image classification based on Inception V3 learning model (2019). https://doi.org/10.1109/ACCESS.2019. 2946000 5. Liu, X., Kale, A.U., Capewell, N., Talbot, N., Ahmed, S., Keane, P.A., Mollan, S., Belli, A., Blanch, R.J., Veenith, T., Denniston, A.K.: Optical coherence tomography (OCT) in unconscious and systemically unwell patients using a mobile OCT device. BMJ Open (2019). https:// doi.org/10.1136/bmjopen-2019-030882 6. Du, X.L., Li, W.B., Hu, B.J.: Application of artificial intelligence in ophthalmology. Int. J. Ophthalmol. 1555–1561 (2018). https://doi.org/10.18240/ijo.2018.09.21
Building a Neural Network for Identification and Localization …
393
7. Kermany, D., Zhang, K., Goldbaum, M.: Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data V2 (2018). https://doi.org/10.17632/rsc bjbr9sj.2; Gholami, P., Roy, P., Parthasarathy, M.K., Lakshminarayanan, V.: OCTID: Optical Coherence Tomography Image Database. arXiv preprint arXiv:1812.07056 (2018) 8. Francois Chollet.: Xception: Deep Learning with Depth convolution (2017). https://doi.org/ 10.1109/CVPR.2017.195 9. Bankar, J., Gavai, N.R.: Convolutional Neural Network based Inception v3 Model for Animal Classification, 141–146. ISSN (Online) 2278–1021 (2018) 10. Fatima, S.A., Kumar, A., Raoof, S.S.: Real Time Emotion Detection of Humans Using MiniXception Algorithm (2021). ISSN 1042 012027
A Quick Dynamic Attribute Subset Method for High Dimensional Data Using Correlation-Guided Cluster Analysis and Genetic Algorithm Nandipati Bhagya Lakshmi, Nagaraju Devarakonda, Zdzislaw Polkowski, and Anusha Papasani Abstract In data mining and machine learning principles, dimensionality reduction is crucial. The word “dimensionality” means the reduction of the number of functions. Machine learning models are also finding it difficult to deal with high dimensional data and high computational costs in the current scenario. The model gets more dynamic as the number of features increases and over fitting becomes more likely. When a machine learning model is trained on a large number of features, it becomes overly dependent on the data it was trained on, resulting in poor results on real data and defeating the goal. This research suggests a new feature selection (FS) hybrid three-stage algorithm according to correlation-based cluster analysis and genetic algorithm to overcome the above issues. During the two stages, a filter FS approach and a function clustering-based method with low computing cost are built to make the third process’s quest area smaller. The third step is to identify an optimal function subset using a global readability evolutionary algorithm. The three phases are also designed to improve the effectiveness of a symmetrical insecurity based removal process; a rapid link-based clustering method and a genetic algorithm. Finally, on a variety of datasets from the physical world that are freely accessible, the proposed algorithm is contrasted to some FS algorithms. According to experimental findings, the suggested algorithm can obtain a successful function subset with the least price of computing. Keywords Cluster · Feature selection (FS) · Hybrid search · Genetic algorithm
N. Bhagya Lakshmi (B) · N. Devarakonda · A. Papasani School of Computer Science & Engineering, VIT-AP University, Amaravati, AP, India e-mail: [email protected] Z. Polkowski Technical Sciences, Jan Wyzykowski University, Polkowice, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_35
395
396
N. Bhagya Lakshmi et al.
1 Introduction Dimensionality discount is applied in two methods. One is function selection and other one is feature extraction. H. Liu, H. Motodanb suggest that attribute selection [1] is the venture of choosing a subset of appropriate factors from a dataset at the same time as ignoring unnecessary capabilities so one can assemble an excessiveaccuracy version. In different words, it is far a way of selecting the fine capabilities from the enter dataset. There are three forms of FS algorithms presently to be had: clear out, Wrapper, and Embedded methods. • Wrapper techniques calculate the utility of a subset of functions by means of actually schooling a model on it, even as filter strategies measure the significance of features by way of their association with the established variable. • Filter out techniques are tons quicker than wrapper strategies due to the fact they do now not require the models to gain knowledge of. Wrapper processes, alternatively, are computationally inefficient as properly. • In sure cases, clear out methods will fail to discover the fine subset of functions, but wrapper methods will still discover the nice subset of features. • when a subset of functions from the wrapper methods is used as opposed to a subset of capabilities from the filter out techniques, the version is greater vulnerable to overfitting (Fig. 1). There are many searching strategies to find the optimal function subset along with sequential forward choice, sequential backward removal and recursive function removal and so forth. The majority of them do have flaws, including nearby convergence and/or excessive computing prices. Evolutionary computation (EC) has lately been implemented to FS issues to increase the worldwide search abilities of an FS set of rules. Genetic set of rules is the advanced algorithm for feature selection. In sure instances, genetic algorithm performs on a population of individuals to provide better and better approximations [2]. Genetic algorithm has many capabilities to improve the accuracy and perform higher than conventional characteristic choice techniques. But, considering these methods ought to test the category performance of numerous characteristic subsets time and again, they may be also computationally steeplypriced, proscribing their use in high dimensional statistics. There may be break up
Fig. 1 a Filter method. b Wrapper method. c Embedded method
A Quick Dynamic Attribute Subset Method …
397
into organizations, current hybrid methods. Clear out “wrapper hybrid and clear out “clustering hybrid are examples. The maximum of those approaches use a-segment hybrid approach. The first segment involves doing away with redundant capabilities, and the second one phase involves figuring out the first-rate characteristic subset from the decreased feature space. The present-day hybrid algorithms are also faced with the following problems. The amount of statistics stays constrained whilst you deal with high dimensional records. It is far difficult to define a threshold that shows whether or not a characteristic is no longer is important, so a few (legitimate) however often invalid ones are scrapped. The computing value of characteristic-based methods continues to be remains high. As the above tactics, this article defined a three-based feature choice technique for improving the without losing the facts in reduction of dimensionality. First, the irrelevant records are eliminated by using some techniques and in the second segment clustering is applied to institution the applicable information. From a limited wide variety of feature clusters, the 0.33 phase will quickly select consultant functions from the dataset. Those are the article’s key contributions: • Compare to the existing hybrid attribute selection methods like filter–wrapper and filter-cluster, the suggested three-phase feature selection approach will find better representative characteristics from function clusters of this kind. • Fast correlation driven feature clustering (FCFC) compares feature similarities to known cluster centers. It not only reduces the computational expense of clustering functions, but it also eliminates the need to determine the clustering number ahead of time. • In the third phase, genetic algorithm is used to find the accuracy and compare with the other algorithms to perform the better solution for the model. The organization of this paper is as follows: Section 1 describes the introduction part; Sect. 2 describes the related work on feature selection methods; Sect. 3 describes proposed method; Sect. 4 gives the results of the experiment and at last Sect. 5 presents the conclusion and future enhancement of the work.
2 Related Work 2.1 Issues with Feature Selection Feature selection is a preprocessing stage in machine learning that selects a subset of original features based on a collection of criteria. It is useful for eliminating/reducing the influence of unnecessary data, removing redundant data, reducing dimensionality [3] (binary model), increasing learning precision, and enhancing outcome comprehensibility. Assuming that a dataset Data contains D features and L cases, and that its initial feature set is F, an FS issue in the sense of classification can be interpreted as the following: selecting as few features as possible to maximize the specified output index H(•). The following is the expression:
398
N. Bhagya Lakshmi et al.
Max H (x)
(1)
s.t. X = (x1, x2 . . . x D)
(2)
Xi ∈ {0, 1}, i = 1, 2 . . . D
(3)
2.2 Genetic Algorithm Genetic algorithms (GAs) are evolutionary algorithms that are adaptive heuristic search algorithms. Natural selection and genetic research concepts are used to build genetic algorithms [4]. There are intelligent random search programs that use historical data to direct the search to a solution space field that produces better results. They are mostly used to develop superior options to optimization and searching problems. The entire algorithm can be summed up as follows: 1. 2. 3.
Initialize populations at random p Determine population fitness Repeat before convergence: • • • •
Choose parents from the population Crossover to create a new population Mutate the new population Calculate the health of the new population
Why genetic algorithm is used in this paper when compare to other algorithms because of feature selection is the main aspect of this paper and also we have to eliminate irrelevant features to find the best accuracy among all datasets. Previously, we have used particle swarm optimization but now we are implementing genetic algorithm to prove the better accuracy compare to previous algorithms [5, 6]. The principle behind utilizing genetic algorithms to solve optimization issues is that they start with a population of individuals, each of whom represents a possible solution to the problem at work. Compare to other algorithms genetic algorithm finds the optimal solution for difficult problems. Genetic algorithm uses different operators to find the new solution for complex problems. The most common thing is to utilize them to pick the most relevant characteristics (from the provided dataset) that will be used to construct the classifier. Traditional algorithm is an unambiguous specification that describes how to solve a problem, whereas genetic algorithm is an algorithm for solving both restricted and unconstrained optimization problems that is based on genetics and natural selection. Feature selection is the main aspect in machine learning and also reduces dimensionality. Genetic algorithm is used in many fields like biometric, face recognition,
A Quick Dynamic Attribute Subset Method …
399
etc. Here, we are implementing three-phase method to find the better accuracy for comparing the many algorithms by using different datasets.
2.3 Approaches of Feature Selection that Are Still in Use 2.3.1
Selection of Evolutionary Features
Till now many evolutionary algorithms are carried out to function choice problems inclusive of ant colony optimization, PSO algorithms and many others. The method of selecting the most appropriate inputs for a model is known as attribute selection or input selection. These methods can be used to classify and exclude unneeded, irrelevant, and unnecessary elements that do not add to or reduce the predictive model’s accuracy. The genetic algorithm is one of the sophisticated feature discovery algorithms. This is a stochastic function optimization approach focused on natural genetics and biological evolution dynamics. In nature, species’ chromosomes appear to change over centuries to improve their ability to adapt to their surroundings. The genetic algorithm is a heuristic optimization system based on natural evolution procedures. Genetic algorithms work on a population of individuals to improve approximations over time. The genetic algorithm [7] uses fitness score and objective function to calculate the best searching value. It has many operators like mutation, selection, cross over is used to calculate the fitness value from the population size.
2.3.2
Feature Selection Based on Clustering
Clustering technology [8] has been used to address FS problems. A-segment clustering-based totally FS set of rules become suggested by using music et al. the primary segment clustered all capabilities the use of a minimum spanning tree, and the second one selected the maximum consultant element from each of the clusters to create an attribute subset. Liu et al. recommended a clustering method-based totally at the minimal spanning tree, wherein the significance and redundancy of functions is classified the usage of a metric of understanding variance. Then, based on the correlation between capabilities, a minimal spanning tree became shaped, and the long edges had been extracted to form clusters. Then, from each cluster, a characteristic with the greatest importance became chosen. Whilst operating with excessive-dimensional data, the two algorithms above even have a high computational fee because of the want to quantify the correlation of all features earlier with a purpose to shape an entire graph. Density top clustering becomes proposed via Jie et al. [9], wherein the records hole between features became used as a clustering indicator. Chatterjee et al. [10]. Recommended a clear out method primarily based on k-approach. The k-approach clustering algorithm is used to pick out valid capabilities in the schooling dataset via constructing clusters, and the correlation measure is used to find redundant functions from clusters. The functions were clustered the use
400
N. Bhagya Lakshmi et al.
of k-method era, and then the features in each cluster have been taken care of relation to their price. Subsequently, a certain percentage of features from every cluster is chosen. The algorithm is able to effectively clustering features. However, so as to decide the right okay price, it must run the k-approach era several times. On high dimensional consequences, the k-means has a high computational fee, as all of us recognize. Most importantly, in all the strategies defined above, each attribute cluster selects its very own representative feature without thinking of the mixed output of such capabilities.
2.3.3
Hybrid Feature Selection
Till now, many hybrid function choice techniques are added like clear out and wrapper methods mixture, genetic set of rules technique is to lessen dimensionality. Both of these formulas can exclude a large number of unnecessary features earlier than executing FS wrapper methods, however whilst dealing with facts with a large number of applicable capabilities, in addition they face the “curse of dimensionality” on this, we are used an unsupervised clustering method that is referred to as k-means [11]. Following the k-approach clustering of all functions, a proportion of functions from each cluster are selected to group the final function subset. This sort of hybrid method will in fact minimize the characteristic seek area to a degree. They forget about the dynamic coupling relationship between distinctive functions via merely selecting the top element from every cluster to form the very last attribute subset.
3 Proposed Algorithm 3.1 Framework in General and Key Definitions The primary demanding situations of evolutionary computing are dynamicity reduction and high cost of computing. This article proposed a three segment hybrid choice framework for removing those challenges. There are 3 steps of the framework: (1) deleting needless functions (2) clustering applicable functions, and (3) deciding on consultant capabilities. In the first section, getting rid of inappropriate functions and within the 2nd phase grouping all associated features in keeping with their similarities. After that the 1/3 section selects the unique characteristic using genetic set of rules. Because the quantity of characteristic clusters is generally tons smaller than the size of authentic capabilities on this system, each the quest place and the assessment value of genetic set of rules is notably reduced within the 1/3 step [12]. Moreover, considering the fact that functions from one of a kind clusters are largely impartial of one another to some diploma, choosing features from distinctive clusters will accelerate seek space for the great series subset. The first phases has nonetheless
A Quick Dynamic Attribute Subset Method …
401
having a few challenges related how inappropriate data may be tested between applicable and classification records and how we are able to divide the same functions into the equal cluster fast and appropriately [13]. Until clustering, most modern-day clustering methods allow calculating the same stages among all functions. To gain identical ranges among all features, they must do a correlation study (D – 1)! Times, using statistics with D capabilities for instance. If the scale of the capabilities turns into larger, the computing complexity grows exponentially (Fig. 2).
Fig. 2 Structure of the proposed algorithm
402
N. Bhagya Lakshmi et al.
3.2 Key Definitions In many feature selection problems, symmetrical insecurity (SU) use as the measure of similarities between either two attributes or a feature and the target concept. This SU has number of existing measures such as C-relevance, S-relevance, and F-correlation. These methods are the measures of proposed algorithm. By using clustering methods we can group data into groups after performing redundancy check and then select representative features to form a final subset. Compare all the features from different datasets then remove irrelevant data based on weak correlation. Then find the threshold value with the features existing for the number of clusters. After selection of third phase we have to select some random features from different clusters and apply genetic algorithm to calculate objective function for high classification performance to generate best fitness value. The symmetrical insecurity (SU) is given as SU (X, Y ) = 2 × Gain(X |Y )
(4)
where H(X) is the entropy of a discrete random variable X. Suppose p(x) is the prior probabilities for all values of X, H(X) is defined by H (X ) = −
p(x) log 2 p(x)
(5)
Gain(X|Y) is the amount xt X by which the entropy of Y decreases. It reflects the additional information about Y provided by X and is called the information gain which is given by Gain(X |Y ) = H (X ) − H (X |Y )
(6)
C-Relevance: The similarities between an attribute f i ∈ F and the class labels C that is, the C-relevance between f i and C, is calculated by SU(f i , C). SU ( f i, C), i = 1, 2, . . . D;
(7)
S-Relevance: For an attribute f i ∈ F and the class labels C, if the value of SU(f i , C) is greater than a pre-determined threshold ρ 0 , we call that f i is a strong relevant feature with respect to C. F-Correlation: The correlation between two attributes fi and fj(fi = fj), called the F-correlation, can be denoted by SU(fi, fj). In order to prevent removing relevant features by setting a high threshold ρ0, we set threshold ρ0 as follows: ρ0 = min 0.1 ∗ SU max, SU[D/ log D] - th
(8)
A Quick Dynamic Attribute Subset Method … Table 1 Detailed information about 3 datasets
403
Datasets
Number of features
Number of samples
Number of classes
Prostate
339
102
2
lung
12,600
203
2
scadi
7128
72
2
4 Experimental Analysis 4.1 Datasets In this article I have taken some datasets from different sources. The sources are like kaggle and UCI machine learning repository which contain the below datasets. The table shows the general information of the datasets. The following datasets are Prostate, lung and scadi (Table 1).
4.2 Performance of Classifiers for Calculating the Correlation In this step, we have taken 3 datasets and combine all the datasets for calculating correlation to remove irrelevant data in the dataset. Based upon correlation of three datasets we are forming into clusters. In this process, we are performing correlation in the training set only. Here, we are using Pearson correlation to remove the irrelevant data. Based upon the threshold value we are going to compare the correlated value in the training dataset. When the correlation is greater than the threshold value then we will store that particular column in the set so that duplicate entries will not go to the columns (Fig. 3).
4.3 Performance of Classifiers with Grouping of Clusters In this step, after performing correlation we can form the group of data into clusters by means of K-means algorithm. The table shows the grouping of clusters detailed information (Fig. 4; Table 2).
404
Fig. 3 Calculating the performance of correlation using 3 datasets Fig. 4 Graphical representation of forming of clusters
N. Bhagya Lakshmi et al.
A Quick Dynamic Attribute Subset Method …
405
Table 2 Detailed information about grouping of data into clusters Final cluster centroids Cluster Features
Full Data
0
1
2
59
29
20
6
3 4
Name
Katharine
Katharine
John
Yul
Ray
Surname
hepburn
hepburn
smith
brynner
milland
Age
42.6271
34.4483
54.15
42.5
44.5
Smokes
15.0678
12.9655
17.7
16.3333
15.25
AreaQ
5.2034
6.8276
3.45
3.1667
5.25
Alkhol
3.2373
1.6207
5.4
4.1667
2.75
Result
0.4746
0.069
1
0.6667
0.5
diagnosis result
M
M
M
M
M
Radius
16.7458
17.3448
16.2
14.3333
18.75
Texture
17.9831
19.2069
15.5
18.1667
21.25
Perimeter
100.5763
99.4483
106.15
101.3333
79.75
Area
745.4915
734.3448
820.15
741.5
459
Compactness
0.133
0.1348
0.1216
0.1432
0.161
Symmetry
0.1956
0.1986
0.1882
0.1957
0.21
Gender
0.4068
0.4483
0.25
0.5
0.75
d 5100-0
0.0847
0
0.05
0
1
d 5100-1
0.1525
0.1034
0
1
0
4.4 Calculating the Accuracy of Individual Dataset Using Genetic Algorithm with All Features In the below table, we have 3 datasets which are lung, scadi, prostate. By using genetic algorithm, we are calculating the accuracy of all the 3 datasets (Table 3).
4.5 Calculating the Accuracy of All Features of 3 Datasets Using Genetic Algorithm In this step, we implemented 2 prediction models as shown in figure by using training datasets namely K nearest neighbor (KNN), support vector machine (SVM) and one optimization algorithm called genetic algorithm is used. In the below table, we are comparing GA accuracy with classification algorithms like KNN and SVM but accuracy is not differentiating and running time shows the variation in the without
406
N. Bhagya Lakshmi et al.
Table 3 Calculating the accuracy of individual dataset Dataset
Gen
Nevals
Avg
Min
Max
Accuracy
Lung
0
10
0.854545
0.660606
0.95
96.6
1
6
0.931364
0.915152
0.95
2
4
0.935
0.933333
0.95
0
10
0.817143
0.742857
0.842857
1
3
0.842857
0.842857
0.842857
2
6
0.835714
0.828571
0.842857
0
10
0.084
0.05
0.12
1
8
0.091
0
0.12
2
4
0.11
0.08
0.12
Scadi
Prostate
78.5
10.99
Table 4 Ac and t values generated by KNN, SVM and GA Algorithms
Dataset
Accuracy
KNN SVM
Prostate
92.24 95.74
4.847
Lung
88.20
2588.515
95.34
1567.898
Scadi GA
Running time 5.678
100
24.274
100
52.157
Combination
96.6
2.042
Selected Features
71.18
0.01
attribute selection. In the attribute selection the accuracy is low compare to the classification algorithms (Table 4).
4.6 Algorithms and Parameter Settings for Comparison As comparative algorithms, we chose one filter algorithm, four FS algorithms that are evolutionary, one grouping-based FS algorithm, and three FS PSO hybrid algorithms. In addition, we compare the proposed genetic Algorithm to the whole package. The intention of evaluating the performance genetic algorithm output and the following are some comparative algorithms for FS. ReliefF FS (ReliefF). Binary PSO FS (BPSO). Binary bare bones PSO FS (BBPSO). Return-cost-based binary firefly FS (Rc-BBFA). Self-adaptive PSO for large-scale FS (SaPSO).
A Quick Dynamic Attribute Subset Method …
407
Table 5 Comparing the accuracy of all algorithms with GA Datasets
Full set
ReliefF
BPSO
BBPSO
Rc-BBFA
SaPSO
GA
Prostate
66.83
78.75
93.02
95.98
95.96
88.22
11
Lung
68.81
48.28
53.28
60.55
55.36
78.93
96.06
Scadi
80.29
83.44
87.29
88.26
88.43
83.45
97
As the above algorithms of this article are compare with the genetic algorithm accuracy which will prove the best accuracy of all the algorithms. In the below table, we have compare the accuracy of above algorithms with GA algorithm. Other than prostate dataset the remaining two datasets which are lung and scadi shows the best accuracy of comparing all algorithms (Table 5).
5 Conclusion The aim of this paper was to find a solution to an FS problem with a high dimensionality and/or computational complexity. It is always a challenge. The “dimensionality curse” is a significant problem that is impossible to solve with today’s low-cost computers, according to this article. This paper proposes a modern hybrid feature selection algorithm. This developed a three-phase hybrid architecture that integrates successfully the benefits of three different FS architectures with the help of the two stages; the genetic algorithms searching area in the third step is clearly reduced, dramatically improving swarm search efficiency. In the second step, the FCFC suggests comparing the similarity of each attribute to the known collection centers, which significantly decreases the clustering function measurement cost. Furthermore, the related swarm initial retardation and differentiation-based adaptive disruptions will make the proposed genetic algorithm more effective in resolving FS issues in the third stage. The algorithm was compared to some algorithms in publicly available real-world data sets, including ReliefF, BPSO, Binary BPSO, Rc-BBFA, HPSO-SSM. Experimental effects found out that with distinctly brief run instances on maximum data units, the proposed set of rules can reap the very best type accuracy, showing that it has heavy competition to solve excessive-dimensional FS problems. In spite of the high quality findings, the paintings remain restrained. First, hybrid feature selection performs a major function in the characteristic-clustering manner. More superior clustering strategies need to be designed and do no longer need to set a threshold manually. It could be viable manners to dynamically change the threshold depending upon the algorithm enter result. Second, the advised algorithm additionally faces high computing prices for data with a large quantity of samples. There’s a need to address capacity reductions in measurement expenses by means of rising tendencies, such alternatively version built up of consultant samples. The advised architecture also can be used to negotiate with multi-objective or multi-label FS issues and precise algorithms want to be similarly explored.
408
N. Bhagya Lakshmi et al.
References 1. Liu, H., Motoda, H., Yu, L.: Selective sampling approach to active feature selection. Artif. Intell. 159(1–2), 49–72 (2004) 2. Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl. Based Syst. 123, 116–127 (2017) 3. Xu, J., Tang, B., He, H., Man, H.: Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans. Neural Netw. Learn. Syst. 28(9), 1974–1984 (2017) 4. Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016) 5. Song, X.-F., Zhang, Y., Gong, D.-W., Sun, X.-Y.: Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognit. 112 Art. no. 107804 (2021) 6. Amoozegar, M., Minaei-Bidgoli, B.: Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst. Appl. 113, 499–514 (2018) 7. Song, Q., Ni, J., Wang, G.: A fast clustering-based feature sub-set selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013) 8. Batchanaboyina, M.R., Devarakonda, N.: An effective approach for selecting cluster centroids for the k-means algorithm using IABC approach. In: 2019 IEEE 18th International Conference on Cognitive … (2019) 9. Ghosh, C.M., Singh, P.K., Sarkar, R., Nasipuri, M.: Clustering-based feature selection framework for handwritten indic script classification. Expert Syst. 36(6) Art. no. e12459 (2019) 10. Liu, Q., Zhang, J., Xiao, J., Zhu, H., Zhao, Q.: A supervised feature selection algorithm through minimum spanning tree clustering. In: Proc. IEEE 26th Int. Conf. Tools Artif. Intell., pp. 264– 271 (Nov. 2014) 11. Batchanaboyina, M.R., Devarakonda, N.: Efficient outlier detection for high dimensional data using improved monarch butterfly optimization and mutual nearest neighbors algorithm: IMBO-MNN. Int. J. Intell. Eng. (2020) 12. Saidi, R., Bouaguel, W., Essoussi, N.: Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient (December 2018) 13. Li, Z., Jing, Z., Fang, W., Xia, L., Bin, A.: A genetic algorithm based wrapper feature selection method for classification of hyper spectral data using support vector machine. Geogr. Res. (2008) (en.cnki.com.cn)
Copy-Move Forgery Detection Using BEBLID Features and DCT Ganga S. Nair, C. Gitanjali Nambiar, Nayana Rajith, Krishna Nanda, and Jyothisha J. Nair
Abstract Images can act as a source of information, with the advancement in editing tools anyone can tamper an image effortlessly. Authenticity of the data we get is something very important and hence becomes an essential procedure. A commonly considered image manipulation is to conceal undesirable objects or replace a region with another set of pixels which is copied from the same image’s another region. Existing system uses block-based or keypoint techniques or a combination of both. In the block-based algorithms, images are divided into regular blocks, and the match between every block of the image is found out. Whereas in the keypointbased method, the interest points in an image are obtained, and the match between those points is found out. Although the keypoint-based method is computationally efficient, it shows less accuracy. The block-based method is whereas computationally expensive. We propose a method by combining the keypoint and block-based methodology in which we split the image into non-overlapping blocks and find the similarity between these blocks using keypoint-based feature extractor BEBLID, and then the block-based technique discrete cosine transformation (DCT) is applied to suspected blocks to detect forgery in copy-move forged images, and also we have compared the performance of boosted efficient binary local image descriptor (BEBLID) with scale-invariant feature transform (SIFT) which is another popular keypoint-based feature extractor. Keywords Copy-move forgery · Block based · DCT · Keypoint based · BEBLID · SIFT
G. S. Nair · C. Gitanjali Nambiar · N. Rajith · K. Nanda · J. J. Nair (B) Amrita Vishwa Vidyapeetham, Amritapuri 690525, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_36
409
410
G. S. Nair et al.
1 Introduction Technology has become indispensable in day-to-day life, which has led most institutions to rely upon digital multimedia. Over the past few years, more and more people have been depending on digital media in their day-to-day life so it is important to determine its authenticity. It is crucial for the image to be authentic, as an image that is forged can manipulate people’s opinions or change the result of an investigation . With the wide range of photoshop applications easily accessible, tampering would be an easy task which in turn makes the detection part challenging. Copy move is one of the most common image tampering techniques used in which a set of pixels from one area of the image is copy-pasted to some other area of the same image either to highlight the block or to suppress some unwanted area in the image. Block-based algorithm DCT cannot detect copy-move when the area is geometrically transformed, i.e, if the image used is rotated or scaled. Keypoint-based extractor like BEBLID would be able to detect forgery even when the image is geometrically transformed. In this paper, we aim to bring out the best of both methods and also to do a performance comparison of two different keypoint-based methods to detect copy-move forgery. Here we have proposed a novel algorithm where we combine the block-based methodology like DCT along with boosted efficient binary local image descriptor (BEBLID) which is a more recent keypoint-based feature extractor that was introduced in 2020 [1]. For the purpose of comparing the performance of keypoint extractors, we have also included another keypoint-based method scale-invariant feature transform (SIFT) which was presented by D. Lowe in 2004.
2 Related Works Detection of forgery remains an active field of research. In existing studies, to detect copy-move forgery, they have usually used block-based feature extractors, keypointbased feature extractors, forensic similarity graphs, etc., along with community detection algorithms, jpeg ghosts, DCT and CNN to localize the forged areas (Table 1). The block-based methods usually divide the images into different overlapping and non-overlapping blocks and then the features are extracted and matched. DCT is one of the most common block-based methods used in forgery [2]. It is inefficient in distorted images. Another approach is using keypoint feature extractors, where the keypoints are extracted and matched, this approach shows less accuracy but is computationally efficient [3]. The block-based approach and keypoint-based techniques are combined and used to utilize the best of both the methods [4, 5]. Another approach used to expose fake images is by using forensic similarity graphs and localization of forged areas through community detection algorithms [6]. Another existing approach is by finding the jpeg ghosts in the images. In this approach, the detection
Copy-Move Forgery Detection Using BEBLID Features and DCT Table 1 Comparison of some major related works Algorithm name Computational complexity Proposed algorithm Block-based algorithm [2] Keypoint-based algorithm [15] Block-based keypoint matching algorithm [4]
Intermediate High Low Intermediate
411
Accuracy High Low High High
accuracy degrades with smaller quality differences in the forged areas or if any kind of preprocessing techniques were applied on the images [7]. Another approach is applying DCT and CNN where the DCT coefficients found out are used as input to the CNN. This approach has a very high complexity [8]. Another approach used was FFT where they combined the existing block-based feature extraction technique with the FFT. But including FFT transformation in the block-based methodology increases the complexity of the method [9]. In [10] instead of blocks, they have made it into a set of triangles where the interest points are extracted and modeled as connected triangles. But this method cannot be used if interest points cannot be detected. In [11], it explains about an vision approach that forms a feature set to identify a plant and its medicinal values. In [12], an in-depth analysis is done on the issues of using an automated control point process using feature extractors like SURF, Canny edge pixels, Min-Eigen and Harris Corner. In [17], they have implemented an approach to detect color images by extracting the unique features using the SURF extractor. In [13], they have implemented a system that predicts the coordinates of a malignant pulmonary nodules from CT scans using deep learning (Table 1).
3 Proposed Methodology The methodology proposed in this paper combines a newly introduced keypoint feature extractor BEBLID and the block-based methodology discrete cosine transformation (DCT). BEBLID is a fast keypoint descriptor as it is using ORB for feature extraction, and ORB is considered the fastest feature extractor. DCT is also included in the algorithm to detect forgery in images which contain noise, as block-based algorithms show better results than keypoint-based algorithms in such cases (Fig. 1). At first, the collected images from the dataset are converted into blocks or patches. Keypoints are computed and described using BEBLID. Keypoints are an area of interest in an image like a corner. With this obtained feature vector, each pair of blocks is compared, and a similarity score would be obtained. A threshold (Threshold1) is set, and if any of the block pairs have a similarity score greater than this threshold, the image is classified as a forged image. But there are situations where we cannot be sure of the forgery since similarity score obtained by keypoint matching can also
412
G. S. Nair et al.
be due to the repetitive areas in an image; to avoid such false positives, we set a base threshold; for any block pair having the similarity score between the base threshold and threshold1, DCT is applied and a score is obtained; this score will be checked with the threshold set, and image is classified accordingly. As an addition to this, we have also included the generally used efficient keypoint-based feature extractor SIFT in the algorithm to compare the performance between the keypoint extractors. The keypoint matching part in the algorithm was tried and tested with SIFT, and then DCT was also performed to avoid false positives in this case also. For block-based approach, the matched block pair images obtained after applying keypoint matching are split into 8 × 8 block pixels. For each block pair, DCT is applied and is rearranged into feature vector using zigzag scanning. These blocks are then lexicographically sorted, and similarity between blocks is calculated using Euclidean distance. Two thresholds are set: one for the similarity in the feature vector and the other to make sure that the overlapping blocks are not counted in the similar blocks. If the number of similar blocks is greater than a particular threshold, the image is classified as forged. Algorithm 1: Proposed Algorithm Result: Outputs whether the given image is forged or not. for Images do Segment The Image Into Non-Overlapping Blocks; end while Blockpairs do Calculate the SimilarityScore using feature extraction algorithm (BEBLID) ; if SimilarityScore > Threshold1 then "Image is forged" ; break; end if SimilarityScore > BaseThreshold and SimilarityScore < Threshold1 then Store the matched block pair ; end end for Matched block pairs do Calculate DCT Find the similarity score ; if SimilarityScore > Threshold then "Image is forged" else "Image is not forged" end end
Copy-Move Forgery Detection Using BEBLID Features and DCT
413
Fig. 1 Pictorial representation of the process the images undergo to get the keypoints using the feature extractors
4 Result A total of 159 images taken from [10, 15–17] were used for analysis. Out of 159 images, 109 images were forged, and 50 were original images. In forged images, 45 were simple copy-move, 29 were copy-move with scaling and 35 were copy-move with rotated images. Out of the scaled images, four sets of images were scaled up or down by seven different scaling factors from 0.5–2. Among the rotated images, five sets of images were rotated at an angle of 15, 30, 60, 90, 180, 270 and 330 and were used (Table 2). BEBLID was able to detect 103 out of 109 forged images, whereas SIFT detected only 95 images. BEBLID and SIFT along with DCT showed good performance in the detection of simple copy-move forged images, and both methods were able to detect 41 out of the 45 forged images. On detection of copy-move forged image with scaling and copy-move with rotation, BEBLID was able to give a better performance than, SIFT and the experimental results are given in Table 3. Keypoint matching using both the algorithms is depicted below from Figs. 3, 4 and 5. Figure 2 shows the original image of all three images we used as examples.
Table 2 Various image transformations and its results Type of image No. of images Correct by SIFT Rotated copy-move Scaled copy-move Simple copy-move Original
35 29 45 50
30 24 41 45
Correct by BEBLID 35 27 41 44
414
G. S. Nair et al.
Table 3 Comparison of SIFT and BEBLID Precision SIFT BEBLID
0.949 0.947
Recall
F1-score
0.870 0.945
0.908 0.945
Fig. 2 Original images before forgery (a–c)
In Fig. 3a, b., the result of BEBLID and SIFT is shown, respectively, naming is done like the same for the other two images. Figure 3 represents a simple copymove forged image with matching keypoint. Figure 4 represents a rotated image, as we can see matching lines are more intense in Fig. 4a which was matched using BEBLID indicating a large number of matched keypoints. Figure 5 is tampered by copy-pasting an area after scaling with a factor of 1.75. Even though SIFT and BEBLID show a precision of around 94%, BEBLID is a better feature extractor when it comes to image tampering with scaling or rotation. Proposed method was also able to give better accuracy even with a lesser number of matched keypoints, since suspected block pairs are again checked by applying DCT. Both the algorithms failed in cases where the forged area is small or forged areas lie within a block.
Copy-Move Forgery Detection Using BEBLID Features and DCT
(a) BEBLID
415
(b) SIFT
Fig. 3 Keypoint matching for image (a)
(a) BEBLID
(b) SIFT
Fig. 4 Keypoint matching for image (b)
(a) BEBLID Fig. 5 Keypoint matching for image (c)
(b) SIFT
416
G. S. Nair et al.
5 Conclusion In this paper, we proposed an algorithm that combined the block-based methodology DCT and keypoint-based feature extractor BEBLID to detect the forgery in copymove forged images. Combining both the kinds of methods in the algorithm helps in increasing the accuracy of detection. We have also compared the performance of the feature extraction method BEBLID and SIFT on copy-move forged images. We tested images with simple copy-move, scaled copy-move and rotated copy-move in which BEBLID showed a better performance than SIFT when the copy-move area is scaled or rotated. The overall resulting accuracy rate of both the feature detectors shows that BEBLID has slighter higher performance than SIFT. The recall and the F1-score value calculated also show the same. But the precision value was almost the same. Although the proposed algorithm increased the accuracy of detection of copymove forged images, with both BEBLID and SIFT, the detection of forgery was not accurate when the forged area was very small and when forged areas lie within a block. We suggest a further improvement in this drawback by decreasing block size or considering overlapping blocks.
References 1. Suárez, I., Sfeir, G., Buenaposada, J.M., Baumela, L.: BEBLID: boosted efficient binary local image descriptor. Pattern Recognit. Lett. 133, 366–372 (2020) 2. Alkawaz, M.H., Sulong, G., Saba, T. et al. : Detection of copy-move image forgery based on discrete cosine transform. Neural Comput. Appl. 30, 183–192 (2018). https://doi.org/10.1007/ s00521-016-2663-3 3. Pan, X., Lyu, S.: Region duplication detection using image feature matching. IEEE Trans. Inf. Forensics Secur. 5(4), 857–867 (2010). https://doi.org/10.1109/TIFS.2010.2078506 4. SyamNarayanan, S., Gopakumar, G.: Recursive block based keypoint matching for copy move image forgery detection. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6 (2020). https://doi.org/10.1109/ ICCNT49239.2020.9225658 5. Kaur, A., Sharma, R.: Copy-move forgery detection using DCT and SIFT. In. J. Comput. Appl. 70, 30–34 (2013) 6. Mayer, O., Stamm, M.C.: Exposing fake images with forensic similarity graphs. IEEE J. Sel. Top. Signal Process. 14, 1049–1064 (2020). https://doi.org/10.1109/JSTSP.2020.3001516 7. Farid, H.: Exposing digital forgeries from JPEG ghosts. IEEE Trans. Inf. Forensics Secur. 4, 154–160 (2009). https://doi.org/10.1109/TIFS.2008.2012215 8. Taya, K., Kuroki, N., Takeda, N., Hirose T., Numa, M.: Detecting tampered regions in JPEG images via CNN. In: 2020 18th IEEE International New Circuits and Systems Conference (NEWCAS), pp. 202–205 (2020). https://doi.org/10.1109/NEWCAS49341.2020.9159761 9. Kanwal, N., Girdhar, A., Kaur, L., Bhullar, J S.: Detection of digital image forgery using fast Fourier transform and local features. In: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp. 262–267 (2019). https://doi.org/ 10.1109/ICACTM.2019.8776709
Copy-Move Forgery Detection Using BEBLID Features and DCT
417
10. Ardizzone, E., Bruno, A., Mazzola, G.: Copy-move forgery detection by matching triangles of keypoints. IEEE Trans. Inf. Forensics Secur. 10(10), 2084–2094 (2015). https://doi.org/10. 1109/TIFS.2015.2445742 11. Venkataraman, D., Mangayarkarasi, N.: Computer vision based feature extraction of leaves for identification of medicinal values of plants. In: IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5 (2016) 12. Menon, H.P.: Issues involved in automatic selection and intensity based matching of feature points for mls registration of medical images. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 787–792 (2017) 13. Sreekumar, A., Nair, K.R., Sudheer, S., Ganesh Nayar, H., Nair, J.J.: Malignant lung nodule detection using deep learning. In: Proceedings of the 2020 IEEE International Conference on Communication and Signal Processing, ICCSP 2020, pp. 209–212, 9182258 (2020). https:// doi.org/10.1109/ICCSP48568.2020.9182258 14. Wen, B., Zhu, Y., Ramanathan, S., Ng, T., Shen, X., Winkler, S.: COVERAGE—A novel database for copy-move forgery detection. In: IEEE International Conference on Image Processing (ICIP), pp. 161–165 (2016). https://doi.org/10.1109/ICIP.2016.7532339 15. Amerini, I., Ballan, L., Caldelli, R., Bimbo, A., Serra, G.: A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Trans. Inf. Forensics Secur. 6, 1099–1110 (2011). https://doi.org/10.1109/TIFS.2011.2129512 16. Christlein, V., Riess, C., Jordan, J., Riess, C., Angelopoulou, E.: An evaluation of popular copymove forgery detection approaches. IEEE Trans. Inf. Forensics Secur. 7, 1841–1854 (2012). https://doi.org/10.1109/TIFS.2012.2218597 17. Muthugnanambika, M., Padmavathi, S.: Feature detection for color images using SURF. In: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 1–4 (2017). https://doi.org/10.1109/ICACCS.2017.8014572
Engineering Design Optimization Using Memorized Differential Evolution Raghav Prasad Parouha
and Pooja Verma
Abstract Differential evolution and its diverse variants are prominently inflated by unfitting operators like mutation and crossover. Basically, DE is does not commit to memorize the finest effects attained in early portion of the preceding peers. To resolve this issue, grounded on memory mechanism a new DE (named as mb DE) is offered in this paper to solve a famous engineering design problem, so-called speed reducer design. This contained new mutation and crossover (swarm, mutation and crossover) created by particle swarm optimization (PSO) circumstance. The empirical results confirm the superiority of mb DE over many existing algorithms. Keywords Engineering design problem · Differential evolution · Particle swarm optimization · Usage of memory
1 Introduction These days, a huge amount of evolutionary algorithms (i.e., EAs) are developed to deal with engineering design optimization problems in literature [1]. Differential Evolution (abbreviated as DE) [2], is one of the popular method among successful EAs to solve difficult optimization problem. Owing to its easiness and lots of benefits like easy carrying out, rationally faster, robust and comprehensive search capability [3], DE has gained wide popularity over few decades. Also, it was used to solve various engineering problems for instance economic load dispatch [4], to design quantization table for JPEG baseline system [5], fuzzy clustering of image pixel [6], mechanical engineering design problem [7], etc. However, while solving multifaceted problems, DE gets stuck into local minima and causing with an untimely convergence [3]. Moreover, DE does not surety to extent at global optimal result in a limited stint interval. R. P. Parouha (B) · P. Verma Department of Mathematics, Indira Gandhi National Tribal University, Amarkantak, M.P 484886, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_37
419
420
R. P. Parouha and P. Verma
Hence, so as to advance the presentation of elementary DE, quantities of efforts are completed in the literature [8–12]. Accordingly, operators (mutation and crossover) selection is greatly challenged in DE. As well incorrect operator’s selection of DE may lead to stagnation, untimely convergence and so on [13]. Over the time, to get rid of it researchers reforming DE through altered mutation schemes [12–17]. Likewise, researchers used mostly twofold crossover (binomial and exponential) in DE [2]. Nonetheless, it is detected that among these crossovers there is no substantial differences [17]. Additionally, hybrids of DE produce quality outcomes instead of its different improved variants [18–22]. Mostly, DE hybridized with particle swarm optimization (PSO) [23] due to contemporary properties. They turned out to be milestones in solving engineering optimization problems. Until now in the literature numerous alternatives and hybrids of DE has been advised to answer industrial optimization problems. But all of them are incompetent in delivering reasonable outcomes, as a consequence of DE do not have any tool/operators to remember the so-far best result [22], besides this in the search space it uses only the global facts. As a result, generally DE loses computing control and easily leads to premature convergence [21, 22]. Inspired from particle swarm optimization (abbreviated as PSO) [23] memorized mechanism; a new DE (mb DE) is proposed in this paper, to answer engineering design optimization problems. It contained new mutation (viz. swarm mutation) with new crossover (viz. swarm crossover) formed by term of PSO. The proposed mb DE employs memory process which permits each particles to improve own search competences utilizing by the knowledge of more effectual particles. The memorized knowledge as aiming at better strength may converges near to optimal solutions. Further, the paper is structured as below: The second section consists construction of speed reducer design (SRD) problem. Then third section presents outline of the conventional DE and detailed of the proposed mb DE. Computational results for the problem are in fourth section. At last, fifth section gives conclusion on this paper along with future direction.
2 Speed Reducer Design (SRD) Problem In structural optimization, SRD problem is the classical complex optimization problem. It depicts the structure of a modest gear box used in a light airplane between the engine and propeller to allow each to rotate at its most efficient speed. Basic aim of SRD problem is to skeletonize weights of this speed reducer depending on constraints such as surface stress, bending stress of the gear teeth, stresses in the shafts and transverse deflections of the shafts. The objective function of the SRD problem mathematically presented below in Eq. (1) and its schematic shown in Fig. 1. Minimize f (x) = 0.7854x1 x22 3.3333x32 + 14.9334x3 − 43.0934
Engineering Design Optimization Using Memorized …
421
Fig. 1 Schematic of the SRD problem
− 1.508x1 x62 + x72 + 7.4777 x63 + x73 + 0.7854 x4 x62 + x5 x72 (1) where face width (b)—x1 ; teeth module (m)—x2 ; number of pining teeth (z)—x3 ; length of shaft 1 between bearing (l1 )—x4 ; length of shaft 2 between bearing (l2 )—x5 and diameter of shaft 1 (d1 )—x6 ; diameter of shaft 2 (d2 )—x7 . Constraint of the SRD problem g1 , g2 , g3 , g4 , g5 , g6 , g7 , g8 , g9 , g10 , g11 and g12 are listed in Eqs. (2)–(12), respectively. g1 (x) =
27 −1≤0 x1 x22 x3
(2)
g2 (x) =
397 −1≤0 x1 x22 x3
(3)
g3 (x) =
1.93x42 −1≤0 x1 x64 x3
(4)
g4 (x) =
1.93x42 −1≤0 x1 x74 x3
(5)
1/2 (745(x4 /x2 x3 ))2 + 16.9 × 106 −1≤0 g5 (x) = 110x63 1/2 (745(x5 /x2 x3 ))2 + 157.9 × 106 −1≤0 g6 (x) = 85x73
(6)
(7)
g7 (x) =
x2 x3 −1≤0 40
(8)
g8 (x) =
5x2 −1≤0 x1
(9)
422
R. P. Parouha and P. Verma
x1 −1≤0 12x2
(10)
g10 (x) =
1.5x6 + 1.9 −1≤0 x4
(11)
g11 (x) =
1.5x7 + 1.9 −1≤0 x5
(12)
g9 (x) =
where 2.6 ≤ x1 ≤ 3.6; 0.7 ≤ x2 ≤ 0.8; 17 ≤ x3 ≤ 28; 7.3 ≤ x4 ≤ 8.3; 7.3 ≤ x5 ≤ 8.3, 2.9 ≤ x6 ≤ 3.9, 5.0 ≤ x7 ≤ 5.5.
3 Proposed Methodology This section presents overview of the conventional DE and details of the proposed algorithm mb DE.
3.1 DE Outline Initialization: at ‘tth’ generation population and for any d-dimensional optimization t t t ; i = 1, 2, . . . , np is , xi,2 , . . . , xi,d problem np parents (target) vectors (xi,t j = xi,1 randomly created in given limits. t t t is produced by Eq. (13). , vi,2 , . . . , vi,d Mutation: a mutant vector vi,t j = vi,1 vi,t j = xrt1 + M × xrt2 − xrt3
(13)
where r1 = r2 = r3 = i& M ∈ [0, 1] isthe mutation factor. t t t is made by Eq. (14). Crossover: a new trial vector u i,t j = u i,1 , u i,2 , . . . , u i,d u i,t j
=
vi,t j ; if rnd(0, 1) ≤ Cr xi,t j ; if rnd(0, 1) > Cr
(14)
where Cr ∈ [0, 1] denotes crossover constant. Selection: It is given in Eq. (15). xi,t+1 j
=
u i,t j ; if f u i,t j ≤ f xi,t j xi,t j ; otherwise
(15)
Engineering Design Optimization Using Memorized …
423
Above three operations are executed at each generation until terminal conditions are fulfilled.
3.2 Suggested Technology mb DE After literature survey, succeeding opinions on DE and its alternatives are noted as (i) kernel operators (mutation and crossover) selection is greatly challenged in DE. As well incorrect operator’s selection of DE may cause to untimely convergence, stagnation and so on. (ii) choice of proper operators puts a weighty effect in quality solution of DE. And (iii) basically DE is does not commit to memorize the finest effects attained in early part of the preceding peers. Above notes and drawbacks as well as benefits of DE became the main inspiration to propose an effective algorithm. The following major key points are considered for new proposed algorithm. (i) constitution a novel approach to remembering the finest effects/results attained in early part of the preceding peers. (ii) necessity to improve all two-difference established mutation scheme of DE and (iii) in place of purely repetition of mutated and target vectors a slight perturbation needed to crossover operator of DE. The perfect conception of ‘usage of memory’ is lent from device of PSO. Since PSO stimulated on or after social and competitive behavior of clustered birds wellknown as swarm, hence proposed operators termed as swarm mutation and swarm crossover. Moreover, suggested DE named as mb DE due to usage of memory mechanism. Pictorial demonstration of the idea of ‘usage of memory’ is clearly differentiates in Fig. 2. t t t in current generation ‘t’ is produced , vi,2 , . . . , vi,d A mutant vector vi,t j = vi,1 t t t by swarm mutation using target vector (xi,t j = xi,1 and best position , xi,2 , . . . , xi,d t t t t vector pi, j = pi,1 , pi,2 , . . . , pi,d by given below Eq. (16).
‘tth’ Generation Initial Population
‘(t+1)th’ Generation Use of Memory MBDE
Initial Population
Mutation
Mutation
Crossover
Crossover
Selection
Selection
Fig. 2 Usage of memory in mb DE versus DE
424
R. P. Parouha and P. Verma
vi,t j
f pt i,best t × pi,best = + rnd(0, 1) t − xi,t j f xi,worst f gt best t × gbest + rnd(0, 1) t − xi,t j f xi,worst xi,t j
(16)
t t where pi,best : individual best position of vector pi,t j , gbest : global best position of t t t t vector pi, j , f xi,worst : worst function value of vector xi, j (if f xi,worst : = 0, then it could be altered by a big positive constant) and here rnd(0, 1): random number lying in interval (0, 1). Similar in PSO, the cognitive and social inheritance permits the particles to advance the search competences utilizing by the knowledge of more effectual particles. t t t , generated by swarm crossover by , u i,2 , . . . , u i,d A new trial vector u i,t j = u i,1 given below Eq. (17).
u i,t j
=
t t vi,t j + rnd(0, 1) × gbest ; if rnd(0, 1) ≤ Cr − pi,best t t t xi, j + rnd(0, 1) × gbest − pi,best ; if rnd(0, 1) > Cr
(17)
This adoption may to enhance convergence rate and local search ability of DE as usage of memorized concept. In mb DE, both proposed operator (swarm mutation and crossover) are marinating sufficient diversity. Therefore, selection operator is exchanged by elitism in mb DE cycle which works as below and retains better results. (i) merged initial and final population then (ii) arranged in ascending order according as the function value of them, then (iii) select better halves discard the rest. Flowchart of mb DE is given in Fig. 3.
4 Computational Results The mb DE is executed on c language with Window 10 OS PC i7, 2.30 GHz, 4.00 GB RAM. After numerous trials fine tuning value of C r = 0.9 and R (penalty factor) = 1000 (to handle the constraint as constrained problem converted into unconstrained optimization problem by adding bracket operator penalty [24]) is acclaimed to use in mb DE. The produced results of mb DE on speed reducer design problem are equated with DE [3], PSO [23] and PSO-DE [22]. The simulations are reported using 30 independent run and np (population size) = 100 with 240,000 FEs (maximum no. of evaluations of function) which is set as stopping condition like other compared algorithm, for fair comparison. The comparative experimental outcomes, i.e., finest parameters, worst, mean, best, standard deviation (std.) and FEs of values of objective function are given as Table 1. Values which are best overall of corresponding algorithms are
Engineering Design Optimization Using Memorized …
425
Start initialize randomly
Population
Yes
Is t =0 No =
Update
= =
using
and
t=t+1
Swarm Mutation Swarm Crossover Elitism Yes
No
Stopping Criteria Met?
Report Optimal Solution
Stop
Fig. 3 Flowchart of mb DE algorithm Table 1 Simulation results Indicators
Methods
Variables
PSO
Best
DE
PSO-DE
mb DE
b(x1 )
3.5015
3.50411
3.5
3.50000
m(x2 )
0.7000
0.7
0.6
0.70000
z (x3 )
17.000
17
17
17
l1 (x4 )
7.6050
7.3
7.3
7.32760
l2 (x5 )
7.8181
7.33342
7.71
7.71532
d1 (x6 )
3.3520
3.37164
3.35
3.35028
d2 (x7 )
5.2875
5.28916
5.29
5.28665
2994.744241
2994.74423
2996.348165
2994.47107
Mean
3001.758264
3001.75825
2996.348165
2994.62318
Worst
3009.96474
3009.964736
2996.34816
2994.76112
Std
1.52e + 01
5.86e–02
1.0e–07
1.52e–10
Fes
41,000
30,000
70,100
10,000
426
R. P. Parouha and P. Verma
Optimum cost
3
5.0x10 3 4.9x10 3 4.8x10 3 4.7x10 3 4.6x10 3 4.5x10 3 4.4x10 3 4.3x10 3 4.2x10 3 4.1x10 3 4.0x10 3 3.9x10 3 3.8x10 3 3.7x10 3 3.6x10 3 3.5x10 3 3.4x10 3 3.3x10 3 3.2x10 3 3.1x10 3 3.0x10
mb
DE PSO-DE DE PSO
0
10
20
30
40
50
60
70
80
90
100
Iteration Fig. 4 Convergence graph for SOP-4
noted as bold letters. From Table 1, mb DE displays competitive or likewise results w.r.t. considered algorithms. Eventually, mb DE produced less std. (it may 0.00e+ 00) with reduced amount of FEs which designates its stability and robustness. On the whole, mb DE is produces either superior or similar results with its competitors. The convergence speed (objective function values/optimum cost vs. iteration on same random population) and average execution time of mb DE with others on considered speed reducer design problem are separately shown in Figs. 4 and 5. It is clear that from respective figures mb DE converges faster and uses less execution time than others. Overall, mb DE is produces either superior or similar results with its competitors.
5 Conclusion A new DE (mb DE) based on memory mechanism is proposed in this paper to solve a famous engineering design problem, so-called SRD. It memorizes the finest effects attained in early part of the preceding peers through newly employed mutation (viz. swarm mutation) and crossover (viz. swarm crossover) operators. Because usages of the p best and g best concept of particle swarm optimization (PSO) circumstance. The simulation (numerical, statistical and graphical) results determine that mb DE carries the ultimate results in terms of solution excellence and convergence rate as well as exceeds traditional DE and PSO with its hybrid PSO-DE. The usages of the
Engineering Design Optimization Using Memorized …
427
Algorithm
PSO
DE
PSO-DE
mbDE 0
5
10
15
20
25
30
Average execution time (in second)
Fig. 5 Average execution time of mb DE with others
memory mechanism form the proposed mb DE to become more effective and robust. The proposed mb DE can be useful for combinatorial optimization problems in future research.
References 1. Parouha, R.P., Das, K.N.: Parallel hybridization of differential evolution and particle swarm optimization for constrained optimization with its application. Int. J. Syst. Assur. Eng. Manage. 7, 143–162 (2016) 2. Storn, R., Price, K.: Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(2), 341–359 (1997) 3. Das, S., Abraham, A., Chakraborty, U.K., Konar, A.: Differential evolution using a neighborhood-based mutation operator. IEEE Trans. Evol. Comput. 13(3), 526–553 (2009) 4. Wang, Y., Li, B., Weise, T.: Estimation of distribution and differential evolution cooperation for large scale economic load dispatch optimization of power systems. Inf. Sci. 180(12), 2405–2420 (2010) 5. Kumar, B.V., Karpagam, M.: Differential evolution versus genetic algorithm in optimising the quantisation table for JPEG baseline algorithm. Int. J. Adv. Intell. Paradigms 7(2), 111–135 (2015) 6. Das, S., Sil, S.: Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm. Inf. Sci. 180(2), 1237–1256 (2010) 7. Zhang, M., Luo, W., Wang, X.: Differential evolution with dynamic stochastic selection for constrained optimization. Inf. Sci. 178(15), 3043–3074 (2008) 8. Bilal Pant, M., Zaheer, H., Garcia Hernandez, L., Abraham, A.: Differential evolution: a review of more than two decades of research. Eng. Appl. Artif. Intell. 90, 1–24 (2020) 9. Opara, K.R., Arabas, J.: Differential evolution: a survey of theoretical analyses. Swarm Evol. Comput. 44, 546–558 (2019) 10. Eltaei, T., Mahmood, A.: Differential evolution: a survey and analysis. Appl. Sci. 8(10), 1–25 (2018)
428
R. P. Parouha and P. Verma
11. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Cybern. 15(1), 4–31 (2011) 12. Neri, F., Tirronen, V.: Recent advances in differential evolution: a survey and experimental analysis. Artif. Intell. Rev. 33(1–2), 61–106 (2010) 13. Gong, W., Cai, Z.: Differential evolution with ranking based mutation operators. IEEE Trans. Cybern. 43(6), 2066–2081 (2013) 14. Das, K.N., Parouha, R.P.: Optimization with a novel hybrid algorithm and applications. Opsearch 53(3), 443–473 (2016) 15. Dhanalakshmy, D.M., Akhila, M.S., Vidhya, C.R., Jeyakumar, G.: Improving the search efficiency of differential evolution algorithm by population diversity analysis and adaptation of mutation step sizes. Int. J. Adv. Intell. Paradigms 15(2), 119–145 (2020) 16. Lenin, K., Ravindhranathreddy, B., Suryakalavathi, M.: Hybridisation of backtracking search optimisation algorithm with differential evolution algorithm for solving reactive power problem. Int. J. Adv. Intell. Paradigms 8(3), 355–364 (2016) 17. Das, K.N., Parouha, R.P., Deep, K.: Design and applications of a new DE-PSO-DE algorithm for unconstrained optimisation problems. Int. J. Swarm Intell. 3(1), 23–57 (2017) 18. Das, K.N., Parouha, R.P.: An ideal tri-population approach for unconstrained optimization and applications. Appl. Math. Comput. 256, 666–701 (2015) 19. Parouha, R.P., Das, K.N.: DPD: An intelligent parallel hybrid algorithm for economic load dispatch problems with various practical constraints. Expert Syst. Appl. 63, 295–309 (2016) 20. Wang, Y., Li, H.X., Huang, T., Li, L.: Differential evolution based on covariance matrix learning and bimodal distribution parameter setting. Appl. Soft Comput. 18, 232–247 (2014) 21. Das, S., Konar, A., Chakraborty, U.K.: ‘Improving particle swarm optimization with differentially perturbed velocity. In: proceedings Genetic Evolutionary Computation Conference, pp. 177–184 (2005) 22. Liu, H., Cai, Z., Wang, Y.: Hybridizing particle swarm optimization with differential evolution for constrained numerical and engineering optimization. Appl. Soft Comput. 10, 629–664 (2010) 23. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, Perth: IEEE (1995) 24. Deb, K.: Optimization for Engineering Design: Algorithms and Examples. Prentice-Hall of India, New Delhi (1995)
Image Forgery Detection Using CNN and Local Binary Pattern-Based Patch Descriptor Shuvro Pal and G. M. Atiqur Rahaman
Abstract This paper aims to propose a novel method to detect multiple types of image forgery. The method uses Local Binary Pattern (LBP) as a descriptive feature of the image patches. A uniquely designed convolutional neural network (LBPNet) is proposed where four VGG style blocks are used followed by a support vector machine (SVM) classifier. It uses ‘Swish’ activation function, ‘Adam’ optimizing function, a combination of ‘Binary Cross-Entropy’ and ‘Squared Hinge’ as the loss functions. The proposed method is trained and tested on 111,350 image patches generated from phase-I of IEEE IFS-TC Image Forensics Challenge dataset. Once trained, the results reveal that training such network with computed LBP patches of real and forged image can produce 98.96% validation and 98.84% testing accuracy with area under the curve (AUC) score of 0.988. The experimental result proves the efficacy of the proposed method with respect to the most state-of-the-art techniques. Keywords Image forgery · Convolutional neural network (CNN) · Local binary pattern (LBP) · LBPNet
1 Introduction Manipulating digital images via intuitive software is very common now as it can be done with very low cost and with less hard effort. Such outcomes often create serious situations as the facts may be distorted and the public opinion may be biased by fake visual proofs, yielding negative social influence. Therefore, there is a potential necessity to build a robust and accurate detection method which can classify whether a picture is original or not. In order to address the problem, a number of active research
S. Pal (B) · G. M. Atiqur Rahaman Computer Science and Engineering Discipline, Khulna University, Khulna 9208, Bangladesh e-mail: [email protected] G. M. Atiqur Rahaman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_38
429
430
S. Pal and G. M. Atiqur Rahaman
communities have put attention in recent years. A real image can be forged by resampling, compression or scaling. Farid [1] proposed some methods to detect such image forgery. Between two basic forgery detection types, active forgery detection requires specific digital information to be embedded with the real image. But passive forgery detection does not require any extra information. Image manipulation frequently reveals severe flaws and reduces the credibility of digital photographs. Therefore, forgery detection often poses a dying need, especially if the photograph is used as evidence in a court of law, in news articles, as part of medical records, or in financial documents. Even beyond these domains forged images in various social media often create social, political and religious vulnerability in particular groups of people or region or county. The research scopes of detecting a forged image or recognizing the forged parts are a complex task for a number of regions. First, there are not so much robust detection systems available yet from the existing research works. Second, non-availability of large volume dataset is one of the biggest challenges to solve this particular problem too. The number of open source dataset is not quite enough and none of them contains a handful amount of images of different variations of image manipulations. Third, the practical application field of this problem is much more challenging than the theoretical domain. Today’s social media platforms are facing enormous challenges to differentiate credible multimedia content for their platforms who deals with billions of contents every day. Because no state-of-the-art techniques has not been introduced yet to detect the authenticity of such huge volume images in real-time with better accuracy. For which the field is very active now and an increasing number of publications are coming up in recent years focusing this problem domain. The rest of the paper is organized as follows. Section 2 describes the related research works of image forgery detection; Sect. 3 describes the proposed forgery detection methodology; Sect. 4 illustrates the experimental results; discussion and summary are given in Sect. 5.
2 Related Works Mahale et al. [2] proposed a method to use Local Binary Pattern (LBP) for feature extraction to identify image inconsistency. For COMOFOD database they claimed 98.58% accuracy to detect inconsistent image from the dataset. Cozzolino et al. [3] mentioned residual-based image descriptors and blockmatching techniques to identity image splicing and copy-move image forgeries. They won phase 1 of the first Image Forensics Challenge, which was held in 2013 by the IEEE Signal Processing Society with the score of 0.9421 on the test set. Xu et al. [4] experimented couple of statistical features and their combinations. They introduced the Hamming distance of Local Binary Patterns (LBP) as similarity measurement. They achieved 93.7% accuracy with the phase-I dataset the first IEEE IFS-TC Image Forensics Challenge and the f-score of 0.26. Bayar et al. [5] trained a CNN with the same dataset. They randomly collected 2445 images, created 100,000
Image Forgery Detection Using CNN and Local …
431
patches for training, where only 16,667 patches were real. Accuracy was 98.70% on testing cases in 45 epochs (70,000 iterations), but data imbalance between real and tempered patches is noteworthy even in their test dataset. Farooq et al. [6] used local binary pattern combining with spatial rich model (SRM). They generated a custom dataset from IEEE IFS-TC Image Forensics Challenge. Their experiment suggests that computation of LBP on noise residuals along with co-variance matrices produced their best model with accuracy of 98.4%. Huang et al. [7] proposed a CNN model where they used automatic feature learning to extract features from each convolutional layer and image manipulation. They used CASIA v1.0 dataset and generated two other datasets from it. For CASIA v1.0 their accuracy was 96.71%, whereas 93.13 and 64.44% for the other two. Some researchers trained their network from a single dataset and later tested their performance with multiple dataset. Salloum et al. [8] constructed a multi-task fully convolutional network (MFCN) which uses two output branches for multitask learning. They trained their network with CASIA v2.0 dataset and tested the performance on multiple other dataset. Cozzolino et al. [9] in his another experiment, proposed a Siamese network to detect image splicing and hence localization. Pairs of image patches from the same or other cameras were used to train the network and performance was tested on multiple dataset. Many of modern day’s researchers follow either statistical or deep learning-based approach or combination of both for modeling their approaches to detect image forgeries. Similar methodology motivates this work to come up with better ideas to solve this extremely challenging problem.
3 Materials and Methods Textural patterns are one of the prominent features in forgery detection scenario since the pixel level distortion is prevalent in forged areas of an image. Here is where the descriptive feature like Local Binary Pattern (LBP) comes into play.
3.1 Local Binary Pattern (LBP) Local Binary Patterns are texture descriptor which was originally proposed by Ojala et al. [10]. It calculates the texture’s local representation. This representation is created by comparing each pixel with the pixels in its immediate vicinity. To construct a LBP descriptor at first the image must be converted to grayscale. For each pixel in the grayscale image that was generated, we can choose an r-size neighborhood to surround the center pixel. After that this center pixel’s LBP value can be calculated and stored in the output 2D array. The mathematical representation of this process is:
432
S. Pal and G. M. Atiqur Rahaman
Fig. 1 LBP conversion of an image: left (RGB), right (LBP). Source: IEEE IFS-TC Image Forensics Challenge phase-I dataset
LBP P,R =
P−1
f (Gi − Gc )2i
(1)
i=0
f (x) =
1, if x ≥ 0; . 0, otherwise
In Eq. (1), P represents the sampling points on a circle of radius R, i.e., neighborhood pixels, c represents the center pixel. LBP creates low-dimensional representations of any input photos, emphasizing local topographical features. This surprisingly memory efficient algorithm is more than a capable tool to categorize even labelless photos by comparing their visual similarities and calculating the likelihood that each image is sampled from the same population. It can simplify high dimensional images into low-dimensional ones which can be used as input vectors in more complex machine learning models. Additionally, there is hardly any computationally efficient algorithm even now to beat LBP in terms of less storage and memory requirements. Since forged images contain less similar texture patterns than the real one hence we used LBP as a key feature to differentiate between them. LBP converted image of an input RGB image will look like as follows (Fig. 1).
3.2 Dataset For this research experiment phase-I dataset of IEEE IFS-TC Image Forensics Challenge [11] has been chosen. It has a total of 1050 real images, 450 forged images and their 450 masks. The mask images are black and white (not grayscale) images which describe spliced area of the forged image. We discarded some grid images among the real portion which is filled with pixels of 0 values and kept 1475 images in total.
Image Forgery Detection Using CNN and Local …
433
3.3 Feature Extraction Mask images detect anomalies at the pixel level of a forged area of an image. Rao et al. [12] followed an approach to sample down the forged part matching with their corresponding masks at their boundaries. The approach is useful to extract forged patches from an image, which was followed in this experiment too. It was made sure that both the forged and real parts of an image contributed at least 50% when sampling the fake image at the spliced region’s boundary with the mask. We extracted 64 × 64 patches of stride 16 and 96 × 96 patches of stride 24 for extensive experiments. When sampling was done, in first case we had 47,200 forged and real patches each, hence 94,500 patches in total. For the second case of 96 × 96 patches, we extracted 55,965 forged and real image patches each. So there were total 111,350 patches in the dataset. We used 70% of these patches to train proposed deep learning model, rest of the 30% were used for validation and testing.
3.4 Designing Custom CNN Architecture (LBPNet) Initially a base CNN architecture was used to train with the extracted features. Later, a state-of-the-art (SOTA) architecture, named as VGG block [13] was utilized. A succession of convolutional layers, followed by a max pooling layer, can be found in one VGG block for spatial down-sampling. Generally these blocks use small filters (3 × 3 or 5 × 5). We tried hundreds of combination to find better accuracy comparing with modern methods. After feature engineering we had 64 × 64 and 96 × 96 shaped LBP patches. Hence, we customized the input layer so that CNN model can take grayscale patches (64 × 64 × 1 or 96 × 96 × 1). Then, VGG style blocks were used to build the desired network architecture. For every block, we took two convolutional layers each with 32 filters. Instead of using the traditional filter size of (3, 3) or (5, 5) here we used (7, 7) filter size for the first block (Fig. 2). We have seen that larger filter size helped the network to better understand the forged part. Stride size was 1 × 1 in our case. A max pooling layer of size 2 × 2 with a stride of the same dimensions (1 × 1) and a dropout layer sits next for the first block. Next 3 blocks are similar like the first block, except (5, 5) filter size was used on each cases. Also the number of filters was increased on each following block. Dropout percentage was increased on each layer by 5–10% and it topped up to 50% on last block.
Fig. 2 Proposed VGG style block architecture
434
S. Pal and G. M. Atiqur Rahaman
The fully connected layer was replaced by support vector machine (SVM) here. In a classification task when clear margin of distinction exists between classes, SVM performs well. The problem we are solving here is a classification task and a highly distinctive feature like LBP is being used. Hence, as part of experiment we tried both fully connected layer (FCL) and SVM separately on top of CNN architecture. The experiment results suggested that adding SVM performed better than FCL. The number of learning epochs was significantly lower than using FCL, hence we replaced it. This completes our proposed network architecture and we named it as ‘LBPNet’. Activation Function: We used ‘Swish’, proposed by Ramachandran et al. [14] in our case. It acted smoother and gave better performance in both training and validation dataset than traditional Rectified Linear Unit (ReLU). The multiplication of the input x with the sigmoid function for x give outputs as Swish function. S(x) = x ∗ sigmoid(x) = x ∗ (1 + e−x )
(2)
−1
Optimizing Function: ‘Adam (Adaptive Moment Estimation) Optimizer’, a popular stochastic gradient decent algorithm which requires less memory and computationally efficient was chosen for our case. It computes the gradient’s exponential moving average as well as the squared gradient. The authors of the function suggested a value close to 1.0 for beta1 and beta2 which is in charge of shifting average decay rates. We used beta1 = 0.9, beta2 = 0.999 in our case. Initially, we used 1e−4 as the learning rate and fixed the decay rate as (Fig. 3),
Fig. 3 Proposed CNN architecture (LBPNet)
Image Forgery Detection Using CNN and Local …
decay = (Initial learning rate/epoch)
435
(3)
Loss Function: While using dense layers in our proposed LBPNet, we used Binary Cross-Entropy loss function since we are solving binary classification problem which will identity whether a particular image patch is forged or not where target values are in the set of {0,1}. While using SVM, instead of fully connected layers we used ‘squared hinge loss’ as our go-to option. Instead of calculating smaller errors we preferred to punish larger errors to the network. Technically squared hinge is the square of the output of the hinge’s max() function. A hinge loss can be represented as L(y) = max(0, 1 − t.y)
(4)
Here, desired loss function is stated as L, prediction as y and actual target as t. The function afterward computes the maximum value between 0 and the result of the previous computation. We used the squared version to get a better decision boundary.
3.5 Training Environment We used Kaggle kernel and Colab as our training environment. Both of the resources were limited up to 13 GB RAM, 16 GB GPU and 30 GB storage capacity with a session timeout of 8 h. We used Keras and Tensorflow framework to build our LBPNet.
3.6 Model Training Since we have 2 set of image patches of different dimension, we went for training on both cases to see which dimension knows the forged area better. For 64 × 64 patches, we changed the input layer dimension of LBPNet to (64 × 64 × 1). Following table illustrates the findings. On both cases of training, we trained the model for 50 epochs. In case of (64, 64) patches the training and validation accuracy was 87.32%, 84.78%. Whereas, for (96, 96) patches the scores were 98.75% and 97.96%, respectively. Hence, it is clear that larger patch dimension works better than the smaller one. Hence, we moved forward with the (96, 96) patch dataset at this stage.
436
S. Pal and G. M. Atiqur Rahaman
4 Experimental Results To understand the importance of increasing the number of blocks of LBPNet we made experiments by increasing/decreasing them and following result illustrates the findings. In every attempt the network was trained for 50 epochs (Table 1). The improving validation accuracy score justifies the number of blocks of LBPNet. Hence at this stage, we kept the network which contains 4 blocks. To validate the performance between fully connected and SVM classifier we trained the network for 50 epochs at this stage of the experiment and traced validation accuracy (Table 2). At this stage, we found SVM performed better with the combination of squared hinge loss. We continued the training steps for 100 epochs. It achieved 98.96% accuracy in validation and 98.84% accuracy in testing dataset. The highest accuracy and lowest loss was in epoch 76, there were 609 iterations on each epoch. For testing dataset, our model showed balanced performance also. The precision, recall and f1-score were 0.98, 0.99 and 0.99, respectively for 8240 real image patch samples. Whereas, for 8458 forged image patch samples, precision, recall and f1score were 0.99, 0.98 and 0.99, respectively. The receiver operating characteristic curve (ROC curve) also proves the performance of our classification model. To summarize the model performance, we calculated area under the curve (AUC) score which is 0.988 (Fig. 4). In the following table [NP = No. of patches, PS = Patch size, T-V-T = TrainValidation-Test Ratio, NTD = No. of Test data, (R, F)% = (Real, Forged) data in Test dataset(%), Acc. (%) = Accuracy (%)], we compared our model efficiency with Bayar et al. [5], where they used a CNN to train extracted patches from the same dataset images (Table 3). Additionally, to realize our model performance further we tested it for another popular dataset DSO-1[15]. It contains 200 indoor and outdoor images, 100 of them are real and rests of the 100 are forged ones. It contains various spliced images of people. For this dataset we get the precision, recall and f1-score as 0.87 (Fig. 5), ROC AUC score as 0.875, Area under Precision-Recall Curve (AUPRC) score as 0.887. Table 4. demonstrates the compared result with other state-of-the-art techniques and found it competed considerably well with some recent prominent methods. Table 1 Training and validation accuracy (%) in respect of network layer numbers No. of layers
1
2
3
4
5
Training accuracy
72.47
79.95
91.75
98.75
99.65
Validation accuracy
69.65
77.01
89.33
97.96
97.07
Table 2 Validation accuracy (%) in respect of epoch numbers for SVM and dense layer No. of epochs
10
20
30
40
50
Dense layer
78.47
84.04
86.15
88.75
91.69
SVM
82.82
87.32
91.33
93.32
95.57
Image Forgery Detection Using CNN and Local …
437
Fig. 4 LBPNet model performance. Accuracy and loss in training and validation on (a) and (b). Precision, recall matrix and ROC curve of test dataset on (c–f) Table 3 Performance comparison with Bayar et al. [5] NP
PS
T-V-T
NTD
(R, F)%
Acc. (%)
Bayar et al. [5]
100,000
(256, 256)
70-0-30
32,000
(17.79, 82.21)
98.70
Proposed
111,350
(96, 96)
70-15-15
16,698
(49.34, 50.65)
98.84
438
S. Pal and G. M. Atiqur Rahaman
Fig. 5 Precision and recall matrix on dataset DSO-1
Table 4 Performance comparison based on AUPRC score on DSO-1 dataset
Salloum et al. [8]
Cozzolino et al. [9]
Proposed
0.820
0.769
0.887
5 Discussion and Conclusion Local Binary Pattern is a lightweight texture descriptor for images that is being used for couple of decades to identify local spatial patterns. We have presented a novel methodology here to train a custom convolutional neural network with LBP patch descriptors. We used phase-I dataset of IEEE IFS-TC Image Forensics Challenge to train the proposed LBPNet which performance suggests that it is very effectively trained to classify forged and real patches of corresponding images. We also found that extracted features of 96 × 96 patches performed better than 64 × 64 patches with lesser epochs. Additionally, proposed LBPNet used SVM instead of fully connected layer which achieved AUC score of 0.988. As part of our future extension, we will test network performance in more challenging datasets. Also will experiment if larger patches of 128 or 256 dimension perform better or not with this network. The resulting model was 50 MB which exposes a bright possibility to use it in mobile phones as a convenient tool to distinguish between forged and real ones.
Image Forgery Detection Using CNN and Local …
439
References 1. Farid, H.: Image forgery detection. IEEE Signal Process. Mag. 26(2), 16–25 (2009) 2. Mahale, V.H., Ali, M.M.H., Yannawar, P.L., Gaikwad, A.T.: Image inconsistency detection using local binary pattern (LBP). Procedia Comput. Sci. 115, 501–508 (2017) 3. Cozzolino, D., Gragnaniello, D., Verdoliva, L.: Image forgery detection through residualbased local descriptors and block-matching. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 5297–5301. IEEE (2014) 4. Xu, G., Ye, J., Shi, Y.-Q.: New developments in image tampering detection. In: DigitalForensics and Watermarking, pp. 3–17. Springer International Publishing, Cham (2015) 5. Bayar, B., Stamm, M.: Design principles of convolutional neural networks for multimedia forensics. In: IS&T International Symposium on Electronic Imaging, vol. 7, pp. 77–86 (2017) 6. Farooq, S., Yousaf, M.H., Hussain, F.: A generic passive image forgery detection scheme using local binary pattern with rich models. Comput. Electr. Eng.: Int. J. 62, 459–472 (2017) 7. Huang, N., He, J., Zhu, N.: A novel method for detecting image forgery based on convolutional neural network. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 1702–5. IEEE (2018) 8. Salloum, R., Ren, Y., Jay Kuo, C.-C.: Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN). ArXiv [Cs.CV] (2017). http://arxiv.org/abs/1709.02016 9. Cozzolino, D., Verdoliva, L.: Noiseprint: A CNN-Based Camera Model Fingerprint. ArXiv [Cs.CV] (2018). http://arxiv.org/abs/1808.08396 10. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996) 11. IEEE IFS-TC Image Forensics Challenge Dataset. Accessed March 03, 2020 http://ifc.recod. ic.unicamp.br/fc.website/index.py 12. Rao, Y, Ni., J.: A deep learning approach to detection of splicing and copy-move forgeries in images. In: 2016 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6. IEEE (2016) 13. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv [Cs.CV] (2014). http://arxiv.org/abs/1409.1556 14. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. ArXiv [Cs.NE] (2017). http://arxiv.org/abs/1710.05941 15. Carvalho, T., Faria, F.A., Pedrini, H., da Torres, R.S., Rocha, A.: Illuminant-based transformed spaces for image forensics. IEEE Trans. Inf. Forensics Secur. 11(4), 720–733 (2016)
Extracting and Developing Spectral Indices for Soil Using Sentinel-2A to Investigate the Association of Soil NPK V. Dhayalan
and Karuppasamy Sudalaimuthu
Abstract Spatial soil nutrient information is very essential for agricultural risk assessment and decision making. Remote sensing data act as the chief constituent of soil nutrient information that can be converted to digital soil database which is found to be cost effective and less time consumption compared to manual soil testing procedure. This study investigated the soil nutrients through extracting the soil spectra from multispectral satellite data (Sentinel-2A). Soil samples were taken and tested in spectroradiometer laboratory as per standard procedure. Spectral indices extracted from sentinel data that show the soil properties were correlated with the in situ laboratory spectral results. The degrees of accuracy were identified, and the justifications for using remote sensing technology alternative to manual soil testing were revealed in this study. Mathematical correlations were employed to derive R2 value using OriginPro 8.5 software. R2 value derived from in situ laboratory spectra and the spectra derived from the sentinel data were validated against the independent set of samples which clearly gives pattern of soil properties followed in the project area (Pollachi, Tamil Nadu). The outcome of this study elucidates the extensive usage of satellite data in exploring the soil health information in large scale. Considering the increased availability of freely available remote sensing data, soil information at local and regional scales can be predicted with relatively little financial and human requirements. Keywords Remote sensing · Soil nutrient · Multispectral satellite data · R2 value · Spectra
V. Dhayalan · K. Sudalaimuthu (B) Department of Civil Engineering, SRM Institute of Science and Technology, Kattankulathur, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_39
441
442
V. Dhayalan and K. Sudalaimuthu
1 Introduction Agriculture is the raising solution to the global challenges such as food insecurity, nutritious food availability, sustainable food productivity and so on. Prevailing population accrual leads many socioeconomic issues especially in concern with food availability and productivity. Sustained profitable agriculture productivity solves many socioeconomic issues in the country [1]. The important two natural resources which play a major role for farm productivity are the land and water resources. The water availability is unpredictable and relies on the natural resources, whereas the soil fertility of the agricultural land can be easily and continuously predictable. In the canal irrigation system, the water supply to agricultural practices depends on the water availability in the dam which determines the cropping pattern in the local area. Cropping pattern or multiple cropping depends on soil fertility of the land. Hence, the knowledge on the soil fertility is needed for the decision makers. This supports the farmers to choose cropping pattern to maximize the profit. Also based on the fertility of the land, with the existing knowledge on the water availability, the multiple cropping patterns can be chosen by the farmers. Hence, this study on soil properties of the agricultural land is very imperative which determines the key factors in raising the productivity of the crops. It is imperative that water supply, soil properties and proper crop rotation are the major key factors for sustainable returns to the farmers. Thus, soil property analysis is one of the most challenging phenomena that determine all other key factors. Therefore, it is essential to analyze the soil parameters periodically for better result in agricultural practices. Paramikulam Aliyar Irrigation Project (PAP) supports an agricultural land of 20536 Ha at Pollachi which lies between 10.662°N 77.00650°E, located 40 km to the south of Coimbatore. It was identified by the key informants of the farmer’s water use association and with the local farmers in PAP irrigation project area that the soil parameters are varying which determines important role in multiple cropping. Although the manual soil testing gives effective information about the soil parameters N, P and K, it is essential to identify the soil health periodically at large extent which supports for sustainable productivity of the agricultural land. Moreover, the frequent analysis of soil report after each cropping season helps them to go for choosing proper organic manure and suitable crops. Hence, accurate digital database about soil parameters of agricultural land in Pollachi area is needed for the farmers to choose their crops to increase the agricultural profit. Remote sensing is becoming an effective tool in monitoring local, regional and global environmental issues. Remotely sensed data are a proven source of information for detailed characterization of soil type, structure and condition [2]. Soil spectral indices derived from satellite image data become one of the primary information sources for monitoring soil conditions and mapping soil health changes [3]. The spectral indices of the soil are one of the most common tools used in global soil health and change detection studies. Spectral indices have a high correlation with several soil properties, such as soil texture, total nitrogen, phosphorous and potassium. In arid and semiarid regions, which are characterized by low vegetation cover, the effect of the soil background reflectance needs to be considered [4]. Remote
Extracting and Developing Spectral Indices for Soil …
443
sensing techniques sense characteristics of the soil surface. One of remote sensing’s most effective capabilities is its ability to monitor variation of diurnal or ephemeral soil properties as they change over time in response to time of day, season, weather and climate [5]. When experimented at regional scale, the variations in soil surface moisture, texture, mineral composition, organic carbon, etc., can be tracked from week to week for the progression of seasonal patterns of crop growth. The sensitivity of the spectral information to soil background has generated ample interest in developing other indices [6]. Different research groups have developed qualitative and quantitative evaluation of soil characteristics using spectral measurements. Spectral models and indices are being developed to improve vegetation sensitivity by accounting for atmosphere and soil effects. The present study has been undertaken to investigate the association of soil NPK through fine correlations between the spectral indices extracted from in situ laboratory testing by using FieldSpec® HandHeld 2TM Spectroradiometer instrument [5] and Sentinel-2A dataset [2].
2 Materials and Methods 2.1 Study Area Description Anaimalai which lies between 10.662°N 77.00650°E, located 40 km to the south of Coimbatore holds 20,536 ha of agricultural land and has been supported by Paramikulam Aliyar Project (PAP). Anaimalai soil is characterized by red loam, black soil and calcareous soil which receives an average annual rainfall of about 158 cm, and the temperature ranges from 23 to 26 °C. Besides, the study area consists of various major crops such as coconut, paddy, groundnut, sugarcane, coco, nutmeg, areca nut and vennila. It was revealed from key informants of the farmer’s water use association that the soil parameters are varying, which determines the multiple cropping. Although there is an existing systematic irrigation system, still farmers are following mono cropping pattern due to lack of periodic soil fertility information. Even though the manual soil testing gives effective information about the soil parameters (N, P and K), it is mandatory to identify the soil nutrients periodically in large scale, in order to support sustainable productivity. Multispectral remote sensing fulfills the aforementioned needs and helps to carry out the study on specific location.
2.2 Field Soil Sampling and Laboratory Analysis A total of twenty-seven soil samples were collected at the depth of 15 cm in the various locations of study area as shown in Fig. 1 in the month of January 2021. The
444
V. Dhayalan and K. Sudalaimuthu
India
Tamilnadu
Fig. 1 Study area with sampling locations
Fig. 2 Field soil sample collection and labeling
soil samples were collected as per the standard guidelines of Tamil Nadu Agricultural University (TNAU), India. The sampling site descriptions, such as crop cultivated, soil color, texture, types, latitude and longitudinal values, were recorded for precise assessments. Sampling location was recorded using Differential Global Positioning System (DGPS) along with a sub-meter (Trimble Navigation Ltd., Sunnyvale, California, USA). Samples were sieved with 2 mm sieve in order to remove other foreign materials, and then it is subjected to air dried to remove the moisture content. Finally, the soil samples were packed in a plastic bag and were labeled with the aforementioned site descriptions as shown in Fig. 2.
2.2.1
Descriptive Statistical Analysis of Soil Properties
Labeled samples were brought to soil testing laboratory (TNAU soil testing laboratory, India) where available nitrogen, phosphorous and potassium were measured using Kjeldhal nitrogen analysis method, Bray-1 method and flame photometer,
Extracting and Developing Spectral Indices for Soil …
445
respectively [7]. The available nitrogen ranges from 128 to 265 kg/ha, where phosphorous ranges from 15 to 37 kg/ha and available potassium ranges from 164 to 374 kg/ha. Descriptive statistical analysis was made using OriginPro 8.5 software, where entire soil samples were subjected to standard deviation, mean, skewness, kurtosis and coefficient of variation (C.V) measurements that are elucidated in Table 1.
2.2.2
In Situ Soil Spectra Analysis
Soil samples were subjected to extract spectra indices using ASD FieldSpec® HandHeld™ 2 spectroradiometer [5] having the measurement ranges from 325 to 1075 nm. The soil spectra were extracted in the closed laboratory to avoid signal-to-noise errors. The instrument is mounted on the tripod stand, and artificial light source is made with the help of tungsten quartz halogen lamp. The soil sample is placed at the distance of 30 cm from the instrument. After initial calibration and optimization, white spectra were taken followed by the soil spectra for the samples which were extracted. For further analysis of extracted soil spectra, the data files are imported from spectroradiometer instruments to computer as referred in Fig. 3. In ViewSpec Pro software, the spectra were processed to convert the radiance files to desired reflectance files. The soil spectra is viewed in the ENVI 4.7 software as shown in Fig. 4, and the plot parameters are fixed from 400 to 1075 nm since the pre-wavelength spectra were subjected to noise errors. The spectra are saved as ASCII files, and the same has been opened in OrginPro 8.5 for further analysis of the soil spectra.
2.3 Remote Sensing Data Sentinel-2A data consisting the study area were collected from Copernicus Open Access Hub (previously known as Sentinels Scientific Data Hub) website (https:// sentinels.copernicus.eu/web/sentinel/home). The data were collected in the month of
Fig. 3 Soil spectra extraction and data importing
446
V. Dhayalan and K. Sudalaimuthu
Fig. 4 In situ extracted soil spectra
January 2021. The metadata about the data are furnished in Table 2. The soil spectra indices were extracted from this multispectral satellite image.
2.3.1
Image Preprocessing
Sentinel Application Platform (SNAP) software developed by ESA enables to render specific atmospheric and radiometric corrections with the specific Sen2Cor tool [8, 9]. Sen2Cor corrects the data from the effects of atmosphere in order to receive desired bottom-of-atmosphere (BOA) reflectance product. After image processing, the data are subjected to extract the spectra with the help of latitude and longitude of sample locations.
2.3.2
Sentinel Data Spectral Extraction
The sentinel data are subjected to spectral extraction from the selected location where the soil samples were taken. Latitude and longitude values were fed in the pin manager tool, and the spectral reflectance values were extracted in clipboard as reflected in Fig. 5 and imported in OrginPro 8.5 software for further analysis.
Extracting and Developing Spectral Indices for Soil …
447
Fig. 5 Typical spectrum view of the selected sample location
2.4 Statistical Interpretation Spectral reflectance extracted from in situ laboratory soil testing and Sentinel-2A data were imported in OrginPro 8.5 software. Imported data were plotted in the graph in order to develop soil spectral reflectance curve [10]. The reflectance curve shows close correlation among the in situ measurements and satellite data. In order to justify the correlations, the R2 value is obtained for both soil spectral curves extracted from in situ measurements and satellite data. Comparing the R2 value, the final outcomes were derived showing correlations in in situ measurements and sentinel data that are very useful in measuring the soil associated NPK.
3 Results and Discussion 3.1 In Situ Laboratory Soil Spectra Curve The extracted spectral reflectance from the in situ soil testing was developed as spectral reflectance curve statistically. The soil samples taken from the various locations of Anaimalai, Coimbatore, India, imply very peak response in visible near infrared region (VNIR) the curve continuous at the peak in the range of 1000–1075 nm. From
448
V. Dhayalan and K. Sudalaimuthu
Table 1 Descriptive statistical analysis of soil samples S.No
Mean
S.D
Skewness
Kurtosis
C.V
Min
Median
Max
N
27
185
28.7
0.34407
1.14498
0.15
128
186
265
P
27
23.51
0.60373
-0.68153
0.26
15
22
37
K
27
213.3
1.4665
2.46718
0.26
132
198
374
6.24 55.4
S.No—Number of samples SD—Standard deviation CV—Coefficient of variation
the extensive literature, the nitrogen ranges from 100 to 250 kg/ha, phosphorous ranges from 20 to 40 kg/ha and potassium ranges from 200 to 400 kg/ha are shown in the aforementioned wavelength regions [11]. From the descriptive statistical Table 1 where the soil was tested manually and reported correlates with the outcomes of the soil spectral curve from the in situ measurements as shown in Fig. 6.
3.2 Sentinel-2A Soil Spectra Curve Preprocessed cloud-free sentinel data are subjected to soil spectra extraction, the 27 sample locations were placed in the image, and the spectral reflectance values were extracted. The extracted spectral values imported in the Orgin Pro 8.5 software and the spectral curves were developed which implies the same outcomes of the spectral curve of in situ measurements. The curves are plotted as shown in Fig. 7.
3.3 R2 Value The linear equation of the curve gives the R2 value for both the curves of spectral reflectance extracted from in situ measurement and the satellite data. Table 4 shows close correlation between the in situ measurement and satellite data which enables to investigate NPK associated with the soil. Moreover, the descriptive analyses of manual soil testing report support the findings of the spectral indices (Table 3).
4 Conclusion In spite of all modern technology in monitoring the soil health, there is an undiscovered ideology in the theme of periodical and large-scale soil nutrient status. Periodical (time series) soil nutrient database plays a vital role to attain the sustainable agriculture through suitable crop rotation at appropriate time with respect to soil nutrient
Extracting and Developing Spectral Indices for Soil …
449
Fig. 6 Soil spectral curve extracted from in situ measurements a–i nine soil spectra consisting 27 samples
450 Table 2 Remote sensing data used
V. Dhayalan and K. Sudalaimuthu Satellite data
Sentinel-2A
Acquisition date
09–01-2021
Spatial resolution
VNIR 10 m; NIR 20 m
and ecosystem. In order to achieve the twenty-first-century challenges of food safety and security, there is a great need for measuring the nutrient status inch by inch of the farm lands so as to follow multiple cropping protocols. Multiple cropping is the only solution to resolve the food security problems, which is a major challenge in future as forecasted by the experts. Inch by inch farm nutrient status for a large scale is feasible only through remote sensing technique. Thus, this study fulfills the void research gaps, with respect to time series nutrient measurements and large-scale compilation of soil nutrient status. The present study reveals the importance of satellite technology alternates to the manual soil testing in analyzing the soil properties especially NPK. This study gives clear picture about the NPK status of the Anaimalai area from remote sensing technology.
0.0546
0.0451
0.0477
0.0747
0.0762
0.0762
0.0544
0.0478
0.0449
0.056
0.0546
0.0451
0.0477
0.0747
0.0762
0.0762
0.0544
0.0478
0.0449
0.056
SAMPLE-02
SAMPLE-03
SAMPLE-04
SAMPLE-05
SAMPLE-06
SAMPLE-07
SAMPLE-08
SAMPLE-09
SAMPLE-10
SAMPLE-11
SAMPLE-12
SAMPLE-13
SAMPLE-14
SAMPLE-15
SAMPLE-16
SAMPLE-17
SAMPLE-18
SAMPLE-19
SAMPLE-20
443
0.1304
0.1334
0.1293
0.2
0.1968
0.2135
0.2146
0.173
0.1579
0.1631
0.1304
0.1334
0.1293
0.2
0.1968
0.2135
0.2146
0.173
0.1579
0.1631
1610
0.0637
0.0683
0.0621
0.1433
0.1461
0.152
0.1413
0.089
0.0734
0.0757
0.0637
0.0683
0.0621
0.1433
0.1461
0.152
0.1413
0.089
0.0734
0.0757
2190
Reflectance @ wavelength
SAMPLE-01
S.Nos
0.0415
0.0383
0.0403
0.0647
0.0663
0.0497
0.0717
0.0459
0.0414
0.0442
0.0415
0.0383
0.0403
0.0647
0.0663
0.0497
0.0717
0.0459
0.0414
0.0442
490
Table 3 Soil spectral reflectance value from Sentinel-2A data
0.0593
0.0544
0.0561
0.0935
0.0893
0.0743
0.0972
0.0711
0.0594
0.0689
0.0593
0.0544
0.0561
0.0935
0.0893
0.0743
0.0972
0.0711
0.0594
0.0689
560
0.0458
0.0429
0.0439
0.1074
0.106
0.0691
0.0924
0.0558
0.0419
0.0434
0.0458
0.0429
0.0439
0.1074
0.106
0.0691
0.0924
0.0558
0.0419
0.0434
665
0.0779
0.0775
0.0819
0.1267
0.1234
0.126
0.1444
0.0954
0.0924
0.0951
0.0779
0.0775
0.0819
0.1267
0.1234
0.126
0.1444
0.0954
0.0924
0.0951
705
0.171
0.1745
0.1961
0.2032
0.1961
0.2163
0.261
0.217
0.2876
0.2866
0.171
0.1745
0.1961
0.2032
0.1961
0.2163
0.261
0.217
0.2876
0.2866
740
0.21
0.222
0.2442
0.236
0.2233
0.2427
0.3034
0.2567
0.3539
0.3681
0.21
0.222
0.2442
0.236
0.2233
0.2427
0.3034
0.2567
0.3539
0.3681
783
0.2258
0.2234
0.2548
0.2486
0.2346
0.2362
0.2912
0.2736
0.298
0.4476
0.2258
0.2234
0.2548
0.2486
0.2346
0.2362
0.2912
0.2736
0.298
0.4476
842
0.2291
0.2392
0.2689
0.2515
0.2392
0.2601
0.3258
0.2873
0.3786
0.3881
0.2291
0.2392
0.2689
0.2515
0.2392
0.2601
0.3258
0.2873
0.3786
0.3881
865
(continued)
0.2425
0.2472
0.2628
0.2483
0.274
0.274
0.3047
0.2594
0.2938
0.3033
0.2425
0.2472
0.2628
0.2483
0.274
0.274
0.3047
0.2594
0.2938
0.3033
945
Extracting and Developing Spectral Indices for Soil … 451
0.0477
0.0747
0.0762
0.0762
0.0544
0.0478
0.0449
SAMPLE-22
SAMPLE-23
SAMPLE-24
SAMPLE-25
SAMPLE-26
SAMPLE-27
443
0.1334
0.1293
0.2
0.1968
0.2135
0.2146
0.173
1610
0.0683
0.0621
0.1433
0.1461
0.152
0.1413
0.089
2190
Reflectance @ wavelength
SAMPLE-21
S.Nos
Table 3 (continued)
0.0383
0.0403
0.0647
0.0663
0.0497
0.0717
0.0459
490
0.0544
0.0561
0.0935
0.0893
0.0743
0.0972
0.0711
560
0.0429
0.0439
0.1074
0.106
0.0691
0.0924
0.0558
665
0.0775
0.0819
0.1267
0.1234
0.126
0.1444
0.0954
705
0.1745
0.1961
0.2032
0.1961
0.2163
0.261
0.217
740
0.222
0.2442
0.236
0.2233
0.2427
0.3034
0.2567
783
0.2234
0.2548
0.2486
0.2346
0.2362
0.2912
0.2736
842
0.2392
0.2689
0.2515
0.2392
0.2601
0.3258
0.2873
865
0.2472
0.2628
0.2483
0.274
0.274
0.3047
0.2594
945
452 V. Dhayalan and K. Sudalaimuthu
Extracting and Developing Spectral Indices for Soil …
453
Fig. 7 Soil spectral curve extracted from Sentinel-2A data a–i nine soil spectra consisting 27 samples
454 Table 4 R2 value comparison for spectral reflectance curves of in situ measurement and the satellite data
V. Dhayalan and K. Sudalaimuthu Soil sample curve
In situ measurement R2 value
Sentinel-2A R2 value
(a)
0.02
0.02
(b)
0.62
0.65
(c)
0.36
0.31
(d)
0.62
0.64
(e)
0.13
0.16
(f)
0.025
0.015
(g)
0.30
0.29
(h)
0.04
0.05
(i)
0.37
0.36
References 1. Wong, C., Li, X.: Analysis of heavy metal contaminated soils, practice periodical of hazardous toxic and radioactive. Waste Manage. (2003). https://doi.org/10.1061/(ASCE)1090-025X(200 3)7:1(12)Source:OAI.42 2. Earth Explorer.: Retrieved from U.S. Geological Survey (2016). http://earthexplorer.usgs.gov/ 3. Huete, A.R., et al.: Soil and atmosphere influences on the spectra of partial canopies. Remote Sens. Environ. 25(1), 89–105 (1988) 4. Jiang, Z., et al.: Analysis of NDVI and scaled difference vegetation index retrievals of vegetation fraction. Remote Sens. Environ. 101(3), 366–378 (2006) 5. Kuiawski, A.C.M.B., Safanelli, J.L., Bottega, E.L., Oliveira, A.M.D., Guerra, N.: Vegetation indexes and delineation of management zones for soybean1. Pesquisa Agropecuária Tropical 47, 168–177 (2017) 6. Manchanda, M.L., Kudrat, M., Tiwari, A.K.: Soil survey and mapping using remote sensing. Int. Soc. Tropical Ecol. 43(1), 61–74 (2002). ISSN 0564–3295 7. Mohamed, A,E., Rahman, A., Natarajan, A., Srinivasamurthy, C.A., Hegde, R., Prakash, S.S.: Assessment of soil quality by using remote sensing and GIS techniques; a case study, Chamrajanagar District, Karnataka, India. Acta Sci. Agric. 2(1), 05–12 (2018) 8. Palanisamy, C.: GIS and Remote Sensing Techniques for Land Degradation Studies in Dharmapuri District of Tamil Nadu, Thesis submitted in part fulfilment of the requirements for the award of the degree of Doctor of Philosophy (Agriculture) in Soil Science and Agricultural Chemistry to the Tamil Nadu Agricultural University, Coimbatore, ID. No.91-814-017 (2000) 9. Punithavathi, J., Tamilenthi, S., Baskaran, R.: Interpretation of soil resources using remote sensing and GIS in Thanjavur district, Tamil Nadu, India, Pelagia Research Library. Adv. Appl. Sci. Res. 2(3), 525–535 (2011) 10. Setia, R., Verma, V., Sharma, P.: Soil informatics for evaluating and mapping soil productivity index in an intensively cultivated area of Punjab, India. J. Geog. Inf. Syst. 4, 71–76 (2011) 11. Malavath, R., Mani, S.: Detection of some soil properties using hyper-spectral remote sensing of semi arid region of Tamil Nadu. Asian J. Soil Sci. 12(1), 191–201 (2017). e ISSN–0976–7231
Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth Engine: Case Study of Odisha State, India Somya Jain, Anita Gautam, Arpana Chaudhary, Chetna Soni, and Chilka Sharma
Abstract India is a flood-prone country. Every year, many states of India witness flood conditions during the monsoon season. Flood cause loss of human lives and biodiversity and damage land and vegetation which hinders the social and economic development of the country. Proper monitoring and assessment of flood regions help in preparing mitigation strategies and to take effective measures for the recovery. Flood mapping helps in the proper and accurate assessment of flood in any area. With the help of remote sensing, flood-prone areas can be monitored, and flood mapping can be done by using different methodologies that help for accurate analysis. In this study, flood mapping is done using the Sentinal-1 synthetic aperture radar (SAR) data. SAR provides continuous observation of the earth’s surface without affected by any atmospheric conditions, hence effective for the flood mapping. This study demonstrated flood mapping process using the VH and VV polarizations of the synthetic aperture radar (SAR) data for Odisha state in India through Google Earth Engine. In 2020, a few districts of Odisha which are as follows: Baleshwar, Bhadrak, Kendrapara, and Jagatsinghapur were heavily rain. The methodology employs preprocessed Sentinel-1 images of pre-flood and during flood and change detection and thresholding technique to map inundated area on the Google Earth Engine (GEE). The flood area and extent have been analyzed through the result. The results of this study revealed that the SAR data along with the GEE can be well utilized for flood risk management and disaster risk reduction. Keywords Flood mapping · Remote sensing · Synthetic aperture radar (SAR) · Google Earth Engine (GEE)
S. Jain (B) · A. Gautam · A. Chaudhary · C. Soni · C. Sharma School of Earth Sciences, Banasthali Vidyapith, Banasthali 304022, India e-mail: [email protected] C. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_40
455
456
S. Jain et al.
1 Introduction Flood is one of the most frequent and destructive disasters, affect many countries across the world every year (Rahman and Thakur [15]. Tripathi [22] defined flood as “High-water stages in which water overflows its natural or artificial banks onto normally dry land, such as a river inundating its floodplain.” This natural disaster not only takes lives and damage infrastructure, but it has a devastating impact on the economy and environment of a country [23]. According to the disaster statistics of UNISDR between 1995 and 2015, floods have caused an economic loss of at least $166 billion worldwide [10]. Flood is a natural disaster, but both natural and anthropogenic factors cause its occurrence [21]. Natural factors include excessively heavy rainfall, cyclones, tsunamis, storms, tidal surges, and melting of ice [2]. Anthropogenic factors include the construction of embankments, barrages, dams, unplanned construction of roads, bridges, and houses in the floodplain which affect the natural flow of the river system causing severe flooding [5]. India faces flood conditions every year during the monsoon season [4]. The excessive rainfall causes overflow of river water which led to the flooding conditions in many states [1]. India’s nearly 40 million hectares of land is vulnerable to flood as stated by Rashtriya Barh Aayog (1980) which increased to 49.815 mha according to a report by states on flood damage data between 1953 and 2010 [22]. The main floodprone area of India is the Ganga and Brahmaputra basin causing flood in the northern and eastern part of the country and the Baitarani-Brahmani, and the Subarnarekha basin creates flood conditions in Odisha [14]. The flood events cannot be controlled, but proper monitoring and assessment of flood events help in flood risk management [22]. Assessment of flood-prone areas helps in forming mitigation plans and effective recovering of flood-affected areas [25]. The near-real-time information of flood inundation extent is necessary for assessing damage and providing a quick and efficient response during the flood condition [24]. The mapping of the flood is the primary requirement for flood risk management. Flood mapping is an essential tool for deriving information of flood extent, frequency of flood, and depth of flood which used to minimize the damage caused by floods and in planning for the affected areas [19]. Flood maps facilitate the rapid monitoring of flood events and asset disaster relief organizations to carry out relief operations [17]. Flood maps help in predicting future floods, zoning flood areas, flood warnings, preparedness plans, and response plans which reduce the flood risk [1]. Flood mapping can be done by using ground observation and aerial observation data, but these are the traditional methods that consume much time, require skilled persons, and also, they are expensive [15]. The continued updating of the flood extent is required for accurate monitoring and responding timely and quickly [8]. Satellite data are the most helpful tool for flood mapping as it provides an overall view and timely information of the flood area [25]. The variety of satellite products have increased with improved quality [3].
Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth …
457
Two types of satellite imagery, i.e., optical and synthetic aperture radar (SAR) are used for flood monitoring [1]. Optical sensors like Landsat-8, Moderate Resolution Imaging Spectroradiometer (MODIS), Advanced Very High-Resolution Radiometer (AVHRR), and Sentinel-2 are used for flood mapping. These sensors have the disadvantage that they cannot penetrate through clouds and depends on sun’s illumination [17]. On the contrary, SAR sensors collect data in all atmospheric conditions as well as during the night [24]. SAR data are important for near-real-time information on flood events and the detection of water underneath the vegetation areas [23]. One of the major advantages of SAR is its ability to differentiate the land, and water as water surface behaves as a specular reflector [21]. The use of SAR has increased with the freely available data from the Sentinel-1 mission which is launched by the European Space Agency (ESA) under its Copernicus program in 2014. Markert et al. [11] Sentinel-1 C-band SAR provides a great prospect for flood disaster monitoring and management [24]. SAR-based flood mapping requires a large amount of data processing [17]. Google Earth Engine, a cloud computing platform is a helpful tool that can store and process big geospatial datasets [13]. A variety of remote sensing satellite data, image processing, and analysis functions are available on GEE [24]. Python API and a JavaScript API are used to access the data and the scientific algorithms on GEE [7]. Once an algorithm has been developed for a specific function, it can be used on Earth Engine later [6]. In August 2020, the state of Odisha in India experienced heavy rainfall which caused flooding in its various districts. This study aims to generate flood inundation map from SAR images for four coastal districts of the state, viz., Baleshwar, Bhadrak, Kendrapara, and Jagatsinghapur processed in the Google Earth Engine. The study also has an objective of computing flood area and deriving flood extent in these districts. The study aims to provide a rapid method to derive near-real-time information of inundated areas to support the disaster management agencies in carrying out rescue and relief operations rapidly in an effective manner. Several researches explored the SAR data for flood mapping at different scales using the threshold method [3]. Sivasankar et al. [18] demonstrated automatic flood mapping for Northeast India using SAR images with VV backscatter along with threshold method on Google Earth Engine. Zurqani et al. [26] prepared a flood map for coastal South Carolina using SAR pre-flooding and after-flooding images in Google Earth Engine. Singh and Pandey [16] explored SAR images using threshold method in GEE to map the inundated areas of 12 districts in Punjab state. Lal et al. [9] studied the flood inundation in the region of lower Indo-Gangetic-Brahmaputra plains using SAR VH-polarized images processed in Google Earth Engine. Moothedan et al. [12] presented an automated flood inundation mapping from Sentinel-1 SAR images using the thresholding method in the GEE platform for the 2019 flood in Darbhanga, Bihar. Tiwari et al. [20] shown flood inundation in Kerala occurred in August 2018 using Sentinel-1 SAR VV-polarized data and threshold method in Google Earth Engine. In the existing literatures for flood mapping combining Sentinel-1 SAR data along with the thresholding technique in the GEE platform, none have been performed over the coastal districts of Odisha state at this scale. Therefore, this study presented use
458
S. Jain et al.
of VV- and VH-polarized SAR data for mapping flood-inundated area and deriving flood extent in the Google Earth Engine platform over the four coastal districts of Odisha state in India. The methodology employs preprocessed Sentinel-1 images of pre-flood and during flood and change detection and thresholding technique to map inundated area on the Google Earth Engine (GEE).
2 Study Area The study region lies in the state of Odisha, situated in the southeastern part of India. Baleshwar (or Balasor), Bhadrak, Kendrapara, and Jagatsinghapur districts are selected for this study. The districts lie between the extent 19° 57 N to 22° 0 N latitude and the 86° 0 E to 87° 30 E longitude (Fig. 1). These districts constitute the northern and central coastal plains of Odisha. The selected districts are surrounded by the West Bengal at the north, Mayurbhanj, Keonjhar, Jajpur, and Cuttack districts at the west, Puri district at the south and the Bay of Bengal in the east. The total geographical area covered by these districts is around 10,580 sq. km. The study area has a plain topography having an average elevation of 30 m from the mean sea level (MSL). Odisha shares a 480 km coastal boundary with the Bay of Bengal making the state vulnerable to natural disasters like cyclone, flood, and storm surges. The state has many rivers flowing through it originating inside and outside of its boundary. The major rivers flowing in the state are the Mahanadi, the Subarnarekha, the Budhabalanga, the Baitarani, the Brahmani, and the Rushikulya. These rivers with their tributaries form several deltas. The Balasore district in the North Coastal Plain lies in the deltas formed by the Subarnarekha and the Budhabalanga rivers. The other districts of the study area are a part of the Central Coastal Plain and lie in the combined deltaic region of the Baitarani, Brahmani, and Mahanadi rivers and their Fig. 1 Study area map
Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth …
459
tributaries. The other rivers which drain the selected districts are Salandi, Bhargavi, Devi, Kharsua, Birupa, and some other tributaries of the main rivers. During the monsoon season, heavy rainfall causes the overflow of water of these rivers resulting in the severe flooding conditions in the study area impacting many lives and environment.
3 Methodology The whole processing was performed in the Code Editor platform of GEE (Fig. 2). Firstly, the region of interest (ROI) was selected. The shapefiles of the investigated area were imported to GEE to define ROI. Now, the Sentinel-1 C-band SAR ground range collection was loaded from the GEE public data archive by applying different filters. Firstly, the data were filtered by defining instrument mode, polarization, orbit properties, resolution, and ROI. The Sentinel-1 SAR GRD data collection for the Fig. 2 Methodology adopted for the study
460
S. Jain et al.
defined ROI in Interferometric Wide (IW) instrument mode with VV and VH polarization, and a descending orbit pass in a 10 m resolution was loaded. Secondly, the resulted image data collection was filtered for the specific dates. The obtained data collection from first filtration was therefore further filtered to obtain images for before and after the flood event in VV and VH polarization. The obtained images for before and after the flood event in both the polarization mode were then mosaicked. Speckle filtering was applied to the resulted images. The smoothing filter with 50 m radius was applied to remove speckle. Now, the difference between before and after the event was calculated using the mosaicked and speckle-filtered VH-polarized images to extract the inundated area. To obtain the difference, the VH-polarized image for after the event was divided by the VH-polarized image for before the event. Further, a mask was generated by applying threshold to enhance the obtained flood-inundated area. The threshold value was applied to the obtained image of flooded area. A final flood map was obtained which was analyzed to find the extent of flood in the ROI.
4 Results Odisha has experienced perpetual rain in August 2020 which brought severe floods in many districts of the state. Flood inundation map has been prepared for four majorly affected districts, and the flood area has been analyzed. Out of the total study area, i.e., 10,578.57 km2 , an area of 2938.601 km2 (i.e., 27.77% of the total study area) is mapped as the flooded area which is displayed as blue color. The non-flooded area is calculated as 7639.968 km2 which is represented by black color (Fig. 3). The district-wise analysis of flood-inundated area was carried out and presented in Table 1. The majorly affected districts are Kendrapara, Bhadrak, and Jagatsinghapur. The flood inundation was assessed at all parts of these districts. In Baleshwar district, northeastern part was the most affected. The separate flood maps were prepared for each district which has been shown in Fig. 4, wherein black color represents non-flooded area and blue color represents flooded area. The highest flood-inundated area was analyzed in Kendrapara district and Bhadrak district which is 1164.91 km2 (i.e., 39.64% of the total flood area) and 908.027 km2 (i.e., 30.89%), respectively. The flood-inundated area in Jagatsinghapur district was 586.655 km2 which is 19.96% of the total flood area followed by Baleshwar having 279.005 km2 (i.e., 9.49%) area as flooded.
5 Conclusions In the present study, the potentiality of Sentinel-1 C-Band SAR for finding near-realtime flood-inundated area and its assessment has been analyzed. It is hard and rare to obtain the cloud-free optical data during monsoon season. SAR has a significant
Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth …
461
Fig. 3 Flood inundation map of the study area
Table 1 Flooded and non-flooded area (in km2 ) in each district S. No
District
Total area (km2 )
Flooded area (km2 )
Flooded area (%)
Non-flooded area (km2 )
1
Baleshwar
3887.03
279.005
9.49
3608.033
2
Bhadrak
2491.76
908.027
30.89
1583.731
3
Jagatsinghapur
1733.70
586.655
19.96
1147.046
4
Kendrapara
2466.07
1164.914
39.64
1301.158
5
Total
10,578.57
2938.601
27.77
7639.968
benefit of providing images in all atmospheric conditions which makes it the most suitable data for near-real-time flood mapping. After the commencement of Sentinel1 mission by the European Space Agency (ESA) under its Copernicus program in 2014, the use of SAR has been increased in variety of applications. SAR-based flood mapping requires a large amount of data processing and storage. Google Earth Engine (GEE), a cloud computing platform is a helpful tool that can store and process big geospatial datasets. The prime objective of carrying out this study was to prepare flood map and hence calculate flood-inundated area of four coastal districts of Odisha state in India by exploring SAR. VV and VH-polarized SAR images are processed in GEE platform using threshold method to identify the flood area and extent. The floodinundated area calculated for four districts of Odisha state was 2938.601 km2 out of
462
S. Jain et al.
Fig. 4 District-wise flood inundation map
the total area of 10,578.57 km2 . Out of the total area, 27.77% area was affected by flood. The highly affected districts were Kendrapara and Bhadrak having inundated area 1164.91 km2 (i.e., 39.64%) and 908.027 km2 (i.e., 30.89%), respectively. The obtained results showed that the SAR is very useful for rapid flood inundation mapping and monitoring. Google Earth Engine has an immense potential for preparing flood maps and assessment of flood areas. The results of this study revealed that the proposed methodology is helpful for providing near-real-time information of flood extent to the disaster management agencies which can help them to prioritize their relief activities. Sentinel-1 SAR data along with the GEE can be well utilized for flood risk management and disaster risk reduction.
References 1. Aryal, D., Wang, L., Adhikari, T. R., Zhou, J., Li, X., Shrestha, M., Wang, Y., Chen, D.: A model-based flood hazard mapping on the southern slope of Himalaya. Water (Switzerland), 12(2) (2020). https://doi.org/10.3390/w12020540 2. Braimah, M.M., Abdul-rahaman, I., Sekyere, D.O., Momori, P.H., Abdul-mohammed, A., Dordah, G.A.: Assessment of waste management systems in second cycle institutions of the Bolgatanga Municipality, Upper East, Ghana. Int. J. Pure Appl. Biosci. 2(1), 189–195 (2014) 3. Clement, M.A., Kilsby, C.G., Moore, P.: Multi-temporal synthetic aperture radar flood mapping using change detection. J. Flood Risk Manage. 11(2), 152–168 (2018). https://doi.org/10.1111/ jfr3.12303 4. Pawar Amol, D., Sarup, J., Mittal, S.K.: Application of GIS for flood mapping : a case study of Pune City. Int. J. Mod. Trends Eng. Res. 3(4), 474–478 (2016). https://www.researchgate. net/publication/303702550Application
Flood Mapping Using Sentinel-1 GRD SAR Images and Google Earth …
463
5. Gaurav, K., Sinha, R., Panda, P.K.: The Indus flood of 2010 in Pakistan: a perspective analysis using remote sensing data. Nat. Hazards 59(3) (2011). https://doi.org/10.1007/s11069-0119869-6 6. Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., Moore, R.: Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017). https://doi.org/10.1016/j.rse.2017.06.031 7. Gulácsi, A., Kovács, F.: Sentinel-1-imagery-based high-resolution water cover detection on wetlands, aided by google earth engine. Remote Sens. 12, 1–20 (2020). https://doi.org/10. 3390/rs12101614 8. Kuntla, S.K., Manjusree, P.: Development of an automated tool for delineation of flood footprints from SAR imagery for rapid disaster response: a case study. J. Indian Soc. Remote Sens. 48(6), 935–944 (2020). https://doi.org/10.1007/s12524-020-01125-4 9. Lal, P., Prakash, A., Kumar, A.: Google Earth Engine for concurrent flood monitoring in the lower basin of Indo-Gangetic-Brahmaputra plains. Nat. Hazards 104, 1947–1952 (2020). https://doi.org/10.1007/s11069-020-04233-z 10. Lin, L., Di, L., Tang, J., Yu, E., Zhang, C., Rahman, M.S., Shrestha, R., Kang, L.: Improvement and validation of NASA/MODIS NRT global flood mapping. Remote Sens. 11(205), 1–18 (2019). https://doi.org/10.3390/rs11020205 11. Markert, K.N., Markert, A.M., Mayer, T., Nauman, C., Haag, A., Poortinga, A., Bhandari, B., Thwal, N.S., Kunlamai, T., Chishtie, F., Kwant, M., Phongsapan, K., Clinton, N., Towashiraporn, P., Saah, D.: Comparing Sentinel-1 surface water mapping algorithms and radiometric terrain correction processing in Southeast Asia Utilizing Google Earth Engine. Remote Sens. 12 (2020). https://doi.org/10.3390/rs12152469 12. Moothedan, A.J., Dhote, P.R., Thakur, P.K., Garg, V.: Automatic flood mapping using sentinel1 GRD SAR images and Google Earth Engine : a case study OF DARBHANGAH, BIHAR. Recent Advances in Geospatial Technology & Applications, IIRS Dehradun, India, 1–4 (2020). https://www.researchgate.net/publication/343539830 13. Mutanga, O., Kumar, L.: Google earth engine applications. Remote Sens. 11 (2019). https:// doi.org/10.3390/rs11050591 14. Panda, P.K.: Vulnerability of flood in India: a remote sensing and Gis approach for warning, mitigation and management. Asian J. Sci. Technol. 5(12), 843–846 (2014) 15. Rahman, M.R., Thakur, P.K.: Detecting, mapping and analysing of flood water propagation using synthetic aperture radar (SAR) satellite data and GIS: a case study from the Kendrapara District of Orissa State of India. Egypt. J. Remote Sens. Space Sci. 21, 537–541 (2018). https:// doi.org/10.1016/j.ejrs.2017.10.002 16. Singh, G., Pandey, A.: Flood mapping using multi-temporal open access synthetic aperture radar data in Google Earth Engine. Roorkee Water Conclave 2020 (2020). http://repositorio. unan.edu.ni/2986/1/5624.pdf 17. Singha, M., Dong, J., Sarmah, S., You, N., Zhou, Y., Zhang, G., Doughty, R., Xiao, X.: Identifying floods and flood-affected paddy rice fields in Bangladesh based on Sentinel-1 imagery and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 166, 278–293 (2020). https:// doi.org/10.1016/j.isprsjprs.2020.06.011 18. Sivasankar, T., Sarma, K.K., Raju, P.L.N.: Automatic flood mapping using sentinel-1 GRD SAR images and Google Earth Engine : a case study from North East India. November (2019) 19. Tam, T.H., Ibrahim, A.L., Rahman, M.Z.A., Zulkifli, M.: Flood risk mapping using geospatial technologies and hydraulic model. In: 34th Asian Conference on Remote Sensing 2013 (2013) 20. Tiwari, V., Kumar, V., Matin, M.A., Thapa, A., Ellenburg, W.L., Gupta, N., Thapa, S.: Flood inundation mapping-Kerala 2018; Harnessing the power of SAR, automatic threshold detection method and Google Earth Engine. PLoS ONE 15(8), 1–17 (2020). https://doi.org/10.1371/jou rnal.pone.0237324 21. Tripathi, G., Pandey, A.C., Parida, B.R., Kumar, A.: Flood inundation mapping and impact assessment using multi-temporal optical and SAR satellite data: a case study of 2017 flood in Darbhanga District, Bihar, India. Water Resour. Manage. (2020). https://doi.org/10.1007/s11 269-020-02534-3
464
S. Jain et al.
22. Tripathi, P.: Flood disaster in India : an analysis of trend and preparedness. Interdisc. J. Contemp. Res. 2(4), 91–98 (2015). https://www.researchgate.net/profile/Prakash_Tripathi/ publication/292980782_Flood_Disaster_in_India_An_Analysis_of_trend_and_Preparedn ess/links/56b36ac208ae156bc5fb25bd.pdf 23. Tsyganskaya, V., Martinis, S., Marzahn, P., Ludwig, R.: Detection of temporary flooded vegetation using Sentinel-1 time series data. Remote Sens. 10 (2018). https://doi.org/10.3390/rs1 0081286 24. Uddin, K., Matin, M.A., Meyer, F.J.: Operational flood mapping using multi-temporal Sentinel1 SAR images: a case study from Bangladesh. Remote Sens. 11, 1–19 (2019). https://doi.org/ 10.3390/rs11131581 25. Vishnu, C.L., Sajinkumar, K.S., Oommen, T., Coffman, R.A., Thrivikramji, K.P., Rani, V.R., Keerthy, S.: Satellite-based assessment of the August 2018 flood in parts of Kerala, India. Geomat. Nat. Haz. Risk 10(1), 758–767 (2019). https://doi.org/10.1080/19475705.2018.154 3212 26. Zurqani, H.A., Post, C.J., Mikhailova, E.A., Ozalas, K., Allen, J.S.: Geospatial analysis of flooding from hurricane Florence in the coastal South Carolina using Google Earth Engine. Clemson University TigerPrints Graduate Research and Discovery Symposium (GRADS), 4–5 (2019). https://tigerprints.clemson.edu/grads_symposiumRecommended
Remote Sensing Image Captioning via Multilevel Attention-Based Visual Question Answering Nirmala Murali and A. P. Shanthi
Abstract Image captioning refers to generating a description for an image. Remote sensing image(RSI) captioning is a cross-field between RSI processing and natural language processing. Most of the RSI suffer from large inter-class similarity. The proposed system uses visual question answering (VQA) to overcome this problem and to aid the caption generation process. VQA model helps the system to identify important words that can be included in the captions. By doing this, the captions include all the objects in the image and the count of those objects, thus making the captions more informative. The process has two phases. The first phase performs VQA using three levels of attention. The model uses the knowledge that it has obtained from the VQA to enhance the captions in the second phase. An overall BLEU score of 78.8% is obtained by the proposed model. Keywords Convolutional neural network · Multi-class classification · Remote sensing · Self-attention · Visual question answering
1 Introduction Image captioning is the task of explaining an image by including details about all objects in that image. Remote sensing image captioning means that a description is to be generated for the satellite images. RSI captioning has various applications such as disaster assessment, city planning, terrain scanning, military scout and information retrieval systems. The captioning task in remote sensing is more important because there are thousands of cc images, and these captioning systems can easily generate captions for such images.
N. Murali (B) · A. P. Shanthi Department of Computer Science, College of Engineering, Anna University, Guindy, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_41
465
466
N. Murali and A. P. Shanthi
Visual question answering refers to answering questions with respect to the objects in the image. Merging captioning and VQA could bring out a lot of unexplored features of the image and the captions that are generated by the system will be very informative and can be used in many retrieval systems. BLEU scores are used to determine how effective the system is. Similar works have already been done for the natural images. But this paper attempts to try a different architecture for the RSI, since RSI captioning is more challenging than captioning of natural images. This paper is sectioned into three parts. Section 2 discusses similar works, and Sect. 3 explains the step by step implementation. Results are elaborated in Sect. 4.
2 Literature Review Various techniques have been followed to caption remote sensing images. Templatebased methods are effective when all the images are similar. In template-based methods [1–3], the images are passed into the CNN or any feature extraction model, and instead of generating the captions from scratch, a template is given into the model to get the blanks filled. The overhead is greatly reduced because this works like an extractive model , which extracts the words and fills the blanks in the templates. Retrieval-based methods [4] extract the features from an image and compare the features with the features of images in the database. If the features match, then the caption of that image is extracted and given a the caption to this image. The model does not learn anything, instead it only compares the features and retrieves the caption that matches the most. The last type of model is the generative model. Captions generated by these models are more informative and well constructed in terms of grammar. The most traditional method is to have a CNN to extract features and use a simple RNN to generate captions [5]. The image is passed into the CNN to extract the features. The features and the captions for training are passed into the RNN to get the caption. Usage of fully connected network(FCN) [6] will also make the object detection more effective because FCN can produce pixel-to-pixel output map rather than a single-class label for an arbitrary input image. Denoising techniques [7] are also used in captioning of remote sensing images. The denoising step is applied in the feature extraction stage in the encoder, and this also helps with multi-scaled features that are quite common in RSI. Loss of information during captioning has been handled using intensive positioning a network [8] which uses region proposals to select the best regions and sample from the set of region proposals to position the object and extract the features. Handling images which have objects of different scales have been handled by multi-scale cropping mechanism, [9] where an image is cropped with multiple scales and trained for captioning. Architectures, like transformer, have made a great difference in the field of attention [10], where transformer used only attention and no other sequence-to-sequence models to generate captions. Overfitting is a common issue with RSI, and it has been handled by using variational auto-encoders [11].
Remote Sensing Image Captioning via Multilevel …
467
Usage of VQA for captioning natural images [12] has proven that VQA has greatly improved the quality of the generated captions. This paper tries to leverage that concept by adding many levels of attention in question answering. This is done because if VQA improves the captions, then improving the VQA model can greatly improve the captions in turn. The major contributions of this paper are as follows: (1) using VQA for captioning remote sensing images, (2) introducing various levels of attention in the process of VQA, (3) generating more informative captions which include count of objects and other features of the objects.
3 Proposed Image Captioning Model 3.1 System Overview The image captioning model flow can be divided into two steps. The first step is to perform visual question answering(VQA). The next step is to caption the image using the knowledge gained from the VQA model (see Fig. 1). The first step involves feature extraction of images. The input images are preprocessed and passed on to convolutional neural network. The images are divided into many regions, and region of interest (ROI) pooling is performed. The regions are given as input to the two-dimensional gated recurrent network(GRU) to get region vector. Attention is performed on region vector and the question vector to get the attention map. This attention is termed as image region attention. The answer labels are also given to attention module along with the question vector to get answer
Fig. 1 Architecture diagram
468
N. Murali and A. P. Shanthi
attention map. The training captions are given as input the skip-thoughts model. The skip-thoughts model represents the captions in an encoded vector where similar sentences have a similar value. Hence while projected to an n-dimensional space, similar sentences have lesser distance between them compared to sentences that are different from each other. This is similar to word embedding, only that embedding is performed to sentences. Attention is performed on the encoded caption sentences. This is knowledge attention. Joint learning is done for image region attention map, the answer attention map and the knowledge attention map. The output is the VQA grounded feature vector. This vector is fed into the language LSTM to get a word at this time step t. This word is fed back into the model to generate the next word. This process occurs like a loop, and finally the caption is obtained as the output.
3.2 Feature Extraction Feature extraction refers to preprocessing the input data as required for the model and to extract the features from the model so that only the necessary features are fed into the model. Three types of features are needed for this model. Features from the input images, questions and the captions are extracted. Convolutional neural network(CNN) is used to extract the features of the images. CNN does two operations: convolution of the image vector and then pooling the vector to reduce dimension. After the last max pooling layer, the vector is extracted, and region of interest(ROI) pooling is performed. ROI pooling means dividing the image vector into multiple regions of equal size and performing pooling on the image vector. There are two types of features that can be extracted from an image, namely spatial feature and semantic feature. Spatial feature includes the background information as well, whereas semantic features are those features that are identified by humans. Semantic feature is obtained from the output of classification or softmax layer and spatial features are obtained after the max pooling layer [13]. The questions are preprocessed by removing the stop words and encoding the sentences. The questions are passed to the gated recurrent unit(GRU) to get word sequence encoding. The output from GRU is Vq . Now, the input captions need to be encoded by performing word embedding and positional encoding. Positional encoding gives the position of every word in the caption sentence.
3.3 Visual Question Answering Visual question answering consists of three attention mechanisms [14], namely image attention, answer attention and knowledge attention. Image attention map is obtained by passing the image regions from ROI pooling to the attention module along with Vq . Attention is performed on Vq and image regions. Hence, every image region
Remote Sensing Image Captioning via Multilevel …
469
is attended for every question vector. The answers are divided and categorized into multiple classes. This is done because question answering is treated as a multi-label classification problem. This is similar to image classification, but every image will fall into multiple categories. These answer classes are passed into an RNN along with the corresponding questions. Output of the RNN is passed into the attention module to get answer attention map. Captions are passed into the skip-thoughts model to get the encoded vectors. These encoded vectors are passed into the attention model, where attention is performed on the caption vector along with the Vq. The output from this attention module is the knowledge attention map. Thus, we get three attention maps [14]. One from attending Vq and image regions, one from attending Vq and answer classes and one from attending Vq and caption vector. This is done so that the question vector is attended with image as well as the answers and will enhance the answers generated by the model. All three attention maps are fused together using point wise multiplication. A perceptron with few dense layer is designed, where the input to the network is the fused vector from three attention models and the answer classes are projected on to the dense layers as classes. The output from this layer will be the answer class corresponding to the question.
3.3.1
Attention Mechanism
Attention is similar to human attention, where a human tends to observe few important things and leave out few objects when an image is shown. This process of selectively viewing something by filtering out is called attention mechanism [10]. The deep neural networks mimic this, and hence, making use of attention mechanism can greatly enhance the output from the model, since the model is able to assign more weights to the important attributes and leave out the not so important features. Three values are given as input to the attention model, namely the query, key and value. The query is like the question, and the key is the keyword pointing out the place that the model needs to focus. The value is the answer or the output to the query.
3.4 Caption Generation Inputs to the caption generation module are the image features, VQA grounded feature vector and the sample captions. The image features are obtained from the CNN as explained in Sect. 3.2. The sample captions are the input captions that are used for the training purpose. The VQA grounded feature vector is the vector extracted from the last layer of the VQA model. This feature vector has the activation scores for every word in the vocabulary corresponding to the image that is being fed into the system. For example, there are 100 words in the vocabulary, and an image is given as input to the model. VQA grounded feature vector will be a matrix of dimension 100*1, where every word in the vocabulary will have an activation score. This activation
470
N. Murali and A. P. Shanthi
score will tell the model whether to include the word or not in the caption. An higher activation score means that the word is very relevant to the image, and hence, it can be included in the caption. Similarly, a very low activation score means that the word is not at all related to the caption, and the model should try to avoid using that word in the caption. This enhances the caption because the model is refrained from including objects that are not present in the image. Also, the model is forced to include all objects in the image. An LSTM is used to generate the sequence of words [12]. The first word is the semantic feature of the image. In the next iteration, the second word is generated based on the first word and the image features. Teacher force technique is used, where the model is forced to generate the expected word, irrespective of what the model has predicted at the current time step t. Beam search technique is used to select the word with the highest probability at the current time step t.
4 Results 4.1 Data Set The RSVQA data set [15] is used for visual question answering. This data set uses 15-cm resolution aerial images in RGB format, and they are extracted from the highresolution orthoimagery (HRO) data collection of the USGS. This collection covers most urban areas of the USA, along with a few areas of interest (e.g., national parks). For most areas covered by the data set, only one tile is available with acquisition dates ranging from the year 2000 to 2016, with various sensors. RSICD [16] is used for image captioning purpose. It has 10921 images which are collected from various sources, namely Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224 × 224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.
4.2 Experimental Results This section depicts the results obtained from performing the above process. Output from VQA module is shown in Fig. 2. The VQA was handled as a multi-label classification problem, and the model answers various questions regarding area type, area covered, presence of an object and count of an object in the image. Most of the answers match with the ground truth. The system was tested with different types of questions. The model is able to differentiate different objects and identify the presence of every object in the image. The count of the objects has been divided into two classes: more than five and less than five (Table 1).
Remote Sensing Image Captioning via Multilevel …
471
Fig. 2 Sample input image
Table 1 Sample question answers Questions Are houses present in the image? How many houses are present? What is the area covered by houses? What type of area is it? How many roads are there? Is a water area present? How much area does the commercial buildings cover?
Predicted answers Yes Less than five 500 m2 Residential area Less than five No 0 m2
The output from captioning module is shown in Fig. 3. The attention map generated from the final attention module is shown in Fig. 4. For every word that is generated by the model, the attention map is also generated. That attention map shows what the model has focused while generating that particular word.
4.3 Discussion Figure 3a shows that the model was able to identify the presence of a building and many trees. In Fig. 3b, the model has correctly predicted the count of the buildings in the image. Figure 3c identifies that there are no buildings. Hence, the model is able
472
N. Murali and A. P. Shanthi
Fig. 3 Captions generated by the proposed model
Fig. 4 Attention map for a sample image (X-axis and Y -axis represent the width and height of the image, respectively)
to tell the presence and absence of all objects. However, none of captions included the area covered by the objects. Figure 3d shows that the model is able to construct a complicated sentence without any grammatical error, and also it identifies the count of the object in the given image. In Fig. 3e the caption includes all objects in the image. The model is able to include the count of the objects that are present in the input image, which proves the importance of the VQA model. If the VQA grounded feature vector is not used, the model produced very simple captions without any
Remote Sensing Image Captioning via Multilevel … Table 2 Evaluation metrics Model BLEU-1 CSMLF model Multimodal Attention AttrAttention VQA image captioning
473
BLEU-2
BLEU-3
BLEU-4
Meteor
0.5759
0.3859
0.2832
0.2217
0.2128
0.6378 0.7336 0.7571 0.788
0.4756 0.6129 0.6336 0.6812
0.4004 0.5190 0.5385 0.698
0.3006 0.4402 0.4612 0.5527
0.2905 0.3549 0.3513 0.3987
information regarding the objects or the count. But in Fig. 3f, the model has identified only the ‘road,’ but it was not able to construct a sentence with that information. The VQA model was able to predict correct answers to the area-related questions. But the captions did not include the area-related information. This could be due to multiple reasons such as the model was not able to perfectly locate the object and that the model was not able to exactly identify the area covered by the object. This can be rectified by using any of the already available object detection models like fast RCNN, mask RCNN or YOLO.
4.4 Evaluation Metrics The most commonly used metric is the BLEU score [17]. An overall BLEU score of 78.8% is achieved. BLEU-1 score used 1 g, and BLEU-2 used both 1 and 2 g. Similarly, BLEU-3 and BLEU-4 are also calculated. METEOR score [18] takes into account the n-grams, synonyms and root words, and it used F-measure to calculate the score. The model is able to achieve METEOR score of 0.39. The CSMLF model [19], attribute attention model [20], multimodal method [21] and attention model [22] are used to compare the results achieved by the proposed model (Table 2).
5 Conclusion The proposed model for image captioning using visual question answering and the impact of using VQA has been discussed in detail. The question answering process is first done on the images, and the model learns important vocabulary-related features. The visual grounded features were extracted from the VQA model and used to aid the captioning process, and the generated captions were found to be more informative in terms of count and presence of objects. The model can be further improved by using any object detection models such as mask RCNN to exactly locate the area of the objects.
474
N. Murali and A. P. Shanthi
References 1. Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., Lazebnik, S.: Improving image-sentence embeddings using large weakly annotated photo collections. In: Computer Vision—ECCV (2014). Springer International, pp. 529–545 (2014) 2. Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2596–2604 (2015). https://doi.org/10.1109/Iccv.2015.298 3. Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013).https://doi.org/10. 1613/jair.3994 4. Basaeed, E., Bhaskar, H., Al-Mualla, M.: Supervised remote sensing image segmentation using boosted convolutional neural networks. Knowl. Based Syst. 99, 19–27 (2016). https://doi.org/ 10.1016/j.knosys.2016.01.028 5. Hoxha, G., Melgani, F., Slagenauffi, J.: A new CNN-RNN framework for remote sensing image captioning. M2GARSS (2020). https://doi.org/10.1109/M2GARSS47143.2020.9105191 6. Shi, Z., Zou, Z.: Can a machine generate humanlike language descriptions for a remote sensing image? IEEE Trans. Geosci. Remote Sens. 55(6) (2017). https://doi.org/10.1109/TGRS.2017. 2677464 7. Huang, W., Wang, Q., Li, X.: Denoising-based multiscale feature fusion for remote sensing image captioning. IEEE Geosci. Remote Sens. Lett. 18(3), 436–440 (2020). https://doi.org/10. 1109/LGRS.2020.2980933 8. Wang, S., Chen, J., Wang, G.: Intensive positioning network for remote sensing image captioning. In: 8th International Conference, IScIDE 2018, Lanzhou, China, 18–19 Aug 2018 9. Zhang, X., Wang, Q., Chen, S., Li, X.: Multi-scale cropping mechanism for remote sensing image captioning. IGARSS (2019). https://doi.org/10.1109/IGARSS.2019.8900503 10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). https://doi.org/10.5555/3295222.3295349 11. Shen, X., Liu, B., Zhou, Y., Zhao, J., Liu, M.: Remote sensing image captioning via Variational autoencoder and reinforcement learning. Knowl. Based Syst. 203 (2020). https://doi.org/10. 1016/j.knosys.2020.105920 12. Yang, X., Xu, C.: Image captioning by asking questions. ACM Trans. Multimedia Comput. Commun. Appl. 15(2s) (Article 55) (2019). https://doi.org/10.1145/3313873 13. Zhang, X., Wang, X., Tang, X., Zhou, H., Li, C.: Description generation for remote sensing images using attribute attention mechanism. Remote Sens. 11, 612 (2019). https://doi.org/10. 3390/rs11060612 14. Yu, D., Fu, J., Tian, X., Mei, T.: Multi-source multi-level attention networks for visual question answering. ACM Trans. Multimedia Comput. Commun. Appl. 15(2s) (Article 51) (2019). https://doi.org/10.1145/3316767 15. Lobry, S., Marcos, D., Murray, J., Tuia, D.: RSVQA: visual question answering for remote sensing data. IEEE Trans. Geosci. Remote Sens. 58(12) (2020).https://doi.org/10.1109/TGRS. 2020.2988782 16. Lu, X., Wang, B., Zheng, X., Li, X.: Exploring models and data for remote sensing image caption generation. IEEE Trans Geosci Remote Sens. 56(4) (2018). https://doi.org/10.1109/ TGRS.2017.2776321 17. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics (2001). https://doi.org/10.3115/1073083.1073135 18. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/Or Summarization, Association for Computational Linguistics, Ann Arbor, Michigan, pp. 65–72 (2005). aclweb.org/anthology/W05-0909
Remote Sensing Image Captioning via Multilevel …
475
19. Wang, B., Lu, X., Zheng, X., Li, X.: Semantic descriptions of high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 16(8), 1274–1278 (2019). https://doi.org/10.1109/ LGRS.2019.2893772 20. Qu, B., Li, X., Tao, D., Lu, X.: Deep semantic understanding of high resolution remote sensing image. In: 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), (2016), pp. 1–5. https://doi.org/10.1109/CITS.2016.7546397 21. Wiseman, S., Rush, A.: Sequence-to-sequence learning as beam-search optimization. EMNLP (2016).https://doi.org/10.18653/v1/D16-1137 22. Mylonas, S.K., Stavrakoudis, D.G., Theocharis, J.B.: GeneSIS: a GA-based fuzzy segmentation algorithm for remote sensing images. Knowl.-Based Syst. 54(2013), 86–102.https://doi.org/ 10.1016/j.knosys.2013.07.018
Auto Target Moving Object with Spy BOT Atharva Ambre, Ashwin Selvarangan, Rushabh Mehrotra, and Sumeet Thakur
Abstract In order to reduce the risk of life of soldiers at borders, various high tech gadgets such as military tanks, drones have been made possible; but never has there been an alternative to a human soldier. The proposed work focuses on remote and border enforcement in support of the Wi-Fi-based robots that are currently used in defense and military applications. The project’s main aim is to provide an alternative smart BOT that follows human commands which will then decrease casualties. Hence, the main domain of the project lies in security and surveillance. This multisensory BOT is used to detect an intruder in war areas. Using the Internet as a communication medium, the robotic vehicle will autonomously monitor and surveil whereas control will be manual. The Wi-Fi powered NodeMCU, dc motors for wheels and movement of the gun, and the L298N motor driver are the parent components in the proposed work. MIT App Inventor is used to make the application which will monitor the BOT’s movement. The live feed taken by the BOT will be provided to the authorized user. Image processing is responsible for intruder detection and identification. For image processing, we are using the OpenCV module, which will allow us to record the intruder’s movement specifying the entry and exit time along with the date in support of the code written. The gun mounted on our BOT is capable of aiming at an intruder, after a command is given by the user. This BOT is built so far to be a pillar of security and surveillance to the defense team. Keywords Image processing · Face detection · Face recognition · Motion detection · Reciprocating motion · Internet of things · BOT
A. Ambre · A. Selvarangan · R. Mehrotra (B) · S. Thakur Department of Electronics and Telecommunications, Vivekanand Education Society Institute of Technology, Mumbai, India e-mail: [email protected] S. Thakur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_42
477
478
A. Ambre et al.
1 Introduction Injuries and deaths are unavoidable in battle, but most recipients are acknowledged for their exemplary loyalty and contribution to the nation. Animals have been trained for various roles ranging from force defense to military applications in order to minimize casualties. But animals, on the other hand, have drawbacks since they can only perform in certain environments. Also, surveillance is critical in defense applications for keeping an eye on the situation, to protect civilians and take appropriate action when necessary. This usually happens in critical situations like monitoring conflict zones, enemy territories or a hostage. The situation is critical to a country’s defense. Human monitoring of sensitive areas is carried out by trained workers in close proximity. The chance of losing personnel is high in these situations. A possible solution is bringing robots into the picture that have high-class sensors and multitasking abilities in military sectors. In light of recent technological advances, defense bodies across the world with large budgets and political pressure to reduce casualties, are eagerly researching the use of robots not only for spying but also for combat and other missions. Robotics technology is not a recent phenomenon in military applications; it has long been used by armed forces all over the world. In today’s digital era, robots in the military can now perform a variety of combat tasks including rescue operations, explosive disarmament, fire support, reconnaissance, logistics support, lethal combat duties, and more. These robots can also be used as a replacement for human soldiers, as they are able to handle a wider variety of military duties, such as picking off snipers and attacking enemy areas more effectively. During heavy artillery fire, military robots may provide support in reducing the number of casualties. This paper is targeted to the security and surveillance market. There is detailed description about some components used in the proposed work in this research paper [1, 2].
2 Block Diagram The block diagram in Fig. 1, signifies the complete flow of the proposed project. It has been divided into 2 important parts—the user-controlled part and the controller which controls the motion of the BOT as well as the Gun. These 2 parts are connected to each other via the Internet of things (IoT). The left half is the user-controlled part that is built using the application created by the MIT app inventor where inputs are given to the controller, whereas the right half signifies how inputs received from the user via the MIT application is given to the motor driver for the movement of the wheels as well as the gun with the help of a Wi-Fi module known to be as the NodeMCU.
Auto Target Moving Object with Spy BOT
479
Fig. 1 General block diagram of the proposed project
3 Implementation of BOT The main aim is to control the BOT using the Wi-Fi module of the controller, i.e., NodeMCU [3]. Owing to the Wi-Fi module, the BOT will be controlled via an Android/IOS application. As this is built to be an alternative to soldiers, the BOT will have to be suitable for different terrains and different weather conditions. Also, it should be built to carry weight, as a weapon will be loaded on the body of the BOT.
3.1 Design and Requirements To match all the above requirements, for the prototype, we have used Marine Plywood as the main body of the BOT, DC gear motors with wheels to move the BOT according to the command of the user, Motor driver (L98N) to drive the gear motors and NodeMCU as the main controller. Marine Plywood is made in such a way that it can last longer in humid and wet conditions and also prevents the possibility of fungus (Fig. 2).
3.2 MIT App Inventor Now, in order to control the BOT, we will need application software. That’s when we came across the MIT App Inventor. It was originally developed by Google and now it is maintained by the Massachusetts Institute of Technology. This App Inventor is used to develop Apps for Android and IOS environments. As the MIT App Inventor is for newcomers in computer programming language, it uses Scratch Programming language to develop the application. Scratch Programming language has code blocks which the newcomers can use to easily create their own applications [5]. The NodeMCU, when connected to the Wi-Fi with the appropriate code, gives the user an IP address as an output which is then fed as input to the MIT application. The IP address, when fed to the MIT application, establishes a wireless connection with
480
A. Ambre et al.
Fig. 2 Design of Basic BOT using Blender Software
the NodeMCU controller. Using this connection, the BOT can accept user commands and act accordingly. As security and privacy is one of the huge concentrations of our BOT, the network used to control the BOT can easily be protected by introducing strong passwords. Only a user having the appropriate application installed and proper password can control the BOT [4]. Figure 3 shows the display of our application. The title given to the application is BOT CONTROLLER using the text option in the MIT App Inventor. We have used four buttons for left, right, forward and backward, respectively. Apart from that one button at the center acts as a brake by forcefully stopping the BOT at its place if something goes out of control. The “Enter IP address section” is where we need to put in the IP address which is provided by the NodeMCU as an output. Using this input the application and the BOT syncs to the same network and the user can control the BOT from his/her location.
3.3 Construction of BOT Marine Plywood is clamped with four DC gear motors and four wheels. These motors and wheels are driven using a motor driver (L98N) [6]. The driver has two parts, one for the section of Motor A and another for Motor B. It also has six pins for guiding the direction and speed of the motors. Out of the six pins, four are digital that control direction and two are analog that control the rpm of the motor. These two parts consist of three pins each, with two pins as a digital output and one pin as a digital input which is then connected to the NodeMCU’s digital and analog GPIO pins, respectively. After setting all the appropriate connections of the BOT, we provided power to the NodeMCU using a power bank and motor driver using a 12 V power supply.
Auto Target Moving Object with Spy BOT
481
Fig. 3 Application developed using MIT App Inventor
In conclusion, the BOT was successfully controlled using the application created by the MIT App Inventor. The BOT accepted the command via the Wi-Fi module of the controller, i.e., NodeMCU and gave appropriate input to the motor driver which could drive the DC motor and wheels attached to it easily. The BOT was also tested in different terrains and we found satisfactory results.
4 Image Processing As one of the major and most important domains of the paper lies in security and surveillance, image processing plays a very important role. With the help of image processing, we will be able to detect any human face that appears in front of the bot. It will not only detect human faces but with the help of deep learning, it also will be able to recognize any person’s face whose intel has been fed to the bot. As spying is also an important part of the project, with the help of image processing, our BOT would be able to monitor and capture any kind of motion in front of it which would help us identify if there is any intruder trying to enter any restricted or secure area. Also if there is any motion detected and an intruder is captured, their incoming time and outgoing time would be noted by the BOT and would be made available in the form of a csv or excel file to the user in command.
482
A. Ambre et al.
Fig. 4 Face detection using Viola-Jones algorithm
4.1 Face Detection Viola-Jones in the early 2000’s came with an object detection framework, in which the framework was trained to detect a variety of object classes, but it mainly focused on the problem of detection of a human face. Robust nature, real time application and proper detection of a human face are the characteristics which made this algorithm very reliable [7]. Face detection plays a very vital role, as our BOT should be able to detect whether any object appearing in front of it is human or not. In the prototype during the implementation of the code, we have used the Haar Cascade Frontal Face classifier. Haar cascade classifier is an xml file where the code is trained using machine learning approach by a lot of positive and negative images to detect any object, in the prototype human face is detected using the Frontal face classifier. Thus with the help of this our BOT is able to identify objects and thereby give a real time output when a human face is detected [8]. Figure 4 shows how our BOT was able to capture and give an output in real time, when a human face appeared in front of it, thereby giving a very high precision.
4.2 Face Recognition With the help of a face recognition model, the system can match a human face with the help of a digital image or video frame against a database of faces. In the prototype we have used the fundamental concept of EigenFace algorithm wherein the face is recognized by taking unique information about that face, encode it and then compare it with the decoded result of previously stored facial images on the database of the system. Decoding in this technique is performed by the eigenvector calculation and
Auto Target Moving Object with Spy BOT
483
Fig. 5 Face recognition using EigenFace algorithm
then representing those calculations in the form of a matrix. However, EigenFace technique is only restricted for the images having frontal faces [9]. In this way, the use of face recognition in our model is to recognize those faces which have been detected by our BOT using the Viola-Jones algorithm. Figure 5 shows the implementation of face recognition in our model. Here, the images of myself, i.e., Rusahbh and Lionel Messi were fed into the system in the form of a digital image. These digital images were then encrypted by the system. The system with the help of the code then decrypted all those faces which appeared in front of the camera and then compared those images with the encrypted ones which were fed into the system. Thus, when a match was found it recognized that face and gave real time output to the user controlling the BOT.
4.3 Motion Detection For the purpose of spying, motion detection plays a very important role in the prototype. Motion detection is achieved by building a representation of the scene and that is known to be the Background Model and thereafter observing any kind of deviation in comparison to the background model for each incoming frames [10]. With the help of motion detection, the BOT would be able to identify any object which is in motion and also as the motion detection using image processing completely works upon background subtraction or frame differencing, the complete process works on grayscale images. Thus even in low light conditions and without a high-end camera, any object which is in motion would be captured with the help of image subtraction.
484
A. Ambre et al.
Also in support of the code, the BOT would not only identify any object in motion but would also note the incoming and outgoing time along with the date of that object which is in motion. Figure 6 shows how motion detection actually works in the background by converting real time frames into grayscale and then finding deviations in the frame, whereas Fig. 7 represents the actual real time frame bordered in a red rectangle signifying the person who is in motion to the user controlling the BOT. Fig. 6 Grayscale image
Fig. 7 Motion detection in real time
Auto Target Moving Object with Spy BOT Table 1 Incoming and outgoing date and time of the intruder in motion
485
Sr. No
Incoming date and time
Outgoing date and time
1
2021- 05- 04 12:26:47
2021- 05- 04 12:26:49
2
2021- 05- 04 12:26:50
2021- 05- 04 12:26:53
3
2021- 05- 04 12:26:54
2021- 05- 04 12:26:58
4
2021- 05- 04 12:27:00
2021- 05- 04 12:27:04
5
2021- 05- 04 12:27:05
2021- 05- 04 12:27:13
6
2021- 05- 04 12:27:14
2021- 05- 04 12:27:15
7
2021- 05- 04 12:27:16
2021- 05- 04 12:27:19
8
2021- 05- 04 12:27:20
2021- 05- 04 12:27:28
Table 1 denotes the incoming and outgoing date and time of the intruder (person/object) who appeared in front of the BOT and was in motion.
5 Installation of Dummy Weapon The most critical takeaway from the following section is the key aspects in consideration when deploying a gun on a BOT that is being used for high-level security purposes. The emphasis is on how the gun would be able to fire when and when the user commands. Since the installed weapon will be far away from the user, it will be important to use IoT to activate it. Aside from triggering, the mounted weapon should also rotate a certain number of degrees to allow the user to easily shoot down the target. All such relevant factors are discussed ahead.
5.1 Reciprocating Motion A reciprocating motion represents a linear motion that repeats itself up and down or back and forth. It can be used in a wide range of applications, including reciprocating engines and pumps. A single reciprocation loop is made up of two opposing motions called strokes. Using a crank, circular motion can be converted to reciprocating motion, and reciprocating motion can be converted to circular motion. Figure 8 shows how a circular motion of the wheel results in the linear motion of the slider thereby explaining the concept of reciprocating motion.
486
A. Ambre et al.
Fig. 8 Reciprocating motion [13]
5.2 Gun Triggering Until now, the proposed work has shown how it can locate an intruder near the border, but with the rapid advancements at the border, spying is no longer enough; we need something that can behave exactly like a human soldier and attack the intruder on the spot. The surveillance process is handled by image processing, but we have mounted a gun on our BOT for security purposes. Since shooting anyone is an important commitment, it is imperative that the operator issues the command for it. The gun would be moved up and down using the reciprocating motion principle, which was explained earlier. The Servo motor plays a key role in triggering the gun. A servo motor is a selfcontained electrical device that rotates parts of a machine with high efficiency and with great precision. We have introduced an extra function (button) on the MIT app inventor which, when pressed, will rotate the shaft of the servo motor connected to the trigger of the gun. For better precision, we are using a Tower Pro 9 g servo motor. The complete action involves triggering the gun and letting the trigger come back to its rest position [6].
6 Comparison with Existing Systems See Table 2.
7 Results Below Fig. 9 shows the complete BOT with proper gun movement and triggering mechanism and wireless camera. The BOT could perform Image processing, i.e., face detection, recognition and motion detection using the installed wireless camera.
Auto Target Moving Object with Spy BOT
487
Table 2 Comparison with UGV and TRAP [11, 12] Attributes
Auto target moving object with Spybot
Unmanned ground vehicle (UGV)
Tele present rapid aiming platform (TRAP)
Accuracy
High
Low
High
Face detection and recognition
Present
Not present
Not present
Motion detection
Present
Not present
Not present
Controls
Manual
Automatic
Manual
Max speed
20 kmph
8 kmph
Stationary
Weapon
Present
Not present
Present
Cost
Moderate
High
Low
Fig. 9 Side and front view of the complete bot
The BOT also can shoot down the target at its environment only after the command of the individual via MIT App Inventor by the servo motor installed.
8 Conclusion and Future Scope The future will be more secure if robots are involved in the defense sector. The main advantage of this is that the user will be notified prior about the intruder in the selective premises. When we gave a 12 V supply to the BOT it was able to move in all possible directions as commanded, which signifies that the MIT App is working efficiently. When tested on rough roads, the BOT wheels were capable enough to run smoothly. Motion detection and face recognition done using Image processing resulted in 95% accuracy. Tracking and navigating the military BOT as a potential scope would make the project more successful, so the idea of GPS is unavoidable. Installing GPS systems will enable us to monitor the bot’s position and guide it more efficiently. Regardless of the project’s scale, power consumption is an important consideration. As a result, getting a Solar Panel mounted on the military BOT would aid the bot’s energy consumption and utilization.
488
A. Ambre et al.
Acknowledgements Our research was carried out under the supervision of Mr. Chintan Jethva of the college Vivekanand Education Society’s Institute Of Technology. The authors would like to acknowledge his helpful guidance and expertise. The authors also want to thank Dr. Ramesh Kulkarni for providing appropriate suggestions at the proper time. Last but not the least, authors would like to acknowledge the contribution of Mr. R.N. Dalvi in constructing the BOT as required.
References 1. Military Robots—A Glimpse from Today and Tomorrow, 8th International Conference on Control, Automation, Robotics, and Vision Kunming, China,·9th December, 2004 2. Surveillance Robot for Defence Environment. IJRAR- Int. J. Res. Anal. Rev. 6(2) (2019) 3. IoT Practices in Military Applications Proceedings of the Third International Conference on Trends in Electronics and Informatics. ICOEI (2019) 4. Spy Robot for Military Application Using Wi-Fi Module. HBRP Publication (2018) 5. Introduction to Programming Using Mobile Phones and MIT App Inventor. IEEE Conference (2020) 6. DC motors and servo-motors controlled by Raspberry Pi 2B. MATEC Web of conferences (2017) 7. Role of Image Processing in Defence and Military Electronics. IEEE Colloquium on Role of Image Processing in Defence and Military Electronics 8. Face Detection System Based Viola-Jones Algorithm. In: 6th International Engineering Conference Sustainable Technology and Development (IEC-2020), Erbil, Iraq 9. Face detection and recognition using OpenCV. Int. Res. J. Eng. Technol. (IRJET) (2018) 10. Motion detection using image processing. Int. J. Sci. Res. (IJSR) (2015) 11. TRAP—Spiral Development of a Lightweight Remote Weapon System and Integration into a Mobile Sensor—Shooter Network—McConnell Presentation 12. Unmanned Aerial and Ground Vehicle (UAV-UGV) System Prototype for Civil Infrastructure Mission 13. Citation for Reciprocating Motion. https://www.firgelliauto.com/blogs/news/types-of-motion
Power System Restoration at Short Period of Time During Blackout by Plugin Hybrid Electric Vehicle Station Using Artificial Intelligence R. Hariharan
Abstract This paper proposes a comparative analysis on two algorithms Plugin Hybrid Electric Vehicle (PHEV) Station Integrated PPSR Algorithm and PPSR Algorithm for restoring the power system from Blackout in a short period of time by using artificial intelligence. During the Restoration time, the PHEV system was considered as Black Start Generation unit for improving Generation capability and picking up the critical load in the power system. Conventional methodologies take more time to restore the system. In this paper, Plugin Hybrid Electric Vehicle (PHEV) Station Integrated PPSR Algorithm implemented for restoring the Power system by the parallel manner in the IEEE 39 Bus system. Proposed method results are compared PPSR Algorithm. Based on the obtained results PHEV integrated PPSR Algorithm takes less time than the PPSR Algorithm and also the system motivates green energy in the environment. PHEV integrated PPSR algorithm takes 205 min. The improvised efficiency of the PHEV integrated PPSR Algorithm is 16.32% then PPSR Algorithm. Keywords Novel fuzzy logic system · Blackout restoration · PPSR algorithm · Power system restoration · Plugin hybrid electric vehicle
1 Introduction In recent years, the Blackout Restoration has attained much attention from researchers because it has more Power System Failures than Blackouts due to natural disasters, Overloading, Equipment failure, and instability of the Power system [1]. Power system restoration is a complex process. Power system restoration has steps as generation capability optimization, transmission path optimization, and load optimization. Power system restoration follows many constraints and conditions to recover the system. The generation capability optimization process starts with all the generation units in the power system. A challenging task in generation capability R. Hariharan (B) Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_43
489
490
R. Hariharan
optimization finding the Black start unit. After Black start Generation unit can start non-black start units. Black start can start without any electrical source and NonBlackout needs an electrical source to start the unit [2, 3]. Normally Black start unit considers hydro and Diesel plants. After starting the all generation units in the power system. Transmission path optimization’s challenging task is to find the shortest transmission path to recover the system [4]. Load optimization is picking the load demand without violating the constraint limit. This paper focuses on the generation capability optimization process to improve the generation capability short period of time by PHEV system PPSR using a Fuzzy logic system. Plugin Hybrid Electric vehicle Charging station consider a Black start unit. PHEV unit power fed to Non-Black start unit as Crank up power [3, 5]. Fuzzy logic system Control the PHEV Charging station Crank up power based on the requirement at constraint level [6].
2 Plugin Hybrid Electric Vehicle (PHEV) Accumulation According to the International Energy Agency, the penetration of electric vehicles was 2.8% of the total vehicle fleet after the first quarter of 2020. Regional statistics vary with the penetration in 2019 in China was at 4.9% and Europe overall at 3.5%, although some individual European countries have very high penetration rates. Norway has the highest EV Penetration in the world [7]. The amount of PHEV station growing supports flexible generation and distributed energy storage to distribution systems. The PHEV Charging station acts as an aggregator to collect the parameters of PHEV batteries, communicate with operators, and decide the charging or discharging mode of each PHEV. Therefore, the aggregated PHEVs can be regarded as one generation or load in the distribution system. During Blackout restoration, PHEV can act as either generation source or load. A PHEV Station can be used to supply the feeder to critical load immediately after a blackout. PHEV can also serve as a Black start unit to start the Non-Black start unit by feeding Crank power to the NBS unit [8]. The system consists of Transmission and Distribution restoration feeders with loads and PHEV stations to each other. PHEV helps to compensate for the power system imbalance between generation and load demand [9]. PHEV charging station operates as a charging and discharging mode. During load pickup, it acts as a discharging mode in the distribution system, and during excess generation capacity, it acts as a charging mode. In this paper, A Fuzzy logic controller is used to producing optimized PHEV allotment with respect to NBS unit Crank Up power and Critical load demand in the IEEE39 Bus system. Parallel power system restoration applied in IEEE 39 bus system with integration of PHEV system for the short restoration time period. Then a comparative analysis on the conventional PPSR method and PHEV integrated with the PHEV system with respect to system restoration time.
Power System Restoration at Short Period …
491
Transmission restoration Generation Capability improvement Transmission Line Energies
Distribution restoration Critical Load Pickup sequence Other Load Pickup Sequence
Fig. 1 Block diagram of proposed system
3 Materials and Methods The proposed method block diagram is shown in Fig. 1. Fuzzy logic controller is considered the best optimized tool to produce crisp output. Based on requirement of Non-Black start unit crank up the power and critical load pickup power Fuzzy system allocate PHEV output power to the respective critical load and NBS unit. The proposed system simulated using the NI LabVIEW Fuzzy system designer and G Coding-Virtual instrumentation [10].
4 Problem Formulation 4.1 Objective Function 1: Pick up the Critical Load The objective is to restore the load at restoration time period. Activate the load based on the priority. E Load =
n iεL
{(T − tLstart, i) ∗ Pl, i)}
(1)
492
R. Hariharan
where T = Pre-set maximum restoration time. tLstart, i = pickup time of load i. Pl, i = MW of load i. ΩL = set of all loads. MAX
n
{W L , i ∗ (T − tstart, i) ∗ Pl, i)}
(2)
iεL
where WL, i = Load Priority (The most important load is picked up first). tstart, i =
n
{(1 − μl, it) + 1)}
(3)
iεL
μl, it = status of load i at time t (μl, it = 1 for online load; μl, it = 0 for offline load). Then 2 is equivalent to MAX
n
W L , i ∗ Pl, i ∗
iεL
n
{(μl, it) − 1)}
(4)
iεt
4.2 Objective Function 2: Improvement of Generation Capability During Restoration Period EG =
n
{EGcap, i − EGstart, i}
iεL
where EGcap, i = MW generation capabilities over all units. EGstart, i = Start-up power requirements.
(5)
Power System Restoration at Short Period …
493
4.3 System Constraints Real Power Balance n
P G, it ≥
i=G
n
μL jt P L , jt +
j=G
n
P B, kt
(6)
k=B
where PG, it = MW output of generator i at time t. Ω G = Generator set numbers. PB, kt = MW input/output of battery k at time t. Ω B = Set of all PHEV batteries. Reactive Power Balance n i=G
QG, it ≥
n
μL jt Q L , jt
(7)
j=L
where QG, it = MVAR output of generator i at time t. QL, jt = MVAR output of load j at time t.
5 Fuzzy Logic Controller Development System Fuzzy logic controllers are based on Fuzzy sets. Fuzzy input sets are Critical load (PL) and Non-Black Start Generating Unit Crank up power (PNBS Start). Fuzzy output Sets are Black Start unit Output power (PHEV). Each input and output variable is 3–4 linguistic functions. Input variables have 4 linguistic variables and output variables have 5 linguistic variables. All the input and output variables are defined by the triangular function. Input and output variables of PHEV in the Fuzzy logic window are shown in Fig. 2. Fuzzy rules for the proposed system are shown in Table 1. If–then rules are used to develop a fuzzy rule-based system. A total of 16 rules are developed for the proposed system. The Defuzzification method is Centre of Area, Degree of support is 1, Consequent implication is minimum and Antecedent connective is And (Minimum). Simulation result of PHEV Integrated System restoration Fuzzy logic controller is shown in Fig. 3. The Fuzzy test system showed the simulation result based on the input variables. It showed the weight index value, input and output relationship, and plot variables. Simulation result for Fuzzy logic output, weight, and invoked rule for the PHEV system using a Fuzzy logic controller is shown in Fig. 4.
494
R. Hariharan
Fig. 2 PHEV integrated PPSR algorithm—Fuzzy logic controller input and output variables Table 1 Fuzzy rules—PHEV integrated PPSR algorithm
S. no
Critical load
NBS unit crank up power
PHEV unit
1
Zero
Zero
Zero
2
Zero
Low
Low
3
Zero
Med
Med
4
Zero
High
High
5
Low
Zero
Low
6
Low
Low
Med
7
Low
Med
Med
8
Low
High
High
9
Med
Zero
Med
10
Med
Low
High
11
Med
Med
High
12
Med
High
Very high
13
High
Zero
High
14
High
Low
Very high
15
High
Med
Very high
16
High
High
Very high
Power System Restoration at Short Period …
Fig. 3 PHEV integrated PPSR algorithm—Fuzzy logic controller test system
Fig. 4 PHEV integrated PPSR algorithm—Fuzzy logic controller test system
495
496
R. Hariharan
6 Parallel Power System Restoration (PPSR)—Conventional Method PPSR is implemented in IEEE 39 Bus system, in this system sectionalized by island. In this three islands separated each other. Each island starts to restore the system then synchronizes with each island. In Island 1 BS unit is G10. Black start units that provide cranking power to NBS units are G1, G8, and G9. For the BS unit (G10) restoration processes, P2 and P1 are taken for starting up the BS unit. The BS unit provides crank up power to start the G8, G1, and G9 to take 90 min. After generator capability improvement then all the loads are restored simultaneously as L1–L8. The total time for restoring Island 1 in 205 min. Island 1 Restoration Path Details: PPSR is shown in Table 2. Island 2 BS unit is G7. Black start units that provide cranking power to NBS units are G6 and G4. For the BS unit (G10) restoration process P2 and P1 are considered for starting up the BS unit. BS units provide crank up power to start the G6 and G4, to take 60 min. After generator capability improvement then all the loads are restored simultaneously as L9–L12. The total time to restore the Island is 1 in 130 min. Island 1 Restoration Path Details: PPSR is shown in Table 3. Island 3 BS unit is G5. Black start units that provide cranking power to NBS units are G3 and G2. For the BS unit (G10) restoration process P2 and P1 are considering start-up the BS unit. The BS unit provides crank up power to start the G3 and G2, to take 75 min. After generator capability improvement then all the loads are restored simultaneously as L13–L19. The total time to restore the Island is 1 in 200 min. Island 1 Restoration Path Details: PPSR is shown in Table 4. Figure 5 shown PPSR simulation result in IEEE 39 bus system. Table 2 PPSR algorithm in IEEE 39 bus system Island 1 Restored elements
PATH
Actions
G10 (BS)
B30
P2 P1
Time taken (min) 20
G8 (NBS)
B30-B2-B25-B37
P2 P1
30
G1 (NBS)
B30-B2-B1-B39
P2 P1
25
G9 (NBS)
B30-B2-B25-B26-B28-B29-B38
P2 P1
35
L1
B9
P2 P5
10
L2
B3
P2 P4
15
L3
B18
P4, P5
15
L4
B25
P2 P5
10
L5
B26
P2 P5
10
L6
B27
P2 P5
15
L7
B29
P2 P4
10
L8
L5, L6
10 Total
205
Power System Restoration at Short Period …
497
Table 3 PPSR algorithm in IEEE 39 bus system Island 2 Restored elements
Path
Actions
G7 (BS)
B36
P2 P1
Time taken (min) 20
G6 (NBS)
B36-B23-B22-B35
P2 P1
30
G4 (NBS)
B36-B23-B19B33-
P2 P1
25
L9
B21
P2 P5
15
L10
B9
P2 P5
10
L1
B3
P2 P4
15
L12
B18
P4, P5
15
Total
130
Table 4 PPSR algorithm in IEEE 39 bus system Island 3 Restored elements
Path
Actions
Time taken (min)
G5 (BS)
B34
P2 P1
20
G3 (NBS)
B34-B20-B17-B15-B14-B13-B10-B32
P2 P1
40
G2 (NBS)
B34-B20-B15-B14-B13-B12-B11-B6B31
P2 P1
35
L13
B20
P2 P5
10
L14
B13
P2 P5
10
L15
B15
P2 P4
15
L16
B31
P4, P5
15
L17
B15-B4
P2 P5
20
L18
B7
P2 P5
15
L19
B8-B9
P2 P5
20
TOTAL
200
Comparave Analysis PPSR Vs PHEV integrated PPSR
300 250 200 150 100 50 0 Island 1
Island 2
Island 3
Fig. 5 Comparative analysis PPSR versus PHEV integrated PPSR
Total
498
R. Hariharan
7 Results and Discussion Table 5 shows the Fuzzy logic controller optimized result based on input variables. Fuzzy logic controller produced optimized PHEV output-based requirements of critical load and NBS unit crank up power with all constraints satisfied. Using PHEV for PSR in IEEE 39 Bus system at time consumption is shown in Table 6. PHEV Integrated PPSR Algorithm all NBS units started by PHEV system and each Island Critical load are started by PHEV system. In Island 1 L1–L4 considered as a critical load, the total time taken to restore Island 1 in 165 min. Island 2 L9, L10 is considered a critical load, the total time taken to restore Island 2 in 110 min. In Island 3, L13, Table 5 Fuzzy simulation result—PHEV integrated PPSR algorithm S. no
Input variables Critical load (PL )
Output NBS crank power (PNBS Start )
Weight index
Invoked rule
BS unit (PPHEV )
1
0
0.427807
0.500
0.639037
3
2
0.47058
0.427807
0.7999
0.639037
11
3
0.877005
0.3475
1
0.26232
14
4
0.508021
0.08556
0.80009
0.42780
10
5
0.50802
0.641711
0.80411
0.29144
11
6
0.812834
0.64171
1
0.291
15
Table 6 Fuzzy simulation result—PHEV integrated PPSR algorithm Island 1 Process
Island 2 Restoration time in minutes
Process
Island 3 Restoration time in minutes
Process
Restoration time in minutes
G10(BS)
20
G7 (BS)
20
G5 (BS)
20
G8 (NBS)
20
G6 (NBS)
20
G3 (NBS)
20
G1 (NBS)
20
G4 (NBS)
20
G2 (NBS)
20
G9 (NBS)
20
L9 (CL)
10
L13 (CL)
10
L1 (CL)
10
L10 (CL)
10
L14 (CL)
10
L2 (CL)
10
L1
15
L15
15
L3 (CL)
10
L12
15
L16 (CL)
10
L4 (CL)
10
L17 (CL)
10
L5
10
L18
15
L6
15
L19
20
L7
10
L8 Island 1
10 165
Island 2
110
Island 3
150
Power System Restoration at Short Period …
499
Table 7 Comparative analysis—PHEV integrated PPSR algorithm versus PPSR Island
PPSR (conventional method) PPSR integrate with PHEV
Island 1
205 min
165 min
Island 2
130 min
110 min
Island 3
200 min
150 min
Max (Island 1, Island 2, Island 3)
205 min
165 min
With synchronous subsystem
245 min
205 min
Total restoration time for IEEE 39 4.0833 h bus in hours
3.41 h
L14, L16, L17 are considered as critical load, the total time taken to restore the Island 3 in 150 min. The total restoration time is 205 min to restore the whole system. Comparative analysis of PPSR and PHEV integrated PPSR are shown in Fig. 5. Table 7 Comparative analysis PPSR Algorithm and PHEV Integrated PPSR Algorithm. PHEV integrated PPSR methods take less restoration time than PPSR Algorithms. The PPSR Algorithm takes 245 min to restore the whole system. PHEV integrated PPSR Algorithm takes 205 min. Improvised efficiency of the proposed system is 16.32% then PPSR Method. The limitation of the proposed system considers the uncertainty of the PHEV battery charging and discharging module. In future design the self-healing power system for recovery of the Blackout and Fault in the power system in a quick manner. It helps the sustainable power system be possible in the future.
8 Conclusion Based on the obtained results the PHEV integrated PPSR Algorithm Restore the system in less time than PPSR Algorithm. The PPSR algorithm takes 245 min to restore the whole system. PHEV integrated PPSR Algorithm takes 205 min. Improvised efficiency of the PHEV integrated PPSR Algorithm is 16.32% then PPSR Algorithm.
References 1. Talib, A., Najihah, D., et al.: Power system restoration planning strategy based on optimal energizing time of sectionalizing islands. Energies 11(5), 1316 (2018) 2. Patsakis, G., et al.: Optimal black start allocation for power system restoration. IEEE Trans Power Syst 33(6), 6766–6776 3. Hariharan, R.: Design of Controlling the Charging Station of PHEV System Based on Virtual Instrumentation, pp. 43–46 (2012)
500
R. Hariharan
4. Rajesh, S., Hariharan, R., Yuvaraj, T.: Cuckoo search algorithm based critical load restoratinwith dgr using virtual instrumentation. Int. J. Innov. Technol. Explor. Eng. (2019) 5. Roggatz, C., Power, M., Singh, N.: Power system restoration: meeting the challenge to resiliency from distributed generation. IEEE Power Energ. Mag. 18(4), 31–40 (2020) 6. Qiu, F., Li, P.: An integrated approach for power system restoration planning. Proc. IEEE 105(7), 1234–1252 (2017) 7. The State of the Electric Vehicle Market (2020). November 30, 2020. https://www.newenergy solar.com.au/renewable-insights/renewable-energy/the-state-of-the-electric-vehicle-market 8. Hariharan, R., Usha Rani, P.: Blackout restoration process by PHEV charging station integrated system using virtual instrumentation. In: 2017 Fourth International Conference on Signal Processing, Communication and Networking (ICSCN). IEEE (2017) 9. Hariharan, R., Usha Rani, P., Muthu Kannan, P.: Sustain the critical load in blackout using virtual instrumentation. In: Intelligent and Efficient Electrical Systems. Springer, Singapore, pp 77–88 (2018) 10. Hariharan, R.: Design of controlling the smart meter to equalize the power and demand based on virtual instrumentation. In: 2013 International Conference on Power, Energy and Control (ICPEC). IEEE (2013)
Robust Adversarial Training for Detection of Adversarial Samples Sandip Shinde, Jatan Loya, Shreya Lunkad, Harsh Pandey, Manas Nagaraj, and Khushali Daga
Abstract The advent of machine learning in our lives provides much needed convenience and speed, but this also comes with some unwanted danger. Since the process is getting governed by machine learning models, which are prone to adversarial samples which could lead to catastrophic events like car crashes in autonomous cars, adversarial attacks range from one pixel attack to fast gradient sign method attack. Potential defences against these attacks vary from denoisers to adversarial learning. A system is proposed for robust adversarial learning to protect against real-life danger which could be averted if such systems are implemented and further advanced. Keywords Adversarial attack · Defence · Deep learning · Fast gradient sign method · Basic iterative method attack
1 Introduction There has been a significant advancement in deep neural networks (DNN) in recent years. The application of DNN ranges over a variety of features like face recognition, speech recognition, natural language processing, fraud detection, virtual assistants and many more. There has been a massive increase in the adaption of machine learning and deep neural networks in our lives, right from self-driving automated cars to smart homes to innovations in the banking sector. It has become an essential and deep-rooted part of our daily lives. However, with increased accessibility, the threat increases. The DNN is quite vulnerable to malicious attacks caused by adversarial samples. These adversarial samples are generated by perturbing poisoned data into the data set, which causes distortion of the data sets and forces the DNN models to misbehave. These adversaries can easily add malignant perturbations in the machine learning model, which go unnoticed by humans. These perturbations cause the machine learning model to malfunction and give the wrong prediction with very S. Shinde (B) · J. Loya · S. Lunkad · H. Pandey · M. Nagaraj · K. Daga Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_44
501
502
S. Shinde et al.
high confidence. This might lead to unfavourable results at the output of the DNN model. The DNN models are pretty robust. However, they cannot avoid the attacks like adversarial samples. With the increasing influence of technology, it is essential to ensure the machine learning and deep learning neural networks’ security and robustness. There are three broad categories of attacks based on the threat model: white-box attack, black-box attack and grey-box attack. The main difference between these three is their knowledge of the adversaries regarding the data sets, algorithms and model architecture. In white-box attacks, the attacking adversaries are presupposed to have complete knowledge of their target model. In grey-box attacks, their knowledge is restricted to the structure of the model. The attacking adversaries have no idea about the parameters of the model or the data in it. In black-box attacks, the adversaries can only apply brute force or query access to attack the model. They have no insight regarding the model whatsoever. Several other attack methods cause poisoning of data sets by inserting adversarial samples in the data sets, like DeepXplore [1] attack, DeepFool [2], fast gradient sign method (FGSM) [3], basic iterative method (BIM) [4], Carlini and Wagner (C&W) [5] attack and many more. These attacks can also be categorized based on their influence on the classifier, violation of security and specificity. Different attacks have different effects on the data sets. Every data set has different properties and behaves differently under attack. Hence, the detection mechanisms also have a varied effect on different data sets. Researchers have compared different data sets like MNIST [6], CIFAR-10 [7], ImageNet [8] and CelebA [9] and the effect of attacks on these data sets. Several defences have been formulated to protect our machine learning models against attacks. Approaches such as threat modelling, attack impact evaluation, information laundering and attack simulation have proved to be adequate to quite some extent. There are many mechanisms and models which can be used for the detection of adversarial samples. Researchers have proposed various methods to increase the detection models’ robustness like neural network invariant checking [10], steganalysis, feature squeezing [11], statistical defence approach and convolutional filter statistics [12]. AI is the new electricity. Machine learning algorithms are getting smarter day by day with more data and computing power. A lot of machine learning models, including neural networks, are extremely vulnerable to adversarial samples. And if the model is being used at a large enterprise, it becomes a matter of concern. There are attacks in which changing only a single pixel of an input image causes the machine learning model to fail, which could be fatal. Hence, protecting against these possible attacks is of utmost importance. There is scope to make an all-in-one model which is developed by combining the best detectors for specific attacks. This will result in very good accuracy against many known attacks. In this paper, a method for robust adversarial learning as a defence mechanism is proposed. Review of numerous existing researches based on the detection and avoidance of perturbations caused by adversarial samples in deep neural networks has also been done.
Robust Adversarial Training for Detection of Adversarial Samples
503
2 Background 2.1 Adversarial Examples Adversarial samples are not perceptible to human eyes but can fool neural networks very quickly. This vulnerability becomes very crucial for the machine learning models. Adversarial examples can be crafted using various methods that may or may not be model independent. Adversarial examples are also created to conduct a specific attack, whether it be a white-box attack or black-box attack. Creating adversarial examples for a specific model or type of attack requires information of the model and the model’s working to make a more targeted and successful attack.
2.2 Fast Gradient Sign Method (FGSM) Attack FGSM attack method is one of the most popular and prevalent adversarial attacks. This method uses the gradients of loss with respect to the input image to create a new image that maximizes the loss; the resulting image becomes an adversarial image. This equation explains this same concept: advx = x + ∗ sign(∇x J (θ, x, y))
(1)
where advx represents adversarial image, ‘x’ is input image (Original), ‘y’ input label (Original), is the multiplier, θ is the parameter of the model, and ‘J’ is the loss function. As it can be observed in the above example that even small values of like 0.007 can be enough to misclassify objects with very high degree of confidence. An interesting property here is the fact that the gradients are taken in relation to the input image. This is done to create an image that enhances the loss. The way to do this is to find out how much each pixel in the image contributes to the amount of loss and then to apply the perturbation accordingly. This works very quickly because it is easy to determine how each input pixel contributes to losses through the chain rule and obtain the required gradients. The only purpose is to deceive an already trained model.
2.3 Basic Iterative Method (BIM) Attack The basic iterative method (BIM) is the iterative version of FGSM. Unlike FGSM, where the perturbation was applied in a single step, it is now applied in multiple steps using a small step size in the basic iterative method. In particular, it repeats the one step gradient update (i.e. FGSM) while there is an upper bound to the size of the
504
S. Shinde et al.
adversarial perturbation. Here, the intermediate result pixel values are clipped after each step to be in the original image’s neighbourhood. For large image data sets, it can compute targeted perturbations. This method can help increase reliability and safety assessments in deep neural network practical applications like medical image-based diagnosis. In the fast method, higher values of destroy the image’s content because it adds -scaled noise to each image and makes it unrecognizable even by a human. On the other hand, iterative methods do not destroy the image even with higher , and they exploit much finer perturbations and at the same time confuse the classifier with a higher rate. The basic iterative method produces better adversarial images when < 48, as increases, it cannot improve. Iterative methods use a more subtle kind of perturbations, which are more likely to be destroyed by photo transformation. The destruction rate of adversarial example for iterative methods could reach 80–90%.
3 Literature Review In the research conducted by T. Pang et al. towards robust detection of adversarial examples [13], they have presented a novel method where their model minimized the reverse cross-entropy. This method was tested for performance under various threats like performance under oblivious attack, performance under white-box attack and the performance under black-box attack. The authors used a kernel density detector to implement a threshold test strategy. They have substituted common cross-entropy with reverse cross-entropy. A new method for detecting adversarial samples using feature squeezing is proposed by Xu et al. [11]. The search space which is available to an adversary is reduced by feature squeezing. It does so by merging the samples belonging to various feature vectors in the original space into one sample. They have explored two types of feature squeezing methods, the first is spatial smoothing, and the second is reducing the colour bit depth of each pixel. They have discussed the effectiveness of different squeezers under different attacks. They have also considered how multiple squeezers might work under a joint framework. In their research, Xin Li et al., in their research adversarial examples detection in deep networks with convolutional filter statistics [12], have designed a cascade classifier to detect adversarial samples efficiently and even trained for a specific adversarial generating mechanism; this classifier can successfully detect adversarial images from a different technique. A simple average filter also helps to recover images. Instead of training a neural network to discover adversarial attacks, their approach is predicated on statistics on the output from convolutional layers. By research, they have found out that adversarial examples can be recovered by performing a small average filter on the image. The resulting classifier can successfully detect adversarial from a completely different mechanism as well, even if it is trained from one particular adversarial generating mechanism. Explaining and harnessing adversarial examples by Goodfellow et al. [3] made an interesting discovery where they discovered that various machine learning mod-
Robust Adversarial Training for Detection of Adversarial Samples
505
els, including neural networks, were vulnerable to adversarial examples which are only slightly different from the correct examples. The generalization of adversarial examples across various models can be explained due to adversarial deviation being aligned with weight vectors. The researchers have worked extensively to formalize the notion that the vulnerability of neural networks to adversarial examples is due to their linear nature. This notion is supported by the most captivating fact that adversarial examples generalize across different architectures and training sets. This resulted in the formulation of a fast and straightforward technique for generating adversarial examples. The generalization of adversarial examples across various models can be explained due to adversarial deviation being aligned with weight vectors. In ML-LOO, detecting adversarial examples with feature attribution by Pei et al. [14], the authors observed that there is a significant difference in the feature attributions of adversarial examples from those of original ones. A new framework is introduced to detect adversarial examples by thresholding a scale estimate of feature attribution scores. They have also included multilayer feature attributions. It can tackle attacks that have mixed confidence levels. This technique is in a position to detect adversarial examples of mixed confidence levels and the transfer between different attacking methods. It was observed that the feature attribution map of an adversarial example is different near the boundary from that of the corresponding original example. This method achieves far better performance in detecting adversarial samples generated from attack methods on MNIST, CIFAR-10 and CIFAR-100.
4 Proposed Method A system has been developed for adversarial learning, where the system generates adversarial images with the attack which are needed to be defended against, and then system trains the model on the adversarial images along with regular images. A LeNet architecture proposed by LeCun et al.[15] is implemented, where a convolutional neural network (CNN) with 5 × 5 kernel size and a MaxPool layer of 2 × 2. Dropout with a probability of 0.5, which activates only half of the neurons at the time of training, reduces overfitting and is used as a regularization technique. ReLU was used as the activation function. Using ReLU is advantageous because it accelerates the convergence of stochastic gradient descent in comparison with tanh or sigmoid function. This model was used to create adversarial samples using FGSM attack for different values of to create larger and more varied adversarial attacked images (Fig. 1). If training is done only on adversarial images, then the network fails to classify clean images (non-adversarial images). This effect was observed by researchers in the semantic attack [16], where the model was trained only on adversarial images, i.e. images produced using the semantic attack, and they saw that the model failed miserably on clean images. The proposed system trains a CNN on a 50–50 split of clean and adversarial images, each type having 50k images from the MNIST handwritten digits data set (Fig. 2).
506
S. Shinde et al.
Fig. 1 Model architecture to create FGSM & BIM samples
Fig. 2 Model architecture to detect FGSM & BIM samples
The above model was trained to detect adversarial samples, irrespective of epsilon values. It was trained on the MNIST handwritten digits data set, comprised of 60,000 images of 28 × 28 dimension. The image is passed through a convolution layer of 5 × 5 kernel size. Padding of 2px is applied on both sides so that H (height) and W (width) remain the same after passing through convolution. Batch normalization is used for the regularization effect. The output of this layer is 28 × 28 × 10, which is passed through a MaxPool layer with a kernel size of 2 × 2. After MaxPool, the dimension reduces to 14 × 14 × 10. Then dropout is used for regularization to prevent overfitting with 0.5 probability. Then another convolution layer with kernel size 5 × 5 is used, which results in 14 × 14 × 20 channel images. MaxPool layer of 2 × 2 kernel
Robust Adversarial Training for Detection of Adversarial Samples
507
size is applied, resulting in 7 × 7 × 20 dimension. Again, dropout is used with 0.5 probability. This 3D matrix is flattened out to a vector with dimension 280 × 1, which is passed through the linear layers, and it outputs the prediction for ten classes of MNIST. This prediction is passed through a softmax layer. Hence, output for each class becomes probability for the classes. Output with the maximum probability is considered as the classification prediction. ReLU is used as an activation function. The convolutional layers’ outputs are flattened into a vector and then fed through linear layers for detecting if a sample is adversarial or clean.
5 Experiment and Results Adversarial samples were generated by FGSM attack and BIM attack on various values of ranging from 0.05 to 1. They system has chosen five values of , i.e. 0.05, 0.1, 0.2, 0.3 and 1, to keep the data set diverse of low-level and high-level attacks. For each value of , 10,000 images were generated. These adversarial samples have even distribution for different values of . Hence, 50,000 FGSM attacked images and 50,000 BIM attacked images were generated. BIM attack takes significantly longer in comparison with FGSM attack since it is an iterative version of the FGSM attack itself. Binary cross-entropy was the loss used for our model, which was initialized using the sigmoid function. The attacking model was trained for 20 epochs with a learning rate of 0.01 with a batch size of 128. The detection model also used binary cross-entropy loss, initialized with sigmoid function. The batch size was set to 256 for faster training of the model. The model was trained for 20 epochs with a learning rate of 0.001. The proposed system obtained an accuracy of 99.60% for detecting adversarial samples (Fig. 3). Clean images have = 0; the above images specified misclassification caused by the attack over each image. As it can be observed, with an increase in value of , visible distortion is added to the image. For = 0.3, there is a significant distortion that might look like a compromised or corrupted image. Since the proposed system has used the MNIST handwritten digit data set, the images have a 28 × 28 pixel dimension, which is very small. Hence, when the value = 1, the distortion level is so high that it stops looking like a sample from the data set and begins to look like gibberish. The system also checked whether the attack is working correctly to misclassify the samples and have tested the adversarial samples against them (Figs. 4). As it can be observed by the graph in Figs. 5 and 6, accuracy of the LeNet model decreases with increase in value of . This evaluation was done using test model of LeNet architecture. Even for very small values of , the accuracy dips sharply (Table 1).
508
S. Shinde et al.
Fig. 3 FGSM attack for range of epsilon
The trend of accuracy is not linear, and the accuracy falls nonlinearly with increase in value. It has least accuracy for highest =1, but as discussed earlier, in this value of , the image can no longer be considered to be from the same distribution for data sets having very small image size.
Robust Adversarial Training for Detection of Adversarial Samples
509
Fig. 4 BIM attack for range of epsilon
6 Conclusion and Future Scope In this paper, FGSM and BIM attack methods were studied which leads to some degree of misclassification even for very low values of epsilon. It was also learnt that higher values of epsilon can only be feasible where the image size is large so as to not have only noise as a result. Implementation of a robust adversarial learning that takes lesser computation resource owing to a smaller and more efficient network as a
510 Fig. 5 Accuracy drop for FGSM attack
Fig. 6 Accuracy drop for BIM attack
S. Shinde et al.
Robust Adversarial Training for Detection of Adversarial Samples Table 1 Epsilon & FGSM test accuracy & BIM test accuracy Epsilon FGSM 0.05 0.1 0.2 0.3 1
0.7745 0.3829 0.1725 0.1297 0.1341
511
BIM 0.5769 0.0648 0.0307 0.0306 0.0306
defence mechanism against FGSM & BIM attack with an accuracy of 99.60% over MNIST handwritten digits data set was done. Adversarial training can also be done in conjecture to novel techniques that use denoisers and average filters to detect and recover original images. This will result in particularly good accuracy against many known attacks.
References 1. Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. In: proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18 (2017, October) 2. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016) 3. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015) 4. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale (2016). CoRR, arXiv preprint arXiv:1611.01236 5. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017, May) 6. LeCun, Y., Cortes, C.: MNIST handwritten digit database. Available at: http://yann.lecun.com/ exdb/mnist/. (2010) 7. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). CoRR, arXiv preprint arXiv:1207.0580 8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255. IEEE (2009, June) 9. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015) 10. Ma, S., Liu, Y.: Nic: detecting adversarial samples with neural network invariant checking. In: Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019) (2019, February) 11. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society (2018)
512
S. Shinde et al.
12. Li, X., Li, F.: Adversarial examples detection in deep networks with convolutional filter statistics. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5764–5772 (2017) 13. Pang, T., Du, C., Dong, Y., Zhu, J.: Towards robust detection of adversarial examples. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, pp. 4584–4594, December 3-8, 2018, Montréal, Canada (2018) 14. Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: Ml-loo: Detecting adversarial examples with feature attribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, issue No. 4, pp. 6639–6647 (2020, April) 15. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541– 551[1] (1989) 16. Joshi, A., Mukherjee, A., Sarkar, S., Hegde, C.: Semantic adversarial attacks: Parametric transformations that fool deep classifiers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp. 4773–4783 (2019)
Performance Evaluation of Shallow and Deep Neural Networks for Dementia Detection Deepika Bansal, Kavita Khanna, Rita Chhikara, Rakesh Kumar Dua, and Rajeev Malhotra
Abstract Dementia is a neurocognitive disorder responsible for decreasing the overall quality of life for patients. The disease has emerged as a worldwide health challenge in the age group of 65 years or above. Deep learning has been effectively applied for the prediction of the presence of Dementia using magnetic resonance imaging. In this work, the performance of two different frameworks is assessed for the detection of dementia and normal subjects using MRI images. The first framework uses the first-order and second-order hand-crafted features as an input of shallow neural networks. Instead, the second framework used pre-trained convolutional neural networks based on automatic feature extraction using various pre-trained neural networks. The results show that the subsequent framework performs better contrasted with the first methodology in terms of various performance measures. The best accuracy obtained using Alexnet pre-trained CNN is 83%. Keywords Dementia · First-order features · Second-order features · Shallow neural networks · Convolutional neural network
1 Introduction Dementia is a progressive brain disease that includes chronic declination of mental ability due to damaged neurons of the brain. Dementia patients experience memory D. Bansal · R. Chhikara Department of Computer Science and Engineering, NCU, Gurugram, Haryana, India e-mail: [email protected] K. Khanna (B) Dwarka Campus, Delhi Skill and Entrepreneurship University, New Delhi, India e-mail: [email protected] R. K. Dua Department of Neurosurgery, Fortis Hospital, New Delhi, India R. Malhotra Department of Neurosurgery, Max Super Speciality Hospital, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_45
513
514
D. Bansal et al.
loss, leading to disruption of their daily life. As noted by the World Health Organization (WHO), around fifty million population across the globe are experiencing dementia while around thirty million new instances are anticipated to arise in the next three decades [1]. Early treatment of this neurodegenerative disease is a critical part of improving the condition of the influenced patients and their families. Various cognitive, clinical, and genetic tests are available for diagnosing dementia. Magnetic Resonance Imaging (MRI) images are utilized for diagnosing the disease, as they hold a connection with the topology of the brain, and also, the modifications in the morphological structure of the brain are easily visible [2]. There is no cure for dementia, the only precaution is the early detection of dementia. Recently, researchers and neurologists are contributing to the early diagnosis of dementia and have achieved encouraging results [3]. Machine learning (ML) techniques have shown great potential for providing aid in the diagnosis of dementia. The traditional artificial neural network (ANN)/machine learning techniques for detecting dementia comprises of the following steps (1) Image Acquisition, (2) Preprocessing, (3) Extraction of features, (4) Classification, and (5) Validation. The majority of the studies proposed in the literature for the detection of the disease are based on supervised learning approaches such as ANN, decision tree (DT), support vector machine (SVM), or Bayes classifiers [4–9]. The extraction of a good set of features is the crucial step for the model and correct detection of the disease. The ANNs are classified into 2 categories on consideration of the hidden layers, shallow neural networks, and deep neural networks [10]. Shallow networks are networks having a single layer and deep networks have greater than one layer. There are few studies available that assessed differences between shallow neural networks and deeper neural networks [11, 12]. It is clear from the literature that by introducing multiple layers, ANN becomes capable of learning features at a different level of abstraction. This capability of DNN makes easy generalization in comparison to shallow architectures. So, thereafter the researchers explored deep learning (DL). DL methods have arisen as the greatest development in machine learning. DL automatically unsheathes the discriminative features from the images, which helps in avoiding the errors while feature engineering. Different researchers implemented DL using convolutional neural network [13, 14], residual nets [15], dense networks [16], and many other networks [8, 17] in the field of dementia. Suk et al. [18] devised a model which concatenates the feature of both MRI and PET images with a Deep Boltzmann Machine (DBM). Liu et al. [19] used a zero-masking strategy for the fusion of data and extracted the information from multi-modalities. Sarraf et al. [20] also performed the classification of dementia patients from normal ones using CNN and LeNet-5 with 30 epochs. Classification with 3 frameworks of CNN namely, GoogleNet, Resnet18, and Resnet152 was proposed adopting 100 epochs for detecting dementia by Bidani et al. [21]. The OASIS dataset was used which leads to the attainment of an accuracy of around 80%. Tae-Eui Kam et al. [22] developed a framework of deep learning using CNN brain functional networks for mild cognitive impaired diagnosis. Jiang et al. [23] classified the early MCI v. normal subjects via the VGG-16 network for transfer learning, along with Lasso, for feature selection and SVM, for classification. The three models of CNN were observed by tweaking the
Performance Evaluation of Shallow and Deep …
515
number of layers using structured MRI images from OASIS and MIRIAD datasets for the diagnosis of AD [24]. A detailed survey and implementation for the detection of dementia have been carried out in works [25–30]. The main aim of the current study is to assess the convolutional neural network/deep learning approach against the conventional (i.e., shallow) machine learning methods for the early detection of dementia. Firstly, the first and second-order features are being extracted from MRI and then classification is performed using a few machine learning techniques and neural network (NN). Secondly, eight notable pre-trained CNN models are assessed using MRI images for binary classification of dementia and normal patients. The remainder paper is coordinated as following: the data source and the frameworks used are discussed in Sect. 2. The experimental results are explained in Sect. 3. And finally, the conclusion of the study is in Sect. 4.
2 Material and Method 2.1 Dataset Open Access Series of Imaging Studies (OASIS) is a freely accessible MRI dataset used in this study [31, 32]. A sample image of dementia and normal MR brain image is expressed in Fig. 1. The OASIS dataset comprises 416 samples (images) of different subjects aged between 18 and 96 years. 316 samples are normal controls while only 100 samples are of dementia patients. To balance the data properly, this study considers 200 samples, (including 100 dementia, and 100 normal controls). The original dimensions of the images are 176 × 208 × 176. Fig. 1 Sample image of normal and dementia MR brain image
(a) Normal
(b) Dementia
516
D. Bansal et al.
Table 1 Clinical brain MRI dataset
Image class
# MRI Slices
Normal
100 (subjects) × 50 slices = 5000 slices
Dementia
100 (subjects) × 50 slices = 5000 slices
Total
10,000 slices
2.2 Data Preprocessing The preprocessing of MRI images plays a vital role due to the high complexity of the images. The segmented images generated from the masked atlas image are used in this study. The Brain Extraction Tool is used for removing the facial features [33]. The 3D images are sliced into 2D images using MRIcro software [34]. The mid 50 slices containing the relevant information are extracted from a total of 176 slices obtained. The same procedure is applied to all the images. In total, 10,000 PNG images are utilized in this work for further processing. The categorization of the dataset is given in Table 1.
2.3 Classification Framework The two frameworks designed for the classification of MRI images as dementia and normal subjects are described in this Section. Specifically, the work process of the first framework depends on the extraction of the first and second-order features. The classification is performed using shallow neural networks. While the second framework considers the pre-trained CNNs as profound neural networks for discriminating the two classes.
2.3.1
First Framework Using Shallow Neural Network and Machine Learning
The first framework extracts the first-order and second-order features from the MRI images. The first-order features are extracted using a histogram of an image [35, 36]. The histogram is easy to compute provides us with many characteristics of an image. The features extracted in this study are Mean, Standard Deviation, Variance, Skewness, Kurtosis, Entropy. The variance is the measure of the gray-level’s deviation from the mean. Skewness measures the degree of asymmetry and kurtosis the sharpness of the histogram. All these statistical properties of the image are defined as Eqs. (1–5). Mean :
μ=
G−1 i=0
i p(i)
(1)
Performance Evaluation of Shallow and Deep …
Variance :
517
σ = 2
G−1
(i − μ)2 p(i)
(2)
i=0
Skewness :
μ3 = σ −3
G−1
(i − μ)3 p(i)
(3)
(i − μ)4 p(i) − 3
(4)
p(i)log2 [ p(i)]
(5)
i=0
μ4 = σ −4
Kurtosis :
G−1 i=0
Entropy :
H= −
G−1 i=0
where i = 0, 1, …, G − 1 are discrete values of intensity levels in an image, where p(i) is the approximate probability density of occurrence of the intensity levels. The second-order features consider the spatial relationship of pixels using the graylevel co-occurrence matrix (GLCM), which is also known as the gray-level spatial dependence matrix. GLCM calculates the pairs of pixels with specific values in a specified spatial relationship occurring in an image. The features calculated are Contrast, Correlation, Energy, Homogeneity defined in Eqs. (6–9). The Contrast is a measure of local variations in the GLCM. Correlation is the measure of the joint probability occurrences of the specified pixel pairs. The sum of squared elements in the GLCM is provided as Energy or Angular Second Moment. Contrast :
|I1 − I2 |2 log P(I1 I2 )
(6)
I1 I2
Correlation :
(I1 − μ1 )(I2 − μ2 )P(I1 , I2 ) σ1 σ2
I1 I2
Homogeneity :
I1 I2
Energy :
(7)
P(I1 , I2 ) 1 + |I1 − I2 |2
(8)
P(I1 I2 )2
(9)
i. j
where I 1 , I 2 is the frequency of two pixels, P(I 1 , I 2 ) is the occurrence of some graylevel configuration using relative frequencies. These first-order and second-order extracted features are then used for the classification of dementia and normal MRI images for the early detection of the disease. The classification is performed using different traditional machine learning techniques and shallow neural networks. The techniques include the following:
518
D. Bansal et al.
Table 2 Properties of pre-trained networks Network
Depth
Image ınput size
Size (MB)
Parameter (Millions)
AlexNet
8
227-by- 227
227
61
SqueezeNet
18
227-by-227
5.2
1.24
GoogleNet
22
224-by-224
27
7
VGG16
16
224-by-224
515
138
VGG19
19
224-by-224
535
144
Resnet18
18
224-by-224
44
11.7
Resnet50
50
224-by-224
96
25.6
Resnet101
101
224-by-224
167
44.6
• • • • • •
Support Vector Machine (SVM) [37] K-Nearest Neighbor (KNN) [38] Naïve Bayes (NB) [39] Decision Tree (DT) [40] Linear Discriminant Analysis (LDA) [41, 42] Neural Network (NN) [48]
2.3.2
Second Framework Using Convolutional Neural Network
CNN can classify MRI images as input with automatic feature extraction. The second approach uses eight pre-trained CNNs namely, AlexNet [43], Squeezenet [44], GoogleNet [45], VGG16 [46], VGG19 [46], Resnet18 [47], Resnet50 [47], and Resnet101 [47]. The properties of the pre-trained models used are presented in Table 2.
3 Experimental Results The classification results obtained are reported in this section. The results of the shallow neural networks are presented in the first subsection, whereas the pre-trained CNN results are discussed in further subsections. All the experiments are performed using the existing networks for calculating the classification accuracy acquired with the presented experimental setup. The performance of the presented approach is validated by computing notable performance measures. The classification accuracy is the probability of correctly identified individuals defined in Eq. (10). The other performance measures adopted in this work are depicted in Eqs. (11–17) using various relationships between T P , F P , T N , and F N where: T P = True positive; F P = False positive; T N = True negative; F N = False negative.
Performance Evaluation of Shallow and Deep …
519
(TP + TN ) (TP + TN + FP + FN )
(10)
Sensitivity =
TP (TP + FN )
(11)
Specificity =
TN (TN + FP )
(12)
Accuracy =
False Negative Rate =
FN (FN + TP )
(13)
False Positive Rate =
FP (FP + TN )
(14)
F1 − Score =
TP +
TP 1 2 (FP
+ FN )
(15)
Positive Likelihood Ratio =
TP (FP + TN ) . FP (TP + FN )
(16)
Negative Likelihood Ratio =
FN (FP + TN ) . TN (TP + FN )
(17)
3.1 Performance of the First Framework Regarding the shallow neural networks, the results are recorded in terms of Accuracy. The comparison of obtained values of Accuracy is presented in Table 3. SVM and NN perform almost similarly with a difference of one percent accuracy. As expected, the other classifiers were recorded with lower accuracy. The design and optimization of NN make the behavior predictable and reasonable. The overall capabilities of the classifier are influenced by the process of extracting the relevant features. Table 3 Results obtained with shallow neural networks
Approach
Accuracy (%)
kNN
76.04
SVM
79.37
NB
77.72
DT
76.01
LDA
75.80
NN
80.34
520 Table 4 Properties of pre-trained networks for second framework
Table 5 Classification accuracy using pre-trained networks
D. Bansal et al. Parameters
Value
Epochs
6
Iterations per epoch
700
Max. Iterations
4200
Hardware resource
Single GPU
Learning rate schedule
Constant
Learning rate
0.0001
Optimizer
SGDM
Mini batch size
10
Verbose
FALSE
Pre-trained Network
Accuracy (%)
Duration
Alexnet
83.00
3m1s
Squeezenet
81.6
4 m 43 s
Googlenet
81.13
9 m 45 s
VGG16
76.2
23 m 46 s
VGG19
81.8
25 m 14 s
Resnet18
81.33
5 m 19 s
Resnet50
78.9
25 m 2 s
Resnet101
71.07
46 m 19 s
3.2 Performance of the Second Framework The results obtained using the second framework are reported in this section. Few primer tests were likewise performed and assessed for tracking down the ideal classifiers. The properties of the pre-trained networks used are specified in Table 4. The classification accuracy obtained using various pre-trained networks is recorded in Table 5. AlexNet is proven to be outperforming all the rest of the pre-trained networks with an accuracy of 83% in a very short period of 3 min and 1 s. Resnet101 performs the worst in all the networks showing an accuracy of 71.07%. The accuracy improvement is around 12% with the AlexNet respective to the ResNet101. The accuracy obtained is also supported by the other performance measures as shown in Table 6.
4 Conclusion In this work, two approaches are used for the classification of MRI images for the early detection of dementia. Supervised learning is the basis of both frameworks.
Performance Evaluation of Shallow and Deep …
521
Table 6 Performance measures using pre-trained networks Pre-trained network
Sensitivity
Specificity
False negative rate
False positive rate
F1-score
Positive likelihood ratio
Negative likelihood ratio
Alexnet
79.92
86.77
2.00
1.32
83.83
6.04
2.31
Squeezenet
81.76
81.43
1.82
1.85
81.55
4.40
2.23
Googlenet
81.68
80.60
1.83
1.93
80.96
4.21
2.27
VGG16
79.86
73.33
2.01
2.66
74.64
2.99
2.74
VGG19
81.13
82.49
1.88
1.75
81.99
4.63
2.28
Resnet18
80.71
81.97
1.92
1.80
81.51
4.47
2.35
Resnet50
80.08
77.80
1.99
2.21
78.47
3.60
2.55
Resnet101
82.24
65.64
1.77
3.43
65
2.39
2.70
In the first approach, the first-order and second-order features are extracted using pre-processed MRI images and then, the machine learning techniques and shallow neural networks are used as classifiers for the classification of dementia and normal subjects. NN performed well among all the shallow networks. On the other hand, in the second approach, the automatic feature extraction process is used with the pretrained networks of CNN. After the evaluation, it is observed that AlexNet acquired an accuracy of 83% in a very short duration. It is clear from the evaluated results that the second approach based on CNN as a feature extractor is found out to be very useful and powerful for discriminating the two classes, whereas extraction of features in the first approach could be a problematic methodology. As per the literature review also, the use of CNN is a very promising approach in comparison with the traditional approach. Future work will look at the reliability of such frameworks to propose a more robust system for supporting the clinical diagnosis of the disease. Acknowledgements The research was funded by the Department of Science and Technology DST, New Delhi, Reference number DST/CSRI/2017/215 (G).
References 1. World Health Organization: 10 Facts on dementia. Internet: https://www.who.int/news-room/ facts-in-pictures/detail/dementia, 1 April 2021 2. Puente-Castro, A., Fernandez-Blanco, E., Pazos, A., Munteanu, C.R.: Automatic assessment of Alzheimer’s disease diagnosis based on deep learning techniques. Comput. Biol. Med. 120, 103764 3. Tabaton, M., Odetti, P., Cammarata, S., Borghi, R.: Artificial neural networks identify the predictive values of risk factors on the conversion of amnestic mild cognitive impairment. J. Alzheimer’s Dis. 19(3), 1035–1040 (2010) 4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
522
D. Bansal et al.
5. Chaplot, S., Patnaik, L.M., Jagannathan, N.R.: Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed. Signal Process. Control 1(1), 86–92 (2006) 6. Stonnington, C.M., Chu, C., Klöppel, S., Jack, C.R., Jr., Ashburner, J., Frackowiak, R.S., Initiative, A.D.N.: Predicting clinical scores from magnetic resonance scans in Alzheimer’s disease. Neuroimage 51(4), 1405–1413 (2010) 7. Li, S., Shi, F., Pu, F., Li, X., Jiang, T., Xie, S., Wang, Y.: Hippocampal shape analysis of Alzheimer disease based on machine learning methods. Am. J. Neuroradiol. 28(7), 1339–1345 (2007) 8. Vieira, S., Pinaya, W.H., Mechelli, A.: Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neurosci. Biobehav. Rev. 74, 58–75 (2017) 9. Jiang, J., Kang, L., Huang, J., Zhang, T.: Deep learning based mild cognitive impairment diagnosis using structure MR images. Neurosci. Lett. 730, 134971 (2020) 10. Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015) 11. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015) 12. Winkler, D.A., Le, T.C.: Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and qsar. Mol. Inf. 36(1–2) (2017) 13. Payan, A., Montana, G.: Predicting Alzheimer’s disease: a neuroimaging study with 3D convolutional neural networks (2015). arXiv preprint arXiv:1502.02506. 14. Martinez-Murcia, F.J., Górriz, J.M., Ramírez, J., Ortiz, A.: Convolutional neural networks for neuroimaging in parkinson’s disease: Is preprocessing needed? Int. J. Neural Syst. 28(10), 1850035 (2018) 15. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., … Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 16. Ortiz, A., Munilla, J., Gorriz, J.M., Ramirez, J.: Ensembles of deep learning architectures for the early diagnosis of the Alzheimer’s disease. Int. J. Neural Syst. 26(07), 1650025 (2016) 17. Hjelm, R.D., Calhoun, V.D., Salakhutdinov, R., Allen, E.A., Adali, T., Plis, S.M.: Restricted Boltzmann machines for neuroimaging: An application in identifying intrinsic networks. Neuroimage 96, 245–260 (2014) 18. Suk, H.I., Lee, S.W., Shen, D., Initiative, A.D.N.: Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. Neuroimage 101, 569–582 (2014) 19. Liu, S., Liu, S., Cai, W., Che, H., Pujol, S., Kikinis, R., ... Fulham, M.J.: Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer’s disease. IEEE Trans. Biomed. Eng. 62(4), 1132–1140 (2014) 20. Sarraf, S., Tofighi, G.: Classification of alzheimer’s disease using fmri data and deep learning convolutional neural networks (2016). arXiv preprint arXiv:1603.08631 21. Bidani, A., Gouider, M.S., Travieso-González, C.M.: Dementia detection and classification from MRI ımages using deep neural networks and transfer learning. In: International WorkConference on Artificial Neural Networks, pp. 925–933. Springer, Cham (2019) 22. Kam, T.E., Zhang, H., Jiao, Z., Shen, D.: Deep learning of static and dynamic brain functional networks for early MCI detection. IEEE Trans. Med. Imaging 39(2), 478–487 (2019) 23. Jiang, J., Kang, L., Huang, J., & Zhang, T.: Deep learning based mild cognitive ımpairment diagnosis using structure MR ımages. Neurosci. Lett. 134971 (2020) ˘ ˙IT, A., ISIK, 24. Y˙IG ¸ Z.: Applying deep learning models to structural MRI for stage prediction of Alzheimer’s disease. Turkish J. Electr. Eng. Comput. Sci. 28(1), 196–210 (2020) 25. Bansal, D., Chhikara, R., Khanna, K., Gupta, P.: Comparative analysis of various machine learning algorithms for detecting dementia. Procedia Comput. Sci. 132, 1497–1502 (2018) 26. Bansal, D., Khanna, K., Chhikara, R., Dua, R.K., Malhotra, R.: A study on dementia using machine learning techniques. In: Communication and Computing Systems: Proceedings of the 2nd International Conference on Communication and Computing Systems (ICCCS 2018), December 1–2, 2018, Gurgaon, India, p. 414. CRC Press (2019)
Performance Evaluation of Shallow and Deep …
523
27. Bansal, D., Khanna, K., Chhikara, R., Dua, R.K., Malhotra, R.:. Analysis of classification & feature selection techniques for detecting dementia. In: Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur, India (2019) 28. Bansal, D., Khanna, K., Chhikara, R., Dua, R.K., Malhotra, R.: Classification of magnetic resonance images using bag of features for detecting dementia. Procedia Comput. Sci. 167, 131–137 (2020) 29. Bansal, D., Khanna, K., Chhikara, R., Dua, R.K., Malhotra, R.: A systematic literature review of deep learning for detecting dementia. In: Proceedings of the Second International Conference on Information Management and Machine Intelligence, pp. 61–68. Springer, Singapore (2021) 30. Bansal, D., Khanna, K., Chhikara, R., Dua, R.K., Malhotra, R.: Analysis of Univariate and Multivariate Filters Towards the Early Detection of Dementia, Recent Advances in Computer Science and Communications 2020; 13 (2021). https://doi.org/10.2174/266625581399920093 0163857 31. Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle-aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–1507 (2007) 32. OASIS-Brains: (2020). https://www.oasis-brains.org/ 33. fMRIDC: (2020). http://www.fmridc.org 34. MRIcro: (2020). https://www.mccauslandcenter.sc.edu/crnl/mricro 35. Pratt, W.: Digital Image Processing. Wiley (1991) 36. Levine, M.: Vision in Man and Machine. McGraw-Hill (1985) 37. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998) 38. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: International Conference on Database Theory, pp. 217–235. Springer (1999) 39. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2), 103–130 (1997) 40. Rokach, L., Maimon, O.: Data mining with decision trees: Theory and applications. World Scientific (2014) 41. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936) 42. Li, B., Zheng, C.-H., Huang, D.-S.: Locally linear discriminant embedding: An efficient method for face recognition. Pattern Recogn. 41(12), 3813–3821 (2008) 43. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural ˙Information Processing Systems, pp. 1097–1105 (2012) 44. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size (2016). arXiv preprint arXiv:1602.07360 45. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 46. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale ˙Image Recognition (2014). arXiv preprint arXiv:1409.1556 47. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 48. Shahid, N., Rappon, T., Berta, W.: Applications of artificial neural networks in health care organizational decision-making: A scoping review. PloS one, 14(2), e0212356 (2019)
Computer Vision Based Roadside Traffic Convex Mirror Validation for Driver Assistance System Suraj Dhalwar, Hansa Meghwani, and Sachin Salgar
Abstract On the roadside, there is a plethora of information that can aid drivers in driving safely. One such thing is a traffic convex mirror which are usually found in Tjunctions, turns and parking areas. The Advance Driver Assistance System (ADAS) will play a key role in detecting such mirrors and alerting the driver in advance. At present, there is no such function available in driver assistance systems which can analyze the information available in traffic convex mirrors for autonomous and semiautonomous vehicles. Thus, it is currently not possible to extract information that is readily available to the driver. However, every traffic convex mirror on the road is not useful for the driver, so in this paper, a computer vision-based solution is provided to identify a valid convex mirror which extracts and provides only the required information to the driver. The ADAS camera can detect the presence of a convex mirror on the roadside. This convex mirror provides information of the incoming objects and obstacles in a blind spot that is not in the direct field of view. By analyzing the region which is present inside the mirror for oncoming road participants, it is possible to determine whether the mirror is useful for driver assistance or not. Keywords Traffic convex mirror · Computer vision · Deep convolution neural network · Driver assist
1 Introduction Convex mirrors on the side of the road are often employed for traffic safety because they provide an erect, virtual, full-size diminished image of objects which are present on the other side of the road, allowing us to see visually impaired corners. It typically expands the lateral visibility at road junctions on one or both sides of intersections, which is hampered by space constraints [1]. These low-cost safety mirrors are effective only if the driver notices them and reacts accordingly. Most drivers fail to notice Present Address: S. Dhalwar · H. Meghwani · S. Salgar (B) Continental Automotive Components Pvt Ltd, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_46
525
526
S. Dhalwar et al.
the presence of a convex mirror on the roadside, necessitating the use of a detection system to alert the driver in advance. It will be an advantage if objects, which are outside the range of the front camera, are detected using these traffic mirrors and it will be a crucial information for semi-autonomous and driverless cars. Typically, these mirrors are protected by red-colored fiber holding plates, which makes them stand out even in a crowded environment. These convex mirrors are used in places where lateral visibility is required, such as blind corners, hilly regions, T-junctions, parking lots. Because every traffic convex mirror on the roadside is irrelevant to drivers, the focus of this paper is on detecting and assessing traffic convex mirrors before assisting the driver or taking appropriate action. • In this paper, the proposed system will locate the convex mirrors using vehicle’s front camera. • The images with convex mirrors are further analyzed to extract useful information about the blind spot. • This information is used to indicate whether the mirror is valid or invalid for Driver Assistance Systems. • Additionally, a scenario analysis and pose understanding is done on valid convex mirrors. Section 2 contains information regarding the prior art, Sect. 3 contains a summary of the methodology employed, and the remaining sections contain information about the various algorithms employed. The experimental setup is described in detail in Sect. 4, and the results and discussions are presented in Sect. 5. Finally, Sect. 6 talks about conclusion and future work.
2 Literature Survey Traffic sign detection and recognition play a vital role in vehicle safety applications, where most of the attention has been laid on signs of shape rectangles and circles which gives information about the speed limit and other traffic rules to assist the driver for safety [2–5]. In similar fashion traffic convex mirror detection is also possible which is helpful to see blind spot areas [6]. One specific use of convex mirrors is to find the blind spot with the help of rearview (wing) mirrors in vehicles because they reflect distant objects with a wider field of view. Traditional flat mirrors on the driver’s side of a vehicle give drivers an accurate sense of the distance of cars behind them but have a very narrow field of view. As a result, there is a region of space behind the car, known as the blind spot that drivers cannot see via either the side or rear-view mirror. Thus, convex mirrors enable the driver to view a much larger area than would be possible with a plane mirror. The roadside convex mirror also helps to view larger areas in blind spots. The deep neural network is used for the detection of traffic sign recognition [3, 7]. A deep learning-based approach is used to detect traffic convex mirrors for automotive applications [8]. To improve the detection, a 3D object detector gives
Computer Vision Based Roadside Traffic …
527
better estimations, and pose information can be used from the 3D detector for object movement, either it is coming toward or moving away from the mirror [9].
3 Methodology Using the vehicle’s front camera, locate the convex mirror on the roadside and start focusing on it. The camera will capture the image of the convex mirror and analyze it. The image captured gives vital information about the objects on the road, which are mostly in the blind spot area. Such mirrors can be detected using image processing. To make it more informative for the ADAS system, scenario understanding is needed. The primary focus of this paper is on validating the mirror based on certain checks; if all checks are passed, only then provide information to the driver; otherwise, suppress that information internally. As per the flow chart shown in Fig. 1 the convex mirror detection is performed on an input image from the ADAS camera. Once the traffic mirror is detected, it is passed through a certain check to validate the traffic mirror. Here, the first check is a circularity check. It will decide whether the mirror detection is eligible for further processing or not. If a circularity check Fig. 1 Flow chart for valid traffic convex mirror assist
528
S. Dhalwar et al.
is passed, then a detailed mirror analysis is needed. During this analysis, the first approach is to find if any object is present in the traffic mirror or not. If there is an absence of an object and its movement in the traffic mirror, then it will be considered as a void mirror. If the object is present in the mirror, a detailed analysis of the object’s behavior and its classification is required. In this paper, to improve the efficiency of the system, one more step is included, which will tell you whether the object is coming toward the mirror or if it is moving away from the mirror, which is helpful while making a valid decision. If the object is approaching the mirror from a blind spot, then the system will assist the driver to take the necessary safety action. The pseudocode for the mentioned approach is shown below.
Algorithm 1: Validation of Convex Mirrors for Driver Assistance Systems. for each frame in Video do: 1. Perform Convex Mirror Detection. (ex- using YOLO) 2. Get the Region of Interest (ROI) from the detected Bounding Box. 3. Detection and Classification of objects inside the mirror. 4. Store all the detected classes inside mirror in vlabels. If (len(vlabels) == 0): Assign outputLabel = “Invalid Mirror” /*Void mirror check*/ Do nothing else if (vlabel contains ‘car’ or ‘truck’ or ‘bike’ or ‘pedestrian’): Assign outputLabel = ``Valid Mirror'' /*Vehicles in mirror check*/ Provide class information if objects inside mirror. 3D Box Estimation and Pose of the object. Scenario Analysis. else: Assign outputLabel = “Invalid Mirror” /*Static objects check*/ Do nothing endif end for
The step-by-step process of doing this validity check is explained in a detailed manner as follows.
3.1 Algorithms for Validity Check Circularity Check Once the traffic mirror is detected, it is checked for its circularity (see Eq. 1) as a preprocessing step. The traffic mirror will look more ellipse-shaped when the EGO vehicle is far away, and it provides less/no information about the opposite traffic. When the EGO vehicle approaches the mirror, the circularity of the mirror increases, providing more insight into oncoming vehicles than elliptical mirrors. The Canny
Computer Vision Based Roadside Traffic …
529
edge detector, hysteresis threshold, and connected components are run on the offset of the added detected traffic mirror image. The biggest blob (which is expected to be a traffic mirror) is checked for circularity. If the circularity is greater than the set threshold, then it is used for further processing. Circularity = 4 ∗ Pi ∗ Area/(Perimeter)2
(1)
Void Mirror Check This check is to verify whether the object is present in the mirror or not. If the object and its movement are not available, then it is an invalid mirror. It will check objects like pedestrians, bicycles, motorbikes, cars, trucks, or any vulnerable traffic object present in the mirror for consecutive three frames. If the object is not available in the mirror, then it is not useful for the driver, and the proposed system will avoid this unwanted assistance. One more possible approach for this is to verify flow vector movement in the mirror. If a flow vector-based classifier is used, then it is possible to improve the accuracy of the system with additional measurements. Object Classification and Pose Understanding Object classification is important because behavior of every object is different. The change in speed for pedestrian and motorbike is different. Here the classes like pedestrian, bicycle motorbike, car and truck are considered. After classification of object the primary interest is to find direction of object, if its direction is toward the mirror then detailed analysis of the object is needed [10]. Scenario Analysis Figure 2 shows different scenarios which the system should understand for a better decision. Case 1: Left turn, here the ego car is moving toward the mirror and the movement of vehicle A is visible in the mirror. The focus is on vehicle A which is coming toward the mirror from the opposite direction. As per the property of the convex traffic mirror, the vehicle looks close in the mirror, which is slightly far in reality, but
Fig. 2 Three different use case scenarios a Left turn scenario. b Right turn scenario. c T-Junction scenario
530
S. Dhalwar et al.
it helps with the early detection of objects. So, in this case, the vehicle is approaching the mirror, and if the same results are obtained for three or more consecutive frames, the mirror assist will be valid. Case 2: Right turn scenario. Here, vehicle B is moving away from the mirror, so it will be in the same lane as per the traffic rules. The traffic rule in consideration here is the left-hand driving rule. As a result, in the post-analysis, this is not the information that the driver requires. After three or more consecutive frames, if the results are similar, then the mirror will be classified as invalid, and the driver will not be notified of this. Case 3: T-junction scenario. As per the traffic rules, the ego vehicle will take a left turn. In such a scenario, the information from the blind spot is necessary for the movement of any approaching vehicle from the right-side direction toward the junction that is visible in the mirror. Here, the mirror is also placed in such a way that the traffic in the right direction is visible. If the vehicle is approaching the mirror, it is a valid case; otherwise, it falls into the invalid category. For this image, the traffic rules for left-hand driving are taken into consideration.
4 Experimental Setup In the proposed paper, a pre-trained YOLOv3 model is used for detecting objects inside the mirror to produce 2D boxes and then estimating 3D boxes from 2D detection boxes whose confidence scores exceed a threshold [10]. YOLOv3 is pre-trained on ImageNet. Darknet53 model weights are used for training [11]. The experimental setup is done with an NVIDIA GPU-GFORCE GT 730 graphics card and CUDA 10.1, CuDNN 7.4, and OpenCV3.x libraries. For regressing 3D parameters, a pre-trained VGG network without its FC layers and the additional 3D module is used [9]. There is no public dataset available for convex mirror detection. An ADAS-based front view camera was used to collect this dataset and the model was trained using this dataset on top of pre-trained weights. 80% of the collected dataset is used for training and 20% is used for validation. The validation dataset consists of 2000 images with traffic convex mirrors, 816 images filtered out as invalid based on validity checks. Out of 1184 valid convex mirror images, 1120 gave the correct detailed information about the object visible in the mirror, and 748 showed it was a valid traffic convex mirror based on their pose.
5 Results Analysis and Discussion The detection of convex mirrors [8] is an existing approach in the same area, but all detected mirrors may not be useful to the driver and would thus trigger the warning unnecessarily. The proposed approach would notify only if there is a piece of useful information inside the mirror. There is no existing invention for the validation of
Computer Vision Based Roadside Traffic …
531
mirrors. The proposed convex mirror validation system analyses the mirror and classifies it as valid and invalid based on different filtering criteria. If it is a valid convex mirror, it estimates the pose of the vehicle to categorize it into different scenarios and indicates whether the vehicle is coming toward or going away. In Fig. 3, the convex mirror is classified as invalid because there is no object seen in the mirror. Figures 4 and 5 shows the output of detailed information about the object which is visible in the convex mirror. Figures 4 and 5 depict a left and right turn scenario, respectively, in which an object (a car) is present inside the mirror and that object is approaching the mirror, indicating that it is a valid mirror to assist the driver. Figure 6 depicts a complex scenario such as the apartment entrance gate, which includes multiple objects such as motorbikes and pedestrians, as well as another complex scenario in which a vehicle makes a right turn, and the proposed system returns a correct, valid result.
Fig. 3 Test image for valid and invalid mirror check a Empty mirror. b Traffic mirror detected as invalid
Fig. 4 Test image and result a Ego vehicle intend to take left turn. b 2D detection of object (car). c 3D box estimation and pose analysis
532
S. Dhalwar et al.
Fig. 5 Test image and result: a Ego vehicle intend to take right turn. b 2D detection of object (car). c 3D box estimation and pose analysis
Fig. 6 Test image and results: a Apartment Entrance gate scenario. b Result image. c Complex scenario ego vehicle taking right turn. d Result image
6 Conclusion and Future Work In the driver assistance system, it is always important to grab information from the blind spot to ensure safety. The proposed method effectively extracts the information from roadside traffic mirror. With the above-mentioned approach, it is possible to find blind spot information with a detailed analysis of traffic convex mirrors and take necessary action. It is possible to find an object’s intention and predict the target object’s movement in the blind spot area using a detailed analysis of the mirror. Future work should focus on a detailed understanding of the scenario, including pose estimation, which will provide valid information in complex scenarios during cross-sections on the road. Also, one more branch of analysis could be done on optical flow vector-based pose/direction estimation to improve the accuracy of the systems.
Computer Vision Based Roadside Traffic …
533
References 1. Moukhwas, D.: Road junction convex mirrors. Appl. Ergon. 18(2), 133–136 (1987) 2. Belaroussi, R., et al.: Road sign detection in images: A case study. In: 2010 20th International Conference on Pattern Recognition. IEEE (2010) 3. Artamonov, N.S., Yakimov, P.Y.: Towards real-time traffic sign recognition via YOLO on a mobile GPU. J. Phys. Conf. Ser. 1096(1) (IOP Publishing, 2018) 4. Peng, E., Chen, F., Song, X.: Traffic sign detection with convolutional neural networks. In: International Conference on Cognitive Systems and Signal Processing. Springer, Singapore (2016) 5. Chakraborty, P., et al.: Traffic congestion detection from camera images using deep convolution neural networks. Transp. Res. Record 2672(45), 222–231 (2018) 6. Prakash, B., Pimpalkar, P.U.: Apparatus and method for detecting road users in an environment of a vehicle ego. Patent no. DE102016215115A1 (2016) 7. Arcos-Garcia, A., Alvarez-Garcia, J.A., Soria-Morillo, L.M.: Evaluation of deep neural networks for traffic sign detection systems. Neurocomputing 316, 332–344 (2018) 8. Dhalwar, S., et al.: Image processing based traffic convex mirror detection. In: 2019 Fifth International Conference on Image Information Processing (ICIIP). IEEE (2019) 9. Zhu, M., Derpanis, K.G., Yang, Y., Brahmbhatt, S., Zhang, M., Phillips, C., Lecce, M., Daniilidis, K.: Single image 3d object detection and pose estimation for grasping. In: IEEE ICRA (2013) 10. Mousavian, A., et al.: 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 11. Github, www.github.com/AlexeyAB/darknet. Last accessed 5 April 2021
Feature Extraction Using Autoencoders: A Case Study with Parkinson’s Disease Maria Achary and Siby Abraham
Abstract Parkinson’s disease is a common progressive neurodegenerative disorder. PD is also considered to be slow progress, so the detection of its early stages is the biggest challenge in the medical domain. This paper offers an autoencoderbased feature extraction framework to extract features of different cohorts of PD from magnetic resonance imaging images. It uses sparse autoencoder and convolutional autoencoder for this purpose. The cohorts used in the work are Normal Control, GenCohortPD, GenRegPD, GenRegUnaff, Phantom, DenovoPD, Parkinson’s disease, Swedd, Prodromal, and GenCohort. Experimental results show the effectiveness of autoencoder in the feature extraction of PD and the superiority of convolutional autoencoders. Keywords Parkinson’s disease · Autoencoder · Magnetic resonance imaging (MRI)
1 Introduction In recent years, it has been noticed that health informatics systems have been widely to detect and monitor neurodegenerative disorders [1]. Intelligent systems based on AI are widely utilized to monitor Parkinson’s disease (PD), which is a serious issue noticed in people around the age of 60 years [1]. PD is commonly said to be a noncurable progressive neurodegenerative disease with multiple motor and non-motor characteristics [2]. PD’s significant symptoms are slow movements, inability to move easily, and other movement symptoms like asymmetry and poor posture [1]. In addition, voice disorders are another most common symptom in PD patients [3]. Current M. Achary (B) Department of Computer Science, University of Mumbai, Mumbai 400098, India e-mail: [email protected] S. Abraham Center of Excellence in Analytics and Data Science, NMIMS Deemed To Be University, Mumbai 400056, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_47
535
536
M. Achary and S. Abraham
studies state that it has been noticed that PD is also affecting the younger generation. It is reported that, within the age range of 40–45 people living in the US, Parkinson’s disease has reached up to one million of the population. The increased number of people with PD worldwide is relatively high and rapidly increasing in developing countries [3]. Mayo Clinic in the US has reported the lifelong risk of Parkinson’s disease at 2% for men, and the value is 1.3% for women [3]. It is reported that the increase in the number of PD patients will be double the amount by 2030 [3]. Despite all these, the cause of PD remains largely unknown. Moreover, PD is not curable and its treatment includes mainly medications and surgery [3]. Due to the prolonged life of the PD patients with an early accurate diagnosis, most trustworthy intelligent systems are of great requirement [3, 4]. Smart systems will try to reduce the load at clinicians and expert levels dependencies [5–8]. Therefore, developing expert systems that can provide an accurate and most reliable prediction of Parkinson’s disease is recommended [3]. Neuroimaging biomarkers technique proved to be more promising for disease diagnosis [5]. These include biomarkers like single-photon emission computed tomography (SPECT), dopamine transporters scans (DaTscans), and magnetic resonance imaging (MRI) [6, 7]. With the advent of machine learning (ML) and deep learning (DL), several recent studies have been proposed to predict PD using brain MRI images [2, 8, 9]. Rana et al. proposed a supervised ML model as a support vector machine (SVM) using MRI images of weights as T1 for PD classification [2]. Salvatore et al. adopted an ML model as a principal component analysis (PCA) using MRI images of weights as T1 [3]. Sivaranjini et al. adopted a supervised ML model as SVM for the classification of PD and progressive supranuclear palsy (PSP) patients [4]. Recent studies in deep learning (DL) have been more promising in this regard. It plays an important step in image analysis and diagnosis of neurological diseases. DL helps doctors analyze CT scans, ECG, and genome data for heart disease, cancer, and brain tumor and provides treatment at early stages. A deep learning model, called convolutional neural network (CNN), is used to analyze the image datasets of different types. CNN also helps in drug discovery. Shin et al. suggested computeraided detection problems using the CNN model [10]. There are multiple instances of CNN and recurrent neural network (RNN) as mechanisms to achieve high prediction accuracy [4, 10]. One of the major difficulties in applying ML or DL in neuroimaging research is their high dimensionality and limited data availability. The best way to address this issue is by using the feature extraction method. Feature extraction is a part of the dimensionality reduction process wherein an initial set of raw images is divided and reduced to a more readable group of sets, leading to much faster processing. Large datasets consist of many variables (which include features) which require lots of computing power for processing to extract the best features by selecting and combining variables into a feature vector, which effectively reduces the amount of data. Essentially, feature extraction helps to remove redundant parts to retain only high-quality elements of the data to build an effective learning model.
Feature Extraction Using Autoencoders …
537
Autoencoders have been considered to be a more promising feature extraction technique. An autoencoder replicates the input to the output in an unsupervised manner [11, 12]. The following are the different types of autoencoders. 1.1
1.2
1.3
1.4
1.5 1.6
1.7
Denoising Autoencoder: In this type of autoencoder, some sample noise is added to the input image and the system learns to remove it. It takes a partially corrupted input image while training to recover the original as an exact input image. In this process, the encoder will extract essential features from the image. Convolutional Autoencoder: It learns to encode an input in the form of plain signals and then reconstructs the original image. In this, encoding layers are called convolution layers (down-sampling) and decoder layers are called deconvolution layers (up-sampling). Variational Autoencoders: It tries to generate new images based on the assumptions of the distribution of latent variables. This would result in added loss of component and specific estimator for training, also called stochastic gradient variational Bayes estimator, for estimating it. Deep Autoencoder: It uses two deep-belief networks, which are symmetricone of 4–5 shallow layers, called encoded layers, and another of 4–5 layers, called decoded layers. Under Complete Autoencoder: In this model, the hidden layer has a smaller dimension compared to the input layer. Sparse Autoencoder: In this model, the training condition involves the usage of sparse penalty in which it tries to construct a loss function and needs to drop hidden layers so that only a few nodes are activated. This would help to feed only a single sample into a network. Contractive Autoencoder: This encoder is less sensitive if it has minimum variations in its training dataset, which are accomplished by adding a regularizer. The final aim is to reduce the learned representation, which is exposed to the training input.
The proposed work uses sparse and convolutional autoencoders. The methodology uses ten different cohorts of Parkinson’s disease (PD) from MRI data. The features extracted can be used for granular level analysis and early detection of Parkinson’s disease. The remainder section of the paper is designed as follows: Sect. 2 describes related work. Section 3 describes the materials and methods used in the work. Section 4 explains the details of the experimental requirements and results. Section 5 offers the conclusion of the work. Finally, the references used in the work.
2 Related Work Several works have been discussed in the literatures that deal with feature extraction and other classification-related activities of Parkinson’s disease (PD) using ML
538
M. Achary and S. Abraham
and DL techniques. Among the different ML techniques, supervised algorithms like support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN) are the most often used. These techniques gave accuracies ranging from 80.01 to 89.00% [1, 6, 9, 13]. Rouzbahani et al. obtained a method using optimal features as specified by a wrapper approach using an SVM classifier to form a featureperformance curve. After getting optimal features, SVM, KNN, and discriminationfunction-based classifiers were trained [14]. For this purpose, three different word embedding models, namely Word2Vec, GloVe, and FastText, were utilized to enrich the meaning and context of PD. Rouzbahani et al. suggested a probabilistic neural network (PNN) classifier for the early diagnosis and achieved a success rate of 80.92% [15]. Sharma and Giri used three classifiers: MLP, KNN, and SVM in which a dataset composed of data of voice signals from 31 normal patients, 23 patients with Parkinson’s disease, and 8 healthy patients [6]. In which the results acquire a high accuracy of around 85.29% [6]. B. E. Sakar applied SVM, ELM, and KNN classifiers to classify Parkinson’s disease (PD) and standard control to construct a classification model. The work used voice recordings of early PD and late PD patients with a variety of high risk of voice impairments using the tools called Unified Parkinson’s Disease Rating Scale (UPDRS) score as a rating index of disease giving an accuracy of 96.40% [1, 13]. Rana et al. proposed machine learning techniques for Parkinson’s classification using MRI images of T1 weighted to classify normal controls and PD patients using SVM techniques [2]. C Salvatore applied MRI images of T1 weighted of 28 Parkinson’s disease (PD) patients, 28 progressive supranuclear palsy (PSP) patients, and 28 healthy control patients. He used the unsupervised ML model as principal components analysis (PCA) as a feature extraction technique and supervised ML models as support vector machines as a classification algorithm. The method identified individual diagnosis of PD versus healthy controls, PSP versus healthy controls, and PSP versus PD with an accuracy, specificity, and sensitivity greater than 90.00% [16]. Hakan Gundz applied two types of frameworks based on CNN for the classification of Parkinson’s disease (PD) and normal control using a set of vocal (speech) features, and their performances are validated with leave-one-out cross-validation (LOOCV) which obtained an accuracy of 82.01% [9]. Sivaranjini et al. applied CNN architecture type as AlexNet to refine the detection of Parkinson’s disease to classify the healthy controls and PD patients using MR images data by applying a deep learning neural network which obtained an accuracy of 88.91% [4]. Ram Deepak Gottapu applied convolutional neural networks (CNNs) to analyze PD data to understand which type of segmentation must be used for various imaging biomarkers. He further used long short-term memory (LSTM) to check the slow progression of the disease using the white and gray matter of MRI images and obtained an accuracy of 90.00% [7]. Sumeet Shinde applied CNNs to create an early diagnosis of PD from MRI (NMS) images to classify PD from a typical Parkinson’s disease over standard healthy controls in which it obtained an accuracy of 85.70% [17]. Yanhao Xiong et al. applied sparse autoencoder for which he used deep feature extraction from vocal data to classify PD. They used six supervised machine learning
Feature Extraction Using Autoencoders …
539
algorithms, out of which linear discriminant analysis (LDA) performed best for classification between normal controls and PD and obtained an accuracy of 91.00% [3]. Veronica Munoz Ramırez applied an anomaly detection approach which showed that autoencoders (AEs) provide an effective and efficient anomaly scoring to discriminate de novo PD patients and normal contol using quantitative MRI data [18]. She has implemented it by using spatial variational autoencoder and dense variational autoencoder in which the number of voxels was over 95.00% accuracy in which quantile abnormality threshold was generally superior in PD patients than in healthy controls [18]. Given the related work mentioned above, the proposed work differs in two main aspects: Firstly, it applies to all possible cohorts of PD, not just on a few like other works. Secondly, this offers the best accuracy than others.
3 Materials and Method 3.1 Dataset The dataset used in this proposed work is obtained from the Parkinson’s progression markers initiative (PPMI) database [19]. PPMI is one of the first lastest worldwide collaborations of research participants to identify and diagnose biomarkers images to improve PD treatment with early diagnosis it [5]. They follow a standardized protocol for the acquisition and analysis of the data which will promote the overall comprehension of PD [5]. Since MRI images are considered to be the best biomarkers for PD diagnosis, the same is used in our proposed work [5]. MRI images are acquired by using imaging protocol of T1 weighted, 2D (MPRAGE), as such MRI images of the brain consist of 24 sagittal (vertical view—front to back) cuts beginning at the right-hand side view of the human brain and moving to the left side view, pulse sequence = IR stands for inversion recovery (IR) is an MRI protocol, which can be accessed into MR imaging, at a time of T1, the magnetization of nuclear is reversed before its regular imaging [19]. The IR sequence is specified in terms of 3 parameters types, i.e., inversion time (TI), repetition time (TR), and TE (echo time). Slice thickness = 1.0 mm, TE = 2.9 ms, TI = 900.0 ms, TR = 2300 [19]. Based on machines used for producing the MRI images, the size of the image varies. Proposed work has used the different cohorts of Parkinson’s disease (PD) of MRI images, including Normal Control, GenCohortPD, GenRegPD, GenRegUnaff, Phantom PD, SWEDD, and Prodromal and GenCohort Unaff of 50 samples each.
540
M. Achary and S. Abraham
3.2 Preprocessing Table 1 gives the details of different cohorts of PD used in the proposed work. As the size of MRI images is based upon type of MRI machines used, there is no standard image size. In preprocessing, we try to normalize the size of MRI images of different cohorts of PD to maintain the image standardization. The proposed work has considered 500 MRI images from PPMI datasets altogether. We have first preprocessed the original MRI images to 192 × 192 pixels to maintain the standardization of input of MRI images throughout the experiment. After preprocessing, each MRI image will be of size 192 * 192 = 36,864 features which have been given as an input to sparse autoencoder (SAE) and convolutional autoencoder (CAE).
3.3 Methodology A detailed flow structure of the proposed work is mentioned in Fig. 1. Sparse autoencoder has been first computed that the current cost is based on the values of each weight, followed by which network error has been calculated over the training set. Every single training example has been evaluated and stored in its resultant neuron, in which activation values are calculated for cost and gradient. The average square difference between the network output and training output (mean square error) has been computed and which is then added to its regularization cost. Then, the sparse penalty cost and gradients were calculated. Finally, the accuracy is obtained by checking its reconstruction loss values. In convolutional autoencoder, the input-sized images are of 192 * 192 * 1 or 36,864-dimensional vectors; which are converted as an image encoded matrix into an array index and resized between 0 and 1. A batch of 128 sizes was used, and the network was trained for 100 epochs. 3.3.1
3.3.2
Encoder. The following are the architecture details of encoder: First layer: 32 filters of 3 × 3 size, followed by a down-sampling (maxpooling) layer, Second layer: 64 filters of 3 × 3 size, followed by another down-sampling layer, Final layer: 128 filters of 3 × 3 size. Decoder. The architecture details of the decoder are given as follow: First layer: 128 filters of 3 × 3 size followed by an up-sampling layer. Second: 64 filters of 3 × 3 size followed by another up-sampling layer. Final layer: 1 filter of 3 × 3 size.
Both the models were optimized using Adam optimizer, and then, the loss for every batch was computed pixel by pixel. Finally, the model’s reconstruction loss after 100 epochs was checked. After that, the model selection was done based on
Feature Extraction Using Autoencoders …
541
Table 1 The groups of subjects used in the study [19] S. no
Different cohorts of PD
Explanation
1
Normal Control
The patients are completely Normal
2
GenCohortPD
The patients suffer from asymmetric resting tremor and genetic mutation
3
GenRegPD
The patients suffer due to family members suffering from PD-associated mutations
4
GenRegUnaff
The patients are not affected by family members with PD-associated mutations
5
Phantom PD
The patients suffer from hallucinations
6
Parkinson’s disease (PD)
The patients ultimately have Parkinson’s disease with symptoms like (tremor, bradykinesia, rigidity, postural instability, Speech, and writing changes)
7
Swedd
The patients were having DaTscans that do not show evidence of a dopaminergic deficit. This distinguishes PD from beginning or disorders
8
Prodromal
The patients without Parkinson’s disease who have a diagnosis of hyposmia
9
GenCohort Unaff
The patients who have non-infected resting tremors, bradykinesia, and rigidity must have either resting tremor or bradykinesia
10
De novo PD
The patients suffer from PD but not yet taking prescribed PD specific medications
Original images (Sagittal)
542
M. Achary and S. Abraham
Input MRI Brain images
Data Pre-Processing
Convolutional Autoencoder
Sparse Autoencoder
Model Selection
Best Feature extractor Fig. 1 Proposed work
each reconstruction loss which finally indicates which is the best feature extractor. It was found that the convolutional autoencoder (CAE) performed better than sparse autoencoder (SAE).
4 Experimental Setup and Results The proposed work was implemented using Keras 2.4.3, trained on an NVIDIA GeForce RTX 2080 Ti GPU, 4 GB with batches of 50 images. Keras is a simple and powerful open-source library that provides a Python interface for the TensorFlow library which is more supportive for deep learning models. Keras is considered to be one of the best high-level neural network APIs and supports multiple backends neural network computations. The biggest reason to use Keras is user-friendly. Keras supports with advantages of broad adoption supports a wide range of production deployment options, with the integration of at least five-back end engines like TensorFlow, CNTK, Theano, MXNet, and PlaidML with max support of GPU’s along with distributed training. Keras helps in reducing cognitive load which offers consistent and simple APIs, minimization of user action for common use cases and it also provides clear and actionable feedback upon user error. For sparse and convolutional autoencoder models, the nonlinear activation function in each layer used was rectified linear unit (ReLU), except for the last layer where sigmoid was employed to have output pixels normalized between [0, 1]. The loss functions were optimized using Adam optimizer.
Feature Extraction Using Autoencoders …
543
Convolutional autoencoder consisted of 2D convoutional layer, max pooling 2D layers, and downsampling 2D layers, while the decoder consisted of 2D convoutional layer, max pooling 2D layers, and upsampling 2D layers.
4.1 Comparison of Performance of Sparse and Convolutional Autoencoders Figure 2 shows the accuracy achieved for different cohorts of PD of sparse and convolutional autoencoders. It indicates that the convolutional autoencoder has performed better than sparse autoencoder. It has been noted that on average of 68% and above accuracy is found in the sparse model, whereas 89% and above are reached in the convolutional autoencoder with a maximum reconstruction loss of 0.98. Convolutional autoencoder, which performs better than sparse autoencoder, does not use any sparsity constraint to minimize the mean squared error in the hidden layer during a training process; whereas, in sparse autoencoder model, the sparsity constraint is introduced on the hidden layers which are obtained by additional terms in the loss function during the training process, which is done either by comparing the probability distribution of hidden unit activations with some low desired values or manually zeroing all but the strongest hidden unit activations. In convolutional autoencoder due to its convolutional nature, it scales high for high-dimensional images but not in sparse autoencoder. Convolutional autoencoder automatically removes noise or reconstructs missing parts but not in sparse autenocoder. In sparse autoencoder, individual nodes of a trained model which activate are data-dependent, which results in different inputs with different activation but not in convolutional autoencoder. Hence, the performance of feature extraction is best in convolutional autoencoder as compared with sparse autoencoder. 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
Sparse Autoencoder Convolutional Autoencoder
Fig. 2 Comparison of performance between sparse and convolutional autoencoders
544
M. Achary and S. Abraham
95.00% 90.00% Accuracy
85.00% 80.00% Epoch=25
Epoch=50
Epoch=100
Fig. 3 Convolutional autoencoder accuracy at different epoch values [25, 50, 100]
4.2 Variation in Accuracy for Convolutional Autoencoder Figure 3 shows the results of three different iterations using the convolutional autoencoder. It was observed that the convolutional autoencoder performs better with an accuracy of 91.34% with 0.98 reconstruction loss with an epoch of 100.
4.3 Comparative Study of Parkinson’s Disease Using MRI Images Table 2 compares related works and the proposed work in terms of different cohorts of PD used. It uses six works other than our work for this purpose. All those works use one or a couple of cohorts, whereas the proposed work uses cohorts like Normal Control, GenCohort PD, GenRegPD, GenRegUnaff, Phantom, Parkinson’s disease, Swedd, Prodromal, and GenCohort. The proposed work is compared with other works based on the criteria of ML and DL techniques used, types of data used, and cohorts dealt with to study the effectiveness of each with the proposed work.
4.4 Comparison Concerning the Accuracy Obtained Figure 4 gives the comparison of the accuracy provided by similar works and the proposed work. It uses the same seven works listed in Table 2. As the figure shows, the proposed method provided an accuracy of 91.34%, whereas the best one from the rest gave an accuracy of 90.00%
5 Conclusions The paper proposes an autoencoder-based approach for feature extraction of PD using MRI data. It uses sparse and convolutional autoencoders for this purpose.
Feature Extraction Using Autoencoders …
545
Table 2 Comparison of the proposed work with similar works in terms of different cohorts of PD Author
Methods employed
Data’s used
Different cohorts used in PD
Schwarz et al. [8]
CNN and LSTM (Segmentation)
MRI
Normal Control and PD
Rana et al. [2]
SVM
MRI
Normal Control and PD
Shinde et al. [17]
CNN
MRI
Normal Control and PD
Sivaranjini and Sujatha, Springer [4]
CNN
Neuromelanin sensitive magnetic resonance imaging (NMS)-MRI
Normal Control and PD
Salvatore et al. [16]
SVM
MRI
Parkinson’s disease (PD), Normal Control and progressive supranuclear palsy (PSP)
Xiong and Lu [3]
Sparse autoencoders
MRI
Swedd, PD
Proposed Work
Sparse autoencoders and convolutional autoencoders
MRI
Normal Control, GenCohort Parkinson’s disease (PD), GenRegPD, GenRegUnaff, Phantom, PD, Swedd, Prodromal De novo PD, and GenCohort
Accuracy Comparisons
92.00%
0.9134
91.00% 90.00%
0.889
0.8856
89.00% 88.00%
0.8667
87.00%
0.9 0.8734
0.857
Accuracy
86.00% 85.00% 84.00%
Schwarz,
83.00%
Stefan T
Rana B.Juneja
Sumeet Shinde
Sivaranjini
Salvatore
Yanhao Xiong, Yaohua Lu
Proposed Work
82.00%
Fig. 4 Comparison between proposed work and work done till now in Parkinson’s disease
The proposed work has proved that convolutional autoencoder is a more successful feature extractor for different cohorts of Parkinson’s disease (PD) as it provided an accuracy of 91.34% compared to sparse autoencoder. The findings of the work shall be used for future analysis of its early detection.
546
M. Achary and S. Abraham
References 1. Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E., Apaydin, H.: A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. J. 74, 255–263 (2019). https://doi.org/10.1016/j.asoc.2018.10.022 2. Rana, B., Juneja, A., Saxena, M., Gudwani, S., Senthil Kumaran, S., Agrawal, R.K., Behari, M.: Regions-of-interest based automated diagnosis of Parkinson’s disease using T1-weighted MRI. Expert Syst. Appl. 42(9), 4506–4516 (2015). https://doi.org/10.1016/j.eswa.2015.01.062 3. Xiong, Y., Lu, Y.: Deep feature extraction from the vocal vectors using sparse autoencoders for Parkinson’s classification. IEEE Access 8, 27821–27830 (2020). https://doi.org/10.1109/ACC ESS.2020.2968177 4. Sivaranjini, S., Sujatha, C.M.: Deep learning-based diagnosis of Parkinson’s disease using convolutional neural network. Multimedia Tools Appl 79(21–22), 15467–15479 (2020). https:// doi.org/10.1007/s11042-019-7469-8 5. Lei, H., Zhao, Y., Wen, Y., Luo, Q., Cai, Y., Liu, G., Lei, B.: Sparse feature learning for multi-class Parkinson’s disease classification. Technol Health Care 26(S1), S193–S203 (2018). https://doi.org/10.3233/THC-174548 6. Sharma, A., Giri, R.N.: Automatic recognition of Parkinson’s disease via artificial neural network and support vector machine. Int. J. Innov. Technol. Explor. Eng. (IJITEE), 4(3), 35–41 (2014). http://www.ijitee.org/attachments/File/v4i3/C1768084314.pdf 7. Gottapu, R.D., Dagli, C.H.: Analysis of Parkinson’s disease data. Procedia Comput. Sci. 140, 334–341 (2018). https://doi.org/10.1016/j.procs.2018.10.306 8. Schwarz, S.T., Afzal, M., Morgan, P.S., Bajaj, N., Gowland, P.A., Auer, D.P.: The “swallow tail” appearance of the healthy nigrosome—A new accurate test of Parkinson’s disease: A casecontrol and retrospective cross-sectional MRI study at 3T. PLoS ONE, 9(4) (2014). https://doi. org/10.1371/journal.pone.0093814 9. Hakan, G. (2019) Deep learning based Parkinson’s Disease classification using vocal feature sets. ACCESS.2936564, IEEE Access,10.1109, Bandirma Onyedi Eylül Üniversitesi (2019) 10. Tsanas, A., Little, M., McSharry, P., Ramig, L.: Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. Nature Proc. (2009) https://doi.org/10.1038/npre. 2009.3920.1 11. FeatureEmbedding: https://cloud.google.com/solutions/machine-learning/overview-extrac ting-and-serving-feature-embeddings-for-machine-learning. Last accessed 22 Jnuary 2021 12. Autoencoders, https://www.mygreatlearning.com/blog/autoencoder/. Last accessed 8 May 2020 13. Sakar, B., Serbes, G., Sakar, C.: Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson”s disease. PloS one 12(8), e0182428 (2017) 14. Tsanas, A., Little, M., McSharry, P., Ramig, L.: Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. Nat. Proc. (2009). https://doi.org/10.1038/npre.2009. 3920.1 15. Karimi Rouzbahani, H., Daliri, M.: Diagnosis of Parkinson”s disease in human using voice signals. Basic Clin. Neurosci. 2(3), 12–20 (2011) 16. Salvatore, C., Cerasa, A., Castiglioni, I., Gallivanone, F., Augimeri, A., Lopez, M., Arabia, G., Morelli, M., Gilardi, M.C., Quattrone, A.: Machine learning on brain MRI data for differential diagnosis of Parkinson’s disease and Progressive Supranuclear Palsy. J. Neurosci. Methods 222, 230–237 (2014). https://doi.org/10.1016/j.jneumeth.2013.11.016 17. Shinde, S., Prasad, S., Saboo, Y., Kaushick, R., Saini, J., Pal, P.K., Ingalhalikar, M.: Predictive markers for Parkinson’s disease using deep neural nets on neuromelanin sensitive MRI. NeuroImage Clin. 22(February), 101748 (2019). https://doi.org/10.1016/j.nicl.2019.101748 18. Ramirez, V.M., Kmetzsch, V., Forbes, F., Dojat, M.: Deep learning models to study the early stages of Parkinson’s disease. In: Proceedings International Symposium on Biomedical Imaging, 2020-April, 1534–1537. https://doi.org/10.1109/ISBI45749.2020.9098529 19. PPMI Dataset details, https://www.ppmi-info.org/study-design/study-cohorts/. Last accessed 21 January 2021
Combination of Expression Data and Predictive Modelling for Polycystic Ovary Disease and Assessing Risk of Infertility Using Machine Learning Techniques Sakshi Vats, Abhishek Sengupta, Ankur Chaurasia, and Priyanka Narad Abstract Polycystic ovary disease is a reproductive disorder which may put a woman at greater risk of infertility. Polycystic ovary disease can also lead to other diseases such as hyperandrogenism, type-2 diabetes and cardio-vascular disease. This research study has combined expression analysis using Affymetrix data and classification of the genes that are most altered/differentially expressed in polycystic ovary disease based on gene expression data (GDS4399) retrieved from NCBI Gene Expression Omnibus database. The paper includes differential gene expression and gene ontology analysis. Out of 54,675 genes, 31,445 genes were having differential expression. To attain a thorough knowledge of the mechanism for differentially expressed genes and for information on biological process and cellular component and molecular function, we performed gene ontology analysis for top 20 differentially expressed genes using DAVID: functional annotation tool. The differentially expressed genes list was further extracted for classification modelling to predict polycystic ovary disease using algorithms like penalized logistic regression, random forest, diagonal discriminant analysis, probabilistic nearest neighbours. A gene selection technique, i.e. student’s t-test was performed for improving prediction/classification performance. Comparatively, penalized logistic regression model was having an average probability of 0.768 or 76.8% with AUC score (mean) of 1. Keywords Polycystic ovary disease · Artificial intelligence · Machine learning · Microarray data · Differential gene expression analysis · Prediction · Gene ontology · Gene classification
1 Introduction Polycystic ovary disease is one of the major aspects that affects the fertility of a woman [1, 2]. PCOD can cause reproductive problems like, metabolic and hormonal S. Vats · A. Sengupta · A. Chaurasia · P. Narad (B) Systems Biology and Data Analytics Research Lab, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Roy et al. (eds.), Innovations in Computational Intelligence and Computer Vision, Advances in Intelligent Systems and Computing 1424, https://doi.org/10.1007/978-981-19-0475-2_48
547
548
S. Vats et al.
imbalance [3, 4]. About 8–13% of reproductive aged women are affected by PCOD [4]. It gives rise to improper menstruation and elevates androgen hormone levels [1, 3]. Also causes skin problems (i.e. dark skin) and hyperandrogenism [3]. The patients who report PCOD are known to develop large number of small cysts (about 8 mm) in their ovaries where eggs are produced. This creates difficulty in the release of eggs, thus resulting anovulation [1, 2, 4]. The most usual symptoms of PCOD are elevation in luteinizing hormone (LH) levels and insulin and reduction in follicle stimulating hormone (FSH) levels [3, 4]. PCOD even increases risk of other comorbid conditions like diabetes and heart conditions [5]. It also affects lifestyle can lead frequent anxiety, mood swings and eating problems [6]. Metabolism, clinical history and molecular aspects of PCOD are considered for analysis of disease [3]. Better diagnosis of PCOD could help in proper managing and handling of disease and in turn will be helpful in treating infertility problem in women. In this research work, combined microarray data analysis and gene classification of polycystic ovary disease have been performed based on gene expression data (GDS4399) [7] related to PCOD collected from Gene Expression Omnibus (GEO). This paper includes differential gene expression (DGE) and gene ontology (GO) analysis. Out of 54,675 genes, 31,445 genes are differentially expressed genes (DEGs). To acquire more detailed information about differentially expressed genes (DEGs) like molecular function (MF), biological process (BP) and cellular component (CC) we did gene ontology analysis based on top 20 DEGs using DAVID: functional annotation tool. These genes are used for classification modelling to predict PCOD using various machine learning (ML) techniques such as penalized logistic regression, random forest, diagonal discriminant analysis, probabilistic nearest neighbours. A gene selection technique, i.e. student’s t-test was also used for improving prediction/classification performance. The paper is organized as follows. Related works in Sect. 2 provide literature about the previous studies that have been performed for prediction of polycystic ovary disease using artificial intelligence (AI)/machine learning (ML) techniques. Section 3 has presented the proposed methodology of the research work. Results and discussion in Sect. 4 show the results of the research study which is discussed, respectively. Conclusion of the work is given in Sect. 5.
2 Related Works In the past, numerous studies have been reported in the literature for diagnosis of polycystic ovary disease using artificial intelligence (AI)/machine learning (ML) techniques. A PCOD diagnostic model was constructed in a study by applying random forest classifier and artificial neural network based on microarray data (achieved AUC = 0.72) and RNA-seq data (achieved AUC = 0.64) [8]. There has been also an automated diagnostic system was built based on images for identification of follicles using object detection algorithm and the diagnostic system achieved 85% as recognition rate [9]. Using ML algorithms, a PCOD predictive model was applied based
Combination of Expression Data and Predictive Modelling …
549
on PCOD clinical features in which random forest model performed the best with 96% accuracy [10]. Another study was made in which PCOD-based clinical data were utilized, and a comparative analysis was performed between K-nearest neighbour and logistic regression (LR) estimator, where LR was the best with F1 score of 0.92 [11]. In a study, predictive model was built for the prediction of polycystic ovary morphology (PCOM) based on pelvic ultrasound data using gradient boost classifier and rule-based classifier [12]. Another automated system was proposed for the diagnosis of PCOD based on potential biomarkers, in which Naïve Bayes model (with accuracy of 93.93%) performed better than logistic regression (with accuracy of 91.04%) [13].
3 Methodology The analysis of gene expression data could be a beneficial approach for finding significant genes that can give a better picture to predictive modelling of a disease. The proposed work in this paper is shown in Fig. 1. The work has performed on RStudio platform.
3.1 Data Collection The microarray data (GDS4399) [7] are retrieved from Gene Expression Omnibus (GEO) which contains gene expression data of granulosa cells collected from patients having PCOD and going in for in-vitro fertilization (IVF) procedure. The dataset consists of 10 samples in which 7 were PCOD patient samples and 3 were controls samples.
3.2 Data Normalization The normalization of data was performed using Log2 transformation. Normalized data plot is shown in Fig. 2.
3.3 Differential Gene Expression (DGE) Analysis DGE analysis was done on the given data by Limma R package. Based on cutoff (median expression level), differential expressed genes (DEGs) were obtained.
550 Fig. 1 Methodology of research study
S. Vats et al.
Data Collection (GDS4399)
Data Normalization
Differential Gene Expression
Gene Ontology
Gene Selection
Machine Learning Techniques
Model Performance Evaluation
3.4 Gene Ontology (GO) Gene ontology studies represent by mechanisms, these differential expressed genes (DEGs), like biological process (BP); molecular function (MF) and cellular component (CC), we performed gene ontology (GO) for top DEGs using DAVID: functional annotation tool. DAVID is an online application tool that provides integrated functional genomic annotation analysis programme such as gene ontology, gene classification and pathways. After gene ontology (GO) analysis, a gene selection approach, i.e. student’s t-test was applied on DEGs list and top 10 relevant genes were extracted for prediction of polycystic ovary disease.
Combination of Expression Data and Predictive Modelling …
551
L o g 2
Gene Expression Data Fig. 2 Normalized data plot
3.5 Machine Learning Techniques A number of widely used microarray-based machine learning (ML) techniques was applied for gene classification of PCOD such as penalized logistic regression, random forest, diagonal discriminant analysis, probabilistic nearest neighbours by using CMA (Bioconductor R package). The microarray-based machine learning (ML) techniques provide ease in predictive modelling based on gene expression data and eliminate the gaps occurred with gene expression data such as small sample size issue and overfitting problem. The different classifiers are trained based on learning samples which were obtained using Monte Carlo cross-validation (MCCV) method.
3.6 Model Performance Evaluation Performance was evaluated using different classifiers based on metrics such as average probability of correct prediction, AUC score, sensitivity score (true positives), specificity score (true negatives), misclassification score and Brier score (lower Brier score signifies more precise prediction).
552
S. Vats et al.
4 Results and Discussion From differential gene expression (DGE) analysis, we identified that out of 54,674 genes, 31,445 upregulated genes were selected as differentially expressed genes (DEGs) with adjusted p-value