193 54 15MB
English Pages 662 [633] Year 2022
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Mohammad Shorif Uddin Prashant Kumar Jamwal Jagdish Chand Bansal Editors
Proceedings of International Joint Conference on Advances in Computational Intelligence IJCACI 2021
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/16171
Mohammad Shorif Uddin Prashant Kumar Jamwal Jagdish Chand Bansal
•
•
Editors
Proceedings of International Joint Conference on Advances in Computational Intelligence IJCACI 2021
123
Editors Mohammad Shorif Uddin Department of Computer Science and Engineering Jahangirnagar University Dhaka, Bangladesh
Prashant Kumar Jamwal Nazarbayev University Nur-Sultan, Kazakhstan
Jagdish Chand Bansal Department of Mathematics South Asian University New Delhi, India
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-19-0331-1 ISBN 978-981-19-0332-8 (eBook) https://doi.org/10.1007/978-981-19-0332-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the International Joint Conference on Advances in Computational Intelligence (IJCACI 2021). IJCACI 2021 has been jointly organized by South Asian University (SAU), India, and Jahangirnagar University (JU), Bangladesh, under the technical co-sponsorship of the Soft Computing Research Society, India. It was held on October 23–24, 2021, at South Asian University (SAU), India, in virtual mode due to the COVID-19 pandemic. The conference was conceived as a platform for disseminating and exchanging ideas, concepts, and results of the researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. The conference focused on collective intelligence, soft computing, optimization, cloud computing, machine learning, intelligent software, robotics, data science, data security, big data analytics, signal and natural language processing. This conference is an update of the first four conferences: (1) International Workshop on Computational Intelligence (IWCI 2016) that was held on December 12–13, 2016, at JU, Dhaka, Bangladesh, in collaboration with SAU, India, under the technical co-sponsorship of IEEE Bangladesh Section; and (2) International Joint Conference on Computational Intelligence (IJCCI 2018) that was held on December 14–15, 2018, at Daffodil International University (DIU), in collaboration with JU, Bangladesh, and SAU, India, and (3) International Joint Conference on Computational Intelligence (IJCCI 2019) that was held on October 25–26, 2019, at University of Liberal Arts Bangladesh (ULAB) in collaboration with Jahangirnagar University (JU), Bangladesh, and South Asian University (SAU), India, (4) International Joint Conference on Advances in Computational Intelligence (IJCACI 2020) that was held on November 20–21, 2020, at Daffodil International University (DIU), in collaboration with JU, Bangladesh, and SAU, India. All accepted and presented papers of IWCI 2016 are in IEEE Xplore Digital Library, and IJCCI 2018, IJCCI 2019, and IJCACI 2020 are in Springer Nature Book Series Algorithms for Intelligent Systems (AIS).
v
vi
Preface
We have tried our best to enrich the quality of the IJCACI 2021 through a stringent and careful peer-reviewed process. IJCACI 2021 received a significant number of technical contributed articles from distinguished participants from home and abroad. After a very stringent peer-reviewing process, only 56 high-quality papers were finally accepted for presentation, and the final proceedings contains only 47 papers after careful selection. In fact, this book presents novel contributions in areas of computational intelligence, and it serves as reference material for advanced research. Dhaka, Bangladesh Nur-Sultan, Kazakhstan New Delhi, India
Mohammad Shorif Uddin Prashant Kumar Jamwal Jagdish Chand Bansal
Contents
1
2
3
4
5
6
7
8
Performance Analysis of Secure Hybrid Approach for Sharing Data Securely in Vehicular Adhoc Network . . . . . . . . . . . . . . . . . . Atul B. Kathole and Dinesh N. Chaudhari
1
Particle Swarm Optimization and Computational Algorithm Based Weighted Fuzzy Time Series Forecasting Method . . . . . . . . Shivani Pant and Sanjay Kumar
9
Assessing Usability of Mobile Applications Developed for Autistic Users through Heuristic and Semiotic Evaluation . . . . . . . . . . . . . . Sayma Alam Suha, Muhammad Nazrul Islam, Shammi Akter, Milton Chandro Bhowmick, and Rathin Halder
25
Blockchain Implementations and Use Cases for Inhibiting COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amirul Azim and Muhammad Nazrul Islam
41
Ant Colony Optimization to Solve the Rescue Problem as a Vehicle Routing Problem with Hard Time Windows . . . . . . . . Mélanie Suppan, Thomas Hanne, and Rolf Dornberger
53
Applying Opinion Leaders to Investigate the Best-of-n Decision Problem in Decentralized Systems . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Kruta, Urs Känel, Rolf Dornberger, and Thomas Hanne
67
Pathfinding in the Paparazzi Problem Comparing Different Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kevin Schär, Philippe Schwank, Rolf Dornberger, and Thomas Hanne
81
Empirical Evaluation of Motion Cue for Passive-Blind Video Tamper Detection Using Optical Flow Technique . . . . . . . . . . . . . . Poonam Kumari and Mandeep Kaur
97
vii
viii
9
Contents
Quantifying Changes in Sundarbans Mangrove Forest Through GEE Cloud Computing Approach . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chiranjit Singha and Kishore C. Swain
10 Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms for Software Integration Testing . . . . . . . . . . . . . . . . . . 131 Valdivino Alexandre de Santiago Júnior and Camila Pereira Sales 11 Towards a Static and Dynamic Features-Based Framework for Android Vulnerabilities Detection . . . . . . . . . . . . . . . . . . . . . . . 153 Jigna Rathod and Dharmendra Bhatti 12 A Comparative Study of Existing Knowledge Based Techniques for Word Sense Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Aarti Purohit and Kuldeep Kumar Yogi 13 An Insider Threat Detection Model Using One-Hot Encoding and Near-Miss Under-Sampling Techniques . . . . . . . . . . . . . . . . . . 183 Rakan A. Alsowail 14 A Review on Unbalanced Data Classification . . . . . . . . . . . . . . . . . 197 Arvind Kumar, Shivani Goel, Nishant Sinha, and Arpit Bhardwaj 15 Towards Developing a Mobile Application for Detecting Intoxicated People through Interactive UIs . . . . . . . . . . . . . . . . . . . 209 Ifath Ara, Tasneem Mubashshira, Fariha Fardina Amin, Nafiz Imtiaz Khan, and Muhammad Nazrul Islam 16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy Conversion Systems by Using Fuzzy MPPT Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Satyabrata Sahoo and K. Teja 17 Novel Harris Hawks Optimization and Deep Neural Network Approach for Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Miodrag Zivkovic, Nebojsa Bacanin, Jelena Arandjelovic, Andjela Rakic, Ivana Strumberger, K. Venkatachalam, and P. Mani Joseph 18 Random Forest Classification and Regression Models for Literacy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Mayur Pandya and Jayaraman Valadi 19 Towards Robotic Knee Arthroscopy: Spatial and Spectral Learning Model for Surgical Scene Segmentation . . . . . . . . . . . . . . 269 Shahnewaz Ali and Ajay K. Pandey
Contents
ix
20 Opposition-Based Arithmetic Optimization Algorithm with Varying Acceleration Coefficient for Function Optimization and Control of FES System . . . . . . . . . . . . . . . . . . . . 283 Davut Izci, Serdar Ekinci, Erdal Eker, and Laith Abualigah 21 Robot Path Planning Using b Hill Climbing Grey Wolf Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Saniya Bahuguna and Ashok Pal 22 Texture Feature Analysis for Inter-Frame Video Tampering Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Shehnaz and Mandeep Kaur 23 Computer Vision-Based Algorithms on Zebra Crossing Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Sumaita Binte Shorif, Sadia Afrin, Anup Majumder, and Mohammad Shorif Uddin 24 AI Based Multi Label Data Classification of Social Media . . . . . . . 329 Shashi Pal Singh, Ritu Tiwari, Sanjeev Sharma, and Ajai Kumar 25 Feature Extraction Based Landmine Detection Using Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 T. Kalaichelvi and S. Ravi 26 Prediction of Water Quality Index of Ground Water Using the Artificial Neural Network and Genetic Algorithm . . . . . . . . . . . 355 Mehtab Mehdi and Bharti Sharma 27 Improving Throttled Load Balancing Algorithm in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Worku Wondimu Mulat, Sudhir Kumar Mohapatra, Rabinarayana Sathpathy, and Sunil Kumar Dhal 28 IOT Based Smart Parking System Using NodeMCU and Arduino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Sai Venkata Dhanush Amirineni and Rohit Sai Kasukurthi 29 Study on Intelligent Tutoring System for Learner Assessment Modeling Based on Bayesian Network . . . . . . . . . . . . . . . . . . . . . . 389 Rohit B. Kaliwal and Santosh L. Deshpande 30 Finite Element Analysis of Prosthetic Hip Implant . . . . . . . . . . . . . 399 Priyanka Jadhav, Swar Kiran, T. Tharinipriya, and T. Jayasree 31 A Comprehensive Study on Multi Document Text Summarization for Bengali Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Nadira Anjum Nipa and Naznin Sultana
x
Contents
32 Deep Learning-Based Lentil Leaf Disease Classification . . . . . . . . . 427 Kaniz Fatema, Md. Awlad Hossen Rony, Kazi Mumtahina Puspita, Md. Zahid Hasan, and Mohammad Shorif Uddin 33 Framework for Diabetes Prediction Using Machine Learning Techniques Through Swarm Intelligence . . . . . . . . . . . . . . . . . . . . 445 C. Kalpana and B. Booba 34 Statistical Post-processing Approaches for OCR Texts . . . . . . . . . . 457 Quoc-Dung Nguyen, Duc-Anh Le, Nguyet-Minh Phan, Nguyet-Thuan Phan, and Pavel Kromer 35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 M. M. Sravani, S. Ananiah Durai, M. Prathyusha Reddy, G. Sowjanya, and Nabihah Ahmad 36 A Framework for Improving the Accuracy with Different Sampling Techniques for Detection of Malicious Insider Threat in Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 G. Padmavathi, D. Shanmugapriya, and S. Asha 37 Customer Churn Analysis Using Machine Learning . . . . . . . . . . . . 495 Ritika Tyagi and K. Sindhu 38 A Comparative Study of Hyperparameter Optimization Techniques for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Anjir Ahmed Chowdhury, Argho Das, Khadija Kubra Shahjalal Hoque, and Debajyoti Karmaker 39 Fault Location on Transmission Lines of Power Systems with Integrated Solar Photovoltaic Power Sources . . . . . . . . . . . . . 523 Thanh H. Truong, Duy C. Huynh, and Matthew W. Dunnigan 40 Emergency Vehicle Detection Using Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Samiul Haque, Shayla Sharmin, and Kaushik Deb 41 Emotion Recognition from Speech Using Deep Learning . . . . . . . . 549 MD. Muhyminul Haque and Kaushik Deb 42 Secure Predictive Analysis on Heart Diseases Using Partially Homomorphic Machine Learning Model . . . . . . . . . . . . . . . . . . . . 565 M. D. Boomija and S. V. Kasmir Raja 43 Artificial Intelligent Based Control of Improved Converter for Hybrid Renewable Energy Systems . . . . . . . . . . . . . . . . . . . . . . 583 L. Chitra and K. S. Kavitha Kumari
Contents
xi
44 Quality Analysis of PATHAO Ride-Sharing Service in Bangladesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Md. Biplob Hosen, Nusrat Jahan Farin, Mehrin Anannya, Khadija Islam, and Mohammad Shorif Uddin 45 An Image Steganography Technique Based on Fake DNA Sequence Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Subhadip Mukherjee, Sunita Sarkar, and Somnath Mukhopadhyay 46 Random Forest Based Legal Prediction System . . . . . . . . . . . . . . . 623 Riya Sil 47 Problem Solution Strategy Assessment of a Hybrid Knowledge-Based System in Teaching and Learning Practice . . . . 635 Kamalendu Pal Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
About the Editors
Prof. Mohammad Shorif Uddin completed his Doctor of Engineering (Ph.D.) at Kyoto Institute of Technology in 2002, Japan, Master of Technology Education at Shiga University, Japan, in 1999, Bachelor of Electrical and Electronic Engineering at Bangladesh University of Engineering and Technology (BUET) in 1991 and Master of Business Administration (MBA) from Jahangirnagar University in 2013. He began his teaching career as Lecturer in 1991 at Chittagong University of Engineering and Technology (CUET). In 1992, he joined the Computer Science and Engineering Department of Jahangirnagar University, and at present, he is a professor of this department. Besides, he is Teacher-in-Charge of the ICT Cell of Jahangirnagar University. He served as Chairman of the Computer Science and Engineering Department of Jahangirnagar University from June 2014 to June 2017 and as Adviser of ULAB from September 2009 to October 2020. He undertook postdoctoral research at Bioinformatics Institute, Singapore, Toyota Technological Institute, Japan, and Kyoto Institute of Technology, Japan, Chiba University, Japan, Bonn University, Germany, Institute of Automation, Chinese Academy of Sciences, China. His research is motivated by applications in the fields of artificial intelligence, imaging informatics, and computer vision. He holds two patents for his scientific inventions and has published more than 170 research papers in international journals and conference proceedings. He had delivered a remarkable number of keynotes and invited talks and acted as General Chair or TPC Chair or Co-Chair of many international conferences. He received the Best Paper Award in the International Conference on Informatics, Electronics & Vision (ICIEV2013), Dhaka, Bangladesh, and the Best Presenter Award from the International Conference on Computer Vision and Graphics (ICCVG 2004), Warsaw, Poland. He was Coach of Janhangirnagar University ACM ICPC World Finals Teams in 2015 and 2017 and supervised a good number of doctoral and master theses. He is currently President of Bangladesh Computer Society (BCS), Fellow of IEB and BCS, a Senior Member of IEEE, and Associate Editor of IEEE Access.
xiii
xiv
About the Editors
Prof. Prashant Kumar Jamwal earned Ph.D. degree and a post-doctoral fellowship from the University of Auckland, New Zealand. Earlier, he had obtained M. Tech. from I.I.T., India, securing the first position in all the disciplines and B. Tech. from MNREC, Allahabad, India. Presently, he is working as Associate Professor at the School of Engineering and design sciences, Nazarbayev University (NU), Astana, Kazakhstan. He is actively pursuing research in artificial intelligence, multi-objective evolutionary optimization, mechatronics systems, biomedical robotics, and fuzzy mathematics. He is applying his research in the development of medical robots for rehabilitation and surgical applications besides the development of improved algorithms for cancer data analytics. He has more than 25 years of teaching and research experience and has published many research articles in reputed international journals/conferences. He has won many awards such as best paper awards in conferences, best digital solution award for his medical robots, Asian Universities Alliance (AUA) Scholars Award, etc., and recently United Nations acknowledged one of his robotics projects as one of the top twenty innovative projects in the world. He is working as Editor for the International Journal of bio-mechatronics and bio-robotics and as a reviewer to quite a few international journals and conferences of repute. He led many government-funded research projects and has so far received research grants worth more than $5M including a prestigious World Bank grant. Dr. Jagdish Chand Bansal is an Associate Professor at South Asian University New Delhi and Visiting Faculty at Maths and Computer Science, Liverpool Hope University UK. Dr. Bansal has obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU, New Delhi he has worked as an Assistant Professor at ABV- Indian Institute of Information Technology and Management Gwalior and BITS Pilani. His Primary area of interest is Swarm Intelligence and Nature Inspired Optimization Techniques. Recently, he proposed a fission-fusion social structure-based optimization algorithm, Spider Monkey Optimization (SMO), which is being applied to various problems from the engineering domain. He has published more than 70 research papers in various international journals/conferences. He is the editor in chief of the journal MethodsX published by Elsevier. He is the series editor of the book series Algorithms for Intelligent Systems (AIS) and Studies in Autonomic, Data-driven and Industrial Computing (SADIC) published by Springer. He is the editor in chief of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also the Associate Editor of IEEE ACESSS published by IEEE and ARRAY published by Elsevier. He is the general secretary of Soft Computing Research Society (SCRS). He has also received Gold Medal at UG and PG levels.
Chapter 1
Performance Analysis of Secure Hybrid Approach for Sharing Data Securely in Vehicular Adhoc Network Atul B. Kathole
and Dinesh N. Chaudhari
1 Introduction A WSN is a clustered wireless network. It does not have a pre-defined infrastructure, and nodes may communicate directly with one another [1, 2]. Due to the ad hoc nature of the web, it is very susceptible to Dos attacks at the network layer. Sybil attacks are generic web-based assaults, often known as Ad hoc attacks. Malicious nodes obstruct network data transmission by providing inaccurate routing information [2]. In an attack dubbed the black hole, negative nodes broadcast false routing information to neighboring nodes, informing them of a minor route to the target node. After obtaining this bogus information, the source transmits packets through these malicious nodes, which discard the packets. On the other hand, the packages will not reach the network node that serves as the destination. The gray hole is considered an extension of Blackhole since malevolent nodes cannot be anticipated. Sybil’s attack is analogous to the temporary impression of generating new nodes or network entities and sending bogus data to another network node. On some occasions, it may act maliciously, but on others, it may behave normally. Both of these attacks disable the route discovery process, lowering the throughput-to-packet distribution ratio [3].
A. B. Kathole (B) Department of Computer Engineering, Pimpri Chinchwad College of Engineering (PCCOE), Pune, India e-mail: [email protected] D. N. Chaudhari Department of Computer Science and Engineering, Jawaharlal Darda Institute of Engineering and Technology, Yavatmal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_1
1
2
A. B. Kathole and D. N. Chaudhari
There are many different causes for the declining PDR in MANET: high data rate, blocking or excessive load, high portability, etc. In any case, there are instances when MANET’s protection mechanisms struggle to provide acceptable alternative reasons for a dropped PDR scenario. For instance, using conventional techniques results in erroneous estimations of malicious nodes [1–3]. This article develops a crucial safety mechanism dubbed the Hybrid method for securing interaction and avoiding DSR protocol assaults. This procedure determines if any nodes in the network are malicious since the network is composed of many nodes. To eliminate these rogue nodes, the sophisticated DSR protocol is used. As a result, all malicious nodes are eliminated. If a nearby node gets incorrect routing information from an intermediary node, the node should be regarded as malevolent. The intermediate node informs the other nodes and any node that gets the information about the malicious nodes and updates its routing database to designate the node as malicious. When an RREQ is delivered, a malicious node list is added, and other nodes alter the receiving node’s routing table. Thus, by detecting erroneous routing information or examining the routing table, nodes may identify rogue nodes and alert other nodes not to accept the malicious nodes’ routing information. Numerous nodes are connected through ties to form the network. A unique Id identifies each node, and each packet is stamped with the identity of the associated source node. This critical information is maintained at each computer node in the network. MANET, or Mobile Ad hoc Networking, is a novel technology based on a multihop architecture that is wireless and does not need a rigid infrastructure or prior network node configuration. The following are the key characteristics of this new networking model [4]. Strategies often fail to comprehend the true nature of an unfavorable case’s reasons. This leads to a high number of false positives for nodes that are not malicious and low detection rates for malicious nodes. Such vulnerabilities exist as a result of the assumptions made by these confidence-based security methods. After all, packet losses occur only due to malevolent behavior on the part of misbehaving nodes [5]. These are, in any case, due to a variety of factors, including insufficient agility propagation, congestion, and wireless connections. Without a fine-grained examination of packet losses, conventional detection methods may result in erroneous confidence estimations, mainly when mobile nodes and data rates are high [6]. The rest of the research is organized as follows: Sect. 4 provides nodes for harmful identification systems in Sect. 2, Related Work, and Sect. 3 Study Process. Section 5 details the simulation’s effects. Finally, Sect. 6 contains a summary of the study.
1 Performance Analysis of Secure Hybrid Approach …
3
2 Research Method We recommended the Hybrid Bait Detection System, (HBDS), as a harmful detection tool in this post. This is just a parameter for negative identification nodes to resolve the problems encountered by earlier detection methods, which were mainly dependent on packet losses. The HBDS presented is a two-stage detection method in which the MANET defense provides and reduces the nodes that are malicious detection mistakes. Numerous security mechanisms make it impossible to identify rogue nodes with certainty [7]. The graph illustrates the node mobility about the packet delivery ratio, throughput, and end-to-end latency. In the presence of Sybil attack, HBDS has a packet delivery ratio of 96.5%, a throughput of 38.37%, and an end-to-end latency of 0.34%, which is comparable to FGA. Utilizing the suggested method enables the network’s overall preference to be improved to achieve maximum throughput in the shortest amount of time [8].
3 Malicious Activity Detection The primary goal of this study is to offer insight into the malicious node detection mechanism used in MANETs to enhance security and performance. The software is built on the HBDS framework, a capable of defending against a variety of MANET assaults. To optimize end-to-end latency and PDR performance, the hybrid CBDS (HBDS) technique is needed. The proposed work versus existing HBDS methods would significantly improve performance against a range of network attacks [9]. We utilize the suggested method to avoid a Sybil attack on a malicious node inside a particular MANET. Algorithm The value assigned to a node by a source node will be computed and compared to the actual behavior of each node in order to determine the correctness of the underlying trust-based framework that has evaluated credibility criteria. The pseudocode for our HBDS technique is presented after the algorithm [3],
4
A. B. Kathole and D. N. Chaudhari
The above algorithm will consider the different parameters with packet loss & PDR to evaluate the node working. If the particular node drops the packet gather, then the set threshold value and PDR are also greater than 0.5 that time node id is captured and stored in a separate table as it can be a malicious node on an above predication [2]. The Network environment is 500 m * 500 m. The number of different nodes is shown in Table 1 below. In addition, the proposed phenomenon has been tested Table 1 Parameters use during execution
1 Performance Analysis of Secure Hybrid Approach …
5
against malicious situations where attackers have infected various legitimate nodes [3].
4 Simulation and Results Below you will find some analysis of HBDS performance using corresponding methods. 1)
Positive Rates The sum of false positives is the ratio of valid nodes classified as unsafe to the total number of legitimate nodes. We compared the proposed architecture with DSR, CBDS, and FGA in terms of false positives. This time we covered all structures with our optimization model and included malicious nodes in the network. The false alarm rate as the speed of the node increases is shown in Fig. 1. As can be seen from the schematic diagram, compared with other solutions, the number of false positives in our HBDS system has significantly been reduced. The method we propose can better investigate the common possible causes of packet loss events and then determine the reliability of the node. In general, statistics show that as the speed of growing nodes increases, the rate of false positives will also increase. There is a reason for this trend: if the node moves faster, the possibility of the source node eavesdropping badly or routing information out of date will increase significantly, resulting in the source node being declared malicious. Means that as the density of nodes in the network increases and the frequency of false alarms, it is difficult to maintain a speed of 4 m/s in the nodes. As the number of nodes in the architecture increases, the source/destination pair will increase as the number of lost packets due to network conflicts increases. Compared with many other schemes that also treat dropped packets as malicious activities, the number of false positives in the
Fig. 1 Effect of a node moving speed and density on false positive. a False positives vs node moving speed
6
A. B. Kathole and D. N. Chaudhari
Fig. 2 Effect of a node moving speed and density on detection rate. a Detection rate vs node moving speed
HBDS plan is lower because the drop rate of each packet is measured before evaluating the behavior of the node. 2)
Detection Rate Compared to the other plan’s rate, our HBDS framework offers a higher level of identification efficacy. Similarly, Fig. 2(a) illustrates the recognition rate as a function of expanding hub speed for the HBDS methodology and other techniques. In contrast, Fig. 2(b) illustrates the location rate as a function of expanding hub thickness. The amount of information associations inside an organization increases as the hub thickness increases, as more bundles are lost due to effects. The alternative paradigm views these parcel drops as upstream actions from real hubs. Subsequently, as shown in the figure, the identification rate is more significant in our HBDS cycle than in the other cycles [4].
3)
Packet Loss Rate The packet loss rate as a function of increasing node speed is shown in Fig. 3 for the FGA and HBDS schemes, respectively. Our HBDS system, as shown in the picture, has a lower packet loss rate than the FGA system. In the HBDS system, more reliable nodes are often chosen for routing, resulting in fewer error packets and a higher package transmission ratio [4].
1 Performance Analysis of Secure Hybrid Approach …
7
Fig. 3 Effect of node moving packet loss rate
5 Conclusion Numerous people know that Sybil assaults are the most destructive kind of attack on an ad hoc network. While there are many ways for defending Ad hoc networks against such attacks, traditional preventative measures have significant limitations and disadvantages in this area. Numerous conventional methods are imprecise in their use. DSR often fails to remove rogue nodes during the route discovery process, failing Sybil attacks to deliver all data packets to the target. Additionally, when the number of malicious nodes grows in these assaults, the Packet Distribution Ratio, or PDR, may reduce throughput. As a result, a new technology called HBDS has been proposed for securing ad hoc networks. In conclusion, we can prevent Sybil attacks through our suggested methods, and the observed increase in throughput and Packet Delivery Ratio is noteworthy.
References 1. Kathole AB, Chaudhari DN (2019) Pros & cons of machine learning and security methods. JGRS 21(4). ISSN:0374-8588. http://gujaratresearchsociety.in/index.php/ 2. Kathole AB, Halgaonkar PS, Nikhade A (2019) Machine learning & its classification techniques. Int J Innov Technol Explor Eng 8(9S3), 138–142. ISSN:2278-3075 3. Kathole AB, Chaudhari DN (2019) Fuel analysis and distance prediction using machine learning. Int J Future Revol Comput Sci Commun Eng 5(6) 4. Hasrouny H, Samhat AE, Bassil C, Laouiti A (2017) VANet security challenges and solutions: a survey. Veh Commun 7:7–20
8
A. B. Kathole and D. N. Chaudhari
5. Yaqoob I, Ahmad I, Ahmed E, Gani A, Imran M, Guizani N (2017) Overcoming the key challenges to establishing vehicular communication: is SDN the answer. IEEE Commun Mag 55(7):128–134 6. Ahmad I, Noor RM, Ali I, Imran M, Vasilakos A (2017) Characterizing the role of vehicular cloud computing in road traffic management. Int J Distrib Sensor Netw 13(5):1550147717708728 7. Khan MS, Midi D, Khan MI, Bertino E (2017) Fine-grained analysis of packet loss in manets. IEEE. ISSN:2169-3536 8. Ahmad I, Ashraf U, Ghafoor A (2016) A comparative QoS survey of mobile ad hoc network routing protocols. J Chin Inst Eng 39(5):585–592 9. Li L, Lee G (2005) DDoS attack detection and wavelets. Telecommun Syst 28(3–4):435–451
Chapter 2
Particle Swarm Optimization and Computational Algorithm Based Weighted Fuzzy Time Series Forecasting Method Shivani Pant and Sanjay Kumar
1 Introduction Traditional time series forecasting techniques have the disadvantage of being unable to cope with forecasting difficulties involving uncertainties due to non-probabilistic reasons of imprecision and vagueness in time series data. In 1993, Song and Chissom [1–3] created a model based on indefinite knowledge and uncertainty inherent in time series data to overcome this challenge. They employed notion of fuzzy set [4] to articulate non-probabilistic uncertainties and referred them fuzzy time series (FTS) forecasting models. Thereafter, various researchers [5–10] proposed a slew of models to enhance the accuracy in FTS forecasting. Abhishekh et al. [11] developed a weighted method for forecasting type-2 FTS forecasting. Gautam and Abhishekh [12] used moving average approach to make the forecast. Computational approaches have also been developed in the past as they have the advantages of being simple to use, capable of handling massive time series databases, and improving the model’s forecast accuracy. Singh [13] forecasted the University of Alabama enrolments using a simple technique based on computation. Jain et al. [14] proposed a computational method to partition the universe of discourse (UOD). Bisht and Kumar [15] devised a computational approach for FTS forecasting using hesitant fuzzy set. Gangwar and Kumar [16, 17] created a high-order computational method based on multiple partitions. Joshi and Kumar [18] developed intuitionistic fuzzy set based computational method. Alam et al. [19] used simple arithmetic rule to forecast time series data using intuitionistic fuzzy set. In many FTS models, equal weights were assigned to fuzzy relations which do not reflect the importance of individual fuzzy relationships. To address this issue, Yu [20] suggested weighted FTS. Later, Cheng [21]; Rubio et al. [22]; Kumar [23]; Yang et al. [24]; Jiang et al. [25] applied S. Pant · S. Kumar (B) Department of Mathematics, Statistics and Computer Science, G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_2
9
10
S. Pant and S. Kumar
this method to forecast time series, with satisfactory results. Gautam et al. [26] used weighted method in intuitionistic environment. Techniques such as swarm intelligence, clustering algorithms, machine learning approaches, and neural networks have also been found to be useful in enhancing the accuracy in FTS forecasting. Singh [27] developed a hybrid neutrosophic-PSO based model. Tinh [28] model was created with fuzzy C-means and PSO. Iqbal et al. [29] used clustering and weighted average strategy. Pattanayak et al. [30] exploited Chemical Reaction Optimization (CRO) coupled with pi-sigma neural network (PSNN) in FTS forecasting. Other recently created models in the forecasting field include those by Zeng et al. [31]; Pattanayak et al. [32–34]; Panigrahi and Behera [35]; Egrigolu et al. [36, 37]. The recent researches include the model of Pant and Kumar [38] where the PSO was used to optimize length of intervals used to partition UOD in intuitionistic FTS forecasting. This paper put forward an enhanced computational process for forecasting highorder weighted FTS forecasting method in which weights of fuzzy logical relations (FLRs) are considered in increasing order. The PSO technique has been widely employed in past to partition the universe to increase model accuracy, but we’re utilizing it here to optimize the weights of FLRs. The model has a benefit over other models since it does not build relational equations using sophisticated min–max operations, also it does not require defuzzification procedures, thereby saving time. The approach is used to forecast the University of Alabama enrolments.
2 Preliminaries In this division we discuss basic definitions of fuzzy set, fuzzy time series, timevariant and time-invariant fuzzy time series. In addition, the weighted fuzzy time series and typical PSO techniques are thoroughly explained.
2.1 Fuzzy Set Let X = {x1 , x2 , x3 , . . . . . . , xn } be the UOD. Then a fuzzy set S on X can be defined as follows: S = {(xi , μ S (xi )) : xi ∈ X, i = 1, 2, 3, . . . , n} Or equivalently, S = μ S (x1 )/x1 + μ S (x2 )/x2 + μ S (x3 )/x3 + · · · + μ S (xn )/xn =
n i=1
μ S (xi )/xi
2 Particle Swarm Optimization and Computational Algorithm …
11
where, μ S : X → [0, 1] is the membership function of fuzzy set A, which denotes the degree of certainty of elements in X are to belongs to the set S.
2.2 Fuzzy Time Series Let Y (t), (t = ..., 0, 1, 2, . . . ) be the UOD such that Y (t) ⊆ R. Assume f i (t), (i = 1, 2, . . . ) to be fuzzy sets defined on Y (t). Then F(t) which is the collection of f i (t) is referred as fuzzy time series (FTS) on universe Y (t). If F(t) is resulted from F(t − 1) only i.e., the relation F(t − 1) → F(t) then it is known as first order FTS model and the corresponding fuzzy relational equation is given by following expression: F(t) = F(t − 1) ◦ R(t, t − 1) where “◦” is the max–min composition operator and R(t, t − 1) is fuzzy relation between F(t) and F(t − 1). If F(t) is caused by F(t − 1), F(t − 2), . . . , F(t − n) i.e., the relation F(t − 1), F(t − 2), . . . , F(t − n) → F(t) then it is known as nth order fuzzy time series model. Let R(t, t − 1) be the first order relation of FTS F(t). If R(t, t − 1) = R(t − 1, t − 2) at any time t, then F(t) is called a time-invariant FTS else known as time-variant FTS i.e., relation R(t, t − 1) is time dependent and could vary from R(t − 1, t − 2).
2.3 Weighted Fuzzy Time Series Yu [20] proposed the concept of weighted fuzzy time series to address two main issues arising in forecasting FTS namely recurrence and weighting. The assignment of identical weights to the fuzzy connections does not appropriately reflect the relevance of individual fuzzy relations, and ignorance of the recurring fuzzy relations leads to knowledge loss. Weighted FTS models resolves both these issues and, in many studies, it has been found that these models perform better than the traditional fuzzy time series models. It is considered that the most recent FLR has a greater influence on the next forecast, so the most recent FLR is assigned with greater weight than the weights of previous FLR’s. The proposed model assigns weight in the following manner: If wn w1 w2 w3 F1 −→ F2 , F2 −→ F3 , F3 −→ F4 , ..., Fn −→ Fn+1 are the FLR’s utilized to forecast Fn+2 , where the weights wi are the optimized weights which may take any value in the interval [i − 1, i] i.e. the weights are linearly increasing but may even take non-integer value for a FLR.
12
S. Pant and S. Kumar
2.4 Particle Swarm Optimization (PSO) PSO, proposed by Kennedy and Eberhart [39] is a swarm intelligence technique which is capable of solving complex mathematical problems by finding an optimal solution. It models the collective behavior of bird flocking or fish schooling. The members or particles in the swarm work in a cooperative way when they go on the hunt for food. Each member changes its position by learning from experiences of its own and its neighbours i.e. the other members. The velocity of the particle is also updated subject to three factors: inertia weight, particle’s best locations and the best position of the swarm; which are also updated throughout the process. The equations to update velocity and position are given as follows: +1
=
×
+
1
× ×
−
+1
+
=
+
1
×
′
×
−
(1)
+1
(2)
Here, ω represents the inertia weight coefficient and, was embedded to offset search ability of particles locally and globally. Typically, it is considered linearly decreasing and is given by: (ωmax − ωmin ) × tc − tmax
ω = ωmax
(3)
Here, ωmin , ωmax are the beginning and end value of ω(t) and tc is the current iteration and tmax is the maximum number of iterations. Symbols c1 and c2 are acceleration coefficients which may be dynamic or can be set constant in standard PSO. r and r are any random number which lie in range [0,1]. and xit are velocity and position of the is bounded to where ith particle in t th iteration respectively. are constants defined by user. pib is the ith particle’s personal best both location, while pg is the swarm’s global best position.
3 Proposed PSO Based FTS Model The suggested model partitions the time series using Gangwar and Kumar’s [16] ratio formula, and then uses a computational technique to get the forecast, where the best weights are derived using PSO. For every partition, the difference parameters and weighted fuzzy relations are given by the following rule:
2 Particle Swarm Optimization and Computational Algorithm …
(i) (ii)
(iii)
(iv)
13
To forecast enrolment for the third year (1973), we use the fuzzy weighted w1 relation F1 −→ F2 and the difference parameter used is w1 × |D2 − D1 |. For forecasting enrolment of fourth year (1974), the fuzzy weighted relations w1 w2 1 2 used are F1 −→ F2 & F2 −→ F3 . For this case w1w+w ×|D2 − D1 | & w1w+w × 2 2 |D3 − D2 | are used. In the same way for forecasting the enrolment of fifth year (1975), we w1 w2 w1 use F1 −→ F2 & F2 −→ F3 & F3 →w3 F4 . For this case w1 +w × 2 +w3 w2 w3 |D2 − D1 | & w1 +w2 +w3 × |D3 − D2 | & w1 +w2 +w3 × |D4 − D3 | are used. In general, for (n + 2)th year we can implement the above method in the similar way i.e. we have the weighted fuzzy relawn w1 w2 w3 tions F1 −→ F2 , F2 −→ F3 , F3 −→ F4 , . . . , Fn −→ Fn+1 . For this w1 w2 |, |D |D × − D × − D case w1 +w2 +w 2 1 3 2 |, w1 +w2 +w3 +...+wn 3 +...+wn wn w3 |, | |D |D . . . , are × − D × − D 4 3 n+1 n w1 +w2 +w3 +...+wn w1 +w2 +w3 +...+wn used.
The suggested approach is implemented in the following steps: Step 1: Define the UOD, which is denoted by U as: U = Dm − d, D M + d where Dm , D M are the lowest and highest value of the data, respectively. d and d are two positive numbers which are chosen to set the buffers for the time series. Step 2: Partition UOD into intervals of equal length as: T1 , T2 , . . . , Tm . Step 3: The fuzzy sets Fi are constructed and the number of fuzzy sets are in compliance with number of intervals formed in step 2. Each fuzzy set is subjected to the triangle membership function. Step 4: The fuzzification of time series is performed by allocating the fuzzy set having highest membership degree. Step 5: The FTS is again repartitioned by using ratio formula suggested by D M +Dm Gangwar and Kumar [16] i.e., p = 2(D M −Dm ) Step 6: The annotations used are given as follows: ∗F j is corresponding interval T j whose membership in fuzzy set F j is supremum (i.e., 1). L ∗F j is lower bound of T j U ∗F j is upper bound of T j M[∗F i ]is middle value of interval Ti M ∗F j is middle value of interval T Ai , Ai−1 , Ai−2 , Ai−c and Ai−(c+1) are actual enrolments for the year n, (n − 1), (n − 2), (n − c) and (n − (c + 1)), respectively and E j is crisp forecasted enrolment for the year (n + 1). The suggested technique would construct rules for implementing on FLR Fi → F j , where Fi is fuzzified enrolment of nth year, also known as current state, and F j is fuzzified enrolment of (n + 1)th year, also known as next state, using previous data spanning 1 to n years. In this method for each partition the weights are assigned in generating the FLR’s in the time series data for years 1 to n to forecast enrolment of year (n + 1). The
14
S. Pant and S. Kumar
weighted differences of the previous n years are used as a fuzzy parameter for estimating next year’s enrolment. Computational Algorithm The algorithm starts from m = 1 (first partition) to m = k (last partition). For m = 1 For i = 2 to n (last entry of time series data in each partition) and we have FLR Fi → F j for the year i to (i + 1) We compute the difference parameter for year i as follows: w i−1 Di = i−1 × |A2 − A1 | l=1 wl
i−1 w(i−(c+1)) − × Ai−c − Ai−(c+1) , where w0 = 0 i−1 c=1 w l l=1
(4)
U = 0 and V = 0. For b = 2 to i with increment of 0.1 Calculate Fib and F Fib by using the formula: Fib = M[∗Fi ] +
2 × Di (b − 1)
F Fib = M[∗Fi ] −
2 × Di (b − 1)
(5) (6)
If L ∗F j ≤ Fib ≤ U ∗F j then (b + 1) × Fib 2
(7)
(b + 1) 2
(8)
(b + 1) × F Fib 2
(9)
(b + 1) 2
(10)
U =U+
V =V+ and If L ∗F j ≤ F Fib ≤ U ∗F j then U =U+
V =V+
Based upon the above estimated values, the forecast is made using the formula below:
2 Particle Swarm Optimization and Computational Algorithm …
U + M ∗F j Ej = (V + 1)
15
(11)
We repeat the procedure for the remaining partitions in the same way. In our computational algorithm, for calculating the forecast of the next state, we calculate two variables Fib and F Fib using Eqs. (5) and (6) respectively, which are dependent on b, which itself is dependent oni. Now, for every change in b; Fib and F Fib takes the value in the right and left neighbourhood of the middle value of the previous state (M[∗Fi ]) respectively and based on it, checks the that the condition two value lies in the interval of the next state i.e., in range [L ∗F j , U ∗F j ], if it does then the constants U and V are incremented by using Eqs. (7)–(10) else they are kept unaltered. Here, the variable Fib takes into account the possibility that the next state data value is greater than the previous state value and the value of F Fib takes into account the possibility that the next state data value is less than the previous state value. The small increment of b which in result affect both the variables Fib andF Fib , helps to ensure that the neighbourhoods of the previous states are well examined since it will help in setting the parameters U and V, which are further used in predicting the next state. Employing PSO to the Model We set the parameters of our PSO: Acceleration coefficients c1 and c2 are taken to be dynamic, and are given by the formula:
tc + c1i c1 = c1 f − c1i × tmax
(12)
tc c2 = c2 f − c2i × + c2i tmax
(13)
where, c1i , c1 f are respective initial and final value of constant c1 ; c2i , c2 f are the initial and final value of constant c2 respectively. The dynamic parameters were designed to strike a balance between exploitation and exploratory searches while also preserving the swarm’s diversity. This technique aid in the algorithm’s avoidance of premature convergence, delay or slow convergence and increase its resilience. We set, c1i = c2i = 0.5; c1 f = c2 f = 2.5. The range of velocity is taken as [−1,1]. ωmax = 1.4, ωmin = 0.4. Total number of particles = 50. The maximum number of iterations are set to 100. The positions, or weights, must be such that the weights wi lie in the range [i−1,i]. Additionally, the position of each updated particle must not exceed the preset range, and if it does, it must be set to the boundary value. The stopping criteria is either the iterations are over or the optimal solution has been found i.e. the solution begins to repeat.
16
S. Pant and S. Kumar
The algorithm of PSO is employed to each partition and is given as follows: PSO algorithm for optimizing weight and FTS forecasting 1.
Set PSO parameters
2.
Initialize particles random position and velocity (in defined range)
3.
while (stopping criteria not met)
4. 5.
for all the particle i Compute fitness function (RMSE based on forecast made using computational algorithm given above)
6.
Update
and
(based on comparing previous personal best value with new fitness function
value) 7. 8. 9.
Update
(Best position among personal best position)
Update velocity and position of particle (using equations (1) and (2)) end for
10. end while
The ultimate position achieved by employing PSO will be the ideal weights used in the FLRs for forecasting the time series using computational method. Step 7: Verify the model’s forecasting accuracy using the following error measures. n 2 i=1 (Actualvalue− f or ecastedvalue) , where n Root Mean Square Error (RMSE) = n indicates the number of forecasts that have been made. f or ecastedvalue| × 100 Forecasting Error (%) = |Actualvalue− Actualvalue f or ecastingerr or Average Forecasting Error (AFE) = number o f err or s
4 Experimental Study The proposed FTS forecasting method is tested on historical enrolment of the University of Alabama. Following are the stages for putting the method into action: Step 1: Taking, Dm = 13055, D M = 19337, d = 55 and d = 663, UOD is defined as: U = [13000, 20000] Step 2: We partition UOD into seven equal intervals as follows: T1 = [13000, 14000]T2 = [14000, 15000]T3 = [15000, 16000]T4 = [16000, 17000] T5 = [17000, 18000]T6 = [18000, 19000]T7 = [19000, 20000]
2 Particle Swarm Optimization and Computational Algorithm …
17
Step 3: Seven fuzzy sets i.e. F1 , F2 ,…, F7 are defined on U, and the membership of each element in these fuzzy sets are given in the following way: F1 = 1/T1 + 0.5/T2 + 0/T3 + 0/T4 + 0/T5 + 0/T6 + 0/T7 F2 = 0.5/T1 + 1/T2 + 0.5/T3 + 0/T4 + 0/T5 + 0/T6 + 0/T7 F3 = 0/T1 + 0.5/T2 + 1/T3 + 0.5/T4 + 0/T5 + 0/T6 + 0/T7 F4 = 0/T1 + 0/T2 + 0.5/T3 + 1/T4 + 0.5/T5 + 0/T6 + 0/T7 F5 = 0/T1 + 0/T2 + 0/T3 + 0.5/T4 + 1/T5 + 0.5/T6 + 0/T7 F6 = 0/T1 + 0/T2 + 0/T3 + 0/T4 + 0.5/T5 + 1/T6 + 0.5/T7 F7 = 0/T1 + 0/T2 + 0/T3 + 0/T4 + 0/T5 + 0.5/T6 + 1/T7 Step 4: Fuzzification of the time series is performed using the fuzzy sets defined in step 3, and fuzzified enrolments are shown in Table 1. Since, the data point 13,055 belongs to the interval T1 , and any element in interval T1 has membership 1 in fuzzy set F1 , 0.5 in fuzzy set F2 and 0 in other fuzzy sets. Hence, based on the maximum membership fuzzy set F1 is allocated corresponding to the element 13,055. Similarly, the data point 14,696 belongs to the interval T2 , and it has highest membership i.e. 1 in fuzzy set F2 , therefore F2 is assigned to it. In the similar manner the rest of the data points are also fuzzified. Table 1 Enrolments of the University of Alabama (Actual and fuzzified) Year
Actual
Fuzzified
Year
Actual
Fuzzified
1971
13,055
F1
1982
15,433
F3
1972
13,563
F1
1983
15,497
F3
1973
13,867
F1
1984
15,145
F3
1974
14,696
F2
1985
15,163
F3
1975
15,460
F3
1986
15,984
F3
1976
15,311
F3
1987
16,859
F4
1977
15,603
F3
1988
18,150
F6
1978
15,861
F3
1989
18,970
F6
1979
16,807
F4
1990
19,328
F7
1980
16,919
F4
1991
19,337
F7
1981
16,388
F4
1992
18,876
F6
15,433
15,497
15,145
15,163
1982
1983
1984
1985
15,861
1978
16,388
15,603
1977
1981
15,311
1976
16,919
15,460
1975
16,807
14,696
1974
1980
13,867
1973
1979
13,563
1972
2
13,055
1971
1
Actual
Year
Partition
15,500
15,500
15,500
15,622
16,500
–
–
15,500
15,500
15,500
15,500
14,500
13,500
–
–
Gangwar and Kumar [17]
15,546
15,309
15,309
15,383
17,178
17,178
16,748
15,309
15,546
15,309
15,383
14,648
13,668
13,423
–
Gautam et al. [26]
Table 2 Predicted value from different models
15,612
15,594
15,952
15,892
16,374
16,932
16,827
16,194
16,052
15,768
15,918
15,145
14,402
14,091
14,011
Jain et al. [14]
15,458
15,375
15,375
15,375
16,833
16,833
16,500
15,375
15,458
15,375
15,375
14,500
13,500
13,500
–
Alam et al. [19] –
15,566.30
15,426.09
15,419.12
15,405.14
16,784.26
17,474.65
16,816.33
15,624.97
15,515.36
15,414.23
15,450.02
14,696.31
13,866.65
13,563.13
Zeng et al. [31]
15,544
15,544
15,994
15,994
17,230
15,994
16,665
15,544
15,544
15,544
15,427
14,722
13,682
13,682
–
Pant and Kumar [38]
15,349
15,549
15,549
16,049
16,449
16,349
15,749
15,649
15,449
15,549
15,049
14,549
14,349
14,049
13,049
Panigrahi and Behera [35]
15,550
15,731
15,698
16,190
16,466
16,406
15,918
15,786
15,635
15,712
15,195
14,408
14,120
13,637
13,055
Pattanayak et al. [34]
(continued)
15,500
15,500
15,500
15,500
16,500
–
–
15,500
15,500
15,500
15,500
14,500
13,500
–
–
Proposed
18 S. Pant and S. Kumar
15,984
16,859
18,150
18,970
19,328
19,337
18,876
1986
1987
1988
1989
1990
1991
1992
3
Actual
Year
Partition
Table 2 (continued)
18,763
19,500
19,500
18,500
18,375
–
–
Gangwar and Kumar [17]
18,593
19,208
19,208
18,962
17,178
16,748
15,546
Gautam et al. [26]
18,918
18,918
18,918
18,918
18,918
16,877
16,194
Jain et al. [14]
18,750
18,750
18,750
18,750
16,833
16,833
15,458
Alam et al. [19]
19,100.96
19,106.35
19,333.83
18,969.86
17,214.01
16,893.30
15,575.51
Zeng et al. [31]
19,311
19,311
19,311
17,230
16,665
15,516
15,516
Pant and Kumar [38]
17,849
17,849
17,649
17,149
16,349
15,849
15,349
Panigrahi and Behera [35]
18,236
18,230
17,967
17,366
16,433
15,982
15,559
Pattanayak et al. [34]
18,815
19,500
19,300
18,500
18,250
–
–
Proposed
2 Particle Swarm Optimization and Computational Algorithm … 19
AFE
RMSE
1.31
247.44
Gangwar and Kumar [17]
1.46
347.89
Gautam et al. [26]
2.34
396.47
Jain et al. [14]
Table 3 Error comparison of different models
1.79
417.37
Alam et al. [19]
1.15
303.66
Zeng et al. [31]
2.89
422.68
Pant and Kumar [38]
3.83
1590.8
Panigrahi and Behera [35]
3.46
1342.3
Pattanayak et al. [34]
1.15
233.2
Proposed
20 S. Pant and S. Kumar
2 Particle Swarm Optimization and Computational Algorithm …
21
Step 5: We split the time series into three partitions using ratio formula and each partition consist of enrolments for the years 1971 to 1978, 1979 to 1985 and 1986 to 1992, respectively. Step 6: The proposed model is utilized to anticipate the time series data, and the forecasted values are tabulated in Table 2 alongside the outcomes from other models. To anticipate the enrolment for the year 1973, we utilised actual data from the previous two years, 1971 and 1972, and then used Eq. (4) to determine the difference parameter Di , from which the variables Fib and F Fib were computed using Eqs. (5) and (6), respectively. After testing the condition on Fib and F Fib , we use Eqs. (7)– (10) to update the parameters U and V, and then use Eq. (11) to make the forecast. To predict for the year 1974, we took real data from the previous three years, 1971– 1973, and then repeated the method, yielding a forecasted value of 14,500. The rest of the data points followed the same trend. Step 7: We calculate the RMSE and AFE (Table 3) to compare the model’s forecasting accuracy with certain previous models. According to Table 3, the suggested model’s RMSE and AFE values were found to be 233.2 and 1.15, respectively, which are the lowest in comparison to other models. Although the suggested model and Zeng et al. [31] have the same AFE value, the new model outperforms Zeng et al. [31] model in terms of RMSE. As a result, we may infer that the current model has a higher predicting accuracy based on these two measures.
5 Conclusion In this study, a computational algorithm to forecast the high-order weighted FTS by optimizing the weights of FLRs using PSO is proposed. The weights in prior studies were either taken as equal, meaning that past relations have equal impact on the forecast of the next state, or were taken in ascending order, implying that recent relations have a stronger impact than previous relations. In this study, we have given recent relations higher weights and subsequently optimised them using PSO to increase the model’s forecasting accuracy. The model has been applied to the University of Alabama enrolment dataset, and RMSE and AFE have been used to test the model’s correctness. The current model shines because it has least RMSE and AFE measure comparative to other models, as illustrated in Table 3. The proposed study could even be used to analyse a vast amount of time series data. Acknowledgements The first author gratefully acknowledges the support of the UGC (F. No. 16-9 (June 2018)/2019 (NET/CSIR)) of the Government of India for this research.
22
S. Pant and S. Kumar
References 1. Song Q, Chissom BS (1993) Fuzzy time series and its models. Fuzzy Sets Syst 54(3):269–277 2. Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy time series—Part I. Fuzzy Sets Syst 54(1):1–9 3. Song Q, Chissom BS (1994) Forecasting enrollments with fuzzy time series—Part II. Fuzzy Sets Syst 62(1):1–8 4. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353 5. Chen SM (1996) Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst 81(3):311–319 6. Chen SM, Zou XY, Gunawan GC (2019) Fuzzy time series forecasting based on proportions of intervals and particle swarm optimization techniques. Inf Sci 500:127–139 7. Cheng SH, Chen SM, Jian WS (2016) Fuzzy time series forecasting based on fuzzy logical relationships and similarity measures. Inf Sci 327:272–287 8. Egrioglu E, Aladag CH, Yolcu U (2013) Fuzzy time series forecasting with a novel hybrid approach combining fuzzy c-means and neural networks. Expert Syst Appl 40(3):854–857 9. Huarng K, Yu TH-K (2006) Ratio-based lengths of intervals to improve fuzzy time series forecasting. IEEE Trans Syst Man Cybern B (Cybernetics) 36(2):328–340 10. Singh P, Dhiman G (2018) A hybrid fuzzy time series forecasting model based on granular computing and bio-inspired optimization approaches. J Comput Sci 27:370–385 11. Abhishekh, Gautam SS, Singh SR (2018) A refined weighted method for forecasting based on type 2 fuzzy time series. Int J Model Simulat 38(3):180–188 12. Gautam SS (2019) A novel moving average forecasting approach using fuzzy time series data set. J Contr Autom Electr Syst 30(4):532–544 13. Singh SR (2008) A computational method of forecasting based on fuzzy time series. Math Comput Simul 79(3):539–554 14. Jain S, Mathpal PC, Bisht D, Singh P (2018) A unique computational method for constructing intervals in fuzzy time series forecasting. Cybern Inf Technol 18(1):3–10 15. Bisht K, Kumar S (2019) Hesitant fuzzy set based computational method for financial time series forecasting. Granular Comput 4(4):655–669 16. Gangwar SS, Kumar S (2012) Partitions based computational method for high-order fuzzy time series forecasting. Expert Syst Appl 39(15):12158–12164 17. Gangwar SS, Kumar S (2015) Computational method for high-order weighted fuzzy time series forecasting based on multiple partitions. In: Chakraborty MK, Skowron A, Maiti M, Kar S (eds) Facets of Uncertainties and Applications: ICFUA, Kolkata, India, December 2013. Springer, New Delhi, pp 293–302. https://doi.org/10.1007/978-81-322-2301-6_22 18. Joshi BP, Kumar S (2012) A computational method of forecasting based on intuitionistic fuzzy sets and fuzzy time series. In: Proceedings of the international conference on soft computing for problem solving (SocProS 2011) 20–22 December 2011. Springer, New Delhi, pp 993–1000 19. Alam NMFHNB, Ramli N, Mohamad D (2021) Fuzzy time series forecasting model based on intuitionistic fuzzy sets and arithmetic rules. In: AIP conference proceedings, vol 2365, no 1. AIP Publishing LLC, p 050003 20. Yu HK (2005) Weighted fuzzy time series models for TAIEX forecasting. Physica A 349(3– 4):609–624 21. Cheng C-H, Chen T-L, Chiang C-H (2006) Trend-weighted fuzzy time-series model for TAIEX forecasting. In: King I, Wang J, Chan L-W, Wang DL (eds) Neural Information Processing. Springer, Heidelberg, pp 469–477. https://doi.org/10.1007/11893295_52 22. Rubio A, Bermúdez JD, Vercher E (2016) Forecasting portfolio returns using weighted fuzzy time series methods. Int J Approx Reason 75:1–12 23. Kumar S (2019) A modified weighted fuzzy time series model for forecasting based on twofactors logical relationship. Int J Fuzzy Syst 21(5):1403–1417 24. Yang R, He J, Xu M, Ni H, Jones P, Samatova N (2018) An intelligent and hybrid weighted fuzzy time series model based on empirical mode decomposition for financial markets forecasting. In: Perner P (ed) Advances in Data Mining. Applications and Theoretical Aspects. Springer, Cham, pp 104–118. https://doi.org/10.1007/978-3-319-95786-9_8
2 Particle Swarm Optimization and Computational Algorithm …
23
25. Jiang P, Dong Q, Li P, Lian L (2017) A novel high-order weighted fuzzy time series model and its application in nonlinear time series prediction. Appl Soft Comput 55:44–62 26. Gautam SS, Singh SR (2020) A modified weighted method of time series forecasting in intuitionistic fuzzy environment. Opsearch 57:1022–1041 27. Singh P (2020) A novel hybrid time series forecasting model based on neutrosophic-PSO approach. Int J Mach Learn Cybern 11(8):1643–1658 28. Tinh NV (2020) Enhanced forecasting accuracy of fuzzy time series model based on combined fuzzy C-mean clustering with particle swam optimization. Int J Comput Intell Appl 19(02):2050017 29. Iqbal S, Zhang C, Arif M, Hassan M, Ahmad S (2020) A new fuzzy time series forecasting method based on clustering and weighted average approach. J Intell Fuzzy Syst 38(5):6089– 6098 30. Pattanayak RM, Behera HS, Panigrahi S (2020) A multi-step-ahead fuzzy time series forecasting by using hybrid chemical reaction optimization with pi-sigma higher-order neural network. In: Das AK, Nayak J, Naik B, Pati SK, Pelusi D (eds) Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019. Springer, Singapore, pp 1029–1041. https:// doi.org/10.1007/978-981-13-9042-5_88 31. Zeng S, Chen SM, Teng MO (2019) Fuzzy forecasting based on linear combinations of independent variables, subtractive clustering algorithm and artificial bee colony algorithm. Inf Sci 484:350–366 32. Pattanayak RM, Behera HS, Panigrahi S (2020) A novel hybrid differential evolution-PSNN for fuzzy time series forecasting. In: Behera HS, Nayak J, Naik B, Pelusi D (eds) Computational Intelligence in Data Mining: Proceedings of the International Conference on ICCIDM 2018. Springer, Singapore, pp 675–687. https://doi.org/10.1007/978-981-13-8676-3_57 33. Pattanayak RM, Behera HS, Panigrahi S (2021) A novel probabilistic intuitionistic fuzzy set based model for high order fuzzy time series forecasting. Eng Appl Artif Intell 99:104136 34. Pattanayak RM, Panigrahi S, Behera HS (2020) High-order fuzzy time series forecasting by using membership values along with data and support vector machine. Arab J Sci Eng 45(12):10311–10325 35. Panigrahi S, Behera HS (2020) A study on leading machine learning techniques for high order fuzzy time series forecasting. Eng Appl Artif Intell 87:103245 36. Egrioglu E, Bas E, Yolcu U (2020) Intuitionistic fuzzy time series functions approach for time series forecasting. Granul Comput 37. Egrioglu E, Bas E, Yolcu U, Chen MY (2020) Picture fuzzy time series: defining, modeling and creating a new forecasting method. Eng Appl Artif Intell 88:103367 38. Pant M, Kumar S (2021) Particle swarm optimization and intuitionistic fuzzy set-based novel method for fuzzy time series forecasting. Granular Comput 1–19 39. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN 1995international conference on neural networks, vol 4. IEEE, pp 1942–1948
Chapter 3
Assessing Usability of Mobile Applications Developed for Autistic Users through Heuristic and Semiotic Evaluation Sayma Alam Suha, Muhammad Nazrul Islam, Shammi Akter, Milton Chandro Bhowmick, and Rathin Halder
1 Introduction Autism is defined to be a neuro-developmental disorder that can be identified through early social skills and communication dysfunction with rigid and repeated sensorymotor behavioral habits and interests [11]. Researchers have described autism as a serious impairment in brain development that firstly occurs in children and the symptoms tend to persist throughout their lifetimes being adult [2]. Linguistic development with verbal or nonverbal communication difficulties are seen as one of the main characteristics of autism and they usually response effectively via visual approaches [7, 14, 18, 21]. Interactive and stimulating mobile apps are one of the most promising technologies to help autistic people for developing their communication abilities [1]. Therefore, Augmented and Alternative Communication (AAC) therapies are recommended for such people [29]. The AAC is defined to be one of the most effective way of communication for autistic people or for people with communication impairments [5]. Although, a vast variety of AAC devices and tools exit, but mobile applications are remarkably defined to be the most appropriate ones due to their portability, affordability and user friendliness [32]. These kind of mobile applications are designed to present characters, words, or symbols to interact with others [35] and generally use Picture Exchange Communication System (PECS) that involves exchange of pictures to communicate with other persons [12, 13]. These mobile applications are required to be usable for focused end users. Thus it is necessary to evaluate usability of such applications to make them easier to understand, interact and use by the autistic users. Though a number of studies have been conducted to evaluate the usability of mobile applications [17, 19, 20, 30], but a little attention has been paid regarding the usability and its S. A. Suha · M. N. Islam (B) · S. Akter · M. C. Bhowmick · R. Halder Department of Computer Science and Engineering, Military Institute of Science and Technology, Mirpur Cantonment, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_3
25
26
S. A. Suha et al.
evaluation while developing the apps for autistic users. Hence, it is crucially significant to systematically evaluate such mobile apps to assess how these apps can satisfy the usability criteria for this specific group of end-users. Thus, the objectives of this research are to evaluate the usability of mobile applications developed for enhancing the communication skills of autistic people; to assess the applicability of modified set of heuristics suggested for evaluating the usability of such kind of mobile apps for autism; to assess the importance of semiotic evaluation for autism communication apps; and to provide a comparative views among the findings of different usability evaluation methods. To attain these objectives, four mobile applications developed for the autistic users to enhance their communication skills were selected and then heuristic evaluation and a semiotic evaluation were conducted. The rest of the paper is structured in the following way: Sect. 2 presents the related literature; Sect. 3 discusses the methodology followed in this research; Sect. 4 presents the evaluation of applications and the findings; and finally, Sect. 5 includes the discussion and conclusion that highlights the main outcomes, limitations and future work.
2 Background and Related Works The primary problems for children with autism are; the malfunction of understanding and thinking or cognitive function, as well as a lack of self-regulation and control of communication via verbal or nonverbal motives, feelings and emotion [34]. Thus, the children with autism require special education and guidance along with proper treatment. However, several software and mobile solutions are developed for the autistic children to learn or teach [8, 31]. In this study, the focused area of assessment is the mobile applications using PECS technique which is a type AAC technology for developing communication skill of the autistic users. For evaluating such kind of applications, two techniques had been chosen. Among them the first one is heuristic evaluation. Jakob Nielsen had developed a set of 10 general principles called heuristics to check the usability of any user interface [27]. However, in this study, a modified set of heuristics was used to meet the goal of constructive evaluation of interventions for children with ASD [23]. The second approach is the semiotic evaluation where heuristics are proposed in the Semiotic Interface Sign Design and Evaluation (SIDE) framework [15, 16] were used to evaluate the selected mobile apps. A number of studies were conducted focusing on the usability evaluation and software or app developed for autistic users.According to Sofian et al. [33], despite the fact that the number of autism-related mobile applications is growing, most of them are underutilized and under-explored by autistic children due to a lack of suitable usability standards. Thus by evaluating past studies, they suggested a theoretical usability guideline to assist practitioners in developing the UI of mobile apps for autistic children focusing on usability aspects. In another study, Barry et al. [4] evaluated
3 Assessing Usability of Mobile Applications Developed ...
27
the usefulness of educational gaming software for autistic children in a systematic approach where they emphasized that these learners’ user preferences and learning styles require additional consideration in UI and interaction design which can incorporate eye-tracking analysis for more effective result. Camargo et al. [10] evaluated a mobile application prototype built with ‘Marvel’ for the autistic children based on SIDE framework for improving the usability performance of the app. While Brando et al. [9] utilized the Semiotic Inspection Method (SIM) to evaluate a gaming application designed for helping children with special needs to strengthen their cognitive skills. Al-Wakeel et al. [3] compared the usability of two Arabic mobile applications formulated for children with ASD where eye tracking and Morae measurement tools were employed in the analysis, and assessment tools were employed to collect qualitative and quantitative information and to assess the users’ contentment with the apps. Khan et al. [22] conducted a survey of ASD users to assess the usability of two communication apps for autistic persons, both of which are accessible on Android and iPhones, and then compared the results to address usability concerns. Bhuiyan et al. [6] employed interviews and observation analysis to evaluate the usability of a smartphone-based system called ’MumIES,’ which helps children with special needs to overcome their difficulties. So far none of the research work have conducted both heuristic and semiotic evaluation in case of the usability evaluation of communication related applications developed for autistic user. However, a few researchers have used such kind of usability evaluation techniques for assessing other types of applications. For example, Kundu et al. [24] used heuristic and semiotic assessment techniques to evaluate the usability of two pregnancy monitoring apps developed in context of Bangladesh, with the goal of providing design recommendations to improve the general usability and acceptability of such apps. Similarly, Muaz et al. [25] evaluated usability of three truck-hiring apps using heuristic and semiotic assessment techniques in order to give design recommendations for improving their usability and customer experience. Designing an interactive and user friendly mobile application where the target user is a special group of people (having autism spectrum disorder) is often very challenging. Because the process here is substantially different from developing a conventional mobile application. Again, the developers need to be very conscious about the user interface with specialized requirements for autistic users. Though many re-searchers have done usability evaluation in various perspectives, but the evaluation of mobile apps developed for improving communication skills of autistic people using both heuristic and semiotic approach is a unique one which have been conducted in this study.
3 Research Methodology An overview of the research methodology is presented in Fig. 1. Here, initially four mobile applications developed for enhancing the communication skills of autistic people were selected. For the selection of applications, initially ‘Google play store’
28
S. A. Suha et al.
Fig. 1 Overview of the research methodology
was searched with different types of keywords like ‘augmented and alternate communication talk’, ‘autism communication’ , ‘autism communication helper’ etc. A number of related apps were found and from the resultant list four were selected based on the following criteria: i) have used the PECS features and AAC technology for autistic users to communicate with others; ii) user rating in google play store; iii) number of times downloaded since the app release; and iv) comments provided as users reviews. The selected apps were (a) SymboTalk - AAC Talker (Rating: 4.1, Downloads: 50,000+), (b) Help Me Talk (Rating: 3.6, Downloads: 10,000+), (c) LetMeTalk: Free AAC Talker (Rating: 4.1, Downloads: 100,000+) and (d) Leeloo AACAutism Speech App (Rating: 4.2, Downloads: 50,000+). After the selection of apps; both heuristic and semiotic usability assessments were conducted for each of these apps by four evaluators having expertise on user interface design and development as well as human-computer interaction. In addition, all the evaluators were Computer Science graduates and are currently pursuing their postgraduate degrees. In heuristic evaluation, the applications were firstly evaluated following heuristics proposed by Khowaja et al. [23]. Khowaja et al. [23] proposed these 15 heuristics (see Table 1) for evaluating the mobile applications developed for autistic users which is an extended version of the Neilson’s set of 10 heuristics [28]. During the usability evaluation, each usability problem was noted in an excel sheet considering the field of: ‘Where is the problem’, ‘Problem Description’, ‘Evidence’, ‘Severity of the problem’, ‘Possible Solution’, and ‘Relationship with other problem’. Here, severity ratings suggested by Nielsen [26] has been used, where 0 indicates not a usability problem at all; 1 indicates a cosmetic problem; 2 implies a minor usability problem; 3 denotes a major usability problem; and 4 conveys catastrophic usability problem. Following the heuristic principles, the evaluators examined each of the applications individually and then their results were cross-checked to integrate, while conflict raised during the integration were solved through discussion. Interface signs need to be intuitive for applications related to autistic individuals because of their higher dependency on user interface signs, thus the second approach for usability evaluation was chosen to be semiotic evaluation. For conducting semiotic evaluation, a set of heuristics as proposed in SIDE framework [15, 16] was chosen (see Table 2). For the semiotic evaluation, the evaluators were instructed to fill up the following sections for each key interface signs of the selected applications: ‘Interface sign’, ‘Intended Meaning of the interface sign’, ‘Intuitiveness’,
3 Assessing Usability of Mobile Applications Developed ...
29
Table 1 Modified set of heuristics for evaluating applications developed for autistic users No Heuristics H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15
[Visibility] of system status [Match] between system and the real world [Consistency] and standards [Recognition] rather than recall Aesthetic and [minimalist] design: minimise distraction and keep design simple User [control] and freedom [Error] prevention [Flexibility] and efficiency of use Help users recognise, diagnose, and [recover] from errors Help and [documentation] [Personalisation] of screen items User interface [screens] of the system [Responsiveness] of the system [Track] user activities monitor performance and repeat activity Use of [multi-modalities] for communication
‘Evidence’, and ‘Suggest Possible Solution’. Here, ‘intended meaning of the interface sign’ refers to the actual meaning of that specific sign and the meaning that the developer intended to provide. The ‘intuitiveness’ refers to a proper judgment of evaluation through the range of 1–9, where 1–3 signifies low intuitiveness; 4–6, moderate intuitiveness; and 7–9, high intuitiveness based on how much it violates the semiotic evaluation structure. The ‘evidence’ indicates to point out why is this interface sign problematic by referring the number of semiotic heuristic(s) it violated from the set of heuristics. Finally, ‘suggest possible solution’ refers to provide a possible solution to overcome the problems of an interface sign. After individual assessment, the results were aggregated for better judgement and then the severity level analysis for the selected apps was done according to the severity ratings suggested by Nielsen. For the severity rating it has been considered for the study that, if the intuitiveness score is 8–9 then it is a cosmetic problem; followed by score 5–7 is minor usability problem; score 4–5 is major usability problem and 1–3 is catastrophic usability problem. Finally, a comparison between the findings of the heuristic and semiotic evaluation was conducted.
30
S. A. Suha et al.
Table 2 Set of semiotic heuristics Level Semiotic heuristics Syntactic
Pragmatic
Social
Environmental Semantic
SH1. Clearly present the purpose of interactivity SH2. Make effective use of color to design an interface sign SH3. Make the representamen readable and clearly noticeable SH4. Make a sign presentation clear and concise SH5. Create the representamen context appropriately SH6. Follow a consistent interface sign design strategy SH7. Place the interface sign in the proper position in a UI SH8. Make effective use of amplification features SH9. Create good relations among the interface signs of a UI SH10. Retain the logical coherence in interface sign design SH11. Design interface signs to be culturally sensitive or reactive SH12. Matches the reality, conventions, or real-world objects SH13. Make effective use of organizational features SH14. Map with metaphorical and attributing properties SH15. Model the profiles of the focused end-users SH16. Make effective use of ontological guidelines SH17. Realize a match between a designer’s encoded and a user’s decoded meaning.
4 Evaluation of the Applications 4.1 Heuristic Evaluation Heuristic evaluation is a usability assessment procedure where the evaluators determine its compatibility against set of rules. The evaluators here evaluated four mobile applications considering the 15 rules of modified heuristics and all flaws were identified and suggested to be modified with possible solutions. For example, in Fig. 2, few user interfaces (UIs) of SymboTalk app are presented. In this app there are two main modes, ‘ME/ Autism’ mode for autistic users and ‘Edit’ mode for the caregivers. In these UIs, no information about the system status is visible to understand whether a user is currently in ‘ME’ mode or ‘Edit’ mode. One of the evaluator identified it as a major usability problem with a severity rating of 3, since it violates heuristics H1 and H3 (see Table 1). To address this usability issue the evaluator suggested that, the mode of the app should be clearly indicated on top of each page to make it (system status) understandable for users. As the outcomes of the heuristic evaluation, the evaluators detected total 12 usability problems for ‘SymboTalk’ app, 16 problems for ‘HelpMe Talk’ app, 13 problems for ‘LetMe Talk’ app and 10 problems for ‘LeeloAAC’ app. Each of the detected
3 Assessing Usability of Mobile Applications Developed ...
31
Fig. 2 User interfaces of ‘Home page’ ‘Edit Mode’ and ‘Autism Mode’ pages from SymboTalkAAC Talker application
problems violated numbers of heuristics. Findings (usability problems and its associated violations of heuristics) of each evaluator for each app were integrated and the synthesized results are presented in Table 3. The table shows the number of problems to each heuristic on each application, the total number of heuristic violations for each apps and the total number of violations found against each heuristics. The findings implies that four applications have violated different heuristic principles and among them HelpMeTalk has more heuristic violations (n=32) compared to other apps while Leeloo has least number of heuristic violations (n=20). It is also visible that, the user control and freedom (H6) was violated most in SymboTalk app; flexibility and efficiency of use (H8) was violated most in HelpMeTalk app; visibility of system status (H1) was violated most in LetMeTalk app and finally personalisation of screen time (H11) was violated most in Leelo-AAC app. Again, in general among the 15 heuristics, mostly violated heuristic for the selected apps are H1(n=12) and H8(n=12); followed by H2(n=10) and H11(n=10). On the other hand, the least violated heuristics are H7(n=2); followed by H15(n=3). Again, the problems of the applications were analyzed with respect to their severity (see Fig. 3). The analysis shows that, the highest number of cosmetic problems (severity rating 1) arise in HelpMeTalk and Leeloo-AAC apps; the highest number of minor usability problems (severity rating 2) arise in LetMeTalk app; then the highest
32
S. A. Suha et al.
Table 3 Number of problems to each heuristics Modified
SymboTalk
HelpMe Talk
LetMeTalk
LeelooAAC
Total violations
heuristics
(12 problems)
(16 problems)
(13 problems)
(10 problems)
(per heuristic)
Heuristic1
3
4
4
1
12
Heuristic2
3
3
3
1
10
Heuristic3
1
1
0
2
4
Heuristic4
1
1
1
1
4
Heuristic5
1
4
3
1
9
Heuristic6
4
1
2
2
9
Heuristic7
1
0
1
0
2
Heuristic8
2
5
3
2
12
Heuristic9
2
1
3
1
7
Heuristic10
0
2
1
2
5
Heuristic11
3
2
2
3
10
Heuristic12
1
3
1
2
7
Heuristic13
1
2
1
1
5
Heuristic14
1
2
2
1
6
Heuristic15
1
1
1
0
3
Total violations 25 (per app)
32
28
20
Fig. 3 Severity levels of the identified problems for different apps
major usability problems (severity rating 3) arise in SymboTalk app; and finally the highest Catastrophic usability problems (severity rating 4) arise in HelpMeTalk app. The average severity rating of SymboTalk is 2.6; HelpMeTalk is 2.7; LetMeTalk is 2.5 and Leeloo-AAC is 2.1 out of 4.00. The results thus indicated that in average, the problems found in HelpMeTalk were more severe.
3 Assessing Usability of Mobile Applications Developed ...
33
4.2 Semiotic Evaluation Semiotic evaluation investigates the visual acuity of selected applications including symbols, navigation links, buttons, icons, and other visual directions. Therefore, the key interface elements which are present in the four applications were selected for the semiotic evaluation. The examples of the interface signs selected for the
Fig. 4 Selected Interface Signs from each app for evaluation
34
S. A. Suha et al.
Fig. 5 Home interface sign of SymboTalk app
evaluation are illustrated in Fig. 4. In this research, interface elements of the four selected applications were evaluated through semiotic heuristics (see Table 2) proposed in the SIDE framework [15, 16]. For all the selected applications, the evaluator identified 10 necessary interface signs to evaluate and conducted semiotic evaluation for each of the signs. For example, the app icon for the SymboTalk app stands for indicating ‘Homepage’. But in “ME/Autism mode” the icon doesn’t provide any functionality (see Fig. 5) though in some other pages the app icon stands for navigating back to homepage; which violets the semiotic heuristic no SH1, SH6, SH12, SH16. So the evaluator marked this sign as being less intuitive as it works as homepage for some pages but does not provide any meaning for rest of the pages. The evaluators suggested to make the app icon consistently intuitive for all pages. As alternative, they also recommended to use a conventional interface sign for ‘Homepage’ instead of the used sign in this case and that sign should be consistently meaningful for all pages. Again, the evaluators provided intuitiveness scores for each of the interface signs of the selected apps. For example, in the Fig. 6 the user interface of the four applications are given, representing the sentence making blocks for each app. The intuitiveness scores of the Sentence Making block for SymboTalk, HelpMeTalk, LetMeTalk and Leeloo-AAC were 5, 4, 6 and 9 respectively. The Sentence Making block of SymboTalk, HelpMeTalk, LetMeTalk violates semiotic heuristics from syntactic, pragmatic and environmental level and thus represents moderate intuitiveness. On the other hand the Leeloo-AAC app has succeed to express the intended meaning more accurately comparing the other three apps and thus the evaluator judged it to be highly intuitive. The aggregated and average intuitiveness scoring for each of the selected interface signs for each app is presented in Table 4. The results showed the average intuitiveness for each of the apps that indicated, Leelo-AAC possesses the highest average intuitiveness (score 6.7); followed by SymboTalk (score 5.6), LetMeTalk (score 5.3) and the least average intuitiveness was observed in HelpMeTalk (score 4.7) app.
3 Assessing Usability of Mobile Applications Developed ...
35
Fig. 6 ‘Sentence block’ signs of (a) SymboTalk, (b) LetMeTalk, (c) HelpMeTalk, (d) Leeloo
Fig. 7 Severity level of interface signs
The problems were further categorized according to their severity levels. The higher value of severity level indicates less intuitiveness of the application and vice versa. The severity level analysis of the four applications is illustrated in Fig. 7. The analysis shows that, the highest number of signs belongs to cosmetic problems arise in Leeloo-AAC app; signs that belongs to minor usability problems are observed mostly for three apps -SymboTalk, HelpMeTalk and LetMeTalk; the highest number of major usability problems arise for both SymboTalk and LetMeTalk and finally the highest number of catastrophic problem arise for HelpMeTalk app. The average severity rating of SymboTalk is 2.4; HelpMeTalk is 2.9 LetMeTalk is 2.7 and LeelooAAC is 1.9 out of 4.0. The results thus indicated that in average, problems found (intuitiveness of interface signs) in HelpMeTalk were more severe.
36
S. A. Suha et al.
Table 4 Intuitiveness scoring for selected apps Interface signs SymboTalk Logo Home button Communication block Sentence making block Add customized symbol Language settings Save sign Back sign Share sign Play sign Avg intuitiveness (per app)
4 3 8 5 9 5 6 2 6 8 5.6
HelpMe Talk
LetMeTalk
LeelooAAC
8 2 7 4 3 2 5 8 0 8 4.7
7 1 7 6 3 8 2 8 3 8 5.3
5 8 9 9 7 8 3 8 8 2 6.7
4.3 Comparative Analysis The findings of the heuristic evaluation showed that, the Leeloo-AAC app has the fewest usability issues and least heuristic rules violations with the lowest average severity ratings; followed by SymboTalk and LetMeTalk apps. On the other hand, the HelpMeTalk app has the mostly identified usability issues and most heuristic rules violations with the highest severity ratings. In context of semiotic evakuation, LeelooAAC app shows relatively higher intuitiveness in maximum of the selected interface signs with least average severity rating. SymboTalk and LetMeTalk balanced the average level of intuitiveness by maintaining moderately intuitive interface signs. HelpMeTalk relatively showed the worst balance of the intuitiveness where maximum interface signs were less intuitive and thus possessed the highest average severity rating. In sum the study showed that, the ‘Leeloo-AAC’ app has the best usability compared to the other three apps, with ‘Help-MeTalk’ having the least usability from both the heuristic and semiotic standpoint. The results showed that, most of the identified usability problems found through the heuristic evaluation were not observed in semiotic evaluation and similarly, many of the problems identified by semiotic evaluation were not revealed through heuristic evaluation. Thus, integrating the both approaches to evaluate would be a more effective approach for enhancing the app usability.
3 Assessing Usability of Mobile Applications Developed ...
37
5 Discussion and Conclusion In this study, the usability of four communication skill development applications for autistic users were assessed based on heuristic and semiotic principles. For each of the apps, the heuristic evaluation helped to identify usability problems; whereas the semiotic evaluation provided the intuitiveness of the interface elements. The comparative examination between the apps enlightened the effectiveness, consistency, adaptability, usefulness and preferences of the apps. The study result showed that all applications have a noticeable number of usability problems, while the ‘Leeloo’ app showed better and ‘HelpMeTalk’ showed least performance comparatively than the others considering the usability perspective. The findings of this study will substantially assist practitioners in designing and developing apps for developing the communication skills of autistic end users. As these apps are designed for persons with disabilities, the discoveries of this investigation will help professionals better understand existing users’ capability and usability difficulties. The findings of the applications can be pro-posed and utilized for future development in the way of maintaining these apps by fixing the usability issues. As a result, autistic users may face less difficulty while using these applications with better performance. The study’s limitation was that the assessment procedure did not collect information from any actual autistic end users; instead, the study’s results were solely based on expert evaluation. In the future, doing a detailed survey and interview can assist in overcoming this limitation. In addition, the researcher hopes to conduct additional studies like these for more autism-related apps in the future. Today with the development of research and technologies, a lot of people with autism have a stronger potential than they had before; where more individuals with autism can communicate and contribute to the society and eventually some of them are expected be relatively free of adulthood autism symptoms with practice. Thus, building effective and usable communication skill development applications can help autistic end-users overcome communication deficits.
References 1. Ahmad WFW, Zulkharnain NAB (2020) Development of a mobile application using augmentative and alternative communication and video modelling for autistic children. Glob Bus Manag Res 12(4) 2. Akanksha M, Sahil K, Premjeet S, Bhawna K (2011) Autism spectrum disorders (ASD). Int J Res Ayurv Pharm 2(5):1541–1546 3. Al-Wakeel L, Al-Ghanim A, Al-Zeer S, Al-Nafjan K (2015) A usability evaluation of Arabic mobile applications designed for children with special needs-autism. Lect Notes Softw Eng 3(3):203 4. Barry M, Kehoe A, Pitt I (2008) Usability evaluation of educational game software for children with autism. In: EdMedia+ innovate learning. Association for the Advancement of Computing in Education (AACE), pp 1366–1370 5. Beukelman DR, Mirenda P et al (1998) Augmentative and alternative communication. Paul H, Brookes Baltimore
38
S. A. Suha et al.
6. Bhuiyan M, Zaman A, Miraz MH (2017) Usability evaluation of a mobile application in extraordinary environment for extraordinary people. arXiv preprint arXiv:1708.04653 7. Bin Munir M, Alam FR, Ishrak S, Hussain S, Shalahuddin M, Islam MN (2021) A machine learning based sign language interpretation system for communication with deaf-mute people. In: Proceedings of the XXI international conference on human computer interaction, pp 1–9 8. Bölte S, Golan O, Goodwin MS, Zwaigenbaum L (2010) What can innovative technologies do for autism spectrum disorders? 9. Brandão A, Trevisan DG, Brandão L, Moreira B, Nascimento G, Vasconcelos CN, Clua E, Mourão P (2010) Semiotic inspection of a game for children with down syndrome. In: 2010 Brazilian symposium on games and digital entertainment. IEEE, pp 199–210 10. Camargo MC, Carvalho TC, Barros RM, Barros VT, Santana M (2019) Improving usability of a mobile application for children with autism spectrum disorder using heuristic evaluation. In: International conference on human-computer interaction. Springer, pp 49–63 11. Campisi L, Imran N, Nazeer A, Skokauskas N, Azeem MW (2018) Autism spectrum disorder. Br Med Bull 127(1) 12. Charlop-Christy MH, Carpenter M, Le L, LeBlanc LA, Kellet K (2002) Using the picture exchange communication system (PECS) with children with autism: assessment of PECS acquisition, speech, social-communicative behavior, and problem behavior. J Appl Behav Anal 35(3):213–231 13. Flippin M, Reszka S, Watson LR (2010) Effectiveness of the picture exchange communication system (PECS) on communication and speech for children with autism spectrum disorders: a meta-analysis. Am J Speech Lang Pathol 14. Hasan N, Islam MN (2019) Exploring the design considerations for developing an interactive tabletop learning tool for children with autism spectrum disorder. In: International conference on computer networks, big data and IoT. Springer, pp 834–844 15. Islam MN, Bouwman H (2015) An assessment of a semiotic framework for evaluating userintuitive web interface signs. Univ Access Inf Soc 14(4):563–582 16. Islam MN, Bouwman H (2016) Towards user-intuitive web interface sign design and evaluation: a semiotic framework. Int J Hum Comput Stud 86:121–137 17. Islam MN, Bouwman H, Islam AN (2020) Evaluating web and mobile user interfaces with semiotics: an empirical study. IEEE Access 8:84396–84414 18. Islam MN, Hasan AS, Anannya TT, Hossain T, Ema MBI, Rashid SU (2019) An efficient tool for learning bengali sign language for vocally impaired people. In: International conference on mobile web and intelligent information systems. Springer, pp 41–53 19. Islam MN, Karim MM, Inan TT, Islam AN (2020) Investigating usability of mobile health applications in Bangladesh. BMC Med Inf Dec Mak 20(1):19 20. Jerin JQ, Zaki T, Mahmood M, Rochee SK, Islam MN (2020) Exploring design issues in developing usable mobile application for dyscalculia people. In: 2020 Emerging technology in computing, communication and electronics (ETCCE). IEEE, pp 1–6 21. Jordan R (1993) The nature of the linguistic and communication difficulties of children with autism. In: Critical influences on child language acquisition and development. Springer, pp 229–249 22. Khan S, Tahir MN, Raza A (2013) Usability issues for smartphone users with special needsautism. In: 2013 International conference on open source systems and technologies. IEEE, pp 107–113 23. Khowaja K, Salim SS (2015) Correction: heuristics to evaluate interactive systems for children with autism spectrum disorder (ASD). Plos One 10(8) 24. Kundu S, Kabir A, Islam MN (2020) Evaluating usability of pregnancy tracker applications in Bangladesh: a heuristic and semiotic evaluation. In: 2020 IEEE 8th R10 humanitarian technology conference (R10-HTC). IEEE. pp 1–6 25. Muaz MH, Islam KA, Islam MN (2020) Assessing the usability of truck hiring mobile applications in Bangladesh using heuristic and semiotic evaluation. In: International conference on design and digital communication. Springer, pp 90–101
3 Assessing Usability of Mobile Applications Developed ...
39
26. Nielsen J (1994) Enhancing the explanatory power of usability heuristics. In: Proceedings of the SIGCHI conference on human factors in computing systems. pp 152–158 27. Nielsen J (1994) Usability inspection methods. In: Conference companion on human factors in computing systems. pp 413–414 28. Nielsen J (2005) Ten usability heuristics 29. Nunes DR (2008) AAC interventions for autism: a research summary. Int J Spec Educ 23(2):17– 26 30. Rahman MM, Sarker A, Khan IB, Islam MN (2020) Assessing the usability of ridesharing mobile applications in Bangladesh: an empirical study. In: 2020 61st international scientific conference on information technology and management science of Riga Technical University (ITMS). IEEE, pp. 1–6 31. Sehaba K, Courboulay V, Estraillier P (2006) Observation and analysis of behaviour of autistic children using an interactive system. Technol Disab 18(4):181–188 32. Sennott S, Bowker A (2009) Autism, AAC, and proloquo2go. Perspect Augment Altern Commun 18(4):137–145 33. Sofian NM, Hashim AS, Ahmad WFW (2018) A review on usability guidelines for designing mobile apps user interface for children with autism. In: AIP conference proceedings, vol 2016. AIP Publishing LLC, p 020094 34. Trevarthen C (1998) Children with autism: diagnosis and interventions to meet their needs. Jessica Kingsley Publishers 35. Wheeler M, Wolf F, Kuber R (2013) Supporting augmented and alternative communication using a low-cost gestural device. In: Proceedings of the 15th international ACM SIGACCESS conference on computers and accessibility. pp 1–2
Chapter 4
Blockchain Implementations and Use Cases for Inhibiting COVID-19 Pandemic Amirul Azim and Muhammad Nazrul Islam
1 Introduction The deadly spread of COVID-19 has obscured all other epidemics of coronavirus. A virus, which is a highly contagious, has brought the human life completely at halt. SARS COV-2 for its rapid change of biography, the World Health Organization (WHO) is more or less appear to be helpless to give an appropriate solution, and the WHO is currently committed to a single goal of maintaining universal health care standards [1]. As of 07 October 2021, COVID-19 has spread to 223 countries and territories affecting more than 230 million people and claiming more than 4.8 million lives [2]. However, the development of reliable and a powerful vaccine is the only effective way to end the COVID-19 epidemic [3]. The WHO expressed that, it did not expect an examined vaccine against SARS-COV-2 in less than 18 months [4]. The reason behind the delay is that the clinical trials and research is often a long process with rules and regulations that need to be observed [5]. Numbers of research initiatives have been taken to design, develop and deploy the ICT or digital systems to fight with the COVID-19 pandemic [6–9]. Again, to expedite the global research work to get a workable vaccine of COVID-19 as fastest as possible the WHO has taken a milestone initiative by facilitating collaboration to accelerated efforts with arranging vital communications across the research community and beyond the community [10]. However, transmission and sharing of research data over the internet need CIA (confidentiality, integrity, and accessibility), non-repudiation, and immutability. Research data is stored using various electronic A. Azim (B) Department of Information and Communication Technology (ICT), Bangladesh University of Professionals (BUP), Mirpur Cantonment, Dhaka 1216, Bangladesh e-mail: [email protected] M. N. Islam Department of Computer Science and Engineering (CSE), Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka 1216, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_4
41
42
A. Azim and M. N. Islam
media, data management system, etc. which can be easily stolen, tampered with, or completely removed [11]. Again, third party data storage is also unreliable because third-party credibility cannot be guaranteed. Unreliable and faulty misuse of data will result in a negative result. The Blockchain offers a distributed archive technology that performs as a collective database; keeps copies of verified and scanned data [12, 13]. In recent days, the Blockchain technology has proved its potentiality in revolutionizing the healthcare industry, particularly in the field of data sharing of scientific research for research and development, data management, data storage (cloud-based systems),electronic health records, clinical trials, pharmaceutical healthcare, healthcare IoT data security, artificial intelligence, etc. [12, 14]. A recent survey [15] proposed that Blockchain-based solutions for coronavirus fighting would give the solution for pandemic outbreak tracking, medical data sharing and medical supply chain tracking. As the novel technology Blockchain has enormous potentiality, there are several health care facilities that can be enhanced using Blockchain technology. Therefore, the objective of this research is to devise a set of possible features or use cases towards developing a Blockchain-based epidemic management system to store and analyze SARS COV2 epidemic medical record; to track down infected and to achieve an outcome of clinical trials. This article is organized as follows: the succeeding section provides a glimpse of systematic research methodology to derive the Blockchain healthcare factors or use cases that relate to pandemic data management. Section 3 presents suggested Blockchain-based healthcare use-cases that would manage the COVID-19 pandemic data system. Section 4 concludes the article by discussing the results including the future progress and improvement.
2 Research Methodology The research has adopted the systematic literature review (SLR) [11, 16] and the systematic mapping [17] approaches to attain the research objectives. This use of the SLR approach would facilitate an examination of the present inclinations in terms of technology, methods, and thoughts that are active in developing Blockchain-based healthcare systems related to epidemic data sharing and management. The result of this systematic mapping will be to classify and map out Blockchain features or usage cases in relation to data sharing and management. These combined approaches will help to understand the modern idea of developing a Blockchain-based system for inhibiting the coronavirus. However, focus of the review study was limited in the field of healthcare, considering the construct of Blockchain, coronavirus, and Blockchainbased healthcare data management. An overview of the research methodology is presented in Fig. 1.
4 Blockchain Implementations and Use Cases ...
43
Fig. 1 Flow diagram of research methodology
2.1 Research Questions (RQ) The review study aims to answer the following RQ: RQ1: What kind of research has been done focusing on Blockchain against COVID-19? RQ2: Which features or use cases have been considered to develop or enhance any of the solutions designed for Blockchain to fight against COVID-19?
2.2 Conduct Search - Data Sources In this study, a systematic review of the literature was adopted to determine the purpose of the study. For selecting related articles, scholarly database such as Google Scholar, IEEE Explorer, Springer Link, MDPI, ACM digital library, and Science Direct were searched. Relevant articles were searched using a set of query thread(s). The search threads were created based on the research area and on defined research questions; for example, “blockchain” OR “blockchain” AND “healthcare*” OR “system*” OR “use cases*” OR “factors*” OR “coronavirus*” OR “COVID-19*” OR “pandemics*”. Digital library searches were conducted online in late 2020 and articles in English were used. The used procedure for searching [18] is summarized in Fig. 2. A summary of the results returned for each data search is presented in Table 1. Repeated articles were excluded by review of title, abstract, and introduction; finally, 19 articles were selected for review. During the review of each article, a collection of data was released that included: research objective, Blockchain application domain to combat COVID-19 epidemic, or other coronavirus health issues.
44
A. Azim and M. N. Islam
Fig. 2 PRISMA diagram of search procedure
2.3 Selection of Studies - Exclude and Include The selection process began with 4065 publications collected in digital libraries. Depending on the conditions, the publication is included in the formal review or discarded. The selection process is divided into four steps: Identification and Removal of Duplication: Search studies included from the year 2016 to 2020. Search results were structured based on the inclusion and exclusion process. To avoid reviewing duplicate articles, the articles that appear more than once as a duplicate was removed. Screening: Analyzing the title and abstract searched articles were screened and irrelevant records were excluded.
4 Blockchain Implementations and Use Cases ... Table 1 Summary of search results Databases Number of articles Google Scholar IEEE Explore Springer MDPI ACM Digital Library Total
5030 1697 219 10 61 7027
45
Articles suitable for detailed screening 49 21 23 9 10 118
Eligibility: Subtracting the number of excluded articles following the screening procedure, full-text articles assessed for eligibility were 118 publications (see Table 1). Full-text articles excluded, for absent of Blockchain fighting against COVID-19 (n = 43), not in English language (n = 07), not within last 5 years (n = 11), limited relevance (n = 08). Finally, 49 publications were selected (eligible) for review. Included: The eligible 49 articles were examined in more detail that closely related to specific research questions. The remaining stored results had a positive impact on the healthcare sector involving Blockchain. Finally, 19 publications were included in the systematic review. The procedure was strict so that only pertinent and quality studies to be selected.
2.4 Data Extraction All (n = 19) the articles were read in detail to extract the relevant data items that have presented in Tables 2 and 3, while the derived use-cases for sharing and management of COVID-19 pandemic data are illustrated in Table 4 and Fig. 3.
3 Synthesizing the Review Data The objectives of the reviewed studies were grouped into a total of four focused objectives. Similarly, eight use cases were found from the existing studies.
3.1 Focused Objectives The focused objectives of the reviewed studies are briefly discussed in this subsection.
46
A. Azim and M. N. Islam
Table 2 Study objectives of revised articles Ref. Study objectives [14] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36]
Healthcare data management capabilities with respect to secure and decentralize system. Provide trustless and transparent health data sharing system COVID-19-blockchain-based data management system to track coronavirus and future epidemics Health data record with its data security, validity and access control and secure medical supply-chain management Hospital device tracking, patients’ data sharing and clinical trials Establishing trusted links for healthcare data sharing with interoperablitiy and decentralize system Provide secure medical and pharmaceutical supply-chain management system Tracking system for the COVID-19, data collection and management Blockchain based healthcare data sharing and storage management system Blockchain based secure data sharing and research Blockchain based EMR, remote patient monitoring health data research Cross-organizational medical data sharing and access management Blockchain based trusted medical data sharing for collaborative research, clinical trial and precision medicine Blockchain based medical data sharing among medical big data custodian for research with secure access control Blockchain technology for maintaining medical records and creating new blocks. Multi-organizational clinical trial to design and implement a blockchain based network system. Implementation of medical cooperation using blockchain technology to combat against Covid-19, share research result, protecting patient’s privacy Share authentic COVID-19 data and tracking of pandemic relevant information Sharing COVID-19 diagnostic data of infectious patients and prevent the spreading of false information
Table 3 Objectives are grouped into four Focused Objectives (FO) Focused Objectives (FO) Ref.
Freq.
FO1: Data Sharing FO2: Data Storage FO3: Exchange of clinical-trial FO4: Track of pandemic data
14 (74%) 8 (42%) 6 (32%) 4 (21%)
[14, [20, [21, [20,
19, 21, 22, 22,
22–25, 27, 29–31, 33–36] 24–27, 29, 32] 27, 30, 31, 33] 28, 35]
4 Blockchain Implementations and Use Cases ...
47
Table 4 Explored use cases of Blockchain in healthcare system. Ser. Features/ Use Cases (UC) Ref. 1. 2. 3. 4. 5 6. 7. 8.
Secured Data Sharing Interoperability Patients’ Health Records Pandemic Data Storage Sharing SARS COV Clinical Trial Track of SARS COV Data Data Transparency Data Access Control
[14, [14, [14, [14, [19, [20, [14, [14,
19–23, 25, 27, 31, 34–36] 21, 24, 25, 28, 31, 34] 21–28, 31, 32] 19, 21, 23, 28, 29, 34, 36] 20, 22, 25, 26, 28, 30, 33] 35] 19, 20, 23, 34, 36] 19, 21]
Freq. 12 7 11 8 8 2 6 3
Data Sharing: Existing studies focusing on efficient data sharing include managing COVID-19 data with respect to security and decentralization of the system, sharing trusted and transparent COVID-19 data, managing data to track down the coronavirus, and establishing trusted links for sharing COVID-19 data with interoperability and access to control. Data Storage: The study focused on the secured decentralized data storage which includes patients’ health records, COVID-19 data records with efficient data tracking system, and access to control. The research revealed that decentralized system data distributed across various healthcare organizations across a network one impossible to modify or delete the data by the unauthorized users. Exchange of Clinical Trial: It is necessary to exchange trusted clinical trailed data among the healthcare organizations to get the combined outcome of a result. Therefore the study focused on sharing transparent SARS COV related clinical trial data over a cross-organizational sharing system and managing to control the access. Studies focusing to exchange clinical trail also facilitate collaborative combat against COVID-19 and tested outcome of research focusing to COVID-19 treatment. Track of Pandemic Data: Subsisting research also focused on COVID-19 data tracking with respect to patients’ health records, transparent pandemic data storage, and access to control. Studies focusing to tracking pandemic data to prevent the crossborder spread of COVID-19 through the travel of a person.
48
A. Azim and M. N. Islam
Fig. 3 Derived use cases to develop Blockchain based system for COVID-19 pandemic data sharing and management
3.2 Use Cases The eight use-cases revealed in this study are briefly discussed here. Secure Data Sharing: Secure sharing of pandemic data is of utmost importance and failure of such sharing could bring devastating consequences on research outcomes. Secure COVID-19 data sharing should satisfy the CIA triad confidentiality, integrity, and availability. Interoperability: Various health organizations and stakeholders have used various applications for storing COVID-19 pandemic data. COVID-19 data exchange architectures should provide interoperability so that different information systems, devices, and applications can access, interchange, assimilate and cooperatively use data in a coordinated manner within all applicable settings and with relevant organizations and stakeholders.
4 Blockchain Implementations and Use Cases ...
49
Patients’ Health Record: Various Healthcare organization have used various system and database to keep Patients’ Health Record (PHR). All PHR data split across multiple facilities to be integrated in an automated manner. Pandemic Data Storage: Information to be stored as a distributed ledger that maintains a single source of information accessible to various health organizations around the world. Sharing SARS COV Clinical Trail: Data sharing system to provide a commitment to collaboration, and equal data access. This would facilitate a researcher to avoid duplication of research work. It would also help to develop necessary vaccines and drugs with minimum possible time. Data from the study could reveal important details about the symptoms and the development of the infection in different people. Tracking SARS COV Data: The term SARS COV tracking refers to the collection and analysis of COVID-19 data related to one’s health. This will track how quickly the virus spreads in different areas; identifying vulnerable areas; who is most at risk; prevent cross-border spread through a human by implementing restrictions on movement. Such tracking would also help a researcher to understand why some people fall into the development of more severe symptoms while others have only minor symptoms. Data Transparency: Data transparency is interred related to interoperability. The COVID-19 pandemic data sharing system would provide data transparency, commitment to collaboration, openness, and equal data access. It would also provide sharing of information and knowledge at a level so that a user within the system can easily understand. Data Access Control: The system would provide user access control to ensure that only authorized persons or organizations have the authority to access data storage.
4 Mapping Between Focused Objectives and Use-Cases The Focused Objectives were mapped with the derived eight uses cases as shown in Fig. 4: Mapping with Secure Data Sharing: A trusted COVID-19 data sharing system helps to get accurate and trusted outcomes amongst the researcher. Sharing of trusted pandemic data would satisfy secure data sharing, interoperability, data transparency, and data access control. Mapping with Data Storage: Traditional pandemic data records are isolation database storage systems and sometimes paper-based. The Blockchain-based COVID-19 record system would provide better decision-macking services and improve cooperation amongst healthcare organizations. It would address the secure PHR, COVID-19 data storage, tracking of SARS COV data and data access control.
50
A. Azim and M. N. Islam
Fig. 4 Mapping the study objectives with the derived use cases
Mapping with Exchange of Clinical Trail: In an outbreak of a transmittable disease, it is important to study and analyze all available data to identify the root causes, prevent continued spillover, to have a better understanding of the disease’s transmissible in terms of times and context. Such an objective would map with interoperability, sharing of SARS COV clinical trials, and data transparency. This would help a researcher to avoid duplication of effort at the same time improves the success of the COVID-19 research outcome. Mapping with Track of Pandemic Data: The past history of medical information is often regarded as one of the many ways to improve medicine and medication. Therefore for tracking of pandemic data needs patients’ past medical records of such as COVID-19 data records or PHR. The system would also need to map with the tracking of SARS COV data so that a researcher can explore relevant information of his research.
5 Conclusion Blockchain technology ensures the accuracy and quality of data through service vision and immunity, so it contributes significantly to the quality and accurate epidemiological data transactions. Again, as the coronavirus pandemic is the main focus of the world healthcare driving forces, therefore the implementation of novel technology would help to recover from this catastrophe. In this study, a total of 19 articles were meticulously reviewed and mapped between the focused objectives and use
4 Blockchain Implementations and Use Cases ...
51
cases. As outcomes, this review study has proposed eight use cases or factors that may contribute to develop Blockchain-based system to combat the COVID-19 pandemic. The revealed factors or use-cases are secure data sharing, interoperability, patient’s health record, pandemic data storage, sharing SARS COV clinical trail, track of SARS COV data, data transparency, and data access control. The study also showed that existing major studies primarily focused on Data Sharing, Data Storage, Exchange of Clinical Trail and Track of Pandemic Data. However, in the future, a Blockchain-based framework will be proposed considering the revealed use-cases for inhabiting the COVID-19 pandemic.
References 1. Reassessing sustainable governance models for the post-covid 19 world order. https:// moderndiplomacy.eu/2020/06/30/reassessing-sustainable-governance-models-for-the-postcovid-19-world-order/. Accessed 07 Oct 2021 2. Coronavirus disease (covid-19) pandemic. https://www.who.int/emergencies/diseases/novelcoronavirus-2019 . Accessed: 07 Oct 2021 3. Chamola V, Hassija V, Gupta V, Guizani M (2020) A comprehensive review of the COVID-19 pandemic and the role of IOT, drones, AI, blockchain, and 5G in managing its impact. IEEE Access 8:90225–90265 4. Grenfell R, Drew T (2020) Here’s why it’s taking so long to develop a vaccine for the new coronavirus. Science alert. Archived from the original on, 28, 2020 5. Hamid J, Stefan K, Arshad J, Gregory E, Al-Khateeb H (2019) Blockchain and clinical trial: securing patient data. Springer 6. Islam MN, Inan TT, Rafi S, Akter SS, Sarker IH, Najmul Islam AKM (2021) A systematic review on the use of AI and ML for fighting the COVID-19 pandemic. IEEE Trans Artif Intell 7. Islam MN, Najmul Islam AKM (2020) A systematic review of the digital interventions for fighting COVID-19: the Bangladesh perspective. IEEEE Access 8:114078–114087 8. Zaman A, Islam MN, Zaki T, Hossain MS (2020) ICT intervention in the containment of the pandemic spread of COVID-19: an exploratory study. arXiv preprint arXiv:2004.09888 9. Islam MN, Islam I, Munim KM, Najmul Islam AKM (2020) A review on the mobile applications developed for COVID-19: an exploratory analysis . IEEE Access 8:145601–145610 10. Update on who: solidarity trial - accelerating a safe and effective COVID-19 vaccine. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-onnovel-coronavirus-2019-ncov/accelerating-a-safe-and-effective-covid-19-vaccine. Accessed 07 Oct 2021 11. Li H, Zhu L, Shen M, Gao F, Tao X, Liu S (2018) Blockchain-based data preservation system for medical data. J Med Syst 42(8):1–13 12. Sultana M, Hossain A, Laila F, Taher KA, Islam MN (2020) Towards developing a secure medical image sharing system based on zero trust principles and blockchain technology. BMC Med Inf Dec Making 20(1):1–10 13. Pereira SN, Tasnim N, Rizon RS, Islam MN (2021) Blockchain-based digital record-keeping in land administration system. In: Proceedings of international joint conference on advances in computational intelligence. Springer, pp 431–443 14. Khezr S, Moniruzzaman M, Yassine A, Benlamri R (2019) Blockchain technology in healthcare: a comprehensive review and directions for future research. Appl Sci 9(9):1736 15. Nguyen DC, Ding M, Pathirana PN, Seneviratne A (2021) Blockchain and AI-based solutions to combat coronavirus (COVID-19)-like epidemics: a survey. IEEE Access 9:95730–95753 16. Islam MN (2013) A systematic literature review of semiotics perception in user interfaces. J Syst Inf Technol
52
A. Azim and M. N. Islam
17. Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12th international conference on evaluation and assessment in software engineering (EASE) vol 12, pp 1–10 18. Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. PLoS Med 6(7):e1000097 19. Kumar T, Ramani V, Ahmad I, Braeken A, Harjula E, Ylianttila M (2018) Blockchain utilization in healthcare: key requirements and challenges. In 2018 IEEE 20th international conference on e-health networking, applications and services (Healthcom). IEEE, pp 1–7 20. Azim A, Islam MN, Spranger PE (2020) Blockchain and novel coronavirus: towards preventing covid-19 and future pandemics. Iberoamerican J Med 2(3):215–218 21. Stagnaro C (2017) White paper: innovative blockchain uses in health care. Freed Associates 22. Bell L, Buchanan WJ, Cameron J, Lo O (2018) Applications of blockchain within healthcare. Blockchain in healthcare today 23. Zhang P, Schmidt DC , White J, Lenz G (2018) Blockchain technology use cases in healthcare. In: Advances Computers, vol 111. Elsevier, pp 1–41 24. Clauson KA, Breeden EA, Davidson C, Mackey TK (2018) Leveraging blockchain technology to enhance supply chain management in healthcare: an exploration of challenges and opportunities in the health supply chain. Blockchain in healthcare today 25. Marbouh D, Abbasi T, Maasmi F, Omar IA, Debe MS, Salah K, Jayaraman R, Ellahham S (2020) Blockchain for covid-19: review, opportunities, and a trusted tracking system. Arab J Sci Eng 1–17 26. Siyal AA, Junejo AZ, Zawish M, Ahmed K, Khalil A, Soursou G (2019) Applications of blockchain technology in medicine and healthcare: challenges and future perspectives. Cryptography 3(1):3 27. Yaqoob S, Khan MM, Talib R, Butt AD, Saleem S, Arif F, Nadeem A (2019) Use of blockchain in healthcare: A systematic. Int J Adv Comput Sci Appl 10:5 28. Agbo CC, Mahmoud QH, Eklund JM (2019) Blockchain technology in healthcare: a systematic review. In: Healthcare, vol 7. Multidisciplinary Digital Publishing Institute, p 56 29. Xiao Z, Li Z, Liu Y, Feng L, Zhang W, Lertwuthikarn T, Goh RSM (2018) Emrshare: a cross-organizational medical data sharing and management framework using permissioned blockchain. In 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, pp 998–1003 30. Shae Z, Tsai JJP (2017) On the design of a blockchain platform for clinical trial and precision medicine. In 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 1972–1980 31. Xia QI, Sifah EB, Asamoah KO, Gao J, Du X, Guizani M (2017) Medshare: trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 5:14757–14767 32. Eman-Yasser D, Yousef-Awwad D, Shyan-Ming Y (2019) Medchain: a design of blockchainbased system for medical records access and permissions management. IEEE Access 7:164595– 164613 33. Choudhury O, Sylla I, Fairoza N, Das A (2019) A blockchain framework for ensuring data quality in multi-organizational clinical trials. In 2019 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 1–9 34. Resiere D, Resiere D, Kallel K (2020) Implementation of medical and scientific cooperation in the caribbean using blockchain technology in coronavirus (covid-19) pandemics. J Med Syst 44:1–2 35. Khatoon A (2020) Use of blockchain technology to curb novel coronavirus disease (covid-19) transmission. Available at SSRN 3584226 36. Chang MC, Park D (2020) How can blockchain help people in the event of pandemics such as the COVID-19? J Med Syst 44(5):1–2
Chapter 5
Ant Colony Optimization to Solve the Rescue Problem as a Vehicle Routing Problem with Hard Time Windows Mélanie Suppan , Thomas Hanne , and Rolf Dornberger
1 Introduction The classical Vehicle Routing Problem (VRP) aims to find the optimal route for a fleet of vehicles from a depot to a set of given destinations. This optimization problem has been extended in multiple ways by adding constraints like vehicle capacity and service time windows as well as dynamic integration of new customers. This allows its application to different real-world situations and, among them, to the routing of emergency vehicles in daily or disaster situations. Route optimization of emergency vehicles is important to ensure a timely and adequate response to various medical emergencies. In such scenarios, the challenge is to avoid a delayed arrival of rescue teams, which could lead to a worsening of medical conditions or even death. This rescue problem is usually modeled as a Capacitated Vehicle Routing Problem (CVRP) in which vehicles have a limited capacity that can never be exceeded. Depending on the chosen focus, different variables can be integrated in the model. In [1–5], the problem is set as a Multidepot Vehicle Routing Problem (MDVRP), where different hospitals can be regarded as depots. It is treated as a Vehicle Routing Problem with Time Windows (VRPTW) in [6]. In this case, penalties are given when the time windows are exceeded. A Dynamic Vehicle Routing
M. Suppan School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland T. Hanne (B) Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland e-mail: [email protected] R. Dornberger Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Basel, Switzerland © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_5
53
54
M. Suppan et al.
Problem (DVRP) is described in [7], which includes adaptation of the route by integrating new demands. The objective is generally to find a set of routes at a minimal cost, usually corresponding to the shortest or fastest route as described in [1, 5] and [7]. In rescue scenarios, however, it is not uncommon to add another objective related to the evolution of the patient’s condition over time. In [2], this alternative objective is to minimize the number of patients whose medical condition will worsen or who will die. The same purpose is sought in [3], although the objective is inversely formulated. The minimization of human suffering is another example presented in [6]. Constraints specific to the rescue problem such as level of injury and available time before worsening of medical conditions are not always considered in the literature. When taken into consideration, they are used in different ways and for different purposes. They are nevertheless important to ensure optimal care and to determine an ideal fleet size to limit avoidable casualties in rescue scenarios. Determining the optimal solution for each type of VRP is NP-hard. In other words, the time required to compute the solution may exponentially increase with the size of the problem. For this reason, different computational intelligence methods have been used to optimize this problem. Ant Colony Optimization (ACO) is often used either alone [7] or in combination with a Local Search (LS) algorithm [1, 6]. In [4], a Large Neighborhood Search alone is described. The traditional Genetic Algorithm (GA) [5] or its extensions [2, 3], as well as an adaptation of the Particle Swarm Optimization (PSO) algorithm [2], represent other options. This paper focuses on the ACO algorithm to solve the rescue problem modeled as a single-objective VRPTW with hard time windows. The aim is to find the best route to rescue people in a daily or disaster situation while considering their level of injury for transport prioritization and route adaptation. Hard time windows allow to solve the problem as a single-objective one, ensuring that all patients are being taken care of in a reasonable amount of time, which has scarcely been treated in the literature. The paper is organized as follows. Section 2 describes our formulation of the rescue problem. Section 3 presents ACO as the proposed optimization method. The used test cases to model the problem are described and the results are given in Sect. 4. Finally, Sect. 5 concludes the paper and proposes possible improvements.
2 Problem Definition and Modeling The rescue problem can be modeled by a graph G = (V, E), where the set of nodes V = 0, . . . , n represents the hospital (0) and the patient locations (1, . . . , n), while the set of edges E represents all connections between the nodes. The goal is to find a set of routes at minimal cost, corresponding to the shortest or fastest route, for a fleet of ambulances K with limited capacity so that all nodes are visited only once and by only one ambulance. The rescue problem is treated here as a single-objective single-depot VRPTW.
5 Ant Colony Optimization to Solve the Rescue Problem …
55
2.1 Input Variables The set of ambulances K is fixed for all formulations of the problem. The set of patients, however, will be changed to increase the size of the problem (small-size: 20, medium-size: 50). Each location on the map corresponds to only one patient and the demand at each location is therefore equal to 1. Each patient is assigned an injury level, which is divided into three categories. Each injury level is assigned a corresponding maximum time window: 1: less urgent: deadline 20. Increasing the number of leader agents seems to cancel this effect, which is positive, but at the same time the accuracy of the solution seems to decrease. Discussion: The addition of opinion leaders (Fig. 12) does not cause a breakdown of the algorithm’s performance; however, an improvement is not visible either. Neither when it comes to the accuracy of the swarm’s decision nor to the efficiency of finding a consensus. Depending on the number of leader agents within a swarm, either the value of the relative deviation to the optimal solution or the number of iterations is affected rather negatively than positively. Evaluating the results leads us to the assumption that a swarm performs best when the “regular” agents are on their own and not under the influence of one or multiple opinion leaders.
78
J. Kruta et al.
Fig. 11 Leader noise exploration II (1 leaders)
Fig. 12 Leader noise exploration III (5 leaders)
5 Conclusions and Outlook Although all proposed model modifications and the related experiments cannot achieve significantly better results in terms of reduced relative deviation or reduced number of iterations, another aspect is observed: With increasing dimensionality (number of agents or number of leaders), the bias is reduced, and the variance is enhanced. In modelling, dimensionality aspects play an important role. Low complexity and accuracy may allow precise estimates and stable, reproducible
6 Applying Opinion Leaders to Investigate the Best-of-n Decision Problem …
79
models. As such models produce few new associations, model creators attempt to decrease the precision of estimates and the model stability by adding to the variance. The benefit of models with a high variance is a reduction of the bias as more elements are considered. This also makes it possible to discover new or previously unknown associations between the elements’ dimensions [13]. This additional new feature allowed us to conduct several experiments and test different scenarios. We can show that our model can converge with leaders to the optimum in most case, while adding too many rogue agents [2] makes a good decision impossible. In our model, we have used only two roles – leaders and “regular” agents, while Chen et al. [9] propose three different classes of roles (e.g., three types of wolves) in their approach. Adding additional roles to the proposed opinion leader role could prove to be the necessary component to achieve the desired improvements in the results. In this paper, we were able to change the dynamics of the model by implementing relative weights to specific opinions, resulting in a modified behavior of the discrete democratic swarm (opinion leader effect). Our general conclusion is that our adapted approach makes it possible to produce multiple comparative results that indicate the following aspects: Using opinion leaders to find the best out of n possible solutions has a beneficial impact in certain very specific situations (see S1 and S4). The general perception is, however, that the presence of opinion leaders rather complicates the process of achieving the best possible solution. Furthermore, the implementation of these new features in the existing best-of-n decision-making model allows to conduct further experiments by parameter tuning. During this research work, several opportunities for further investigation have been identified. As mentioned in [7, 12], most research focuses on a static environment assuming that both the search environment and the option qualities do not change over time. Such a setting is unlikely in the real world, e.g., due to perturbations [12]. To simulate a dynamic setting, [1, 2] introduce stubborn agents into their models. In [12], the function of spontaneous opinion switching to simulate a further element of stochasticity is also used. Setting up a model using these features provides an interesting approach for further investigations. [1, 2] apply the opinion manipulation mainly to test the robustness of an algorithm. Our focus is to investigate a potential impact on accuracy and efficiency. [9] proposes a related investigation within their work. They make use of more than two types of role assignments and provide improving results within the investigation of the democratic behavior of a swarm. Adding a strengthened hierarchical level opens another possibility for research. The proposed problem model of Hügli and Pereira [2] has proven to be adaptable for further features. Another interesting approach is to investigate the change in the majority threshold or the change in the voting mechanism (as discussed in Sect. 2.1). The presented graphs also indicate outliers. The detection and investigation of these outliers must also be explored to show the highest statistical relevance.
80
J. Kruta et al.
References 1. Pochon Y, Dornberger R, Zhong VJ, Korkut S (2018) Investigating the democracy behavior of swarm robots in the case of a best-of-n selection. In: 2018 IEEE symposium series on computational intelligence SSCI 2. Hügli A, Pereira M (2020) Hyperparameter analysis of the democratic behavior of swarm robots. University of Applied Sciences and Arts Northwestern Switzerland FHNW 3. Ebert JT, Gauci M, Nagpal R (2018) Multi-feature collective decision making in robot swarms. In: 2018 Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS 2018), Stockholm, Sweden, 10–15 July 2018 4. Petrenko VI, Tebueva FB, Ryabtsev SS, Gurchinsky MM, Struchkov IV (2020) Consensus achievement method for a robotic swarm about the most frequently feature of an environment. IOP Conf Ser Mater Sci Eng 919(4):042025. https://doi.org/10.1088/1757-899X/919/4/042025 5. Valentini G (2019) How robots in a large group make decisions as a whole? From biological inspiration to the design of distributed algorithms. In: 2019 School of earth and space exploration, school of life sciences, Arizona State University, Tempe. arXiv:1910.11262v2 (2019) 6. Maître G, Tuci E, Ferrante E (2020) Opinion dissemination in a swarm of simulated robots with stubborn agents: a comparative study. In: 2020 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–6 7. Valentini G, Ferrante E, Dorigo M (2017) The best-of-n problem in robot swarms: formalization, state of the art, and novel perspectives. Front Robot AI 4:1–9 8. Ramsey M et al (2020) The prediction of swarming in honeybee colonies using vibrational spectra. Sci Rep 10(1):1–17 9. Chen X, Zhang Y, Li K, Huang B (2019) Path planning of mobile robot based on improved wolf swarm algorithms. In: 2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC 2019) 10. Ebert JT, Gauci M, Mallmann-Trenn F, Nagpal R (2020) Bayes bots: collective Bayesian decision-making in decentralized robot swarms. In: 2020 IEEE international conference on robotics and automation (ICRA), Paris, France 11. Phung N, Kubo M, Sato H (2020) Bias and raising threshold algorithm using learning agents for the best proportion-searching problem. In: The 3rd international conference on intelligent autonomous systems, IEEE Xplore 12. Prasetyo J, De Masi G, Ferrante E (2019) Collective decision making in dynamic environments. Swarm Intell 13(3):217–243 13. Alyass A, Turcotte M, Meyre D (2015) From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 8(1):1–12
Chapter 7
Pathfinding in the Paparazzi Problem Comparing Different Distance Measures Kevin Schär, Philippe Schwank, Rolf Dornberger, and Thomas Hanne
1 Introduction Pathfinding algorithms have long been used in various fields, such as route planning for robots in warehouses or controlling non-player characters in computer games [1, 2]. Different algorithms are used for pathfinding in different scenarios. However, the widely used and investigated A* algorithm is often applied to find the shortest path related to the length of the path between two defined nodes in a static scenario. In reality, these scenarios often consist of different types of obstacles and terrain structures that influence the movement and the path found [3]. Thus, the shortest path does not always correspond to the optimal (e.g., fastest) path. Therefore, the optimization must take such criteria into account. The A* algorithm can use various functions to evaluate the path, such as the Euclidean, Manhattan or the Chebyshev distance measures [4]. This paper reviews A* pathfinding performance efficiency using various distance measures in scenarios of different sizes and complexity by including different terrain structures and obstacles. Further, it investigates the performance using these distance measures by including four or eight neighbouring nodes that affect the possible movement.
K. Schär · P. Schwank Institute for Medical Engineering and Medical Informatics, School of Life Sciences, FHNW, Muttenz, Switzerland R. Dornberger Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Basel, Switzerland T. Hanne (B) Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_7
81
82
K. Schär et al.
2 Problem Statement The paper examines different distance measures within the A* algorithm in different maps of the size of 25 × 25, 50 × 50, 100 × 100 and 200 × 200 nodes (see Fig. 1 as an example). The allowed neighbouring nodes for movements are limited to four neighbours in the first experiment (only horizontal and vertical movements are allowed). The subsequent experiment allows eight neighbours (additional diagonal movements possible). The scenario is a Paparazzi problem [5]. It consists of different terrain structures and is a weighted pathfinding problem. The fictional goal is to have a photo of a celebrity taken by a paparazzo in the shortest possible time. The paparazzo has to enter a celebrity’s estate and sneak as efficiently as possible to the known location of the celebrity to take the photo. Security cameras installed on the property monitor a specific area. The paparazzo should avoid these areas, as a discovery would prolong the mission of the photoshoot. There are also various obstacles and terrain structures on the property, such as trees and bushes, pools and ponds and stone gardens. Due to their nature, the trees and bushes cover the paparazzo in front of the cameras. They can be crossed at a very high cost. Crossing the pool and stone gardens provides
Fig. 1 Example of a map with an extension of 100 × 100, which depicts the variety of obstacles. The following colour scheme is used: green – grass, trees, and bushes; blue – water; beige – stone garden; grey – different types of roads; red – camera; yellow – start node; orange – end node (both located in the lower-left part of the matrix)
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures … Table 1 Field types of the map and their costs
Field type
Colour in the matrix
83
Cost of the node
Wall
Purple
Infinity
Start node
Yellow
–
End node
Orange
–
Path found
Pink
–
Main road
Dark grey
5
Side road
Light grey
7
Grass
Light green
9
Trees/Bushes
Dark green
100
Shallow water
Light blue
25
Deep water
Dark blue
35
Stone garden
Beige
15
Building
White
1
Camera
Red
10, 20, 30, 40, 50
a direct route. However, the paparazzo needs additional time for this, as no visual protection is offered, and it requires increased effort. The start node of the paparazzo is marked by a yellow field. The end node or the location of the celebrity, is marked by an orange field. The matrix is supplemented by the described obstacles that influence the movement of the paparazzo. Red fields mark security cameras. Blue and beige fields represent the pool and ponds or the stone gardens on the estate. Grey fields mark different types of roads. Green fields indicate grassy areas and the location of a tree or bush. The saturation of the colours indicates how much the movement of the paparazzo is affected. A high saturation represents a strong slowdown, whereas a low saturation represents a moderate slowdown. Exact details of the costs can be found in Table 1. In principle, the weights are based on the information in [5] and specify the cost to travel through a particular node or cell of the grid. In addition to [5], further obstacles and their costs have been introduced to enlarge the problem.
3 Background Information 3.1 A* Algorithm The A* algorithm (see Fig. 2) is a heuristic search algorithm based on Dijkstra’s algorithm and the Breadth First Search algorithm. It was introduced in 1968 by Peter E. Hart, Nils J. Nilsson, and Bertram Raphael [6]. The A* algorithm is assumed to find the solution by visiting the fewest nodes. The basic idea of A* is to expand outwards from a defined start node S in a matrix, calculate the cost value of each
84
K. Schär et al.
a_star(start_node, end_node): OPEN = empty list; CLOSED = empty list; add start_node to OPEN; while (OPEN is not empty): current_node = find lowest cost node in OPEN; if (current_node == end_node): return solution = true; remove current_node from OPEN; add current_node to CLOSED; for each neighbour_node in current_node: if (neighbour_node == obstacle OR neighbour_node is in CLOSED): continue; // distance calculates cost of two given nodes new_cost = current_node_cost + distance(neighbour_node, current_node); Neighbour_node_cost = distance(neighbour_node, current_node); if (new_cost < neighbour_node_cost OR neighbour_node is not in OPEN): // distance calculates cost of two given nodes new_cost = new_cost + distance(neighbour_node, end_node); neighbour_node_cost = new_cost; // parent gets parent_node of given node parent(neighbour_node) = current_node; if (neighbour_node is not in OPEN): add neighbour_node to OPEN;
Fig. 2 Pseudo code of the A* algorithm
neighbouring node and select the node with the minimum cost value as the next traversal node. The expansion process is repeated until the defined end node E is reached. The cost function of the A* algorithm is determined by (1): f (n) = g(n) + h(n)
(1)
For each node n, the total cost f (n) is given as the sum of the costs g(n) and h(n). g(n) represents the actual path cost from the start node S to the current node n. h(n) is a heuristic factor that estimates the cost of moving from the current node n to the end node E. For the quality of the A* formulation, it is important to choose a proper heuristic function h(n) and suitable distance measures.
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures …
85
The A* algorithm implements two different sets (respectively lists), the closed set and the open set. The closed set consists of nodes that have already been visited and expanded. Successors have already been explored and included in the open list if this was the case. This list is empty at the beginning of the pathfinding process. The open set consists of nodes that have been visited but not expanded, meaning that successors have not been explored yet. This list contains pending tasks. The start node S is initially the only node within the open set [3, 6].
3.2 Extended A* Algorithm In the problem presented, various obstacles and terrain structures slow down the speed of a paparazzo. Therefore, these weighted nodes have to be considered to evaluate the cost f (n) of each node n. The new variable e(n) extends the traditional A* algorithm by taking these additional costs into account. The cost e(n) is added to the cost g(n). If there are equal costs f (n) for possible neighbouring nodes, the costs h(n) are added when deciding on the choice of the next node. This rule is adopted by adding the cost e(n) to the cost h(n) for the current problem [5]. Thus, the cost of moving to a particular node in the map can be determined by (2): f (n) = g(n) + e(n) + h(n)
(2)
3.3 Distance Functions The distance function in the A* algorithm is used to measure the distance between nodes, so the choice of heuristic function in map-based pathfinding determines the complexity of the algorithm [4, 7]. Based on previous research, various distance functions are commonly used in (2) for A* pathfinding are introduced below. 1) None The ‘None heuristic’ is based on (2) but simplifies it by ignoring the cost h(n). Therefore, it is an approximation of the Dijkstra pathfinding algorithm [7–9]. 2) Minkowski Distance The Minkowski distance calculates the distance between two map nodes. It is a generalisation of the Euclidean and Manhattan distance and adds a parameter p that allows different distance measures to be calculated. The Minkowski distance is calculated as follows [10]: n i=1
|X i − Yi | p
1/ p
(3)
86
K. Schär et al.
When p is set to 1, the calculation is the same as the Manhattan distance. When p is set to 2, it is the same as the Euclidean distance. Intermediate values provide a controlled balance between the two measures. 3) Manhattan Distance The standard heuristic for a square map is the Manhattan distance (see Fig. 3). As a distance measurement function, the Manhattan distance is a simple summation of the absolute horizontal and vertical distance between two map nodes [9, 11, 12]. 4) Euclidean Distance The Euclidean distance between two nodes in the Euclidean space is the length of a line segment between two nodes as a distance measurement function (see Fig. 4). Thus, it can be calculated by using the Pythagorean theorem [9, 11, 12]. 5) Chebyshev Distance The Chebyshev distance (see Fig. 5) is a variant of the diagonal distance. It assumes that the cost needed to move diagonally is equal to the cost needed to move vertically or horizontally. The diagonal distance works under the assumption that movements with eight neighbouring nodes are possible, i.e., horizontal, vertical, and diagonal movements are allowed. The diagonal movement is possible if a vertical and horizontal movement could also be performed simultaneously. If only one horizontal or vertical movement is possible, the next movement will be towards the target node based on the remaining possibility [9, 11, 12]. Fig. 3 Pseudo code of the Manhattan distance
manhattan_distance(current_node, end_node): // abs returns the absolute value of a given number distance_x = abs(current_node.x - end_node.x); distance_y = abs(current_node.y - end_node.y); return (distance_x + distance_y);
Fig. 4 Pseudo code of the Euclidean distance
euclidean_distance(current_node, end_node): // abs returns the absolute value of a given number distance_x = abs(current_node.x - end_node.x); distance_y = abs(current_node.y - end_node.y); // sqrt returns the square root of a given number return sqrt(distance_x * distance_x + distance_y * distance_y);
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures … Fig. 5 Pseudo code of the Chebyshev distance
87
chebyshev_distance(current_node, end_node): // abs returns the absolute value of a given number distance_x = abs(current_node.x - end_node.x); distance_y = abs(current_node.y - end_node.y); if (distance_y > distance_x): return distance_y; return distance_x;
3.4 Admissibility of Heuristics In pathfinding algorithms, a heuristic function is admissible if it never overestimates the cost of reaching the end node. The costs it estimates to reach the end node must not be higher than the lowest possible costs from the current node of the path. Underestimation of real costs is admissible. However, accurate cost estimates should be sought [4, 6, 7]. Figure 6 shows the path possibilities to reach the orange end node from the yellow start node using the described heuristics. The blue path corresponds to the cost calculation of the Manhattan heuristic. The green path corresponds to the cost calculation of the Chebyshev and Euclidean heuristics. The real cost when using eight neighbouring nodes is 5, and when using four neighbouring nodes 10. The cost for the Manhattan heuristic is 10, for Euclidean 7.07 and for Chebyshev 5. According to the definition, the Manhattan and Euclidean heuristics are not admissible for eight neighbouring nodes. All heuristics are admissible for four neighbouring nodes, but Euclidean and Chebyshev underestimate the real costs.
Fig. 6 A visualization of the two possible cost calculations for the Manhattan and Euclidean distance are shown. Blue illustrates the way corresponding to the Manhattan distance (sum of vertical and horizontal distance). Green illustrates the direct diagonal way that corresponds to Euclidean distance. The Chebyshev distance (not shown) is only the minimum of vertical and horizontal distance
88
K. Schär et al.
4 Implementation and Testing 4.1 System Specifications The project is run and evaluated on a personal computer with the following specifications: • • • • •
Model: Dell XPS 15 7590 Operating System: Windows 10 Pro × 64 Processor: Intel Core i7-9750H CPU @ 2.60 GHz Memory: 16 GB DDR4, 2 * 8 GB, 2666 MHz Graphics Card: Intel UHD Graphics 630 & NVIDIA GeForce GTX 1650
4.2 Code Structure The A* pathfinding algorithm is written and implemented in Python 3.8. The evaluation is done by using Jupyter Notebooks. Each node in the map is initialised as a cell object. This object consists of various attributes such as the weight of the node, its cost f(n), the previous node visited and all its neighbouring nodes, depending on whether a diagonal movement is allowed or not. The A* algorithm takes the cell object of the start and end node, and the chosen heuristic function as input. It returns the path found and various other parameters determined during runtime. The source code for the A* algorithm is taken from [13]. However, it is extended to the needs of the project, particularly the additional costs of different cells of the Paparazzi problem. For each map, the particular function is run 20 times both for four and eight neighbouring nodes. For the results, we report the average values from these runs. However, only the run time was changed in the different runs (i.e., path length, path cost and number of iterations are always identical) and we provided the standard deviation for the run time. The run time is only determined for the pathfinding itself and not for the map initialisation.
4.3 Parameters of the Evaluation The evaluation of the different distance functions uses the path length, path cost, number of iterations and run time. The total cost of the path found of the ‘None heuristic’ is not included. Due to the ‘None heuristic’ definition, h(n) is dropped in (2). This means that the ‘None heuristic’ always has the lowest costs. Therefore, there is no meaningful comparison with the other heuristics that consider all partial costs in calculating f(n). The number
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures …
89
of iterations the algorithm uses with each heuristic is also used for the analysis, as it is independent of computational characteristics such as software and hardware. Finally, the run time to find the path from the start node S position to the end node E is considered another evaluation criterion. In many real-world applications, the run time is considered more important than the path length [2].
5 Results Table 2 and 3 show the results of the different heuristics used for the different map sizes. More specifically, they show the length of the path found with the associated total cost and the number of iterations and time needed for finding this path. In addition, Table 2 shows the results of pathfinding when using four neighbouring nodes, and Table 3, in contrast, shows the result of eight neighbouring nodes. For each map size, the smallest value with respect to the four attributes path length, path cost, number of iterations and run time is marked in bold. Table 2 shows that the ‘None heuristic’ always finds at least the same short path as the other heuristics. However, in two cases a shorter path is found. The Manhattan heuristic finds a path with the lowest number of iterations in three out of four cases. In two out of four cases, the Manhattan heuristic also needs the shortest time in this respect. In the largest map, the Euclidean heuristic needs about 10% fewer iterations and the shortest time to find the path. The Chebyshev heuristic always has the lowest path costs. In contrast, the Manhattan heuristic always has the highest path costs. This is not surprising as single Chebyshev distance measures are usually smaller than Manhattan distance measures. Table 3 shows that the ‘None heuristic’ finds a shorter path than the other heuristics in three out of four cases. In the two smaller maps, the Manhattan heuristic finds the path with the lowest number of iterations. In contrast, the Euclidean heuristic finds the path with the smallest number of iterations in the two larger maps. However, the shortest time does not correspond to the lowest number of iterations, except in the map 50 × 50. Similar to Table 2, the Chebyshev heuristic always has the lowest path costs and the Manhattan heuristic has the highest. Only in the largest map, does the Euclidean heuristic have higher path costs. Figure 7 shows an example of a map with a size of 50 × 50 after pathfinding. The depicted path is determined when using eight neighbouring nodes and the Chebyshev heuristic for distance calculation. The algorithm strives for a path that runs diagonally from the start node at the top left to the end node at the bottom right if the environment allows it. Figure 8 shows an example of a map with an extension of 50 × 50 after performing the pathfinding. The depicted path is determined when using the Manhattan distance function and four neighbouring nodes. In contrast to Fig. 7, the path does not follow a diagonal pattern, but shows an L-shape.
90
K. Schär et al.
Table 2 Results of the different heuristics when using four neighbouring nodes Map size
Heuristic
Path length
Path cost
Number of iterations
Run time [ms]
25
None
102
-
626
3.7 ± 0.6
Chebyshev
110
8457
642
4.5 ± 0.5
Euclidean
110
8607
642
4.6 ± 0.5
Manhattan
110
8992
595
4.0 ± 0.4
None
98
-
1751
40.1 ± 4.3
Chebyshev
98
8234
1532
33.0 ± 2.2
Euclidean
98
8918
1546
38.2 ± 4.6
Manhattan
98
9996
701
10.0 ± 1.3
None
258
-
8214
585.7 ± 36.7
Chebyshev
260
45,326
5927
320.7 ± 20.6
Euclidean
260
46,368
5222
265.6 ± 13.4
Manhattan
260
48,933
4703
168.8 ± 7.5
None
733
-
53,809
5795.8 ± 150.4
Chebyshev
733
363,921
52,119
5696.6 ± 192.6
Euclidean
733
371,763
46,396
5206.5 ± 175.9
Manhattan
733
393,870
51,690
6835.0 ± 164.2
50
100
200
6 Discussion Table 2 shows that the ‘None heuristic’ always provides the shortest path when using four neighbouring nodes as it reduces the A* to the exact Dijkstra algorithm. However, this is generally associated with a higher number of iterations and calculation time than the other heuristics. If the path length is the decisive criterion of an application, the results show that the ‘None heuristic’ or the Dijkstra algorithm should be chosen. Besides the path length, time is often the essential evaluation factor. Based on Table 2, the Manhattan heuristic generally requires the smallest number of
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures …
91
Table 3 Results of the different heuristics when using eight neighbouring nodes Map size
Heuristic
Path length
Path cost
Number of iterations
Run time [ms]
25
None
76
-
864
6.4 ± 0.5
Chebyshev
81
5025
705
5.8 ± 0.4
Euclidean
81
5144
674
5.5 ± 0.5
Manhattan
79
5229
667
5.8 ± 0.4
None
62
-
2274
64.5 ± 7.0
Chebyshev
64
4391
1455
33.5 ± 1.5
Euclidean
81
6714
894
19.3 ± 0.5
Manhattan
81
7469
706
12.8 ± 1.7
None
203
-
12,005
1525.3 ± 146.9
Chebyshev
205
30,171
5035
268.7 ± 28.1
Euclidean
205
30,991
3686
197.9 ± 21.9
Manhattan
206
33,318
4135
195.1 ± 16.8
None
636
-
83,568
19,002.9 ± 2263.6
Chebyshev
638
284,461
79,415
16,271.8 ± 211.1
Euclidean
638
292,876
79,080
18,789.0 ± 1508.3
Manhattan
603
286,885
80,563
19,064.8 ± 397.0
50
100
200
iterations and run time in smaller maps (25 × 25, 50 × 50 and 100 × 100). However, the Euclidean heuristic shows a reduction in the number of iterations by around 10% in the largest map. The run time is reduced by around 24%. Table 3 shows a similar situation when using eight neighbouring nodes. The ‘None heuristic’ always finds the shortest path in the three maps up to a size of 100 × 100. Again, this requires the highest number of iterations and a significantly increased run time. Focusing on the run time, the two smaller maps (25 × 25 and 50 × 50) show that the Manhattan distance requires the least number of iterations. The Euclidean heuristic shows better performance in the larger maps (100 × 100
92
K. Schär et al.
Fig. 7 Example of a map with an extension of 50 × 50 depicting the path found (coloured in pink) when using eight neighbouring nodes and Chebyshev distance function
and 200 × 200), although the difference is relatively marginal. Interestingly, the Chebyshev heuristic, as gold standard [12], shows no advantages over the Euclidean and Manhattan heuristics in the 25 × 25 and 100 × 100 maps. In the 50 × 50 map, the shorter path is found at the cost of a significantly higher number of iterations and run time. However, the Chebyshev heuristic shows a significantly reduced run time with a comparably low number of iterations in the 200 × 200 map. Based on the results presented, no conclusive recommendation for choosing a heuristic for eight neighbouring nodes is possible. The results of the Chebyshev function when using four neighbouring nodes tend to be worse than those of the Euclidean and Manhattan function. This result is consistent with the admissibility of heuristics. According to this theory, the Euclidean and Manhattan heuristics should also perform worse than the Chebyshev heuristic when using eight neighbouring nodes. However, this fact is not always observed. In many cases, the Euclidean and Manhattan heuristics perform better in terms of run time and number of iterations. In principle, the algorithm is doing redundant work with an inadmissible heuristic. Because the path costs do not match the estimated costs, the algorithm has a misconception about which paths are better than others. Therefore, it examines possible paths that should be ignored. As a result, suboptimal paths could be found. Figure 7 shows that the path chosen by the Chebyshev function does not necessarily correspond to the path that would be chosen as an ideal path in the real world.
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures …
93
Fig. 8 Example of a map with an extension of 50 × 50 depicting the path found (coloured in pink) using Manhattan distance function and four neighbouring nodes
On the one hand, the algorithm chooses a path through rough terrain such as the shore area of the pond. On the other hand, it ignores the loophole in the wall in the lower-left area and goes directly through the main gate with the cameras. Figure 8 shows a more realistic path as it uses the loophole in the wall to enter the estate and bypasses the security cameras at the main entrance. In addition, the path found in Fig. 8 has an L-shaped form. In contrast, the path found in Fig. 3 shows a diagonal pattern. This is consistent with the visualization of the cost calculation for h(n) in Fig. 6. The evaluation shows that the current implementation of node weighting is not universally applicable for different map sizes. For smaller maps, the variable e(n) takes a dominant part in calculating f (n). However, the two variables g(n) and h(n) take the dominant part for larger maps. To solve this issue, either (2) or the values of the node weighting must be adjusted. A possible solution is to scale the node weights according to the map size. For example, the node weighting in the present study could be set for the map size 25 × 25 and scaled up accordingly for the other maps.
94
K. Schär et al.
7 Conclusion In summary, the Manhattan function offers the best performance when using four neighbouring nodes in small maps (25 × 25, 50 × 50 and 100 × 100). The Euclidean measure offers the best performance at reasonable path costs in the largest map. However, the results do not provide a sufficient basis to recommend using a particular heuristic for eight neighbouring nodes conclusively. The Chebyshev heuristic always has the lowest path costs, but it also takes the longest time to find a path in principle. This fact applies to both types of movement. If eight neighbouring nodes are used, the Chebyshev heuristic shows no advantages over the Manhattan and Euclidean heuristics. It is in contradiction to the admissibility of heuristics. Therefore, this aspect would have to be investigated further. Furthermore, it has been shown that the choice of weights and (2) as specified in [5] are not optimal. However, this problem is negligible if only constant map sizes are used. If working with maps of widely varying sizes, a different approach is recommended. A scaling of the node weights could be a possible solution. The Paparazzi problem is not a realistic pathfinding problem. Nevertheless, it combines various real-world problems of pathfinding. This paper is mainly concerned with classical obstacle avoidance and various terrain structures in different sized and complex environments. Thus, despite the artificial setting, a reasonable application of pathfinding is created.
8 Outlook To make the paparazzi problem presented more realistic, it would be possible to make the target point or the celebrity movable. Another option would be to introduce security personnel to search the area as soon as the paparazzo walks through a camera node. However, these two extensions would convert pathfinding into a more dynamic and multi-agent-based problem as described in [1]. However, dynamic algorithms such as D* are expected to achieve better results in this problem than the traditional A* algorithm [14]. It would also be interesting to examine the node weights more in detail. Problems arise when working with maps of different sizes. The scaling of the node weights mentioned above is one possible solution. Finding an alternative formula to calculate the total costs f (n) would be another possibility. As visible in Table 3, the Chebyshev heuristic does not show the expected superiority over the two inadmissible heuristics when using eight neighbouring nodes [12]. Therefore, this fact would need to be investigated further. It is recommended to pursue this with additional maps of different sizes and layouts.
7 Pathfinding in the Paparazzi Problem Comparing Different Distance Measures …
95
References 1. Li J, Tinka A, Kiesel S, Durham JW, Satish Kumar TK, Koenig S (2020) Lifelong multi-agent path finding in large-scale warehouses. In: Proceedings of the international joint conference on autonomous agents and multiagent-systems, AAMAS 2. Permana SH, Bintoro KY, Arifitama B, Syahputra A (2018) Comparative analysis of pathfinding algorithms A*, Dijkstra, and BFS on maze runner game. Int J Inf Syst Technol 1(2):1–8 3. Zhang H-M, Li M-L, Yang L (2018) Safe path planning of mobile robot based on improved A* algorithm in complex terrains. Algorithms 11(4):1–18 4. Monzonís DL (2019) Pathfinding algorithms in graphs and applications. B.S. Thesis, Faculty of Mathematics and Computer Science, University of Barcelona, Barcelona, Spain 5. Baldi S, Maric N, Dornberger R, Hanne T (2018) Pathfinding optimization when solving the paparazzi problem Comparing A* and Dijkstra’s algorithm. In: 2018 6th international symposium on computational and business intelligence (ISCBI), Basel, Switzerland, pp 16–22 6. Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern 4(2):100–107 7. Och FJ, Ueffing N, Ney H (2001) An efficient A* search algorithm for statistical machine translation. In: Proceedings of the ACL 2001 workshop on data-driven methods in machine translation, pp 1–8 8. Lin M, Yuan K, Shi C, Wang Y (2017) Path planning of mobile robot based on improved A* algorithm. In: 2017 29th Chinese control and decision conference (CCDC), China, pp 3570–3576 9. Patel A. A*’s Use of the Heuristic. http://theory.stanford.edu/~amitp/GameProgramming/Heu ristics.html. Accessed 12 June 2021 10. Huang H et al (2019) Dynamic path planning based on improved D* algorithms of Gaode map. In: 2019 IEEE 3rd information technology, networking, electronic and automation control conference (ITNEC), pp 1121–1124 11. Suryadibrata A, Young J, Luhulima R (2019) Review of various A* pathfinding implementations in game autonomous agent. Int J New Media Technol 6(1):43–49 12. Guo X, Luo X (2018) Global path search based on A* algorithm. In: 2018 international conference on transportation & logistics, information & communication, smart city (TLICSC 2018), pp 369–374 13. Pranjal S (2020) A_star-visualization. Github repository. https://github.com/AnonymousPS/ a_star-visualization. Accessed 06 May 2021 14. Al-Mutib K, AlSulaiman M, Emaduddin M, Ramdane H, Mattar E (2011) D* Lite based real-time multi-agent path planning in dynamic environments. In: 2011 third international conference on computational intelligence, modelling & simulation, pp 170–174
Chapter 8
Empirical Evaluation of Motion Cue for Passive-Blind Video Tamper Detection Using Optical Flow Technique Poonam Kumari and Mandeep Kaur
1 Introduction Digital videos capture more incidents than ever before because of the proliferation of smartphones and security cameras. In addition, the technological and economic advancements have led to grave issues concerning reliability of multimedia content. The algorithms and techniques of multimedia forensics are dedicated towards analyzing tampering of digital media content that can also be produced as digital evidence [1]. Video Forensics, which is a subset of the multimedia forensics, specifically focuses on the scientific examination and evaluation of videos in the legal matters. It broadly deals with the acquisition of video evidence from multiple sources, its camera identification, temper detection and hidden data recovery [2]. The domain of video forensics is mainly categorized as active forensics and passive forensics [3]. Active approaches embed or append some validating inputs within it, like watermarks or digital signature. These approaches are intrusive in nature and mostly dependent on the hardware. Unlike active approaches, the passive forensic methods authenticate the veracity and integrity of videos without having any prior embedded information. These passive-blind approaches are found to be more effective than active forensics in practice [4]. The techniques and algorithms used by the video forensic experts need continuous assessment to counter the challenges posed as malice by attackers or criminals. It may also be possible that the perpetrator may have acted in an anti-forensic way. They can try to hide out some facts that can be possibly a criminal proof, for example, by deleting or inserting some frames from a video or by using copy-move forgery or splicing forgery [5, 6] followed by sophisticated post-processing operations. Some videos can be altered by crop out the frames P. Kumari (B) · M. Kaur University Institute of Engineering and Technology Panjab University, Chandigarh, India e-mail: [email protected] M. Kaur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_8
97
98
P. Kumari and M. Kaur
to hide some information called up-scale crop forgery [7]. Thus, video forensic technology is drawing more attention to identify such unauthorized activities and has very crucial role in multimedia authentication and security. Video forgeries are broadly classified as intra-frame forgery and inter-frame forgery [7, 8]. The intra-frame technique does not involve any temporal manipulation. Still, it does some copy-paste or splicing on some set/sequence of frames of the video, and recompression of the video has to carry out. Hence, such forgeries are also called spatial domain forgery [9]. On the other hand, in the inter-frame forgery, frames are related to one another and forgeries implemented on frames like frame insertion, deletion, replication, or reposition [10]. This kind of forgery has to be performed between the various inter-related frames over time and hence also called temporal domain forgeries [11]. Both of these forgery detection methods mostly use the I-frames that helps to analyze the future frames and figure out what is changing in the next frames [12]. For example, if some sequence of frames is added or deleted or reshuffled from the video, then the information stored in the I-frames does not match with the information generated after forgery. Some recent research enhancements introduced a third category of video forgery that can be carried out in both spatial and temporal domains [13]. Forgeries like Deepfakes [14] and style transfer [15] comes under this category. There are numerous techniques proposed in literature to detect video forgeries. Some active video forgeries can be detected by identifying its source of origin [16], which uses the sensor pattern noise [17], interpolation methods [18], quantization methods [19], and motion vectors cues [20]. There are also many cues that are exploited to identify passive video forgeries like the motion fields [21], threshold [22], quantization errors [19], variation in IBP frame sequence pattern [23], double compression [21], DCT [24, 25], SIFT features [26], HoG [27], brightness gradients [23], optical flow [11], Motion Compensated Edge Artifact (MCEA) [28], Levenshtein Distance [29]. An inter-frame copy-move forgery is depicted in Fig. 1 taken from the REWIND data set [30], which is one of the most widely used dataset available for the researchers. The figure depicts a snapshot of frames 1 to 200 taken from original (04_original_enc10.avi) and forged (04_forged_enc10.avi) videos from the dataset. The dataset provides 40 original and 40 forged videos with four different quantization factors. Motion is the key difference between static images and videos. While working with videos, all the frames are related to one another. Optical flow [31] is the technique which provides the motion of the objects from one frame to another. It is a potent cue and is alone capable of defining many actions. The current paper aims to analyze optical flow-based methods for video forgery detection, as the optical flow deals with the estimation of the true motion field. Estimation of optical flow is one of the key problems in video analysis and its estimation is observed as a dense correspondence problem [32]. To the best of our knowledge, Farneback method for dense optical flow technique [33] is not exploited for forgery detection in digital videos. The present paper aims to empirically analyze the motion cue to detect the forgery in digital videos by computing the optical flow of each pixel, of every frame, by using the
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
99
Fig. 1 Copy move forgery: a) Original Video frames b) Forged Video Frames (Source REWINDdata set [30] a) 04_original_enc10.avi b) 04_forged_enc10.avi)
Farneback method [33]. This method can detect either inter-frame and also facilitates localization of forged region. The proposed method uses the OpenCV library [34] and the standard REWIND dataset [30] to validate the results. The organization of the paper is as follows: Sect. 2 provides a review of optical flow techniques and related literature that comprises optical flow-based video forgery detection techniques. Section 3 elaborates the proposed scheme to detect the forged video. The empirical results and discussions have presented in Sect. 4. Finally, Sect. 5 provides the conclusion of the paper, along with some directions for future research.
2 A Review of Optical Flow Techniques Motion is a vibrant source of information related to the world that facilitates to realize operations that deal with segmentation, surface structure from parallax, self-motion, recognition of objects, understanding behavior and scene dynamics. But sometimes it is not enough to determine only the motion of sparse points for recognition of objects, their properties and actions. For example, if a dynamic source point illuminates a light on parts of the human body, then it is not easy to recognize its gender, complexion, and mood of that person. We need to measure the motion field to obtain the motion features for subsequent recognition. Sometimes object itself does not move in the scene, but we can illuminate the motion of the object due to other objects or background movements. Suppose, we have an object like a barber’s pole moving around its axis in a scenario under some assumptions like the background is still, lighting conditions are consistent during the video capturing, the velocity of the moving object is the same throughout the scene as shown in Fig. 2. All points are moving, but we can’t distinguish one point from the other. So for us, the pole seems
100
P. Kumari and M. Kaur
Fig. 2 Motion field and optical flow of Barbar’s pole [36]
static. If points observed from the scene are moving relative to the camera, then the vector field of 2D projections of scene point’s motion vectors into an image is called motion field image. In contrast, the optical flow is a vector field of apparent motion of pixels between frames. S. Husseini treats the optical flow as the estimation of the true motion field of the moving object [35]. The key objective of the optical flow is to find the 2D displacement of brightness patterns of each pixel in each frame of the video [37]. If we consider the real-world scenario, it seems like a 3D component vectors uij, vij, and wij, which show the motion of the object in each direction. But when we record a scene using the camera, it captures that in a 2D space. Then the optical flow provides some approximation of the motion field, which captured as the 2D projection of the 3D motion field. But the motion field is not equal to the optical flow, as optical flow also depends upon the apparent shift of the lightening conditions in the two frames. To estimate the optical flow in 2D space for each point (xij,yij) in the first frame of a video, we need to find a corresponding point (xij + uij, yij + vij) in the second frame, which corresponds to the same point of zone c as the point on the first frame. Let us assume, that the intensity at time t of a pixel position (x,y) is I(x,y,t). If we shift that pixel with minimal distance x and y in time t, i.e., its new position is (x + x, y + y), then its intensity I(x + x, y + y, t + t) must be equal to the intensity of the pixel (x,y) at time t [37]: i.e I(x + x, y + y, t + t) = I(x, y, t) Expansion by Tailor Series and further simplification reduces the above equation to u(x, y).I x + v(x, y).I y + I t = 0∀(x, y) ∈
8 Empirical Evaluation of Motion Cue for Passive-Blind Video … x t→0 t
where, lim
=
dx dt
y t→0 t
= u and lim
=
dy dt
101
= v.
Both u(x,y) and v(x,y) are treated as the displacement components of the optical flow field, and is the size of each frame. This equation is known as brightness consistency constraint, and we have two optical flow vectors unknown as u and v [37]. Thus, the optical flow problem is to estimate the vector field of local displacement in a sequence of frames. This problem can be solved by selecting a pixel and finding the velocities vectors flowing through that pixel. There are primarily three methods that are widely used to estimate the optical flow vectors u and v. It includes Lucas Kanade method [31] detects the object first and then estimates the flow vector values only for the detected objects; the Farneback method [33] which take each pixel and its neighborhood pixel values to estimates the flow vectors value, and finally Horn and Schunk method [38] that estimates the values of u and v by analyzing each pixel separately. Further, there are two main ways to visualize the result of optical flow estimation [39]. The first is to draw motion vectors directly. Motion vectors should be drawn only for the sparse set of points, because if we draw the motion vectors for each pixel, then the image generated will be unreadable. The second approach is to use a color-coding method in which we can specify a color for each possible motion. Usually, vector orientation was coded by color hue, and vector length was coded by color saturation. Many researchers use both of these techniques to detect the forgeries in the videos. A limited research has been carried out wherein optical flow techniques are used to detect the forgeries in the videos. Chao et al. [40] used a window-based method and precise detection model based on binary search technique to identify the frame insertion forgery. They also detected frame deletion forgery by implementing the Lucas Kanade optical flow approach to find the minute difference in the original and forged video sequence. It gives high precision for frame insertion forgery but failed to achieve such high precision to detect frame deletion forgery. Another approach used in [41] also detected frame insertion and deletion forgery using the Lucas Kanade approach to find the optical flow and then examine the variation sequence of optical flow to find the discontinuity points. The work was carried out on customized videos. The authors in [42] detect various inter-frame forgeries. They also used Lucas Kanade optical flow to extract flow vectors for frames of the videos and then its consistency after normalization and quantization are calculated to detect the inter-frame forgeries in both X and Y direction on still background dataset. They were then used the SVM with this consistency feature for classification of original and forged videos. Bidokhti and Ghaemmaghami [11] also used Lucas Kanade approach, but they focused on the partly copy/move attacks by dividing the frames into original vs. suspicious parts. Then forgery is detected by calculating the optical flow for a frame and computed a coefficient for each frame. If a secondary peak exists in the optical flow coefficient graph, then they identified the video as a forged video. They also used a secondary peak detection algorithm. This algorithm is highly sensitive for the region of interest selected.
102
P. Kumari and M. Kaur
In the existing literature, there were two papers [23] and [43], which used the Horn and Schunck optical flow method to detect video forgery. Authors in [23] use a residual-based block matching technique and optical flow. The objective is to detect frame-based tampering by analyzing the prediction residual-based block, and optical flow gradients in MPEG-2 and H.264 encoded videos for slow motion, moderate motion, and fast motion videos. They did not test the work on a standard dataset, and the dataset used is no longer available. Authors in [43] detected the frame insertion and deletion forgery by using optical flow brightness gradient and frame replication by prediction residual-based technique. Both features are used by identifying and localizing the irregularities in the graph. Intra-frame copy-move forgery detected by [44] shows the harmonic motion in the optical flow, which gives an anomaly movement distribution of each frame. But the main drawback of this approach is that it can be applied only to a particular type of video that cannot be generalized, and also they tested this approach on a very small number of videos. Shuo et al. [45] proposed a method to detect the deleted frames by using robust principal component analysis to extract moving objects and then applied pseudo flow orientation variation descriptor to approximate the flow orientation variations. It was tested on the 324 real–world videos. Deepfake is also a kind of video forgery in which facial manipulation is the primary task [46]. Authors in [47] detected deepfake videos and original videos by calculating optical flow and then applying a VGG16- & ResNet50 Flow CNN with a sigmoid activation function. They trained the model with 960 videos out of which 720 used for training, 120 for validation, and 120 for testing. But the performance of this approach is not efficient, which can be enhanced by combing this approach with other frame-based methodologies to improve the performance of the existing system.
3 Proposed Methodology Unlike the existing approaches the proposed method uses the Farneback optical flow technique for empirically evaluating the motion cue for detecting forgeries in digital videos. The proposed approach first divides the digital video into a sequence of frames. It then computes the optical flow between every two consecutive frames, which gives two values for each pixel position in the 3D-flow image, which in turn shows the apparent shift in the position of pixel xn between frame n and frame n + 1.Then a distance-vector array is calculated for each pixel value in the flow image, and the average distance-vector is taken into consideration for each flow image. These average distance vectors are plotted with respect to the number of frames to detect the forged video sequence and for the localization of the forged frames. A supervised machine learning approach based on linear support vector classifier is then applied to automate the detection process in the proposed scheme. A detailed block diagram of the proposed method is depicted in Fig. 3, where the input video sequence was divided into the N number of frames starting from 0 to N-1.
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
103
Fig. 3 Methodology of the proposed work
Then the N-1 3D flow images are generated by calculating the optical flow between every two consecutive frames by using the Farneback OF technique. Farneback OF method was chosen for in-depth analysis of each pixel in the frame because this method calculates the dense optical flow, i.e. optical flow for each pixel between two consecutive frames of a video. The 3D flow images contain the two values at each pixel location because the optical flow gives the apparent shift of a pixel from one position to another in both x and y direction between two consecutive frames. So the flow images contain the value of x and yfor each pixel position. After that, we calculated the distancevector zvalue for each pixel position of the 3D flow image. Average of all z values as one value for each frame. Then we plotted these average z values against the number of frames, and the results obtained from those graphs. Proposed Algorithm Step 1: Convert video into N no. of frames. (Each frame having resolution of 320 * 240) Step 2: Convert all the frames from RGB to grayscale. Step 3: Calculate dense optical flow using the Franeback method for each pixel of between every two consecutive frames, which gives a 3D optical flow image that contains 320 * 240 values of x and y. Repeat this step for all N frames. Step 4: For each 3D optical flow image,
104
P. Kumari and M. Kaur
Fig. 4 Flowchart of the proposed scheme
4.1 Calculate distance-vector z = x2 + y2 for each flow image and store the values on the distance vector for each pixel in a distance-vector array. The distancevector array of flow image n contains 320 * 240 values of z. Calculate the average of z value for each distance vector image, i.e. avg(z). Step 5: Plot the value of avg(z)with respect to the number of frames. Step 6: Find out extra spikes present in the forged frame sequences as compared to the original video graphs. To automate the process of forgery detection LinearSVC is used which worked on the selected 200 frames from each video. A cross fold validation is applied to validate the process of forgery detection and then average accuracy model is computed.
4 Empirical Evaluation of Motion Cues for Temper Detection 4.1 Materials and Methods The REWIND dataset of h.264 lossless compressed videos having.mp4 and.avi formats were used to validate the proposed scheme for video forgery detection. It contains 10 forged and 10 original videos having a still background and a constant resolution of 320 * 240 with varying quantization factor as 0, 10, 20 and 30. So a total of 80 videos (40 original and 40 forged) are available for the research purpose. These 80 videos contain multiple types of forgeries like copy-move, splicing, frame insertion, frame deletion, frame reshuffling, and frame duplication.
4.2 Experimental Setup 1. 2. 3.
Samples for the empirical analysis were taken from the REWIND dataset. Frame separation algorithm was applied to the video, and then all the frames are pre-processed by converting them to the grayscale images. Then calcOpticalFlowFarneback()method for optical flow calculation has been implemented by using the OpenCV library in python 3.0.
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
4.
5. 6.
7.
105
Then the value of avg(z) was calculated by the proposed algorithm, and the graphs were plotted with the help of the pyplot library and find the 200 avg(z) values of the region of interest. A dataset.csv file was created by taking 200 avg(z) values for all 80 videos of the dataset. A pre-processing step, i.e. standardization of the dataset was done, and then LinearSVCfrom the sklearn library was applied for the training, testing and validation. Prediction accuracy of detection was calculated to analyse the learned model.
4.3 Empirical evaluation and results The detection of video forgery is represented as a binary classification problem, wherein class 1 was used for original and class 2 was used for the forged video. The distance-vector array of one flow image contains 320 * 240 values of z. Hence, an average of z value for each distance vector image, i.e. avg(z), is computed and analyzed. On plotting avg(z) values with respect to frames shows extra spikes in forged videos on manual inspection. This motivated to apply machine learning in order to automate the process of discriminating original and forged videos. For the current experiment, a window of 201 frames was selected from each video for analysis, given each video had a minimum of 201 frames. The selection of frames was done based on manual inspection, to extract a region of interest in the video under analysis. This reduces the overall computational overhead and also maintains uniformity in the feature vector from each sample. Thus, from each sample video a total of 200 values of avg(z) is obtained that signifies the variationin motion cue in the 200 flow images obtained from each frames. These values are stored in a.csv file which contains 80 rows for no. of videos, 200 columns for avg(z) values, and 1 column for the class field to tag the video as original and forged. A total of 62 videos were used for training purposes and 18 for testing purposes. A Linear SVC model is used to classify the dataset into class1 or class2. A five-fold cross validation is carried out to validate the results. The models gives a prediction accuracy of 97.00%. Manual inspection of the given samples resulted in 98% accuracy. This is due to the limited window of frames used for the classification purpose in order to maintain equal feature length. Padding with dummy frames can be carried out to improve the prediction accuracy. Evaluation metrics used to validate the results include accuracy, precision, recall, and ROC curve. In order to distinguish between the original video and forged video frames, two videos screenshots were taken for reference. Frames of the first original video are depicted in Fig. 5(a). It does not contain any moving car, but in Fig. 5(b) the frames between 90 to 100 are showing that a red colour car passed the road at this interval. The original video contained the moving car frames but on some other frame interval. This clearly shows that these frames were copied from the same video and inserted at frame interval 90–100. Same kind forgery is done in second video shown in the
106
P. Kumari and M. Kaur
a)
b)
c)
d) Fig. 5 Selected 200 frame sequence of videos from REWIND dataset, h264_lossless a) 02_original.mp4 Original video frames b) 02_forged.mp4 Forged Video frames c) 07_original.mp4 Original video frames d) 07_forged.mp4 Forged Video frames
Fig. 5(c) and 5(d), where former is the original video and does not have any moving red colour ball, whereas latter contained a red colour ball moving right to left between frames 120–150. In this also frames were copied from the same video but some other frame interval and inserted between frames 120–150. Both these videos are showing a copy-move inter-frame forgery. Some of the graphs with localized forged regions are shown in Fig. 6. It was analyzed from the graphs that the extra spike occurred in the forged video approximately follows the same pattern as of some other spike present in the original videos. From this, it was depicted that the identified forged frames must have taken from the same video sequence to create an inter-frame copy-move forgery in the video to replicate the scene. For the illustration of the proposed scheme, an empirical analysis carried out wherein graphs are plotted using the pyplot library between avg (z) values termed as the distance for the generalization purpose and the number of frames of the video. In the plots of resultant values, the difference between the forged video and the original video was clearly visible. Some extra spikes are present in the forged frame sequences as compared to the original video frames. Manual inspection results in
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
Original Videos
107
Forged Videos
a)
b)
c)
d)
e)
f)
g)
h)
Fig. 6 avg(z) plots for Videos of Rewind dataset folder h264_lossless a) 02_original.mp4 b) 02_forged.mp4 c) 03_original.mp4 d) 03_forged.mp4 e) 07_original.mp4 f) 07_forged.mp4 g) 08_original.mp4 h) 08_forged.mp
108
P. Kumari and M. Kaur
accuracy of 97%, which is highly acceptable if a video needs to categorized between forged and original under the assumptions taken during the experimentation done in this paper. In order to automate the process of discriminating forged and original video frames, support vector classifier is applied. After applying Linear SVC, an accuracy of 97% was observed, with precision value of 91.66% and recall as 100%. A more balanced performance metric that takes into account both precision and recall, depicted by the F1-score is obtained as 95%. A similar approach by Al-sanjary et al. [44], claimed the accuracy of the proposed scheme is 96%. But this approach was not tested on all videos of the SULFA (now renamed as REWIND) data set. Only three videos were taken into consideration from SULFA and six videos from VTD dataset. Comparatively, our approach gives a better result when applied on all 80 videos of the REWIND dataset. To enhance the accuracy of this proposed technique, we can also use a fusion of cues as carried out in the passive-blind approaches for images at features level [48] or measurement level [49]. For videos, characteristics that exploit inconsistencies in compression, texture, noise, group of frames (gop) etc. can be investigated. ROC (Receiver Operating Characteristics) Curve for the classification problem under discussion is presented in Fig. 7. The area covered by the curve is the area between the orange line (ROC) and the axis and is the area under the curve (AUC). The larger the area covered, the better the machine learning models are at discriminating between the given classes. This proposed scheme is also tested with various SVM models by varying their parameters. But the best results are given by the linear SVC model of SVM. The first parameter used is a gamma variable which is inspected by varying its values as 0.1, 1.0, 10, and 100. This gamma variable is used to distinguish the non-linear hyper planes. The second parameter taken into consideration is a C value which is a penalty parameter of the error term. This C variable helps us to correctly classify the training points and also controls the trade-off between smooth decision boundaries. Values taken for C are 0.1, 1.0, 10, 100, and 1000. The third parameter is the degree of the polynomial, which can be used only with the poly kernel. This degree variable value varies between 0 and 6. By varying all these parameters, only the values of the highest value of accuracy with its corresponding recall and precision values are compared with the Linear SVC model. It is clearly shown in the chart given below in Fig. 8, that the linear SVC model gives the optimal performance.
4.4 Comparison with Other State-of-Arts Approaches In literature, mainly three optical flow approaches were described and only two of them were used to detect the inter-frame forgery. First, Lucas Kanade approach was used in [11] which gives the accuracy of 89.47%. Second, Horn and Schunck approach was used in [23] which gives the accuracy of 83% but it was not tested on any standard dataset. Author in [44] also detected video forgery with 96% accuracy using optical flow but not described the approach used. Farneback method has not to
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
109
Fig. 7 The Receiver Operating Characteristics (ROC) for the proposed algorithm
1
Accuracy
Recall
Precision
F1 Score
0.8 0.6 0.4 0.2 0
SVC kernal=Linear SVC kernal=rbf SVC kernal=poly
LinearSVC
Fig. 8 A comparison chart of various SVC models with their accuracy, recall, precision and F1Score values
be exploited for forensic analysis of digital videos. Our approach uses the Farneback method which gives the better accuracy i.e. 97% and also tested on the standard dataset.
5 Conclusion A passive-blind approach to detect the forgeries in digital videos by exploiting motion cues is proposed. To the best of our knowledge the application of Farneback dense OF technique has been very limited in forensic analysis of digital videos. The statistical detail from optical flow images displays extra spikes in the forged videos upon visual inspection. Supervised machine learning was applied to automate the process
110
P. Kumari and M. Kaur
of discriminating original and forged videos. Currently, the learning is applied on a region of interest selected for each video under analysis based to statistical values obtained from optical flow images. It reduces overall computational overhead and also maintains uniformaity in the feature vertor from each sample. Videos encoded in H.264, .mp4, and .avi video formats from standard REWIND are used for the experiment. Though manual inspection of extra spikes generated by the optical flow values results in accuracy of 98%, but supervised learning using linear SVC resulted in prediction accuracy of 97% with F1-score of 95.64%. The reduction in accuracy is due selection of a window of frames for analysis. In future, we aim to automate the process of region of interest selection to compensate computational complexity and carry out in-depth investigation of the window of frames based on multiple cues to augment overall prediction accuracy.
References 1. Battiato S, Giudice O, Paratore A (2016) Multimedia forensics: discovering the history of multimedia contents. In: ACM international conference proceeding series, vol. 1164, no. June, pp 5–16 2. Khanna A, Singh AK, Swaroop A (eds) (2021) Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020). Springer Singapore, Singapore 3. Su L, Huang T, Yang J (2014) A video forgery detection algorithm based on compressive sensing. Multimed Tools Appl 74(17):6641–6656 4. Delp E, Memon N, Min Wu (2009) Digital forensics [from the guest editors]. IEEE Sign Proc Magaz 26(2):14–15. https://doi.org/10.1109/MSP.2008.931089 5. Moon SK, Raut RD (2014) Application of data hiding in audio-video using anti forensics technique for authentication and data security. In: IEEE international advanced computing conference, pp 1110–1115 6. Stamm MC, Lin WS, Liu KJR (2012) Temporal forensics and anti-forensics for motion compensated video. IEEE Trans Inf Forensics Secur 7(4):1315–1329 7. Kumar V, Singh A, Kansal V, Gaur M (2021) A comprehensive survey on passive video forgery detection techniques. Stud. Comput. Intell. 921:39–57 8. Singh RD, Aggarwal N (2018) Video content authentication techniques: a comprehensive survey. Multimed Syst 24(2):211–240 9. Pandey RC, Singh SK, Shukla KK (2015) Passive copy-move forgery detection in videos. In: 5th 2014 international conference on computer and communication technology (ICCCT), pp 301–306 10. Cheng Hui MAI (2003) Spatial temporal and histogram video registration for digital watermark detection. In: Proceedings 2003 international conference on image processing (Cat. No.03CH37429), vol. 2, no. 70, pp 735–738 11. Bidokhti A, Ghaemmaghami S (2015) Detection of regional copy/move forgery in MPEG videos using optical flow. In: Proceedings of the international symposium on artificial intelligence and signal processing, AISP 2015 12. Gilbert Yammine AK, Eugen W (2018) Blind gop structure analysis of mpeg-2 and h. 264/avc decoded video. In: 28th picture coding symposium PCS2010, December 8–10, 2010, Nagoya, Japan Blind, pp 258–261 13. Mathews MR, Sreedharan S (2015) Detection and localization of video copy-move forgery in temporal and spatial domain. Int J Innov Technol Explor Eng ISSN 2278–3075(1):68–71
8 Empirical Evaluation of Motion Cue for Passive-Blind Video …
111
14. Westerlund M (2019) The emergence of deepfake technology: a review. Technol Innov Manag Rev 9(11):39–52. https://doi.org/10.22215/timreview/1282 15. Ruder M, Dosovitskiy A, Brox T (2016) Artistic style transfer for videos. In: Ger. Conf. Pattern Recognition. Springer, Cham, vol. 9796 LNCS, pp 26–36. https://doi.org/10.1007/978-3-03068793-9 16. Dirik NM, Emir A, Husrev TS (2007) Source camera identification based on sensor dust characteristics. In: 2007 IEEE working signal processing application public security forensics. IEEE, no. 92251-NY-IJ 17. Luka J, Fridrich J, Goljan M (2006) Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Foren. Secur. 1(2):205–214. https://doi.org/10.1109/TIFS.2006.873602 18. Niklaus S, Mai L, Liu F (2017) Video frame interpolation via adaptive convolution. In: Proceedings, 30th IEEE conference on computer vision and pattern recognition, vol. 2017-Janua, pp 2270–2279 19. Aghamaleki JA, Behrad A (2016) Inter-frame video forgery detection and localization using intrinsic effects of double compression on quantization errors of video coding. Signal Proc Image Commun 47:289–302. https://doi.org/10.1016/j.image.2016.07.001 20. Su Y, Xu J, Dong B, Zhang J, Liu Q (2010) A novel source mpeg-2 video identification algorithm. Int J Pattern Recognit Artif Intell 24(8):1311–1328 21. He P, Jiang X, Sun T, Wang S (2016) Double compression detection based on local motion vector field analysis in static-background videos. J Vis Commun Image Represent 35:55–66 22. Beatrice O, Akumba BO, Iorliam AA, Agber S, Okube EO, Kwaghtyo KD (2021) Authentication of video evidence for forensic investigation: a case of Nigeria. J Inf Secur 12(02):163–176. https://doi.org/10.4236/jis.2021.122008 23. Kingra S, Aggarwal N, Singh RD (2017) Inter-frame forgery detection in H.264 videos using motion and brightness gradients. Multimed Tools Appl 76(24):25767–25786 24. Su Y, Nie W, Zhang C (2011) A frame tampering detection algorithm for MPEG videos. In: 2011 6th IEEE joint international information technology and artificial intelligence conference ITAIC 2011, vol. 2, no. 2006, pp 461–464 25. Zhao H, Wang H, Malik H (2012) Steganalysis of youtube compressed video using high-order statistics in 3D DCT domain. In: Proceedings of the 2012 8th international conference on intelligent information hiding and multimedia signal processing IIH-MSP 2012, pp 191–194 26. Najva N, Bijoy KE (2016) SIFT and tensor based object detection and classification in videos using deep neural networks. Procedia Comput Sci 93(September):351–358 27. Bilinski P, Bremond F (2011) Evaluation of local descriptors for action recognition in videos. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 6962 LNCS, pp 61–70 28. Dong Q, Yang G, Zhu N (2012) A MCEA based passive forensics scheme for detecting framebased video tampering. Digit Investig 9(2):151–159 29. Ren H, Atwa W, Zhang H, Muhammad S, Emam M (2021) Frame duplication forgery detection and localization algorithm based on the improved levenshtein distance. Sci Program 2021:1–10. https://doi.org/10.1155/2021/5595850 30. REWIND dataset (2012). https://sites.google.com/site/rewindpolimi/downloads/datasets/ video-copy-move-forgeries-dataset. Accessed 05 Aug 2020 31. Lucas BD, Kanade T (1881) An iterative image registration technique with an application to stereo vision. In: Proceeding DARPA Image Understanding Workshop, April 1981, pp 121–130 32. Liu C, Yuen J, Torralba A (2015) Sift flow: dense correspondence across scenes and its applications. Dense Image Corresp Comput Vis 1(1):15–49 33. Farneback G (2003) Two-frame motion estimation based on polynomial expansion. Lect Notes Comput Sci 2749(1):363–370 34. Culjak I, Abram D, Pribanic T, Dzapo H, Cifrek M (2012) A brief introduction to OpenCV. In: MIPRO 2012 - 35th international convention on information and communication technology, electronics and microelectronics proceedings, pp 1725–1730 35. Husseini S (2017) A survey of optical flow techniques for object. Tampere University of Technology
112
P. Kumari and M. Kaur
36. Owens R (1997) Computer Vision IT412. http://homepages.inf.ed.ac.uk/rbf/CVonline/ LOCAL_COPIES/OWENS/LECT12/node4.html#SECTION00040000000000000000. Accessed 14 Jul 2020 37. Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv 27(3):433–466 38. Horn BKP, Schunck BG (1981) Determining optical flow. Massachusetts Institute of Technology Artificial Intelligence Laboratory, North-Holland, p. A.I. Memo No. 572 39. Lin C (2018) Introduction to motion estimation with optical flow. https://nanonets.com/blog/ optical-flow/. Accessed 18 Aug 2020 40. Chao J, Jiang X, Sun T (2012) A novel video inter-frame forgery model detection scheme based on optical flow consistency. In: International workshop on digital watermarking. Springer, Berlin, Heidelberg, no. October, 2012 41. Wang W, Jiang X, Wang S, Wan M (2014) Identifying video forgery process using optical flow, no. February 2016 42. Wang Q, Li Z, Zhang Z, Ma Q (2014) Video inter-frame forgery identification based on optical flow consistency. Sens Transd 166(3):229–234 43. Singh RD, Aggarwal N (2017) Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection, vol. 26, no. 7 44. Al-sanjary OI, et al. (2018) Deleting object in video copy-move forgery detection based on optical flow concept. In: IEEE Conference on Systems, Process and Control, no. December, pp 33–38 45. Li S, Huo H (2021) Frame deletion detection based on optical flow orientation variation. IEEE Access 9:37196–37209 46. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) DeepFakes and beyond: a survey of face manipulation and fake detection. arXiv Prepr. arXiv2001.00179, pp 1–23 47. Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based CNN. In: Proceedings - 2019 international conference on computer vision workshop. ICCVW 2019, no. Micc, pp 1205–1207 48. Kaur M, Gupta S (2016) A passive blind approach for image splicing detection based on DWT and LBP histograms. Int. Symp. Secur. Comput. Commun. Springer, Singapore, pp 318–327. https://doi.org/10.1007/978-981-10-2738-3_27 49. Kaur M, Gupta S (2017) A fusion framework based on fuzzy integrals for passive-blind image tamper detection. Clust Comput 22(S5):11363–11378
Chapter 9
Quantifying Changes in Sundarbans Mangrove Forest Through GEE Cloud Computing Approach Chiranjit Singha
and Kishore C. Swain
1 Introduction Mangrove forests mostly found in the some part of the coast belts. It represents local to the global functional link between the terrestrial and oceanic carbon cycles [1]. It is a rich source of environmental goods, services having very high economic and societal benefits for biodiversity management [2]. Sunderban supports high levels of floral and faunal diversity [3], attarting a number of turist every year. Cyclone disturbances can cause significant damage to forest vegetation. Simard et al. [4] stated that climate change disturbed the mangrove structure and carbon stocks from national to regional scale through cyclones and other natural hazards. Cyclone landfall frequency is very high in recent times [5] in the coastal areas. Additionally, mangroves are also affected by anthropogenically driven disturbances including deforestation, development aquaculture resources and urbanization [6], and coastline transgression due to sea level rise [7]. Recent estimates of annual global mangrove loss rates ranged between 0.16 and 0.39%, which may be up to 8.08% in South and Southeast Asia [8]. The assessment showed that mangrove cover in the West Bengal (South 24 Parganas district etc.) in 2019 was 2,112 sq km, which was 42.5% of the country’s total mangrove covered area. There had been a net decrease of 2.0 sq km in the mangrove-covered since 2017 [9] in Sunderban. Many local activities such as deforestation, developmental activity such as expansion of agriculture, aquaculture and natural calamaties in terms of frequent tropical cyclones namely Sidar, Amphan, Bulbul may damage the Sundarbans mangrove forest during the last two decades [10]. Islam et al. [11] also stressed on factors such as anthropogenic interference and climate variation as the main drivers for mangrove cover changes in Bangladesh from 1976 to 2015. Spaceborne satellite remote sensing technology is most useful C. Singha (B) · K. C. Swain Department of Agricultural Engineering, Institute of Agriculture, Visva-Bharati, Sriniketan 731236, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_9
113
114
C. Singha and K. C. Swain
for assessing vegetation change in general and mangroves’ dynamics and extensions in the Sundarbans region [12, 13] in particular. Al-Amin Hoque et al. [14] assessed the impact of cyclone Sidar (2007) through the object-based image analysis of the SPOT 5 imagery. RS-based Google Earth Engine (GEE) cloud was used to quickly quantify changes in the mangrove landscape in the Sundarbans and French Guiana region between 1984 and 2018 [15]. The web-based GEE tool was very useful for rapid monitoring and mapping the mangrove ecosystem in particularly for smaller areas of interest in Myanmar [16]. Mondal et al. [17] used optical Sentinel-2 data and Google Earth Engine platform to predicting mangrove coverage area through the classification and regression trees (CART) along with random forest (RF) machine learning models.
1.1 Importance of SAR Data The remote sensing management tool is the more reliable and efficient approach for continuous mapping, monitoring, and change detection of the mangrove species. SAR plays an important role in monitoring the biophysical parameters of mangrove forests since its microwave energy can penetrate clouds covers mostly found in the tropics, to acquire data throughout the year [18]. SAR technique is very useful for quick assessment of damages and recovery of mangroves, even from tropical cyclones that hit Sundarbans. HV backscatter from SAR images is highly correlated to more than 120–150 Mg·ha-1 response of mangrove biomass estimation. The Lband of SAR data themselves can also be used to classify taller (> ~ 10 m) mangroves with prop root systems (e.g. Rhizophora and Ceriops species) from comparatively lower L-band HH backscatter [19]. Mangrove AGB biomass estimation through the ALOS PALSAR imagery is very popular [20] in recent times. Darmawan et al. [21] demonstrated that HH and HV polarimetry of ALOS PALSAR estimated AGB mangrove forest impact of tidal height and topography in Indonesia. In recent years, there has been an increase in high-performance cloud computing platforms, such as GEE which allows free access to the vast and fast-growing earth observation data for global, as well as regional studies [22]. Landsat images were used to derive the normalized difference vegetation index (NDVI) in GEE cloud platform for quick assessment of the change detection for twentyone pre and post cyclonic events during 1988–2016. This long-term studies identified the impacts of cyclones on Sundarbans mangrove species regeneration, composition structure, classification, and mapping accuracy using different RADAR data such as Radarsat-1, SAR ERS-1/2, and ALOS PALSAR [23].
9 Quantifying Changes in Sundarbans Mangrove Forest …
115
1.2 Scope and Objectives Sundarban region has been frequently affected by severe cyclonic storms.Though mangroves protects the plaing from the cyclone, get destroyed themselves in the process. Indian Meterological Department (IMD, 2019) reported of 252 cyclonic storms originated from the Bay of Bengal during 1981–2018. The frequent cyclone storms caused great devastation of mangrove species. The previous studies quantifying changes in mangrove forest mapping mostly used the traditional optical and SAR RS approaches due to unavailability of fast processing system. Recently free availability of the Google Earth Engine cloud platform with fast processing of SAR data on a real-time basis, we have examined the suitability of radar-based RS technique for Sundarbans mangroves forest change detection in different temporal scales. The platform is also used to analyze radar images to estimate above-ground biomass and Bulbul cyclone effect on mangrove forests in Sunderban.
2 Materials and Methods 2.1 Study Area The Sundarbans mangrove region is formed by the sedimentation deltaic plain of the Ganges, Brahmaputra, and Meghna rivers situated in India and Bangladesh in the vicinity of the Bay of Bengal (Fig. 1). The study area is located between 210 31 – 220 30 N latitude and 880 10 –890 51 E longitude covering an estimated area of 187.65 sq km. The elevation of the forest area varies between 0.9 and 2.1 m above mean sea level [24]. Sundarban covers an area of about 10,000 km2 , 40% of which is located Fig. 1 Study area map of Sundarban
116
C. Singha and K. C. Swain
in India [25]. The dominating tree species of mangrove forest are Sundari, Goran, Gewa, Passur, Keora, Kankra, and Baen [10], in the study area. The Sundarbans support exceptional biodiversity with a wide range of flora and fauna including 453 faunal wildlife, 290 species of birds, 120 species of fishes, 53 reptiles, 49 species of mammals and 8 amphibian species. The floras includs 17 pteridophytes, 87 monocotyledons, and the rest are dicotyledons including 35 legumes, 29 kinds of grass, 19 sedges, and 18 euphorbias. The average monthly temperature ranges from 12 to 35 °C, and the average rainfall is about to 700 mm/year, 80% of the total precipitation in the region occurs during the monsoon season from June to October. Relative humidity varies between 70 and 80%.
2.2 Image Processing 2.2.1. Data Description Different types of freely available open-sources of spatial data used for the details quantified the Sundarbans mangroves forest change detection through the GEE cloud platform (Table 1). Firstly, Global Mangrove Watch (GMW) inbuild by Japanese L-band synthetic aperture radar (SAR) 2010 vector data was acquired from (https://data.unep-wcmc. org/datasets). Secondly, Global Japanese earth resources satellite (JERS-1), ALOS PALSAR-1, and ALOS PALSAR-2 mosaic tiles L-Band, 25 m resolution data products were obtained from JAXA EORC. The ENVI format images were opened in the Sentinel Toolbox using Import to convert option in GeoTiff format. Global yearly mosaics images were available annually in 1996, 2007, 2010, and 2017 for HH polarization backscattering coefficient and HV polarization backscattering coefficient in 16-bit DN. However, the 2010 mosaic was demarcated as the reference year because of the most complete in terms of spatial coverage and temporal consistency and therefore was create a mosaic from the four JERS-1 images uploaded to the GEE code editor. Thirdly apply a speckle filter to the PALSAR and JERS-1 images for converting the images to dB with the following Eq. 1 and 2 [4]. Sigma naught (d B) = 10 log10 ( pi xel value)2 + Calibration Factor
(1)
where, Calibration Factor = −83, for the PALSAR images and −84.66 for JERS1satellite images. Conver t the images to d B : 0 = 10log10 D N 2 + C F
(2)
where, CF = calibration factor, CF = −83.0 for PALSAR and CF = −84.66 for JERS [26].
9 Quantifying Changes in Sundarbans Mangrove Forest …
117
Table 1 Description of data sources used for processing mangrove forest changes SL. No.
Data type
Description
Source
1
The Global Mangrove Watch (GMW) yearly
Vector (polygon;.shp), WMS 2.0, 0.8 arc seconds, WGS 1984
https://data.unepwcmc.org/datasets
2
ALOS PALSAR Global Yearly Mosaics– 1996, 2007,2010,2017
JAXA/ALOS/PALSAR/YEARLY/SAR, L-Band, 25-m resolution
https://developers. google.com/earthengine/datasets/cat alog/JAXA_A LOS_PALSAR_ YEARLY_SAR# citations
3
SRTM
USGS/SRTMGL1_003, 30 m DEM
https://developers. google.com/earthengine/datasets/cat alog/USGS_SRTM GL1_003
4
Sentinel-1A SAR
COPERNICUS/S1_GRD, MultiSpectral Instrument, Level-1C, 10 m resolution
https://developers. google.com/earthengine/datasets/cat alog/COPERN ICUS_S1_GRD
5
Sentinel-2B MSI
COPERNICUS/S2, 10-m resolution
https://developers. google.com/earthengine/datasets/cat alog/COPERN ICUS_S2#descri ption
Calculate the ratio between images from two different dates within 1996–2017 and 2007–2017 to estimate the threshold value. Start with standard deviation and adjust it according to the results desired for creating a variable called reducers that combines the mean and standard deviation [4]. Lucas et al. [27] showed that changes of gain and losses map through mangrove AGB and canopy height using the L-band backscatter from shuttle radar topographic mission (SRTM). Dual-polarization ALOS PALSAR Fine Beam Dual (FBD) data also generated Radar Forest Degradation Index (RFDI). Generally, RFDI is useful for biomass gain, loss estimation, forest change detection, and its recovery from disturbances of different anthropogenic or natural activities with quad-pol or dual-pol backscatter [28]; Eq. 3. RFDI =
γoH H − γ0HV γoH H + γ0HV
(3)
where, γ0 denotes the geometrically and radiometrically corrected SAR backscattering coefficient for individually polarization combination. HH represents horizontal transmit and horizontal receive, HV denotes horizontal transmit and vertical receive.
118
C. Singha and K. C. Swain
RFDI value ranges from 0 to 1 and value of HH backscatter > HV backscatter, even for complex terrain structure. The RFDI value of >0.3 represent dense forests, >0.4 denotes degraded forests and >0.6 represent deforested area. Inundated or degraded forests are described by a large variance between HV and HH values.
2.3 Loss and Gain map Mangrove loss and gain map estimated through GEE code editor “Assets” tab within the study area “Shapefiles”. The image was then imported as global mangrove distribution vector file. PALSAR and SRTM 30-m DEM (USGS/SRTMGL1_003) was loaded by applying thresholds based on the mean and standard deviation for estimating polygons showing loss and gain in the map for selecting the epoch.
2.3.1
Mangrove Biomass SRTM Product
Radar interferometry is very useful for mapping mangrove canopy height. SRTM product used for elevation measurement, acted between radar microwave and canopy volume, might be influenced by forest canopy height and density [29]. We estimated mangroveabove ground biomass using canopy height measured from SRTM DEM. SRTM based elevation representes basal area-weighted height or Lorey’s height. Mangrove canopy height is directly correlated to SRTM elevation where mean sea level is insignificant topography. We used the SRTM 30 m resolution global digital elevation model (DEM) derived global mangrove extent map for masking and categorizing mangrove and non-mangrove areas in the SRTM elevation data set [30]. SRTM elevation values extending from 0 to 55 m above MSL were masked to eliminate some areas misleadingly recognized as mangroves area. Then an allometric equation was directly applied to the DEM. There are several generic equations relating SRTM to canopy height and above ground biomass for mangroves (Eq. 4): Basal ar ea weighted height : Hba ∼ 1.08 ∗ S RT M; Maximum canopy height : Hmax ∼ 0.93 ∗ 1.7 ∗ S RT M; 1.53 , Abovegr ound Biomass : B ∼ 3.25 ∗ Hba
(4)
Abovegr ound Biomass( AG B) = 3.25 ∗ (1.08 ∗ S RT M)1.53
2.4 Sentinel-1 SAR Data and Pre-processing The Sentinel-1 satellite was launched and operated by European Space Agency (ESA) and the data are freely available at http//.earthengine.google.com. The Sentinel-1
9 Quantifying Changes in Sundarbans Mangrove Forest …
119
mission provides data from a dual-polarization C-band Synthetic Aperture Radar (SAR) instrument. In this study, we used the IW mode, which is provided in dualpolarization with vertical transmit, vertical receive (VV), and vertical transmit, horizontal receive (VH) being calibrated and ortho-corrected. Backscatter coefficient (σ°) in dB obtained from processed Level-1 Ground Range Detected (GRD) data [31]. The spatial resolution of this imagery is 10 × 10 m. These SAR data were accessed through the Google Earth Engine (GEE). The Earth Engine pre-processed the Sentinel-1 data to derive the backscatter coefficient in each pixel using the Sentinel-1 toolbox in GEE API. Quantifying changes before and after bulbul cyclone S1 mosaic data acquisitions take place during 01–07 and 13–20 November 2019, respectively.
2.5 Sentinel-2 Image Processing In this research work, optical Sentinel-2B MSI (Multi-Spectral Instrument), Level1C data was applied for Normalized Differential Vegetation Index (NDVI) analysis. All the S2 data used were obtained from the European Union/ European Spatial Agency (ESA)/Copernicus at the system through the GEE platform. Sensing orbit direction was descending mode with orbit number of 33. The function to cloud mask from built-in quality band filtering [32] was used. S2 data was collected during 06– 30 April 2017, which is the peak vegetative growth season of the mangroves in the study area.
3 Results The mangrove forest Above Ground Biomass (AGB) was estimated using L band ALOS PALSAR in GEE cloud computing platform at the regional level for West Bengal, India. The forest gain/loss change detection status was mapped for the period of 1996–2017. The study demonstrated the potential use of L-band PALSAR backscatter information for the reliable estimation of spatial AGB in the tropical region (Fig. 2). Lucas et al. [19] generated AGB maps based on mangrove species structure and height through RADAR backscatter data. High HH backscatter varies between −0.79 and 5.04 dB during 1996–2017, where the HV backscatters ranges from −10.24 to −8.63 dB during 2007–2017 (Fig. 2). North eastern part of the region found high HH and HV backscatter due to dense mangrove compare to western part.
120
C. Singha and K. C. Swain
Fig. 2 Backscatter map a HH map 1996 b HH map 2007, c HV map 2007, d HH map 2017, e HV map 2017
3.1 Loss and Gain Map This study employed the GEE to classify and estimate the area of mangrove loss and gain during 1996–2017 (Fig. 3). Global Mangrove Watch (GMW) 2010 vector map was used as reference map in the study. The GMW might played supportive roles in
Fig. 3 Vegetation ratio and loss/gain map (a) ratio of HH backscatter map 1996/2017 (b) ratio of HV backscatter map 2007/2017 (c) loss area map 1996–2017, (d) loss area map 2007–2017, (e) gain area map 1996–2017, (f) gain area map 2007–2017
9 Quantifying Changes in Sundarbans Mangrove Forest …
121
Fig. 4 Mangrove (a) NDVI and (b) RFDI map for 2017
forming national and international policies for long-term management of mangrove ecosystems ensuring sound societal benefits [33]. ALOS PALSAR backscatter HH and HV ratio images were also estimated during 1996 to 2017 (Figs. 4a and 3b [34]). NDVI and RFDI maps of 2017 quantified the vegetation damage and growth extent mangrove in the study area (Figs. 4a and 4b). NDVI and RFDI maps were generated from Sentinel optical and SAR images for the interest year with masked area of interest (AOI). Radar forest degradation index (RFDI) described the mangrove type orientation, structure, biomass, and forest landscape dynamics. In the northern part of the mangrove region, the higher RFDI value (around 0.013) indicated that the forest degradation takes place due to various long-term climatic or anthropogenic interference. Further analysis was made by combining NDVI and RFDI relationship with different environmental conditions such as moisture and phenology distribution. NDVI value in this region varies from −0.24 to 0.56 and RFDI value differs between −2.72 to 0.01. Therefore the amount of rainfall variable influences the spectral reflectance of the leaves and corresponding NDVI outcomes in the wet periods of these years of the Sundarbans area. Long-term mangrove loss area found 1.12 sq. km during 1996–2017 whereas, the higher mean value is 10.46 sq. km. Similarly, short-term mangrove loss and the gain area are 0.78 and 0.25 sq. km respectively during 2007–2017. The current multi-temporal ALOS PALSAR data investigation found the study area gained 3.07 sq. km mangroves area between 1996 and 2017 epoch, and 0.25 sq. km mangroves area was increased between 2007 and 2017 epoch. Average no change mangrove area is positive during1996–2017 compare to 2007–2017. Mangrove loss and gain area mapping with the help thresholding approach.
3.2 AGB Maps The distribution of mangrove AGB maps and mangrove SRTM elevation map for the entire Sundarbans region is shown in (Fig. 5). Mangrove AGB maps generated using ALOS PALSAR L-band backscatter through GEE cloud computing. The densest mangrove elevation, as well as AGB, was found in the eastern region of the study
122
C. Singha and K. C. Swain
Fig. 5 Sundarban mangrove forest (a) Mangrove SRTM elevation map; (b) AGB map
Table 2 Mangrove forest loss and gain change area statistics 1996–2017 1996–2017 2007–2017 Loss Loss area (sq km)
Gain 1.12
0.78
No loss area (sq km) 186.53
186.87
Threshold value Mean SD
1996–2017 2007–2017
8.70
9.40
10.46
13.20
1.63
2.27
Gain area (sq km) No gain area (sq km)
3.07
0.25
184.58
187.40
−4.50
−6.40
−10.14
−11.20
2.26
2.06
Mean
0.97
−0.09
SD
1.44
1.07
Threshold value Mean SD No change
area. AGB ranges from 30.2 to 700 Mg·ha−1 and mangrove elevation varies from −16 to 31 m in the study area. In the Sundarbans region, the average biomass concentration is limited at 93.7 ± 33.0 Mg·ha−1 [35]. There is similarity in distribution of forest biomass concentration mangrove as identified from SRTM data in the GEE cloud (Fig. 5). This established the NDVI as a leading tool for identifying the disturbances mangrove vegetation covers (Table 2).
3.3 Bulbul Cyclone We assessed the before and after VH and VV backscatter values of S1 for 25 field locations in the mangrove forest region after the bulbul cyclone event (Table 3). The highest difference in VV backscatter value was 1.290 dB, located in the Herobhanga area. Similarly, the highest difference VH backscatter value was found in 1.118 dB at the Dattar region. Low backscattering signature during the post-cyclone of VH and VV Polarization −12.43 dB were found at Jhila region and −6.97 dB at Herobhanga region, respectively. This is caused due to the damage in the forest stand height. Before the bulbul cyclone the VH backscatters varied between −40.79 to −9.91 dB,
Location
Bob I
Netidhopani
Sajnekhali
Dattar
Jhila
Burirdabri
Harinbhanga
Chaimari
Khatuajhuri
Jhingekhali
Chamta
Sundarban RF
Kalash
Haldibari
Gona
Mechua
Plot ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
21.628194
21.641043
21.639615
21.59964
21.808081
21.870899
22.155007
22.053642
21.605351
22.006528
22.086478
22.155007
22.082195
22.117887
21.916585
21.57537
Latitude
88.969156
88.847804
88.725024
88.570835
88.860653
88.949169
88.940603
88.996282
88.716458
88.959163
88.99771
88.940603
88.894917
88.81925
88.723596
88.867791
Longitude
0.938
0.919
1.281
1.113
0.963
0.939
1.194
0.989
0.946
0.972
1.096
1.194
0.946
0.783
1.142
0.981
Diffrence VV
Table 3 S1 backscatter value for bulbul cyclone in ground mangrove location
0.857
0.907
0.960
0.994
0.960
0.939
0.881
0.891
1.079
0.927
1.031
0.881
1.118
0.983
1.033
0.951
Diffrence VH
−8.040 −8.719 −7.717 −7.893 −9.538
−15.210 −13.622 −12.434 −13.461 −14.585
−8.949
−9.225
−12.839
−12.942
−7.839
−13.700
−7.945
−7.717
−12.434
−14.004
−8.592
−15.106
−8.415
−6.994
−13.922
−9.562
−8.119
−15.234
−14.405
−9.718
−14.461
−13.647
After VV
After VH
−15.102
−15.434
−14.216
−14.492
−15.187
−14.340
−14.118
−15.281
−14.097
−13.843
−13.288
−14.118
−13.507
−14.167
−14.740
−15.212
Before VH
(continued)
−9.545
−8.645
−7.467
−7.562
−9.908
−8.402
−6.463
−8.814
−8.497
−9.490
−7.151
−6.463
−9.087
−8.927
−7.112
−9.906
Before VV
9 Quantifying Changes in Sundarbans Mangrove Forest … 123
Location
Baghmara
Goashaba
Dobanki
Arbesi
Pancha Mukhani
Chandkhali
Matla
Choto Hardi
Herobhanga
Plot ID
17
18
19
20
21
22
23
24
25
Table 3 (continued)
21.993679
21.720993
21.869471
21.83949
21.992252
22.135019
21.993679
21.739553
21.681245
Latitude
88.669344
88.717885
88.737873
88.990571
88.870647
89.027691
88.762143
88.803546
89.06935
Longitude
1.290
1.045
1.162
0.989
0.783
1.010
1.088
1.233
1.187
Diffrence VV
1.053
0.982
0.987
0.912
0.946
1.005
0.965
0.928
1.000
Diffrence VH
−9.293 −9.816 −6.978
−14.268 −12.891
−9.626
−13.543 −14.788
−8.874
−8.035
−14.630 −8.496
−8.118
−13.389 −14.026
−8.655
−13.603
−13.479
After VV
After VH
−12.247
−15.053
−14.458
−14.844
−14.243
−13.963
−15.155
−14.433
−13.597
Before VH
−5.407
−9.397
−7.997
−9.736
−10.849
−8.782
−7.382
−6.586
−7.289
Before VV
124 C. Singha and K. C. Swain
9 Quantifying Changes in Sundarbans Mangrove Forest …
125
Fig. 6 Bulbul cyclone S1 backscatter map (a) before VH, (b) before VV (c) after VH (d) after VV, (e) difference VH (f) difference VV (g) ratio VH (h) ratio VV
where the VV backscatter ranges from −28.18 to −3.33 dB (Fig. 6). After the bulbul cyclone the VH backscatter varies between −42.18 and −9.26 dB, whereas the VV backscatters ranges from −31.80 to −1.01 dB (Fig. 6). The ratio of VH values of the S1 backscatter were ranged between −14.88 and 9.48 dB and the ratio of VV ranged between −14.50 to 13.69 dB (Fig. 6).
4 Discussion Mangrove forest acts as a protection wall against the cyclones, tides, tsunami and other threats arise from the sea. It helps in reducing soil erosion in the coastal zones with its net type rooting system. The rich biodiversity of Sunderbans mangrove provides fish fauna production and livelihood to the tribals along with environmental protection. A number of tourist visits these areas to spend quality time with nature, through boat riding, night stay etc. and provids livelihood to the youth and locals. The current research used multi-temporal RADAR imagery to quantify changes in Sundarbans mangrove forest through the GEE cloud computing approach. Realtime basis quantifying changes of mangrove forest mapping is a most constructive
126
C. Singha and K. C. Swain
method that allows assisting decision-makers to have a proper inspection for ecological restoration. Radar remote sensing is an influential tool for controlling mangrove extent and map wide-ranging structure qualities. It allows changes of forest cover deviations regardless of cloud cover. GEE cloud computing technique is a very good alternative approach to the rapid investigation of mangrove preservation through frequent availability of low prices for SAR data. This research will support future decisions of other mangrove ecosystems in the world. The extent of the loss of the Sundarbans mangroves through RADAR images in GEE cloud computing was around 1.12 sq. km during 1996–2017. It was also found that there was higher vegetation gain during 1996–2017 than 2007–2017. Remote sensing guided management tool was found very reliable and efficient technique for continuous mapping, monitoring, guiding management tool. The change detection of the mangrove forests from a long-term viewpoint, such occasional yet disastrous cyclones may have an important consequence on species composition and regeneration in the Sundarbans region [23]. The importance of mangrove is well documented. Since 2016, July 26 has been observed as World Mangorve day to protect them from external calamaties. This initative promotes protection, restoration and controlled use of mangrove in the coastal belt. Sustainable uses includes ecotourism, fish farming, bee keeping etc. to raise the livelihood and discourge deforestation by the local people. The outcomes of this study will support long-term preservation of mangrove considering the multifaceted threat including natural disasters, different anthropogenic activities, and mangrove forests [36]. This will promote restoration activities which need to be carried out after natural calamities in the Sunderbans and other mangrove regions. Lack of field reference data as well forest biomass information is the major limitation for accurate measurement of the damaged forest immediately after the cyclone but we were unable to do that as post-cyclone Bulbul event. Local bulletin reports and field photographs are verified the forest damage and affected area. It is challenging to quantifying the forest area changes due to flood inundation and coastal deformation through tidal surge activities. The temporal extent of GMW layers has methodological limitations for mangroves area estimation from 1984 to 1996. Availability of current ALOS PALSAR RADAR data is costly for near real-time monitoring of the forest ecosystem. The current research is successfule at the local scale so this method can be carried out on a global scale.
5 Conclusion The cyclone storm threats the flura and funa of the Mangrove region. This study estimated the above-ground biomass of the Sundarbans mangrove forests with the ALOS PALSAR RADAR images for both the long-term and short-term approaches. The most dense mangrove elevation, as well as AGB, were found in the eastern region of the study area. The average AGB ranges from 30.2 to 700 Mg·ha−1 and mangrove elevation varies from −16 to 31 m in the study area.
9 Quantifying Changes in Sundarbans Mangrove Forest …
127
The cyclone destruction of vegetation was measured basically from the Sentinel 1 backscatter values before and after the cyclone. This study has quantified the extent of pre and post cyclone Bulbul effect of the Sundarbans applying RS techniques on rapidly and easily available S1 imagery in the Google Earth Engine platform. Trees with evacuated canopies or fragmented stems as a consequence of strong winds can contribute to the reductions in the backscatters values after cyclones. After the bulbul cyclone, the VH backscatter varies between −42.18 and −9.26 dB, whereas the VV backscatters range from −31.80 to −1.01 dB. Further research including a higher number of condition factors such as topography, climatic variable, hydrology, oceanic activity, and anthropogenic interference, etc. responsible for the impact on mangrove forest through advance AI techniques, LiDAR, UAV, ICESat-2, and RS data cube technology for policy construction and decision-making analysis. Acknowledgements The authors do hereby acknowledge the contribution Visva-Bharati (A Central University), West Bengal, India for facilitating this research work.
References 1. Mcleod E et al (2011) A blueprint for blue carbon: toward an improved understanding of the role of vegetated coastal habitats in sequestering CO2 . Front Ecol Environ 9(10):552–560. https://doi.org/10.1890/110004 2. Alongi DM (2008) Mangrove forests: Resilience, protection from tsunamis, and responses to global climate change. Estuar Coast Shelf Sci 76:1–13 3. Vo QT, Kuenzer C, Vo QM, Moder F, Oppelt N (2012) Review of valuation methods for mangrove ecosystem services. Ecol Ind 23:431–446. https://doi.org/10.1016/J.ECOLIND. 2012.04.022 4. Simard M (2019) Radar remote sensing of mangrove forests. In: SAR handbook: comprehensive methodologies for forest monitoring and biomass estimation (eds.) Flores A, Herndon K, Thapa R, Cherrington E NASA 2019. https://doi.org/10.25966/33zm-x271 5. Hutchison J, Manica A, Swetnam R, Balmford A, Spalding M (2014) Predicting global patterns in mangrove forest biomass. Conserv Lett 7:233–240 6. Richards DR, Friess DA (2016) Rates and drivers of mangrove deforestation in Southeast Asia, 2000–2012. Proc Natl Acad Sci USA 113:344–349 7. Sasmito SD, Murdiyarso D, Friess D, Kurniato S (2016) Can mangroves keep pace with contemporary sea level rise? A global data review. Wetlands Ecol Manage 24:263–278 8. Hamilton SE, Casey D (2016) Creation of a high spatio-temporal resolution global database of continuous mangrove forest cover for the 21st century (CGMFC-21). Glob Ecol Biogeogr 25:729–738 9. FSI (2019) India State of Forest Report 2019, Forest Survey of India, (Ministry of Environment Forest and Climate Change), Dehradun, India. https://fsi.nic.in/isfr-volume-i?pgID=isfr-vol ume-i, Accessed 21 Nov 2020 10. FAO (2007) The World’s Mangrove 1980–2005 (2007), FAO Forestry Paper 153, Food and Agricultural Organization of the UN, Rome, Italy. http://www.fao.org/forestry/95632/en/. Accessed 21 Dec 2020 11. Islam MM, Borgqvist H, Kumar L (2019) Monitoring mangrove forest landcover changes in the coastline of Bangladesh from 1976 to 2015. Geocarto Int 34(13):1458–1476. https://doi. org/10.1080/10106049.2018.1489423
128
C. Singha and K. C. Swain
12. Rahman MR, Hossain MB (2015) Changes in land use pattern at Chakaria Sundarbans mangrove forest in Bangladesh. Bangladesh Res Publ J 11(1):13–20. ISSN:1998-2003 13. Sulaiman NA, Ruslan FA, Tarmizi NM, Hashim KA, Samad AM (2013) Mangrove forest changes analysis along Klang coastal using remote sensing technique. In: Proceedings of the IEEE 3rd International conference on system engineering and technology (ICSET); Aug 19–20; Malaysia (Shah Alam): IEEE, pp. 307–312 14. Al-Amin Hoque M, Phinn S, Roelfsema C, Childs I (2016) Assessing tropical cyclone impacts using object-based moderate spatial resolution image analysis: a case study in Bangladesh. Int J Remote Sens 37(22):5320–5343. https://doi.org/10.1080/01431161.2016.1239286 15. Bhargava R, Sarkar D, Friess DA (2021) A cloud computing-based approach to mapping mangrove erosion and progradation: case studies from the Sundarbans and French Guiana. Estuarine Coastal Shelf Sci 248:106798. https://doi.org/10.1016/j.ecss.2020.106798 16. Yancho JMM, Jones TG, Gandhi SR, Ferster C, Lin A, Glass L (2020) The google earth engine mangrove mapping methodology (GEEMMM). Remote Sens 12(22):3758. https://doi.org/10. 3390/RS12223758 17. Mondal P, Liu X, Fatoyinbo TE, Lagomasino D (2019) Evaluating combinations of sentinel-2 data and machine-learning algorithms for mangrove mapping in West Africa. Remote Sens 11:2928. https://doi.org/10.3390/rs11242928 18. Zhu Y, Liu K, Liu L, Wang S, Liu H (2015) Retrieval of mangrove aboveground biomass at the individual species level with worldview-2 images. Remote Sens 7(9):12192–12214 19. Lucas RM, Mitchell AL, Rosenqvist A, Proisy C, Melius A, Ticehurst C (2007) The potential of L-band SAR for quantifying mangrove characteristics and change: case studies from the tropics. Aquat Conserv Mar Freshwat Ecosyst 17:245–264. https://doi.org/10.1002/AQC.833 20. Pham TD, Yoshino K (2017) Aboveground biomass estimation of mangrove species using ALOS-2 PALSAR imagery in Hai Phong City Vietnam. J Appl Rem Sens 11(2):026010. https://doi.org/10.1117/1.jrs.11.026010 21. Darmawan S, Takeuchi W, Vetrita Y, Wikantika K, Sari DK (2015) Impact of topography and tidal height on ALOS PALSAR polarimetric measurements to estimate aboveground biomass of Mangrove forest in Indonesia. J Sens 2015:1–13. https://doi.org/10.1155/2015/641798 22. Swain KC, Singha C, Nayak L (2020) Flood susceptibility mapping through the GIS-ahp technique using the cloud. ISPRS Int J Geo-Inf 9(12):720. https://doi.org/10.3390/ijgi9120720 23. Mandal MSH, Hosaka T (2020) Assessing cyclone disturbances (1988–2016) in the Sundarbans mangrove forests using landsat and google earth engine. Nat Hazards 102:133–150. https:// doi.org/10.1007/s11069-020-03914-z 24. Rahman LM (2000) The Sundarbans: a unique wilderness of the world. In: USDA forest service proceedings RMRS-P-15(2), pp. 143–148 25. WCMC (UNEP World Conservation Monitoring Centre) (2005) Protected Areas Database. http://www.wcmc.org.uk/data/database/un_combo.html. Accessed 21 Dec 2020 26. JAXA (2014) ALOS-2/PALSAR-2 Level 1.1/1.5/2.1/3.1 CEOS SAR Product, Japan Aerospace Exploration Agency, Tokyo. https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/data/index.htm. Accessed 2 Oct 2020 27. Lucas R, Rebelo LM, Fatoyinbo L, Rosenqvist A, Itoh T, Shimada M, Hilarides L (2014) Contribution of L-band SAR to systematic global mangrove monitoring. Mar Freshw Res 65(7):589. https://doi.org/10.1071/mf13177 28. Saatchi S (2019) SAR Methods for mapping and monitoring forest biomass. In: SAR handbook: comprehensive methodologies for forest monitoring and biomass estimation (eds.) Flores A, Herndon K, Thapa R, Cherrington E. NASA. https://doi.org/10.25966/hbm1-ej07 29. Trettin CC, Stringer CE, Zarnoch SJ (2016) Composition, biomass and structure of mangroves within the Zambezi River Delta. Wetlands Ecol Manage 24(2):173–186 30. Giri C et al (2011) Status and distribution of mangrove forests of the world using earth observation satellite data. Glob Ecol Biogeogr 20:154–159 31. ESA (2019) User Guides - Sentinel-1 SAR - Level-1 - Sentinel Online. https://sentinel.esa.int/ web/sentinel/user-guides/sentinel-1-sar/producttypes-processing-levels/level-1. Accessed 23 Dec 2020
9 Quantifying Changes in Sundarbans Mangrove Forest …
129
32. Radoux J, Chome G, Jacques DC, Matton N et al (2016) Sentinel-2’s potential for sub-pixel landscape feature detection. Remote Sens 8(6):488 33. Bunting P, Rosenqvist A, Lucas RM, Rebelo LM, Hilarides L, Thomas N, Hardy A, Itoh T, Shimada M, Finlayson CM (2018) The global mangrove watch—A new 2010 global baseline of mangrove extent. Remote Sens 10:1669 34. Zhen J, Liao J, Shen G (2018) Mapping mangrove forests of dongzhaigang nature reserve in china using landsat 8 and radarsat-2 polarimetric SAR data. Sensors 18(11):4012. https://doi. org/10.3390/s18114012 35. Ray R et al (2011) Carbon sequestration and annual increase of carbon stock in a mangrove forest. Atmos Environ 45:5016–5024 36. Singha C, Swain KC, Sahoo BB, Ghosh P, Swain SK (2019) Assessment of bio diversity conservation using geospatial models. J Pharma Phytochem 8(1):1177–1186
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms for Software Integration Testing Valdivino Alexandre de Santiago Júnior and Camila Pereira Sales
1 Introduction Evolutionary algorithms (EAs), such as genetic algorithms, particle swarm optimisation, simulated annealing, are some examples of metaheuristics which have been employed for such a long time for solving non-trivial optimisation problems giving their ability to obtain the best/most suitable solutions in relatively small amount of time, even when they face very large problem sizes [1]. Despite their successes, researchers state that metaheuristics are still not straightforward able to be applied to new optimisation problems with no or minimal change, or even to new instances of the same problem [2]. The ability of an optimisation algorithm to solve well not only a specific problem but rather a series of distinct problems is a measure of its generalisation. More general an algorithm is, the better. Hence, according to this perspective, overall metaheuristics have low generalisation capabilities. Hence, the motivation to develop hyper-heuristics, even higher-level search techniques aiming to be more general [2, 3]. In hyper-heuristics, the search is performed in the space of heuristics (or heuristics components) instead of being performed directly in the decision variable space (space of solutions) [4]. Thus, at first, hyper-heuristics would be more general than metaheuristics. Even if recent studies have shown the best performance of hyper-heuristics compared to metaheuristics [4–7], we believe it is still required to accomplish further experiments in order to better answer the question of generalisation when comparing hyper-heuristics to metaheuristics. This is particular true when we address discrete and real (not benchmark) non-trivial optimisation problems. V. A. de Santiago Júnior (B) · C. P. Sales Coordenação de Pesquisa Aplicada e Desenvolvimento Tecnológico (COPDT), Instituto Nacional de Pesquisas Espaciais (INPE), Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil e-mail: [email protected] C. P. Sales e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_10
131
132
V. A. de Santiago Júnior and C. P. Sales
In software engineering, the goal of the testing activity within the software development lifecycle is to detect the maximum number of defects in the software product. Optimisation and software testing are combined in an active subfield called searchbased software testing (SBST), where testing a software system is formulated as an optimisation problem [8, 9]. SBST is based on the fact that test objectives can be considered as objective functions, and hence optimisation algorithms can be used to help in this regard. Moreover, integration is a testing level where the emphasis is placed on building the structure of the system. Articles for integration test case generation have already been published [10, 11] but we realised the absence of studies that rely on optimisation methods to generate integration test cases considering C++ source code. Hence, in a previous work, we presented a method, called Software Integration Testing via Metaheuristics and Hyper-heuristics (InMeHy), aimed at generating integration test cases based on C++ source code and metaheuristics [12]. The method creates a directed graph1 which represents the integration of several files/classes of the application based only on the C++ code. We carried out an experimental evaluation with four metaheuristics where two are multi-objective EAs, Indicator-Based Evolutionary Algorithm (IBEA) [13] and Strength Pareto Evolutionary Algorithm-2 (SPEA2) [14], and two are more recent many-objective approaches, Nondominated Sorting Genetic Algorithm-III (NSGA-III) [15] and Metaheuristic Based on the R2 indicator-II (MOMBI-II) [16]. Three quality indicators were considered to perceive the performance of the algorithms which are: hypervolume [17], indicator [18], and modified inverted generational distance (IGD+) [19]. In our previous work [12], the value of the decision variable of a solution is an integer that identifies a vertex of the directed graph. In this study, we modify and extend our method so that a decision variable identifies indeed a vertex sequence of a directed walk in the directed graph.2 Therefore, we had to change our method and its implementation (tool) to deal with this new definition of the decision variable. This new version of our method is named InMeHy_STF where STF stands for Solution as Test Suite with Fixed Size of Test Cases (see Sect. 3). Moreover, we also present a (rigorous) controlled experiment to assess the generalisation issue we have mentioned above. We then considered the four metaheuristics we evaluated earlier and added three recent selection hyper-heuristics to make part of the comparison: Hyper-Heuristic based on Reinforcement LearnIng, Balanced Heuristic Selection and Group Decision AccEptance - Responsibility (HRISE_R) and Majority (HRISE_M) rules [4], and the Choice Function hyper-heuristic (HH-CF) [7]. We considered the same three quality indicators of the previous study and the case studies are two non-trivial C++ geoinformatics applications [12]. Our hypothesis is that hyper-heuristics will perform better than metaheuristics given the previous remarks. 1
We will use the terms “directed graph” and “graph” interchangeably in this article. A sequence differs from a set because repetition of elements is allowed and order matters. A finite or infinite sequence of edges directed in the same direction which joins a sequence of vertices is a directed walk. A directed trail is a directed walk in which all edges are different. Every directed trail is a directed walk but the opposite is not true.
2
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
133
Specifically, the research questions (RQs) we want to answer are: (a) RQ_1 - Which of the seven algorithms is the best regarding each quality indicator? (b) RQ_2 - Is there an algorithm that is clearly superior than all the others considering all quality indicators? (c) RQ_3 - Which of the hyper-heuristics presents the best performance? The main contributions of this article are: (a) We present an extension of our previous method and thus create a new one, InMeHy_STF, to generate integration test cases based only on C++ source code and optimisation algorithms, i.e. metaheuristics and hyper-heurisitics; (b) We present a controlled experiment considering two types of evaluations, crossdomain and statistical analyses, and seven algorithms to assess the generalisation issue; (c) We have made available online the tool [20] that implements the InMeHy_STF method, so that researchers and industry professionals can use it. This article is organised as follows. Relevant related studies are shown in Sect. 2. Section 3 presents an overview of our method, emphasising the STF version. In Sect. 4, we describe our controlled experiment and its results are in Sect. 5. In Sect. 6, we conclude our article and future directions are highlighted too.
2 Related Work Firstly, we should mention some preliminaries associated with this research. Metaheuristics can be divided into those inspired and those non-inspired by the nature. Nature-inspired metaheuristics include swarm intelligence [21] and EAs, such as genetic algorithms [22]. Note that hybrid methods combining machine learning, swarm intelligence, and EAs are promising directions to follow [23]. In this study, we focused on metaheuristics and hyper-heuristics based on EAs addressing the problem of integration test case generation. Some relevant related studies are presented in this section where we also point the differences between our research and theirs. With respect to software integration testing, in [24] the authors presented a technique which uses Unified Modelling Language (UML) sequence and state machine diagrams in combination to derive a control-flow graph to then generate integration test cases. But, they do not make use of optimisation algorithms as we do to generate integration test cases. In [10], authors presented a procedure for the automatic generation of integration test data based on genetic algorithms. They represented the behaviour and the interaction of software components via UML state machines enriched by messages sent between the components as well as possible effects of state transitions. They used as
134
V. A. de Santiago Júnior and C. P. Sales
objective functions the maximisation of the interaction coverage and the minimisation of the number of test cases (similar to our size of test suite function). However, they demand the existence of a behaviour diagram while we only require the source code of the application to generate the integration test cases. Whole Test Suite (WTS) generation is a strategy where instead of searching for a single test case for each individual coverage goal in sequence, the search problem is changed to a search for a test suite that covers all coverage goals at the same time [25]. The first difference between their work and ours is that they make use of a single value of objective function that aggregates the values of all objective functions measured for the test cases contained in a test suite. Moreover, one can not consider different types of coverage goals (and their respective objective functions) at the same time (e.g. brach and line coverage). Differently, our approach is a multi/many-objective optimisation one where all objective functions are optimised simultaneously. Furthermore, they target Java systems while in our work we address C++ applications by transforming the source code into a graph, and the objective functions are measures over this graph. In [26], the Dynamic Many-Objective Sorting Algorithm (DynaMOSA) was presented specifically to address the test case generation problem in the context of coverage testing. It is a many-objective approach and our strategy can also address many-objective problems. As well as WTS, their research can not consider different types of coverage goals at the same time. Like WTS, they address Java while we target C++ applications. Some recent studies presented experimentation involving evolutionary-based metaheuristics and hyper-heuristics considering continuous benchmark and realworld problems [4, 6, 7]. Others shown results of experiments with multi-armed bandit-based hyper-heuristics applied to the multi-objective permutation flow shop problem [5]. To the best of our knowledge, with the exception of our previous study [12], no other research presented a robust evaluation in the context of software integration testing formulated as a discrete optimisation problem as we do in this article.
3 Overview of the InMeHy_STF Method In this section, we present an overview of the InMeHy_STF method. However, we first provide some definitions that were adopted in this study. Definition 1. Abstract test case: An abstract test case is one whose representation does not allow it to be effectively executed against the Software Under Test (SUT). Such abstract test cases serve as a guide for generating the truly executable test cases, i.e. those that can be executed against the SUT obtained by translating the abstract test cases. Definition 2. Decision variable as an abstract test case: A decision variable is one element of a solution. The value of the decision variable of a solution is an integer that
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
135
identifies a sequence of vertices related to a directed walk of the directed graph. The graph represents the integration of files of the SUT. Therefore, a decision variable means a sequence of vertices of a directed walk which is also an abstract test case. Definition 3. Solution as a test suite with fixed size of test cases: A solution is formed by a sequence of decision variables. In our case, a solution of a population created by an optimisation algorithm is indeed a test suite, i.e. a sequence of abstract test cases (sequence of sequences of vertices of directed walks). The number of abstract test cases contained in the test suite (solution) is fixed in the InMeHy_STF method. As we have already said, in our previous work [12], the value of the decision variable of a solution is an integer that identifies a vertex of the graph (test step). In that case, the number of test cases a test suite (solution) can have is variable, depending on the number of times the terminal vertex of the graph appears in the solution. However, we realised that several abstract test cases created via the previous version of our method were inconsistent. In other words in that situation, it is usually necessary that the sequence of values of the decision variables within a (abstract) test case is consistent with the sequence of vertices (edges) of the graph, otherwise an inconsistent test case will be generated. With Definition 2, we handle this problem and an abstract test case is already consistent because it is a sequence of vertices of a directed walk of the directed graph. Hence, the number of test cases a test suite can have is now fixed and this explains the STF nomenclature. Based on the definitions above, we see that we are dealing with discrete optimisation. From this point onward, unless otherwise noted, we will denote an abstract test case simply as a test case for simplicity. Our method consists of six modules which accomplish transformation operations starting with C++ code, after generating a graph, and finally creating the test cases. As seen in Fig. 1, the six modules are: Collector, Reader, Extractor, Integrator, Constructor, and Generator. Note that the previous and current versions of our method present the same architecture. The difference between both methods is on the behaviour of the Generator module which ultimately creates the test cases. When generating test cases for object-oriented programming software, a unit is usually considered one class [25, 26]. Integration testing takes place to expose defects at the interfaces and in the interactions between integrated classes of the system [10]. However, if the system is already developed and in operation thus it is “complete” with all classes but, even so, integration testing can be important to detect unseen difficult and critical interface defects. However in such a situation, one may define as a unit one file where the integration process starts with a main file defined by the user and goes on considering all other dependencies (methods, classes) to create a unique model. This is precisely our context and hence the Collector module receives a C++ source code file (.cpp or .hpp) where this is considered the main file to start the integration process. One input of our method is the integration level which is a parameter that defines how many files will be integrated at a time in order to create a problem instance. The Reader module receives a file as input and outputs a syntax
Fig. 1 The architecture of the InMeHy_STF method. Source: adapted from [12].
136 V. A. de Santiago Júnior and C. P. Sales
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
137
tree. The Extractor module obtains the nodes (“vertices”) and branches (“edges”) of the syntax tree generated in the previous module. After receiving all lists of vertices and edges generated previously, the Integrator module integrates them, returning two lists: one containing all integrated vertices and another containing all integrated edges. Hence, the Constructor module creates a directed graph that represents the integration of files. Finally, the Generator module creates the test cases (sequences of vertices of directed walks) based on the integrated graph derived by the Constructor module. It does so by transforming the graph into an adjacency matrix. This matrix takes into account the execution effort (computational demand) of the instructions. In other words, some instructions demand more execution effort than others. This adjacency matrix has weights which define the execution effort of edges. Algorithm 1 presents the Generator module devised for the STF strategy. Note that inputs are the optimisation algorithms ( Alg), the integrated directed graph (G), and a parameter which limits the maximum number of directed trails to consider (lim). The goal of such parameter is to deal with scalability issues which are likely to occur with non-trivial applications. Algorithm 1. Generator - STF input: Alg, G, lim output: ∀Popi , I 1: v_init ← get I nitialV er tex(G) 2: v_ f in ← get FinalV er tex(G) 3: DW ← solveChinese Postman Pr oblem(G) 4: DW ← DW cr eateDir ectedT rails(G, v_init, v_ f in, lim) 5: for each ai ∈ Alg do 6: while r uns ≤ max_r uns do 7: Popi ← cr eateI nitial Population() 8: while iterations ≤ max_iterations do 9: Popi ← run Algorithm(ai , Popi , DW ) 10: end while 11: end while 12: end for 13: T K P F ← cr eateT r ueK nown Par etoFr ont (∀Popi ) 14: I ← calculateQualit y I ndicator s(T K P F, ∀Popi ) 15: return ∀Popi , I
Firstly, we get the initial (v_init) and final (v_ f inal) vertices of the directed graph G. Directed walks (DW ) are obtained in two steps. Firstly, we solve the Chinese postman problem and get a closed walk of minimum length that visits every edge of the graph at least once [27]. In the second step, we create directed trails (each directed trail is also a directed walk) and join them with the directed walks obtained by solving the Chinese postman problem.
138
V. A. de Santiago Júnior and C. P. Sales
Fig. 2 Example of integration graph.
Let us assume Fig. 2 to explain the need for these two phases. Such a figure presents an integration graph obtained based on C++ files. We clearly see that the vertices are associated with instructions of the source code. In order to solve the Chinese postman problem, the graph must be strongly connected. In case it is not, as the example in Fig. 2, we can just add an extra edge from the vertex f inal to the vertex main (the initial one). Each test case starts with the initial vertex and ends in the final one. Hence, a typical solution of the Chinese postman problem can contain the final vertex several times as shown in the closed
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
139
walk (cw) below: cw = { f inal, main, U ser _newU ser _22_1, Account_new Account_2, bool_validU ser _newU ser _veri f y Age_3, i f _4, r etur n_7, f inal, main, U ser _newU ser _22_1, Account_new Account_2, · · · , r etur n_7, f inal}. Thus, we split the closed walk and generate directed walks and their sequences of vertices that start with main and ends in f inal. In this example, the two test cases (tc1 , tc2 ) created via the solution of the Chinese problem are as follows: tc1 = {main, U ser _newU ser _22_1, Account_new Account_2, bool_validU ser _newU ser _veri f y Age_3, i f _4, r etur n_7, f inal}.
tc2 = {main, U ser _newU ser _22_1, Account_new Account_2, bool_validU ser _newU ser _veri f y Age_3, i f _4, new Account_initialise_newU ser _100_5, initialise, user _user _13, balance_b_14, i f _15, new Account_initialise_newU ser _100_5, initialise, user _user _13, balance_b_14, i f _15, · · · , new Account_deposit_10_6, r etur n_7, f inal}. Note that we can have repeated vertices and related edges in such test cases above as seen in tc2 . The motivation to create test cases via solving the Chinese postman problem is to provide to the optimisation algorithms a set of test cases that, altogether, cover all the edges of the graph at least once. Hence, these are options that the decision variables of a solution can contemplate. But note that some test cases can become enormous (huge number of vertices) if we consider large graphs. For example, a test case like tc2 considering a graph containing thousands of vertices and edges can be very large, making it infeasible to be executed in practice. Moreover, usually, we may have few test cases derived via the Chinese postman problem where some are very large resulting in less possible options for the decision variables. This is the reason to create additional test cases by obtaining directed trails (cr eateDir ectedT rails) in order to have smaller test cases and increase the number of them. Since these extra cases are directed trails, we do not have repeated edges but we can still have repeated vertices. As we have mentioned above, the parameter lim serves to limit the maximum number of directed trails to handle scalability issues. Hence, for each algorithm ai , a population (Popi ) is generated according to the principles of the technique, for a maximum number of iterations (max_iterations). And each algorithm runs for a maximum number of runs (max_r uns). The Generator returns all final populations (∀Popi ) of all optimisation algorithms, and also their
140
V. A. de Santiago Júnior and C. P. Sales
quality indicators (I ). However, in order to obtain such indicators, we need to create the so called True Known Pareto Front (T K P F) where, for each problem instance, we join all final populations of all algorithms after the maximum number of runs, obtain the nondominated solutions, and remove the repeated ones. This is necessary because we deal with real-world and not benchmark problems. The objective functions to generate the test cases are: (a) Size of the test suite. This is a cost measure which is simply the sum of the number of vertices of all test cases (decision variables) of a test suite (solution). It is to be minimised since, in general, the fewer events required to be stimulated x) based on the test suite, the better. We denote this objective function by f 1 ( where x is a solution. Moreover, we did not use this function in our previous work [12]; (b) Execution effort: this function aims to evaluate the execution effort associated with a solution (test suite). It takes into consideration the adjacency matrix that we have detailed above, and we just sum the weights of the edges related to such a test suite. It is to be minimised. This function is related to non-functional x ); testing and we denote it by f 2 ( (c) Edge coverage: this function shows the coverage of edges of the graph which represents a problem instance (set of integrated files). It is to be maximised since, in general, the more edges covered, the better. This is a functional testing x ). objective denoted by f 3 ( In formal terms, our multi-objective optimisation problem can be formulated as follows: x ), f 2 ( x ), f 3 ( x ))T minimise F( x ) = ( f 1 ( subject to x ∈Ω where Ω is the decision variable space, F : Ω → R 3 consists of the three objective functions we have just described, and R 3 is the objective space. We now show how the values of the objective functions are calculated considering Fig. 2 again. Let us define that each solution has two decision variables and that the execution effort is 1 for each edge. Moreover, let us say that a test suite, i.e. a solution x1 , consists of the test case tc1 presented earlier and the test case tc3 below: tc3 = {main, U ser _newU ser _22_1, Account_new Account_2, bool_validU ser _newU ser _veri f y Age_3, i f _4, new Account_initialise_newU ser _100_5, new Account_deposit_10_6, r etur n_7, f inal}. It is important to state that x1 = 1, 3 but each integer value represents one of the test cases, i.e. tc1 or tc3 . Hence, f 1 ( x1 ) is simply the sum of all vertices of all test x1 ) = 7 + 9 = 16. On the other hand, f 2 ( x1 ) = 6 + 8 = 14, since all cases, i.e. f 1 (
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
141
edges have an effort equal to 1. The edge coverage must be maximised but we turn this into a minimisation problem too as shown below: x1 ) = 1 − f 3 (
|C E| |E|
(1)
where |C E| is the number of covered edges due to all test cases of a test suite, and |E| is the total number of the edges of the graph. A covered edge is counted only once if more than one test case traverses it. In Fig. 2, the graph has a total of 19 edges x1 ) = 1 − 9/19 = 0.526. and the two test cases cover 9 out of them. Hence, f 3 ( The optimisation algorithms can therefore be used to generate the test cases. The population, S, of nondominated solutions, s j , due the execution of an algorithm ai is then a set of test suites (set of sequences of abstract test cases) to be later translated into executable test cases. The InMeHy_STF method has been implemented in a tool available online [20], based on the jMetal framework [28], the JGraphT library [29], and the ANTLR [30] parser generator where the latter creates the syntax trees.
4 Controlled Experiment The design and characteristics of the controlled experiment we conducted is described in this section.
4.1 Objective, Research Questions and Variables The objective of this evaluation is to identify which out of seven optimisation algorithms is the best regarding test case generation related to the integration testing level. Applications developed in C++ were considered as our SUTs, and the algorithms IBEA, SPEA2, NSGA-III, MOMBI-II, HRISE_R, HRISE_M, and HH-CF were the options to generate test cases. Three quality indicators to evaluate the performance of the algorithms were selected: hypervolume, indicator, and IGD+. The motivation for using various quality indicators is because each one assesses the quality of the populations derived by the algorithms under different perspectives. As we have already mentioned in Sect. 1, this experiment should answer the following research questions (RQs): (a) RQ_1 - Which of the seven algorithms is the best regarding each quality indicator? (b) RQ_2 - Is there an algorithm that is clearly superior than all the others considering all quality indicators? (c) RQ_3 - Which of the hyper-heuristics presents the best performance?
142
V. A. de Santiago Júnior and C. P. Sales
The independent variables are the optimisation algorithms. The dependent variables are the values of the quality indicators: hypervolume, indicator, and IGD+.
4.2 Problems and Problem Instances Case studies are two C++ geoinformatics software products. GeoDMA is a toolbox for integrating remote sensing imagery analysis methods with data mining techniques, aiming to extract information and knowledge discovery over large geographic databases. TerraLib is a geographic information system software library to support the development of customised geographical applications. It is a non-trivial application with more than 1,400 classes. Table 1 shows the problems (Prob) and information of the largest problem instances for both products. The name of the problem refers to the name of a class of the software product, and since both products are already developed, our unit is one file where we start the integration process with a main file defined by ourselves. For instance, the single GeoDMA problem (Main) generated 4 problem instances (#Prob Inst) where each instance is obtained by integrating 4 files at a time (integration level). Therefore, instance GEO1_4 (instance 1 with integration level 4) was created by integrating the first 4 files, GEO2_4 was created by adding 4 more files, after that GEO3_4, and finally GEO4_4 were derived. Also notice that when we create a new problem instance within a problem, the graph becomes larger. Hence, there are 229 vertices and 330 edges in GEO1_4 and 451 vertices and 769 edges in GEO4_4. ⏐)), number of vertices (#Vertices In Table 1, we only show the identification (Id ( (⏐)), and edges ((#Edges (⏐)) of the largest instance within a problem. Altogether, we have 12 problems where 1 problem is due to GeoDMA and the remaining 11 are derived for TerraLib, and 39 problems instances where 4 are for GeoDMA and 35 are due to TerraLib. Most of problems and problem instances are generated for TerraLib. This is because TerraLib is much larger than GeoDMA. Notice that some problem instances did not change the resulting graph obtained in the previous problem instance, even if we added files. This is because the new files that were selected for integration either had no body or were not directly related to the main file or the secondary ones. Hence, we ruled out such useless instances. Altogether, for both products, we considered 89 files in this evaluation.
4.3 Algorithms and Parameters Regarding the selected algorithms, IBEA [13] is a general indicator-based multiobjective EA, and the main reasoning is to first define the optimisation goal in terms of a binary performance measure (indicator) and then to directly use this measure in the selection process. SPEA2 [14] is an improved version of SPEA and has,
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ... Table 1 Characteristics of problems and of the largest problem instances. Prob
#Prob Inst
143
Id (⏐)
Software
#Vertices (⏐)
#Edges (⏐)
Main
4
GEO4_4
GeoDMA
451
769
CompoundCurve
3
TLCC6_4
TerraLib
125
158
CircularString
5
TLCS6_4
TerraLib
252
393
GeometryCollection
2
TLGC2_4
TerraLib
133
171
GeometryFactory
7
TLGF7_4
TerraLib
92
118
LineString
4
TLLS6_4
TerraLib
300
420
MultiCurve
3
TLMC3_4
TerraLib
36
53
MultiPoligon
4
TLMP5_4
TerraLib
45
59
OrdinalPeriod
1
TLOP2_4
TerraLib
31
35
Point
3
TLPT3_4
TerraLib
110
148
PolyhedralSurface
2
TLPS2_4
TerraLib
117
149
TimePeriod
1
TLTP1_4
TerraLib
31
37
Total
39
as additional characteristics, a fine-grained fitness assignment strategy, a density estimation technique, and an enhanced archive truncation method. These classical multi-objective EAs were selected due to their popularity and several studies usually compare hyper-heuristics to them [4, 6]. NSGA-III [15] is based on NSGA-II, one of the most popular EA, and it is a reference-point-based many-objective EA that emphasises population members that are nondominated, yet close to a set of supplied reference points. MOMBI-II is a many-objective EA based on the R2 indicator as individual selection mechanism [16]. As for the selection hyper-heuristics, HRISE_R and HRISE_M [4] embed a heuristic selection method based on roulette wheel supported by reinforcement learning followed by a balanced exploitation/exploration procedure. Moreover, they use a two-level move acceptance strategy: only improving plus a group-decision framework where several move acceptance methods are considered. HRISE_R relies on the responsibility rule while HRISE_M is based on the majority rule. A hyper-heuristic based on Choice Function, HH-CF, was presented in [7] in which NSGA-II and SPEA2 are two out of three low-level heuristics (LLHs). In our case, a third LLH is IBEA. The heuristic selection method is based on a two-stage ranking scheme and four quality indicators: algorithm effort, ratio of nondominated individuals, hypervolume, and uniform distribution. All algorithms were run with these parameters: i) population size = 100; ii) number of decision variables = 10; iii) crossover probability = 0.9; iv) mutation probability = 0.0125; v) crossover operator = simulated binary crossover (SBX); vi) mutation operator = integer polynomial; vii) maximum number of runs = 20; ix) maximum number of iterations = 1,000; x) parameter lim = 10,000. Note that the selection hyper-heuristics, HRISE_R, HRISE_M, and HH-CF, use as LLHs complete EAs, in
144
V. A. de Santiago Júnior and C. P. Sales
this case IBEA, SPEA2, and NSGA-II. In other words, at each decision point, they select one of these LLHs to run, in accordance with some criterion, for a certain number of iterations. Therefore, in addition to the parameters above, they have some extra ones which are defined in the respective articles. It is important to mention that we did not perform tuning of the parameters of the hyper-heuristics but rather we used the values suggested in the articles.
4.4 Types of Evaluation At least the front-normalised values of the indicators were considered. From this point onward, we will denote the front-normalised hypervolume simply as hypervolume, h, as well as , I + are the (front-normalised) indicator, IGD+, respectively. The higher the h, the better the algorithm but the lower , I +, the better the algorithm. We accomplished two types of evaluation. The cross-domain analysis is suitable to obtain evidence of the generalisation capability of the algorithms across all problem instances, and not performing a case-by-case evaluation. We then defined a second level of normalisation of the indicators. For instance, the normalised (frontnormalised) or simply normalised hypervolume, h N , is defined below [4]: hN =
h max (∀a, p) − h (ai , p) min h max (∀a, p) − h (∀a, p)
(2)
min where h max (∀a, p) and h (∀a, p) are the maximum and minimum values, respectively, of the hypervolume, h, due to all algorithms a for a problem instance p, and h (ai , p) is the average value of the hypervolume due to algorithm ai for p. Note that the formulation of h N is like a maximisation problem (maximise hypervolume) is turned into a minimisation problem. Hence, the lower the value of h N , the better the algorithm. As for and I +, the normalised value is calculated in a standard manner. For instance, the normalised indicator, N , is obtained as follows:
N =
min (ai , p) − (∀a, p) max min (∀a, p) − (∀a, p)
(3)
max min where (∀a, p) and (∀a, p) are the maximum and minimum values, respectively, of the indicator due to all algorithms a for a problem instance p, and (ai , p) is the average value of the indicator due to algorithm ai for p. The lower N and also I + N (normalised IGD+), the better the approach. We then calculate the averages of the normalised quality indicators considering all problem instances of all problems: h NA P R , NA P R , I + NA P R which refer to the average values of the normalised hypervolume, indicator, and IGD+, respectively. Again, the algorithm which obtained the lower of these values is considered the best overall.
Metaheuristics and Hyper-heuristics Based on Evolutionary Algorithms ...
145
We also performed a second evaluation which is a statistical analysis but now we decided to rely on the (front-normalised) indicators, namely h, , and I +. It is our belief that using the second-degree of normalisation may mask the results of the statistical test. We applied a two-tailed permutation test (conditional inference procedure) [31] for multi-group comparison with significance level equal to 0.05. This is a case-by-case analysis, i.e. we verified, for each problem instance, if an algorithm ai was significantly better (“>”) than an algorithm a j , if it was worse (“” means the leftmost algorithm was significantly better than the rightmost one, “ ∼ < > ∼ IBEA × SPEA2 IBEA × NSGA-III IBEA × MOMBI-II IBEA × HRISE_M IBEA × HRISE_R IBEA × HH-CF SPEA2 × NSGA-III SPEA2 × MOMBI-II SPEA2 × HRISE_M SPEA2 × HRISE_R SPEA2 × HH-CF NSGA-III × MOMBI-II NSGA-III × HRISE_M NSGA-III × HRISE_R NSGA-III × HH-CF MOMBI-II × HRISE_M MOMBI-II × HRISE_R MOMBI-II × HH-CF HRISE_M × HRISE_R HRISE_M × HH-CF HRISE_R × HH-CF
h 0 30 39 23 24 32 37 39 27 26 33 39 14 16 19 2 1 4 2 18 22
22 9 0 9 9 6 2 0 6 7 6 0 11 8 14 7 7 11 35 21 16
0 28 39 21 21 35 37 39 28 29 38 39 14 14 19 4 4 4 0 16 21
17 0 0 7 6 1 0 0 6 6 0 0 14 15 6 30 31 24 2 0 1
22 11 0 15 14 4 2 0 10 8 1 0 8 9 9 4 3 8 38 23 17
147
∼
U 1 ) then it was called Boost converter. So the o/p voltage is expressed by U
16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy …
Uo =
Ui 1− D
229
(13)
By combining the above features of Buck and Boost converters, we have Uo =
D Ui 1− D
(14)
3.5 Three Phase Inverter From the DC-DC buck-boost converter, the output DC voltage is connected to grid via DC-AC converter i.e. Inverter. Generally, a 3-phase inverter which is utilized for the conversion of DC-AC in the grid side was termed as Grid Side Converter (GSC). A three-phase inverter is shown in Fig. 7. In Fig. 7, a control strategy known as Voltage Oriented Control (VOC) is utilized to vary output voltage by varying the gate trigger pulse i.e. pulse width modulation (PWM) of the signal. This PWM signal controls the inverter semiconductor switches. The voltage vector controls are derived from the voltage and current PI controllers through park’s transformation theory [12]. This Proportional Integral (PI) controller is utilised to enhance system stability.
Fig. 7 Three phase bridge inverter
230
S. Sahoo and K. Teja
3.6 Fuzzy MPPT Algorithm To takeout the maximum power from the output, we implemented a new strategy known as MPPT method which continuously tracks maximum electric power so that the reliability of the system is enhanced. Various techniques are available viz. Fuzzy Logic Controller (FLC) [13, 14], Hill-Climbing Search (HCS) and Perturb and Observe (P&O). In this paper FLC MPPT is implemented for controlling the respective buck boost converters. In this FLC, mainly program If-else statement is used and does not require any mathematical model. So it reduces the complexity for analysing the given solution and it also implemented to a very complex solution. The below figure shows the flowchart of Fuzzy MPPT (Fig. 8). Fig. 8 Flow chart of FUZZY MPPT method START
Initialize P(k-1) = 0
Measure V(k), I(k). Then find P(k) Find ΔP and ΔI
Fuzzification
Rule Base
Defuzzification
D (Duty Cycle)
Fuzzy Set
Inference
16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy …
231
4 Simulation Results and Discussion This section presents details of the MATLAB/SIMULINK software simulation, which is carried out for a WECS, Solar Cell and Fuel cell sources connected to grid using Fuzzy MPPT controller. The parameters of these sources are shown in Tables 1, 2 and 3 respectively. In Fig. 9 wind speed is shown, where the wind speed is 14 m/s up-to 1 s after that it is 9 m/s respectively. Figure 10 demonstrations the wind torque, Figs. 11, 12 and 13 shows output Voltage, Current and Power of the wind energy system respectively. Similarly In Figs. 14, 15 and 16 indicates the output voltage, Current and Power of PV array. From Fig. 14, the output of the PV array is 63.6 V. Table 1 The parameters of wind turbine and PMSG [1]
Table 2 The parameters of solar PV array
Table 3 The parameters of fuel cell
Parameters of wind turbine
Magnitude
Rated power (MW)
2.3
Blade diameter (m)
71
Rated wind speed (m/s)
14
Number of blades
3
Turbine inertia Jr (kg·m2 )
670
Parameters of PMSG
Magnitude
Rated voltage (U)
690
Stator frequency (Hz)
12.15
No. of poles
2
Rated current (I)
6.8
Rated power (MW)
2
Parameters of solar PV array
Magnitude
Rated power (W)
600
Open circuit voltage (V)
63.6
Short circuit current (A)
12.5
Optimum voltage (V)
42.4
Optimum current (A)
6.75
Parameters of fuel cell
Magnitude
Voc
35
No. of cells
40
Rated current (I)
4.8
Rated power (MW)
85
232 Fig. 9 Speed of the wind
Fig. 10 Torque of wind
Fig. 11 Wind turbine output voltage
Fig. 12 Wind turbine output current
S. Sahoo and K. Teja
16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy … Fig. 13 Wind power output
Fig. 14 Output voltage of PV array
Fig. 15 Output current of PV array
Fig. 16 Output power of PV array
233
234
S. Sahoo and K. Teja
In Figs. 17, 18 and 19 indicates the output voltage, Current and Power of Fuel Cell. The output of the Fuel cell is 17.6 V. In Figs. 20 and 21 indicates the three-phase voltage and current for a hybrid scheme. Similarly Figs. 22 and 23 indicates grid voltage and current respectively. The output voltage of the PV, Wind and Fuel cell hybrid system connected to grid is 590 V. Figure 24 indicates proposed grid voltage total harmonic distortion (THD) is 3.30%. Fig. 17 Output voltage of fuel cell
Fig. 18 Output current of fuel cell
Fig. 19 Output power of fuel cell
16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy … Fig. 20 Three phase voltage for hybrid scheme
Fig. 21 Three phase current for hybrid scheme
Fig. 22 Grid voltage
Fig. 23 Grid current
235
236
S. Sahoo and K. Teja
Fig. 24 THD of grid voltage is 3.30% in the proposed scheme
5 Conclusion This paper mainly explains the hybrid grid with an integration of PMSG for the wind power generation, Solar PV cells and Fuel cells. These power sources are implemented based on fuzzy MPPT technique. Because of fuzzy MPPT, proposed model gives less THD and better output magnitude as compared to the conventional model. The suggested results were simulated in the Simulink of Platform and Waveform has been analysed and plotted.
References 1. Fathabadi H (2017) Novel standalone hybrid solar/wind/fuel cell power generation system for remote areas. Sol Energy 146:30–43 2. Mastromauro RA, Liserre M, Kerekes T, Dell’Aquila A (2009) A single-phase voltagecontrolled grid-connected photovoltaic system with power quality conditioner functionality. IEEE Trans Industr Electron 56(11):4436–4444 3. Abdelsalam AK, Massoud AM, Ahmed S, Enjeti PN (2011) High-performance adaptive perturb and observe MPPT technique for photovoltaic-based microgrids. IEEE Trans Power Electron 26(4):1010–1021 4. Schulz D, Jahn M, Pfeifer T (2008) Grid integration of photovoltaics and fuel cells. In: Strzelecki R, Benysek G (eds) Power electronics in smart electrical energy networks, power systems. Springer, London 5. Li X, Wang Q, Wen H, Xiao W (2019) Comprehensive studies on operational principles for maximum power point tracking in photovoltaic systems. IEEE Access. 7:121407–121420 6. Rajesh K, Kulkarni AD, Ananthapadmanabha T (2015) Modelling and simulation of solar PV and DFIG based wind hybrid system. Procedia Technol 21:667–675 7. Jain S, Agarwal V (2007) A single-stage grid connected inverter topology for solar PV systems with maximum power point tracking. IEEE Trans Power Electron 22(5):1928–1940 8. Lian KL, Jhang JH, Tian IS (2014) A maximum power point tracking method based on perturband-observe combined with particle swarm optimization. IEEE J Photovolt 4(2):626–633 9. Biswas I, Bajpai P (2014) Control of PV-FC-battery-SC hybrid system for standalone DC load. 2014 Eighteenth national power systems conference (NPSC), pp 1–6 10. Mezzai N, Rekioua D, Rekioua T, Mohammedi A (2014) Modeling of hybrid photovoltaic/wind/fuel cells power system. Int J Hydrogen Energy 39(27):15158–15168
16 Power Control of a Grid Connected Hybrid Fuel Cell, Solar and Wind Energy …
237
11. Lahari NV, Shetty HVK (2015) Integration of grid connected PMSG wind energy and solar energy systems using different control strategies. In: International scientific conference on electric power engineering, pp 23–27 12. Sahoo S, Subudhi B, Panda G (2016) Pitch angle control for variable speed wind turbine using fuzzy logic. In: IEEE International Conference on Information Technology (ICIT), IIIT, Bhubaneswar, India. pp 28–32 13. Ali MN, Mahmoud K, Lehtonen M, Darwish MMF (2021) Promising MPPT methods combining metaheuristic, fuzzy-logic and ANN techniques for grid-connected photovoltaic. Sensors 21(4):1244 14. Ge X, Ahmed FW, Rezvani A, Aljojo N, Samad S, Foong LK (2020) Implementation of a novel hybrid BAT-Fuzzy controller based MPPT for grid-connected PV-battery system. Control Eng Pract 98:104–380
Novel Harris Hawks Optimization and Deep Neural Network Approach for Intrusion Detection Miodrag Zivkovic , Nebojsa Bacanin , Jelena Arandjelovic , Andjela Rakic , Ivana Strumberger , K. Venkatachalam , and P. Mani Joseph
1 Introduction To ensure safe and reliable information flow across diverse businesses, modern networked business settings necessitate a high level of security. After traditional security technologies fail, an intrusion detection system works as a versatile safeguard device for system security. Because cyberattacks are only going to get more sophisticated, defensive technology must keep up. In general, IDS employs two approaches to detect potential computer security breaches: the first is signature-based detection, which matches data activity with a signature or pattern maintained in a signature database. The second type, behaviorbased or statistical anomaly-based detection, detects any irregularity and raises a M. Zivkovic · N. Bacanin (B) · J. Arandjelovic · A. Rakic · I. Strumberger Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] M. Zivkovic e-mail: [email protected] J. Arandjelovic e-mail: [email protected] A. Rakic e-mail: [email protected] I. Strumberger e-mail: [email protected] K. Venkatachalam Faculty of Science, Department of Applied Cybernetics, University of Hradec Králové, 50003 Hradec Krállové, Czech Republic e-mail: [email protected] P. M. Joseph Department of Mathematics and Computer Science, Modern College of Business and Science, PO Box 100, PC 133 Muscat, Sultanate of Oman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_17
239
240
M. Zivkovic et al.
warning. It is referred to as an expert system since it learns what normal system behavior is. Furthermore, IDS systems are frequently categorized into five categories [14]: protocol-based IDS (PIDS), application protocol-based IDS (AIPDS), hostbased IDS (HIDS), network-based IDS (NIDS), and hybrid IDS (HIDS). IDS solutions are available in a variety of configurations and capabilities. The following are examples of common intrusion detection systems: 1. PIDS is a type of IDS that is often located on the web servers and utilized to monitor and analyze the protocols that the given computer system is using. It explores the protocol’s dynamic way of behaving and status and often includes a component incorporated in the server’s front end that monitors communication among interconnected devices and the system they’re safeguarding. 2. APIDS is an IDS that monitors a particular application protocol (one or more) used by the observed computer system. An APIDS will track the protocol’s way of behaving and condition, and it is particularly consisting of a component located among a set of servers that monitors the particular application protocol. An APIDS is typically installed betwixt a web server and a database management system (DBMS), where it monitors the SQL protocol used by the applications while they communicate to the database. 3. HIDS system is a type of intrusion detection system that is deployed on remote servers that are connected to the internet and a company’s internal network. This technology can detect packets from within the company as well as additional malicious traffic that a NIDS can’t. It can also detect dangerous threats emanating from the host, such as a host infected with malware that is attempting to propagate it throughout the organization’s system 4. NIDS solution monitors incoming and outgoing network traffic and is implemented at crucial places throughout an organization’s network. This IDS strategy monitors and detects malicious and suspicious traffic entering and departing from all network-connected devices. 5. HIDS is installed on the specific endpoints as a protector from both inside and outside threats. An IDS of this type is capable of monitoring the incoming/outgoing network traffic of the given computer, observing active processes, and accessing the system logs. A HIDS domain is bounded to the host computer, which limits the policy-enforcing context. To a considerable extent, the ability of ML techniques to execute correct classification is dependent on the data quality. IDS systems use datasets of high dimension, that have redundancy and even unrelated features, as well as a large number of samples. As a result, in ML techniques, the data preprocessing phase is critical. On such high-dimensional data, nature-inspired metaheuristics like swarm intelligence methods can be employed to select the features that have the largest impact on total categorization while rejecting unneeded features.
Novel Harris Hawks Optimization and Deep Neural Network Approach ...
241
1.1 Research Goals and Contributions In this research, we propose an improved version of the harris hawks optimization (HHO) method, which is part of the swarm intelligence approaches, to address the task of selecting the optimum set of features that will influence the classification results. The contributions of the research given in this manuscript can be summarized in the following way: – The improvement of the basic design of the HHO metaheuristics, that specifically targets the drawbacks of the original version. – Hybridization of the deep neural network (DNN) with the devised improved HHO algorithm that will help in the feature selection and optimize the time required for training by reducing the dimensionality.
1.2 Structure of the Paper The sections that make up this paper are specified in the following order. Section 2 provides an essential theoretical background for DNN and swarm intelligence metaheuristics, as well as references to relevant literature. Section 3 describes the basics of Harris Hawks optimization algorithm. Section 4 depicts the empirical results, analysis, and discussion, and finally, Sect. 5 shows the summary of this paper but also proposes future studies.
2 Theoretical Background and Literature Review Machine learning is a method of teaching computers to learn independently and interpret data with no need to be explicitly programmed. Because of their universal capacity to detect both original and variant threats, machine learning, and deep learning approaches offer enormous prospect to improve the current intrusion detection system models [22]. Artificial neural networks (ANNs) are the computer system’s components that mimic how the human brain analyzes and processes data. Artificial intelligence (AI) is built on this foundation, and it solves issues that would be either not possible or extremely hard to solve by humans or statistical criteria. Because ANNs are selflearning, they can enhance their performance as additional data becomes available. Artificial neural networks have neuron nodes connected in a web-like manner, similar to the human brain’s structure. Tens of billions of neurons make up the human brain. The cell body of each neuron is responsible for processing information and conveying it to and from the brain (inputs and outputs) [11, 13]. A training phase is where an artificial neural network (ANN) is learning to perceive data patterns that could be either visual, vocal, or textual. During the supervised phase, the ANN performs the
242
M. Zivkovic et al.
comparison of its actual outputs to the desired outputs. The disparity among the two outputs is tuned by utilizing the backpropagation. The network adjusts the weight of its connections backward, starting from the output layers until the discrepancy among the actual and preferred results provides the minimum possible error. The basic principle of swarm intelligence as emulating natural science is exemplified by the Michael Crichton swarm. Experts point to biological systems such as bird flocking, ant colony behavior, fish schools, and bacteria multiplication as examples of natural systems that swarm intelligence could imitate. This led to a large number of nature inspired metaheuristics algorithms in the past two decades. Others discuss stochastic processes that may be used to model swarm intelligence to understand better how concrete swarm intelligence IT applications would work [1, 16]. In the recent years, swarm intelligence methods were utilized to resolve a numerous pragmatic NP-hard IT tasks, such as global numerical optimization [6], wireless sensor networks issues such as localization and network lifetime [4, 18, 20], task scheduling in the cloud computing domain [2, 9], ANN and CNN optimization [3, 7, 15], COVID-19 cases prediction [19, 22], and MRI classification optimization in medical domain [5, 8].
3 Proposed Method This section first describes the basic version of the HHO metaheuristics. After that, the observed and known drawbacks of the basic HHO are highlighted. Finally, the improved version of the HHO is proposed, with modifications that have a goal to address the mentioned flaws of the basic algorithm and improve the algorithm’s performance.
3.1 Basic HHO Algorithm The HHO method (Harris Hawks Optimization) is a new metaheuristic algorithm based on cooperative activity [12]. In comparison to other optimization approaches, HHO showed encouraging outcomes. Surprise pounce, a group hunting exercise and pursuit method used by Harris hawks, is the foundation for HHO metaheuristics. The hawks’ particular surprise pounce strategy, their exploration for prey, and different capture strategies utilized by the party of hawks during the hunting drive the HHO metaheuristics exploration and exploitation phases. The HHO method’s exploration phase is modeled after how hawks track and detect their prey. Hawks may sit and study the target area for hours, looking for potential prey. Each hawk represents a potential solution in HHO implementation, while the best solution at each step is seen as the goal or near to the optimum, and hawks set up shop in a variety of sites at random and wait for a pray to appear using one of two techniques. It is hypothesized that HHO has an exploring mechanism. Given the characters of Harris’
Novel Harris Hawks Optimization and Deep Neural Network Approach ...
243
hawks, they are able to follow and spot the prey with their keen vision, however it may not always be visible. As a result, the hawks must hold back, watch, and surveil the desert location for many hours in order to discover prey. Harris’ hawks represent the candidate solutions in HHO, while the best one in every stage is regarded as the targeted prey or roughly the optimal candidate solution. The Harris’ hawks conduct a surprise pounce by executing an attack on targeted prey discovered in the preceding phase during the exploitation. Preys, on the opposite, frequently seek to run away from the harmful circumstances. Consequently, different pursuing styles emerge in real-life circumstances. In order to mimic the attacking stage, the HHO suggests four potential methods based on prey fleeing behaviors and Harris’ hawk pursuit strategy [21]. Since the hawks in a hunting party situate themselves dependent of the positions of other hawks (to be in close vicinity for the attack) and the prey, that can be described using the Eq. (1), both tactics have an identical chance q. X (t + 1) =
X rand (t) − r1 |X rand (t) − 2r2 X (t)|, q ≥ 0 (X best (t) − X m (t)) − r3 (L B + r4 (U B − L B)), q < 0.5
(1)
Components L B and U B represent the highest and lowest boundaries of decision variables, respectively, X rand (t) signifies a randomised response from the population, X m provides the average capacity of existing solutions population. The present hawk position vector is represented by X (t) and the answer for the next round t is denoted by X (t + 1). Finally, r1 , r2 , r3 , r4 and q are randomly selected values inside the [0, 1] range that are updated in every round, while X best (t) indicates the position of prey. Equation bellow may be used to get the mean position of the current responses in population (X m (t)): N 1 X i (t) (2) X m (t) = N i=1 in which X ( t) denotes the position of the i-th member in phase t, and N is the population’s total lot of approaches. The HHO method may change among multiple exploitation tactics while transiting out from exploration phase to the phase, depending of the target’s available energy, which diminishes as the prey flees, as described by Eq. (3): E = 2E 0 (1 −
t ). T
(3)
E defines the prey’s level of energy for fleeing, T denotes the maximum number of repetitions, and E 0 defines the prey’s level of energy at the start. Parameter E 0 switches between (−1, 1) at random. The HHO will use the unexpected pounce technique in the exploitation phase to pursue the prey that was discovered in the previous stage. Prior to the surprise pounce move, the parameter r represents the chance that prey would successfully
244
M. Zivkovic et al.
escape (r < 0.5) or not (r ≥ 0.5). In both situations, the hawk hunting group will encircle the victim and execute a hard or gentle besiege. Shifting among gentle and harsh conquer processes is controlled by parameter E. In the case of |E| ≥ 0.5, soft besiege occurs, whereas in the case of |E| < 0.5, strong besiege occurs. When |E| ≥ 0.5 and r ≥ 0.5, the prey will have enough strength to flee by leaping at random during the gentle besiege. The hawks then softly surround the target to tire it before attacking with the surprise pounce maneuver, which may be represented using Eqs. (4) and (5): (4) X (t + 1) = ΔX (t) − E|J X best (t) − X (t)| ΔX (t) = X best (t) − X (t)
(5)
The gap between both the prey’s vector field and the current position in iterate t is denoted by ΔX (t). r5 is a random number in the range (0, 1) that represents the prey’s random leaping energy J = 2(1 − r5 ) throughout the escape. When the prey is tired or when |E| < 0.5 and r ≥ 0.5 are present, however, hard besiege ensues. The hawks of the party continue to circle the target until it is caught, and the current positions may be updated using Eq. (6). X (t + 1) = X best (t) − E|ΔX (t)|
(6)
3.2 Drawbacks of the Basic HHO The basic HHO has shown excellent performances in its original form and established itself as one of the best and most promising optimizers [12]. However, by performing tests with the standard Congress on Evolutionary Computation (CEC) benchmark functions set, it can be seen that the original HHO could be enhanced in both exploration and exploitation phases. In some cases, the basic HHO may remain in sub-optimal areas of the search space in the early phases of execution. As a result, the general quality of solutions will be poor as most of them would converge to the sub-optimal early best solutions. Later, during exploration, HHO can discover the more promising domains. However, it is typically in the final rounds of execution, in other words, too late for fine-tuning, and the quality of the final solutions is not very good. This drawback of the HHO is a consequence of an unbalanced explorationexploitation trade-off that is oriented towards exploitation in the early phases, while it should be shifted in the direction of the exploitation in the later phases of the algorithm’s execution.
Novel Harris Hawks Optimization and Deep Neural Network Approach ...
245
Algorithm 1. Pseudo-code of devised HHO-QRLRS Set the size of population (N ) and termination condition in terms of T Generate initial population X i , (i = 1, 2, 3, ...N ) Calculate fitness and determine the best solution Set counter of iterations t = 0 while t ≤ T do Determine fitness for all individuals Denote X best as position of the current best individual for every solution X i do Set initial energy E 0 and jump strength J (Eq. (3)) Update E if |E| ≥ 1 then Exploration phase Update the location vector end if if |E| < 1 then Exploitation phase if r ≥ 0.5 and |E| ≥ 0.5 then Soft besiege Update the location vector by soft besiege else if r ≥ 0.5 and |E| < 0.5 then Hard besiege Update the location vector by using hard besiege else if r < 0.5 and |E| ≥ 0.5 then Soft besiege with progressive rapid dives Update the location vector by using soft besiege with rapid dives else if r < 0.5 and |E| < 0.5 then Hard besiege with progressive rapid dives Update the location vector using hard besiege with progressive rapid dives end if end if end for qr Generate X best qr Perform greedy selection between X wor st and X best Update iteration counter t + + end while Return X best
3.3 Proposed Improved HHO Algorithm - HHO-QRLRS Based on the previous research from this domain, one of the most efficient strategies to improve both exploration and exploitation is the quasi-reflection-based learning (QRL) procedure [17]. The QRL generates a solution on the opposite side of the search space of the current solution. If the current individual is, for example, in the sub-optimal domain, there is a good chance that its QRL peer will be near the optimal region. The quasi-reflexive-opposite individual X qr of the solution X is generated in the following way:
246
M. Zivkovic et al.
X qr = rnd
LB + UB ,X , 2
(7)
LB + UB that generates random number from uniform distribution in range ,X . 2 This procedure is executed for each parameter of solution X in D dimensions. Proposed improved HHO adopts a relatively simple replacement strategy of the worst individual from the population based on the QRL. This procedure is executed qr as follows: the quasi-reflexive current best solution (X best ) is generated, and then the qr greedy selection between X best and the current worst solution X wor st is performed. The individual with lower fitness will be discarded from the population. This procedure is efficient in early as well as in later iterations. In earlier cycles, the QRL improves exploration, and in later phases of execution, with the assumption that the current best individual has converged to the proper part of the search space, it may improve exploitation. Motivated by the proposed improvements, devised metaheuristics is named HHO with QRL replacement strategy (HHO-QRLRS). However, according to the NFL theorem, there is always a trade-off. Proposed HHO-QRLRS performs one more fitness function evaluation in each iteration, thus its complexity can be described as:O((N + 1) · (T + T · D + 1)). These facts were taken into consideration in simulations to maintain a fair and objective comparisons to other state-of-the-art techniques. Pseudo-code of proposed HHO-QRLRS is Algorithm 1.
4 Experimental Setup and Analysis The structure of the proposed hybrid classifier, named HHO-QRLRS DNN is similar to the approach stated in [14]. The dataset is first preprocessed by utilizing the minmax normalization method and afterward encoded by applying the 1-N encoding algorithm. Dimensionality reduction is performed by utilizing the devised HHO-QRLRS swarm intelligence method on the dataset. This step is necessary as if the number of features is extensive, over-fitting can happen during the network training. For additional information considering the classifier structure, please refer to the paper [14]. Two datasets were utilized in the experiments, namely the NSL-KDD and KDD Cup 99 datasets. The NSL-KDD data set was created to address the issues raised in the literature about the KDD’99 malware and intrusion data. Even though it still has some undesirable characteristics, it has become a well-known benchmark dataset. Nonetheless, the lack of local IDS datasets and the complexity in gathering data make NSL-KDD a reliable solution for malware prevention detection research. The set contains nearly five million records, making it suitable for machine learning while not being so large that researchers are forced to pick random parts of the set. As a result, the outcomes are easier to compare. To avoid bias in machine learning algorithms, the NSL-KDD dataset has been cleaned of redundant data, which is an improvement over the original KDD’99 dataset [10].
Novel Harris Hawks Optimization and Deep Neural Network Approach ...
247
The CICIDS2017 Intrusion Detection Evaluation Dataset was created in response to a lack of reliable and current cybersecurity datasets. Researchers’ access to IDS datasets usually comes with its own set of issues, such as a lack of traffic inclusivity, attack variety, inadequate features, and other issues. CICIDS2017 authors provide a dataset with accurate background traffic, which was created by abstracting the behavior of 25 users across a variety of protocols. The data was collected over a fiveday period, with four of those days being spent in the field. The data was gathered over a five-day time frame, with the first four days exposed to a variety of attacks such as malware, DoS attacks, web attacks, and others. CICIDS2017 is one of the most recent datasets available to researchers, with over 80 network flow features.
4.1 Cross Validation A few specific issues accompany Machine Learning and various aspects of Artificial Intelligence implementations. Overfitting is to blame for some of these problems. If the algorithms are put through these tests, they may perform admirably in the lab but will fail miserably when applied to real-world data. To counteract this, the models are subjected to k-fold cross-validation, also known as rotation estimation. The training dataset is divided into k sets, the first k-1 used for training and the remaining one for testing. The procedure is then repeated k times, each time with a different testing part. The folds are selected at random. In this study, 10-fold cross-sections were used.
4.2 Comparative Analysis and Discussion The performances of the proposed DNN classifier for the intrusion detection were evaluated on NSL-KDD and KDD Cup 99 datasets and compared to similar approaches. The same experimental conditions were established for all algorithms. The suggested DNN classifier hybridized with HHO metaheuristics operates in two distinct phases: reducing the dimensionality of the problem and performing the classification. The devised HHO metaheuristics is utilized in the first phase to reduce the number of dimensions and avoid overfitting. Later, the DNN performs the classification of the intrusion detection datasets. Throughout the conducted experiments, we have used the same structure as in [14]. DNN is fed with the resulting dataset generated by the HHO to its inputs. DNN itself is consisting of 3 hidden layers, with the activation function of choice being ReLU, and two output neurons that use Softmax as activation function. For more details about the structure and setup of DNN, please refer to the [14]. Comparative analysis for NSL-KDD and KDD Cup 99 is shown in Tables 1 and 2, respectively. Best results for each metric are marked bold.
248
M. Zivkovic et al.
Table 1 Performance validation of HHO-QRLRS-DNN, HHO-DNN and other hybrid approaches for NSL-KDD dataset Model SMO+DNN PCA+DNN DNN HHO+DNN HHOQRLRS+DNN Accuracy Precision Recall F-score Sensitivity Specificity
0,994 0,995 0,995 0,996 0,996 0,996
0,938 0,934 0,918 0,937 0,938 0,926
0,914 0,891 0,882 0,905 0,908 0,898
0,941 0,932 0,917 0,939 0,939 0,921
0,995 0,995 0,996 0,997 0,996 0,996
Table 2 Performance validation of HHO-QRLRS-DNN, HHO-DNN and and other hybrid approaches for KDD Cup 99 dataset Model SMO+DNN PCA+DNN DNN HHO+DNN HHOQRLRS+DNN Accuracy Precision Recall F-score Sensitivity Specificity
0,928 0,927 0,928 0,927 0,928 0,930
0,898 0,884 0,898 0,882 0,898 0,885
0,909 0,896 0,909 0,894 0,909 0,882
0,912 0,908 0,910 0,905 0,907 0,901
0,930 0,928 0,929 0,929 0,928 0,931
The performances of the suggested hybrid HHO-QRLRS and DNN approach were evaluated in the comparative analysis with other competitive hybrid strategies. Together with the HHO-QRLRS hybridized DNN, we conducted experiments with the basic HHO hybridized DNN to show differences between the improved and the basic HHO approach. The results for other algorithms that were included in the comparative analysis, namely PCA+DNN, SMO+DNN, and pure DNN (without reducing dimensionality) approaches, were derived from [14]. These algorithms were tested on the same dataset in [14], allowing a fair comparison. The comparative results and performances are presented in Table 1 for NSL-KDD dataset, and Table 2 for KDD Cup 99 dataset. The results from the experimental simulations indicate that the suggested HHOQRLRS enhanced DNN obtained superior performances over the basic HHO-DNN, PCA-DNN, and simple DNN. The proposed method achieved slightly better results than the SMO-DNN presented in [14]. The proposed HHO-QRLRS DNN method attained an accuracy of 0.995 on the NSL-KDD dataset and approximately 0.93 on the KDD Cup 99 dataset, while the SMO-DNN method achieved values 0.994 and 0.928. The improvement in accuracy is more drastic when the HHO-QRLRS DNN is compared to the basic version of HHO DNN, being approximately 5% on the NSL-KDD, and almost 2% for the KDD Cup 99. The proposed method also
Novel Harris Hawks Optimization and Deep Neural Network Approach ...
249
shows superior results in terms of other metrics included in the research over the basic HHO-DNN, PCA-DNN, and DNN, with a slight improvement of the results achieved by SMO-DNN. The basic HHO-DNN achieved decent results, performing better than the DNN and PCA-DNN, but it was clearly outperformed by SMO-DNN and HHO-QRLRS driven DNN.
5 Conclusion This manuscript proposes an improved HHO algorithm. We have devised and implemented the new algorithm named HHO-QRLRS to overcome the known drawbacks of the basic version of the HHO metaheuristics. The HHO-QRLRS method was then utilized in a hybrid DNN classifier for the intrusion detection problem. As the simulation results show, the proposed HHO-QRLRS approach significantly outperformed the basic HHO and other advanced approaches for the given problem. The proposed research has two significant contributions. First, the basic HHO metaheuristics was upgraded in a way that specifically addresses the known downsides of the algorithm. Second, the devised HHO-QRLRS method was used to hybridize the DNN that deals with intrusion detection, with promising results. The future work in this domain is to proceed with experiments with the devised HHO-QRLRS and apply it to different problems, including cloud computing, sensor and ad hoc networks, and convolutional neural networks. The second part of the future work will also include improving other swarm intelligence algorithms and testing them on the intrusion detection problem.
References 1. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M (2020) Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 13(3):67 2. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In: 2019 27th telecommunications forum (TELFOR). IEEE, pp 1–4 3. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In: Mobile computing and sustainable informatics. Springer, pp 397–409 4. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In: International conference on hybrid intelligent systems. Springer, pp 328–338 5. Bezdan T, Milosevic S, Venkatachalam K, Zivkovic M, Bacanin N, Strumberger I (2021) Optimizing convolutional neural network by hybridized elephant herding optimization algorithm for magnetic resonance image classification of glioma brain tumor grade. In: 2021 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 171–176 6. Bezdan T, Petrovic A, Zivkovic M, Strumberger I, Devi VK, Bacanin N (2021) Current best opposition-based learning salp swarm algorithm for global numerical optimization. In: 2021 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 5–10
250
M. Zivkovic et al.
7. Bezdan T, Stoean C, Naamany AA, Bacanin N, Rashid TA, Zivkovic M, Venkatachalam K (2021) Hybrid fruit-fly optimization algorithm with k-means for text document clustering. Mathematics 9(16):1929 8. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Glioma brain tumor grade classification from MRI using convolutional neural networks designed by modified FA. In: International conference on intelligent and fuzzy systems. Springer, pp 955–963 9. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. In: International conference on intelligent and fuzzy systems. Springer, pp 718–725 10. Choras M, Marek P (2020) Intrusion detection approach based on optimised artificial neural network. Neurocomputing 11. Farzadnia E, Shirazi H, Nowroozi A (2021) A novel sophisticated hybrid method for intrusion detection using the artificial immune system. J Inf Secur Appl 12. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris Hawks optimization: algorithm and applications. Future Gen Comput Syst 97:849–872 13. Hosseini S, Mohammed B (2020) New hybrid method for attack detection using combination of evolutionary algorithms, SVM and ANN. Comput Netw 14. Khare N, Devan P, Chodhary LC, Bhattacharya S, Singh G, Singh S, Yoon B (2020) Smodno: spider monkey optimization and deep neural network hybrid classifier model for intrusion detection. Electronics, pp 16–18 15. Milosevic S, Bezdan T, Zivkovic M, Bacanin N, Strumberger I, Tuba M (2021) Feed-forward neural network training by hybrid bat algorithm. In: Modelling and development of intelligent systems: 7th international conference, MDIS 2020, Sibiu, Romania 22–24 October 2020, Revised Selected Papers 7. Springer International Publishing, pp 52–66 16. Qureshi AUH, Larijani H, Mtetwa N, Javed A, Ahmad J et al (2019) RNN-ABC: a new swarm optimization based technique for anomaly detection. Computers 8(3):59 17. Rahnamayan S, Tizhoosh HR, Salama MMA (2007) Quasi-oppositional differential evolution. In: 2007 IEEE congress on evolutionary computation, pp 2229–2236. https://doi.org/10.1109/ CEC.2007.4424748 18. Zivkovic M, Bacanin N, Tuba E, Strumberger I, Bezdan T, Tuba M (2020) Wireless sensor networks life time optimization based on the improved firefly algorithm. In: 2020 International Wireless Communications and Mobile Computing (IWCMC). IEEE, pp 1176–1181 19. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 20. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In: 2020 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 21. Zivkovic M, Bezdan T, Strumberger I, Bacanin NKV (2020) Improved Harris Hawks optimization algorithm for workflow scheduling challenge in cloud-edge environment, p 15 22. Zivkovic M, Venkatachalam K, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA (2021) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In: Proceedings of international conference on sustainable expert systems (ICSES 2020), vol 176. Springer Nature, p 169
Random Forest Classification and Regression Models for Literacy Data Mayur Pandya and Jayaraman Valadi
1 Introduction As of 2019, 26.62% of the total Indian population was in the age band 0–14 and 67% belonged to the age band of 15–64. For India to sustain good economic growth as well as increase its human capital, education is one of the determining factors which will play an important role. The process of teaching, learning, and training of human capital in schools and colleges can be summarized as the primary goal of the education system in India. In this study we analyzed data provided by the Ministry of Human Resources Development(MHRD), India [1] to develop models to estimate male, female and Overall literacy rates. Further, We employed proper thresholding to convert these data to build both single and multi-label classification models.
2 Information Data and statistical analysis-based methods are on the rise. Their usage has become more and more prevalent in everyday domains that encompass us, for example, finance, health care, lifestyle and education. The Study of AI as well as using AI in the field of education is more dominant than ever before. The Indian education system has been a constant area of research. Improvement in infrastructure, teaching methods, quality as well as the structure of the entire sector are just some of the many focal points that researchers stress upon. However the changes have been slow-paced. But as the government of India brings in the new M. Pandya Savitribai Phule Pune University, Pune, India J. Valadi (B) Vidyashilp University, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_18
251
252
M. Pandya and J. Valadi
National Education Policy, 2020 (NEP) some of the previous caveats of the older 1996 system are being addressed such as the inclusion of skill-based learning at an early age. At the same time, the literacy rates of India need to improve to tackle the declining employment rates and increase the human capital. Application of machine learning-based methods along with mainstream AI is now finding its way through classrooms of the schools. The quality of education imparted is thus slated for improvement. Machine learning models are used by researchers for enhancing student performances [2]. These models learn about the behavioral patterns of each subject (student) and hence can find out the weak and strong points. This in turn can be used by the same or different models to suggest ways to improve them [3].
3 Previous Work Tanushree Chandra [4] studied the literacy landscape in India between the period 1987 and 2017. Her emphasis was mainly on the gender gap in children, youth, working-age adults, and the elderly. She also recommends launching adult literacy programs linked with skill development and vocational training, offering incentives such as employment and micro-credit, and including technology such as e-learning to bolster adult education, especially for females. Ritimoni Bordoloi [5] reports that open distance learning (ODL) can be raised as an alternative for making education accessible with minimized costs. ODL can play a significant role in transforming and empowering the adult population of a country like India into productive human resources by providing need-based training and equipping them with need-based skills, which are necessary for maintaining a decent standard of living. Tarun Verma et al. [6] used the decision tree algorithm to predict the literacy rates (national and international) using the 2011 census data and building on the previous works. Saurabh Pal et al. [7] studied student performance analysis using decision trees to extract knowledge that describes student’s performance in end semester examinations. Aparna Samudra [8] using the data of the 2011 and 2007 census in India simulated the trends in Female literacy. Kavita Sheoron [9] used the 2015– 2016 census data for the formation of a structured dashboard that gives a reasonable picture of proficiency in different locales of India.
4 Data Curation: Preprocessing and Feature Engineering The paper used data shared by the Ministry of Human Resources Development (MHRD) in 2016 [1]. The raw data set consisted of 680 districts of different states of India (observations) and 819 features. Some of the features in this data are the number of government schools in the district, percentage of urban population, percentage of male literacy and female literacy, etc. Attributes such as growth rate, sex ratio, private
Random Forest for Literacy Data
253
and government school numbers in the district served as input variables while the target variables were male literacy rate, female literacy rate, and overall literacy rate. Using the meta-data provided, it was observed that multiple attribute entries could be summed to form a single attribute, for example, ‘Primary only private schools’, ‘Primary with upper primary private schools’, ‘Upper primary only private school’, ‘Upper primary with secondary private school’ were all combined to form a single ‘Total private schools in district’ for this instance 4 attributes were dropped and only the final summed up attribute was kept. Following this method throughout the dataset, 620 features were replaced by their respective sum aggregates (20 such ‘summed’ attributes remain and 600 are removed). Many attributes that contained ‘No response’ category were removed from data because they did not help in our analysis. Examples containing missing values were removed for simplicity. As a result, 625 observations (district entries) and 222 features (attributes) remained [10]. This new data served the purpose of raw data from this point for our analysis. Henceforth, in this paper we will refer to this data set as the curated data set. Attribute selection was carried out separately for each classification and regression task. For this, we have employed random forest Gini importance ranking [11]. Classification models (both single and multi-label) based on literacy rates as target variable enabled us to determine the factors which help in classification of districts as high literacy district or low literacy district. Also regression analysis was carried out to predict percentage of literate population in the district and also engineer some new features that can improve the prediction capabilities of both regression and classification models.
5 Random Forest Classification and Regression Algorithms 5.1 Random Forests In this work, we employed random forest classification and regression algorithms for the prediction of literacy. Originally developed by Breiman [12]. Random forest is an improvement over classical bagging. Random forest consists of an ensemble of decision trees. Random forest employs two specific randomnesses for performance. In the first randomness, the training set is constituted by distinct bootstrap sampling with a replacement for each tree. The second randomness deals with the attributes for node splitting. Only a randomly selected fixed-size subset of attributes is selected in each node of each tree for the node splitting process. The first randomness enables a reduction in variance and enhancement of generalization capabilities. The second randomness along with the optimal selection subset size chosen for splitting provides the best trade-off between information content and tree correlation. Also with bootstrap sampling with replacement every tree leaves out approximately one-third of examples for building the training model. These left-out examples are known as out of bag examples(OOB). By this, an additional performance measure of OOB error
254
M. Pandya and J. Valadi
estimate is available along with conventional cross-validation procedures for gauging performance. The random forest algorithm is quite similar for handling regression problems. Here the output is obtained by computing the mean of prediction of each tree. Random forest has more desirable properties by which the algorithm has become very popular in different fields of science and engineering. Random forest has two feature selection methods embedded in the algorithm itself. The first one estimates the mean decrease in Gini across nodes in a tree and across the entire forest. The second method employs permuting one attribute at a time randomly and estimates mean decrease in accuracy.
5.2 Feature Ranking and Feature Selection We further trimmed the prepossessed data set by feature ranking and feature selection. For both classification and regression problems, we used the mean decrease in Gini importance embedded in the random forest algorithm itself. Mean decrease in Gini importance is a measure of an attribute’s importance, predictive power, and correlation with the output. We ran random forest with default settings and estimated the mean decrease in Gini for each of the features and thereby ranked the attributes. Subsequently, we built models with all attributes and then removed one lowest ranking attribute at a time and built models for each subset consisting of the remaining attributes and estimated the performance measure. We chose the optimal subset having the highest value of performance measure. We employed the same method for both classification and regression problems.
5.3 Models Developed and Simulations Random forests for regression were used to constitute the model for overall literacy rate regression analysis. The metric used to evaluate the model was the R2 metric. The R2 explains the degree to which input variables can explain variation in the output. Higher the R2, more the variation is explained by the input. For these models we considered both a) the entire data set and b) The optimal subset of attributes. We employed R2 as the performance measure and tuned random forest hyperparameters and selected informative subsets which maximize R2. For an unbiased estimate of performance, we further employed a five-fold cross-validation procedure. In this procedure first, the training data are divided randomly into five different folds. Subsequently, five different models are built with four-folds, with one fold reserved for testing the model. The Mean of five different test R2 values is computed as the fivefold R2. We performed tuning of the random forest hyperparameters, viz. number of trees, a subset of attributes considered for node splitting, and maximum depth of trees employing a standard grid search.
Random Forest for Literacy Data
255
Table 1 Mean values of all 3 literacy rates Label Female literacy rate Male literacy rate Overall literacy rate
Mean value 62 82 72
R2 = 1 − RSS =
n (yi − f (xi ))2 i=1
RSS T SS T SS =
n
(yi − y¯ )2
i=1
where n i s total number of samples, yi is value of that sample, y¯ is mean value of sample, and f (xi ) is the predicted value of the sample. We further developed three binary classification models using appropriate thresholds to group the instances into, Females literate or not literate, Males literate or not literate, Overall literate or not literate. We also found that the median values were very close to mean values. Table 1 shows the mean values of all 3 classes. Examples below the threshold were considered as illiterates (binary value - ‘0’) and examples above the threshold values were considered as literates (binary value - ‘1’). For binary classification, we used Matthews correlation coefficient (MCC) as the performance measure and tuned random forest hyperparameters and selected subsets that maximize MCC. For an unbiased estimate of performance we further employed a five-fold cross-validation procedure.
MCC = √
(T P ∗ T N − F P ∗ F N ) (T P + F P) (T P + F N ) (T N + F P) (T N + F N )
In addition to MCC, accuracy, precision, and recall were also calculated.
Accuracy =
TP TP +TN
Pr ecision =
TP T P + FP
Recall =
TP T P + FN
where, T P is number of true positive examples, T N is number of true negative examples, F P is number of false positive examples, F N is number of false negative examples. This literacy set is amenable for multi-label classification [13]. Several classification data in different fields of science and engineering can be formulated as multi-label classification problems. In multi-label classification, a given example can have more than one label. The literacy data can be treated as a two-label problem. Finally, we developed a multi-label classification Model to predict a given instance with presence or absence of one more label, viz. male literates and female literates. For instance,
256
M. Pandya and J. Valadi
The label vector [1,0] corresponds to an example in which only females are literate. The label vector [0,1] corresponds to an example in which only males are literate. The label vector [1,1] corresponds to examples in which both females and males are literate. The label vector [0,0] corresponds to examples in which neither females nor males are literate. Multi-label problems can be handled by 1) problem transformation and 2) algorithm adaptation methods. Problem transformation methods modify data to suit already available classification algorithms like support vector machines and random forest. In this case, we used problem transformation method known as binary relevance using random forest as our base classifier to formulate our model. For hyperparameter tuning, we used the Hamming Loss (HL) as an indicator for improvement. |N | |L|
HL =
1 X O R(yi j , z i j ) |N | ∗ |L| i=1 j=1
where yi, j is the target, z i, j is the prediction, and XOR is the ’Exclusive or’ operator that returns zero when the target and prediction are identical and one otherwise. In addition to hamming loss F1 macro was also calculated. The Macro F1-score is defined as the mean of class-wise/label-wise F1-scores. TP T P + (1/2)(F P + F N ) N 1 Macr o F1 − scor e = F1 − scor e N i=1 F1 − scor e =
where i is the class/label index and N the number of classes/labels. Regression Analysis of Female Literacy Rates Using the curated data set, we obtained the R2 value as 0.6144. We then applied Gini importance based feature selection on this data employing the methodology described in Sect. 5.2. Further tuning the parameters, the model with the top features obtained the coefficient of determination as 0.7061. The Top 10 features rank-wise were tabulated below in Table 2. Regression Analysis of Male Literacy Rates Following the same procedure, reducing the feature from 222 (curated dataset)to 22. The optimal value R2 for prediction of male literacy was obtained as 0.51862. On application of feature selection, we found the top features carried the maximum predictive power yielding a much improved R2 value of 0.6186. The Top 10 features rank-wise were tabulated below in Table 3.
Random Forest for Literacy Data
257
Table 2 Feature importance (female literacy rate) by decrease in Gini impurity Features Decrease in Gini impurity Total teachers in age group of 55 to 56 Percentage of rural population in district Total enrollment in schools Sex-ratio Total number of schools in the district Percentage of schools in the rural areas of district Number of Villages in the district Total number of teachers in private schools Total enrollment in government rural schools Total teachers in age group of 59 to 60
18.9974 15.915 12.6813 7.016 5.3468 4.469 4.2708 308596 3.5177 2.9009
Table 3 Feature importance (male literacy rate) by decrease in Gini impurity Features Decrease in Gini impurity Total teachers in age group of 55 to 56 Total enrollment in government rural schools Total enrollment in schools Total number of teachers in private schools Total teachers in age group of 57 to 58 Percentage of schools in the rural area of district Schools having computers Total teachers in age group of 59 to 60 Percentage of urban population in district Percentage of rural population in district
12.2113 7.5766 7.3108 5.7725 5.6375 5.4313 5.0569 4.9038 4.2905 4.0384
Regression Analysis of Overall Literacy Rates Using the curated dataset, without attribute selection our model with the curated data provided the R2 value as 0.5910. We further carried out feature elimination using the Gini importance based feature selection procedure. Finally, for overall literacy rates regression we were left with 24 features. . With the top 24 features and optimally tuned parameters achieved a much higher value of R2 as 0.7136. We illustrate the top 10 ranked features for overall literacy rates in Table 4. Classification of Districts Based on Female Literacy Rates For female literacy classification, we found the mean value of literacy rate of curated examples was approximately 62% . As mentioned in Table 1, we kept this as the threshold to divide the data into literacy and non-literacy classes. The tuned model with the entire curated data yielded a cross-validation MCC value of 0.5354 and with feature selection model gave cross-validation MCC value of 0.7501. Table 5 displays top 10 features for female literacy.
258
M. Pandya and J. Valadi
Table 4 Feature importance (overall literacy rate) by decrease in Gini impurity Features Decrease in Gini impurity Total enrollment in schools Total number of schools in the district Percentage of rural population Growth rate Sex ratio Total teachers in age group of 55 to 56 Percentage of urban population in district Total number of regular female teachers Ratio of enrollment to population age group of 6 to 13 Total teachers in age group of 57 to 58
14.1638 11.2578 11.1104 8.1829 6.6142 5.6486 5.5041 4.3764 4.2581 3.4852
Table 5 Feature importance (Female literacy) by decrease in Gini impurity Features Decrease in Gini impurity Number of blocks in the district Number of villages in the district Population of children in the age group of 6 to 15 Total aided government schools in the district Number of clusters in the district Percentage of urban population in the district Total number of schools in the district Growth-rate Total enrollment in the private schools of the district Percentage of scheduled caste population in the district
10.5263 9.4351 9.3672 8.7684 5.9027 5.4236 5.2477 4.0313 3.7294 2.9734
Classification of Districts Based on Male Literacy Rates For male literacy classification, we found the mean value of literacy rate of curated examples around 82%, we kept this as the threshold to divide the data into literacy and non literacy classes. The tuned model with the entire curated data yielded a cross-validation MCC value of 0.5233 and with feature selection gave 0.7336 crossvalidation MCC. Table 6 displays the top 10 features for male literacy.
Random Forest for Literacy Data
259
Table 6 Feature importance (Male literacy) by decrease in Gini impurity Features Decrease in Gini impurity Number of villages in the district Number of blocks in the district Total aided government schools in the district Percentage of scheduled caste population in the district Population of children in the age group of 6 to 15 Number of clusters in the district Percentage of urban population in the district Total number of schools in the district Total enrollment in the private schools of the district Growth-rate
7.3005 7.2928 7.0905 6.0932 5.8287 5.4956 4.9948 4.3116 4.2606 3.7726
Table 7 Feature importance (Overall literacy) by decrease in Gini impurity Features Decrease in Gini impurity Number of blocks in the district Number of villages in the district Total number of aided government schools in the district Population of children in the age group of 6 to 15 Number of clusters in the district Percentage of urban population in the district Percentage of scheduled caste population in the district Total number of schools in the district Total enrollment in the private schools of the district Population of children in the age group of 0 to 6
9.5228 9.1134 7.623 7.4674 5.8205 5.6696 4.7153 4.5914 4.1864 3.5639
Classification of Districts Based on Overall Literacy Rates Referring to Table 1, it was concluded that the mean of overall literacy rate is around 72%, hence values greater than 72% were set to ‘1’ (high literacy district) and lower than 72% were set to ‘0’(low literacy district). Curated data was used initially and later on using feature importance the number of features was reduced. The tuned model with the entire curated data yielded a cross-validation MCC value of 0.5237 and with feature selection gave 0.7654 cross-validation MCC. Table 7 tabulates the top 10 features ranked according to decrease in Gini impurity.
260
M. Pandya and J. Valadi
Table 8 Feature importance for multi-label (Male and Female literacy) by decrease in Gini impurity Features Decrease in Gini impurity Percentage of urban population in the district Growth-rate Total enrollment in the government rural schools of the district Number of schools established in 2001 or prior to 2001 Number of primary level schools having parents to teacher ratio above 30 Total number of single teacher schools in the district Total number of class rooms available Number of professionally qualified female teachers Total number of schools having computers Total number of teachers in age band of 59 to 60
9.2784 6.7870 6.4956 6.2344 5.2755 5.0826 4.4456 4.1518 3.5731 3.1543
Multi-label Classification Using Male and Female Literacy Rates In the binary relevance algorithm, we converted our two label problems into two binary single-label classification problems. The first binary classifier considers the examples in which only males are literate are considered as positive examples. All other examples irrespective of the presence or absence of female literacy are considered negative examples. The second binary classifier considers the examples in which only females are literate are considered as positive examples. All other examples irrespective of the presence or absence of male literacy are considered as negative examples. So two different models are built and any given test example is sent through both the models to get a final decision (0,0 or 0,1 or 1,0 or 1,1) (Table 8).
5.4 Hyper-parameter Tuning Using Grid Search Hyper-parameter tuning was performed for finding out the optimum values of depth of trees, Number of features, the minimum number of samples in leaf, minimum number of samples per split, and number of estimators. Parameter Tuning for Regression. The performance measure was chosen to be the R2 value in this case. Refer to Table 9 for the optimum parameter values for female, male and overall literacy rates respectively.
Random Forest for Literacy Data
261
Table 9 Tuned parameter values for all regression models Parameter Female literacy rate Male literacy rate Number of trees mtry Max depth
300 3 100
320 5 100
Table 10 Tuned parameter values for all classification models Parameter Female literacy Male literacy Number of trees mtry Max depth
280 4 80
300 4 120
Overall literacy rate 300 4 120
Overall literacy 300 5 100
Table 11 Tuned parameter values for multi-label classification model Parameter Male and Female literacy model Number of trees mtry Max depth
180 7 60
Parameter Tuning for Binary Classification The Performance measure was chosen to be MCC for classification models. Table 10 shows the optimum value of parameters observed for the 3 models. Parameter Tuning for Multi-label Classification For the multi-label dataset of male and female literacy rates, we used the hamming loss as a metric. Hamming loss calculates the loss generated in the bit string of class labels during the prediction, it does this by exclusive or (XOR) between the actual and predicted labels and then average across the dataset. Table 11 lists the value of parameters obtained while minimizing the hamming loss.
6 Results and Discussions 6.1 Results As mentioned previously, Table 12 shows the R2 values with the curated data-set and tuned parameters. It can be observed that the male literacy rate model without feature selection obtained the lowest R2 score of 0.51862. Using feature selection and parameter tuning an increase of approximately 19% in the R2 values for the male literacy rate model was observed. For female literacy rates using all the features, the
262
M. Pandya and J. Valadi
Table 12 Regression analysis on curated dataset without feature selection Regression model R squared Female Male Overall
0.6144 0.51862 0.5910
Table 13 Regression analysis after feature selection Regression model R squared Female (19 features) Male (23 features) Overall (24 features)
0.7061 0.6186 0.7136
Table 14 Classification analysis on curated dataset Classification Accuracy MCC model Female (Model 5.3.4(a)) Male (Model 5.3.5(a)) Overall (Model 5.3.6(a))
Percent increase in R-squared 14.92516 19.2780 20.7445
Precision
Recall
0.8128
0.5345
0.7708
0.5522
0.8082
0.5233
0.7518
0.5522
0.7117
0.5237
0.7800
0.7800
R2 score of 0.6144 was the highest. But with feature selection and parameter tuning the percentage rise in R2 value was the lowest among all 3 models. Table 13 shows Optimum values of R2 along with percentage rise in value after feature selection and parameter tuning. It is clear that there is a significant rise in the R2 value for Overall literacy rates. Satisfactory percentage increase is seen in case of female rates also. Table 14 shows the performance of models using the curated data-set and tuned parameters. The 3 models achieve satisfactory accuracies but the cross-validation MCC values here are very low. Table 15 shows the results of models after feature selection and hyper-parameter tuning. The cross-validation MCC values achieved by these models show a significant rise in all 3 cases (Overall, Male, Female). Crossvalidation MCC for female literacy class registered the highest value of 0.5345 with curated data. However, after feature selection process, the cross-validation MCC value for the same binary classification problem was 0.7501. The highest rise in cross-validation MCC after parameter tuning and feature selection was recorded by The Overall literacy class with the value 0.7654 Precision, recall, and sensitivity are also mentioned for the sake of convenience. Table 16 shows the model performance for multi-label classification. The first row describes performance evaluated on the curated data-set while the second row describes the results after the feature selection process as mentioned previously. A
Random Forest for Literacy Data
263
Table 15 Classification analysis after feature selection Classification Accuracy MCC model Female (Model 5.3.4(b), 23 features) Male (Model 5.3.5(b), 21 features) Overall (Model 5.3.6(b), 27 features)
Precision
Recall
0.8700
0.7501
0.8088
0.8111
0.8145
0.7336
0.8181
0.8372
0.8102
0.7654
0.8530
0.8355
Table 16 Multi-label classification of male and female literacy Classification case Accuracy Hamming loss Male and female (222 0.7133 features) Male and female (34 0.8940 features)
F1 score (macro)
0.3747
0.6888
0.0826
0.9222
clear decrease in the hamming loss value from 0.3747 to 0.0826 justifies model’s performance in this case. Accuracy and F1 (macro) scores are also calculated for convenience sake.
6.2 Discussion Based on Inferences Using the model inferences from previous sections and viewing the results, it is clear that model performance for classification and regression were more favorable when fewer attributes were used than when all features from curated dataset were used. Therefore the accuracy is dependent on both characteristics of the random forest algorithm and the number of attributes. Based on the important features from all models of regression as well as classification, features can be sub-categorized into 3 different categories namely District features , Infrastructure features and Human resource dedicated for education. Table 17 describes 3 categories and all the corresponding features. Ranking features in this way provide us with better insights when it comes to improving the education quality in a particular district. Because it can provide information such as what are the feature values in districts that have higher male and female literacy rates. While some districts of Himanchal Pradesh such as Kullu, Mandi, and Solan have an average female literacy rate of 80% other districts such as Hamirpur and Bilaspur have an average of 1%. This is just one of the many example
264
M. Pandya and J. Valadi
Table 17 Features categorization District features Percentage of urban population Percentage of rural population Sex-ratio Growth rate Total enrollment Ratio of enrollment to child population in age band of 0 to 15 Infrastructure features Total number of classrooms available Total government schools in urban areas Total government schools in rural areas Total private schools in urban areas Total private schools in rural areas Schools that were established in 2001 or prior to that Total schools with computers Human resources dedicated for education Total regular male teachers Total regular female teachers Total number of female teachers Total number of male teachers Teachers in age band of 55–56 years
cases in which using the above-mentioned features one can try to understand the sharp inequality in female literacy rates. Of course, other features such as sq kilometer area of the district might play a role in some cases but in the majority of the cases, it is clear that is not the case. Further building upon these categorizations we also managed to directly correlate the top-ranked features with the 3 literacy rates. We found out that states that have female literacy rates higher than the mean value also tend to have a higher number of female teachers in both private and government schools. Both male and female literacy rates are high in districts having high infrastructure feature values, stating the importance of basic infrastructure available to the common masses. Literacy rates are poor in districts that show lower values of both district features as well as human resources dedicated to education. This further indicates that the prevalence of the importance of education as well as its advantages to the overall development of the district needs to be taught and propagated using mass communication mediums. Ranking of features provided some interesting observations, at the same time this method increased the model performance by selecting important features from the curated dataset. The total enrollment in the district, as mentioned previously is a feature common for both regression and classification models. Total enrollment
Random Forest for Literacy Data
265
displays a positive correlation with some other important features such as the total number of male teachers and the total number of female teachers. Another important factor of literacy rates in male, female and overall cases is the engineered feature of the ratio of total enrolled children to total child population in the age band of (6–15). This feature shows a positive correlation with all 3 literacy rates. The feature, viz. total number of teachers both in the private schools as well as government schools explains the availability of the teachers in the district and also displays a higher correlation with total enrollment. Both the total number of teachers in private schools and total number of teachers in government schools features positively correlate with all 3 literacy rates. Incidentally, schools that have attached pre-primary sections have a higher rate of maintaining their enrollment figures as both of these figures share a positive correlation of 0.6877 with each other. Schools with attached pre-primary sections are also an important feature for the multi-label model. The factor also shares a positive correlation with all 3 literacy rates. This can indicate that the teachers who work there have more experience as they teach a wide range of subjects from preprimary to primary and secondary mediums. Schools that are running for 15 years (established in 2001 or before it) are also an important feature for the multi-label model. This feature has a strong positive correlation of 0.70 with total enrollment factor. This indicates that the school is efficiently running the curriculum and has a good administration behind it. One of the important features found in all 3 classification models is the Availability of computers as well as electricity. Both of these features have a positive correlation (0.2597) with the overall literacy of the district. Availability of such fundamental resources in school is very essential as it helps in all round development of students. The sex ratio factor is an important common factor for both regression and classification models, it is defined here as the ratio of the number of females in the district per every 100 males. An initial claim can be made that the set of top-ranking features in the individual curated data sets of female, male and overall literacy rates are different. For instance, in the case of female literacy rate factors such as Total female teachers in private schools that are professionally qualified is an important feature whereas in male literacy rates the same factor is absent as a top feature. Even though it shows a positive correlation of 0.3155 with male literacy rates. There is a positive correlation (0.5077) between the percentage of the urban population and the overall literacy rates. This indicates that higher the urban population percentage in the district, higher is the overall literacy rates. Equivalently a negative correlation is observed between the percentage of rural population of district and its overall, male and female literacy rates. Percent of rural population is an important factor in both male and female literacy rates.Coincidentally, one of the engineered features viz, total enrollment in the government rural schools of the district also has a negative correlation of -0.3152 with the overall literacy rates of the district. Percentage of scheduled tribe population in the district is one of the important factors for male literacy rates in the district, however, it has a negative correlation with overall, male and female literacy rates of the district. The importance of education for children in the scheduled caste and scheduled tribes of the society needs to have more
266
M. Pandya and J. Valadi
penetration. Percentage enrollment of children from scheduled tribes and scheduled castes have a negative correlation with the literacy rates of the district. Some of the underlying factors such as incentives for textbooks and incentives for uniforms etc. are having negative correlations with all 3 literacy rates. this in turn suggests that such structures whilst created to encourage education of children from backward classes need to have better penetration and conversion rate. Another set of factors are the number of teachers in the private sector and government sector, which are split into two parts one being professionally qualified and other the one being not qualified. Engineering a new set of features by taking ratios of corresponding professionally qualified teachers to not qualified teachers helps in explaining the data and also shows a positive correlation with male, female and overall literacy rates. One feature that stands out in this set of features and has the highest positive correlation with all 3 literacy rates, is the total number of regular female teachers with Ratio of Teachers who undergo professional training to regular teachers who do not undergo any professional training also maintains a positive correlation with all 3 literacy rates. While this might be a good indication, teachers on a contract having qualification below secondary, secondary, higher secondary, and graduate have stronger correlation (0.5677) compared to those being postgraduates (0.2877) or having a PhD (0.1247) indicating that influx of teachers with higher qualification needs to be higher than it is presently. Parents to teacher ratio which is the ratio the total number of children (One child per 2 parents) to number of the teachers available has a negative correlation with all 3 literacy rates, one of the same features, Primary level schools having parents to teachers ratio above 30 is an important feature for the multi-label classification model. This is an indication that as the number of enrollments rise in the district, the total number of teachers available to impart education needs to increase. This claim can be further solidified as the factor, ’number of single teacher schools’ in which there is only one teacher per school also shows a negative correlation of (0.3853) with the overall literacy, (0.30877) with male literacy and (0.36811) with female literacy rates.
6.3 Conclusions In this work we have employed a robust random forest algorithm for classification and regression analysis of literacy data. We employed appropriate thresholding to convert male and literacy rates into two labels. We employed Gini Ranking embedded in the random forest algorithm itself to select the top-ranking informative subsets for both classification and regression problems. The ranked subsets provided considerable enhancement in performance. The top-ranked features further provide valuable domain information which can be used to improve literacy rates were also listed. We further used the binary relevance algorithm for multi-label literacy classification problem. This study can serve as an initial step to analyze the factors can help the Indian education system to improve the literacy rates.
Random Forest for Literacy Data
267
Declarations Conflict of Interest: The authors declare that they have no conflict of interest. Code Availability: github.com/MerliN-47/Indian_education_data_2015-16.git.
References 1. Kaggle data page. https://www.kaggle.com/rajanand/education-in-india 2. Jain S, Bindal S, Goel R, Aggarwal G (2021) Indian literacy analysis using machine learning algorithms. Smart Sustain Intell Syst 191–204 3. Anozie N, Junker BW (2006) Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. In: Educational data mining: papers from the AAAI workshop. AAAI Press, Menlo Park 4. Chandra T (2019) Literacy in India: the gender and age dimension. Observ Res Found 322 5. Bordoloi R (2018) Transforming and empowering higher education through open and distance learning in India. Asian Assoc Open Univ J 6. Verma T, Raj S, Khan MA, Modi P (2012) Literacy rate analysis. Int J Sci Eng Res 3(7) 7. Baradwaj BK, Pal S (2012) Mining educational data to analyze students’ performance. arXiv preprint arXiv:1201.3417 8. Samudra A (2014) Trends and factors affecting female literacy-an inter-district study of Maharashtra. Int J Gender Women’s Stud 2(2):283–296 9. Sheoron K. Literacy rate analysis dashboard 10. Thankachan JA, Srinivasan B (2020) A Machine learning approach to analyze and predict the factors in education system: case study of India. In: Proceedings of international conference on communication and computational technologies (ICCCT 2021) 11. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830 12. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 13. Zhang ML, Zhou ZH (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Chapter 19
Towards Robotic Knee Arthroscopy: Spatial and Spectral Learning Model for Surgical Scene Segmentation Shahnewaz Ali and Ajay K. Pandey
1 Introduction Knee arthroscopy—a minimally invasive surgical (MIS) procedure used to treat knee joints both in a diagnostic and therapeutic manner. A typical arthroscopic procedure requires insertion of a surgical tool and camera through small incisions, the overall outcomes in minimum tissue scarring, less surgical trauma, less blood loss, and fast recovery of patients [1]. However, arthroscopy remains among complex MIS procedures which impose several challenges for the surgeons. Surgeons often multitask while performing surgery, such tasks include looking at a screen to inspect knee anatomy while moving imaging devices inside the confined knee cavity and manipulating patient’s leg using flexion and extension movements to get better visualization and tool navigation. The lack of complete visualization of different tissues inside the knee cavity presents a steep learning curve for new surgeons [2]. The complex ergonomics of the knee joint is a very confined space with a convex-concave bone structure which puts limits on the physical dimension of imaging devices. Most often imaging devices can only capture a small fraction of joint structure with 30- or 70degree field-of-view (FoV) in two-dimensional space. Therefore, lack of perception and contextual information, the indirect vision of the surgical space, and the lack of haptic feedback are the fundamental factors that make arthroscopy a challenging procedure. As a result, some unintentional tissue damage is very common in knee arthroscopy [3, 4]. Robot-assisted minimally invasive surgery (RMIS) is currently receiving high research interest which has high potential to mitigate the limitations and challenges of MIS. Precise navigations, resections, and extended dexterity are the major benefits of this platform which has high potential to mitigate limitations associated with knee S. Ali (B) · A. K. Pandey School of Electrical Engineering and Robotics, Faculty of Engineering, Queensland University of Technology, Brisbane, QLD 4001, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_19
269
270
S. Ali and A. K. Pandey
Robotic ARM
Light Source
Arthroscope Camera
Robotic Arthroscopy Fig. 1 Robotic-Knee Arthroscopy. The left image shows the arthroscopic tools mounted on Kinova robotic arm, right image represents the workflow where robotic arm holds camera and light source perform tools incision and guide surgical tools inside the knee cavity
arthroscopy, therefore, there is a trend to move MIS to RMIS [5, 6]. Though in several MIS procedures such as in laparoscopy, this platform technology already has been applied, however, in arthroscopy an autonomous robotic surgical platform is still an open challenge to solve. Figure 1 shows a typical RMIS setup developed for knee arthroscopy. Our group has made significant contributions in addressing key visualization and navigation issues. Conventional monocular cameras have been replaced by a prototype version of a miniaturized stereo camera with the aim to capture surgical scenes with relatively higher FoV (110-degree) in a stereopsis manner to alleviate lack of depth perception. Towards vision-based robotic navigation and localization systems, in this study we presented a semantic segmentation model which solves three mainstream problems, namely, i) situational awareness, ii) safety, and iii) localization in the form of a 3D-segmented surgical scene (maps). Summary of our contributions in this paper as follows. I.
II. III.
We propose a new architecture to segment knee arthroscopic scenes which takes the benefits of multi-scale feature propagation, attention modules, and embedded shape features of the key tissue structures. We further integrate a network that extracts shape information of key tissue structures from arthroscopic image data. Our proposed network takes spectral and spatial image data. Spectral image data is reconstructed from the RGB images.
19 Towards Robotic Knee Arthroscopy …
IV.
271
Though channel wise features have been addressed previously in deep network architecture, we demonstrate that reconstructed spectral images still carry out important features to ease segmentation procedure compared to RGB space in fully convolutional networks.
2 Background Study Semantic segmentation can be defined as a pixel-level classification process. Conventional computer vision algorithms use pixel-level information such as features, textures, and pixel intensity information to segment video frames [7, 8]. The development of a fully convolutional network (FCN) achieved success in segmenting medical images and others [9–13], however, surgical or endoscopic scene segmentation still remains challenging especially when it comes to addressing multi-class segmentation in a domain where intra-class variability is large but inter-class variability is relatively less [15–17]. The situation becomes more complicated when domain-specific constraints are not relaxed such as lack of dataset. As a matter of fact, in the context of surgical scene segmentation the process of creating ground truth labels is always a tedious, expensive, and prone to error task. It requires a long time of involvement of skilled domain experts. However, in many situations these labels are categorized as a weakly label data due to the level of uncertainty. Moreover, attenuated, obscure, occlusions, and frames captured at close proximity intensify the level of difficulty in training an FCN model [1, 2, 17]. Additionally, in many biomedical domains surgical or endoscopic scenes may contain few features and texture-less smooth regions. It makes the segmentation process harder to solve in the convolutional stack. Furthermore, lack of dataset, lack of robust and discriminative features and textures, and poor imaging conditions can limit the FCN model capability to generalize segmentation process to segment video frames having versatile appearances. Although FCN models have been successfully applied in laparoscopic surgery, similar success for arthroscopy has been limited due to the challenges discussed above [2]. A set of images showing the nature of surgical scene visualization challenges encountered in knee arthroscopy is depicted in Fig. 2. Optical reflectance is a fundamental property of materials, which describes spectral distribution of photon energy after light-matter interactions. The characteristics of surface reflectance depend on the composition of the material. In the context of surgical scenes, the composition of material stands for its biological arrangement and variations within it. Therefore, surface reflectance offers another dimension of information for semantic segmentation to tackle feature and textures-less regions of arthroscopic video frames. Furthermore, it can be pivotal information to capture the versatile appearances of tissue structures if imaging conditions are maintained. In arthroscopy, the key tissue structures are the femur, tibia, Anterior Cruciate Ligaments (ACL), and meniscus. These tissue types show different spectral responses from others as is shown in Fig. 3. Among them, structure-type femur and tibia are
272
S. Ali and A. K. Pandey
Fig. 2 Representation of the surgical scenes. The video frame in column (a) is contaminated by noise and different color temperatures due to illumination. The video frame in column (b) represents motion blur, noise, and different color temperatures. The video frame in column (c) represents degenerated tissue structure but excessive lighting conditions overexposed and washed out the frame’s features
(a). Bone
(b). ACL (c).Bone
(d).ACL
Fig. 3 Surface reflectance of tissue type bone and ACL. It has been seen that the reflectances are not identical. (a)–(b) represent the spectral responses of bone-cartilage and tissue type ACL. As the figure depicts, they are substantially different. (c)–(d) represents the spectral intensity images in 12 bands
19 Towards Robotic Knee Arthroscopy …
273
bones that are covered by the tissue type cartilage. Therefore, in the spectral domain (reflectance) the femur and tibia stay in the same tissue class- bone or cartilage. Poor imaging condition is one of the major artifacts which limits the vision tasks [1, 2, 17]. Several attenuation factors such as illumination, blur, noise, debris, shadow, and haze degrade image quality. Although noise, blur, and hazing effects can be corrected using image enhancement methods [29], they can have little effect in the spectral domain. Additionally, illumination strongly affects the surgical scene and often the surgical scene gets saturated. Saturated pixel does not contain any spectral information. Therefore, spectral responses strongly rely on imaging quality and this limitation has been observed in our previous work [1]. In this context, spatial information in RGB space has some advantages with respect to the spectral domain. Hence, in this work, we combined both modalities of image data.
3 Methods 3.1 Spectral Image Reconstruction Reconstruction of reflectance from red, green, and blue (RGB) filters responses has been well studied in computer graphics. [18, 19] proposed a method based on least square estimation from RGB pixel. There are several advancements achieved so far, and it has been proved that the data-driven approach provides better results compared to numerical fitting [20, 21]. However, in this section, we briefly introduce the hypothesis behind the reconstruction method. If Pi (x, y) is the i-th pixel response of a digital camera and the value of Pi (x, y) is the color pixel value in RGB space where RGB stands for the three-color filters, namely red, green and blue in digital imaging system then the response of each color filter is defined by the camera response function as follows [1]; PiR,G,B (x, y) = ∫ tR,G,B (λ) E(λ) S(λ) r(x, y : λ)dλ
(1)
Where, tR,G,B (λ) is the permeability of color filters, PiR,G,B (x, y) is the color response of ith pixel in RGB space, E(λ) is the spectrum of the illuminant, S(λ) is the sensitivity of the camera and r (x, y : λ) is the reflectance spectrum in spatial domain. The vector representation of the Eq. (1) is P = Fr
(2)
Here, P is the spatial response in RGB space, F represents the transformation matrix and r is the reflectance. If the transformation matrix is estimated, then surface reflectance can be obtained from the RGB pixel value. To achieve that wiener method derived a linear relation though the error minimization of the reconstructed reflectance using least square method as follows;
274
S. Ali and A. K. Pandey
e = |r − r_est |
(3)
H. Otsu et al. in their research used tristimulus colors (in CIE XYZ and sRGB space) and confirmed that their proposed method can reconstruct spectra within three basis functions with minimum reconstruction error which makes the above equation as follows [21], s=
3 j=1
wjbj
(4)
In this work we adopt H. Otsu method to reconstruct spectral images [21]. The CIE XYZ color values and the corresponding spectra are collected from the spectrometer under same white illuminant. From the RGB image data the spectral responses from 380 to 730 nm wavelength are calculated.
3.2 Dataset The dataset used in this article is obtained from a stereo camera which is developed in our lab. The details of the camera model can be found in [16, 17]. The arthroscopic sequences are recorded at the Medical and Engineering Research Facility (MERF). The arthroscopic video sequences are recorded from four human cadaver samples, among them three were male donors and one was female donors. To follow the exact clinical flow, the incisions for tools were made at the bottom left and right soft spot below the patella tendon. The dataset contains strong data inconsistency due to tissue degeneration. There are two video arthroscopic video sequences containing highly degenerated cartilage and meniscus tissue.
3.3 Model Our proposed architecture is based on U-Net [22]—a deep learning model to segment medical images which follows the encoder-decoder architecture. On our knee arthroscopic dataset, the modified skip connections between the encoder-decoder layers proposed in U-Net++ achieved higher accuracy in segmenting tissue structure compared to U-Net model [23]. The skip connections proposed in U-Net++ architecture allow multi-scale feature propagations between the encoder and decoder. Which indicates that the long-term dependency is not efficiently maintained. Moreover, the result has little improvement and the overall segmentation accuracy is still limited which is concluded in literature as limitations of the imaging quality. In this work, we proposed a multi-scale densely connected FCN network on top of U-Net with residual learning strategy and attention mechanism. Residual connection is established at each end of the convolution block in U-Net architecture to propagate
19 Towards Robotic Knee Arthroscopy …
275
identity information which is lost during the downsample stage in each layer. The residual connection is as follows [24]; X l+1 = σ (X l + F(X l , Wl ))
(5)
where F(.) known as a residual function, X l is the input feature map from the previous layer and X l+1 is the output feature map of the current layer. In FCN networks, stacked convolutions are used to capture long-range dependency between pixel to semantic map, however, the stacked convolutional layer under local neighborhood seems inefficient [26, 27]. In the context of the surgical scene, the lack of robust and discriminative features, as well as texture-less region, can make the process more ineffective for mapping. Fusion of multi-scale feature maps from the encoder to decoder layer in U-Net architecture can strengthen the information propagation, therefore, make effective pixel to semantic mapping which is proved as an efficient way to address long- and short-term dependency in an FCN network [26–28]. In our architecture, we implemented densely connected layers where each decoder layer receives feature maps from all previously connected encoder blocks. Attention block is a way to provide a high score to the important feature while suppressing less important features. Our network uses both spatial and spectral information therefore, in the proposed architecture we followed the spatial-channel-wise attention mechanism. The spatial attention mechanism provides a score to the spatial feature map in each channel band which is obtained from the following equation [27]; X C x H x W = Conv(X iC x H x W ) HxW HxW X 1x , X 1x = Pool(X C x H x W ) Avg Max HxW HxW Fscor e = Mul(X iC x H x W , Sigma (Conv(Cat(X 1x , X 1x )))) Avg Max
(6)
Here, X iC x H x W is the input feature map, Conv is the convolution operation. Pool perform two way of pooling operation, namely maximum pooling and average pooling to capture salient features in a feature map. Both feature maps are then concatenated expressed with Cat and the final score of each feature map is calculated using sigmoid activation function expressed in Sigma. The attention scores are then multiplied to get scored features in a feature map. Similarly, the channel attention map provides a score in channel-dependent features. Channel-specific features can be thought of as spectral band-specific features as the input of the network has 36 spectral channels and 3 color filter channels. Similarly, the channel attention map is constructed as [27]; X C x H x W = Cov(X iC x H x W )
276
S. Ali and A. K. Pandey x1x1 x1x1 X CAvg , X CMax = Pool(X C x H x W ) x1x1 x1x1 x1x1 x1x1 FDC Avg , FDCMax = Dense(X CAvg , X CMax )
x1x1 C x1x1 FCCHx1x1 = Sigma(Add FDC Avg , FD Max ) C x1x1 FCCHx1x1 , X iC x H x W ) _scor e = Mul(FC H
(7)
Where, Mul(.) represents multiplication, Add(.) represents an addition, and Dense represents a fully connected block. Two global pooling operations—global average pooling and global maximum pooling are performed in this layer to extract spatial global information. Two shared fully connected is used to attain further secondary channel attention map. The score is obtained from the sigmoid function and the weighted feature map is obtained after performing multiplication operation between the feature map and score map. Both channel and spatial attention maps are then added to obtain the final weighted feature map of each layer input. Shape extractor is the minimum implementation of FCN network used as an embedded network within our main segmentation. The shape extractor extracts the shape of key tissue structures as a binary mask. The aim of this network is to propagate shape-aware feature learning, due to the shape of tissue structure carrying out important information of tissue type, for instance, bone shapes are usually concave-convex rounded shapes. In our implementation, we separately trained our shape extractor with binary cross-entropy loss function and in the segmentation network training phase we used pre-trained network weights. Each network is trained with the Adam optimizer at a learning rate of 0.001. The summation of the categorical cross-entropy (CE) and Dice Coefficient loss function is used as the total loss function to train the segmentation network (Fig. 4). During the training, we used to select joint structures from the video sequences of two cadaver experiments (experiments 1 and 4). And the model was tested on other video sequences of two cadaver experiments (experiment 2 and 3). Moreover, the test dataset contains degenerated femur cartilage, meniscus, and ACL tissue. Hence, we used 100 video frames from 1426 video frames and the selection was based on the inconsistency of the dataset. In clinical practice as well as during the cadaver experiments surgeons lose their confidence and recourse the arthroscopy procedure from landmark position which tissue type femur. This workflow creates highly imbalanced data. To tackle data imbalances, several data augmentation policies can be applied. In our context, we found that rotation and scale shift augmentation policy improve segmentation accuracy. Our philosophy was to train the network with and high-quality images and test with the relatively poor dataset.
19 Towards Robotic Knee Arthroscopy …
277
Fig. 4 The deep learning model for arthroscopic scene segmentation. (a) represents a block diagram of the complete network. The network takes an image and reconstructs multi-spectral images. Shape extractor network also presented which extracts the shape of key tissue structures. In (b), the Segmentation network block is presented
278
S. Ali and A. K. Pandey
4 Results Figure 6 represents the segmented maps obtained from our method. Analytical information reveals that the bone cartilage (tissue type femur and tibia) received the highest segmentation accuracy among the others. The spectral response of cartilage has a larger difference than others; hence it received the highest accuracy. Moreover, the spectral-based approach only can segment the femur and tibia into one class because both bones are covered by it where spatial features and the shape further helps to classify this segmented map into tibia and femur. The achieved segmentation accuracy for bone is >91%. In previously implemented segmentation models, the average accuracy (dice coefficients) for femur, tibia, ACL, and meniscus are 0.78, 0.50, 0.41, 0.43 using the U-net and 0.79, 0.50, 0.51, 0.48 using the U-net++ [2], whereas our score are 0.9145, 0.71, 0.389, 0.617 for femur, tibia, ACL, and meniscus, respectively (Fig. 5 and Table 1). The proposed model achieved relatively higher accuracy to segment tissue type femur and tibia. Relatively less accuracy received to the tissue type ACL and meniscus because of tissue degenerations, variation in their structural representation, and appearances. Moreover, tissue type ACL and meniscus were highly affected by pixel saturation. Due to saturated pixels exhibiting less structural, spatial, and spectral information, the model achieved less segmentation accuracy. From the lack of visual and spectral information, segmenting an unseen anatomical structure is still an open challenge for the segmentation model. During the network modification, the model achieved higher accuracy deliberately compare to variants of U-Net. For instance, residual U-net [30] with attention gates, the validation accuracy was 0.7510 whereas our proposed model achieved 0.7724.
Fig. 5 The figure represents the segmentation accuracy obtained from our model. The graph shows that the model achieved higher average accuracy when compared to the method proposed in [2]
19 Towards Robotic Knee Arthroscopy …
279
Fig. 6 The figure represents the segmentation maps predicted from the trained model. In the third-row image, prediction of the tissue type ACL was relatively poor due to in the test dataset all the input images of ACL are oversaturated, degenerated, and almost no pixel-level information exists. Moreover, the saturated pixel does not convey spectral information. Hence, we received poor performance in ACL segmentation
Table 1 Segmentation result Dataset
Femur
Tibia
ACL
Meniscus
Dataset2
0.943
0.66
0.417
0.641
Dataset3
0.886
0.76
0.362
0.594
Average
0.91
0.71
0.39
0.62
5 Conclusion We have shown that combining the spectral signature of different tissues present in the knee cavity with the structural information available in RGB images can be a promising approach to achieve full-scale semantic segmentation of surgical scenes. The robustness of our approach provides a high level of accuracy in segmenting tissue surfaces that otherwise appear texture-less. A close look at the segmentation output of the proposed network reveals overall better accuracy than manually contoured ground truth segments. The fact that this has been achieved with the dataset that mostly contains video frames obtained from a low-resolution miniature camera further builds confidence in the richness of optical reflectance data for RMIS and endoscopic image segmentation. This in conjunction with the shape extracting feature of our network provides sharp contours between different tissues that are difficult to visualize with the naked eye.
280
S. Ali and A. K. Pandey
Acknowledgements This work is supported by the Australia-India Strategic Research Fund (Grant AISRF53820), the Medical Engineering Research Facility at QUT, and QUT’s Centre for Robotics. We acknowledge Dr. Yu Takeda, Dr. Fumio Sasazawa for the ground truth segmentation data used in this study.
References 1. Ali S et al (2021) Arthroscopic Multi-Spectral Scene Segmentation Using Deep Learning. arXiv preprint arXiv:2103.02465 2. Jonmohamadi Y et al (2020) Automatic segmentation of multiple structures in knee arthroscopy using deep learning. IEEE Access 8:51853–51861. https://doi.org/10.1109/ACCESS.2020.298 0025 3. Jaiprakash A et al (2017) Orthopaedic surgeon attitudes towards current limitations and the potential for robotic and technological innovation in arthroscopic surgery. J Orthop Surg 25(1):230949901668499 4. Price AJ et al (2015) Evidence-based surgical training in orthopaedics: how many arthroscopies of the knee are needed to achieve consultant level performance? Bone Joint J 97(10):1309–1315 5. Prete FP et al (2018) Robotic versus laparoscopic minimally invasive surgery for rectal cancer: a systematic review and meta-analysis of randomized controlled trials. Ann Surg 267(6):1034– 1046 6. Xia L et al (2019) National trends and disparities of minimally invasive surgery for localized renal cancer, 2010 to 2015. Urol Oncol: Semin Orig Investig 37(3):182.e17–182.e27 7. Malik J et al (2001) Contour and texture analysis for image segmentation. Int J Comput Vis 43(1):7–27 8. Suga A et al (2008) Object recognition and segmentation using SIFT and graph cuts. In: 2008 19th international conference on pattern recognition. IEEE 9. Maqbool S et al (2020) m2caiSeg: Semantic Segmentation of Laparoscopic Images using Convolutional Neural Networks. arXiv preprint arXiv:2008.10134 10. Scheikl PM et al (2020) Deep learning for semantic segmentation of organs and tissues in laparoscopic surgery. Curr Dir Biomed Eng 6(1):1–11 11. Kletz S et al (2019) Identifying surgical instruments in laparoscopy using deep learning instance segmentation. In: 2019 international conference on content-based multimedia indexing (CBMI). IEEE 12. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 13. Diakogiannis FI et al (2020) ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114 14. Jha D et al (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM). IEEE 15. Sun J et al (2020) Saunet: shape attentive u-net for interpretable medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham 16. Ali S et al (2021) Supervised scene illumination control in stereo arthroscopes for robot assisted minimally invasive surgery. IEEE Sensors J 21(10):11577–11587 17. Liu F et al (2020) Self-supervised depth estimation to regularise semantic segmentation in knee arthroscopy. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham 18. Stigell P, Miyata K, Hauta-Kasari M (2007) Wiener estimation method in estimating of spectral reflectance from RGB images. Pattern Recognit Image Anal 17(2):233–242
19 Towards Robotic Knee Arthroscopy …
281
19. Chen S, Liu Q (2012) Modified Wiener estimation of diffuse reflectance spectra from RGB values by the synthesis of new colors for tissue measurements. J Biomed Opt 17(3):030501 20. Peng X et al (2017) Self-training-based spectral image reconstruction for art paintings with multispectral imaging. Appl Opt 56(30):8461 21. Otsu H, Yamamoto M, Hachisuka T (2018) Reproducing spectral reflectances from tristimulus colours: reproducing spectral reflectances from tristimulus colours. Comput Graph Forum 37(6):370–381 22. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham 23. Zhou Z et al (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, pp 3–11 24. He K et al (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, Cham 25. Wang X et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 26. Zhang J et al (2018) MDU-net: multi-scale densely connected u-net for biomedical image segmentation. arXiv preprint arXiv:1812.00352 27. Cheng J et al (2020) Fully convolutional attention network for biomedical image segmentation. Artif Intell Med 107:101899 28. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 29. Ali S et al (2021) Surgery Scene Restoration for Robot Assisted Minimally Invasive Surgery. arXiv preprint arXiv:2109.02253 30. Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15(5):749–753
Chapter 20
Opposition-Based Arithmetic Optimization Algorithm with Varying Acceleration Coefficient for Function Optimization and Control of FES System Davut Izci , Serdar Ekinci , Erdal Eker , and Laith Abualigah
1 Introduction In recent decades, there has been a significant amount of research effort towards the development and applications of optimization algorithms as they provide significantly greater efficiency to applications in different research fields [1–3]. In that sense, metaheuristic algorithms have dominated the related research since they have been proved to be able to achieve far better solutions. As part of this effort, the arithmetic optimization algorithm (AOA) has been developed as one of the most recent and also competitive metaheuristic algorithms that has been developed for optimization problems [4]. This algorithm wisely models the arithmetic operators (addition, subtraction, multiplication, and division) in order to solve the optimization problems. Despite its demonstrated competitiveness, AOA may not perform well in terms of reaching efficient solutions for all available optimization problems [5]. The latter occurs due to stochastic nature and results the unbalanced exploration and exploitation stages. In this work, the capability of the original form of AOA was attempted to be further enhanced by using an opposition-based learning (OBL) [6] mechanism. However, instead of the original form of OBL, a novel modified version of it (mOBL) was employed in this study in order to achieve better enhancement for AOA.
D. Izci (B) Department of Electronics and Automation, Batman University, Batman, Turkey e-mail: [email protected] S. Ekinci Department of Computer Engineering, Batman University, Batman, Turkey E. Eker Vocational School of Social Sciences, Mus Alparslan University, Mus, Turkey L. Abualigah Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_20
283
284
D. Izci et al.
The proposed mOBL based AOA (mAOA) algorithm’s performance was firstly assessed against classical benchmark functions by statistically comparing it with the original form of AOA. The obtained results from those benchmark functions demonstrated the better optimizing performance of the proposed mAOA algorithm. To further demonstrate the optimization ability of the proposed mAOA algorithm, a biomedical system known as functional electrical stimulation (FES) [7] was adopted as a real-world engineering challenge. The FES is a method that is used to restore the movement of the paralyzed limbs [8] and requires a controller for appropriate functioning. For the latter purpose, a proportional-integral-derivative (PID) controller was employed in this study due to ease of implementation. The performance of mAOA for optimizing the PID controlled FES system was compared with both the original AOA based and the classical Ziegler-Nichols based PID. The mAOA based PID controlled FES system has been shown to have better transient responses compared to similar systems tuned by the classical Ziegler-Nichols method and original AOA algorithm.
2 The Arithmetic Optimization Algorithm The arithmetic optimization algorithm is one of the latest population-based metaheuristic algorithms and uses basic addition, subtraction, multiplication and division operators to determine the best solution from candidate solutions [4]. A set of random solutions (xi, j ) are generated in the initialization. Then, a function called Math Optimizer Accelerated (M O A) is adopted for selection of the exploration or exploitation search phases. The explorative process is performed for r1 > M O A. In here r1 stands for a random number. To perform exploration, the arithmetic operators of multiplication (M) or division (D) are used which are mathematically modeled as follows: xi, j (tc + 1) =
best x j × M O P × U B j − L B j × μ + L B j , f or r2 > 0.5 best x j ÷ (M O P + ) × U B j − L B j × μ + L B j , f or r2 < 0.5
(1) where i th solution in the next iteration is denoted by xi(tc + 1) and its j th position in current iteration is represented by xi, j (tc ). The best x j represents the best solution’s j th position obtained so far. Besides, μ is a control parameter for search process adjustment whereas stands for a small integer number whereas. U B j and L B j respectively represent the upper and lower bounds of the j th position. The M O P function represents the Math Optimizer probability. A random number (r2 ) is used to decide about the execution of M or D which are used for exploration. As can be seen from the above equation, the M operator performs for r2 > 0.5. Until the completion of the latter task, the D operator is neglected. The execution occurs vice versa for r2 < 0.5.
20 Opposition-Based Arithmetic Optimization Algorithm …
285
For exploitation, addition (A) and subtraction (S) operators are used. This phase is performed for r1 < M O A. The following equation models the exploitation phase. The operators of A and S are executed based on the value of r3 which is another random number. The A operator is executed for r3 > 0.5 and the operator of S is neglected until the task is finished whereas for r3 < 0.5 vice versa occurs. xi, j (tc + 1) =
best x j + M O P × U B j − L B j × μ + L B j , f or r3 > 0.5 best x j − M O P × U B j − L B j × μ + L B j , f or r3 < 0.5 (2)
3 Modified Arithmetic Optimization Algorithm In terms of enhancing the metaheuristic algorithms, the opposition-based learning (OBL) [6] has so far been shown to be a capable strategy as it provides a good opportunity to avoid stagnation amongst candidate solutions [9]. To explain the OBL in brief, let X be a real number within [lb, ub]. The opposite (X ) number can then be calculated as: X = ub + lb − X
(3)
For D-dimensional search space, it can be defined as follows where X i ∈ [ubi , lbi ] and i ∈ 1, 2, . . . , D: X i = ubi + lbi − X i
(4)
Different versions of OBL mechanism such as generalized OBL [10], quasiOBL [11], modified OBL [12], selective OBL [13], orthogonal OBL [14] and neighborhood OBL [15] have so far been proposed to enhance the metaheuristic algorithms. The modified opposite solutions are simultaneously calculated by the modified OBL proposed in this study using the following form: X i = r1 · ubi + r2 · lbi − r3 · X i
(5)
where r1 , r2 and r3 are randomly generated three different numbers within [0, 1]. Once calculated, the best N solutions are chosen from the union set of X and X solutions. Unlike the previously listed versions of OBL, the proposed mAOA algorithm wisely allows the operation of original AOA and mOBL based on a probability coefficient (Pm O B L ) which is calculated as follows:
286
D. Izci et al.
Fig. 1 The proposed mAOA algorithm’s flowchart
Pm O B L = vmax − tc
vmax − vmin tmax
(6)
where tc stands for the current iteration whereas tmax is the maximum iteration number. In this study, vmax and vmin were set to 1 and 0.01, respectively. As can
20 Opposition-Based Arithmetic Optimization Algorithm …
287
be seen, the probability coefficient is updated in each iteration and linearly decreases with respect to iteration numbers. In each iteration, this coefficient is compared with a random number (rand) and mOBL strategy is activated for Pm O B L > rand, otherwise, only AOA operates. In this way, the Pm O B L coefficient provides the balance for explorative and exploitative stages. Figure 1 shows the operation of the proposed mAOA algorithm in detail.
4 Performance Evaluation Against Benchmark Functions 4.1 Details of the Employed Benchmark Functions The mAOA algorithm’s performance was initially tested with the benchmark functions listed in Table 1. The details (name, equation, range and optimum) of those benchmark functions are also listed in the respective table. Besides, the dimension size (D) for all test functions was taken to be 30 and 100 in order to demonstrate the performance of the mAOA algorithm for lower and higher dimensional problems.
4.2 Experimental Results and Discussion The comparative statistical results of the benchmark functions achieved by both algorithms are listed in Table 2. Those numerical results were obtained by running the algorithms for 30 times and adopting a population size of 50 along with a maximum iteration number of 1000. Besides, the control parameters of AOA were set to be 1 for Max, 0.2 for Min, 5 for α and 0.499 for μ. As can be observed from the table, the proposed mAOA algorithm provides far better results in terms of finding the best optimal solutions. Table 1 Details of employed benchmark functions Function name Equation of test function Range D−1 2 100 xi+1 − xi2 + (xi − 1)2 [−30, 30] Rosenbrock F1 (x) =
Optimum 0
i=1
Step
F2 (x) =
D
(xi + 0.5)2
[−100, 100] 0
i=1
Schwefel
F3 (x) = −
D
√ xi sin |xi |
[−500, 500] −418.9829 × D
i=1
Griewank
F4 (x) =
1 4000
D i=1
xi2 −
D i=1
cos
xi √ i
+1
[−600, 600] 0
288
D. Izci et al.
Table 2 Statistical result of AOA and proposed mAOA for different dimensions Function Statistical metric F1 (x)
F2 (x)
F3 (x)
F4 (x)
D = 30
D = 100
AOA
mAOA
AOA
mAOA
Average
2.8093E+01
1.2349E−02
9.8834E+01
3.8971E−02
STD
4.0285E−01
1.5138E−02
1.2900E−01
4.4890E−02
Best
2.7082E+01
1.7116E−05
9.8416E+01
2.9388E−04
Worst
2.8844E+01
7.1277E−02
9.9005E+01
1.7936E−01
Average
2.5070E+00
2.1288E−03
1.7131E+01
5.9759E−03
STD
2.5341E−01
2.6168E−03
5.8297E−01
6.7210E−03
Best
2.0323E+00
2.1140E−05
1.6029E+01
1.6566E−06
Worst
3.0026E+00
1.0466E−02
1.7973E+01
3.0240E−02
Average
−6.1968E+03 −1.2558E+04
STD
4.3504E+02
Best
−6.9854E+03 −1.2569E+04
−1.3393E+04 −4.1898E+04
Worst
−5.1236E+03 −1.2529E+04
−9.6804E+03 −4.1490E+04
Average
6.5438E−02
6.5705E−03
2.6480E+02
5.8661E−03
STD
6.1244E−02
7.2064E−03
1.5098E+02
5.6034E−03
Best
2.3830E−04
3.4448E−07
7.7621E+01
1.0453E−04
Worst
2.6319E−01
2.9630E−02
6.5797E+02
1.8369E−02
1.3241E+01
−1.1408E+04 −4.1808E+04 8.1413E+02
1.0960E+02
5 Performance Evaluation on Biomedical System 5.1 Modeling of Functional Electrical Stimulation The block diagram given Fig. 2 (a) shows a functional electrical stimulation (FES) system and Fig. 2 (b) provides the Hill model which is used to describe the muscle’s behavior as a mechanical system [16]. As can be seen from the diagram, the application of the electrical stimulation consists of a feedback-loop control. In the respective figure, xr e f (t), f mt (t), x(t) and e(t) stands for desired length of the muscle, tension, the actual length of the muscle and the error, respectively. Besides, ks is the spring constant and (generates a force f s ), c is the damping constant (develops a force f c ) and f f is the friction force. Using the model given in Fig. 2, the open-loop transfer function of the FES system can be obtained as follows [16]. G(s) =
1 X (s) = Fmt (s) ms 2 + (c + μmg)s + ks
(7)
It is also worth noting that the strain gauge is used as sensor which introduces a time delay. Therefore, in this study, the first order Pade approximation was used to obtain the following form for the sensor.
20 Opposition-Based Arithmetic Optimization Algorithm …
289
Fig. 2 System control for FES a and Hill’s muscle model b
2 − Ts H (s) = e−T s ∼ = 2 + Ts
(8)
5.2 PID Controlled System and Application of mAOA The transfer function of a PID controller is given as follows [17]. C(s) = K p +
Ki + Kd s s
(9)
In the latter definition, K p , K i and K d respectively stand for proportional, integral, and derivative gains. The limits for the parameters of the PID controller were chosen to be 0 ≤ K p , K i , K d ≤ 200. The following objective function (Fobj ) was used to achieve the optimum performance. Fobj = δ
% OS + (1 − δ)Ts 100
(10)
In the equation given above, δ is the balancing coefficient between percent overshoot (%O S) and settling time (Ts ). The best value for δ was found to be 0.95. Including the PID controller into FES system would yield the following closed-loop transfer function. T (s) =
C(s)G(s) X (s) = X r e f (s) 1 + C(s)G(s)H (s)
290
D. Izci et al.
Table 3 Adopted system parameters
Parameter
Value
Unit (SI)
m
100
g
ks
20
N /m
c
4
N − s/m
g
9.81
m/s 2
μ
0.1
−
T
0.7
s
Fig. 3 mAOA based PID controller tuning strategy
K d s 2 + K p s + K i (2 + T s) = 2 ms + (c + μmg)s + ks (2 + T s)s + K d s 2 + K p s + K i (2 − T s) (11)
Table 3 lists the system related values adopted in the equation given above and Fig. 3 illustrates the implementation of the proposed mAOA algorithm to PID controlled FES system.
5.3 Simulation Results and Discussions The detailed optimization process for minimization of Fobj objective function is provided in Fig. 3. For the optimization process, a population of 50 with 40 iterations
20 Opposition-Based Arithmetic Optimization Algorithm …
291
Fig. 4 Step response of the feedback system with PID controller tuned by various approaches
were used for AOA and mAOA algorithms. Then, both algorithms performed 30 runs. The controller parameters that were obtained by the best runs of the algorithms are listed in Table 4. The numerical values provided in Tables 3 and 4 can be substituted into Eq. (11) to obtain the closed loop transfer functions given in the following equations for both original AOA and the proposed mAOA algorithms. T AO A (s) = Tm AO A (s) =
50.49s 3 + 205.1s 2 + 186s + 34.63 70s 4 + 221s 3 + 301.6s 2 + 201.7s + 34.63 64.26s 3 + 246.7s 2 + 192.5s + 34.6 + 207.2s 3 + 338.6s 2 + 208.3s + 34.6
70s 4
(12)
(13)
Besides, the controller parameters obtained by the classical tuning method (first method of Ziegler-Nichols) are also listed in Table 4 and the corresponding closed loop transfer function is given as follows. TZ N (s) =
Table 4 Obtained PID parameters tuned by Ziegler-Nichols, AOA and mAOA methods
46.68s 3 + 205.2s 2 + 232.8s + 78.92 70s 4 + 224.8s 3 + 279.8s 2 + 217.6s + 78.92
(14)
Parameter
Ziegler–Nichols method
AOA method
mAOA method
Kp
102.5940
86.9251
90.2035
Ki
39.4592
17.3156
17.2994
Kd
66.6861
72.1302
91.7929
292
D. Izci et al.
Table 5 Transient response analysis results when a unit step input is applied Transient response specification Peak Maximum overshoot (%) Rise time (sec) Settling time (sec) Peak time (sec)
Ziegler–Nichols tuning method
AOA tuning method
mAOA tuning method
1.4548
1.1341
1.0941
45.4824
13.4114
9.4075
0.9699
1.1416
0.9548
10.9948
4.3240
3.0880
2.6341
2.4686
1.9722
The comparative step response of the feedback system with PID controller tuned by Ziegler-Nichols, AOA and mAOA methods are illustrated in Fig. 4. As can be seen the proposed mAOA algorithm helps achieving better response. Besides, the comparative numerical results of methods listed above are also listed in Table 5. The latter numerical values further verify the better ability of the proposed mAOA algorithm for a PID controlled FES system as clearly the faster response with less overshoot and settling time is achieved by the proposed method.
6 Conclusion This paper describes the development of a novel mAOA algorithm which integrates the mOBL scheme with the original AOA. A good balance between explorative and exploitative stages has been achieved in this way. The comparative evaluation on four well-known benchmark functions have demonstrated better capability of the proposed mAOA algorithm. Further assessment of the mAOA has been performed by designing a PID controlled FES system as a real-world biomedical optimization problem. A comparative transient response analysis has been performed using the original AOA and the classical Ziegler-Nichols based PID controlled FES systems. The latter analysis has shown better capability of the proposed mAOA for such a real-world biomedical system. The proposed method has the advantage of providing good results for potential future works related to different biomedical systems, as well.
References 1. Izci D (2021) An enhanced slime mould algorithm for function optimization. In: 2021 3rd international congress on human-computer interaction, optimization and robotic applications (HORA). IEEE, pp 1–5 2. Eker E, Kayri M, Ekinci S, Izci D (2021) A new fusion of ASO with SA algorithm and its applications to MLP training and DC motor speed control. Arab J Sci Eng 46:3889–3911.
20 Opposition-Based Arithmetic Optimization Algorithm …
293
https://doi.org/10.1007/s13369-020-05228-5 3. Izci D, Ekinci S, Hekimo˘glu B (2022) A novel modified Lévy flight distribution algorithm to tune proportional, integral, derivative and acceleration controller on buck converter system. Trans Inst Meas Control 44:393–409. https://doi.org/10.1177/01423312211036591 4. Abualigah L, Diabat A, Mirjalili S et al (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609. https://doi.org/10.1016/j.cma.2020.113609 5. Izci D, Ekinci S, Kayri M, Eker E (2021) A novel improved arithmetic optimization algorithm for optimal design of PID controlled and bode’s ideal transfer function based automobile cruise control system. Evol Syst. https://doi.org/10.1007/s12530-021-09402-4 6. Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06). IEEE, pp 695–701 7. Lynch CL, Popovic MR (2008) Functional electrical stimulation. IEEE Control Syst 28:40–50. https://doi.org/10.1109/MCS.2007.914689 8. Nekoukar V (2020) Control of functional electrical stimulation systems using simultaneous pulse width, amplitude, and frequency modulations. Neuromodulation Technol Neural Interface. https://doi.org/10.1111/ner.13126 9. Izci D, Ekinci S, Eker E, Kayri M (2020) Improved manta ray foraging optimization using opposition-based learning for optimization problems. In: 2020 international congress on human-computer interaction, optimization and robotic applications (HORA). IEEE, pp 1–6 10. Wang H, Wu Z, Rahnamayan S et al (2011) Enhancing particle swarm optimization using generalized opposition-based learning. Inf Sci 181:4699–4714. https://doi.org/10.1016/j.ins. 2011.03.016 11. Mandal B, Roy PK (2013) Optimal reactive power dispatch using quasi-oppositional teaching learning based optimization. Int J Electr Power Energy Syst 53:123–134. https://doi.org/10. 1016/j.ijepes.2013.04.011 12. Izci D, Ekinci S, Zeynelgil HL, Hedley J (2022) Performance evaluation of a novel improved slime mould algorithm for direct current motor and automatic voltage regulator systems. Trans Inst Meas Control 44:435–456. https://doi.org/10.1177/01423312211037967 13. Dhargupta S, Ghosh M, Mirjalili S, Sarkar R (2020) Selective opposition based grey wolf optimization. Expert Syst Appl 151:113389. https://doi.org/10.1016/j.eswa.2020.113389 14. Wang W, Xu L, Chau K et al (2021) An orthogonal opposition-based-learning Yin–Yang-pair optimization algorithm for engineering optimization. Eng Comput. https://doi.org/10.1007/s00 366-020-01248-9 15. Zhao X, Feng S, Hao J et al (2021) Neighborhood opposition-based differential evolution with Gaussian perturbation. Soft Comput 25:27–46. https://doi.org/10.1007/s00500-020-05425-2 16. Fernández de Cañete J, Galindo C, Barbancho J, Luque A (2018) Automatic Control Systems in Biomedical Engineering. Springer, Cham 17. Izci D, Hekimo˘glu B, Ekinci S (2022) A new artificial ecosystem-based optimization integrated with Nelder-Mead method for PID controller design of buck converter. Alexandria Eng J 61:2030–2044. https://doi.org/10.1016/j.aej.2021.07.037
Chapter 21
Robot Path Planning Using β Hill Climbing Grey Wolf Optimizer Saniya Bahuguna and Ashok Pal
1 Introduction Robotics is a field that combines science, engineering, and technology to create devices known as robots. In other words, the robotics area generates programmable devices that aid humans or replicate human activities. Initially, robots were created to do routine tasks (like as building automobiles on a production line), therefore their employment was restricted to the industrial industry. They have now been expanded to include tasks such as firefighting, tidying houses, and assisting with exceptionally difficult procedures. It is also now widely employed in a variety of domains, including entertainment, medical, mining, rescuing, education, defense, aerospace, agribusiness, and many more. Every robot seems to have a distinct level of mobility, spanning from completely unsupervised robots that perform tasks without any outer impacts to human-controlled bots that execute tasks under the direct supervision of a supervisor. As technology advances so does the spectrum of robotics. Robotics has evolved to include the creation, building, and implementation of botnets that investigate Earth’s toughest conditions, robots that serve law enforcement, and sometimes even robots that support in virtually every area of healthcare. The robot is integrated with several smart types of equipment that are essential to design the environment and locate its location, regulate motion, identify impediments, and avoid obstacles using navigational approaches while executing the task of navigation. In robot navigation, a mobile robot travels from the source station to the target destination without the assistance of humans, avoiding collisions with objects and determining the best path via iterations. Path planning is the most critical function of any navigational technology and is one of the most significant research problems in robotics. It is a basic building block for robotic systems that allows them to find the shortest or optimal path between two points. The most fundamental role of any navigational S. Bahuguna (B) · A. Pal Chandigarh University, Mohali, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_21
295
296
S. Bahuguna and A. Pal
approach is to design a safe course from the beginning place to the goal position (by recognizing and avoiding obstacles). As a result, while operating in a simple or complicated environment, the right alternative for the navigational strategy is by far the most critical phase in the course plan of a robot. And proposing path planning solves several difficulties in a variety of industries. Path planning or robot navigation can be done in static as well as dynamic environments. The position of the impediment (if any) is fixed in a static environment and does not vary over time. In a dynamic environment, however, the position of the obstacle varies over time. There are two types of path planning: global path planning and local path planning. Global path planning occurs when the robot has prior knowledge of the surroundings. It’s done before the robot starts moving in offline mode. If the robot has no prior knowledge of the environment, it must detect the existence of obstacles and make a choice on how to travel towards the target station while avoiding collisions. This is known as local path planning and is done in online mode. Figure 1 depicts the fundamental processes involved in the robot’s functioning. Global navigation and local navigation are two types of navigation strategies that rely on past knowledge of the surroundings for path planning. Global navigation strategy deals with a completely known environment and path planning method for a given environment is based on traditional techniques such as the Cell Decomposition (CD) Approach, the Roadmap Approach (RA), and the Artificial Potential Field (APF). These are classic algorithms with limited intellect. Local navigational techniques are referred to as reactive approaches since they are cleverer and more capable of independently controlling and executing a plan. As a result, the numerous methods used for mobile robot navigation may be divided into two categories: classical and reactive approaches. As a result, the numerous methods used for mobile robot navigation may be divided into two categories: classical and reactive approaches. Since
Fig. 1 Flow diagram for mobile robot navigation
21 Robot Path Planning Using β Hill Climbing Grey Wolf Optimizer
297
artificially intelligent approaches had not yet been developed, traditional methodologies were initially highly popular for handling robot navigational issues. While using traditional methods to complete a task, it has been observed that either a solution is created or the absence of a result is established. This strategy’s primary shortcoming is its high processing cost and inability to react to unpredictability in the environment; as a result, it is not suggested for real-time use. Reactive approaches, on the other hand, have become the most preferred tool for mobile robot navigation, surpassing traditional approaches. These include genetic algorithms, fuzzy logic, neural networks, particle swarm optimization, ant colony optimization, artificial bee colony, grey wolf optimizer, and other algorithms. They have a lot of potential for dealing with the uncertainty in their environment. As a hybrid algorithm, the reactive techniques are also employed to increase the efficacy of the classical approaches. As a result, reactive techniques are increasingly popular and frequently employed for mobile robot route planning.
2 Literature Review A robot is a system of different modules and entities which interact with each other. Path-planning is a fundamental foundation for robotic systems that enables it to identify the shortest or somehow the best path between two places. Mobile robots are currently widely utilized in a number of industries, including medical and surgical uses, agriculture, climate checking, military applications, personal assistance, ocean and space exploration and various others where the interest has increased over time and is being explored and discussed [1]. These are machines that have the capacity to maneuver through various environments [2]. As a result, path planning in mobile robotics system is a crucial part of mobile robot study. Therefore, primary focus of path planning is to devise a strategy that is free of collisions from source to destination across an environment with obstacles [3]. And thus, while operating in a simple or complicated environment; the right selection of the navigational strategy is the most critical phase in the course planning of a robot. The job of path planning for mobile robots is usually regarded as one of the most difficult in the area of mobile robotics, and it is classified as an NP-complete problem in the most basic variant [4, 5] and an NP-hard problem with an existence of many hurdles [6]. Algorithms that address path planning problems can be labeled into two kinds which are global approach also known as off line planning and local approach also described as on-line planning. In the global approach, the environment in which the robot will traverse is thoroughly understood ahead of time, whereas in the local approach, no comprehensive environment information is known in advance [7]. Cell decomposition, Road map approach and Artificial potential field approach are the traditional path planning strategies. The classical methodologies previously stated were shown to be successful and effective in solving the problem by giving plausible collision-free solutions. These techniques, however, have a number of limitations. When coping with challenging issues of large-scale and complex ecosystems,
298
S. Bahuguna and A. Pal
traditional methodologies take a long time to solve since they create computationally costly solutions [7, 8]. Another disadvantage of these traditional techniques is that they may become stuck in local optimal solutions rather than global optimal solutions, especially when working with vast environments with a high range of answers [9]. Due to the extreme shortcomings of the previously described traditional methodologies, probabilistic methods were introduced to handle the path planning problem. A few of the metaheuristic algorithms employed in the solution of this problem are Simulated Annealing [10, 11] (SA), Ant Colony Optimization [12] (ACO), Particle swarm optimization [13] (PSO) and their hybrids with other metaheuristics to improve the generated solutions. Over traditional methodologies, they have been acknowledged as the most common tool for mobile robot navigation as they have a high capacity to deal with unpredictability in the environment and are less computationally costly than other conventional techniques [14, 15]. A hybrid of beta hill climbing [16] and grey wolf optimizer algorithm known as Beta Hill Climbing Grey Wolf Optimizer (β-HCGWO) [17] is suggested in this study to identify the best acceptable path from the beginning point to the goal site without encountering any obstacles. To test the efficacy of the β-HCGWO algorithm we utilized a zone with three circular obstacles of varying radius. Three coordinate points were employed between the start and goal positions while determining optimal solution of this path planning problem. These coordinate points with each iteration were modified by β-HCGWO. If the solution point was in the obstacle zone, then violation was summed up to the cost function. Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO) algorithm, Artificial Bee Colony (ABC) algorithm, and Differential Evolution (DE) algorithm. The rest of the paper is sectioned into five parts. Section 2 overviews the proposed Beta Hill Climbing Grey Wolf Optimizer (β-HCGWO). The Robot Path Planning problem is introduced in Sect. 3. Section 4 represents the experimental results and lastly Sect. 5 explains the denouement drawn.
3 β Hill Climbing Grey Wolf Optimizer GWO is a newly designed swarm intelligence algorithm presented by Mirjalili et al. in 2014 [18], impacted by the social structure and aggressive behavior of Canis Lupus. There are four levels: alpha (α), beta (β), delta (δ) and omega (ω) and they are classified by falling into force [19], with the alpha wolf being the group’s controller and omega wolves being the lowest in the unit hierarchy. The following is the primary foraging procedure for grey wolves: a) b) c)
Constantly keeping an eye on, outrunning, and edging closer and closer to the target The prey is chased, barricaded, and frightened until it stops moving. Aiming to assassinate the prey.
21 Robot Path Planning Using β Hill Climbing Grey Wolf Optimizer
299
While structuring GWO, the social chain of command of the wolves is demonstrated numerically as follows: the best solution out of a large number of solutions is referred to as alpha (α), the next highest is referred to as beta (β), the third-best is referred to as delta (δ), and the rest of the solutions are referred to as omegas. GWO is great at exploitation but not so good at averting untimely convergence and local optimum owing to diversity loss, and search agents couldn’t probe the search space well [20]. As a result, a hybrid of GWO and β-hill-climbing algorithm (BHC) algorithm is recommended to help stabilize between the exploration and exploitation stages, with the latter being particularly good at balancing these two phases. The β-hill-climbing algorithm (BHC) is a novel and a very competent explorative local search algorithm. The method proceeds using an arbitrary solution → to a specified problem (− x = (x1 , x2 , x3 , x4 . . . .xd )) and d is the dimension of the problem. Neighborhood navigation or the N operator and β-operator are the two operators that BHC incorporates iteratively to produce new random solutions → (− x = (x1 , x2 , x3 , x4 , . . . .xd )). The N-operator is in charge of exploitation, whereas the β-operator in the beta hill climbing algorithm allows for exploration. As a result of these two operators, the BHC algorithm is not entrapped in local minima/maxima. In this study, β-hill ascending is only utilized in the basic GWO to update the location of Ist best, IInd best and IIIrd best wolf and thus updating their fitness. A random solution is first generated around alpha and its fitness value is calculated. It is
Fig. 2 Pseudo code of β-HCGWO
300
S. Bahuguna and A. Pal
then compared with fitness value of alpha and if the latest solution is better than the primitive alpha’s objective value then the alpha value is replaced by the new value if not then alpha value is improved with a beta hill-climbing algorithm with probability e f /T , where f denotes the difference seen between new and original alpha values and T is the decreasing temperature. Do this process for beta and delta values as well. A pseudo code of β-HCGWO is shown in Fig. 2.
4 Robot Path Planning The aim of robot path planning is to generate a collision-free route from one destination to another. Effective methodologies in tackling issues of this nature have a variety of applications, including automated surveillance, computer animation, robotics, and drug design. As a result, it is not astonishing that research effort in this sector has gradually increased over the previous two decades. The path planning problem for robots is an NP-hard optimization problem, and meta-heuristics algorithms are frequently used to tackle it. The key goal in solving this challenge is for the mobile robot to progress from the initial site to the target place in the lowest possible time while avoiding any impediments. It consists of the starting and ending positions, the size and shape of the obstructions, the number of obstacles, and the zone’s boundaries [21]. The path planning problem’s objective function is as follows: F = minx,y Q(1 + βV) where β signifies the violation coefficient (100), V the violation cost, and Q the total distance between the start and target points. Pseudo code of violation’s calculation: Violation ← 0 for each obstruction Calculate distance vector between the obstacle’s centre and path a ← max(1 − (distance/radiusobs ), 0) Violation ← Violation + mean(a) end for
5 Results of Experiment We utilized an example scenario from the www.yarpiz.com [22] website to show the operation of the β-HCGWO algorithm for path planning issues.
21 Robot Path Planning Using β Hill Climbing Grey Wolf Optimizer
301
Fig. 3 Path planning problem used in this study
This example case is depicted in Fig. 3. In a 6 × 6 zone, there are three circleshaped obstacles with varying radiuses. The yellow square represents the mobile robot’s starting place, and the green star represents the target point. We solved this problem with the β-HCGWO algorithm, and its performance was compared to that of many well-known meta-heuristic methods, like Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO) algorithm, Artificial Bee Colony (ABC) algorithm, and Differential Evolution (DE) algorithm. β-HCGWO and other meta-heuristic method codes were ran on a PC with an Intel(R) Core(TM) i5-7200U CPU running at 2.60 GHz and 8.00 GB of RAM. The population size is 50, and the count of iterations is 1000. Figure 4 depicts the optimal path planning solution produced after a one-time run of the β-HCGWO and some other meta-heuristic algorithms with which it was compared. The variables of the meta-heuristic algorithms utilized to solve the robot path planning problem are outlined in Table 1. The cost value of β-HCGWO algorithm is found as 7.3995. During optimization, it was discovered that the β-HCGWO algorithm finds the best path with the shortest distance between the start and target sites. Furthermore, at each iteration the paths found by the best current solution contain a very minimal violation. Figure 5 shows convergence curves of β-HCGWO and it shows that its performance delivers a slightly better outcome, and it can be used as an alternate method for path planning.
302
S. Bahuguna and A. Pal
Fig. 4 Shows the best solutions depicted by nature inspired algorithms, a.1 DE, a.2 ABC, a.3 PSO, a.4 GWO, and a.5 β-HCGWO
21 Robot Path Planning Using β Hill Climbing Grey Wolf Optimizer Table 1 Parameters of nature inspired algorithms
Algorithm
Parameters
β-HCGWO
Number of Wolves: 50
GWO
Number of Woles: 50
PSO
Inertia Weight: 1.0 Inertia Weight Damping Ratio: 0.99 Personal Learning Coefficient: 1.5 Global Learning Coefficient: 2.0
ABC
Number of Onlooker Bees: 50 Abandonment Limit Parameter: round(0.6*NumberOfVar*PopSize)
DE
Lower Bound of Scaling Factor: 0.5 Upper Bound of Scaling Factor: 1.0 Crossover Probability: 0.7 Strategy: rand2bin
303
Fig. 5 Convergence curves of β-HCGWO for robot path planning problem
6 Conclusion This paper investigated the robot route planning problem and presented the βHCGWO method to solve it. Well-known meta-heuristic methods (like GWO, PSO, DE and ABC) were used to evaluate the algorithm’s performance in addressing the path planning problem. The comparison findings reveal that the proposed β-HCGWO algorithm outperforms other metaheuristics somewhat. It can be improved further in the future by increasing the algorithm’s efficacy.
304
S. Bahuguna and A. Pal
References 1. 2. 3. 4.
5. 6. 7. 8.
9. 10. 11.
12. 13.
14.
15.
16. 17.
18. 19. 20.
21.
22.
Lazinica A (2006) Mobile robots - towards new applications. I- Tech Education and Publishing Grayson P (1999) Robotic motion planning. MIT Undergraduate J Math 1:57–67 Sharir M (1989) Algorithmic motion planning in robotics. Computer 22(3):9–20 Hussein A, Mostafa H, Badrel-din M, Sultan O, Khamis A (2012) Metaheuristic optimization approach to mobile robot path planning. In: 2012 international conference on engineering and technology (ICET), pp 1–6 Nearchou AC (1998) Path planning of a mobile robot using genetic heuristics. Robotica 16:575– 588 Canny J, Reif J (1987) New lower bound techniques for robot motion planning problems. In: 28th annual symposium on foundations of computer science (SFCS 1987), Los Angeles. IEEE Raja P, Pugazhenthi S (2012) Optimal path planning of mobile robots: a review. Int J Phys Sci 7:1314–1320 Sugihara K, Smith J (1997) Genetic algorithms for adaptive motion planning of an autonomous mobile robot. In: Proceedings 1997 IEEE international symposium on computational intelligence in robotics and automation, Monterey. IEEE, pp 138–143 Ismail AT, Sheta A, Al-Weshah M (2008) A mobile robot path planning using genetic algorithm in static environment. J Comput Sci 4(4):341–344 Blackowiak A, Rajan S (1995) Multi-path arrival estimates using simulated annealing: application to crosshole tomography experiment. IEEE J Oceanic Eng 20:157–165 Carriker W, Khosla PK, Krogh BH (1990) The use of simulated annealing to solve the mobile manipulator path planning problem. In: IEEE international conference on robotics and automation, pp 204–209 Roul S (2011) Application of ant colony optimization for finding navigational path of mobile robot. Master’s thesis, National Institute of Technology, Rourkela Wang L, Liu Y, Deng H, Xu Y (2006) Obstacle-avoidance path planning for soccer robots using particle swarm optimization. In: 2006 IEEE international conference on robotics and biomimetics, Kunming, China. IEEE Elshamli A, Abdullah H, Areibi S (2004) Genetic algorithm for dynamic path planning. In: Canadian conference on electrical and computer engineering 2004 (IEEE Cat. No. 04CH37513), Niagara Falls, ON, Canada. IEEE Garcia MP, Montiel O, Castillo O, Sepúlveda R, Melin P (2009) Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl Soft Comput 9:1102–1110 Al-Betar MA (2016) β-Hill climbing: an exploratory local search. Neural Comput Appl 28(1):153–168 Bahuguna S, Pal A (2021) β-hill climbing grey wolf optimizer. In: Tiwari A, Ahuja K, Yadav A, Bansal JC, Deep K, Nagar AK (eds) Soft computing for problem solving. Advances in intelligent systems and computing, vol 1393. Springer, Singapore Mirjalili S, Mirjalili SM, Lewis AD (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 Melin P, Castillo O, Kacprzyk J (2017) Nature-Inspired Design of Hybrid Intelligent Systems. Springer, Cham Arora S, Singh H, Sharma M, Sharma S, Anand P (2019) A new hybrid algorithm based on grey wolf optimization and crow search algorithm for unconstrained function optimization and feature selection. IEEE Access 7:26343–26361 Do˘gan L, Yüzgeç U (2018) Robot path planning using gray wolf optimizer. In: International conference on advanced technologies, computer engineering and science (ICATCES 2018), Safranbolu, Turkey, pp 69–74 Heris MK (2015) Optimal Robot Path Planning using PSO in MATLAB. Yarpiz
Chapter 22
Texture Feature Analysis for Inter-Frame Video Tampering Detection Shehnaz and Mandeep Kaur
1 Introduction With the advent of advanced and sophisticated processing tools, fake digital videos are proliferating in society. It poses a potential threat to the authenticity and integrity of the multimedia data. Using these tools, tampering [1] (unauthorized editing) is done in original videos to create tampered/forged videos (fake). Technically, video is just a sequence of frames (images) that are displayed at high rate to generate motion of its objects. If a tampering modifies the sequence of frames then it is termed as Inter- Frame video tampering otherwise if it changes frames as images, it is known as Intra-Frame tampering. These tampering operations can result in different types of video attacks such as Frame Deletion [2], Frame Replication (Frame Cloning/Frame Copy Move) [3], Frame Shuffling Attack, Frame Interpolation [4], Frame Mirroring Attack [5], Upscale-Crop [6], Region Duplication [7], Video Splicing [8], Replayed Video [9], Video Face Spoofing [10],Video Re-capture [11], Video Copy [12], Video Phylogeny [13], Green Screening [7], Deep Fake [14] etc. Inter-Frame Tampering includes frame deletion, frame insertion, frame shuffling, frame interpolation, frame replication, frame mirroring etc.; whereas Intra-Frame Tampering includes region duplication, video-splicing and upscale-crop. Multiple tampering operations on a video can result in complex forgery which may lead to adverse social, legal and political implications. Hence, it becomes essential to authenticate it scientifically, especially when it is presented as evidence in the court of law. The domain of video forensics provides active and passive detection approaches to identify video tampering. Active approaches are intrusive in nature that confirm tampering by recomputing and Shehnaz (B) · M. Kaur Department of Information Technology, University Institute of Engineering and Technology, Panjab University, Chandigarh, India e-mail: [email protected] M. Kaur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_22
305
306
Shehnaz and M. Kaur
matching of pre-embedded watermarks or digital signatures. They have some limitations such as they require additional hardware to embed watermark or signature. In contrast, passive approaches study video content to find out the traces of tampering. They study and analyze footprints left by editing operations to find forgery. It can be noise residues, change in prediction error value; motion residues, high spatial and temporal correlation between frame intensity values, inconsistencies in optical flow, abnormalities in motion vectors, the difference in the quality of frames, variation of prediction footprint (VPF), motion-compensated edge artifact (MCEA), blocking artifacts, ghost artifacts, variation in texture, etc. Most Inter-frame forgery detection approaches that exist in the literature study either pixel-level, compression, motion features of a video. If we delete, insert or rearrange some number of frames, it causes temporal disturbances in the video sequence. Such methods study the pixel-correlation between adjacent frames of video to find inconsistencies in continuity or similarity using specific features to reveal inter-frame forgery such as methods in [15, 16] analyze grey intensity values. The approach presented in [44] is based on 2D phase concurrency that has good accuracy for insertion but not for deletion. SIFT-based method in [17] can detect only frame duplication attacks in a given video. In [18], a SURF- based histogram similarity approach is applied that can detect frame insertion, deletion and duplication but before applying, it requires a video shot detection scheme and cannot identify complex forgeries. Texture feature (LBP) based method [19] identifies frame insertion and deletion but the method cannot distinguish between these two forgeries. Similarly, method in [20] studies noise inconsistencies which can detect frame duplication or mirroring attack. Motion-level inconsistencies are analyzed in optical flow-based methods [21–25] and techniques in [26, 27] study MCEA (Motion compensated edge artifacts) to detect inter-frame video tampering. Methods [28–37] study compression artifacts such as DCT coefficients, VPF, prediction error sequences of Discrete Fourier Transform, Markov feature [38], First Digit Distribution [39], DCT histogram [40] to reveal any inter-frame video tampering. These methods compute pixel-correlation to measure similarity of adjacent frames due to which their computational cost is very high [19, 43, 44]. Since a video can be of any length because of different frame rates and processing each video with considerably good length will be difficult. Due to this reason, they are impractical and limited applicable. This fact motivates to propose a practical, accurate and an efficient approach to detect inter-frame video forgeries. In this paper, a histogrambased approach is proposed to identify inter-frame video forgeries. It studies texture features (LBP) that are more reliable than motion and compression features. Motion features provide no footprints in a video where either motion is too high or no motion (static scenes). Similarly, compression features do not fulfill the same purpose in case of uncompressed video. Due to their limited scope, it extracts texture feature LBP of each frame and computes its histogram. It checks histogram-similarity of adjacent frames using histogram intersection comparison metric rather than pixel-similarity that can provide erroneous data. It then analyses inconsistencies in similarity pattern by taking differences of metric values of adjacent frames to exploit tampering artifacts. With empirical analysis, it is observed that this approach yields good detection
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection
307
accuracy on various kinds of inter-frame forgeries. Experimental results based on feature LBP and its variants are presented. It also includes comparison results with a pixel-correlation based approach. The paper is organized as follows: Sect. 2 presents the methodology of the proposed approach; Sect. 3 provides the details of the experimental dataset generated to represent inter-frame forgeries. Experimental results discussion is presented in Sect. 4 and Sect. 5 respectively. Section 6 concludes the paper with discussion on future scope.
2 Proposed Inter-Frame Video Tampering Detection Approach A novel histogram-based approach is proposed to identify inter-frame forgeries in a digital video. The block diagram of the proposed approach is presented in Fig. 1 and details of each step are discussed in the following subsections.
2.1 Video to Frame Conversion Video is just a sequence of images (called frames) that are displayed at a high frame rate to show motion in the video. The number of video frames depend upon the video duration and its frame rate. Processing the whole video mostly is a computationally expensive operation. Therefore, it is required to convert input video into video frames so that it can be further processed as images.
2.2 Video Frame Pre-processing Here RGB video frames are processed to get grayscale video frames. The gray value of an image defines brightness level which can accurately represent the video content. This feature is beneficial as the RGB color image takes storage space three times more than the gray color image which results in increased computation time and space. It is therefore an usual preprocessing step followed in many applications. The RGB image is converted to a gray-scale image using the following relationship between the RGB model and the YUV model, where component Y represents the brightness. Y = 0.2989 × R + 0.5870 × G + 0.1140 × B
(1)
308
Shehnaz and M. Kaur
2.3 Feature Extraction Several features exist in the literature that can represent a different type of useful information of given video frames. The proposed approach extracts LBP features for each gray-scaled video frame. The local binary pattern operator [41] is defined as a gray-scale based invariant texture measure. Due to its discriminative power, good efficacy, and computational simplicity, this feature and its variants find a good place in several applications such as image representation and classification, image forensics, image retrieval, motion analysis, visual inspection, video analysis, distinguishing computer graphics from photographic images. It has variants namely the uniform LBP (U-LBP) and rotation invariant (R-LBP) that are also exploited in the current paper. The U-LBP patterns contain at most two circular 0–1 and 1–0 transitions. For example, patterns 0011000, 11011100, 00001100, and 11111011 are uniform, and 011101000, 11001110 are not uniform. They are mostly used because of their high discriminative capability and compact image representation that yields fewer dimensional feature vectors. They also improve the performance of machine learningbased classification. The R-LBP variant is mostly used for good texture analysis. It is generated by rotating each bit of the original LBP pattern to get a pattern with a minimum value. For example.111010001, 110100011, 101000111, 010001111, 100011110 are generated from the same original LBP with one-bit circular rotations and they are normalized to minimum value pattern 000111101 (rotation invariant pattern). LBP feature are defined in Eqs. (2) and (3).
LBPP =
P−1
s(g p − gc )2 p
(2)
1, x ≥ 0 0, x < 0
(3)
p=0
Where, s(x) =
P denotes a neighborhood of P sampling points around the central pixel. Here, P is set to 8, gc is the gray value of the central pixel, and gp (p = 0,· · · , P − 1) represents the gray value of the pth neighbor point of the central pixel. If gp is greater than gc , the corresponding binary code is 1, otherwise 0. The LBP of the central pixel is then generated by concatenating P binary values and transforming the binary string to a decimal number. LBP computation can be explained through the following example. If the window size is 3 * 3 to compute the LBP value, central pixel value ‘89’ will be coded as ‘138’ in LBP coded frame using Eqs. (2) and (3) explained through (Fig. 2).
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection Fig. 1 Methodology of the proposed inter-frame video forgery detection approach
309
Video to Frame Conversion Video Frame Processing
Feature Extraction
Histogram Computation
Histogram Similarity Analysis Normalization and Quantization
Training and Testing
Original Video
Tampered Video
Fig. 2 Computation of LBP value for the central pixel
2.4 Histogram Computation Most of the existing inter-frame video forgery detection methods are pixel-based approaches [19, 43, 44]. They obtain feature-coded frames with a size equal to the size of video frames and compute pixel-correlation between the adjacent frames. The number of pixels depends upon the resolution of a video frame, hence videos with high resolution have a large number of pixels. Pixel-by-pixel correlation computation increases execution time which makes it an inefficient, impractical, inapplicable approach. Due to this, the proposed approach computes the histogram of the LBPcoded video frame as a texture descriptor. It has a length of 256 irrespective of size,
310
Shehnaz and M. Kaur
resolution of feature-coded frames. This step improves time and space efficiency to a great extent.
2.5 Histogram Similarity Analysis In a given video shot, the similarity of the content between the adjacent video frames is high which can be due to static backgrounds and same objects, etc., while far apart frames have relatively low similarity that can be due to changes in backgrounds and objects because of tampering operations such as frame deletion, frame insertion, and frame duplication, etc. In this paper, similarity analysis is carried out based on a histogram of texture descriptors. The similarity is calculated using the histogram intersection metric given in Eq. (4). More similar histograms will result in the high value of similarity and less similar histograms output low value of histogram intersection metric. n−1 j=0 min H j (Fi ), H j (Fi+1 ) (4) dk (H(Fi ), H (Fi+1 )) = l−1 j=0 H j (Fi+1 ) H j (Fi ) is jth bin of histogram of LBP featured video frames(Fi ) with numbersi = 1, 2, 3, . . . . . . n − 1, where n is the total number of video frames, l is the vector length of the histogram and equals to 256, d(Hi , Hi+1 ) is the calculated histogram intersection metric between two histograms. This step provides n-1 dk values for a video of length n. If tampering is done at one location, then there will be only one low dk value and other values are high due to which point of tampering is suppressed. To highlight the traces of tampering, we take differences of adjacent dk values using the following Eq. (5) and get a vector d of size N − 1 × 1 with values d1 , d2 , d3 , d4,........... dn−1 that preserves the variability of video content. It is observed with the experiment that variability of the sequence remains consistent in original videos whereas it becomes inconsistent if videos have been tampered with. It depicts that d sequence is different from the original video and its tampered versions having frame deletion, frame insertion, and frame duplication attacks. dk =
|dk − dk+1 |, k > 2 0, k=1
(5)
2.6 Normalization and Quantization The feature vector is normalized using the Min–Max scaler method to get values in the standard range 0 to 1. It then quantifies all the elements into D quantization
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection
311
levels with quantization interval 1/D and generate N − 1 discrete values from 1/D to 1 (1/D, 2/D, ……, 1). Then, it counts the distribution of the obtained discrete values into a D × 1 vector to obtain fixed length feature for videos of different lengths. This technique is applied to each original and forged video included in the dataset given in Table 1. A database of N vectors is created to train it using SVM (Support Vector Machine) with RBF (Radial Basis Function) kernel. Training is done with a dataset of 2740 videos (1370-original video, 1370-forged video) where 40% of the samples are used for testing with K-fold cross-validation.
3 Dataset To verify the effectiveness of the proposed approach, standard datasets are generally required. To the best of our knowledge currently no benchmark dataset of interframe tampered videos exists. Few standard datasets are available such as SULFA [45] and VTD [1] that contain testing videos for intra-frame forgeries only. Therefore, 6 datasets of inter-frame tampered videos are designed from uncompressed original videos from the benchmark SULFA dataset. They are edited using Python Library MoviePy to generate 1370 tampered videos. Most of the authors validate proposed techniques on their customized datasets which contain inter-frame tampering of duration of 2, 3 or 4 s. Generally, the frame rate of a video is 25 fps, accordingly, they include videos having deletion or insertion of 50, 75, or 100 frames. It is very easy for a detection tool to identify such a substantial amount of tampering and results in good accuracy. Therefore, to avoid this type of bias, we perform deletion/duplication/insertion of 1 or 2 s video shots of any length comprising of 25 or 50 frames at random locations. Other specifications of the original version are kept the same in the tampered videos. Details of the dataset are presented in Table1 below. Table 1 Details of fabricated inter-frame video forgery dataset Dataset
Type of forgery
Number of videos
Frame rate
Resolution
1
25 frames deletion
233
25 fps
320 * 240
2
50 frames deletion
63
25 fps
320 * 240
3
25 frames insertion
220
25 fps
320 * 240
4
50 frames insertion
295
25 fps
320 * 240
5
25 frames duplication
281
25 fps
320 * 240
6
50 frames duplication
278
25 fps
320 * 240
312
Shehnaz and M. Kaur
4 Experimental Results The experiment is carried out on a python-based platform to implement above mentioned approach with system specifications of an Intel(R) Core(TM) i5-8250U CPU @ 1.60 GHz, 8 GB RAM on 64-bit operating system. The outcome of different implementation phases is discussed in the current section. The frame sequence of the original video is represented in Fig. 3 on which different attacks are performed i.e., frame deletion, insertion, duplication that are depicted in Figs. 4, 6, 8 respectively. Based on the rotation invariant LBP feature, Fig. 5(a) highlights statistical distribution of the original video sequences of frames 101–125 and 126–150. If its frame sequence 101–125 is deleted then the distribution of original frame sequence 126–150 should take its place. Figure 5(b) depicts highlighted distribution present in the forged video is just similar to that present at location 126–150 in Fig. 5(a). Figure 6 shows that new frames are inserted at locations 101–125 in the original video to perform a frame insertion attack. Due to this, the inconsistent statistical distribution of forged video is obtained shown in Fig. 7, that is totally different to Fig. 5(a). In Fig. 8, a frame duplication attack is performed at locations 76–100 with duplicated original frames present at locations 26–50. The highlighted similar distribution shown in Fig. 9 is obtained by this method that proves the existence of duplication of frames. This methodology is trained and tested with classifier SVM with RBF kernel. It trains a database of 2740 videos containing an equal number of original and forged videos. K-fold cross-validation is done with K = 10 and 40% of samples are taken for testing. Experimental results based on different parameters such as Precision, Recall, F1-score, and accuracy are summarized below in Table 2, Table 3 and Table 4 that display performance of the proposed approach based on original LBP, R-LBP, and U-LBP respectively. Figure 10 shows performance of LBP and its variants to detect frame insertion, deletion and duplication. Method [16] is cited in most of the methods available in literature and work in similar manner but it can just identify frame insertion and deletion attack. The proposed approach is compared with the pixel-correlation based
Fig. 3 Original video sequence with frame number 0–180 taken from SULFA dataset
Fig. 4 Forged video is generated by deleting video frame sequence 101–125 from the original video shown in Fig. 3
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection
313
Fig. 5 a Differences of Histogram intersection metric values based on rotation invariant LBP for the original video, b Differences of Histogram intersection metric values based on rotation invariant LBP for tampered video with Frame deletion operation
Fig. 6 Forged video with frame insertion attack performed at frame number 101–125 of original video sequence shown in Fig. 3
Fig. 7 Difference of Histogram intersection metric values based on rotation invariant LBP for tampered video with Frame Insertion attack shown in Fig. 6
method [16] and its results of overall accuracy shown in Table 5 are graphically presented in Fig. 11.
314
Shehnaz and M. Kaur
Fig. 8 Video sequence with frame number 26–50 are duplicated at location frame 76–100 in original video sequence of Fig. 3 to perform frame duplication attack
Table 2 Results of proposed approach based on original LBP Tampering detection
Precision
Recall
F1-score
Accuracy
Deletion
0.98
0.98
0.98
98%
Insertion
0.93
0.89
0.90
91%
Duplication
0.97
0.97
0.97
97%
Overall accuracy
0.96
0.94
0.95
95%
Table 3 Results of proposed approach based on rotation invariant LBP Tampering detection
Precision
Recall
F1-score
Accuracy
Deletion
0.99
0.99
0.99
99%
Insertion
0.98
0.98
0.98
98%
Duplication
1
1
1
100%
Overall accuracy
0.99
0.99
0.99
99%
Table 4 Results of proposed approach based on uniform LBP Tampering detection
Precision
Recall
F1-score
Accuracy
Deletion
0.99
0.99
0.99
99%
Insertion
0.99
0.99
0.99
99%
Duplication
1
1
1
100%
Overall accuracy
0.99
0.99
0.99
99%
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection
315
Fig. 9 Difference of Histogram Intersection metric values based on rotation invariant LBP for tampered video with frame duplication attack shown in Fig. 3
Accuracy
Fig. 10 Comparison results of proposed approach based on LBP, R-LBP, and U-LBP
100 98 96 94 92 90 88 86
Deletion
Insertion LBP
Duplication
R-LBP
U-LBP
Overall Accuracy
Table 5 Comparison of proposed approach with method [16] Method
Feature
Precision
Recall
F1-score
Accuracy
[16]
Gray
0.94
0.93
0.93
94%
[16]
LBP
0.95
0.93
0.94
94%
Proposed
LBP
0.96
0.94
0.95
95%
Proposed
U-LBP
0.99
0.99
0.99
99%
Proposed
R-LBP
0.99
0.99
0.99
99%
100
Fig. 11 Comparison results based on overall accuracy
99
U-LBP
R-LBP
Proposed
Proposed
98
Accuracy
97 96 95 94
LBP Gray value
LBP
[16]
[16]
93 92 91
Proposed
316
Shehnaz and M. Kaur
5 Discussion Pixel-correlation based approach [16] with HOG feature [42] is best among LBP and gray values if used with a random forest classifier to detect frame deletion forgery. It is less suitable to detect frame insertion because it yields lesser accuracy (86%–87%) as compared to accuracy (91%–93%) that can be achieved by LBP and gray values. It is also observed that irrespective of the classifier being chosen, gray value and LBP features are good to detect frame insertion tampering, but it produces different results for a different kind of forgery and cannot detect all types of interframe forgery at a time. With empirical analysis, it is observed that approach [16] has high computational cost and relatively low classification accuracy. It was tested on frame insertion and deletion interframe forgery but not on duplication attack. The proposed methodology exhibits better classification accuracy on different kinds of inter-frame forgeries. It is tested with different variants of LBP using SVM RBF kernel. The R-LBP gives overall accuracy 99% as compared to default LBP that exhibits classification accuracy of 95%. The proposed approach accurately detects frame deletion and duplication but it detects frame insertion with slightly less accuracy. It is observed that U-LBP feature along with histogram intersection (comparison metric) contributed a lot to improve the scalability and applicability. Using U-LBP, this method is able to detect frame insertion, deletion and frame duplication attacks with overall good accuracy in the range 99%–100% in less time. It gives 100% accuracy in the detection of frame duplication attacks.
6 Conclusion An inter-frame video tampering detection is proposed. Unlike the pixel-correlation based approach, it follows a histogram-based approach to reduce computational cost. The proposed approach can detect various kinds of inter-frame forgeries like insertion, deletion and duplication with high classification accuracy in the range 98–100%. In addition, it works efficiently on videos of varying lengths, thus exhibits better reliability and scalability. Experiment is conducted on the variants of LBP where it is observed that U-LBP outperforms as compared to its other variants. It outperforms on samples with frame duplication attack. Its results based on different variants of LBP and comparison results with method [16] are provided. Existing approaches are tested and validated on limited customized datasets due to the unavailability of a standard dataset. A customized dataset is therefore created with total 2740 videos containing 1370 tampered videos using the benchmark dataset. The proposed method can be improved to detect frame shuffling and incorporate localization of tampered region in the video under analysis.
22 Texture Feature Analysis for Inter-Frame Video Tampering Detection
317
References 1. Ismael Al-Sanjary O, Ahmed AA, Sulong G (2016) Development of a video tampering dataset for forensic investigation. Forensic Sci Int 266:565–572 2. Shanableh T (2013) Detection of frame deletion for digital video forensics. Digit Investig 10(4):350–360 3. Singh RD, Aggarwal N (2018) Video content authentication techniques: a comprehensive survey. Multimedia Syst 24(2):211–240 4. Yao Y, Yang G, Sun X, Li L (2016) Detecting video frame-rate up-conversion based on periodic properties of edge-intensity. J Inf Secur Appl 26:39–50 5. Bozkurt I, Bozkurt MH, Uluta¸s G (2017) A new video forgery detection approach based on forgery line. Turk J Electr Eng Comput Sci 25(6):4558–4574 6. Hyun DK, Ryu SJ, Lee HY, Lee HK (2013) Detection of upscale-crop and partial manipulation in surveillance video based on sensor pattern noise. Sensors 13(9):12605–12631 7. Wang W, Farid H (2009) Exposing digital forgeries in a video by detecting double quantization. In: Proceedings of the 11th ACM multimedia security workshop, pp 39–47 8. Singh RD, Aggarwal N (2017) Detection of upscale-crop and splicing for digital video authentication. Digit Investig 21:31–52 9. Li L, Xia Z, Hadid A, Jiang X, Zhang H, Feng X (2019) Replayed video attack detection based on motion blur analysis. IEEE Trans Inf Forensics Securi 14(9):2246–2261 10. Zhang Y, Dubey RK, Hua G, Thing VLL (2019) Face spoofing video detection using spatiotemporal statistical binary pattern. In: IEEE Region 10 annual international conference, proceedings/TENCON, October 2018, pp 309–314 11. Schaber P, Dong S, Guthier B, Kopf S, Effelsberg W (2015) Modeling temporal effects in the re-captured video. In: Proceedings of the 2015 ACM multimedia conference, pp 1279–1282 12. Esmaeili MM, Fatourechi M, Ward RK (2011) A robust and fast video copy detection system using content-based fingerprinting. IEEE Trans Inf Forensics Secur 6(1):213–226 13. Lameri S, Bondi L, Bestagini P, Tubaro S (2018, September) Near-duplicate video detection exploiting noise residual traces. In: Proceedings - international conference on image processing, ICIP, vol 2017, pp 1497–1501 14. Yang X, Li Y, Lyu S (2019, May) Exposing deep fakes using inconsistent head poses. In: ICASSP, IEEE international conference on acoustics, speech and signal processing proceedings, vol 2019, pp 8261–8265 15. Zheng L Sun T, Shi YQ (2015) Inter-frame video forgery detection based on block-wise brightness variance descriptor. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9023, pp 18–30 16. Wang Q, Li Z, Zhang Z, Ma Q (2014) Video inter-frame forgery identification based on consistency of correlation coefficients of gray values. J Comput Commun 2(04):51 17. Ulutas G, Ustubioglu B, Ulutas M, Nabiyev VV (2018) Frame duplication detection based on BoW model. Multimedia Syst. 24(5):549–567 18. Zhao DN, Wang RK, Lu ZM (2018) Inter-frame passive-blind forgery detection for video shot based on similarity analysis. Multimedia Tools Appl 77(19), 25389–25408 19. Zhang Z, Hou J, Ma Q, Li Z (2015) Efficient video frame insertion and deletion detection based on the inconsistency of correlations between local binary pattern coded frames. Secur Commun Netw 8:311–320 20. Ulutas G, Ustubioglu B, Ulutas M, Nabiyev V (2017) Frame duplication/mirroring detection method with binary features. IET Image Process 11(5), 333–342 21. Micheloni C, Canazza S, Foresti GL (2009) Audio-video biometric recognition for noncollaborative access granting. J Vis Lang Comput 20(6):353–367 22. Jia S, Xu Z, Wang H, Feng C, Wang T (2018) Coarse-to-fine copy-move forgery detection for video forensics. IEEE Access 6:25323–25335 23. Chao J, Jiang X, Sun T ( 2018) A novel video inter-frame forgery model detection, pp 267–281. Springer, Heidelberg
318
Shehnaz and M. Kaur
24. Kingra S, Aggarwal N, Singh RD (2017) Video inter-frame forgery detection approach for surveillance and mobile recorded videos. Int J Electr Comput Eng 7(2):831–841 25. Singh RD, Aggarwal N (2017) Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection. J Circuits Syst Comput 26(7) 26. Stamm MC et al (2009) Temporal forensics and anti-forensics for motion compensated video. IEEE Trans Inf Forensics Secur 228:84–96 27. Yao H, Ni R, Zhao Y (2019) An approach to detect video frame deletion under anti-forensics. J Real-Time Image Process 16(3):751–764 28. Wang W, Farid H (2007) Exposing digital forgeries in interlaced and deinterlaced video. In: MM and Sec’07 - proceedings of the multimedia and security workshop 2007, vol 2, no 3, pp 35–42 29. He P, Jiang X Sun T, Wang S (2016) Double compression detection based on local motion vector field analysis in static-background videos. J Vis Commun Image Represent 35: 55–66 30. Vázquez-Padín D, Fontani M, Bianchi T, Comesaña P, Piva A, Barni M (2012) Detection of video double encoding with GOP size estimation. In: WIFS 2012 - proceedings of the 2012 IEEE international workshop on information forensics and security, pp 151–156 31. Stamm MC, Lin WS, Liu KJR (2012) Temporal forensics and anti-forensics for motioncompensated video. IEEE Trans Inf Forensics Secur 7(4):1315–1329 32. Jiang X, Xu Q, Sun T, Li B, He P (2019) Detection of HEVC double compression with the same coding parameters based on analysis of intra coding quality degradation process. IEEE Trans Inf Forensics Secur 15:250–263 33. Bakas J, Naskar R, Bakshi S (2021) Detection and localization of inter-frame forgeries in videos based on macroblock variation and motion vector analysis. Comput Electr Eng 89:106929 34. Singh G, Singh K (2019) Video frame and region duplication forgery detection based on correlation coefficient and coefficient of variation. Multimedia Tools Appl 78(9), 11527–1156 35. Huang T, Zhang X, Huang W, Lin L, Su W (2018) A multi-channel approach through the fusion of audio for detecting video inter-frame forgery. Comput Secur 77:412–426 36. Abbasi Aghamaleki, J, Behrad A (2017) Malicious inter-frame video tampering detection in MPEG videos using time and spatial domain analysis of quantization effects. Multimedia Tools Appl 76(20):20691–20717 37. Abbasi Aghamaleki J, Behrad A (2016) Inter-frame video forgery detection and localization using intrinsic effects of double compression on quantization errors of video coding. Signal Process Image Commun 47:289–302 38. Jiang X, Wang W, Sun T, Shi YQ, Wang S (2013) Detection of double compression in MPEG-4 videos based on Markov statistics. IEEE Signal Process Lett 20(5):447–450 39. Raimi RA (1976) The first digit problem. Am Math Mon 83(7):521–538 40. Mohamed A, Khellfi F, Weng Y, Jiang J, Ipson S (2009) An efficient image retrieval through DCT histogram quantization. In: International conference on CyberWorlds, pp 237–240 41. Ojala T, Pietikäinen M, Mäenpää T (2000) Gray scale and rotation invariant texture classification with local binary patterns. In: BT - Computer vision - ECCV 2000, pp 404–420 42. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 1, pp 886–893 43. Bakas J, Naskar R, Dixit R (2019) Detection and localization of inter-frame video forgeries based on inconsistency in correlation distribution between Haralick coded frames. Multimedia Tools Appl 78(4):4905–4935 44. Li Q, Wang R, Xu D (2018) An inter-frame forgery detection algorithm for surveillance video. Information 9(12) 45. Qadir G, Yahaya S, Ho ATS (2012) Surrey university library for forensic analysis (SULFA) of video content. In: IET Conference Publications, vol 2012, no. 600 CP
Chapter 23
Computer Vision-Based Algorithms on Zebra Crossing Navigation Sumaita Binte Shorif , Sadia Afrin, Anup Majumder, and Mohammad Shorif Uddin
1 Introduction All over the world, approximately 39 million people are blind and about 285 million people suffer from visual impairment [1]. The capability of being able to move, i.e., the definition of mobility is given as “the ability to travel safely, comfortably, gracefully, and independently through the environment,” [2] is the leading impediment for the blind and visually challenged people. These people usually used four types of navigational aids, such as conventional travel aids (white cane, electronic cane, and mobility robot), guides (human and dog), visual pathway (cortical, retinal, and optic nerve prosthesis), and computer vision-based systems. The most extensively used outdoor navigational assistance for visually challenged people include the guide dog and the white cane. Unfortunately, the aforementioned navigational assistances have many shortcomings: span of recognition of unusual patterns or shapes with the help of a cane is meager, and guide dogs require ample training as well as a substantial number of lessons to be taught and are not the best option for people who are not physically robust or are not capable of maintaining a dog or might have Cynophobia. Numerous devices have been developed over the years such as the Mowat sensor, the Sonic guide, the Laser cane, the ultrasonic cane, the RFID cane, and the Navbelt to ameliorate the proficiency of the white cane. Descriptions of these devices are available in [3–5]. Pedestrian crossings or crosswalks are highly perilous for blind people to cross securely. Nonetheless, in order to detect the location, the extent of a crosswalk, and to interpret the current state of traffic lights, the aforementioned devices are not capable of assisting the blind. In some crosswalks, traffic lights have beepers that enable a blind person to identify whether that part of the road is a crossing or S. B. Shorif · S. Afrin · A. Majumder (B) · M. S. Uddin Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_23
319
320
S. B. Shorif et al.
not. However, such traffic widgets are not conveniently available at all crosswalks most likely because installation and maintenance of such widgets at every crossing are difficult to ensure and sometimes these devices might come across technical difficulties that might take too long to get rid of. Although blind people do not have a visual sense, they have a sense of hearing. Easy availability of inexpensive and expeditious netbook computers/smartphones/tabs with multimedia computation features to facilitate audio–video conversion streams create new scopes of developing an intelligent navigation system for the people who happen to be blind. Some computer vision-based systems [6–10] have been developed for the blind and visually challenged people to facilitate autonomous detection of inevitably essential information in order to safely negotiate a road crossing. Besides, Meijer [11] developed a hardware-based image to auditory conversion to ameliorate the mobility of visually impaired people. Besides, many researchers [12–20] reported computer vision-based pedestrian/crossing/outdoor object detection strategies to ensure the safety or protection of usual pedestrians from accidents. However, these strategies are not focused on the navigation of zebra crossing for visually impaired pedestrians. On the other hand, Jason Dowling [21] tried stimulating the visual cortex through electrical signals to create an artificial vision and Hoang et al. [22] developed an obstacle warning system depending on mobile Kinect and electrode matrix to help improve blind mobility. However, it is not user-friendly, and also long-time use can result in a headache. Lin et al. [23] developed a smartphone-based diverse outside objects recognition system for mobility enhancement of the blind. Recently, a general review was done by Santiago and Alvaro [24] on mobility enhancement systems for visually challenged and blind people. But these two papers [23, 24] did not focus on the zebra-crossing system and almost no review is yet performed on the vision-based zebra-crossing system. This paper has been attempted to fill the void by presenting a detailed review of the computer vision-based zebra-crossing navigation system. Human beings are capable of sensing and recognizing almost 80% of their vision. So, the people who have no vision are facing serious problems and the mobility of these people is very limited and it is dangerous for them to navigate in outdoor environments alone. Hence, they live a complete dependent life. Nowadays, intelligent technologies are available. Zebra crossing is very dangerous and life-threatening, so it is a great idea to develop a computer vision-based road crossing system to ameliorate the lifestyle of visually impaired people. Many researchers [6–10, 17– 20] have been involved with navigational aids for the blind and developed some vision-based systems. This motivates us to review the existing works on the visionbased road crossing systems to find their effectiveness and also to show some research directions. Two real-time snapshots of zebra crossing are presented in Fig. 1.
23 Computer Vision-Based Algorithms on Zebra Crossing Navigation
321
Fig. 1 A figure caption is always placed below the illustration. Short captions are centered, while long ones are justified. The macro button chooses the correct format automatically
The main objectives of this survey works are as follows: • Study on the existing research works on vision-based zebra crossing navigation system • Comparative analysis of the performance of the existing systems along with their merits and demerits • Show future research directions to develop a more novel, intelligent and effective zebra crossing system to facilitate the amelioration of the free movement of millions of visually challenged people. The rest of the paper is organized as follows. Section 2 provides a description of different techniques. Section 3 discusses the research direction, and finally, Sect. 4 draws the conclusion.
2 Comparative Analysis In addition to conventional white cane or electronic cane, the following five major computer vision-based techniques have been developed for zebra crossing detection. However, elaborate experimentations on these methods have not been conducted yet by using real blind subjects.
322
S. B. Shorif et al.
2.1 Examining Groups of Concurrent Lines As we mentioned earlier that a zebra-crossing is simply a pattern consisting of black and white stripes repeating periodically which can be considered as a group of consecutive edges. Zebra-crossing edges are parallel to each other in 3D space, but when these are projected onto the image, these edges will intersect at a vanishing point. Therefore, Stephen Se [8] proposed a method by searching for concurrent lines when looking for a structure that originally consists of parallel lines using the vanishing point constraint. Regardless, meticulous experimentation of this method with blind as well as visually impaired people has not been conducted so far. Furthermore, this method is quite ponderous in respect to functioning. The mechanism and the detection results are shown in Fig. 2, which is taken from the paper [8].
2.2 Image to Speech Conversion Technique Meijer’s “vOICe,” [11] comprising of a camera that requires to be mounted to the head, a laptop or palmtop, and stereo headphones is the only vision-based travel aid for the visually impaired people that can be accessed commercially which uses one-to-one image-to-sound patterns mapping. This mapping sets the seal on the correspondence of the visual information. Although it can recognize objects such as vehicles, trees, etc., but zebra crossing detection is still one of its shortcomings.
2.3 Detection Using AdaBoost Lausser et al. [9] developed a zebra crossing detection using Viola and Jones approach [25] through the AdaBoost algorithm [26] to train a single stage of their cascaded architecture. AdaBoost is a meta-algorithm that generates an ensemble of weak classifiers for getting the final strong classifier. There is a problem with a single stripe that is treated as zebra crossing (false positive detection). Though this problem was partially solved by grouping the single sub-windows in postprocessing, it is not still adequate, as the false positive rate is still high.
23 Computer Vision-Based Algorithms on Zebra Crossing Navigation
323
Fig. 2 a A usual zebra crossing; b Result of the recognition process; c partitioning and overlaying the detected edges; d detection of sidelines; e partitioning and overlaying the detected edges but having side lines removed [8]
324
S. B. Shorif et al.
2.4 Detection Using Mobile-Based System A collaborative mobile-cloud strategy for context-aware involving outdoor navigation was proposed by Bhargava et al. [10] where computation is done in the cloud. This technique captures the images through a camera mounted in a sunglass that is transferred to the mobile device. The mobile device collaborates with the cloud for real-time image processing.
2.5 Detection Using Image-Based Bipolarity As a typical crosswalk or zebra crossing is identified as broad stripes of coats of white paint having a certain length and width over the usual pitch-black road surface, the crossing region or pattern can therefore be treated as a bipolar region. In this process, initially, the chosen candidate image portions of a crosswalk are distinguished depending on the bipolarity of the intensities of the image portions. Then, the separated portions are rectified for figuring out the angle of view. This rectification allows conduction of detail labeling on the basis of count and transition of black–white and white–black alterations within a specific area. Hence, the presence of a zebra-pattern crosswalk can be detected depending on the potency of bipolarity in an image that is matched with the crosswalk specimen. In an ideal bicolor image, the dispensation of bipolar intensity is concentrated at two points, i.e., the distribution of black and white pixel intensities is made from a sum of two delta-isolated probability functions. Let the distribution of intensity of an image region be considered as p0 (x). If the region contains only bipolar pixels that means having only black color and white color, then p0 (x) can be marked down as p0 (x) = αp1 (x) + (1 − α) p2 (x), where 0 ≤ α ≤ 1, p 1 (x) is the distribution of intensity of black pixels, and p2 (x) is the distribution of the intensity of white pixels. Then the bipolarity (strength of the black and white patterns) can be written as γ =
1 α(1 − α)(μ1 − μ2 )2 2 σ0
Where μ and σ2 represent mean and variance, respectively. From this equation, it can be concluded that 0 ≤ γ ≤ 1. If γ = 1, there are α, p1 , and p2 such that σ1 = σ2 = 0. This indicates that p1 (x) = δ(x − μ1 ) and p2 (x) = δ(x − μ2 ). So, γ = 1 corresponds to perfect bipolarity which means the region contains only pure black and white pixels, and γ = 0 depicts the lack of bipolarity that means the region contains no black and white stripes. A quintessential dispensation of the intensity of bipolarity of an image block is presented in Fig. 3.
23 Computer Vision-Based Algorithms on Zebra Crossing Navigation
325
Fig. 3 a A zebra crossing image, b Each segmented region showing bipolarity, c regions chosen as candidates by using bipolarity, d original image (crossing area) found at the location of the candidate region, e crossing area that has been Binarized, f mean integration along the crossing direction of the binarized image, g bandwidth procured from the results of the integration of the crossing bands [7]
Summary of the comparative study on zebra crossing detection using different existing methods is presented in the below Table 1.
326
S. B. Shorif et al.
Table 1 Comparative study on zebra crossing detection using different existing methods Reference
Algorithm with attributes and effectiveness
Data size
Accuracy
Stephen Se [8]
Examining groups of Experimented with concurrent lines a few images through the vanishing point constraint. The method is working slow and far from real-time
Not Mentioned
Meijer’s “vOICe,” [11]
Works based on Not Mentioned converting the image to sound. It can recognize objects such as vehicles, doors, etc., but zebra-crossing detection is still one of its shortcomings
Not Mentioned
Lausser et al. [9]
Detection using Viola and Jones approach through Adaboost. The false-positive rate is high
75 zebra crossing images
Accuracy = 97.33% Sensitivity = 1.30% Specificity = 99.90%
Bhargava et al. [10]
Detection through a collaborative mobile-cloud strategy using image processing technique that can be used as real-time crossing guidance for pedestrians
Experimented with a few images
Accuracy is not shown. Average computation response time = 660 ms
Uddin and Shioyama [7]
As a zebra crossing Experimented with contains alternate black 100 zebra crossing and white stripes of images bipolar patterns, the crossing is detected on the basis of the image bipolarity feature
Accuracy = 95% with no false positive
3 Research Directions and Challenges All the methods mentioned in the above section have been used conventional image processing methods. Recently deep learning-based approaches using CNN (convolutional neural network) and GAN (generative adversarial network) have shown tremendous success in diverse recognition tasks. Therefore, researchers may emphasize deep learning strategies in zebra crossing detection. Nowadays, smartphones have computation capabilities similar to computers and are equipped with highresolution cameras. However, there’s almost no convenient mobile-based zebra
23 Computer Vision-Based Algorithms on Zebra Crossing Navigation
327
crossing detection is available in practice and most of the vision-based zebra crossing detection techniques are in the laboratory stage. So, researchers need to mitigate these challenges by focusing on developing a mobile-based system that is capable of in working real-time. Researchers may also focus on doing detailed experimentation using real blind subjects to evaluate the practicability.
4 Conclusion Detection of zebra crossings is very important to ensure protection, ameliorate safety, vigor, and mobility of the visually impaired people in crossing a road. In this survey, several vision-based methods are described and discussed. However, no practicable convenient commercial method is available in practice. As mobile technology is highly enriched nowadays so it is the demand of time to concentrate on developing a mobile-based zebra crossing detection system. Therefore, this paper tries to survey the existing techniques, their performances and also shows the future research directions to develop an intelligent vision-based road crossing system to ameliorate the mobility of visually challenged people.
References 1. Pascolini D, Mariotti SP (2012) Global estimates of visual impairment: 2010. Br J Ophthalmol 96:614–618 2. Shingledecker CA, Foulke E (1978) A human factor approach to the assessment of mobility of blind pedestrians. Hum Factor 20(3):273–286 3. Gori M, Cappagli G, Tonelli A, Baud-Bovy G, Finocchietti S (2016) Devices for visually impaired people: High technological devices with low user acceptance and no adaptability for children. Neurosci Biobehav Rev 69:79–88 4. Senjam SS (2019) Assistive technology for people with visual loss. Delhi J Ophthalmol 30(2):7– 12. https://doi.org/10.7869/djo.496 5. Elmannai W, Elleithy K (2017) Sensor-based assistive devices for visually-impaired people: current status, challenges, and future directions. Sensors 17(3):565. https://doi.org/10.3390/ s17030565 6. Shioyama T, Wu H, Nakamura N, Kitawaki S (2002) Measurement of the length of pedestrian crossings and detection of traffic lights from image data. Meas Sci Technol 13(9):1450–1457 7. Uddin MS, Shioyama T (2005) Detection of pedestrian crossing using bipolarity feature—an ımage-based technique. IEEE Trans Intell Transp Syst 6(4):439–445 8. Se S (2000) Zebra-crossing detection for the partially sighted. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition (CVPR), Hilton Head, SC, June 2000, vol 2, pp 211–217 9. Lausser L, Schwenker F, Palm G (2008) Detecting zebra crossings utilizing AdaBoost. In: Proceedings of ESANN 2008, Bruges, Belgium, 23–25 April 2008 10. Bhargava B, Angin P, Duan L (2011) A mobile-cloud pedestrian crossing guide for the blind. In: Proceedings of AMP 2011 11. Meijer PBL (1992) An experimental system for auditory ımage representations. IEEE Trans Biomed Eng 39(2):112–121
328
S. B. Shorif et al.
12. Broggi A, Bertozzi M, Fascioli A, Sechi M (2000) Shape-based pedestrian detection. In: Proceedings of IEEE intelligent vehicles symposium, Dearborn, MI, October 2000, pp 215–220 13. Zhao L, Thorpe CE (2000) Stereo- and neural network-based pedestrian detection. IEEE Trans Intell Transp Syst 1(3):148–154 14. Curio C, Edelbrunner J, Kalinke T, Tzomakas C, Seelen WV (2000) Walking pedestrian recognition. IEEE Trans Intell Transp Syst 1(3):155–163 15. Franke U, Heinich S (2002) Fast obstacle detection for urban traffic situations. IEEE Trans Intell Transp Syst 3(3):173–181 16. Tsuji T, Hattori H, Watanabe M, Nagaoka N (2002) Development of night vision system. IEEE Trans Intell Transp Syst 3(3):203–209 17. Sumi A, Santha T (2017) An intellıgent predıctıon system for pedestrıan crossing detectıon. ARPN J Eng Appl Sci. 12:5370–5378 18. Berriel RF, Lopes AT, De Souza AF, Oliveira-Santos T (2017) Deep learning-based large-scale automatic satellite crosswalk classification. IEEE Geosci Remote Sens Lett 14(9):1513–1517 19. Liu X, Zhang Y, Li Q (2017) Automatıc pedestrıan crossıng detectıon and ımpaırment analysis based on mobıle mapping system. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci IV2/W4:251–258 20. Wang C, Zhao C, Wang H (2015) Self-similarity-based zebra-crossing detection for intelligent vehicle. Open Autom Control Syst J 7:974–986 21. Dowling J, Maeder A, Boles W (2003) Intelligent image processing constraints for blind mobility facilitated through artificial vision. In: Proceedings of Australian and New Zealand Conference on Intelligent Information Systems, Sydney, Australia, pp 109–114 22. Hoang VN, Nguyen TH, Le TL, Tran TH, Vuong TP, Vuillerme N (2017) Obstacle detection and warning system for visually impaired people based on electrode matrix and mobile Kinect. Vietnam J Comput Sci 4:71–83 23. Lin B-S, Lee C-C, Chiang P-Y (2017) Simple smartphone-based guiding system for visually impaired people. Sensors 17(6):1371 24. Real A (2019) Navigation systems for the blind and visually impaired: past work, challenges, and open problems. Sensors 19(15):3404 25. Viola P, Jone M (2001) Robust real-time object detection. In: Proceedings of IEEE workshop on statistical and computational theories of vision, Vancouver, CA 26. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Chapter 24
AI Based Multi Label Data Classification of Social Media Shashi Pal Singh, Ritu Tiwari, Sanjeev Sharma, and Ajai Kumar
1 Introduction Multi-label classification and text classification have gained growing attention in recent years, not because they are technically important, but because of its potentially fascinating perspective. Multi-label classifications deal with the assigning of several labels to each instance in a dataset. It is possible to assign an instance to more than one class at the same time. It is an AI text analysis technique that automatically tags the text that is to be categorized by topic. Thus, we can assume that the groups expected are not mutually exclusive. For example, a movie can be in any of these genres—thriller, romantic, comedy, crime, etc. Typically speaking, a single label is not sufficient to identify all the information that needs to be identified. This will require a record with more than one label. Multi labelling systems are not limited to text the categorization only but also extended to the categorization of image, audio, medical, and bio-informatics. Text classification may be used to search the brand’s social media and to classify responses by product or topic. In order to direct them to the right agency, it may be helpful to assign topics to emails or customer service tickets. Yet not every ticket S. P. Singh (B) · A. Kumar AAIG, Center for Development of Advanced Computing, Pune, India e-mail: [email protected] A. Kumar e-mail: [email protected] R. Tiwari · S. Sharma Indian Institute of Information Technology Pune, IIIT Pune, Pune, India e-mail: [email protected] S. Sharma e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_24
329
330
S. P. Singh et al.
you receive is going to fall into only one category. A proper algorithm is strongly recommended in this area, as it would not only allow categorization but would also aid in sorting, filtering, and searching for relevant information. Multi-label classification varies from multi-class classification as more than one classification tag can be added to a single text by multi-label. Text classification uses machine learning to predict and classify text with present tags. For example, if a movie is a comedy or romantic or thriller or all. Let’s have a look at how multi-label is different from multi-class classification. Multi-label classifiers can generate multiple tags/labels all together whereas multi-class can generate only one label at a time. For instance, fruit can be an orange or lemon but not both together but any given text may and may not be about culture, politics, and education, etc. at the same time. Let’s also get a slight idea of different types of classification problems for an image to understand what exactly the classification problem is (see Fig. 1). Based on three basic approaches, current multi-label classification algorithms are created: Problem Transformation Method, Problem Adaption Method, and Ensemble Methods which are further categorized into parts. Problem transformation is easily understandable [1]. Figure 2 illustrates an example of the method of transforming multi-label labelling (text classification) into multiple binary classifications (yes or no). For each binary dataset, all old-style classification algorithms can be pragmatic directly to create a classifier and predict its correlated test instances. Some examples of the classifier using problem transformation are Binary Relevance (BR), Classifier Chain (CC), Label Powerset (LP), and Label Ranking (LR). In multi-label classifications, the second approach used is Problem Adaption Method. This method is the extension of existing traditional classification approaches such as, Decision Trees and Boosting, Support Vector Machine (SVM), Multi-label K-Nearest Neighbour (MLKNN), which is an adaptation of lazy learning (kNN), and Adaboost.MH, which is an adaption of Adaboost for multi-label classification. The Ensemble Method is the third technique meant for multi-label classification. A few examples of an ensemble method are Ensemble of Classifier Chain, which is an improvement of Classifier Chain, Random k-Label Sets (RAkEL), which is developed to solve the productivity concerns of Label Powerset, and Ensemble of Multi-label Classifier (EML). The data classification problem finds broad application in different fields for tasks such as a) b) c)
News selection and grouping, Document organization in websites, digital libraries, social feeds, etc., Email classification including spam filtering.
A variation of this issue where each document can belong to any number of groups (labels) is referred to as a problem of multi-label data classification. The extension of such a problem where a categorical hierarchy interrelates the labels is referred to as a hierarchical text classification problem. In this project, we
24 AI Based Multi Label Data Classification of Social Media
331
Fig. 1 Different types of classification problems
• study multi-label data classification functions on real-world data sets • implement algorithms to leverage the hierarchy within the labels • analyse the effect on classification performance of the various algorithmic approaches and data set properties.
332
S. P. Singh et al.
Fig. 2 Multi-label problem transformation
2 Literature Review The literature review was carried out with the guidance of IEEE, Springer, and Research Gate paper using a keyword like “Natural Language Processing”, “Study on Multi-label Classification”. Few papers mentioned in references gave a good understanding of basic knowledge of multi-label classification like NLP, data classification, text alignment, text mining, text processing [3]. The study [4], it talked about multi labeling approach and the experiment was performed on New York Times (NYT) articles. To perform the classification, they scraped NYT website using API and cleaned the data. They stored the data in an excel sheet after the data was cleaned, and used MEKA tools to identify and test various algorithms for multi-label classification. And finally, they decided to use a binary classifier with the SVM algorithm. They concluded that SVM performed well for text classification but it could have done better with better strategies like the hierarchical representation of data. In [1], the researcher mentioned, many modern applications, such as webpage collections, gene biological functions, or categorization of text, require hierarchy to multi-label classification. For example, two top-level divisions, Business, and Computers can be categorized into numerous subcategories in the categorization of news articles: Business/Jobs, Business/Investing, Computers/Internet, Computers/Hardware, and Computers/Software. In [5], it was observed that supervised learning involves large quantities of labelled data and human tagger interference in the development of training sets. This process can be slow, prone to error, and time-consuming when the data sets become very large or heavily noisier. In this case, a better alternative is semi-supervised learning, which needs only a couple of labels. First, pre-processing was done to reduce the noise of unstructured text and then used WordNet to disambiguate word sense. Second, on these data sets, multiple experiments related to multi-label classification and semi-supervised learning are performed and compared with each other.
24 AI Based Multi Label Data Classification of Social Media
333
After comparing semi-supervised classification with supervised classification, it was found that semi-supervised classification increases learning efficiency and provides good results. They also concluded that deep learning in multi-labelling tasks performs better compared with other algorithms. In [6], social media data (Twitter) were used to identify the occurrence of many forms of disturbances in infrastructure due to hurricane considering the feeling of disruption. Geo parsing method has been employed to extract a position from a Twitter file. The result shows that logistic regression, kNN, SVM, DNN (Deep Neural Network) performs better than BR, LP, RAKEL. In [7], they used a naïve bayes classifier to classify the news articles, and the performance of a binary classifier approach was assessed. A common weighting scheme used for the search and retrieval of information is TF-IDF, which shows the importance of a word in an article. But they were not able to lower the error produced by the tf-idf model. In [8], Problem Transformation Method, Problem Adaption Method, and Ensemble Methods performance were evaluated for multi-label classification using the movie dataset. ML-kNN gave the best result on this dataset and Adaboost.MH gave the worst performance. Natural Language Processing (NLP) [9] is a research and application area that discovers how for useful purposes, computers can be castoff to understand and manage text or speech in natural language. NLP applications contain a variety of fields of study such as the processing and summarization of natural language text, machine translation, artificial intelligence, user interfaces, speech recognition, multilingual and cross-language information retrieval, etc. NLP technologies are becoming increasingly relevant in the areas of knowledge acquired, information recovery, and language translation for the development of user-friendly decision-making support systems for non-expert everyday users. Examples of NLP products for certain applications stated by the author in this paper are Word Processing and Desktop Publishing, WordPerfect (Novel), Transducer techniques, which enable the extraction of structure, and function queries from a maintenance manual’s text database, spelling checkers, and many more. Many of the recent studies on NLP phenomena include Syntactic phenomena, Machine translation, Semantic phenomena, and Pragmatic phenomena. As the idea of NLP is expanding, researchers are trying to understand how various languages are interpreted and used by humans. Nowadays grammar checkers, conceptual search, and event extraction are future applications for NLP rather than glossary lookup, and string matching. In [10], data classification using Tensorflow research was aimed to display the overall performance of the Tensorflow library and was measured by using the MNIST dataset. It was an experimental research study that looked at which activation function delivers fast and accurate results. For this classification, activation functions used are Rectified Linear Unit (ReLu), Exponential Linear Unit (eLu), Hyperbolic Tangent (tanH), sigmoid, softsign, and softplus. The best and the most accurate result with an accuracy of 98.43% were gained by ReLu activation function.
334
S. P. Singh et al.
The goal of future research is to increase precision by implementing various neural network architectures. In [11], text alignment research aimed to predict the type of plagiarism among the two documents and to use this knowledge to match documents better via using different text alignments with different types of plagiarism. The author has shown that they can effectively predict the plagiarism relationship using the distribution of intensities, frequencies, and locations of the common elements of the two documents. They have built META TEXT ALIGNING which, based on plagiarism sorting, improves the overall performance of text alignment by choosing either the most efficient or the best configurations. In [12], the text mining approach aimed to demonstrate how it is difficult for suppliers to gain a deeper understanding of the consumer experience of their product to enhance the service quality and support, and how text mining has helped both the consumer and the manufacturer to solve or strengthen the problem. For this problem, many advanced and real-time NLP algorithms can be used to make this approach more efficient and robust. Text Processing [13] paper deals with increased code-mixing data on social media as identification of language are becoming very difficult due to bilingual or multilingual content. Code mixing means combining at least two languages or more languages in content. It often occurs when there is no way to separate the use of two languages or two cultures from the components of one language well and regular overlap between the two systems. Code mixing is the insertion of words, sentences, and morphemes of one language into another language translation. Machine learning-based sentiment extraction accuracy for uni-grams, bi-grams, and skip-grams was 71.59, 75.14, and 76.33% respectively.
3 Popular Algorithms for Multi-label Classification This section briefly discusses some of the common multi-label classification algorithms, i.e., Binary Relevance, Classifier Chain, Random k-Label Set, Support Vector Machine, and Multi-label K-Nearest Neighbour.
3.1 Binary Relevance (BR) This operates by breaking down the multi-label learning task into a variety of different binary learning tasks (one per class label). Let’s say, we have the dataset where X is the independent feature, and Y’s are the target variable [2] (Fig. 3). This matrix is then split into 4 different classification problems in BR, as seen below [2] (Fig. 4).
24 AI Based Multi Label Data Classification of Social Media
335
Fig. 3 Binary relevance
Fig. 4 Binary relevance-matrix
3.2 Classifier Chain (CC) This is quite close to binary relevance, with the only difference being that to maintain label correlation, it forms chains. For example, in a dataset, we have X as the input space and Y’s are like the labels [2] (Fig. 5). In CC, this problem will be divided into 4 separate single label problems as shown below [2] (Fig. 6). Fig. 5 Classifier chain
Fig. 6 Classifier chain-matrix
336
S. P. Singh et al.
3.3 Random k-Label Set (RAkEL) RAkEL is developed to solve the productivity concerns of Label Powerset, which is one of the problem transformation methods, but it is time consuming. It randomly breaks the set of labels in n subsets of small size k, called k-label sets. Disjoint and overlapping label sets are two distinct strategies for constructing label sets.
3.4 ML-kNN Classification ML-KNN is derived from a widely known Neighbor K-nearest algorithm (KNN). It is a supervised classification algorithm and is also called multi-label lazy learning approach. Firstly, its k nearest neighbors in the training set is identified for each test instance. It determines previous probabilities from the k nearest training instances and then seeks the highest posteriori probability for the test instance to evaluate the label collection.
4 Performance Measures In our analysis, we used the following evaluation methods. Confusion Matrix: The classifier model often allows two kinds of errors in the context of the multi-label text classification. Type 1 Error: When we predicted positive but it was false, it’s False Positive. Type 2 Error: When we predicted negative and it was false, it’s False Negative (Fig. 7). Accuracy: Accuracy is how much we predicted correctly out of all classes including positive and negative. Consider T is the total number of observations i.e. sum of TP, FP, FN, and TN therefore, Fig. 7 Confusion matrix
24 AI Based Multi Label Data Classification of Social Media
337
Accuracy = (T P + T N )/|T | Precision: Precision defines the number of predictions of a positive class that currently belongs to the positive class. Pr ecision = T P/(T P + F P) Recall: Recall means how many true positives have been identified. Recall = T P/(T P + F N ) F-measure: It provides a way of integrating both precisions and recall into one formula capturing all properties. F − measur e = (2 ∗ Pr ecission ∗ Recall)/(Pr ecision + Recall) Hamming Loss: Hamming Loss is the proportion of incorrectly expected labels, i.e., fraction of incorrect labels to total label number. The smaller the value of the hamming loss, the better the performance, i.e., when hamming loss = 0 the result is perfect.
5 Research Methodology 5.1 Data Source In this paper, we have collected any website or HTML or XML data, for example, New York Times, or Twitter data. We scraped the website using API to get the data (Fig. 8).
5.2 Data Pre-processing In this step, we are using the pre-processing: 1.
Short text normalization—Process of transforming a text into a typical form. a) b) c)
Removing extra letters, e.g., “gooood” or “gud” will be transformed into “good”. Initial spell test for most common flaws e.g., “muinets” for “minutes”. Replacing specific words used e.g., “onl9” for “online”.
338
S. P. Singh et al.
Fig. 8 System architecture of project
2.
Hashtag decomposition—A hashtag always begins with ‘#’, making it easy to recognise. a) b) c)
3.
4.
If every token in a hashtag begins with uppercase/capital letter, we use the separation function for those terms. Ex: #MethodOfMeditation We use another feature if wordsor a token is divided by distinct symbol, characters or by numerical. Example: #10hits_lil If every token or a word starts by a small or lowercase, we use a third method which separates a hashtag from left to right with the fewest amounts of words possible. Example: #windenergy can be written as (wind, energy) or (win, energy)
Remove stop words and extra whitespaces—Stop words are very common words like, “we”, “are”, “then”, “but”, “and”, etc. it can easily be overlooked without losing the sentence’s meaning as it doesn’t add much value to the sentence. Extra whitespace, short URLs, emoticons, special characters are also removed. Tokenize—Finally, we can tokenize the sentence and apply lemmatization and stemming.
5.3 Data Processing In this step, we determine the TF-IDF (Term Frequency–Inverse Document Frequency) of the word/term (w) to check how relevant a term is given the document. It is calculated as follows:
24 AI Based Multi Label Data Classification of Social Media
339
T F − I D F(w) = T F(w) ∗ I D F(w) where, the number of times term w appears in a document the total number of terms in the document the total number of documents I D F(w) = log the number of documents with term w
T F(w) =
TF-IDF is calculated using both unigram and bigram of words.
5.4 Data Storage The extracted or cleaned data after preprocessing and processing is stored in the Excel sheet in CSV format. This will give us one file for all articles and their labels.
5.5 Data Labelling Suppose If I scrap NYT website to get the latest news articles, I will label the data with different subdivisions like Sports, Politics, Technology, Travel, Entertainment, Food, Health, Business, World and arts, etc. To represent the multi-label data, we will use binary classifier with a suitable algorithm, in which 1 and 0 will represent the presence or absence of a particular label respectively. The extracted data which is stored in the CSV file is then labeled.
5.6 Classification Algorithm I have used the Binary Classifier with SVM algorithm to classify the data after all the fundamental work is done. So, for my implementation, I used The SVM algorithm in the OneVsRest (also known as Binary Relevance) type of classifier.
340
S. P. Singh et al.
6 Result We ran a multi-label text classification problem in MEKA, which is a WEKA extension, in order to see which algorithm performs relatively best. Binary Relevance: Support Vector Machine Predictive Output • • • • •
Accuracy = 0.79 Precision = 0.75 Recall = 0.74 F1-measure = 0.70 Hamming Loss = 0.11
7 Application of Multi-label Classification Multi-label classification is used in various industries like 1.
2.
3.
4.
5.
Health Care—The graphic photos of the health-care system are unlabelled. The labelling of medical diagnoses to specific class values is therefore essential, requiring multi-label classification. Social Science—Analysis of sentiment is performed to analyse human actions in social problems and primarily focuses on mining tweet content, short message service (SMS), clinical reports, etc. Law—Until administering the fines, damages, charges, and other punishments for the convicted or offenders, legal professionals need to identify applicable laws and regulations. Nonetheless, composite laws and regulations are obviously an obstacle for ordinary people. As a consequence, the marking of legislative papers is important and workable in studying legal knowledge. Business—Researchers in the industry are using social media as a marketing point. Opinion mining from social media networks is used to boost the company on a wide scale. Text Categorization—labelling the text according to the category it belongs.
8 Conclusion As the problem of multi-label classification is one of the most critical issues in the world today, in this project we have been able to discuss multi-label classification algorithms that are applied to a dataset where the data has gone through certain steps like pre-processing, processing, and data was stored in a CSV format to further training processing using different classification algorithm and the comparison of all algorithm was done using evaluation method like, accuracy, precision, recall,
24 AI Based Multi Label Data Classification of Social Media
341
f-measure, hamming loss, etc. We also explored some applications of multi-label classification and also understood how algorithms work on a dataset. We will go on to analyse the efficiency of current algorithms for multi-label classification. At the same point, with the insights found in the experiments, we will design new algorithms for multi-label classification.
References 1. Kanj S (2013) Learning methods for multi-label classification. Machine Learning [stat.ML]. Université de technologie de Compiègne; Université Libanaise (Liban), pp 11–30 2. Vidhya A (2016) Solving Multi-Label Classification Probles (Case studies included). https:// www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/ 3. Towards Data Science (2018) Understanding Confusion Matrix. https://towardsdatascience. com/understanding-confusion-matrix-a9ad42dcfd62 4. Goyal R (2016) Natural Language Processing: Labelling New York Times Artices. https://doi. org/10.13140/RG.2.1.3484.3285 5. Billal B, Fonseca A, Sadat F, Lounis H (2017) Semi-supervised learning and social media text analysis towards multi-labeling categorization. In: 2017 IEEE international conference on big data (big data), Boston, MA, pp 1907–1916 6. Roy KC, Hasan S, Mozumder P (2020) A multilabel classification approach to identify hurricane-induced infrastructure disruptions using social media data. Comput-Aided Civ Infrastruct Eng 35(12):1387–1402. https://doi.org/10.1111/mice.12573 7. Nicolas, Chase Z (2013) Learning Multi-Label Topic Classification of News Articles 8. Tawiah CA, Sheng VS (2013) A study on multi-label classification. In: Perner P (eds) Advances in data mining. Applications and theoretical aspects. ICDM 2013. Lecture notes in computer science, vol 7987. Springer, Heidelberg 9. Joseph S, Sedimo K, Kaniwa F, Hlomani H, Letsholo K (2016) Natural language processing: a review. Nat Lang Process Rev 6:207–210 10. Ertam F, Aydın G (2017) Data classification with deep learning using Tensorflow. In: 2017 international conference on computer science and engineering (UBMK), Antalya, pp 755–758 11. Abnar S, Dehghani M, Shakery A (2015) Meta text aligner: text alignment based on predicted plagiarism relation, pp 193–199. https://doi.org/10.1007/978-3-319-24027-5_16 12. Rangu C, Chatterjee S, Valluru SR (2017) Text mining approach for product quality enhancement: (improving product quality through machine learning). In: 2017 IEEE 7th international advance computing conference (IACC), Hyderabad, pp 456–460 13. Padmaja S, Bandu S, Fatima SS (2020) Text processing of Telugu–English code-mixed languages. In: Satapathy S, Raju K, Shyamala K, Krishna D, Favorskaya M (eds) Advances in decision sciences, image processing, security and computer vision. ICETE 2019. Learning and analytics in intelligent systems, vol 3. Springer, Cham
Chapter 25
Feature Extraction Based Landmine Detection Using Fuzzy Logic T. Kalaichelvi and S. Ravi
1 Introduction Landmine detection is a dangerous problem encountered in many countries worldwide, and the condition can become worst to naturalized disasters for the development of land. There is an immediate necessity of detecting the landmine and removing it safely. Safe detection is needed, which uses image processing techniques by a non-touchable sensor like metal detectors and radars. There are two types of landmine: anti-tank landmine and anti-personnel landmine. Many researchers implement different methods to detect and clear those buried landmines. The elements of climate and environment on landmine placement could be very complicated. Some efficient landmine detection systems can retrieve the size, shape, burial depth, and casing types of landmine. Table 1. displays strategy, performance, and limitations based on sensor type used in each technique of landmine detection [17]. Biological and some of the electromagnetic sensors create more effect on the environment. Landmines made up of metals cab be detected using a metal detector. It shows a high detection probability, but the false alarm rate is also high when the presence of a metallic object on the ground surface. GPR, NQR, and Acoustic/Seismic give a low false alarm rate as compared to all the other sensors. Nowadays, dual sensors are also used for landmine detection to achieve the best detection rate.
T. Kalaichelvi · S. Ravi (B) Department of Computer Science, School of Engineering and Technology, Pondicherry University, Pondicherry 605014, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_25
343
344
T. Kalaichelvi and S. Ravi
Table 1 Comparison of the landmine detection technique based on sensors Techniques
Sensor
Strategy
Performance
Limitations
Biological
Dog
Trained to detect explosives
More reliable than other animals
Good in different environments
Rodents
Trained using food
Work more time Work under than dogs limited weather conditions
Bees
Sense of explosives like TNT
Provide higher accuracy than rodents
Plants
Changes color when nitrous oxide
Find more areas High false alarm in a short time rates
Bacteria
Spraying bacteria on mine area
Used in Highly sensitive different terrains to and detect TNT environmental conditions
MD
Passing current to a metallic object
Detect metal objects
Identify metallic clutter as mines
GPR
Send the radio wave to the ground surface
Detect plastic objects
Mines cannot detect in dry soils
MWR
Send microwaves to the ground
Detecting big Less effective and deep objects under wet soil
EIT
Uses electrical currents
Detecting all types of landmines
It cannot use in dry soil and is noise sensitive
IR
IR radiation
Works in all environment
Deeply buried objects cannot detect
Light
Capturing light waves
Large areas scanned in a short time
Used only on flat land
Lidar
Works in infrared regions
Safe and detect metallic and nonmetallic objects
Not work well in highly vegetated areas
NQR
Radiofrequency technique
Achieve a low false alarm rate
Impossible to detect metallic substances
Neutron
It emits neutrons into the ground
Reducing the background radiation
High power consumption, radiation hazards
Electromagnetic Induction
Optimal Detection
Nuclear Detection
Work in the limited environment
(continued)
25 Feature Extraction Based Landmine Detection Using Fuzzy Logic
345
Table 1 (continued) Techniques
Sensor
Strategy
Performance
Acoustic Detection
A/S
Acoustic or seismic waves
Low false alarm Low detection rates speed
Ultrasound
Emits sound wave Penetrate well in wet soils
Sensitive to noise
Clearing machine
Machine used to clear a minefield
Remote controlled and efficient
Less accuracy and safety
Prodder
Find mechanical impedance of the material
Discriminate rock, wood, plastic, and metal
Manual demining process
Mechanical Detection
Limitations
2 Review on Feature Extraction Based Landmine Detection using Fuzzy Logic and Wavelet Transform A programmed landmine discovery calculation was depicted using groundpenetrating radar by P.D. Gader et al. (2000). It involved multiple algorithms running simultaneously and combining the outputs of different module information sources into the final feature set. The process has performed by utilizing a framework that ended with an information fusion module. Results accommodated truthful data obtained uninterruptedly using the geo-centers vehicle-mounted landmine detection system [1]. A forward-looking infrared sensor used vehicular-based landmine detection. This framework initially depicted the component requirement and description of the fundamental forward-looking infrared sensor. It used a novel technique to characterize areas inside the infrared pictures in real-time. Extraction of specific features from the infrared images made for target classification. A fuzzy inference system evaluated, extracted information and generated a confidence value to identify targets as mines or clutter. Experimental results compared the accuracy in both the existence and non-existence of the mentioned system [2]. Morphological shared weight neural networks applied in discriminating surfaces, target identification, and written character recognition issues. System utilized morphological hit-and-miss Transforms (HMT) as a primary function in the feature extraction system. Traditional morphological filters have some disadvantages, like being not robust, being more noise sensitive, and not tolerating a slight change in greyscale values and shape. However, Choquet integral-based morphological operations (CMOs) are less sensitive. Ali K. Hocaoglu et al. (2003) proposed that the MSNN design utilizes CMOs instead of HMTs, represented as Choquet morphological shared weight neural networks (CMSNN). It has parts to extract salient information from the inputs using more Choquet hit-and-miss transforms and a neural network classifier. CHMT produces a feature map to view the output image. The result showed the comparison of the existing and modified networks by experimenting on the real-time issue of land mine discovery [3]. Paul Gader et al. (2004) presented a
346
T. Kalaichelvi and S. Ravi
FROSAW algorithm for landline detection using ground-penetrating radar. Constant false alarm rate (CFAR) detectors on the depth-based adaptively whitened data used to detect anomalies. CFAR certainty measure was recorded using order statistics. There were inconsistencies at locations with high confidence values. A feature-based rule proposed to dismiss cautions that do not show mine-like properties. Using constant false alarm rate and feature-based techniques together were evaluated. The calculations and examinations applied to information received from many square meters in outside destinations with a cluster of GPR sensors [4]. Identifying a covered object buried under the surface has more limitations like environmental and economic complications. Sawsan M et al. (2004) handled mine detection in a more extensive set of preprocessing and texture segmentation for the infrared sensor data. Principle Component Analysis method used to extract the dynamic data from an image. Texture Parameter and Fuzzy C-means clustering method used to identify the mine-like objects. Post-processing removed the clutter in an image based on a morphological reconstruction filter to provide accurate results [5]. GPR sensor information used for landmine recognition was used. It produced an intensity value in a three-dimensional matrix of a volume under the ground surface. A feature extraction algorithm projected data to identify mines from a few data representatives. Genuine discoveries of mine distinguished from false alerts using the k-nearest neighbor rule. Results compared the detection probability and false alarm rate between the CFAR method and computed proposed algorithm values [6]. Ismail I. Jouny et al. (2004) used the Fractional Fourier transform to find dispersive scatterers in a GPR signal. The intention was to uniquely define an object. The Fractional Fourier Transform introduced these dispersing features to a nonparametric landmine recognition scheme. Eigenmines algorithms rejected many false alarms detected by the CFAR method without disturbing the detection probability [7]. They determined covered anti-personnel (AP) landmine controlled in the more widespread setting of target recognizable proof. Deciding important features, isolated from impulse ground-penetrating radar information, could be used to order landmine. Wigner-Ville distribution (WVD), the Wavelet Transform (WT) method were used in the time–frequency domain to retrieve features by radar. The radar information was collected using the MINEHOUND dual-sensor system. It used more sorts of soil and various types of landmines. The standard for ideal segmentation used Wilk’s lambda value. Results show that time–frequency retrieved from the WVD method contains more essential data than the features retrieved using WT. Subsequently, it improves the detection of landmines, classify false alerts, and help to distinguish various landmines [8]. Fathi E. Abd El-Samie. (2009) developed a cepstral approach that uses acoustic images to detect landmines similar to pattern recognition. Image Cepstral features extracted to transform into 1-Dimensional signals by lexicographic ordering. Discrete wavelet transform is applied to the input image to retrieve the frame. Then, Mel frequency cepstral coefficients (MFCCs) and polynomial shaping coefficients were extracted from 1-Dimensional signals to train a neural network to identify features of landmine. Results achieved 100% detection rates for landmines when the nonavailability of degradations [9]. Human interaction was more needed during target
25 Feature Extraction Based Landmine Detection Using Fuzzy Logic
347
recognition and clutter discrimination during landmine recognition. So Minh DaoJohnson Tran et al. (2009) developed an automated Decision-Making System to recognize landmines using Metal Detector. ROI detection processed the input signal to isolate the suspicious area in this method. ROI data is given to the feature extraction stage to identify the target object. The CWT-based feature extractor uses the morphological characteristics of the wavelet coefficient power spectrum of the data signal. Finally, classification process identified the target or clutter based on features [10]. Umar S. Khan et al. (2010) followed the same procedure explained in [9], which included acoustic images combined with GPR images [11]. H. Kasban et al. (2010) proposed a procedure for landmine detection. The technique used images retrieved by a Laser Doppler Vibrometer (LDV) based acoustic/seismic system. This procedure used morphological image processing and discrete wavelet transform (DWT). It transformed RGB image to grayscale image, applied image closing to the structuring element, and processed 2-D Haar DWT to get an approximation component for automatic detection [12]. The cepstral approach explored by E. A. El-shazly et al. (2011) used acoustic images to detect landmines. 2-Dimensional images were transformed into 1Dimensional signals using a spiral scan. The approach followed the same procedure [9, 11] to form a database used to train neural networks. However, a Spiral scan was used instead of Lexicographic ordering after 2-D discrete transform. Detection mapped based on a database generated of cepstral features in the training phase with data features in the testing phase [13]. Minh Dao-Johnson Tran et al. (2011) proposed a target discrimination methodology using wavelet-based transform and morphological feature extraction. The feature extraction algorithm extracted morphological characteristics from the alarm and decomposed them using the CWT. It isolated a region of interest (ROI) and extracted wavelet characteristics from the region of interest. Then, constructed the feature vector using the above-extracted elements. Finally, return the features to detect mines based on classification and identify the target. The classification component uses a network system known as fuzzy Adaptive Resonance Theory Mapping (ARTMAP) [14]. Amine B. Khalifa et al. (2014) proposed an Adaptive Neuro-Fuzzy Inference Framework (ANFIS) strategy using meaningful fuzzy rules for various regions of the information space. The proposed fusion method identified local contexts and discovered associated optimal linear combination weights, finally producing a confidence value representing the target [15]. Hichem Frigui et al. (2015) proposed a Multiple Instance Adaptive Neuro-Fuzzy Inference System (MI-ANFIS). The MIANFIS architecture was used along with double Sugeno multiple instances. Parameters initialization used the FCM algorithm to cluster the positive instances into defined clusters. The technique used a basic learning algorithm to learn optimal multiple instances rules with many additional layers to identify a target. MI-ANFIS could defeat marking uncertainty and beat other usually used combination strategies [16].
348
T. Kalaichelvi and S. Ravi
3 Discussion The performance of different landmine detection algorithms was analyzed. Each method can vary significantly depending on the target depth, the material used, burial orientation, and other environmental factors. A variety of sensors were used to retrieve image data from the ground surface. Most of the algorithms used ground-penetrating radar as the best sensor to recognize landmine. Some algorithms can provide perfect proof and results, while others can provide contradicting results. However, all algorithms were developed to reduce the false alert rate and simultaneously improve the detection probability. Table 2 shows the review information about the algorithm, sensors used, advantages, and limitations of each technique used for landmine detection mentioned by Table 2 The different methods of feature extraction in landmine detection using fuzzy logic Authors
Algorithm
Sensors
Advantages
Limitations
P.D. Gader et al., [1]
ATR based (Transition, Gradient, Line-based and CAN confidence)
GPR
False alarm rate reduced
The detection rate needs to increase
B.N. Nelson et al., [2]
Mamdani style fuzzy inference system
IR
Achieved low false Higher threshold alarm rate and values resulted in a detected AT mines lower FAR, which reduced the actual detection
A.K. Hocaoglu et al., [3]
CMSNN
GPR, LIDAR
Faster computation Made attempts to in domain shape, obtain grayscale Better performance values of structuring elements
P. Gader et al., [4]
FROSAW, CFAR
GPR
Reported a small number of false alarms
Challenges faced when clutter objects similar to mines and detect only AT mines
SM, Ayman et al., [5]
FCM, Mathematical Morphology
IR
Reduce redundancy based on the thermal properties of an image
Detect only AP mines
H. Frigui et al., [6]
K-NN
GPR
Supports data Tested only in collected from Anti-Tank Mine multiple sites with different sensors of GPR (continued)
25 Feature Extraction Based Landmine Detection Using Fuzzy Logic
349
Table 2 (continued) Authors
Algorithm
Advantages
Limitations
H. Frigui et al., [7]
EHD, fuzzy K-NN GPR
Sensors
Quickly train and adapt data collected from other sites
Tested only in AT landmine
A.B. Khalifa et al., [15]
ANFIS
GPR
The fusion approach outperformed and global fusion methods
Not efficient for all types of soil
H. Frigui et al., [16]
MI-ANFIS
GPR
Performance was better than ANFIS
Perform well in labeled data. Tested only in AT mine
different authors. Most algorithms used GPR as a sensor to extract information from the ground surface. The feature extraction on a landmine, follows fuzzy logic and wavelet transform which are reviewed in this paper. Some of the methods can detect or test either anti-tank mines or anti-personal mines. Neural networks, fuzzy logic are used to retrieve information from images. In this review paper, Mamdani style fuzzy inference system [2], FROSAW, CFAR [4], K-NN [6], EHD, fuzzy K-NN [7], MI-ANFIS [16] techniques were used to detect the anti-tank landmine. FCM, Mathematical Morphology [5], WVD, WT [8] techniques were used to detect the anti-personnel landmine. The remaining techniques identify all types of mine. Table 3 shows the review information about the different methods of feature extraction in landmine detection using wavelet transform. The cepstral approach [9, 13] combined with wavelet transform gave a 100% detection rate without degradations. Table 3 The different methods of feature extraction in landmine detection using wavelet transform Authors
Algorithm
Sensors
Advantages
0. Lopera et al., [8]
WVD, WT
GPR
Identified different Tested only in shapes of mines AP landmine
FE Abd et al., [9]
Cepstral approach, Acoustic DWT, DCT, MFCC, PSC
Robust when compared to all other feature extraction methods
Showed 100% detection rate only in the absence of degradations
M.D.J. Tran et al., [10]
ROI Detection, DWT, Classification
Reduced the false alarms based on morphological properties
The system did not train with data with Clutter signals
MD
Limitations
(continued)
350
T. Kalaichelvi and S. Ravi
Table 3 (continued) Authors
Algorithm
U. S. Khan et al., [11]
Cepstral approach, Acoustic and GPR Achieved the DWT highest detection rate
Sensors
Advantages
Limitations
H. Kasban O et al., [12]
Morphological image processing, DWT
Acoustic/Seismic
Low false alarm Low rate, Reduced the probability clutter object and detection identified both AT and AP mine
E. A. El-shazly et al., [13]
Cepstral Approach, Discrete Transform, Spiral Scan
Acoustic
Feature extraction from target images are the most robust
Showed 100% detection rate only in the absence of degradations
M.D.J. Tran et al., [14]
CWT, Fuzzy ARTMAP
MD
Achieved high accuracy and lower false alert percentage
Further improvement needed in the fusion method
The location of the images affects the recognition rates
4 Analysis Figure 1. shows the Sugeno method, and Fig. 2. shows the Choquet method information. A chart is prepared to compare both methods for probability detection and false alarm rate percentage. Here mapped the confidence value retrieved from the calibration lane. The result shows high probability detection with a high false alarm rate in the Sugeno method and median probability detection with less false alarm rate in the Choquet method. The main aim is to minimize the false alarm rate with accurate landmine detection. The Probability detection (PD) and False alarm rate (FAR) percentage were obtained from the calibration lane retrieved by the algorithms used in [4]. Sugeno
FAR(%)
0.05 0.04 0.03 0.02 0.01 0
0
20
40
60
80
100
PD(%) Fig. 1 Probability detection and false alarm rate percentage of sugeno method
120
25 Feature Extraction Based Landmine Detection Using Fuzzy Logic
Choquet
0.04
FAR (%)
351
0.03 0.02 0.01 0 80
82
84
86
88
90
92
94
PD(%) Fig. 2 Probability detection and false alarm rate percentage of choquet method
0.06
Comparision of CFAR and FROSAW
0.05 0.03
FAR(%) for
0.02
FAR(%) for
0.01 0
78-83 84-88 89-94 95-100 84-88 89-94 95-100 86-90 91-95 96-100 80-86 87-93 94-100 79-85 86.92 93-100 73-81 82-90 91-100
FAR (%)
0.04
PD (%) of different type of Lane Fig. 3 Probability detection and false alarm rate percentage of CFAR and FROSAW
Figure 3 shows that the false alarm rate of the FROSAW method was less than the constant false alarm rate detection method. FROSAW showed more variation in false alarm rates and ultimately increased the confirmed detection of landmines. However, this method collected data from a different site. Figure 4 shows that PD vs. FAR for EigenMines2 and EigenMines3 mentioned in [6]. The false alarm rate got reduced for both eigenmines2 and eigenmines3 in most lanes compared to the constant false alarm rate detector method. Moreover, Eigenmines3 performance is good as compared to eigenmines2 because eigenmines3 encodes more information retrieved from the cross-track direction. This algorithm rejects many false alarms identified by the constant false alarm detector without disturbing the mine detection rate.
352
T. Kalaichelvi and S. Ravi 0.08 0.07
FAR (%)
0.06 0.05 0.04
CFAR
0.03
Eigenmines2
0.02
Eigenmines3
0
78-83 84-88 89-94 95-100 84-88 89-94 95-100 86-90 91-95 96-100 80-86 87-93 94-100 79-85 86.92 93-100 73-81 82-90
0.01
PD (%)
Fig. 4 Probability detection and false alarm rate percentage of CFAR and eigenmines
5 Conclusion In this paper a review is provided on feature extraction-based landmine detection using fuzzy logic and wavelet transform. Landmine detection plays a significant role in saving the lives of soldiers, the public, and animals. The terrorists bury the landmine under the ground. The landmine is detected using several sensors. The data retrieved from the sensors will be processed using digital image processing techniques, and the performance of the algorithms evaluated using the metrics as false alarm rate and detection probability. The different feature extraction methods were used for landmine detection based on Fuzzy Logic and wavelet transform were discussed and evaluated using CFAR, FROSAW, Eigenmines algorithms. Then discussed the sensors used, the merits and demerits of each technique. New algorithms will be proposes for feature extraction, image segmentation, and image classification algorithms in the future. The algorithms will be tested to detect different types of mines that support various factors like different depth and orientation, climatical changes, variety of soil, and multiple materials used in mine as plastic, metal, or wood.
References 1. Gader PD, Nelson BN, Frigui H, Vaillette G, Keller JM (2000) Fuzzy logic detection of landmines with ground penetrating radar. Signal Proc 80(6):1069–1084. https://doi.org/10.1016/ S0165-1684(00)00020-7 2. Nelson BN (2000) Region of interest identification, feature extraction, and information fusion in a forward looking infrared sensor used in landmine detection. In: Proceedings IEEE workshop on computer vision beyond the visible spectrum: methods and applications (Cat. No.PR00640), 2000, pp. 94–103. https://doi.org/10.1109/CVBVS.2000.855254
25 Feature Extraction Based Landmine Detection Using Fuzzy Logic
353
3. Hocaoglu AK, Gader PD (2003) Domain learning using Choquet integral-based morphological shared weight neural networks. Image Vis Comput 21(7): 663–673. https://doi.org/10.1016/ S0262-8856(03)00062-3 4. Gader P, Lee WH, Wilson JN (2004) Detecting landmines with ground-penetrating radar using feature-based rules, order statistics, and adaptive whitening. IEEE Trans Geosci Remote Sens 42(11):2522–2534. https://doi.org/10.1109/TGRS.2004.837333 5. Sawsan M, Ayman ED, Ahmed B, Hanan AK (2003) Fuzzy C-means and mathematical morphology for mine detection in IR image. In: 2003 46th midwest symposium on circuits and systems, Vol. 2, pp. 670–673. https://doi.org/10.1109/MWSCAS.2003.1562375 6. Frigui H, Gader P, Satyanarayana K (2004) Landmine detection with ground penetrating radar using fuzzy K-nearest neighbors. IEEE Int Conf Fuzzy Syst 3:1745–1749. https://doi.org/10. 1109/FUZZY.2004.1375447 7. Frigui H, Gader P (2009) Detection and discrimination of land mines in ground-penetrating radar based on edge histogram descriptors and a possibilistic K-nearest neighbor classifier. IEEE Trans Fuzzy Syst 17(1):185–199. https://doi.org/10.1109/TFUZZ.2008.2005249 8. Lopera O, Milisavljevie N, Daniels D, Macq B (2007) Time-frequency domain signature analysis of GPR data for landmine identification. In: 2007 4th international workshop on, advanced ground penetrating radar, pp. 159–162https://doi.org/10.1109/AGPR.2007.386544 9. Abd El-Samie FE (2009) Detection of landmines from acoustic images based on cepstral coefficients. Sens Imag 10(3–4): 63–77.https://doi.org/10.1007/s11220-009-0047-9 10. Tran MDJ, Abeynayake C (2009) Evaluation of the continuous wavelet transform for feature extraction of metal detector signals in automated target detection. Stud Comput Intell 199:245– 253. https://doi.org/10.1007/978-3-642-00909-9_24 11. Khan US, Al-Nuaimy W, Abd El-Samie FE (2010) Detection of landmines and underground utilities from acoustic and GPR images with a cepstral approach. J Vis Commun Image Represent 21(7):731–740. https://doi.org/10.1016/j.jvcir.2010.05.007 12. Kasban H, Zahran O, El-Kordy M, Sayed MS, Elaraby FE, El-Samie A (2010) False alarm rate reduction in the interpretation of acoustic to seismic landmine data using mathematical morphology and the wavelet transform. Sens Imag Int J 11(3):113–130. https://doi.org/10. 1007/s11220-010-0056-8 13. El-shazly EA, Elaraby SM, Zahran O, El-Kordy M, Abd. El-Samie F.E (2010) Cepstral detection of buried landmines from acoustic images with a spiral scan. In: ICENCO’2010 - 2010 int. comput. eng. conf. expand. inf. soc. front., pp. 97–102. https://doi.org/10.1109/ICENCO. 2010.5720434. 14. Tran MDJ, Abeynayake C, Jain LC (2012) A target discrimination methodology utilizing wavelet-based and morphological feature extraction with metal detector array data. IEEE Trans Geosci Remote Sens 50(1):119–129. https://doi.org/10.1109/TGRS.2011.2159801 15. Khalifa AB, Frigui H (2014) Fusion of multiple landmine detection algorithms using an adaptive neuro fuzzy inference system. Int Geosci Remote Sens Symp, 3148–3151. https://doi.org/10. 1109/IGARSS.2014.6947145 16. Khalifa AB, Frigui H (2015) A multiple instance neuro-fuzzy inference system for fusion of multiple landmine detection algorithms. Int. Geosci. Remote Sens. Symp. 2015: 4312–4315. https://doi.org/10.1109/IGARSS.2015.7326780. 17. Kasban H, Zahran O, Elaraby SM, El-Kordy M, Abd El-Samie FE (2010) A comparative study of landmine detection techniques. Sens Imag Int J 11(3):89–112. https://doi.org/10.1007/s11 220-010-0054-x
Chapter 26
Prediction of Water Quality Index of Ground Water Using the Artificial Neural Network and Genetic Algorithm Mehtab Mehdi and Bharti Sharma
1 Introduction Water is the most important resource for all the society. Pure water is not only requiring for our ecosystem but also it is one of the important elements for our industrial world. The significance of ground water has been identified by many researchers. This is the cause of establishment of different water resources and water purifiers in the cities, industrial areas and agriculture activities. At this time the whole world is facing the crisis of pure water. Many people use the groundwater as for drinking purpose in India. According to the survey of CPCB (central pollution control board), Nearly Twenty-nine thousand million liter/day of waste produced from big cities and small cities beyond that about 45% is from thirty-five metro-cities alone [1]. There are many factors which are pressurizing to improve the water resources. The water emergency refers to a worldwide condition where society in many areas lacks access to adequate water, clean water, or both. The WQI is a mathematical value which represents the quality of water in cities, villages and corporate area. WQI is a value defining the complex pressure of multiple water quality factors [2, 3]. To calculate the WQI is not an easy task. There are lots of ways to calculate the WQI. FWQI and NSFWQI are some well-known methods to calculate the WQI. Many countries have developed an ideal value that spotlight on water situation and parameters, but India is a very wide country with a variety of weather situation that vary from dry to moist. Any method which is giving the good result at one site might be possible is not applicable in another site [4]. If we calculate the WQI using by any of the above, we can also face some problems. Like as: Calculation of WQI takes lot of time because of the complex process. Also, the calculated WQI is inconsistent.
M. Mehdi (B) · B. Sharma DIT University, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_26
355
356
M. Mehdi and B. Sharma
For solving the above issues, many researchers took the help from the technology to calculate the WQI. Predictions of WQI in ground water have been attained by different Data mining algorithms like SVM, decision trees, and random forest [5]. Though SVM get good efficiency, its value is minimized by testing four kernel values to choose the best. It includes numerous factors for that optimal result to be produced. ANN is the widely used data mining algorithm. Alone ANN can affect the efficiency of the model but the hybrid AI model can give the better accuracy. In this hybrid model we used genetic algorithm for getting the optimal result and improving the ANN accuracy [6]. In this research work, ten samples of groundwater are collected from Amroha district, Uttar Pradesh, India. The ten water quality factors measured for the calculation of WQI. Like as pH, (TH), (TDS), (COD) and (BOD) [7].
2 Study Area The Amroha city is located nearly 130 km from the New Delhi which is the national capital of India. This is situated in the Himalayas area at the northern region (Uttar Pradesh). The climate of amroha based on the Koppen Climate categorization i.e. it has three predominant seasons: winter, summer and spring. The severe temperature fluctuates from 0 to 48 °C. Due to lack of rain in the extremely changeable Monsoon time is the reason of drought in the North Region of India. Amroha city is also affected by such disasters many times. Most of the population depends on the groundwater for the drinking and other uses. Its geological area is near about 2300 Sq. Km. and situated among 280 24 01 and 280 06 01 north latitude, and 78,003 01 and 78,043 01 east longitude and available in study of India. The district headquarter is situated in the eastern section of the region. Gangs River flows in the west and estranged the Ghaziabad and Bulandshahar district. The area comes in the Ganga basin [8].
3 Methodology The methodology is divided in the following subparts
3.1 Data Collection and Preparation Ten years (2006 to 2016) of data were collected at 10 different tactically - sites of water class observing places inside the Amroha. In the collected data different features like as BOD, COD, DO, pH, total solids (TS), and electrical conductivity (EC) are included [9]. These collected features used to calculate the WQI of the Amroha region. In this process of collecting the samples and their analysis followed
26 Prediction of Water Quality Index of Ground Water …
357
standard technique used in the Central Ground Water Board. The 70% training set and 30% test set is separated using tenfold cross validation method. Although there is no general rule to divide the test and training set for both spatial and worldly prediction. For computing the AMWQI, The NSFWQI method was used. The greater value of AMWQI indicates the cleanliness of ground water. For calculating the AMWQIsc following formula can be used AMWQI =
n
Wi × Sl i
i=1
The AMWQI is the Water Quality Index of Amroha region. The range of WQI is available in Table 1. Wi is the mass of ith variable (between 0-1), and SIi is the resulting of sub -index from the curve (0–100). The techniques of computation are constant through the AMFWQI [10]. In the Table 2 it is clear that in Amroha the WQI ranges from 15.6 to 91.8. The quality of water is wide-ranging from bad to excellent. The water in the higher areas is good quality; it appears that holding negligible pollution made from confined forests. The urbanization is one cause to minimize the quality of ground water [11]. For the updated prediction while the data is normalized, following formula can be used to calculate the cost of the parameter: xi = (xi − xmin )/(xmax − xmin ) where, xi is the updated cost of a variable (i.e., ph, EC, etc.), xi, cost on the place, and xmin, and xmax are the lower and upper cost. For the data collection, we used the different sensors and connect those sensors with the private cloud storage, this data is stored in our local system and finally this Table 1 WQI range with quality
Index
Range
Quality
AMWQI
1 means lightening the details. g = 1 means no effect. g < 1 means darkening the details. Subsequent A higher value of gamma [typically g >= 2.00] causes the image to “washed out” and vice versa. Therefore, by altering different values and testing them on the image, a suitable g (gamma value) is picked, the output o is scaled back to the actual range [0,255] (See Eq. 2). In this following equation, o = output image, r = rescaled image in [0,1] and g = gamma value. O=
r g ∗ 255 255
(2)
In this work the value of gamma was 2.00. Here the flowchart of gamma correction enhancement technique is shown in Fig. 3.
32 Deep Learning-Based Lentil Leaf Disease Classification
433
Fig. 3 Flowchart of gamma correction
Histogram Equalization. Histogram equalization is a highly utilized enhancement process in enhancement techniques [12]. It helps to improve the contrast value of all types of images. Roshan Raj and his colleagues [13] suggested that by applying an adaptive histogram equalization technique, anyone can easily enhance the contrast of any image. Bogy et al. [14] proposed software that can improve image contrast by using the histogram equalization technique, and they applied histogram equations for enhancing the quality of the image. Likewise, in this study, the histogram technique has been applied for clearing the image information and make this working process more accessible. The Process of the histogram equalization is shown in Steps (1–3). STEP 1: Calculate the pixel value histogram of the input image. The histogram holds the value per pixel f x, y into one of l consistently-spaced buckets h[i] h[i] =
a m=1
b n=1
1, i f f [m, n] = i other wise, 0,
(3)
434
0
K. Fatema et al.
RGB(1,9,17) 0.000%
RGB(254,254,254)100.000%
254
Fig. 4 Scale of GEM16
0
RGB(1,2,2) 0.000%
RGB(254,252,250)100.000%
254
Fig. 5 Scale of GEM256
where l = 28 and the image dimension is a × b. STEP 2: Calculate the order distribution function C D F[ j] =
j i=1
h[i]
(4)
STEP 3: Scale the input image utilizing the sequence distribution function to generate the output image. g[m, n] =
C D F[ f [m, n]] − C D Fmin X (l − 1) (a X b) − C D Fmin
(5)
Gradient Energy Measure (GEM) Filter. To achieve the optimal result, we need to enhance the image quality. ImageJ can enhance the image quality by removing the colors, increasing the contrast, and the signal thresholding of the images. It is a metaphorical analysis with open-source software and flexible enough to adapt to different requirements [15]. This software has a different filter, but GEM is one of the filters to increase the quality of the images. GEM is a measure of the quality of a promising image [16]. As shown in Fig. 4, GEM16 first divides its scale into 16 blocks. When the scale range is 0 the RGB value is 1,9,17 respectively, whereas when the scale range is 100 the RGB value is equal to 254. As shown in Fig. 5, GEM256 first divides its scale into 256 blocks. When the scale range is 0, the RGB values are 1,2,2, respectively, whereas when the scale range is 100, the RGB values are 254, 252, 250, respectively. Thus, it can be seen that the GEM scale helps to change the quality of the image according to its range. After applying all the data preprocessing and image enhancement methods to all the images, the output results are shown in Fig. 6
32 Deep Learning-Based Lentil Leaf Disease Classification
435
Fig. 6 Resulted Image after applying all Image preprocessing and enhancement methods
5 Deep Learning Methods The deep learning technique can extract the features by itself and provide high-quality results simultaneously; that’s why researchers utilize the deep learning models in their system. So, we have also applied ResNet50, VGG16 and Inceptionv3 in this study.
5.1 ResNet50 ResNet is a very deep network, and it contains 50-level [17]. It helps to predict the delta to reach the annual predictions from one level to the following [18]. In the ResNet model, backpropagation does not come across the vanishing gradient issue. A residual neural network has alternate route networks corresponding to the regular convolutional layers that assist with understanding the global features. These shortcut connections permit the network to avoid the unusual layers during training and result in the best modification of the number of layers for faster training [19]. In this work, we assess the ResNet-50 method to detect lentil plant diseases in the harvest field. This model is a 50-level deep-sophisticated convolutional network using repetitive connections with transfer learning. It contains a 7 × 7 convolution layer with 64 kernels, a 3 × 3 max-pooling layers with stride 2, 16 residual building blocks, and a 7 × 7 average pooling layers with stride 7 besides a new FC (fully-connected) s layer before the flatten output level [20].
436
K. Fatema et al.
5.2 Vgg16 VGG16 is a VGGnet model that customizes 16 levels as a model architecture [21]. A typical VGG16 contains 5 convective blocks before being connected to the multilayer perceptron (MLP) classification.[22]. The VGG model has a convection level with a series of 3 × 3 convolutions and 2 × 2 maximum-pooling layers followed by two FC (fully connected) layers with the last layer as the flatten output [23]. On the VGG-16 network, the first two convoluted layers contain a 3 × 3 sizes 64 feature kernel filter. The figure with depth 3 goes through those layers, and the dimensions change to 224 × 224 × 64, and the output is sent using a stride of 2 to the maximum-pooling level. And then, there are the third and fourth levels with 3 × 3-dimension 124 feature kernel filters followed by a max-pooling layer with stride 2, where the outcome is compact at 56 × 56 × 128. The fifth to the seventh layers utilize 256 feature maps with stride 2, and the eighth to thirteen have 512 kernel filters with stride 1. Fourteen and fifteen layers are 4096 units of fully connected concealed layers followed by flatten output level 1000 units [24].
5.3 Inceptionv3 The Inceptionv3 model has higher performance in object identification by comparing GoogleNet (Inception-v1). Specifically, the deep learning model contains three parts: the basic convolutional block, enhanced Inception module, and the classifier [25]. In inceptionv3, first, it accepts the image input of size 224 × 224 × 3. In the first convolutional layer, we have 3 convolutional layers where the filter dimension is 3 × 3. This layer has 32 filters with stride 2, and the filter size is 3 × 3; likewise, the second convolutional level also has 32 filters with stride 1, and the filter dimension is 3 × 3. In third convolutional layer have 64 filters with stride 1, and filter dimension is 3 × 3. When the image dimension is 224 × 224 × 3, then it passes through these 3 convolutional layers. The first convolutional layer gives us the image size 111 × 111 × 32, the second convolutional layer gives us the image size 109 × 109 × 32, and the third convolutional layer give us the image size 109 × 109 × 64. Next, the filter size of the max pool layer is 3 × 3, and the sprite is two. The bottom layer of this max pool is the feature extraction portion of this model [26]. The output from the convolutional layers becomes the input of the max pool layer, and the size of the output images becomes 54 × 54 × 64 after applying a 3 × 3 filter with sprite. Next, the convectional layer has 8o filters where the filter size is 1 × 1, including stride 1. The next convolutional layer has 192 filters where the dimension of the filter is 3 × 3, including stride 1. After that, in the max pool, a layer has a 3 × 3 filter with stride 2. Then here is the inception block A, which is driven 3 times. Subsequently, reduction block A runs one time and then inception block B, which is driven 4 times, and again reduction block B run one time, and then inception block B, which is driven 2 times.
32 Deep Learning-Based Lentil Leaf Disease Classification
437
Fig. 7 Architecture of Inceptionv3
Lastly, the average pool layer comes. Here have fully connected two dense layers before the SoftMax output layer (See Fig. 7).
6 Experiments The research details, such as the image dataset and performance metrics, are all discussed in this section of the chapter.
6.1 Dataset To develop the lentil leaf disease dataset, we have collected images of different sizes with the help of a digital camera. Our collected image sizes were 1024 × 764, 1920 × 1080, and 1280 × 720. After receiving the initial data set, we have studied the classes before starting the image augmentation and image preprocessing then got some grey and blurry images. These images may be the cause of decreased accuracy in this experiment. Accordingly, about 20 images have been manually removed from the dataset. Lastly, we found a total of 3,750 images of lentil diseases through data augmentation and image preprocessing techniques [27]. The entire test is completed in a 7: 2: 1 ratio for training, validation and testing. Classified information of the lentil disease dataset is shown in Table 1.
438
K. Fatema et al.
Table 1 The categorical information on the lentil disease dataset Lentil Disease
Originala
Expandedb
Trainingc
Validationd
Teste
Anthracnose
116
1160
812
232
116
Ascochyta blight
145
950
665
190
95
Mold
102
1020
714
204
102
Rust
62
620
434
124
62
Total
425
3750
2625
750
375
a: Original images; b: Augmented Images; c: Trained set images; d: Validation set images; e: Test set images.
6.2 Performance Metrics Performance metrics are an essential element of the estimation frameworks in different fields [28]. All the formulas of performance metrics that we have applied in this research are given below. P+T N Accuracy = T P+FT P+F N +T N TN Specificity = F P+T N TP Precision = F P+T P Tp Recall = F N +T P Pr ecision∗Recall F − Measure = 2∗ Pr ecision+Recall N Classification Error = T P+TF NP+F +F P+F N
(6)
In machine learning, different classification properties can be determined with different performance metrics, and especially in two-class classification problems, maximum performance metrics are sensitive for the conformation of the datasets [29]. The training set and validation set are utilized, respectively, for training the model and providing the training progress reports as training is completed or not. Finally, the different pre-trained model is applied to the test set to evaluate the model’s results. The effectiveness of the proposed system is estimated based on the required measures of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values [30].
7 Result Analysis In this study, Table 2 and 3 present the gamma correction and histogram equalization performance matrix description, respectively. From Table 2, it can be observed that by applying the gamma correction enhancement technique, the misclassification rate for the VGG16 and Inception model is equal to 1.2%, where this value is 6.67% for the ResNet50 model, which is higher than others. However, the rest of the values are
32 Deep Learning-Based Lentil Leaf Disease Classification
439
Table 2 Performance description of the gamma correction Class
Specificity (%)
Precision (%)
Recall (%)
F-measure (%)
Classification error (%)
VGG16
99.16
97.86
97.31
97.56
1.2
InceptionV3
99.18
97.85
97.08
97.41
1.2
ResNet50
95.45
85.21
86.32
85.15
6.67
Table 3 Performance description of the histogram equalization Class
Specificity (%)
Precision (%)
Recall (%)
F-measure (%)
Classification error (%)
VGG16
99.18
97.64
97.37
97.47
1.2
InceptionV3
99.36
98.18
98.13
98.16
0.93
ResNet50
98.86
96.16
96.09
96.12
1.73
impressive except for their misclassification value. On the other hand, in histogram equalization performance matrices, the misclassification rate for Incep-tionV3 is only 0.93%, and its other results (Specificity, Precision, Recall, and F-measure) are higher than VGG16 and ResNet50. Therefore, it can be concluded that using the histogram equalization technique, the InceptionV3 model performs very well to classify lentil diseases. (See Table 2 and 3 for all values).
7.1 Comparison of Different Model Accuracy Comparison of model accuracy plays a vital role in understanding which model performs best on the system and provides the best solution from multiple models. Therefore, several researchers have compared the accuracy of their models in their systems to get optimal results [5, 31–34]. In this study, we also compared the accuracy of our models to predict the optimal results of our systems. Model Accuracy Comparison applying Gamma Correction. Here Table 4 shows a model of recognition accuracy using gamma correction. Table 4 The classification accuracy of deep learning models
Deep learning model
Input size
Recognition accuracy (%)
VGG16
224 × 224
97.60
Inceptionv3
224 × 224
97.60
ResNet50
224 × 224
85.87
440
K. Fatema et al.
Fig. 8 Model accuracy comparison graph (applying gamma correction)
Table 5 The classification accuracy of deep learning models
Deep learning model
Input size
Recognition accuracy (%)
VGG16
224 × 224
97.60
Inceptionv3
224 × 224
98.13
ResNet50
224 × 224
96.53
As displayed in Fig. 8, the X-axis presents the training repetitions, and the Y-axis presents the comparing training accuracy. The accurateness history of the individual identifier network test is reported in Table 4. Figure 8 illustrates the recognition accuracy results of Inceptionv3 and VGG16 networks are satisfying and impressive. At the same time, to acquire optimal accuracy in this work, we also need to focus on resnet50 as the accuracy rate of this model is 85.87% which is comparatively lower than others. Model Accuracy Comparison applying Histogram Equalization. Here Table 5 shows a model of recognition accuracy using histogram equalization. As displayed in Fig. 9, the X-axis presents the training repetitions, and the Y-axis presents the comparing training accuracy. The accurateness history of the individual identifier network test is reported in Table5. Figure 9 presents, the recognition accuracy results of Inceptionv3 and VGG16 networks are satisfying and impressive. However, in this whole work, Inceptionv3 provides the best accuracy (98.13%) in the histogram equalization method.
32 Deep Learning-Based Lentil Leaf Disease Classification
441
Fig. 9 Model accuracy comparison graph (applying histogram equalization))
8 Conclusion In this article, we recommend an architecture based on deep learning, Inceptionv3 (applying histogram), for lentil leaf disease detection. The proposed technique can automatically extract the disease spots of lentils and classify four common lentil diseases with high accuracy (98.13%). We have collected a total of 425 original images physically from different lentil fields, and a total of 3,750 images of lentil diseases were created through the data augmentation process. In this study, we have applied the gamma correction and histogram equalization methods as image enhancement techniques to prove that the image preprocessing technique improves the models’ performance. For example, in the case of Inception v3, the application of gamma and histogram resulted in a 0.53% change in their accuracy. So, the result proposed that the Inceptionv3 method (applying histogram equalization) can identify four common lentil diseases expeditiously and accurately, and it affords a possible solution to detect the lentil disease for a real-time application.
References 1. Adoption and Impact of Improved Lentil Varieties in Bangladesh. https://cas.cgiar.org/spia/pub lications/adoption-and-impact-improved-lentil-varieties-bangladesh-1996-2015. Accessed 13 Oct 2021 2. Lentil Imports on the rise. https://www.thedailystar.net/business/news/lentil-imports-the-rise2036693. Accessed 13 Oct 2021 3. Shahin MA, Symons SJ (2003) Lentil type identification using machine vision. Can Biosyst Eng Le Genie des Biosyst. au Canada 45:5–11 (2003) 4. Singh K, Kumar S, Kaur P (2019) Automatic detection of rust disease of Lentil by machine learning system using microscopic images. Int J Electr Comput Eng 9:660 5. Xie X, Ma Y, Liu B, He J, Li S, Wang H (2020) A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Front Plant Sci 11:751
442
K. Fatema et al.
6. Kamal MM, Masazhar ANI, Rahman FA (2018) Classification of leaf disease from image processing technique. Indones J Electr Eng Comput Sci 10:191–200 7. Singh K, Kumar S, Kaur P (2019) Support vector machine classifier based detection of fungal rust disease in Pea Plant (Pisam sativam). Int J Inf Technol 11:485–492 8. Raut S, Fulsunge A, Student PG (2007) (Certified Organization) Website: www.Int.J.Innov. Res.Sci.Eng.Technol. (An ISO. 3297) 9. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 10. Kaur J, Puri S, Kaur V (2016) Iris recognition using Hough’S transform gamma correction and histogram thresholding method. Int J Eng Sci Res Technol 5:84–92 11. Xu G, Su J, Pan H, Zhang Z, Gong H (2009) An image enhancement method based on gamma correction. In: ISCID 2009 - 2009 international symposium on computational intelligence and design, Changsha, China. IEEE, pp 60–63 12. Abdullah-Al-Wadud M, Hasanul Kabir M, Ali Akber Dewan M, Chae O (2007) A dynamic histogram equalization for image contrast enhancement. IEEE Trans Consumer Electron 53(2):593–600 13. Jajware RR, Agnihotri RB (2020) Image Enhancement of Historical Image Using Image Enhancement Technique. Lecture Notes in Networks and Systems. Springer, Singapore 14. Oktavianto B, Purboyo TW (2018) A study of histogram equalization techniques for image enhancement. Int J Appl Eng Res 13(2):1165–1170 15. Papadopulos F, Spinelli M, Valente S, Foroni L, Orrico C, Alviano F, Pasquinelli G (2007) Common tasks in microscopic and ultrastructural image analysis using ImageJ. Ultrastruct Pathol 31(6):401–407 16. Ewing GJ, Barnden LR (1998) The gradient energy measure (GEM): an objective measure of image quality. ANZ Nucl Med 29:40–50 17. Chu Y, Yue X, Lei Yu, Sergei M, Wang Z (2020) Automatic image captioning based on ResNet50 and LSTM with soft attention. Wirel Commun Mob Comput 2020:1–7 18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Las Vegas, NV, USA. IEEE Computer Society, pp 770–778 19. Theckedath D, Sedamkar RR (2020) Detecting affect states using VGG16, ResNet50 and SE-ResNet50 networks. SN Comput. Sci. 1:79 20. Thenmozhi K, Srinivasulu Reddy U (2019) Crop pest classification based on deep convolutional neural network and transfer learning. Comput Electron Agric 164:104906 21. Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, de Geus P (2018) Malicious software classification using VGG16 deep neural network’s Bottleneck features. In: Advances in intelligent systems and computing. Springer, Cham 22. Hridayami P, Putra IKGD, Wibawa KS (2019) Fish species recognition using VGG16 deep convolutional neural network. J Comput Sci Eng 13(3):124–130 23. Swasono DI, Tjandrasa H, Fathicah C (2019) Classification of tobacco leaf pests using VGG16 transfer learning. In: Proceedings 2019 international conference information and communication technology and systems ICTS 2019, Surabaya, Indonesia. IEEE, pp 176–181 24. Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int J Sci Res 9(10):p9420 25. Lin C, Li L, Luo W, Wang KCP, Guo J (2019) Transfer learning based traffic sign recognition using inception-v3 model. Period Polytech Transp Eng 47(3):242–250 26. Liu Z, Yang C, Huang J, Liu S, Zhuo Y, Lu X (2021) Deep learning framework based on integration of S-Mask R-CNN and Inception-v3 for ultrasound image-aided diagnosis of prostate cancer. Futur Gener Comput Syst 114:358–367 27. Image Preprocessing Lentil disease dataset. https://drive.google.com/drive/folders/1gX2rFY2i qG44qcdEQjOl4yHumAuOCkbk. Accessed 13 Oct 2021 28. Botchkarev A (2018) Performance metrics (error measures) in machine learning regression, forecasting and prognostics: properties and typology 14:45–79
32 Deep Learning-Based Lentil Leaf Disease Classification
443
29. Rácz A, Bajusz D, Héberger K (2019) Multi-Level Comparison of machine learning classifiers and their performance metrics. Molecules 24(15):1–18 30. Zahid Hasan M, Zubair Hasan KM, Sattar A (2018) Burst header packet flood detection in optical burst switching network using deep learning model. Procedia Comput Sci 143:970–977 31. Olden JD, Joy MK, Death RG (2004) An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Modell 178:389– 397 32. Zhao X, Yan X, Yu A, Van Hentenryck P (2020) Prediction and behavioral analysis of travel mode choice: a comparison of machine learning and logit models. Travel Behav Soc 20:22–35 33. Rubaiat SY, Rahman MM, Hasan MK (2019) Important feature selection accuracy comparisons of different machine learning models for early diabetes detection. In: 2018 international conference innovation in science engineering technology ICIET, Dhaka, Bangladesh. IEEE, pp. 1–6 34. Mukti IZ, Biswas D (2019) Transfer learning based plant diseases detection using ResNet50. In: 2019 4th international conference electrical information and communication technology. EICT 2019, Khulna, Bangladesh. IEEE, pp 1–6
Chapter 33
Framework for Diabetes Prediction Using Machine Learning Techniques Through Swarm Intelligence C. Kalpana and B. Booba
1 Introduction Artificial intelligence and data-driven algorithms help to manage chronic disease by adopting prediction methods and monitoring techniques [1, 2]. Off late the research Community has developed interest in the clinical support system [3] that helps to enhance the everyday life of patients. In this paper, our objective is to implement an algorithm for the accurate prediction that supports a decision support system related to diabetes. Diabetes is caused by elevation in blood sugar levels caused by either insulin action or secretion disorder. Sometimes insulin deficiency is caused by a combination of both. It is a chronic metabolic disease. There are some categories of diabetes Mellitus and gestation diabetes. Some of the symptoms of diabetes are dehydration, polydipsia, blurred vision, exhaustion, weight loss, and so on [4]. To maintain health and quality dip it is essential to have proper regulation and prevention of diabetes. The development of chronic complications of diabetes can be reduced by proper regulation of blood glucose levels [4, 5]. Based on the degree of disorder of insulin secretion, the treatment of diabetes varies. Different types of therapies are insulin medication, non-medication, and special medication. The major concern of the patient and research community about diabetes is the exorbitant treatment cost. As previously mentioned research community focuses on developing an application to support diabetes in a more effective way. The fundamental goal is to use models to make precise forecasts. Predicting glucose levels in the early stages can help diabetic patients maintain their glucose levels by taking proper medication, by regulating physical activities and by modifying the diet and life style. In recent years ML algorithm has become a supporting tool in the medical domain. ML automates the prediction of diabetes [6]. With the application of cutting
C. Kalpana (B) · B. Booba Department of CSE VISTAS, Vel’s University, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_33
445
446
C. Kalpana and B. Booba
edge technology it can help extract hidden patterns. ML uses various algorithms to study the parsed data and to predict the disease [7] The remaining work is structured as: Sect. 2 illustrates the suggested methodology and Segment 3 describes the result simulation trailed by the conclusion work in Sect. 4.
2 Literature Review Several researchers employed ML algorithms to foresee diabetes by exploring Pima Indian diabetes dataset (PIDD). The existing dataset has 9 parameters and 768 data of patients who were females and some work associated with these have been reviewed in the section [8, 9]. In [10] researchers predicted the types of diabetes, associated problems and the relevant medication to be prescribed to the patients. For prediction and to give appropriate treatment predictive analysis algorithm and Hadoop, map-reduce were deployed. A huge data set was collected from various laboratories, PHR, EHR, clinics processed in Hadoop. Based on the geographical locations the results were then distributed over various servers. Jiang Zheng and Aldo Dagnino [11] offered a brief assessment of Literature on big data analytics. The aim of the author was to implement, ML methods on the industrial power system to predict the faults and power load. In [12] Naive Bayes algorithm a health care prediction model was presented. The proposed methodology discovered hidden patterns related to various diseases from the disease database. This method permits the user to share healthrelated issues and then predict the appropriate disease by applying naive Bayes. The author streamlined the ML in [13] for better heart disease prediction. A new convolutions neural network model was proposed for the disease prediction algorithm. Realtime data was collected from the hospital for performance evaluation. The experiment was conducted with cerebral infraction a chronic disease. The experimental results showed the structures Naive Bayes performed better. The concept study proof was presented in [14] Simi et al. investigated the significance of timely finding of infertility in females [15] For the study author used 26 variables and classes of female infertility. Results were compared with other techniques and finally concluded that Random forest performed better with better accuracy of 85%. Lafta et al. proposed an intelligent recommender’s method to assist the practitioners and patients about the short-term risk assessment of heart failure [16]. Scholars proposed a heart disease prediction method, recommending the patient to understand the importance of taking the test and consulting a doctor. The proposed method used a real-time dataset. The evaluation performance analysis is done by using different quantification methods.
33 Framework for Diabetes Prediction Using Machine Learning Techniques ...
447
3 Methodology 3.1 Data, Feature Selection and Software Tools The UCI In this work, has facilitated in the downloading of PIMA Indian diabetes (PID) dataset from the national institute of diabetes and digestive and kidney diseases (NIDDIC), which contains details of female patients whose age is greater than 21. The data set contains information of about 9 attributes and 768 patients. Figure 1 Exhibits the Framework of ML Techniques. Table 1 displays the representation of the parameters of the existing dataset. The 9 parameters used for predicting diabetes are BMI, Insulin, pregnancy, Diabetes pedigree function, Skin Thickness, BP, Age and target. The outcome variable is considered as the target variable and the rest of the traits are considered as independent variables. The data parameter is composed of Binary values 0 and 1 where 0 represents the patient is not diabetic and 1 means diabetic.
Fig. 1 Framework of ML techniques
Table 1 Details of the dataset S. Pregnancies Glucose Blood Skin Insulin BMI Diabetes Age Outcome no pressure thickness PedigreeFunction 0
6
148
72
35
0
33.6 0.627
50
1
1
1
85
66
29
0
26.6 0.351
31
0
2
8
183
64
0
0
23.3 0.672
32
1
3
1
89
66
23
94
28.1 0.167
21
0
4
0
137
40
35
168
43.1 2.288
33
1
448
C. Kalpana and B. Booba
Table 2 Statistical summary of the data Attributes
Count
Mean
Std
Min
Max
Pregnancies
768
3.845052
3.369578
0
17
Glucose
768
121.6816
30.43602
44
199
Blood pressure
768
72.25481
12.11593
24
122
Skin thickness
768
26.60648
9.631241
7
99
Insulin
768
118.6602
93.08036
14
846
BMI
768
32.45081
6.875374
18.2
67.1
Diabetes Pedigree Function
768
0.471876
0.331329
0.078
2.42
Age
768
33.24089
11.76023
21
81
Outcome
768
0.348958
0.476951
0
1
In our study we have used ML algorithms, to predict if a patient is diabetic or nondiabetic. People of an obese disposition are likely to develop diabetes type 2. Table 2. represents the average BMI is 32 for the 768 patterns. We use python an open-source ML technique and software tools for the performance analysis for the diabetes datasets. Python consists of tools for data preprocessing, classification, clustering, feature selection, and visualization. Different ML techniques are deployed in the python Jupiter note work and for coding python programming language were used.
3.2 Data Preprocessing Data preprocessing helps to develop a better ML model to provide high accuracy. In preprocessing different functions are used for filling missing values, outline rejection, features selection, data normalization to increase the credibility of data. In the dataset 500 were tabulated as non-diabetic then 268 samples are categorized as diabetic.
3.2.1
Missing Value Identification
Using python we identified the missing values of the dataset in the table represented in Table 2. The absent value was replaced with the matching mean value. Using Jupiter notebook the dataset was filtered for tracing the outlines and extreme value based on interquartile range. There were 699 occurrences after eradicating those outliers and the farthest ranges from the dataset.
33 Framework for Diabetes Prediction Using Machine Learning Techniques ...
3.2.2
449
Feature Selection by BPSO
PSO was introduced and it has been successfully deployed using optimization problems (optimizing nonlinear) continuous functions [15]. The application of PSO to discrete complexities, on the other hand, necessitates a period of model equation adaptation. Eberhart and Kennedy (1997) identified the PSO’s distinct binary vision based on the situation. The BPSO is different compared to classical PSO based on two characteristics. The i represents the Binary vector particle of the position. Xi − (Xi1 , Xi2 . . . Xie ), Xij ∈ {0, 1}
(1)
• The velocity of a particle can be defined in relations of probabilities, bit changes to 1. Based on the explanation a velocity must be narrowed (0, 1) by employing the succeeding sigmoid function. Sig Vrij =
1 t
1 + e−vij
(2)
Novel point is achieved in the calculation below; Y tij = {1 if rig < Sig Vij t Else 0
(3)
where rij is considered as a Uniform arbitrary value in the range [0, 1]. Vmax is applied and often when the set is at 4 Vijt ∈ (−4, 4) to stop Sig(Vtij ) approaching 0 or 1 which is considered as the constant. The attribute selection problem entails determining a Boolean assignment that is a proportionate variable that quantifies the number of clauses that can be satisfied simultaneously. BPSO Pseudo – code. Initialize a. b. c.
Designate the dimension of the swarm (np) and randomly position (X1 ), the velocity (vi ) and finest spot as (pi ) of all particles. Estimate the particle to select in the swarm paramount spot (GBEST ) Define the coefficients of Inertia and desirability (Co, C1, C2)
If Supreme quantity of iterations couldn’t accomplished or G(BEST) varies from the clause(m) In place of all element i = 1 to Np do a b c
.alter the velocity .Compute the sigmoid function (2) .Shift to the novel location with (2)
450
C. Kalpana and B. Booba
Table 3 SVM confusion matrix
d
A
B
A Tested Negative
500
0
B Tested Postive
268
0
.Calculate the act spending its recent location Xi t and recount with the location • finest particular act: if F(pi ) and pi = xit • Swarm best performance: if F(Xti ) > F(GBEST) then (GBEST ) = Xti
3.3 Application of ML Techniques In the research study detail trainings are conducted on PIDD by applying different ML classification methods like ML Classification.
3.3.1
Support Vector Machine (SVM)
SVM is a unique supervised ML techniques applied on classification statement. The goal of a SVM is to identify the improved uppermost margin by dividing the hyper plane among the two classes in the existing two class training model. The hyper plane must not fall close towards the data point that belongs to the new class for best generalization. From each category the hyper plane must be designated that is far away from the data arguments. The arguments that fall closer to the boundary of the classifier can be considered as support vectors. By using python the accuracy of the experiment is evaluated. SVM achieves the optimal by splitting hyper plane by increasing the space among the twofold assessment boundaries. We can increase the distance among the hyper plane can be well-defined mathematically wT x + a = −1 then the hyper plane determined by wT 2 2 x + b = 1. The interval is equivalent to ||w|| which defines that we want max ||w|| and min / and min ||w|| equally to solve it. 2 SVM should be able to categorizes entire x (i); where yi (wT xi + b) > = 1, ∀i ∈{1, ¢¢,N}. The evaluation of SVM algorithm of predicting diabetes by confusion Matrix is as follows in Table 3.
3.3.2
Naive Bayes Classifier (NB)
It is the ML taxonomy method that defines that entire features are independent and dissimilar respectively. NB describes the status in which the particular feature in a class will not disturb another feature’s position. This algorithm is built on restrictive probability. NB observed to be a potent algorithm applied for purpose of grouping. It functions better on behalf of data that have missing values and imbalance issues.
33 Framework for Diabetes Prediction Using Machine Learning Techniques ... Table 4 Naive Bayes classification confusion matrix
A
451 B
A Tested Negative
422
78
B Tested Positive
105
163
Naïve Bayes (24) is a supervised ML classifier that applies Bayes Theorem, via statement following probability P(D/Y) computed from P(D) 1 P(Y) and P(Y/D) (23). Therefore P (D/Y = (P(Y/D) P(D))/P(Y). Where P(D/Y) = Earlier probability class. P(Y/D) = Probability predictors class. P(D) = Probability of class D’s. P(Y) = Prior probability of predictor. The performance evaluation of Naive Bayes classification using confusion Matrix is as trails Table 4.
3.3.3
XGBoost Algorithm
XGBoost recently gaining importance. It is a supervised machine learning technique. It can be termed stochastic gradient boosting, gradient boosting, multiple regression trees. It is an ensemble method that controls the error generated by the existing model. Extreme Gradient boosting was recommended by Chan and Guestrin [17] in 2016. It was considered as an advanced estimator in classification and regression due to ultra-high performance. XGBoost avoids over fitting by utilizing the loss function. LK (F(xi )) =
n
k
(yi , Fk (xi ) +
(i=1)
fk
(4)
(k=1)
Where Fk (xi) is considered as the prediction at the kth boost on model i. The difference between prediction and actual label can be measured using the loss function. The regularization. (fk) equation is as follows. (f) = γ T +
1 λ ω2 2
(5)
452
C. Kalpana and B. Booba
Fig. 2 Confusion matrix of XGBoost
In regularization the complexity of leaves can be denoted γ. Number of leaves can be designated by T. λ represents the penalty parameter and output of leaf node as ω2 . XGBoost is different from GBDT because it perceives objective function as second order Taylor series the Eq. (4) can be altered as. LK ∼ =
n 1 2 gi fk (xi ) + hi fk (xi ) + (fx) 2 (i=1)
(6)
Where gi and hi represents the gradient statistics on the loss function of first order and second order, Ij designates the sample set of leaf j. The converted Eq. (6) are as follows. ⎤ ⎡ ⎤⎡ T 1 ⎣ LK = gi ⎦⎣ωj + hi + λ⎦ + γ T 2 i∈I (j=1) i∈I j
(7)
j
In inference, the objective function is changed into the purpose problem of the minimum of a quadratic function [18]. In addition, to tackle over-fitting problem XGBoost surveys the idea of GBDT and practices learning rate, maximum tree depth, boosting numbers, and subsampling. Figure 2. Demonstrates the confusion matrix of XGBoost.
33 Framework for Diabetes Prediction Using Machine Learning Techniques ...
453
4 Result and Discussion Three ML classifiers SVM, Naïve Bayes and XGBooster were applied and experimented in this study by applying on PIMA dataset. Then dataset was split into two set, 70% as training set and 30% as testing set. The important evaluation parameter that used in this study was accurate prediction. Equation 8 defines the accuracy. Overall success of the algorithm was gaining accuracy. The ML accuracy may be computed through confusion Matrix. The Table (5) illustrates the confusion Matrix. Predicted No (0)
Predicted yes (1)
Predicted (0) TN
FP
Predicted(1) FN
TP
FP represents false positive, FN = False Negative, TP – True Positive. TN – True Negative. Equation (4)–(7) used to compute performance evaluation of ML. Accuracy = Recall =
(8)
(TP) (TP + FP)
Precision = F − Measure =
(TP + TN) (P + N)
(9)
(TP) (TP + FP)
(10)
2 × (PrecisionXRecall) (Precision + Recall)
(11)
The Confusion Matrix of SVM, Naive Bayes classifiers for cross-validation, and trains split are denoted in Table. An above 70% accuracy can be observed in all classification methods. However, XGBoost is giving better accuracy. The naive Bayes algorithm can predict the occurrence of diabetes with better accuracy of 82% when compared with the different algorithms. Figure 3. Demonstrates the significant features of the dataset. It is detected that plasma Glucose has more significance among other features. BMI, age is considered as second and third vital features. It can be précised that these parameters play a crucial role to forecast whether a patient has diabetes or not. Figure (4) represents Auroc Plot of XGBoost. Table 5 Confusion Matrix predicted by ML algorithms Algorithm
TP
FN
FP
TN
SVM
37
16
37
141
Naive Bayes
52
33
28
118
XGBooster
42
17
34
138
454
C. Kalpana and B. Booba
Fig. 3 Importance of feature selection
Fig. 4 Auroc Plot of XGBoost
5 Conclusion An accurate early detection of diabetes has always been a real world in this research work system was modeled in a systematic manner to predict the diabetes disease. Three ML algorithms were applied and evaluated on different measures. The computer results determining the capability of the designed system were able to achieve an accuracy of 82% using XGBoost model. The research work can be improved and extended for automating diabetes analyzing by incorporating other ML techniques. In this paper we have proposed a frame work of PSO for parameter
33 Framework for Diabetes Prediction Using Machine Learning Techniques ...
455
selection process. Features selection is done to eliminate irrelevant and noisy feature that creates negative effect on the classification. We have detected the key features that causes diabetes. Application of proposed methodology on dataset achieved promising results. Highest classification accuracy was obtained by employing a few significant features. A future study can be carried out by developing a model by deploying firefly algorithm for feature selection and to develop an algorithm for firefly based parallel distributed process that permits all fire flies to be processed simultaneously which results in reducing the processing time.
References 1. Aiello EM, Lisanti G, Magni L, Musci M, Toffanin C (2020) Therapy-driven deep glucose forecasting, Eng Appl Artif Intell 87:103255 2. Jia P, Zhao P, Chen J, Zhang M (2019) Evaluation of clinical decision support systems for diabetes care: an overview of current evidence. J Eval Clin Pract 25(1):66–77 3. Hosni M, Abnane I, Idri A, de Gea JMC, Alemán JLF (2019) Reviewing ensemble classification methods in breast cancer. Comput Methods Program Biomed 4. Joslin EP, Kahn CR, Joslin’S Diabetes Mellitus (2005). Edited by Rronald Kahn C, et al Lippincott Williams & Wilkins 5. Georga EI et al (2013) Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression. IEEE J Biomed Health Inform 17(1):71–81 6. Kumar NMS, Eswari T, Sampath P, Lavanya S (2015) Predictive methodology for diabetic data analysis in big data. Procedia Comput Sci 50:203–208 7. Zheng J, Dagnino A (2014) An initial study of predictive machine learning analytics on large volumes of historical data for power system applications. In: 2014 IEEE international conference on big data (Big Data), pp 952–959 8. International Journal of Advanced Computer and Mathematical Sciences (2010). Bi Publication-Bio IT Journals 9. Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) “Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869–8879 10. Taylor RA et al (2016) Prediction of in-hospital mortality in emergency department patients with sepsis: a local big datadriven, machine learning approach. Acad Emerg Med 23(3):269–278 11. Lafta R, Zhang J, Tao X, Li Y, Tseng VS (2015) An intelligent recommender system based on short-term risk prediction for heart disease patients. In: 2015 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), pp 102–105 12. Chaurasiya RK, Khan MI, Karanjgaokar D, Prasanna BK (2020) BPSO-based feature selection for precise class labeling of diabetic retinopathy images. In: Venkata Rao R, Taler J (eds) Advanced engineering optimization through intelligent techniques. advances in intelligent systems and computing, vol 949. Springer, Singapore (2020). https://doi.org/10.1007/978981-13-8196-6_24 13. Khanesar MA, Teshnehlab M, Shoorehdeli MA (2007) A novel binary particle swarm optimization. In: 2007 mediterranean conference on control & automation, pp 1- 6. IEEE 14. Sarwar MA, et al (2018) Prediction of diabetes using machine learning algorithms in healthcare. In: 2018 proceedings of the 24th international conference on automation & computing 15. Unnikrishnan R, Anjana RM, Mohan V (2016) Diabetes mellitus and its complications in India. Nat Rev Endocrinol 12(6):357 16. Babu GR et al (2018) Association of obesity with hypertension and type 2 diabetes mellitus in India: a meta-analysis of observational studies. World J Diab 9(1):40
456
C. Kalpana and B. Booba
17. Nemade DR, Gupta RK (2020) IEEE Xplore (2020) 18. Saiti K, Macas M, et al (2020) Ensemble methods in combination with compartment models for blood glucose level prediction in type 1 diabetes mellitus. Comput Methods Programs Biomed 196:105628
Chapter 34
Statistical Post-processing Approaches for OCR Texts Quoc-Dung Nguyen , Duc-Anh Le, Nguyet-Minh Phan, Nguyet-Thuan Phan, and Pavel Kromer
1 Introduction The low accuracy of Optical Character Recognition (OCR) systems causes negative impacts on the readability of OCR-generated texts as well as their readiness for practical use. Errors generated from an OCR process often come from limitations of text recognition methods together with poor quality, unusual fonts, or layouts of the source documents. The OCR-generated texts need to be post-processed to improve their quality. OCR post-processing can be applied as the last step of an OCR system or to the erroneous OCR texts. There are three possible ways to improve the quality of OCR texts: modifying input images, optimizing OCR system, and post-processing OCR texts. The third way is preferable as it does not depend on any specific OCR system. It aims to automatically detect and correct kinds of errors in the OCR texts. Automatic OCR post-processing Q.-D. Nguyen (B) Faculty of Engineering, Van Lang University, 69/68 Dang Thuy Tram Street, Ward 13, Binh Thanh District, Ho Chi Minh City, Vietnam e-mail: [email protected] D.-A. Le The Institute of Statistical Mathematics, Tokyo 101-8430, Japan N.-M. Phan Sai Gon University, 273 An Duong Vuong, Ward 3, District 5, Ho Chi Minh City, Vietnam e-mail: [email protected] N.-T. Phan University of Science, VNU-HCM, 227 Nguyen Van Cu, Ward 4, District 5, Ho Chi Minh City, Vietnam e-mail: [email protected] Q.-D. Nguyen · P. Kromer Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_34
457
458
Q.-D. Nguyen et al.
approaches can be categorized into two main types: corpus-based type and hybrid type. In the corpus-based approaches, n-gram dictionaries are created from large external corpora and resources and used for OCR error detection and correction [1, 2]. However, the performance of this approach type is limited due to the coverage of the constructed dictionaries. The reason is that such external resources only cover the specific genres of documents and are available for a period. The second type of approach is often a combination of different models and techniques among characterlevel and word-level n-gram language models [3–5], error channel models [3, 5], machine learning [3–6], optimization algorithm [6], statistical and neural machine translation [7–9]. Our recent statistical approach [10] employs the word-based n-gram language models and OCR error model based on the linguistic features including word similarity, word n-gram context frequency, and character edit probability learned from OCR training texts. Correction candidates are generated with character edits followed by the scoring and ranking process using a weighted fitness function of such features. In the other recent paper [11], we have applied the statistical n-gram language models and the candidate generation model that is relied on random character patterns under the control of the evolutionary Self-Organizing Migrating Algorithm (SOMA). In this paper, we present the important linguistic features which have been successfully employed in the above OCR post-processing approaches regarding their meaning and originality in language modeling. These linguistic features include word frequency (unigram frequency), word n-gram context frequency, word similarity, and character edit probability. While the word frequency and word n-gram context frequency are computed based on the n-gram statistics from an external text corpus, the character edit probability utilizes the statistics of correction character patterns extracted from training OCR texts. Our OCR post-processing models using these linguistic features are depicted as well. Our models are evaluated on the benchmark databases to see how well they perform in OCR error correction in comparison with the other approaches. The rest of the paper is organized as follows. In Sect. 2, the statistical linguistic features are presented and discussed in detail. Our statistical language models (SLM) are shown and evaluated on the benchmark databases in Sect. 3. Finally, Sect. 4 gives conclusions.
2 Statistical Linguistic Features In this section, we describe the linguistic features consisting of the corpus-based features employed in the language models and the OCR error features in the error models. We discuss the ways to come up with these features and their relations to the language modeling in literature.
34 Statistical Post-processing Approaches for OCR Texts
459
2.1 Corpus-Based Features N-gram dictionaries are often created from a text corpus in the reference language. They include a unigram dictionary for single words, a bigram dictionary for contexts of consecutive two words (called bigram contexts), a trigram dictionary for contexts of consecutive three words (called trigram contexts), etc. These dictionaries can be further extended using the Ground Truth (GT) texts in the training dataset. In OCR error detection, words that are not in the unigram dictionary are considered as nonword errors while the bigram and trigram dictionaries are used to detect errors that are real words but in the wrong context (called real-word errors). In general, an n-gram language model [12] is used to calculate the probability P(wn |w1n−1 ) of the word wn knowing its n-1 preceding words. In the bigram model, this probability becomes the conditional probability P(wn |wn−1 ) of the word wn following the word wn-1 . Similarly, in the trigram model, this probability is the conditional probability P(wn |wn−2 wn−1 ) of the word wn following the two preceding words wn-2 wn-1 . To simplify the calculation, we consider this probability is equivalent to the occurrence frequency of the n-gram context in the corresponding n-gram dictionary, such as the bigram or trigram dictionary. N-gram contexts have been applied in many OCR text post-processing solutions [1–6, 10, 11]. The n-gram language model is used to check the correctness of a correction candidate based on its n-gram contexts in the bigram and trigram dictionaries. The more frequently a correction candidate with the corresponding contexts appears in an n-gram dictionary, the better it has a chance of being selected to substitute for the error word. Assuming wci is one of the correction candidates of the error word we , the bigram context frequency of wci is defined as the sum (or product) of the frequencies of the left and right bigram contexts of wci and normalized as follows: f 2 wci−1 wci + f 2 (wci wci+1 ) B(wci ) = max wc ∈W ( f 2 wc i−1 wc i + f 2 (wc i wc i+1 )) i f 2 wci−1 wci . f 2 (wci wci+1 ) B(wci ) = max wc ∈W ( f 2 wc i−1 wc i . f 2 (wc i wc i+1 ))
(1a)
(1b)
i
where wci−1 and wci+1 are the preceding and following words of wci respectively, W is the set of correction candidates of we , and f 2 () is the bigram frequency. The choice of sum or product of the bigram frequencies depends on the model design. Similarly, the normalized trigram context frequency of wci is the sum (or product) of the frequencies of the trigram contexts of wci in reference to the maximum sum (or product) of the trigram context frequencies of a correction word in W. Let f 3 () be the trigram frequency.
460
Q.-D. Nguyen et al.
f 3 wci−2 w ci−1 wci + f 3 wci−1 wci wci+1 + f 3 (wci wci+1 wci+2 ) T (wci ) = max wc ∈W ( f 3 wc i−2 wc i−1 wc i + f 3 wc i−1 wc i wc i+1 + f 3 (wc i wc i+1 wc i+2 )) i
(2a) f 3 wci−2 w ci−1 wci . f 3 wci−1 wci wci+1 . f 3 (wci wci+1 wci+2 ) T (wci ) = max wc ∈W ( f 3 wc i−2 wc i−1 wc i . f 3 wc i−1 wc i wc i+1 . f 3 (wc i wc i+1 wc i+2 ))
i
(2b) Especially, the normalized unigram frequency of wci is not dependent on any context and simply defined as below. Let f 1 () be the unigram frequency. f 1 wci U (wci ) = max wc ∈W f 1 wc i
(3)
i
2.2 OCR Error Features OCR error model is used to generate correction candidates through character edit operations such as substitution, insertion, and deletion. Character edits can be learned from the training text dataset, where each text consists of an OCR text and a corresponding GT text aligned at the character level. According to the studies [13, 14] on the statistics of OCR errors, it is indicated that most OCR errors have edit distances 1 and 2. Given a character edit s : x → y, the error character pattern x in the OCR error word is chosen of 1 or 2 characters long, and the correction character pattern y in the corresponding GT correct word also has 1 or 2 characters in length. We can see that each error character pattern x in the OCR text can correspond to one or more correction character patterns y found in the GT text. A character edit table S includes the character edits s and their occurrence frequency f(s) in the training dataset. The OCR error model determines the probability that a GT correct word becomes an OCR error word caused by the OCR process, which is also the conditional probability of the error word we given the correct word wc , P(we |wc ). Equivalently, the conditional probability of the correct word wc given the error word we is computed as the product of the frequencies of the character edits si ∈ Swe →wc that transform the error word we to the correct word wc . f (si ) (4) P(wc |we ) = si ∈Swe →wc
where Swe →wc is the set of character edits needed to transform we to wc .
34 Statistical Post-processing Approaches for OCR Texts
461
Assuming there is more than one correction candidate wci for the error word we , the normalized character edit probability to transform we to wc , denoted as P(wc |we ), is calculated as: P(wc |we ) =
P(wc |we ) max i (P(wci |we ))
(5)
In addition, the similarity between the correction word and the error word suggested by [1] and applied in [4, 5, 10, 11] is another important feature used in the paper. The word similarity, denoted as S(wc , we ), can be viewed as a comparison between the word pair wc and we to find the length of the longest sequence of similar characters between two words and is normalized based on the total length of two words. The closer the value S(wc , we ) is to 1, the more similar these two words are. On the contrary, S(wc , we ) also shows the degree of difference between the two words. By aligning two words according to the character level, dissimilar pairs of character strings in the corresponding positions between the two words can be indicated. Finally, the correction candidate wc of the error word we is ranked based on the fitness function Fscor e (wc ), which is a weighted sum of the above linguistic features U (wc ), B(wc ), T (wc ), P(wc |we ) and S(wc , we ). Since the weights sum up to 1, the fitness score of the correction candidate is in the range [0, 1] and considered as the confidence or probability of substituting the error word with the correction word. The closer the score of a correction candidate is to 1, the more confident the correction candidate is used to substitute for the error word. The correction candidate with the highest fitness score is selected as the final correction word of we .
3 Datasets and Statistical Language Models In this section, we introduce two databases established as the benchmarks for the text recognition and correction competitions, one for Vietnamese texts [15] and one for English/French texts [16]. Then, our two OCR post-processing models using the statistical linguistic features above are presented and evaluated on these two databases.
3.1 Databases VNOnDB Database. The VNOnDB database [17] was used in the Vietnamese online handwriting recognition competition1 [15]. The database contains handwritten texts written in different styles by 200 Vietnamese. Three datasets at paragraph, 1
https://sites.google.com/view/icfhr2018-vohtr-vnondb/home.
462
Q.-D. Nguyen et al.
Fig. 1 The four processing phases of our statistical hybrid model
line, and word levels are provided with the ink data and ground truth (GT). In our experiments, the OCR output texts are generated from an OCR system based on the attention-based encoder-decoder model (AED) [18] with the DenseNet encoder and attention-based long short-term memory (LSTM) decoder. The online handwritten texts in the VNOnDB-Line dataset are converted to the images which are inputted into the AED model to generate the OCR texts. The GT texts and corresponding OCR texts are aligned at the character level. We use the GT-OCR texts from the training and validation sets (5,662) for training purpose and the texts from the test set (1,634) for our model evaluation. ICDAR Database. The ICDAR database was proposed for the OCR post-correction competition2 of the ICDAR 2017 conference [16]. It comes from different digital sources of the British Library (BL), the French National Library (BnF), Europeana Newspapers, Wikisource, etc. The database contains Gold Standard (GS) texts and corresponding OCR texts aligned at the character level. The English monograph dataset with 666 training documents and 81 evaluation documents is employed for the experiments. The training documents are used to learn OCR error features by extracting correction character patterns, and the evaluation documents are used to evaluate our proposed approach.
3.2 Statistical Language Models Proposed Model on the VNOnDB Database. In [10], we propose a hybrid model of detecting and correcting Vietnamese OCR errors based on the linguistic features and the OCR error characteristic. The proposed model includes four processing phases consisting of tokenization, non-syllable and real-syllable error detection, edit-based candidate generation, and error correction (see Fig. 1). The model is summarized as follows. We first construct the word n-gram dictionaries along with their occurrence frequencies using the VietTreeBank (VTB) corpus [19]. The OCR text is split into syllables by space in the tokenization phase. The punctuations at the end of syllables are also removed. Next, the OCR text is scanned to detect errors of non-syllable and real-syllable kinds. In the candidate generation phase, the character edit table (CET) 2
https://sites.google.com/view/icdar2017-postcorrectionocr.
34 Statistical Post-processing Approaches for OCR Texts
463
is created by aligning the GT and OCR texts to find the dissimilar character patterns of one or two characters long. In the CET table, each error character pattern from the OCR text is transformed to correction character patterns learned from the aligned GT text. Their observed frequency is also taken into account. Then, candidates are generated by substituting error character patterns in the error syllables by correction character patterns obtained from the CET table. If a candidate or its contexts appear in any of the n-gram dictionaries, it is considered as a correction candidate; otherwise, it is eliminated. Finally, the correction candidates are ranked using the weighted fitness function of the linguistic features and OCR error characteristics including bigram and trigram context frequency, syllable similarity, and character edit probability. The highest-scored candidates are selected to substitute for the error syllables. Table 1 shows the text recognition and correction results of the proposed models on the VNOnDB-Line dataset. The evaluation metrics are based on the character error rate (CER) and word error rate (WER) [15]. They are the inverse metrics, meaning that a lower error value indicates a better model. Most of the approaches in Table 1 include the post-processing step that follows the text recognition system. For example, the Google team makes use of n-gram language models at both character level and word level on their private corpus. For the IVTOV system, the dictionary constraints are applied to the bidirectional LSTM (BLSTM) network’s output sequence. The MyScript system is post-processed with a syllable-level trigram language model trained on several corpora including VTB. The MyScript system achieves the best performance on both the CER and WER metrics (1.02 and 2.02% respectively). Our proposed SLM model with 4.17% of CER and 9.82% of WER outperforms the other models of Google, IVTOV, and AED [18, 20], except for the MyScript system. Furthermore, our model helps improve the baseline AED model with the DenseNet encoder [18] by 0.5% CER and 3.5% WER. It also has a better WER rate than the Table 1 Post-processing performance on the VNOnDB-Line dataset [10] Model
Language model
Corpus
CER (%)
WER (%)
AED model with CNN-BLSTM encoder [20]
None
None
7.17
NA*
Google Task
Character and word n-gram
Other
6.86
19.00
IVTOV Task
Dictionary
VTB
3.24
14.11
AED model with DenseNet encoder [18]
None
None
4.67
13.33
BLSTM-based NMT model Syllable-based unigram [21] and NMT
VTB
NA
11.5
MyScript Task 1
Syllable-based trigram
VTB
1.02
2.02
MyScript Task 2
Syllable-based trigram
VTB + others
1.57
4.02
Proposed SLM model
Syllable-based n-gram and character edits
VTB
4.17
9.82
* Not available
464
Q.-D. Nguyen et al.
recent BLSTM-based neural machine translation (NMT) model [21] on the same VNOnDB-Line based OCR text dataset. For candidate suggestion examples of the correct and incorrect cases, the readers can refer to our previous work [10]. Proposed Model on the ICDAR Database. In [11], we present a model of OCR post-processing based on random character patterns under the control of the adapted SOMA evolutionary algorithm. The proposed model consists of four main phases shown in Fig. 2. These phases are briefly described as follows. The English corpus of billion words [22] is utilized to construct the word n-gram dictionaries. In the first phase, the OCR text is separated by space into tokens (words). The original OCR tokens are preserved when the tokenization is done without any restriction on punctuation. Then, by verifying if the tokens in the OCR text exist in the word unigram vocabulary, the errors such as misspellings, run-on errors, and unexpected errors are detected. The candidate generation includes two parts: candidate search and feature scoring. A table of correction patterns is constructed, where the correction patterns are directly learned from the training dataset. They involve all types of character edit operations like substitution, insertion, and deletion. The candidate search process is performed under the SOMA migration loops using random correction patterns to generate correction candidates. For the detailed descriptions of the candidate search process and the adapted SOMA algorithm, the readers can refer to [11]. Next, in each migration loop, the candidates are selected due to their weighted fitness scores using the five linguistic features including word frequency, bigram and trigram context frequency, word similarity, and substitution probability. Finally, the top-scored candidates at the last migration loop are used to replace the detected errors in the error correction phase. Table 2 depicts the OCR error detection and correction results of the different approaches on the English monograph texts. The error detection performance is based on the F1 metric [16]. This metric is computed as the harmonic mean of precision and recall. The correction performance is determined by the improvement percentage [16] when comparing the Levenshtein distance between the corrected text and GS text with that between the OCR text and the GS text. In Table 2, we only consider the approaches obtaining the improved performance on the OCR error correction task. They include statistical language models and error models (Anavec, EFP, WFSTPostOCR [16]), evolutionary and optimization algorithms (SLM-SOMA [11], SLMHC [24]), machine learning (Modified-prob.SLM [5]), statistical machine translation (MMDT [8]) and neural machine translation (CLAM [16], Char-SMT/NMT
Fig. 2 The four processing phases of our SLM-SOMA model
34 Statistical Post-processing Approaches for OCR Texts Table 2 OCR post-processing performance on the English monograph dataset [11]
465
Model
Detection F1 (%)
Correction improvement (%)
Anavec
NA*
5
EFP
69
13
MMDT [8]
66
20
WFST-PostOCR
73
28
CLAM
67
29
Modifed-prob.SLM [5]
NA
30.2
NMT-BERT [23]
72
36
Char-SMT/NMT [9]
67
43
SLM-HC [24]
69
33.7**
Proposed SLM-SOMA
69
33.7
* Not available. ** Rounded to one decimal place.
[9], NMT with the bidirectional encoder representations from transformers (BERT) [23]). For the OCR error detection task, the best performer is WFST-PostOCR, which employs n-gram language models and probabilistic character error models to address and correct OCR errors. The Char-SMT/NMT approach, a character-based ensemble model, obtains the highest correction improvement. Our proposed SLMSOMA model achieves competitive detection performance and shows comparable correction results to those of the best performers. Besides, the SLM-SOMA model has a similar correction performance to our other SLM model based on the Hill Climbing (HC) optimization algorithm (called SLM-HC) [24]. Our two SLM models utilize the same five linguistic features for the fitness function to score and rank correction candidates. Nevertheless, while the SLM-SOMA model employs the evolutionary SOMA algorithm to direct the correction candidate exploration through the migration loops using random character edits, the correction generation process in the SLM-HC model selects random character positions along the error word and applies random character edits following the adapted HC algorithm steps. For further detailed analyses of the randomness and complexity of these SLM models, the readers can find them in the related papers [11, 24]. In general, the SLM models show to be a comparable approach to the OCR post-processing problem, and they could obtain promising and better performance in OCR error detection and correction when combined with the other methods like optimization algorithms [11, 24] and NMT [9, 23].
466
Q.-D. Nguyen et al.
4 Conclusions In this paper, we describe the statistical linguistic features including word frequency, bigram and trigram context frequency, word similarity, and character edit probability with regard to their meaning and relations to language modeling. The proposed SLM models based on these features perform well on the two benchmark databases. In future work, we would like to propose OCR correction approach based on optimization algorithms for the VNOnDB database. In addition, it is worth applying stateof-the-art deep learning techniques in sequence data processing like BERT [25] to improving error detection and correction performance. It is aimed to have a full comparison of the various types of OCR post-processing approaches to Vietnamese OCR texts.
References 1. Islam A, Inkpen D (2009) Real-word spelling correction using Google Web 1T n-gram data set. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1689–1692 2. Bassil Y, Alwani M (2012) OCR post-processing error correction algorithm using Google’s Online spelling suggestion. J Emerg Trends Comput Inf Sci 3(1):90–99 3. Kissos I, Dershowitz N (2016) OCR error correction using character correction and featurebased word classification. In: 12th IAPR workshop on document analysis systems (DAS), pp 198–203 4. Mei J, Islam A, Moh’d A, Wu Y, Milios E (2018) Statistical learning for OCR error correction. Inf Process Manag 54(6):874–887 5. Nguyen TTH, Coustaty M, Doucet A, Jatowt A, Nguyen NV (2018) Adaptive edit-distance and regression approach for Post-OCR text correction. In: Dobreva M, Hinze A, Žumer M (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science, vol 11279. Springer, Cham, pp 278–289 https://doi.org/10.1007/978-3-030-042578_29 6. Khirbat G (2017) OCR post-processing text correction using simulated annealing (OPTeCA). In: Proceedings of the Australasian language technology association workshop 2017, Brisbane, Australia, pp 119–123 7. Afli H, Qiu Z, Way A, Sheridan P (2016) Using SMT for OCR error correction of historical texts. In: Proceedings of the tenth international conference on language resources and evaluation, Paris, France. European Language Resources Association (ELRA), pp 962–966 8. Schulz S, Kuhn J (2017) Multi-modular domain-tailored OCR post-correction. In: Proceedings of the 2017 conference on empirical methods in natural language processing association for computational linguistics, Copenhagen, Denmark, pp 2716–2726 9. Amrhein C, Clematide S (2018) Supervised OCR error detection and correction using statistical and neural machine translation methods. J Lang Technol Comput Linguist 33(1):49–76 10. Nguyen DQ, Le AD, Zelinka I (2019) OCR error correction for unconstrained Vietnamese handwritten text. In: Proceedings of the tenth international symposium on information and communication technology. Association for Computing Machinery, New York, pp 132–138 11. Nguyen DQ, Le AD, Phan MN, Zelinka I (2020) OCR error correction using correction patterns and self-organizing migrating algorithm. J Pattern Anal Appl 24(2):701–721 12. Jurafsky D, Martin J (2008) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall (2008)
34 Statistical Post-processing Approaches for OCR Texts
467
13. Nguyen QD, Le DA, Phan NM, Zelinka I (2020) An in-depth analysis of OCR errors for unconstrained Vietnamese handwriting. In: Dang TK, Küng J, Takizawa M, Chung TM (eds) Future data and security engineering. FDSE 2020. Lecture Notes in Computer Science, vol 12466. Springer, Cham. https://doi.org/10.1007/978-3-030-63924-2_26 14. Nguyen HTT, Jatowt A, Coustaty M, Nguyen VN, Doucet A (2019) Deep statistical analysis of OCR errors for effective post-OCR processing. In: 2019 ACM/IEEE joint conference on digital libraries, Champaign, IL, USA, pp 29–38 15. Nguyen HT, Nguyen CT, Nakagawa M (2018) ICFHR 2018 - competition on Vietnamese online handwritten text recognition using HANDS-VNOnDB (VOHTR 2018). In: 16th international conference on frontiers in handwriting recognition (ICFHR), pp 494–499 16. Chiron G, Doucet A, Coustaty M, Moreux J (2017) ICDAR 2017 competition on Post-OCR text correction. In: 14th IAPR international conference on document analysis and recognition, Kyoto, Japan, vol 01, pp 1423–1428 17. Nguyen HT, Nguyen CT, Pham BT, Nakagawa M (2018) A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks. Pattern Recogn 78:291–306 18. Le AD, Nguyen HT, Nakagawa M (2020) An end-to-end recognition system for unconstrained Vietnamese handwriting. SN Comput Sci 1(7):18 19. Nguyen TP, Vu LX, Nguyen HTM, Nguyen HV, Le PH (2009) Building a large syntactically annotated corpus of Vietnamese. In: Proceedings of the 3rd linguistic annotation workshop ACL-IJCNLP 2009. Association for computational linguistics, Stroudsburg, pp 182–185 20. Le AD, Nguyen HT, Nakagawa M (2018) Recognizing unconstrained Vietnamese handwriting by attention based encoder decoder model. In: 2018 international conference on advanced computing and applications (ACOMP), pp 83–87 21. Nguyen DQ, Le AD, Phan MN, Kromer P, Zelinka I (2021) OCR error correction for vietnamese handwritten text using neural machine translation. In: The 1st international conference on Van Lang heritage and technology. AIP conference proceedings, vol 2406, pp 020022 22. Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2014) One billion word benchmark for measuring progress in statistical language modeling. In: INTERSPEECH 2014, 15th annual conference of the international speech communication association, Singapore, 14–18 September, pp 2635–2639 23. Nguyen HTT, Jatowt A, Nguyen VN, Coustaty M, Doucet A (2020) Neural machine translation with BERT for post-OCR error detection and correction. In: Proceedings of the ACM/IEEE joint conference on digital libraries in 2020 (JCDL 2020). Association for Computing Machinery, New York, pp 333–336 24. Pham DT, Nguyen DQ, Le AD, Phan MN, Kromer P (2021) Candidate word generation for OCR errors using optimization algorithm. In: The 1st international conference on Van Lang heritage and technology. AIP conference proceedings, vol 2406, pp 020028 25. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1, pp 4171–4186
Chapter 35
FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application M. M. Sravani, S. Ananiah Durai, M. Prathyusha Reddy, G. Sowjanya, and Nabihah Ahmad
1 Introduction Digital signature (DS) is preferred in wireless networks to enhance digital authorization [1]. One such authorization form is an electronic signature, employed in documented certificates such as driving license, PAN card, and passport [2]. DS is also increasingly preferred at end-user devices of any digital communication, popularly referred to as IoT devices, to defend from any malicious command run by the cryptanalyst [3]. Though the implementations of DS schemes have enormous advantageous in file sharing, chances of developing a replica is high causing significant data manipulation. To avoid this replication and to circumvent the threats security enhancement of digital signature algorithms is essential. Asymmetric and hash algorithms based DS are used widely in the past to authenticate the data in the public (open network) and private (secret or closed network) networks. The hash function (SHA-2) will primarily generate the hash digested value from user data [4]. Later, the ECC-160 algorithm will authenticate the hash digested value by applying the private key [5]. Finally, the certificate has been attached with the digital signature and transferred to an open network, as shown in Fig. 1. Here, the ECC-160 offers equal security strength as compared to 3024-bits key of the RSA algorithm, and it provides lightweight and fast processing features.
M. M. Sravani · M. Prathyusha Reddy · G. Sowjanya School of Electronics Engineering, VIT Chennai, Chennai, India S. A. Durai (B) Centre for Nanoelectronics and VLSI Design, VIT Chennai, Chennai, India e-mail: [email protected] N. Ahmad Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_35
469
470
M. M. Sravani et al.
Hash function
10110011010 Digitally signed data
Hash
Data
Encrypt hash using Signer’s private key
111101101110 Signature
111101101110 Certificate
Signature
Attach to data
Data
10110011010 Hash Digitally signed data
Decrypt hash using Signer’s public key
Hash function ?
10110011010 Hash
If the hashes are equal, the signature is valid
Fig. 1 Digital signature certification flow [9]
Additionally, the hash function provides special features for digital schemes such as authenticity, integrity, verification, and non-repudiation. MD-5, SHA-0, SHA-1, and SHA-2 are examples of such hashing schemes. Among them, MD5 and SHA0 have the least security levels, which can be represented in the form of 2(68) and 2(80) , respectively [6]. SHA-1 was the primary choice in DS over the past few years, however, as it provides the same digest value for two different inputs, exploring alternative scheme is required [7]. Further, low-security strength and pre-image collision attacks originated in SHA-1, forced to move for SHA-2 function. However due to similar core structure, SHA-2 also is prone to collision attack [8]. A novel hybrid Masked AE$HA-2 crypto-style that overcomes the above stated vulnerability is proposed in this work for enhancing the strength in terms of security and privacy. The proposed Masked AE$HA-2 has Masked AES encryption followed by the SHA-2 algorithm providing the required security features for DS application. The conventional AES encryption architecture is threatened by side-channel analysis; hence Masked AES algorithm is employed to shield from SCA attacks. Hidden key in this Masked AES makes key retrieval difficult, further, change in the message expansion block improves computational time in SHA-2 thereby adapting this scheme in to high-speed network is made feasible. Therefore it can be highlighted that the proposed Masked AE$HA-2 architecture not only secures the data from SCA but also enhances the speed of the architecture.
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
471
The problem statement has been explained in Sect. 2, followed by the discussion of the proposed novel hybrid hash function in Sect. 3. In Sect. 4, comparison of the computational time between the existing architecture and the proposed architecture is provided. Finally, the last section concludes the achievement with a brief note of the future scope.
2 The Existing Work Cryptographic algorithms are widely employed to secure VLSI devices that are core circuity in fields including electronic signatures, electronic commerce, and digital certificates. Side Channel Information (SCI) leaked from such advanced circuitry is the vulnerable component that reveals the characteristics of the algorithms. This section describes the prevalent side-channel attacks on crypto-algorithms and process steps of the conventional AES & SHA-2 algorithms with a footnote on reported respective SCA.
2.1 Side-Channel Attacks There are diverse attacks that can be performed on cryptographic devices to observe the changes in device behavior named passive and active attacks. Later, cryptanalyst introduces invasive, semi-invasive, and non-invasive attacks to detect the variations on device surface [8]. Among these, Side-Channel Attacks fall under the category of non-invasive passive attacks. Figure 2 shows various possible side-channel information that can reveal the characteristics of the VLSI device [8]. Such information is effectively utilized by the cryptanalyst for a successful intrusion. Here, the cryptanalyst will behave like an outsider to note all the behavioral changes on the VLSI device’ surface. Later, appropriate techniques are applied to read the secret information of the device. The study of a security breach that might successfully reveal Fig. 2 Side-Channel information source [10]
Frequency
Acoustic Cryptanalyst
M
E
Sound
Error Message
M
D Ka
Fault VLSI Device 1
Kb
EMA
Execution time
Power
VLSI Device 2
472
M. M. Sravani et al.
confidential data is inevitable in any crypto-system. Knowledge of attack frameworks and strategies enables the cryptographic engineer to strengthen the algorithm, ultimately improving device/system security.
2.2 The AES Algorithm National Institute of Standards and Technology (NIST) proposed an Advanced Encryption Standard (AES) for eliminating the large key size problem of the Data Encryption Standard (DES). It provides a smaller key of 128 bits which has equal strength as that of the Triple-DES architecture. The AES algorithm has 128 bits of block length and different key sizes. The key size determines the round numbers, hence if the chosen key sizes are say, 128, 192 & 256 bits, the allowed rounds are 10, 12, and 14 respectively. Consider an AES-128 algorithmic flow that has a 128-bit input block been converted into a 4 × 4 matrix (16 × 8 = 128 bits). The encryption process of AES-128 starts from the preparation of the key, as shown in Fig. 3. The key size for AES-128 is 128-bit hence ten rounds of operation will be performed. Each round operation generates a round key from one real key with the help of a key expansion block. For ten rounds, it will generate ten keys from one real key. Upon completion of the round key generation, encryption begins, the functional operation has several sub-process as explained below [11]; • Round Key: This process includes a simple XOR operation between the input data and a generated round key. Then, the resultant value of the round key is handled by a subsequent process. Fig. 3 Encryption process of AES Algorithm [11]
Plain Text
keys
Key Expansion
Round key Shift rows Round 1 to 9
SubBytes Mix Cloumns Round Key Shift rows SubBytes Cipher Text
Round 10
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
473
• Shift Rows: As per predefined offset values, each row in the state matrix has different shifting process [11]. Based on the offset values, a left cyclic shift operation should be done on each row of state matrix and the state matrix is updated. • SubBytes (Byte substitution): The byte substitution is purely a non-linear and independent mathematical function. In this process, the old state matrix is substituted with a new state matrix as per the predefined look-up table (S-box). Finally, the entire state matrix has been updated, which then proceed for further calculations. • Mix Columns: After completing the byte substitution process, a mathematical operation was processed on columns in the state matrix. This mathematical calculation includes a multiplication operation between the state matrix and the polynomial of degree 4 over G.F. (28 ) in a specified standard. The final resultant column values were placed in the state matrix and a new state matrix is formed. However, during the last round of operation, the ‘MixColumns’ is ignored for specific ordering of the bits. If say 128 bits is opted as an input key, in the 10th round of AES encryption the mix column operation is not performed. After the ten rounds of operation, the final AES encrypted value is generated. A Side-channel time attack can easily breach this conventional AES algorithm by physically accessing its functional characteristics during the cryptologic device operation [10]. Similarly, during the bitstream generation process in FPGA, a fault is induced to reveal the SCI of the crypto AES device. The cryptanalyst has utilized such SCI to retrieve the secret key out of AES in FPGA [12]. In [13] and [14], acoustic and E.M. analysis attacks were introduced in the crypto core hardware system to reveal the pattern of AES secret keys. So, design enhancement to protect the secret key by proposing a masking countermeasure for side-channel attacks is essential. This masking method combines the false key with the real key to confuse the attacker by hiding the real data. A detailed discussion about masking process on the AES algorithm is provided in Sect. 3.1.
2.3 The SHA-2 Architecture The secured hash function (SHA) is an iterative structure and also a one-way hash functioning process to produce a condensed representation called message output digested. SHA holds a unique feature to determine the different output message digests for distinct inputs immediately. This type of feature is helpful in the generation & verification of digital signatures, message authentication codes, and the generation of random numbers. Apart from different SHA versions, DS preferably adopts the SHA-2 algorithm after the weak collision reported on SHA-1. As per user requirements, the SHA-2 provides various output digested values such as SHA-224, SHA-256, SHA-384, and SHA-512. This paper chooses the SHA-2(256) architecture for core and data width that is compatible with AES-128 algorithms [4].
474
M. M. Sravani et al.
Kt ht
ROM(constants kt)
Initial Hash Values(ht)
Msg
ht
Padding
Kt
wt Compression function clk rst
Message expansion Round Addition
Pre-processing
Hash computation
SHA-256
Fig. 4 The block diagram of SHA-2(256)
Figure 4 represents the block diagram of SHA-2(256). It consists of 2-stages viz pre-processing and hash computation. The pre-processing stage includes initializing predefined hash and round constant values in memory register. The processing involves series of operation including padding, compression and round addition. The required input message length is prepared by the padder block. Further, the hash computation includes message expansion for preparing the 64 blocks of the input message and then proceeded with compression function operation. At last, a round addition step executes to finish the preparation of 256 bits output digested values. These preprocessing and hash computation operations are detailed as below; • Step 1: Initially, the 64 prime round constant values are stored in a separate register named K0 to K63. During the iteration of the compression function, regular fetching of a round constant value according to the corresponding stack address is done. • Step 2: Similarly, the pre-defined hash values were also stored in a register named h0 to h7, as shown in Table 1 [4]. These predefined hash values have been assigned Table 1 Initial hash values [4]
S.no
Register
1
h0
Hash values 32’h6a09e667
2
h1
32’hbb67ae85
3
h2
32’h3c6ef372
4
h3
32’ha54ff53a
5
h4
32’h510e527f
6
h5
32’h9b05688c
7
h6
32’h1f83d9ab
8
h7
32’h5be0cd19
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application Table 2 The computational time for AES algorithm
475
Function
Conventional
Proposed
Add Round Key
52.6 μs
3.809 ns
ShiftRows
4.04 μs
3.124 ns
SubBytes
26.58 μs
4.593 ns
MixCloumns
54.03 μs
4.129 ns
SbTrans
–
5.091 ns
Mixcol
–
4.129 ns
One round
137.25 μs
24.875 ns
Fig. 5 The message length of padder unit [4]
Message bits
Padding bits
64 bits of Message length (l)
512 bits
as initial hash computation to begin the compression function operation. Later, these stored pre-defined hash values are utilized during the modulo addition step to get the secured hash digested value (Table 2). • Step 3: In this padder preparation step, the input message should be prepared according to the specified message length, as shown in Fig. 5. Here, the padder unit will prepare the required input message as 512 bits by appending the necessary padded bits to obtain a maximum size as 264 . These 512 bits are later divided into individual blocks consisting of 32-bits each [4]. • Step 4: After completing the message preparation, the hash computation begins with the message scheduler operation by expanding these 16 blocks of the message into 64 blocks. These blocks are assigned as input to the compression function regularly during the 64 rounds of operation. The preparation of the remaining 48 blocks of the input message is calculated as per (1) and named as W16 to W63 . For all 0 ≤ t ≤ 15 For all 16 ≤ t ≤ 63
wt = messageinput wt = wt − 16 + σ0 + wt − 7 + σ1
(1)
Where, σ0 = r otr 7(wt − 15) ⊕ r otr 18(wt − 15) ⊕ shr 3(wt − 15), σ1 = r otr 7(wt − 2) ⊕ r otr 19((wt − 2) ⊕ shr 10(wt − 2)) • Step 5: As discussed in steps 1, 2, and 4, the input message and the round constants are given as input to the compression function and then, initialization of the predefined hash values into a, b, c, d, e, f, g, h for computing the first round operation as per Fig. 6 is carried out. These compression includes a primary subfunctions such as Ch(x,y,z), Maj(x,y,z), 0 (x) and 1 (x). Symbols in (2) denoted as ∧, ~ and ⊕ represent logical AND gate, NOT gate, and XOR gate respectively.
476
M. M. Sravani et al.
Fig. 6 The compression function of SHA-256 [4]
a
b
c
d
e
f
g
h
Wt Kt
Ch ∑1 Maj ∑0
a
b
c
d
e
f
g
h
h = g, g = f, f = e, e = d + temp2, d = c, c = b, b = a, a = temp1 + temp2 (2) where, templ = 0 + ma j, temp2 = h + s1 + che + kt + wt 0 = r ot2(a)r otr 13(a) + r otr 22(a), Ma j(a, b, c) = a ∧ b ⊕ a ∧ c ⊕ b∧ c , 1 (e) = r otr 6(e) + r otr 11(e) + r otr 25(e) Che(e, f, g) = e∧ f ⊕ ∼ e∧ g . After completing a round operation, the earlier generated hash computation values are updated with the new values so that the remaining 63 rounds operation is continued. This process will continue until 64th iteration, and then the final hash computation value as a, b, c, d, e, f, g and h are stored. • Step 6: During this step, the stored initial hash values have been fetched from the register named h0 to h7 and then performed a modulo addition operation with final hash values of ‘a’ to ‘h’ as represented below:
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
H0 = h 0 + a, H1 = h 1 + b, H2 = h 2 + c, H3 = h 3 + d, H4 = h 4 + e, H5 = h 5 + f, H6 = h 6 + g, H7 = h 7 + h.
477
(3)
• Step 7: Finally, the generated 32-bit block of the hash value are combined into a single 256-bit hash output digest value.
As S H A − 2(256) = H0 ||H1 ||H2 ||H3 ||H4 ||H5 ||H6 ||H7 ||
(4)
This existing SHA-2 architecture lags in terms of computational time during the expansion of the message from 16 to 64 blocks. A sliding window protocol was proposed in the place of a message scheduler to enhance the timing performance of SHA-2. It will help for fast processing the hash digested values along with the default security features. However, the SHA-2 may be prone to various types of attacks. So, the combination of conventional SHA-2 with the HMAC algorithm was proposed to prepare a secured bitcoin. But this combination was successfully breached by the scan-based time attack [15]. So, it’s time to think the combination algorithm with SHA-2 should be précised with high-security features, and also it would be a oneway form. Therefore, proposing an AES encryption algorithm with the SHA-2 hash function will enhance the security feature with less computational time.
3 Proposed Architecture The proposed work is a combination of AES-128 and SHA-2(256) architecture, and it is named a hybrid hash function as shown in Fig. 7. The process of the proposed architecture begins with AES algorithm being applied on the input messages to get the required encrypted output as 128 bits [16]. Then the encrypted value is passed on to the SHA-2 to get the required hash digested value. The final digested value has been double encrypted and hence it is more secured than the individually encrypted data. The conventional AES has a threat from side-channel attacks and permits the reveal of the secret key; the proposed combination style replaces the conventional with a masked AES algorithm to strengthen the security features. Further, the existing SHA-2 unnecessarily expands the input message from 16 to 64 blocks yielding to increases area and computational time. A sliding window protocol is proposed for minimizing the computational time for message expansion block in SHA-2. The detailed discussion about the proposed masked AES and SHA-2 architecture is provided in subsections.
478
M. M. Sravani et al.
Fig. 7 The flow chart of Masked AE$HA-2 hash function
Input data
K masked Key
AES
Cipher Text
SHA-2(256)
Hashed output
3.1 Proposed Masked AES Encryption Algorithms Previously, the secret key was revealed to the cryptanalyst through the correlation process on leakage information obtained through SCI and prediction. Therefore, a countermeasure is processed to resist such attacks by combining the encryption operation of the real key with a false key. This combination will eliminate the sidechannel information leakage and eliminates the successful correlation process on the leaked data. A simple XOR operation between the real key and masked data for preparing the false key is shown in Eq. (5). K f alse = K r eal ⊕ K mask
(5)
where K MASK is a value consisting of sixteen bytes, and ⊕ being the XOR operator. It is noted that (5) is satisfied for each of the 16 bytes that form K MASK , (i.e. K FALSE (p, q) = KREAL (p, q) ⊕ KMASK (p, q), p = 0 to3 and q = 0 to3). This Kfalse will hide the real key operation by doing separate parallel fake operations on ‘SbTrans’ and ‘mixCol’ to disguise it as a real operation to the attacker, as shown in Fig. 8. Here, masking scheme will hide the original data through ‘SbTrans’ and ‘MixCol’ operations as shown in below Eq. (6): S Box (a f ( p, q)) ⊕ S Box (a f ( p, q) ⊕ K mask ( p, q)) ⊕ m H
(6)
where af (p,q) is the plain text XOR-ed with mask key (given in (5)), and mH is used for converting the result of the operations as a measure against any SCA attack. The ‘mixcol’ and ‘mixcolumn’ block implements the same operation, but the data will differ. Subsequently, ‘Sbtrans’ and ‘Subbytes’ also executed similar functions
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
Kmask
Plain Text Round=1
Round=1 Duplicate operation
479
Round=1 Add round key
Shift rows
Sb Trans
subbytes
mixcol
mix columns
xor
M U X
Key expansion
Round=10 Key incr
Round=10 XoR
Cipher text
Re masking
Fig. 8 Proposed AES encryption architecture
for different data sets to confuse the attacker. Finally, this proposed masked AES architecture will hide the original data compared to conventional AES architecture. However, handling the different sets of keys in the Masked AES algorithm is a huge task. The MUX resource will resolve the issue by finding the suitable keys for respective rounds with the help of ‘Key incr’ signal. This ‘Key incr’ behaves as the ‘select’ signal and points to the algorithm’s next round operation until the 9th round. During the last round (10th ), the output of the ‘Subbytes’ and the previous key from the ‘Key expansion’ process is XOR-ed to obtain the final Ciphertext value. If any cryptanalyst will analyse this ciphertext, they may not retrieve any original form of data because of one-way encryption and thus preventing the decryption process to circumvent actual data exposure. Therefore, the proposed Masked AES architecture has enhanced security due masking of the original data hence prevents SCA attacks.
3.2 Proposed SHA-2 Architecture As a subsequent process of AES, the encrypted data is again hashed by SHA2. The proposed SHA-2 architecture minimizes the computational time by replacing the message expansion block with a sliding window protocol. This sliding window
480
M. M. Sravani et al.
wt0
wt1
wt2
wt3
wt12
wt13
wt14
wt15
wt1
wt2
wt3
wt4
wt13
wt14
wt15
wt16
wt2
wt3
wt4
wt5
wt14
wt15
wt16
wt17
wt48
wt49
wt50
wt51
wt59
wt60
wt61
wt63
Fig. 9 Sliding window protocol [17]
protocol calculates the values as per (2) and precisely applies shift operation to get the required block well before the compression function begins its. This protocol will save computational time by minimizing the waiting cycle and also it enhances the clock’s speed enabling fast processing. Primarily, the input message of 512 bits is expressed as sixteen 32-bit words, and represented as wt0 , wt1 , wt2 , wt3 ……wt15 . These 16 blocks are placed like a sliding window form, and then one left shift operation is done. In the place of w15 , a new prepared value is placed (as discussed previously) as shown in Fig. 9. Likewise, remaining values have been prepared, inserted, and then a left shift operation is done on all the bits until 64th blocks of message bits. This greatly improves the computational time non-repudiation, integrity and message authentication features. The Masked AES output value followed by the SHA-2 hash function thus forms a new hybrid hash functional output named Masked AE$HA-2 digest. This hybrid hash function builds double security and also enhances the computational time to prepare the double encryption for DS.
4 Results and Discussion The novel hybrid architectures have been implemented on Virtex 7 (xc7vx485tffg1157-1) device using Vivado 2018.2 version. The performance of this proposed hybrid hash function is evaluated in terms of computational time metric. The RTL schematic of the synthesized novel hybrid hash function is redrawn to show the overall process flow as shown in Fig. 10.
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
Plain_text (127:0)
Top_pipelined Cipher
SHA256_24bit_in msg_in(128:0)
Hashed (255:0) D
K_mask clk rst valid_in
data_valid_in clk
clk
SHA
rst
481
AES
rst
C valid_out
valid
Q
Hashed (255:0)
fdr
R
ready
D
Q
fdr
Q
D C
Valid_out
fdr
C R
SHA_inready
R
Valid_out
Fig. 10 RTL schematic of Masked AE$HA-2
Primarily, the Masked AES encryption algorithm implemented is compared with the standard algorithm for verifying its block wise performances in terms of computational time as shown in Table 2. Interestingly, the Masked AES algorithm performs well and achieves low computational time of 25 ns to complete one round operation compared to 137.25 μs for conventional AES. Therefore, the overall computational time of the proposed Masked AES encryption algorithm is only 280 ns to get the final encrypted 128 bits. Though the fake operation runs parallel to hide the original operation flow, the clock cycles of the masked AES remains unchanged during this encryption process. However, the addition of extra “SbTrans and mixcol” resources will increase the architecture area. Further, the conventional SHA-2 lags in the performance of computational time. But the proposed sliding window protocol in SHA-2 has increased the speed of computation for 48 blocks and also completes the hash computation process within 388 ns compared to the conventional design, as shown in Fig. 11. Finally, the combination of conventional AES & SHA-2 and the proposed AE$HA-2 have been implemented on V7 utilizing Xilinx Vivado tool, to evaluate the performances of computational time. Remarkably, the proposed Masked AE$HA-2 has produced the final 256-hash digested output in only 680 ns, which is almost half the computational time of the conventional one as shown in Fig. 11.
M. M. Sravani et al.
time(μs)
Fig. 11 Computational time of proposed algorithms
2000 Proposed 1750 Conventional 1500 1380 1250 388 1000 298 750 500 250 69.9 0
AES
SHA-2
680 1490
800 700 600 500 400 300
time(ns)
482
200
AE$HA-2
100 0
5 Conclusion and Future Scope The proposed novel Masked AE$HA-2 hash function provides a dual one-way encryption and also maintains three tire protection. This protection layer is achieved through schemes such as masking, AES encryption, and SHA-2 hashing, which yields low computational time of 680 ns for preparing this hybrid hash digested value. Also, it enhances the speed of communication in the digital signature application. However, the area might be marginally high, due to the masked pattern in AES and sliding window of SHA-2. In future, redesign of the architecture to improve the area that caters to the digital signature application will be done. Further implementation on the advanced target device such as Zedboard might improve the performance in terms of area, throughput, and efficiency.
References 1. Toubal A, Bengherbia B, Zmirli MO, Guessoum A (2020) FPGA implementation of a wireless sensor node with built-in security coprocessors for secured key exchange and data transfer. Measurement 153:107429 2. Zhu L, Zhu L (2012) Electronic signature based on digital signature and digital watermarking. In: 2012 5th international congress on image and signal processing, pp 1644–1647 3. Abdullah GM, Mehmood, Khan CBA (2018) Adoption of Lamport signature scheme to implement digital signatures in IoT. In 2018 international conference on computing, mathematics and engineering technologies (iCoMET), pp 1–4, 4. Suhaili SB, Watanabe T (2017) Design of high-throughput SHA-256 hash function based on FPGA. In 26th International conference on electrical engineering and informatics (ICEEI) 5. Lee YK, Saki Yama K, Batina L, Verbauwhede I (2008) Elliptic-curve-based security processor for RFID. IEEE Trans Comput Nov. 57(11):1514–1527 6. Barker E, Dang Q (2020) NIST special publication 800–57 part 1, revision 5. NIST, Technical report 7. Stevens M, Bursztein E, Karpman P, Albertini A, Markov Y (2017) The first collision for full SHA-1. In: Katz J, Shacham H (eds) Advances in cryptology – CRYPTO 2017. CRYPTO 2017. Lecture Notes in Computer Science, vol 10401. Springer, Cham, pp 570–596. https://doi.org/ 10.1007/978-3-319-63688-7_19
35 FPGA Implementation of Masked-AE$HA-2 for Digital Signature Application
483
8. Sravani MM, Ananiah Durai S (2019) Side-channel attacks on cryptographic devices and their countermeasures—a review. In Tiwari S, Trivedi M, Mishra K, Misra A, Kumar K (eds) Smart innovations in communication and computational sciences. Advances in Intelligent Systems and Computing, vol 851. Springer, Singapore, pp 209–226. https://doi.org/10.1007/978-98113-2414-7_21 9. Ambadiyil S, Vibhath VB, Mahadevan Pillai VP (2016) On paper digital signature (OPDS). In: Thampi S, Bandyopadhyay S, Krishnan S, Li KC, Mosin S, Ma M (eds) Advances in signal processing and intelligent recognition systems. Advances in Intelligent Systems and Computing, vol 425, pp. Springer, Cham. https://doi.org/10.1007/978-3-319-28658-7_46 10. Sravani MM, Durai SA (2021) Attacks on cryptosystems implemented via VLSI: a review. J Inf Secur Appl 60:102861. 11. Soliman SM, Magdy B, Abd El Ghany MA (2016) Efficient implementation of the AES algorithm for security applications. In: IEEE 2016 international system-on-chip conference (SOCC), pp 978–981 (2016) 12. Swierczynski P, Becker GT, Moradi A, Paar C (2018) Bitstream Fault Injections (BiFI)–automated fault attacks against SRAM-based FPGAs. IEEE Trans Comput 67(3):348–360 13. Al Faruque MA, Chhetri SR, Canedo A, Wan J (2016) Acoustic side-channel attacks on additive manufacturing systems. In: ACM/IEEE 7th international conference on cyber-physical systems (ICCPS), Vienna, pp 1–10 14. Gu K, Wu L, Li X, Zhang XM (2011) Design and implementation of an electromagnetic analysis system for smart cards. In: Seventh international conference on computational intelligence and security, Hainan, pp 653–656 15. Oku D, Yanagisawa M, Togawa N (2017) A robust scan-based side-channel attack method against HMACSHA- 256 circuits. In: IEEE 7th international conference on consumer electronics - Berlin (ICCE-Berlin), Berlin, pp 79–84 (2017) 16. Hamzah H, Ahmad N, Ruslan SH (2020) The 128-bit AES design by using FPGA. J Phys: Conf Ser 1529:1–7 (2020) 17. Forouzan AB (2007) Data communications & networking (SIE). Tata McGraw-Hill Education
Chapter 36
A Framework for Improving the Accuracy with Different Sampling Techniques for Detection of Malicious Insider Threat in Cloud G. Padmavathi, D. Shanmugapriya, and S. Asha
1 Introduction Storing of information and accessing of resources from anywhere is possible by anyone at any time by cloud. Many threats can attack the cloud, and one of the crucial threats is the malicious insider threat. A malicious insider is an individual who threatens to access confidential data and pretend to be a legitimate user within the organization. A malicious insider may cause data leakage leads to substantial financial and reputation loss. So, it is crucial to detect the malicious insider threat in an organization. Hence, a framework is proposed to detect a malicious insider threat. The real-world malicious insider data has been gathered from the US-Computer Emergency Response Team (CERT), which contains information regarding the malicious activity and non-malicious activity [1]. But non-malicious activity contains majority class instances, while malicious activity contains minority class instances. It is difficult to detect real malicious insider threats. The reason remains the same, which is possible when an instance of one class has maximum distribution than the other class instance. For example, in CERT data, the instance of non-malicious activity has a higher proportion than malicious activity. Whereas the instance of a non-malicious activity is considered as majority class, and the instance of malicious activity is regarded as a minority class. The classifier would consider the minority class as G. Padmavathi · S. Asha (B) Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore 641043, Tamilnadu, India e-mail: [email protected] G. Padmavathi e-mail: [email protected] D. Shanmugapriya Department of Information Technology, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore 641043, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_36
485
486
G. Padmavathi et al.
noise or outlier during the training, and misclassification occurs. In the classification technique, the Class Imbalance problem arises due to the inaccurate classification of the minority class [2]. It suppresses the performance of supervised classification algorithms. The class imbalance problem will arise two critical issues in the supervised classification algorithm. They are (i) Misassumption and misclassification due to unequal proportion of class instance and (ii) Inaccurate prediction of the minority class, which will suppress the performance of the supervised classification algorithm due to inaccurate prediction of the minority class. Therefore, many techniques have been used to improve the accuracy and minimize the inaccurate prediction of the minority class. This paper utilizes different oversampling and undersampling techniques to solve the class imbalance problem and enhance the accurate prediction of malicious insider threats. The sampled data is trained using an SVM classifier to evaluate the performance of different sampling techniques using accuracy, f-score, precision and recall. The entire paper is organized into four sections. Section 2 tabulates the literature study in different sampling techniques. Section 3 explains the methodology overview. Section 4 illustrates the result and discussion. Section 5 concludes the research and suggests possible scope for future enhancement.
2 Background Study The primary focus is to solve the class imbalance problem in imbalanced CERT data comprising malicious insider threats. The following Table 1 describes the work done in the area of various sampling techniques. Table 1 Study of various sampling techniques S. no
Author
Sampling techniques Classification algorithm
Observations
1
Gosain and Sardana (2017)
SMOTE, ADASYN, Borderline-SMOTE, safe level-SMOTE
Naïve bayes, SVM and Nearest Neighbor
Safe Level SMOTE performed better than other oversampling techniques based on f-measure and g-mean [3]
2
Dittman et al. (2014)
RUS, ROS, SMOTE
KNN, SVM
RUS classified better than other techniques in SVM and KNN based on AUC-curve [4] (continued)
36 A Framework for Improving the Accuracy …
487
Table 1 (continued) S. no
Author
Sampling techniques Classification algorithm
Observations
3
Junsomboon and Phienthrakul (2017)
Neighbor Cleaning Rule (NCL), SMOTE
Naïve Bayes, Sequential Minimal Optimization (SMO) and KNN
Combined NCL and SMOTE provided a better result than SMOTE, NCL and ordinary data in various classifiers based on recall measures [5]
4
Hasanin and Khoshgoftaar (2018)
RUS
Random Forest
The minority class between 0.1% to 1% true positive rate is outperformed than 10% and 100% of class balanced data [6]
5
Haibo He et al. (2008)
ADASYN and SMOTE
Decision tree
The ADASYN algorithm provided better accuracy than SMOTE [7]
6
Yap et al. (2014)
ROS, RUS, AdaBoost
Classification and Regression Tree (CART), C5 and Chi-Square Automatic Interaction Detection (CHAID)
RUS outperformed the other sampling techniques in three Decision Tree algorithm based on accuracy, sensitivity, specificity and precision [8]
7
Fujiwara et al. (2020)
ADASYN, SMOTE, AdaBoost, RUSBoost, hyperSURF, HUSBoost and proposed HUSDOS-Boost sampling
Random Forest
HUSDOS-Boost outperformed the RUSBoost and provided 0.69% of G-mean to detect stomach cancer with the minority class instance less than 30 [9]
8
Bunkhumpornpat and Subpaiboonkit (2013)
Improved SMOTE, Borderline-SMOTE and Safe-Level-SMOTE
Naive Bayes, Decision tree, KNN and RIPPER
Improved SMOTE provided a better result than other techniques on various classifiers and achieved 73% of F-measure and 78% of AUC [10] (continued)
488
G. Padmavathi et al.
Table 1 (continued) S. no
Author
Sampling techniques Classification algorithm
Observations
9
Abdi and Hashemi (2015)
Mahalanobis Distance-Based Over-Sampling Technique (MDO), SMOTE, Borderline-SMOTE, and ADASYN
Decision Tree, KNN MDO performed and RIPPER better than other techniques with various classifiers in terms of MAUC and precision [11]
10
Elhassan and Aljurf (2016)
Tomek’s Link(T-Link), RUS, ROS and SMOTE
SVM, ANN, Random Forest (RF) and Logistic Regression (LR)
T-Link performed best among various classifiers based on F-statistic, G-mean and AUC [12]
The above table shows that the different sampling techniques are applied to handle the class imbalance problem. Hence, the different sampling techniques are implemented and compared to improve the correct detection of a malicious insider in an organization.
3 Methodology The Following Fig. 1 illustrates the proposed methodology of minority class classification with different sampling techniques to detect a malicious insider threat. Fig. 1 Overview of proposed methodology
Dataset
Data preprocessing Data Integraon
Data Transformaon
Data Level Sampling
Classificaon
36 A Framework for Improving the Accuracy …
489
3.1 Dataset The benchmark dataset is collected from the cyber security-based CERT [13] Division. The synthetic dataset based on malicious activity in the cloud environment has been collected. THE gathered US-CERT dataset consists of log details based on emails, web connection, device connectivity status, and login of malicious and non-malicious users. The dataset version r3.1 is considered as a primary dataset to analyze and detect the malicious insider threat. Some malicious insider threat-based scenarios [14] are defined below: • Scenario 1: An individual in an organization working after working hours, often used to carry a removable drive and uploaded the important information to wikileaks.org. Later resigned from the organization. • Scenario 2: An individual in an organization visited job websites and beseeched employment opportunities from a competitor of the business. The abnormal behaviour of the employee increases in data transfer using removable drives. Later resigned from the organization. • Scenario 3: Unauthenticated or unsatisfactory system administrator tries to install malicious software to collect sensitive information and utilize the removable drives for data transmission from the particular authorized system. Gather sensitive information to access the authenticated system. It also contains emails regarding sensitive information in an unusual manner in an organization. Later, resigned from the organization. • Scenario 4: Over three months, individuals frequently logged into other user’s computers. Searched and forwarded files to a personal email address. • Scenario 5: Uploaded documents to Dropbox for personal gain.
3.2 Data Pre-processing The primary CERT data contains log details of 516 days, where 4000 users generate 135,117,169 log events [14]. The events are activities including email-based, loginbased, device storage-based, HTTP operations, psychometric details, file information and daily log details. This paper considers scenario-1 and scenario-2 among the above mentioned five scenarios. So, the primary data related to selected scenarios are regarded as Base data, and others are neglected. The base data undergoes two pre-processing steps to make the data suitable for classification. It includes data integration and data transformation. Data Integration. Detection of Malicious insider threat records related to device status, login status and HTTP operation satisfies the above-selected scenario. The selected records are integrated using simple feature concatenation techniques. While other records are neglected. The following Table 2 demonstrates the feature details of integrated data.
490 Table 2 Feature details of integrated data
Table 3 Transformed data
G. Padmavathi et al. Features
Description
InsiderThreat
It considers malicious activity or not
Vector
It is considered as the origin of data
Date
Date of the particular event
User
User id who carries particular activity
Pc
Unique identification for each computer
Activity
Action of particular user
Features
Before transformation
After transformation
InsiderThreat
1
1
Vector
Logon
0
Date
07-01-2010 02:23:00
1,280,707,200
User
CCH0959
4
Pc
PC-0588
128
Activity
http://linkedin.com/ jobs/displayhome.html
750
Data Transformation. The integrated data needs to be transformed to a categorical value for further processing. The features namely “vector”, “pc”, “user” and “activity” from integrated data converted into a numerical value. The value of “date” is converted into the number of epochs. The following Table 3 shows the details of transformed data. Data Level Sampling. It is necessary to balance the instance in all classes for accurate classification. To solve the class imbalance problem, three different types of techniques have been used. They are Data level solution, Algorithmic level solution, Ensemble-based learning solution [15]. The solution at the data level for the class imbalance problem is based on sampling methods [16]. This technique provides the solution by altering the pattern of data distribution. It is also said to be restructuring the imbalanced class data to make it well-balanced data. It is accomplished by both undersampling and oversampling. The different types of oversampling techniques are Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic (ADASYN) and Random Oversampling (ROS). Some of the essential undersampling techniques are Edited Nearest Neighbours (ENN), Near-Miss 1 (NM-1), Near-Miss 2 (NM-2), Random Under sampler (RUS), Tomek-link (T-L) have been implemented. In the pre-processed dataset, a feature like “InsiderThreat” is the target variable where the majority class instance “0” denotes non-malicious activity and minority class instance “1” indicates malicious activity. It is difficult to classify the minority class because the minority class instance is lesser than the majority class instance. So, a class imbalance arises during classification, where the data is distributed unequally for all the classes. This results in misclassification and misinterpretation of data. To
36 A Framework for Improving the Accuracy …
491
handle the class imbalance problem, data-level solutions such as oversampling and undersampling techniques are recommended. Oversampling Techniques. The primary focus of Oversampling technique is to replicate the instance of minority class until the dataset is balanced. Since the size of minority class instances would increase abruptly, the learning time also increases. This paper considers the three oversampling algorithms to resample the imbalanced data. They are ROS, SMOTE and ADASYN. One of the common techniques in oversampling is ROS. It multiplies the instance of the minority class randomly by replicating the minority class instance. Thus, it raises the problem known as overfitting. To overcome the overfitting [4, 8, 12], artificial synthetic methods are recommended. In SMOTE Eq. (1), a new artificial synthetic dataset is generated by combining minority class instance xi and interpolation within KNN, namely x zi [3, 4, 7, 9, 11, 12]. xnew = xi + λ(x zi − xi)
(1)
where the λ is denoted as a random number between 0 and 1, the balanced data is created by interpolation between xi and x zi. Minority instances are generated using (i) Regular. (ii) Borderline approach using KNN (iii) SVM approach [10]. SMOTE modifies the artificial instance of minority class based on weight for each class is called ADASYN. It generates several instances for minority classes proportional to the number of the adjacent class instance [3, 7–10]. It concentrates on outlier or minority class instances. Undersampling Technique. The primary focus of the Undersampling technique is to eliminate the instance of the majority class until the dataset is balanced. The decrease in the size of the majority class instance decreases the learning time [6]. This paper focuses on five undersampling algorithms that are used to balance the class imbalanced data. They are RUS, NM-1, NM-2, T-L and ENN. One of the common techniques in undersampling is ROS. It minimizes the instance of the majority class in a random pattern until the majority class instance equals the minority class instance [6, 8, 10, 12]. Hence it causes loss of important information in the majority class. The idea of Near-miss is to resample the instance of the majority class necessary to differentiate all classes. In NM-1, the majority class instance is selected if it satisfies the minimum average distance for N neighbouring minority class instance. In NM-2, the majority class instance is chosen if it meets the minimum average distance for N outermost minority class instance. T-L’s objective [12] is to clean the majority instance by eliminating the outlier same as a classifier. d(x, z) < d(x, y)or d(y, z) < d(x, y)
(2)
where d is defined as the distance between two instances. The link exists if the two instances of distinct classes are nearby each other. In ENN [11], the KNN eliminates the instance that fails to satisfy the neighbor.
492
G. Padmavathi et al.
3.3 Classification The supervised classification technique such as the SVM classifier [3, 4, 12] is implemented using balanced data to accomplish the detection of malicious insider threats in an organization.
4 Results and Discussions The following metrics are used to evaluate the proposed methodology. They are Accuracy, Sensitivity, Precision, F-score, True positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). Accuracy is the most frequently used evaluation metrics and is defined as the proportion of accurately predicted instances to the overall sum of instances. It is expressed as follows: Accuracy = T P + T N /T P + F P + T N + F N
(3)
Sensitivity or Recall is defined as a correctly predicted positive instance. It is also known as a measure of correctness. It is expressed as follows: Sensitivit y or Recall = T P/T P + F N
(4)
Precision is a measure of exactness to predict the positive instance by eliminating the inaccurately predicted negative instance as a positive. It is expressed as follows: Pr ecision = T P/T P + F P
(5)
F-score is the weighted average of precision and recall in binary classification. The increase in precision and Recall increases the value of σ. It is expressed as follows: F − scor e = (1 + σ 2) ∗ Pr ecision ∗ Recall/σ 2 ∗ Recall ∗ Pr ecision
(6)
The performance of the SVM classifier is compared using above mentioned evaluation metrics. The following Table 4 illustrates the performance of the SVM classifier after applying different oversampling and undersampling techniques. From Table 4, it is observed that the performance of ADASYN, ROS and SMOTE remains the same, and the Recall of a non-malicious event is less. Hence, the handling of imbalanced data using the oversampling technique is complex. In the undersampling technique, the f-score of NM-1 is almost more negligible, and specificity is unsatisfactory where it fails to detect the malicious activity. ENN and T-L achieved equal modest performance. NM-2 surpasses the ROS and improves the generation of an artificial minority class instance dramatically. Thus, it achieves high performance in Recall, Precision, F-score and Accuracy.
36 A Framework for Improving the Accuracy …
493
Table 4 Performance metrics of eight sampling methods Accuracy
F score
Precision
Recall 0.67 ± 0.77
Oversampling techniques ADASYN
0.680375
0.80 ± 0.03
0.99 ± 0.02
ROS
0.680375
0.80 ± 0.03
0.99 ± 0.02
0.67 ± 0.77
SMOTE
0.680375
0.80 ± 0.03
0.99 ± 0.02
0.67 ± 0.77
Undersampling techniques ENN
0.680375
0.80 ± 0.03
0.99 ± 0.02
0.67 ± 0.77
NM-1
0.319625
0.48 ± 0.00
0.97 ± 0.00
0.32 ± 0.22
NM-2
0.84325
0.91 ± 0.02
0.99 ± 0.01
0.84 ± 0.28
RUS
0.716625
0.83 ± 0.04
0.99 ± 0.02
0.71 ± 0.74
T-L
0.680375
0.80 ± 0.03
0.99 ± 0.02
0.67 ± 0.77
Table 5 Comparison of SVM Classifier Performance using imbalanced and balanced data Accuracy
F-score
Precision
Recall
Imbalanced data
0.991625
0.99 ± 0.00
0.99 ± 0.00
1.00 ± 0.00
Balanced data
0.84325
0.91 ± 0.02
0.99 ± 0.01
0.84 ± 0.28
Table 5 demonstrates the comparison between the performance of the SVM classifier using imbalanced and balanced data. From Table 5, it is observed that imbalanced data fails to detect malicious activity. The Recall of SVM classifier using balanced data correctly predicts the non-malicious activity than imbalanced data. Precision and f-score are outperformed using balanced data, while accuracy remains satisfactory in the detection of malicious activity.
5 Conclusion and Future Enhancement This proposed research paper implements different oversampling and undersampling strategies to combat the imbalanced class data with the classification prediction model. The CERT dataset includes the malicious insider threat is used, and SVM is applied for classification. The performance of the SVM classifier before sampling and after sampling is compared using various performance metrics. The performance of undersampling techniques outperformed the oversampling techniques to handle the imbalanced CERT dataset using SVM Classifier. NM-2 works better than other sampling techniques based on F-score and Recall. It eliminates majority class instances safely, resulting in improved performance than RUS, NM-1, ENN and T-L. In the near future, deep learning and other sampling techniques can be applied to classify the majority class instance of CERT data to improve performance.
494
G. Padmavathi et al.
Acknowledgements This work is supported by Centre for Cyber Intelligence (CCI), DST-CURIEAI-Phase II Project, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamilnadu, India - 641027.
References 1. Le D, Heywood Z (2020) Exploring anomalous behaviour detection and classification for insider threat identification. Int J Netw Manage 31(4):e2109 2. Devi D, Biswas SK, Purkayastha B (2020) A review on solution to class imbalance problem: undersampling approaches. In: 2020 international conference on computational performance evaluation (ComPE), pp 626–631 3. Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp 79–85 4. Dittman DJ, Khoshgoftaar TM, Wald R, Napolitano A (2014) Comparison of data sampling approaches for imbalanced bioinformatics data. In: The twenty-seventh international FLAIRS conference, pp 268–271 5. Junsomboon N, Phienthrakul T (2017) Combining over-sampling and under-sampling techniques for imbalance dataset. In: Proceedings of the 9th international conference on machine learning and computing, pp 243–247 6. Hasanin T, Khoshgoftaar T (2018) The effects of random undersampling with simulated class imbalance for big data. In: 2018 IEEE international conference on information reuse and integration (IRI), pp 70–79 7. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328 8. Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Springer, Singapore, pp 13–22 9. Fujiwara K et al (2020) Over- and under-sampling approach for extremely imbalanced and small minority data problem in health record analysis. Front Public Health 8:178. https://doi. org/10.3389/fpubh.2020.00178 10. Bunkhumpornpat C, Subpaiboonkit S (2013) Safe level graph for synthetic minority oversampling techniques. In: 2013 13th international symposium on communications and information technologies (ISCIT). IEEE, pp 570–575 11. Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of oversampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251 12. Elhassan T, Aljurf M (2016) Classification of imbalance data using Tomek link (T-link) combined with random under-sampling (RUS) as a data reduction method. Global J Technol Optim S1:11 13. Glasser J, Lindauer B (2013) Bridging the gap: a pragmatic approach to generating insider threat data. In: 2013 IEEE security and privacy workshops, pp 98–104 14. Meng F, Lou F, Fu Y, Tian Z (2018) Deep learning based attribute classification insider threat detection for data security. In: 2018 IEEE third international conference on data science in cyberspace (DSC), pp 576–581 15. Pengfei J, Chunkai Z, Zhenyu H (2014) A new sampling approach for classification of imbalanced data sets with high density. In: 2014 international conference on big data and smart computing (BIGCOMP), pp 217–222 16. Guo H, Li Y, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Chapter 37
Customer Churn Analysis Using Machine Learning Ritika Tyagi and K. Sindhu
1 Introduction ‘Churn’, in simple terms, can be defined as the number of units leaving a specific region of use over a period of time. Churn is used as an important factor of consideration in business terms. Customer churn rate or CCR is basically defined as the rate at which customers of a company stop using its services or stop doing business with the company depending on if the company is a B2B or B2C. It is most commonly calculated as the percentage of the unsubscribed customers with respect to the total customers of a company within a specific time range. Keeping track of the rate of attrition for a company is of utmost importance when it comes to evaluation of its success. Since the rate of acquiring a new customer is often much higher than the rate of retaining a customer, customer churn analysis is being explored in depth in the new age and companies are trying to find faster and more efficient ways to retain their customers. The lack of efficient churn models available currently and a sense of ambiguity that remains in companies when they try to predict their churn rate was the primary motivation to choose to perform Churn Analysis. Ahmed et al. [4] discusses how churn is especially a tremendous problem in the telecommunications industry. According to research, the top four wireless companies in the United States have a monthly churn rate of approximately 1.9%–2%, indicating that companies not only need to heavily invest in customer churn models, but also ensure that they are up to date and state-of-the-art. Churn management is a crucial task for companies for many reasons, one of them being that it greatly effects the revenue generated by a company. In the R. Tyagi (B) · K. Sindhu Department of ISE, BMS College of Engineering, Bangalore, India e-mail: [email protected] K. Sindhu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_37
495
496
R. Tyagi and K. Sindhu
past, neural networks have shown depictions of churn analysis models but they are not state-of-the-art. The proposed work aims to build a churn analysis model that can predict the target variable ‘churn’, hence predicting if a particular customer will stop using the company’s services or not. Companies can leverage this to their advantage using several machine learning techniques such as cross validation, hyperparameter tuning on the cross validated models and even building an ensemble model. For classification purposes, six algorithms are chosen appropriately to perform churn classification. Unit and Integration testing was also performed on the built model to ensure its performance and also maintain a report of each individual working model as well as the complete model. For performance evaluation, accuracy, precision, f1-score, recall, and confusion matrices are used to evaluate each model being built.
2 Literature Survey Ahn et al. [1] has discussed that due to the sudden growth in telecom industry the demand for churn prediction has skyrocketed, as keeping the existing customer base is quite economical compared to acquiring new customers. Due to advancements in computing power and machine learning, companies are now capable of processing data efficiently and predicting the churn rate. In this paper, a new model for churn analysis is proposed which uses a data mining software called WEKA. Decision tree and Logistic regression algorithms are implemented and their accuracy is compared. For the implementation phase, they used a considerably sized data set having 50 numeric variables and 200 rows. A larger data set was also used, having 100 variables and 608 rows or entries. Hung et al. [2] discusses how the liberalization of Taiwan’s telecom industry led to a rapid growth and competition among local service providers. In order to grow their customers many of them started to look for various techniques to retain and secure new customers. This paper shows how data mining was prominent in finding the churn rate score. Various data mining techniques were compared and used to assign the “propensity to churn” value for the customers during a particular period of time. Ullah et al. [3] discusses how the telecom industry can use its subscriber database and develop essential and effective churn rate prediction models. Business analysts and customer relationship managers in the telecom industry can use the churn reports to manage and retain their customers by grouping them into categories and providing them with suitable offers. Here, the first model implemented is Random Forest with 88% accuracy followed by clustering techniques such as K-means clustering which is used to segregate the customer base into similarly grouped clusters. Ahmed A. et al. [4] suggests customer defection prediction is gaining popularity in the corporate world, particularly in the telecom industry. Many scholars have introduced various churn forecasting models that are heavily involved in data mining techniques and incorporate machine learning as well as metaheuristic techniques. This paper’s goal is to look at some of the most significant churn prediction approaches that have been developed over time. The paper depicts hybrid approaches, such as hybrid
37 Customer Churn Analysis Using Machine Learning
497
machine learning, hybrid meta-heuristics, or even both, which yield excellent churn forecast accuracy. SVM, ANN, SOM, and other hybrid models deliver great accuracy scores while reducing processing complexity. Brandusoiu et al. [5] discusses an innovative data mining algorithm that forecasts customer attrition in the pre-paid mobile telecommunications industry using call log data with 3333 customers and 21 attributes apiece. To minimize the dimensionality of the data and resolve the risk of multicollinearity, they first use the principal component analysis (PCA) approach. On the obtained main components and discrete variables, three machine learning algorithms are implemented (Neural networks, Support vector machines and Bayesian networks) in order to develop the prediction models. The gain measure and the ROC curve are used to finally evaluate the models. Tsai et al. [6] uses two hybrid models for churn prediction that use 2 distinct neural network techniques: back-propagation artificial neural networks (ANN) and self-organizing maps (SOM). ANN coupled with ANN (ANN + ANN) as well as SOM combined with ANN (SOM + ANN) are indeed the hybrid models. Three main types of testing sets are examined for evaluating the performance of these models. The general testing set and two fuzzy testing sets built on screened out data. Hadden et al. [7] discusses that it’s easier to retain old customers than to get new customers. So, the companies are investing big in research and spending their resources to retain their consumers. Keeping this in mind, churn analysis should be more accurate through the use of ML/AI built models instead of relying on the previously used manual methods that are not only cumbersome but also inaccurate. The techniques that presented were decision trees, regression analysis and other conventional methods. Vafeiadis et al. [8] presents various machine learning models to predict churn analysis. The models discussed in the paper are initially trained and then tested using the famous cross validation technique. Then, a wide range of machine learning algorithms were compared by their accuracy and rate of type 1 and type 2 errors. The results showed that boosted algorithms such as ADA boosting, Gradient boosting are far better in terms of classifiers than the normal algorithms. Therefore, they are better suited for churn analysis. Xia et al. [9] suggests that to improve the prediction accuracy of various machine learning algorithms, support vector machines can be used in churn analysis. Support vector machine was compared to other machine learning algorithms such as Decision trees, Naive Bayes, Logistic regression and also various artificial neural networks. SVM outperformed the other algorithms in terms of accuracy proving it to be more suited for churn prediction models.
3 Proposed Work Figure 1 presents the workflow of the chain of events for a machine learning model. The data store is populated with the extracted dataset from Kaggle (a dataset of a telecommunications company) that is then fed for feature generation and other exploratory data extraction techniques. Finally, the generated output is analyzed and
498
R. Tyagi and K. Sindhu
Fig. 1 Proposed system architecture
performance tuning factors are put in place for correct evaluation of the efficiency and optimality of the built model. The predicted model segments the customer history dataset into different segments. The customer attributes and usage patterns are devised from the dataset consisting of multiple features. The data is then randomly split into training and testing (20% of the data going into testing) after which the models are built and finally, testing is performed on all the models to check their churn prediction classification reports which are indicative of how efficient and accurate they are. Each model is built incrementally, on top of the previous model by performing cross validation and hyperparameter tuning on the models. An ensemble model is also built comprising of four algorithm Linear Regression, Ada Boost Classifier, XG Boost Classifier, SVC and hard voting is performed on the ensemble model to check its performance. Once the classification model is built, apart from being able to predict the churn value per customer, the churn rate of the company will also be calculable as seen in Eq. 1. Customer Churn Rate (CCR) =
Total no. of customers lost during a period (1) Total no. of customers in the beginning of the period
Data Transformation From the dataset that was imported, initially missing values were searched for. But in the first attempt, no missing values were found. To solve this issue, all the labels were converted from their object notations to their integer values and 11 missing values were found. The rows of these 11 missing values were entirely deleted from the dataset. Also, the features have string values which were encoded to numeric values. This helps the model understand the nature of the dataset better and eventually leads to a more accurate model as it reduces the risk of overlooking data curves.
37 Customer Churn Analysis Using Machine Learning
499
Feature Analysis From the dataset, two different types of correlation graphs were plotted to recognize the heavily correlated features with the ‘predictor’ variable churn. Also, the features having little to no importance to the predictor variable were noticed. Not only that, many features were independently plotted such as gender ratio (to see if gender had an impact on if a customer leaves the company’s services or not), churn rate with respect to dependents (that is if the customer has any financial dependents), partners (to see if being single or having a partner effects a customer’s churn decision), senior citizen (to have a better understanding of the company’s target demographic), contract of the customer, monthly charges, total charges (to see if contract and its charges can have a major effect on the final decision of a customer) and even the different kinds of company services provided to the customer to gather insights about the nature of churn in the company. Plotting individual graphs for some of the important features into consideration produced better insights from the data analysis being performed. Model Engine For the model building part, six different algorithms were used to get the best accuracy possible. The algorithms include Logistic Regression, ADA Boost Classifier, XG Boost Classifier, Decision Tree Classifier, Random Forest and SVC. The performance of each model was evaluated using various metrics of evaluation such as confusion matrix, F1 score, Precision, Recall & Accuracy. Only checking accuracy alone is not enough to correctly evaluate a built model because the cost of a true negative of false positive also needs to be taken into consideration. Model Output From those six algorithms, the most well-suited and efficient algorithm is to be chosen. After analysing the Confusion matrix, F1 Score, Precision, Recall and the Accuracy (the performance metrics chosen for evaluation), XGBoostClassifier was chosen as the best suited algorithm for the chosen dataset. An ensemble model was also built, in hopes for a higher accuracy score, however, XGBoostClassifier outperformed even the ensemble learning model for the prediction of customer churn. Performance Tuning To further increase the model’s accuracy, model tuning was performed by using various methods like cross validation (which is basically a form of fragmenting your dataset to reduce the risks of certain problems such as overfitting) and also using hyperparameter tuning through RandomSearchCV on the cross validated data to further improve our model. An ensemble model was also built using 4 out of the 6 previously used algorithms. Finally, the best working model was observed through its performance metric scores.
500
R. Tyagi and K. Sindhu
3.1 Data Analysis The dataset chosen is of a telecommunication company. It is extracted from Kaggle [10]. The classification model built should be efficient and have a fairly good accuracy without overfitting the dataset. Also, extensive data analysis should be performed on all the features, including checking if the dataset consists of any missing values in it. Finally, one model is to be chosen that best fits the churn prediction of the telecommunication company dataset by performing not only performance evaluation but also performance tuning through cross validation, or tuning of hyperparameters. The final model built should end-to-end and efficiently predict the ‘churn’ rate of companies. The dataset consists of 21 features such as gender, partner, dependents, tenure also the various kind of company services subscribed to by the customer such as phone service, internet service, online security, device protection and many other telecom-related features apart from the final predictor variable feature ‘churn’. Next, missing values were searched for in the dataset to look for inconsistencies. Initially, no missing values were found. However, that seemed like an inaccurate result. As the dataset was directly extracted from Kaggle and represents real data of a telecommunication company, missing values are likely. When ‘isNull()’ function was used to check for missing values, it gave an output of 0 for all the features. Since at least a few missing values are very likely for a dataset from Kaggle as they consist of real values, the data type of all the features were checked to see if the data type is not representative of giving a missing value output. Total charges and some other features were seen to be of object datatype rather than integer or float. The total charges features are converted to float64 datatype from object datatype. After converting it to a float datatype from object, missing values were again checked for in the dataset. 11 missing values were seen in total charges. These rows were rejected from the model being built. Most of the features were not in numeric format. These had to be converted to dummy variables of a numeric format so that they are representative and understandable to the model. The no. of features increased as any non-numeric feature was converted into ‘n’ number of features. The value of ‘n’ is decided from how many classes each feature has. Now, the dataset is trainable because all the data is in a ‘0’, ‘1’ format. Further, correlation of different features was checked with respect to the ‘churn’ variable to see whether any particular features are heavily correlated (either positively or negatively) with it. Figure 2 shows the pairwise correlation of all the features in our dataset to facilitate a better understanding of which feature or set of features are heavily dependent on the predictor variable ‘churn’. The numbers on the y-axis of Fig. 3 represent the degree of correlation. A positive number indicates that the features are positively correlated, that is, if one of the features in question increases, the other shall increase too. Similarly, a negative value represents negative correlation, that is, one of the feature’s value increases, the other’s decreases. It also gives us an understanding about which features to not focus on, as a lower correlation implies that, that particular feature is not a major contributor in the final decision of a customer.
37 Customer Churn Analysis Using Machine Learning
501
Fig. 2 Pairwise correlation of all the features in the dataset (Y-axis negative represents inversely related)
Fig. 3 No. of customers with respect to the tenure
As represented in Fig. 2, gender has very little to do with the ‘churn’ variable showing minimum correlation in the graph. While some features such as monthto-month contract, no online security services, no technical support, no online backup are positively correlated with churn, indicating that more customers leave the company in a monthly contract rather than a yearly one. It also shows that customers give importance to services provided in a telecommunications company such as online security, technical support and other services. Monthly charges are also an
502
R. Tyagi and K. Sindhu
important factor when it comes to the churn rate. Bringing costs down in a manner such that it doesn’t affect the company revenue but also reduces churn rate is critical for company success. From a gender distribution graph, it was noticed that gender was not a critical attribute for churn and that the male: female gender ratio was almost balanced. So, it is not indicative of any underlying churn characteristics related to gender. It was also noticed that 16.2% of the customers were senior citizens which is a large amount. Some company programs should definitely be altered to cater the senior citizen demographic so as to reduce churn rate in that customer segment because senior citizen comprise of a high percentage of the total customers of the company. The partner and dependent status show us that about 50% of the customers have a partner, while only 30% of the total customers have a dependent. Interestingly, among the customers who have a partner, only about half of them also have a dependent, while other half do not have any independents. Additionally, as expected, among the customers who do not have any partner, a majority (90%) of them do not have any dependents. The variations amidst the % of customers with/without dependents and partners by gender were also looked at. There is no difference in their distribution by gender. Additionally, there is no difference in senior citizen status by gender. As analysed from a tenure vs. no. of customers graph, there is a sharp rise in the beginning when customers enrol in, but almost 50% of them drop out just after 1 month of staying in the company. While the churn rate remains stagnant for the coming months, there is an increase in the no. of customers (i.e., decrease in churn rate) after 70 months. This shows that customer contracts play a huge role in the churn rate of the company. Companies should try and enrol customers for longer contracts rather than monthly contracts. Even though monthly contracts are more enticing for customers, it is easier for a customer to leave when in a monthly contract. Contracts should be decided analytically based on the demographic in question. As discussed earlier, although maximum customers are attracted by a month-to-month contract, it also leads to the highest churn rate, which in the long run is not beneficial for the company. In Fig. 3, it is noticed that two-year contracts are most favourable for a company. While monthly contracts have a sharp decrease in the number of customers after a month, one-year contracts fail to generate a rise of customers even after reaching the 70-month mark. It also shows that customers take their own time and experiences to develop a level of trust and relationship with the company. Phone service, Online security and Tech support are crucial to keep customer retention at pace, as analyzed from customer usage of all the telecommunication services the company provides. Also, from our dataset, 26.6% of the customers churn. This is a very high churn rate for the telecommunication company. This can also result from some skewed values of the extracted dataset. The maximum churn rate is of monthly contract customers. Two-year contract seems to be more beneficial for the company, showing that 97% of the customers in a two-year contract do not leave the company services which can be very economical for the company. Senior citizens have approximately twice the churn rate than non-senior citizens. Since
37 Customer Churn Analysis Using Machine Learning
503
Fig. 4 Monthly charges by churn
senior citizens comprise of 16.2% of the customers, their demands need to be met to reduce the churn rate. In Fig. 4, when the monthly charges are low, the density of customers who don’t churn is very high. Whereas, the density of customers who churn increase as monthly charges increase. This shows that customers have almost no tolerance for higher chargers, irrespective of the services provided and will switch to different companies with cheaper service charges.
4 Results Once the dataset was imported, missing values were calculated and rectified for better model performance. As mentioned earlier, the project was broken down into two parts, first being data analysis (to get better insights about the nature of the dataset) followed by a classification model that predicts the target variable ‘churn’ for any new customer. From our data analysis, it is clear that the customer churn rate for the company is high and needs to be reduced. Many interesting insights were gained in the data analysis phase. Some of them are discussed below. Perhaps, the company can understand their data curve better with the gained insights and make changes accordingly to reduce the churn rate in the company. Although gender distribution seems to be of not much importance when it comes to the churn rate, it is clear that senior citizens have a higher churn rate and their demands have not properly been catered to. Perhaps, most of the services are developed keeping the newer generation in mind, but the senior citizen demographic cannot take a back seat as they comprise of 16.2% of the population. Also, services such as Online Security, Phone Service and Technical support play a critical role in maintaining customer retention. Newer and more micro services such as device protection etc. don’t seem to of as much importance as expected when it comes to the churn rate, and it is observed that customers rely on much more fundamental characteristics such monthly and total charges when
504
R. Tyagi and K. Sindhu
deciding whether they would want to churn or not. Service charges definitely need to reduce in a manner so as to not increase the fixed and variable costs of the company much, and also to attract a larger customer base. Customer contracts also tend to play a key role in churn, proving again and again that monthly contracts fail to perform customer retention even though their initial customer attraction is very high. As the cost of retaining a customer is much lower than the cost of acquiring one, it is of fundamental importance to retain the existing customer base by shifting them to a longer contract basis. Two-year contracts have been proven to be more beneficial for the company, reaching 97% of customers to not churn. Focusing on model building, ultimately, we have to predict the ‘churn’ variable, that is, predict a particular customer’s churn value. We have used six widely used and most appropriately suited algorithms for classification purposes. Logistic Regression ADA Boost Classifier, XG Boost Classifier, Decision Tree Classifier, Random Forest and SVC were the algorithms used for the model being built. After building the models, Logistic Regression 82.01%, ADA Boost Classifier 81.59%, XG Boost Classifier 82.51%, Decision Tree Classifier 74.69%, Random Forest 79.53%, SVC 82.01% were their accuracies. XGBoostClassifier had the highest accuracy while Decision Tree gave the lowest accuracy of all for predicting customer churn. XG Boost Classifier is known to perform well for tabular and structured data and is often the price winning model in Kaggle open-source competitions. It works on the concept of gradient descent and combines many weak learners into ultimately developing a strong learner. Figure 5 shows the confusion matrices of all the six algorithms used and each sub-square representing true positives, false positives, true negatives, and false negatives. The confusion matrices of all six algorithms were analyzed, proving that XG Boost Classifier gives the maximum true positives (969) and minimum false positives (83). The models were further developed by performing cross validation on all six models. K-Fold cross validation was performed on all the models to check if feeding data in folds of k will increase model efficiency. Cross validation was performed on the built models, and it was found that Ada Boost (80.51%) performed to give maximum accuracy in cross validated models but it was still lower than the non-cross validated XG Boost model. Hence, XG Boost Classifier was further modified by performing hyperparameter tuning. The hyperparameter tuned model of XG Boost gave approximately equal accuracy as the XG Boost Classifier from the first build, proving that the accuracy of the first build is still higher in the second digit. Finally, an ensemble model was built to check whether combining 4 algorithms and using their results to generate the final churn rate will break the accuracy record of the XG Boost Classifier. Hard voting was performed on the built ensemble model to predict churn rate. However, the ensemble model also marginally fell short when compared to the XG Boost Classifier leading us to the conclusion that XG Boost Classifier is the chosen algorithm for prediction of churn rate in the model built. For predicting the churn value of a new customer in the future, XG Boost Classifier should be used as the algorithm of choice while building a classification model as it will lead to the most
37 Customer Churn Analysis Using Machine Learning
Fig. 5 Confusion matrices of all the six algorithms
505
506
R. Tyagi and K. Sindhu
accurate predictions. This will successfully help companies trying to build an efficient state-f-the-art customer churn prediction model. Comparing the final built model with the literature survey, the built model differs in the type and number of algorithms used as well as how it was incrementally built. Unit testing was performed in a manner such that, before cross validation, the previously built models were not only tested, but their performance was evaluated. Performance evaluation was performed not only on the final model, but also on each step of the built model. Hence, every further step was in the direction of producing better performance metric results than the previous build, however each build was complete in itself. Finally, all the models were evaluated to see which algorithm and training method outperforms others. This provided a feedback mechanism and also provided a cover for any possible failures in the future builds.
5 Conclusion The machine learning model was incrementally built, with the initial phase involving data analysis, which proved to be successful by giving insights about the nature of customers, the appropriate contract for the least customer churn rate, the company services that are more crucial in reducing the customer churn rate than other services and much more. This was followed by the model building process, which involved the training of six different algorithms to accurately predict the ‘churn’ value per customer. Cross validation was then performed in attempt to increase the accuracy of the algorithms being used, and hyperparameter tuning was performed on XG Boost Classifier as it sustained to give the highest accuracy. An ensemble model was also trained, but it proved to be marginally less accurate (82.23%) than XG Boost Classifier (82.37%). XG Boost Classifier should be the chosen algorithm for further use of predicting the customer churn value of any new or existing customers in the company.
References 1. Ahn JH, Han SP, Lee YS (2006) Customer churn analysis: churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry. Telecommun Policy 30(10–11):552–568 2. Hung SY, Yen DC, Wang HY (2006) Applying data mining to telecom churn management. Expert Syst Appl 31(3):515–524 3. Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149 4. Ahmed A, Linen DM (2017) A review and analysis of churn prediction methods for customer retention in telecom industries. In: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), January. IEEE, pp 1–7
37 Customer Churn Analysis Using Machine Learning
507
5. Brându¸soiu I, Toderean G, Beleiu H (2016) Methods for churn prediction in the pre-paid mobile telecommunications industry. In: 2016 international conference on communications (COMM), June. IEEE, pp 97–100 6. Tsai CF, Lu YH (2009) Customer churn prediction by hybrid neural networks. Expert Syst Appl 36(10):12547–12553 7. Hadden J, Tiwari A, Roy R, Ruta D (2007) Computer assisted customer churn management: state-of-the-art and future trends. Comput Oper Res 34(10):2902–2917 8. Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9 9. Xia GE, Jin WD (2008) Model of customer churn prediction on support vector machine. Syst Eng Theory Pract 28(1):71–77 10. https://www.kaggle.com/radmirzosimov/telecom-users-dataset
Chapter 38
A Comparative Study of Hyperparameter Optimization Techniques for Deep Learning Anjir Ahmed Chowdhury, Argho Das, Khadija Kubra Shahjalal Hoque, and Debajyoti Karmaker
1 Introduction Deep Learning (DL) is a subfield of machine learning that deals with artificial neural networks, which are algorithms inspired by the structure and function of the brain. Deep learning is a critical component of self-driving automobiles, allowing them to detect a stop sign or discriminate between a pedestrian and a lamppost [1]. Deep learning can be used to analyze the performance of electrically driven extraterrestrial rovers [2], lunar rovers [3] and terrain recognition, classification, object detection from drones [4], deep-sea organism tracking [5], action recognition [6, 7] and parameter estimation for autonomous robots [8]. Deep learning models possess important parameters that can not be predicted directly from the data [9, 10]. Since, there is no analytical technique to derive an acceptable value for this type of model parameter, it is referred to as an optimization parameter. Hyperparameters are frequently used in processes to aid in the estimation of model parameters and are frequently specified by the user. In most circumstances, a heuristic method will be used based on one’s experience to tune these hyperparameters, such as starting values for the hyperparameters or finding the optimal values for a certain problem through trial and error. A hyperparameter procedure is expected to give the best model architecture for a deep learning model. There are numerous compelling reasons to use hyperparameter approaches in deep learning models. Because many DL developers spend a significant amount of time tweaking hyperparameters, especially for large datasets or complicated DL algorithms with a high number of hyperparameters, it A. A. Chowdhury (B) · A. Das · K. K. S. Hoque · D. Karmaker Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh e-mail: [email protected] A. Das e-mail: [email protected] K. K. S. Hoque e-mail: [email protected] D. Karmaker e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_38
509
510
A. A. Chowdhury et al.
decreases the amount of human labor necessary [11]. It boosts the performance of deep learning models. Many deep learning hyperparameters have distinct optimums for different datasets or situations. Finally, it improves the reproducibility of models and studies. Different DL algorithms can only be compared properly when the same level of hyperparameter optimization technique is used; thus, utilizing the same hyperparameter method on different DL algorithms also aids in determining the best DL model for a certain problem. In a hyperparameter, the optimization process is crucial. Because optimization is the process of enhancing a model’s performance while avoiding overfitting or excessive variance. Many open-source packages exist to help DL developers put theory to reality and decrease the barrier of entry for HPO challenges. The Keras Tuner API [12] was used for this study because of its ease of use. Hyperparameter tweaking can be easily incorporated into training scripts using Keras tuner. There are many common algorithms such as Random Search, Hyperband, and Bayesian optimization in Keras tuner API for optimizing Deep Neural Networks (DNNs). Genetic Algorithm and Particle swarm optimization were implemented from scratch. Grid search was implemented using the scikit-learn package [13]. The motivation of this study is to – help DL users, developers, data analysts, and researchers use and optimize DL models using HPO algorithms and frameworks. – aid in a better understanding of the existing issues in the HPO field, allowing future HPO and DL research to move forward. – conduct a numerical analysis of all the well-known HPO algorithms. – observe the performance consistency of the different HPO algorithms across different architecture and datasets.
2 Hyperparameter Optimization Techniques 2.1 Grad Student Descent Optimization A simple hyperparameter optimization approach is babysitting, often known as ‘Trial and Error’ or grad student descent (GSD) [14]. This approach is carried out entirely manually and is extensively utilized by students and academics. The procedure is straightforward: after creating a machine learning model, a student tests a variety of hyperparameter values based on prior experience, guesswork, or analysis of previously evaluated results; this process is repeated until the student runs out of time (often due to a deadline) or is satisfied with the results. As a result, this method needs a significant quantity of past knowledge and expertise in order to quickly find ideal hyperparameter values.
38 A Comparative Study of HPO Techniques for DL
511
Algorithm 1. Bayesian Approach for HPO Place prior over f ; Define an acquisition function that given a posterior distribution determines new sample locations; Evaluate n random samples of f ; while i < budget do Determine posterior distribution conditioned on the current samples of f ; Find new parameters xi by maximizing the acquisition function; Evaluate f (xi ) and store output in ot ; Increment i; end while return ot
2.2 Grid Search Grid search is an optimization approach that aims to find the best hyperparameter values. It is a brute-force or exhaustive search strategy that analyzes all hyperparameter combinations supplied to the grid of setups. Grid search calculates the Cartesian product of a finite collection of values supplied by the user. Hence, it is unable to further utilize the high-performing areas on its own. To find the global optimums, start with a broad search space and step size, then limit the search space and step size depending on the results of prior well-performing hyperparameter configurations.
2.3 Bayesian Optimization When deciding which hyperparameter set to examine next, Bayesian optimization considers previous evaluations. It is able to focus on those portions of the parameter space that it believes will yield the most promising validation scores by making intelligent parameter combinations. This method usually requires fewer repetitions to arrive at the best set of hyperparameter values. Most significantly, it ignores those portions of the parameter space that it believes will not contribute anything. Below the Bayesian Optimization algorithm is shown in pseudo-code An acquisition function is defined in the pseudo-code example. Any function that reflects the place that needs to be examined next can be used as the acquisition function. Expected Improvement is a function that is frequently utilized. E I (x) = Emax( f (x ∗ ) − f (x + ), 0)
(1)
where x represents the proposal parameters and x + represents the most recently assessed parameters. This expectation has a closed form solution, which is given by the equation below. (2) E I (x) = δΦ(Z ) + σ (x ∗ )φ(Z )
512
A. A. Chowdhury et al.
where δ = μ(x ∗ ) − f (x + ) and Z=
δ , σ (x ∗ )
0,
if σ (x ∗ ) > 0 otherwise
(3)
2.4 Random Search Random search is a popular alternative to grid search [15]. Random Search uses a random selection process to replace the exhaustive enumeration of all possible combinations [16]. This can be applied directly to the discrete situation mentioned above, but it also applies to continuous and mixed spaces. It can outperform Grid search, especially when only a few hyperparameters affect the machine learning algorithm’s ultimate performance. The upper and lower boundaries of the hyperparameter values are established in random search (RS). RS picks values at random from the pre-defined boundaries and trains them till the budget is depleted. RS can discover global optimums if the configuration space is large enough. If n is the total number of evaluations, then the computational complexity of RS is O(n), where n is specified by the user before the optimization process begins [11]. The hyperparameter response function was defined by the authors of [17] as follows: ψ (valid) (λ) = mean x∈X (valid) L(x; Aλ (X (train) ))
(4)
ψ (test) (λ) = mean x∈X (test) L(x; Aλ (X (train) ))
(5)
The estimated variance V about these means for the validation and test sets was defined as follows by the authors of [11]. The test-set score of the model among λ(1) , ..., λ(S) is a random variable, z, due to the uncertainty coming from X (valid) being a finite sample of G x . A Gaussian mixture model with means μ = ψ (test) (λ(s) ), variance σs2 = V(test) (λ(s) ), and weights ws is used to model the z score. The performance z of the best model in an experiment of S trials with mean μ and standard error σs2 was summarized by the authors of [11]. V(valid) (λ) = V(test) (λ) =
ψ (valid) (λ)(1 − ψ(valid)(λ)) |X (valid) − 1|
(6)
ψ (test) (λ)(1 − ψ(test)(λ)) |X (test) − 1|
(7)
Simulation is used to estimate the weights ws . Often, each trial yields a winning score by drawing hypothetical validation scores ψ (valid) (λ(s) ) from Normal distributions whose means are the V(valid) (λ(s) ) and whose variance are the squared standard errors V(valid) (λ(s) ) is quantified.
38 A Comparative Study of HPO Techniques for DL
513
Algorithm 2. Hyperband Approach for HPO Input: bmax , bmin ; smax = log( bbmax ) min for s ∈ {bmax , bmin − 1,.....0} do n = Deter mineBudget (s) γ = SampleCon f iguration(n) SuccessiveH alving(γ ) end for return The best configurations so far;
2.5 Hyperband Lisha Li et al. first proposed the Hyperband Approach in a study published in 2018 [18]. The authors of this study define hyperparameter optimization as a non-stochastic infinite-armed bandit problem in which a specified resource, such as iterations, data samples, or features, is allocated to randomly sampled configurations. The machine learning model can be checkpointed during training when Hyperband is used. Rather than optimizing hyperparameters, this approach aims to determine the best training schedule. Algorithm 2 depicts the basic phases of Hyperband algorithms. The total amount of data points, the minimal number of instances necessary to train a reasonable model, and the available budgets establish the budget restrictions bmin and bmax . Following that, in steps 2–3 of Algorithm 2, the number of configurations n and the budget size allotted to each configuration are determined using bmin and bmax. The configurations are sampled using n and b, and then put through the successive halving model in steps 4–5. The successive halving algorithm discards the detected poorly-performing configurations and moves on to the next iteration with the well-performing configurations [18]. This procedure is continued until the best hyperparameter configuration is found.
2.6 Genetic Algorithm The Genetic Algorithm (GA) is a search-based optimization methodology based on genetics and natural selection principles. It’s routinely used to find optimal or nearoptimal solutions to tough problems that would take an eternity to solve otherwise [19]. It’s commonly used to address optimization problems, and it also has a lot of parallel features. This algorithm does not necessitate the use of derived data. However, this approach is not appropriate for many problems, particularly those that are basic and have derivative information. This algorithm may not converge to the best answer if it is not implemented correctly. The main steps of Genetic algorithms are shown below,
514
A. A. Chowdhury et al.
Algorithm 3. Genetic algorithm for HPO Generate the initial population P; k → 0; EvaluateFitness(P); while Result is not accpectable do Parents → SelectMates(P); Children → ApplyCrossover(Parents); Children → ApplyMutation(Children); EvaluateFitness(Children); P → SelectBest(Children ∪ P); k → k + 1; end while return return the best solution in P; Table 1 The time complexity comparison of common HPO algorithms (n is the number of hyperparameter values and k is the number of hyperparameters). Algorithms Time complexity Grid search Random search Bayesian Hyperband Genetic algorithm
O(n k ) O(n) O(n 3 ) O(nlogn) O(n 2 )
Algorithm 3 begins by randomly creating the starting population P. The fitness function assesses each individual’s fitness inside the P. The loop is repeated until a satisfactory result is obtained. This loop handles a generation of the population in each iteration. The creation of a new generation begins with the selection of a subset of parents from the existing population, followed by a crossover and the production of children. A tiny percentage of children undergo mutations, and the fitness of each newly created person is assessed. Finally, the greatest people from the old and new generations are chosen to form a new generation, and a new iteration begins. Here, in the Table 1, the time complexity of the above algorithms are shown,
2.7 Swarm Intelligence Swarm Intelligence (SI) refers to the shared behavior of decentralized, self-organized organisms, whether it is natural or artificial [20]. Swarms occur naturally in nature, and researchers have studied ant colonies, bird flocking, and mammal herding to learn how distinct biological organisms work together with their surroundings to achieve a common purpose. In the 1980s, the concept of swarm intelligence was originally conceived. Since then, it has piqued the interest of scientists in a wide range of disciplines, including artificial intelligence, engineering, computer science, economics,
38 A Comparative Study of HPO Techniques for DL
515
and many more. Swarm Intelligence has given rise to a number of techniques for resolving problems involving the improvement of squadron intelligence, which are now applied in a wide range of applications. Swarm Intelligence can be used to optimize convolutional neural networks [21–23], weight connection optimization [24, 25], supporting farming operations in inaccessible land using unmanned drones [26] and to optimize the LSTM hyperparameters for a language modeling task [27]. Ant Colony Optimization (ACO) [28], Particle Swarm Optimization (PSO) [29], Bee Colony Optimization (BCO) [30], Artificial Fish Swarm Optimization (AFSO) [31], and Swallow Swarm Optimization (SSO) [32] are the most common swarm intelligence methodologies (SSO) for HPO. Ant colony optimization (ACO) is a population-based metaheuristic that can be applied to complex optimization problems to obtain approximate solutions. By traveling over the graph, the artificial ants gradually create solutions. The pheromone model, which is a set of parameters associated with graph components whose values are modified at runtime by the ants, biases the solution construction process. Particle swarm optimization (PSO) is an artificial intelligence (AI) technique that can be used to find exact solutions to numerical maximization and minimization problems that are exceedingly difficult or impossible to solve. PSO is based on group behavior seen in nature, such as bird flocking and fish schooling. The Bee Colony Optimization (BCO) algorithm is an optimization technique that replicates honey bee foraging action and has been effectively applied to a variety of real-world issues. The number of employed bees or observer bees in the swarm is equal to the number of solutions. Among the swarm intelligence algorithms, the Artificial Fish Swarm Algorithm (AFSA) is one of the greatest ways of optimization. The collective movement of the fish and their different social activities inspired this algorithm. The fish always try to maintain their colonies and show sophisticated behaviors as a result of a series of instinctual activities. Food hunting, immigration, and avoiding threats are all social activities, and interactions between all fish in a group result in sophisticated social behavior. The swallow swarm algorithm is a continuous optimization algorithm. In this algorithm, there are three kinds of particles: Explorer particles, Aimless particles, and Leader particles. These particles move in a straight line and are always in contact. Each particle in the colony is in charge of something that, by doing so, helps the colony to improve its condition. There are several other Swarm Intelligence algorithms that were not covered in this study because the intention of this study was to introduce and explore the most wellknown and widely used SI-based approaches. Some noteworthy SI algorithms are Firefly Algorithm (FA) [33], Bat Algorithm (BA) [34], Grey Wolf Optimizer (GWO) [35], Glowworm Swarm Optimization (GSO) [36], Whale Optimization Algorithm (WOA) [37], and Cuckoo Search Algorithm (CSA) [38]. There are still unexplored areas of Swarm Intelligence algorithms that can be a future research topic.
516
A. A. Chowdhury et al.
2.8 Limitations of HPO Techniques The existing HPO approaches have a number of drawbacks. Many possible applications are absolutely out of reach as a result of these limitations. When evaluating the number of hyperparameters rises exponentially, one of the limitations of grid search is that it suffers from dimensionality. There is no guarantee, however, that the search will yield the ideal solution, as it frequently does so via aliasing around the correct set. Grid search is only useful for Deep Neural Networks if there are just a few hyperparameters to optimize. The disadvantage of random search is that it produces a lot of variance during computation. Because the parameters are chosen at random and no intelligence is utilized to sample these combinations [39]. Random search is only useful in the early stages of Deep Neural Networks. There are few limitations in Genetic Algorithms too. Although, a genetic algorithm requires less knowledge about the problem, it might be challenging to create an objective function and get the representation and operators right. It is both time-consuming and has high computation cost. Specifying a prior in the Bayesian optimization is quite challenging. For each setting of the world model parameters, an actual number must be supplied. Bayesian Optimization is the default algorithm for tools for Deep Neural Networks, however, some alternative variations of BO may be better suitable. Lastly, when it comes to swarm intelligence, it is difficult to predict behavior based on the individual rules, and the functions of a colony can’t be comprehended without knowing how an agent works. Another problem is that even minor changes in the simple rules cause differences in group behavior. One thing to add is that all the HPO algorithms are designed to optimize models in offline training. These algorithms are not considered for on-the-fly scenarios. So, here is an open scope to develop a HPO algorithm to solve problems in online training.
3 Methodology 3.1 Datasets and Architectures CIFAR10 [40] dataset and the Intel Image Classification dataset [41] were used for this numerical study. The image size of these two datasets is the main reason for their selection. The images in CIFAR10 are 32 × 32 pixels, which is a small image size dataset, however the images in the Intel image dataset are 150 × 150 pixels, which is a large image dataset size. The further description of the datasets are given below,
38 A Comparative Study of HPO Techniques for DL
517
CIFAR10: The 60000 32 × 32 color images in the CIFAR-10 Dataset are organized into ten classes, each with 6000 images. For training, there are 50000 images, and for testing, there are 10,000 images. Intel Image Classification Dataset: This large image collection, created by Intel for an image classification contest, has roughly 25,000 images. Buildings, woodland, glacier, mountain, sea, and street are among the categories split among the images. Following that, the study was also thoroughly tested on variety of architectures to illustrate the validity of the findings, and the result was assessed to determine whether it was consistent. VGG16: In a paper published in 2014, Simonyan and Zisserman introduced the VGG network design [14]. The usage of merely 33 convolutional layers stacked on top of each other in increasing depth emphasizes the network’s simplicity. ResNet50: In their publication [18], Kaiming et al. introduced the concept for the first time. ResNet50 is a ResNet variation of 48 Convolution layers, 1 MaxPool layer, and 1 Average Pool layer.
3.2 Experimental Setup In this research, VGG16 and ResNet50 were selected for observing the performance of the suggested HPO algorithms. We want to analyze the behavior of HPO algorithms on sequential and residual architectures. For optimizing both of the architectures the search space is mentioned in Table 2. Grid search (GS), Genetic algorithm (GA), Bayesian optimization (BO), Random search (RS), Hyperband (HB) and Particle swarm optimization (PSO) were selected for testing as HPO algorithms. To avoid time complexity, some values have been fixed for a few hyperparameters such as kernel size (3,3), pool size (2,2), stride size (2,2), optimizer: Adam, batch size: 64, and max epoch: 30. To prevent the overfitting of the models, we use the early stopping callback function and define monitor: ‘val_loss’ and patience: 3. The max trial has been set to 20 for BO and RS. But on Hyperband we manually stopped the optimization process because the max trial could not be defined in the Keras tuner. To evaluate the performance of the HPO algorithms, validation accuracy (VA) was used as the performance and optimization time (OT) as the model efficiency metrics.
518
A. A. Chowdhury et al.
Table 2 Specifics of the configuration space for the hyperparameters. Architectures Hyperparameters VGG16
RESNET50
Number of units per layer (nunits) ∈ [16, 32, 64, 128, 256, 512], number of dense units per layer ∈ [64, 128, 256, 512], activation function ∈ [relu, elu, gelu, selu] dropout of each layer ∈ [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], weight decay ∈ [0.0, 1e−01, 1e−03, 1e−04], learning rate ∈ [1e−01, 1e−02, 1e−03, 1e−04, 1e−05] Number of filters per layer ∈ [16, 32, 64, 128, 256, 512], number of dense units per layer ∈ [64, 128, 256, 512], dropout of each dense layer ∈ [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], weight decay for each dense layer ∈ [0.0, 1e−01, 1e−03, 1e−04], learning rate ∈ [1e−01, 1e−02, 1e−03, 1e−04, 1e−05]
3.3 Hardware and Software Setup We use Tensorflow 2.4 for implementing all the architectures mentioned above. Python 3.7, multiple open-source libraries and packages such as Jupyter notebook, NumPy, Matplotlib, and panda were used. Additionally, for implementing HPO algorithms, the Keras tuner API and scikit-learn was used. All the test cases were trained on a system with Intel Core i5 4th generation processor (3.20 GHz), Nvidia GTX 1060 6 GB graphics card, and 16 GB ram.
4 Results The results of the different HPO algorithms were compared under different scenarios. The matrics (optimization time and validation accuracy) are presented in Table 3. Grid Search has the highest optimization time and the lowest validation accuracy for both VGG16 and ResNet50, as seen in the table. In terms of optimization time and validation accuracy, Genetic Algorithm is similar to GS. Hyperband offers the fastest optimization time in all scenarios. HB also has the highest validation accuracy in all scenarios, with the exception of VGG16 with Intel Dataset. Particle swarm optimization also has a fast optimization time and has a high level of validation accuracy. From the results, it can be seen that the performance of all HPO algorithms is almost similar. There is no visible pattern in determining which method performs best on different datasets and architectures. Also, finding which HPO algorithm is more efficient is also challenging.
38 A Comparative Study of HPO Techniques for DL
519
Table 3 Performance consistency of HPO algorithms in different DNNs and datasets Dataset Algorithms VGG16 ResNet50 OT VA OT VA CIFAR10
Intel dataset
GS GA BO RS HB PSO GS GA BO RS HB PSO
07 h 37 m 05 h 25 m 03 h 41 m 03 h 27 m 03 h 41 m 03 h 51 m 06 h 42 m 05 h 02 m 03 h 16 m 02 h 02 m 02 h 06 m 02 h 33 m
74% 75% 78% 78% 81% 80% 73% 78% 84% 83% 82% 81%
03 h 01 m 02 h 13 m 01 h 09 m 00 h 54 m 01 h 05 m 01 h 11 m 05 h 41 m 03 h 55 m 2 h 19 m 02 h 04 m 02 h 17 m 02 h 17 m
66% 68% 68% 70% 68% 67% 59% 63% 60% 60% 71% 67%
5 Conclusion The rising use of deep neural networks has prompted this research since deep learning has become the primary method for dealing with data-related issues, and it’s found its way into a variety of applications. To apply deep learning models to real-world issues, hyperparameters must be fine-tuned to match specific datasets. However, the size of the data produced in real life is much larger, and manually adjusting hyperparameters is time-consuming and requires human labor. To resolve this issue researchers found automated ways to optimize hyperparameters of DNNs. In this study, we performed a numerical analysis on different HPO algorithms and observed the performance consistency across different datasets and architectures. In future research, different toolkits will be explored with various HPO algorithms such as Multi-armed bandit algorithms, particle swarm optimization, and Population-based training will be compared for better insight. It is anticipated that this research would be beneficial to DL users, developers, data analysts, and academics in their efforts to utilize and adjust DL models using suitable HPO algorithms. It will also contribute to a better understanding of the problems that remain in the HPO area, allowing future research involving HPO and DL applications to progress.
520
A. A. Chowdhury et al.
References 1. Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362–86. https://doi.org/10.1002/rob.21918 2. Avanzini G, de Angelis EL, Giulietti F (2021) Performance analysis and sizing guidelines of electrically-powered extraterrestrial rovers. Acta Astronautica 178:349–59. https://www. sciencedirect.com/science/article/pii/S0094576520305749 3. Yu X, Wang P, Zhang Z (2021) Learning-based end-to-end path planning for lunar rovers with safety constraints. Sensors 21(3). https://www.mdpi.com/1424-8220/21/3/796 4. Budiharto W, Gunawan AAS, Suroso JS, Chowanda A, Patrik A, Utama G (2018) Fast object detection for quadcopter drone using deep learning. In: 2018 3rd international conference on computer and communication systems (ICCCS), pp 192–195 5. Lu H, Uemura T, Wang D, Zhu J, Huang Z, Kim H (2020) Deep-sea organisms tracking using dehazing and deep learning. Mobile Netw Appl 25(6):2536 6. Shuvo AAC, Chowdhury SK, Hanif M, Nosheen SN, Zishan MSR (2021) Design and development of citizen surveillance and social-credit information system for Bangladesh. AIUB J Sci Eng (AJSE) 20(2):33–39 7. Chowdhury AA, Chowdhury SK, Hanif M, Nosheen SN, Zishan MSR (2020) YOLO-based enhancement of public safety on roads and transportation in Bangladesh. AIUB J Sci Eng (AJSE) 19(2):71–78 8. Nampoothiri MGH, Vinayakumar B, Sunny Y, Antony R (2021) Recent developments in terrain identification, classification, parameter estimation for the navigation of autonomous robots. SN Appl Sci 3(4):1–14. https://doi.org/10.1007/s42452-021-04453-3 9. Hasan KT, Rahman MM, Ahmmed MM, Chowdhury AA, Islam MK (2021) 4P model for dynamic prediction of COVID-19: a statistical and machine learning approach. Cogn Comput Special Issue:97–110 10. Chowdhury AA, Hasan KT, Hoque KKS (2021) Analysis and prediction of COVID-19 pandemic in Bangladesh by using ANFIS and LSTM network. Cogn Comput 13(3):761–770. https://doi.org/10.1007/s12559-021-09859-0 11. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316. https://www.sciencedirect.com/science/ article/pii/S0925231220311693 12. O’Malley T et al (2019) Keras Tuner. https://github.com/keras-team/keras-tuner 13. sklearn.model_selection.GridSearchCV. https://scikit-learn.org/stable/modules/generated/ sklearn.model_selection.GridSearchCV.html?highlight=gridsearch#sklearn.model_selection. GridSearchCV 14. Abreu S (2019) Automated architecture design for deep neural networks 15. Liashchynskyi P, Liashchynskyi P (2019) Grid search, random search, genetic algorithm: a big comparison for NAS. ArXiv:1912.06059 16. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305 17. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(10):281–305. http://jmlr.org/papers/v13/bergstra12a.html 18. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816 19. Lambora A, Gupta K, Chopra K (2019) Genetic algorithm-a literature review. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon), pp 380–384 20. Zhang Y, Agarwal P, Bhatnagar V, Balochian S, Yan J (2013) Swarm intelligence and its applications. Sci World J 2013:1–3 21. Byla E, Pang W (2020) DeepSwarm: optimising convolutional neural networks using swarm intelligence. In: Advances in intelligent systems and computing advances in computational intelligence systems, pp 119–130
38 A Comparative Study of HPO Techniques for DL
521
22. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M (2020) Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 13(3). https://www.mdpi.com/1999-4893/13/3/67 23. Zhang X, Zhao K, Niu Y (2020) Improved Harris Hawks optimization based on adaptive cooperative foraging and dispersed foraging strategies. IEEE Access 8:160297–160314 24. Milosevic S, Bezdan T, Zivkovic M, Bacanin N, Strumberger I, Tuba M (2021) Feed-forward neural network training by hybrid bat algorithm. In: Modelling and development of intelligent systems communications in computer and information, pp 52–66 25. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2021) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In: Mobile computing and sustainable informatics lecture notes on data engineering and communications technologies, pp 397–409 26. Spanaki K, Karafili E, Sivarajah U, Despoudi S, Irani Z (2021) Artificial intelligence and food security: swarm intelligence of AgriTech drones for smart AgriFood operations. Prod Plann Control 0(0):1–19. https://doi.org/10.1080/09537287.2021.1882688 27. Aufa BZ, Suyanto S, Arifianto A (2020) Hyperparameter setting of LSTM-based language model using grey wolf optimizer. In: 2020 international conference on data science and its applications (ICoDSA), pp 1–5 28. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39 29. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95 international conference on neural networks, vol 4, pp 1942–1948 30. Teodorovi´c D (2009) Bee Colony Optimization (BCO). In: Innovations in swarm intelligence studies in computational intelligence, pp 39–60 31. Hassan EA, Hafez AI, Hassanien AE, Fahmy AA (2015) Community detection algorithm based on artificial fish swarm optimization. In: Advances in intelligent systems and computing intelligent systems 2014, pp 509–521 32. Neshat M, Sepidnam G, Sargolzaei M (2012) Swallow swarm optimization algorithm: a new method to optimization. Neural Comput Appl 23(2):429–454 33. Yang XS (2009) Firefly algorithms for multimodal optimization. In: Foundations and applications lecture notes in computer science, stochastic algorithms, pp 169–178 34. Yang X, Gandomi AH (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483 35. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. https://www.sciencedirect.com/science/article/pii/S0965997813001853 36. Krishnanand KN, Ghose D (2008) Glowworm swarm optimization for simultaneous capture of multiple local optima of multimodal functions. Swarm Intell 3(2):87–124 37. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67. https://www.sciencedirect.com/science/article/pii/S0965997816300163 38. Chandrasekaran K, Simon SP (2012) Multi-objective scheduling problem: hybrid approach using fuzzy assisted cuckoo search algorithm. Swarm Evol Comput 5:1–16. https://www. sciencedirect.com/science/article/pii/S2210650212000107 39. Jamon M (1987) Effectiveness and limitation of random search in homing behaviour. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3531-0_23 40. CIFAR10 (2017). https://www.cs.toronto.edu/~kriz/cifar.html 41. Bansal P (2019) Intel image classification. https://www.kaggle.com/puneet6060/intel-imageclassification
Chapter 39
Fault Location on Transmission Lines of Power Systems with Integrated Solar Photovoltaic Power Sources Thanh H. Truong, Duy C. Huynh, and Matthew W. Dunnigan
1 Introduction The electrical energy transmission from the power sources to the consumers is implemented by the transmission lines. In this process, faults are inevitable on the transmission lines which can be caused by natural events such as tree branches falling on lines, wind, storms; or by equipment operating on the transmission lines; etc. When a short-circuit fault occurs, the short-circuit current is so large that it can damage electrical equipment as well as collapse the power system. This affects the utilization of electrical energy. For this reason, when a fault occurs, it must be quickly detected and isolated from the power system to avoid adverse effects on the whole power system. Simultaneously, the fault location must also be identified quickly to repair and recover the transmission lines to the new steady-state early. All of these expectations are to improve the reliability of power supply for consumers, as well as reduce the steady-state recovery time for the transmission line, which includes the time of locating the short-circuit fault and repairing the transmission line. Thus, the problem and result of fault location on the transmission lines are required and necessary. The fault location problem is currently getting more and more complicated and certainly more difficult to solve when the power sources and transmission lines are expanded in response to the increased demand for electrical energy. Recently, the solar photovoltaic (PV) power system integrated into the power system brings many benefits such as reducing the burden on the traditional power T. H. Truong · D. C. Huynh (B) Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam e-mail: [email protected] M. W. Dunnigan Heriot-Watt University, Edinburgh, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_39
523
524
T. H. Truong et al.
source, reducing emissions, reducing noise pollution, and reducing the cost of power generation, etc. [1]. The previous researches show that the faults on the transmission lines can be located by techniques such as impedance-based techniques [2–4], travelling wave techniques [5–7], and optimization algorithm-based techniques [8–12]. The impedance-based technique uses voltage and current signals at the fundamental frequency. These signals can be obtained at one terminal or two terminals of the transmission line. The impedance-based technique is widely applied for fault location but it also faces challenges affecting the accuracy of the fault location results such as the influence of fault resistance and unsynchronized data. The disadvantages of the impedance-based techniques show more clearly in the power system integrated renewable energy sources such as solar energy and wind energy [13–15]. The traveling wave technique is based on the propagation of electrical pulses from the fault location to the two terminals along the transmission line. The fault location results are determined by the pulse response time. This technique is suitable for long and homogeneous transmission lines. The traveling wave techniques are sometimes affected by the parameters and the structure of the transmission grid in the propagation process of electrical pulses. The effects are shown in the power system with the integration of renewable energy sources [16]. The optimization algorithms-based techniques have been recently applied for fault location getting more and more attention with the use of a genetic algorithm (GA) [8, 9], a cuckoo search (CS) algorithm [10], an artificial bee colony (ABC) algorithm [11], a whale optimization algorithm (WOA) [12], etc. This paper proposes an advanced CS (ACS) algorithm to locate faults on the transmission line of the power system including solar photovoltaic (PV) power sources. The ACS algorithm is the integration of the CS algorithm and the chaos theory to improve the CS algorithm performance. The paper is organized as follows. The fault location on a transmission line of an integrated power system is described in Sect. 2. The ACS algorithm-based fault location is proposed in Sect. 3. The numerical result of the proposal is followed in Sect. 4. The effectiveness of the proposal is shown in Sect. 5.
2 Fault Location on a Transmission Line A transmission system consists of a power source, S A , a transmission line, AB, and a solar PV power source, S B , Fig. 1. In this integrated power system, the solar PV power source is used to increase the sustainability of the power system which is becoming more and more popular. The solar PV arrays and DC/AC converter are the key parts of the solar PV power source. Then the problems of controlling and operating this integrated power system may lead to many challenges. The transmission line is modeled by the distributed-parameter transmission line model, Fig. 2 [11].
39 Fault Location on Transmission Lines of Power Systems …
525
d
~ SA
IB
IA
F
A
B
DC/AC
UB
UA
SB
Load Fig. 1 Faulted transmission line
ZFA
IA,ie
ZFB
IFB,i
IFA,i
IB,i
F UA,iej
YFA
YFA
YFB
YFB
UB,i
Fig. 2 Faulted transmission line with a distributed-parameter model
In Fig. 1, a fault is assumed to occur at F. Then U FA and U FB are given by:
U A,i cosh(γi dl)−
[cos δ + j sin δ] −Z Ci I A,i sinh(γi dl) U B,i cosh(γi (1 − d)l)− U F B,i = −Z Ci I B,i sinh(γi (1 − d)l) zi Z Ci = yi
U F A,i =
γi =
√
z i yi
(1)
(2)
(3) (4)
where U FA and U FB : the voltages at the faulted location, F viewed from A and B respectively (V); U A and U B : the voltages at A and B respectively (V); I A and I B : the currents at A and B respectively (A); z: the series impedance of the transmission line per unit length (/km); y: the shunt admittance of the transmission line per unit length (−1 /km); Z C : the characteristic impedance (); γ : the propagation constant;
526
T. H. Truong et al.
Fig. 3 Solar PV module model
Rs
D
Iph
Rsh
IB
UB
d: the distance from A to F (km); l: the length of the transmission line, AB (km); δ: the synchronization angle to make the data between the A and B synchronized. i: the symbol index for the sequence components; i = 1 (positive), 2 (negative), and 0 (zero). Following the trend of conserving the environment and natural resources, solar PV power systems and wind turbine power systems get much attention and are widely used. This paper deals with the solar PV power systems which are based on the solar PV module for integrating into the power system. The modeling of a solar PV module is shown in Fig. 3. At B, U B and I B are given by: UB q Rs I B −1 I B =N p I ph − N p I0 exp + akT Ns Np Np Rs − UB + IB Ns Rsh Rsh
(5)
where I ph : the source current of the solar PV module (A); I 0 : the diodes’ saturation current (μA); q: the charge on the electron, q = 1.602 × 10–19 (C); k: Boltzmann’s constant, k = 1.38 × 10–23 (m2 kg/s2 ); T: the solar PV module temperature (°K); a: the diodes’ ideality coefficient; N s and N p : the solar PV cell’s numbers in series and parallel respectively; I 0 : the diodes’ saturation current (μA); Rsh and Rs : the shunt and series resistances respectively (); From (1) and (2), the fault location problem is transformed into the optimization problem with the following objective function. f = U F A,i − U F B,i
(6)
39 Fault Location on Transmission Lines of Power Systems …
527
Then the faulted location is determined by minimizing the objective function (6). The constraints of the minimization problem are given by: 010 km Daily Weekly Monthly Seldom North Dhaka South Dhaka
1079 456 476 407 324 207 121 379 436 359 235 126 703 425 407 169 370 230 766 930 605
Percentage 70.22% 29.78% 30.89% 26.59% 13.48% 21.16% 7.87% 24.72% 28.47% 23.41% 15.17% 8.24% 45.88% 27.71% 26.40% 10.86% 24.16% 14.98% 50% 60.59% 39.41%
3 Result Analysis and Discussion 3.1 Respondents Demographic In this study, the respondents are Bangladeshi students, who have enough experiences to use PATHAO ride-sharing system. The survey is distributed among those students widely without any abridgements on age, gender, location or anything else. Around 1535 people (1079 male and 456 female) responded to this survey. Table 3 shows the respondents’ demographic information. The result demonstrates that the majority of respondents are living in Dhaka which is a metropolitan and urban city zone. In urban territories, versatility is important and it drives the requirement of public transportation [7]. Hence, ride-sharing systems are popular in Dhaka since it accommodates the requirements for driving in urban regions. Moreover, the respondents are ruled by male because most of the male students don’t have bikes as it is not so cheap for the students and they like to have bike rides rather than others. Along with these, male use public transport more frequently than female. These turned into
QA of PATHAO Ride-Sharing ...
605
one of the factors causing male users to use ride-sharing system frequently than that of female users. On the basis of education level, the second year students of Under Grade (UG) use ride-sharing system frequently than that of first year students because they have better technological knowledge than them. Moreover, in respect to educational categories, it is seen that there are a lot of users from second year students (28.47%) followed by third (23.41%) year, fourth (15.17%) year and masters students (8.24%). Another statistical result demonstrates that there are a lot of students using ride-sharing services and the greater part of respondents (50%) seldom using the ride-sharing services. In most of the cases, users use the ride-sharing systems for 1–5 km distance, around 45.88%, rather than 6 km distance. For the research purpose, we have separated Dhaka city into two major areas (north and south) for this study according to Bangladesh City Corporation. From there, we found that 60.59% people use PATHAO from North Dhaka whereas its 39.41% for South Dhaka.
3.2 Result of Service Quality Assessment Data analysis is carried out by assigning weight to every criterion in Table 1 which is calculated using entropy analysis technique. The higher the weight, the better service quality analysed from the collected data from the users. The consequence of entropy analysis is displayed in Table 4. From Table 4, a summary of entropy analysis has been made in Table 5 showing the highest and the lowest ranked criteria in each dimension. The highest ranked criteria means PATHAO has best services in those criteria and vice versa for the lowest criteria. In Table 5, the most noteworthy weighted criterion of online transport service quality investigation is cost effectiveness. Khan et al. in [8] showed that the ride-sharing system is cost effective, time reducing and rating is higher than traditional transport services in Bangladesh. Another significant criterion of service quality dimension is compensation. Ride-sharing services compensate at different times which is more cost effective than traditional vehicles. In the traditional system, there is no official system to give compensation to the passengers whereas the online system (e.g.; PATHAO, Uber etc.) makes the compensation to the users. The lowest weighted criterion for service quality is vehicle quality. In ride-sharing applications, users are able to see the vehicle quality as well as picture mentioned in the application during the ordering time. But, in this study we found that most of the users gave the lowest priority in this criterion. In Information quality measurement, the most elevated weighted criterion is regular updates. The information that is provided in the website is very up-to-date. Up-to-date implies that data is continually refreshed. It is also essential to have trust on the transportation service providers. Most of the users in our survey believe that their information are not misused i.e. the given data is trusted. Trusted means the given data of the users are not used for any other purposes. The data are secured and no other third party gets the updated data for any type of use, which is very
606
Md. B. Hosen et al.
Table 4 Results of entropy analysis Rank Criterion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Cost effective Regular updates System availability Updated website Does not insist offline Service User friendly payment Information security Compensate Customer service quality Trust Punctuality Skilled employee Response to user complain Unrestricted access to all pages Apps smoothness Privacy Complain over phone call Security Transparency Perfectness of website Rider behavior Vehicle quality Load information quickly Maps shows correct route App functions properly App is easy to use Ease of use Information easy to Understand Website is easy to use Information accuracy Saves time User discount
Weight 0.032370 0.032188 0.032180 0.032079 0.031928 0.031881 0.031789 0.031580 0.031580 0.031580 0.031502 0.031392 0.031356 0.031356 0.031321 0.031287 0.031254 0.031221 0.031188 0.031156 0.031125 0.031125 0.031005 0.031005 0.030947 0.030838 0.030760 0.030615 0.030283 0.030176 0.030047 0.029878
QA of PATHAO Ride-Sharing ...
607
Table 5 Summary of entropy result analysis Dimension Highest ranked criteria Service quality Information quality System quality
Cost effective Regular updates System availability
Lowest ranked criteria Vehicle quality Information accuracy Website is easy to use
important for users’ information confidentiality. Users perceive that PATHAO now have respectable substance helpfulness. In any case, it ought to be noticed that, in overall comparison information is not misused (i.e. Trust) whose position is on tenth. Though most of the users have trust on PATHAO regarding the safety of information provided by themselves, their is a big concern with the accuracy of information that comes from PATHAO. As a result, information accuracy is the lowest ranked criterion for the information quality dimension. In system quality, system availability has the highest weight among others. The respondents seems to be satisfied with the availability of PATHAO service whenever they need it. Salameh et al. in [24] reported that the use of services in m-commerce will increase if the system is easy to use by fulfilling the customer needs and providing support services to the users. This shows that ease of use and applications smoothness play an important role in determining the quality of service. Sometimes, using an internet application seems so hard for the users when it does not work properly. Sometimes, customers feel intimidating to make transactions on the internet as it appears more complex. So, apps smoothness is considered as an important thing to the users. Users perceive that, PATHAO app works properly and is easy to use. Application interface is updated by PATHAO technical team continuously fulfilling the users’ expectations. Since, most of the PATHAO users are comfortable to use mobile applications rather than websites, in this study we found that users have given more priority to apps’ smoothness rather than website’s easy usability. Meanwhile, vehicle quality, information accuracy and website is easy to use, are the three lowest ranked criteria in each dimension. And in overall ranking, information accuracy, user discount and vehicle quality are the three bottom ranked criteria. Since, one of our objective is to find out some criteria which can help PATHAO to improve its services by increasing users’ satisfaction level, they can improve their vehicle quality and information accuracy to compete in the market with other ride sharing services in Bangladesh.
3.3 User Satisfaction Analysis Twelve separate machine learning algorithms are analysed to decide how accurately the users’ satisfaction can be predicted. To conduct this study, majority of the analysis is done in weka 3.8.2. Confusion matrix along with computational time has provided detail information about the classified instances.
608
Md. B. Hosen et al.
Fig. 3 Accuracy comparison
Accuracy =
T r ue Positive + T r ueN egative T otal I nstances
(5)
T r ue Positive Rate(T P R) =
TP T P + FN
(6)
False Positive Rate(F P R) =
FP FP + T N
(7)
Pr ecision =
Recall =
F_scor e =
TP T P + FP
TP T P + FN
2(Pr ecision ∗ Recall) Pr ecision + r ecall
(8)
(9)
(10)
As our main focus in this section is to predict the users’ satisfaction accurately, the accuracy of all of the classifiers is illustrated in Fig. 3.
QA of PATHAO Ride-Sharing ...
609
Table 6 Confusion matrix Classifier TPR
FPR
Precision
Recall
F-measure
BayesNet NaiveBayes DecisionTable J48 RandomForest RandomTree Kstar AdaBoostM1 Vote Logistic SVM MultilayerPerceptron
0.181 0.173 0.649 0.607 0.602 0.520 0.461 0.593 0.809 0.413 0.472 0.410
0.863 0.866 0.792 0.776 0.802 0.765 0.811 0.775 0.809 0.796 0.799 0.816
0.805 0.807 0.822 0.807 0.828 0.762 0.822 0.803 0.809 0.775 0.807 0.816
0.821 0.823 0.785 0.783 0.799 0.763 0.816 0.783 0.894 0.784 0.803 0.816
0.805 0.807 0.822 0.807 0.828 0.762 0.822 0.803 0.809 0.775 0.807 0.816
Fig. 4 Load time comparison
Decision Table and Random Forest classifiers have the highest accuracy (82.7715%), and Random Tree classifier has the lowest accuracy (76.2172%). Table 6 demonstrates true positive rate (TPR), false positive rate (FPR), Precision, Recall and F1_score of the twelve classifiers which are calculated using the Eq. 6, Eq. 7, Eq. 8, Eq. 9 and Eq. 10 respectively. Figure 4 is a comparative analysis of load time in milliseconds (ms) for different classifiers. For multilayer Perceptron, load time value is an outlier (59270 ms). We have skipped the value of this classifier in this figure for better understanding of the comparison.
610
Md. B. Hosen et al.
4 Conclusion The aim of this research is to analyze the service quality of ride-sharing system in Bangladesh. PATHAO has been taken as an object of the case study. In this era of competitive business platform, PATHAO being one of the most popular ride-sharing system is fighting with other ride-sharing systems to exist in the market. In this study, prediction has been made about users’ satisfaction with the help of two different techniques namely- entropy analysis technique and classification using machine learning algorithms. Entropy analysis helped to rank the satisfaction criteria and classification using machine learning algorithms helped to predict users’ satisfaction. Moreover, this research will help the PATHAO authority to have a prediction about some criteria to improve their service quality in order to get better users satisfaction from their services and to long last in the market. In future, an improvement in this research can be done to get better accuracy level from the machine learning algorithms or more criteria can be considered in the survey work to have more specification of users’ satisfaction.
References 1. Ahmed JU, Tinne WS, Ahmed A (2019) Pathao: an emerging motorcycle-ride service in Bangladesh. SAGE Publications: SAGE Business Cases Originals 2. Aïvodji UM, Gambs S, Huguet M-J, Killijian M-O (2016) Meeting points in ridesharing: a privacy-preserving approach. Transp Res Part C Emerging Technol 72:239–253 3. Alzahrani AI, Mahmud I, Ramayah T, Alfarraj O, Alalwan N (2019) Modelling digital library success using the Delone and McLean information system success model. J Librarianship Inf Sci 51(2):291–306 4. Atia T (2019) Partners and drivers attraction and retention strategies of Uber Bangladesh 5. Cachon GP, Daniels KM, Lobel R (2017) The role of surge pricing on a service platform with self-scheduling capacity. Manuf Serv Oper Manag 19(3):368–384 6. Choi C, Kim C, Sung N, Park Y (2007) Evaluating the quality of service in mobile business based on fuzzy set theory. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 4, IEEE, pp 483–487 7. Choi S (2018) What promotes smartphone-based mobile commerce? Mobile-specific and selfservice characteristics. Internet Res 8. Collins C, Hasan S, Ukkusuri SV (2013) A novel transit rider satisfaction metric: rider sentiments measured from online social media data. J Public Transp 16(2):2 9. Handayani PW, Hidayanto AN, Sandhyaduhita PI, Ayuningtyas D, et al (2015) Strategic hospital services quality analysis in Indonesia. Expe Syst Appl 42(6):3067–3078 (2015) 10. Huang EY, Lin S-W, Fan Y-C (2015) MS-QUAL: mobile service quality measurement. Electron Commer Res Appl 14(2):126–142 11. Shi-Ming H, Chia-Ling L, Kao A-C (2006) A balanced scorecard framework. Ind Manag Data Syst Balancing Perform Measures Inf Secur Manag 12. Jamal J, Montemanni R, Huber D, Derboni M, Rizzoli AE (2017) A multi-modal and multiobjective journey planner for integrating carpooling and public transport. J Traffic Logistics Eng 5(2) 13. Khan M, Hossain Z, Hossain S, Hossain M, Anik SA, et al (2018) A smart navigation system for public bus service in Dhaka city
QA of PATHAO Ride-Sharing ...
611
14. Ali Khan MA, Raki Billah M, Debnath C, Rahman S, Habib MT, Islam GZ (2019) A detailed investigation of the impact of online transportation on Bangladesh economy. Indonesian J Electric Eng Comput Sci 16(1):420–428 15. Kumar N, Jafarinaimi N, Morshed MB (2018) Uber in Bangladesh: the tangled web of mobility and justice. Proc ACM Hum Comput Interact 2(CSCW):1–21 16. Lai Y, Yang F, Zhang L, Lin Z (2018) Distributed public vehicle system based on fog nodes and vehicular sensing. IEEE Access 6:22011–22024 17. Lim H, Widdows R, Park J (2006) M-loyalty: winning strategies for mobile carriers. J Consum Market 18. Lu M-T, Hu S-K, Huang L-H, Tzeng G-H (2015) Evaluating the implementation of businessto-business m-commerce by SMEs based on a new hybrid MADM model. Manag Decis 19. Anas Abdelsatar Salameh and Shahizan Bin Hassan (2015) Measuring service quality in mcommerce context: a conceptual model. Int J Sci Res Publ 5(3):1–9 20. Sarkheyli A, Song WW (2019) Delone and McLean IS success model for evaluating knowledge sharing. In: Hacid H, Sheng QZ, Yoshida T, Sarkheyli A, Zhou R (eds) QUAT 2018, vol 11235. LNCS. Springer, Cham, pp 125–136. https://doi.org/10.1007/978-3-030-19143-6_9 21. Septiani R, Handayani PW, Azzahro F (2017) Factors that affecting behavioral intention in online transportation service: case study of GO-JEK. Procedia Comput Sci 124:504–512 22. Silalahi SLB, Handayani PW, Munajat Q (2017) Service quality analysis for online transportation services: case study of GO-JEK. Procedia Comput Sci 124:487–495 23. Stiakakis E, Georgiadis CK (2011) A model to identify the dimensions of mobile service quality. In: 2011 10th International conference on mobile business. IEEE, 195–204 24. Stiglic M, Agatz N, Savelsbergh M, Gradisar M (2015) The benefits of meeting points in ride-sharing systems. Transp Res Part B Methodol 82:36–53 25. van de Kar E, Muniafu S, Wang Y (2006) Mobile services used in unstable environments: design requirements based on three case studies. In: Proceedings of the 8th international conference on electronic commerce: the new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet, pp 302–308 26. Zhu M, Liu X-Y, Wang X (2018) An online ride-sharing path-planning strategy for public vehicle systems. IEEE Trans Intell Transp Syst 20(2):616–627 27. Zuo Wenming, Zhu Wenfeng, Chen Shaojie, He Xinming (2019) Service quality management of online car-hailing based on PCN in the sharing economy. Electron Commer Res Appl 34:100827
Chapter 45
An Image Steganography Technique Based on Fake DNA Sequence Construction Subhadip Mukherjee, Sunita Sarkar, and Somnath Mukhopadhyay
1 Introduction The sharing of data in this era of the internet has resulted in a significant increase in the transmutation of data among the senders and the recipients. Though transferring data via the internet made communication easyer, it faces a significant hurdle in the form of safe data transmission [1, 2]. All of this may be traced back to an increase in hacking and interference instances last year. Cryptography [3, 4] and steganography [5–7] are two major approaches that have emerged and are widely utilised across the world to solve dependable communication difficulties. They may, however, be blended to create more robust procedures, making it more difficult to breach the protection [8]. Though encryption necessitates methods of encryption of confidential information in order to render it secure, it is suspect. Steganography, on the other hand, is a method of concealing private information within a media file without perverting any data regarding the file in order to construct low suspect. Different material types, including as images, music, video, and DNA sequences, can be used [9, 10]. The extensive distortion of the cover media may provide a hint to any steganalysis attacker that the media includes sensitive data, and the attackers may destroy or misuse the hidden data. Various medias, including as audio, video, and picture, may be used to disguise data and create a more secure and resilient solution. Because picture is the most often used multimedia on the internet, we employed image to create our suggested approach, and the DNA sequencing-based methods are outstanding for achieving a high-quality stego image. To produce a high PSNR stego picture, a
S. Mukherjee (B) Department of Computer Science, Kharagpur College, Kharagpur 721305, India e-mail: [email protected] S. Sarkar · S. Mukhopadhyay Department of Computer Science and Engineering, Assam University, Silchar 788011, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_45
613
614
S. Mukherjee et al.
safe and resilient steganography approach using the DNA sequencing is proposed in this research work. The following is how the paper is organised: Sect. 2 discusses the works linked to the recommended technology, and Sect. 3 discusses the suggested method, which includes both the hiding and retrieving techniques. Section 4 delves into the results of the proposed methodology’s analysis. Finally, the conclusion is shown in Sect. 5.
2 Related Works Several steganographic tactics have been developed over the years to hide private information into a picture file, and among those approaches, LSB-based [11, 12] schemes are the most frequent, simple, and widely utilised. Many professors, academicians, and researchers in the area of data concealment [13, 14] have been drawn to the DNA computing technology. The following are the key three factors that have attracted the attention of the researchers: (1) Visuality (quality of stego-picture), (2) Concealing ability (having a reasonable amount of data hiding capabilities), 3Robustness [15]. To hide a message in the DNA chain, the nucleotide arrangement is either changed or corrected. For the sake of simplicity, the nucleotides are converted to a stream of bits from English letters, and this notion was conceived and implemented by Leier in the year 2000[16]. Some scientists [17, 18] have done various researches on nucleotide randomization based on mathematical models (A, G, T, C). Furthermore, too many alterations to the DNA nucleotides may cause attackers to suspect codons. But, if RNA translation is used to make proteins, the codons’ usefulness may be compromised, and so the extraction will fail. Chen proposed [19] an approach using two commonly used message concealing technologies, compression by lossless and expansion of difference. They encoded the confidential data within the nucleotides A, C, T, and G, by using a 2-bit arrangment, according to this technique. The comparable stream of decimal-digits obtained by arranging groups of k-bits was then categorised into C1 and C2 sets, where, C1 is the expandable set and C2 is the changeable set. The compressed location map will be built based on these sets to hide the secret data inside the cover LSBs of the corresponding pairs via the expansion of differences. Despite the fact that they have reached a message concealing capacity of 0.13 bpn, which is too lower for today’s standards. The approach [20] proposed by Liu et al. is formed with the use of a piecewise chaotic-map in linear form. Fu et al. described an image steganography approach for tamper repair using DNA sequences [21]. Malathi et al. [22] suggested an data hiding method that relates the embedding efficiency of the LSB strategy in the combined domain of spatial and transform. In 2018, Vinodhini et al. [23] suggested a steganography approach where the F5 algorithm as well as matrix hiding are combined with the LSB technique. The transform and spatial domain of the pictures are both effected by this combination. The cover image has a size of 256 × 256 pixels, and the message has a length of 3096 pixels. However, neither of these
45 An Image Steganography Technique Based on Fake DNA ...
615
techniques is capable of creating a high-quality stego picture. As a result, the proposed approach is proposed to address the aforementioned issues.
3 The Proposed Scheme To create a high-quality stego picture, the suggested idea is primarily depend upon data embedding methods: the LSB approach and DNA sequencing. The suggested steganography scheme’s mechanism is presented in Fig. 2. From the original image, the associated two dimensional matrices for red, green, and blue are extracted. The deoxyribonucleic acid is decomposed of production of proteins (see Fig. 1) and the attributes of this breakdown are transmuted and serialised in order to attain the goal of a high PSNR value. Because the cover picture LSBs for RGB are represented in binary, a technique to convert these bits to nucleotides must be devised. The representation of C, T, G, and A in binary form would then be used in the proposed technique (see Table 1). The secret message is concealed within a reference DNA sequence to generate the fake DNA. After that, the fake DNA is embedded withing the cover image using the LSB substitution technique. The whole extraction mechanism is defined in Sect. 3.2 to acquire the sensitive information, and it’s worth noting that the retrieving strategy follows exactly the complementary procedures as the hiding technique illustrated in Sect. 3.1. Furthermore, these techniques make it more difficult to detect the presence of the secret message disguised into the picture using the provided method.
Fig. 1 Basic structure of a DNA Table 1 Codes with corresponding Nucleotides Short form Nucleotide C T G A
Cytosine Thymine Guanine Adenine
Corresponding code 11 10 01 00
616
S. Mukherjee et al.
Fig. 2 The entire scenario of the proposed image steganography methodology
3.1 Embedding Procedure Algorithm 1: Fake DNA Generation Input: Private Message (PM ), Reference DNA (R D ) Output: Fake DNA (FD ) Step 1: Convert PM to its equivalent ASCII code. Step 2: Convert the ASCII code of PM to its equivalent binary code, Pb . Step 3: Convert R D to its equivalent binary code, Rb using the encoding rule. Step 4: Generate a random key value, K v . Step 5: Split the Rb of K v bits segments. Step 6: Insert the bits of the stream Pb , one by one, at the beginning of each segment of Rb . After the completion of the insertion process, combine all the segments. Step 7: Generate the FD by using the encoding rule. Algorithm 2: Stego Image Generation Input: Original Image (O I ), Fake DNA (FD ) Output: Stego Image (O I ) Step 1: Generate the equivalent binary value of FD . Step 2: Generate the RGB panels of O I . Step 3: Hide 2 secret bits at a time within two LSBs of each red, green, and blue panels. Step 4: Repeat Step 3 until the entire bit stream is concealed. Step 5: Stego image generated.
45 An Image Steganography Technique Based on Fake DNA ...
617
3.2 Extraction Procedure Algorithm 3: Fake DNA Extraction Input: Stego Image (O I ) Output: Original Image (O I ), Fake DNA (FD ) Step 1: Generate the red, green, and blue channels of the O I . Step 2: Extract 2 secret bits at a time from two LSBs of each red, green, and blue panels. Step 3: Repeat Step 2 until all the secret bits are extracted. Step 4: After extracting the hidden bits, O I will be automatically constructed. Step 5: From the extracted secret bit stream, construct the FD . Algorithm 4: Private Message Extraction Input: Fake DNA (FD ), Random key (K v ). Output: Private Message (PM ) Step 1: Convert FD to its equivalent binary code, Pb with the help of the encoding rule. Step 2: Split the Pb of K v bits segments. Step 3: Extract first bit from the beginning of each segment of Pb . After the completion of the extraction, combine all the extracted bits. Step 4: Original PM extracted.
4 Experimental Analyses The proposed approach is tested using four test photos from the USCID database [24] in MATLAB R2011a: Lena, Airplane, Baboon, and Tree. The pictures used in the studies are 256 × 256 in size and may be seen in Fig. 3. To measure the security and visual quality of the stego picture, the metrics NAE and PSNR are used. The PSNR is the most often utilised image quality measure for stego-picture.
Fig. 3 The test images
618
S. Mukherjee et al.
The value of PSNR for the stego image Vc,d and original image Vc,d of size C × D is expressed in Eq. (1). P S N R = 20 log10
255 1 CD
C D (Vc,d − Vc,d )2
(1)
c=1 d=1
NAE finds absolute errors in normalized manner and is a parameter for finding the strength of the suggested method (see Eqn. (2)). C D (Vc,d − Vc,d )
N AE =
c=1 d=1 C D
(2) Vc,d
c=1 d=1
of concealed bits 154140 The hiding capacity of our method = number number of image pixels = 3×256×256 = 154140 = 0.784 bpp. 196608 Table 2 shows the results of the recommended approach with regard to the metrics embedding capacity (EC), NAE, and PSNR. We all know that a scheme is considered good steganography if it achieves a PSNR of higher than 30 dB. With the EC of 0.784 bpp, we were able to get a stunning PSNR value of 56.24 dB on average, which is 87.47% greater than the normal value. The closer the NAE value is to zero, the less the mistake and the less likely the human eyes will notice the distortion. The suggested method has a NAE value of 0.0018 on average. As a result, we can conclude that the suggested approach is capable of not only creating improved PSNR but also providing increased visual security.
Table 2 Outcomes of EC, NAE and PSNR for the suggested method Image Embedding capacity NAE (bpp) Tree Airplane Lena Baboon
0.784 0.784 0.784 0.784
0.0018 0.0017 0.0019 0.0018
PSNR (dB) 56.26 56.25 56.22 56.24
45 An Image Steganography Technique Based on Fake DNA ...
619
Table 3 Outcome comparisons of proposed method with other methods Method Parameter Value Nag et al. [26] Zhang et al. [25] Muhammad et al. [27] Proposed
PSNR EC PSNR EC PSNR EC PSNR EC
55.44 0.700 52.92 0.300 53.89 0.500 56.24 0.784
Fig. 4 The PSNR comparison
We compared our proposed technique to other well known, new, and related picture steganography schemes to see whether it has any promise. The average EC and PSNR of Zhang [25], Nag [26], and Muhammad [27] are compared to the suggested approach in Table 3. According to this table, the suggested approach produced an average PSNR of 56.24 dB, which is 3.32, 0.80, and 2.35 dB greater than Zhang [25], Nag [26], and Muhammad [27] (see Fig. 4), respectively. Suggested approach’s embedding ability is 0.784 bpp, which is 0.484, 0.084, and 0.284 bpp greater than Zhang [25], Nag [26], and Muhammad [27] (see Fig. 5), respectively.
620
S. Mukherjee et al.
Fig. 5 The EC comparison
5 Conclusion This paper proposes a novel picture steganography technology based on deoxyribonucleic acid computing. Unlike traditional picture steganography methods, this one is based on DNA’s biological functioning. To secure our proposed scheme, we have developed an algorithm to generate a fake DNA which contains the secret message. After that, by using the LSB strategy the fake DNA is hidden into the original picture. Our proposed method is compared to other related and new image steganography methods to determine its viability. We all know that a scheme is considered good steganography if it achieves a PSNR of higher than 30 dB. This table shows that we obtained an outstanding PSNR value of 56.24 dB on average, which is 87.47% greater than the standard norm. The closer the NAE value is to zero, the less the mistake and the less likely the human eyes will notice the distortion. The suggested method has a NAE value of 0.0018 on average. As a result, it has been demonstrated that the suggested approach is capable of not only creating more capacity but also providing increased visual security.
References 1. Rani SS, Alzubi JA, Lakshmanaprabu S, Gupta D, Manikandan R (2019) Multimedia tools and applications, 1–20 2. Gochhayat SP et al (2019) Wireless Networks, pp 1–14 3. Easttom W (2021) Modern Cryptography. Springer, pp 385–390 4. Sadhukhan D, Ray S, Biswas G, Khan M, Dasgupta M (2021) J Supercomput 77(2):1114 5. Kaur S, Bansal S, Bansal RK (2021) Multimedia Tools Appl 80(5):7749 6. Mukherjee S, Jana B (2019) Int J Nat Comput Res (IJNCR) 8(4):13 7. Mukherjee S, Sarkar S, Mukhopadhyay S (2021) J Inf Secur Appl 62:102955 8. Abikoye OC, Ojo UA, Awotunde JB, Ogundokun RO (2020) Multimedia Tools Appl 79(31):23483 9. El-Khamy SE, Korany NO, Mohamed AG (2020) IEEE Access 8:148935 10. Al-Harbi OA, Alahmadi WE, Aljahdali AO (2020) SN Appl Sci 2(2):1 11. Gambhir G, Mandal JK (2021) Innovations in systems and software engineering, pp 1–10 12. Chatterjee A, Ghosal SK, Sarkar R (2020) Multimedia Tools Appl, 1–19
45 An Image Steganography Technique Based on Fake DNA ...
621
13. Nisperos ZA, Gerardo B, Hernandez A (2020) 2020 12th international conference on electronics, computers and artificial intelligence (ECAI). IEEE, pp 1–6 14. Jose A, Subramaniam K (2020) Materials today: proceedings 15. Marwan S, Shawish A, Nagaty K (2016) Biosystems 150:110 16. Leier A, Richter C, Banzhaf W, Rauhe H (2000) Biosystems 57(1):13 17. Chang CC, Lu TC, Chang YF, Lee R (2007) Int J Innov Comput Inf Control 3(5):1145 18. Huang YH, Chang CC, Wu CY (2014) Multimedia Tools Appl 70(3):1439 19. Chen T (2007) International workshop on frontiers in algorithmics. Springer, pp 84–95 20. Liu G, Liu H, Kadir A (2014) Med Biolog Eng Comput 52(9):741 21. Fu J, Zhang W, Yu N, Ma G, Tang Q (2014) 2014 7th international conference on biomedical engineering and informatics. IEEE, pp 868–872 22. Malathi P, Gireeshkumar T (2016) Procedia Comput Sci 93:878 23. Vinodhini R, Malathi P (2018) Computational vision and bio inspired computing. Springer, pp 819–829 24. USCID Image Database. http://sipi.usc.edu/database/ 25. Zhang S, Gao T (2015) Int J Multimedia Ubiquit Eng 10(4):337 26. Nag A, Choudhary S, Basu S, Dawn S (2016) IEIE Trans Smart Process Compu 5(4):250 27. Muhammad K, Ahmad J, Farman H, Jan Z (2016) arXiv preprint arXiv:1601.01386
Chapter 46
Random Forest Based Legal Prediction System Riya Sil
1 Introduction The problem-solving approach of humans have led us to the current hi-tech era, where constant challenges and efforts are made for making machines smarter compared to human intelligence [1]. Instructions are induced into machines to make them act and take decisions like humans and also resolve complex problems [2]. The enormous amount of data transmitted through digital medium should be analyzed properly for better understanding of any situation. artificial intelligence provides an efficient mechanism of machine learning based document analysis using various parameters to understand a problem and predict its probable output [3]. Human expectations have taken a new height gradually with time to reduce workload and enhance accuracy, efficiency and also speed-up work for better productivity, thus innovative ways of technology are being developed using artificial intelligence [4]. It analyzes any event or predicts any outcome-based system [5]. For the past few decades as, artificial intelligence is getting its root into law through legal document generation [6], prediction, summarization, etc., researchers are trying to assist the legal professionals for delivery of justice to the beneficiary [7]. In a developing country like India, shortage of judges in the count has led to an enormous number of pending cases. According to reports by the Law Ministry of India in the year 2019, there have been more than forty-three lac pending cases over twenty-five High Courts in India, out of which eight lac cases have been waiting over a decade now [8]. With the degradation of socio-economic situation during the pandemic of Coronavirus (COVID-19) similar cases have further piled up and worsened the situation. In such a crucial time, the use of machine learning algorithms to increase efficiency and decrease the workload of legal professionals can help in reducing the number of cases thus cultivating the legal system [9–11]. “Justice delayed is justice R. Sil (B) Adamas University, Kolkata 700126, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_46
623
624
R. Sil
denied”, time is the biggest challenge faced by our judicial system today, stemming from variety of reasons from the lack of well-trained legal professionals, inefficient flow of information and mishandling of evidences all contribute to prolonged trials. Making data the focal point in a trial by introducing algorithms, would streamline the entire judicial process. Legal professionals would be empowered to consume and transcribe a broad range of cases by having an algorithm to assist them with greater accuracy via Random Forest Algorithm. For achieving the objective, authors have emphasized on argument-based legal judgement prediction system related to Dowry Prohibition Act that aims towards the prediction of the accused person being offender or not from manually created judicial dataset [12–15]. There is no pre-structured legal dataset in India for which new dataset needs to be created. Argument-based legal hardcopy documents are collected with their respective pronounced judgments related to Dowry Prohibition Act from trial courts of West Bengal [16–20]. Based on certain parameters the dataset is created from the collected documents and Random Forest based standard classifier to analyze the performance of proposed legal prediction system [21, 22]. Section – 2 provides a comparison of various research works related to artificial intelligence in legal field. Section – 3 provides a clear view about the fundamentals of Random Forest and detailed picture about the input dataset. Section – 4 illustrates the proposed model using random forest algorithm. It discusses about the performance analysis of the model. Section – 5 concludes the paper and discusses about the future scope of work.
2 Comparison of Various Research Works Related to Artificial Intelligence in Legal Field In this section, authors have compared the various existing works using artificial intelligence in legal field. Riya Sil et al. [3], in their paper had performed a pertinent study of artificial intelligence in legal field, that focuses on comparing various software based legal tools. In another paper, Dipanjan Saha et al. [7] in their paper, created a legal support system using machine learning and text analytics. Bag of Words (BOW) has been found out for vector representation such that the most used words can be identified within the legal documents. In another paper, Riya Sil et. al [6] has predicted the offender of legal cases using support vector machine and also displays the performance and accuracy of the model. Similarly, Riya Sil et al. [24] in their paper has proposed a legal prediction system with the help of several classifiers namely, Naïve-Bayes Classifier, K-Nearest Neighbour (KNN), and Decision Tree to validate performance and accuracy of the proposed model. Moreover, the average is compared to find the best prediction algorithm over legal dataset that has been generated from dowry death cases.
46 Random Forest Based Legal Prediction System
625
3 Fundamentals of Random Forest A supervised classification algorithm, Random Forest algorithm is used to analyze behavior and therefore predict from a given set of data [23]. It comprises of enormous number of distinct Decision Trees that function collaboratively i.e., it is built on decision tree and is used for structuring of predictions and analyzing behavior [24]. Each of the Trees in a Random Forest provides with a class prediction from which the highest number of votes given to a class is declared as the predicted result of the model [9, 25, 26]. The Model Construction, Model Prediction and Random forest-based text analytics are described below: Model Construction: For construction of a model, a training dataset is provided with N cases where N size training sample set is selected from original training dataset with replacement. Sampled dataset consists of M input variables/ features i.e., in the tree building process, an explicit count of input variables are specified and kept constant (where n 7 years or not)’. This parameter ensures the case is under ‘dowry death’ if the incident is within 7 years of marriage. (iv) ‘If the incident has taken place within seven years of marriage’ (v) ‘Postmortem Report (Usual/Unusual Death)’ and various other significant parameters are there. These parameters are selected on the importance of some features thus eliminating the other unimportant ones. Feature selection is the method by which useful features are selected from a stream of extracted features. The model is trained on the basis of these selected features to predict the offender. The above-mentioned feature selection parameters have been opted by legal professional through various case studies to propose the legal prediction system for Dowry Death Prohibition Act (Table 1).
46 Random Forest Based Legal Prediction System
627
Table 1 Manual dataset Torture and assaulted by
Dowry (Yes/No)
Torture after marriage (Yes/No)
Incident Post mortem within 7 years report (Yes/ No) (unusual/usual)
If person is offender or not
Husband of the the victim, brother-in-law, mother-in-law
1
1
1
1
1
Husband of the the victim, brother-in-law, mother-in-law
1
1
1
1
1
Husband of the the victim, brother-in-law, mother-in-law
1
1
1
1
1
Husband of the the victim, brother-in-law, mother-in-law
1
1
0
1
1
Husband of the the victim, brother-in-law, mother-in-law
1
1
0
1
1
Husband of the the victim, brother-in-law, mother-in-law
1
1
0
1
1
4 Proposed Model Using Random Forest The dataset created for the model has been used for the automatic generation of judgement prediction. In this section, authors have discussed about the working principle of the proposed model. In general, there are 4 types of machine learning models that include: (i) supervised learning (ii) unsupervised learning (iii) semi-supervised learning (iv) reinforcement learning. In this paper, authors have used random forest algorithm that is under supervised machine learning. In Supervised machine learning algorithm knowledge is gained from predefined data with both input and expected output [44]. Random Forest Algorithm: A Random Forest Classifier is a collection of different Decision Tree Classifiers. It is a collection of prediction tree in which every single tree depends on independent random vector sample with similar distribution with other tree in random forest [45–48]. The final class of the test data is calculated by taking the majority voting of all the predictions of the decision trees which are being considered.
628
R. Sil
Fig. 1 Random forest algorithm
Random forest algorithm works simply by working on the following steps: (i) Firstly, some random samples are selected from a given set of data. (ii) Then, for each of the sample, decision tree is constructed and accordingly a prediction result is achieved from each of them. (iii) Next, for each of the predicted result, a vote is performed. (iv) Finally, the most voted one from the prediction result is selected as the final result. Classifier Creation: In this paper, Random Forest classifier (refer Fig. 1) Module has been used from Scikit-Learn package of Python. For creating of classifier, the following statement is used: r f Classi f ier = Random For estClassi f ier ()
(1)
Fitting inputs into Classifier: Fit() function of RandomForestClassifier class has been used that assists in storing the data from the dataset into the formed classifier. The statement for the above input feeding into the created classifier is as follows: r f Classi f ier. f i x(X T rain, Y T rain)
(2)
Prediction/Class Phase: After fitting the inputs to the classifier model, we get the classes that are predicted by our classifier from the testing input data. The task of prediction is done using the predict () function of the RandomForestClassifier class. The statement of predicting the output classes is as shown below: Y Pr ed = r f Classi f ier. pr edict(X T est)
(3)
Classification Report and Accuracy Generation: After getting the predicted class labels, classification report and accuracy score is generated to find the performance of our classifier.
46 Random Forest Based Legal Prediction System
629
Table 2 Classification report and Accuracy score of Random Forest classifier Parameters
Precision
Recall
F1-score
Class 0
1.00
0.50
0.67
4
Class 1
0.92
1.00
0.96
23
Accuracy
Support
0.93
27
Macro average
0.96
0.75
0.81
27
Weighted average
0.93
0.93
0.92
27
Advantages of Random Forest Classifier includes: (i) It can handle huge amount of data. (ii) The number of decision tree makes it more accurate and robust in nature. (iii) Random forest takes the average of the predictions that cancels bias which results in no problem of overfitting. (iv) Among many other classification approaches, random forest provides with the maximum precision. (v) It can be used both for regression problem as well as classification problems.
4.1 Performance Analysis Using Proposed Model To get a clear view of model performance, classification report and accuracy score has been found that includes the following metrics: (i) Precision – It is the quotient of total count of correctly-identified members in class by the total count of class members in that particular class. (ii) Recall – It is the quotient of the total count of the number of class members by total count of class members (iii) F-1 Score – This metric combines both the above metrics into one metric that include precision and recall. F1 depends on the value of these metrics i.e., if both precision and recall are high then automatically the value of F-1 will be high and vice-versa. If one of the metrics is high and another one is low, F-1 will be low. Through F-1, one can know and identify a good classifier that can identify members of a particular class. In the result, the average accuracy is shown as 93%. In the given table, the performance report and accuracy score of the model has been depicted (Table 2).
4.2 Comparison Between Classes Measuring the performance of classifying the data using Random Forest classifier model can also be done on the basis of the class-labels of the data that are present in the dataset. All the values to be predicted belongs under two classes namely: (i) Class-0 and (ii) Class-1. Therefore, in this section, authors have graphically plotted the comparison chart of the classifiers on the basis of both the classes of data but separately. The performance of the said classifier for Class-0 and Class-1 values are shown subsequently (Refer Fig. 2).
630
R. Sil
Fig. 2 a Performance analysis for Class-0 data b Performance analysis for Class-1 data
5 Conclusion In this paper, authors have used Random Forest algorithm, a supervised machine learning algorithm for creation of legal judgement prediction system to extend help for legal professionals. As a developing nation, India is facing shortage of welltrained, expert manpower and proper infrastructure in the legal field as well. This affects the citizens from getting their deserved justice. Prolonged justice of legal cases may lead to various difficult consequences like, medical unfitness of accused, hostility of witnesses, tampering of evidences, etc. The legal professionals will be benefitted by the proposed model. It provides help to analyze and perform prediction based on important parameters related to ‘Dowry death’ cases. A standard classifier (i.e., Random Forest Classifier) has been used to demonstrate the accuracy and performance of the model. An accuracy of 93% has been achieved in this approach. In future, authors aim to achieve an accuracy of 100% by adding some additional parameters to provide the victim with proper justice to provide huge benefit to the society.
References 1. Makridakis S (2017) The forthcoming artificial intelligence revolution: its impact on society and firms. Futures 90:46–60. https://doi.org/10.1016/j.futures.2017.03.006 2. McGovern A et al (2017) Using Artificial Intelligence to improve real-time decision-making for high-impact weather. Bull Am Meteor Soc 98(10):2073–2090. https://doi.org/10.1175/bamsd-16-0123.1 3. Sil R, Roy A, Bhushan B, Mazumdar AK (2019) Artificial intelligence and machine learning based legal application: the state-of-the-art and future research trends. In: 2019 international conference on computing, communication, and intelligent systems (ICCCIS). https://doi.org/ 10.1109/icccis48478.2019.8974479
46 Random Forest Based Legal Prediction System
631
4. Wildhaber I (2018) Artificial Intelligence and robotics, the workplace, and workplace-related law. In: Research handbook on the law of artificial intelligence, pp 577–608. https://doi.org/ 10.4337/9781786439055.00036 5. Agrawal A, Gans J, Goldfarb A (2018) Exploring the impact of artificial intelligence: prediction versus judgment. https://doi.org/10.3386/w24626 6. Sil R, Roy A (2020) A novel approach on argument based legal prediction model using machine learning. In: 2020 international conference on smart electronics and communication (ICOSEC). https://doi.org/10.1109/icosec49089.2020.9215310 7. Saha D, Sil R, Roy A (2020) A study on implementation of text analytics over legal domain. In: Evolution in computational intelligence, pp 561–571. https://doi.org/10.1007/978-981-155788-0_54 8. Kumar D, Priyanka NA (2020) Decision tree classifier: a detailed survey. Int J Inf Decis Sci 12(3):246. https://doi.org/10.1504/ijids.2020.10029122 9. Zhang F, Yang X (2020) Improving land cover classification in an urbanized coastal area by random forests: the role of Variable Selection. Remote Sens Environ 251:112105. https://doi. org/10.1016/j.rse.2020.112105 10. Bodanza G, Tohmé F, Auday M (2017) Collective argumentation: a survey of aggregation issues around argumentation frameworks. Argument Comput 8(1):1–34. https://doi.org/10.3233/aac160014 11. Sil R, Alpana Roy A, Dasmahapatra M, Dhali D (2021) An intelligent approach for automated argument based legal text recognition and summarization using machine learning. J Intell Fuzzy Syst, 1–10. https://doi.org/10.3233/jifs-189867 12. Gurbani V, Thakur S (2018) Study of alleged dowry death cases at a morgue in West Bengal. Indian J Forensic Med Toxicol 12(1):313. https://doi.org/10.5958/0973-9130.2018.00061.0 13. Burrell J (2015) How the Machine ‘thinks:’ understanding opacity in machine learning algorithms. SSRN Electron J. https://doi.org/10.2139/ssrn.2660674 14. Long S, Tu C, Liu Z, Sun M (2019) Automatic judgment prediction via legal reading comprehension. In: Lecture Notes in Computer Science, pp 558–572. https://doi.org/10.1007/978-3030-32381-3_45 15. Prihandoko P, Bertalya B, Setyowati L (2020) City health prediction model using random forest classification method. In: 2020 fifth international conference on informatics and computing (ICIC). https://doi.org/10.1109/icic50835.2020.9288542 16. Branting LK et al (2020) Scalable and explainable legal prediction. Artif Intell Law 29(2):213– 238. https://doi.org/10.1007/s10506-020-09273-1 17. Wang C, Jin X (2020) Study on the multi-task model for legal judgment prediction. In: 2020 IEEE international conference on artificial intelligence and computer applications (ICAICA). https://doi.org/10.1109/icaica50127.2020.9182565 18. Tonry M (2013) Legal and ethical issues in the prediction of recidivism. SSRN Electron J. https://doi.org/10.2139/ssrn.2329849 19. Yamakoshi T, Ogawa Y, Komamizu T, Toyama K (2020) Japanese legal term correction using random forest. Trans Japanese Soc Artif Intell 35(1). https://doi.org/10.1527/tjsai.h-j53 20. Karwa SS (2020) Dowry death and law- in India. Nat J Res Ayurved Sci 8(06). https://doi.org/ 10.52482/ayurlog.v8i06.690 21. Agarwal R (2018) Deciphering dowry deaths in India. Contemp Soc Sci 27(2):150–155. https:// doi.org/10.29070/27/57476 22. Ranganath LM (2019) Study of dowry deaths in northern Maharashtra region. Indian J Forensic Med Toxicol 13(4):195. https://doi.org/10.5958/0973-9130.2019.00287.1 23. Sarica A, Cerasa A, Quattrone A (2017) Random Forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Frontiers Aging Neurosci 9. https://doi.org/10.3389/fnagi.2017.00329 24. Sil R, Saha D, Roy A (2021) A study on argument-based analysis of legal model. In: Advances in intelligent systems and computing, pp 449–457. https://doi.org/10.1007/978-3-030-736033_42
632
R. Sil
25. Priyanka, Kumar D (2020) Decision tree classifier: a detailed survey. Int J Inf Decis Sci 12(3):246. https://doi.org/10.1504/ijids.2020.10029122 26. Xu H, Yang M, Liang L (2010) An improved random decision trees algorithm with application to land cover classification. In: 2010 18th international conference on geoinformatics. https:// doi.org/10.1109/geoinformatics.2010.5567531 27. Azar AT, Elshazly HI, Hassanien AE, Elkorany AM (2014) A random forest classifier for lymph diseases. Comput Methods Programs Biomed 113(2):465–473. https://doi.org/10.1016/ j.cmpb.2013.11.004 28. Chen W et al (2017) A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA 151:147–160. https://doi.org/10.1016/j.catena.2016.11.032 29. Ellis K, Kerr J, Godbole S, Lanckriet G, Wing D, Marshall S (2014) A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas 35(11):2191–2203. https://doi.org/10.1088/0967-3334/35/11/ 2191 30. Shah K, Patel H, Sanghvi D, Shah M (2020) A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Hum Res 5(1). https:// doi.org/10.1007/s41133-020-00032-0 31. Alfaro E, Gámez M, García N (2018) Ensemble classifiers methods. Ensemble Classif Methods Appl R, 31–50. https://doi.org/10.1002/9781119421566.ch3 32. Guerreiro J, Rita P (2020) How to predict explicit recommendations in online reviews using text mining and sentiment analysis. J Hosp Tour Manag 43:269–272. https://doi.org/10.1016/ j.jhtm.2019.07.001 33. Tyralis H, Papacharalampous G, Langousis A (2021) Random forests in water resources. https:// doi.org/10.5194/egusphere-egu21-2105 34. Campos D, Silva R, Bernardino J (2019) Text mining in hotel reviews: impact of words restriction in text classification. In: Proceedings of the 11th international joint conference on knowledge discovery, knowledge engineering and knowledge management. https://doi.org/10.5220/ 0008346904420449 35. Arunadevi J, Ganeshamoorthi K (2019) Feature selection facilitated classification for breast cancer prediction. In: 2019 3rd international conference on computing methodologies and communication (ICCMC). https://doi.org/10.1109/iccmc.2019.8819752 36. Bishop C (2016) Domestic violence: the limitations of a legal response. Domest Violence, 59–79. https://doi.org/10.1057/978-1-137-52452-2_4 37. Nguyen QV et al (2017) Argument discovery via crowdsourcing. VLDB J 26(4):511–535. https://doi.org/10.1007/s00778-017-0462-9 38. Sadev SP (2021) Analyzing the challenges dowry prohibition laws through a review of the Supreme Court decisions in relation to misuse of Section 498A. SSRN Electron J. https://doi. org/10.2139/ssrn.3913497 39. Roesch E, Amin A, Gupta J, García-Moreno C (2020) Violence against women during covid-19 pandemic restrictions. BMJ m1712. https://doi.org/10.1136/bmj.m1712 40. Chaudhary D, Vasuja ER (2019) A review on various algorithms used in machine learning. Int J Sci Res Comput Sci Eng Inf Technol, 915–920. https://doi.org/10.32628/cseit1952248 41. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, Aljaaf AJ (2019) A systematic review on supervised and unsupervised machine learning algorithms for data science. Unsupervised Semi-Supervised Learn. 3–21. https://doi.org/10.1007/978-3-030-22475-2_1 42. Loog M (2018) Supervised classification: quite a brief overview. Mach Learn Tech Space Weather 113–145. https://doi.org/10.1016/b978-0-12-811788-0.00005-6 43. Muhammad I, Yan Z (2015). Supervised machine learning approaches: a survey. ICTACT J Soft Comput, 05(03):946–952. https://doi.org/10.21917/ijsc.2015.0133 44. Prakash AJ, Ari S (2019) AAMI standard cardiac arrhythmia detection with random forest using mixed features. In: 2019 IEEE 16th India council international conference (INDICON). https://doi.org/10.1109/indicon47234.2019.9030317
46 Random Forest Based Legal Prediction System
633
45. Kano Y et al (2019) COLIEE-2018: evaluation of the competition on legal information extraction and entailment. New Frontiers Artif Intell 177–192. https://doi.org/10.1007/978-3-03031605-1_14 46. Yohannes E, Ahmed S (2018) Prediction of student academic performance using neural network, linear regression and support vector regression: a case study. Int J Comput Appl 180(40):39–47. https://doi.org/10.5120/ijca2018917057 47. Ao Y, Li H, Zhu L, Ali S, Yang Z (2019) The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J Petrol Sci Eng 174:776–789. https:// doi.org/10.1016/j.petrol.2018.11.067 48. Wang Q, Nguyen T-T, Huang JZ, Nguyen TT (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif 12(4):953–972. https://doi.org/10. 1007/s11634-018-0318-1
Chapter 47
Problem Solution Strategy Assessment of a Hybrid Knowledge-Based System in Teaching and Learning Practice Kamalendu Pal
1 Introduction In recent decades, educators have been dealing with a hugely large number of students [26, 40, 57] in recent years. This expansion brought with it a non-homogenous student population. It has made the educators reconsider their teaching style and change their teaching in a more befitting way to accept this societal tendency. Besides, educators can no longer assume that all students will achieve their educational competence by being taught in the same manner. Therefore, new teaching practices are essential to support the services for the non-homogenous student population. There is evidence that students learn better when engaging in authentic, motivating, and appropriate learning activities pertinent to their requirements and hopes. Exercise-based tutorials, from a pedagogical standpoint, can address these characteristics and present the subject matter. Besides, course design is also part of higher education teaching and learning practice. Four-course design approaches (i.e., systematic, intellectual, scenario-based, and workshop-based) were advocated by D’Andrea [22]. She also mentioned that an outcome-based learning system provides flexible teaching, learning, and assessment strategies for course design. It is also essential to realize the difference between deep and surface approaches [7] to learning and necessary influences on outcomes in a course. Biggs [8] also mentioned that the following four components are essential: (i) motivational context, (ii) learning activity, (iii) interaction with other, and (iv) a well-structured knowledgebase. Students need to be given options to learn and discover things for themselves where possible. Academics and practitioners have expressed their views regarding instruction-based teaching practice in the context of computerized teaching systems or intelligent tutoring systems (ITS). Personalized instruction is often considered the most effective type of teaching, specifically for real-world use-case solutions K. Pal (B) City, University of London, London EC1V 0HB, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8_47
635
636
K. Pal
[16]. The inspiring motivation for academic research in ITS tried to encapsulate appropriate behaviours of human practitioners, hence trying to create artificial intelligence-based teaching and learning software systems [20, 23]. This paper describes a scenario-based teaching and learning practice in which the learning resources are used before, during and after synchronous events such as scheduled lecture sessions and tutorial sessions. The paper consists of five sections. It includes the background knowledge regarding teaching and learning practice; and the software architectural details of GBMA. Besides, the paper also presents the information representation details in the software architecture. Finally, the concepts of fuzzy set theory have been explained, and the evaluation experimental results are described. At the end the paper concludes with concluding remarks.
2 Background of Teaching and Learning Practice The Socratic style teaching technique exposes contradictions in the students’ logical reasoning process to solve problems, and this practice is widely used in law school teaching and learning [25]. In law school, educators often ask a series of questions once a student comes up with a case solution and explains related legal principles to justify the decision. Educators often change the information related to the case, or the legislative guidance to show the case’s resolution can alter abruptly even if a single piece of information changes. The main objective of this demonstration and analysis is for students to relate their understanding of the case and reasoning processes by considering different options. This way, the student can be taught to think critically to justify the ultimate solution. Traditional teaching practices (e.g., lecture, tutorial, computer laboratory session) are restricted in their opportunity to create a transformative learning environment. Hence, other teaching practices need to be trailed and used. In this context, scenario-based learning (SBL) presents an effort of teaching by solving a simulated problem [5, 44, 47]. SBL indulges learners in being more proactive in their learning process and provides the option to enhance real-world problem-solving ability. This paper describes the scenario-based teaching and learning feedback assessment method for a legal knowledge-based ITS. Academics and legal practitioners often deploy two reasoning methods when preparing to analyze a law case: reasoning by legislative statutes and reasoning by analogy [4, 32, 34]. There exist at least three unique software architecture development methods for knowledge-based systems in the context of the legal domain: normative reasoning (or reasoning based on rules – RBR) [3, 35, 53]; case-based reasoning (CBR) [2, 12, 19, 51]; and a combined effort [14, 36–39] or deployment of a system based on other reasoning techniques [59]. Three fundamental techniques have been highlighted by practitioners for assessing deployed software systems [1, 10, 11, 48] that involves the following assessment
47 Problem Solution Strategy Assessment of a Hybrid Knowledge …
637
levels: (i) technical assessment, (ii) empirical assessment, and (iii) subjective assessment. In addition, different academic researches are highlighting the use of these approches [27, 28, 32, 49, 58]. Many research projects have assessed the functionalities of software systems by examining the quality of outcomes of these systems [4, 52]. As Adelman noted [1], technical assessment techniques focused on internal appropriateness and provide verification techniques. The questionnaires-based examination technique is wellliked, and it has been used in a different applications [17, 29, 33]. Questionnaires are often used to evaluate a software system’s performance [1]. The fuzzy set theory has been used in many industrial applications [18, 51]. This paper presents a fuzzy linguistic term-based assessment technique [6, 9].
3 Structure of GBMA The software system’s architecture of GBMA consists of an RBR module, CBR module, and suitability of reasoning metho. The architecture of GBMA is shown in Fig. 1. The designed software system encapsulates three functionalities of business merger and acquisition processes (e.g., valuation, planning, and modification of initial
Fig. 1 Software system architecture for GBMA
638
K. Pal
Fig. 2 An S-type rule structure
planning) [13, 15, 24, 41]. In addition, the developed system provides a simple text-based user interface.
3.1 Knowledge Representation in GBMA In the implemented software system, many characteristics of previously decided legal case reports are detailed information represented systematically using objectoriented design principles. Statutory legislative guidance is also translated into ruleform, saved in the software system’s rule-based reasoning (RBR) component. The rules are classified into four categories (e.g., V-type for target business valuation, S-type for future planning purposes, M-type for modification of the initial plan, and C-type for software system operation management). An S-type rule for shareholders protection related issues is described in Fig. 2. The diagrammatic architecture of the deployed software system is presented in Fig. 1.
3.2 Reasoning Based on Previously Decided Case Law Previously decided law case reports help solve a new business acquisition and merger case based on its analogical similarity with past cases [38, 39]. This way, analogical reasoning uses a similarity measurement mechanism [21].
47 Problem Solution Strategy Assessment of a Hybrid Knowledge …
639
Similarly, a rule-based scoring mechanism is also used to compute different types of RBR-based guidance. The score of a predictive rule Ri can be defined as follows: Scor e Ri =
Scor eu Scor el
(1)
where Scoreu = w1 Ne + w2 Ns + w3 Np ; and Scorel is the total number of preconditions of the rule in the equation; Ne , Ns , Np are the numbers of essential, significant, and peripheral preconditions that are true for the current case. The weighting factors w1 , w2 , and w3 are essential, significant, and peripheral three categories. It has been found that the most convincing behaviour of GBMA occurs when w1 = 0.75, w2 = 0.62, and w3 = 0.25. In addition, based on the retrieved information from the developed software system, the legal expert also provided support upon these values. This way, RBR can provide complete and partial advice based on the above scoring mechanism. A fuzzy set theory-based assessment method has been used to measure the performance of the GBMA by using a questionnaire-based survey. The following section describes the basic concepts of fuzzy subjective knowledge.
4 Fuzzy Subjective Knowledge Fuzziness happens when the boundary of a piece of information is not clearly described. More formally, a fizzy set A in a universe of discourse U is characterized by a membership function: µ A (x) : × → [0, 1] Which associates with each element x of X a number (x in the interval [0, 1]) that considers the grade of membership of x in the fuzzy set A. The fuzzy sets theory, first outlined by [56–58] was developed to model the concept of fuzzy information, and decision-making processes [6]. One of the most useful representations is by its membership function. Definition 1: Simply, fuzzy membership function can be defined as: µ A (u) = 1 − µ A (u) f A (x) =
1 0
if x ∈ A if x ∈ / A
µ A (x) = Degr ee ( x ∈ A)
640
K. Pal
Fig. 3 A simple framework of GBMA
The membership deciding criteria (or function) of a fuzzy number can be classified in different ways, such as a triangular fuzzy number (TFNs), trapezoidal fuzzy number. Definition 2: A triangular fuzzy number denoted by A = (a, b, c), has the membership function as:
µ A (x) =
⎧ x−a ⎪ ⎨ b−a x−c b−c
⎪ ⎩ 0
if a ≤ x ≤ b, a = b if b ≤ x ≤ c, b = c other wise
This triangular fuzzy number A can be defined by a triplet (a, b, c) as shown in Fig. 3. Modelling human behaviour using fuzzy sets has given an appropriate method for subject-specific imprecise [56] decision making. Linguistic terms deal with a verbal expression as their values [30, 57].
5 Fuzzy Approaches for GBMA Assessment The current research used three end-user groups for GBMA’s performance assessment purpose. The software system’s decision quality is evaluated by the users. A set of fourteen performance criteria are used for system assessment. In this process, fuzzy triangular numbers are converted into corresponding crisp real numbers. The expected value (EV) based technique [21] is used for this purpose, and its definition is as: E V (T ) =
(a + 2b + c) 4
(2)
47 Problem Solution Strategy Assessment of a Hybrid Knowledge …
641
The synthesis of end-user responses is presented in Table 1. The computational procedure consists of – (i) delimitation of TFNs of linguistic variables, (ii) identifying appropriate weights of linguistic variables, and (iii) converting in real numbers. Individual users assessed the software system’s assessment criteria, and the gathered data has been plotted in a two-dimensional graphical representation, as shown in Fig. 4. Its practical exercise represents the application of cognitive theory to the study of users understanding of a particular subject with the help of an automated software system. In recent decades, research in cognitive theory and social psychology has attracted much more interest in studying an individual’s understanding using theoretical and empirical results. However, the mental theory-based end-user learning experience is a promising direction for future research. Table 1 Synthesis of the user responses Criteria
Importance
Group One (G1)
Group Two (G2)
Group Three (G3)
Fuzzy
Real
Fuzzy
Real
Fuzzy
Real
C01
0.081
(6.31, 8.81, 9.52)
8.36
(5.94, 8.44, 9.37)
8.04
(6.45, 8.95, 9.47)
8.45
C02
0.085
(6.19, 8.69, 9.52)
8.27
(7.19, 9.69, 10.0)
9.14
(6.45, 8.95, 9.73)
8.52
C03
0.073
(4.54, 5.90, 8.45)
6.72
(3.91, 6.09, 7.81)
5.98
(4.47, 6.71, 8.29)
6.55
C04
0.066
(6.43, 8.93, 9.40)
8.42
(4.06, 6.41, 7.97)
6.21
(6.32, 8.82, 9.34)
8.32
C05
0.069
(5.71, 8.21, 9.17)
7.82
(4.38, 6.72, 8.44)
6.56
(5.92, 8.42, 9.21)
7.99
C06
0.065
(3.93, 6.31, 8.10)
6.16
(4.06, 6.25, 7.97)
6.13
(4.08, 6.45, 8.16)
6.28
C07
0.058
(3.95, 5.12, 7.02)
5.31
(3.91, 6.25, 8.13)
6.14
(3.29, 5.25, 7.24)
5.26
C08
0.059
(4.05, 6.07, 7.62)
5.95
(4.22, 6.72, 8.59)
6.56
(3.82, 5.79, 7.37)
5.69
C09
0.063
(6.31, 8.81, 9.52)
8.36
(5.47, 7.97, 9.06)
7.62
(5.79, 8.16, 9.08)
7.30
C10
0.084
(7.50, 10.0, 10.0)
9.37
(7.03, 9.37, 9.37)
8.79
(7.50, 10.0, 10.0)
9.37
C11
0.076
(3.69, 6.07, 7.74)
5.89
(3.91, 6.09, 7.97)
5.02
(4.08, 6.45, 8.03)
6.25
C12
0.064
(5.24, 7.74, 9.17)
7.72
(4.06, 6.09, 7.66)
5.97
(5.26, 7.76, 9.08)
7.46
C13
0.073
(6.90, 9.40, 9.88)
8.89
(6.87, 9.38, 9.84)
8.87
(7.24, 9.74, 10)
9.18
C14
0.084
(7.26, 9.76, 10.0)
9.19
(6.72, 9.22, 9.84)
8.75
(7.24, 9.74, 10.0)
9.18
642
K. Pal
Fig. 4 Groupwise assessment criteria assessment
Table 2 Experimental outcome
Group number
Expectancy value
Group01
7.6885
Group02
7.3095
Group03
7.7103
The expected outcome values are computed based on the end-user responses using Eq. (2), and the experimental results for the three user groups are presented in Table 2. The difference between the three user groups’ expected values is minimal, and their sequence is G1 > G3 > G2. The evaluation outcomes indicated that the end-users were satisfied with the developed system’s most of the assessment criteria. Therefore the analysis here aims to improve our understanding of software-mediated learning and assessment practice. Modelling end-user behaviour using fuzzy sets effectively allows the assessment to formulate decision problems where the available information is subjective and imprecise. For example, Artificial Intelligence (AI) technique [52] makes it possible to use this soft-computing method to automate software system assessment practice.
6 Conclusion This paper presents some reflective analyses of the teaching and learning experience of an undergraduate legal reasoning practice. The reasoning context covers the subject area of business merger and acquisition of legal procedure based on a hybrid knowledge-based software system. The application software system uses its stored knowledge to be bidding for a business, formulate the plan, and modify the initial formulation of plan (when needed) for the business buy out processes. The
47 Problem Solution Strategy Assessment of a Hybrid Knowledge …
643
described knowledge-based system uses two reasoning methods, RBR and CBR, to help end-users (e.g., undergraduate students) in decision-making practice. The paper also presents an assessment mechanism of the GBMA software system. Fuzzy numbers and membership functions are an appropriate mechanism to handle the uncertainty of concepts relating to human beings’ subjective judgements.
References 1. Adelman L (1992) Evaluating decision support and expert systems. Wiley, New York 2. Ashley KD (1987) Modelling legal argument: reasoning with cases and hypotheticals. PhD thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, USA 3. Allen LE, Saxon C (1987) Some problems in designing expert systems to aid legal reasoning. In: Proceedings of the first international conference on artificial intelligence and law’, pp 94–103. ACM Press, New York 4. Bench-Capon TJM, Coenen F (1991) Practical application of KBS to law: the crucial role of maintenance. In: Noordwijk C, Schmidt A, Winkels R (eds) Legal knowledge-based systems: aims for research and development, 5–17, Lelystad, Veranda, The Netherlands 5. Bard JF, Feo TA, Holland SD (1995) Reengineering and the development of a decision support system for printed wiring board assembly. IEEE Trans Eng Manage 42:91–98 6. Bellman RE, Zadeh LA (1997) Local and fuzzy logics. In: Epstein G (ed) Modern uses of multiple-valued logic, pp 103–165 7. Biggs JB (1987) Student approaches to learning and studying, australian council for educational research, Melbourne 8. Biggs JB (1999) Teaching for quality learning at university. Open University Press, Buckingham 9. Bound D, Feletti GI (eds) (1991) The challenge of problem-based learning. Kogan Page, London 10. Borenstein D (1998) Towards a practical method to validate decision support systems. Decis Support Syst 23:227–239 11. Boritz JE, Wensley AKP (1992) Evaluating expert systems with complex outputs – the case of audit planning. Auditing J Pract Theor 11:14–29 12. Bain WM (1986) Case-based reasoning: a computer model of subjective assessment. PhD thesis, Department of Computer Science, Yale University, New Haven, USA 13. Burton SJ (1985) An introduction to law and legal reasoning, little, brown and company 14. Branting LK (1991) Integrating rules and precedents for classification and explanation: automating legal analysis. PhD thesis, Department of Computer Science, University of Texas, Austin, USA 15. Brealey R, Myers S, Allen F (2019) Principles of corporate finance. McGraw-Hill Education (2019). 16. Bloom BS (1984) The 2 sigma problem: the search for methods of group instruction as effective as one-to-one tutoring. Educ Res 13:4–16 17. Chen JQ, Lee SM (2003) An exploratory cognitive DSS for strategic decision making. Decis Support Syst 36:147–160 18. Chang TH, Wang TC (2009) Using the fuzzy multi-criteria decision-making approach for measuring the possibility of successful knowledge management. Inf Sci 179:355–370 19. Cuthill BB (1993) Using a multi-layered approach to representing Tort law cases for CBR. In: Proceedings of AAAI case-based reasoning workshop, pp 41–47, Washington 20. Carbonell JR (1970) AI in CAl: an artificial intelligence approach to computer-aided instruction. IEEE Trans Man-Mach Syst 11:190–202
644
K. Pal
21. Dubois D, Prade H (1991) Fuzzy sets in approximate reasoning, Part I: inference with possibility distribution. Fuzzy Sets Syst 40:143–202 22. D’Andrea V (2001) Organizing teaching and learning: outcome-based planning, a handbook for teaching and learning in higher education, 41–57. Kogan Page, London 23. Felix U (2002) The web as a vehicle for constructivist approaches in language teaching. ReCALL 14(1):2–15 24. French, D.: Blackstone’s Statutes on Company Law (2020). 25. Gregory V, Graham DW (1971) The paradox of socrates, the philosophy of socrates: a collection of critical essays. Anchor Books 26. Halsey AH (1992) Opening wide the door of higher education, NCE briefing No 6. National Commission on Education, London 27. Honey P, Mumford A (1982) The manual of learning styles, peter honey, maidenhead, United Kingdom 28. Hernando ME, Gomez EJ, Corocy R, del Pozo F (2002) Evaluation of DIABNET, a decision support system for therapy planning in general diabetes. Comput Methods Programs Biomed 62:235–248 29. Hubona GS, Blanton JE (1996) Evaluating system design features. Int J Hum Comput Stud 44:93–118 30. Hsieh TY, Lu ST, Tzeng GH (2004) Fuzzy MCDM approach for planning and design tender’s selection in public office buildings. Int J Project Manag 22(7):573–584 31. Johnson P, Mead D (1991) Legislative knowledge base systems for public administration – some practical issues. In: Proceedings of the third international conference on AI and law, pp 108–117 32. Kobbacy KAH, Proudlove NC, Harper MA (1995) Towards an intelligence maintenance optimization system. J Oper Res Soc 46:831–853 33. Li SL (2000) The development of a hybrid intelligent system for developing a marketing strategy. Decis Support Syst 27:395–409 34. Levi EH (1984) An introduction to legal reasoning. The University of Chicago Press 35. Michaelson RH (1982) A knowledge-based system for individual income and transfer tax planning. PhD thesis, Department of Computer Science, University of Illinois, Illinois, USA 36. Pal K, Campbell JA (1997) An application of rule-based and case-based reasoning with a single legal knowledge-based system. DATA BASE Adv Inf Syst SIGMIS 28(4):48–63 37. Pal K, Campbell JA (1995) A hybrid system for decision-making about assets in English divorce cases. In: Advances in case-based reasoning: first UK CBR workshop. LNAI, vol 1020. Springer: pp 152–165 38. Pal K, Campbell JA (1996) A hybrid Legal decision-support system using both rule-based and case-based reasoning. Inf Commun Technol Law 5:227–245 39. Pal K, Campbell JA (1998) ASHSD-II: A computational model for litigation support. Expert Syst 15(3):169–181 40. Parry G (1995) England, Wales, and Northern Ireland, in Adult in higher education: international perspectives on access and participation. In: Davies P (ed) Jessica Kingsley Publishers, London: pp 102–133 41. Pike R, Neale B, Linsley P (2015) Corporate finance and investment: decisions and strategies, Pearson Education (2015) 42. Kogan M, Moses I, El-Khawas E (1994) Staffing higher education: meeting new challenges. Jessica Kingsley Publishers, London 43. Kolb DA (1984) Presidential learning: experience as the source of learning and development. Prentice-Hall, Englewood Cliffs 44. Kindley RW (2002) Scenario-based e-learning: a step beyond traditional e-learning. ASTD Mag 45. Mayson SW, French D, Ryan CL (2000) Company law. Oxford University Press 46. Martin E (1999) Changing academic work: developing the learning university, Society for Research into Higher Education. Educatio Open University Press, Buckingham
47 Problem Solution Strategy Assessment of a Hybrid Knowledge …
645
47. Miser HJ, Quade ES (1988) Handbook of system analysis – craft issues and procedural choices. Wiley, USA 48. O’Keefe RM, Preece AD (1996) The development, validation and implementation of knowledge-based systems. Eur J Oper Res 92:458–473 49. Ram S, Ram S (1996) Validation of expert systems for innovation management: issues, methodology, and empirical assessment. J Prod Innov Manag 13:53–68 50. Riggert SC, Boyle M, Petrosko JM, Ash D, Rude-Parkin C (2006) Student employment and higher education: empiricism and contradiction. Rev Educ Res 76(1):63–92 51. Rissland EL, Ashley KD (1989) HYPO: precedent-based legal reasoning. In: Vandenberghe G (ed) Advanced topics of law and information technology. Kluwer, Dordrecht: p 213 52. Samek W, Wiegard T, Muller KR (2017) Explaining artificial intelligence: understanding, visualizing and interpreting deep learning models. ITUJ 1:1–10 53. Sergot MJ, Kowalski R, Kriwaczek F, Hammon P, Cory HT (1986) The British nationality act as a logic program. Commun ACM 29:370–386 54. Shehzad K, Javed MY (2010) Multithreaded fuzzy logic-based web services mining framework. Eur J Sci Res 4:632–644 55. Sharda R, Barr SH, Mcdonnell MC (1988) Decision Support System effectiveness – a review and an empirical test. Manage Sci 34:139–159 56. Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353 57. Zadeh L (1975) The concept of a linguistic variable and its application to approximate reasoning. Inf Sci 8:199–249 58. Zeleznikow J, Nolan JR (2001) Using soft computing to build real-world intelligent decision support systems in uncertain domains. Decis Support Syst 31:263–285 59. Zeleznikow J, Stranieri A (1995) The split-up system: integrating neural nets and rule-based reasoning in the legal domain. In: Proceedings of the fifth international conference on AI and law, University of Maryland, vol 185. ACM Press, New York 60. Zimmermann HJ (1996) Fuzzy set theory and its applications. Kluwer Academic Publishers, Boston
Author Index
A Abualigah, Laith, 283 Afrin, Sadia, 319 Ahmad, Nabihah, 469 Akter, Shammi, 25 Ali, Shahnewaz, 269 Alsowail, Rakan A., 183 Amin, Fariha Fardina, 209 Amirineni, Sai Venkata Dhanush, 379 Anannya, Mehrin, 597 Ara, Ifath, 209 Arandjelovic, Jelena, 239 Asha, S., 485 Azim, Amirul, 41
Deshpande, Santosh L., 389 Dhal, Sunil Kumar, 369 Dornberger, Rolf, 53, 67, 81 Dunnigan, Matthew W., 523 Durai, S. Ananiah, 469
B Bacanin, Nebojsa, 239 Bahuguna, Saniya, 295 Bhardwaj, Arpit, 197 Bhatti, Dharmendra, 153 Bhowmick, Milton Chandro, 25 Booba, B., 445 Boomija, M. D., 565
G Goel, Shivani, 197
C Chaudhari, Dinesh N., 1 Chitra, L., 583 Chowdhury, Anjir Ahmed, 509 D Das, Argho, 509 Deb, Kaushik, 535, 549
E Eker, Erdal, 283 Ekinci, Serdar, 283 F Farin, Nusrat Jahan, 597 Fatema, Kaniz, 427
H Halder, Rathin, 25 Hanne, Thomas, 53, 67, 81 Haque, MD. Muhyminul, 549 Haque, Samiul, 535 Hasan, Md. Zahid, 427 Hoque, Khadija Kubra Shahjalal, 509 Hosen, Md. Biplob, 597 Huynh, Duy C., 523 I Islam, Khadija, 597 Islam, Muhammad Nazrul, 25, 41, 209 Izci, Davut, 283
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. S. Uddin et al. (eds.), Proceedings of International Joint Conference on Advances in Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-0332-8
647
648 J Jadhav, Priyanka, 399 Jayasree, T., 399 Joseph, P. Mani, 239 K Kalaichelvi, T., 343 Kaliwal, Rohit B., 389 Kalpana, C., 445 Känel, Urs, 67 Karmaker, Debajyoti, 509 Kasmir Raja, S. V., 565 Kasukurthi, Rohit Sai, 379 Kathole, Atul B., 1 Kaur, Mandeep, 97, 305 Kavitha Kumari, K. S., 583 Khan, Nafiz Imtiaz, 209 Kiran, Swar, 399 Kromer, Pavel, 457 Kruta, Jan, 67 Kumar, Ajai, 329 Kumar, Arvind, 197 Kumar, Sanjay, 9 Kumari, Poonam, 97 L Le, Duc-Anh, 457 M Majumder, Anup, 319 Mehdi, Mehtab, 355 Mohapatra, Sudhir Kumar, 369 Mubashshira, Tasneem, 209 Mukherjee, Subhadip, 613 Mukhopadhyay, Somnath, 613 Mulat, Worku Wondimu, 369 N Nguyen, Quoc-Dung, 457 Nipa, Nadira Anjum, 415 P Padmavathi, G., 485 Pal, Ashok, 295 Pal, Kamalendu, 635 Pandey, Ajay K., 269 Pandya, Mayur, 251 Pant, Shivani, 9 Phan, Nguyet-Minh, 457 Phan, Nguyet-Thuan, 457 Prathyusha Reddy, M., 469 Purohit, Aarti, 167 Puspita, Kazi Mumtahina, 427
Author Index R Rakic, Andjela, 239 Rathod, Jigna, 153 Ravi, S., 343 Rony, Md. Awlad Hossen, 427 S Sahoo, Satyabrata, 223 Sales, Camila Pereira, 131 Santiago Júnior, Valdivino Alexandre de, 131 Sarkar, Sunita, 613 Sathpathy, Rabinarayana, 369 Schär, Kevin, 81 Schwank, Philippe, 81 Shanmugapriya, D., 485 Sharma, Bharti, 355 Sharma, Sanjeev, 329 Sharmin, Shayla, 535 Shehnaz, 305 Shorif, Sumaita Binte, 319 Sil, Riya, 623 Sindhu, K., 495 Singh, Shashi Pal, 329 Singha, Chiranjit, 113 Sinha, Nishant, 197 Sowjanya, G., 469 Sravani, M. M., 469 Strumberger, Ivana, 239 Suha, Sayma Alam, 25 Sultana, Naznin, 415 Suppan, Mélanie, 53 Swain, Kishore C., 113 T Teja, K., 223 Tharinipriya, T., 399 Tiwari, Ritu, 329 Truong, Thanh H., 523 Tyagi, Ritika, 495 U Uddin, Mohammad Shorif, 319, 427, 597 V Valadi, Jayaraman, 251 Venkatachalam, K., 239 Y Yogi, Kuldeep Kumar, 167 Z Zivkovic, Miodrag, 239