131 65 5MB
English Pages 192 [187] Year 2022
Studies in Computational Intelligence 1029
Siddhartha Bhattacharyya Gautam Das Sourav De Editors
Intelligence Enabled Research DoSIER 2021
Studies in Computational Intelligence Volume 1029
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/7092
Siddhartha Bhattacharyya · Gautam Das · Sourav De Editors
Intelligence Enabled Research DoSIER 2021
Editors Siddhartha Bhattacharyya Rajnagar Mahavidyalaya Birbhum, India
Gautam Das Cooch Behar Government Engineering College Cooch Behar, West Bengal, India
Sourav De Cooch Behar Government Engineering College Cooch Behar, West Bengal, India
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-981-19-0488-2 ISBN 978-981-19-0489-9 (eBook) https://doi.org/10.1007/978-981-19-0489-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Dr. Siddhartha Bhattacharyya would like to dedicate this book to his loving and caring wife Rashni. Sourav De would like to dedicate this book to his respected parents Satya Narayan De and Tapasi De, loving wife Debolina Ghosh, beloved son Aishik De, sister Soumi De, his father-in-law Late Barun Ghosh and mother-in-law Alpana Ghosh and brother-in-law Subhodip Ghosh.
Preface
With the advent of computational intelligence (CI) techniques, modern technological advances have been enabled by newer and novel CI-enabled techniques over the last few decades. The CI-enabled techniques have almost become an integral part of modern civilization encompassing a broad range of applications from scientific, engineering, medical and financial applications due to their ability to offer failsafe and robust end solutions. As a part of its initiative for furthering technological advances and know-how, the Government of India has been laying stress on the inculcation of computational intelligence in almost all sectors that influence the Indian economy. The Doctoral Symposium on Intelligence Enabled Research is one of the first such attempts in line with the initiatives of the Government of India and which has led several other similar initiatives to follow. The first edition of the Doctoral Symposium on Intelligence Enabled Research (DoSIER 2019) was organized by RCC Institute of Information Technology, Kolkata, India, way back in the year 2019, with the novel cause of providing the doctoral students and early career researchers a unique platform to showcase their innovations to the research and academic fraternity. The second edition of the symposium, DoSIER 2020, was organized by VisvaBharati University, Santiniketan, India, in 2020 in a virtual mode keeping in mind the objective of creating intelligent solutions for the future. Due to the severe pandemic currently affecting the world socio-economic structure, the third edition—the 2021 Third Doctoral Symposium on Intelligence Enabled Research (DoSIER 2021)—was organized by Cooch Behar Government Engineering College, Cooch Behar, India, in association with Acuminous Research Foundation (AcuminoR), Kolkata, India, during November 12–13, 2021, in a virtual mode using the Webex meet platform, however, adhering to the quality constraints as prescribed by the technical sponsors. More than 100 participants attended the proceedings of DoSIER 2021 held online. The symposium was technically sponsored by the IEEE Computational Intelligence Society, Kolkata Chapter. DoSIER 2021 featured four keynotes delivered by eminent researchers and academicians across the globe along with two technical tracks. The keynote speakers included (i) Prof. Debashis De, Maulana Abul Kalam Azad University of Technology, Kolkata, India, (ii) Mr. Aninda Bose, Senior Editor, Springer India, (iii) Prof. Ashish vii
viii
Preface
Mani, Amity University Noida, India, and (iv) Dr. Shakeel Ahmed, King Faisal University, Saudi Arabia. DoSIER 2021 received a good number of submissions from doctoral students from around the globe. After peer review, only 13 papers were accepted to be presented at the conference. The authors of the accepted papers presented their peer-reviewed articles under the two technical tracks of DoSIER 2021. Birbhum, India Cooch Behar, India Cooch Behar, India December 2021
Siddhartha Bhattacharyya Gautam Das Sourav De
Contents
Solving Graph Coloring Problem Using Ant Colony Optimization, Simulated Annealing and Quantum Annealing—A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arnab Kole, Debashis De, and Anindya Jyoti Pal Computer-Assisted Diagnosis and Neuroimaging of Baby Infants . . . . . . Vinodkumar R. Patil and Tushar H. Jaware Early Prediction of Ebola Virus Using Advanced Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avinash Sharma, Asadi Srinivasulu, and Tarkeshwar Barua A Three-Step Fuzzy-Based BERT Model for Sentiment Analysis . . . . . . . Koyel Chakraborty, Siddhartha Bhattacharyya, and Rajib Bag
1 17
31 41
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources Single Area Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Muthukumar, K. Jagatheesan, and Sourav Samanta
53
Group Key Management Techniques for Secure Load Balanced Routing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Praveen Bondada and Debabrata Samanta
65
Search Techniques for Data Analytics with Focus on Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archana S. Sumant and Dipak V. Patil
77
A Survey on Underwater Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . Pratima Sarkar, Sourav De, and Sandeep Gurung
91
Covacdiser: A Machine Learning-Based Web Application to Recommend the Prioritization of COVID-19 Vaccination . . . . . . . . . . . 105 Deepraj Chowdhury, Soham Banerjee, Ajoy Dey, Debasish Biswas, and Siddhartha Bhattacharyya
ix
x
Contents
Research of High-Speed Procedures for Defuzzification Based on the Area Ratio Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Maxim Bobyr, Sergey Emelyanov, Natalia Milostnaya, and Sergey Gorbachev A Single Qubit Quantum Perceptron for OR and XOR Logic . . . . . . . . . . 133 Rohit Chaurasiya, Divyayan Dey, Tanmoy Rakshit, and Siddhartha Bhattacharyya Societal Gene Acceptance Index-Based Crossover in GA for Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Ravi Saini, Ashish Mani, and M. S. Prasad The Method of Neuro-fuzzy Calibration of Geometrically Distorted Images of Digital X-Ray Tomographs . . . . . . . . . . . . . . . . . . . . . . 167 Sergey Gorbachev, Dmytro Shevchuk, Victor Kuzin, Siddhartha Bhattacharyya, and Wang Zhijian
Editors and Contributors
About the Editors Prof. Siddhartha Bhattacharyya [FRSA (UK), FIET (UK), FIEI (I), FIETE, LFOSI, SMIEEE, SMACM, SMIETI] did his Bachelors in Physics, Bachelors in Optics and Optoelectronics and Masters in Optics and Optoelectronics from the University of Calcutta, India in 1995, 1998 and 2000 respectively. He completed Ph.D. in Computer Science and Engineering from Jadavpur University, India in 2008. He is the recipient of the University Gold Medal from the University of Calcutta for his Masters. He is the recipient of several coveted awards including the Distinguished HoD Award and Distinguished Professor Award conferred by Computer Society of India, Mumbai Chapter, India in 2017, the Honorary Doctorate Award (D.Litt.) from The University of South America and the South East Asian Regional Computing Confederation (SEARCC) International Digital Award ICT Educator of the Year in 2017. He has been appointed as the ACM Distinguished Speaker for the tenure of 2018–2020. He is currently serving as the Principal of Rajnagar Mahavidyalaya, Rajnagar, Birbhum. He served as a Professor in the Department of Computer Science and Engineering of Christ University, Bangalore. He served as the Principal of RCC Institute of Information Technology, Kolkata, India during 2017–2019. He is a coauthor of 6 books and the co-editor of 86 books and has more than 300 research publications in international journals and conference proceedings to his credit. His research interests include hybrid intelligence, pattern recognition, multimedia data processing, social networks, and quantum computing. Dr. Gautam Das is a Professor of the ECE department of Cooch Behar Government Engineering College, West Bengal. He completed B.Tech. and M.Tech. from the Institute of Radio Physics and Electronics, Calcutta University, and subsequently completed Ph.D. from NBU. Dr. Das has more than 19 years of teaching and research experience. He has been the author and co-author of many journals and conference papers and participated
xi
xii
Editors and Contributors
in/organized national and international conferences. His area of interest includes System-on-Chip Testing and Design of Smart City. Dr. Sourav De [SMIEEE, MACM, MIEI, LISTE, MCSTA, MIAENG] is currently an Associate Professor of Computer Science and Engineering at Cooch Behar Government Engineering College, West Bengal. With over 15 years of academic experience, he has authored one book and edited 12 books, and contributed to more than 54 research publications in internationally reputed journals, edited books, and international IEEE conference proceedings, and has five patents to his credit. His research interests include soft computing, pattern recognition, image processing, and data mining.
Contributors Rajib Bag Indas Mahavidyalaya, Indas, Bankura, India Soham Banerjee Department of Electronics and Communication Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India Tarkeshwar Barua BlueCrest University, Monrovia, Liberia Siddhartha Bhattacharyya Rajnagar Mahavidyalaya, Rajnagar, Birbhum, West Bengal, India Debasish Biswas Department of Electronics and Communication Engineering, Budge Budge Institute of Technology, Kolkata, West Bengal, India Maxim Bobyr Southwest State University, Kursk, Russian Federation Praveen Bondada Dayananda Sagar Research Foundation, University of Mysore (UoM), Mysore, Karnataka, India Koyel Chakraborty Supreme Knowledge Foundation Group of Institutions, Mankundu, India Rohit Chaurasiya Department of Applied Physics, Delhi Technological University, Bawana, Delhi, India Deepraj Chowdhury Department of Electronics and Communication Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India Debashis De Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India Sourav De Cooch Behar Government Engineering College, Cooch Behar, India Ajoy Dey Department of Electronics and TeleCommunication Engineering, Jadavpur University, Jadavpur, West Bengal, India
Editors and Contributors
xiii
Divyayan Dey Department of Electronics and Communication Engineering, University of Calcutta, Kolkata, West Bengal, India Sergey Emelyanov Southwest State University, Kursk, Russian Federation Sergey Gorbachev National Research Tomsk State University, Tomsk, Russian Federation Sandeep Gurung Sikkim Manipal Institute of Technology, SMU, Rangpo, India K. Jagatheesan Department of EEE, Paavai Engineering College, Namakkal, India Tushar H. Jaware R. C. Patel Institute of Technology, Shirpur, M.S., India Arnab Kole The Heritage Academy, Kolkata, West Bengal, India Victor Kuzin Russian Academy of Engineering, Moscow, Russian Federation Ashish Mani Amity Innovation and Design Centre, Amity University, Noida, Uttar Pradesh, India Natalia Milostnaya Southwest State University, Kursk, Russian Federation T. Muthukumar Department of EEE, Kongunadu College of Engineering and Technology, Trichy, India Anindya Jyoti Pal University of Burdwan, Burdwan, West Bengal, India Dipak V. Patil Department of Computer Engineering, GES’s R H Sapat College of Engineering Management Studies and Research, P T A Kulkarni Vidyanagar, Nashik, Maharashtra, India Vinodkumar R. Patil R. C. Patel Institute of Technology, Shirpur, M.S., India M. S. Prasad Amity Institute of Space Science and Technology, Amity University, Noida, Uttar Pradesh, India Tanmoy Rakshit Department of Computer Science and Engineering, Seacom Engineering College, Howrah, West Bengal, India Ravi Saini Department of Computer Science and Engineering, ASET, Amity University, Noida, Uttar Pradesh, India Debabrata Samanta Dayananda Sagar Research Foundation, University of Mysore (UoM), Mysore, Karnataka, India; Department of Computer Science, CHRIST Deemed to be University, Bengaluru, India Sourav Samanta Department of CSE, University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India Pratima Sarkar Sikkim Manipal Institute of Technology, SMU, Rangpo, India Avinash Sharma Maharishi Markandeshwar Engineering College, Haryana, India
xiv
Editors and Contributors
Dmytro Shevchuk National Technical University of Ukraine “Kyiv Polytechnic Institute”, Kyiv, Ukraine Asadi Srinivasulu BlueCrest University, Monrovia, Liberia Archana S. Sumant Department of Computer Engineering, MET’s Institute of Engineering, Bhujbal Knowledge City, Adagaon, Nashik, Maharashtra, India Wang Zhijian Robotics Institute, Zhejiang University, Ningbo, China
Solving Graph Coloring Problem Using Ant Colony Optimization, Simulated Annealing and Quantum Annealing—A Comparative Study Arnab Kole , Debashis De , and Anindya Jyoti Pal
Abstract Graph coloring’s paramount significance drives various advancements in developing state-of-the-art algorithms to resolve both empirical and practical problems in computer science. However, no single algorithm is robust enough to produce optimal solutions for all widely recognized benchmark instances. Meta-heuristic algorithms (MHA) and evolutionary algorithms (EA) make prudent and optimal solutions for most of the cases of graph coloring problems (GCP). Moreover, these algorithms can offer optimal solutions heuristically in contrast to conventional wearisome approaches. This paper presents a comparative study in solving GCP to find the chromatic number based on three MHAs, namely ant colony optimization (ACO), simulated annealing (SA), and quantum annealing (QA), in a single framework, with favorable outturns. The final result proves QA outperforms the performances of SA and ACO, but it takes more time than ACO and SA for large graph instances. Keywords Kinetic energy · Meta-state · Pheromone intensity · Thermal gradient · Quantum fluctuations
1 Introduction The idea of graph coloring originated from uniquely coloring the countries of a map with minimal color indices, i.e., generalized version of face-coloring or regioncoloring of a planar graph. The regions of a planar graph are considered appropriately colored if no two contiguous areas, having a common edge between them with the same color, referred to as map-coloring, which leads to the notion of edge-coloring cardinal to the concept of vertex-coloring wherein two vertices of an edge have two A. Kole (B) The Heritage Academy, Kolkata, West Bengal 700107, India e-mail: [email protected] D. De Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal 700064, India A. J. Pal University of Burdwan, Burdwan, West Bengal 713104, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_1
1
2
A. Kole et al.
different colors. Formally, a proper coloring of a graph implies the coloring of adjacent vertices with distinct colors. A graph conceived by adopting an appropriate mechanism of coloring is perceived as an adequately colored graph requiring minimum colors. A graph G with k different colors for its proper coloring is termed as a k-chromatic graph, and the number k is called the chromatic number of G. The reason behind the importance of GCP has two aspects. First, there are some areas where efficient coloring of an undirected graph with the minimum possible number of colors directly impacts how effectively a specific target problem can be solved. Such areas include a broad spectrum of real-life issues like examinationscheduling [1], register allocation [2], microcode optimization [3], channel routing [4], printed circuit testing [5], allocation of electronic bandwidth [6], flexible manufacturing systems design [7], and timetable scheduling [8]. The other reason is that it has been shown that the decision problem version of GCP is NP-Hard [9], and its approximate version is NP-Complete [10] due to which a variety of evolutionary and meta-heuristic approaches have been implemented till date to produce optimal or suboptimal solution in a reasonable time. We have applied ACO, SA, and QA to solve GCP for the following reasons: firstly, no work has been done yet to compare the performances of these three algorithms to solve GCP in a single framework. Secondly, ACO and PSO are nature-inspired evolutionary algorithms and [11–13] show that ACO outperforms PSO for different combinatorial hard optimization problems. GCP is also an NP-Complete combinatorial optimization problem. Thirdly, SA is an age-old classical meta-heuristic approach, whereas its new quantum variant is QSA or QA, and thus, to solve GCP, we have applied both these two. Finally, we made a comparative performance analysis of these two algorithms with one of the nature-inspired evolutionary algorithms.
2 Earlier Work In the works of [14, 15], authors employed adjacency matrix and link-based exact (LBE) algorithms to address GCP, whereas [16–18] exercised direct approaches, namely branch-and-bound (BB), branch-and-cut (BC), and backtracking algorithms to solve GCP. Firefly algorithm (FA) proposed by Chen and Kanoh [19] has also been used for GCP. Memetic algorithms (MA) have been adopted to solve GCP by Moalic and Gondran [20], Zhuang and Fan [21]. Zhou and Duval [22] and Lim and Wang [23] used hybrid algorithms (HA) for GCP. Reynolds Boyd swarms [24] and Cuckoo search optimizations [25, 26] have also been used to solve GCP. With the advent of GAs and other evolutionary approaches, the solutions of GCP and its applications have been achieved in optimal or nearly suboptimal time [27–29]. In path-breaking researches, Costa et al. [30] and Hertz et al. [31] proposed ACO-based and Tabu search methods, respectively, to solve GCP. Dahl [32] applied the ANN approach to address GCP, modified by Jagota [33]. Graph coloring and ACO-based summarization for social networks have been implemented by Mosa, Hamouda [34]. Lately, Henrique et al. [35] applied deep learning (DL) algorithm to solve GCP.
Solving Graph Coloring Problem Using Ant Colony …
3
Naderi et al. [36] used parallelism in GA to produce optimal solutions for complex problems in a lesser time. Filho et al. [37] developed a constructive genetic algorithm (CGA) with column generation, whereas Pal et al. [38] proposed a GA approach with a double point guided mutation operator, enhancing simple GA’s performance level. The divide and conquer-based GA method has been proposed by Marappan and Sethumadhavan [39] to solve GCP. Han et al. [40] designed a bi-objective genetic algorithm to solve GCP. Lakshmi et al. [41] proposed a hybrid genetic algorithm and local search for graph coloring. SA approach with some modifications has been proposed by Pal et al. [42] to find the chromatic number. A QA approach has been applied first by Crispin et al. [43] for GCP, and later Titiloye et al. [44] fine-tuned the parameters of QA. GCP has also been solved by Khan et al. [45] using quantuminspired evolutionary algorithm and recently by Xu et al. [46] with Cuckoo quantum evolutionary algorithm and by Basmassi et al. [47] with a novel greedy GA.
3 Ant Colony Optimization Ant algorithm [48, 49] was first proposed as a meta-heuristic approach to solving complex optimization problems such as traveling salesman problem (TSP) and quadratic assignment problem (QAP). Ant’s behaviors biologically inspire ACO algorithms to determine the pheromone-based shortest path between the nest and a food source. A local pheromone trail is a communication channel among the ants. Pheromone information is continuously modified during the process, which aids in searching the optimum path by forgetting the previously found best way slowly, known as the evaporation mechanism, critical in achieving a superior solution from an inferior one through inter-communication between ants. Finally, an excellent final solution is achieved by the global cooperation of all the ants through the movement in neighboring states, based on some problem-dependent local information, pheromone trail, and ant’s private information. Figure 1 describes the general ACO framework:
Fig. 1 General ACO framework
4
A. Kole et al.
4 Simulated Annealing Simulated annealing [50] is a classical meta-heuristic technique. The basic idea behind annealing innovation is to reproduce a natural metaphor by heating up to break the object’s chemical bonds and then cooling down using water or silicon dioxide that helps in crystallization. The cooling has to be done appropriately as slow cooling may cause crystal generation in a lower energy metastate than a higher energy metastate. In this method, a large volume of random sampling is repeated on ‘N’ dimensional hypercube to generate the sample states. During the process, whenever a new state is created, energy gets changed. Acceptance of a new state depends on energy change. Negative energy change or, with a certain probability, positive energy change leads to accepting a new state. This process is repeated with a temperature decrease until the temperature becomes 0. Figure 2 presents the general SA framework:
5 Quantum Annealing Quantum annealing (QA) [51] is a computational paradigm where some modifications have been made to the existing conventional approach—simulated annealing. Here, in the place of thermal fluctuations, one mechanism known as quantum fluctuation has been introduced to get faster convergence to the optimal state; state transitions occur because of this quantum fluctuation. Instead of a thermal gradient, a quantum field known as quantum fluctuation is used in QA. Change of energy occurred in short time lapses as per the result of the Heisenberg uncertainty principle. The process is controlled by selecting the quantum field strength, which determines the radius for neighboring states to be checked. In QA, the energy cost function and quantum-
Fig. 2 General SA framework
Solving Graph Coloring Problem Using Ant Colony …
5
kinetic terms jointly represent the time objective function over the problem domain. Initially, this kinetic term is set very considerably, and it is reduced toward zero within a precise time interval. Then, the quantum state evolves until a final solution is obtained based on the Schrödinger and Fokker-Plank wave equation, given in Eq. 1. ˆ P¯2 d ˆ¯ t)|Ψ (t) |Ψ (t) + V (r, Hˆ |Ψ (t) = i h¯ |Ψ (t) = dt 2m
(1)
where Hˆ is Hamiltonian operator, Ψ is the state vector of the quantum system, t is the time, h¯ is the reduced Planck constant, i is the imaginary unit, m is the mass of the particle, V is the potential representing the environment where the particle exists, rˆ¯ is is a three-dimensional position vector, and Pˆ¯ is is a three-dimensional momentum vector.
6 Proposed Algorithms 6.1 Representation of Solution For all the three algorithms, the solution is represented as a set of integers {1, 2, 3, · · · , n} which are color values of the nodes of the graph. The corresponding positions are the vertex numbers of the graph. For example, a solution (coloration) for a graph having five vertices can be represented as 1 3 2 3 1. Here, vertex 1 and vertex 5 have the same color 1, vertex 2 and vertex 4 have the same color 3, whereas vertex 3 has the color 2.
6.2 Fitness Calculation In a particular coloration, when two adjacent vertices have the same color, a conflict exists between them, and the corresponding vertices are said to be conflicted vertices. The following example illustrates the calculation of the conflict in a particular coloration: For the graph given in Fig. 3 and for the coloration generated above, the number of conflicts is 1. The conflicted vertices are 2 and 4 as these two vertices are adjacent and have the same color assignment.
Fig. 3 A sample graph with five vertices
6
A. Kole et al.
6.3 ACO Algorithm for GCP A sequential approach generates the initial solution in the ACO algorithm. In this approach, we have considered three sets, namely set1, set2, and set3, where all the graph vertices are placed in set3 initially. Then, the first vertex is moved to set1, and all the adjacent vertices of the first vertex are moved to set2. From the remaining vertices in set3, move the first vertex again in set1, and adjacent vertices of this newly moved vertex are moved to set2 using set union operation. This process continues until set3 becomes empty. When set3 is empty, all the vertices of set1 are colored using a single color, and all the vertices in set2 are moved to set3. The process ends when all the vertices of the graph are appropriately colored. In the central execution part of the algorithm, the ant movements are based on the same approach but with some restrictions and randomness. Here, the selection of vertices to be colored is random, not sequential. Every time color assignment is made to reduce the number of colors by maintaining a list of used colors so that no conflict arises in the coloration. The coloration determines the pheromone. When all ants complete their tours, a global pheromone has been updated by taking the best coloration made by all the ants. When the execution is trapped in local minima, the solution is reconstructed by updating the global pheromone based on the worst coloration of the colony. The termination condition set for the algorithm is either maximum iteration is reached or desired chromatic number is obtained, whichever occurs earlier. Algorithm 1: ACO Algorithm for GCP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Input: A graph with n vertices. Output: A Valid Coloration without any Conflict. Create an initial valid coloration by sequential approach; Set chrnum = number of distinct colors used in the initial coloration; while (Termination condition not met) do Distribute m_ants to m(< n) vertices randomly; Construct local pheromone of m_ants based on initial coloration; for ant = 1 to m_ants do ant colors all the vertices with valid coloration using the least color; ant updates its local pheromone table based on new coloration; end bestcol = best coloration among the coloration made by all the ants; ncolor = the number of distinct colors in best col; if (ncolor < chr num) then chrnum = ncolor; update global pheromone using bestcol; end if chrnum not improved in successive 10 iterations then reconstruct the solution; end end
Solving Graph Coloring Problem Using Ant Colony …
7
6.4 SA Algorithm for GCP Here, the initial solution is constructed by assigning colors to the vertices randomly using k number of colors where k is the best known chromatic number obtained so far. The state’s potential energy is the solution’s cost function (coloration), calculated as the number of conflicts in the coloration. The objective function is to make the potential energy zero. Then, a new solution is generated by picking a conflicting vertex randomly and assigning a new color different from the existing color from k number of colors. If the solution becomes invalid, we try to make it valid by repeating the process (tunable multiplier * average neighborhood) times before temperature decreases. The average neighborhood is calculated as |V | ∗ k, where |V | is the number of vertices of the graph. The algorithm terminates when no conflict exists in a coloring configuration, or maximum iterations are exhausted, or the algorithm runs for a specific time, whichever occurs first.
Algorithm 2: SA Algorithm for GCP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Input: A graph with n vertices and the best known chromatic number, k. Output: A Valid Coloration without any Conflict. Create an initial valid coloration x, using k number of colors; H pot (x) = C(x) where C(x) is the number of conflicts in x; Set initial temperature T0 , Max Step, T = T0 ; while (Termination Condition not met) do while (I teration = T unableMulti plier ∗ AverageN eighbor hood Si ze) do Randomly select a conflicting vertex v; Choose a new color randomly using [1, k] color for v to create new coloration x ; H pot (x ) = C(x ); delta H Pot = C(x ) − C(x); if (delta H Pot < 0) then x = x ; end else with probability ex p(−delta H Pot/T ), set x = x ; end end T = T − (T0 /Max Step); end
6.5 QA Algorithm for GCP SA algorithm has a chance to get trapped in local minima. Hence, to overcome this situation, a QA algorithm has been proposed where the minimization of cost function not only involves potential energy but also the sum of potential energy and kinetic
8
A. Kole et al.
energy. Instead of taking a single coloring configuration, in QA, a coloring configuration consists of a set of P replicas of the coloration where each of the replicas consists of some spin variables {Sk } corresponding to a coloring configuration. Each spin variable can take two values—either 1 or -1—depending on the color assignments of the vertices. Thus, in QA, the objective function to be minimized is: H = H p + Hk
(2)
where H is the quantum Hamiltonian and H p and Hk are defined as follows: Hp =
P 1 H pot {S(u,v),k } P k=1
(3)
P S(u,v),k .S(u,v),k+1 + S(u,v),1 .S(u,v),P ) (4) Hk = −coupling f actor ∗ ( k=1 (u,v)
where coupling f actor = − S(u,v),k
(u,v)
Tf T ln tanh( ) 2 PT
−1, i f col(u) = col(v) = 1, i f col(u) = col(v)
(5)
(6)
7 Results and Discussions This section illustrates the experimental results obtained by the above three algorithms along with various parameter settings, and some processes are explained that have been used in the algorithms. After some trial and error, we have considered number of ants (m_ants) as 10 for ACO algorithm, and the maximum iteration is set as 100,000. Initial temperature (T0 ) has been chosen randomly in the range [0.09, 0.20]. Maxstep has been set to 100,000 for SA algorithm. In contrast, for QA algorithm, initial quantum temperature (Tq) has been chosen randomly in (0, 1], tunneling field strength has been selected randomly in [1.20*Tq, 3*Tq], and MaxStep is set as 10000. The tunable multiplier has been set as 4 for SA and QA approaches, and the number of replica (P) has been set to 10 because, for bigger value of P, more memory and computations are needed. All the algorithms have been implemented in ANSI C in Linux platform and run on a 2.4 GHz machine with 8 GB RAM. All these algorithms have been tested on some of the DIMACS [52] benchmark graphs ten times, and the average of ten execution times is given for all the algorithms on each of the instances, respectively. Since GCP is an NP-Complete problem, it is impossible to obtain an optimum solution by any single algorithm for all the graph
Solving Graph Coloring Problem Using Ant Colony …
9
Algorithm 3: QA Algorithm for GCP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Input: A graph with n vertices and the best-known chromatic number, k, number of replicas P. Output: A Valid Coloration without any Conflict. Create an initial set of P coloring configurations {x z } = x, randomly using k colors; Set quantum temperature Tq = (P ∗ T ), tunneling field strength T f 0 , Max Step and Set T f = T f0 ; while (Termination Condition not met) do Randomly shuffle the order of the replica; Calculate kinetic energy of x as H K in(x); for each y in {1, 2, 3, · · · , P} do Select replica z in position y; H Pot (x z ) = C(x z ) where C(x z ) is the number of conflicts in x z ; while (I teration = T unableMulti plier ∗ AverageN eighbor hood Si ze) do Randomly select a conflicting vertex v in x z ; Choose a new color randomly using [1, k] color for v to create new coloration x z and hence x ; H Pot (x z ) = C(x z ); delta H Pot = C(x z ) − C(x z ); delta P E = delta H Pot/P; Calculate kinetic energy of x as H K in(x ); K inChange = H K in(x ) − H K in(x); delta K E = K inChange ∗ Coupling Factor ; delta H = delta P E − delta K E; if (delta H Pot < 0 OR delta H < 0) then x = x ; end else with probability ex p(−delta H/T ), set x = x ; end end end T f = T f − (T f 0 /Max Step); end
instances in polynomial time. Hence, the main objective of this work is to get optimal results in terms of chromatic number for most of the benchmark instances of GCP in a reasonable time. Also, the quantum algorithms are reversible in nature and give the best performance if executed with the help of quantum computers which work based on the reversible principle. But due to the absence of quantum computers, QA has
10
A. Kole et al.
Fig. 4 F-score of ACO, SA, and QA for 45 graph instances
Fig. 5 RMSD of ACO, SA, and QA for 45 Graph Instances
been implemented and tested in a classical irreversible environment, and thus, it is taking reasonably high time for some large graph instances. Table 1 shows that out of 45 instances, QA gives optimal results for 43 instances, whereas ACO and SA give optimal results for 38 and 41 instances, respectively. Also, there are two instances where all the algorithms fail to provide optimal results, but QA gives better results than ACO and SA. Table 2 presents the comparative performance of the above QA algorithm with two other existing quantum-based evolutionary algorithms, namely quantum-inspired evolutionary algorithm (QIEA) [45] and Cuckoo quantum evolutionary algorithm (CQEA) [46]. For efficiency measures of these three proposed algorithms, we have calculated F-Score values, given in Table 3 and Figure 4, and RMSD values, shown in Table 4 and Figure 5.
8 Conclusion and Future Scope This paper elucidates three different heuristic and meta-heuristic approaches in solving GCP, an NP-Complete problem. The experimental result shows that QA outperforms both ACO and SA for all the 45 benchmark instances but failed to generate the best-known results for the two graph instances. In the future, to overcome the limitations of the NP-complete nature of GCP, we mainly focus on designing one or more heuristic or meta-heuristic evolutionary approaches either by using some existing EAs and MHAs or through a new and robust approach to produce the optimal outturns for most of the benchmark instances in a reasonable timeframe.
Solving Graph Coloring Problem Using Ant Colony …
11
Table 1 Comparative result of ACO, SA and QA algorithm for GCP Instances
V
E
K
Best known
ACO
ACO Time (S)
SA
SA Time (S)
QA
QA Time (S)
anna.col
138
493
11
11
11
0.01
11
1.12
11
4.97
david.col
87
406
11
11
11
0.01
11
0.37
11
1.44
homer.col
561
1629
13
13
13
0.01
13
31.96
13
352.65
huck.col
74
301
11
11
11
0.01
11
0.02
11
0.19
jean.col
80
254
10
10
10
0.01
10
0.03
10
0.28
games120.col
120
638
9
9
9
0.01
9
0.14
9
2.33
miles250.col
128
387
8
8
8
0.01
8
0.89
8
3.57
miles500.col
128
1170
20
20
20
0.07
20
12.06
20
11.63
miles750.col
128
2113
31
31
31
10.41
31
149.31
31
122.65
queen5_5.col
25
160
5
5
5
0.01
5
0.08
5
0.17
queen6_6.col
36
290
7
7
7
0.54
7
1.21
7
5.63
queen7_7.col
49
476
7
7
7
50.69
7
3.73
7
7.07
queen8_8.col
64
728
9
9
10
102.09
9
14.43
9
27.32
queen8_12.col
96
1368
12
12
13
236.73
12
53.44
12
134.65
queen9_9.col
81
2112
10
10
11
164.51
10
46.99
10
241.32
queen10_10.col
100
2940
?
11
13
256.78
11
203.09
11
132.38
myciel3.col
11
20
4
4
4
0.01
4
0.01
4
0.01
myciel4.col
23
71
5
5
5
0.01
5
0.01
5
0.01
myciel5.col
47
236
6
6
6
0.01
6
0.16
6
0.21
myciel6.col
95
755
7
7
7
0.01
7
4.65
7
1.8
myciel7.col
191
2360
8
8
8
0.01
8
65.28
8
19.7
mug88_1.col
88
146
4
4
4
0.01
4
0.01
4
0.04
mug88_25.col
88
146
4
4
4
0.01
4
0.01
4
0.03
mug100_1.col
100
166
4
4
4
0.01
4
0.01
4
0.04
mug100_25.col
100
166
4
4
4
0.01
4
0.01
4
0.05
1-Insertions_4.col
67
232
4
5
5
0.01
5
0.09
5
0.4
1-Insertions_5.col
202
1227
?
6
6
0.01
6
22.47
6
15.43
1-Insertions_6.col
607
6337
?
7
7
0.01
7
62.59
7
591.31
2-Insertions_3.col
37
72
4
4
4
0.01
4
0.01
4
0.01
2-Insertions_4.col
149
541
4
5
5
0.01
5
1.59
5
3.44
2-Insertions_5.col
597
3936
?
6
6
0.01
6
44.68
6
241.32
3-Insertions_3.col
56
110
4
4
4
0.01
4
0.01
4
0.01
3-Insertions_4.col
281
1046
?
5
5
0.01
5
20.67
5
27.34
3-Insertions_5.col
1406
9695
?
6
6
0.01
6
605.99
6
6699.32
4-Insertions_3.col
79
156
3
4
4
0.01
4
0.01
4
0.01
4-Insertions_4.col
475
1795
?
5
5
0.01
5
14.54
5
137.38
1-FullIns_3.col
300
100
?
4
4
0.01
4
0.02
4
0.01
1-FullIns_4.col
93
593
?
5
5
0.01
5
3.64
5
1.86
2-FullIns_3.col
52
201
?
5
5
0.01
5
0.14
5
0.01
DSJC125_1.col
125
736
?
5
6
223.92
5
22.32
5
68.53
DSJC125_5.col
125
3891
?
17
21
758.36
19
237.36
18
813.39
le450_5d.col
450
9757
5
5
8
432.21
6
544.36
5
764.84
le450_15d.col
450
16750
15
15
20
589.65
17
694.81
15
1042.48
mulsol.i.1.col
197
3925
49
49
49
0.01
49
201.06
49
83.12
flat300_20_0.col
300
21375
20
20
38
795.22
27
938.74
24
1685.69
12
A. Kole et al.
Table 2 Result comparison of QA with two other existing quantum-based evolutionary algorithms for GCP Instances V E K Best QIEA CQEA QA known [45] [46] anna.col david.col homer.col huck.col jean.col games120.col miles250.col miles500.col miles750.col queen5_5.col queen6_6.col queen7_7.col queen8_8.col queen8_12.col queen9_9.col queen10_10.col myciel3.col myciel4.col
138 87 561 74 80 120 128 128 128 25 36 49 64 96 81 100 11 23
493 406 1629 301 254 638 387 1170 2113 160 290 476 728 1368 2112 2940 20 71
11 11 13 11 10 9 8 20 31 5 7 7 9 12 10 ? 4 5
11 11 13 11 10 9 8 20 31 5 7 7 9 12 10 11 4 5
Table 3 Precision, recall, and F-score of ACO, SA, and QA Algorithm Precision Recall ACO SA QA
0.84 0.91 0.96
Table 4 RMSD of ACO, SA, and QA Algorithm ACO SA QA
1 1 1
RMSD 2.91 1.14 0.61
11 11 13 11 10 9 8 10 31 5 7 7 9 12 – 12 4 5
11 11 – 11 10 9 8 20 – 5 7 – – – – – 4 5
F-score 0.92 0.95 0.98
11 11 13 11 10 9 8 20 31 5 7 7 9 12 10 11 4 5
Solving Graph Coloring Problem Using Ant Colony …
13
References 1. F.T. Leighton, A graph coloring algorithm for large scheduling problems. J. Res. Natl. Bur. Stand. 84(6), 489–506 (1979). https://doi.org/10.6028/jres.084.024 2. F.C. Chow, J.L. Hennessy, Register allocation by priority-based coloring, in Proceedings of the 1984 Sigplan Symposium on Compiler construction. ACM, 1984, pp. 222–232. https://doi. org/10.1145/502949.502896 3. G.D. Micheli, Synthesis and Optimization of Digital Circuits (McGraw-Hill, 1994). https:// doi.org/10.5860/choice.32-0950 4. S.S. Sarma, R. Mondal, A. Seth, Some sequential graph coloring algorithms for restricted channel routing. Int. J. Electron. 77(1), 81–93 (1985) 5. M.R. Garey, D.S. Johnson, H.C. So, An application of graph coloring to printed circuit testing. IEEE Trans. Circuits Syst. 23(10), 591–599 (1976). https://doi.org/10.1109/sfcs.1975.3 6. A. Gamst, Some lower bounds for class of frequency assignment problems. IEEE Trans. Veh. Technol. 35(1), 8–14 (1986). https://doi.org/10.1109/t-vt.1986.24063 7. K.E. Stecke, Design, planning, scheduling and control problems of flexible manufacturing. Ann. Oper. Res. 3(1), 1–12 (1985) 8. R.A. Haraty, M. Assi, B. Halawi, Genetic algorithm analysis using the graph coloring method for solving the university timetable problem. Sci. Direct, Procedia Comput. Sci. 126, 899–906 (2018). https://doi.org/10.1016/j.procs.2018.08.024 9. M.R. Garey, D.S. Johnson, Computers Intractability: A Guide to the Theory of NPCompleteness (W.H. Freeman and Company, 1979) 10. S. Baase, A.V. Gelder, Computer Algorithms: Introduction to Design and Analysis (Pearson, 1999) 11. C. Kinjal, T. Ankit, Travelling Salesman Problem: An Empirical Comparison Between ACO, PSO, ABC, FA and GA, Emerging Research in Computing, Information, Communications and Applications, vol. 906 (Springer, Singapore, 2019), pp. 397–405 12. G. Arushi, S. Smriti, Comparative analysis of ant colony and particle swarm optimization algorithms for distance optimization, in International Conference on Smart Sustainable Intelligent Computing and Applications, vol 173, pp 245–253 (2020) 13. S. As Anna Maria, S.B. Maya, K. Gilang, Comparison study of metaheuristics: empirical application of delivery problems. Int. J. Eng. Bus. Manage. 9, 1–12 (2017) 14. A.N. Shukla, M.L. Garg, An approach to solve graph coloring problem using adjacency matrix. Biosci. Biotechnol. Res. Commun. 12(2), 472–477 (2019). https://doi.org/10.21786/bbrc/12. 2/33. 15. A.N. Shukla, V. Bharti, M.L. Garg, A linked list-based exact algorithm for graph coloring problem. Int. Inf. Eng. Technol. Assoc. 33(3), 189–195 (2019). https://doi.org/10.18280/ria. 330304 16. A. Mehrotra, M.A. Trick, A column generation approach for graph coloring. Inf. J. Comput. 8(4), 344–354 (1996). https://doi.org/10.1287/ijoc.8.4.344 17. I.M. Diaz, P. Zabala, A branch-and-cut algorithm for graph coloring. Discrete Appl. Math. 154(5), 826–847 (2006). https://doi.org/10.1016/j.dam.2005.05.022 18. R. Masson, On the Analysis of Backtrack Procedures for the Coloring of Random Graphs, Lecture Notes in Physics, vol. 650 (Springer, Berlin, Heidelberg, 2004), pp. 235–254 19. K. Chen, H. Kanoh, A discrete firefly algorithm based on similarity for graph coloring problems, in 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2017. https://doi.org/10.1109/snpd. 2017.8022702 20. L. Moalic, A. Gondran, Variations on memetic algorithms for graph coloring problems. J. Heurist. 24(1), 1–24 (2018). https://doi.org/10.1007/s10732-017-9354-9 21. Z. Zhuang, S. Fan, H. Xu, J. Zheng, A memetic algorithm using partial solutions for graph coloring problem. IEEE Congress on Evolutionary Computation (CEC) (2016). https://doi.org/ 10.1109/cec.2016.7744194
14
A. Kole et al.
22. Y. Zhou, B. Duval, J.K. Hao, Improving probability learning based local search for graph coloring. Appl. Soft Comput. 65, 542–553 (2018). https://doi.org/10.1016/j.asoc.2018.01.027 23. A. Lim, F. Wang, Meta-heuristics for robust graph coloring problem, in 16th IEEE International Conference on Tools with Artificial Intelligence, 2004, https://doi.org/10.1109/ictai.2004.83 24. B. Cases, C. Hernandez, M. Graña, A. D’anjou, On the ability of swarms to compute the 3coloring of graphs, in Proceedings of the 11th International Conference on the Simulation and Synthesis of Living Systems (MIT Press, Cambridge, 2008), pp. 102–109 25. Z. Yongquan, Z. Hongqing, L. Qifang, W. Jinzhao, An improved cuckoo search algorithm for solving planar graph coloring problem. Appl. Math. Inf. Sci. 7(2), 785 (2013). https://doi.org/ 10.12785/amis/070249 26. C. Aranha, K. Toda, H. Kanoh, Solving the graph coloring problem using cuckoo search, in International Conference on Swarm Intelligence (Springer, Berlin, 2017), pp. 552–560. https:// doi.org/10.1007/978-3-319-61824-1_60 27. F.F. Ali, Z. Nakao, R.B. Tan, C.Y. Wei, An evolutionary approach for graph coloring. IEEE Int. Conf. Syst., Man, Cybern. 5, 527–532 (1999). https://doi.org/10.1109/icsmc.1999.815607 28. K. Tagawa, K. Kanesige, K. Inoue, H. Haneda, Distance based hybrid genetic algorithm: an application for the graph coloring problem, in Proceedings of the 1999 Congress on Evolutionary Computation-CEC99, vol. 3. IEEE, 1999, pp. 2325–2332. https://doi.org/10.1109/cec. 1999.785564 29. A. Dey, A. Agarwal, P. Dixit, T. Pal, Genetic algorithm for robust total coloring of a fuzzy graph, IEEE Congress on Evolutionary Computation (CEC). IEEE, 2019, pp. 1806–1813. https://doi. org/10.1109/cec.2019.8790137 30. D. Costa, A. Hertz, O. Dubuis, Ants can color graphs. J. Oper. Res. Soc. 48(3), 295–305 (1997). https://doi.org/10.1057/palgrave.jors.2600357 31. A. Hertz, D. Werra, Using tabu search techniques for graph coloring. Computing 39(4), 345– 351 (1987). https://doi.org/10.1007/bf02239976 32. E.D. Dahl, Neural networks algorithms for an NP-complete problem: map and graph coloring, in Proceedings of First International Conference on Neural Networks, vol. 3, pp. 113–120 (1987) 33. A. Jagota, An adaptive, multiple restarts neural network algorithm for graph coloring. Eur. J. Oper. Res. 93(2), 257–270 (1996). https://doi.org/10.1016/0377-2217(96)00043-4 34. M.A. Mosa, A. Hamouda, M. Marei, Graph coloring and ACO based summarization for social networks. Expert Syst. Appl. 74, 115–126 (2017). https://doi.org/10.1016/j.eswa.2017.01.010 35. L. Henrique, M. Prates, P. Avelar, L. Lamb, Graph colouring meets deep learning: effective graph neural network models for combinatorial problems, in 31st International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2019, pp. 879–885. https://doi.org/10.1109/ ictai.2019.00125 36. S. Naderi, M. Jabbarian, V.S. Naeini, A novel presentation of graph coloring problems based on parallel genetic algorithm. Int. J. Soft Comput. Eng. (IJSCE) 3(3), 65–70 (2013) 37. G.R. Filho, L.A.N. Lorena, Constructive genetic algorithm and column generation: an application to graph coloring, in The Fifth Conference of the Association of Asian-Pacific Operations Research Societies, 2000 38. B. Ray, A.J. Pal, D. Bhattacharyya, T.H. Kim, An efficient GA with multipoint guided mutation for graph coloring problems. Int. J. Sig. Process., Image Process. Pattern Recognit. 3(2), 51–58 (2010) 39. R. Marappan, G. Sethumadhavan, Solution to graph coloring problem using divide and conquer based genetic method, in International Conference on Information Communication and Embedded Systems (ICES). IEEE, 2016, pp. 1–5. https://doi.org/10.1109/icices.2016.7518911 40. H. Lixia, H. Zhanli, A novel bi-objective genetic algorithm for the graph coloring problem, in Second International Conference on Computer Modeling and Simulation, vol. 4. IEEE, 2010, pp. 3–6. https://doi.org/10.1109/iccms.2010.157 41. K. Lakshmi, G. Srinivas, V.R. Bhuvana, A study on hybrid genetic algorithms in graph coloring problem. Res. J. Sci. Technol. 9(3), 392–394 (2017)
Solving Graph Coloring Problem Using Ant Colony …
15
42. A.J. Pal, B. Ray, N. Zakaria, S.S. Sarma, Comparative performance of modified simulated annealing with simple simulated annealing for graph coloring problem, in International Conference on Computational Science, ICCS 2012 43. O. Titiloye, A. Crispin, Quantum annealing of the graph coloring problem. Discret. Optim. 8(2), 376–384 (2011). https://doi.org/10.1016/j.disopt.2010.12.001 44. O. Titiloye, A. Crispin, Parameter tuning patterns for random graph coloring with quantum annealing. PLoS ONE 7(11)(2012). https://doi.org/10.1371/journal.pone.0050060 45. D.P. Prosun, H. A. Khan Mozammel, Quantum-inspired evolutionary algorithm to solve graph coloring problem. Int. J. Adv. Comput. Sci. Appl. 4(4), 66–70 (2014) 46. Y. Xu, C. Yu, A Cuckoo Quantum Evolutionary Algorithm for the Graph Coloring Problem, arXiv preprint arXiv: 2108.08691 (2021) 47. M.A. Basmassi, L. Benameur, A. Chentoufi, A novel greedy genetic algorithm to solve combinatorial optimization problem, in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLIV-4/W3-2020 (2020) 48. M. Dorigo, V. Maniezzo, A. Colorni, Ant system: optimization by a colony of operating agents. IEEE Trans. Syst. Man Cybern.-Part B (Cybern.) 26(1), 29–41 (1996). https://doi.org/10.1109/ 3477.484436 49. M. Dorigo, G. Dai Caro, L.M. Gambardella, Ant algorithms for discrete optimization. Artif. Life 5(2), 137–172 (1999). https://doi.org/10.1162/106454699568728 50. A. Chams, A. Hertz, D. Werra, Some experiments with simulated annealing for coloring graphs. Eur. J. Oper. Res. 32(2), 260–266 (1987) 51. T. Kadowaki, H. Nishimori, Quantum annealing in the transverse Ising model. Phys. Rev. E 58(5), 5355–5363 (1998). https://doi.org/10.1103/physreve.58.5355 52. DIMACS Benchmark Graphs, https://mat.gsia.cmu.edu/COLOR04. Last accessed 2021/11/06
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants Vinodkumar R. Patil and Tushar H. Jaware
Abstract To study neuro-development of baby infant’s brain, MRI segmentation plays an important role. Brain tissue characterization and volumetric analysis are utilized for predicting infant brain development. This paper aims to focus on neuroimaging of baby infants for accurate tissue segmentation and quantitative and qualitative analysis brain MR images of newborn infants. Quantitative analysis aids in determining the extent of brain injury or abnormalities. As a result, segmentation plays an important role in neuroimaging. The paper starts discussing challenges, related work, existing approaches, motivation, and problem definition. In later parts, it focuses on methodology being adopted, contribution, and conclusion. Keywords Infants · Segmentation · Tissue · Brain · Neonates
1 Introduction The brain is the nervous system’s main processing mechanism. The brain works in cooperation with cerebrum, cerebellum, and brainstem and shelters within the skull. Neuroimaging or brain imaging contains the usage of numerous methods to analyze the structure and function of the nervous system. Neuroimaging is rapidly becoming the preferred modality for the evaluation of brain disorders in human being. The study of the human brain using neuroimaging has entered a new era. MR images segmentation is the best tool to analyze the MR images. Image segmentation is a procedure to classify areas of concentration from images. Brain tissue segmentation from MRI is of excessive status for research and medical field of abundant neurological pathology. The exact tissue segmentation of MRI images into dissimilar tissue lessons, gray matter, white matter, and cerebrospinal fluid is a significant task.
V. R. Patil (B) · T. H. Jaware R. C. Patel Institute of Technology, Shirpur, M.S., India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_2
17
18
V. R. Patil and T. H. Jaware
Analysis of brain in early period of life infants is essentially task. Neonatal MRI mostly applies clinically to judge acquired lesions. Magnetic resonance imaging builds up the detection, visualization, and characterization of neurological disorders. MRI can be supportive to provide the detailed information about structure and different regions of the brain [1]. Neonates brain MR images segmentation provides initial step toward quantitative analysis and volumetric studies. Quantitative analysis assists estimate of brain damage or abnormality. Volumetric studies for neonates are helpful in brain diagnosis. Consequently, segmentation shows a vital role in neuroimaging [2]. Infant brain MRI reasonable segmentation is far extra challenging compared with adult brain MRI tissue segmentation. The explicit challenges are: • Low contrast to noise ratio (CNR) of white and gray matter for newborn MRI. • Typically, infants brain MRI shows motion. • In neonatal brain MRI, each tissue type has distinct levels of intensity, homogeneity, and variability. • The intensity characteristics of various tissues are highly overlapping; therefore, the decision limits for intensity-based categorization are usually unclear and difficult.
2 Literature Review We focused on a literature review on infant brain MRI segmentation and classification approaches. Daliri et al. [3] used a probabilistic atlas for skull segmentation. They suggested a Bayesian classifier-based adaptive method for weighing local characteristics of the MR image against those of the atlas. An atlas-free skull-stripping methodology was proposed by Peporte et al. [4] for infants, and a new automated skull-stripping process has been developed. Image artifacts are eliminated, and binary masks are created using morphological operators. Segmentation was achieved using multiple thresholds. Serag et al. [5] introduced a new approach for extracting the brain from multimodal newborn brain MR data. This approach is based on a sparsity-based atlas collection strategy, which needs a small amount of evenly distributed atlases in a lesser data space. Prastawa et al. [6] used register probabilistic brain atlas. To estimate the initial intensity distributions, they utilized robust graph clustering and parameter estimation. Scattering estimations and spatial priors are used for bias correction and later it was combined into the expectation maximization scheme for tissue classification.
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants
19
Tissue segmentation using EM algorithm is extended by Kelly el al. [1]. They performed tissue segmentation using multiple age-specific atlases and conjunction with the Markov random fieldAQ3. Kuklisova et al. [6] constructed probabilistic atlases using affine alignment which provides prior tissue probability maps for every selected phase of infant brain. Atlas is formed from the segmentations of 142 neonatal matters at dissimilar ages using a kernel-based regression. Alijabar et al. [7] proposed multi-class atlas for tissue segmentation. It can be created using images that are comparable to the target image, allowing for more exact segmentation than population-based atlases. Later Shi et al. [8] have developed a multi-region and multi-reference system for newborn brain segmentation. They reported lower standard deviations and greater tissue overlap rates. Jaware et al. [9] introduced atlas-free segmentation method for neonates and premature infants based on local similarity factor-based fusion adaptive neural tree method. They have classified the brain tissues at global level as well as tissues level. The decision tree provides accurate substitution of neuron in respective tissue class. In another approach [10], they presented a SOM-DCNN with sparse auto encoderbased atlas-free newborn brain image segmentation and classification method. The hybrid combination of deep convolution neural network and self-organizing maps results in improved segmentation efficiency. For neonatal brain MR images, a new atlas-free newborn brain tissue classification method based on multi-kernel support vector machine and Levenberg–Marquardt classification algorithm was presented by Jaware et al. [11]. This multi-stage classification approach significantly overcomes the problem of misclassification and labeling of brain tissues.
3 Motivation At present, neuro-radiologists utilize computers only to enhance the visualization of medical images. Brain MR image tissue segmentation and clinical evaluation are done manually. For better diagnosis and clinical interpretation as well as to handle large amount of image data, an automated method is desirable. Automating the segmentation technique should result in faster and more consistent outcomes. This has motivated us to develop an automatic brain tissue segmentation and classification system that is compatible for classifying maximum number of brain tissues with more accuracy. Hence, the proposed research work will focus on the development of the new computer-assisted automatic method for brain MRI analysis of baby infants.
20
Preprocessing
V. R. Patil and T. H. Jaware
Feature Extraction
Segmentation
Classification
Fig. 1 Automated infant brain soft tissue detection system
4 Problem Statement Several atlas-based and atlas-free methods for infant brain MRI segmentation were projected and presented by the researchers. The existing approaches did not achieve the expected level of segmentation accuracy, particularly in myelinated and non-myelinated white matter. The merits and demerits of the existing techniques are exposed from literature survey. Hence, to optimize the performance measurements of automated systems, appropriate approaches must be developed.
5 Methodology The overall flow of the proposed system will be consisting of various steps as illustrated in Fig. 1. Firstly, preprocessing is required to remove noise without losing any image quality to enhance the image quality by removing artifacts of image. Feature extraction the important step in MRI brain tissue segmentation because segmentation accuracy is dependent on feature extraction method. The measurement vectors used in image segmentation are obtained by feature extraction and selection. A set of feature vector are given to input to the classifier to segment into various classes. Segmentation separates pixels into regions and thus specifies the tissue’s boundaries of gray matter, white matter, and cerebrospinal fluid levels, as well as their spatial distribution, which have all been examined to assist in the diagnosis.
6 Proposed Research Work Computer-assisted diagnosis systems depend heavily on MR image recognition algorithms. The accurate detection of the anatomical structure forms the backbone of many automated systems. It is used to evaluate and visualize various regions of the brain. The main objective of the proposed research is to investigate, analyze, and simulate several methods to identify and evaluate brain structures in infants using MRI. This work will investigate neuroimaging of baby infants using novel innovative methods such as fusion methods for performance improvement.
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants
21
It is proposed to do the following: 1. 2. 3. 4. 5. 6.
To examine the option of transforming or to recommend new-fangled technique to enhance the quality of MR images. MRI images are being used to extract features and improve it for accurate detection of brain structures. To see whether there is a way to improve the performance of segmentation algorithms by altering or proposing new methods for brain MR images of infants. To examine into to the feasibility of improving or proposing a new methodology for segmentation of white matter into myelinated and non-myelinated. To examine the possibility of altering or to advise new technique for classification of infants brain tissues. To make a comparison performance measures of the proposed algorithms from those of existing algorithms.
7 Results 7.1 Preprocessing The much more broadly utilized neurological modalities is MRI. Image de-noising is an important pre-processing stage in computer-aided diagnosis. The primary purpose of image de-noising is to reconstruct information of the image as exactly as possible from its noisy representation while preserving the indicated image’s vital details including boundaries and textures. Conservative, BM3D (block matching), Crimmins, Gaussian, Laplacian addition and subtraction, low pass, median, mean, and unsharp filtering methods have been used on T1-w and T2-w infant brain MR images, and simulated results for infant’s images are shown in Table 1. Performance of filtering methods is evaluated using various parameter such mean squared error (MSE), peak signal-to-noise ratio (PSNR), root mean squared error (RMSE), multi-scale structural similarity index (MSSSIM), blocking effect (PSNR), spectral angle mapper (REF_SAM), structural similarity index (SSIM), universal quality image index (UQI), and visual information fidelity (VIFP). Various statistical parameters of filtering approaches are computed and compared for superiority evaluation. As shown in Table 2, the BM3D approach achieves the best results for all evaluation criteria which including MSSSIM, RMSE, SSIM, PSNR, VIFP, and UQI.
7.2 Feature Extraction In the medical image investigation, the feature accomplishes an extremely vital part. Diverse image de-noising processes are implemented input brain MRI image earlier
BM3D
Low Pass
Crimmins
Filtering method
Subject ID 1
Table 1 Simulated result of MRI image filtering Subject ID 2
Subject ID 3
Subject ID 4
Subject ID 5
(continued)
22 V. R. Patil and T. H. Jaware
Laplacian Substract
Gaussian
Laplacian addition
Filtering method
Table 1 (continued)
Subject ID 1
Subject ID 2
Subject ID 3
Subject ID 4
Subject ID 5
(continued)
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants 23
Conservative
Filtering method
Table 1 (continued)
Subject ID 1
Subject ID 2
Subject ID 3
Subject ID 4
Subject ID 5
24 V. R. Patil and T. H. Jaware
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants
25
Table 2 Evaluation of different de-noising algorithms for T1-w newborn images for various metrics Filter methods
MSE
PSNR
RMSE
MSSSIM
RMSE
REF-SAM
SSIM
UQI
BM3D
2643.72
21.73
0.33
0.84
35.91
0.29
0.79
0.89
Unsharp
6668.82
11.20
0.68
0.55
76.55
0.62
0.49
0.55
Median
7083.46
12.23
0.68
0.55
75.76
0.62
0.50
0.67
Mean
6735.69
12.29
0.66
0.55
74.13
0.61
0.50
0.63
Laplacian Sub
5782.92
14.36
0.59
0.62
65.59
0.54
0.57
0.69
Conservative
5782.92
14.36
0.59
0.62
65.59
0.54
0.57
0.69
Crimmins
6410.76
12.89
0.64
0.58
71.52
0.58
0.53
0.64
Gusassian
6359.15
13.22
0.63
0.59
70.52
0.58
0.54
0.66
Laplacian Add
6214.29
13.42
0.62
0.59
69.47
0.57
0.54
0.66
Low Pass
6110.01
13.65
0.61
0.60
68.54
0.56
0.55
0.67
Ideal Range
0
INF
0
1 + 0j
0
0
1
1
procurement features. Later, feature extraction approaches are applied to obtain significant features in clinical image segmentation. By varying theta, sigma, lambda, and gamma, 32 filters have been molded for feature extraction. These molded filter banks are illustrated in Table 3, beside with Canny edge, Gaussian (σ = 3 and 7), Scharr, median (σ = 3 and 7); Sobel and Roberts operators are applied for real feature recognition and edge preservation. The edges of clinical medical research are of supreme implication. Hybrid amalgamation of all upstairs approaches is the originality of this research work. Optimal features are designated for this purpose to improve segmentation accuracy.
7.3 Segmentation Random forest [12–15] has now received a great deal of attention in the field of brain MRI computer vision. Random forest has substantiated to precise as well as steady for numerous newborn MRI brain soft tissue segmentation contests to grip a great volume of multiclass data of a great measurement. We applied random forest classifier to segment WM, GM, and CSF soft tissue of infant’s brain MR images. Proposed method for infant brain MRI tissue segmentation scheme is shown in Fig. 2, and also, simulated results of segmentation are illustrated in Table 4. Comparing results with manual segmentation, the DSC and accuracy parameters have been used to evaluate each segmentation approach. Table 5 presents the results of measurements of performance including such DSC and accuracy.
26
V. R. Patil and T. H. Jaware
Table 3 Gabor features Gabor
Feature generated with changing various parameter Theta
Sigma
Lamda
Gamma
1
0
1
0
0.05
2
0
1
0
0.5
3
0
1
0.785398163
0.05
4
0
1
0.785398163
0.5
5
0
1
1.570796327
0.05
6
0
1
1.570796327
0.5
7
0
1
2.35619449
0.05
8
0
1
2.35619449
0.5
9
0
3
0
0.05
10
0
3
0
0.5
11
0
3
0.785398163
0.05
12
0
3
0.785398163
0.5
13
0
3
1.570796327
0.05
14
0
3
1.570796327
0.5
15
0
3
2.35619449
0.05
16
0
3
2.35619449
0.5
17
0.785398163
1
0
0.05
18
0.785398163
1
0
0.5
19
0.785398163
1
0.785398163
0.05
20
0.785398163
1
0.785398163
0.5
21
0.785398163
1
1.570796327
0.05
22
0.785398163
1
1.570796327
0.5
23
0.785398163
1
2.35619449
0.05
24
0.785398163
1
2.35619449
0.5
25
0.785398163
3
0
0.05
26
0.785398163
3
0
0.5
27
0.785398163
3
0.785398163
0.05
28
0.785398163
3
0.785398163
0.5
29
0.785398163
3
1.570796327
0.05
30
0.785398163
3
1.570796327
0.5
31
0.785398163
3
2.35619449
0.05
32
0.785398163
3
2.35619449
0.5
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants
27
Fig. 2 Proposed method for infant brain MRI tissue segmentation scheme
The Dice coefficient is a measure of performance, and ideally, it should be unity. For the proposed approach, the average DSC value is more than 0.9. The major contribution of our work lies in molding of 32 Gabor filters and their amalgamation with classical edge detection filters for feature extraction, and finally, random forest classifier has been utilized for MRI brain tissue segmentation.
8 Conclusion As sophisticated quality images are acquired and newborn development is progressively attentive, segmentation of infant brain MRI is pretty more fascinating. The aim of the proposed research is to investigate, analyze, and simulate different methods for identifying and analyzing brain structures in infants using MRI. This study aimed to look at neuroimaging of baby infants applying cutting-edge techniques such as hybrid approaches to improve the performance. The proposed approach segments the MRI brain images in white, gray, and CSF on the iSeg-2017 dataset.
28
V. R. Patil and T. H. Jaware
Table. 4 Simulated results for T1 weighted images Input
T1- w
Segmented output
Mask
Image 1
Image 2
Image 3
Image 4
(continued)
Computer-Assisted Diagnosis and Neuroimaging of Baby Infants
29
Table. 4 (continued) Input
T1- w
Segmented output
Mask
Image 5
Table 5 Measurements of performance
Image
Accuracy (%)
DSC
Sub 1
90.68
91.56
Sub 2
90.87
90.47
Sub 3
90.68
90.87
Sub 4
91.48
91.48
Sub 5
91.56
90.68
References 1. C.J. Kelly, E.J. Hughes, M.A. Rutherford, S.J. Counsel, Advances in neonatal MRI. Bio Med. J. 1–5 (2018). https://doi.org/10.1136/archdischild-2018-314778 2. C.N. Devi, A. Chandrasekharan, V.K. Sundararaman, Z.C. Alex, Neonatel brain MRI segmentation: A review. Comput. Biol. Med. (Elsevier) 64, 163–178 (2015) 3. M. Daliri, H.A. Moghaddam, S. Ghadimi, M. Momeni, F. Harirchi, M. Giti, Skull segmentation in 3D neonatal MRI using hybrid hopfield neural network, in 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (2010), pp. 4060–4063 4. M. Peporte, D.E.I. Ghita, E. Twomey, P.F. Whelan, A hybrid approach to brain extraction from premature infant MRI, in Scandinavian Conference on Image Analysis (Springer, Berlin, 2011), pp. 719–730 5. A. Serag, M. Blesa, E.J. Moore, R. Pataky, S.A. Sparrow, A.G. Wilkinson, G. Macnaught, S.I. Semple, J.P. Boardman, Accurate learning with few Atlases (ALFA): an algorithm for MRI neonatal brain extraction and comparison with 11 publicly available methods. Sci. R. 6 (2016) 6. M. Kuklisova-Murgasova, P. Aljabar, L. Srinivasan, S.J. Counsell, V. Doria, A. Serag, I.S. Gousias, J.P. Boardman, M.A. Rutherford, A.D. Edwards, J.V. Hajnal, D. Rueckert, A dynamic 4D probabilistic atlas of the developing brain. NeuroImage (Elsevier ) 54, 2750–2763 (2011) 7. P. Aljabar, R.A. Heckemann, A. Hammers, J.V. Hajnal, D. Rueckert, Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy. NeuroImage (Elsevier) 46, 726–738 (2009) 8. F. Shi, P.-T. Yap, Y. Fan, J.H. Gilmore, W. Lin, D. Shen, Construction of multi-region-multireference atlases for neonatal brain MRI segmentation. NeuroImage (Elsevier) 51, 684–693 (2010) 9. T.H. Jaware et al, An accurate automated local similarity factor-based neural tree approach toward tissue segmentation of newborn brain MRI. Am. J. Perinatol. https://doi.org/10.1055/ s-0038-1675375, ISSN 0735–1631 (SCI) Impact Factor 1.623 Dec 2018
30
V. R. Patil and T. H. Jaware
10. T.H. Jaware et al, An atlas-free newborn brain image segmentation and classification scheme based on SOM-DCNN with sparse auto encoder, in Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. Feb (2019). https://doi.org/10.1080/216 81163.2019.1573380 11. T.H. Jaware et al, “Multi-kernel support vector machine and Levenberg-Marquardt classification approach for neonatal brain MR images, in IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES), July (2016), pp. 1–4 12. B. Kang, T.Q. Nguyen, Random forest with learned representations for semantic segmentation. IEEE Trans. Image Process. 28(7), 3542–3555 (2019) 13. C. Liu, R. Zhao, M. Pang, A fully automatic segmentation algorithm for CT lung images based on random forest. Med. Phys. 47(2), 518–529 Feb (2020) 14. L. Wang et al., Benchmark on automatic six-month-old infant brain segmentation algorithms: The iSeg-2017 challenge. IEEE Trans. Med. Imaging 38(9), 2219–2230 (2019) 15. M.Z. Alam, M.S. Rahman, M.S. Rahman, A Random Forest based predictor for medical data classification using feature ranking. Inform. Med. Unlocked 15 (2019)
Early Prediction of Ebola Virus Using Advanced Recurrent Neural Networks Avinash Sharma, Asadi Srinivasulu , and Tarkeshwar Barua
Abstract First, this research introduces some machine learning strategies that assist the neural organization with adapting effectively; particularly in an imbalanced informational index; where less instances of Ebola alongside huge number of different cases. Second, applied another and Advanced Recurrent neural organization (ARNN) that is Recurrent with the other numerical capacities to get better outcomes with actuation capacities. Third, intermittent utilization of the current and existing channel to enter involves in gathering of initiation (viz. gathering of highlights) demonstrates the strength of recovered element in the info, for example informational index link of the NN and ResNet organizations. Fourth, the computational work observer that the came about network accomplishes better exactness by getting numerous attributes arrangement by 03 other intermittent organizations. Simultaneously, the avg. precision of our extended neural organization for predicting network for Ebola classes is 89.86%, with avg. performance for all other cases is 91.63%. Keywords Classification · Neural networks · Ebola virus · Image processing · DL · TL · Skin images · RNN · ARNN
1 Introduction The Ebola is a gathering of infection at first begin in warm blooded animals, as an illness. Ebola, has a place with Filoviridae family, presented by world wellbeing association (WHO) on September 1976. The absolute initial cases was detected in the Guinea and South Sudan, known for Wu Flue. Ebola was found as longest genome of ribonucleic acid (RNA) approx. 26.5 k to 31.4 k long. Ebola [1, 2] malignant may be the most broadly perceived sort of infection among examination people in the whole world. A. Sharma (B) Maharishi Markandeshwar Engineering College, Haryana 133207, India e-mail: [email protected] A. Srinivasulu · T. Barua BlueCrest University, Monrovia, Liberia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_3
31
32
A. Sharma et al.
Latest Ebola information in 1973 demonstrates a third huge justification failing horrendously from Ebola cases in Africa, with approximately 171,567 cases, with 19.57% of new sickness events and 25,935 fatalities, showing 9.18% of most threatening development passing. In spite of that Ebola [1] may be expansive kind of threat, anyway because of its acknowledgment in the beginning phases; made a decent accomplishment rate, and over the top because of square development of the condition. Likewise, specialists are constrained to screen and finish up its principal to extended patient’s perseverance. Larger part of multi-class learning issues utilizes an assessment network dependent on a misfortune framework; subsequently, calculations for such issues are proxy limiting calculations, which are described by substitute [3, 4] misfortune. On the off chance that proxy misfortune is curved, that outcomes in substitute limiting calculation, at last it tends to be outlined as arched advancement issue and can be tackled productively. This examination study centers in three ways. Initial segment endeavors to depict aligned substitutes misfortunes which prompt a steady proxy limiting calculation for a given misfortune grid. It additionally talks about the fundamental and adequate conditions under which adjustment will occur in light of mathematical properties of the proxy and genuine misfortune. Second part centers around examining about raised alignment measurement that portrayed the natural trouble while accomplishing consistency for a preparation issue. At last, we investigate the conventional system to lead arched aligned substitute. In well-being information examination field, PC vision (CV) helped affirmation dependent on end convolution [5] added plan (CAD), which is genuinely mixed with imaging brand name anatomist. Then again, the machine learning (ML) gathering exhibits in arranging and supporting radiologists for definite examination, lessening time boundary and the partner cost viability in such assurance. The dropout layer (DL) systems demonstrate a promising result in a combination of CV endeavors, like division, game plan and article revelation. These methods include recurrent [6] layers that can remove different low-level close by features to huge level around the world. A related layer of neural network toward the last layer of RNN [5] changes all over tangled features into chance of explicit attributes. For instance, clump standardization layer (CSL) normalizes the commitment of a sheet (layer) with a 0 mean and a unit of measurement variety and DL that is one of the regularization techniques that neglect heedlessly picked center points. DL is likewise expectable to better the display of significant learning that is based. Finishing up center is based around standard troubles of significant learning-based techniques, which are applied to different fields, for example well-being clinical imaging and picture dataset applications [26].
2 Literature Review As indicated by the past research around here, a monstrous measure of work has been finished by individuals working at emergency clinics, facilities and laboratories; alongside, numerous analysts and researchers devote significant endeavors in battle
Early Prediction of Ebola Virus Using Advanced Recurrent …
33
against Ebola plague [25]. Because of inappropriate dispersal of the sickness, the execution of AI ready-made a critical commitment to computerized wellbeing area by utilize the essentials of DL and ASR calculations. These examination likewise centers around the significance of discourse signal preparing through early screening and diagnosing the Ebola infection by using the recurrent neural [5] network (RNN). Especially, through its engineering, long short-term memory (LSTM) for dissecting the patient’s indications like hack, breath and voice. Our outcomes discover low exactness in informational index test contrast with hacking and breathing sound examples. Our outcomes are in fundamental stage and perhaps hope to improve exactness of the voice tests by extending the informational collection and focusing on a bigger gathering of sound and tainted individuals. As of late, NAACCR association declares that all danger cases are portrayed by ICD for Oncology except for youth and pre-grownup Ebola, which were organized by the ICCC. The underlying drivers of passing were assembled by the ICD. At whatever point Ebola is assaulted, infection recurrence rates presented in this report were adjusted for delays in itemizing, which occur because of a leeway in the occasion that catch or data reviews. Past a few scientists announced the robotized screening and diagnosing, in view of the investigation of chest CT-pictures [1, 7]. Man-made intelligence is discovered to be held and implemented in e-well-being locale to help early identification [5] of Ebola, by investigating sound through, hacking, breathing and discourse [1]. The respiratory sound is a sign for human well-being status which can be perceived and analyzed by carrying out ML calculations [4].
3 Present Systems and Their Carrying Out Details Presently, the current frameworks is surveying the beginning of the stage in Ebola patients dependent upon barely any NN strategies [8]. This load of utilized strategies and calculations is discovered to be restricted as far as their presentation exactness and time intricacy. To address these boundaries, we propose ARNN calculations as depicted in this work. This segment specifically addresses the current frameworks alongside their downsides and afterward propose other frameworks as displayed beneath:
3.1 Proposed System Progressed RNNs are the most standard significant learning models for dealing with multidimensional bunch data, for instance, concealing pictures. A regular advanced RNN involves diverse recurrent [9], and pooling layers followed two or three totally related layers to meanwhile get comfortable with a component arrange and describe pictures. It uses bungle back spread a capable sort of tendency dive to revive the heaps overlap it’s seriousness of the return through with its diverse engineering. We
34
A. Sharma et al.
demonstrates a multiple-stage approaching by using separates Advanced RNNs. The primary advanced RNN recognizes centers in the presented tissue image, while the next advanced RNN accepts plot centered in the distinguished nuclear concentrations as commitment to expect the probability that system has a spot in an occasion on PCa rehash. Prior to depicting our advanced RNN models, introduced the nuances of the data, used to develop the proposed PCa rehash model. Advantages of Proposed System 1. 2. 3. 4. 5. 6. 7.
High preciseness and low time period intricacy. High executing with reduced computational expense. Even works with moderate measure of ready content are tiptop to the present-day framework. ARNN [6] is expansion of NN and ReNet60V3 to build the presentation. Training method for managing imbalanced picture informational collections. Evaluate our organizations on 12,050 skin pictures. Evaluation by utilizing ResNet50V2 and exception on picture informational index.
4 System Design and Implementation 4.1 ARNN Algorithms The ARNN [10] is a classification of NN. Grouping Labeling-Part of discourse labeling and named element acknowledgment. To determine the above benefits, we rolled out certain improvements to RNN and got ARNN through an arrangement [11, 12] of steps as: Stage 1: Checks the measure of put away information from past experience. Stage 2: Checks the measure of information being added into the current execution. Stage 3: Checks the measure of the yield information is being precise. In light of the Ebola informational collection, for example an all-out 960 pictures, our experimentation included the accompanying thirteen stages: Stage 1: Importation the necessary libraries. Stage 2: Importation the preparation data set. Stage 3: Execute include grading to changed information. Stage 4: Make over an information structure using 60 steps and 1 resulting. Stage 5: Importation Keras library. Stage 6: Initialize the ARNN. Stage 7: Adding the LSTM layers and some dropout regularization. Stage 8: Add the yield layer. Stage 9: Accumulate the ARNN. Stage 10: Acceptable the ARNN for the preparation group of data.
Early Prediction of Ebola Virus Using Advanced Recurrent …
35
Stage 11: Loading the Ebola trial picture information for 2014. Stage 12: Get on the anticipated Ebola for 2020. Stage 13: Visualize the consequences of anticipated genuine Ebola. Subsequently, it is discovered to be of more than preciseness, greedy less execution time; enumerating the Ebola cases in their basic outlook.
5 Outcome The CPU utilization to the Ebola data set using adapted recurrent neural network (Fig. 1).
5.1 Evaluation Methods The execution flow of Ebola data set using adapted recurrent neural network (Fig. 2). The display and survey effect from proposed technique using ARNN, the received not many methodologies as following. At first, RA, UA, UD and RD are characterized on the individual premise, for disarray network, to be analyzed first. Since the number of cases successfully expected as needed because of RA. Simultaneously, the number of models incorrectly expected as needed because of UA. The number of cases precisely expected as not needed because of RD. At long last, the number of events is mistakenly expected as not needed because of UD. By then, we can get four
Fig. 1 CPU utilization
36
A. Sharma et al.
Fig. 2 Execution flow of Ebola data set
assessments; for example execution, precision, time intricacy survey and F1-measure are determined dependent on the accompanying formulae (Fig. 3): Accuracy =
Fig. 3 Data loss versus accuracy
RA + RD RA + UA + RD + UD
Early Prediction of Ebola Virus Using Advanced Recurrent …
37
RA RA + UA RA Callback = RA + UD 2x Preciseness x Callback F measurse = Preciseness + Callback Preciseness =
Considering each informational index and all boundaries into thought, for showing odds of sickness; the exactness of peril assumption is required to depend upon the different assortment feature of clinical center data. That is, the betterment in component depiction of illness compares to the high precision. The analysis aboveground the accuracy @ 91.4%, that are below ‘almost certain’ survey the openness. It is understood that, our methodology burn-through half time while contrasted with the existing procedures. Additionally limited utilizing realistic handling unit (CPU + GPU) and TPU. This whole activity executing time is additionally relying upon the framework execution. At last, the framework execution is thusly relying upon the framework programming, framework equipment and accessible space. Contribution: As referenced prior, our test is thinking about 965 pictures: The above Fig. 4 explains execution time between Ebola virus data set and the number of iterations for the adapted recurrent neural network. The above Fig. 5 explains data loss between Ebola virus data set and the number of iterations for the adapted recurrent neural network.
Fig. 4 Number of CPU processors versus input data set
38
A. Sharma et al.
Fig. 5 Data loss versus iteration
6 Conclusions This exploration presents a fundamental outcome toward an earth shattering and current methodology for early determination of Ebola identification. It is anything but a specific instrument of the proposed Ebola early discovery framework. This work demarks the characterized preparing informational index into eight progressive stages. Out of 865 pictures, we have discovered 169 Ebola positive, 324 pneumonia positive, 360 typical/negative. As far as numbers, each class is discovered to be roughly equivalent to other people, mirroring our proposed ARNN self-learn Ebola class attributes. Particularly in this way, in light of the highlights as well as from other two classes. In each stage, we tracked down a striking note: The picture information base from typical and pneumonia classes is discovered to appear as something else, mirroring our proposed ARNN’s ability in distinctive Ebola from different classes. Among 760 pictures of our preparation informational indexes, the excess 34 pictures were taken for checking the presentation of organization. Since this model depends on countless pictures, made to accomplish better precision. Hence, we close that presentation of the model 79.42%, with affectability of 90.54% for Ebola gathering, and last execution of model is 78.6% that may berry in 2±ve folds.
References 1. F. Khan, Prediction of Ebola Virus Disease (EVD) model using machine learning approach. Dec 2019 2. A. Barry, S. Ahuka-Mundeke, Y. Ali Ahmed, Y. Allarangar, J. Anoko, B.N. Archer et al., Outbreak of Ebola virus disease in the Democratic Republic of the Congo, April–May, 2018: an epidemiological study. The Lancet 392, 213–221 (2018) 3. R. Bartzatt, Prediction of novel anti-ebola virus compounds utilizing artificial neural network
Early Prediction of Ebola Virus Using Advanced Recurrent …
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
39
(ANN), article received on06may 2018, revised on 27may 2018, accepted on 18 june 2018. 7(13) (2018), www.wjpr.net. https://doi.org/10.20959/wjpr201813-12749 S. Asadi, N. Deepika, Big data integrated with cloud computing for prediction of health conditions. Sambodhi Int. J. 43(02(V)) 125–131 (2020). ISSN: 2249-6661, UGC Care Journal. 2020, July-Sept 2020 A. Srinivasulu et al, Advanced lung cancer prediction based on blockchain material using extended CNN. Appl. Nanosci. APNA-D-21–00538, Received: 26 Mar 2021/Accepted: 24 May 2021 Appl. Nanosci. 1–13, https://doi.org/10.1007/s13204-021-01897-2, 09–06- http://link.spr inger .com/article/10.1007/s13204-021-01897-2(SCI-Journal), Springer Nature Journal A. Srinivasulu, S. Kolambakar, T. Barua, Efficient implementation of CoCalc cloud microservices for health data analytics, in International Conference on Research in Engineering, Technology and Science (ICRETS) (Paper ID - ICRETS033) (ICRETS, Turkey, 2021), 10–13 June 2021 A. Srinivasulu, U. Neelakantan, T. Barua, Early prediction of lung cancer detection using extended convolutional neural network. Psychol. Educ. 58(1), 5614–5624 (2021). ISSN:00333077 S. Asadi, C.D.V. Subbarao, A survey of large-scale ontologies based on pre-clustering paradigm. i-manager’s J. Cloud Comput. 2(3):8–19, ISSN: 2231-3338, May-July 2016. (Free Publication), http://www.imanagerpublications.com/ A. Srinivasulu, R. Reddy, B.L Bhargav, Survey of health monitoring system using integration of cloud and big data mining techniques, in International Conference on The Joint International Conference on ICAIECES—2017 & ICPCIT—2017 organized at MITS during 27th, 28th & 29th Apr 2017 S. Asadi, C.D.V. Subbarao, M. Sreedevi, Finding number of cluster services in educational cloud using EDBE (extended dark block extraction). J. Bookman Int. J. Softw. Eng. (BIJSE) 1(1), Sept 2012, ISSN No. 2319-4278 K. Thirumalesh, A. Srinivasulu, Clustering of Summarizing multi-documents (Large Data) by using mapreduce framework. i-manager’s J. Cloud Comput. 3(1), 2349–6835 (2016). Nov 2015 – Jan 2016; 3(1), 15–26. E-ISSN, 2350-1308 R. Sai, S. Asadi, Auditing the shared data in cloud through privacy preserving mechanisms. i-manager’s J. Inf. Technol. (JIT) 12(6), 22–29; i-manager’s J. Cloud Comput. 2(2) Feb–Apr 2015 S. Asadi, B. Rajesh, Improving the performance of KNN classification algorithms by using APACHE Spark. i-manager’s J. Cloud Comput. 4(2) July-Dec 2017. http://www.imanagerpubl ications.com/article/14382/. https://doi.org/10.26634/jcc.4.2.14382 S. Asadi, G.M. Chanakya, Health monitoring system using integration of cloud and data mining techniques, Copyright © 2017 Helix ISSN 2319-5592 (Online), HELIX multidisciplinary J. Sci. Explorer 5(5), Helix, 7(5), 2047–2052, Sept 2017. http://www.helix.dnares.in/wp-content/ uploads/2017/09/2047-2052-92.pdf A. Srinivasulu, K. Manasa, Survey of health monitoring system using integration of cloud and big data mining techniques. Int. J. Appl. Eng. Res. 12(1) (2017), ISSN 0973–4562, © Research India Publications. http://www.ripublication.com, INDEXING: SCOPUS (2010– 2017), https://www. ripublication.com/ijaerspl2017/ijaerv12n1spl_08.pdf S. Asadi, C.D.V. Subbarao, A. Bhudevi, Dynamic data storage publishing and forwarding in cloud using fusion security algorithms. Comput. Sci. Inf. Technol. 2(4), 203–210, https://doi. org/10.13189/CSIT.2014.020404, HR Publications, USA, (Free Publication), Mar 2014, http:// www.hrpub.org. ISSN: 2250-2734 S. Tarigonda, A. Ganesh, S. Asadi, Providing data security in cloud computing using novel and mixed agent based approach. Int. J. Comput. Appl. 112(6112(6)), 15–19, (2015). https:// doi.org/10.5120/19670-1180. ISSN: 973-93-80885-35-2, Feb 2015 S. Asadi, C.D.V. Subbarao, K. Lalitha, M. Sreedevi, Presented a paper on Finding number of cluster services in educational cloud using EDBE, in International Conference on Computer Science and Information Technology,(ICEECS-2012), organized by A unit of Interscience Institute of Management & Technology (Goa, India, 2012), Sept 23rd 2012, www.irnetexplore.ac. in. ISBN: 978-81-922442-0-4
40
A. Sharma et al.
19. S. Asadi, C.D.V. Subbarao, G. Suresh, Educational cloud: services for students, in International Conference on Cloud Computing and eGovernance (ICCCEG-2012), organized by ASDF (Association of Scientists, Developers and Faculties of Thai Chapter). (Hotel Holiday Inn, Bangkok, Thailand, 2012), 26–28 July 2012, www.asdf.org.in. ISBN: 978-81-920575-0-7 20. S. Asadi, G.M. Chanakya, Health monitoring system integrating integration of cloud and data mining techniques. Copyright © 2017 Helix ISSN 2319-5592 (Online), HELIX multidisciplinary J. Sci. Explorer 5(5); Helix 7(5), 2047–2052 (2017), Sept 2017 21. S.Asadi, B. Rajesh, Improving the performance of KNN classification algorithms by apache spark. i-manager’s J. Cloud Comput. 4(2) (2017), July-Dec 2017 22. S. Asadi, A. Pushpa, Disease prediction in big data healthcare an adapted recurrent neural network techniques. Int. J. Adv. Appl. Sci. (IJAAS) 9(2), 85–92 (2020); June 2020. https://doi. org/10.11591/ijaas.v9.i2, ISSN:2252-8814 23. T. Barua, R. Doshi, K.K. Hiran, Mobile applications development with python in Kivy framework. De Gruyter Stem. ISBN: 3110689383, 9783110689389 24. T. Barua, Machine learning with python. De Gruyter Stem. ISBN: 3110697165, 9783110697162
A Three-Step Fuzzy-Based BERT Model for Sentiment Analysis Koyel Chakraborty , Siddhartha Bhattacharyya , and Rajib Bag
Abstract The main aim of sentiment analysis is to understand the attitude of the audience towards a particular product. The attitudes are generally classified into positive, negative or neutral views. This paper proposes a new three-step fuzzy-based bidirectional encoder representations from transformers (BERT) sentiment analysis model to predict the sentiments of various types of datasets. The advantage of BERT is its both direction-wise training of context along with words. In the first step, the data pre-processing is done to clean the texts. The second step consists of passing the cleaned texts through the BERT module for classification. Finally, in the third step, the classified datasets are passed through a fuzzy logic module to help in better dealing of inconsistencies while finding sentiments. BERT has emerged to be one of the most competent deep learning models in recent times. The main aim of the model is to combine the advantages of both BERT and fuzzy logic mechanisms to identify sentiments accurately. The proposed model has been implemented on various datasets and has shown exemplary accuracy of over 90% in the highest epoch considered in the experiment. The model also has proved to perform better in terms of accuracy, precision and non-parametric measures. Keywords Sentiment analysis · BERT · Fuzzy-based approach · Accuracy
1 Introduction The world has been facing a humongous amount of rise in data as it is gradually shifting its base to digitization in its entirety. There has been a significant rise in the number of users that has been affecting the usage of Internet. This extensive usage has added to the mammoth quantity of data being generated every second. K. Chakraborty (B) Supreme Knowledge Foundation Group of Institutions, Mankundu, India S. Bhattacharyya Rajnagar Mahavidyalaya, Rajnagar, Birbhum, India R. Bag Indas Mahavidyalaya, Indas, Bankura, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_4
41
42
K. Chakraborty et al.
Specifically, since the beginning of 2020, i.e. during the pandemic and the postpandemic times that we are going through, it has become inevitable that Internet is the only option available in any sphere of work and study. Hence, the requirement of analysing the collected data and understanding the direction in which people are following is of utmost importance. Various businesses have dedicated personnel who study these trends so that the business can be strategized accordingly. Manufacturers of different types of products, industries responsible in producing various goods, entertainment industry, fashion industry, e-business concerns are all engaged in following the bandwagon of trends. Reviews on any object in real life today form an integral part of decision-making for human beings. The inclination of people following what is “trending” forces all industries to study trends and analyse them for enhanced performance. Entertainment industry is such an industry that gathers its insights from varied sources like reviews, blogs and social site posts. Of late, movie industry seems to be one of the most sought-after fields which is highly dependent on reviews collected from different sources. With the upliftment in technology, most of the movies are made available in all sorts of online platforms. Hence, analysts are compelled to amass their reviews for understanding the sentiments of the public towards that movie. Sentiment analysis (SA) deals with understanding the specific behaviour of the population towards a particular event or product [1]. SA can be broadly categorized into three groups, positive, negative and neutral. However, there are variations in these groups where the sentiments are further trodden upon and broken down into sub-divisions. For example, a positive sentiment can be broken down to somewhat positive, very positive, highly positive, etc. There has been multiple models and prediction systems that have efficiently dealt with identifying the sentiments from reviews. Rule-based techniques, machine learning-based approaches and more recently deep learning-based mechanisms have been utilized to infer sentiments from reviews [2–4]. The three-step working methodology followed in this paper can be summarized as follows– 1. 2.
3.
Firstly, a basic pre-processing of the datasets is done. BERT [5] language model is used to efficiently train deep bidirectional representations from beforehand. The architectural specialization of halting the inner layers and making the outer layers to get accustomed to the task while fine-tuning themselves accordingly is considered in this case. The reviews as output from the BERT [5] model are further passed through a fuzzy logic module based on the cognitive thinking process [6].
The proposed methodology has been implemented on three types of datasets, viz IMDB movie reviews [7], YELP restaurant reviews [8] and COVID-19 dataset(tweets) [9]. The paper has been organized in the following manner. Section 2 presents the inspiration behind the work done in this paper. Section 3 summarizes the recent works in the field of SA based on fuzzy logic using BERT [5]. Section 4 describes in detail the proposed methodology. Section 5 discusses the results obtained with experiments on the datasets. Section 6 contains the statistical analysis of the results
A Three-Step Fuzzy-Based BERT Model for Sentiment …
43
obtained. The paper ends with Sect. 7 mentioning the limitations of the proposed method and the probable future scope in this direction.
2 Motivation The main motivation of this work is to utilize the advantage of the transformer-based attention-only models. This in turn would yield the maximum result in the same amount of time allotted for the traditional approaches of natural language processing. The ease in which these models train long sentences quicker due to its pre-training feature removes the necessity of applying heavy engineered architectures for specific tasks. Hence, the quickness of the BERT [5] model is combined with the efficient decision-making feature of fuzzy logic in the proposed model. Two considerable takeaways from the paper are (1) high efficiency of BERT [5] model in handling extensive range of language processing tasks and (2) high performance of fuzzy logic in handling indecisiveness for analysing sentiments.
3 Background Study Sentiment analysis and its related fields have been in the helm of research for long. Collecting reviews/opinions/tweets from social media sites and devising various means to study and analyse them effectively have been reported in [10]. Fuzzy logic has also been applied in the field of SA [11] and in various facets of mining emotions from social media messages. Recent works include authors combing the advantage of fuzzy logic dealing with uncertainty and the automatic learning feature of deep learning architectures to predict sentiments [12]. Fuzzy regulation-based techniques have also been designed with the usage of membership degrees that show better results in predicting sentiments [13]. Hybrid approaches of mixing machine learning features with fuzzy logic have also been performed in [14]. Other than movies, implementation of fuzzy logic has also been done for product reviews where the characteristic-based categorization approach is extended to outdo the consequences of transformers, concentrators and dilaters [15]. BERT [5] has been proved to present superior results because of its bidirectional training method when compared to traditional embedding systems practiced in NLP [16]. As it tries to unearth the meaning of texts without any specific direction, it can easily learn the meaning of words based on the context. Many authors have also devised methods to help BERT [5] to be domain conscious so as to easily identify source and target domain separately [17]. Aspect-based sentiment categorization can also be done efficiently with BERT [5] by modifying some hyperparameters and using adversarial training [18]. Various fine-tuning methods have been applied to the originally created BERT [5] model, and the same has been implemented effectually in a plethora of languages other than English like Japanese, Spanish, Italian, etc.
44
K. Chakraborty et al.
[19]. Research has also been carried out to improvise the pooling layer construction method in BERT [5] to further allow identification of sentiments for individual tokens [20]. Work has also been done in the recent months on various languages like Chinese by combining BERT model with convolution neural network (CNN) to augment the original features [21]. An aspect-level SA model based on BERT [5] and an aspect feature location model (ALM-BERT) have been used to lessen the effect of noise words while detecting sentiments in [22]. Even a comparative study was made with traditional SA models and BERT model on IMDB datasets, where supervised and unsupervised models like long short-term memory (LSTM) were also implemented [23]. Authors in [24] mentions a task-oriented layer been incorporated with BERT through a bidirectional long short-term memory (Bi-LSTM) model to overcome the problem of understanding sentiments from short length texts like tweets. While BERT [5] accurately predicts sentiments in texts, it also reduces the overhead of training models from scrape. Fuzzy logic is competent enough to handle the vagueness of the sentiment scores which ranges from 0 to 1. The intention of this paper is to bring out the best by combining these two self-sufficient fields onto board for analysing optimal sentiments.
4 Proposed Methodology The sentiment analysis methodology proposed in this paper comprises mainly of three phases. The entire method is represented through a block diagram in Fig. 1. The different steps of the proposed methodology are discussed in the following subsections.
4.1 Collection of Datasets Three types of datasets have been considered for implementation of the proposed model as shown in Table 1. While the Dataset_1 (IMDB movie review dataset) has been traditionally used in numerous experiments pertaining to analyse sentiments, Dataset_2 (YELP) deals with restaurant reviews and the third dataset comprises Dataset_3 (COVID-19 tweets) between December 2019 and May 2020. Both YELP and COVID-19 datasets have their reviews labelled into positive, negative and neutral, but the reviews of the IMDB datasets can be classified into positive and negative tweets only. To bring uniformity in the datasets, some irrelevant columns have been omitted to avoid computational complexities of the BERT [5] and fuzzy logic architectures. Reviews in the form of textual sentences and their ratings are mainly considered for the experiments.
A Three-Step Fuzzy-Based BERT Model for Sentiment …
45
Collection of datasets
Pre-processing of the datasets
BERT Module
Fuzzy Logic Module
Prediction of sentiments
Positive
Neutral
Negative
Fig. 1 Three-step architecture of the proposed model
Table 1 Details of datasets being used Name
Name given in this paper
Type of classification
Number of reviews/tweets
IMDB movie reviews
Dataset_1
Binary (Positive, Negative)
YELP restaurant reviews
Dataset_2
Multiclass (Positive, Negative, Neutral)
140,000
130,000
10,000
COVID-19 dataset (tweets)
Datasiet_3
Multiclass (Positive, Negative, Neutral)
226,668
204,000
22,668
50,000
Testing quantity
Training quantity
25,00025,000
4.2 Pre-processing of the Datasets After collection of the datasets, the basic pre-processing on the datasets is done. Pre-processing of data before extracting sentiments from them is one of the most crucial steps. It helps in enhancing the quality of the texts whose sentiment is to be unearthed. It also lessens the complexity due to the cleaning of the texts before being processed in the model. The raw datasets are removed of stop words, unwanted
46
K. Chakraborty et al.
spaces and are stemmed to find out the base words of the texts. Stop words are the most frequently used words like “a”, “an”, “the”, “is”, etc., which are used to render meaningful insight to a sentence. The removal of stop words hence does not influence the meaning of the text. Apart from the stop words, spaces marked by tabs, extra spaces in between the texts, punctuation marks, hashtags, hyperlinks, emoticons are also removed to clean the texts for processing. Then, stemming and lemmatization are done to bring out the stem of the words and to find out the background in which they have been used. Finally, the words are identified as tokens. Generally, tokenization is done to break the texts into appropriate phrases, either as single components or by combining two or three phrases together. While the single word being tokenized is called as n-gram, collection of two words is known as bi-gram, and three words form a tri-gram [25].
4.3 BERT Module After the tokenization of the respective datasets, they are sent for categorization in the BERT [5] module. BE stands for “Bidirectional Encoder Representations from Transformers” and is based on attention-oriented learning strategy. The transformerbased technology used in BERT [5] is to find out the related background of a particular word for a smarter presentation of the same while encoding it. As BERT [5] has been pre-trained on huge datasets, a minimum number of fine-tuning leads to massive improvement in the results. The contextual modelling of BERT [5] and its ability to understand word representations in a two-way manner help for its effectivity in natural language processing tasks. The tokens received after pre-processing of the datasets is fed into the input layer of the BERT [5] framework. BERT [5] encoder is then responsible for attaching a sentence embedding to each of the tokens which it had received as input. Along with it, a position-wise embedding is also added to the tokens to identify the position of the word in the sentence. The entire sequence is then passed through the transformer model of the BERT [5] framework. The output layer of the BERT [5] framework comprises of a SoftMax activation function [26] to help map the resultant words to the vectors.
4.4 Fuzzy Logic Module The next step of the proposed model is to pass the classified output through fuzzy membership functions. Fuzzy logic is specially used for its ability to efficiently handle indecisiveness in sentiment analysis cases. In this step, the value given to the texts via the BERT [5] model is further fuzzified keeping the fixed range of values between 0 and 1 by the means of membership functions. Here a set of seven rules has been formulated randomly based on Mamdani fuzzy interpretation technique to find the final predicted sentiment of the datasets represented in Table 2. It has been assumed
A Three-Step Fuzzy-Based BERT Model for Sentiment … Table 2 Fuzzy membership rules
Rule no
47
Positive score
Negative score
Sentiment
I
Low
Low
Neutral
II
High
Low
Positive
III
Medium
Medium
Neutral
IV
Low
Medium
Negative
V
High
Medium
Positive
VI
Low
High
Negative
VII
High
High
Neutral
that the effect of a sentiment with a medium score is nullified if the other score is positive or negative; hence, it has been discarded while considering the membership rules. The sentiments hence obtained after applying the defuzzification method is considered as the final sentiment prediction. To prove the efficiency of the proposed model to the existing BERT [5] model, all the datasets have been run with the BERT [5] model first post the necessary preprocessing. Several combinations of the parameters like batch size, learning rate and number of epochs have been taken to show the variety the results. There are two variations in the BERT [5] model based on the number of encoders stacked on one other, BERT [5] base and BERT [5] large. The BERT base having twelve encoder layers, 110 million features and 786 hidden layers has been used in this model. It is found that the BERT [5] base model with three epochs executed on Dataset_1 results in 86% accuracy. The detailed output is shown in Table 3. For clarification, the mentioned datasets have been implemented twice, once only on the BERT [5] model and then through the proposed model. For the fuzzy module again, three variations have been made. The predicted results from the BERT [5] module is fuzzified using membership functions [27]. Seven basic Mamdani rules [28] have been formed to progress each text. The rules are considered from a previous paper by the authors [9]. After the inference values have been estimated, the defuzzification is carried out.
5 Results and Discussion The three-tier proposed model has been implemented over all the datasets. All the datasets have been applied to the BERT [5] module only at the beginning, and then, they have been introduced to the fuzzy module. The fuzzy module has been established by comparing three membership functions, viz. triangular membership function, trapezoidal membership function and Gaussian membership function [29]. Triangular membership function has been considered as it is one of the most primitive membership functions. Trapezoidal membership function turns out to be the
48 Table 3 Results of BERT [5] being used on Dataset_1
K. Chakraborty et al. Batch size Learning rate Number of epochs Accuracy (%) 8
16
32
1e−5
1
60
1e−5
2
61
1e−5
3
63
2e−5
1
62
2e−5
2
64
2e−5
3
63
3e−5
1
64
3e−5
2
64
3e−5
3
65
1e−5
1
77
1e−5
2
78
1e−5
3
77
2e−5
1
76
2e−5
2
80
2e−5
3
79
3e−5
1
78
3e−5
2
80
3e−5
3
79
1e−5
1
84
1e−5
2
85
1e−5
3
85
2e−5
1
82
2e−5
2
83
2e−5
3
84
3e−5
1
85
3e−5
2
85
3e−5
3
86
comprehensive combination of both triangular and rectangular membership functions. Gaussian membership function on the other hand makes use of mean and standard deviation to represent fuzziness. The reason these three membership functions have been considered here is because it encompasses almost all types of popular membership functions. Out of these, the Gaussian membership function yields the best result as shown in Table 4. Tables 5, 6 and 7 represent the different results applied on Dataset_1, Dataset_2 and Dataset_3, respectively. Based on all the experimental results, the proposed model shows better results than standalone BERT [5] model. To avoid the menace of over fitting and underfitting, the epochs considered for implementation of BERT [5] is 3. Dataset_1 resulted in greatest accuracy with our proposed model. While Dataset_1 yields an accuracy of
A Three-Step Fuzzy-Based BERT Model for Sentiment …
49
Table 4 Results of fuzzy membership functions applied on Dataset_1 Membership function
Accuracy (%)
F1-Score (%)
Precision (%)
Recall (%)
Triangular membership-based fuzzy inference rules
90
90
91.98
91.01
Trapezoidal membership-based fuzzy inference rules
92
92
92.35
93.67
Gaussian membership-based fuzzy inference rules
94%
95%
93.60%
93.58%
Table 5 Proposed model implemented on Dataset_1 Model used
Number of epochs
Accuracy (%)
F1- score
BERT
1
61
0.62
2
78
0.76
Proposed model
3
85
0.85
1
85
0.85
2
89.2
0.88
3
94
0.95
Table 6 Proposed model implemented on Dataset_2 Model used
Number of epochs
Accuracy (%)
F1-score
BERT
1
60
0.59
2
68
0.67
Proposed model
3
84
0.86
1
65
0.65
2
79
0.78
3
91
0.90
Table 7 Proposed model implemented on Dataset_3 Model used
Number of epochs
Accuracy (%)
F1-score
BERT
1
63
0.63
2
75
0.81
3
85
0.85
1
70
0.68
2
79
0.79
3
89
0.88
Proposed model
50
K. Chakraborty et al. Comparison of metrics for all the datasets before fuzzification
Comparison of metrics for all the datasets after fuzzification
96
90
94
85
92 90
80
88
75
86 84
70
IMDB
YELP
COVID
IMDB
YELP
COVID
Fig. 2 Metric-wise comparison of the datasets before and after fuzzification
94%, Dataset_2 gathers 91% in accuracy and Dataset_3 scores 89%. A comparison in the accuracy, F1-score, precision and recall of all the datasets before and after applying the fuzzification procedure is presented in Fig. 2.
6 Statistical Analysis of Results Apart from the parametric experiments performed to check the efficiency of the proposed model, a nonparametric statistical test has also been performed to establish the significance of the proposed method. The independent t test [30] is a statistical test which is based on interpretation that aims to find the difference between two distinct groups. In this test, the null hypothesis (1) is considered as the fact that the mean of two distinct groups is equal µ0 = µ1 ,
(1)
where µ represents mean. The aim in this case was to nullify this hypothesis and prove the fact that distinct groups have dissimilar mean (2). µ0 = µ1 ,
(2)
The threshold for the level of significance is considered at 0.5. After executing the t test, it is found that the significance value is 0.031, which is less than the threshold considered in this case. Hence, the groups considered for extracting sentiments have little or no similarity between them.
A Three-Step Fuzzy-Based BERT Model for Sentiment …
51
7 Conclusion This paper presents a new fuzzy-based BERT [5] architecture that shows exemplary results when applied on various datasets. After the necessary pre-processing mechanisms, the datasets were processed through BERT base model. Manual fine-tuning of the model is done by adding a fuzzy module which leads to higher accuracy rates in almost all the datasets. In this paper, the BERT [5]-based model has been used on pre-processed data, and finally, the model was fine-tuned by applying fuzzy-based approach for obtaining high accuracy rates. While experimenting, it was found out that the Gaussian membership functions yielded the best results in predicting the sentiments. The main objective of the paper is to incorporate the swiftness of BERT [5] and apply the accurateness of fuzzy to predict optimal sentiments. As limitations of this work, the computational complexity incurred due to expansion of the model parameters can be of concern to the researchers. A future work to this paper could be done by fine-tuning the BERT [5] model to produce better results. Other fuzzy membership functions could be also implemented as a probable extension to this work.
References 1. http://www.andrew.cmu.edu/user/angli2/li2019sentiment.pdf 2. C. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis of social media text, in Proceedings of the International AAAI Conference on Web and Social Media, vol. 8(1) (2014) 3. N. Cristianini, J. Shawe-Taylor, An introduction to support vector machines and other kernelbased learning methods. Cambridge University Press (2000) 4. T. Chen, R. Xu, Y. He, X. Wang, Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017) 5. J. Devlin, M. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 6. K. Indhuja, R.P.C. Reghu, Fuzzy logic based sentiment analysis of product review documents, in 2014 First International Conference on Computational Systems and Communications (ICCSC) (IEEE, 2014), pp. 18–22 7. http://ai.stanford.edu/~amaas/data/sentiment/, Retrieved on Mar 2021 8. https://www.yelp.com/dataset, Retrieved on Mar 2021 9. K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, A.E. Hassanien, Sentiment analysis of COVID-19 tweets by deep learning classifiers—A study to show how popularity is affecting accuracy in social media. Appl. Soft Comput. 97, 106754 (2020) 10. K. Chakraborty, S. Bhattacharyya, R. Bag, A survey of sentiment analysis from social media data. IEEE Trans. Comput. Soc. Syst. 7(2), 450–464 (2020). https://doi.org/10.1109/TCSS. 2019.2956957 11. A. Kar, D.P. Mandal, Finding opinion strength using fuzzy logic on web reviews. Int. J. Eng. Ind. 2(1), 37–43 (2011) 12. P. Bedi, P Khurana, Sentiment analysis using fuzzy-deep learning, in Proceedings of ICETIT 2019 (Springer, Cham, 2020), pp. 246–257
52
K. Chakraborty et al.
13. C. Jefferson, H. Liu, M. Cocea, Fuzzy approach for sentiment analysis, in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (IEEE, 2017), pp. 1–6 14. M. Anagha, RR. Kumar, K. Sreetha, P.C. Reghu Raj, Fuzzy logic based hybrid approach for sentiment analysisl of malayalam movie reviews, in 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES) (IEEE, 2015), pp. 1–4 15. Sugiyarto, J. Eliyanto, N. Irsalinda, M. Fitrianawati, Fuzzy sentiment analysis using convolutional neural network, in AIP Conference Proceedings, vol. 2329(1) (AIP Publishing LLC, 2021), p. 050002 16. S. Alaparthi, M. Mishra, BERT: a sentiment analysis odyssey. J. Mark. Anal. 9(2), 118–126 (2021) 17. C. Du, H. Sun, J. Wang, Q. Qi, J. Liao, Adversarial and domain-aware BERT for crossdomain sentiment analysis, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 4019–4028 18. A. Karimi, L. Rossi, A. Prati, Adversarial training for aspect-based sentiment analysis with bert, in 2020 25th International Conference on Pattern Recognition (ICPR) (IEEE, 2021), pp. 8797–8803 19. E. Bataa, J. Wu, An investigation of transfer learning-based sentiment analysis in japanese. arXiv preprint arXiv:1905.09642 (2019) 20. J. Leheˇcka, J. Švec, P. Ircing, L. Šmídl, BERT-based sentiment analysis using distillation, in International Conference on Statistical Language and Speech Processing (Springer, Cham, 2020), pp. 58–70 21. R. Man, K. Lin, Sentiment analysis algorithm based on BERT and convolutional neural network, in 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC) (2021). https://doi.org/10.1109/ipec51340.2021.9421110 22. G. Pang, L. Keda , X. Zhu, J. He, Z. Mo, Z. Peng, B. Pu, Aspect-level sentiment analysis approach via BERT and aspect feature location model. Wirel. Commun. Mob. Comput. (2021) 23. S. Alaparthi, M. Mishra, BERT: a sentiment analysis odyssey. J. Mark. Anal. 9(2), 118–126, June (2021), Palgrave Macmillan 24. S. Agrawal, S. Dutta, B. Patra, Sentiment analysis of short informal text by tuning BERT-BiLSTM model, in IEEE EUROCON 2021–19th International Conference on Smart Technologies (IEEE, 2021), pp. 98–102 25. A. Mitra, Sentiment analysis using machine learning approaches (Lexicon based on movie review dataset). J. Ubiquit. Comput. Commun. Technol. (UCCT) 2(03), 145–152 (2020) 26. Z. Lu, C. Liangliang, Y. Zhang, C. Chung-Cheng, J. Fan, Speech sentiment analysis via pre-trained features from end-to-end asr models, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 7149–7153 27. A. Sadollah, Introductory chapter: which membership function is appropriate in fuzzy system? in Fuzzy Logic Based in Optimization Methods and Control Systems and Its Applications. IntechOpen (2018) 28. I.A. Hameed, Using Gaussian membership functions for improving the reliability and robustness of students’ evaluation systems. Expert Syst. Appl. 38(6), 7135–7142 (2011) 29. K. Wang, Computational intelligence in agile manufacturing engineering, in Agile Manufacturing The 21st Century Competitive Strategy, Oxford, UK: Elsevier Science Ltd (2001), pp. 297–315 30. R. Sharma, D. Mondal, P. Bhattacharyya, A comparison among significance tests and other feature building methods for sentiment analysis: A first study, in Computational Linguistics and Intelligent Text Processing, ed. by A. Gelbukh (Springer International Publishing, Cham, 2018), pp. 3–19
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources Single Area Power System T. Muthukumar , K. Jagatheesan , and Sourav Samanta
Abstract This proposed research work is LFC of multi-sources single area power generating units with proportional–integral–derivative (PID) regulator. Multisources power generating unit comprised of thermal hydrogas (THG) power generating plants. In this work, mayfly algorithm (MA) technique is adopted to optimize the gain parameters of the PID regulator. Integral time absolute error (ITAE) objective function is considered at the time of gain tuning. The response of the proposed technique is equated with the genetic algorithm (GA) and particle swarm optimization (PSO) techniques tuned PID controller for the identical techniques to prove the superiority of the proposed controller. During the investigation, 1% step load participation (SLP) is considered. Finally, improvement clearly shows that proposed the MA-PID controller is superior to GA and PSO techniques in terms of quick settling time and minimal overshoot and undershoot under unexpected loading situations. Keywords Mayfly algorithm · Matting trajectory · Regulatory controller · Stabilized frequency · Particle swarm optimization
1 Introduction In recent modern, day-to-day life activities, everyone expected comfort life. For living comfort and peaceful life, electrical energy plays a major role. But, maintaining the quality of electrical power is difficult due to continuous varied load demand. In this regard, the output voltage along with the frequency of generating power supply gets deviated from its predetermined value. The voltage and frequency get a major role in the power system for maintaining system stability. When a rapid load demand T. Muthukumar (B) Department of EEE, Kongunadu College of Engineering and Technology, Trichy, India K. Jagatheesan Department of EEE, Paavai Engineering College, Namakkal, India S. Samanta Department of CSE, University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_5
53
54
T. Muthukumar et al.
occurs, the system’s power quality and stability are impacted. In order to overcome the issues, many research peoples are utilized more optimization techniques and adopted secondary controllers. The literature review for the LFC is as follows: To solve the oscillation in the frequency during unexpected loading situations in an interconnected thermal power network liaisons gravitational search algorithm, fuzzy PID controller is employed as an auxiliary controller in [1]. The author in [2] has been adopted flower pollination algorithm (FPA)-based PID controller to overcome the LFC issues which occur in the interconnected power network with nonlinearity components such as GDB. Firefly algorithm (FA) is involved in the gain parameter tuning process of PID controller which is employed as an auxiliary regulator for an interconnected reheated thermal power plant (PP) in [3] to solve the AGC problem. In the modern complex power system (PS), ICA-tuned fuzzy fractional-order integral derivative + filter (CFFOIDF) controller is designed and implemented in [4] to overcome the frequency oscillation problem in a two-area interconnected PS incorporated with electrical vehicle (EV). Ant colony optimization (ACO) technique tuned PID controller is implemented in [5] as an auxiliary controller for a single area nuclear PS, performance is compared with the classical method. An FPID controller is employed for the LFC of a multi-area PS during the unpredicted load demand, and the impact of the superconductor magnetic energy storage (SMES) device is also analyzed in [6]. A microgrid PS is examined by a secondary controller (PID) which is optimized by the PSO technique, to prove the improvement in the result of the proposed optimization method result are equated with the classical tuning method in [7]. A two-area PS is incorporated with thermal, and hydro-PP has been undergone through the examination of LFC in [8] with the help of an auxiliary controller (fuzzy logic controller). The controller is tuned by modified Jaya optimization algorithm (MJOA), and final result was compared with various optimization techniques and controllers. An adaptive neuro-fuzzy system controller is appointed as an auxiliary controller in [9] to a multi-area thermal PS for controlling and maintaining the frequency oscillation of the system during sudden load demand occurs. A one-area PS with THG, nuclear, and energy storage units has been examined in [10] by PSO-PID controller for analyzing the behavior of the suggested optimization technique. Grasshopper optimization algorithm-based PID controller is designed and implemented in [11] for two areas interconnected AC microgrid PS for solving LFC problem. Distributed model predictive control (DMPC) scheme is implemented for LFC of PS thermal PP with GRC in [12]. Quasi-oppositional gray wolf optimization algorithm (QOGWO)based PID controller has been implemented in [13] for solving frequency oscillation problem of multi-area thermal PS. To prove the effectiveness of the proposed technique, result has been compared with other popular optimization techniques. Social foraging behavior of Escherichia Coli bacteria approach-based PI controller is utilized to eradicate the frequency deviation in the system, and the performance is equated with GA for showing proposed technique is better in [14]. The literature review for LFC/AGC of the PS clearly shows that the controlling and maintenance of the system frequency oscillation and stability are a difficult task.
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources …
55
To perform that task in an effective manner, various secondary controller’s areas are implemented and also many computing algorithms are utilized for optimizing the gain parameters of the controller. With the help of literature review in this research work, the PID regulator is proposed as a subordinate controller and MA is employed to tune the gain parameters of the controller for a proposed PS with THG. And also literature survey effectively evident that the performance of the system is affected during unexpected sudden load. Because of this system, time-domain parameters are deviated from its predetermined value in system response. The novelty of this proposed article is that the first time MA is applied to optimize PID controller gain parameters.
1.1 Highlights of the Research Work • To develop a Simulink model of proposed PS consisting of THG. • The proper selection of secondary controller and objective function. • Utilization of proposed MA-based optimization technique to optimize the controller gain parameter. • To prove the supremacy of proposed optimization technique performance compared with GA and PSO.
1.2 Structure of the Article In this research, the introduction and literature review are discussed in Sect. 1. Section 2 contains the system development and transfer functions. The controller design is explained in detail in Sect. 3. In Sect. 4, the objective function selector process and the need for tuning were discussed. The result and performance of the proposed system are analyzed in Sect. 5. The conclusion of the research work is explained in Sect. 6.
2 System Investigated The proposed PS consists of three different power generating units such as THG PPs. For performance analysis of the projected power system, a mathematical model was developed with the help of the following transfer functions. Transfer functions are given in Eqs. 1–10. Thermal system: Governor =
1 sTsg + 1
(1)
56
T. Muthukumar et al.
Reheater =
K r Tr s + 1 Tr s + 1
(2)
1 sTt + 1
(3)
1 sTgh + 1
(4)
Turbine = Hydro-system: Governor =
Tr + 1 Tr h s + 1
(5)
−Tw s + 1 0.5Tw s + 1
(6)
Valve positioner =
1 s Bg + C g
(7)
Speed governor =
Xgs + 1 YG s + 1
(8)
Drop compensation = Penstock turbine = Gas turbine system:
Combustion reaction =
−TC R s + 1 TF s + 1
(9)
1 sTC D + 1
(10)
Compressor discharge =
With the help of the above-mentioned transfer function, a system developed for the investigation of frequency regulation for a single area multi-source PS that is given in Fig. 1 [15, 16]. The variables name used in the above equations (Eqs. 1– 10) are given in annexure. The nominal values for the transfer function are given in annexure.
3 Controller Design PID controller is a closed-loop controller with feedback to achieve the desired result. It is mostly used for industrial control applications and other applications. In practice, the control function is automatically corrected in a precise and responsive manner. The PID principle is now widely employed in applications that require precise and efficient automatic control. The mathematical function of the PID regulator is given
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources …
57
Fig. 1 Simulink model of proposed THG PS
Fig. 2 Typical assembly of PID controller
in Eq. 11 [5]. The basic functional block of the PID controller is given in Fig. 2. G(s) = K P + K I + K D /S Lets, G(s) = Transfer function of PID controller. K P = P controller gain parameter. K I = I controller gain parameter. K D = D controller gain parameter.
(11)
58
T. Muthukumar et al.
3.1 Objective Functions A cost function is the measure of efficiency to mitigate errors in response and maximize the performance of the system during an emergency situation. This requires maximum power consumption, built-in error, and deviation from a signal reference value. The cost function is a functional equation that maps several points to a single scalar value in a time series. Such function ITAE cost function. Cost functions are used to find optimal control gain values to achieve desired power quality and minimum performance index in the power system. The mathematical expression of the objective function ITAE is expressed in Eq. 12. JITAE =
t.|ACE|dt
(12)
where J = Performance index, t = Simulation time, ACE = Area Control Error.
4 Mayfly Algorithm MA is used in this suggested research to modify the gain parameters of the PID controller in the single area multisource PS. The MA comprises the major advantage of PSO, GA, and FA techniques. And also, it is a modified version of the PSO technique which is proposed by Kennady and Ebernart in 1995. The social behaviors of mayflies and specifically in the matting process of mayflies are attracted by many researchers for developing MA. A mayfly is a type of bug that lives near water and only lives for a brief period as an adult. In this, male fly will generally have much larger eyes over females. Male fly also has claspers beneath their tails that are used to hold the female while they fly and mate. In this work, solution to the problem is obtained by the position of each mayfly in a given search space. The proposed MA algorithm work as follows: male and female mayflies are initially generated randomly as a towed that is in problem space each mayfly is located randomly as a candidate solution, and it has generated by a d-dimensional vector (X = (X1, …, Xd)) and its effectiveness is verified by utilizing predefined cost function f (x). The velocity flying direction of each mayfly is repeated the position changes and flying direction of mayfly, respectively. Flies direction is representing dynamic interaction of both social and individual mayfly experiences. Particularly each mayfly adjusts its trajectory toward best personal position (Pbest) so far, as well as global best at
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources …
59
mayfly in the swarm so far (gbest) [17]. The functional flowchart of the MA is given in Fig. 3. Functional steps of MA are as follows [17]: Step 1: Initialize the male and female mayfly population. Step 2: Update the velocity and solution. Step 3: Order the mayflies. Fig. 3 Functional flowchart of MA
60
T. Muthukumar et al.
Table 1 Optimized PID controller gain values Optimization technique/gain values
P controller gain
I controller gain
D controller gain
Fitness value
GA
0.8000
0.9796
0.2599
0.0743
PSO
0.8120
0.9996
0.1138
0.0716
MA
0.6581
0.9999
0.0627
0.0697
Step 4: Mate the mayflies and appraise the offspring. Step 5: Replace the poorest solutions with the finest new solution. The optimized controller gain parameters for LFC of the proposed power system are reported in Table 1.
5 System Performance Analysis The proposed and developed Simulink model of single area THG PS is tested in the MATLAB working platform by applying 1% of load disturbance. While applying 1% SLP, the system frequency deviated from the standard one that means when the system is under no load disturbance the system is stable. While the disturbance is applied to the system, the frequency gets oscillation and the stability of the system is also disturbed. The graphical representation of the frequency deviation of the proposed THG PS is plotted in Fig. 4. The optimized parameters of the frequency deviation from Fig. 3 are reported in Table 2. 0.001 0.000 -0.001 -0.002
del F (Hz)
-0.003 -0.004 -0.005 -0.006 -0.007 -0.008
GA-PID PSO-PID MA-PID
-0.009 -0.010 -0.011 0
5
10
15
20
Time (S)
Fig. 4 Frequency deviation comparison of THG
25
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources …
61
Table 2 Time domain-specific parameters of frequency deviation of single area THG PS Optimization technique/time domain-specific parameters
Ts (S)
Os (Hz)
Us (Hz)
GA
60
1 × 10–4
−6.2 × 10−3
PSO
58
0.5 × 10–4
−7.2 × 10−3
46
10−4
−10 × 10−3
MA
0.2 ×
Settling time in Sec
60
50
40
30
20
10
0
GA
PSO
MA
Optimization Technique
Fig. 5 Bar chat comparison of settling time
The proposed system is investigated by using MA-PID controller, and the performance was analyzed by both graphically and numerically in Fig. 3 and Table 2, respectively. From the analysis, the MA-based optimization technique is performed well than GA and PSO for the proposed THG PS. To confirm the effectiveness of the proposed technique-based PID controller, a bar chart comparison is plotted. The bar chat comparison is shown in Fig. 5. The percentage of improvement of the proposed optimization technique (MA) over other techniques (GA and PSO) is clearly demonstrated. The improvement of PSO technique over GA is 3.33%, MA over GA is 23.33%, and MA over PSO is 26.67%. From all the domination of performance analysis, the proposed technique is performed well than others (Table 3).
6 Conclusion The frequency stabilization of single area multi-sources PS was developed and studied. In this proposed work, the analyzed power generating single area power unit comprises THG PP units along with PID controller. The proposed MA optimization
62
T. Muthukumar et al.
Table 3 % Improvement of MA over GA and PSO Optimization technique/% of Improvement
TS (S)
% Improvement (over GA)
% Improvement (over PSO)
GA
60
–
−3.33
PSO
58
3.33
–
MA
46
23.33
20.67
technique is implemented to optimize the gain parameter of the PID regulator in the investigated system, and its supremacy is proved by comparing GA and PSO technique tuned controller response. In this work, ITAE objective function was utilized to optimize gain values during the optimization process. The result comparison clearly exposed that the proposed MA-PID controller delivers well-controlled performance over GA and PSO technique during emergency with fast settling of oscillation in frequency and minimal values of peak shoots.
Appendix [15, 16] Prt = 2000 MW. PL = 1840 MW. Speed regulator gain (R1,2,3) = 2.4 Hz/p.u. Turbine time constant (T T ) = 0.3 s. Gain constant of reheater (K R ) = 0.3 Reheater time constant (T R ) = 10 s. Governor time constant (T SG ) = 0.08 s. Gain constant of power system (K PS ) = 120 Hz/ p.u. MW. Time constant of power system (T PS ) = 20 s. Time constant of hydro-turbine (T W ) = 1 s. Time constant of compensator (T RS ) = 5 s. Time constant of drop compensator (T RH ) = 28.75 s. Time constant of gas turbine (T GH ) = 0.2 s. Gas Speed Governor time constant (X C , Y C , cg, bg) = 0.6, 1, 1, 0.05 s; Combustion reactor time constant (T F , T CR , T CD ) = 0.23, 0.01, 0.2 s; Thermal PP gain constant (K T ) = 0.543478.
Mayfly Algorithm-Based PID Controller for LFC of Multi-sources …
63
Hydro PP gain constant (K H ) = 0.326084. Gas PP gain constant (K G ) = 0.130438.
References 1. P. Mohanty, R.K. Sahu, S. Panda, A novel hybrid many optimizing liaisons gravitational search algorithm approach for AGC of power systems. Automatika 61(1), 158–178 (2020) 2. K. Jagatheesan, B. Anand, S. Samanta, Flower pollination algorithm tuned PID controller for multi-source interconnected multi-area power system, in Applications of Flower Pollination Algorithm and its Variants (2021), p. 221 3. K. Naidu, H. Mokhlis, A.H.A. Bakar, V. Terzija, H.A. Illias, Application of firefly algorithm with online wavelet filter in automatic generation control of an interconnected reheat thermal power system. Int. J. Electr. Power Energy Syst. 63, 401–413 (2014) 4. Y. Arya, Effect of electric vehicles on load frequency control in interconnected thermal and hydrothermal power systems utilising CF-FOIDF controller. IET Gener. Transm. Distrib. 14(14), 2666–2675 (2020) 5. B. Dhanasekaran, S. Siddhan, J. Kaliannan, Ant colony optimization technique tuned controller for frequency regulation of single area nuclear power generating system. Microprocess. Microsyst. 73, 102953 (2020) 6. M.R.I. Sheikh, S.M. Muyeen, R. Takahashi, T. Murata, J. Tamura, Application of self-tuning FPIC to AGC for load frequency control in multi-area power system, in 2009 IEEE Bucharest PowerTech (IEEE, 2009), pp. 1–7 7. D. Boopathi, S. Saravanan, K. Jagatheesan, B. Anand, Performance estimation of frequency regulation for a micro-grid power system using PSO-PID controller. Int. J. Appl. Evol. Comput. (IJAEC) 12(2), 36–49 (2021) 8. C. Pradhan, C.N. Bhende, Online load frequency control in wind integrated power systems using modified Jaya optimization. Eng. Appl. Artif. Intell. 77, 212–228 (2019) 9. A. Pappachen, A.P. Fathima, Load frequency control in deregulated power system integrated with SMES–TCPS combination using ANFIS controller. Int. J. Electr. Power Energy Syst. 82, 519–534 (2016) 10. V. Kumarakrishnan, G. Vijayakumar, D. Boopathi, K. Jagatheesan, S. Saravanan, B. Anand, Optimized PSO technique based PID controller for load frequency control of single area power system. Solid State Technol. 63(5), 7979–7990 (2020) 11. D.K. Lal, A.K. Barisal, M. Tripathy, Load frequency control of multi area interconnected microgrid power system using grasshopper optimization algorithm optimized fuzzy PID controller, in 2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS) (IEEE, 2018), pp. 1–6 12. Y. Zhang, X. Liu, B. Qu, Distributed model predictive load frequency control of multi-area power system with DFIGs. IEEE/CAA J. Autom. Sin. 4(1), 125–135 (2017) 13. D. Guha, P.K. Roy, S. Banerjee, Load frequency control of large-scale power system using quasi-oppositional grey wolf optimization algorithm. Eng. Sci. Technol. Int. J. 19(4), 1693– 1713 (2016) 14. E.S. Ali, S.M. Abd-Elazim, Bacteria foraging optimization algorithm-based load frequency controller for interconnected power system. Int. J. Electr. Power Energy Syst. 33(3), 633–638 (2011) 15. S.K. Sinha, R.N. Patel, R. Prasad, Application of GA and PSO tuned fuzzy controller for AGC of three area thermal-thermal-hydro power system. Int. J. Comput. Theory Eng. 2(2), 238 (2010)
64
T. Muthukumar et al.
16. B. Mohanty, S. Panda, P.K. Hota, Controller parameters tuning of differential evolution algorithm and its application to load frequency control of multi-source power system. Int. J. Electr. Power Energy Syst. 54, 77–85 (2014) 17. K. Zervoudakis, S. Tsafarakis, A mayfly optimization algorithm. Comput. Ind. Eng. 145, 106559 (2020)
Group Key Management Techniques for Secure Load Balanced Routing Model Praveen Bondada and Debabrata Samanta
Abstract Remote sensor organizations (WSNs) assume a vital part in giving ongoing information admittance to IoT applications. Be that as it may, open organization, energy limitation, and absence of brought together organization make WSNs entirely defenseless against different sorts of pernicious assaults. In WSNs, recognizing vindictive sensor gadgets and dispensing with their detected data assume a vital part for strategic applications. Standard cryptography and confirmation plans cannot be straightforwardly utilized in WSNs on account of the asset imperative nature of sensor gadgets. In this manner, energy productive and low idleness procedure is needed for limiting the effect of malignant sensor gadgets. In this research work presents a secured and burden balanced controlling contrive for heterogeneous bunch-based WSNs. SLBR shows a predominant trust-based security metric that beats the issue when sensors proceed to influence from extraordinary to terrible state and the other way around; besides, SLBR alters stack among CH. In this way, underpins fulfilling superior security, allocate transmission, and vitality efficiency execution. Trials are driven to calculate this presentation of developed SLBR demonstrate over existing trust-based controlling show, particularly ECSO. The result accomplished appears SLBR demonstrate fulfills favored execution over ECSO as distant as vitality capability (i.e., arrange lifetime considering to begin with sensor contraption downfall and total sensor contraption passing), correspondence overhead, throughput, allocate planning idleness, and harmful sensor contraption mis-classification rate and recognizable verification. Keywords Wireless sensors · Group key · Cryptography · Ad hoc networks
P. Bondada (B) · D. Samanta Dayananda Sagar Research Foundation, University of Mysore (UoM),Mysore, Karnataka, India e-mail: [email protected] D. Samanta Department of Computer Science, CHRIST Deemed to be University, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_6
65
66
P. Bondada and D. Samanta
1 Introduction Various applications in crucial fields such as detection of chemical or biological weapons or monitoring enemy vehicles are provided by networks of wireless sensors. The use of inaccurate or deliberately damaged data in these essential applications might have devastating effects [1]. Security services are necessary if such networks’ important information is gathered and processed to assure authenticity, confidentiality, freshness, and integrity. To provide these security services, the authentication of the entity and the key management that is resilient to external assault and failure of these sensors are needed. Since of WSNs natural components, it tends to be successfully passed on in any climate (basically hostile), be that as it may, defenseless against different perils which can cause dishonest transmission [2, 3]. It can deter digital thieves from attempting to take advantage of an incapably designed terminal to destroy data or steal data from an unapproved area of interest in order to defraud end-clients. From now on, to allow superior security, a viable security show ought to be arranged. This research tends to the number of issues relating to affirmation, genuineness, mystery, and certificate. Besides since of the arranged thought of WSNs, conventional technique bombs pitiably. Interruption discovery framework implies recognizing the sensor contraption conduct or the specific component distinguished with the sensor contraption. Trust-based IDS show has procured ubiquity and has appeared a fundamental advancement in finishing compelling assurance from the interior ambushes in peer-to-peer organization and WSNs moreover [4]. IDSs have many uses in WSNs; security coordinating is one of them, where the arranged calculation picks the foremost gotten ways subordinate on the belief appraisal of connecting sensor contraptions displayed the vital demonstrate for WSNs, the structure utilizes passed on notoriety and clarified where within the present, the imperfect sensor gadgets as per the value-based data among the neighbor sensor contraptions [5, 6]. This center determines the status of its helper sensor by constructing assessments that show the assistant sensor’s “assistance” in transmitting high-quality data or the center’s ability to transfer information completely throughout the sensor organization. Believe is not considered in the recommendation, so do not worry about recognizing different inside attacks from now on. Besides, carried out IDS utilizing QoS believe and social beliefs, which makes a difference in teaching the belief metric. Regardless, the commonality evaluation of the qualities depends on the occasion of the most special affiliation among these center points. These marvels can be readily deceived by deceptive sensor devices that go beyond the limits of everyday communication [7]. With the advent of the Web, a massive amount of data is being created, posing various security risks and challenges. By and enormous, grasping existing IDS will incite the following number of bundles being dropped considering profound activity stack circumstance. IDS are at that point impressively more intricate thinking almost such cases within the enormous information climate. Thus, believe appraisal utilizing fair standard divide-based status information when the data is outstandingly colossal turns out to be amazingly troublesome as the practicality of believing calculation can
Group Key Management Techniques for Secure Load Balanced Routing Model
67
be altogether reduced [8]. There is a necessity for compelling believe calculation that carries significant tradeoffs to construct insider ambushes considering the enormous information climate.
2 Literature Survey This section gives a thorough overview of the many existing secure control plans for more extensive sensor networks. Various trust-based approaches for solving quality of service (QoS) and security needs for huge data applications have recently been presented. A Bayesian-based belief computation demonstration for dynamic coordinating WSNs was shown to address the above-identified problems. Threatening sensor contraptions are identified using believe thresholding limits, and the unshakable quality is enlisted using divide status information. Exponential-based Trust and Notoriety Assessment Plot (ETRES) was presented for offering security for additional sensor organizations. The entropy approach is used to assess coordinate believe (DT). In addition, circuitous believe (IDT) improves communication and provides a higher level of faithfulness. It proposes an essential management plan based on probabilistic key sharing across random network nodes [9]. Eschenauer and Gligor. There are three steps in key distribution: key predistribution, key-sharing discovery, and path-key setup. Each node picks k keys randomly from a critical pool during the key redistributing stage. Each sensor node learns the shared keys within its transmission range during the shared-key discovery stage. In the route key build phase, if two nodes have no shared keys, a shared key is established through two or more connections. In this system, the key ring is revoked once a node is hacked. Moreover, these keys should be deleted from others. Their demonstration can restrain the DoS ambush with irrelevant calculation fetched. Regardless, taking a see at each sensor contraption for favoring stamp activates basic deferral [10]. In [11] presented a worked on beta demonstrate for recognizing malignant sensor contraptions. The abutting contraption is picked subordinate on belief information amid the correspondence organize. The condition of the connecting sensor contraption is revived in an irregular way. The show encourages creates vitality capability of WSNs. Be that because it may, vitality and memory imperatives are not considered within the believe demonstrate. In [12] settled issues of believe alter. Believe is prepared utilizing both quick and unusual believe. In any case, believe legitimacy of the demonstrate is not thought of [13]. In [14], a few current approaches consider resource and security autonomously. They also appeared impotent in their choice of bounce centers, which will cause data hardship or require the bundle to be re-transmitted, resulting in nearly larger resource use. Using the exponential cat swann optimization technique, advance establishing allocates transmission execution in provided a multipath coordinating scheme for WSNs. Perfect methods for offering bundles considering several QoS bounds, such as interface openness, separation, latency, stack thickness, vitality, and belief, have been discovered. Using the ECSO approach, the bunch head with maximum QoS bounds
68
P. Bondada and D. Samanta
is used for multipath data transmission. However, because bundles are controlled to a sensor contraption with an ultimate QoS boundary, the ECSO show concludes with a much better division of transmission execution with security provisioning, which creates overhead among CH. An essential for heterogeneous WSNs (LLEECMP demonstrate) was arranged and advanced to basic, and this research utilizes trustbased security show for heterogeneous further sensor organizations to overcome inquire about challenges for this research [15, 16].
3 Proposed Methodology Portable impromptu organizations are foundation-less organizations. The remote hubs play out the assignment of the switch just as the correspondence end focuses. Profoundly unstable hidden medium forced numerous security dangers on ad hoc networks. The relevance of such organization goes from military activities and crisis calamity help to local area systems administration and communication (Fig. 1). The small–medium presented these organizations with additional security dangers than their wired partners. Dynamic and uninvolved threats harm specially appointed organizations in contrast with wired organizations. Emotional assaults like unapproved access, maverick passage, man in the center assault, meeting commandeer, or refusal of administration just as latent snooping, are intrinsically simple in these organizations. These assaults infiltrate all layers of convention. For lower layers, different methods like spread-range procedures, recurrence bouncing, interleaving, or any such means are being utilized for security. Higher layer cryptography is a genuinely unique approach to keep an eye on these dangers[17]. Cryptography is one of the most investigated and broadly sent methods[14] of giving security administrations. Various plans depending on cryptography have been
Fig. 1 Cryptography-based hard security services
Group Key Management Techniques for Secure Load Balanced Routing Model
69
planned and are carried out on remote impromptu organizations. Cryptographic keys go about as verification of realness, and their ownership recognizes real clients from evil ones. The complex security parts are displayed. Essential administration and trust the board are two broad regions. In this paper, we have overviewed key management schemes. Segment 2 gives a short outline of not many essential administration plans. Segment 3 provides the correlation of these conventions, and area 4 closes our examination.
4 Key Management Schemes Previous research has revealed that the security of an impromptu organization’s board of directors is a difficult task. Different factors, such as computational complexity, low limits, hub dynamicity, and a hazard-prone climate, render the MANET more vulnerable to security threats. The lack of a framework further complicates the situation. Despite this, various key administration frameworks for MANET security have been presented in previous years (Fig. 2). The executives’ intentions are arranged extensively in this work as symmetric, crooked, and crossover keys. The symmetric key management is based on the essential private framework, which establishes standard private keys for symmetric cryptography. The executive plot’s topsy-turvy legend is based on the fundamental public foundation, which provides a couple of keys for unbalanced cryptography, such as public and private keys. In different plan stages, the mixture basic administration plot employs both symmetric and awry cryptographic keys. The symmetric essential plan is further divided into two categories: pre-appropriated, in which the mysterious key is shared before an arrangement by a presumed outsider, and critical understanding, in which the customers all agree on the typical private
Fig. 2 Soft security services based on trust
70
P. Bondada and D. Samanta
Fig. 3 Key management classification schemes
key after a performance [18]. Furthermore, the lopsided key the board plot is divided into three categories: certificate, in which a third-party certifies the authenticity of the client’s public key, ID, in which the client’s ID is used as the public key; and third endorsement less, in which no outsider is required, and the client certifies its public key (Fig. 3). SPINS, a security convention for sensor organizations, is designed for remote sensor organizations. It is anticipated that the pre-appropriation of a pair of astute mystery keys between hubs and base station will occur before the setup. The base station supplies the regular key, scrambled with distinct individual keys, for secure communication between junctions. Twists use SNEP and IITESLA for information security and validated broadcasting, respectively. The main disadvantage of this strategy is that it relies on a secure guiding convention for communication between hubs and base stations. The limited encryption and validation convention (LEAP) was designed with static WSNs in mind. Several pre-introduced keys for various exercises are created in this
Group Key Management Techniques for Secure Load Balanced Routing Model
71
design. For communication between sensor hubs and base stations, pre-circulated individual keys are used. A unique pre-shared gathering key ensures the security of broadcast messages by base stations. To construct pairwise keys between the hubs, a pre-introduced network-wide key is used. Its one-trust neighbors are also a plus [19]. For course recovery and single-direction key-chain validation, this scheme uses IITESLA. The major disadvantage of this strategy is that it assumes that the sink hub is rarely harmed. Deterministic key pre-dispersion is required for a key pre-conveyance scheme for general and framework bunch organizing of remote sensor networks. The organization zone has been divided into square portions. The square sections form the bunches, which have two types of hubs: ordinary and remarkable seats. Exceptional seats have far more computational force and energy than standard hubs. Before organization, two types of crucial pre-conveyance are established. For internal bunch communication, symmetric key pre-conveyance conspire is used between the exceptional and ordinary hubs, and a separate key pre-conveyance conspire is used between the extraordinary junctions of gathers. Sensor hubs in a group communicate directly, whereas hubs in many square areas communicate through special hubs [17]. The central authority CA in a classical public critical architecture (PKI) is responsible for all affirmation-related activities such as testament age, declaration repudiation, and so on. In any way, due to the lack of a stable framework and other brought together administrations, sending such an endorsement-based PKI in MANETs is difficult. Although to overcome the issue of weak links, a dispersed CA’s among a pre-selected number of hubs might be used. The following are some of the most widely used PKIs: The versatile certificate authority (MOCA) is an imbalanced key strategy based on a testament. It adheres to the guidelines of the PKI that has been delivered. The central position (CA) distributes a portion of its private key to a group of n worker hubs, nM, where M is the organization’s total number of worker hubs. Because the edge value for CA private-key reproduction is k (1= k n), the key must be discovered by k worker hubs. A worker dressed as a combiner collects the missing keys and hands over a hefty marked testimony. Secure and Efficient Key Management (SEKM) is an MOCA enhancement in which all worker hubs possessing the midway portion of the CA private-key associate with a worker bunch. Workers are specific hubs that form a multicast gathering to provide the edge component of the private key for administration citing hubs’ age of approval.
5 Results and Discussion The recreation consequences of the proposed MLKM convention are contrasted, and a few states of the craftsmanship strategies are characterized individually. Standard assessment measurements are utilized to assess the execution of condition of the craftsmanship strategies. The whole MLKM convention work is carried out in the NS2 test system, and the PC with Ubuntu OS, 4 GB RAM; also, Intel I3 processor is utilized likewise. Table 1 shows different parameters of simulation.
72
P. Bondada and D. Samanta
Table 1 Parameters of simulation Total amount of nodes
160
Size of area Range of transmission Time for simulation Source of traffic Size of packet Energy initial Power of transmission Power receipt
Table 2 Comparative results Performance Probabilistic key metric Communication overhead Detection accuracy Key memory storage Energy
900 * 900 200 met 60 s cbr 500 8.7 J 0.598 W 0.381 W
Three factor authentication
EDAK
Proposed iMLKM
0.172
0.257
0.301
0.112
0.905
0.865
0.870
0.928
3.284
4.320
3.682
2.412
13.705
13.829
13.259
14.289
MLKM convention recreation stage should be possible by introducing the accompanying boundaries as displayed in Table 1. Here, two distinct situations evaluate the MLKM convention: (1) with assault and (2) without assault. Reenactment is accomplished for the principal type by presenting the specific parcel drop assault also assaulting black hole. In the second type of recreation stage, each hub is free of a variety of network results. The following is a depiction of the near methods: Table 2 presents the similar conversation of the strategies execution against the MLKM convention proposed. Figure 4 projects the comparison results. The near model exhibition against the proposed MLKM convention is examined in Table 1. The discussion suggests that the current probabilistic key validation conspire has upsides of 0.172 kb, 0.905, 3.324 kb, and 13.705 J separately as overhead correspondence, location exactness, key memory stockpiling, and energy. The proposed MLKIVI convention accomplishes the generally further developed exhibition with the upsides of 0.112, 0.928, 2.412, and 14.289 as the overhead correspondence, exactness location, and key memory stockpiling.
Group Key Management Techniques for Secure Load Balanced Routing Model
73
Fig. 4 Comparison results
In Fig. 4, we are comparing our proposed results with existing models. In this case, we got 0.112, 0.928, 2.412, and 14.289 as the overhead correspondence, exactness location, and key memory stockpiling. In Fig. 5, we are showing only proposed results what we got with proposed model.
6 Future Scope WSNs hold promise for some applications, yet security is often a significant problem. Even though efforts have been made to examine cryptography, key management, secure steering, secure information collecting, and interruption placement in WSNs, there are still a few issues to be addressed. To begin with, the appropriate cryptographic techniques are determined by the handling capability of sensor hubs, demonstrating that there is no one-size-fits-all solution for all sensor organizations. The security solutions are quite application-specific. Second, energy, computation capacity, memory, and correspondence transmission capacity are all requirements for sensors. WSNs’ safety benefit plan should meet these requirements. Third, many current conventions assume that the sensor hubs and base station are permanently
74
P. Bondada and D. Samanta
Fig. 5 Proposed results
installed. Nonetheless, there may be times when the base station and possibly the sensors should be portable, such as in frontline situations. The portability of sensor hubs has a huge impact on sensor network geography, which creates a lot of questions about secure direction conventions. We recognize the following headings as a piece of what is to come in investigating safety hazards in WSNs. Make use of the private-key procedure’s accessibility on sensor hubs: Recent research into open-key cryptography suggests that public-key operations in sensor hubs may be commonplace. In any event, private-key activities in a sensor hub are still prohibitively expensive. Working on the effectiveness of private-key activities on sensor hubs is highly desirable since open-key cryptography can greatly simplify the plan of safety in WSNs. Portable sensor organizations with secure steering conventions: Sensor hub versatility impacts sensor network geography and, as a result, on steering conventions. At the base station, sensor hubs, or both, versatility can be found. The sensor network is assumed to be fixed according to current protocols. For portable sensor networks, new secure guiding conventions need to be developed. Sensor network security research is now focused on discrete events like temperature and stickiness. Persistent stream events like video and pictures are not examined. WSN video and picture sensors are not now generally available, but they will be in the future. There are considerable changes in invalidation and encryption between discrete and persistent events, implying that there will be trade-offs between consistent stream security and current WSN norms. Security and quality of service: WSN performance has deteriorated as safety benefits have increased. Security research in
Group Key Management Techniques for Secure Load Balanced Routing Model
75
WSNs is now focusing on specific themes such as key administration, secure direction, secure information aggregation, and interruption identification. In WSNs, both QoS and security administrations should be assessed at the same time.
7 Conclusion In this paper, we briefly depict a few conventions based on the critical plans that they follow. In comparison to topsy-turvy, half and half key plans, symmetric key plans have the least computing complexity. In contrast to other schemes, imbalanced key plans have a high level of interruption resistance, and a half and half key plans offer a high level of versatility. This work promotes the MLKM key administration convention for safe information transfer over the WSN. The proposed MLKM protocol establishes a secure communication link between hubs and transfers encoded data over the got to connect. The proposed MLKM convention ensures security over information correspondence by encouraging the staggered protection connect. The MLKM convention, explicitly designed for the bunched WSN, includes three phases: pre-arrangement, key aging, and key validation and confirmation. By giving WSN hubs a personality and a key, the pre-deployment eliminate provides the differentiating proof assignment. The next step is to use homomorphic encryption to figure out how to encrypt. In the last stage, a numerical model is developed utilizing the obtained factors such as hashing capacity, homomorphic encryption, dynamic passwords, profile succession, varying number, and EX-OR capacities for safe information transfer. The entire job is evaluated using various craftsmanship methods and estimates such as memory, key stockpiling, size, correspondence overhead, and transmission capacity usage.
References 1. T. Zhang, X. Xu, L. Zhou, X. Jiang, J. Loo, Cache space efficient caching scheme for contentcentric mobile ad hoc networks. IEEE Syst. J. 13(1), 530–541 (2019) 2. H. Xu, Y. Zhao, L. Zhang, J. Wang, A bio-inspired gateway selection scheme for hybrid mobile ad hoc networks. IEEE Access 7, 61997–62010 (2019) 3. D.-G. Zhang, P.-Z. Zhao, Y.C. Lu, T.Z. Chen, W. Hao, A new method of mobile ad hoc network routing based on greed forwarding improvement strategy. IEEE Access 7, 158514–158524 (2019) 4. A. Guha, D. Samanta, A. Banerjee, D. Agarwal, A deep learning model for information loss prevention from multi-page digital documents. IEEE Access 1 (2021) 5. A. Bhardwaj, H. El-Ocla, Multipath routing protocol using genetic algorithm in mobile ad hoc networks. IEEE Access 8, 177534–177548 (2020) 6. T. Dbouk, A. Mourad, H. Otrok, H. Tout, C. Talhi, A novel ad-hoc mobile edge cloud offering security services through intelligent resource-aware offloading. IEEE Trans. Netw. Serv. Manage. 16(4), 1665–1680 (2019)
76
P. Bondada and D. Samanta
7. V.V. Paranthaman, Y. Kirsal, G. Mapp, P. Shah, H.X. Nguyen, Exploiting resource contention in highly mobile environments and its application to vehicular ad-hoc networks. IEEE Trans. Veh. Technol. 68(4), 3805–3819 (2019) 8. Y. Zhang, Y. Shi, F. Shen, F. Yan, L. Shen, Price-based joint offloading and resource allocation for ad hoc mobile cloud. IEEE Access 7, 62769–62784 (2019) 9. A.K. Biswal, D. Singh, B.K. Pattanayak, D. Samanta, M.-H. Yang, IoT-based smart alert system for drowsy driver detection. Wireless Commun. Mob. Comput. 2021, 1–13 (2021) 10. V. Gomathy, N. Padhy, D. Samanta, M. Sivaram, V. Jain, I.S. Amiri, Malicious node detection using heterogeneous cluster based secure routing protocol (HCBS) in wireless adhoc sensor networks. J. Ambient Intell. and Hum. Comput. 11(11), 4995–5001 (2020) 11. A. Hammamouche, M. Omar, N. Djebari, A. Tari, Lightweight reputation-based approach against simple and cooperative black-hole attacks for MANET. J. Inf. Secur. Appl. (2018) 12. R.R. Althar, D. Samanta, The realist approach for evaluation of computational intelligence in software engineering. Innov. Syst. Softw. Eng. 17(1), 17–27 (2021) 13. P. Sivakumar, R. Nagaraju, D. Samanta, M. Sivaram, M. Nour Hindia, I.S. Amiri, A novel free space communication system using nonlinear InGaAsP microsystem resonators for enabling power-control toward smart cities. Wirel. Netw. 26(4), 2317–2328 (2020) 14. A. Khamparia, P.K. Singh, P. Rani, D. Samanta, A. Khanna, B. Bhushan, An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning, in Transactions on Emerging Telecommunications Technologies, 2020 15. A. Guha, D. Samanta, Hybrid approach to document anomaly detection: an application to facilitate RPA in title insurance. Int. J. Autom. Comput. 18(1), 55–72 (2021) 16. M.S. Mekala, R. Patan, S.K. Hafizul Islam, D. Samanta, G.A. Mallah, S.A. Chaudhry, DAWM: Cost-Aware Asset Claim Analysis Approach on Big Data Analytic Computation Model for Cloud Data Centre, 2021 17. V.S. Devi, N.P. Hegde, Multipath security aware routing protocol for MANET based on trust enhanced cluster mechanism for lossless multimedia data transfer. Wireless Pers. Commun.: Int. J. 100(3), 923–940 (2018) 18. M. Malathi, S. Jayashri, Modified bi-directional routing with best afford path (MBRBAP) for routing optimization in MANET. Wirel. Pers. Commun. 90(2), 861–873 (2016) 19. S. Subramaniyan, W. Johnson, K. Subramaniyan, A distributed framework for detecting selfish nodes in MANET using record- and trust-based detection (RTBD) technique. EURASIP J. Wirel. Commun. Netw. 2014(1), 205 (2014)
Search Techniques for Data Analytics with Focus on Ensemble Methods Archana S. Sumant
and Dipak V. Patil
Abstract A search technique plays an important role in selecting significant features from a huge feature set. As the dimensionality of data grows, there is a growing demand from industries that analyze data for analytics purposes. The main challenge is processing the data in a short time and with more predictive accuracy. Very less informative features are present in data which are important to increase classification performance. For this purpose, search strategies are applied on datasets to select relevant features. An ensemble method makes the best use of existing search techniques to improve classification accuracy. We have conducted a comprehensive evaluation of search strategies for dealing with ensemble methods in this study. The SU-R-SFS and SU-R-Stepwise ensemble algorithms are developed and tested on seven highdimensional cancer datasets. It has been discovered that ensemble stepwise search outperforms sequential forward search for high-dimensional data by an average of 3.08%. Keywords High-dimensional datasets · Feature subset selection · Greedy search · Heuristic search · Hybrid search · Ensemble search
1 Introduction Working with high-dimensional data for analysis is difficult in today’s data-driven world. Many applications, such as genomic analysis, sensor network analysis, medical analysis, and text analysis, work with a lot of data. When compared to the amount of features, high-dimensional data has a much smaller sample size. If the number of features is f and the number of samples is n, then high-dimensional data A. S. Sumant (B) Department of Computer Engineering, MET’s Institute of Engineering, Bhujbal Knowledge City, Adagaon, Nashik, Maharashtra, India e-mail: [email protected] D. V. Patil Department of Computer Engineering, GES’s R H Sapat College of Engineering Management Studies and Research, P T A Kulkarni Vidyanagar, Nashik, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_7
77
78
A. S. Sumant and D. V. Patil
is f >> n. Among these vast numbers of features, very few features are important for analysis tasks. In the first step, some filtering methods are used to rank these features. But these filtering methods have the disadvantage that they do not consider feature dependency. In the second step, search methods are used to find optimal feature subset by considering feature dependencies. In the analysis of high-dimensional data, feature subset selection (FSS) is a vital step. To improve algorithm performance, FSS is used to reduce data dimensionality. FSS gives advantage as it reduces overall execution time of the classification algorithm and provides a commercial model for analysis tasks. To deal with high-dimensional data, one can reduce the dimensionality by applying dimensionality reduction techniques and the second one is selecting a subset of features. There are various search algorithms for selecting a subset of features, but no algorithm has been declared the best for feature selection in general. Prior art [1] compares FS algorithms and concludes that for all datasets, no single solution outperforms the others. As a result, it is critical to keep offering the community with new feature selection options and search algorithms in order to enhance the current ones’ performance. Search strategy in FSS contributes to reducing the time complexity and also to increase the accuracy. In this paper, various search strategies used in FSS are reviewed.
2 Search Strategies for FSS Search strategies can be subdivided into exhaustive, greedy and heuristic, hybrid and ensemble. Exhaustive approach: It searches in the entire search space for optimal subset. For HDD, it becomes infeasible to apply this type of search because feature space is too large. Greedy Search greedy algorithm is popular in all search methods. A candidate subset of features is iteratively evaluated and checked for accuracy with other subset to improve the old subset. Two approaches are there as forward and backward. The forward process starts with the initial bare set, and subset is checked by adding one by one feature till expected accuracy is not achieved. In backward approach, elimination processes are done by considering the whole subset at start and later on features are removed from the subset. As FSS is an NP-hard combinatorial problem, greedy approach works better and is popular in FSS. Greedy search is easy and quick in solution generation. The problem with greedy search is that it is susceptible to false starts and not optimal [2]. Heuristic Search which finds the subset which will gradually lead to a better optimal solution. These are divided as trajectory-based and population-based. The trajectory algorithm normally employs one solution at a time, gradually improving the
Search Techniques for Data Analytics with Focus …
79
current solution as iterations progress by constantly modifying the current solution. Simulated annealing (SA) [3], particle swarm optimization (PSO) [4], genetic algorithms (GA) [5, 6] are a few examples. One of the meta-heuristic harmony search [7, 8] is a newly developed algorithm. Very limited mathematical requirements are for harmony search, and it is insensitive to initial value settings. Harmony search is popular because of its simple structure and powerful results and is used in many applications [9–13]. In [14], harmony search is used to solve the FSS problem. Symmetric uncertainty is usually found in HDD and can be treated with PSO solutions found. In order to find a better solution, a new hybrid fitness function is employed to evaluate candidate solutions [15]. Tabu search method replaces the existing solution by a novel solution obtained from a neighborhood. Tabu search is used along with GA for microarray data in [16]. Hybrid Search There are two basic steps in hybrid techniques. A filter method is used in the first phase to limit the number of features and, as a result, the searching space. The subsets are explored in the second stage, which is a wrapper method. A new hybrid fitness function evaluates candidate solutions to find better solution in [17]. Tabu search method replaces the existing solution by a novel solution obtained from a neighborhood. Tabu search is used along with GA for microarray data in [18]. Ensemble Search It is divided as combination approach and selection approach. The basic classifiers create their classifications, which are then used in the combination procedure to create the final classification. One of the classifiers is picked through a selection process, and the outcome is the final classification. Ensemble feature selection [19], genetic ensemble feature selection (GEFS) [20], ensemble forward sequential selection (EFSS) [21], and ensemble backward sequential selection (EBSS) [21] are some of the ways to ensemble search. The major goal of this review is to explore the literature for ensemble search methods. As ensemble methods are popular and can be applied in various stages of feature selection. Also, our aim is to experiment ensemble search techniques on high-dimensional datasets. This review will help researchers in selecting appropriate ensemble search methods. Subsequent sections explain each category in detail.
2.1 Ensemble Search Strategy Ensemble techniques can be applied at ranking of features, at feature selection level, at learning algorithm level, and also at classification levels. Diao et al. [22] Harmony Search based classifier ensemble reduction (CER). The bagging strategy is employed as the primary ensemble construction method, with C4.5 as the basic classification algorithm. The feature subset evaluators are the correlation-based FS (CFS), the probabilistic consistency-based FS (PCFS), and the fuzzy-rough set theory-based FS (FRFS). The HSFS algorithm then collaborates with the evaluators to select subsets of quality features (classifiers). Other FS approaches such as feature importance
80
A. S. Sumant and D. V. Patil
ranking techniques and other heuristic search strategies can be easily generalized to work with the suggested approach. Ebrahimpour and Eftekhari [23] here hesitant fuzzy sets are exploited for signifying different ranking algorithms in order to maximize the relevancy among features and class labels. Also, different similarity measures are taken into account to minimize between feature redundancies. With an ensemble of feature ranking algorithms and similarity measurements, the well-known correlation-based feature selection (CFS) within the sequential forward search merit has been muddled. In highdimensional datasets with small sample sizes, the suggested MRMR-HFS is advantageous. It can also be utilized in situations when a fast feature selection strategy is required. When the search space is exceptionally big and meta-heuristic algorithms are unable to explore it, the proposed MRMR-HFS technique can be applied. With the exception of the breast cancer dataset, the proposed MRMR-HFS differs dramatically from all previous methodologies. Dowlatshahi et al. [24] an Epsilon-greedy swarm optimizer (EFR-ESO) technique was proposed, as well as an ensemble of filter-based rankers. The ESO’s feature probabilities are weighted using the filter-based rankers’ ensemble knowledge of feature relevance. The six feature ranking methods that are aggregated by arithmetic mean to produce the final ranked feature list are Conditional Infomax Feature Extraction (CIFE), Joint Mutual Information (JMI), Max-Relevance MinRedundancy (MRMR), Interaction Capping (ICAP), Mutual Information Feature Selection (MIFS), and Double Input Symmetrical Relevance (DISR). In the future, the proposed approach for solving the multi-objective feature subset could be used. Zhang et al. [25] devised classifier ensemble reduction; a modified FA model was created. It solves the original FA model’s premature convergence problem by incorporating two new search strategies: accelerated attractiveness and dodging operations. PSO-based feature selection is used to generate two sets of 30 and 50 ideal feature subsets for the development of 30 and 50 base models. Future search procedures should include a micro-GA-based secondary swarm, chaos-based parameter modification, and mutation-based population diversification. Yang et al. [26] created numerous balanced datasets from the initial imbalanced dataset using sampling. An ensemble of base classifiers, each trained on a balanced dataset is then used to evaluate the feature subsets. There are two different types of search algorithms in use. The first is a hill-climbing algorithm that starts with an empty set and optimizes the fitness function by selecting a feature at a time. The second search method is a simple elitist genetic algorithm. The feature size is predetermined, and the approach uses genetic processes such as crossover and mutation to identify the best feature set for the given fitness function. The disadvantage is that even with the same set of features, different categorization algorithms perform differently. Guo et al. [27] dynamic rough subspace-based selective ensemble (DRSSE). Here base classifiers are selected based on mutual independence. Ensemble is formed with these selected base classifiers. Rough set ensembles are limited to supervised learning algorithms and further can be extended for semi-supervised data.
Search Techniques for Data Analytics with Focus …
81
Das et al. [28] developed a feature selection technique based on a bi-objective ensemble parallel processing genetic algorithm. To eliminate ambiguous data and select informative data, rough set theory and mutual information gain are utilized. The temporal complexity of a genetic algorithm is reduced through parallel processing. In all datasets, the proposed technique surpasses the majority of them, and it is only a close second to DSFFC in a few situations. Xue et al. [29] Genetic Ensemble of Extreme Learning Machines (GE-ELMs) are proposed. Sets of ELM are trained, and a sorting approach is used to select ELM based on low error rate and small norm weight. In spite of better accuracy, its training time is more than existing state of art methods. Ghorai et al. [30] Nonparallel plane proximal classifier (NPPC) is proposed for gene expression cancer data. Mutual information is used to select a few genes from data. A genetic algorithm for simultaneous feature selection scheme is used to train multiple NPPC models in different subspace. A new average proximity-based decision combiner is introduced to combine decision output from multiple NPPC models. Multiclass classification is not addressed. This method can be further extended using a nonlinear kernel trick. Hu et al. [31] FS-PP-EROS algorithm is proposed with forward search postpruning ensemble of rough subspace. Accuracy is used to prune base classifiers. It has been observed that after a certain time increasing the number of ensembles decreases the accuracy. Voting strategy is used to combine the classifiers decision. Further other combining strategies can be tested. System is not tested for high-dimensional datasets. Wu et al. [32] Multi-population ensemble differential evolution (MPEDE) with three mutation strategies is introduced. The best performing mutation strategy is dynamically assigned to the reward subpopulation. Algorithms tested on constraint optimization functions and not tested on real-world problems. Liu and Nishi [33] have devised a multi-population ensemble of particle swarm optimizers (MPEPSO). Whole population is divided into four subpopulations. Three populations were used to train three different PSO strategies called unified PSO (UPSO), linearly decreasing inertia weight PSO (LDWPSO), and comprehensive learning PSO (CLPSO). Fourth reward subpopulation is assigned to the best performing strategy. The system is tested on 30 benchmark functions. Further best performing other state-of-the-art algorithms can be ensemble. Success history-based DE with linear population (L-SHADE), jSO and covariance matrix bimodal learning DE (CoBiDE) can be ensemble to test their performance. Nguyen et al. [34] if the classifiers predicted confidence score is higher than credibility threshold, then that base classifier is selected. Prediction confidence is calculated based on training observation. It integrates both dynamic and static aspect of ensemble selection. Entropy measure is used to calculate confidence on training set. The entropy used to calculate confidence is weak in some cases when the values are the same for different classes when combined. Mohebian et al. [35] hybrid prediction for breast cancer recurrence (HPBCR) is proposed with a bagged decision tree ensemble approach. Here PSO is used to tune feature weights in training of classifier. 579 patient dataset from one institute is used
82
A. S. Sumant and D. V. Patil
for experimentation. There may be bias in prognosis due to less sample size. Output is not fuzzy. Risk of recurrence and its analysis can be studied in the future study. Aliˇckovi´c and Subasi [36] GA is used to eliminate insignificant features. Multiple classifiers called rotation forest are used to classify breast cancer data. WBC (Diagnostic) dataset is tested. System can be tested for other diseases datasets. Verma and Rahman [37] cluster-oriented ensemble classifier (COEC) is proposed. Data is divided into multiple clusters. Base classifiers ensemble learns cluster boundaries and calculates confidence values. Fusion classifier converts this to class confidence values for making decisions. Heterogeneous clustering as well as homogeneous clustering performance is tested. Performance improvement is observed with an increased number of clusters. In the future, finding optimum value for number of clusters and global optimization of parameters is selected in fusion classifier. Hasan et al. [38] Ensemble_RH with bagging and complex tree are introduced to handle missing values in multivariate medical data. Cervical cancer dataset is used for experiment. To reduce the number of features, the success rate of features is calculated with a few classifications algorithms. A total of 35 features have been chosen from a total of 858. Jan and Verma [39] Accuracy and diversity are important factors concerning ensemble classifiers. Misclassification diversity measure is devised in this paper. Incremental multi-tier classifier selection approach selects the best classifiers based on accuracy and diversity measures. System tested on 55 benchmark datasets. Not tested for high-dimensional datasets. In the future, ensemble component size and selected classifier with diversity and accuracy measures will be analyzed. Ramos-Jiménez et al. [40] a Multi-Layer Control of Induction by Sample Division Method (ML-CIDIM) is devised with a decision tree. In the first layer, sample is checked for an ensemble of classifiers. If there are any discrepancies in classification, then sample it will pass through individual base classifiers to correctly classify the instance. MLP has shown best results with this method. In the future, synergy between the proposed system and other AI-based classifiers can be tested. Zhang [41] rough set reduction-based algorithm with top-down pruning is used for dimensionality reduction. Best performing SVM ensembles are constructed. Rapidly selecting the best performing model in ensemble is future focus. Yu et al. [42] a Hybrid Adaptive Ensemble Learning (HAEL) to overcome drawback of random subspace-based classifiers is proposed. In the future, SVM as a base classifier can be chosen. The use of parallel framework to reduce computational cost can be done in the future. Wu [43] a set of trained base classifiers contains supplementary information like reliability of labels, must-link constraints for pairwise data, or labeled feature. A stack-based ensemble is generated by making use of this supplementary information to modify weights in training. Computational cost is more as compared to LPBoost. Supplementary ordering information is only for pairwise classifiers considered. Martinez-Muñoz et al. [44] from the pool of classifiers fraction of pruned ensemble selection is a computationally intensive task. GA and Semi-definite programming (SDP)-based ensemble pruning methods are devised. Pruning is less effective in complex ensembles.
Search Techniques for Data Analytics with Focus …
83
Guo et al. [45] generated self-adaptive base classifiers ensemble for different datasets. PSO is used to optimize weights in average voting schemes in final decision making. Though the cost of a multi-stage ensemble model is more, it is affordable in economic comparison with improved accuracy. Only three small datasets are used in the experiment. Heuristic approach can be used to select base classifiers. Amasyali and Ersoy [46] hereby applying difference operators to a set of randomly chosen features, new features are created. Classifier ensembles are trained with this new extended space forest. Bagging and rotation forest have achieved the highest learning accuracy rate. The random forest and random subspace are more diverse base learners. Though extended space forest training time is more as new features are added in training, it generates small trees so testing time is reduced. Use of other types of base learner is future scope. Yu et al. [47] Dataset is divided into different subspaces by an adaptive feature selection approach which selects random feature selection approach from a set of feature selection approaches. After that KNN graph is formed, and a supervised classifier is applied. Adaptive weighting is used to weight each classifier in process. Auxiliary training set is generated to enhance an existing training set. The proposed approach achieves better performance than a single semi-supervised approach but time complexity is more. Chen et al. [48] tree-based feature selection algorithm is applied to select feature set. Bagging is used to select training subsamples. Sample feature-based transformation is presented with PCA to handle unselected samples. Further classifier pruning is applied to reduce redundant and invalid classifiers. The classifiers in an ensemble are pruned based on their similarity using DBSCAN and accuracy. The system further can be improved by applying effective dimension reduction techniques. Yu and Ni [49] features are selected by applying Pearson correlation coefficient to form clusters of similar features. Signal-to-noise ratio (SNR) is used to select relevant features to form feature subspace. Further multiple diverse feature subsets are generated through random projection to balance accuracy and diversity. Training subsets are generated with asymmetric bagging where the majority class sample bootstrap method is used while minority class samples are reserved. These subsets are given to multiple base classifiers, and voting schemes are used to aggregate decisions. The system is not tested for multi-class datasets. Serpen and Pathical [50] Ensemble of random subsample is introduced. Higherdimensional feature space is divided into N number of subspaces by random projection with d number of features in each set without replacement. All base learners are then trained. 11 base classifiers with 15% sampling and C4.5 as a base classifier have got the best results. More high-dimensional datasets can be tested with the proposed system. Piao et al. [51] Symmetric Uncertainty is used to select relevant features. Then with this reduced set feature subsets are generated by inserting features in order into N subsets. Redundancy of feature added is checked with feature to feature SU measure in every stage. SVM is used as a base classifier in an ensemble. The system has been tested only for two datasets (Leukemia and Prostate). Further systems can be tested on more high-dimensional datasets.
84
A. S. Sumant and D. V. Patil
Espichan and Villanueva [52] in proposed tEnsemble approach template features are selected through model-based selection from high-dimensional datasets. F-test is used to filter the irrelevant features. Then correlation of template features and nontemplate features are calculated to generate a diversified feature set. Tournament selection strategy is used to generate feature subsets. Further, these are given to different classifier models. Voting is used to predict the final class label. Dynamic selection of a number of template features can be tested in the future. Also dynamic ensemble selection from a set of base classifiers can be customized as per the test sample. Kuncheva et al. [53] the system is tested for stability, accuracy, and ROC curve with sixteen feature ranking algorithms for Chronic Obstructive Pulmonary Disease (COPD) dataset. The data is first binarized. It is observed here that T-test is the most stable and accurate measure for high-dimensional data. System can be further tested for more datasets. Wu et al. [54] HUBoost is proposed based on clustering of majority class samples into k-hubs. Then representative samples are selected to balance the imbalance in the dataset. HUBoost can be further tested on gene expression datasets. HUBoost can be modeled with other machine learning algorithms. Lu et al. [55] ReliefF is used to decrease the number of features. Then five sets are generated, and four are used for training while the fifth is considered for testing. Training is carried out with three decision group classifiers as KNN, NB, and C4.5. For each sample, weight is calculated using the GA algorithm for classifiers. For testing, one classifier is chosen as per weight calculated by GA algorithm. In the future, decision classifiers can be increased to enhance system accuracy. Kamali et al. [56] Feature extraction and selection are done in time and frequency domain with GA. For each base classifier, an optimized feature subset is given. Multiple classifiers with multiple feature subsets are tested with the best voting and all voting criteria. Single classifier system is also tested to compare results with the proposed system. 12 base class variants are considered and given either time domain features, frequency domain features or FT features. The devised strategies can be applied to other pattern recognition tasks. de Oliveira et al. [57] from a pool of classifiers selection of ensemble of classifiers were done with multi-objective GA by considering accuracy and diversity. Here base classifiers are heterogeneous, and the effect of such classifier systems is studied in this paper. The system is not tested for high-dimensional datasets. Bolón-Canedo et al. [58] Distributed filter (DF) and Distributed Ranking filter (DRF) are devised. Here the vertical partition of features is done with random selection. After that, parallel fast filter algorithm works on this vertically partitioned dataset. Then subsets are ranked parallel. Final subset is generated from these ranked subsets. Filter methods used in experimentation are INTERACT, Information Gain, ReliefF, correlation-based feature selection (CFS), and consistency-based. Centralized feature ranking is compared with this distributed approach of feature selection. Further systems can be used with any feature selection approach. We discovered certain research gaps in existing search algorithms as a result of this investigation. Without any domain information about the feature importance,
Search Techniques for Data Analytics with Focus …
85
the search algorithm searches in a high-dimensional space. Existing search methods have issues including becoming stuck in local optima and being computationally intensive. The key problem discovered from this study is optimization algorithm difficulties such as sluggish convergence rate and modifying multiple parameters. Based on a specific search technique, subset assessment generates candidate feature subsets. Each candidate subset is assessed using a specific evaluation criterion and compared to the prior best candidate subset in terms of this criterion. This typically increases the computation time.
3 Experimental Setup and Datasets R 3.6.0 was used to build the system, while Python 3.7.2 was used to validate it with classifiers. System configuration used here was an Intel i7 processor with 8 GB RAM. The feature selection is done in R, and the results are validated in Python. Table 1 gives dataset details used in this experiment. All seven datasets are high-dimensional cancer datasets. Here the number of instances is n, and the number of features is p. The number of classes is denoted by C k . Accuracy is used to measure system performance. To calculate accuracy of classifier cross-validation technique is used. It divides dataset into train and test datasets. Here tenfold cross-validation is used. Total no. of correctly classified instances ∗ 100 (1) %Accuracy = Total instances The number of correctly classified instances divided by the total number of instances is referred to as accuracy. Equation (1) gives accuracy in percentage. The following search techniques are experimented on seven high-dimensional datasets. A sequential forward search (SFS) [60] begins with an empty set and adds the most promising feature one at a time. In stepwise selection search [61], forward and backward selections are used. It begins with no forecasts and then gradually includes the most essential forecasters Table 1 Dataset details used in experiment [59]
Dataset name
n
p
Ck
COLON
62
2000
2
Lung-cancer
203
12,600
5
Ovarian
254
15,154
2
CNS
60
7130
2
Leukemia
72
7129
2
Prostate
102
12,600
2
DLBCL
47
4026
2
86
A. S. Sumant and D. V. Patil
(like forward selection). After each new variable is included, remove any variables that no longer improve the model fit (like backward selection). The rationale for using SFS and stepwise search for high-dimensional datasets is that it has been less researched. These algorithms are executed on 11 features selected by applying symmetric uncertainty ReliefF (SU-R) [62] to rank features. The details of the method can be found in [62]. The new ensemble methods devised here are SU-R-SFS and SU-R-Stepwise ensemble search methods. Random forest (RF), support vector machine (SVM), K-nearest neighbors (KNN), and multilayer perceptron (MLP) classifiers are used for evaluating system performance. Table 2 Table 2 SU-R-SFS and SU-R-stepwise search results on high-dimensional datasets
Dataset name COLON
Lung cancer
OVARIAN
CNS
LEUKEMIA
PROSTATE
DLBCL
Average accuracy
SU-R-SFS
SU-R-stepwise
RF
87
94
SVM
92
93
KNN
90
94
MLP
87
93
RF
50
88
SVM
98
99
KNN
90
99
MLP
98
99
RF
99
99
SVM
99
99
KNN
99
99
MLP
99
99
RF
80
74
SVM
89.9
43
KNN
83
74
MLP
79.9
73
RF
90
90
SVM
92
92
KNN
96
96
MLP
88
87
RF
66
88
SVM
50
90
KNN
62
87
MLP
56
85
RF
99
90
SVM
99
75
KNN
88
89
MLP
64
69
84.67
87.75
Search Techniques for Data Analytics with Focus …
87
shows system results with SU-R-SFS and SU-R-Stepwise search. For ovarian dataset, overall best performance has been observed while for prostate dataset worst performance. Colon cancer dataset shows consistent performance for all classifiers for both the methods. In the CNS dataset, the SVM classifier has the worst performance for the SU-RStepwise search strategy, while ovarian and DLBCL have the best performance. For the SU-R-SFS search strategy, the KNN classifier has the worst performance in the prostate dataset, whereas the best performance is seen with the ovarian dataset. For the SU-R-SFS search strategy, MLP classifier has the worst performance in the prostate dataset, whereas ovarian has the greatest performance. For the SU-R-SFS search strategy, the worst performance of the RF classifier is found in the lung dataset, while the best performance is found in the ovarian dataset. For high-dimensional datasets, it has been discovered that SU-R-Stepwise search outperforms SU-R-SFS. Stepwise search performance has improved by 3.08% overall.
4 Conclusion Searching techniques as discussed in paper play important role in selecting predictive features from high-dimensional datasets. Greedy, hybrid, heuristic: trajectory, population-based, and ensemble search techniques are discussed. An ensemble method improves system accuracy and time, which makes these techniques more popular in research areas. SU-R-SFS and SU-R-Stepwise search techniques are experimented here achieve average accuracy as 84.67 and 87.75, respectively. The research direction we can suggest based on this study is dynamic selection of the best classifier in the prediction stage for improving system performance. To improve system performance, more ensemble search techniques could be developed in the future. Because time complexity is a big issue, parallel processing receives less attention in order to reduce time complexity. To make a system faster, one can look into parallel ensemble approaches.
References 1. J. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) 2. H. Zheng, Y. Zhang, Feature selection for high-dimensional data in astronomy. Adv. Space Res. 41 (2008). Published by Elsevier Ltd., pp. 1960–1964. https://doi.org/10.1016/j.asr.2007. 08.033 3. J. Debuse, V. Rayward-Smith, Feature subset selection within a simulated annealing data mining algorithm. J. Intell. Inf. Syst. 9(1), 57–81 (1997) 4. X. Wang, J. Yang, X. Teng, W. Xia, R. Jensen, Feature selection based on rough sets and particle swarm optimization. Pattern Recognit. Lett. 28(4), 459–471 (2007) 5. R. Leardi, R. Boggia, M. Terrile, Genetic algorithms as a strategy for feature selection. J. Chemometr. 6(5), 267–281 (1992)
88
A. S. Sumant and D. V. Patil
6. J. Wroblewski, Ensembles of classifiers based on approximate reducts. Fund. Inform. 47(3–4), 351–360 (2001) 7. Z.W. Geem, State-of-the-art in the structure of harmony search algorithm, in Recent Advances in Harmony Search Algorithm. Studies in Computational Intelligence, ed. by Z.W. Geem, vol. 270 (2010). https://doi.org/10.1007/978-3-642-04317-8_1 8. K.S. Lee, Z.W. Geem, A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput. Meth. Appl. Mech. Eng. 194(36–38), 3902–3933 (2005) 9. M. Fesanghary, M. Mahdavi, M. MinaryJolandan, Y. Alizadeh, Hybridizing harmony search algorithm with sequential quadratic programming for engineering optimization problems. Comput. Meth. Appl. Mech. Eng. 197(33–40), 3080–3091 (2008) 10. R. SrinivasaRao, S.V.L. Narasimham, M. RamalingaRaju, A. SrinivasaRao, Optimal network reconfiguration of large-scale distribution system using harmony search algorithm. IEEE Trans. Power Syst. 26(3), 1080–1088 (2011) 11. M. Mahdavi, M.H. Chehreghani, H. Abolhassani, R. Forsati, Novel meta-heuristic algorithms for clustering web documents. Appl. Math. Comput. 201(1–2), 441–451 (2008) 12. M.H. Mashinchi, M.A. Orgun, M. Mashinchi, W. Pedrycz, A tabu-harmony search-based approach to fuzzy linear regression. IEEE Trans. Fuzzy Syst. 19(3), 432–448 (2011) 13. C.C. Ramos, A.N. Souza, G. Chiachia, A.X. Falcão, J.P. Papa, A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. Comput. Electr. Eng. 37(6), 886–894 (2011) 14. R. Diao, Q. Shen, Feature selection with harmony search. IEEE Trans. Syst., Man, Cybern. B 42(6), 1509–1523 (2012) 15. B. Tran, M. Zhang, B. Xue, A PSO based hybrid feature selection algorithm for highdimensional classification, in Evolutionary Computation (CEC), 2016 IEEE Congress. https:// doi.org/10.1109/CEC.2016.7744271 16. E. Bonilla-Huerta, A. Hernandez-Montiel, R. Morales-Caporal, M. Arjona-Lopez, Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 12–26 (2016) 17. K.B. Nahato, K.H. Nehemiah, A. Kannan, Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical datasets. Inform. Med. Unlocked 2, 1–11 (2016) 18. M. Seera, C.P. Lim, A hybrid intelligent system for medical data classification. Expert Syst. Appl. 41(5), 2239–2249 (2014) 19. P. Cunningham, J. Carney, Diversity versus quality in classification ensembles based on feature selection, in Proceedings of ECML 2000, 11th European Conference on Machine Learning, ed. by R.L. de Mántaras, E. Plaza, Barcelona, Spain, LNCS 1810 (Springer, 2000), pp. 109–116 20. D. Opitz, Feature selection for ensembles, in Proceedings of 16th National Conference on Artificial Intelligence, AAAI, 1999, pp. 379–384 21. A. Tsymbal, P. Cunningham, M. Pechinizkiy, S. Puuronen, Search strategies for ensemble feature selection in medical diagnostics, in Proceedings of 16th IEEE Symposium on ComputerBased Medical Systems CBMS’2003, ed. by M. Krol, S. Mitra, D.J. Lee. The Mount Sinai School of Medicine, New York 22. R. Diao, F. Chao, T. Peng, N. Snooke, Q. Shen, Feature selection inspired classifier ensemble reduction. IEEE Trans. Cybern. 44(8), 1259–1268 (2014) 23. M.K. Ebrahimpour, M. Eftekhari, Ensemble of feature selection methods: a hesitant fuzzy sets approach. Appl. Soft Comput. J. 50, 300–312 (2017). https://doi.org/10.1016/j.asoc.2016. 11.021 24. M.B. Dowlatshahi, V. Derhami, H. Nezamabadi-Pour, Ensemble of filter-based rankers to guide an epsilon-greedy swarm optimizer for high-dimensional feature subset selection. Information 8(4) (2017). https://doi.org/10.3390/info8040152 25. L. Zhang, W. Srisukkham, S.C. Neoh, C.P. Lim, D. Pandit, Classifier ensemble reduction using a modified firefly algorithm: an empirical evaluation. Expert Syst. Appl. 93, 395–422 (2018). https://doi.org/10.1016/j.eswa.2017.10.001
Search Techniques for Data Analytics with Focus …
89
26. P. Yang, W. Liu, B.B. Zhou, S. Chawla, A.Y. Zomaya, Ensemble-based wrapper methods for feature selection and class imbalance learning, Lect. Notes Computer Sci. (including Subser. Lect. Notes Artif.Intell. Lect. Notes Bioinformatics), vol. 7818 LNAI, no. PART 1, 2013, pp. 544–555. https://doi.org/10.1007/978-3-642-37453-1_45 27. Y. Guo et al., A novel dynamic rough subspace based selective ensemble. Pattern Recognit. 48(5), 1638–1652 (2015) 28. A.K. Das, S. Das, A. Ghosh, Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017). https://doi.org/10.1016/j.knosys.2017.02.013 29. X. Xue, M. Yao, Z. Wu, J. Yang, Genetic ensemble of extreme learning machine. Neurocomputing 129, 175–184 (2014) 30. S. Ghorai, A. Mukherjee, S. Sengupta, P.K. Dutta, Multicategory cancer classification from gene expression data by multiclass NPPC ensemble. Int. Conf. Syst. Med. Biol. ICSMB 2010 Proc. 8(3), 41–46 (2010) 31. Q. Hu, D. Yu, Z. Xie, X. Li, EROS: ensemble rough subspaces. Pattern Recognit. 40(12), 3728–3739 (2007) 32. G. Wu, R. Mallipeddi, P.N. Suganthan, R. Wang, H. Chen, Differential evolution with multipopulation based ensemble of mutation strategies. Inf. Sci. (NY) 329, 329–345 (2016) 33. Z. Liu, T. Nishi, Multipopulation ensemble particle swarm optimizer for engineering design problems. Math. Probl. Eng. 2020 (2020) 34. T.T. Nguyen, A.V. Luong, M.T. Dang, A.W. C. Liew, J. McCall, Ensemble selection based on classifier prediction confidence. Pattern Recognit. 100, 107104 (2020) 35. M.R. Mohebian, H.R. Marateb, M. Mansourian, M.A. Mañanas, F. Mokarian, A hybrid computer-aided-diagnosis system for prediction of breast cancer recurrence (HPBCR) using optimized ensemble learning. Comput. Struct. Biotechnol. J. 15, 75–85 (2017) 36. E. Aliˇckovi´c, A. Subasi, Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput. Appl. 28(4), 753–763 (2017) 37. B. Verma, A. Rahman, Cluster-oriented ensemble classifier: impact of multicluster characterization on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24(4), 605–618 (2012) 38. M.R. Hasan, H. Gholamhosseini, N.I. Sarkar, A new ensemble classifier for multivariate medical data, in 2017 27th International Telecommunication Networks Application Conference, ITNAC 2017, vol. 2017, January (2017), pp. 1–6 39. M.Z. Jan, B. Verma, A novel diversity measure and classifier selection approach for generating ensemble classifiers. IEEE Access 7, 156360–156373 (2019) 40. G. Ramos-Jiménez, J. Del Campo-Ávila, R. Morales-Bueno, Hybridizing ensemble classifiers with individual classifiers, in ISDA 2009—9th International Conference Intelligence Systems Design Application, 2009, pp. 199–202 41. G. Zhang, A classifier research based on RS reducts and SVM ensemble, in Proceedings of 2010 International Symposium on Computational Intelligence and Design, ISC 2010, vol. 2, no. D, 2010, pp. 136–139 42. Z. Yu, L. Li, J. Liu, G. Han, Hybrid adaptive classifier ensemble. IEEE Trans. Cybern. 45(2), 177–190 (2015) 43. O. Wu, Classifier ensemble by exploring supplementary ordering information. IEEE Trans. Knowl. Data Eng. 30(11), 2065–2077 (2018) 44. G. Martinez-Muñoz, D. Hernández-Lobato, A. Suarez, An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009) 45. S. Guo, H. He, X. Huang, A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7, 78549–78559 (2019) 46. M.F. Amasyali, O.K. Ersoy, Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 26(3), 549–562 (2014) 47. Z. Yu et al., Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Trans. Cybern. 49(2), 366–379 (2017)
90
A. S. Sumant and D. V. Patil
48. W. Chen, Y. Xu, Z. Yu, W. Cao, C.L.P. Chen, G. Han, Hybrid dimensionality reduction forest with pruning for high-dimensional data classification. IEEE Access 8, 40138–40150 (2020) 49. H. Yu, J. Ni, An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(4), 657–666 (2014) 50. G. Serpen, S. Pathical, Classification in high-dimensional feature spaces: random subsample ensemble, in 8th International Conference on Machine Learning and Applications, ICMLA 2009, 2009, pp. 740–745 51. Y. Piao, H.W. Park, C.H. Jin, K.H. Ryu, Ensemble method for classification of high-dimensional data, in 2014 International Conference on Big Data and Smart Computing, BIGCOMP 2014, 2014, pp. 245–249 52. A. Espichan, E. Villanueva, A novel ensemble method for high-dimensional genomic data classification, in Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedical, BIBM 2018, 2019, pp. 2229–2236 53. L.I. Kuncheva, C.J. Smith, Y. Syed, C.O. Phillips, K.E. Lewis, Evaluation of feature ranking ensembles for high-dimensional biomedical data: a case study, in Proceedings of 12th IEEE International Conference on Data Mining Work, ICDMW 2012, 2012, pp. 49–56 54. Q. Wu, Y. Lin, T. Zhu, J. Wei, HUSBoost: a hubness-aware boosting for high-dimensional imbalanced data classification, in Proceedings of International Conference on Machine Learning and Data Engineering, iCMLDE 2019, 2019, pp. 36–41 55. H. Lu, H. Gao, M. Ye, K. Yan, X. Wang, A hybrid ensemble algorithm combining adaboost and genetic algorithm for cancer classification with gene expression data, in Proceedings of 9th International Conference on Information Technologies in Medicine and Education, ITME 2018, 2018, pp. 15–19 56. T. Kamali, R. Boostani, H. Parsaei, A multi-classifier approach to MUAP classification for diagnosis of neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 191–200 (2014). https://doi.org/10.1109/TNSRE.2013.2291322 57. D.F. de Oliveira, A.M.P. Canuto, M.C.P. de Souto, Use of multi-objective genetic algorithms to investigate the diversity/accuracy dilemma in heterogeneous ensembles, in 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 2009, pp. 2339–2346. https://doi. org/10.1109/IJCNN.2009.5178758 58. V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. J. 30, 136–150 (2015) 59. https://archive.ics.uci.edu/ml/index.php 60. J. Xie, C. Wang, Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38(5), 5809–5815 (2011). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2010.10.050 61. Z. Zhang, Variable selection with stepwise and best subset approaches. Ann Transl Med. 4(7), 136 (2016). https://doi.org/10.21037/atm.2016.03.35 62. A.S. Sumant, D. Patil, Ensemble feature subset selection: integration of symmetric uncertainty and chi-square techniques with RReliefF. J. Inst. Eng. India Ser. B (2022). https://doi.org/10. 1007/s40031-021-00684-5
A Survey on Underwater Object Detection Pratima Sarkar , Sourav De , and Sandeep Gurung
Abstract Underwater object detection is a challenging area of research because of unclear images. Underwater object detection covers the detection of fish, planktons, submerged ships, pipelines, debris, etc. The work reviews different art of studies where it considers either object detection or object localization. Here we have reviewed the existing work into two categories learning-based approach and non-learning-based approach. Learning-based approaches are capable of classifying objects but other approaches mostly localize objects or detect the shape of the objects. The deep learning-based technique is mostly used in optical images but in sonar images mostly segmentation, clustering, edge detection is used. In this chapter, it is tried to address the key issues of different literature’s also made a summary of some of the literature with their findings and our findings. The paper gives an idea about the detection of small objects. Also, it covers the literature that deals with shadow and objects related confusion. Keywords Underwater object detection · Deep learning · Machine learning · Segmentation
1 Introduction Object detection is an approach toward finding objects present in the real world based on existing knowledge. This is an easy process for human beings but very difficult for a machine to recognize an object [1]. Out of all types of object detection, underwater object detection is most challenging because the quality of an image is not good. Scatter lights and color changes are two major issues that will cause distorted images underwater. As changes in light path varies with the density of water so P. Sarkar (B) · S. Gurung Sikkim Manipal Institute of Technology, SMU, Rangpo, India S. Gurung e-mail: [email protected] S. De Cooch Behar Government Engineering College, Cooch Behar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_8
91
92
P. Sarkar et al.
multiple times reflection and de-reflection happen underwater before reaching the camera. It may cause low contrast and image casting. Image casting is the effect of any color on a particular image and underwater images suffer from this problem. Underwater images are dominated by blue color. Another reason for the haze image inside is the presence of unwanted particles or plankton which causes reflection inside water. In this paper, we will provide a review of different underwater object detection techniques based on deep learning techniques, machine learning-based techniques, and segmentation-based techniques. Underwater image processing can be categorized into two types of images, i.e., sonar image and optical image as shown in Fig. 1. Sonar underwater imaging technique uses sound propagation to detect an underwater object. Sonar image interpretation is difficult than optical image. The resolution of a sonar image is less than an optical image as images are captured by using ultrasonic signals. Rough surface images are more prominent in sonar image because rough object reflects sounds in multiple directions. In most of the cases, angular smooth surfaces have strong reflection on one side and no reflection on other side of the image. Angular smooth surface object can be detected using sonar imaging concept. Sometimes two target objects in sonar images are detected as one single object. The sonar image uses transducer so that relative elevation is not recognized by sonar imaging mechanism. These all kinds of problem occur due to sonar image being captured in “slant” range. Optical images are formed by using lens or mirror system from a refracted or reflected light waves. Main challenge in this kind of is not visible properly each of the object inside sea. Basic object detection techniques are categorized into two parts localization on an object and classification of an object. In optical images, exiting art of studies performed for localization and classification of objects based on different features but in sonar images mostly concentrates on localization of objects based on segmenting an image. The general object detection technique is shown in Fig. 2 block diagram.
Fig. 1 a Example of sonar image, b example of optical image
A Survey on Underwater Object Detection
Input Image
Image Preprocessing
93
Extract Features
Features Matching
Does object present in an image?
Yes Detect object
No
Predict location
Fig. 2 Generalize underwater object detection model
2 Learning-Based Object Detection Techniques Learning meaning is that classifier should recognize different objects present in an image. Learning can be classified as two types learning by training and learning by validation. Learning-based underwater object detection model is shown in Fig. 3. In learning by training its generating feature maps and detecting Region of Interest (ROI). Then classifier is used to classify the objects. In addition to that learning is also possible in the time of validation by object detection model. Validation of data is usually performed beforehand [1]. Template matching techniques are used for the validation of data just after image preprocessing. Testing phase is used to check the accuracy level of the work. Also, testing gives us a conclusion that object is present in an image or not. Deep learning and machine learning techniques give very good result in underwater optical images. Autonomous underwater vehicles (AUV) mostly depend on object detection technique, and the main job of these type of vehicles is monitoring the underwater environment. Initially, in the duration of 1996–1998 developed AUV devices are focusing on texture of underwater images and perform template matching [2, 3]. These methods are based on huge calculation and generate redundant features. However, incorrect features can be a cause of high error rates. To avoid incorrect feature selection, Foresti et al. [4] proposed a neural network-based technique that is capable of realtime object detection and can be used in AUV devices. In [4] for detecting pipeline edges, neural network is used. Then again neural network is used to classify images based on calculated probability value. Most important contribution of this work is in the presence of sand and seaweed, and different illumination levels, different water depths, and different pipeline radii do not affect the performance of the work.
94
P. Sarkar et al.
Learning by training
Training data
Feature Map generation
Dataset collection Learning by and Validation pre-processing
Testing Data
Feature Map generation
Detection of ROI
Validation data
Detection of ROI
Classifier Training
Object detection Model
Region Classification
Post Processing
Detection Result Fig. 3 Generalize learning-based underwater object detection model
Year 2008 and onwards, K-nearest neighbor (KNN), support vector machine (SVM) classifier are used for classifying objects. These all algorithm are classified on the basis of feature extracted from images. Some of the work used texture as feature or scale-invariant feature transform (SIFT) or speeded-up robust feature (SURF) extraction techniques. These techniques are invariant to rotation, scale, illumination, blur, and translation invariant [5]. Shihavuddin et al. [6] used SVM classifier to classify underwater objects. Radial basis function is used with SVM. MLC dataset is used which having data of 2008, 2009, and 2010. Handcrafted features are used for classification like hue channel color histograms, gray-level co-occurrence matrix (GLCM), opponent angle, and local binary pattern (LBP). Fulkerson et al. [7] proposed a method classified the different objects based on superpixel and superpixel are treated as basic unit of segmentation. Classifier is designed based on local feature, i.e., histogram of superpixels. SIFT feature extraction is used to extract features, and SVM classifier is used to identify different classes of objects. Li et al. [8] used Fast-RCNN [9] to detect different types of fish species. FastRCNN computational complexity is very high so Li et al. [10] used fish detection using Faster-RCNN [11] that is much faster than Fast-RCNN. Fast-RCNN and FasterRCNN both are facing some problem in the detection of small objects and shadow effectively. To addressed the problem between the segregation of shadow and original object, Naveen Kumar et al. [12] proposed a mean-shift-clustering-based technique for highlighting objects and isolating shadow from original object. The work used feature fusion and SVM classifier for object detection.
A Survey on Underwater Object Detection
95
In the year 2014, Ross et al. [13] proposed a simple network, that is, RCNN. RCNN first proposes the region and then processes the region to detect object. This is a scalable detection algorithm that improves mAP by more than 30% with respect to VOC 2012. Li et al. [14] proposed a network that works faster than RCNN named as Fast-RCNN and can give better performance with low-resolution images. Object detection from sonar image is one of the challenging tasks the problem is addressed by Lee et al. [15] in 2018. In the work, RCNN is used to detect underwater images. The work proposed StyleBankNet to synthesize the noise available in case of sonar image. Data augmentation technique is used to increase database size depending on physical property of different objects. Zhang et al. [16] proposed deep learningbased technique to detect underwater objects and used VGG16 as backbone network. Multi-box feature pyramid is used to extract features, and contiguous features are used to integrate features. Anchor box designing and matching are done based on different scale of objects. Mahmood et al. proposed a CNN-based handcrafted approach to classify coral inside the sea [17]. The work also identified patches in multiple scale to detect coral and used texture and color to extract feature. Mahmood et al. [18] aims to find out coral coverage area in sea, and it is very much important to analyze decreasing trends in coral area coverage. In this work, VGGnet and ResNet are used to extract features for corals and non-corals. Extracted features were classified using SVM or MLP. Multiple times classification error occurs due to dataset gap between object localization and classification in the work proposed by Mahmood et al. [18]. Single-shot detector (SSD) facing problem in the detection of small objects to resolve the problem associated with small object detection in the year of 2020. Hu et al. [19] proposed a cross-level-based fusion network to improve feature extraction ability. The work combines ResNet and SSD concept together to get better performance and performing 7.6% better than SSD. The network does not considering the problem associated with the heterogeneous noise. Chen et al. [20] proposed Sample-WeIghted hyPEr Network (SWIPENet) and the network considering trying resolve the problem associated with the heterogeneous noise. The network generates feature map that is of high resolution and semantic rich. The network-facing problem is related to noisy image while detecting may detect some unwanted object. In the year of 2020, Hongbo Yang et al. proposed a technique for real-time object detection by using YOLOv3. The work gives better performance than Faster-RCNN with respect to speed and mean average precision. The work lagging in the detection of occluded objects. Liu et al. [21] proposed an approach to detect sea cucumber, sea urchin, and scallop under the sea. Generative adversarial network is used to avoid class imbalance problem, and AquaNet is proposed for efficient in the detection of small objects. Figure 4 presents the different learning-based approaches used for underwater object detection starting from the year 2008. We have summarized all deep learningbased art of the states in Table 1.
96
P. Sarkar et al.
Fig. 4 Different learning-based underwater object detection techniques
3 Other Approaches (Non-learning-Based Approach) One of the main objectives of underwater object detection is automated underwater vehicle movement. So in this paper, we have considered object detection and tracking both. In learning-based approach, it is classifying the objects but in other approach mostly identifies the edges or segments the underwater objects. Different segmentation techniques like thresholding [22] and clustering [23] are used. In this type of approach, different filters are used for the purpose of noise removal. Some of the state art of work proposes template matching for identifying objects, some of the work detects shapes to guess objects, and some of the work uses texture of objects to detect different objects. In the year of 1998, Lane et al. [22] proposed a work based on motion estimation of underwater objects from sequence of sonar images. At first segmented, the moving and static image is using thresholding technique. Then predicted the then predicted the path of moving object. In the year of 2001, Petillot et al. [24] proposed a path planning system for AUV, and the work used real-time data stream is to detect obstacle. It takes care of static and dynamic characteristics of obstacle and generates workspace model to plan proper path. Filip Mandic et al., in the year of 2016 [25] proposed an object tracking system that works on sonar images. Gaussian blur filter is used to remove noise from the sonar image. Gaussian blur gives similar result as erosion and dilation performed on binary image [23]. Localization of object is done by contour detection and clustering method. Galceran [26] proposed a work real-time object detection on sonar image. As it is a real-time system, it concentrates on quick feature computation and reduction of computational load. To reduce load of computation taken small part of image instead of entire image. As it is working on sonar image so it can detect different shapes instead of exact object. In the year of 2016, Zhu et al. [27] proposed a work to segment underwater background from object and detect object. Saliency-based region merging technique is used to detect object. Discriminative regional feature integration (DRFI) 2013 [28] is used by Yafei Zhu to generate master saliency map. Saliency map generation is dependent on integration of regional contrast, backgroundness descriptor, and regional property.
Dataset
Graz-02 and PASCAL VOC 2007
POOL, SEA2017 and SEA2018
UDD dataset created by authors
NSFC-dataset
Authors
Brian Fulkerson et al.
Sejin Lee et al.
Chongwei Liu et al.
Lu Zhang et al.
Performance evaluation parameter
Feature aggregation Mean average based underwater object precision (mAP) detection
Modified generative Mean average adversarial network, i.e., precision (mAP) Poisson GAN is used to detect sea cucumber, sea urchin, and scallop
StyleBankNet to Precision and synthesize sonar images recall and RCNN for object detection
Classification of objects Average pixel based on histogram of accuracy superpixel and SIFT feature extraction technique using SVM
Concept used
Table 1 Summary of learning-based underwater object detection technique Authors findings
Detection of multiscale underwater object
Underwater object grabbing
Object detection from sonar image
Occluded image may affect the accuracy of the work
Our findings
The proposed network is much more cheaper than SSD
Compared with different types of GAN network of gives better result
(continued)
During designing anchor box the way of calculating threshold value is not specified
Improvement in performance is possible by using data augmentation technique
Data augmentation is The entire process is based on physical restricted to diver property of different database objects
Object localization Histogram of and classification of Superpixel can be objects used as important feature for localization and classification
Issue addressed
A Survey on Underwater Object Detection 97
Benthoz15
ImageCLEF_Fish_TS
Ammar Mahmood et al.
Xiu Li et al.
Fast-RCNN is used to detect fish
The work concentrates on finding out population of coral on specific region. Instead of using whole image for training ResNet extract 3D array of features and used to train CNN network
Detection pipeline edges and classification of images are based on neural network
–
G. L. Foresti et al.
Concept used
Used neural network-based Sample-WeIghted hyPEr Network (SWIPENet), for small object detection
Dataset
Long Chen et al. URPC2017 and URPC2018
Authors
Table 1 (continued)
Mean average precision (mAP)
Accuracy, precision, and recall
Probability value
Mean average precision (mAP)
Performance evaluation parameter
Fish detection using fast-RCNN
Deep image representations for coral image classification
Pipeline detection
Underwater object detection
Issue addressed
The work compared deformable parts model and received 11% better accuracy and 80 times faster than RCNN
To improve performance with respect speed it is better to extract feature and then train a model
Real-time pipeline detection can be achieved
Works well with small object detection
Authors findings
(continued)
Facing problem while detecting small objects
Multiple times classification error occurs due to dataset gap between object localization and classification
Mainly concentrates on pipeline detection
Incorrectly detected some of the objects if image is very noisy and computational complexity is very high
Our findings
98 P. Sarkar et al.
Dataset
National Natural Science Foundation of China Underwater Robot Competition dataset
Underwater Robot Picking Contest (URPC) VOC2007 Format
Authors
Kai Hu et al.
Hongbo Yang et al.
Table 1 (continued)
YOLOv3 is used for object detection the technique implements simultaneous localizations and classification of objects
Feature cross-level fusion is implemented using ResNet. This is a feature enhancement technique. SSD is used for object detection on top of selected features
Concept used
Mean average precision (mAP) and recall rate
Average Precision
Performance evaluation parameter
Underwater object recognition
Marine object detection
Issue addressed
Our findings
Takes less time than faster-RCNN and 6.4% better mean average precision (mAP) value achieved than faster-RCNN
The work lagging in detection of occluded object and small object
The algorithm Not considering achieves 7.6% higher heterogeneous noise precision value than Classic SSD
Authors findings
A Survey on Underwater Object Detection 99
100
P. Sarkar et al.
Fig. 5 Different non-learning-based underwater object detection techniques
Priyadarshni et al. [29] used feature extraction technique to detect and track an object. The work extracts ten important points of an object to match reference image with target image. The algorithm is facing problem while changing detection window size. Object detection from sonar image is a challenging task because of low quality of image so some of the author concentrates on this area [26, 30]. In case of object detection, it is mostly concentrating on the segmentation of objects and detection of shape. Priyadharsini et al. [30] proposed a work to detect edges of underwater object from sonar image. Preprocessing of image is done based on Wiener filtering and median filter. Edges are detected by using morphological gradient. Chen [31] detects salient object by combining 2D and 3D visual features. The proposed work follows the three most important principles—(1) color sensitivity will decrease with increased light sensitivity, (2) the short-range objects visually more salient than that of distant ones, and (3) contrast sensitivity improvement. This work segmented the salient objects from its background by using natural light and artificial light region. Figure 5 presents the different non-learning-based approaches that used for underwater object detection starting from year 1988. We have summarized nondeep learning-based approaches in Table 2.
4 Conclusion This paper presents survey of various art of studies for underwater object detection. The process of underwater object detection is classified into two categories, learningbased approach and other approach (non-learning-based approach). Learning-based approach is again categorized into machine learning and deep learning. Before 2007, most of the object detection approach is based on image segmentation or clustering or
Dataset
Gemellina ASV
–
–
–
Authors
Enric Galceran et al
Yafei Zhu et al.
Divya Priyadarshni et al.
R. Priyadharsini et al.
Detecting edges by using morphological gradient. Moore’s neighbor algorithm is used for boundary tracing
Extract features from target object and reference image and match features
Discriminative regional feature integration algorithm is used to segment object
Echo estimation and echo scoring-based thresholding technique is used to detect object
Concept used
Qualitative analysis considered
Precision and accuracy
Percentage of pixel correctly detected
Detection error
The work compared with ground truth labeled by human and received 98% accuracy
Real-time object detection with minimal computational complexity
Authors findings
Object detection using segmentation
Gradient helps in detection of false edges
Underwater object Prediction of objects detection and is done by Kalman tracking filter and the assumed probability value
Automatic object detection and segmentation
Real-time object detection from sonar image
Performance Issue addressed evaluation parameter
Table 2 Summary of other approaches-based underwater object detection technique
(continued)
Results not shown for distant objects
The work fails in detection and tracking of object with different frame sizes
The work mostly fails in detection of tails of fish
The work extended for mid-level water and number of false alarm can be reduced by adding density filtering concept
Our findings
A Survey on Underwater Object Detection 101
Dataset
Garda, Portofino and Soller
Authors
Dario Lodi Rizzini et al.
Table 2 (continued)
ROI detection based on segmentation and contour shape validation. K-means clustering is used for segmentation
Concept used Precision, recall, accuracy
Vision-based underwater object detection
Performance Issue addressed evaluation parameter The work validated on different types of dataset and 84% accuracy achieved
Authors findings
Considering pipeline as objects not considering other objects
Our findings
102 P. Sarkar et al.
A Survey on Underwater Object Detection
103
template matching. Then from 2007 till now most of the authors used learning-based approach. Learning-based is again subdivided into two parts, machine learning and deep learning. In the year 2012 onwards, researchers used deep learning techniques. The main issues related to object detection are the detection of small objects and occlusion. In the duration of 2019–2021, small object detection issue is almost solved by some of the deep learning-based approach. While considering other approach or non-learning-based approach, these are mostly used for the localization of objects. The work presents details of author findings, our findings and techniques used to resolve the problem in the form of summary. The paper is must be useful to the new researcher who wants to work on underwater object detection.
References 1. K.U. Sharma, N.V. Thakur, A review and an approach for object detection in images. Int. J. Comput. Vis. Robot. 7(1/2), 196–237 (2017). https://doi.org/10.1504/IJCVR.2017.081234 2. G. Tascini, P. Zingaretti, G.P. Conte, Real-time inspection by submarine images. J. Electron. Imag. 5, 432–442 (1996) 3. A. Branca, E. Stella, A. Distante, Autonomous navigation of underwater vehicles, in Proceedings of Oceans ’98, Nice, France, pp. 61–65, 1998 4. G.L. Foresti, S. Gentili, A vision based system for object detection in underwater images. Int. J. Pattern Recognit. Artif. Intell. 14(2), 167–218 (2000) 5. P. Sykora, P. Kamencay, R. Hudec, Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map. AASRI Procedia 9, 19–24 (2014). ISSN 2212-6716. https://doi.org/10.1016/j.aasri.2014.09.005 6. A. Shihavuddin, N. Gracias, R. Garcia, J. Escartin, R.B. Pedersen, Automated classification and thematic mapping of bacterial mats in the North Sea, in Proceedings of MTS/IEEE OCEANSBergen, pp. 1–8, 2013 7. B. Fulkerson, A. Vedaldi, S. Soatto, Class segmentation and object localization with superpixel neighborhoods, in 2009 IEEE 12th International Conference on Computer Vision, pp. 670–677, 2009. https://doi.org/10.1109/ICCV.2009.5459175 8. X. Li, M. Shang, H. Qin, L. Chen, Fast accurate fish detection and recognition of underwater images with fast R-CNN, in OCEANS 2015-MTS/IEEE Washington. IEEE, 2015, pp. 1–5 9. R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448 10. X. Li, M. Shang, J. Hao, Z. Yang, Accelerating fish detection and recognition by sharing cnns with objectless learning, in OCEANS 2016-Shanghai. IEEE, 2016, pp. 1–5 11. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems, 2015, pp. 91–99 12. N. Kumar, U. Mitra, S.S. Narayanan, Robust object classification in underwater sidescan sonar images by using reliability-aware fusion of shadow features. IEEE J. Oceanic Eng. 40(3), 592–606 (2015). https://doi.org/10.1109/JOE.2014.2344971 13. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in CVPR, 2014 14. X. Li, M. Shang, H. Qin, L. Chen, Fast accurate fish detection and recognition of underwater images with Fast R-CNN. OCEANS 2015—MTS/IEEE Washington, 2015, pp. 1–5. https://doi. org/10.23919/OCEANS.2015.7404464 15. S. Lee, B. Park, A. Kim, Deep learning from shallow dives: sonar image generation and training for underwater object detection. Robotics, arXiv:1810.07990
104
P. Sarkar et al.
16. L. Zhang, X. Yang, Z. Liu, L. Qi, H. Zhou, C. Chiu, Single shot feature aggregation network for underwater object detection, in Conference: 2018 24th International Conference on Pattern Recognition (ICPR) (2018), pp. 1906–1911. https://doi.org/10.1109/ICPR.2018.8545677 17. A. Mahmood, et al., Coral classification with hybrid feature representations, in Proceedings of IEEE International Conference on Image Processing, 2016, pp. 519–552 18. A. Mahmood, M. Bennamoun, S. An, F.A. Sohel, Senior Member, IEEE, F. Boussaid, R. Hovey, G.A. Kendrick, R.B. Fisher, Deep image representations for coral image classification. IEEE J. Oceanic Eng. 44(1), 121–131 (2019). https://doi.org/10.1109/JOE.2017.2786878 19. K. Hu, F. Lu, M. Lu, Z. Deng, Y. Liu, A marine object detection algorithm based on SSD and feature enhancement. Hindawi Complexity 2020, Article ID 5476142, 14 p (2020). https://doi. org/10.1155/2020/5476142 20. L. Chen, Z. Liu, L. Tong, Z. Jiang, S. Wang, Junyu, H. Zhou, Underwater object detection using invert multi-class Adaboost with deep learning. Comput. Vis. Pattern Recogn. arXiv:2005.115 52v1 [cs.CV], 23 May 2020 21. C. Liu, Z. Wang, S. Wang, T. Tang, Y. Tao, C. Yang, H. Li, X. Liu, X. Fan, A new dataset, Poisson GAN and AquaNet for underwater object grabbing. Comput. Vis. Pattern Recogn. arXiv:2003.01446 22. D.M. Lane, M.J. Chantler, D. Dai, Robust tracking of multiple objects in sector-scan sonar image sequences using optical flow motion estimation. IEEE J. Oceanic Eng. 23(1) (1998) 23. K.J. DeMarco, M.E. West, A.M. Howard, Sonar-based detection and tracking of a diver for underwater human-robot interaction scenarios, in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC ’13), IEEE, Manchester, UK, October 2013, pp. 2378–2383 24. Y. Petillot, I.T. Ruiz, D.M. Lane, Underwater vehicle obstacle avoidance and path planning using a multi-beam forward looking sonar. IEEE J. Oceanic Eng. 26(2), 240–251 (2001) 25. F. Mandic, I. Rendulic, N. Miskovic, Y. Nad, Underwater object tracking using sonar and USBL measurements. J. Sens., Article ID 8070286, 10 p (2016). https://doi.org/10.1155/2016/ 8070286 26. L. Weng, M. Li, Z. Gong, S. Ma, Underwater object detection and localization based on multi-beam sonar image processing, in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2012, pp. 514–519. https://doi.org/10.1109/ROBIO.2012.6491018 27. Y. Zhu, L. Chang, J. Dai, H. Zheng, B. Zheng, Automatic object detection and segmentation from underwater images via saliency-based region merging, in OCEANS 2016, Shanghai, 2016, pp. 1–4. https://doi.org/10.1109/OCEANSAP.2016.7485598 28. H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, S. Li, Salient object detection: a discriminative regional feature integration approach, in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2083–2090 29. D. Priyadarshni, M. Kumar, H. Kolekar, Underwater object detection and tracking, in Soft Computing: Theories and Applications (2018), pp. 837–846. https://doi.org/10.1007/978-98115-0751-9_76 30. R. Priyadharsini, T. Sree Sharmila, Object detection in underwater acoustic images using edge based segmentation method. Procedia Comput. Sci. 165, 759–765 (2019). ISSN 1877-0509. https://doi.org/10.1016/j.procs.2020.01.015. 31. Z. Chen, H. Gao, Z. Zhang, H. Zhou, X. Wang, Y. Tian, Underwater salient object detection by combining 2D and 3D visual features. Neurocomputing 391, 249–259 (2020). https://doi. org/10.1016/j.neucom.2018.10.089 32. H. Yang, P. Liu, Y.Z. Hu, J.N. Fu, Research on underwater object recognition based on YOLOv3. Microsyst. Technol. 27, 1837–1844 (2021). https://doi.org/10.1007/s00542-019-04694-8(012 3456789 33. E. Galceran, V. Djapic, M. Carreras, D.P. Williams, A real-time underwater object detection algorithm for multi-beam forward looking sonar. IFAC Proc. 45(5), 306–311 (2012). https:// doi.org/10.3182/20120410-3-PT-4028.00051
Covacdiser: A Machine Learning-Based Web Application to Recommend the Prioritization of COVID-19 Vaccination Deepraj Chowdhury , Soham Banerjee , Ajoy Dey , Debasish Biswas, and Siddhartha Bhattacharyya Abstract The COVID-19 has slowly spread all over the world. Hence, to get rid of this deadly virus, vaccination is very important, but for a country like India, vaccinating the whole country is not possible within a very short time. So vaccine prioritization should be done in a very effective way. For an instance, the elderly persons or people having health issues or frontline workers are to be given higher priority for vaccination than other masses. Age and job designation are not the only attributes that affect a person’s chance of getting infected. Covacdisor has been specifically developed for this purpose using machine learning (ML), where the infection risk factor is predicted. The prediction would help in the proper prioritization of vaccines. The predicted risk factor is dependent on 24 parameters. These parameters directly affect a person’s immunity. A dataset has been proposed with these 24 parameters. Support vector machine (SVM), K-nearest neighbour (KNN), logistics regression (LR), and random forest (RF) have been used for training the model on the proposed dataset and got the highest accuracy of 0.85 from RF. Random forest is applied on the backend of the Web Application which is acting as a user interface and predicts the risk group of the user. With this proposed technique, prediction of the urgency of a user to get vaccinated can be done, which would help in achieving herd immunity faster by prioritizing the vaccination of the vulnerable population. D. Chowdhury (B) · S. Banerjee Department of Electronics and Communication Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India e-mail: [email protected] S. Banerjee e-mail: [email protected] A. Dey Department of Electronics and TeleCommunication Engineering, Jadavpur University, Jadavpur, West Bengal, India D. Biswas Department of Electronics and Communication Engineering, Budge Budge Institute of Technology, Kolkata, West Bengal, India S. Bhattacharyya Rajnagar Mahavidyalaya, Birbhum, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_9
105
106
D. Chowdhury et al.
Keywords COVID-19 · Machine learning · Web application · Risk factor
1 Introduction The COVID-19 virus infection, with its first case sighted in the Wuhan province of China, soon took over the world as a global pandemic. In a country as large as India, with a very high population density, controlling the pandemic becomes extremely hard. Lockdown protocols delay the spreading, and they can only be perceived as a temporary solution. With subsequent waves of the COVID-19 pandemic hitting India, situations only worsen each time. The only feasible solution in such a scenario is mass vaccination. Even vaccination in a country like India is a hectic process as the population of India is around 140 crore. To vaccinate the entire population in a single go, there is a need for 280 crore vaccines. Such a large-scale production in such a short duration is almost impossible. So, there is a need for a mechanism that would help prioritize the masses having a higher risk of getting infected and get them vaccinated first. In India, we have two vaccines [1]—first COVISHIELD, developed by AstraZeneca-Oxford. In India, it is developed by the Serum Institute of India, and the second vaccine is COVAXIN, developed by Bharat Biotech. Recently, third vaccine arrived in India, Sputnik V, developed by a Russian company. With limited resources in hand, the main challenge of the government is to prioritize masses according to their risk of developing COVID-19 infection. The senior citizens, people with health issues like diabetics, and frontline workers (police, doctors, nurses, etc.,) have to be given higher priority than the other masses as they are more susceptible to infection. So, it has become a very fundamental need for people to have an effective mechanism to prioritize vaccination. The proposed idea is a solution to that problem. In the developed prototype, two of the most recent technologies like machine learning and Web application development have been used to tackle the problem [2].
2 Literature Survey Bubar et al. [3] conducted extensive research on COVID-19 vaccination prioritization. The authors have used an age-stratified SEIR model to evaluate the impact of vaccine prioritization strategies. The strategies prioritized vaccination to different age groups. Upon further study, it was found out that prioritizing vaccination to adults (60+ years) reduced the mortality rate by a large scale. Kumar et al. [1] have conducted detailed research on the entire vaccination process in India. They have performed an extensive comparison of the available vaccines. The authors gave a detailed analysis of the cold chain network for the distribution of COVID-19 vaccines. Parra-Bracamonte et al. [4] studied the mortality data of COVID-19 patients of Mexico. They have studied the effects of existing comorbidities such as diabetes, hypertension, and obesity in the mortality rate of COVID-19 patients. Wolff et al. [5]
Covacdiser: A Machine Learning-Based Web Application to Recommend …
107
have conducted an extensive study on the risk factor of patients infected with COVID19 disease, and how it varied with pre-existing comorbidities as well as developed comorbidities. Shahcheraghi et al. [6] have performed studies on the SARS COV-2 virus and provided detailed insight on vaccine production and types of vaccines available for COVID-19. Forni and Mantovani [7] provided a detailed study of the SARS COV-2 vaccine production and the challenges which are likely to be faced in the effective distribution of the vaccines. Rodriguez et al. [8] have performed extensive research on the variation of risk factors with existing comorbidities of COVID-19 patients (Table 1).
Table 1 Related works Author
Topic discussed
Salient feature
Bubar et al. [3]
COVID-19 vaccine prioritization
Kumar et al. [1]
Vaccination in India
Parra-Bracamonte et al. [4]
Mortality rate of COVID-19 patients
Wolff et al. [5]
Risk factor of COVID-19 patients
Shahcheraghi et al. [6]
Vaccine production
Forni and Mantovani [7]
Vaccine production and distribution
Rodriguez et al. [8]
Risk factor for COVID-19 patients COVID-19 vaccine prioritization
Age-stratified SEIR model Prioritization according to age group Comparison of available COVID-19 vaccines Cold-chain network for vaccine distribution Effect of existing comorbidities on COVID-19 patients Effect of existing comorbidities Chance of developing new comorbidities Details regarding types of available vaccines Insight on vaccine production Detailed study on vaccine production Supply chain for vaccine distribution Effect of comorbidities on COVID-19 risk factor Risk factor calculation is based on 24 attributes including age, comorbidities, BMI, age, etc. Individuals categorized into three risk groups—low, medium and high
Covacdisor
108
D. Chowdhury et al.
3 Motivation and Contribution In these above-referred research papers, the authors only investigated the risk factors of COVID-19 infection based on the different health diseases or age, or serostatus. Whereas, in the proposed work, all determining factors have been regarded to predict risk factor and prioritize the vaccine not only based on age and health parameters but also the antibody status and job status. It has been found that none of the existing work deals with all of these parameters of the vaccine receiver. If a person, who got a COVID-19 infection previously, and his body has naturally started to produce antibodies, then he is relatively safe from having reinfection. So, this person can be given less priority than a person whose antibody status is negative. The contribution of the paper are: – A Web application is made which acts as a user interface. – Machine learning model is in the backend of the Web app to predict the risk factor based on user input. – On the basis of the predicted risk factor, Covacdiser gives output which is basically the urgency of the user to get vaccinated. – This urgency of vaccination is not only suggested on the basis of age and health parameters but also on the basis of antibody test report.
4 Proposed Methodology Covacdisor is based on a trained machine learning model deployed over a Web application. The user interacts with the Web application. Users’ data are collected at the frontend of the application. The data is fed into a machine learning (ML) model deployed at the backend of the app. The machine learning (ML) model has different attributes (age, area of living, BMI, and existing health conditions to name a few) as input and predicts the risk factor for COVID-19 infection with respect to the input values. The predicted risk factor is then transferred onto the frontend of the app, which is conveyed to the user in terms of priority groups as in, high, medium, or low. This main idea is to make use of a modern machine learning algorithm-based approach which provides the opportunity of considering 24 attributes that leaves an impact on a person’s immunity against the SARS cov2 virus to a considerable degree. This would surely ensure a far more accurate and exhaustive vaccination drive, and thus, we could look forward to an effective herd immunity providing measure even with a scarce supply of vaccine against all odds. Since Covacdisor proposes an alternative vaccine prioritization procedure for mankind, hence, it justifies the name Covacdisor. Covacdisor in some other form/sense could also be broken down as, ‘Covid vaccine advisor’ which is surely an apt threeword description for the proposed work. The block diagram of the model is shown in Fig. 1. Here is a conceptual diagram of the model (Figs. 2, 3; Table 2).
Covacdiser: A Machine Learning-Based Web Application to Recommend …
109
Fig. 1 Schematic of the proposed prototype
Fig. 2 Conceptual diagram of the proposed prototype
5 Machine Learning Model A person’s risk of infection is dependent on various diseases and factors. The effects of each conditions considered together give the risk factor. There is no such existing mathematical formula that can be used to calculate the risk factor which in turn provides a person’s probability of getting infected by the virus. The data-driven method is the only way to address this problem. Dataset is the most important part here as the risk factor has to be experimentally evaluated on a one-to-one basis. For this purpose, help from professionals and doctors has been taken, and previously published researches have been thoroughly studied. The task of evaluating further
110 Fig. 3 Flowchart of the proposed Web application
D. Chowdhury et al.
Covacdiser: A Machine Learning-Based Web Application to Recommend … Table 2 Risk factor versus priority range Risk factor Greater than 0.7 Between 0.4 and 0.7 Less than 0.4
111
Prioritization High risk Medium risk Low risk
risk factors for subjects unknown to the dataset is automated using the machine learning (ML) model. This section discusses the various algorithms that have been considered before finalizing the model. The machine learning algorithms considered for the proposed model: 1. Logistics Regression: Logistic regression is a form of a supervised machine learning model which is used extensively for classification-based problems. It makes use of a sigmoid function and generates a probability value for the dependent variable by establishing a relationship between one or more than one existing independent variable. Logistic regression is highly effective over data comprising of dichotomous or binary dependent variable. This model is used in statistical softwares where it becomes important to understand relationship between the dependent variable and the independent variable(s) by probability estimation using a logistic regression equation. This kind of an analysis helps in predicting the likelihood of an event happening or a choice being made. Since this particular Web application ventures towards the same goal as in, to evaluate the likelihood of falling in high or medium, or low-risk category, logistic regression has been a very sensible consideration as far as accurate classification is concerned [9]. 2. K-Nearest Neighbours (KNN): The k-nearest neighbours (KNN) algorithm is a simple and supervised machine learning algorithm which can be used for regression problems as well in a classification problem as well. KNN is relatively easier to implement but slows down with larger data size. KNN algorithms use existing data and classify new data points based on similarity measures (examples include distance function approach). The data is assigned to the class which has the maximum number of nearest neighbours [10]. The efficiency of the algorithm is dependent on the number of neighbours that are being used for comparison, and hence, the quality and precision/accuracy measures of the predictions depends on the distance measure. The KNN algorithm competes with the most accurate models because of its highly accurate predictions. Therefore, this makes it suitable for applications that require high accuracy but do not need a human-readable model. The prioritization technique devised needs accurate results for even a limited span of samples and attributes. So, utilizing a tool like KNN that provides features as described above can be regarded as a suitable algorithm for model training. 3. Support Vector Machine (SVM): Support vector machine or SVM is a supervised machine learning (ML) model which is generally used for solving classification and regression problems. Though, it works fine with linear as well as non-linear problems and can be used for solving many practical problems. The idea behind
112
D. Chowdhury et al.
SVM is: The algorithm produces a line or a hyperplane which divides the data points after which it groups them into different classes (one or more than one). The advantages of support vector machines are evident in higher dimensional spaces. But nonetheless, it is still effective in cases where the number of dimensions is greater than the number of samples (for example, in our case) [11]. Since support vector machines produce excellent results even with a lower number of samples, delivering excellent accuracy and precision is an immensely effective parameter for reaching the desirable goal in this particular scenario; hence, support vector machine algorithms have been found to produce an extremely desirable result. 4. Random Forest: Random forest (RF) is an ensemble classifier that makes use of several decision trees to obtain a better prediction. It makes use of bagging and feature randomness while constructing each decision tree which enables the creation of an uncorrelated forest of trees. For classification problems, the output of a random forest model is the class selected by the most number of trees. Random forests perform very well with high-dimensional data. It is faster to train than decision trees because work is only being done on a subset of features in this model, so one can easily work with hundreds of features [12]. The random forest classifier reduces the probability of overfitting than the decision tree, so to use the random forest as it performs better on a smaller number of samples reducing chances of overfitting. And this is the cause that prompted the use of random forest algorithm in the currently discussed application.
6 Dataset The proposed dataset has been inspired from the dataset for ‘Age stratified Covid-19 case fatality rates,’ [13] and CORD-19: The COVID-19 Open Research Dataset [14]. Instead of limiting the proposed work to only age-based classification, other attributes have been added to make the prioritization (or classification) more accurate which stands as the key highlight of the prioritization approach devised in the following lines. The dataset has been generated by collecting data from 2512 persons. It consists of different parameters which directly or indirectly affect the individual’s risk factor for COVID infection. The proposed dataset comprises 24 attributes and a target variable which is the risk factor. These 24 attributes are age, gender, frontline worker, antibody test result, cancer, chronic kidney disorder, COPD, down syndrome, heart condition, immunocompromised state, BMI, pregnancy, sickle cell, smoking, type 2 diabetics, asthma, cerebrovascular disease, hypertension, neurological disorder, liver disease, pulmonary fibrosis, thalassemia, decompensated cirrhosis, and type 1 diabetics [15]. In this dataset, all the attributes except BMI and age are Boolean, that is, they are fixed to a possible result of either 1 or 0 corresponding to ‘yes’ and ‘no,’ or ‘positive’ or ‘negative.’ BMI has been classified into five divisions: overweight, normal, underweight, severe obesity, and obesity. The numerical age of a person is taken in as input. The value of the target variable is noted based on the values of 24 attributes. The risk factor has been denoted as a decimal value (accuracy up to two
Covacdiser: A Machine Learning-Based Web Application to Recommend …
113
places) between 0 and 1. The risk factor denotes the person’s chance of mortality or the deterioration of one’s immunity against the virus once affected. A person whose risk factor is 0.9 has a higher chance of death due to COVID infection than a person whose risk factor is 0.2. No wonder anyone might think that vaccinating the person with a higher risk factor is more sensible than vaccinating the low-risk holder person in case of limited vaccine supply. But since manually calculating risk factor is tedious so much so that it might even seem impossible, here comes the role of machine learning algorithms. Several scientists and firms have carried out several studies as to what degree to certain comorbidity or disease can control the risk factor of an individual in a case of COVID infection. After carrying out thorough research on what should be the set of attributes that controls the risk factor magnitude to the highest degree, the decision for consideration of the aforesaid 24 particular attributes was finalized. The end goal of the following venture is to assemble all such studies to produce some beneficiary results for mankind, if at all any, in the field of vaccination prioritization. The starting point of the project was always to devise a procedure to vaccinate very specific subjects and avoid flowing out vaccines to the low-risk group holders for now. This would not only ensure accurate vaccination drive even with limited resources and the protection of the future generations from the deadly influence but also catalyse the counter-spread of the disease even in the current time-frame so that things could go back to normalcy faster and with more reliability and guarantee of security against the pandemic forces.
7 Web Application The Web application contains the following sections: 1. Introduction 2. A Read More section highlighting further insights on our project 3. The Take the Test button (/Task area). The various factors that have been considered are: age (in number), gender, front line employee, COVID antibody test result, cancer, chronic kidney disease, COPD, down syndrome, cardiomyopathies, immunocompromised state from solid organ transplant/HIV, BMI group/nutritional status, pregnancy status, sickle cell disease, smoking, type 2 diabetes mellitus, asthma, cerebrovascular diseases, hypertension/high B.P, dementia, liver disease, pulmonary fibrosis, thalassemia, decompensated cirrhosis, and type 1 diabetes mellitus. The ML model has been written in a separate .ipynb file, and then, the model is dumped into a pickle file (.pkl). By importing the pickle file in the .py streamlit file, data dropped into the Web app by the user can be feed to the machine learning (ML) model in the form of the pickle file, and which would automatically classify the user into a risk group, low, high, or medium. In simple words, an evaluation of the urgency of vaccination is needed in our user’s case. After that, all that awaits is the deployment of the Web application in some cloud-based services available. This
114
D. Chowdhury et al.
Fig. 4 Homescreen of Web application
would generate a link for our Web application. Thus, the proposed Web application is finally ready for use now. The link could be shared anywhere and is open to use for everyone on the Internet who has the link (Fig. 4).
8 Results and Discussion There is a lack of mathematical methodology/procedure that can provide a proper scientific pathway to perform accurate prioritization. Hence, it is an unsolved problem that needs to be curated first using already available tools if possible and then turned into an algorithm. So the only way is to rely on pre-recorded datasets, for which the best choice would be using machine learning models like KNN, SVM, logistic regression, and random forest. Now, let’s look at a brief study on the performance of these various algorithms deployed for the classification. The reports on the performance of various models used for classification is described in Table 3. Figure 5 shows a comparative study of the used ML models. A comparative study of the confusion matrices of the different algorithms has been provided in Fig. 6.
Covacdiser: A Machine Learning-Based Web Application to Recommend … Table 3 Comparison of different performance metrics Model name Accuracy Precision Recall Logistic regression KNN SVM Random forest
115
f1-score
AUC area
0.81
0.81
0.81
0.81
0.9264
0.74 0.82 0.85
0.73 0.83 0.85
0.74 0.82 0.85
0.73 0.82 0.85
0.8793 0.9358 0.9462
Fig. 5 Performance comparison of ML models
9 Conclusion and Future Works With the impending threats from subsequent waves of the pandemic, lockdown protocols cannot be considered as a permanent solution. The only way out is mass vaccination to achieve herd immunity. It is in this context that the proposed prototype would have its utmost importance. Taking 24 factors into consideration adds to the depth of prediction and would help in proper prioritization. It will also be helpful in better utilization of the available vaccine resources. The model can also be used as an advisor to the common masses, making them aware of their imminent risks and advising them accordingly. In a way, reducing pressure from the authorities. The proposed prototype, taking into account all possible factors which directly or indirectly affect the risk factor of a person contracting the disease. It has been aimed to make the proposed work more thorough by adding more possible attributes to
116
D. Chowdhury et al.
Fig. 6 Confusion matrices of different ML models
keep the model consistent. In the future, the prototype can be made more efficient and effective by implementing deep learning algorithms or other machine learning algorithms to achieve better results. Android applications can be made for mobile devices. Acknowledgements We thank Dr. Priyanka Roy (MBBS) for authenticating the dataset and guiding us with different other medical information which helped us a lot in this research.
References 1. V. Kumar, S.R. Pandi-Perumal, I. Trakht, S. Thyagarajan, Strategy for COVID-19 vaccination in India: the country with the second highest population and number of cases. NPJ Vaccines 6, 60 (2021). https://doi.org/10.1038/s41541-021-00327-2
Covacdiser: A Machine Learning-Based Web Application to Recommend …
117
2. J. Velasco, W.-C. Tseng, C.-L. Chang, Factors affecting the cases and deaths of COVID19 victims. Int. J. Environ. Res. Public Health 18, 674 (2021). https://doi.org/10.3390/ ijerph18020674 3. K. Bubar, S. Kissler, M. Lipsitch, S. Cobey, Y. Grad, D. Larremore, Model-informed COVID19 vaccine prioritization strategies by age and serostatus. medRxiv: the preprint server for health sciences (2020).https://doi.org/10.1101/2020.09.08.20190629 4. G.M. Parra-Bracamonte, N. Lopez-Villalobos, F.E. Parra-Bracamonte, Clinical characteristics and risk factors for mortality of patients with COVID-19 in a large data set from Mexico. Ann. Epidemiol. 52, 93–98.e2 (2020). https://doi.org/10.1016/j.annepidem.2020.08.005. Epub 2020 Aug 14. PMID: 32798701; PMCID: PMC7426229 5. D. Wolff, S. Nee, N.S. Hickey et al., Risk factors for Covid-19 severity and fatality: a structured literature review. Infection 49, 15–28 (2021) 6. S.H. Shahcheraghi, J. Ayatollahi, A.A. Aljabali, M.D. Shastri, S.D. Shukla, D.K. Chellappan, N.K. Jha, K. Anand, N.K. Katari, M. Mehta, S. Satija, H. Dureja, V. Mishra, A.G. Almutary, A.M. Alnuqaydan, N. Charbe, P. Prasher, G. Gupta, K. Dua, M. Lotfi, et al.: An overview of vaccine development for COVID-19. Therap. Delivery 12(3), 235–244 (2021). https://doi.org/ 10.4155/tde-2020-0129 7. G. Forni, A. Mantovani, COVID-19 Commission of Accademia Nazionale dei Lincei, Rome, COVID-19 vaccines: where we stand and challenges ahead. Cell Death Differ. 28(2), 626–639 (2021). https://doi.org/10.1038/s41418-020-00720-9. Epub 2021 Jan 21. PMID: 33479399; PMCID: PMC7818063 8. J.E. Rodriguez, O. Oviedo-Trespalacios, J. Cortes-Ramirez, A brief-review of the risk factors for covid-19 severity. Rev. Saúde Púb. 54 (2020). https://doi.org/10.11606/s1518-8787. 2020054002481 9. J. Peng, K. Lee, G. Ingersoll, An introduction to logistic regression analysis and reporting. J. Educ. Res. 96, 3–14. https://doi.org/10.1080/00220670209598786 10. L. Wang, Research and implementation of machine learning classifier based on KNN. IOP Conf. Ser. Mater. Sci. Eng. 677, 052038. https://doi.org/10.1088/1757-899X/677/5/052038 11. T. Evgeniou, M. Pontil, Support Vector Machines: Theory and Applications, vol. 2049, pp. 249–257 (2001). https://doi.org/10.1007/3-540-44673-71-2 12. J. Ali, R. Khan, N. Ahmad, I. Maqsood, Random forests and decision trees. Int. J. Comput. Sci. Issues(IJCSI) 9 (2012) 13. J. von Kügelgen, L. Gresele, B. Schölkopf, Age-stratified Covid-19 case fatality rates (CFRs): different countries and longitudinal. IEEE Dataport (2020). https://doi.org/10.21227/9rqbh361 14. L. Lu Wang, K. Lo, Y. Chandrasekhar et al., CORD-19: The Covid-19 Open Research Dataset. Preprint. arXiv:2004.10706v2. Published 2020 Apr 22 15. O. Dadras, N. Hahrokhnia, S. Borran, S. Seyedalinaghi, Factors associated with COVID-19 morbidity and mortality: a narrative review. J. Iran. Med. Counc. (2020). https://doi.org/10. 18502/jimc.v3i4.5188
Research of High-Speed Procedures for Defuzzification Based on the Area Ratio Method Maxim Bobyr , Sergey Emelyanov , Natalia Milostnaya , and Sergey Gorbachev
Abstract The article discusses ways to build high-speed defuzzifiers. Two procedures for modifying the defuzzification model based on the area ratio method are presented. A limitation of the proposed method with two high-speed procedures is the use of only triangular or singleton membership functions. The main aim of the research is to test the hypothesis about the possibility of changing the type of the transient process during learning of the fuzzy MISO-system and to study the properties of the influence of the weight coefficient on the speed of its learning. The study tested the hypothesis of the presence of the additivity property in the high-speed defuzzifier with procedures I and II. Experimental researches have confirmed these hypotheses. The architecture of a fuzzy MISO-system based on the area ratio method with two high-speed procedures is shown in the article. Also in the article, firstly, graphs are presented that simulate the work of the center of gravity method, the area ratio method, and two high-speed procedures. Secondly, the article shows graphs that simulate the learning process of a fuzzy MISO-system. Keywords Defuzzification · Method of area’s ratio · MAR · Soft computing
1 Introduction Fuzzy reasoning models are an effective tool for creating artificial intelligence systems. Such systems are often used in the tasks of recognizing data obtained by computed tomography [1], high-speed processing of parts [2], and control of mobile robots [3]. The main stages of fuzzy inference are fuzzification, fuzzy inference, and defuzzification [4–10]. The subject of the article is the third defuzzification operation. One of the main drawbacks that can occur during defuzzification based on the center of gravity method is the narrowing of the interval of the resulting variable, which means it is impossible to obtain the minimum and maximum values from M. Bobyr (B) · S. Emelyanov · N. Milostnaya Southwest State University, Kursk, Russian Federation 305040 S. Gorbachev National Research Tomsk State University, Tomsk, Russian Federation 634050 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_10
119
120
M. Bobyr et al.
the kernel of the output membership function at the output of a fuzzy system [11]. Most defuzzification models are based on the center of gravity and have systematic drawbacks that are inherent in it [12]. The defuzzification model based on the area ratio method is able to compensate for these drawbacks [13, 14].
2 Method of Area’s Ratio and High-Speed Procedures 2.1 Method of Area’s Ratio The method of area ratio is formed using the following mathematical operations. Step 1. Determine the total area of the output variable form: S1 =
nb1 , 2
(1)
where n is the number of terms of the output membership function; b1 —width of the base of the triangular membership function. Step 2. Calculation of the area of the transformed membership functions, which depend on the values of the height of the premises of fuzzy rules: k S1i i=1
⎧ ⎨ S1i = 0, if h = 0; = S1i = b21 , if h = 1; ⎩ S1i = h2 (b1 + b3 ), if h ∈ (0, 1).
(2)
where b3 is the upper base of the trapezoidal membership function after truncation. Step 3. Determination of the total area of the figure with the transformed membership functions: S2 =
k
S1i .
(3)
i=1
Step 4. Calculation of the crisp value at the output of the fuzzy system: ydefuz =
S2 · (yfin − yst ) + yst , S1
(4)
where yst , yfin are the start and final values of the fuzzy membership function kernel.
Research of High-Speed Procedures for Defuzzification …
121
2.2 Modification I of the Area Ratio Method Through experimental research, it was found that Formula (2) is the most timeconsuming operation. It is possible to reduce the complexity of Formula 2 by expressing the variable b3 through the variable b1 . The article [5] presents mathematical formulas for this transformation. Taking them into account, the total area of the truncated terms of the output variable will be determined n S2 =
i=1
h i · b1 (2 − h i ) . 2
(5)
Then n
S2 = S1
i=1
n
h i ·b1 (2−h i ) 2 nb1 2
=
i=1
h i (2 − h i ) . n
(6)
And n ydefuz I =
i=1
h i (2 − h i ) · (yfin − yst ) + yst n
(7)
Thus, Eq. (7) does not depend on the value of the upper base of the trapezoidal membership function.
2.3 Modification II of the Area Ratio Method Let the truncated terms of the output variable do not depend on the width of the upper base of the truncated term b3 . Then Formula (2) will take the form: Si =
hi b1 . 2
(8)
After substituting Eq. (8) into Eq. (2) obtain a formula for finding the total area of the truncated terms of the output fuzzy variable: n S2 =
i=1
h i b1
2
.
(9)
Substitute Eqs. (9) and (1) into Eq. (4) and define: n
S2 = S1
h i b1 2 nb1 2
n
i=1
=
i=1
n
hi
.
(10)
122
M. Bobyr et al.
Thus, at the output of the defuzzifier obtain n ydefuz II =
v
hi
n
· (yfin − yst ) + yst
(11)
3 Fuzzy MISO-System with Method of Areas’ Ratio Let a fuzzy MISO-system have three inputs (T = {t 1 , t 2 , t 3 }, S = {s1 , s2 , s3 }, V = {v1 , v2 , v3 }) and one output variable (M = {m1 , m2 , m3 , m4 , m5 , m6 , m7 , m8 , m9 , m10 , m11 ,}). All variables have triangular membership functions. The labels of the triangular membership functions for the input and output variables are shown in Table 1. The block scheme for calculating membership functions of the first variable of the fuzzy MISO-system (T = {t 1 , t 2 , t 3 }) is shown in Fig. 1. The degrees of activation of the first membership function are calculated in the subsystem block (see Fig. 1). The block scheme for this operation is shown in Fig. 2. Table 1 Labels of inputs and output membership function
Variables
Variable name
Designation
Labels
Input
T
t1
{200, 220, 240}
t2
{220, 240, 260}
t3
{240, 260, 280}
s1
{0.3, 0.45, 0.6}
s2
{0.46, 0.6, 0.75}
s3
{0.6, 0.75, 0.9}
v1
{300, 350, 400}
v2
{350, 400, 450}
v3
{400, 450, 500}
m1
{220, 230, 240}
m2
{230, 240, 250}
m3
{240, 250, 260}
m4
{250, 260, 270}
m5
{260, 270, 280}
m6
{270, 280, 290}
m7
{280, 290, 300}
m8
{290, 300, 310}
m9
{300, 310, 320}
m10
{310, 320, 330}
m11
{320, 330, 340}
S
V
Output
M
Research of High-Speed Procedures for Defuzzification …
123
Fig. 1 Block scheme for constructing membership functions of the first input variable T
Fig. 2 Block scheme for calculating the degrees of activation of the first membership function
124
M. Bobyr et al.
Table 2 Fuzzy rules’ base FR
Variables
1
t1
FR
Variables
v1
10
t2
2
v2
3
v3 v4
13
5
v5
6
v6 v7
16
8
v8
9
v9
4
7
s1
s2
s3
Table 3 Aggregating fuzzy variables
FR
Variables
v1
19
t3
11
v2
20
12
v3
21
v4
22
14
v5
23
15
v6
24
v7
25
17
v8
26
v8
18
v9
27
v9
s1
s2
s3
s1
v1 v2 v3
s2
v4 v5 v6
s3
v7
#
Equation
1
M 1 = min(t 1 , s1 , v1 )
2
M 2 = max[min(t 1 , s1 , v2 ), min(t 1 , s2 , v1 )]
3
M 3 = max[min(t 1 , s1 , v3 ), min(t 1 , s2 , v2 ), min(t 1 , s3 , v1 )]
4
M 4 = max[min(t 1 , s2 , v3 ), min(t 1 , s3 , v2 ), min(t 2 , s1 , v1 )]
5
M 5 = max[min(t 1 , s3 , v3 ), min(t 2 , s1 , v2 ), min(t 2 , s2 , v1 )]
6
M 6 = max[min(t 2 , s1 , v3 ), min(t 2 , s2 , v2 ), min(t 2 , s3 , v1 )]
7
M 7 = max[min(t 2 , s2 , v3 ), min(t 2 , s3 , v2 ), min(t 3 , s1 , v1 )]
8
M 8 = max[min(t 2 , s3 , v3 ), min(t 3 , s1 , v2 ), min(t 3 , s2 , v1 )]
9
M 9 = max[min(t 3 , s1 , v3 ), min(t 3 , s2 , v2 ), min(t 3 , s3 , v1 )]
10
M 10 = max[min(t 3 , s2 , v3 ), min(t 3 , s3 , v2 )]
11
M 1 = min(t 3 , s3 , v3 )
Let the fuzzy MISO-system has 27 fuzzy rules (FR) (see Table 2). Formulas for aggregating fuzzy variables are presented in Table 3. Let defuzzification in the fuzzy MISO-system be carried out using the area ratio method (see Eq. 4) and two high-speed procedures (see Eqs. 7 and 11).
4 Experimental Research The kinematic model of the fuzzy MISO-system was developed using the Simulink environment included in MATLAB. In the course of its implementation, several experiments were carried out (Fig. 3). In the first experiment, using Eqs. (1)–(4), defuzzification was carried out based on the area ratio method and a two-dimensional graph was built (see Fig. 4, blue line), using Eqs. (5)–(7) (Fig. 5), defuzzification was carried out based on high-
125
Fig. 3 Block scheme of the fuzzy MISO-system
Research of High-Speed Procedures for Defuzzification …
126
Fig. 4 Modeling of the fuzzy MISO-system
Fig. 5 Block scheme method of area ratio—procedure I
M. Bobyr et al.
Research of High-Speed Procedures for Defuzzification …
127
speed procedures I and a two-dimensional graph was built (see Fig. 4, orange line), using Eqs. (8)–(11) (Fig. 6), defuzzification was carried out based on high-speed procedures II and a two-dimensional graph was built (see Fig. 4, green line). This experiment makes it possible to determine the resulting output characteristics of the fuzzy MISO-system. During the first experiment, it was found that the traditional defuzzification model based on the center of gravity method has zones of insensitivity, that is, does not respond to changes in input variables in the ranges from 0 to 5 and from 15 to 20 (see Fig. 4). And also, it was concluded that fuzzy models based on the center of gravity method are non-additive. Also, analysis of the data in Fig. 4 showed that in the developed method MAR with procedures I and II in the range from 0 to 5 and from 15 to 20, the resulting variable changes. Therefore, a fuzzy system using the area ratio’s method with procedures I and II will have the property of additivity.
Fig. 6 Block scheme method of area ratio—procedure II
128
M. Bobyr et al.
Fig. 7 Block scheme of the learning block
It should be noted that 36 computational operations are used in the MAR with procedure I. 14 computational operations are used in the MAR with procedure II. Therefore, the performance of MAR with procedure II is approximately 2.6 times better than that of MAR with procedure I. The possibility of learning the fuzzy system with high-speed defuzzifiers based on procedures I and II was checked during the second experiment. Fuzzy MISO-system learning is carried out using the method of backpropagation error by formula woutput = woutput−1 + σ (ydefuz − yref ), until |ydefuz − yref | ≤ T,
(12)
where woutput is weight coefficient; σ is a learning step (by default is equal 0.02); ydefuz is the value obtained using one of Formulas (7) or (11); yref is the reference value; T is a threshold (by default is equal 0.01). It is necessary to add a weighting coefficient in Formulas 7 and 11 for the fuzzy model to be able to learn. The block scheme of the learning block with the use Formula (12) is shown in Fig. 7. Then Formulas 7 and 11 take the form: n i=1 h i (2 − h i ) · (yfin − yst ) + yst (13) ydefuz I = w·n n i=1 h i · (yfin − yst ) + yst ydefuz II = (14) w·n The learning of the fuzzy MISO-system was carried out during the experiment. The output signal of the fuzzy MISO-system had to tend to several values to 235–245 and to 255. Diagrams simulating the learning process are shown in Figs. 8, 9 and 10. An analysis (see Figs. 8, 9 and 10) of the learning of a fuzzy MISO-system showed that the learning speed is better for the MAR-defuzzifier with procedure I. During the second experiment, it was also found that the weighting coefficient w (see Eqs. 13
Research of High-Speed Procedures for Defuzzification …
129
Fig. 8 Learning fuzzy system to 235 with w = 0.05
Fig. 9 Learning fuzzy system to 245 with w = 0.05
and 14) affects the learning time of the fuzzy model. For example, when the weight coefficient changes to 0.01 while learning to 255, it is seen that the learning process is transformed from oscillatory to monotonous. In this case, the learning rate of the fuzzy model decreases (Fig. 11).
130
M. Bobyr et al.
Fig. 10 Learning fuzzy system to 255 with w = 0.02
Fig. 11 Learning fuzzy system to 255 with w = 0.01
5 Conclusion A fuzzy MISO-system consisting of three input and one output variable is presented in the article. Input and output membership functions are described by triangular and sigmodal membership functions. The knowledge base contains 27 fuzzy rules. The article proposes two high-speed defuzzification procedures that allow, firstly, to ensure the additivity property of a fuzzy model, and secondly, to carry out its training.
Research of High-Speed Procedures for Defuzzification …
131
In the course of experimental researches, it was found that the MAR-defuzzifier with procedure II has a 2.5 times better performance, but loses to the MAR-defuzzifier with procedure I when learning a fuzzy model.
References 1. A.M. Anter, S. Bhattacharyya, Z. Zhang, Multi-stage fuzzy swarm intelligence for automatic hepatic lesion segmentation from CT scans. Appl. Soft Comput. 96, 106677 (2020) 2. M.V. Bobyr, S.A. Kulabukhov, Simulation of control of temperature mode in cutting area on the basis of fuzzy logic. J. Machinery Manuf. Reliab. 46(3), 288–295 (2017) 3. M. Bobyr, V. Titov, A. Belyaev, Fuzzy system of distribution of braking forces on the engines of a mobile robot. MATEC Web Conf. 79, 01052 (2016) 4. D.R. Keshwani, D.D. Jones, G.E. Meyer, R.M. Brand, Rule-based Mamdani-typefuzzy modeling of skin permeability. Appl. Soft Comput. 8, 285–294 (2008) 5. V.I. Syryamkin, S.V. Gorbachev, M.V. Shikhman, Adaptive neuro-fuzzy classifier for evaluating the technology effectiveness based on the modified Wang and Mendel fuzzy neural production MIMO-network. IOP Conf. Ser.: Mater. Sci. Eng. 516(1), 012037 (2019) 6. S. Gorbachev, N. Gorbacheva, S. Koynov, A synergistic effect in the measurement of neurofuzzy system. MATEC Web Conf. 79, 01065 (2016) 7. V.I. Syryamkin, S.V. Gorbachev, M.V. Shikhman, Adaptive fuzzy neural production network with MIMO-structure for the evaluation of technology efficiency. IOP Conf. Ser.: Mater. Sci. Eng. 516(1), 012010 (2019) 8. S. Gorbachev, V. Syryamkin, High-performance adaptive neuro-fuzzy classifier with a parametric tuning. MATEC Web Conf. 155, 01037 (2018) 9. Extractive text summarization using deep natural language fuzzy processing 10. G. Neelima, M.R.M. Veeramanickam, S. Gorbachev, S.A. Kale, Int. J. Innov. Technol. Explor. Eng. 8(6 Special Issue 4), 990–993 (2019) 11. A. Piegat, Fuzzy Modelling and Control (Physica-Verlag, Heidelberg, 2001). https://doi.org/ 10.1007/978-3-7908-1824-6 12. W.V. Leekwijck, E.E. Kerre, Defuzzification: criteria and classification. Fuzzy Sets Syst. 108, 159–178 (1999) 13. M.V. Bobyr, S.G. Emelyanov, A nonlinear method of learning neuro-fuzzy models for dynamic control systems. Appl. Soft Comput. 88, 106030 (2020) 14. M.V. Bobyr, A.S. Yakushev, A.A. Dorodnykh, Fuzzy devices for cooling the cutting tool of the CNC machine implemented on FPGA. Measurment (2020). https://doi.org/10.1016/j.mea surement.2019.107378
A Single Qubit Quantum Perceptron for OR and XOR Logic Rohit Chaurasiya, Divyayan Dey, Tanmoy Rakshit, and Siddhartha Bhattacharyya
Abstract Quantum computing is gradually gaining attention due to the properties of parallelism and superposition which can massively speed up data processing. This study deals with the development of quantum logic gates, fundamental to a quantum computer and trained on a single-layer perceptron quantum neural network. We demonstrate the proposed model with its implementation on Xanadu’s Pennylane quantum simulator. Along with the proposed model, a classical single-layered perceptron has been implemented so as to perform a comparative study between their performance. The proposed model has been trained on three different optimizers, namely Adam optimizer, momentum optimizer and gradient descent in order to study each optimizer’s performance to train the proposed single qubit perceptron. Keywords Qubits · Quantum perceptron · Superposition · Entanglement · Rotational gate
1 Introduction In 1981, at a conference related to physics and computation, Feynman posed a question that if it was possible to simulate the complete physics on a classical computer. The answer turns out to be—not all of the physics. To be able to simulate quantum mechanical systems, the classical computers need more memory than they possess. R. Chaurasiya (B) Department of Applied Physics, Delhi Technological University, Bawana, Delhi 110042, India e-mail: [email protected] D. Dey Department of Electronics and Communication Engineering, University of Calcutta, Kolkata, West Bengal 700009, India T. Rakshit Department of Computer Science and Engineering, Seacom Engineering College, Howrah, West Bengal 711302, India S. Bhattacharyya Rajnagar Mahavidyalaya, Birbhum, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_11
133
134
R. Chaurasiya et al.
In order to study a single quantum particle, at least two variables are needed. So if a physicist needed to study a system of 100 particles, he/she would need to declare a minimum of 2100 variables. This number is relatively larger than the memory of any of the current super computers. This has raised one of the fundamental problems in physics and computer science [1]. In order to solve them, it was proposed to put forth a computational procedure which could leverage the properties of quantum mechanical systems instead of the classical method of using transistors. Such computers are now known as quantum computers. Since the past few decades, quantum computing has promised to solve a huge number of heavy computational tasks and have found applications in diversified fields, which extends from machine learning to finance to medical industry [2–4]. This century brings us to the dawn of the realization of quantum computers. The current quantum computers belongs to the noisy intermediate-scale quantum (NISQ) era [5]. The merit of these quantum computers lies in the fact that majority of them are easily accessible to anyone through cloud technologies. Its accessibility gives us a tendency to study and demonstrate the most promising applications of quantum algorithms proposed from previous literature. Machine learning is one such field which has grabbed the attention of quantum computers. In this paper, we propose a single-qubit perceptron which has been trained to learn OR and XOR logic along with the study of the performance of various optimizers including Adam, momentum and gradient descent optimizers for model training and testing. The rest of the paper is divided into eight sections. What follows next is the brief description of the motivation behind the work followed by the theory of a singlelayered perceptron. Further, some basics of quantum computing have been touched upon, and subsequently, the implementation of the proposed model along with the discussion of the simulation results have been discussed.
2 Motivation and Contributions Several decades after the proposal of quantum computers, we now have access to the real hardware through cloud technologies. This motivated us to study and implement the proposed algorithms in the domain. Also, machine learning has turned out to be one of the most promising fields with a plethora of practical applications. The power of machine learning would remain untapped without faster processors and larger memory, which has allowed quantum computing to enter the field of machine learning with its promising exponential speed up and huge data holding capacities through its properties of superposition and entanglement. Therefore, it has become important to explore and study the arena of quantum machine learning exhaustively. There has been a myriad of work done in the field of quantum machine learning for the development of quantum perceptrons [6]. But as the technology of quantum computers progress, it becomes vitally important to implement these models and study their results. In this work, the authors propose a single-qubit quantum
A Single Qubit Quantum Perceptron for OR and XOR Logic
135
perceptron and its implementation for performing OR and XOR logic operations. The reason for choosing two logic gates is to demonstrate performance for linearly and non-linearly separable functions.
3 Single-Layer Perceptron Rosenblatt in 1957 [7] proposed the first model of the single-layered perceptron which is a simple linear model spurred from the neurological perspective of our brain’s neuron. The model was designed to take a set of n inputs X = (x1 , x2 , . . . , xn ) and produce an output y. This model learns a set of weights, W = (w1 , w2 , . . . , wn ), which is used to compute a weighted sum of the inputs along with addition of a bias term, f (x, w) = X T W + b. The value procured is passed into the activation function which decides whether the neuron will be activated or not. If the neuron is activated, then the final output is 1; else, it is 0. An example of a famous activation function is the Heaviside function which has been shown in the following equation, 1, z ≥ 0 T f (z = X W + b) = (1) 0, z < 0 Figure 1 shows a basic structure of the single-layered perceptron. This simple model of the perceptron has evolved into diverse complex structures like that of a deep neural network [8]. These evolutions have opened gates for vast applications for the neural networks which extend from self-driving cars to drug discovery [9–12]. One of the simplest applications of a perceptron is to learn to behave as a logical function, i.e. AND, OR, XOR or NOT logic. The next section briefly describes how a single-layered perceptron can be used to train to emulate the OR logic.
Fig. 1 Structure of a single-layered perceptron
136
R. Chaurasiya et al.
Table 1 Truth table for OR logic gate x1 x2 0 0 1 1
0 1 0 1
y 0 1 1 1
3.1 Single-Layered Perceptron Model for OR Logic OR logic is a two-input Boolean function which outputs 0 or 1 depending upon the inputs given. The output is low (or logical 0) only when both the inputs are low. Otherwise, the output is high (or a logical 1). Table 1 shows the truth table for the OR gate. The algorithm to train a perceptron to emulate the OR logic gate includes the following steps: Algorithm 1 Algorithm for OR logic training on single-layered perceptron 1: P ← inputs labelled as 1 2: N ← inputs labelled as 0 3: Weight w is initialized randomly; 4: while !convergence do 5: Choose random x ∈ P ∪ N 6: if x ∈ P and w.x ≥ 0 then 7: w = w + x; 8: else if x ∈ N and w.x < 0 then 9: w = w − x; 10: end if 11: end while Algorithm converges if all outputs are correctly classified
For the case of the OR logic, the inputs, x1 and x2 , in the truth table (Table 1) act as the training data. The structure consists of two input nodes taking x1 and x2 as the input and further executes the algorithm mentioned above on the described structure of the single-layered perceptron. A more detailed explanation for the same can be found in [13]. Table 2 shows the truth table for the XOR logic. The XOR logic cannot be learned by using the single-layered perceptron. Authors of [14] have gone into detailed explanation for this problem with XOR and its solution using multilayered perceptron. The next section dwells upon some of the fundamental terms of quantum mechanics that one must know to understand the single-layered quantum perceptron proposed in this paper.
A Single Qubit Quantum Perceptron for OR and XOR Logic Table 2 Truth table for XOR logic gate x1 x2 0 0 1 1
137
y
0 1 0 1
0 1 1 0
4 Quantum Computers Famous physicist, Feynman et al. [15], came up with an abstract idea of how the properties of quantum mechanics like entanglement and superposition can be used to perform computation. This computing device would allow simulating quantum mechanical systems using principles of quantum mechanics. In 1994, Shor [16] showed that a quantum computer can be used for the factorisation of numbers in lesser time than classical computers. This is considered a breakthrough because faster factorisation indicates that encryption methods can be broken relatively easily. Hence, quantum computation has attracted the attention of the complete world.
4.1 Qubits In quantum computers, the lowest level of information is represented by a qubit [17]. Contrary to classical bits which can only be in a single state at a single time, qubits use a quantum mechanical property called superposition which allows them to be in states 0 and 1 simultaneously. Hence, a qubit can represent 2n states while a classical bit can represent only n states, where n denotes the number of bits. Mathematically, a qubit in a superposed state is described as given |ψ = α |0 + β |1 =
α β
(2)
Here, |0 and |1 represent the basis states while α and β are complex amplitudes associated with the states. The square of these amplitudes represents the probability of measuring a particular state after measurement. Before the measurement, the qubit is in both the states, |0 and |1, simultaneously. Also, the sum of the squares of these amplitudes must always be 1, as given by, α2 + β 2 = 1
(3)
Qubits also have a property of entanglement. Accordingly, the state of one qubit affects the state of the other. The very famous entangled state is the bell state [18] shown as follows
138
R. Chaurasiya et al.
|ψ =
α |00 + β |11 √ 2
(4)
In the above expression, if one measures the first qubit to be in, say, state |0, then the second qubit will always end up in |0 and vice versa. This shows how the first qubit is entangled with the second qubit. Qubits with these properties can be realized by several techniques proposed throughout research. But currently, some of the famous types includes qubits using ion traps, quantum dots, superconducting qubits, photonic qubits, etc. These qubits are operated upon by quantum gates which can manipulate their state in order to execute any quantum algorithm. This has been discussed in the following section.
4.2 Quantum Gates In a classical computer, operations are performed using logic gates in electronic circuits. Similarly, there exists a mathematical formulation for quantum gates. These quantum gates can act on qubits to change their states and perform quantum computations. These gates are unitary matrices. For example, the quantum NOT gate which when acts on a qubit, flips the state of the qubit. It has the following matrix representation, 01 X= (5) 10 When applied on a qubit, it yields, X |ψ =
01 α = α |1 + β |0 10 β
(6)
It can be seen that the matrix operation on the qubit interchanges the amplitudes of |0 and |1, which implies flipping of the states similar to the classical NOT gate. Similarly, there are other gates, like the Pauli X, Y and Z matrices and the Hadamard gate. The Hadamard gate is the quantum gate that puts a qubit into equal superposition states [19]. Given below are some of the quantum gates represented in their matrix form. cos θ2 i sin θ2 (7) Rx (θ ) = i sin θ2 cos θ2 cos θ2 − sin θ2 R y (θ ) = sin θ2 cos θ2 λ e−i 2 0 Rz (λ) = λ 0 ei 2
(8)
(9)
A Single Qubit Quantum Perceptron for OR and XOR Logic
1 1 1 H=√ 2 1 −1
139
(10)
4.3 Quantum Measurement Measurement [20] of the qubit plays a very crucial role in deciding the outcome of the quantum computation. The state of a qubit collapses into one of the basis states upon its measurement. For example, if a qubit is in a superposition state as given 1 |ψ = √ (|0 + |1) 2
(11)
Upon measurement, the wavefunction, in Eq. (11), collapses to either |0 and |1. The probability of the wavefunction collapsing to any of these states depends on the squared amplitude of the corresponding state. In this case, every basis state has an equal probability of 21 . Post measurement, the state of the qubit becomes |ψ = |0
(12)
|ψ = |1
(13)
or
A more concise review on quantum measurement has been provided in [21].
5 Proposed Single Qubit Quantum Perceptron Various models of quantum perceptrons have been proposed throughout literature [22, 23]. This paper uses a variational form of quantum circuits to implement the single-qubit quantum perceptron. A variational quantum circuit is divided into three parts, initialization of the qubits, parameterised unitary matrix U (x, ) and measurement basis. 1. Initialization of Qubits—This is the first step to construct a variational circuit. Normally, qubits are initialised to the state |0, followed by various quantum gate operations. 2. Parameterised Unitary Matrix—The next stage is to define the parameterised quantum circuit, U (x, ) where x is the input feature vector while is the set of all the trainable parameters. The data to the circuit is given through the quantum embedding methods which may be either in the form of basis embedding, amplitude embedding or rotational gate embeddings [24]. Rotational gates-embedding has been used in this paper. The quantum circuit consists of a parameterised circuit made up of a collection of rotational gates.
140
R. Chaurasiya et al.
Fig. 2 Proposed structure of single-qubit quantum perceptron
3. Measurement Basis—The final stage is the measurement of the qubits. The measurement of a qubit is done using different basis, which may be either Pauli-Z, Pauli-X or Pauli-Y basis [25]. In this study, Pauli-Z basis of measurement has been used. The single-qubit quantum perceptron proposed has two initial Rx gates which acts similar to the input nodes, and further, hardamard gate H has been added to incorporate superposition. The circuit ends up with Rz (θ3 ) and R y (θ4 ) gates. These are two ansatz chosen for the model. Each ansatz represents a layer. Since there are four rotational gates, the number of parameters of the circuit becomes four, = [θ1 , θ2 , θ3 , θ4 ], where θ1 and θ2 are the two inputs features to the model from the data while θ3 and θ4 act as trainable parameters which are randomly initialised. The proposed single-qubit quantum perceptron is shown in Fig. 2. The final state vector of the qubit in this single-qubit perceptron has been shown in Eq. (14) θ +θ +θ θ +θ +θ 1 e−i( 1 22 3 ) cos θ24 + ei( 1 22 3 ) sin |ψ = √ θ1 +θ2 +θ3 θ1 +θ2 +θ3 2 e−i( 2 ) cos θ24 − ei( 2 ) sin
θ4 2 θ4 2
(14)
The final state is in the superposition of |0 and |1. Furthermore, expectation values of |0 and |1 are given as follows. < Z |0 >=
1 + sin θ4 2
(15)
< Z |1 >=
1 − sin θ4 2
(16)
After measurement, we obtain the expectation value of either of the states. Furthermore, the classical optimizer optimizes this value in order to minimize the cost function. Since the circuit involves gate matrices and an equivalent single gate can be created as the matrix product of all the gate taken, the circuit is non-linear. This allows the model to even solve non-linear problems like XOR logic.
A Single Qubit Quantum Perceptron for OR and XOR Logic
141
6 Experimental Results This section describes how the proposed single-qubit quantum perceptron has been implemented and trained on the Xanadu’s quantum simulator known as Pennylane [26].
6.1 Dataset The data used in this study has been self-generated for both XOR logic and OR logic [27]. Instead of using a discrete dataset like the truth table of these two gates, analytical models of OR and XOR logic have been used. The analytical representation which has been used to create the dataset is shown below. y(a, b) = a + b − 2ab
(17)
y(a, b) = a + b − ab
(18)
Equation (17) has been used for XOR logic while Eq. (18) has been used for OR logic. The values of inputs a and b have been varied from 0 to 1 to generate the dataset. Since these two functions use real values instead of binary numbers, it allows one to generate multiple combinations of real numbers as inputs. While binary numbers give only four possible combinations for two input features, this method generates more data from the infinite pool of real numbers between 0 and 1 to train the proposed single-qubit perceptron. For the test set, Tables 3 and 4 have been used for the OR and XOR logic, respectively.
Table 3 Data table for OR gate x1 x2 0.842750488 0.611782709 0.983628329 …
0 0.123000231 0.177110005 …
Table 4 Data table for XOR gate x1 x2 0.727721921 0.513446823 0.951059186 …
0 0.968408788 0.653881237 …
y 0.842750488 0.659533525 0.986527916 …
y 0.727721921 0.48740278 0.361180908 …
142
R. Chaurasiya et al.
6.2 Simulation Results Mean squared error [28] has been used to calculate the cost, given by cost =
|yp − y|2 N
(19)
Here, yp is the predicted label, y is the output of the proposed single-qubit perceptron, and N is the length of the dataset. The model has been trained for 1000 iterations. The study has been done against three classical optimizers used to update the parameters of the single-qubit quantum perceptron. These optimizers are gradient (learning rate = 0.01), Adam (learning rate = 0.01) and momentum (momentum = 0.9) optimizers [29]. Also, a classical single-layered perceptron for a comparative study for OR logic and a two-layered classical perceptron for XOR logic. The cost versus iteration curves have been shown in Figs. 3 and 4 for all the three optimizers and the classical method when trained for OR and XOR logic, respectively. Since the output of the quantum perceptron is an expectation value which is a real number, a threshold value equal to 0.5 has been chosen to classify the result as a binary output, yb , i.e. 0 or 1. Thus, if the proposed single qubit quantum perceptron’s output is below 0.5, then it has been considered as low or 0, otherwise the output is considered to be high or 1. This has been shown in the equation given. yb =
0 if yp < 0.5 1 if yp ≥ 0.5
(20)
The model has been able to give the correct outputs for the XOR operation, i.e. (0, 1, 1, 0) for the inputs pairs 00, 01, 10, 11, respectively, for all the optimizers. The model for the OR operation is also found to yield correct responses, i.e. (0, 1, 1, 1) for the input pairs 00, 01, 10, 11, respectively.
Fig. 3 Cost versus iteration curve when single-qubit quantum perceptron has been trained on OR logic
A Single Qubit Quantum Perceptron for OR and XOR Logic
143
Fig. 4 Cost versus iteration curve when single-qubit quantum perceptron has been trained on XOR logic
For the XOR gate, Fig. 4 shows that the proposed single-qubit quantum perceptron reaches the global minimum earlier than a classical network trained on the same dataset. This demonstrates that a single-qubit quantum perceptron function can find the minima faster than the classical approach since the classical methodology takes a larger number of iterations to converge. The same results have been obtained for the OR gate as well which can be seen in Fig. 3.
7 Discussion In this study, a single-qubit perceptron has been proposed which has been trained to perform OR and XOR logic. The study compares the single-qubit quantum perceptron with the classical single-layered perceptron. In contrast to the classical single-layer perceptron, the proposed single-qubit quantum perceptron for OR or XOR logic has been found to converge faster. Also, from Figs. 3 and 4, it was evident that the momentum optimizer has better convergence followed by Adam and then the gradient descent optimizers when used for the proposed single-qubit quantum perceptron.
8 Conclusion This study demonstrates as to how a quantum perceptron can leverage the power of quantum mechanics and speed up the training process by using a single-qubit perceptron by increasing the rate of the convergence. The future work in this direction may include training the quantum perceptron on real quantum hardware through cloud platforms. Integration of error mitigation techniques [30] can also be done to reduce the error due to noise. There is also a scope of improving the quantum perceptron model used by increasing the number of qubits and increasing the complexity of the circuit. The authors are currently engaged in these directions.
144
R. Chaurasiya et al.
References 1. W. Schiehlen, Computational dynamics: theory and applications of multibody systems. Eur. J. Mech. A/Solids 25(4), 566–594 (2006) 2. K.L. Brown, W.J. Munro, V.M. Kendon, Using quantum computers for quantum simulation. Entropy 12(11), 2268–2307 (2010) 3. T.R. Bromley, J.M. Arrazola, S. Jahangiri, J. Izaac, N. Quesada, A.D. Gran, M. Schuld, J. Swinarton, Z. Zabaneh, N. Killoran, Applications of near-term photonic quantum computers: software and algorithms. Quant. Sci. Technol. 5(3), 034010 (2020) 4. R. Orus, S. Mugel, E. Lizaso, Quantum computing for finance: overview and prospects. Rev. Phys. 4, 100028 (2019) 5. J. Preskill, Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018) 6. N. Wiebe, A. Kapoor, K. M. Svore, Quantum perceptron models. arXiv preprint arXiv:1602.04799 (2016) 7. F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958) 8. P. Sharma, A. Singh, Era of deep neural networks: a review, in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (IEEE, 2017), pp. 1–5 9. J. Vamathevan, D. Clark, P. Czodrowski, I. Dunham, E. Ferran, G. Lee, B. Li, A. Madabhushi, P. Shah, M. Spitzer et al., Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18(6), 463–477 (2019) 10. T. Lane, C.E. Brodley, An application of machine learning to anomaly detection, in Proceedings of the 20th National Information Systems Security Conference, vol. 377 (Baltimore, 1997), pp. 366–380 11. K. Worden, G. Manson, The application of machine learning to structural health monitoring. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 365(1851), 515–537 (2007) 12. L. Cui, S. Yang, F. Chen, Z. Ming, N. Lu, J. Qin, A survey on application of machine learning for internet of things. Int. J. Mach. Learn. Cybern. 9(8), 1399–1417 (2018) 13. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016) 14. Z. Yanling, D. Bimin, W. Zhanrong, Analysis and study of perceptron to solve XOR problem, in The 2nd International Workshop on Autonomous Decentralized System (IEEE, 2002), pp. 168–173 15. R.P. Feynman, T. Hey, R.W. Allen, Feynman Lectures on Computation (CRC Press, 2018) 16. P.W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41(2), 303–332 (1999) 17. D. McMahon, Quantum Computing Explained (Wiley, 2007) 18. D. Sych, G. Leuchs, A complete basis of generalized bell states. New J. Phys. 11(1), 013006 (2009) 19. P.B. Sousa, R.V. Ramos, Universal quantum circuit for n-qubit quantum gate: a programmable quantum gate. arXiv preprint quant-ph/0602174 (2006) 20. H.M. Wiseman, G.J. Milburn, Quantum Measurement and Control (Cambridge University Press, 2009) 21. S. Kurgalin, S. Borzunov, Concise Guide to Quantum Computing: Algorithms, Exercises, and Implementations (Springer Nature, 2021) 22. E. Farhi, J. Goldstone, S. Gutmann, A quantum approximate optimization algorithm. arXiv preprint arXiv:1411.4028 (2014) 23. G.G. Guerreschi, M. Smelyanskiy, Practical optimization for hybrid quantum-classical algorithms. arXiv preprint arXiv:1701.01450 (2017) 24. S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, N. Killoran, Quantum embeddings for machine learning. arXiv preprint arXiv:2001.03622 (2020) 25. T. Hey, Quantum computing: an introduction. Comput. Control Eng. J. 10(3), 105–112 (1999)
A Single Qubit Quantum Perceptron for OR and XOR Logic
145
26. V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M.S. Alam, S. Ahmed, J.M. Arrazola, C. Blank, A. Delgado, S. Jahangiri et al., Pennylane: automatic differentiation of hybrid quantum-classical computations. arXiv preprint arXiv:1811.04968 (2018) 27. M. Morris, R. Mano, C.R. Kime, Logic and Computer Design Fundamentals, 2nd edn. updated (Pearson, 2001) 28. G.M. James, Variance and bias for general loss functions. Mach. Learn. 51(2), 115–135 (2003) 29. J. Pomerat, A. Segev, R. Datta, On neural network activation functions and optimizers in relation to polynomial regression, in 2019 IEEE International Conference on Big Data (Big Data) (IEEE, 2019), pp. 6183–6185 30. Y. Suzuki, S. Endo, K. Fujii, Y. Tokunaga, Quantum error mitigation for fault-tolerant quantum computing. arXiv preprint arXiv:2010.03887 (2020)
Societal Gene Acceptance Index-Based Crossover in GA for Travelling Salesman Problem Ravi Saini , Ashish Mani, and M. S. Prasad
Abstract Selection of genetic material based on sociocultural inheritance in nature is the central theme of dual inheritance theory (DIT) or gene-culture co-evolution theory. Recent studies attribute cases of lactase persistence, high-altitude adaptation in Tibetans, adaptation of heat-generating body fat in cold areas etc. to such inheritance. In DIT, societal and Darwinian selection interact in a closed loop to select attributes. Genetic algorithms (GA) are based on Darwinian theory, and we propose to embrace the concept of DIT in GA by selecting attributes from parents based upon Societal Gene Acceptance Index (SGAI). This paper introduces an SGAI crossover for travelling salesman problem (TSP). This new crossover retains the attributes of the parent genetic structure, while being guided by the society for selection of genes. Also, a framework is presented to compare the new crossover and existing crossovers in an environment, amalgamating the established ideas of social disaster techniques (SDT) and no duplicate policy to achieve novelty search (NS). Testing of crossovers in the framework is done on well-known TSP problems presented in TSPLIB. Statistical evaluation of results shows dominance of proposed crossovers in the test cases. Keywords Travelling salesman problem · Genetic algorithm · Crossover · Dual inheritance theory · Social disaster techniques · Novelty search · No duplicate policy
R. Saini (B) Department of Computer Science and Engineering, ASET, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] A. Mani Amity Innovation and Design Centre, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] M. S. Prasad Amity Institute of Space Science and Technology, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_12
147
148
R. Saini et al.
1 Introduction Travelling salesperson problem (TSP) is a simple optimization model that has fascinated mathematicians, philosophers and computer scientists for more than last nine decades. Academic history of TSP problem is beautifully elucidated in [1]. Simply stated, the problem desires to find the shortest tour in which the salesman starting from his home town should visit n cities and then return to home city. A generalised symmetric “n” node TSP problem can be presented for undirected graph G = (V, E) in the following mathematical model: min
n n
ci, j xi, j
(1)
i=1 j=1
With following constraints: n
xi, j = 1 : j ∈ V, i = j
(2)
xi, j = 1 : i ∈ V, j = i
(3)
xi, j ≤ |S| − 1 : S ⊂ V, 2 ≤ |S| ≤ n − 2
(4)
i=1 n j=1
i, j∈S
xi, j = 0 or 1 : ∀(i, j)
(5)
where ci, j = Cost of visiting from city i to city j xi, j =
1, if edge exits between i, j 0, otherwise
S = Any non empty proper subset of V
In general literature, Eqs. 2 and 3 are referred as the degree constraints, Eq. 4 is subtour elimination constraints and Eq. 5 is integrality constraints. TSP is unique combinatorial optimization problems, which is very easy to describe, but very complex to solve. Problem having been proven NP-hard [2], it may not be feasible to find a polynomial time optimization algorithm with respect
Societal Gene Acceptance Index-Based Crossover in GA …
149
to n nodes. Further, with the proof of no free lunch theorems for optimization [3], plethora of techniques have been applied to solve TSP problem [4] including exact method, approximate algorithm [5] and heuristic [6]. Using dynamic program on a specific category of TSP problems, time and space complexity has been calculated as O(2n n O(1) ) [7]. In these circumstances, bio-inspired algorithms have attracted lot of attention for solving TSP [8, 9]. Three bio-inspired approaches dominating the research to solve TSP are generic algorithm (GA) [10], ant colony optimization (ACO) [11] and particle swarm optimization (PSO) [12]. This paper shall concentrate on GA towards forging an environment for improving efficiency using existing paradigms and propose an improved crossover for generating better tours for TSP. Section 2 of the paper describes an overview of GA. Section 3 explains the proposed crossover technique. Section 4 elucidates the experimentation framework and parameters. The proposed crossover is tested against existing crossovers using standard library of problems, and the results are discussed in Sect. 5. Finally, the paper’s conclusion is given in Sect. 6.
2 Overview 2.1 Background GA are based on Darwinian principles [13] and provides a robust mechanism for search and optimization [14] that is inherent in NP-complete problems. These algorithms maintain candidate solutions as a population that evolves until a predefined criteria is satisfied. The candidate solutions are called chromosome (or genome). This population is guided using crossover and mutation to explore the search space and find the optimal solution based on fitness function.
2.2 GA For using GA to solve TSP, each feasible tour (chromosome) is represented by permutation: π : [1 . . . n] → [1 . . . n], which is called an adjacency representation [15]. The tours are generated randomly to form the initial population. Chromosomes in the population are evaluated by the tour distance, and the tour with minimum distance is designated as the fittest chromosome. ⎛
⎞
fitness = ⎝
n−i i=1
1
⎠ d(cπ(i) , cπ(i+1) ) + d(cπ(n) , c(π(1) )
(6)
150
R. Saini et al.
where chromosome = cπ(1) , cπ(2) , cπ(3) , . . . , cπ(n)
d(ci , c j ) = distance from ci to c j Subsequent populations are generated by application of GA operators, viz. parent selection, crossover, mutation and survivor selection as presented in Algorithm 1 [16]. This process of generating populations is repeated till a predefined termination condition is satisfied. With each generation, the solution is improved in order to find the best tour(s), thereby making GA a continual population-based improvement algorithm. Algorithm 1 Basic GA 1: procedure BasicGA() 2: population ← random candidate solutions 3: population Fitness ← f itness(each candidate); 4: while (terminationCondition is pending) do 5: while (new Population < populationSi ze) do 6: par ents ← select Par ents( population) 7: child ← cr ossover ( par ents) 8: child ← mutate(child) 9: new Population ← new Population + child 10: end while 11: new Pop Fitness ← f itness(new Population); 12: population ← select Sur vivor s(new Population, , population); 13: end while 14: end procedure
2.3 GA Operators The GA operates following functions on the population to find optimal solutions: • Termination Condition. This is the predetermined condition, which decides when the GA is to be stopped. Once GA is terminated, the fittest chromosome is reported as the best solution. Some common criterion used to determine the termination condition are [16]: – Maximum number of allowed CPU cycles have been utilised. – GA has reached maximum allowed generations. – No improvement has been observed in fitness under a threshold value for a given number of generations. – Population diversity has reduced below a designated threshold value.
Societal Gene Acceptance Index-Based Crossover in GA …
151
• Select Parents. This operator selects chromosomes to mate and generate new children. Most GA implementations mate two chromosomes at one time; however, multi-parent GA implementations have also been proposed. Common schemes for selection of parents are: – Fitness Proportional Selection (FPS). Here, the fitter chromosomes have higher probability of being selected as parents. – Random Selection. This is a fair selection, where every chromosome has equal chance of being a parent. – Ranking Selection. Here, the chance of selection of the chromosome to be a parent is based on its ranking in the population. • Crossover. Crossover operator generates a valid tour (child) which in general includes genes (edges) of parents. For survey on crossover operators, reader may refer to [17]. Widely applied crossover operators for TSP are: – Order Crossover (OC) [18, 19]. Two crossover points are selected at random. In the offspring, for the positions that fall between the two cross points, alleles are copied from parent1 without any changes. For remaining locations, nonduplicative alleles are copied from parent2 to the offspring beginning at the position following the second cross point circularly. The offspring in this case inherits the order and position of symbols in a sub-sequence of the first parent and also preserves the relative order of the remaining symbols from the second parent [20]. – Modified Order Crossover (MOX) [21]. Here, initially, OX is applied with two crossover points and child created. Thereafter, the child alleles from start to first cross and from second cross point to end of offspring are flipped. – Cycle Crossover (CX) [22]. Like OC, CX also has two crossover points, and within these cross points, alleles are copied from parent1 without any changes. For the rest of the positions, alleles are taken from parent2, and the position of each allele is inherited from either of the two parents. – Partially Mapped Crossover (PMX) [14]. The offspring created by PMX preserves the order and position of symbols in a subsequence of parent1, while for parent2, order and position of the remaining symbols are preserved to the maximum extent feasible. • Mutation. Like crossover, this is also a diversity operator that is applied to one chromosome and results in a new chromosome. The new chromosome generally replaces the original. Many mutation operators are proposed in literature, and in this study, we apply simple inversion mutation (SIM) [10]. For application of SIM, two points are randomly selected in a chromosome, and position of all alleles between these points are reversed. This mutation is one of the basis for famous 2-opt heuristic [23]. • Select Survivor. After new population has been generated, this operator select the survivors. It is also called the replacement mechanism. This operator along
152
R. Saini et al.
with Select Parents is responsible for maintaining pressure on the population to improve fitness of the population. Survivor selection can be based on fitness, age, diversity or any other suitable criterion.
2.4 GA Parameters Perusal of basic GA presented in Algorithm 1 shows the dependence of output on the design decisions. These decisions are primarily towards deciding parameters [15, 25]. While the number of parameters may vary based on the specific implementation, most important parameters in GA are: • Population Size (μ). This parameter defines the number of chromosomes in each population. • Mutation Rate ( pm ). This is the probability of mutation operator being applied to an individual chromosome. This is the safeguard to prevent GA from converging to local optima.
3 Proposed Crossover 3.1 Motivation GA framework attempts to learn from nature and implement natural selection in computer algorithms to achieve results. Recent, studies in natural biological systems indicate a major role in culture and society towards evolution, and this is proposed as dual inheritance theory (DIT) [26]. The theory propounds that there exits a continual feedback loop between genetic selection and culture (society). Therefore, society has influence on genetic selection and vice versa. Profound evidence of DIT has been demonstrated in following examples: • Lactase Persistence. Inability to produce enzymes for digestion of milk is normal behaviour in all mammals, including human. However, based on cultural development, humans developed lactase persistence. Based on DIT, strong evidence has been provided through computer simulations of co-evolution of lactase persistence and the practice of dairying in society [27]. • Heat Generating Allele. WARS2 and TBX15 allele have much higher selection rate in societies living in extreme cold areas. This genetic preference in societies living in cold areas promotes heat-generating body fat as a safeguard from extreme temperature [28]. • Tibetian Adaptation [29]. Tibetans as a society have higher selection of several genes that sustain individual with relatively lower amount of oxygen. EPAS1 allele
Societal Gene Acceptance Index-Based Crossover in GA …
153
is one of the most significant examples, which relates to haemoglobin production and spread through 40% high-altitude Tibetans about 3000 years ago. We derive our motivation based on following key concepts: • DIT. The proposed SGAI operator in this section is based on implementation of DIT concepts with Darwinian principles. • Social Disaster Techniques (SDT). We undertake multiple iterations of GA to generate better solutions using SDT. • No Duplicate Policy (NDP). Coupled with SDT, we shall introduce NDP in the next section towards generation of multiple unique solutions. This experimentation framework ensures novelty of solution generated by each iteration of GA.
3.2 SGAI Based on DIT, we propose Societal Gene Acceptance Index (SGAI). This index is generated for each gene prior to crossover and provides preference of the society for selection of each allele (edge) during crossover. Towards calculation of this index, the population is divided into three categories: • Elites. These are the fittest chromosomes in the population. • Dregs. These are the most unfit chromosome in the population. • Others. All chromosomes in population, which are neither elites nor dregs. For the purpose of above categorisation, we introduce a parameter elitePercent, which decides the percentage of population which is designated as elite. The same percent of worst chromosome is designated as dregs. SGAI for edge (i, j) increments each time the edge appears in elites (Eq. 7) and decrements for each appearance of the edge in dregs (Eq. 8). Finally, the index is normalised between [0, 1] by applying Eq. (9). edgei, j : edgei, j ∈ elite (7) edgeInElitei, j = edgeInDregi, j =
edgei, j : edgei, j ∈ dreg
edgeInElitei, j − edgeInDregi, j ÷2 SGAI[i, j] = 1 + number of elites
(8) (9)
For symmetric TSP: SGAIi, j = SGAI j,i Method for calculation of SGAI, as discussed above, is presented in Algorithm 2.
154
R. Saini et al.
Algorithm 2 SGAI generator 1: procedure SGAIGenerator(elites, dr egs) 2: n ← len(elites[0]) 3: SG AI ← array(n ∗ n) = 0 4: for edgei, j ∈ {elites} do 5: SG AI [i, j] ← SG AI [i, j] + 1 6: end for 7: for edgei, j ∈ {dr egs} do 8: SG AI [i, j] ← SG AI [i, j] − 1 9: end for SG AI [i, j] 10: SG AI [i, j] ← (1 + len(elites) )÷2 11: return SG AI 12: end procedure
number of nodes Initialse n × n array count edgeInElite count edgeInDregs normalisation return array
3.3 SGAI Crossover The proposed SGAI crossover operates on two parents as presented in Algorithm 3. The crossover takes SGAI array and two parents as input. In line 2 and 3 of Algorithm 3, the number of nodes are recorded in variable n, and the child is initialised a with a random node, while loop in line 4 ensures that the algorithm always produces a complete child (length = n). Algorithm 3 SGAI crossover 1: procedure SGAICrossover(SG AI, par ent) 2: n ← len( par ent[0]) 3: child ← [rand[0, n]] 4: while len(child) < n do 5: nodes ← {N : child[ f ir standlastnode]} 6: edges ← {{E i, j : i ∈ nodes} ∧ {E i, j ∈ par ent}} 7: valid Edges ← {egdes \ {E i, j ∈ child}} 8: if valid Edges = {∅} then 9: select Edge ← max(SG AIvalid Edges ) 10: else 11: edges ← {{E i, j : i ∈ nodes}} 12: valid Edges ← {egdes \ {E i, j ∈ child}} 13: select Edge ← max(SG AIvalid Edges ) 14: end if 15: child ← child + {valid Edges} 16: end while 17: return child 18: end procedure
Number of nodes
return child of two parent
With each iteration of while loop, one edge is added to the child. Edge can be added to the child only in the beginning or end. This ensures that once an edge is added in the child, it is not altered by subsequent iterations. First and last nodes of child are designated to variable nodes in line 5 of the proposed algorithm. Two options for selection of edge to be added to the child are as follows:
Societal Gene Acceptance Index-Based Crossover in GA …
155
Fig. 1 Feedback between genetic selection and society
• Firstly, edges present in either or both parents with nodes designated in line 5 are considered eligible (line 6 of Algorithm 3). Out of these edges, only those are selected as valid edges, which have atleast one node that is not exiting in the child. • In case no valid edge is found in above-mentioned steps, then dependency on presence of edge in parents is relaxed. Thus, lines 11 and 12 of Algorithm 3 select all edges with node designated in line 5, as valid edges, as long as they have atleast one node that is not exiting in the child. From the set of valid edges, an edge with max SGAI is selected to be added to the child (line 9 and 13 of Algorithm 3). In case multiple edges have same value, then one of them is selected randomly. Line 15 of the algorithm adds a node to the beginning or end the child to ensure that the selected edge is created. Algorithm 3 at termination returns a child, with alleles (edges) preferably selected from either parent based on acceptance of each allele by the society (SGAI). Study of Algorithms 2 and 3 shows the continual feedback loop between genetic selection and society (culture) as propounded by DIT. Explicitly, this feedback loop is shown with red flow-lines in Fig. 1.1
4 Experimentation Framework and Parameters 4.1 Framework For a fair comparison of proposed crossover with the existing crossovers, a common framework is proposed in this section. Towards improving general GA presented in 1
Figure 1 shows only portion of GA relevant to highlight the feedback loop in proposed crossover.
156
R. Saini et al.
Algorithm 1, following features from existing literature are included in the framework: • Social Disaster Techniques (SDT) [30]. SDT was proposed as a method to prevent premature convergence in GA. In this technique, the GA is restarted whenever a particular condition is met through application of a catastrophic operator. This mimics natural catastrophic events in the nature. In our implementation, we apply the SDT operator within terminationCondition as presented in Algorithm 4. The terminationCondition operator takes following three inputs: – generationNo: This is the current generation number since start of GA. – SDTNo: This is the number of times SDT has been applied to GA. – improveGen: This is the number of generations since last improvement was observed after SDT (or since start of GA, if SDTNo = 0). terminationCondition operator decides either to stop the GA, apply SDT or continue GA based on comparison of above inputs with following parameters: – maxGenerations: Maximum number of generations allowed in GA. – maxSDT: Maximum number of SDT allowed in GA. – terminationThreshold: This threshold value defines the maximum number of generations, within which if the fitness value of the best chromosome does not improve, then either SDT is applied or GA is terminated. Upon application of SDT, the last population of GA is saved as ancestors, and GA continues with new randomly generated chromosomes. At termination of GA, best tours from each SDT are available as ancestors. Ancestors are also used in application of novelty search for each SDT, as will be discussed in “No Duplicate Policy”. • No Duplicate Policy (NDP). Duplicates in population tend to reduce the diversity of the population and diminish ability of GA to explore. Here,we propose a technique equivalent to non-revisiting same solution [31] in genetic algorithms to avoid degeneration of population due to many duplicated solutions. Elimination of duplicate in TSP is complex as same tour may have multiple “reflections” and “rotations”, as shown in three tours below: Tour 1 = 1 − 2 − 3 − 4 − 5
Tour 2 = 5 − 4 − 3 − 2 − 1
Tour 3 = 3 − 2 − 1 − 5 − 4
Therefore to effectively eliminate duplicates we align all chromosomes = cπ(1) , cπ(2) , cπ(3) , . . . , cπ(n) , such that: cπ(1) = 1 cπ(2) < cπ(n)
Societal Gene Acceptance Index-Based Crossover in GA …
157
This method ensures that all duplicate tours have same alleles in every location. To further make the implementation easy, each aligned tours is converted to string and duplicate strings eliminated using sting matching.2 NDP is applied in two stages in selectSurvivors operator as discussed below: – NDP in Population. Prior to selection of survivors, duplicates are eliminated from the combined population of parents and offspring. This ensure that only unique chromosomes get selected in next population. – NDP Between SDTs. Further, to ensure novelty of solution in each SDT, all copies of ancestors are eliminated from current SDT prior selectSurvivor operator. This forces every SDT to search for a solutions that are different from solutions obtained by previous SDT runs.
Algorithm 4 Termination condition 1: procedure terminationCondition(generation N o, S DT N o, gen Last I mpObs) 2: generation N o ← generation N o + 1 3: if generation N o ≥ max Generations then 4: ancestor s ← ancestor s + last population 5: StopG A 6: else if impr oveGen ≥ terminationT hr eshold then 7: S DT N o ← S DT N o + 1 8: ancestor s ← ancestor s + last population 9: if S DT N o ≥ max S DT then 10: StopG A 11: else 12: population ← randompopulation 13: ContinueG A 14: end if 15: else 16: ContinueG A 17: end if 18: end procedure
SDT operation
4.2 Operators and Parameters Set of operators and parameters based on GA and the frameworks discussed above are listed below: • μ = 200 • pm = 0.3 2
It is cheaper to eliminate duplicate string in Java and Python, than to match each node of tour to identify duplicates.
158
• • • • • • •
R. Saini et al.
elitePercent = 20% maxGAinteration = 5000 maxSDT = 5 terminationThreshold = 100 mutation = SIM parentSelection = Random selectSurvivor = fittest of ( par ents + o f f spring) [with NDP].
4.3 Experimentation Benchmark problems from TSPLIB [32] were used to evaluate the performance of crossover operator in the GA framework as explained above. During experimentation, all assumptions regarding distance matrix as per TSPLIB documentation are adhered. The results presented in next section are as based on 25 runs of GA on each problem for each crossover operator. The algorithms for experimentation were implemented in Python on Linux OS.
5 Results 5.1 Readings The values obtained from experiment are presented in Table 1. Following values are shown in the table for each TSP problem with application of five crossovers: • Average Tour (tour). This is the average tour length for each problem, in 25 runs. This value has been rounded to nearest integer, only to fit the table in the page setting. Least value reported for each problem is shown in bold font. • Standard Deviation (σ ). This is the value of standard deviation from the average tour in 25 runs. • Relative Error (re). This is the calculated relative error based on the known optimal value using Eq. 10. re =
Solution Value − Optimal Value Optimal Value
× 100
(10)
21,627
22,218
22,910
665
kroA100
kroC100
kroD100
kroE100
eil101
5.08
298.15
416.08
337.49
263.99
123.61
22.93
850.65
5.76
3.81
4.34
4.23
3.21
5.15
5.55
3.85
MOX
19.25
33.25
207.15
8.23
153.95
8.64
σ
25.35
24.9
6.72
7.75
7.98
0.58
re
878
35,334
32,343
33,437
32,498
12,279
2000
84,106 55.23
65.14
21.69
39.52
1504.49 60.11
1358.24 51.89
1443.49 61.15
1649.76 52.7
499.11
61.06
3844.75 52.34
140,639 5571.93 30.03
674
843
8049
459
11,477
1619
tour
of tour have been rounded to nearest integer, to fit the table
21,964
rd100
a Value
1278
8317
rat99
57,336
gr96
3.8
1.75
0.23
110,272 1008.63 1.95
3.58
7.6
50.81
1.8
pr76
7559
berlin52
3.08
0.69
687
434
eil51
52.56
0
re
558
10,701
att48
0
OX
il76
1610
bayg29
σ
st70
tour
Problem
Crossover
Table 1 Record of tours obtained for TSP problems
980
36,561
36,436
36,246
36,742
13,111
2054
86,713
154,322
746
927
8558
512
12,028
1664
tour
Cyclic
45.28
2519.98
2657.68
2728.05
2795.11
921.70
91.41
7353.48
10,035.6
49.91
57.06
260.49
17.04
293.57
29.48
σ
55.87
65.67
71.11
74.69
72.65
65.75
69.65
57.06
42.68
38.64
37.39
13.47
20.23
13.17
3.33
re
PMX
8.71
13.38
164.53
3.71
116.53
5.52
σ
5.2
3.78
2.78
2.9
1.83
0.27
re
680
23,612
23,163
22,545
23,003
8609
1317
58,730
11.32
398.72
490.39
426.22
492.87
132.24
23.60
947.06
8.17
7.00
8.78
8.66
8.09
8.84
8.79
6.38
112,344 1745.17 3.87
566
701
7752
438
10,822
1614
tour
SGAI
4.28
4.81
150.44
2.34
54.14
2.56
σ
658
22,671
22,009
21,420
21,850
8222
1251
56,588
5.63
214.82
243.24
383.35
295.45
121.98
15.25
574.99
109,920 938.69
551
687
7687
431
10,696
1611
tour
4.62
2.73
3.36
3.23
2.67
3.95
3.29
2.5
1.63
2.44
1.71
1.93
1.1
0.64
0.04
re
Societal Gene Acceptance Index-Based Crossover in GA … 159
160
R. Saini et al.
Fig. 2 Plot of relative error and standard deviation for crossovers
5.2 Plots For graphical appreciation of performance of each crossover, the line plots of relative error and standard deviation against the number of nodes are presented in Plots of Fig. 2.
5.3 Evaluation of Results Study of comparative values for each crossover in Table 1 and plot of these values in Fig. 2 indicates that all crossovers were able to produce close to the optimal tour length for problem bayg29.tsp. This is attributed to the experimental framework which incorporates SDT and NDP, allowing each crossover to perform efficiently. As the value of n increases, differences in performance of crossovers is highlighted. The best tours are always reported by either OX or SGAI crossover. Following test was applied to further investigate the results with statistical rigour: • Pairwise Victory-Draw-Defeat. Pairwise summary of Victory-Draw-Defeat with respect to tour between each pair of crossovers is presented in Table 2. Dominance of SGAI in producing better results, on the set of test case problems, is clearly demonstrated by the number of wins in last column of Table 2. Analysis show that SGAI produces better or equivalent results as compared to all other crossovers, except in two cases were OX generated marginally better tour.
Table 2 Pairwise victory-draw-defeat w.r.t. tour Crossover OX MOX Cyclic OX MOX Cyclic PMX
– – – –
15–0–0 – – –
15–0–0 15–0–0 – –
PMX
SGAI
15–0–0 0–0–15 0–0–15 –
2–1–12 0–0–15 0–0–15 0–0–15
Societal Gene Acceptance Index-Based Crossover in GA …
161
• Wilcoxon Signed Ranks Test [33]. This non-parametric test is employed for testing hypothesis in situations which involve a design with two samples and is used to detect significant differences between two sample means, indicating the behaviour of two algorithms. This test is applied here to compare SGAI crossover against other crossovers. Equal values (ties) for each pair of crossover are first removed, and the sum of positive (R + ) and negative ranks (R − ) are calculated based on winning value of tour. Thereafter, test value (T) is designated as: T ← min(R + , R − ) With significance level (α) shown in Table 3, comparison of Tcrit with test value (T) proves superiority of SGAI vis-à-vis other crossovers for the test cases.
Table 3 Pairwise Wilcoxon signed ranks test for SGAI Crossovers being Ties (a) R + (b) R − (c) T = min compared (b, c) SGAI-OX SGAI-MOX SGAI-cyclic SGAI-PMX
1 0 0 0
95 120 120 120
Table 4 Record of time to reach best tour Problem OX MOX bayg29 att48 eil51 berlin52 st70 eil76 pr76 gr96 rat99 rd100 kroA100 kroC100 kroD100 kroE100 eil101
86.12 413.2 713.83 706.39 906.07 710.28 840.12 949.08 1340.25 833.34 824.96 854.8 877.96 960.17 915.96
345.04 608 1093.5 1000.11 989.39 722.2 755.24 850.48 1071.75 810.86 783 761.88 876.76 886.73 851.2
10 0 0 0
10 0 0 0
Significance (α)
T crit for n = 15 − (a)
0.005 0.001 0.001 0.001
12 8 8 8
Cyclic
PMX
SGAI
682.36 751.08 1210.87 1026.07 1158.96 894.12 1003.52 1020.96 1267.35 983.00 937.16 1082.92 1072.12 1035.03 1030.84
303.24 382.28 792.97 1025.82 1179.89 763.92 799.44 865.68 1382.30 774.28 998.84 909.08 689.64 883.77 827.64
137.56 479.32 870.4 969.86 1229.71 1482.4 1768.96 3131.68 5523.25 3900.21 3826.4 3779.64 3387.64 4921.63 2931.6
162
R. Saini et al.
Fig. 3 Plot of average time for crossovers
5.4 Time Evaluation Average time (time) to reach best reported tour by each crossover on individual problem during 25 runs of experiment is presented in Table 4. These value are presented as a plot between (time) and number of nodes (n) in Fig. 3. Initially, SGAI tends to reach optimal solution faster than all other crossover. However, after n > 70, the time taken by SGAI crossover increases quickly. This increase in time is attributed to requirement of referring to SGAI index (n 2 array) during every crossover of parents. More research is required to have combined application of DIT in GA, without additional overhead of time in larger problems.
5.5 SGAI Array Progression During experimentation, values for SGAI array were also observed. Plot of two half SGAI arrays is presented in Fig. 4, with values of i and j on x-axis and y-axis, respectively. On z-axis of the figure, value of SGAI index is plotted for edge(i, j). Blue portions show SGAI values when array has been created from population initialised randomly at the beginning of GA. While, the orange portion represents SGAI values just prior to application of SDT operator. Following interesting observations can be inferred from Fig. 4: • The trench in middle of plots represent array where i = j and no value exits for SGAI. • Values of SGAI indicate that the society has no strong preferences at the beginning of GA. • Society has very strong acceptances and dislikes for genes just prior to the application of SDT.
Societal Gene Acceptance Index-Based Crossover in GA …
163
Fig. 4 Dual SGAI plots
• Observation of intermediate value of SGAI bring forth that the value of array tend to converge as the population converges. This validates the feedback mechanism highlighted in Fig. 1.
6 Conclusion Successful application of GA has been found in different fields. However, the dilemma between exploration versus exploitation, along with the fear of getting struck into local optima, poses a persistent threat. The proposal in this paper makes following salient contributions: • Juxtaposing DIT with GA provides a feedback mechanism for superior results with stronger convergence. • Novelty in solutions generated by combining SDT and NDP. This paper successfully introduces use of DIT principle with GA in a crossover operator, which has been proven to produce better results on the test cases. This paves the way for future work to find methods for combining DIT with GA while ensuring minimal time overheads.
164
R. Saini et al.
References 1. D.L. Applegate, R.E. Bixby, V. Chvatál, W.J. Cook, History of TSP computation, in The Traveling Salesman Problem: A Computational Study (Princeton University Press, 2006), pp. 93-128. http://www.jstor.org/stable/j.ctt7s8xg.7 2. C.H. Papadimitriou, The Euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977) 3. D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997) 4. G. Gutin, A.P. Punnen, The Traveling Salesman Problem and Its Variations, vol. 12 (Springer Science & Business Media, 2006) 5. R. van Bevern, V.A. Slugina, A historical note on the 3/2-approximation algorithm for the metric traveling salesman problem. Hist. Math. (2020) 6. A.H. Halim, I. Ismail, Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26(2), 367–380 (2019) 7. D. Eppstein, The traveling salesman problem for cubic graphs. J. Graph Algorithms Appl. 11(1), 61–81 (2007) 8. E. Osaba, X.-S. Yang, J. Del Ser, Traveling salesman problem: a perspective review of recent research and new results with bio-inspired metaheuristics, in Nature-Inspired Computation and Swarm Intelligence (Elsevier, 2020), pp. 135–164 9. R. Purkayastha, T, Chakraborty, A. Saha, D. Mukhopadhyay, Study and analysis of various heuristic algorithms for solving travelling salesman problem—a survey, in Proceedings of the Global AI Congress 2019 (Springer, Singapore, 2020) 10. J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (1975) 11. M. Dorigo, V. Maniezzo, A. Colorni, Ant system: optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 26(1), 29–41 (1996) 12. R. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science (IEEE, 1995), pp. 39–43 13. C. Darwin, On the Origin of Species by Means of Natural Selection (John Murray, London, 1959) 14. D.E. Goldenberg, Genetic Algorithms in Search, Optimization and Machine Learning (1989) 15. Y.-A. Zhang, M. Sakamoto, H. Furutani, Effects of population size and mutation rate on results of genetic algorithm, in 2008 Fourth International Conference on Natural Computation, vol. 1 (IEEE, 2008), pp. 70–75 16. A.E. Eiben, J.E. Smith, What is an evolutionary algorithm?, in Introduction to Evolutionary Computing (Springer, 2015), pp. 25–48 17. G. Pavai, T. Geetha, A survey on crossover operators. ACM Comput. Surv. (CSUR) 49(4), 1–43 (2016) 18. L. Davis, Applying adaptive algorithms to epistatic domains. IJCAI 85, 162–164 (1985) 19. L. Davis, Handbook of Genetic Algorithms (CumInCAD, 1991) 20. B. Fox, M. McMahon, Genetic operators for sequencing problems. Found. Genet. Algor. Elsevier 1, 284–300 (1991) 21. S.M. Abdel-Moetty, A.O. Heakil, Enhanced traveling sales-man problem solving using genetic algorithm technique with modified sequential constructive crossover operator. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 12(6), 134 (2012) 22. I. Oliver, D. Smith, J.R. Holland, Study of permutation crossover operators on the traveling salesman problem, in Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms, Massachusetts Institute of Technology, Cambridge, MA, 28–31 July 1987 (L. Erlhaum Associates, Hillsdale, NJ, 1987) 23. P. Larranaga, C.M.H. Kuijpers, R.H. Murga, I. Inza, S. Dizdarevic, Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artif. Intell. Rev. 13(2), 129–170 (1999)
Societal Gene Acceptance Index-Based Crossover in GA …
165
24. W. Banzhaf, The “molecular” traveling salesman. Biol. Cybern. 64(1), 7–14 (1990) 25. R. Greenwell, J. Angus, M. Finck, Optimal mutation probability for genetic algorithms. Math. Comput. Model. 21(8), 1–11 (1995) 26. L.L. Cavalli-Sforza, M.W. Feldman, Cultural Transmission and Evolution: A Quantitative Approach (Princeton University Press, 1981) 27. Y. Itan, A. Powell, M.A. Beaumont, J. Burger, M.G. Thomas, The origins of lactase persistence in Europe. PLoS Comput. Biol. 5(8) (2009) 28. F. Racimo, D. Gokhman, M. Fumagalli, A. Ko, T. Hansen, I. Moltke, A. Albrechtsen, L. Carmel, E. Huerta-Sánchez, R. Nielsen, Archaic adaptive introgression in tbx15/wars2. Mol. Biol. Evol. 34(3), 509–524 (2017) 29. T.S. Simonson, Y. Yang, C.D. Huff, H. Yun, G. Qin, D.J. Witherspoon, Z. Bai, F.R. Lorenzo, J. Xing, L.B. Jorde et al., Genetic evidence for high-altitude adaptation in Tibet. Science 329(5987), 72–75 (2010) 30. K. Melikhov, V. Kureichick, A. Melikhov, V. Miagkikh, O. Savelev, A. Topchy, Some new features in genetic solution of the traveling salesman problem, in Adaptive Computing in Engineering Design and Control’96 (ACEDC’96), 2nd International Conference of the Integration of Genetic Algorithms and Neural Network Computing and Related Adaptive Techniques with Current Engineering Practice (Citeseer, 1996) 31. S. Yansen, G. Neng, T. Ye, Z. Xingyi, A non-revisiting genetic algorithm based on a novel binary space partition tree. Inf. Sci. 512, 661–674 (2020). https://doi.org/10.1016/j.ins.2019. 10.016 32. G. Reinelt, Tsplib—a traveling salesman problem library. ORSA J. Comput. 3(4), 376–384 (1991) 33. J. Derrac, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
The Method of Neuro-fuzzy Calibration of Geometrically Distorted Images of Digital X-Ray Tomographs Sergey Gorbachev , Dmytro Shevchuk , Victor Kuzin , Siddhartha Bhattacharyya , and Wang Zhijian
Abstract Reviewed classification of digital radiography systems for the detection of X-rays, the main quality parameters of radiographic images Calibration is the process of segmentation of distorted image in a finite number of clusters in which each pixel of the image associated with a certain degree of conditioning with the crystal, which was struck by gamma ray. The physical foundations and interpretation of the neuron learning procedure in Kohonen fuzzy cellular neural networks have been investigated. On their basis, a method for automatic calibration of digital X-ray radiography systems is described, which is reduced to neuro-fuzzy segmentation of a distorted image into a finite number of clusters and proposed a neural networkbased technique for the automatic calibration of digital X-ray tomographs that do not require user intervention. To improve the calibration accuracy, Kohonen’s adaptive fuzzy clustering networks are used. The architecture of Kohonen’s three-layer fuzzy cellular neural network (CNN-SOM) is proposed. Keywords Calibration · Neural network · Al NetworksFlat-Panel Detector · Kohonen’s fuzzy cellular neural network
S. Gorbachev National Research Tomsk State University, Tomsk, Russian Federation e-mail: [email protected] D. Shevchuk (B) National Technical University of Ukraine “Kyiv Polytechnic Institute”, Kyiv, Ukraine V. Kuzin Russian Academy of Engineering, Moscow, Russian Federation 125009 S. Bhattacharyya Rajnagar Mahavidyalaya, Birbhum, India W. Zhijian Robotics Institute, Zhejiang University, Ningbo, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Bhattacharyya et al. (eds.), Intelligence Enabled Research, Studies in Computational Intelligence 1029, https://doi.org/10.1007/978-981-19-0489-9_13
167
168
S. Gorbachev et al.
1 Introduction The modern literature describes methods for automated calibration of digital X-ray tomographs, but they are usually designed for interactive correction of the results by the user and do not consider situations when the same pixel with various levels of membership may relate to several types (crystals) at once. The chapter proposes a neural network-based technique for the automatic calibration of digital X-ray tomographs that do not require user intervention. To improve the calibration accuracy, Kohonen’s adaptive fuzzy clustering networks are used [1–3], while a new concept of a neighbourhood and an additional specific training stage are introduced. According to the presented developments, modern digital radiography systems based on the principle of X-ray radiation detection can be divided into four types [4]. 1.
2.
Systems with digitizing X-ray images obtained using various types of X-ray image intensifiers (EOP, REOP) are used in cardiography, in subtraction angiography, as well as in targeted radiography. All these systems use an analog-todigital converter (the main part of which is a charge-coupler device—a CCD matrix), which is designed to convert analog signals into digital ones. The latter are recorded in the form of a digital image matrix. When receiving an image, the digital value of each pixel is transformed into a point of a certain brightness on the screen of a cathode-ray tube or into a certain shading density on a hard copy of the image. A CCD matrix is a plate on the surface of which there are photosensitive elements connected by an electric circuit with microcapacitors. The light from the X-ray image intensifier generates an electric current in the photosensitive elements, which charges the microcapacitor. The amount of this charge depends on the intensity of the light. The numerical values of the charge of each capacitor are recorded in the computer memory in the form of a digital matrix. Each individual light-sensitive element can be represented as a pixel. A matrix (2.5 × 4.25) cm can contain up to 50 million pixels. The experience of operating devices of this type has not been reflected in the literature to date, but according to the characteristics presented, it can be assumed that the image quality, operational and dosimetric parameters of these installations are at a good level. Digital radiography using storage phosphor systems ranks second in frequency of propagation. The method was developed in the early 80s, but only recently the technological and economic aspects of this system have been considered for wide clinical use. This method is based on fixing an X-ray image with a storage phosphor. A screen coated with such a phosphor looks like a conventional intensifying screen and functions in a similar way, storing information in the form of a latent image for later reading and displaying. A latent image on such a screen can persist for a long time (up to 6 h). During this period, the image can be read from the screen by the scanning system and reproduced by the cathode-ray tube. The latent image is read by an infrared laser, which sequentially, line by line, runs across the screen and stimulates the phosphor
The Method of Neuro-fuzzy Calibration of Geometrically …
3.
4.
169
(hence the name “stimulated phosphor system”). Under the action of the laser, the energy accumulated on the phosphors is released in the form of flashes of light. The phenomenon is called photostimulated luminescence. The glow, like that of conventional amplifying screens, is proportional to the number of X-ray photons absorbed by the storage phosphor. Flashes of visible light are converted by a photomultiplier tube into a series of electrical signals and then into digital signals. In latter form, a binary matrix reflects the brightness indicators of each pixel. The latent image remaining on the screen is erased by intense visible light illumination, after which the screen can be reused many times. Such matrices allow obtaining spatial resolution from 2.5-line pairs/mm (with a pixel size of 0.2 mm) to 5–6-line pairs/mm (with a pixel size of 0.1 mm). This provides 256 levels of greyscale brightness (8 bits/pixel). These technical characteristics of spatial resolution are in no way inferior to modern systems for conventional radiography. Digital radiography based on various semiconductor detectors is recognized as the most promising direction. It is believed that direct detection of X-ray radiation using a semiconductor detector working in direct communication with a computer can significantly improve the quality of a digital image. Many researchers see the ideal option for digital conversion in direct registration of X-rays with a full-scale solid-state conversion structure (matrix) capable of forming a digital image on an area (400 × 400) mm with at least 4000 × 4000 pixels and brightness gradations up to 12 bits. A matrix is a two-dimensional surface that is divided into cells. Each cell is capable of registering X-ray quanta “one by one”. The quantum that fell on the matrix is assigned to a specific cell and is summed up with the quanta previously accumulated by it (Fig. 1). The fourth type of digital X-ray systems is systems based on gas detectors. The digital signal in it is obtained by direct detection of X-ray quanta without an intermediate analog image. The X-ray flux is supplied to the investigated area in the form of a flat fan-shaped horizontal beam, which is formed by a diaphragm with a slit range from 0.5 to 2.0 mm.
The distribution of X-ray radiation that has passed through the object is recorded by a detector—a multi-wire proportional chamber (MPC). The chamber is a system Fig. 1 Rice. 1. Schematic representation of a two-layer solid matrix: 1—matrix; 2—scintillation layer; 3—semiconductor layer; 4—reader
170
S. Gorbachev et al.
Fig. 2 Rice. 2. Linear scanning system for digital chest radiography
filled with a mixture of gases (xenon and carbon dioxide), and positive and negative electric charges are applied to the anode and cathode of this system under high voltage. The camera picks up signals that are at least above the sensitivity threshold of the amplifier–discriminator, so that background radiation is not detected. The IPC contains 320 anode wires located at 1.2 mm from each other, which form 320 channels for recording impulses. To increase the spatial resolution, two adjacent amplifiers form an additional coincident channel through the selection circuit, which has an output to an axial counter. The co-channel pulses indicate that the ionization region is between two adjacent anode wires (Fig. 2). The information accumulated by the counters is transferred to the computer during the line exposure. After the end of the frame shooting, an image matrix (639 × 640) of numbers (maximum image size 639 × 750) is accumulated in the computer memory, which describes the distribution of radiation after passing through the patient’s body. The normalized image appears on the PC screen 5 s after the end of scanning.
2 Parameters of Radiographic Images’ Quality Radiographic image in accordance with the terminology is described by the following main parameters of image quality: 1.
2.
Basic spatial resolution SRb. Sometimes they use the concept of resolution, which is determined from the dependence of the contrast of neighbouring objects on the distance separating them. This relationship is called modulation transfer function (MTF) or contrast transfer function. The reciprocal of the distance separating adjacent objects is called the spatial frequency and is measured, by analogy with film, in line pairs/mm. Image blur is determined by geometric blur divided by magnification (projection blur) and detector blur described by the SRb.
The Method of Neuro-fuzzy Calibration of Geometrically …
3.
4. 5.
6.
171
Signal-to-noise ratio SNR. Normalization of the measured signal-to-noise ratio to the basic spatial resolution is necessary since the measured value of the signal-to-noise ratio increases as the square root of the detecting pixels’ area. Contrast/Noise Ratio CNR. In general, CNR depends on the signal-to-noise ratio of the detector and the effective absorption coefficient of the material. The detection sensitivity (contrast sensitivity Cs = 1/CNR, where CNR is the contrast/noise ratio) of a small change in the radiation thickness of the object w (due to the presence of a defect in the structure) is determined with the ratio of contrast (variation of signal intensity I) to the image noise level. The Contrast/Noise Ratio for w possibly to compute from the signal-to-noise ratio (SNR) of the image, taking into the count the absorption coefficient μ and the scattering coefficient k (equal to the ratio of the intensities of the scattered and primary radiation). Dynamic range (the range of radiation thicknesses of an object available for analysis on the identical film). As the value is the same for shots (restricted to the range of optical densities of 2–4.5 and μeff ), it is not considered a parameter for film radiography taken into account in the standards. At the same time, the large dynamic range of the CDS in a number of practical applications can be of decisive importance when choosing a detector.
3 Calibration of a Flat-Panel Detector Calibration of a flat-panel detector (along with the correction of “bad” pixels) is one of the most important operations to ensure that the signal-to-noise ratio and contrast sensitivity are significantly higher than those for other types of detectors (X-ray film and memory plates) [5]. Quite natural variations in the characteristics of sensors in the panel, inhomogeneities in the distribution of X-ray radiation, as well as the characteristics of the electronics, will cause some differences in the signals from different pixels of the panel. Calibration can completely correct images. Typically, calibration involves capturing images at full (brightfield), average (midfield) and zero (darkfield) dose loads. Darkfield imaging is used to obtain the baseline “dark” signal from the detector, which is determined by photodiode currents, TFT leakage currents and the difference between the various charge amplifiers used in readout electronics. The brightfield and midfield images are used to calculate the gain, or response, of each pixel, and the associated pickup amplifier. One of the sources of noise in the detector is correlated linear noise, which is noise inherent in all pixels of a given line simultaneously. Modern software tools make it possible to carry out a procedure for correcting this noise, which should lead to the minimization of variations due to this noise. For this, a portion of the panel’s sensors is masked from the scintillation screen and thus do not receive a light signal during the X-ray exposure. The signal from this part of the panel is correlated with the dark field image to determine the correction that should be made to each line. The resulting calibration images are used to normalize the pixel response.
172
S. Gorbachev et al.
In the latest publications [6–8], we can see all the spector of calibration methods. Some of them include arbitrary geometry [9] or direct determination of geometric alignment parameters [10]. There are lots of cone-beam calibration methods (CBCM) which include different modifications: for misaligned scanner [11], analytic method based on the identification of ellipse parameters [12], CBCM without dedicated phantoms [13] and with self-calibration of geometric and radiometric parameters [14]. Other methods use for scans of unknown objects using complementary rays [15], for simultaneous misalignment correction for approximate circular cone-beam computed tomography [16] and for tomographic imaging systems with flat-panel detectors [5]. All these methods are mostly for interactive correction of the results by the user and do not consider situations when the same pixel with various levels of membership can belong to several classes (crystals) at once. There is a need of developing the automated calibration method. Modern calibration methods use cellular neural networks (CNNs) which were introduced in [17, 18] and have become successfully applied in computational imaging models. In work [8], the possibility of learning the Kohonen ANN with a cellular automaton (CA) has been proved that makes it possible to meliorate speed and quality of self-learning; even so, the issues of separating properties of the network if there are intersecting clusters have not been solved.
4 New Automated Calibration Method Based on a New Hybrid Architecture The author has developed an automated calibration method based on a modern hybrid architecture (Fig. 3)—Kohonen’s fuzzy cellular neural network (FCNN-SOM). The architecture of the FCNN-SOM proposed by the author contains three layers: 1. 2.
3.
Receptor (input) layer. Layer of Kohonen neurons [19] with lateral connections, trained by the CA to determine the centroids of the image intensity peaks in the form of intersecting clusters. Additional (output) fuzzy clustering layer, which calculates the membership levels of the current image vector for each cluster.
Let us consider how FCNN-SOM works in stages. Sets of input pixels x(k) = (x1(k), …, xn(k)), (here k = 1, …, N has the meaning of the image number in the sample) from the receptor (zero) layer are sequentially fed to neurons of the Kohonen layer Nj (j = 1, …, m), which has the topology of the crystal lattice of the X-ray detector matrix. Customizable synaptic weights wji (k) (j = 1, …, m; i = 1, …, n) determined centroids m overlapping clusters, corresponding to m “best” of the intensity peaks in the image, where m—number of crystals on the matrix detector.
The Method of Neuro-fuzzy Calibration of Geometrically …
173
Fig. 3 Kohonen Fuzzy Cellular Neural Network Architecture
Stage 1. Preliminary processing of data. The input data is processed over all ordinates so that observations belong to the hypercube [−1; 1] n. Centring is possible to do in two different ways: 1.
relative to the average calculated by the formula: 1 m i (k) = m i (k − 1) + (xi (k) − m i (k − 1)), k
2.
relative to the median in order to impart robust properties to the centring procedure (protection from anomalous observations) according to the recurrent formula: mei (k) = mei (k − 1) + ηm sign(xi (k) − mei (k − 1)), i = 1, 2, . . . , n,
ηm is describing the search step, selected in concordance with the stochastic approximation conditions of Dvoretsky. Stage 2. Initialization of the initial state of the network neurons, providing independent adaptive self-unfolding of the network neurons with the formation of an ordered self-organizing Kohonen map. The method consists of the gradual retraction
174
S. Gorbachev et al.
of inactive neurons by active neurons into the normalized hypercube of the space of training samples. For this, when initializing the network, all neurons are initially placed at an arbitrary point on a hypersphere of a sufficiently large radius R, the centre of which coincides with the centre of the training sample space. The radius R of the hypersphere is chosen in such a way that the neurons on the hypersphere could never win and would be “drawn” into the hypercube of training samples only due to their connections with previously winning neurons, and not due to their own victories. Stage 3 (modified). Learning the Kohonen layer with a Moore cellular automaton. Since the task was to restore the structure of the network of the matrix of crystals, this method determines the centres of the clusters and connects these centres according to the fixed topology of the Kohonen map. The neurons in the trained map should be located in a fairly uniform space, consistently, with geometric distortions existing in the output image and regardless of the differences in intensity levels associated with the peaks. Therefore, stage 3 can be called the stage of ordering the grid, during which the training procedure will try to build a grid of neurons on the input vectors in such a way that information that reflects the expected structure of the network is largely considered. The main idea is that a more specific concept of a neighbourhood is introduced, which takes into account the geometric nature of the neural network, and, accordingly, the weights of the winning neuron and the neighbouring neurons are updated within a certain distance from the winning neuron, but on the same row (X-coordinate) and on the same column (Y-coordinate) of the grid as the winner. This choice will give a higher weight to such a grid topology that would initially associate with the centres of the intensity peaks. Stage 3 of the calibration is implemented by training the Kohonen network with a cellular automaton (CA) with a von Neumann configuration, which assumes the presence of only horizontal and vertical connections between adjacent CA cells (SOM neurons). The CA cell is a neuron of the Kohonen layer, topologically connected in the environment of von Neumann with 4 (at the border—with 2 or 3) neighbouring neurons of the two-dimensional self-organizing Kohonen map (Fig. 4). A cellular automaton is a collection of Kohonen layer neurons that interact with their neighbours. In this case, the space of training examples in training iterations is divided into multidimensional Voronoi neighbourhoods. The neurons of the network are their centres, according to the principle of the nearest neighbour. At the same time, for the k-th neuron of the Kohonen map, there is a possibility of changing its state vector. This is possible if the neuron has become a winner. Or in the case of the neighbourhood of a neuron with another neuron that has won. Then, for the k-th neuron with the von Neumann environment for one epoch, we can calculate the average total increment of its i-th component of the state vector: n m wki = η
m
( j) j=1 (x mi − k+4 m=k n m
wki )
,
(1)
The Method of Neuro-fuzzy Calibration of Geometrically …
175
Fig. 4 Kohonen cell map topology surrounded by von Neumann
here η is the training speed; k is the winner neuron number; m—the k-th neuron’s environment neurons’ numbers; L—Kohonen map’s width; nm —training samples’ ( j) number; xmi —j-th training sample’s i-th coordinate of the m-th neuron; wki —k-th neuron’s i-th coordinate. Let us denote = k+4 m=k n m , then expression (1) can be rewritten as: wki = η
1 n m (x mi − wki ), S m
(2)
where x mi is the i-th coordinate of the centre of gravity of the Voronoi neighbourhood of the m-th neuron. Stage 4. Fine-tuning, during which the neighbourhood of each neuron includes only itself: practically, each input vector causes updating only the weights of the
176
S. Gorbachev et al.
winning neuron, using a slow decrease in the learning rate, without changing the ordered grid structure achieved in the previous stage. Stage 5 (optional). Network training with the Bezdek fuzzy clustering algorithm [19–21]. In the absence of a clearly defined boundary between neighbouring clusters, an output fuzzy clustering layer is added to the Kohonen layer, whose neurons N j calculate the membership levels u(k) = (u1 (k), …, um (k)) of the current image to the j-th crystal (j = 1, …, m). Because the purpose is to restore the structure of the grid of the matrix of crystals, we must determine the centres of the clusters and connect these centres according to the fixed topology of the Kohonen map. The neurons in the trained map should be located in a fairly uniform space, consistently, with geometric distortions existing in the output image and regardless of the difference in intensity levels associated with the peaks. To do this, we introduce an intermediate stage in the learning process, which we will call the grid ordering stage—between the traditional ordering stage and the customization stage [22, 23]. During this stage, the training procedure will try to build a network of neurons on the input vectors in such a way that information that reflects the expected structure of the network is largely taken into account. The main idea is that a more specific concept of a neighbourhood is introduced, which takes into account the geometric nature of the neural network, and, accordingly, the weights of the winner neuron and neurons are updated within a certain distance from the winner neuron, but on the same row (X-coordinate) or on the same column (Y-coordinate) of the grid as the winner. This choice will give a higher weight to such a grid topology that would initially associate with the centres of the radiation intensity peaks. Thus, in order to create an adaptive neural network system for calibrating an image recorded by a flat-panel X-ray detector, we essentially define pixel clusters of the influence of each crystal on the output image, for which we find the centres of the peaks of the X-ray radiation intensity and connect them according to the topology of the trained Kohonen neural network. To improve the calibration accuracy, a neuro-fuzzy degree of belonging to the corresponding crystal is calculated for each pixel.
5 Conclusion The described method makes it possible to effectively solve complex problems of information processing, when the classes to be separated have an arbitrary shape and intersect with each other, is characterized by high accuracy and efficiency for solving data clustering problems under conditions of uncertainty, including at the image boundaries, low learning time, reliability against noise, spatial distortion and large variability in degrees of pixel intensity. The method is fully automated and does not require interactive user intervention.
The Method of Neuro-fuzzy Calibration of Geometrically …
177
References 1. T. Kohonen, Self-organizing cards, M., BINOM. Knowledge laboratory, 2008, 655 p 2. V.I. Syryamkin, S.V. Gorbachev, S.B. Suntsov, Adaptive Neural Network Algorithms for Diagnostics of Materials, Equipment and Electronic Equipment (LAMBERT Academic Publishing, Saarbrucken, 2013), p. 269 3. J. Rantala, H. Koivisto, Optimized subtractive clustering for neuro-fuzzy models, Proceedings of the 3rd WSES International Conference, Interlaken, Switzerland, 2002, 6 p 4. A.A. Mayorov, Digital technologies in radiation control. In the world of NC. 2007 No. 1 5. X. Li, D. Zhang, B. Liu, A generic geometric calibration method for tomographic imaging systems with flat-panel detectors—a detailed implementation guide. Med. Phys. 37, 3844 (2010) 6. D. Mery, T. Jaeger, D. Filbert, A review of methods for automated recognition of castings defects. Insight 44(7) (2002) 7. A.G. Vincent, V. Rebuffel, R. Guillemaud, L. Gerfault, P.Y. Coulon, Defect detection in industrial casting components using digital X-ray radiography. Insight 44(10), 623–627 (2002) 8. A. Schumm, R. Fernandez, J. Tabary, Inspection of complex geometries using radiographic simulation in CIVA, in Proceedings of International Conference on NDE in Nuclear Industry, 2009 9. N. Robert, K.N. Watt, X. Wang, J.G. Mainprize, The geometric calibration of cone-beam systems with arbitrary geometry. Phys. Med. Biol. 54(24), 7239–7261 (2009) 10. C. Mennessier, R. Clackdoyle, F. Noo, Direct determination of geometric alignment parameters for cone-beam scanners. Phys. Med. Biol. 54, 1633–1660 (2009) 11. Y. Sun, Y. Hou, F. Zhao, J. Hu, A calibration method for misaligned scanner geometry in cone-beam computed tomography. NDT E Int. 39(6), 499–513 (2006) 12. F. Noo, R. Clackdoyle, C. Mennessier, T.A. White, T.J. Roney, Analytic method based on identification of ellipse parameters for scanner calibration in cone-beam tomography. Phys. Med. Biol. 45(11), 3489–3508 (2000) 13. D. Panetta, N. Belcari, A. Del Guerra, S. Moehrs, An optimization-based method for geometrical calibration in cone-beam CT without dedicated phantoms. Phys. Med. Biol. 53(14), 3841–3861 (2008) 14. W. Wein, A. Ladikos, A. Baumgartner, Self-calibration of geometric and radiometric parameters for cone-beam computed tomography, in Proceedings of Fully3D 2011, 2011 15. K.M. Holt, Geometric calibration of third-generation computed tomography scanners from scans of unknown objects using complementary rays, in Proceedings of IEEE International Conference on Image Processing (ICIP 2007), 2007, vol. 4, pp. IV-129–IV-132 16. Y. Kyriakou, R. Lapp, L. Hillebrand, D. Ertel, W. Kalender, Simultaneous misalignment correction for approximate circular cone-beam computed tomography. Phys. Med. Biol. 53, 6267–6289 (2008) 17. V. Kaftandjian, O. Dupuis, D. Babot, Y. Min Zhu, Uncertainty modeling using Dempster-Shafer theory for improving detection of weld defects. Pattern Recognit. Lett. 24(1–3), 547–564 (2003) 18. A. Koenig, A. Glière, R. Sauze, P. Rizo, Radiograph simulation to enhance defect detection and characterization, in Proceedings of ECNDT, Copenhagen, Denmark, 24–29 May 1998 19. S. Gorbachev, N. Gorbacheva, S. Koynov, A synergistic effect in the measurement of neurofuzzy system. MATEC Web Conf. 79, 01065 (2016) 20. V.I. Syryamkin, S.V. Gorbachev, M.V. Shikhman, Adaptive neuro-fuzzy classifier for evaluating the technology effectiveness based on the modified Wang and Mendel fuzzy neural production MIMO-network. IOP Conf. Ser.: Mater. Sci. Eng. 516(1), 012037 (2019) 21. S. Gorbachev, V. Syryamkin, High-performance adaptive neurofuzzy classifier with a parametric tuning. MATEC Web Conf. 155, 01037 (2018)
178
S. Gorbachev et al.
22. V.I. Syryamkin, S.V. Gorbachev, M.V. Shikhman, Adaptive fuzzy neural production network with MIMO-structure for the evaluation of technology efficiency. IOP Conf. Ser.: Mater. Sci. Eng. 516(1), 012010 (2019) 23. S. Gorbachev, Intellectual multi-level system for neuro-fuzzy and cognitive analysis and forecast of scientific-technological and innovative development. MATEC Web Conf. 155, 01012 (2018)