183 50 12MB
English Pages 386 [373] Year 2021
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Ritu Tiwari Apoorva Mishra Neha Yadav Mario Pavone Editors
Proceedings of International Conference on Computational Intelligence ICCI 2020
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/16171
Ritu Tiwari · Apoorva Mishra · Neha Yadav · Mario Pavone Editors
Proceedings of International Conference on Computational Intelligence ICCI 2020
Editors Ritu Tiwari Department of Computer Science and Engineering Indian Institute of Information Technology Pune Pune, Maharashtra, India
Apoorva Mishra Department of Computer Science and Engineering Indian Institute of Information Technology Pune Pune, Maharashtra, India
Neha Yadav Department of Mathematics and Scientific Computing National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh, India
Mario Pavone Department of Mathematics and Computer Science University of Catania Catania, Italy
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-16-3801-5 ISBN 978-981-16-3802-2 (eBook) https://doi.org/10.1007/978-981-16-3802-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This conference proceedings volume of the “International Conference on Computational Intelligence (ICCI 2020)” consists of a collection of the manuscripts accepted after a rigorous peer review and presented at the conference held virtually on 12–13 December 2020. This was the first edition of the conference organized by the Indian Institute of Information Technology, Pune (IIIT, Pune), India, and Soft Computing Research Society, Delhi, India. There were following four technical tracks for paper presentation. • • • •
Artificial Intelligence and Machine Learning Image Processing and Computer Vision IoT, Wireless Communication and Vehicular Technology Soft Computing
The motive of this conference is to bring together researchers working in the field of computational intelligence and allied areas across the world and to have a dissemination of knowledge on the recent developments in the field of artificial intelligence, nature-inspired algorithms and meta-heuristic algorithms: evolutionary and swarm-based algorithms, Internet of things, image processing, image segmentation, data clustering, sentiment analysis, big data, computer networks, signal processing, supply chain management, web and text mining, distributed systems, bioinformatics, embedded systems, expert system, forecasting, pattern recognition, planning and scheduling, machine learning, deep learning, computer vision and many other areas through the exchange of research evidence, personal scientific views and innovative ideas was successfully achieved.
v
vi
Preface
We would like to thank all the authors for contributing their work to this conference and proceedings.
Pune, India Pune, India Hamirpur, India Catania, Italy
The Editors Dr. Ritu Tiwari Dr. Apoorva Mishra Dr. Neha Yadav Dr. Mario Pavone
Contents
1
2
Missing Data Imputation for Solar Radiation Using Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priyanshi Khare, Rajesh Wadhvani, and Sanyam Shukla
1
A Survey on Object Detection and Tracking in a Video Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Sugirtha and M. Sridevi
15
3
Story-Plot Recommender Using Bidirectional LSTM Network . . . . . Manmohan Dogra, Jayashree Domala, Jenny Dcruz, and Safa Hamdare
4
A Novel Technique for Fake Signature Detection Using Two-Tiered Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yohan Varghese Kuriakose, Vardan Agarwal, Rahul Dixit, and Anuja Dixit
5
6
7
8
31
45
Personality Prediction Through Handwriting Analysis Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyanendra Chaubey and Siddhartha Kumar Arjaria
59
A Review on Deep Learning Framework for Alzheimer’s Disease Detection from MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parinita Bora and Subarna Chatterjee
71
Machine Learning and IoT-Based Ultrasonic Humidification Control System for Longevity of Fruits and Vegetables . . . . . . . . . . . A. K. Gautham, A. Abdulla Mujahid, G. Kanagaraj, and G. Kumaraguruparan
87
Classification of Melanoma Using Efficient Nets with Multiple Ensembles and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Vardan Agarwal, Harshit Jhalani, Pranav Singh, and Rahul Dixit
vii
viii
9
Contents
Performance of Optimization Algorithms in Attention-Based Deep Learning Model for Fake News Detection System . . . . . . . . . . . 113 S. P. Ramya and R. Eswari
10 Exploring Human Emotions for Depression Detection from Twitter Data by Reducing Misclassification Rate . . . . . . . . . . . . 127 D. R. Jyothi Prasanth, J. Dhalia Sweetlin, and Sreeram Sruthi 11 Artificial Neural Network Training Using Marine Predators Algorithm for Medical Data Classification . . . . . . . . . . . . . . . . . . . . . . . 137 Jayri Bagchi and Tapas Si 12 Code Generation from Images Using Neural Networks . . . . . . . . . . . . 149 Chandana Nikam, Rahul Keshervani, Shravani Shah, and Jagannath Aghav 13 A Novel Method to Detect the Tumor Using Low-Contrast Image Segmentation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Priyanka Kaushik and Rajeev Ratan 14 The Detection of COVID-19 Using Radiography Images Via Convolutional Network-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 171 Astha Singh, Shyam Singh Rajput, and K. V. Arya 15 Super-Resolution MRI Using Fractional Order Kernel Regression and Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Vazim Ibrahim and Joseph Suresh Paul 16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf by Using Image Processing and k-Means Clustering . . . . 195 Sudip Chatterjee and Gour Sundar Mitra Thakur 17 Restoration of Deteriorated Line and Color Inpainting . . . . . . . . . . . 207 M. Sridevi, Shaik Naseem, Anjali Gupta, and M. B. Surya Chaitanya 18 Study to Find Optimal Solution for Multi-objects Detection by Background Image Subtraction with CNN in Real-Time Surveillance System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Ravindra Sangle and Ashok Kumar Jetawat 19 Hole-Filling Method Using Nonlocal Non-convex Regularization for Consumer Depth Cameras . . . . . . . . . . . . . . . . . . . . 231 Sukla Satapathy 20 Comparative Analysis of Image Fusion Techniques for Medical Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Gaurav Makwana, Ram Narayan Yadav, and Lalita Gupta
Contents
ix
21 Feature Selection and Deep Learning Technique for Intrusion Detection System in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Bhawana Sharma, Lokesh Sharma, and Chhagan Lal 22 Mobile Element Based Energy Efficient Data Aggregation Technique in Wireless Sensor Networks—Bridging Gap Between Virtuality and Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Bhat Geetalaxmi Jairam 23 Implementing Development Status Scheme Based on Vehicle Ranging Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Chun-Yuan Ning, Jun-Jie Shang, Shi-Jie Jiang, Duc-Tinh Pham, and Thi-Xuan-Huong Nguyen 24 Blockchain Based Electronic Health Record Management System for Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Neetu Sharma and Rajesh Rohilla 25 Strategy of Fuzzy Approaches for Data Alignment . . . . . . . . . . . . . . . 299 Shashi Pal Singh, Ajai Kumar, Lenali Singh, Apoorva Mishra, and Sanjeev Sharma 26 Microgrid System and Its Optimization Algorithms . . . . . . . . . . . . . . 311 Chun-Yuan Ning, Jun-Jie Shang, Thi-Xuan-Huong Nguyen, and Duc-Tinh Pham 27 Reducing Errors During Stock Value Prediction Q-Learning-Based Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Rachna Yogesh Sable, Shivani Goel, and Pradeep Chatterjee 28 An Adaptive Stochastic Gradient-Free Approach for High-Dimensional Blackbox Optimization . . . . . . . . . . . . . . . . . . . . 333 Anton Dereventsov, Clayton G. Webster, and Joseph Daws 29 Analysis of Cotton Yarn Count by Fuzzy Logic Model . . . . . . . . . . . . 349 V. Visalakshi, T. Yogalakshmi, and Oscar Castillo 30 A Risk-Budgeted Portfolio Selection Strategy Using Invasive Weed Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Mohammad Shahid, Mohd Shamim Ansari, Mohd Shamim, and Zubair Ashraf Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
About the Editors
Dr. Ritu Tiwari is a Professor in the Department of Computer Science and Engineering at the Indian Institute of Information Technology, Pune, India. Before joining IIIT Pune, She was Associate Professor in Department of Information and Communication Technology at ABV-IIITM Gwalior. She has more than 12 years of teaching and research experience. Her field of research includes Robotics, Artificial Intelligence, Soft Computing and Applications. She has published 05 books and more than 80 research papers in various national and international journals/conferences and is the reviewer for many international journals/conferences. She has received Young Scientist Award from Chhattisgarh Council of Science and Technology in the year 2006. She also received Gold Medal in her post-graduation from NIT, Raipur. Dr. Apoorva Mishra is an Assistant Professor at the Indian Institute of Information Technology, Pune, India. He has served as the Guest Editor of the Springer Nature Journal ‘SN Applied Sciences,’ Topical Collection on Engineering: Artificial Intelligence, and has handled 61 manuscripts to final disposition. He has filed two Indian patents and has published around 20 research papers in various peer-reviewed international/national journals and conferences. He is serving as an active reviewer of many renowned SCI-indexed international journals. He has received the ‘Gold Medal’ from the University for Scoring Highest Marks in M.Tech. In the past, he has also gained industrial experience while working at ‘Tata Consultancy Services.’ Dr. Neha Yadav received her Ph.D. in Mathematics from Motilal Nehru National Institute of Technology, (MNNIT) Allahabad, India, in the year 2013. She was post doctorate fellow at Korea University, Seoul, South Korea. She is receiver of Brain Korea (BK-21) Postdoctoral Fellowship given by Government of Republic of Korea. Prior to joining NIT Hamirpur, she taught courses and conducted research at BML Munjal University, Gurugram, Korea University, Seoul, South Korea, and The NorthCap University, Gurugram. Her major research area includes numerical solution of boundary value problems, artificial neural networks, and optimization.
xi
xii
About the Editors
Mario Pavone is Associate Professor in Computer Science at the Department of Mathematics and Computer Science of University of Catania, Italy. His research is focused on the design and develops of Metaheuristics applied in several research areas, such as in Combinatorial Optimization; Computational Biology; Network Sciences and Network Social Sciences. He was Visiting Professor in many international University, such as University of Angers in France, University of Nottingham in UK, University of Plymouth in UK, and many others. From 2014 to 2018 he was Chair of the IEEE Task Force on Artificial Immune Systems for the IEEE Computational Intelligence Society, whilst so far he is active member for the following IEEE Task Forces on: (i) Interdisciplinary Emergent Technologies (as Vice-Chair); (ii) Ethical and Social Implications of Computational Intelligence, and (iii) Artificial Immune Systems (as Vice-Chair from May 1, 2018). He is also member of several Editorial Boards for international journals, as well as member of many Program Committees in international conferences and workshops. He has an extensive experience of organizing successful workshops, symposium, conferences and summer schools. Indeed, he is currently the Chief of the Scientific Directors of the Metaheuristics Summer School (MESS). In his scientific activities, he was also Tutorial and Invited Speakers for several international conferences, and Editor of many special issues in Artificial Life journal, Engineering Applications of Artificial Intelligence, Applied Soft Computing, BMC Immunology, Natural Computing, and Memetic Computing. In May 2018 he was founder of the Advanced New Technologies research laboratory (ANTs Lab), and currently he is the Scientific Director.
Chapter 1
Missing Data Imputation for Solar Radiation Using Generative Adversarial Networks Priyanshi Khare, Rajesh Wadhvani, and Sanyam Shukla
1 Introduction With the non-renewable energy resources depleting at the alarming rate, the world is moving towards sustainable and renewable energy sources. Among various renewable sources, solar energy is well-known and most commonly used. There are different technologies coming up that make use of solar power and hence moving towards a greener environment which is economically viable. There are various other renewable sources such as wind energy, hydro-energy, tidal energy, geothermal energy, biomass and many more. Solar energy among them has significant potential as it is easily available and has very few limitations to purchase and install. The hydroenergy and geothermal energy are limited by geographic locations, and combustion is required for biomass which results in high carbon emissions [20]. Solar insolation is simply the amount of the solar radiation reaching the surface of earth. It a time series data whose values are recorded at particular intervals. In order to forecast the solar power of the systems, solar insolation plays a major role [23]. But working with the real-world data, one often encounters missing values in the dataset. Solar datasets are not untouched from this problem, and sometimes contain missing and incomplete data. The incomplete solar radiation data can have a huge impact in predicting the output of the system. This data has the potential to provide us with many valuable inputs, support decision-making process and enhance flexibility and reliability of the system. Data imputation is an important area of research since long time as every day a huge amount of data is collected from various domains and sometimes the data is not complete, it contains missing values. The absence of adequate amount of information hinders in reaching full potential of the solution. The reason for missing values are divergent; it may be the case that the data was lost or it was never collected. The P. Khare (B) · R. Wadhvani · S. Shukla Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_1
1
2
P. Khare et al.
missing values generally may surface due to broken collective devices, or if certain data is difficult or even dangerous to acquire. Another scenario may be when the values are collected but are considered unusable. In the case of solar dataset, the reason for missing data may be varied such as no signal from sensors, unstable signals, signals which are lower or higher than physical limits or climatic conditions [21]. The researchers either model the dataset directly or use some imputation method to remove the missing values. Most of the algorithms today, especially in deep learning, rely on the quality and quantity of data to enhance their performance so it becomes significant to obtain a complete dataset without any missing values. The missing data can be classified into three categories: (1) missing at random (MAR), if there is a systematic relationship between the propensity of the missing values and observed data (2) missing completely at random (MCAR), if there is no relationship between the missingness of the data with any values, observed or missing (3) missing not at random (MNAR), if the fact that the data is missing is related to unobserved data, i.e, the data we do not have. Data imputation has been an interesting area of research and many imputation algorithms have been designed to deal with missing values. In the work presented here, data imputation is done using the generative adversarial networks (GANs). Recently, in the past years, generative adversarial network was introduced by Ian Goodfellow which is system a neural networks that can mimic any distribution of data [10]. They are able to generate the data using the noise samples by learning the latent properties of the data. There are two networks in GAN, generator and discriminator. Generator aims on generating synthetic data, and discriminator focuses on identifying real and fake data. GANs have mostly been used for generating realistic-looking images [15, 24]. The work presented here contributes to the imputation of missing solar insolation values in solar dataset by giving it as input to GAN and thus generating a complete dataset. The efficacy of the method is indirectly measured by using the new complete dataset as training data for different time series forecasting models. The data is imputed using other imputation methods also, and the data imputed by other methods is also used for training forecasting models, and thus, it gives a comparison between GAN and other methods.
2 Related Work There are various algorithms which are present for the data imputation today. Knowledge regarding whether the gaps have a certain distribution or not can be used a background for selecting the algorithm. Researchers approach to the missing data in their own way according to the need of their solution. Mainly, there are three main approaches that one can follow [16]. First is to delete the missing values and move forward with the remaining data. But this approach is not very effective as discarding the missing values may lead to loss of some important information. Second, one can use simple imputation methods which consist of replacing the missing value by mean or median of the data. Alternatively, the most common value or last
1 Missing Data Imputation for Solar Radiation …
3
value observed can be used in place of unknown value. Third approach is to use the machine learning-based algorithms for the imputation. Matrix factorization [2], expectation maximization [9], k-nearest neighbour(KNN)-based imputation [13] and recurrent neural network-based imputation [5, 19] can be included in this approach [16]. Matrix factorization (MF) allows to discover the latent features underlying the interactions between the data. kNN can be used for dealing with all kind of data, be it continuous, discrete, ordinal or categorical. It makes use of the concept that a point value can approximated by the values of the points that are closest to it. Multivariate Imputation by Chained Equation (MICE) [4] is among one of the state-of-the-art discriminative imputation methods. In 2014, C.C.Turrado [21] imputed the missing values of solar radiation sensors by means of chained equations method (MICE). In 2015, Steffen Moritz [18] provided a comparision for different univariate time series imputation methods. In 2018, Demirhan [6] estimated the missing value of solar irradiance of a real dataset using 36 imputation methods and provided a comparative study under 16 experimental conditions. In 2019, S. Arcadinho and P. Mateus [3] proposed an imputation method based on expectation maximization over dynamic Bayesian networks. In 2014, Ian Goodfellow introduced GAN in their paper, and they presented a framework for estimating generative model through adversarial training. Their model was composed of two multilayered perceptrons, and the entire model could be trained using backpropagation eliminating the need of Markov chains or unrolled approximate inference network during training or generation of samples [10]. Since their advent most of their applications have been in field of image generation such as image-to-image translation [14] and text-to-image translation [22]. Apart from images, they have been applied in domain of natural language processing [11, 12]. GANs have also been applied to solve the class imbalance problem by generating more data of minority class [7]. In 2019, Mohammad Navid Fekri [8] used recurrent GAN for generating realistic energy consumption data, and to further improve the quality of generated data and stabilize the training procedure, they used Wasserstein GANs (WGANs) and Metropolis–Hastings GAN (MH-GAN). Yoong Jinsung in 2018 [25] proposed a novel method for data impuation using GAN. Their work was demonstrated on five real-world datasets (Breast, Spam, Letter, Credit and News) from UCI Machine Learning Repository. In 2018, Luo [16] used GAN for time series data imputation by implementing a GRUI cell. They used real-world medical dataset and air quality dataset for experimentation. In 2019, Luo [17] presented an endto-end generative model E2 GAN for missing value imputation of multivariate time series. They demonstrated their experiment on two real-world datasets, PhysioNet Challenge 2012 dataset and KDD CUP 2018 Dataset.
4
P. Khare et al.
3 Model for Data Imputation GAN is a simple but powerful framework which comprises of two neural networks generator (G) and discriminator(D). The work proposed here uses GAN to solve the missing value problem. The model here uses generator for accurately imputing the missing data, and the discriminator’s goal is to distinguish between observed and imputed values. The model uses the concept of mask and hint matrix for data imputation [25].
3.1 Problem Formulation As the dataset is incomplete, a mask matrix denoted by M is used to identify which values are missing. Considering a d-dimensional space, let X data be the data vector denoting the data of solar time series dataset. The mask matrix M can take values in {0,1}d and Mi = 1 if X datai exists; otherwise, the value is 0. The random noise vector Z undergoes an element-wise matrix multiplication with mask matrix. The input to the generator is the random vector combined with the observed vector. The input to the generator is G( X¯ , M) where M is the mask matrix, and X¯ can be defined as: X¯ = X data ∗ M + (1 − M) ∗ Z
(1)
where * denotes element-wise matrix multiplication. Even if the value is present in the original dataset, then also a value is generated for it by the generator so the imputed matrix can be obtained by doing some calculations. Let X gen , be the output of the generator then the imputed matrix X imp can be defined as: X imp = X data ∗ M + (1 − M) ∗ X gen
(2)
In the standard GAN, the discriminator identifies the complete output of generator as real or fake. But the scenario here differs as the discriminator identifies which element of imputed matrix is real and which is fake so in a way we are predicting the mask vector which is pre-determined by the dataset. So, the additional information is provided to the discriminator in the form of hint vector H which is a random variable. We allow H to depend on M and by defining H we control the amount of information contained in H about M.
3.2 Architecture for Data Imputation Generator G and discriminator D are the two neural network architectures that designs a GAN. It may be thought of as a framework where two neural network models are
1 Missing Data Imputation for Solar Radiation …
5
pitting against each other forcing each other to improve. The generator network competes against an adversary, i.e. discriminator network, hence setting up a game theoretic scenario. The goal of generator is to generate data that is indistinguishable from real data, while the discriminator aims to distinguish between real and generated data. The input to the generator is a random noise vector, and discriminator takes as input real data as well as data generated by generator. As both the networks try to minimize their own loss, they undergo an adversarial training. The cost function for GAN [10] can described using the following equation: min max V (G, D) = P + Q G
D
P = E x∼ pdata [log D(x)] Q = E z∼ pz [log(1 − D((G(z)))]
(3)
where pdata represents real data distribution, x represents the sample drawn from this distribution and p Z represents the generator’s distribution over data, z is random noise from p Z . Figure 1 here shows the architecture of GAN which is used here for data imputation. The generator takes as input the random noise vector combined with the observed data and a mask matrix. The mask matrix specifies whether a value is present or not and thus takes a value 0 or 1. The generator adopts GRU, a stateof-the-art RNN cell for constructing its network. The discriminator’s input is the imputed data provided by generator along with the hint vector. Discriminator also uses the same GRU cell for its network design. GRU cells are used for both the networks because of their exquisite performance in the time series domain. Both the generator and discriminator comprises of a three layered fully connected neural net architecture whose hidden units use GRU cells. Recurrent neural network (RNN) is a feedforward neural network that has internal memory. It is recurrent in nature as the same process is repeated for every input data and the output of current input depends on the previous computations. RNN has difficulty in remembering long-term dependencies as it suffers from vanishing gradient and exploding gradient problems. To overcome the problem, long-short term memory (LSTM) was introduced and GRU came after that as an improvement over LSTM. GRU provides faster and better performance compared to LSTM. Internally, it uses two gates, update gate and reset gate, to transfer and maintain the information. The update gate selects how much of the information from the previous state needs to be passed along to the future. The reset gate determines how much of the previous information needs to be forgotten. We train D to maximize the probability of correctly predicting M, and we train G to minimize the probability of D predicting M. The cost function of GAN for imputation uses cross-entrpy loss function which can be described as follows ([25]): min max V (G, D) =E X imp ,M,H [M T log D(X imp , H )] + E X imp ,M,H [(1 − M T ) G
D
log(1 − D(X imp , H )]
(4)
6
P. Khare et al.
Fig. 1 Generative adversarial network for solar data imputation
where log is the element-wise logarithm, X imp denotes the imputed data from generator, M and H are mask and hint matrix.
4 Proposed Methodology The method proposed here fills in the missing values using GAN. The process is supported by providing mask matrix and hint matrix to generator and discriminator respectively. GANs are largely used in the domain of image generation where it is effortless to analyse the quality of the data generated as one can simply look at the images. But solar insolation values form a time series data, and in time series field, the quality of the imputed dataset cannot be interpreted just by looking at it. The imputation process begins with pre-processing of incomplete insolation values
1 Missing Data Imputation for Solar Radiation …
7
Fig. 2 Illustration of research design for solar data imputation
which involves normalization of data values. Min-Max normalization is used here to bring all the values to similar scale or level. The values are then passed to GAN for imputation. GAN returns the complete dataset without any missing values. The imputed dataset returned by the generator is judged for its quality by training different time series forecasting models Fig. 2.
Algorithm 1: Algorithm for training of GAN [!ht] Result: Return the updated weights Set the learning rate of generator as αG and discriminator as α D and batch size as m; Initialize the weights θG and θ D of generator and discriminator respectively; while No. of training epochs do Get the batches ( X¯ (i) ,M (i) ) of size m for generator from random noise sample combined with observed data X¯ ( X¯ (i) ,... X¯ (m) ) and mask matrix M(M (i) ,...M (m) ); Get the batches(Ximp (i) ,H (i) ) of size m for discriminator from imputed matrix Ximp (Ximp (i) ,...Ximp (m) ) and hint matrix H(H (i) ,...H (m) ); Update the discriminator as follows: θD = θD − αD θD Update the generator as follows: θG = θG − αG θG end
The Algorithm 1 details the algorithm used for training the GAN. The discriminator and generator are trained to deceive one another. The discriminator tries to reduce its loss so that it can differentiate observed value from imputed value. The generator tries to optimize itself to effectively impute the missing data. Initially, the learning rate of the generator and the discriminator is set as αG and α D , respectively, and the weights θG and θ D are initialized. The learning rates are selected to be the one by which the performance of the network was improving. The learning
8
P. Khare et al.
rates and optimizer are not set according to any algorithm or rule; they are purely selected on the trial and error method basis. They may differ from one dataset to another as performance of a network depends on various other factors, the type of data being one of them. In each iteration, the value of error or loss is calculated and weights are updated by generator and discriminator according to the feedback which they receive using gradient descent algorithm. The time for which the GAN will be trained is decided by the number of epochs. The training is performed by dividing the data into batches; i.e., learning occurs batch by batch.
5 Empirical Investigation and Result Analysis This section describes the experiment conducted on the solar dataset from three different regions and the results achieved by using the imputed dataset for training different time series forecasting models.
5.1 Experiment The experiment is demonstrated with solar datasets collected from Power Data Access Viewer [1] which contains datasets from NASA research for support of renewable energy. Three datasets are used here, namely DatasetA, DatasetB and DatasetC. The three datasets are from three different locations within India. The DatasetA is from Bhopal, DatasetB is from Hyderabad, and Dataset3 is from Mumbai region. The latitude and longitude of the regions are mentioned in Table 1. These datasets contains information about humidity, surface pressure, temperature, wind speed, insolation, thermal infrared radiative flux and precipitation whose values are noted at the interval of 24 h. There are various attributes in the dataset, and since the work here deals with univariate data, ’Clear Sky Insolation’ feature is selected for the experiment. This column contains large number of missing value which forms hindrance in the research work. The value ’-999’ in this column denotes the missing value. The data shows high variations from time to time, so there is a need of good imputation method. The dataset statistics is indicated by Table 1 where name and missing rate of the selected feature is displayed.
Table 1 Description of datasets Dataset Latitude and longitude DatasetA DatasetB DatasetC
Selected feature
23.2599◦ N, 77.4126◦ E Clear sky insolation 17.3850◦ N, 78.4867◦ E Clear sky insolation 19.0760◦ N, 72.8777◦ E Clear sky insolation
Missing rate 14% 18.15% 16.02%
1 Missing Data Imputation for Solar Radiation …
9
The imputed data is returned by generator after replacing the missing values. We evaluate the performance of our model by comparing it with other frequently used imputation methods. The method presented here is compared with mean value imputation, KNN (k-nearest neighbour) method and matrix factorization (MF) method. The dataset imputed by different methods is used for performing prediction task. The accuracy of these filling methods is indirectly compared by using the imputed dataset for training different time series forecasting models. The GRU-based univariate time series forecasting model and ARIMA model are used for prediction here. In order to get fair comparison among different methods, same forecasting model is used by all datasets from different imputation methods. The capability of the forecasting model is described by various performance measure. Mean square error (MSE), root mean square error (RMSE) and R 2 score are the three parameters considered here for performance measure. The mean square error (MSE) is the average of the squared forecast error values. It is tells that how close is the regression line to a set of points. The smaller the value of MSE, the better is the fit of the model. MSE =
1 n
n (yi − yˆi )2
(5)
i=1
Here, n is the total number of observations, yi is the predicted value, and yˆi is the actual value. The root mean square error (RMSE) can be obtained by taking the square root of mean square error as it can be transformed back into original units of prediction by taking the square root. n 1 (yi − yˆi )2 (6) RMSE = n
i=1
R 2 score also known as coefficient of determination is a statistical measure of how well the model replicates the data, based on the proportion of total variation of outcomes explained by the model. It is simply square of sample correlation coefficient between the observed outcome and the observed predictor values. The formulae for R 2 score is as follows: n (yi − yˆi )2 (7) R 2 scor e = 1 − i=1 n ¯ )2 i=1 (yi − y where yi is the predicted value, yˆi is the actual value and y¯ is the mean of data. Good results of these parameters indirectly indicate good filling accuracy by the imputation algorithms.
10
P. Khare et al.
5.2 Results and Discussion This section demonstrates the result of the imputation using GAN on ‘Clear Sky Insolation’ column of Solar Dataset. Using the imputed dataset for training the time series forecasting models will shed the light on the quality of data imputation. The imputed data is tested on GRU-based forecasting model and ARIMA model for time series. Also, GAN imputation method is compared with the most commonly used mean imputation and KNN imputation method. Tables 2 and 3 detail the accuracy results of different imputation methods using GRU forecasting model and ARIMA model, respectively, on DatasetA. The values of Mse, Rmse and R 2 score of GAN imputation from both the tables clearly indicate that the method used in this work is as good as any of the existing methods. The data from GAN imputation provides good results and hence reflects the quality of imputed data. Tables 4 and 5 indicates the accuracy result of GRU model and ARIMA model, respectively, on DatasetB. Similarly Tables 6 and 7 indicate the accuracy of both the forecasting models on DatasetC. The tables provide the comparison of accuracy achieved using GAN imputation with the other imputation methods (Fig. 3). Figure 3 presents the graph between the actual data and predicted data on GRU model using data imputed by GAN. The graph in red colour indicates the actual data while the one in yellow shows the predicted data. Similarly, Fig. 3 represents the graph between actual and predicted data on ARIMA model using GAN imputed data. The graph in blue colour indicates the actual data, while the one in red shows the predicted data. The prediction results of both the forecasting models on DatasetB and DatasetC are shown in Figs. 4 and 5. In spite of high variations in the distribution of the solar data, the predicted data in both the graphs closely follows the actual data indicating a good fit by the model.
Table 2 Accuracy of GRU model on datasetA Results MSE GAN imputation Mean imputation KNN imputation
0.00673 0.00810 0.00675
RMSE
R 2 score
0.08200 0.09004 0.08215
0.79912 0.71081 0.78863
Table 3 Accuracy of ARIMA model on datasetA Results MSE RMSE GAN imputation Mean imputation KNN imputation
0.007 0.008 0.007
0.08280 0.09149 0.08294
R 2 score 0.78685 0.70480 0.78627
1 Missing Data Imputation for Solar Radiation … Table 4 Accuracy of GRU model on datasetB Results MSE GAN imputation Mean imputation KNN imputation
0.00722 0.01549 0.00722
RMSE
R 2 score
0.08497 0.12448 0.08498
0.69162 0.53969 0.69182
Table 5 Accuracy of ARIMA model on datasetB Results MSE RMSE GAN imputation Mean imputation KNN imputation
0.007 0.01664 0.00738
Table 6 Accuracy of GRU model on datasetC Results MSE GAN imputation Mean imputation KNN imputation
0.00692 0.01026 0.00696
0.007 0.01075 0.00704
R 2 score
0.08588 0.12898 0.08593
0.68867 0.52205 0.67855
RMSE
R 2 score
0.08320 0.10129 0.08345
0.83036 0.77739 0.82987
Table 7 Accuracy of ARIMA model on datasetC Results MSE RMSE GAN imputation Mean imputation KNN imputation
11
0.08384 0.10367 0.08392
(a) GRU Model. Fig. 3 Forecasting results of GAN imputation on DatasetA
R 2 score 0.82885 0.77099 0.82870
(b) ARIMA Model.
12
P. Khare et al.
(a) GRU Model.
(b) ARIMA Model.
Fig. 4 Forecasting results of GAN imputation on DatasetB
(a) GRU Model.
(b) ARIMA Model.
Fig. 5 Forecasting results of GAN imputation on DatasetC
6 Conclusion and Future Works In this work, generative adversarial network has been used for data imputation of solar insolation values. With the help of mask and hint vector, the data can be easily imputed by GAN. Using GRU cells in generator and discriminator also supports the process. Although there are various algorithms for imputation, but GAN’s performance is no less than them and one can safely consider it for imputation over any of the existing methods. GAN is proficient in learning latent distribution between the data so the process of data imputation becomes easy and performance is good. Using GAN for replacing the missing values can be useful in various domains, especially where there is hidden relationship between data. Although the results are promising, there is still scope for improvement. This work only focuses on univariate solar data imputation, and in future, it can be extended for multivariate data. Also, the architecture of GAN can be modified further for enhancement in the results.
1 Missing Data Imputation for Solar Radiation …
13
Acknowledgements We acknowledge to Madhya Pradesh Council of Science and Technology, Bhopal, India, for providing us funds to carry out this research work.
References 1. Power data access viewer. https://power.larc.nasa.gov/data-access-viewer/ 2. E. Acar, D. Dunlavy, T. Kolda, M. Mørup, Scalable tensor factorizations with missing data. pp. 701–712 (2010). 10.1137/1.9781611972801.61 3. S. Arcadinho, P. Mateus, Time series imputation. CoRR abs/1903.09732 (2019). URL http:// arxiv.org/abs/1903.09732 4. S. Buuren, C. Groothuis-Oudshoorn, Mice: multivariate imputation by chained equations in r. J. Stat. Softw. 45 (2011). 10.18637/jss.v045.i03 5. W. Cao, D. Wang, J. Li, H. Zhou, Y. Li, L. Li, Brits: bidirectional recurrent imputation for time series, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (Curran Associates Inc., Red Hook, NY, USA, 2018), pp. 6776–6786 6. H. Demirhan, Z. Renwick, Missing value imputation for short to mid-term horizontal solar irradiance data. Appl. Energy 225(C), 998–1012 (2018). 10.1016/j.apenergy.2018.0. URL https:// ideas.repec.org/a/eee/appene/v225y2018icp998-1012.html 7. G. Douzas, F. Bação, Effective data generation for imbalanced learning using conditional generative adversarial networks. Exp. Syst. Appl. 91, (2017). https://doi.org/10.1016/j.eswa. 2017.09.030 8. M. Fekri, A.M. Ghosh, K. Grolinger, Generating energy data for machine learning with recurrent generative adversarial networks. Energies 13, (2019). https://doi.org/10.3390/en13010130 9. P.J. García-Laencina, J.L. Sancho-Gómez, A.R. Figueiras-Vidal, Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010). 10.1007/s00521-0090295-6. URL https://doi.org/10.1007/s00521-009-0295-6 10. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems, vol. 27, ed. by Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger, pp. 2672–2680. Curran Associates, Inc. (2014). URL http://papers.nips.cc/paper/5423-generativeadversarial-nets.pdf 11. J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, J. Wang, Long text generation via adversarial training with leaked information. CoRR abs/1709.08624 (2017). URL http://arxiv.org/abs/1709.08624 12. M.A. Haidar, M. Rezagholizadeh, TextKD-GAN: text generation using knowledge distillation and generative adversarial networks, pp. 107–118 (2019). 10.1007/978-3-030-18305-99 13. A. Hudak, N. Crookston, J. Evans, D. Hall, M. Falkowski, Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sens. Environ. 112, 2232–2245 (2008). https://doi.org/10.1016/j.rse.2007.10.009 14. P. Isola, J. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks. CoRR abs/1611.07004 (2016). URL http://arxiv.org/abs/1611.07004 15. Y. Jin, J. Zhang, M. Li, Y. Tian, H. Zhu, Z. Fang, Towards the automatic anime characters creation with generative adversarial networks. CoRR abs/1708.05509 (2017). URL http:// arxiv.org/abs/1708.05509 16. Y. Luo, X. Cai, Y. Zhang, J. Xu, Y. Xiaojie, Multivariate time series imputation with generative adversarial networks, in Advances in Neural Information Processing Systems, vol. 31, ed. by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.). Curran Associates, Inc. (2018), pp. 1596–1607. URL http://papers.nips.cc/paper/7432-multivariatetime-series-imputation-with-generative-adversarial-networks.pdf
14
P. Khare et al.
17. Y. Luo, Y. Zhang, X. Cai, X. Yuan, Egan: end-to-end generative adversarial network for multivariate time series imputation, in Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 3094–3100. International Joint Conferences on Artificial Intelligence Organization (2019). 10.24963/ijcai.2019/429. URL https://doi.org/10. 24963/ijcai.2019/429 18. S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, J. Stork, Comparison of different methods for univariate time series imputation in r. ArXiv abs/1510.03924 (2015) 19. Q. Suo, L. Yao, G. Xun, J. Sun, A. Zhang, Recurrent imputation for multivariate time series with missing values, in 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–3 (2019) 20. A. Trappey, P. Chen, C. Trappey, L. Ma, A machine learning approach for solar power technology review and patent evolution analysis. Appl. Sci. 9, 1478 (2019). https://doi.org/10.3390/ app9071478 21. C. Turrado, M. Meizoso-López, F. Sánchez-Lasheras, B. Rodriguez-Gomez, J. Calvo-Rolle, F. de Cos Juez, Missing data imputation of solar radiation data under different atmospheric conditions. Sensors 14, 20382–20399 (2014). https://doi.org/10.3390/s141120382 22. A. Viswanathan, B. Mehta, M.P. Bhavatarini, H.R. Mamatha, Text to image translation using generative adversarial networks, in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI, 2018), pp. 1648–1654. 10.1109/ICACCI.2018.8554877 23. C. Voyant, G. Notton, S. Kalogirou, M.L. Nivet, C. Paoli, F. Motte, A. Fouilloy, Machine learning methods for solar radiation forecasting: a review. Renew. Energy 105, (2017). https:// doi.org/10.1016/j.renene.2016.12.095 24. T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, B. Catanzaro, High-resolution image synthesis and semantic manipulation with conditional gans. CoRR abs/1711.11585 (2017). URL http://arxiv. org/abs/1711.11585 25. J. Yoon, J. Jordon, M. van der Schaar, GAIN: missing data imputation using generative adversarial nets. CoRR abs/1806.02920 (2018). URL http://arxiv.org/abs/1806.02920
Chapter 2
A Survey on Object Detection and Tracking in a Video Sequence T. Sugirtha and M. Sridevi
1 Introduction Object tracking is a substantial task in the area of computer vision. It can be described as the issue of reckoning the path of an object in the image plane as it steps over a scene. Tracking of objects is a complicated task due to information loss induced by protrusion of the 3D scene onto a 2D space, presence of noise, complex object movements and shape, imprecise behavior of objects, occlusions, change in brightness and real-time processing demands. Applications of object tracking [1] include automated surveillance, automatic annotation, gesture recognition, traffic monitoring, vehicle navigation, etc. In general, there are two categories of object tracking, namely single object tracking and multiple object tracking. The basic steps in object tracking include capturing of video sequence, frame extraction, object detection and object classification. The paper is organized as follows: Sect. 2 briefly describes about object detection and classification. Object tracking is explained in Sect. 3. Section 4 concludes the paper.
2 Object Detection and Classification The initial step in object tracking is object detection which identifies the object when it appears first time in the video. Figure 1 illustrates the various approaches used for T. Sugirtha (B) · M. Sridevi Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu 620015, India M. Sridevi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_2
15
16
T. Sugirtha and M. Sridevi
Fig. 1 Different object detection methods
detection of object such as frame difference (FD), optical flow (OF), background subtraction (BS) and deep learning (DL).
2.1 Frame Difference FD is an object detection method useful for static background. The position of the object is distinct in successive frames as objects move with reference to time. Accurate position of the object is estimated by computing the difference between two frames. The computational complexity is very low. But, it does not adopt for dynamic environments which results in lower accuracy.
2.2 Optical Flow In OF, the moving objects are extracted by identifying motion flow fields [2]. Moving objects can be detected accurately even in camera jittering. But, optical flow is not suitable under two conditions: (i) real-time scenarios (ii) and videos affected by noise. It provides good accuracy only for good-quality images and is not suitable for noisy image. Also, the computational cost is very high.
2 A Survey on Object Detection and Tracking …
17
2.3 Background Subtraction BS generates a background mask from the given video sequence. The background mask consists of objects that are not important. The remaining data, i.e., foreground, consists of objects that need to be tracked. Most of the background subtraction algorithms proceeds through four phases, namely background modeling, background model maintenance, feature extraction and segmentation. A method is developed in [3] for detecting objects affected by Gaussian, Poisson, salt-and-pepper, and speckle noise. BS is done using low rank minimization. An algorithm for BS using deep convolution neural network (CNN) was introduced in [4]. A semi-automatic method for BS using multi-resolution convolution neural network is proposed in [5]. The results of Jordan sequence specified in [6] is given in Table 1 for frame difference, background subtraction, and optical flow object detection methods. Table 1 Results of object detection[6] using (i) frame difference (ii) background subtraction (iii) optical flow
S. no Object detection method Results 1
Frame difference
2
Background subtraction
3
Optical flow
4
Deep learning (Yolo v3)
18
T. Sugirtha and M. Sridevi
2.4 Deep Learning Recent studies use deep learning approach for object detection. The authors in [7] classified the framework of object detection based on DL into two categories, namely (i) region proposal and (ii) regression/classification. In region-based convolution neural network (R-CNN), region proposals in the image are extracted using region segmentation method that exists in selective search algorithm [8]. It comprises all potential object candidates. The feature vectors are extracted by loading them into CNN. R-CNN proposed in [9] upgraded the quality of candidate bounding boxes and extracted high-level features. R-CNN involves three stages: extract region proposals, computation of CNN features for each proposal, and finally classifies each region with linear SVMs. R-CNN performs warping or cropping for each region proposal so that the fully connected layers have fixed-size input. But, a portion of the object may be present in the cropped region. The warping operation produces unwanted geometric distortion. It results a drop in accuracy. R-CNN has the following drawbacks: • It needs multi-stage pipeline for training. • It has more computational and space complexity. • It detects the object slowly. To overcome the limitations of R-CNN, a novel CNN architecture called spatial pyramid pooling networks (SPP-net) is present in [10]. SPP-net is a deep neural network (DNN) build upon spatial pyramid matching (SPM) [11]. Various size input images are passed through convolution layer to convert it into uniform dimensions. Then, feature maps are generated by the convolution layers. The tier next to the final convolution layer is termed as spatial pyramid pooling layer (SPP layer). Even though SPP-net resulted in higher accuracy compared to R-CNN, it consumes more storage space due to SVM training and fitting of bounding box regressor. This method is not appropriate for real-time applications due to its higher computation time. Fast RCNN is the amend variant of R-CNN. Fully convolutional network takes input image and multiple region of interests (RoIs) as input. Fixed-size feature map is obtained by pooling each RoI. Fully connected layers (FCs) map them to a feature vector. Fast R-CNN has the following advantages: (i) detects the objects more faster than R-CNN and SPP-net, (ii) training is done in single stage, and (iii) reduced number of computations. Also, it is faster compared to R-CNN and SPP-net. More computation has to be performed due to selective search algorithm. Some enhancements are done to fast-CNN which resulted in advance version called Faster R-CNN proposed in [12]. It overcomes the issue of high computation complexity by using region proposal network (RPN). It is ten times faster than Fast R-CNN. Also, it provides improved accuracy. Region-based fully convoultional networks (R-FCN) also take up RPN for generating candidate RoIs. It uses softmax for classification. Recognition and localization can be done simultaneously to detect the objects using R-FCN. Both R-FCN and Faster R-CNN have same rate of accuracy. But, in speed test, R-FCN is rapid compared to Faster R-CNN.
2 A Survey on Object Detection and Tracking …
19
The models under regression yields better results for real-time applications. YOLO is a CNN. It can be used to detect objects in real time. YOLO can also carry out end-to-end training. YOLO uses CNN for extracting the features in frontend. It uses two FCs for classifying. Regression is done in grid regions. The accuracy of YOLO is less compared to Faster R-CNN because region proposal is not used by YOLO. Single-shot multi-box detector (SSD) is designed by combining two mechanisms: regression in YOLO and anchor approach in Faster R-CNN. SSD decreases the computational complexity by using regression idea of YOLO, and thus, it supports real-time applications. SSD guarantees detection accuracy by extracting the attributes of distinct scales and aspect ratio. SSD gives 70% accuracy. But for small objects, SSD does not detect correctly. After object is detected, it needs to be categorized based on the category. Object classification is done under four criteria, namely (i) shape (ii) motion (iii) color, and (iv) texture. CornerNet [13] eliminates anchor box generation and detects bounding box of an object as paired keypoints (top-left and bottom-right corners) by adding a new pooling layer called “corner pooling.” On contrast, CenterNet [14] uses one more keypoint from the center and thus detects objects as “triplets.” EfficientDet [15] proposed a bidirectional feature pyramid network to fuse features at multiple scales to perform fast and easy object detection. Figure 2 shows the results of 2D object detection by Yolo v2 [16] on KITTI dataset [17] using ResNet-50 as backbone.
Fig. 2 2D object detection using Yolo v2 on KITTI dataset
20
T. Sugirtha and M. Sridevi
3 Object Tracking Object tracking segments the RoI from a video sequence and monitors its motion, position and occlusion. Objects are tracked by observing spatial and temporal changes which may be their position, size and shape. Figure 3 represents various categories of object tracking. The methods proposed for object tracking is explained in the subsequent Sects. 3.1–3.4.
3.1 Point Tracking Objects identified in successive frames are interpreted as points or dots. Grouping of these dots is done with respect to object’s previous state such as position and motion [1]. An extrinsic approach is needed to detect objects in each frame using this method. KF is proposed in [18]. KF estimates the state of a linear system. It assumes the state follows Gaussian distribution. There are two stages in KF, namely (i) prediction and (ii) correction. In the first stage, the state model predicts new state of the variables. In the second stage, correction is done by the observation model which minimizes the error covariance of the estimator. Hence, it is an ideal estimator. The KF is an optimal filter for estimating state. The error covariance is minimal in KF. The filter gives better performance if knowledge about initial state, initial state error covariance, process noise covariance, and measurement noise covariance are known in prior. Two variations of KF based on the transient gain exist in the literature. They are constant gain KF [19–22] and adaptive gain KF [23–25]. The performance of KF
Fig. 3 Object tracking methods
2 A Survey on Object Detection and Tracking …
21
depends on the difference between actual and predicted state variables. In some vital cases, KF fails to give notable results. To increase the performance, adaptive versions were developed. The disparity between estimated state and actual state is referred as error. Based on the error rate, KF can either converge or diverge. If the error is small, then the extended KF (EKF) will converge. Otherwise, the EKF permanently diverges. To prevent such divergence, the KF is modified with a feedback loop and is named as fractional order gain Kalman filter (FOGKF) [26]. Fractional derivative is calculated from past gain, and it is used as feedback. The KF gives unacceptable evaluation of state variables that do not comply with Gaussian distribution. This limitation can be overwhelmed by using PF. In PF, the conditional state density at time t is expressed by a set of samples with weights π (n). PF involves three steps, namely selection, prediction, and Correction. It can be used for tracking multiple objects. Variants of PF are listed in [27, 28]. MHT presented in [29] is the most commonly used approach for resolving data association problem in tracking multiple objects. Classic instance of data association conflict scenario is given in [30]. The traditional method to solve data association problem is the global nearest neighbor (GNN) approach [31]. It selects the highest hypothesis for updating the existing track and initiating new track. It works well for spaced targets, precise measurements, and few improper alarms in the track gates. GNN performance could be improved in two ways: (i) increasing the KF covariance matrix and (ii) joint probabilistic data association (JPDA) method. JPDA calculates a weighted aggregate of all observations present in its gate and thereby limits erratic association conditions [32, 33]. More than one track may be updated by an observation. The upgradation made to GNN method resulted in increase of computational complexity which paved the way for MHT. Alternate data association hypotheses are created when observation to track conflict circumstances appear as depicted in Fig. 4. Unlike the JPDA method which either selects the best hypothesis or merges the hypotheses, in MHT the hypotheses are proliferated into the future anticipation. Hence, the subsequent data will justify the ambiguity. Formation of multiple hypotheses in MHT is given in [30]. In [34], the association dilemma is modeled using bipartite graph edge covering technique with targets and object detection info as inputs. They introduced a multiplehypothesis approach which handles the objects that falls into any of the following conditions: (1) entering, (2) exiting, (3) merging, (4) splitting, and (5) detecting an object in split fragments mode through the constraints in background subtraction. MHT is extended with generalized recursion of track-oriented MHT for repeated measurement cases [35]. In the first stage, they formulated all repeated measurements by the same measurement equation and addressed the general multiple target tracking problem in second stage. They found that the two-stage MHT solution resulted in better performance. The developments of MHT are reviewed in [36]. It also summarizes measurement-oriented MHT (MO-MHT), track-oriented MHT (TO-MHT), distributed processing and graph-based MHT.
22
T. Sugirtha and M. Sridevi
Fig. 4 MHT
Track gates
PT1
OB2
PT2
OB3 OB1
OB1 ,OB2 ,OB3 - Observation Positions PT1, PT2 - Predicted Target Positions
3.2 Kernel Tracking Kernel tracking is commonly used in target representation and localization process of object tracking. It is iterative in nature which attempts to maximize the Bhattacharyya coefficient. Template matching is a trial and error method that examines the RoI in the video. It separates a frame from the video and verifies the reference image with it. It is used in handling digital images to find an equivalent model or template in each frame. It then finds miniature parts of an image that is analogous to this template. The matching is done in all possible positions of the original image. Template matching can be used for single object tracking. It can handle partial occlusions of the object. Mean shift tracking algorithm was introduced in [37]. The target localization problem is formulated based on escalation of a similarity measure derived from Bhattacharyya coefficient. Mean-shift algorithm is extended as background-weighted histogram (BWH) in [38]. The background interference is decreased in target representation. Also, background features are derived and used to pick the notable elements from the target model. BWH reduces the background’s interference in target localization. BWH is modified as corrected background-weighted histogram (CBWH) in [39] which gives better results though the target model consists of adequate background info. Here, a background histogram and transfer function is employed to decrease the effects of background colors on target model. This causes high reduction in the susceptibility of mean-shift tracking in target initialization. Optimal color-based mean-shift algorithm [40] extracts optimal colors by histogram agglomeration, and also, it clusters 3D color histogram bins with their corresponding frequency ratios. It generates a confidence map based on indices of the optimal colors. In [41], the
2 A Survey on Object Detection and Tracking …
23
authors presented an object tracking method to track objects with occlusion and variations in target scale based on mean shift algorithm. Adaptive mean shift method for automatic tracking of multiple objects is proposed in [42]. The method uses mixture of Gaussian for foreground extraction, and trackers are automatically refreshed to handle the changes caused by object’s size and shape. A multi-scale mean-shift tracking algorithm is introduced in [43]. This method combined multi-scale model and background-weighted spatial histogram and is useful for scale challenged targets and complex appearance variations. The SVM uses supervised learning algorithms often used in classification and regression. It segregates the given data into two classes of hyperplanes, i.e., objects to be tracked and objects not to be tracked. It can be applied for tracking single object and cannot handle partially occluded object. The algorithm needs physical initialization. The computational complexity of this method is high. An object tracking method based on multi-view learning framework using multiple SVM is presented in [44]. Multiple views of three different features: gray scale value, histogram of oriented gradients (HOG), and local binary pattern (LBP) were used to train the SVM. A method for tracking multiple aircrafts using structured SVM for handling occlusions is proposed in [45]. In [46], the authors introduced graph-regularized structured (GS) SVM. They formulated the algorithm by combining the strategies of manifold learning and structured learning. In [47], a state-based structured SVM tracking algorithm in combination with incremental principal component analysis (PCA) is recommended. It precisely determines and anticipates the object’s states. The incremental PCA is used to modify the virtual feature vector corresponding to the virtual state and the principal subspace of the object’s feature vectors. A framework for adaptive visual object tracking based on structured output prediction is presented in [48]. Intermediate classification step is not required in this method. They used kernelized structured output SVM, which is learned online to provide adaptive tracking. The unbounded growth of the support vectors is prevented using budgeting mechanism. Multiple object tracking can be performed using layering based tracking method. It is capable of tracking fully occluded object. Hierarchical layered tracking structure is proposed in [49] to implement tracking layer by layer in a sequential manner. Target is estimated by two models, namely (i) nonlinear motion model and (ii) interaction model based on inter target correlation. Real-world scenes are represented by merging depth-ordered and interacting graphical model layers in [50]. Mutual overlapping of observation region and their occlusions are handled by relocatable objects. A layered graph model [51] in RGB image and depth domain is proposed for real-time robust multi-pedestrian tracking. Pedestrian detection responses in the RGB domain are represented as graph nodes and 3D motion, appearance, and depth features are represented as graph edges. It uses heuristic label switching algorithm for optimal association and tracking of multiple pedestrians.
24
T. Sugirtha and M. Sridevi
3.3 Silhouette Tracking In this method, the object tracker finds the area bounded by the object in every frame. It uses the object model created from earlier frames. This model can possibly any of the following forms: (i) color histogram, (ii) object edges, and (iii) object contour. Silhouette trackers follow any one of the following two classes, namely 1. 2.
Shape matching—finds the object shape in the current frame; Contour tracking—expands the basic contour to its next position in current frame.
3.4 Tracking Based on Deep Learning Numerous object tracking methods based on neural networks were proposed. Few recent works are presented in this paper. Tracking methods proposed in [52–54] are based on deep CNN. Features are represented by deep hidden layers, which results in better tracking performance. However, these deep CNN-based trackers use supervised learning; they encounter complications while applying to partially labeled video sequences. To overcome the aforementioned limitation, reinforcement learning [55] is developed to make use of the partly labeled video sequences and to train their action-driven deep tracker. An algorithm proposed in [54] uses CNN-based tracking through detection procedure, where CNNs are trained with datasets in [56] and [57]. An adaptive weighted algorithm [58] is presented for integrating correlation response maps. They disintegrated the tracking system into two phases: (i) translation estimation phase and (ii) scale estimation phase. They applied various compatible correlation filters with features of hierarchical CNN in translation estimation phase and 1D correlation filter with histogram of oriented gradient (HOG) features for scale estimation phase. In [59], a deep CNN is devised to count the number of vehicles on the road. Fast Fourier transform networks (FFTNet) for object tracking is deployed in [60]. FFTNet is a correlation filter (CF) which combines two essential elements of CF: autocorrelation and crosscorrelation. Also, it integrates correlation filter with convolution neural networks for training feature description and matching function. A convolutional regression framework for visual object tracking is proposed in [61] to overcome the drawbacks of discriminatively learned correlation filters (DCF). In their framework, the regression model is built over a one-channel-output convolution layer. GOTURN [62] tracker runs at 100 fps which crops search region from current frame and target object from previous frame and finds target location within search region. SORT [63] makes use of Kalman filter and Hungarian algorithm for frameto-frame data association. SORT fails at when objects are occluded. DeepSORT [64] overcomes this limitation using appearance and motion information for tracking. STAM-MOT [65] tracks multiple objects by applying Markov decision process. FCNT [66] performs single object tracking by taking properties of convolutional
2 A Survey on Object Detection and Tracking …
25
layers from multiple levels. The advantages and drawbacks of various object detection and tracking methods are shown in Table 2. Table 2 Comparative study of object detection and tracking methods Object detection methods Type of detections
Methodology
Advantages
Disadvantages
Frame difference
Absolute difference between frames
Easy to implement
Not suitable for real-time applications
Background Subtraction
Median approximation
Adequate background modeling not needed
Require buffer for recent pixel values
Gaussian averaging
Suitable for real-time applications
Time complexity is high
Mixture of Gaussian
Requires less memory
Less accuracy for noise affected videos
Optical flow
Partial derivatives of spatial and temporal coordinates
Produce complete movement information
High computation complexity
Deep learning
Neural network
Improved accuracy
High computation complexity
Kalman filter
Track points in noisy images
State variables are distributed
Particle filter
Works for images with occlusion and complex background
Not suitable for real-time applications and high computation time
Multiple hypothesis tracking (MHT)
Multiple object tracking and can handle occlusions
Needs more memory and time
Simple template matching
Works for images with partial occlusion
RoI of each image requires equivalent model
Mean-shift method
Suitable for real-time applications with less computation cost
Iterations get into local maximum easily
Object tracking methods Point tracking
Kernel Tracking
Silhouette tracking
Support vector machine Works for images with partial occlusion
Initialization and training need to be done physically
Layering based tracking Handles full occlusion and tracks multiple objects
Parametric models of each pixel is required
Contour matching
Consumes more time for State space estimation
Object shapes are modeled implicitly
(continued)
26
T. Sugirtha and M. Sridevi
Table 2 (continued) Object detection methods Type of detections
Deep learning
Methodology
Advantages
Disadvantages
Shape matching
Less sensitive to variations in appearance
Performance is very less
Neural networks
Suitable for real-time applications
High computation complexity less robustness
4 Conclusion This paper gives a rich insight of various object detection and tracking methods. As an extension work, an algorithm will be proposed to overcome the limitations of the existing object detection and tracking methods. It could be applicable to real time applications with low computational complexity and more robustness.
References 1. A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006) 2. https://en.wikipedia.org/wiki/Optical_flow 3. B. Shijila, A.J. Tom, S.N. George, Simultaneous denoising and moving object detection using low rank Approximation. Elsev. Fut. Generat. Comput. Syst. 90, 198–210 (2019) 4. M. Babaee, D.T. Dinh, G. Rigolla, A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 76, 635–649 (2018) 5. Y. Wang, Z. Luo, P.-M. Jodoina, Interactive deep learning method for segmenting moving objects. Pattern Recogn. Lett. 96, 66–75 (2017) 6. R. Chen Richao, G. Yang Gaobo, N. Zhu Ningbo, Detection of object-based manipution by the statistical features of object contour. Forensic Sci. Int. 236, 164–169 (2014) 7. Z. Zhong-Qiu, Z. Peng, X. Shou-tao W. Xindong, Object detection with deep learning: a review, J. Latex Class Files, 14 (2017) 8. J.R.R. Uijlings, K.E.A. Van De Sande, T. Gevers et al., Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013) 9. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) 10. K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition 1–14 (2015). arXiv:1406.4729v4 [cs.CV] 11. S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. CVPR (2006) 12. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6) (2017) 13. L. Hei, D. Jia, CornerNet: detecting objects as paired keypoints. Eur. Conf. Comput. Vis. (ECCV), 1–17 (2018) 14. D. Kaiwen, B. Song, X. Lingxi, Q. Honggang, H. Qingming, T. Qi, Centernet: Keypoint triplets for object detection. IEEE Int. Conf. Comput Vis. (ICCV), 6568–6577 (2019)
2 A Survey on Object Detection and Tracking …
27
15. T. Mingxing, P. Ruoming, V.L. Quoc, EfficientDet: Scalable and efficient object detection, in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 10778–10787 16. R. Joseph, F. Ali, Yolo9000: Better, faster, stronger. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 17. A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: the kitti dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013) 18. R.E. Kalman, A new approach to linear filtering and prediction problems. ASME J. Basic Eng. 82, 35–45 (1960) 19. A. kumar, M. Ananthasayanam, P.S. Rao, A constant gain Kalman filter approach for the prediction of re-entry of risk objects. Acta Astronaut 61(10), 831–839 (2007) 20. G. Cook, D.E. Dawson, Optimum constant-gain filters. IEEE Trans. Ind. Electron. Control Instrum. 21(3), 159–163 (1974) 21. K. Wilson, An optimal control approach to designing constant gain filters. IEEE Trans. Aerosp. Electron. Syst. 8(6), 836–842 (1972) 22. P.A. Yadav, N. Naik, M. Ananthasayanam (2013) A constant gain Kalman filter approach to track maneuvering targets, in Proceedings of the IEEE International Conference on Control Applications, 562–567. 23. N. Chernoguz, Adaptive-gain tracking filters based on minimization of the innovation variance, in International Conference on Acoustics, Speech, and Signal Processing, vol. 3 (2006) 24. H.Y. Chung, T.W. Kim, S.H. Chang, B. Lee, Adaptive Kalman gain approach to on-line instrument failure detection with improved GLR method and suboptimal control on loft pressurizer. IEEE Trans. Nucl. Sci. 33(4), 1103–1114 (1986) 25. J.W. Almagbile, W. Ding, Evaluating the performances of adaptive Kalman filter methods in GPS/INS integration. J. Global Positioning Syst. 9(1), 33–40 (2010) 26. H. Kaur, J.S. Sahambi, Vehicle tracking in video using fractional feedback Kalman filter. IEEE Trans. Comput. Imag. 2(4) (2016) 27. Z. H. E. Chen, Bayesian filtering : from Kalman filters to particle filters , and beyond sequential monte carlo estimation (2003) 28. N.G. Branko Ristic, S. Arulampalam, Beyond the Kalman Filter: Particle Filters for Tracking Applications (Artech House, 2003) 29. D. Reid, An algorithm for tracking multiple targets. IEEE Trans. Autom. Control, 24(6), 843– 854 (1979) 30. S. Blackman, R. Popoli, Design and Analysis of Modern Tracking Systems (Artech House Norwood, MA, 1999) 31. S.S. Blackman, Multiple hypothesis tracking for multiple target tracking. IEEE A&E Syst. Magaz. 19(1), 5–18 (2004) 32. Y. Bar-Shalom, X.-R. Li, Multitarget-Multisensor Tracking: Principles and Techniques (YBS Publishing, Storrs, CT, 1995) 33. Y. Bar-Shalom, E. Tse, Tracking in a cluttered environment with probabilistic data association. Automatica 11, 451–460 (1975) 34. S.-W. Joo, R. Chellappa, A Multiple-hypothesis approach for multiobject Visual tracking. IEEE Transactions on Image Processing, 16(11) (2007) 35. S.P. Coraluppi, C.A. Carthel, Multiple-hypothesis tracking for targets producing multiple measurements. IEEE Trans. Aerosp. Electron. Syst. 54(3) (2018) 36. C.-Y. Chong, S. Mori, D.B. Reid, 21st International Conference on Information Fusion (FUSION) (2018) 37. D. Comaniciu, V. Ramesh, P. Meer, “Real-time tracking of non-rigid objects using mean shift, in Proceeding IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 ( (Hilton Head, SC, 2000), pp. 142–149 38. D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking. IEEE Trans. Pattern Analys. Mach. Intell. 25(5), 564–577 (2003) 39. J. Ning, L. Zhang, D. Zhang, C. Wu, Robust mean shift tracking with corrected backgroundweighted histogram. IET Comput. Vision 6(1), 62–69 (2012)
28
T. Sugirtha and M. Sridevi
40. X. An, J. Kim, Y. Han, Optimal colour-based mean shift algorithm for tracking objects. IET Comput. Vision 8(3), 235–244 (2014) 41. D.T. Stathaki, Mean shift tracking through scale and occlusion. IET Sig. Proc. 6(5), 534–540 (2012) 42. C. Beyan, A. Temizel, Adaptive mean-shift for automated multi object tracking. IET Comput. Vision 6(1), 1–12 (2012) 43. Yu. Wangsheng, X. Tian, Z. Hou, Y. Zha, Y. Yang, Multi-scale mean shift tracking. IET Comput. Vision 9(1), 110–123 (2015) 44. S. Zhang, Yu. Xin, Y. Sui, S. Zhao, Li. Zhang, Object tracking with multi-view support vector machines. IEEE Trans. Multimedia 17(3), 265–278 (2015) 45. Z. Xie, Z. Wei, C. Bai, Multi-aircrafts tracking using spatial–temporal constraints-based intraframe scale-invariant feature transform feature matching. IET Comput. Vis. 9(6), 831–840 (2015) 46. S. Zhang, Y. Sui, S. Zhao, Li. Zhang, Graph-regularized structured support vector machine for object tracking. IEEE Trans. Circ. Syst. Video Technol. 27(6), 1249–1262 (2017) 47. Y. Yin, Xu. De, X. Wang, M. Bai, Online state-based structured SVM combined with incremental PCA for robust visual tracking. IEEE Trans. Cybernet. 45(9), 1988–2000 (2015) 48. S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.-M. Cheng, S.L. Hicks, H. Philip, Struck: “structured output tracking with kernels.” IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016) 49. X. Cao, X. Jiang, X. Li, P. Yan, Correlation-based tracking of multiple targets with hierarchical layered structure. IEEE Trans. Cybernet. 48(1), 90–102 (2018) 50. V. Ablavsky, S. Sclaroff, Layered graphical models for tracking partially occluded object. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1758–1775 (2011) 51. S. Gao, Z. Han, Ce. Li, Q. Ye, J. Jiao, Real-time multi pedestrian tracking in traffic scenes via an RGB-D-based layered graph model. IEEE Trans. Intell. Transp. Syst. 16(5), 2814–2825 (2015) 52. N. Wang, S. Li, A. Gupta, D.-Y. Yeung, Transferring rich feature hierarchies for robust visual tracking. [Online] (2015). Available: https://arxiv.org/abs/1501.04587 53. S. Hong, T. You, S. Kwak, B. Han, Online tracking by learning discriminative saliency map with convolutional neural network. [Online] (2015). Available: https://arxiv.org/abs/1502.06796 54. H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking. [Online] (2015). Available: https://arxiv.org/abs/1510.07945 55. S. Yun, et al, Action-driven visual object tracking with deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2239–2252 (2018) 56. Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015) 57. M. Kristan, et al., The visual object tracking VOT2014 challenge results, in Processing European Conference on Computer Vision, Workshops, (2014), pp. 191–217 58. H. Zhaohui, L. Guixi, Z. Haoyang, Correlation filter-based visual tracking via adaptive weighted CNN features fusion. IET Image Proc. 12, 1423–1431 (2018) 59. J. Chung, K. Sohn, Image-based learning to measure traffic density using a deep convolutional neural network. IEEE Trans. Intell. Transp. Syst. 19(5), 1670–1675 (2018) 60. Z. He, Z. Zhang, C. Jung, Fast fourier transform networks for object tracking based on correlation filter. IEEE Access 6, 6594–6601 (2018) 61. K. Chen, W. Tao, Convolutional regression for visual tracking. IEEE Trans. Image Process. 27(7), 3611–3620 (2018) 62. H. David, T. Sebastian, S. Silvio, Learning to Track at 100 FPS with Deep Regression Networks (2016) CoRR, abs/1604.01802 63. Z. Bewley, L. Ge, F.R. Ott, Upcroft, B., Simple online and realtime tracking, in IEEE International Conference on Image Processing (ICIP), (2016), pp. 3464–3468 64. N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a deep association metric, in IEEE International Conference on Image Processing (ICIP) (2017), pp. 3645–3649
2 A Survey on Object Detection and Tracking …
29
65. C. Qi, O. Wanli, L. Hongsheng, W. Xiaogang, L. Bin, Y.Nenghai, Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism, in The IEEE International Conference on Computer Vision (ICCV) (2017), pp. 4836–4845 66. W. Lijun, O. Wanli, W. Xiaogang, L. Huchuan, Visual tracking with fully convolutional networks, in The IEEE International Conference on Computer Vision (ICCV), (2015), pp. 3119–3127
Chapter 3
Story-Plot Recommender Using Bidirectional LSTM Network Manmohan Dogra, Jayashree Domala, Jenny Dcruz, and Safa Hamdare
1 Introduction According to a list based on the data collected from 123 countries by the United Nations Educational, Scientific, and Cultural Organization (UNESCO) shows that approximately 2.2 million books are published per year [1]. Many factors can cause a reduction in this number such as lack of inventiveness or originality, insipid, or monotonous content. This can be avoided by a wave of inspiration that a plot generator can provide. A plot generator is a tool that generates elemental anecdotes or ideas for the plot. One can select the elements of the story in the plot generator. This could be in the form of a computer program, a book composed of directions that flip independently of one another, a chart with multiple columns, or a set of many adjoining reels that spin independently of one another. It may then combine them randomly to generate a specific variation known as a random plot generator. However, such generators tend to produce conventional and insipid situations [2]. To avoid this, the paper presents a model that uses a long short-term memory (LSTM) network to build a story generator using the Tensorflow framework. To implement this, the data is pre-processed, which simply means bringing the text into a form that is predictable and analyzable for your task, and this is done by using natural language processing (NLP). Then bidirectional LSTMs are used for building the model. LSTM is a trailblazing and coherent method that is pertinent to classifying, processing, and making predictions based on the given seed [3]. It is a novel, efficient, and gradient-based method, which is a special kind of recurrent neural networks (RNNs), fit for long haul conditions by utilizing their cell state and is required in domains like machine translation and speech recognition. They can retain information for long periods [4], which is why LSTMs are the most suited method for building this model. The result generated by the model would be a set of chained M. Dogra (B) · J. Domala · J. Dcruz · S. Hamdare St. Francis Institute of Technology, Borivali-west, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_3
31
32
M. Dogra et al.
text without punctuation and might not be easy to understand. Therefore, the use of ‘segmentation’ to break the text into sentences and ‘punctuator’ to make sense from the segmented text is crucial. To ascertain that using bidirectional LSTM is the best method to employ for this use case, comparison is done with other model building algorithms like simple LSTM network and gated recurrent unit (GRU) network. The main aim of this paper is to provide a lucid system which can be used by just about any individual that needs the inspiration to write a piece.
2 Related Work Several researchers have worked on text generation techniques using a multitude and varied algorithms. The authors of paper [5] have worked on a generator which adheres to the pointer-generator network. They have a component so as to deal with out-of-vocabulary (OOV) and duplicative words. Furthermore, a mixed loss method is introduced. This is done for the purpose of enabling the generator to create story endings of high semantic pertinence. In order to perfect the generator with policy-gradient reinforcement learning (PGRL) in the reward manager, the reward is calculated. The results of this work illustrate that the model surpasses the sequence-to-sequence baseline model by 15.75% and 13.57% in terms of CIDEr and consistency score, respectively. In another journal [6], the authors aim to establish a story generator system that generates the story automatically. This is executed by using 31 move functions. Along with this, the standing Proppian story generator is also adopted. Five-story move functions are contained in the Revised Proppian story generator. Each statement generated in the story is going through the process of reasoning to examine the semantic. For the stories to be rational conceptual reasoning has been implemented. This is done by implementing first order logic and ontology. The lines that do not have suitable action verbs, nouns, and relational words are detected, and the needful changes to rectify the errors are proposed. Semantic reasoning helps to ameliorate the story’s quality. While in the paper [7], the authors create plots using a support method. Furthermore, this is done by subsuming the idea of a story relating to the events according to the chronology of the narrative world. They elucidate the format of the plot as well as the story. Experiments were performed to assess the legitimacy of the plot and the story and the viability of the framework. It was also discovered that the development of the plots of already published novels could be sufficient. Additionally, the system assisted authors with creating plots for distinct genres. The authors of paper [8] developed a short story generator called an episode. It is dependent on self-governing character agents. The character agents have staggered objectives like character goals, viewer goals, and plot goals. This is used to generate steady and consistent stories. The emotional states the author wants users to get are represented by the viewer’s objective. The plot point likewise called the plot goal is used to represent the key scene. To achieve the plot goal, the characters assume a role. What the character wants to accomplish professionally, physiologically, or intellectually is the character goal. They put forth
3 Story-Plot Recommender Using Bidirectional LSTM Network
33
a system that can create plot objectives to attain both viewer and character goals. In another paper [9], the authors have introduced story generation from the point of view of how human authors build stories by means of composing prompts. The framework selects a couple stochastic words as a prompt, which will establish the basic parameters for creating a story. The context of a chosen word cannot be instinctively known by a computer like a human writer. Therefore, to find the context for the chosen words and to direct the story generation process, the Internet will be utilized. Specifically the existing ‘Concept Knowledge’ systems will be used. The research introduced brings attention to the story itself and is not involved with tracing a path from the beginning to the end during the generation cycle. The principal focus is the generation of stories that have the nature of a human creator. The story’s randomness is one of the most salient goals of the suggested system. The generation of stories that as of now exist and that are not adjusted from stories, or that fit into a previous construction, or that are initiated by a client. Authors of the paper [10] propose a framework where they gather a sizable dataset of 300,000 human-composed stories combined alongside composing prompts from an online discussion. The dataset allows hierarchical story generation, wherein the model at first creates a reason and afterward metamorphoses it into a section of text. They gain further enhancements with a new type of model combination that refines the germaneness of the plot. This is done by adding a new gated multi-scale self-attention mechanism. Trials show huge upgrades over amazing baselines on both human and computerized assessments. The authors of paper [11] have established that attributes of human interaction are envisaged in the multimedia multi-agent demonstration. Secondly, they also showed that in support of the participating agents, various plot stages in stories can be identified with motley interpretations. Lastly, subjects formed interpretations for definite and close-knit stories which were rated as conceivable when the presentations were produced by the computer. A promptly made arrangement of understanding labels and a solitary base story permits the generation of a large number of unmistakable, conceivable, and firm story-transforms. Here the outside plot steps stay comparative, yet the thematic material and the inward existences of the characters shift incredibly for specific sorts of situations. While restricted in scope, their outcomes, in any event, propose that this methodology will hold up under more thorough testing. While in another research, the authors of the paper [12] have proposed an information-driven methodology for producing short stories for children that do not need broad manual contribution. They made a start to finish framework that understands the different parts of the generation pipeline stochastically. The framework follows a generateand-rank methodology where the space of various stories is pruned by thinking about whether they are conceivable, intriguing, and sound. Their methodology has three key highlights. Right off the bat, the story plot is made progressively by counseling a consequently made information base. Secondly, the generator understands the different segments of the age pipeline stochastically, without broad manual coding. Thirdly, they produce and store various stories productively in a tree data structure. Story creation adds up to navigating the tree and choosing the nodes with the most noteworthy score. They created two scoring functions that rate stories regarding how lucid and fascinating they are. Generally, the outcomes demonstrate that the generation-and-
34
M. Dogra et al.
ranking methodology pushed here is reasonable in delivering short stories that show story structure. In another study, the authors of paper [13] have introduced a way to deal with story generation that utilizes a library of vignettes that are pre-expected to be ‘acceptable.’ They presented a story arranging algorithm propelled by casebased thinking. This approach joins vignettes into the story being produced. The story arranging calculation necessitates that vignettes be in the fitting domain. While in another paper [14], the authors have a completely executed model for intuitive narrating. They portray the significant instruments associated with the inconstancy of plots, inside a situation of the sitcom kind. They additionally give an assessment of the ideas of how the dynamic collaborations among clients as well as the agents impact the production of the story. They have demonstrated that, in spite of the fact that the entertainer’s practices are deterministic, the communication between entertainers could extensively add to story variation. This level of eccentrics conditions the generation of sensational circumstances. The character-focused methodology has the upside of being particular and expandable to numerous actors. In the paper [15], the authors have proposed a neural network-based process to automate the generation of story plot. This is done by generating news stories taken from a dataset of story summaries. The system is broken down into smaller parts like generating sequence of events followed by transforming them into sentences. Moreover, in the study presented by the authors [16], a INES system is developed. INES stands for interactive narrative emotional storyteller. The framework used is Afanasyev, on which the plot generation system is based. It focuses on template-based generation with the help of simulated characters. In the research [17], the authors proposed a new conditional variation auto encoder. This is based on transformers and helps in the plot generation. The auto encoder has attention layers for both the encoder and decoder system. It takes help from the clues and learns from the latent variable. While in the other paper [18], the researchers have built a story generation model using the casual relations. These casual relations are obtained from the commonsense plot ordering. When different genres are taken into consideration, the variations in the commonsense rules affect the story. A succinct information of all these papers along with the algorithm used and results is depicted in the Table 1.
3 Implementation The implementation process is bifurcated into two modules, namely the training module and testing module. The training module consists of training the model, and the testing module tests the efficiency of that model.
3 Story-Plot Recommender Using Bidirectional LSTM Network Table 1 Comparison of the related work Title Author(s)
35
Algorithm
Result
From plots to endings: Zhao Y., Liu L., Liu a reinforced pointer C., Yang R., Yu D. generator for story ending generation
Pointer-generatornetwork
An intelligent automatic story generation system by revising proppian’s system Plot-creation support with plot-construction model for writing novels Plot can be created by using the format of the plot does not support such a bottom -up thinking process Automatic short story generator based on autonomous agents Random word retrieval for automatic story generation
A. J., G.V. U.
31 move functions along with this the standing Proppian story generator
Surpasses the sequence-to- sequence baseline model by 15.75% and 13.57% in terms of CIDEr andconsistency score System scores 78%
Atsushi Ashida
Tomoko Kojiri
Support method
Yunju Shim and Minkoo Kim"
Self-governing character agents
R. S. Colon, P. K.Patr and K. M. Elleithy
Concept Knowledge systems
Hierarchical neural story generation
Fan Angela, Lewis Mike
Dauphin Yann
Consistent story character agents have no learning abilities Based on the randomly selected words the system needs to generate a setting and develop a plot Hierarchical story + gated multi-scale attention model
Upgrades over baselines on both human and computerized assessments Story-morphing in the Clark Elliott, affective reasoning paradigm Toward vignette-based Riedl Mark O story generation for drama management systems Learning to tell tales: a McIntyre Neil data-driven approach to story generation
Jacek Brzezinski, Sanjay Sheth Robert Salvatoriello
Lapata Mirella
Multimedia multi -agent demonstration. While restricted in scope, methodology will hold up under more thorough testing Information-driven methodology (continued)
36 Table 1 (continued) Title Generation-andranking methodology pushed here is reasonable in delivering short stories Story arranging algorithm propelled by case-based thinking Character-driven story generation in interactive storytelling
Story realization: expanding plot events into sentences
M. Dogra et al.
Author(s) Story generation by using prescreened vignettes but no guarantee that a new story made will be good
Algorithm
F. Charles, S. J. Mead and M. Cavazza
Character-based model High-level action for intuit -ive narrating recognition of interactions between characters behaviors although actor’s behaviors are deterministic Neural network-based Plot generation allows approach for planning toward plot points. However, they are unreadable and abstract, needing to be translated sound sentences System based on the Proper fitting of the Afanasyev framework precondition and postcondition of the episodes but inconsistency affects the plot Conditional Generates plots with variational autoencode better coherence and based on Transformer diversity but models tend to generate generic and dull plots Based on soft causal Parts of the story is relations inferred from better as compared to commonsense the whole but reasoning thematically relevant soft casual relations need to be improved
Prithviraj Ammanab-rolu, Ethan Tien, Wesley Cheung, Zhao-chen Luo, William Ma, Lara J. Martin, Mark O. Riedl
Evolving the INES Eugenio Concep story generation -ción,Pablo Gervás, system: from single to Gonzalo Méndez multiple plot lines
T-CVAE: transformer-based conditioned variational autoencoder for story completion Automated storytelling via causal, commonsense plot ordering
Tianming Wang and Xiaojun Wan
Prithviraj Ammanabrolu, Wesley Cheung, William Broniec, Mark O. Riedl
Result
3 Story-Plot Recommender Using Bidirectional LSTM Network
37
3.1 Module 1: Training 3.1.1
Step 1: Preparing the Dataset
A dataset is needed which contains excerpts of stories for the model to be able to run. The stories can be of a particular genre and can be trained accordingly. The dataset taken here contains fictional stories. These stories are stored in a text document.
3.1.2
Step 2: Analyzing the Dataset
The dataset which is the text document is viewed, and any special symbols are removed which do not add any importance to the efficiency of the model.
3.1.3
Step 3: Implementing Preprocessing Techniques
The implementation of NLP is to prepare the dataset for training purposes. It happens in various steps. (a) Lowercasing: First, we will convert the data to lowercase and split it line-wise to get a python list of sentences. The reason for converting into a lower case is that the variation in input capitalization will give different outputs. For example—‘Doctor’ and ‘doctor’ are the same words, but the model will treat it differently. (b) Tokenization: Additionally, tokenization will be performed. This will generate the dictionary of word encodings and create vectors out of the sentences. An instance of the tokenizer will be created. The tokenizer then takes in the data and encodes it using the fit on text method. The tokenizer provides a word index property. This returns a dictionary containing key-value pairs, where the key is the word, and the value is the token for that word. The length of this dictionary will give us total words. (c) Generating a list of tokens: The next step will be to turn the sentences into lists of values based on these tokens generated in the previous step. The training x’s will be called input sequences, and this will be a Python list. Then for each line in the corpus, the generation of the token list will be done using the tokenizer’s texts to sequences method. This will convert a line of text like ‘frozen grass crunched beneath the steps’ into a list of the tokens representing the words as shown in Fig. 1. (d) Padding: Next is to iterate over this list of tokens generated in the previous step and create several n-grams sequences. Moreover, there is a need to manipulate these lists by
38
M. Dogra et al.
Fig. 1 The list of token generated for a sample sentence
making every sentence the same length; otherwise, it may be hard to train a neural network with them. So the concept of padding will be used which requires the knowledge of the length of the longest sentence in the corpus. To do this, a naive method will be used by iterating over all of the sequences and finding the longest one. Once the longest sequence length is obtained, the next thing to do is pad all of the sequences so that they are the same length. Pre-padding is done with zeros to make it easier to extract the label. The line will be represented by a set of padded input sequences which is shown in Fig. 2. (e) Generate input values and labels: Now that the sequences are formed, the next thing to do is turn them into x’s and y’s, input values, and their labels. Now that the sentences are represented in this way, all that is needed is to take all characters as the x but use the last character as the y on the label. This is the reason pre-padding was done because it makes it much easier to get the label simply by grabbing the last token. The generation of input and output variables is shown in Fig. 3. (f) One hot encoding: Now, the next step is to one-hot encode the labels as this is a classification problem, where given a sequence of words, classification can be done from the corpus, and predict what the next word would likely be.
Fig. 2 The padded sequence generated for a sample input sequence Fig. 3 The process of generation of input and output variables
3 Story-Plot Recommender Using Bidirectional LSTM Network
3.1.4
39
Step 4: Model Building
The model used has 24 input variables, and it is formed by firstly ‘Embedding layers’ with 300 neurons; the second layer is an ‘LSTM’ with 400 neurons followed by a dropout layer of 0.2. The third is an LSTM layer with 100 neurons, followed by two dense layers with activation function ‘relu’ and ‘softmax,’ respectively. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons simultaneously; hence, it is computationally more efficient compared to the ‘sigmoid’ and ‘tanh’ function. The final dense layer of 2000 class is then followed by a compilation layer. A total set of trainable parameters (6000) was compiled using ‘adam’ optimizer and ‘categorical crossentropy’ as the loss function. The layers of the CNN architecture can be seen in Fig. 4. The first dimension in a Keras model is always the batch size. The dimension (?, x) represents the batch size dimension(none) and the input shape, respectively. Finally, the model is trained for 200 epochs until the model converges and the model is saved.
3.2 Module 2: Testing 3.2.1
Step 1: Take Input
The user gives two inputs; one is the input seed which is basically the start phrase of the plot desired. This helps the system to further predict the words. The other input is the word limit. The word limit tells how many words should be generated.
Fig. 4 Flowchart of the training module
40
3.2.2
M. Dogra et al.
Step 2: Preprocessing
The seed input will be tokenized using the text to sequences method on the tokenizer and pad the sequence so it matches the ones in the training set.
3.2.3
Step 3: Prediction
The processed input seed is passed to the model to get a prediction back. This will give the token of the word most likely to be the next one in the sequence. So now, a reverse lookup on the word index items is done to turn the token back into a word and to add that to the input seed texts. This step is performed in a loop as per the word limit. The final output is the chained story.
3.2.4
Step 4: Segmentation
The output generated from the prediction lacks sentence formation and proper punctuation. Segmentation breaks down the chained story in segments by forming plausible sentences. This will help in adding punctuations in the next stage. For this purpose, a library called ‘DeepSegment’ is used.
3.2.5
Step 5: Punctuation
Now the sentences are fed to the punctuator to elucidate the meaning of the segmented text which gives us a grammatically sound story. This function is performed by a library called ‘fastPunct.’ The flowchart of the testing module is shown in Fig. 5.
Fig. 5 Flow chart of the testing module
3 Story-Plot Recommender Using Bidirectional LSTM Network
41
4 Results After the model training, the evaluation metrics graphs are plotted. The plot obtained for the trained model of bidirectional LSTM is compared with two other models which were trained on the same dataset. These models are, namely LSTM model and gated recurrent unit (GRU) model. The training accuracy and loss obtained for the three models are depicted in Fig. 6. It is observed that the bidirectional model obtained a high accuracy and reached the lowest loss for the least number of epochs. It gave training accuracy of 96.3%, and the trained loss obtained was 0.27. But the bidirectional LSTM model and the GRU model had very less difference in their performance. Therefore, other metrics were used along with the comparison of the predicted plot generated to ascertain the efficiency of the model. The other metrics used for the purpose of evaluation are: 1. Precision 2. Recall 3. F1-Score These three metrics are evaluated using BERTScore. It is an automatic evaluation metric used in the cases of text generation [19]. It works by computing the similarity
Fig. 6 Training accuracy and Training loss in LSTM, Bi-LSTM, and GRU model
42
M. Dogra et al.
Table 2 Generated result for same input by LSTM, Bi-LSTM, and GRU Input seed
Word limit
LSTM
Bidirectional LSTM GRU
My love
75
My love on by his farmer into the wall and care with gold out hearing me to boast the other shoe quickly. I darkness a quick sheriff! It she itself not to wondered many move not to nails. He what I’m now as alone way. The slipper stood almost made them it. He could a the day, not thinking swim, but the tortoise were carefully served as her long honesty. The water over of ou!
My love was dropped! The leaf for the plants, the Rose slowly began to wilt. The rose saw a sparrow dip its beak into the cactus for some water. The rose then felt ashamed for having made fun of the cactus all this time, but because it was in need of water, it went to ask the cactus if it could have some water the kind cactus agreed and they both got through summer as friends they
My love of talked pass by passed, approached and here to October for thirty. She ago, least had the could and it hand already. It would their service that her pale for Lord The occurred and the companion curiosity too high May. What the especially terror by back were fellow. She low tone over steamers remained Phileas from all in how spite had of longer and time. Some future at point and of Mountain’s live
Once upon a time
75
Once up on a time, the frog hop around and returns in lived the lion, but sees the hen’s daughter to was he built him the leaf to the cactus. He visit her lots with, did not play as be the mouse to his iron girl. Leg and his nearby warns vexed, he or nails with top of deeper, and we reward the keg pacing erratically and meticulously inspecting every facet of detail He hadn’t eaten for a moment
Once up on a time, there lived a lion in a forest one day after a heavy meal. It was sleeping under a tree after a while. There came a mouse and it started to play on the Lion. Suddenly the Lion got up with anger and looked for those who disturbed. It’s nice sleep. Then it saw a small mouse standing trembling with fear the Lion jumped on it and started to kill it. The mouse requested the Lion
Once upon a time of met want Land East beautiful was of coast round a realized. He hid the smiles from, said Hovel This distance them, said God to reach Elizabeth Time. Francis had presence, see Fogg Exhausted, even deeply at the rendered Your and Change Day, be Liverpool Eyes. The dead by ceased look words, those as to reflect of whist a swift manner. Am houses midnight this white! I must down that san a up me returned
3 Story-Plot Recommender Using Bidirectional LSTM Network Table 3 Precision comparison of LSTM, Bi-LSTM and GRU Precision LSTM Bi-LSTM 0.901 0.929 0.913
0.969 0.962 0.976
Table 4 Recall comparison of LSTM, Bi-LSTM and GRU Recall LSTM Bi-LSTM 0.894 0.939 0.904
0.972 0.954 0.918
Table 5 F1-score comparison of LSTM, Bi-LSTM and GRU F1-score LSTM Bi-LSTM 0.897 0.934 0.915
0.970 0.970 0.970
43
GRU 0.925 0.898 0.904
GRU 0.93 0.907 0.901
GRU 0.927 0.903 0.904
score for each token in the predicted text and the reference text. The precision, recall, and F1-score are calculated for each sentence in the predicted plot. Later, these scores are averaged to get the final mean precision, recall, and F1-score. Sample input and output for different seed inputs are shown in Table 2 along with their mean scores which is demonstrated in Tables 3, 4 and 5 for all the three models. As seen from this figure, the stories generated by the bidirectional LSTM model made more sense and also showed higher precision, recall, and F1-score. Therefore, it can be stated that the bidirectional LSTM model is the most efficient in generating plots.
5 Conclusion The story-plot recommender model was successfully built using the best methods of natural language processing and deep learning. Bidirectional LSTM has proved to efficiently remember the connection between the words which helps in predicting better plots as compared to traditional LSTM’s or GRU networks. The training accuracy of 96.3% achieved can produce meaningful results according to the seed word given.
44
M. Dogra et al.
Future scope of the research could be to add more data to the corpus and train the model with more epochs. Furthermore, models can be built for different genres for genre-specific plot recommendation.
References 1. P. Kowalczyk, Which countries publish the most books? (infographic). Ebook Friendly (2020, August 28). https://ebookfriendly.com/countries-publish-most-books-infographic 2. Wikipedia contributors, Story generator, in Wikipedia, The Free Encyclopedia. Retrieved 09:56, September 22, 2020 (2020, January 18). https://en.wikipedia.org/w/index.php?title=Story_ generators 3. Wikipedia contributors. Long short-term memory, in Wikipedia, The Free Encyclopedia. Retrieved 10:02, September 22, 2020 (2020, September 15). https://en.wikipedia.org/w/index. php?title=Long_short-term_memory 4. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 5. Y. Zhao, L. Liu, C. Liu, R. Yang, D. Yu, in From plots to endings: a reinforced pointer generator for story ending generation, in Natural Language Processing and Chinese Computing. NLPCC, ed. by M. Zhang, V. Ng, D. Zhao, S. Li, H. Zan. Lecture Notes in Computer Science, vol. 11108 (Springer, Cham, 2018) 6. A. J., G. V. U., An intelligent automatic story generation system by revising Proppian’s system, in Advances in Computer Science and Information Technology, ed. by N. Meghanathan , B.K. Kaushik, D. Nagamalai, CCSIT 2011. Communications in Computer and Information Science, vol. 131 (Springer, Berlin, 2011). https://doi.org/10.1007/978-3-642-17857-3_59 7. Atsushi Ashida, Tomoko Kojiri, Plot-creation support with plot-construction model for writing novels. J. Inf. Telecommun. 3(1), 57–73 (2019). https://doi.org/10.1080/24751839.2018. 1531232 8. Y. Shim, M. Kim, Automatic short story generator based on autonomous agents, in Proceedings of the 5th Pacific Rim International Workshop on Multi Agents: Intelligent Agents and MultiAgent Systems (Springer, Berlin, 2002), pp. 151–162 9. R.S. Colon, P.K. Patra, K.M. Elleithy, in Random word retrieval for automatic story generation. (Bridgeport, CT, 2014), pp. 1–6. https://doi.org/10.1109/ASEEZone1.2014.6820655 10. A. Fan, M. Lewis, Y. Dauphin, Hierarchical Neural Story Generation , 889–898 (2018) 11. Story-morphing in the affective reasoning paradigm: generating stories semi-automatically for Use with “Emotionally Intelligent” Multimedia Agents (1998). https://doi.org/10.1145/ 280765.280799 12. N. McIntyre, M. Lapata, Learning to Tell Tales: A Data-driven Approach to Story Generation, pp. 217–225 13. M.O. Riedl, Toward Vignette-Based Story Generation for Drama Management Systems (2007) 14. F. Charles, S.J. Mead, M. Cavazza, in Character-driven story generation in interactive storytelling. (Berkeley, CA, USA, 2001), pp. 609–615. https://doi.org/10.1109/VSMM.2001. 969719 15. P. Ammanabrolu, E. Tien, W. Cheung, Z. Luo, W. Ma, L.J. Martin, M.O. Riedl, (n.d.). Story realization: Expanding Plot Events into Sentences. arxiv: 1909.03480 16. E. Concepción, P. Gervás, G. Méndez, Evolving the INES Story Generation System: From Single to Multiple Plot Lines (2019) 17. T. Wang, X. Wan, (n.d.). T-CVAE: transformer-based conditioned variational autoencoder for story completion, in Welcome to IJCAI. https://www.ijcai.org/Proceedings/2019/727 18. P. Ammanabrolu, W. Cheung, W. Broniec, M.O. Riedl, Automated Storytelling via Causal Commonsense Plot Ordering (2020, September 2). arxiv:2009.00829 19. T. Zhang, V. Kishore, W. Felix, K.Q. Weinberger, Y. Artzi (BERTScore, Evaluating Text Generation with BERT, 2019)
Chapter 4
A Novel Technique for Fake Signature Detection Using Two-Tiered Transfer Learning Yohan Varghese Kuriakose, Vardan Agarwal, Rahul Dixit, and Anuja Dixit
1 Introduction Handwritten signature remains one the most important way to authenticate a person’s identity for administrative purposes, legal purposes and more. Signature forgery is the process of falsely replicating the signature of another person which can lead to financial frauds. Current methods of signature fraud detection compare the test signature with its corresponding sample signatures which are already verified. These signatures acquired in the first place determine the comparison/verification method, for both offline and online modes. The online signature verification method uses data like pressure and inclination which are acquired by an input device as attributes and therefore checks if it is genuine or not. These attributes help in detecting fake signatures much better than offline signature methods in which only the scanned images of the signature are available to check if the test signature is genuine or not. But the requirement of these devices makes the online method unsuitable for practical uses. A forged signature is generally classified into three categories, namely random, unskilled and skilled. A random signature forgery is the easiest to detect and is produced when the forger does not know about which signature he/she is forging. An unskilled forgery happens when the forger has a brief idea about how the signature looks. The toughest to detect is a skilled signature fraud where the forger has a copy
First two authors are lead authors and have contributed equally. Y. V. Kuriakose · V. Agarwal Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India R. Dixit (B) · A. Dixit Department of Computer Science and Engineering, Indian Institute of Technology Dhanbad, Dhanbad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_4
45
46
Y. V. Kuriakose et al.
of the original signature and can practise before attempting a forgery. In this paper, we are dealing with skilled signature fraud detection. In this paper, we discuss a new method under offline signature verification by combining classification models and neural nets (transfer learning). The dataset contains genuine and skillfully forged signature images which were first inputted into VGG-16 [24] (convolutional neural network) for feature extraction up to the flattening layer. The sequence of numbers/features is fed into different classifiers for classification into genuine or forged. For the end result, mean of all classifiers is taken for an image to be classified as genuine or forged. The rest of the paper is organized as follows. A brief review of the state-of-the-art techniques is given in Sect. 2. In Sect. 3, we discuss our proposed method in detail. The experimental results are presented in Sect. 4 and the conclusion in Sect. 5.
2 Literature Review In recent years, many researchers have proposed several techniques to counter offline signature forgery. Bhattacharya et al. [4] proposed a forgery detection technique using pixel matching. It is a computationally less expensive solution in which every pixel of the test image is matched with the original one after pre-processing and rotation. The number of matched is used to obtain a percentage. If the percentage is more than a certain threshold, then the signature is classified as real otherwise considered as forged. Hanmandlu et al. [14] proposed a method using angle features extracted from the box approach where the features correspond to a fuzzy set and are modelled on the Takagi–Sugeno (TS) model. Suryani et al. [25] performed image pre-processing, data normalization and an efficient fuzzy Kohonen clustering networks algorithm. Their system consisted of five steps and showed much better results than the previous research. Ooi et al. [18] suggested a technique using radon transform, principal component analysis and probabilistic neural networks and tested their results on their independent signature database and a public database. Ghanim et al. [11] computed various features like run-length distributions, slant distributions, entropy, histogram of gradient features (HoG) and geometric features and passed them to machine learning algorithms like bagging tree, random forest and SVM. They obtained their best results using SVM along with HoG features. Gideon et al. [12] directly fed the images into a convolution neural network after converting the image to grayscale, removing noise and converting it to a binary image. Karouni et al. [15] applied pre-processing techniques to an image to remove noise followed by extraction of features like area, the centre of gravity, eccentricity, kurtosis and skewness which were given as an input to an artificial neural network to classify as real or forged. Hafemann et al. [13] proposed a writer independent method where convolutional neural networks were trained to capture visual cues and learn about features about skilled forgeries and can then be used to distinguish between real and forged signatures. Das et al. [7] also proposed a writer-independent method by using ensemble learning. They used two convolutional neural networks as feature extractors followed by individual XGBoost
4 A Novel Technique for Fake Signature Detection …
47
classifiers. Their probabilities were combined to obtain the final prediction. Poddar et al. [21] uses Harris algorithm and surf algorithm to detect if image is forged or not after using CNN and crest–trough method to detect a signature.
2.1 Our Contributions According to state-of-the-art fake signature detection, it is observed that online methods give better results than offline due to the use of additional hardware for detection. This extra hardware is the main drawback of online methods. We address this problem with a new and improved offline technique, which uses CNN as a feature extractor and a combination of four classifiers. Another problem we observed is the variation of a signature. Health, age and the emotional state contribute to signature variation, which can increase false-positives or false-negatives. By using multiple signatures of a single person to train the model, we address this problem. It will prevent overfitting by the model and reduce false positives and negatives. Due to the human factor involved in obtaining images of signature, there can be changed in aspect ratio or slight rotation in the scanned image of the signature. It may lead to incorrect values of features like the angle of signing and height-to-width ratio, which may result in incorrect predictions. Our model does not depend heavily on these features, and it does not affect the accuracy of our model.
3 Proposed Method In this section, we present an algorithm for fake signature detection, based on VGG16, and artificial neural network. The proposed method is divided into the following broad steps, which are described in detail, next: (a) Pre-processing; (b) Feature extraction using VGG-16; (c) Classification models for training and prediction of features; (d) Second phase of prediction using artificial neural network. Figure 1 shows the operational flowchart of the proposed technique. Below, we present the subsections included in the presented fake signature detection algorithm.
3.1 Pre-processing In this step, the data is split into positive and negative sets each of which is further divided into training and testing sets.
48
Y. V. Kuriakose et al.
Fig. 1 Operational pipeline of proposed algorithm
3.2 CNN as Feature Extractor for Transfer Learning Convolution neural network (CNN) is a deep learning algorithm that captures the spatial and temporal dependencies of an image due to which it is a good algorithm for prediction based on images and also for feature extraction. Transfer learning is a deep learning method where a pre-trained model on a large dataset like Imagenet [8] is reused as the starting point for a model on another task. This technique reduces the huge computing power and time required for training neural networks and gives a huge boost in skill as well. There exist several learning methods like VGG-16 [24], ResNet [2], Inception V3 resent [26], and Xception [5]. The data from the pre-processing step is inputted into a CNN architecture, VGG16. VGG-16 was introduced by Simonyan et al. [24]. This model uses multiple 3 × 3 kernel-sized filter in the convolution layer and hence was an improvement over AlexNet [17]. The model consists of 16 weight layers. The input to the first layer is of a fixed size of 256 × 256 stride fixed to one pixel. The model contains 5 max-pooling layers
4 A Novel Technique for Fake Signature Detection …
49
Fig. 2 Visualizing data after passing through VGG-16 using (1) PCA, (2) LLE and (3) Isomap of three different signatures (a–c)
for spatial pooling with size 2 × 2 and stride at 2 pixels. VGG-16 also has three fully connected layers; but for this method, they are not used as VGG-16 is only used for feature extraction. The data is inputted into the first layer and undergoes several convolution layers as well as max-pooling layers. The data is extracted from the 19th layer, that is, just before the dense layer or fully connected layer. The data extracted is visualized using principal component analysis (PCA) [29], Isomap [28] and local linear embedding (LLE) [30] using Yellowbrick [3] as shown in Fig. 2.
3.3 Classification Models The data acquired from the previous step is inputted into different classification models for training and prediction. The data was tested with many different models out of which four were selected, for their accuracy, which are random forest, Knearest neighbours (KNN), extra trees and SVM.
50
Y. V. Kuriakose et al.
KNN [27] is used for classification and regression problems. As the name suggests, in KNN, output for a prediction depends on k of its nearest neighbours. The distance is calculated with the help of the Euclidean distance formula. Random forest is an ensemble of decision tree. Each tree predicts the class of a test object, and the majority from the trees is considered the final predicted class of said object. Extra trees [10] are like random forest, that is both use decision trees for learning. Instead of computing the locally optimal feature/split combination used in random forest, for each feature under consideration, a random value is selected for the split in extra tree which provides more diversified trees having less correlation to each other. SVM [6] is used for supervised learning, and it finds an optimal hyperplane in multidimensional space between the two classes. It was developed by Vapnik and can be used for both classification and regression purposes.
Fig. 3 Visualizing the results of different ML models using three signatures (a–c) with (1) KNN, (2) Extra Trees, (3) SVM and (4) random forest
4 A Novel Technique for Fake Signature Detection …
51
For random forest, the data was trained and tested with different n-estimator values, for KNN the varying parameter was k, for extra trees like random forest estimators were varied and for SVM different kernels were used. Their predictions are plotted in Figure 3 for the data obtained by VGG-16 using Mlxtend [22]. As the input data had more than three dimensions, to plot the graph, the data was reduced to two dimensions with the help of PCA.
3.4 Artificial Neural Network Artificial neural networks [9] are inspired by the human brain. An ANN generally consists of neurons and an activation function. It learns by gradient descent and backpropagation. Gradient descent is used to update the parameters and find the minimum of the function. Backpropagation is used by gradient descent to update the weights recursively and improve the network’s accuracy. The outputs from all four models are inputted into an ANN for final prediction. The ANN used has a shallow network because the data is simple. It contains four dense layers with the first one having six units second one having twelve units, the third one having six units and the fourth one having a single unit for final prediction. Each dense layer has ReLU as activation function with only the last dense layer having Sigmoid function. The loss function used is binary cross-entropy with optimizer set to Adam. Binary cross-entropy can be calculated as shown in Eq. 1. loss =
−log(y Pr ed) yactual = 0 1 − log(y Pr ed) yacyual = 1
(1)
where yPred is the predicted value, yActual is the actual value and loss is the calculated binary cross-entropy loss. Adam [16] differs from other stochastic gradient descent algorithms [23] in terms of learning rate. Adam maintains learning rates for each network weight. Adam also combines the advantages of AdaGrad and RMSProp that is it uses averages of both the first and second moments of gradients.
4 Experimental Results and Discussion In this section, we present our experimental results pertaining to the performance evaluation of the proposed technique, as well as its comparison with the state of the art. The proposed algorithm has been implemented in python 3.6 with the help of Scikit-learn [20] and TensorFlow [1]. The dataset used to test our model is the MCYT signature dataset [19] which contains the signature of 100 different people. Each signature class has 50 signature, 25 of them positive and 25 of them negative. We
52
Y. V. Kuriakose et al.
evaluate the model with confusion matrix [20], precision [20], recall [20], F1-score [20], average precision score [20]. The performance of the proposed method is evaluated by confusion matrix (cmScore), precision score (precScore), recall score (reScore), F1-score (f1Score), average precision score(AP) defined as: cm Scor e =
T p + Tn × 100% T p + Tn + F p + Fn
pr ecScor e = r eScor e = f 1Scor e = 2 ∗
AP =
(2)
Tp × 100% T p + Fp
(3)
Tp × 100% T p + Fn
(4)
pr ecScor e ∗ r eScor e × 100% pr ecScor e + r eScor e
(5)
N (r eScor en − r eScor en−1 ) pr ecScor en
(6)
n=1
Table 1 Performance analysis using Extra Trees algorithm No. of estimators
Confusion matrix
Precision score
Recall score
F1 score
Average precision score
Balanced accuracy score
Brier score loss
Fbeta score
Hamming loss
Zero Area one loss under the receiver operating characteristic curve
200
0.943
0.9251
0.964
0.9441
0.9098
0.943
0.057
0.9441
0.057
0.057
0.9903
400
0.946
0.9305
0.964
0.9469
0.9150
0.946
0.054
0.9469
0.054
0.054
0.9905
600
0.948
0.9341
0.964
0.9488
0.9184
0.948
0.052
0.9488
0.052
0.052
0.9910
800
0.951
0.9328
0.972
0.9520
0.9207
0.951
0.049
0.9520
0.049
0.049
0.9912
1000
0.943
0.9219
0.968
0.9443
0.9084
0.943
0.057
0.9443
0.057
0.057
0.9913
1200
0.948
0.9274
0.972
0.9492
0.9155
0.948
0.052
0.9492
0.052
0.052
0.9911
1400
0.951
0.9344
0.970
0.9519
0.9214
0.951
0.049
0.9519
0.049
0.049
0.9909
1600
0.942
0.9201
0.968
0.9434
0.9067
0.942
0.058
0.9434
0.058
0.058
0.9907
1800
0.948
0.9291
0.97
0.9491
0.9162
0.948
0.052
0.9491
0.052
0.052
0.9909
2000
0.952
0.9362
0.97
0.9528
0.9232
0.952
0.048
0.9528
0.048
0.048
0.9918
2200
0.951
0.9344
0.97
0.9519
0.9214
0.951
0.049
0.9519
0.049
0.049
0.9916
2400
0.948
0.9291
0.97
0.9491
0.9162
0.948
0.052
0.9491
0.052
0.052
0.9912
2600
0.95
0.9310
0.972
0.9510
0.9189
0.95
0.05
0.9510
0.05
0.05
0.9909
2800
0.947
0.9289
0.968
0.9480
0.9152
0.947
0.053
0.9480
0.053
0.053
0.9908
3000
0.946
0.9255
0.97
0.9472
0.9128
0.946
0.054
0.9472
0.054
0.054
0.9911
4 A Novel Technique for Fake Signature Detection …
53
Fig. 4 Extra tree classifier graphs
where T p , Tn , F p , Fn and n are true positive, true negative, false positive, false negative and nth threshold, respectively. Table 1 shows the performance of the extra trees algorithm with said estimators. We can see that extra trees work well with the data from the CNN as it shows high accuracy throughout and very low losses (Hamming Loss, Zero One Loss, Brier Score Loss). Figure 4 shows the prediction at each estimator value in graph. Table 2 shows the performance of KNN with all signature samples, and we can see that for k equal to two the accuracy and loss have good values. It means, for two neighbours, the KNN algorithm works well for the dataset. Figure 5 shows the performance corresponding to all k values while using KNN. Table 3 shows that random forest performs well with our data. It is due to the property of low correlation between the trees. As all trees are independent of each other, a mistake by one does not affect the other. Low correlation is achieved, by the process of bagging, which means that each tree randomly samples from the dataset and gives different trees. Also, for a particular tree, all features are not used to train but only a subset of randomly selected features. As the dataset used has a lot of features, it results in diverse trees leading to less correlation. The graphs in Fig. 6 show the performance of random forest with different estimator values. Table 4 shows the overall result with different metrics. As evident in Table 4, the proposed model performs reasonably well with an accuracy of 97.3%.
54
Y. V. Kuriakose et al.
Table 2 Performance analysis of KNN technique K value
Confusion matrix
Precision score
Recall score
F1 score
Average precision score
Balanced accuracy score
Brier score loss
Fbeta score
Hamming loss
Zero Area one loss under the receiver operating characteristic curve
1
0.9110
0.8676
0.9700
0.9160
0.8566
0.9110
0.0890
0.9160
0.0890
0.0890
0.9110
2
0.9200
0.9200
0.9200
0.9200
0.8864
0.9200
0.0800
0.9200
0.0800
0.0800
0.9409
3
0.9010
0.8511
0.9720
0.9076
0.8413
0.9010
0.0990
0.9076
0.0990
0.0990
0.9611
4
0.9190
0.8916
0.9540
0.9217
0.8736
0.9190
0.0810
0.9217
0.0810
0.0810
0.9690
5
0.8920
0.8345
0.9780
0.9006
0.8271
0.8920
0.1080
0.9006
0.1080
0.1080
0.9689
6
0.9040
0.8620
0.9620
0.9093
0.8483
0.9040
0.0960
0.9093
0.0960
0.0960
0.9709
7
0.8570
0.7838
0.9860
0.8733
0.7798
0.8570
0.1430
0.8733
0.1430
0.1430
0.9696
8
0.8740
0.8138
0.9700
0.8850
0.8043
0.8740
0.1260
0.8850
0.1260
0.1260
0.9664
9
0.8290
0.7481
0.9920
0.8530
0.7461
0.8290
0.1710
0.8530
0.1710
0.1710
0.9681
10
0.8530
0.7833
0.9760
0.8691
0.7765
0.8530
0.1470
0.8691
0.1470
0.1470
0.9649
11
0.8070
0.7254
0.9880
0.8366
0.7227
0.8070
0.1930
0.8366
0.1930
0.1930
0.9626
12
0.8370
0.7645
0.9740
0.8566
0.7576
0.8370
0.1630
0.8566
0.1630
0.1630
0.9607
13
0.7970
0.7155
0.9860
0.8293
0.7125
0.7970
0.2030
0.8293
0.2030
0.2030
0.9604
14
0.8240
0.7500
0.9720
0.8467
0.7430
0.8240
0.1760
0.8467
0.1760
0.1760
0.9588
15
0.7820
0.7009
0.9840
0.8186
0.6976
0.7820
0.2180
0.8186
0.2180
0.2180
0.9565
Fig. 5 KNN classifier graphs
4 A Novel Technique for Fake Signature Detection …
55
Table 3 Performance analysis of random forest method No. of estimators
Confusion matrix
Precision score
Recall score
F1 score
Average precision score
Balanced accuracy score
Brier score loss
Fbeta score
Hamming loss
Zero Area one loss under the receiver operating characteristic curve
50
0.9360
0.9343
0.9380
0.9361
0.9073
0.9360
0.0640
0.9361
0.0640
0.0640
0.9850
100
0.9340
0.9357
0.9320
0.9339
0.9061
0.9340
0.0660
0.9339
0.0660
0.0660
0.9848
150
0.9330
0.9304
0.9360
0.9332
0.9029
0.9330
0.0670
0.9332
0.0670
0.0670
0.9852
200
0.9390
0.9399
0.9380
0.9389
0.9126
0.9390
0.0610
0.9389
0.0610
0.0610
0.9847
250
0.9320
0.9286
0.9360
0.9323
0.9011
0.9320
0.0680
0.9323
0.0680
0.0680
0.9847
300
0.9320
0.9337
0.9300
0.9319
0.9034
0.9320
0.0680
0.9319
0.0680
0.0680
0.9849
350
0.9340
0.9411
0.9260
0.9335
0.9084
0.9340
0.0660
0.9335
0.0660
0.0660
0.9854
400
0.9390
0.9434
0.9340
0.9387
0.9142
0.9390
0.0610
0.9387
0.0610
0.0610
0.9852
450
0.9320
0.9320
0.9320
0.9320
0.9026
0.9320
0.0680
0.9320
0.0680
0.0680
0.9837
500
0.9380
0.9363
0.9400
0.9381
0.9101
0.9380
0.0620
0.9381
0.0620
0.0620
0.9854
550
0.9330
0.9339
0.9320
0.9329
0.9044
0.9330
0.0670
0.9329
0.0670
0.0670
0.9834
600
0.9300
0.9370
0.9220
0.9294
0.9029
0.9300
0.0700
0.9294
0.0700
0.0700
0.9830
650
0.9320
0.9337
0.9300
0.9319
0.9034
0.9320
0.0680
0.9319
0.0680
0.0680
0.9861
700
0.9340
0.9272
0.9420
0.9345
0.9024
0.9340
0.0660
0.9345
0.0660
0.0660
0.9838
750
0.9340
0.9375
0.9300
0.9337
0.9069
0.9340
0.0660
0.9337
0.0660
0.0660
0.9838
Fig. 6 Random forest graphs
56
Y. V. Kuriakose et al.
Table 4 Performance analysis of proposed method Metric & Loss Value Confusion matrix Precision score Recall score F1 score Average precision score Loss
0.9733 0.9736 0.9736 0.9736 0.9613 0.0267
5 Conclusion This paper introduces a new method to identify the authenticity of signatures using transfer learning. The images are inputted into a VGG-16 for feature extraction, the output of which is then inputted into four different classification models: random forest, K-nearest neighbours, extra trees and support vector machine. The outputs of these models are inputted into an ANN for final classification. This method is verified with many different verification methods and achieved an overall accuracy of 97.3%.
References 1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016) 2. T. Akiba, S. Suzuki, K. Fukuda, Extremely Large Minibatch sgd: Training resnet-50 on imagenet in 15 minutes (2017) 3. B. Bengfort, R. Bilbro, Yellowbrick: Visualizing the scikit-learn model selection process. J. Open Source Softw. 4, 1075 (2019) 4. I. Bhattacharya, P. Ghosh, S. Biswas, Offline signature verification using pixel matching technique. Proc. Technol. 10, 970–977 (2013) First International Conference on Computational Intelligence: Modeling Techniques and Applications (CIMTA) 2013. https://doi.org/10.1016/ j.protcy.2013.12.445. http://www.sciencedirect.com/science/article/pii/S2212017313006075 5. F. Chollet, Xception: Deep Learning with Depthwise Separable Convolutions (2016) 6. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https:// doi.org/10.1023/A:1022627411411 7. S.D. Das, H. Ladia, V. Kumar, S. Mishra, Writer Independent Offline Signature Recognition Using Ensemble Learning (2019) 8. J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in CVPR09 (2009) 9. H.U. Dike, Y. Zhou, K.K. Deveerasetty, Q. Wu, Unsupervised learning based on artificial neural network: a review, in 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS) (2018), pp. 322–327
4 A Novel Technique for Fake Signature Detection …
57
10. P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees. Mach. Learn. 63, 3–42 (2006) 11. T.M. Ghanim, A.M. Nabil, Offline signature verification and forgery detection approach, in 2018 13th International Conference on Computer Engineering and Systems (ICCES) (2018) , pp. 293–298. https://doi.org/10.1109/ICCES.2018.8639420 12. S.J. Gideon, A. Kandulna, A.A. Kujur, A. Diana, K. Raimond, Handwritten signature forgery detection using convolutional neural networks. Proc. Comput. Sci. 143, 978–987 (2018) 8th International Conference on Advances in Computing and Communications (ICACC-2018). https://doi.org/10.1016/j.procs.2018.10.336. URL http://www.sciencedirect. com/science/article/pii/S1877050918320301 13. L.G. Hafemann, R. Sabourin, L.S. Oliveira, Learning features for offline handwritten signature verification using deep convolutional neural networks. Pattern Recogn. 70, 163–176 (2017). https://doi.org/10.1016/j.patcog.2017.05.012. http://www.sciencedirect.com/science/ article/pii/S0031320317302017 14. M. Hanmandlu, M.H.M. Yusof, V.K. Madasu, Off-line signature verification and forgery detection using fuzzy modeling. Pattern Recogn. 38(3), 341–356 (2005). https://doi. org/10.1016/j.patcog.2004.05.015. URL http://www.sciencedirect.com/science/article/pii/ S0031320304002717 15. A. Karouni, B. Daya, S. Bahlak, Offline signature recognition using neural networks approach. Proc. Comput. Sci. 3, 155–161 (2011) World Conference on Information Technology. https://doi.org/10.1016/j.procs.2010.12.027. http://www.sciencedirect.com/science/ article/pii/S1877050910004023 16. D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization (2014) 17. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems ed. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, vol 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neural-networks.pdf 18. S.Y. Ooi, A.B.J. Teoh, Y.H. Pang, B.Y. Hiew, Image-based handwritten signature verification using hybrid methods of discrete radon transform, principal component analysis and probabilistic neural network. Appl. Soft Comput. 40, 274–282 (2016). https://doi.org/10.1016/j. asoc.2015.11.039. http://www.sciencedirect.com/science/article/pii/S1568494615007577 19. J. Ortega-Garcia, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, I. Hernaez, J. Igarza, C. Vivaracho, D. Escudero, Q. Moro, Mcyt baseline corpus: a bimodal biometric database. IEE Proc.—Vision Image Signal Process. 150(6), 395–401 (2003). https://doi.org/10.1049/ip-vis:20031078 20. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M., Perrot, E. Duchesnay, Scikit-learn: mMachine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://dl.acm.org/citation.cfm?id=1953048.2078195 21. J. Poddar, V. Parikh, S.K. Bharti, Offline signature recognition and forgery detection using deep learning. Proc. Comput. Sci. 170, 610–617 (2020). https://doi.org/10.1016/j.procs.2020. 03.133. http://www.sciencedirect.com/science/article/pii/S1877050920305731 22. S. Raschka, Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. J. Open Source Softw. 3(24) (2018). https://doi.org/10. 21105/joss.00638. http://joss.theoj.org/papers/10.21105/joss.00638 23. S. Ruder, An Overview of Gradient Descent Optimization Algorithms (2016) 24. K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition (2014) 25. D. Suryani, E. Irwansyah, R. Chindra, Offline signature recognition and verification system using efficient fuzzy kohonen clustering network (efkcn) algorithm. Proc. Comput. Sci. 116, 621–628 (2017) (Discovery and innovation of computer science technology in artificial intelligence era: The 2nd International Conference on Computer Science and Computational Intelligence (ICCSCI 2017)). https://doi.org/10.1016/j.procs.2017.10.025. http://www. sciencedirect.com/science/article/pii/S1877050917320690
58
Y. V. Kuriakose et al.
26. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the Inception Architecture for Computer Vision (2015) 27. K. Taunk, S. De, S. Verma, A. Swetapadma, A brief review of nearest neighbor algorithm for learning and classification, in 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (2019), pp. 1255–1260 28. J.B. Tenenbaum, V.D. Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). https://doi.org/10.1126/science. 290.5500.2319. https://science.sciencemag.org/content/290/5500/2319 29. M.E. Tipping, C.M. Bishop, Probabilistic principal component analysis. J. Royal Statistical Soc.: Ser. B (Statistical Methodology) 61(3), 611–622 (1999). https://doi.org/10.1111/14679868.00196. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00196 30. Z. Xie, Z. Mu: Ear recognition using lle and idlle algorithm, in 2008 19th International Conference on Pattern Recognition (2008), pp. 1–4. https://doi.org/10.1109/ICPR.2008.4761861
Chapter 5
Personality Prediction Through Handwriting Analysis Using Convolutional Neural Networks Gyanendra Chaubey and Siddhartha Kumar Arjaria
1 Introduction The real personality of a person including his behavior, emotional outlay, self-esteem, anger, imagination, fears, honesty, and many other personality traits predicted using handwriting analysis. Graphology or handwriting analysis is described as scientific study and analysis of handwriting. It is a way of identifying, evaluating, and interpreting behavior of an individual. The performance is measured by examining multiple samples to identify the writer with a piece of handwriting. Professional handwriting examiner is called graphologist. Handwriting is also one of the expressive ways that tells about the nature, psychology, and behavior of writer. It is unique to each individual. Each personality traits has a neurological brain pattern in the human brain, and each neurological brain pattern design delivers one of a kind neuromuscular movement which is same for each person who has those specific personality traits [1]. Graphology was used always; but nowadays, it has been related to the physical personality and emotional activities of the handwriters and their current domain dispute. In the present learning system, graphology is identified by some psychological analysis. Graphologists thought that the mind of a human form the characters based on the attitude of the writer [2]. Handwriting is a complex activity, and it is considered as an “overloaded” skill including highly sequencing of moments. As it has been concluded by WEINTRAUB (1997), several theoretical models and structures that was instructed that handwriting activities concludes retrieving the form, shape, size, and angle of letters, relating these to be their sounds (phonemes), remembering all the specific parameters, and converting them by main execution to the study paper [3].
G. Chaubey (B) · S. K. Arjaria Department of Information Technology, Rajkiya Engineering College, Banda 210201, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_5
59
60
G. Chaubey and S. K. Arjaria
In this paper, architecture of convolutional neural network has been proposed to predict the personality of a person from the different features extracted from his handwriting. Handwriting is also called brain writing which determined the reflection of an individual brain in many ways. The prime features of handwriting are slant of word, baseline, pen pressure, break in writing, margin, and design of letters [1, 4]. Literally, with the generation or time, handwriting performance becomes automatic and motor technology. More researchers’ studies have shown that adults need to think mostly about the size, shape, form, and positions of letters, tending to write slowly and in larger alphabetic letters. On other hand, handwriting is one of the old models of correspondence in our civilization which has developed over time. Another way some writer learn to write anything by copying forms and shapes from a standardized book store which itself increases as a functionality of the geographical place, temporal, society, and culture preferences [3]. Writer evaluation or identification is used for different purpose like, for monetary activities, forensic, and security purpose. Handwriting analysis or graphology is also used in criminal justice system or organization to judge the culprits of handwritten documents. Handwriting analysis or graphology plays a vital role in the fields such as medical diagnosis, recruitment, in selection of staff, forensic, and psychology. Graphology reveals the physiological and psychological conditions of the patient, so we can say that it is used as an essential tool in psychological and medical diagnosis [5].
2 Related Works In the recent works, we have found that there is no standard method has been developed for predicting behavior of a person on the basis of handwriting. There are many researchers have been done in the area of computer science to develop such system which is able to recognize the personality of a person based on handwriting in an easy way. In the further paragraphs, the recent works which has been done to determine the handwriting and psychological traits are described. Fallah and Khotanlous [6] explained about the personality prediction using handwriting studies of an individual. They have used Minisota multi-phasic personality inventory for training the model and used neural network (NN) to classify the properties which are not related to the writer and employed Hidden Markov Model (HMM) to classify the properties of target writer. The accuracy obtained in this model was over 70%. A system was proposed to predict the personality traits from handwriting using various features. The feature includes “The lower loop of letter ‘y,’ letter ‘t’, ” the slant of writing and pen pressure. The personality traits of the writer on the Myers-Briggs Type Indicator was classified using a rule-based classifier. The accuracy obtained over the scale of 70% [7].
5 Personality Prediction Through Handwriting …
61
The combined model of Support Vector Machine (SVM), Adaboost, and KNN classifier is used with each of seven personality features for analysis of the handwriting traits. The final result was ranging from 62.5% to 83.9% [8]. Siddiqui et al. used Artificial Neural Network (ANN) and Support Vector Machine (SVM) in order to identify male and female from respective handwriting. The features used to analyze are slant, curvature, legibility, and texture. Based on the database of males and females handwriting, the system predicted over 80% accuracy on analysis of gender of writer [9]. Using random forest algorithm and kernel discriminant analysis, a model was proposed to analyze handwriting on the basis of geometric features. This system predicts gender, age, nationality with 75.05%, 55.76%, and 53.66% accuracies, respectively, for an individual writer. But when collective result of all writers was estimated, then accuracies are gender: 73.59%, age: 60.62% and nationality: 47.98% when each wrote a different text [10]. Gil Luria and Sara Rosenblum created a model to discriminate between the low and high mental workloads on the basis of handwriting features. 56 participants were asked to write three arithmetic progressions on a digitizer. The features extracted on the basis of temporal spaces but less in pressure space. The three clusters of the handwriting types were identified using data reduction and result obtained that mental workload is obtained by affected handwriting behavior [11]. Zaarour et al. used Bayesian-network-based model to enhance the performance of students by using a model which takes input of different drawing and writings. The whole model was able to predict the writing behavior of a child which can be analyzed by a psychologist in order to improve the child’s education [12]. In the similar way, Sudirman et al. worked on prediction of behavior of children using handwriting, since children have less impact in terms of cultural background, and their cognition rate is also too good. Hence, children are the best choice in order to do the analysis of handwriting. That is why they developed an automatic system which is able to find out the developmental disorders among children. The system was able to perform with accuracy of over 78% [13]. Another analysis based on handwriting is the detection of deceit in an individual’s character. Luria et al. worked on this research in which they developed which analyzes the handwriting in order to detect the false statement provided by a patient about his health. In their first step, they used a paper to write the true/false statements about the medical conditions and that paper was further linked to a digitizer. All the subjects belonging to the experiment were asked to write. In the second step, the outputs of a particular subject containing deceptive and truthful writing were compared accordingly, and these subjects are divided into three different groups based on handwriting profiles. In the further analysis, it was found that the deceptive writing is much different from other two types of writing. Deceptive writing takes more time and is broader. The other two types of writings were different in both spatial and temporal vectors [14]. As we are aware about detecting deceit, human diseases can also be predicted using handwriting. In this research, a system is proposed which detects diabetic’s disease using handwriting with accuracy over 80% [15]. Researchers used handwriting in
62
G. Chaubey and S. K. Arjaria
order to predict a disease named micrographia, i.e., this is a disease in which there is decrease in the size of velocity, acceleration, and size of letters while writing. This disease is also associated with Parkinson’s disease (PD). It was tested on the patients who were diagnosed of PD, and the result gave 80% accuracy on 75 tested subjects [16]. In order to analyze the multitude of handwriting features, there is a wide range of methods that can be employed. The first step is the normalization of written sample in the offline handwriting analysis to be ensured for any possible noise. Morphological methods are typically used for removing background noise as a part of normalization phase [17]. Laplace filters, unsharp masking, or gradient masking are used for sharpening in the normalization phase [18]. Contrast enhancement is an important factor for obtaining the high accuracy in handwriting analysis [19]. Methods of contour smoothing are essential for the written letters to be a part of normalization phase [20]. Different types of thresholding can be employed after all processing steps to handwriting sample being compressed and converted to grayscale image [21]. Through page segmentation, the written text needs to delimit by post compression where methods for examining the background and foreground region are employed. The most common was the whitespace rectangles segmentation [22]. Vertical projection profile method has been used to segment the handwriting images into text lines and words [23]. Different classifiers have been used for the features classification of handwriting behaviors. In the above sections, we have presented the analysis. The percentage of personality traits like pessimistic, optimistic, balanced and shy, etc., is predicted by Nikit Lemos et al. in 2018 [24] using the convolutional neural network on the English Handwriting. Fatimah et al. [25] used the CNN method to evaluate the accuracy of the personality prediction on six different traits on structuralbased approach and symbol-based approach. Nijil Raj et al. [26] used extracted seven features and then applied CNN mixed with MLP to predict the personality traits using calligraphy. Keeping in mind all the above proposed works, this research work provides a convolutional neural network model for predicting the five important personality behaviors by analysis of handwriting. This system is proposed to give the accuracy of the convolutional neural network (CNN) in analyzing the handwriting to predict the five big personality traits. In next session, we will present our model and architecture of the system.
3 Research Technique and Design The five big personality traits which are going to predict here are: (a) (b) (c)
Agreeableness, Conscientiousness, Extraversion,
5 Personality Prediction Through Handwriting …
63
(a) English Handwriting
(b) Hindi Handwriting Fig. 1 Samples of handwriting
(d) (e)
Neuroticism, Openness.
These five big personality traits are treated as target classes for the convolutional neural networks. The image dataset of each class has been created by taking the individual handwriting of students of an engineering college and taking their personality kinds as classes. The handwriting of both Hindi and English has been taken. The handwriting samples contain a paragraph including all the letters of English alphabet as well as all the letters of Hindi alphabets. The samples of the handwriting are shown below in Fig.1.
3.1 CNN Architecture for Handwriting Analysis: A CNN consists of the number of convolution layers and pooling layers. An input image is p × p × m where p × p is the height and width of the image and m is the number of channels, for example, an rgb image has 3, and convolution layer has k filters of size q × q × n where q < pq < p and n ≤ mn ≤ m. The architecture of the convolutional neural network proposed to solve the problem is shown below in Fig. 2. The color coding of the architecture of the convolutional neural network is given in Fig. 3.
64
G. Chaubey and S. K. Arjaria
Fig. 2 Architecture of convolutional neural network (CNN)
Fig. 3 Color coding for the architecture of CNN
Input Layer Convolutional Layer Pooling Layer Flatten Layer Dense Layer Dense Layer1 (Output Layer)
Input Layer: The input layer of the architecture takes the input from the dataset. Convolution Layer: Convolution layer performs the convolution operation on the image to extract the features from the image. A weighted matrix is formed using the input image and the kernel. The convolution operation is given by Eq. (1)
Wi ∗ X i
Wi ∗ X i
(1)
Pooling Layer: Using the features obtained after convolution for classification, the pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. Flatten Layer: In between the pooling layer and fully connected layer, there is a layer known as Flatten layer. This layer transforms a two-dimensional feature matrix into a vector that can be fed into a fully connected neural network layer. Dense Layer: The deeply connected neural network layer is known as dense layer. It is the most commonly and frequently used layer takes up the input and returns the output. Summary of the layers and parameters for the training of the model is shown in Fig. 4.
4 Results and Analysis The whole dataset is categorized into Hindi and English. The number of images of the handwritings in the dataset was 108 writers handwriting of different category based on their personality. The general steps followed for all the experiments are:
5 Personality Prediction Through Handwriting …
Fig. 4 Summary of the architecture of the convolutional neural network
1.
Data Augmentation • • • •
2.
Rescaling image Horizontal flip Shear range Zoom
Input Augmented Data • Training data • Testing data
3. 4. 5. 6.
Build the model Compile the model Fit the model Evaluate the performance
65
66
G. Chaubey and S. K. Arjaria
4.1 Experiment 1 The Experiment 1 is done with the analysis of English handwritings. The English handwriting of 110 writers containing the five big personality traits has the following number of images in training (Table 1) and test (Table 2) dataset: For training data: For test data: With the above data, the average accuracy obtained is 0.43184 and average loss is 1.44 as shown below in Fig. 5. Table 1 Number of training data for Experiment 1
Table 2 Number of testing data for Experiment 1
Personality trait
No. of handwritings
Agreeableness
15
Conscientiousness
13
Extraversion
4
Neuroticism
18
Openness
38
Personality trait
No. of handwritings
Agreeableness
4
Conscientiousness
4
Extraversion
1
Neuroticism
4
Openness
9
Fig. 5 Training and validation accuracy and loss graph of English handwriting
5 Personality Prediction Through Handwriting … Table 3 Number of training data for Experiment 2
Table 4 Number of testing data for Experiment 2
Personality trait
67 No. of handwritings
Agreeableness
15
Conscientiousness
14
Extraversion
4
Neuroticism
18
Openness
38
Personality trait
No. of handwritings
Agreeableness
4
Conscientiousness
4
Extraversion
1
Neuroticism
4
Openness
9
4.2 Experiment 2 The Experiment 2 is done with the analysis of Hindi handwritings. The Hindi handwriting of 111 writers containing the five big personality traits has the following number of images in training (Table 3) and test (Table 4) dataset: For training data: For test data: With the above data, the average accuracy obtained is 42.63 and the average loss is 1.44. The graph of training and validation accuracy is shown below in Fig. 6.
4.3 Experiment 3 The experiment 3 is done with analysis of mixture of all Hindi and English handwritings. The number of writers was 110 of English and 111 of Hindi writing. The mix handwriting of 221 writers containing the five big personality traits has the following number of images in training (Table 5) and test (Table 6) dataset: For training data: For test data: With the above data, the average accuracy obtained is 42.94 and the average loss is 1.42. The graph of training and validation accuracy is shown below in Fig. 7.
68
G. Chaubey and S. K. Arjaria
Fig. 6 Training and validation accuracy and loss graph of Hindi handwriting
Table 5 Number of training data for Experiment 3
Table 6 Number of testing data for Experiment 4
Personality trait
No. of handwritings
Agreeableness
30
Conscientiousness
27
Extraversion
8
Neuroticism
36
Openness
76
Personality trait
No. of handwritings
Agreeableness
8
Conscientiousness
8
Extraversion
2
Neuroticism
8
Openness
18
5 Conclusion and Future Scope If the dataset is created on the basis of the labeling of the data by the graphologist, then the accuracy obtained is more. The dataset can also be created by taking the handwriting of the writers by creating the handwritings on creating a temporary emotion by showing them a movie of that emotion. We can also train the model with a large dataset to improve the model accuracy.
5 Personality Prediction Through Handwriting …
69
Fig. 7 Training and validation accuracy and loss graph English and Hindi (Mix) handwriting
References 1. H.N. Champa, K.R. Ananda kumar, Automated human behaviour prediction through handwriting analysis, in 2010 first International Conference on Integrated Intelligent Computing (2010) 2. M. Gavrilescu, 3-layer Architecture for Determining the Personality Type from Handwriting Analysis by Combining Neural Networks and Support Vector Machines, Department of Telecommunication (2017) 3. G. Luria, S. Rosenblum, A Computerized Multidimensional Measurement of Mental Work Load via Handwriting Analysis, Department of Human Services and Department of Occupational Therepy (2012) 4. S. Mukherjee, I. De, Feature extraction from handwritten documents for personality analysis, in 2016 International Conference on Computer, Electrical and Communication Engineering (ICCECE) (2016) 5. M. Sachan, S.K. Singh, Personality detection using handwriting analysis, in Seventh International Conference on Advance and Computing, Electronics and Communication (ACEC) (2018) 6. B. Fallah, H. Khotanlou, Artificial intelligence and robotics (IRANOPEN), in Identify Human Personality Parameters Based on Handwriting Using Neural Networks (April 2016) 7. H.N. Champa, K.R. Anandakumar, Automated human behavior prediction through handwriting analysis, in 2010 First International Conference on Integrated Intelligent Computing (ICIIC), pp. 160–165 (August 2010) 8. Z. Chen, T. Lin, Automatic personality identification using writing behaviors: an exploratory study. Behav. Inform. Technol. 36(8), 839–845 (2017) 9. I. Siddiqi, C. Djeddi, A. Raza, L. Souici-Meslati, Automatic analysis of handwriting for gender classification. Pattern. Anal. Appl. 18(4), 887–899 (2015) 10. S. Maadeed, A. Hassaine, Automatic prediction of age, gender, and nationality in offline handwriting. EURASIP J. Image Video Process. 2014, 10 (2014) 11. G. Luria, S. Rosenblum, A computerized multidimensional measurement of mental workload via handwriting analysis. Behav. Res. Methods 44(2), 575–586 (2012)
70
G. Chaubey and S. K. Arjaria
12. I. Zaarour, L. Heutte, P. Leray, J. Labiche, B. Eter, D. Mellier, Clustering and Bayesian network approaches for discovering handwriting strategies of primary school children. Int. J. Pattern Recognit. Artif. Intell. 18(7), 1233–1251 (2004) 13. R. Sudirman, N. Tabatabaey-Mashadi, I. Ariffin, Aspects of a standardized automated system for screening children’s handwriting, in First International Conference on Informatics and Computational Intelligence (ICI), pp. 48–54 (December 2011) 14. G. Luria, A. Kahana, S. Rosenblum, Detection of deception via handwriting behaviors using a computerized tool: toward an evaluation of malingering. Cogn. Comput. 6(4), 849–855 (2014) 15. S.B. Bhaskoro, S.H. Supangkat, An extraction of medical information based on human handwritings, in 2014 International Conference on Information Technology Systems and Innovation (ICITSI), pp. 253–258 (November 2014) 16. P. Drotar, J. Mekyska, Z. Smekal, I. Rektorova, Prediction potential of different handwriting tasks for diagnosis of Parkinson’s, in 2013 E-Health and Bioengineering Conference, pp. 1–4, November 2013 17. W.L. Lee, K.-C. Fan, Document image preprocessing based on optimal Boolean filters. Signal Process. 80(1), 45–55 (2000) 18. J.G. Leu, Edge sharpening through ramp width reduction. Image Vis. Comput. 18(6), 501–514 (2000) 19. S.C.F. Lin et al., Intensity and edge based adaptive unsharp masking filter for color image enhancement. Optik Int. J. Light Electron Optics 127(1), 407–414 (2016) 20. R. Legault, C.Y. Suen, Optimal local weighted averaging methods in contour smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 18, 690–706 (1997) 21. Y. Solihin, C.G. Leedham, Integral ratio: a new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999) 22. K. Chen, F. Yin, C.-L. Liu, Hybrid page segmentation with efficient whitespace rectangles extraction and grouping, in 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 958–962 (2013) 23. V. Papavassiliou, T. Stafylakis, V. Katsouro, G. Carayannis, Handwritten document image segmentation into text lines and words. Pattern Recogn. 43, 369–377 (2010) 24. N. Lemos et al., Personality Prediction based on handwriting using machine learning, in 2018 International Conference on Computational Technique, Electronics and Mechanical Systems (CTEMS), Dec. 2018 25. S.H. Fatimah et al., Personality features identification from handwriting using convolutional neural network, in 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE) (2019) 26. N. Nijil Raj, et al., Forecasting personality based on calligraphy using CNN and MLP. Int. J. Comput. Sci. Eng. 8(7) (2020)
Chapter 6
A Review on Deep Learning Framework for Alzheimer’s Disease Detection from MRI Parinita Bora and Subarna Chatterjee
1 Introduction The MRI image captures the cerebral changes manifested by Alzheimer’s disease (AD) brain tissues. Deep learning (DL) [5, 6] approach can provide high accuracy predictions in neuro-images. Due to the gap between research community and clinical practice, validation of machine learning (ML) models, complex manifestation and variations in symptoms of the disease, machine vision (MV) realization, and generalization remained complex. For decades, the other bio-marker-cerebrospinal fluid (CSF) study could provide good inferenceing, yet it has limitations for its invasive nature, which require especially expert clinician. The MRI enabled imaging of brain tissue is most common, easy, non-radioactive, and non-invasive way of medical image analysis. Due to the availability of MRI data, researchers tried to establish relationship between the brain structure in the MRI image and raw pathological data. The data selection and feature extraction can consume most of the time of a data scientist to build and test a ML model. In such a case, DL techniques like convolutional neural network (CNN) come to help ML when ML engine has to set rules from the data in input domain. CNNs can learn detecting the features in visual data [7]. An approach is described for a simple model using CNN to extract the features and to provide inference on the basis of every portion of the brain. The organization of the paper is as follows: Section 1 comprises the introduction. Section 2 presents review of AD using CNN. Section 3 describes the methodology. Section 4 covers the results with discussion and challenges. Section 5 presents the concluding remarks. P. Bora (B) MLIS, Department of CSE, FET, Ramaiah University of Applied Sciences, Bangalore, India S. Chatterjee Deparment of CSE, FET, Ramaiah University of Applied Sciences, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_6
71
72
P. Bora and S. Chatterjee
2 Literature Review Using CNN In many vision problems, CNNs have arrived at the state-of-the-art performance to build, train, and test models in considerably low reasonable amount of time. Figure 1 shows that the Google scholar publications using CNN in general have grown tremendously high. However, Alzheimer’s classification or prediction using CNN is very negligible comparatively. The maximum change in growth happened during 2015–2016. The factors that have accelerated this growth are 1. Increasing amount of data, 2. Increasing availability of high computing power supported by hardware and software, and 3. New developments of mathematical techniques and their applicability in DL, e.g., are batch normalization, regularization, parameter optimization structural reformation, etc. Available literature to detect Alzheimer’s form MRI images using CNN after the year 2014 is considered here.
2.1 The Basic CNN and Architectural Innovations in CNN CNNs are brain inspired and are based on how mammal’s vision perceives the world. The root of concept for such a model is as old as researches carried out in last 6–7 decades. CNNs are widely used in ML because of its feature extraction and discrimination capability. Similar to human vision, it can distinguish features meaningful to it. A CNN can have several layers of convolution and max pool. The neurons in one layer connect to a portion of the next layer. At each layer, convolution is carried out using a filter or kernel, to produce a feature map. Between the CNN layers, a pooling layer is added to reduce the computations by reducing number of parameters. Max pooling picks up dominant information from a neighborhood. This helps to reduce the number of parameters maintaining the information. Depending on the architecture, there can be connections that connects one layer’s output to another
Fig. 1 Count of publication in scholar.google.com (approx. by string search)
6 A Review on Deep Learning Framework …
73
layer which is not adjacent to it. At the end, a prediction layer is added with softmax or global average pooling followed by a fully connected (FC layer). The number of required layers is decided by the level of features to be extracted. The simple features are extracted at the beginning, and complex features are at the end, deep down. A known thumb rule is applied with the layers till the size of the image becomes equal to receptive field or image size is 1 × 1. Practically though it can stop much before when the features are well enough extracted. Higher-dimensional convolutions may add more computational cost, for which there are special techniques to handle the situations. Changing channel sizes allow it to go deeper. In the recent years, researchers are working to fine-tune the CNN model, especially in vision, speech, and natural language processing. Table 1 gives the evolution of CNN architectures in the last few decades.
2.2 Related Work Approaches Based on features extraction for training ML model for diagnosis of AD, researchers had widely used the region of interest (ROI) [8–11]-based feature extraction and voxel [12, 13]-based feature extraction. ROI-based features are specific to the structural or functional changes in the brain. A few example of such known abnormalities due to manifestations is hippocampal volume, gray matter volume, cortical thicknesses, etc. The voxel-based method does not depend on any structural change. It tries extracting the features which represent some measurements, e.g., tissue densities for gray matter, white matter, and cerebrospinal fluid. This traditional feature extraction has several disadvantages. In case of ROI-based detection, it required proficient clinical knowledge in AD area. The pre-processing is also known to be complex. Also sometimes specific assumed region is not sufficient to represent the brain. The voxel-based features remained computationally heavy and limited. The CNNs were adopted as solution to overcome the disadvantages in the traditional feature selection techniques. There were several approaches applied for improvement in AD diagnosis using CNN. A few techniques as surveyed from public domain are highlighted here.
2.2.1
Use of Multi-modality
The concept of using multiple modalities is very useful considering the AD manifestation is very complex one. The performance can be enhanced using multi-modality [2, 4, 11, 14]. Multiple modalities for data completion and simple CNN feature extraction can be utilized where there is not enough information; corresponding PET image information can be combined [15]; a large number of literature have used clinical data and considered multi-modality for better results.
74
P. Bora and S. Chatterjee
Table 1 Evolution of CNN architectures No.
Era s year
Researcher and application area
1.
Experimental NN (1959, 1962)
Hubel and Wiesel
2.
Convnet and LeNet Era (late 1980–1999
• Multilayered CNN ConvNet (supervised) was by LeCuN et al. (root from Neocognitron- unsupervised) • Application : ConvNet for Vision • ConvNet is modified to LeNet-5 ( having good rotational invariance and good discrimination) • Application : LeNet-5 for document recognition and Commercial use at bank and ATM for character recognition is 93
3.
Stagnant stage for CNN (SVM and Statistical • In 2003, Simard et al. showed better result methods were popular) (late 90–early 2000) than SVM • MNIST—Hand digit benchmark dataset Application: • modified for the optical character recognition (OCR) • modified to other scripts • modified for video analytics, scene analysis • modified marketing to detect customers • modified for image segmentation • modified for machine vision
4.
Revival of CNNs (2006–2011)
• In 2006, Jaffrey Hinton proposed greedy approach for training • Huang et al. used max pooling • In 2006 GPUs evolved, in 2007, NVIDIA launched CUDA programing language • In 2010, in Stanford, Fei-Fei Li established database Image Net with millions of labeled image.
5.
Rise of CNN (2012–2014)
• In 2012, AlexNet won ILSVRC competition • Comparison of AlexNet with 2006 version proved earlier version had failed for lack of computing power and training data • AlexNet introduced regularization, that improves training by reducing over-fitting • In 2013, Zeiler and Fergus used visualization for verifying features • In 2014 GoogleNet and VGG (named after Visual Geometry Group, University of Oxford) came to lime light imageNet won ILSVRC and VGG won second. • Batch normalization
(continued)
6 A Review on Deep Learning Framework …
75
Table 1 (continued) No.
Era s year
Researcher and application area
6.
Rapid Architectural Innovations (2015 onwards)
• Srivastava et al. (2015) used cross channel connectivity using information gating that resolve vanishing gradient problem ; improved representational capacity • In 2015, residual block(skip connection) concept developed” (this is similar to cross channel connectivity) • With skip connections many more variants many more possibilities ( densenet is as example for this) • Hu et al. (2017) developed SE(Squeeze and Excitation, that used feature map in learning • Khan et al (2018) introduced channel boosting (CBCNN), a better way for learning discriminative feature • In 2018, Khan used CB CNN for medical image processing
2.2.2
Use of Data Augmentation
There were a couple of highlights for data augmentation for patch-based CNN methods with data augmentation of random zooming in-out-cropping to the region. This is efficient to increase the performance; simple scaling, an affine transformation, can improve the discriminative ability in patched-based label fusion method. The basic use of data augmentation is to obtain additional valid data for better learning by the model. New valid labeled data can be generated based on existing data can reduce over-fitting and enhance the accuracy of the classifier. However, in this specific case of Alzheimer’s, the applicability of image augmentations has to be considered very carefully, since the real world data cannot be manually estimated here as AD manifestation is much complex.
2.2.3
Use of Multi-channel Learning and Addressing Class Imbalance
Several researchers have used clustered patches [3, 4], cascaded CNNs [16], and deep multi task multi-channel learning [16] along with multi-modality; using the architecture U-Net and a CNN for extracting feature from hippocampus [17] reported promising result. DE-convolution network, U-Net for pixel-wise prediction task shows improvement in resolution.
76
2.2.4
P. Bora and S. Chatterjee
Ensemble Learning
Bragging, random forest, and boosting are examples of ensemble methods. The concept of ensemble is to construct a base classifier for a specific problem. The learning is such that the features can be discriminated, and votes of classes are produced. The weights for these votes’ form the database based on which new data can be classified. An ensemble can be a homogeneous or heterogeneous depending on if same type of algorithm is applied on training data or the other way. Several researchers for AD proposed ensemble method reporting better results [18–22]. There was use of extraction of overlapping 3D patches for the volume of the MRI image and training of light CNNs patch wise. A CNN (ensemble) is trained finally to provide a classification. This method is simple and efficient in terms of the resources and time.
2.2.5
Transfer Learning
The initial layers of CNN detect the low-level features, and the later layers capture high-level features specific to an application. So, a practice of using transfer learning is to replace the last FC layer with a pre-trained ConvNet along with a new FC layer. The new FC layer has to be according to the number of classes in the new application to which transfer is applied. Transfer learning is easily attainable when the distance or difference between the two applications is less. Finer-tuning will be required when the difference is more between the two applications. A known best practice is to initiate fine-tuning in the last layer and then including next deep layer for tuning incrementally till expected performance is arrived at. Using of deep CNNGoogleNet and CaffeNet with transfer learning and data augmentation claims for high classification accuracies [1, 23]. However, there is limitation for computationally heavy and large number of parameters caused by FC layers. There is claims for CNN-based architecture using ResNet, pre-trained transfer learning, and AlexNet in multi-class classification. In other brain imaging areas as well, there is use of transfer learning. In combination with ensemble of CNNs, (P-Net, U-Net, and ResNet) showed good accuracy with other brain image data [22].
2.2.6
Long Short-Term Memory(LSTM) and Recurrent Neural Network-Based Spatial Information Extraction from Feature Map
A framework using 3D-CNN and fully stacked bidirectional (FSBi) LSTM reported good results than 3D-CNN [24]. Spatial information extraction is used on deep chronnectome learning using LSTM [25]; a combination of convolution and recurrent neural network showed good classification [26].
6 A Review on Deep Learning Framework …
2.2.7
77
Use of 3D SEnet (Squeeze-and-Excitation Networks)
Squeeze-and-Excitation block is self-attention function on the channel [27]. SEnet can learn automatically each feature channel weights. It is able to enhance the useful features, at the same time suppressing the useless ones. From online information by ILSVRC, ImageNet classification error is the least for SENets (ImageNet classification error: SEnet < Ensemble < ResNet < GoogleNet < AlexNet < feature engineering). As it is known that neuroimaging is different than any other vision task, hence careful experimentation is required while applying any such technique. A combination of CNN with ensemble learning using 3D SEnets showed that the CNN with ensemble performed better classification result in the experiment than SEnet with Ensemble [28].
3 Methodology 3.1 Data Collection A homogenous section of ADNI image data (in 3D .nii format, same as NIfTINeuroimaging Informatics Technology Initiative) T1 weighted of modality MRI, field strength 2-3T; thickness 1.2–1.3 ‘MPRAGE’ (Magnetization prepaid Rapid Gradient Echo Imaging) files selected. This included 67 subjects in sagittal plane which are with grade-warping, intensity correction. The dataset consists of a set of files in .nii format and one file in .csv format describing the data. The high-resolution image (240 × 256 × 160) comes embedded with the file format. The supervised known categories information is used as supervised classification label. They are cognitive normal (CN), AD (diagnosed as AD and remained AD for subsequent follow-ups), MCI (mild cognitive impairment, the subjects which were categorized under MCI, EMCI, LMCI, remained in a state that did not convert back to CN in subsequent follow-ups). Figures 2, 3, and 4 show example slices for MRI images
Fig. 2 MRI of a CN subject (a : axial slice 128, sagittal slice 80, coronal slice 120)
78
P. Bora and S. Chatterjee
Fig. 3 1 MRI of a MCI subject (axial slice 128, sagittal slice 80, coronal slice 120)
Fig. 4 MRI of an AD subject (axial slice 128, sagittal slice 80, coronal slice 120)
3.2 Tools and Infrastructure Alzheimer’s disease neuroimaging initiative (ADNI) data archive is accessible with specific access request. ADNI has specific data archive tool in its website along with guideline to selectively prepare the image dataset and extract the files for specific file formats. Openly available Jupiter notebook and python libraries, FSL [29] for pre-processing are used for experiments.
3.3 Algorithm Development A high-level diagram for a ML model is as shown in Fig. 5. In CNN-based model development, training and testing are an iterative decision-making process for tuning the parameters to arrive at an optimum model. In case of deep network, restructuring of the units is also experimented while doing the manual runs for architecture search. There is no automated architecture search mechanism to predict and optimum model structure for a particular recognition problem. In this experiment, a 3D patch level with multiple CNN is experimented as shown in Fig. 6. Compared to a single CNN with heavy and deep architecture, multiple patches-based CNN with simple and lighter architecture is with less computational overhead.
6 A Review on Deep Learning Framework …
79
Fig. 5 A high-level diagram for a ML approach
Fig. 6 A method for 3D patch level (A variant of reference models [2, 4])
3.4 Pre-processing For ML model, it is mandatory that the signal-to-noise ratio is maximum for optimum accuracy. The dataset is chosen for the experiment is already from a homogeneous group as mentioned in Sect. 2.1. The pre-processing steps below are carried out in sequence as shown in Fig. 6.
80
P. Bora and S. Chatterjee
Fig. 7 Skull stripped images from MRI sagittal slices
3.4.1
Skull Stripping
For CNN-based learners, to detect the features for AD, the part of image representing brain tissues in the original MRI is the significant data. The skull part is noise in this case. (The MRI slices comes with shown in Figs. 2, 3 and 4 are with the skull in original image). Skull stripping is done for ignoring the non-brain tissue. With ADNI data, author was unable to find standardized skull stripped set; hence, an open source generalized and specialized framework FSL [29] is used. Figure 7 shows example skull-stripped images from MRI sagittal slices.
3.4.2
Intensity Rescaling
In case of MRI images, being non-quantitative, the ranges and intensity distribution of the same tissue type differ [10]. So to remove the adverse effect, rescaling is required. CNN can capture the features more efficiently when the input is normalized. Each pixel in the skull stripped image is normalized between intensity values 0 and 255.
3.4.3
Affine Registration
In the approach, multiple CNNs of simple architecture are to extract the features from brain sub-regions for optimal learning and performance. So each network has to look at the same physical region of the brain tissue. Firstly, there can be minor variations of position of head while capturing the MRI, and secondly, there are minor variations in size of human brains. FSL framework is used for this purpose, enabling the same reference in the coordinate system and alignment.
3.4.4
Down Sampling
By reducing the size of the network in all three dimensions, the training time can be reduced. To achieve reduced training time, the image size is scaled to a smaller (40 × 48x48) for all three dimensions.
6 A Review on Deep Learning Framework …
81
3.5 Architecture of Convolution Neural Network Model Convolution network structure of a unit network is shown in Fig. 8. Similar multiple number of networks are used to extract as many as overlapping patches. A variant reference network extract multiple of overlapping patches with additional clustering mechanism of k-means clustering of the patches was used in the reference models. They used one CNN per cluster to train the networks followed by ensemble learning to provide the decision at the subject level. In this experiment, the approach is simpler. The dataset is smaller. Image registration used and no clustering is done. The change of the number of patches scanned shows slight changes in the performance and training time.
Fig. 8 Architecture used for a single CNN network for the approach
82
P. Bora and S. Chatterjee
3.6 Prediction Using Boosted Forest Changes due to AD manifestation can happen in the brain in various ways. Hence, a multiple CNN method with overlapping patch is reasonable. Input to the random forest model is the inference output of the trained multiple CNNs on cross-validation data and concatenated with the probabilities. The output is cross-validation labels.
4 Results, Discussions, and Challenges The True Positive rate (TPR) versus False positive rate (FPR) is as in Fig. 9 Further the multi-label classification report is as in Fig. 10. The patched-based multi-CNN approach followed by boosting and classification has advantage in the simplicity, reduced computation power, and training time. The performance is different for the variant architecture used, the way of different generalization or pre-processing involved and comparatively much smaller dataset for training. The author met with a few challenges in the study. The inherent complexity of the MRI data and how it is stored is a challenge. There are variant number slices with variant information in image files. Figure 11 shows random continuous slices after skull stripping of images in a sagittal plan, where some slices carry minimal information. Data reduction is to consider minimally and carefully in order to avoid AD early manifestation. Due to the difference in dataset and the other reasons, the results from other researchers work are not easily reproducible. A different approach of converting image data to a standard format was also stated [12]. The process of DL architecture search is time-consuming due to the availability of large number of choices, and it involves trial and error efforts. In this area for a new researcher, instead of diving empirical approach, standardized conceptual
Fig. 9 TPR versus FPR for trained with an Epoch (AD vs. NC)
6 A Review on Deep Learning Framework …
83
Fig. 10 Classification report for multi-label classification (AD, CN, MCI)
Fig. 11 An example of few continuous slices in an MRI file (merged skull stripped images)
development of architecture can also be an option. However, there are challenges in validation as it requires actual clinician’s inputs. The AD is also a complex disease. For clinical implication point of view using MRI requires expert interpretation in clinical settings. Structural MRI is known to differ based on clinical presentation, stage of the disease, and alternate diagnosis/mixed pathology. A known view of medical practitioner is that there are several categories of AD, and the main three types of bio-markers the researchers are working on in the decades is neuroimaging, CSF studies, genetic studies, and other blood-based bio-markers. Again, various types of Alzheimer’s can be associated with various bio-markers.
84
P. Bora and S. Chatterjee
5 Conclusion In future, the author needs to experiment other neuroimaging modalities, use of longitudinal data with proven efficient CNN model in vision task, in addition integration of the AD score from such a classifier with that of other bio-marker scale for accurate timely signature generation. Another non-invasive and cheaper method of using EEG data to differentiate AD from other brain disease could be worth exploring. For this data availability, at the same place for the same subject needs to be ensured. The deployment of the model at a common place where the model only is deployable and data can remain at the user end to avoid any ethical reason need to be investigated. MRI being non-invasive, popular, and commonly used by medical practitioner has also made available image data at various visits for same patients. So there is hope for better method making MRI bio-marker a novel one. For ML, to help building such a bio-marker from this reliantly used MRI image data, the first step is to have an efficient classifier that predicts (detects) based on the findings in the image data. This is exactly the same approach a practitioner’s first step in investigation. CNNs can be a backbone for this medical machine vision.
References 1. J. Wen, E. Thibeau, M. Diaz, J. Routier, A. Bottani, S., Didier, S. Durrleman, , N. Burgo, Colliot, O., Convolutional Neural Networks for Classification of Alzheimer’s Disease: Overview and Reproducible Evaluation (2019) 2. Cheng, D., Liu, M., CNNs based multi-modality classification for AD diagnosis. In: 10th international congress on image and signal processing, BioMedi. Eng. Inf. (CISP-BMEI), Shanghai, pp. 1-5, (2017) https://doi.org/10.1109/CISP-BMEI.2017.8302281. 3. F. Lia, M. Liu, Alzheimer’s disease diagnosis based on multiple cluster dense convolutional networks. Comput. Med. Imag. Graph. 70, 101–110 (2018). https://doi.org/10.1016/ j.compmedimag.2018.09.009 4. M. Liu, D. Cheng, K. Wang, Y. Wang, The Alzheimer’s Disease Neuroimaging Initiative, Multi-Modality Cascaded Convolutional Neural Networks for Alzheimer’s Disease Diagnosis, Springer, 2018. retrieved on Jul 15 2018 5. G. Hinton (2018) Deep learning- a technology with the potential to transform healthcare. JAMA, 320(11), 1101-1102 (2018) https://doi.org/10.1001/jama.2018.11100 6. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015). https://doi.org/ 10.1038/nature14539 7. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4): 834–848 (2018). https://doi.org/10.1109/TPAMI. 2017.2699184 8. B. Duraisamy, J.V. Shanmugam, J. Annamalai, Alzheimer disease detection from structural MR images using FCM based weighted probabilistic neural network. Brain Imag. Behav. 13, 87–110 (2019) 9. C. Feng et al., Deep learning framework for alzheimer’s disease diagnosis via 3D-CNN and FSBi-LSTM. IEEE Access 7, 63605–63618 (2019). https://doi.org/10.1109/ACCESS.2019. 2913847
6 A Review on Deep Learning Framework …
85
10. W. Lin, T. Tong, Q. Gao, D. Guo, X. Du, Y. Yang, G. Guo, M. Xiao, M. Du, X. Qu, Convolutional neural networks-based mri image analysis for the Alz-heimer’s disease prediction from mild cognitive impairment, Front Neurosci. 12 (777), (2018) https://doi.org/10.3389/fnins.2018. 00777 11. S. Liu, S. Liu, W. Cai, H. Che, S. Pujol, R. Kikinis, M.J. Fulham, Multi-modal neuroimaging feature learning for multi-class diagnosis of Alzheimer’s disease. Available on 5 Jan 2019 (2015) 12. B. Lei, S. Chen, D. Ni, T. Wang, ADNI, Discriminative Learning for Alzheimer’s Disease Diagnosis via Canonical Correlation Analysis and Multimodal Fusion. Frontiers in aging neuro science (2016). https://doi.org/10.3389/fnagi.2016.00077 13. Citak-Er, F., Goularas, D., Ormeci, B., : A novel convolutional neural network model based on voxel-based morphometry of imaging data in predicting the prognosis of patients with mild cognitive impairment, J. Neurol. Sci. 2017. 34 (1):52-69 (2017) 14. R. Li, W. Zhang, H.-I. Suk, W. Li, J. Li, D. Shen, S. Ji, Deep learning based imaging data completion for improved brain disease diagnosis, Springer, Med Image Comput Comput Assist Interv. 17(0 3): 305–312. (online) (2014) available from https://adni.loni.usc.edu 15. L. Yuan, Y. Wang, P.M. Thompson, V.A. Narayan, J. Ye, Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61(3), 622–32 (2012). https://doi.org/10.1016/j.neuroimage.2012.03.059 16. M. Liu, J. Zhang, E. Adeli, D. Shen, Joint Classification and regression via deep multi-task multi-channel learning for Alzheimer’s disease diagnosis. IEEE Trans Biomed Eng. 2019 66(5):1195-1206 (2019) 17. L. Cao, L. Li, J. Zheng, X. Fan, F. Yin, H. Shen, J. Zhang, Multi-task neural networks for joint hippocampus segmentation and clinical score regression (2018) 18. Johnson, C.B.F., Predicting Alzheimer’s disease using mri data and ensembled convolution neural network. Scholarly article avaible from google scholar (2018) 19. G. Lee, K. Nho, B. Kang, K.A. Sohn, D. Kim, Predicting Alzheimer’s disease pro-gression using multi-modal deep learning approach. 9, 1952 (2019). https://doi.org/10.1038/s41598018-37769-z 20. H. Suka, S.W. Leea, D. Shena, Deep ensemble learning of sparse regression models for brain disease diagnosis. Med. Imag. Anal. 37, 101–113 (2017). https://doi.org/10.1016/j.media.2017. 01.008 21. J. Islam, Y. Zhang, Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks. Brain Inf. 5, 2 (2018). https://doi.org/10.1186/ s40708-018-0080-3 22. S. Banerjee, H.S Arora, S. Mitra, Ensemble of CNNs for segmentation of Glioma sub-regions with survival prediction, in International MICCAI Brainlesion Workshop, pp. 37-49 (2019), article online from google scholar.com 23. C. Wu, S. Guo, Y. Hong, B. Xiao, Y. Wu, Q. Zhang, Discrimination and conversion predic-tion of mild cognitive impairment using convolutional neural networksQuantitative Imag. Med. Surg. 8(10), 992–1003 (2018) 24. C. Feng et al., Deep learning framework for alzheimer’s disease diagnosis via 3D-CNN and FSBi-LSTM. IEEE Access 7, 63605–63618 (2019). https://doi.org/10.1109/ACCESS.2019. 2913847 25. Yan, W., H. Zhang, J. Sui, D. Shen, Deep chronnectome learning via bidirectional long shortterm memory networks for MCI Diagniosis, NCBI, PMC 2019 Jun 6 (2019) 26. M. Liu. D. Cheng, W, Yan, Classification of Alzheimer’s disease by combination of convolution and recurrent neural networks using FDG-PET images, research article Front. Neuroinform., 19 June 2018 27. J. Hu, L. Shen, G. Sun, Squeeze-and-exitation network, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132-7141 (2018) 28. D. Pan, A. Zeng, L. Jia, Y. Huang, T. Frizze, X. Song, Early detection of alzheimer’s disease using magnetic resonance imaging: a novel approach combining convolutional neural networks and ensemble learning. Front. Neurosci. 14, (2020). https://doi.org/10.3389/fnins.2020.00259
86
P. Bora and S. Chatterjee
29. M. Jenkinson et al.: FSL, NeuroImage (2011), https://doi.org/10.1016/j.neuroimage.2011.09. 015 30. B. Cheng, M. Liu, D. Zhang et al., Robust multi-label transfer feature learning for early diagnosis of Alzheimer’s disease. Brain Imag. Behav. 13, 138–153 (2019)
Chapter 7
Machine Learning and IoT-Based Ultrasonic Humidification Control System for Longevity of Fruits and Vegetables A. K. Gautham, A. Abdulla Mujahid, G. Kanagaraj, and G. Kumaraguruparan
1 Introduction 1.1 Impacts of Post-harvest Losses Food is an essential part of our life that keeps us alive. It provides us with energy to pursue our day-to-day activities. Farmers produce enough food to feed 1.5 times the total world population. But sadly, global hunger seems to be forever increasing. According to FAO (Food and Agricultural Organization), 30–40% of the food produced is lost every year. Post-harvest losses play a major role in these losses. Fruits and vegetables are an integral part of our food chain. Harvested fruits and vegetables are wasted during transportation, storage, etc. before they reach the endusers. Since fruits and vegetables are generally perishable in nature, their storage gets complicated. Proper storage is required for the safe distribution of fruits and vegetables to customers.
A. K. Gautham (B) · A. A. Mujahid · G. Kanagaraj · G. Kumaraguruparan Thiagarajar College of Engineering, Madurai, Tamil Nadu, India G. Kanagaraj e-mail: [email protected] G. Kumaraguruparan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_7
87
88
A. K. Gautham et al.
1.2 Factors Affecting Food Storage Many factors influence the storage of fruits and vegetables. But upon analysis, we have boiled down those influences into two major factors: temperature and humidity. By controlling both of these parameters, we can increase the shelf life of fruits and vegetables [1]. Fruits and vegetables with high moisture content should be stored in high humidity and low temperature and vice versa. Humidity is also an important factor that determines indoor air quality and thermal comfort [2]. Temperature and humidity can be controlled autonomously by employing technologies such as IoT and machine learning [3]. The humidity can be controlled using humidification and dehumidification processes. But, it is evident that humidification process is much more simple and cost efficient to achieve than dehumidification process [4]. Humidification is the process of increasing the humidity of the ambient atmosphere. Increasing the humidity of the surrounding atmosphere also results in reduced temperature. This can be achieved by using humidifiers [5].
1.3 Selection of a Suitable Humidifier There are different types of humidifiers which are available to choose for our application. A brief comparison between them is given in Table 1. The above comparison shows that ultrasonic humidifiers are best suitable for our application to create a portable storage facility for harvested fruits and vegetables since it is highly economical, portable, safe and requires negligible maintenance. But, there is a literal trade-off when choosing ultrasonic humidifiers. The mineral deposits in the water are discharged as white/grey dust. We can easily overcome this problem by using a decalcification filter for our humidification module or by using reverse osmosis water since it has fewer mineral deposits. Table 1 Comparison between types of humidifiers Humidifier types
Affordability
Portability
Maintenance
Safety
Central
Expensive
Not portable
Complicated
Safe
Evaporative
Inexpensive
Portable
Complicated
Safe
Impeller
Inexpensive
Portable
Complicated
Safe
Steam vaporizers
Inexpensive
Portable
Easy
Not safe
Ultrasonic
Inexpensive
Portable
Easy
Safe
7 Machine Learning and IoT-Based Ultrasonic …
89
1.4 Ultrasonic Humidifiers The ultrasonic humidification process is based on the inverse piezoelectric effect. When high-frequency electrical signals that match the resonant frequency of a piezoelectric disc are supplied to it, strong mechanical oscillations are produced. When this happens under water, the mechanical vibrations results in very small water droplets that are catapulted into the air which increases the ambient humidity [6]. There is also a limitation with the above-mentioned type of humidifiers. The mineral deposits from the water are settled as white/grey dust when using these humidifiers [7]. This can also result in several health-related issues [8, 9]. But, this can be easily prevented by using reverse osmosis water or by using a decalcification filter. Ultrasonic humidification is proven to reduce the post-harvest losses of a country when implemented at large-scale [10]. But it requires large storage facilities (cold storage plants) and expensive setup. It is almost impossible for individual farmers to gain access to a cold storage plant for storing their harvests. If the farmers are unable to sell their harvests due to fewer market demands or other reasons, they are forced to sell it at a much lower price or else their harvested fruits and vegetables are wasted due to the lack of a proper storage facility.
1.5 Machine Learning and IoT Machine learning can provide a system with the ability to learn on its own, eliminating the need for explicit programming. An IoT system has several computing devices that are interrelated and can communicate data with each other over a network [11]. Implementation of machine learning and IoT frameworks to control ultrasonic humidifiers results in a smart humidification control system for harvested fruits and vegetables. A machine learning model is used to detect various fruits/vegetables. An IoT framework is used to control the humidifier automatically [12] based on the real-time temperature and humidity data inside the storage facility such that the harvested fruit and vegetable are always stored in their optimal storage conditions to ensure their longevity.
2 Experimental Setup The prototype storage facility contains two main modules: the humidification module and the container module. The detailed design of the overall system is described using the schematic shown in Fig. 1. A machine learning model is used to detect various fruits/vegetables. An IoT framework is used to control the humidifier automatically [13] based on the real-time temperature and humidity data inside the storage facility
90
A. K. Gautham et al.
Fig. 1 Detailed design of the system
such that the harvested fruit and vegetable are always stored in their optimal storage conditions to ensure their longevity.
2.1 Hardware Setup Humidification Module A 12 V battery is used to power the fan. 12 V is stepped down to 5 V using a LM7805 IC for powering the Node MCU and the relay to ensure proper operation. The ultra-sonic humidifier has an adapter which converts 240 V AC to 24 V DC. The controller’s digital pins D6 and D7 act as the control pins for the two-channel relay to which the fan and the humidifier are connected. The circuit connections are shown in Fig. 2. Container Module A 9 V battery is used to power the controller after being stepped down to 5 V by a voltage regulator IC (LM7805). The signal pin of the sensor is connected to the controller’s digital pin D4 for fetching temperature and humidity data. The circuit connections are shown in Fig. 3.
7 Machine Learning and IoT-Based Ultrasonic …
91
Fig. 2 Circuit connections (humidifier module)
Fig. 3 Circuit connections (container module)
2.2 Software Setup Machine Learning Model We have implemented a model called “MobileNet” which was trained using the “ImageNet” database [13, 14]. We have also custom trained the model through transfer learning [15, 16]. It is able to differentiate four different fruits and vegetables. It can be extended to identify a variety of fruits and vegetables through proper training. The outputs of the machine learning model are shown in Fig. 4. Data Transfer and Storage The output of the machine learning model is stored in a text file (fruitvegetablename.txt) in the server by a back-end program (write.php). The sensor’s readings are also stored in the database and the latest reading is stored in a text file (data.txt)
92
A. K. Gautham et al.
Fig. 4 Outputs of the Machine Learning model
by a back-end program (postdata.php). The controller in the humidification module fetches these data to control the humidifier. The data stored in the database is also used in the IoT dashboard. Control and Monitoring The controller (container) fetches the data from the sensor, and it sends those data to a back-end program (postdata.php) via a POST request. Another controller (humidification module) fetches the output of the ML model and recent temperature and humidity readings from the server. It compares them with the optimal values for the stored fruit/vegetable for making control decisions. IoT Dashboard The IoT dashboard provides a visual feedback of the control data. The current temperature and humidity data is provided by a back-end program (senddata.php) which retrieves the latest data from the database and sends it to our dashboard through AJAX. For every four seconds, the IoT dashboard is automatically updated. The IoT dashboard for various fruits and vegetables are shown in Fig. 5.
3 Implementation A machine learning model detects the fruit/vegetable placed inside the container. The output of this model is sent to a back-end program for storage in the server. The controller inside the container module fetches current data from the temperature and humidity sensor and then sends it to a back-end program which stores them in the server. The controller in the humidification module fetches these data and compares it with the optimal temperature and humidity data for the stored fruit/vegetable. This
7 Machine Learning and IoT-Based Ultrasonic …
93
Fig. 5 IoT dashboard
comparison is used to perform control operations in the humidification module. If the current temperature is lower than the optimal temperature or if the current humidity is higher than the optimal humidity, then the humidification module and fan are turned OFF and vice versa. The overall implementation of the proposed prototype storage facility is shown in Fig. 6.
Fig. 6 Implementation of the proposed storage facility (Prototype)
94
A. K. Gautham et al.
4 Results and Discussions We have tested and validated the prototype by studying and comparing the smart storage system we have proposed against the conventional storage methods adopted by farmers. We have chosen mint, mushroom, tomato and apple for the purpose of this test since they perish at a faster rate than most of the fruits and vegetables. The optimal storage humidity and temperature of these fruits and vegetables are shown in Table 2. The results obtained regarding the storage of these four fruits and vegetables are discussed in detail below.
4.1 Storage of Mint For comparison, we have placed a sprig of mint in the prototype’s container unit and we have also placed a mint sprig outside the container unit. The sprig which was stored at 37 °C and 40% RH (outside the container) loses its moisture and started to dry within 30 minutes while the sprig which is stored inside the container unit remained fresh as shown in Fig. 7. The results obtained from the prototype storage unit are also converted into a graph for better visualization as shown in Fig. 8. Table 2 Optimal storage humidity and temperature of the fruits and vegetables to be tested
Fig. 7 Comparison between the mint sprig placed inside the container (left) and the sprig placed outside the container (right)
Fruit/vegetable
Optimal Relative Humidity (RH)
Optimal temperature (°C)
Mint
90–95
0–8
Mushroom
95
0–8
Tomato
95
21
Apple
95
4
7 Machine Learning and IoT-Based Ultrasonic …
95
Fig. 8 Temperature (left) and humidity (right) graphs obtained from the prototype storage unit for the storage of a mint sprig
4.2 Storage of Mushroom We have cut a mushroom in half to stimulate faster deterioration rate for obtaining quicker and better results. A portion of the cut mushroom was stored inside the container unit of the prototype and another portion was stored outside the container unit at 37 °C and 40% RH. As expected, black spots started to appear on the portion placed outside the container and the mushroom started to shrink within 50 minutes of storage while the portion placed inside the container remained fresh as shown in Fig. 9. The results obtained from the prototype storage unit are also converted into a graph for better visualization as shown in Fig. 10. Fig. 9 Comparison between the mushroom placed inside the container (left) and the mushroom outside the container (right)
96
A. K. Gautham et al.
Fig. 10 Temperature (left) and humidity (right) graphs obtained from the prototype storage unit for the storage of a mushroom
Fig. 11 Comparison between the tomato placed inside the container (left) and the tomato outside the container (right)
4.3 Storage of Tomato We have cut a hybrid tomato in half for obtaining better results in a shorter time span. One portion of the cut tomato was stored outside the module at 36 degree C and 30% RH and the other half is stored in the module where temperature and humidity were controlled automatically. After 2 h of storage, the portion stored outside the module started to lose its moisture content from its pulp and the seeds were exposed. But the other portion remained fresh with negligible water loss and the pulp stayed intact as shown in Fig. 11. The results obtained from the prototype storage unit are also converted into a graph for better visualization as shown in Fig. 12.
4.4 Storage of Apple We have cut an apple and placed a portion at 33 °C and 40% RH. Another portion was placed inside the module with controlled humidity and temperature. After about
7 Machine Learning and IoT-Based Ultrasonic …
97
Fig. 12 Temperature (left) and humidity (right) graphs obtained from the prototype storage unit for the storage of a Tomato
Fig. 13 Comparison between the apple placed inside the container (left) and the apple outside the container (right)
40 minutes of storage, the portion placed outside the module started browning at a comparatively higher rate than the portion placed inside the module which still remained fresh with significantly less brown spots as shown in Fig. 13. The results obtained from the prototype storage unit are also converted into a graph for better visualization as shown in Fig. 14.
5 Conclusion Recent researches have suggested that humidification process has the potential to reduce post-harvest losses in supply chain management. It is found that farmers are still affected by these losses. We have developed a prototype storage unit with a commercial humidifier module that works based on the principle of ultrasonic humidification. The prototype storage unit can automatically control the temperature and humidity for achieving optimal storage conditions, and it can also provide
98
A. K. Gautham et al.
Fig. 14 Temperature (left) and humidity (right) graphs obtained from the prototype storage unit for the storage of an apple
visual feedback for the same. We have compared our prototype’s storage performance against the storage methods adapted by farmers. Our prototype’s performance can further be extended by using a vacuum-sealed storage unit to ensure better storage of harvested fruits and vegetables. Temperature and humidity sensors with a wide range of detection and greater accuracy can be used. A phase change material (PCM) lining can be added to the storage container for an additional passive and precise control of temperature.
References 1. D. Mohapatra, S. Mishra, S. Giri, A. Kar, Application of hurdles for extending the shelf life of fresh fruits, in Trends in Post Harvest Technology, vol. 1, no. 1, pp. 37–54, 2013. Available: https://www.researchgate.net/publication/259841724 2. M. Qin, P. Hou, Z. Wu, J. Wang, Precise humidity control materials for autonomous regulation of indoor moisture. Build. Environ 169, 106581 (2020). https://doi.org/10.1016/j.buildenv.2019. 106581 3. B. Herna´ndez, A. Olejua, J. Olarte, Automatic humidification system to support the assessment of food drying processes, in IOP Conference Series: Materials Science and Engineering, vol. 138, p. 012019 (2016). https://doi.org/10.1088/1757-899x/138/1/012019 4. J. Perret, A. Al-Ismaili, S. Sablani, Development of a Humidification–Dehumidification system in a Quonset greenhouse for sustainable crop production in arid regions. Biosyst. Eng. 91(3), 349–359 (2005) https://doi.org/10.1016/j.biosystemseng.2005.04.009 5. Z. Feng, X. Zhou, S. Xu, J. Ding, S. Cao, Impacts of humidification process on in-door thermal comfort and air quality using portable ultrasonic humidifier. Build. Environ. 133, 62–72 (2018). https://doi.org/10.1016/j.buildenv.2018.02.011 6. G. Scott, Ultrasonic Mist Maker—DIY or Buy Oct. 22 2017). Accessed: Jan. 26 2020. [Online Video]. Available: https://www.youtube.com/watch?v=aKhPj7uFD0Yt=248s 7. A. Sain, J. Zook, B. Davy, L. Marr, A. Dietrich, Size and mineral composition of airborne particles generated by an ultrasonic humidifier. Indoor Air 28(1), 80–88 (2017). https://doi. org/10.1111/ina.12414 8. W. Yao, D. Gallagher, L. Marr, A. Dietrich, Emission of iron and aluminum oxide particles from ultrasonic humidifiers and potential for inhalation. Water Res. 164, 114899 (2019). https://doi. org/10.1016/j.watres.2019.114899
7 Machine Learning and IoT-Based Ultrasonic …
99
9. W. Yao, R. Dal Porto, D. Gallagher, A. Dietrich, Human exposure to particles at the air-water interface: influence of water quality on indoor air quality from use of ultrasonic humidifiers. Environ. Int. 143, 105902 (2020). https://doi.org/10.1016/j.envint.2020.105902 10. S. Fabbri, S. Olsen, M. Owsianiak, Improving environmental performance of post-harvest supply chains of fruits and vegetables in Europe: potential contribution from ultrasonic humidification. J. Cleaner Prod. 182, 16–26 (2018). https://doi.org/10.1016/j.jclepro.2018. 01.157 11. M.U. Farooq, M. Waseem, S. Mazhar, A. Khairi, T. Kamal, A review on Internet of Things (IoT). Int. J. Comput. Appl. 113(1), 1–7 (2015). https://doi.org/10.5120/19787-1571 12. J.B. Susa, Automatic room humidifier and dehumidifier controller using Arduino Uno. Int. J. Adv. Trends Comput. Sci. Eng. 9(2), 2208–2212 (2020). https://doi.org/10.30534/ijatcse/2020/ 198922020 13. D. Shiffman. ml5.js: Image Classification with Mobile Net. (Aug. 1 2018). Accessed: Feb. 6 2020. [Online Video]. Available: https://www.youtube.com/watch?v=yNkAuWz5lnYamp;t= 1207s 14. ml5—A friendly machine learning library for the web. Learn.ml5js.org (2020) (Online). Available: https://learn.ml5js.org/ 15. Teachable Machine, Teachablemachine.withgoogle.com (2020) (Online). Available: https://tea chablemachine.withgoogle.com/ 16. K. Weiss, T. Khoshgoftaar, D. Wang, A survey of transfer learning. J. Big Data 3(1) (2016). https://doi.org/10.1186/s40537-016-0043-6
Chapter 8
Classification of Melanoma Using Efficient Nets with Multiple Ensembles and Metadata Vardan Agarwal, Harshit Jhalani, Pranav Singh, and Rahul Dixit
1 Introduction Out of all forms of cancer known, cancer of the skin is the most common of all cancers. Melanoma [1] accounts for only about 1% of skin cancers but causes a large majority of skin cancer deaths—the risk of melanoma increases as individuals age. The average age of people when it is analyzed is 65. Albeit it is not uncommon even among those younger than 30. A thorough understanding of the incidence trends, risk factors, mortality trends, and other factors like sex and area infected can help the diagnosis [2]. Right now, dermatologists assess all of a patient’s moles to distinguish anomaly lesions that are destined to be melanoma. They can enhance their diagnostic accuracy with the help of detection algorithms that consider these images along with the patient-level metadata to determine which cases represent a harmful melanoma. This paper describes our approach for the 2020 SIIM-ISIC Melanoma Classification Kaggle challenge, where skin lesions have to be classified as benignant or malignant. The ISIC archive the largest publicly available collection of data on skin lesions. In this challenge, along with the image data, patient-level data is also provided to make more accurate predictions that can ultimately help diagnose this deadly disease as early as possible. Samples from the dataset are shown in Fig. 1. We use the entire family of Efficient Net [3] models B0-7 for the method. The data is augmented beforehand using standard augmentation techniques and random sprinkling. An attention mechanism [4] is used along with Squeeze and Excitation Nets (SENets) [5] using Swish activation functions [6] with the Efficient Nets as the backbone. This is combined with the meta-data of inputs that are high-level data of the images like age, sex, and the body area where that cancerous tissue was found to classify whether that part is benignant or malignant. A K-fold [7] approach is utilized, and subsequently, five models are trained for each Efficient Net. These V. Agarwal · H. Jhalani · P. Singh Manipal University Jaipur, Jaipur, Rajasthan 303007, India R. Dixit (B) Indian Institute of Information Technology Pune, Pune, Maharashtra 411048, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_8
101
102
V. Agarwal et al.
Fig. 1 Images from the dataset
folds are combined to give a single prediction by using a majority mean ensemble technique. Finally, all eight models’ predictions are combined using the absolute correlation ensemble to give a final prediction.
1.1 Key Contributions We propose using different image sizes are used for different Efficient Net models that were recommended by its creators for each one. This allows different models to capture different features that might have been missed if only a single size were used. Augmentation techniques like random sprinkling and Gaussian dropout [8] that adds noise to weights are used as the size of the mole is varying, and the images have a lot of hair as well, which can confuse the classifier. This enables the model to generalize better and be more robust against over-fitting. The feature maps extracted from Efficient Net models which use the noisy-student weights [9] instead of the standard ImageNet weights are passed to SENets, and attention models to force the model to localize the infected area and make predictions from it. Also, as the dataset is highly imbalanced we suggest using Focal loss instead of the more common categorical cross-entropy loss which is generally used in the existing works. Finally, we introduce two new ensemble techniques: Majority Mean ensemble and Absolute Correlation ensemble used to combine the different sets of predictions. The rest of the paper is organized as follows. A brief review of some state-ofthe-art techniques is provided in Sect. 2, followed by a background of the techniques used, and our method in detail is presented in Sect. 3. The experimental results are presented in Sect. 4, and then the conclusion wraps up the paper in Sect. 5.
2 Literature Review Over the years, many researchers have tried to solve the problem of accurately classifying skin cancer lesions. Gulati et al. [10] used pre-trained deep learning classifiers like VGG16 and AlexNet for transfer learning and used them as feature extractors.
8 Classification of Melanoma Using Efficient Nets …
103
For image preprocessing, DullRazor was used which is a software to remove hairs from images. Finally, they found that transfer learning using VGG16 outperformed the other methods. Nahata et al. [11] also used transfer learning and tried a total of five different pre-trained architectures and experimented with different pooling and dropout layers. They found that Inception–ResNet gave the best results with 91% accuracy against others like InceptionV3, VGG-16, MobileNet, and ResNet-50 with a categorical cross-entropy loss function. Amin et al. [12] used the luminance channel of LAB colorspace instead of RGB as input to pre-trained networks like Alex Net and VGG-16. PCA was applied for feature selection to the feature maps from the model, which were then used to classify cancer as benign or malign. They also segmented out the infected area using the Biorthogonal 2D wavelet transform and the Otsu algorithm. Gessert et al. [13] used an ensemble of Efficient Nets using the pretrained ImageNet weights combined with SENets and ResNeXt WSL, which were selected using a search strategy having chosen weighted cross-entropy as their loss function. Patient meta-data was also utilized through a dense neural network branch. They also used multiple image resolutions as inputs and used two different cropping strategies and predicted an additional class using a data-driven approach.
3 Proposed Method In this section, we first provide a background of different techniques and algorithms used and then combine them in detail to describe our method.
3.1 Random Sprinkling Most of the intense data augmentation methods like CutOut [14], RICAP [15], and CutMix [16] do not work when it comes to skin lesions. The reason is that the infected area occurs in a random location of a skin lesion’s image, whereas the rest is perfectly normal. Thus, by randomly clipping a lot of area with cutmix or ricap or blocking out the contaminated area with a big black block via CutOut, the neural network will look at the benign section of the image and would not result in an accurate model. Thus, instead of using Cutout, which could result in no usable information available to learn from, that block is broken up into tiny blocks, randomly sprinkled around the image and thereby only masking certain random sections, not the entirety of a contiguous section.
104
V. Agarwal et al.
3.2 Efficient Net They are based on the concept of compound scaling where a model is created by modifying the model depth, width and adjusting image resolution simultaneously instead of doing one at a time where after a point, the network’s performance gets compromised. All the versions of efficient Net numbered from B0-7 have been optimized for accuracy and floating-point operations. The architecture uses seven inverted Residual blocks, with each having a different setting.
3.3 Attention Model ‘Attention models’ primary objective is to focus on a specific portion of the data, i.e., paying more attention to the mole in our case. Global average pooling (GAP) [17] is too simplistic since some of the regions are more relevant than others. So we build an attention mechanism to turn pixels in the GAP on and off before the pooling and then rescale (using a Lambda layer) the results based on the number of pixels. The model can be seen as a sort of ‘global weighted average’ pooling.
3.4 Majority Mean Ensemble The most common ensemble technique is to take the mean of all the models. However, this means that outliers affect the final prediction that can negatively impact the score, especially when sensitive metrics like AUC-ROC [18] are used. To counter this, we propose a majority mean ensemble technique that finds which class is voted by the majority of the models and then finds the average of only those models that had voted for that class, as shown in (2). T [ j] =
n 0, if modeli [ j] < 0.5 i=1
1, otherwise
⎧ n ⎨ i=1 modeli [ j], if modeli [ j] < 0.5 and T [ j] < n/2 n modeli [ j], if modeli [ j] > 0.5 and T [ j] > n/2 pred[ j] = i=1 ⎩ n i=1 modeli [ j], if T [ j] = n/2
(1)
(2)
where n denotes the number of models, j denotes the index of test cases and pred is the prediction from the ensemble.
8 Classification of Melanoma Using Efficient Nets …
105
3.5 Absolute Correlation Ensemble We aim to increase the accuracy of the best-performing model using the other models. In this method, we first calculate the correlation between the best model and the remaining models. The model with the least correlation is selected as it will make our final result more versatile. We find the indices where the difference is greater than twice their average and replace them with the average of all models, as shown in Eq. (5). m j=1 model1 [ j] (3) avg1 = m m j=1 model2 [ j] (4) avg2 = m n modeli [ j] i=1 , if |model1 [ j] − model2 [ j] ≥ 2 × |avg1 − avg2| (5) n pred[ j] = model1 [ j], otherwise where model1 and model2 are the best-performing model and the model with the least correlation, respectively, j denotes the current index, m is the total number of test cases, n represents the number of models to ensemble, and pred is the prediction from the ensemble.
3.6 Method in Detail The proposed method can be divided into the following broad categories: preprocessing and augmentation, the model, and finally, combining the results using ensembles. They are further broken down into steps and elaborated below. A flowchart of our algorithm is shown in Fig. 2.
Fig. 2 Flowchart of the algorithm
106
V. Agarwal et al.
1. For image augmentation, standard augmentations [19] like rotation, shearing, zoom, flipping, and shifting are applied to the images along with changing the randomly altering their hue, saturation, brightness, and contrast. Random sprinkling is also applied to the images where small blocks are randomly added to blocking out areas of the image, forcing the classifier to learn from the whole image rather than focusing on just a small area. 2. These augmented images are then passed to our model, which starts with an Efficient Net backbone. 3. It is followed by blocks compromising of Squeeze and Excitation networks. The SE blocks improve channel inter-dependencies at almost no computational cost. Traditionally, each channel is given equal weights by the network when creating the output feature maps; however, SE blocks are all about changing this by adding a content-aware mechanism to weight each channel adaptively. They use the Sigmoid function [20] shown in Eq. (6) for the last convolutional layer as its activation function while the Swish function shown in Eq. (7) is used for the earlier layers. (6) σ (x) = (1 + e−x )−1 where σ (x) represents the Sigmoid function. f (x) = xσ (x)
(7)
where f (x) represents the Swish function and σ (x) is the Sigmoid function. 4. Global average pooling (GAP) layers in an attention model style that gives special attention to certain parts of the feature maps follow the model architecture discussed in steps 2 and 3. 5. A separate dense network branch consisting of input of the patient-level metadata is added to the model and concatenated together with the result of step 4 followed by some dense layers. Like step 2, the Swish function is used as the activation function for all the layers except for the last layer which uses the Sigmoid function. 6. Gaussian dropout is used in-place of normal dropout layers throughout the model as √ it also adds multiplicative noise to the layers having a standard deviation of rate/(1 − rate) where rate refers to the float value passed to the function. 7. The complete model architecture developed from steps 2–5 can be seen in Fig. 3. 8. For the loss function of our method, we use Sigmoid Focal Cross-Entropy [21, 22], as it is beneficial for classification with highly imbalanced classes. It does so by punishing a wrong classification tremendously, in contrast, to reward from a correct one, which prevents the classifier from overfitting and predicting only the dominant class. Along with that, Adam [23] is used as our optimizer. 9. A K-fold approach to the model developed in the above steps they are trained 5 times with different validation sets. These test set predictions of all the folds of each Efficient Net were combined using Majority Mean Ensemble. 10. The best result out of the esembled predictions in step 9 was found out, followed by finding the set having the least correlation with this set.
8 Classification of Melanoma Using Efficient Nets …
107
Fig. 3 Model architecture
11. Absolute Correlation Ensemble as explained above was applied to them to obtain our final predictions for the test set. In the next section, we present our experimental results and discussions of the proposed method.
4 Experimental Results We used the data in SIIM-ISIC challenge 2020 [24] on Kaggle was used for the models. This training set was divided into training and validation splits with a ratio of 80% to 20%, respectively. For the test set, the private section of the challenge was selected. The metric used for measuring models’ performance is AUC-ROC, which is generally preferred when dealing with imbalanced datasets over parameters like accuracy. Different learning rates were tried to determine which ones perform the best, as shown in Fig. 4. Consequently, a learning rate of 0.0001 was selected for the first four models and 0.00016 for the last four for the rest of the method. This dataset is highly
108
V. Agarwal et al.
Fig. 4 Comparing different learning rates, effect of external data, loss function, and pre-trained weights
imbalanced, with only 2% of the images having malignant melanoma. Hence, we used additional data from the ISIC challenge 2019 [25–27] to reduce this imbalance. This vastly improves the performance for the Efficient Net models, especially for Efficient Net B-5, as can be seen in the (b) part of Fig. 4, which compares the effect of using external data against only the competition data. Most of the existing works used categorical cross-entropy as their loss function, but we suggest using the focal cross-entropy loss function when having a highly imbalanced dataset as it gives better results as shown in (c) part of Fig. 4. Finally, unlike the popular ImageNet weights for Efficient Nets we have used noisy-student weights. They are a type of semisupervised learning that enhances the sense of self-training through the use of equal or greater student models and additional noise to the student during the learning. Their comparison with ImageNet weights can be seen in Fig. 4d. Each Efficient Net was trained 5 times with a K-Fold approach. Different input resolutions were used for each Efficient Net which is reported in Table 1. The average test set results for each model, and the results after ensemble with mean and majority mean are shown in Table 1. This is followed by combining all these to obtain our final prediction, which is done using the Absolute Correlation Ensemble. It is measured against other different combinations of ensembles, as shown in Table 2. In Table 1, it can be observed that ensembling the K-fold results gives much better results than the separate average of results of all folds. It can also be seen that our proposed ensemble technique of Majority Mean Ensemble outperforms the standard mean ensemble technique. In Table 2, we try different combinations of ensembles, the first for ensembling the model’s folds and the second for ensembling all the eight
8 Classification of Melanoma Using Efficient Nets …
109
Table 1 Input size for efficient net models Efficient net Input size Average of results Ensemble: mean model B0 B1 B2 B3 B4 B5 B6 B7
224 240 260 300 380 456 528 600
0.9118 0.9055 0.9102 0.9109 0.9143 0.9289 0.9217 0.9226
0.9219 0.9197 0.9209 0.9219 0.9247 0.9309 0.9249 0.9285
Table 2 Results for different combinations of ensembles Mean, mean Majority mean, Majority mean, Mean, absolute mean majority mean correlation 0.9334
0.9336
0.9338
0.9338
ensemble: majority mean 0.9222 0.9193 0.9212 0.9215 0.9247 0.9310 0.9251 0.9289
Majority mean, absolute correlation 0.9342
models together. It is observed that the Majority Mean Ensemble, followed by the Absolute Correlation Ensemble, gives the best results.
5 Conclusion In this paper, we proposed a multiple ensemble model to detect melanoma, using a base efficient net model and a combination of SENet and attention models. Predictions on the state of the lesion, i.e., benign or malign, are performed using both image and patient-level metadata. Augmentation techniques like random sprinkling were also utilized. Majority Mean ensemble and Absolute Correlation ensemble were introduced, which improved the single predictions by the models and also outperformed simple ensemble techniques like mean. For future work, we would like to develop the Absolute Correlation ensemble like the Hill Climbing [28] or the Forward Selection approach. Instead of adding all models together, they are added in an iterative approach and selected if they improve the score.
110
V. Agarwal et al.
References 1. J.F. Thompson, R.A. Scolyer, R.F. Kefford, Cutaneous melanoma. The Lancet 365(9460), 687– 701 (2005). https://doi.org/10.1016/S0140-6736(05)17951-3. http://www.sciencedirect.com/ science/article/pii/S0140673605179513 2. A.J. Miller, M.C. Mihm, Melanoma. New England J. Med. 355(1), 51–65 (2006). https://doi. org/10.1056/NEJMra052166 3. M. Tan, Q.V. Le, Efficientnet: rethinking model scaling for convolutional neural networks. CoRR (2019). http://arxiv.org/abs/1905.11946 4. H. Choi, K. Cho, Y. Bengio, Fine-grained attention mechanism for neural machine translation. Neurocomputing 284, 171–176 (2018). https://doi.org/10.1016/j.neucom.2018.01.007. http:// www.sciencedirect.com/science/article/pii/S0925231218300225 5. J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 6. P. Ramachandran, B. Zoph, Q.V. Le, Searching for activation functions (2017) 7. T.T. Wong, Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 48(9), 2839–2846 (2015). https://doi.org/10.1016/j.patcog. 2015.03.009. http://www.sciencedirect.com/science/article/pii/S0031320315000989 8. J. Hron, A.G. de Matthews, Z. Ghahramani, Variational gaussian dropout is not bayesian (2017) 9. Q. Xie, M.T. Luong, E. Hovy, Q.V. Le, Self-training with noisy student improves imagenet classification (2020) 10. S. Gulati, R.K. Bhogal, Detection of Malignant Melanoma Using Deep Learning, in Advances in Computing and Data Sciences (Singapore, Springer Singapore, 2019), pp. 312–325 11. H. Nahata, S.P. Singh, Deep Learning Solutions for Skin Cancer Detection and Diagnosis (Springer International Publishing, Cham, 2020), pp. 159–182. https://doi.org/10.1007/9783-030-40850-3_8 12. J. Amin, A. Sharif, N. Gul, M.A. Anjum, M.W. Nisar, F. Azam, S.A.C. Bukhari, Integrated design of deep features fusion for localization and classification of skin cancer. Pattern Recogn. Lett. 131, 63–70 (2020). https://doi.org/10.1016/j.patrec.2019.11.042. http://www. sciencedirect.com/science/article/pii/S0167865519303630 13. Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A.: Skin lesion classification using ensembles of multi-resolution efficient nets with meta data (2019) 14. T. DeVries, G.W. Taylor, Improved Regularization of Convolutional Neural Networks with Cutout (2017) 15. R. Takahashi, T. Matsubara, K. Uehara, Ricap: random image cropping and patching data augmentation for deep cnns. (PMLR, 2018), pp. 786–798. http://proceedings.mlr.press/v95/ takahashi18a.html 16. S. Yun, D. Han, S.J. Oh, S. Chun, J. Choe, Y. Yoo, Cutmix: regularization strategy to train strong classifiers with localizable features (2019) 17. T.Y. Hsiao, Y.C. Chang, H.H. Chou, C.T. Chiu, Filter-based deep-compression with global average pooling for convolutional networks. J. Syst. Architecture 95, 9–18 (2019). https://doi.org/10.1016/j.sysarc.2019.02.008. http://www.sciencedirect.com/science/ article/pii/S1383762118302340 18. J. Fan, S. Upadhye, A. Worster, Understanding receiver operating characteristic (roc) curves. Canad. J. Emerg. Med. 8(1), 19–20 (2006). https://doi.org/10.1017/S1481803500013336 19. C. Shorten, T. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019) 20. A.C. Marreiros, J. Daunizeau, S.J. Kiebel, K.J. Friston, Population dynamics: variance and the sigmoid activation function. NeuroImage 42(1), 147–157 (2008). https://doi. org/10.1016/j.neuroimage.2008.04.239. http://www.sciencedirect.com/science/article/pii/ S1053811908005132 21. Y. Cui, M. Jia, T.Y. Lin, Y. Song, S. Belongie, Class-balanced loss based on effective number of samples, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
8 Classification of Melanoma Using Efficient Nets …
111
22. T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection (2018) 23. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization (2017) 24. V. Rotemberg, N. Kurtansky, B. Betz-Stablein, L. Caffery, E. Chousakos, N. Codella, M. Combalia, S. Dusza, P. Guitera, D. Gutman, A. Halpern, Kittler, H., K. Kose, S. Langer, K. Lioprys, J. Malvehy, S. Musthaq, J. Nanda, O. Reiter, G. Shih, A. Stratigos, P. Tschandl, J. Weber, H.P. Soyer, A patient-centric dataset of images and metadata for identifying melanomas using clinical context (2020). https://doi.org/10.34970/2020-ds01 25. N. Codella, V. Rotemberg, P. Tschandl, M.E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, H. Kittler, A. Halpern, Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) (2019) 26. N.C.F. Codella, D. Gutman, M.E. Celebi, B. Helba, M.A. Marchetti, S.W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, A. Halpern, Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (isbi), Hosted by the International Skin Imaging Collaboration (ISIC) (2018) 27. P. Tschandl, The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions (2018). https://doi.org/10.7910/DVN/DBW86T 28. Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection, in Sixth International Conference on Data Mining (ICDM’06), pp. 828–833 (2006)
Chapter 9
Performance of Optimization Algorithms in Attention-Based Deep Learning Model for Fake News Detection System S. P. Ramya and R. Eswari
1 Introduction Social media has become part of everyone’s lives and at this point, one cannot even imagine the world without different online social media. The entire globe has been in a state of lockdown and quarantine for over six months, and people have been spending more time in front of various online social media. These social media platforms permit people to quickly share information using a click of a single share button. Fake news content usually spreads through the social media sites like Facebook, Twitter, and others. The spreading of fake news is faster and cheaper in social media compared to the traditional media such as television and newspapers. About 62% of adults in United States of America get their news through various social media [1]. Also, the spreading of fake news has become a global problem. The misinformation can cause chaos and unwanted stress among the public. It is one of the greatest threats to democracy and freedom. Fake news is nothing but spreading of false information to gain politically or financially. Without these three factors, fake news cannot spread, namely Tools and services, Motivation, and Social networks. Fake news identification from online social networks is particularly a tricky task due to multiple reasons. The first reason is that it is too difficult to collect the fake news and label them manually. Also, the fake news is written by human beings. In addition, the data representation is limited to fake news detection [2]. Nowadays, fake news detection is done with the help of deep neural networks methods. In this work, a fake news detection system using CNN-based deep learning model is implemented to evaluate the performance of different optimization algorithms S. P. Ramya · R. Eswari (B) Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India e-mail: [email protected] S. P. Ramya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_9
113
114
S. P. Ramya and R. Eswari
during training, validation, and testing. The performance of a training optimizer is much realizable and progressive if the problem is simple and linear. This evaluation on fake news detection system has been chosen because unlike other tasks such as spam detection for a machine or machine learning system, it is very hard to distinguish between a fake news and real news and that makes it as a complex learning problem. Hence, it is decided to validate the performance of different optimizers on such complex fake news detection system. The rest of this paper is organized as follows. Related works are given in Sect. 2. In Sect. 3 and Sect. 4, details of the proposed process experimental setup are explained. In Sect. 5, the experimental results are analyzed. In Sect. 6, the observations and findings are presented. In Sect. 7, the conclusions are discussed.
2 Related Works In this section, various existing deep learning models for detecting fake news in various social media networks are discussed. Pritika Bahad et al. [3] have discussed a fake news detection model based on Bidirectional LSTM. Two publicly available dataset from Kaggle has been used. They concluded that Bidirectional LSTM outperformed other deep learning methods, namely CNN, Vanila RNN, and unidirectional LSTM. Implementation was done using AdaGrad, RMSProp, and Adam adaptive learning algorithms and they concluded that the RMSProp optimizer outperformed for large size news articles. Only the accuracy metrics were compared and other metrics were ignored. Shu et al. [1] have discussed a hybrid CNN-LSTM model for fake news detection on text features. The authors used LIAR and News articles datasets. Training of the network was done for 400 epochs with a batch size equal to 64 using Stochastic Gradient Descent (SGD) as an optimization for loss function. The results concluded that the accuracy of the CNN-LSTM outperformed other models. Bajaj et al. [4] focused to build a classifier to predict that the news is fake, based only on its content using Natural Language Processing. Two different datasets from public domain were employed. The author employed several models starting from Logistic Regression to BiLSTM and applied Adam as an optimizer for the experimental work. Only the test data analysis was done and the validation and training data were not considered. Yang et al. [5] designed TI-CNN, which could combine the text and image information with the corresponding explicit and latent features. The authors used the datasets collected before the Presidential Elections 2016 of United States of America. RMSProp optimization algorithm was applied and it was inferred that TI-CNN outperformed other models. Guo et al. [6] developed a deep transfer learning TL-CNN model to achieve accurate rumor detection in social media platforms. Two datasets, namely Yelp and Five Breaking News (FBN) were used. The Stochastic Gradient Descent was applied
9 Performance of Optimization Algorithms in Attention …
115
to reduce the loss function and also evaluated the training time, testing time, and other performance metrics like accuracy, F1-score, Recall, and Precision. Fang et al. [7] created a model named Self Multi-Head Attention-based Convolutional Neural Networks based only on content (SMHA-CNN). Datasets from kaggle and word2vec word embeddings were collected and used. The ReduceLROnPlateau was used to monitor the loss of validations of the optimizer. In summary, previous related literature attempted by various authors randomly selected the optimization algorithms and proved the quality of their results for fake news detection. In this work, an extensive analysis has been done to validate the performance of different optimizers on such complex fake news detection system.
3 Proposed Methodology This section outlines the idea of overall process involved in fake news detection.
3.1 Proposed Process Flow Figure 1 outlines the entire fake news detection process.
3.1.1
Dataset Collection
In [2], LIAR is considered as the Benchmark dataset for fake news detection. It was released by Wang in 2017. It comprises of three files, namely training, testing and validation. LIAR contains total claims of 12.8 k. These claims are manually labeled by humans and obtained from fact-checking websites. This dataset is of larger dimensions when compared to other previously used datasets. It includes 13 columns, namely statement ID, statement, label, subjects, speaker and others. These sentences are extracted and are used for the classification process. With the collection of datasets, pre-processing has been carried out to remove the noises, outliers, and missing values.
3.1.2
Pre-processing
Data pre-processing is carried out to improve the quality of the collected data. Stepwise pre-processing is carried out prior to the classification as the dataset has some unique attributes that could be irrelevant to the classification process. They include the speaker, location, and party affiliation. In this proposed method, the pre-processing is carried out in two stages.
116
S. P. Ramya and R. Eswari
Fig. 1 The fake news detection process
Stage 1 has the raw data which contains excessive noise in terms of stop words that are filtered proficiently from the collected data. The LIAR Data is used for training, validation, and testing. For doing binary classifications, the classes belonging to ‘false,’ ‘pants-fire’ and ‘barely-true’ are labeled as ‘0’ (zero). The classes belonging to ‘half-true,’ ‘mostly-true’ and ‘true’ are labeled as ‘1’ (one). Stage 2 does the process of Tokenization using keras pre-processing and the statements are tokenized. The elimination of stop words, lowercase conversion, and
9 Performance of Optimization Algorithms in Attention …
117
the stemming process were carried out. The output of the pre-processing is a sequence of string which can be given as input for feature extraction.
3.1.3
Glove Encoding
Glove stands for global vectors for word representation. It is an unsupervised algorithm. GloVe is used for obtaining vectors for co-occurrence of words in a corpus. The main aim of the GloVe is to predict the co-occurrence ratios between two words. In this work, a 100-dimensional Pre-trained GloVe word vectors are used to form the embedding matrix that can be used in deep neural networks. Stored Vectors for Glove Embedding are loaded from the saved Glove file glove.6B.100d.txt.
3.1.4
Implementation of CNN-Based Deep Learning Model
Figure 2 shows the architecture of the proposed CNN-based fake news Detection Network. As shown in the figure, a one-dimensional CNN layer is modeled using onedimensional convolution along with a dropout layer, one-dimensional Maxpooling layer, and a flatten layer. These layers are designed in-between the encoding and the dense classification layer. Figure 2 shows the CNN model for fake news detection.
3.1.5
Training Optimization
Optimization is an essential component of deep learning. The optimization algorithm is used to train the neural networks by optimizing the loss function. Some of the commonly available optimization algorithms are Gradient Descent, Momentum, Adagrad, RMSprop, and Adam. (A) Stochastic Gradient Descent (SGD) Stochastic Gradient Descent is a traditional technique to solve the non-convex optimization problem. SGD is mainly applied to text classification and natural language processing. For each iteration, it selects few data samples instead of entire dataset. It is computationally less expensive because it computes the gradient using a single training sample. For larger datasets, it can converge faster because it performs the updates more frequently. SGD is efficient and is easy to carry out the implementation. It updates the weights by subtracting the recent weight (gt ) by the learning rate (α) of its gradient. The equation for the update rule is: wnew = wold − αgt
(1)
118 Fig. 2 The CNN model used for fake news detection
S. P. Ramya and R. Eswari
9 Performance of Optimization Algorithms in Attention …
119
(B) Adaptive Gradient Algorithm (Adagrad) Adagrad is a gradient-based optimization algorithm. It adjusts the features learning rate. It performs to provide the small updates for regular features and larger updates for rare features. It is suitable for huge scale neural networks. GloVe word embedding uses Adagrad where uncommon words required larger updates. Adagrad uses separate learning rate for each features θ for each time step t. G t = gt−1 + dw 2 wnew = wold − √
α o dw Gt + ε
(2)
(C) Root Mean Square Propagation (RMSProp) RMSProp was created as stochastic gradient technique for mini-batch learning to train the neural networks. It uses the adaptive learning rate instead of learning rate as hyper parameters. It holds the running average of the squared gradients for each weight and then divides the gradient by the square root of the average square. It carries out the below update rule: wnew = wold −
n 2 + γ gt + ε (1 − γ ) · gt−1
· gt
(3)
(D) Adam Adaptive Moment Estimation optimizer is one of the most popular gradient descent non-convex optimization algorithms. It is a combination of RMSProp and SGD with momentum. It calculates the individual adaptive learning rate for each parameter. It combines the strengths of two algorithms, namely Adagrad and RMSProp. It is good for noisy and sparse gradients problems. For ranging the learning rate, Adam uses the high rate of change running mean of the gradients instead of simple mean in Adagrad. Adam algorithm performs as follows: vt =ρ1 vt−1 + (1 − ρ1 )dw st =ρ2 st−1 + (1 − ρ2 )dw 2 vt vt = 1 − ρ1t st st = 1 − ρ2r αvt wnew =wold − √ st + ε
(4)
120
S. P. Ramya and R. Eswari
(E) Nadam Nadam is a combination of Nesterov acceleration and Adam. This is working for high bias gradients. The learning process is stepped up by sum of reduced moving average for the current and previous gradients. m wnew = wold − lr ∗ √ v + ε
(5)
(F) Adadelta Adadelta is an expansion of Adagrad. It regularly reduces the learning rate of Adagrad, i.e., moving average depends on the earlier mean and the current gradient. In Adadelta, there is no need to set a default learning rate, rather it uses the change rate in the features itself to adjust the learning rate. wnew = wold + wold RMS[w]t−1 · gt w = − RMS[gt ]
(6)
(G) AdaMax AdaMax is an adaptive Stochastic Gradient Descent method. It is an alternative of the Adam optimizer which employs the infinity norm. In some situations, AdaMax is better than the Adam and the traditional SGD. AdaMax performs well for noisy datasets. In this, both the learning rate and gradients are adopted. AdaMax performs an update: wnew = wold + Sw
(7)
4 Experimental Setup 4.1 Parameter Settings Table 1 lists out the Parameters and Metrics Used. Table 2 shows the parameters of the different layers of the proposed CNN-based fake news Detection Network.
9 Performance of Optimization Algorithms in Attention … Table 1 The Parameters and Metrics used
Table 2 The Parameters of the proposed CNN Network
121
The network model
CNN
The different optimization algorithms
SGD, adagrad, RMSprop, adam, Nadam, adadelta and adamax
EMBEDDING_DIMENTION
100
VOCABULARY_SIZE
40,000
Other parameter
Keras defaults
The epochs of training
10
The training batch size
128
Metrics used for training, validation, and testing
MSE, accuracy, precision, recall, and F1-Score
Layer (type)
Output shape
Param #
embedding_1 (Embedding)
(None, 300, 100) 924,900
dropout_1 (Dropout)
(None, 300, 100) 0
conv1d_1 (Conv1D)
(None, 298, 128) 38,528
max_pooling1d_1 (MaxPooling1) (None, 74, 128)
0
flatten_1 (Flatten)
(None, 9472)
0
dense_1 (Dense)
(None, 1)
9473
Total params: 972,901 Trainable params: 48,001 Non-trainable params: 924,900 Model: ‘sequential_1’
5 Results and Discussions The performance of the seven different optimization algorithms was validated on a CNN-based deep learning models. The CNN model was trained with the LIAR training dataset and validated with the validation dataset at each epoch of training. In Table 3, the measured batch-wise average of loss and accuracy during each epochs of training with different deep learning network model has been presented. The training performance graphs measured in terms of accuracy and precision exhibit a progressive increase in performance in each epoch of training, almost by all the optimization algorithms. But the validation graphs of the training demonstrate that the validation performance is not really getting improved over the increase of training epochs. In some cases, the validation performance is decreasing with respect to the increase of training and in some cases, the validation performance is somewhat random, i.e., it fluctuates between a high and low value. A careful check on the loss graphs reveals that the loss is also decreasing with respect to the increase in training. At the same time, the validation loss is not
122
S. P. Ramya and R. Eswari
Training performance
Validation performance during training
F1-Score
Accuracy
Loss(MSE)
Metric
Table 3 The Performance of Training with respect to Loss, Accuracy, and F1-Score with Different Optimization Algorithms
decreasing with the increase in training, even when the validation loss is increasing with respect to the increase of training epochs. So, the training and validation graphs of Table 3 clearly show that there is no gradual improvement in training with any particular optimization techniques. This proves the very complexity of fake news detection problem. But with respect to precision, the Adam optimizer achieved the highest precision score that makes it a good optimizer for such complex classification problem.
9 Performance of Optimization Algorithms in Attention …
123
Table 4 displays the Performance in terms of Precision and Recall of Different Optimization Algorithms. Table 5 displays the performance of 7 different optimization Algorithms with the test Dataset.
Training performance
Validation performance during training
Recall
Precision
Metric
Table 4 The Performance in terms of Precision and Recall of Different Optimization Algorithms
Table 5 The performance of the optimizers Optimizer
Accuracy
Precision
Recall
F1_Score
SGD
0.56
0.56
0.96
0.71
Adagrad
0.55
0.57
0.87
0.69
RMSprop
0.53
0.58
0.61
0.59
Adam
0.54
0.57
0.73
0.64
Nadam
0.52
0.58
0.57
0.57
Adadelta
0.56
0.56
0.99
0.72
Adamax
0.53
0.53
0.76
0.63
124
S. P. Ramya and R. Eswari
Fig. 3 Comparison of performance measures
Figure 3 shows the comparison of the performance of all of the seven optimizers with respect to accuracy, precision, recall, and F1-score.
6 Observations and Findings As far as the LIAR fake news benchmark dataset is concerned, almost all the optimizers suffer in finding an optimal solution space during training. Because of the very complexity of the fake news, the training always tends to over-fit in to a subspace and that leads to poor performance in terms of all the metrics during validation, and testing. • Even though the training was done up to 10 epochs, the better network model was not always found at 10th epoch. This signifies that the validation and testing performance are not increasing with respect to the increase in training epochs. • From testing point of view, there is no gradual improvement in training with any particular optimization techniques. This proves the very complexity of fake news detection problem.
9 Performance of Optimization Algorithms in Attention …
125
• In most of the trials carried out, the CNN models suffered with over fitting issues while training with LIAR dataset. Even with all the optimization algorithm the results were the same. • Even though the training in the case of CNN was progressive with respect to the training data, it is realized that they also struck at bad solutions at the huge problem space—the results of validation and tests obviously show this. • Even though the performance was not much in linear over epochs, Adam optimizer provided an overall good performance in terms of accuracy and precision during batch-wise calculations. But, during the calculations of accuracy and precision with test data, RMSprop, Nadam, and Adadelata provided good results.
7 Conclusion In this work, successfully implementation of a CNN-based deep neural networkbased fake news detection models was carried out and an extensive analysis of seven different optimization algorithms with LIAR dataset has been done. The training performance graphs are shown in the previous section clearly show the nature of training and validation. So, the training and validation graphs of the previous section clearly show that there is no gradual improvement in training with any particular optimization techniques. This proves the very complexity of fake news detection problem. But with respect to precision, the Adam optimizer achieved highest precision score than all other algorithm during training (batch-wise calculation) which makes it as a good optimizer for this complex fake news detection problem from the perspective of training. Even though the training seems to be improving over epochs, the performance of the final model was not good, while testing. This shows that the validation/test performance was not improving over the number of epochs of training. As mentioned earlier, unlike other natural language processing and machine learning tasks, fake news detection is somewhat complex from the perspective of the deep learning-based attention mechanisms involved in their design. The results clearly prove the complexity in attention toward fake aspects of the news. The reason for getting poor performance with the network attention mechanisms is: the semantics of a fake news will almost resemble that of a genuine news, so, technically, it is very hard for a deep neural network to ‘attend to’ the fake only aspects of the news article. As far as the complexity of the LIAR dataset is concerned, the existing deep neural networks with different optimization algorithms are not able to ‘attend to’ the fake aspects of the news and hence only giving a marginal performance. To make a deep neural network to ‘attend to’ the fake aspects of the news in a better way, the news/data must be presented in a better way with some enhanced distinguishable features of fakeness. For that, in the future works, one may explore the more advanced NLP-based techniques, pre-processing and feature selection techniques, and better encoding methods for enhancing the performance of training. Also,
126
S. P. Ramya and R. Eswari
one may explore the possibility of hybrid network models and sophisticated learning optimization techniques to improve the performance. Our future works will address these issues.
References 1. K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: a data mining perspective. SIGKDD Explor. 19(1) (2017) 2. W.Y. Wang, Liar, Liar pants on fire: a new benchmark dataset for fake news detection (2017). arXiv:1705.00648v1 [cs.CL] 3. A. Pritika Bahad, A. Preeti Saxena, B. Raj Kamal, Fake news detection using Bi-directional LSTM-recurrent neural network, in International Conference on Recent Trends in Advanced Computing (ICRTAC 2019) (2019) 4. S. Bajaj, The pope has a new baby! fake news detection using deep learning (2017) 5. Y. Yang, L. Zheng, J. Zhang, Q. Cui, X. Zhang, Z. Li, P.S. Yu, TI-CNN: Convolutional Neural Networks for Fake News Detection. arXiV:1806.00749v1 [cs.CL] 3 June 2018. 6. M. Guo, Xu. Zhiwei, L. Liu, M. Guo, Y. Zhang, An adaptive deep transfer learning model for rumor detection without sufficient Identified Rumors. Math. Probl. Eng. (2020). https://doi. org/10.1155/2020/7562567 7. Y. Fang, J. Gao, C. Huang, Wu. Runpu, Self Mult-Head Attention-based Convolutional Neural Networks for fake news detection (2019). https://doi.org/10.1371/journal.pone.02222713 8. A. Drif, Z. Ferhat Hamida, S. Giordano, The Ninth International Conference on Advances in Information Mining and Management (IMMM 2019) (2019) 9. A. Galassi, M. Lippi, P. Torroni, Attention in Natural Language Processing (2020). arXiv: 1902.02181v2 [cs.CL]. 10. J. Younus Khan, Md. T. I. Khondaker, A. Iqbal1, S. Afroz, A Benchmark Study on Machine Learning Methods for Fake News Detection (2019). arXiv:1905.04749 [cs.CL]. 11. O. Melamud, D. McClosky, S. Patwardhan, M. Bansal, The role of context types and dimensionality in learning word embeddings, in Proceedings of NAACL-HLT, 2016 (2016) 12. V. Pérez-Rosas et al. (2017). Automatic Detection of Fake News. arXiv:1708.07104. 13. B. Plank, A. Søgaard, Y. Goldberg, Part-of-speech tagging with bidirectional long shortterm memory models and auxiliary loss, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016 (2016)
Chapter 10
Exploring Human Emotions for Depression Detection from Twitter Data by Reducing Misclassification Rate D. R. Jyothi Prasanth, J. Dhalia Sweetlin, and Sreeram Sruthi
1 Introduction Social media has risen to great heights in the last few years. With 3.4 billion people using social media daily, which amounts to almost 42% of the entire population [1], it has become an indispensable part of our lives. It allows people to express their emotions and opinions, giving everybody an insight into their daily lives. With huge volumes of data flowing, the messages on social media can be analyzed for various purposes. The data collected can also be used in the early detection of mental health disorders. Mental illness is a wide term that refers to several conditions that affect a person’s mental health [2]. The treatment pattern is determined based on the severity level of the disorder. Depression is a kind of mental health disorder that is characterized by a feeling of sadness and negativity [3]. People with depression may also experience several other symptoms such as restlessness, anxiety, increased fatigue, change in appetite, a feeling of worthlessness and even thoughts about suicide [3]. Almost 300 million people are said to suffer from anxiety and depression. Suicide through depression is the second leading cause of death in young adults in the age group of 15–29 years [3]. Unfortunately, the depression detection rate is low in many countries [4]. People need to be made aware about the importance of mental health, so that the affected person can be identified and treated early. From the people’s posts on social media, depression can be detected by analyzing the sentiment expressed in them. Sentiment analysis is a technique to extract the subjective information from a text and classify it as positive, negative or neutral [5]. The main objectives in this research work are as follows: 1. To identify the sentiments expressed from the user’s tweets by using an ensemble lexicon method. 2. To develop a novel algorithm for classification of ambiguous tweets. 3. To develop a recommendation system to initiate positive actions. D. R. Jyothi Prasanth · J. Dhalia Sweetlin (B) · S. Sruthi Anna University (MIT Campus), Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_10
127
128
D. R. Jyothi Prasanth et al.
2 Related Works Over the past few years, several studies have been devoted to analysis of human sentiments using data found in social media. Some of the related works have been highlighted in this section. Many studies have used CNN to build a system for detection of emotions. Canales et al. [6] proposed a methodology for labelling of emotions in a text by using SVM to identify the sentiments conveyed in the text. Then, manual annotators were employed to determine the dominant emotion expressed. The method is time consuming as the increase in data requires manual labour. Lei Wang et al. [7] devised a new algorithm called SentiDiff, based on the inter relationship between a tweet and its retweet. This combined with a conventional neural network produced an accuracy of 79%. It worked only on textual information and considered only tweets with comments added to the retweets. Akhtar et al. [8] concentrated on Coarse and fine-grained emotion and sentiment analysis and achieved an accuracy of 80.52% with CNN in one shot. [9] used real time tweets to analyze the variations in emotional expressions globally. Liu et al. [10] used a two-stage approach for cyberhate text classification with the first stage based on different fuzzy classifiers trained using different fuzzy norms fused to identify ambiguity. The second stage used KNN to further identify ambiguity and enhance the performance. Recommendation system developed by Rosa et al. [11, 12] detects users with potential psychological disturbances, specifically, depression and stress using deep convolutional neural networks. A recommendation system with 360 pre-defined messages was built to send calming messages to the user. Similarly, a friend recommendation system [13] based on 5 personality traits such as Openness to experience, agreeableness, conscientiousness, extraversion and neuroticism using a hybrid filtering method was proposed by Ning et al. This manuscript uses a lexicon-based sentiment analysis method using live public tweets extracted from Twitter. It is different from other works in this domain since it uses a combination of three dictionaries and proposes a new algorithm for effective classification of ambiguous tweets. An emoticon dictionary containing the Unicode and polarity of the emoticons is also constructed.
3 Proposed Work The system architecture is made of several modules such as the preprocessing module, classification module and the recommendation engine. With reference to Fig. 1, the system uses real time tweets from Twitter as input. The input is sent to the preprocessing module where removal of punctuations, hashtags, lemmatization and removal of stop words take place. This cleaned data is then classified using three sentiment lexicons namely VADER sentiment lexicon [14], TextBlob [15] and SentiWordsNet
10 Exploring Human Emotions for Depression Detection …
129
Fig. 1 Proposed system architecture
[16]. The ambiguous and neutral tweets are classified using the Neutral Negative Scoring algorithm (NNS). The negatively classified tweets are then collected, analysed and the emoticons present in the tweets are extracted. An emoticon dictionary is constructed containing the emoji along with its corresponding Unicode used and a polarity is assigned to it depending on the emotion expressed. Using the constructed dictionary, the emojis extracted from the tweets are classified and the negative ones are identified. The users of such tweets are traced and the user history of two weeks is extracted. The extracted tweet history is pre-processed and labelled. The labelled data is used for training an RNN, LSTM and BiLSTM network. If the person is suffering from depression, the recommendation system will be used to suggest personalized actions such as therapy information, support group information or suggestion of self-help books.
3.1 Preprocessing of Data Recent tweets are extracted from Twitter using the Twitter public API [17]. The API allows public tweets to be extracted along with the username, location, date and time. A tweet can contain up to 280 characters. It can also contain photos, videos,
130
D. R. Jyothi Prasanth et al.
gifs and emoticons. For this work, only the textual part and emoticons are considered to detect depression. To create a developer’s account, a request needs to be given in the Twitter Developers page, citing the purpose of usage [17]. Since raw datasets have many noisy and irrelevant data, data pre-processing such as removal of stop words, hashtags, emoticons, URL links, special characters, tokenization and lemmatization [18] are performed. Emoticons are stored separately for further analysis.
3.2 Classification by Multiple Dictionaries The preprocessed tweets need to be classified as positive, negative and neutral. For this classification, rule-based method or Machine Learning based methods can be used [19]. In rule based, classification is done by comparing the text to a set of predefined rules. In this manuscript, rule-based classification is performed using an ensemble method with three lexicons. The preprocessed tweets are then classified into the positive, neutral and negative tweets using Vader [14], TextBlob [15] and SentiWordsNet [16] dictionaries. The tweets classified as negative are stored separately for contextual analysis. The remaining tweets which may be neutral or ambiguous are classified using the proposed Neutral Negative Scoring Classifier (NNS).
3.3 NNS Classifier The NNS Algorithm takes a set of neutral and ambiguous tweets as input. In each tweet, the polarity of every word is calculated. If the polarity score is less than 0, i.e. the word is classified as negative, then that tweet is fed into the negatechecker function. In the negatechecker function, the tweet is split into bigrams. If the word for which the polarity is being checked is preceded by a negation word, then depending on the polarity of that word, it is classified as negative or neutral. Example 1 I am neither depressed, nor sad. Example 2 I am not really too happy with this team. The first tweet is not classified uniformly by all the three dictionaries. Since the words “depressed”, “sad” are classified as negative, the entire tweet is classified as negative or neutral by the dictionaries. In the second tweet, since the word “happy” is present, the tweet is classified as neutral or positive by these dictionaries. Hence, these are considered as ambiguous in nature. The NNS Algorithm is applied to such tweets. In the first tweet, as the word “depressed” is preceded by a negation word, the tweet is classified as neutral by the NNS Algorithm. After applying the NNS Algorithm to the second tweet, it gets classified as negative.
10 Exploring Human Emotions for Depression Detection …
131
Algorithm 1 NNS Algorithm while end of file do loop 1: for each line in neutral do var ⇐ line loop 2: for each word in var do compute polarity score if polarit yscor e < 0 then if negatechecker () = false then goto loop 2 else if negatechecker () = true then append var to negative end if end if if polarit yscor e > 0 then if negatechecker () = false then append var to negative else if negatechecker () = true then goto loop 2 end if end if end for end for end while
Algorithm 2 negatechecker(line, word) Find bigrams containing the negation words if negation word present then return false else if negation word not present then return true end if
3.4 Classification Using Emoticons To classify emoticons as positive, negative or neutral, based on the emotions they represent, an emoticon dictionary is constructed. The emoticon dictionary consists of 80 popularly used emoticons. The polarity of the emoticons is assigned by referring the emoji-emotion package in python. The dictionary consists of emoticons, polarity for each emoticon and its Unicode in two formats, i.e. Python/Java and C++. The polarity score ranges from −5 to +5.
3.5 Extraction and Classification of User History After identifying potentially “depressed” users, their user history is traced up to two weeks. User history of only public profiles is extracted, which is allowed by the Twitter API [17]. The user ID, tweet and location is extracted and the tweets are preprocessed and labelled based on whether the tweet is depressing or not. The labelled dataset is used for training an RNN network. The same dataset is given as
132
D. R. Jyothi Prasanth et al.
input to an LSTM and BiLSTM network. It is trained, tested and validated by feeding the labelled dataset as input.
3.6 Recommendation System After identifying depressed users, a recommendation system is built to send positive messages to them. A set of positive messages are stored such as “ Try to look for something positive each day, even if you have to look a little harder”, “ When life gets hard, just remember even a butterfly has to experience the dark days inside a cocoon to spread its wings and fly”. The users can also be suggested support groups and therapists based on their location.
4 Experiment and Results Real time tweets were collected from Twitter using the Twitter API. The original dataset comprises of blocks, with each block containing the tweet, the username, the time and date of the tweet, the location if specified, profile details, retweets, followers count along with lots of noise. For this analysis, about 30,000 tweets were used. The tweets alone were extracted from the blocks and processed by removing emoticons, hashtags and URLs. The emoticons are stored as their corresponding Unicode for further analysis. The username, tweet id and the raw tweet is stored for further usage. The tweet is lemmatized and is given as input to three different dictionaries, VADER, SentiWordsNet and TextBlob. Analysis is done to find the depressed users, so the positively classified tweets can be eliminated. Under the assumption that the tweets classified as negative by all the 3 methods are truly negative, the common negative tweets are removed and stored separately. The tweets classified as negative by the NNS algorithm are added to the set of negative tweets. The constructed emoticon dictionary is used to assign polarity to the emoticons. The emoticons that were extracted while preprocessing as Unicode characters are mapped with the dictionary, and its polarity is valuated. After evaluating the emoticons, only the tweets with negatively classified emoticons are considered for further analysis. The negative tweets are mapped with their usernames, and the user’s twitter history is traced back to two weeks using the Twitter API. While extracting the user history, retweets are not considered. Some users have private profiles or some would not have tweeted in the given time frame. This reduces the number of tweets obtained. For this analysis, we have extracted tweets from around 1200 users. The extracted tweets are labelled as ‘y’ meaning depressed and ‘n’ meaning not depressed. They are then split into training, testing and validation sets and given to RNN, LSTM and BiLSTM networks. It is observed that BiLSTM performs better than the other two, with an
10 Exploring Human Emotions for Depression Detection … Table 1 Comparison of NNS and dictionaries NNS VADER TextBlob SentiWordsNet
Table 2 Performance metrics Parameter Accuracy Recall Precision F-measure Specificity
133
500/500 311/500 248/500 288/500
NNS 0.90 0.97 0.86 0.91 0.80
Table 3 Comparison of RNN, LSTM and BiLSTM Neural networks Accuracy RNN LSTM BiLSTM
72 76 90
accuracy of 0.90. The user names and user IDs of the people who are depressed is sent to the recommendation system. The recommendation system contains a set of positive messages, details of selfhelp groups and support groups. A public twitter account “Depress_Fight” is created exclusively to send positive messages to people. To protect privacy, the messages can be received only by the people who allow private messaging. A few people have acknowledged the messages sent by the recommendation system, thanking for cheering them up. Account Name- Depress_Fight User Name- @Depress_fight The performance metrics considered to evaluate the performance of the NNS Algorithm are given in Table 1. To compare the efficiency of NNS Algorithm with the other three dictionaries used in this work, 500 negative tweets were given as input and the accuracy obtained is shown in Table 2. From that table it can be inferred that the proposed algorithm outperforms the other 3 dictionaries. Table 3 shows the comparison of RNN, LSTM and BiLSTM networks.
134
D. R. Jyothi Prasanth et al.
5 Conclusion A method for detecting depression among twitter users by reducing the misclassification rates is proposed. An enhanced algorithm for efficient classification of negative tweets is suggested called the Neutral Negative Scoring Algorithm. An emoticon dictionary is constructed, and the truly negative tweets are identified. The user history of those users is analyzed using three different networks, RNN, LSTM and BiLSTM. It is found that BiLSTM performs better than the other two networks. A recommendation system is built to recommend positive messages to the user. In the future, the depressed users can be segregated based on their location, and support groups can be started in those areas. The proposed system can be expanded to detect other mental health disorders across various social media platforms. This system can also be modified to identify a particular set like people who experience anxiety and depression due to withdrawal from drugs and alcohol. Identifying specific cases will help in providing customized help and support to everybody.
References 1. 5 Big Social Media Predictions for 2019, in Emarsys (2020). https://www.emarsys.com/ resources/blog/top-5-social-media-predictions-2019/.2019 2. Mental disorders, in World Health Organization. https://www.who.int/news-room/fact-sheets/ detail/mental-disorders 3. Depression, in World Health Organization. https://www.who.int/news-room/fact-sheets/ detail/depression 4. M. Cepoiu, J. Mccusker, M.G. Cole et al., Recognition of depression by non-psychiatric physicians-a systematic literature review and meta-analysis. J. Gen. Int. Med. 23, 25–36 (2007). https://doi.org/10.1007/s11606-007-0428-5 5. W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014). https://doi.org/10.1016/j.asej.2014.04.011 6. L. Canales, W. Daelemans, E. Boldrini, P. Martinez-Barco, EmoLabel: semi-automatic methodology for emotion annotation of social media text. IEEE Trans. Affect. Comput. 1–1 (2019). https://doi.org/10.1109/taffc.2019.2927564 7. L. Wang, J. Niu, S. Yu, SentiDiff: combining textual information and sentiment diffusion patterns for Twitter sentiment analysis. IEEE Trans. Knowl. Data Eng. 32, 2026–2039 (2020). https://doi.org/10.1109/tkde.2019.2913641 8. S. Akhtar, D. Ghosal, A. Ekbal et al., All-in-one: emotion, sentiment and intensity prediction using a multi-task ensemble framework. IEEE Trans. Affect. Comput. 1–1 (2020). https://doi. org/10.1109/taffc.2019.2926724 9. M.E. Larsen, T.W. Boonstra, P.J. Batterham et al., We feel: mapping emotion on Twitter. IEEE J. Biomed. Health Inform. 19, 1246–1252 (2015). https://doi.org/10.1109/jbhi.2015.2403839 10. H. Liu, P. Burnap, W. Alorainy, M.L. Williams, A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Trans. Comput. Soc. Syst. 6, 227–240 (2019). https://doi.org/10.1109/tcss.2019.2892037 11. R.L. Rosa, D.Z. Rodriguez, G. Bressan, Music recommendation system based on user’s sentiments extracted from social networks. IEEE Trans. Consumer Electron. 61, 359–367 (2015). https://doi.org/10.1109/tce.2015.7298296
10 Exploring Human Emotions for Depression Detection …
135
12. R.L. Rosa, G.M. Schwartz, W.V. Ruggiero, D.Z. Rodriguez, A knowledge-based recommendation system that includes sentiment analysis and deep learning. IEEE Trans. Ind. Inform. 15, 2124–2135 (2019). https://doi.org/10.1109/tii.2018.2867174 13. H. Ning, S. Dhelim, N. Aung, PersoNet: friend recommendation system based on big-five personality traits and hybrid filtering. IEEE Trans. Comput. Soc. Syst. 6, 394–402 (2019). https://doi.org/10.1109/tcss.2019.2903857 14. C.J. Hutto, G. Eric, VADER: a parsimonious rule-based model for sentiment analysis of social media text, in Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014 (2015) 15. S. Loria, M. Honnibal, P. Keen et al., Simplified text processing, in TextBlob. https://textblob. readthedocs.io/en/dev/ 16. S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining, in International Conference on Language Resources and Evaluation, (2010), pp. 2200–2204 17. Documentation Home | Docs | Twitter Developer, in Twitter. https://dev.twitter.com/rest/public 18. S. Ghosh, D. Gunning, Natural Language Processing Fundamentals: Build Intelligent Applications That Can Interpret the Human Language to Deliver Impactful Results (Packt Publishing, Birmingham, 2019) 19. B. Liu, L. Zhang, A survey of opinion mining and sentiment analysis. Mining Text Data 415– 463 (2012)
Chapter 11
Artificial Neural Network Training Using Marine Predators Algorithm for Medical Data Classification Jayri Bagchi and Tapas Si
1 Introduction Artificial neural network (ANN) [1] is an information processing mathematical representation that works exactly the same way as our brain processes information. The processing units in an ANN are called nodes, modelling the neurons of the biological brain and the connections between each layer of the ANN mimics the synapses of the human brain. The connections are called edges, and edge has a certain weight associated with it. The output is generated by performing certain mathematical computations on the input and hidden layers of the ANN. The learning algorithm basically adjusts these weights by minimising the error between the predicted output by the ANN and the target output. Learning algorithms are basically of two types: first, the mathematical or traditional ones and second, the stochastic ones. Classical algorithms like backpropagation (BP) [1], LM [1] fall in the first category, whereas swarm intelligence algorithms, genetic algorithms are of the second category. Generally, ANN is trained with backpropagation and other mathematical methods. But there is a serious limitation of these methods. These methods easily get trapped in local optima and are highly dependent on the initialisation which results in their lack of predicting better outcomes for new input patterns. Gradient descent also has a slow performance because for stable learning, small learning rates are desirable. Keeping in mind these limitations, stochastic approaches like metaheuristic algorithms serve as an alternative for ANN training. Stochastic approaches heavily rely on randomness for the results and hence can adapt better to new input patterns. The advantage of the stochastic algorithms is its ability to avoid local optima but these are much slower than the traditional algorithms like BP and LM. It has been found from the literature survey that various stochastic algorithms have been used in ANN training, specifically in the J. Bagchi · T. Si (B) Department of Computer Science and Engineering, Bankura Unnayani Institute of Engineering, Bankura, West Bengal 722146, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_11
137
138
J. Bagchi and T. Si
context of medical data classification. It has also been found out that metaheuristic algorithms are more efficient when there is little information about the problem and the search space is complex and large, and these challenges are more seen in case of medical datasets. Various swarm intelligence algorithms like bio-geography-based optimizer [2], grey wolf optimizer [3], multi-verse optimizer [4], Whale optimization algorithm [5], salp swarm algorithm [6] and sine–cosine algorithm [7] have been used for neural network training. Kononenko [8] provides an overview of the application of intelligent systems in medical data analysis from the perspective of machine learning. Some state-ofthe-art systems have been compared when applied to medical diagnostics. Kalantari et al. [9] discussed the various computational intelligent methods in medical data classification task. This work reported that while SVM and AIRS gave the best results as single-based computational method, hybridization of SVM with other methods like GA, AIS, AIRS and fuzzy network gave prime results in terms of accuracy, sensitivity and specificity. Medical data highly suffers from the problem of class imbalance. Mazurowski et al. [10] conducted a study regarding the consequence of class imbalance when training a neural network for classification. BP and PSO had been used for the study and results showed that BP is more reliable than PSO when dealing with class imbalance. Si et al. [11] proposed a variant of the differential evolution (DE) algorithm as neural network training algorithm and results revealed that the variant performed better than the classical DE algorithm. Shen et al. [12] applied fruit-fly optimization algorithm with SVM (FOA-SVM) and validated the better performance of FOA-SVM with PSO-SVM, GA-SVM, BFO-SVM and GridSVM. Askar et al. [13] applied recurrent neural network in medical data analysis and classification and found out that since RNN had the ability to remember their past behaviour and several other advantages, it is a reliable approach for medical data analysis tasks. Huang et al. [14] applied a transformed FNN training approach to boost the accuracy of classification tasks. Zhu et al. [15] proposed class weights random forest, a novel method for addressing medical data that had the problem of class imbalance. Dutta et al. [16] applied fireworks algorithm as the training algorithm in neural network training for medical data classification. Si et al. [17] used a variant of PSO, called partial opposition-based PSO (POPSO) as the optimization algorithm in neural network training for the task of medical data classification. To the best of authors’ knowledge, recently developed metaheuristic approach marine predators algorithm (MPA) [18] has not been applied in the field of neural network training for classification tasks. In this paper, MPA has been used in ANN training for medical data classification. Ten medical datasets have been used to validate its performance. A comparative analysis has been conducted with LM and PSO. The experiment results with multi-criteria decision analysis (MCDA) [19] verify that MPA outperforms both LM and PSO in medical data classification tasks.
11 Artificial Neural Network Training Using …
139
1.1 Marine Predators Algorithm In this work, multilayer perceptron (MLP) has been trained using marine predators algorithm (MPA). MPA is designed observing the foraging behaviour of ocean predators. MPA mimics the prey and predator behaviours in marine ecosystems. The main inspiration behind this algorithm are the two types of movements, Lévy and Brownian movements. Lévy flight mainly consists of tiny steps and rare long jumps. So it contributes in better local search ability or exploitation of the search space. On the other hand, Brownian flight is associated with long jumps and tracing long distances, resulting in better exploration abilities. Hence, neither Lévy nor Brownian alone can prove to give optimum results if applied to an optimization problem. So MPA combines both Lévy and Brownian for different phases of encounter between the predator and the prey. Brownian movement is actually a stochastic process where the step size is taken from the probability function of the Gaussian distribution with μ (mean) = 0 and σ 2 (variance)= 1. The probability density function at point x for Brownian movement is defined as follows: 2 x 1 (1) f (x) = √ exp − 2 2π The technique for producing random numbers formed from Levy distribution is modelled as follows: x Levy(α) = 0.05 × (2) 1 |y| α where normal distribution variables x and y are calculated as: x = Normal(0, σx2 )
(3)
y = Normal(0, σ y2 )
(4)
where σx2 and σ y2 are the standard deviations of x and y, respectively. In this equation, σx is calculated as: τ (1 + a) sin πa σx = (5) α−12 α2 2 τ 1+α 2 and σ y =1 and α = 1.5. Coming to the formulation of MPA, as other metaheuristic approaches, initial solution is randomly initialized for uniform distribution over the search space [X min , X max ] as: (6) X 0 = X min + rand(X max − X min ) where rand is a random number in the interval [0, 1]. There are two important matrices to be considered. One is called the Elite and the other Prey. Both are of the same dimensions, n × d, where n is the number of search
140
J. Bagchi and T. Si
agents and d is the dimension. The initial solution is the prey matrix, whereas the fittest of all the agents constitutes the Elite matrix. One thing to be noted is that here both the predator and prey are the search agents as both are in search of food. MPA optimization process is basically divided in three parts as the following: 1. When the prey moves faster than the predator, also called the high velocity ratio, 2. When both prey and predator are at almost the same speed, also called unit velocity ratio and 3. When predator is faster than the prey called the low velocity ratio. This entire optimization process mimics what actually happens in nature. The different phases in the optimization process are as follows: Phase-I: This scenario concerns the initial iterations where prey moves faster than the prey and exploration matters. Here the best plan of action for the predator is remaining static, whereas prey moving in Brownian motion as Brownian flight guarantees better exploration. where R B is a vector constituting random numbers in while Iter< 13 × Maxiter do stepsizei = R_B × (Elitei -R_B × Prey) i = 1, 2, · · · , n Preyi =Preyi +P.R × stepsizei end
the normal distribution representing the Brownian movement P = 0.5, R is a random vector in [0, 1]. This phase occurs when step size or velocity is high, mostly in the first third of iterations. Iter is the current iteration and Maxiter is the total number of iterations. Phase-II: This phase is where both exploration and exploitation takes place where the predator is in charge of exploration and prey is responsible for exploitation. This phase actually mimics the natural phenomenon when both predator and prey look for food and hence move at the same pace. If prey progresses in Lévy motion, the best alternative for the predator is to move in Brownian. For the first half of the population where R L is a vector consisting of random numbers build on Lévy distribution that while 13 × Maxiter < Iter < 23 × Maxiter do stepsizei = R L × (Elitei -R L × Prey) i=1,2,· · · ,n/2 Preyi =Preyi +P.R × stepsizei end
constitutes Lévy movement. The multiplication of R L with prey vector replicates the movement of prey in Lévy manner, whereas addition of the step size to the prey position helps in simulating prey movement. Since Lévy flight or walk is analogous to small steps, it helps in better exploitation.
11 Artificial Neural Network Training Using …
141
For the next half of the population stepsizei = R B × (R B × Elite-Prey) i=n/2,· · · ,n Preyi = Elitei + P.CF × stepsizei
where
Iter CF = 1 − Maxiter
2Iter Max
iter
(7)
CF is considered as a flexible parameter for controlling the step size for movement of the predator. Multiplication of R B and Elite helps in simulating the movement of predator in Brownian movement while addition of prey upgrades its position based on the predators movement in Brownian motion. Phase-III: while Iter > 23 × Maxiter do stepsizei =R L × (R L × Elitei - Preyi ) i=1,2,· · · ,n Preyi = Elitei + P.CF × stepsizei end
Multiplication of R L with Elite simulates the movement of Elite in Lévy strategy and adding the position of prey updates the position of Elite. Eddy formation and FADs’ effect—fish aggregating device or FADs are tools used to engage ocean fishes and hence is an environment concern because it has been found that sharks take up more than 80% of their time in the presence of FADs and mathematically eddy formation of FADs effect are considered as local optima and hence, marine creatures take long jumps during simulation to avoid getting trapped in the local optima. MPA avoids getting trapped in local optima by the Lévy strategy of long jumps. FADs effect is mathematically modelled as: Preyi + CF[X min + R.(X max − X min )].U if r ≤ Pd (8) Preyi = Prey + [FADs(1 − r ) + r ](Preyr 1 − Preyr 2 ) if r > Pd where Pd = 0.2 is the probability of FADs in the optimization process, the binary vector U contains 1s and 0s which is formed by initiating random vector in [0, 1] and modifying its array to 0 if array is less than 0.2 and changing its array to 1 otherwise. r is the random number in [0, 1], r 1 and r 2 are random indexes of the prey matrix.
142
J. Bagchi and T. Si
2 ANN Training Using MPA ANN is a mathematical representation that mimics the functioning of the human brain. ANN consists of a network of interconnected nodes that upon receiving set of inputs and corresponding weights to the inputs, give a certain output as a result of “activations” mimicking synapses of the human brain. The basic architecture of ANN consists of three layers: input layer, hidden layer and the output layer. Each processing node, except the input layer, computes a weighted sum of the layer preceding that and then goes through a certain activation function to give the output which is passed as input to the next layer. The weighted sum is calculated as: sum j =
wi j o j + bias j
(9)
i
and output as: o j = F(net j )
(10)
where wi j is the synaptic weight for the connection edge from node i to j, o j is the output of node j, and F(.) is the activation function. Here the activation function used is the sigmoid function which is calculated as: F(net j ) =
1 1 + e−net j
(11)
In this work, MPA is used to search the weights of the neural network by minimizing the mean squared error or MSE. The number of input nodes equal the number of attributes, and the number of output nodes are the number of classes. A three-layer MLP having n input nodes, (2n + 1) hidden nodes and m output nodes is used in this work. The MSE is formulated as: mse =
n m 1 (ti j − oi j )2 n.m i=1 j=1
(12)
where i is the training input pattern, j is the corresponding output node, oi j is the predicted output by the ANN when the network is trained with training pattern i, ti j is the equivalent target output, n is the number of training patterns or samples and m is the number of outputs. For computing the output, a binary encoding of 1-of-m has been computed and winner-takes-all policy is applied where the correct output class carries 1 − , whereas non-output classes carry . The dimension of the search space of the marine creature (both predator and prey) in MPA is the total number of weight coefficients including the bias. Each marine creature represents a distinct neural network trained on the training data. At the termination of the optimization process, the best neural network is used to categorize the class for the test data.
11 Artificial Neural Network Training Using …
143
3 Results and Discussion In this work, MPA has been used to train a multi-layer perceptron. The weights of the neural network have been predicted or searched using MPA algorithm. The proposed method has been applied in the classification of 10 medical data sets, namely dermatology, heart, hepatitis, kidney, liver, lung, parkinsons, pima, SPECT and vertebral collected from [20]. The proposed method is validated using K-fold cross-validation method where K = 10. The parameters of MPA are set as follows: number of search agents (N) = 30, FADs = 0.2, P = 0.5, maximum iterations = 100. The parameters of PSO are set as follows: swarm size (N) = 30, wmin = 0.4, wmax = 0.9, c1 = c2 = 1.44945, Vmax = (X max − X min ), maximum iterations = 100. The parameters of LM as follows: maximum number of epochs = 2000, initial µ = 0.001, µ increase and decrease factors are 10 and 0.1, respectively, maximum µ = 1e+10. The performance of the proposed method has been calculated by the average classification accuracy, sensitivity or true positive rate (TPR), specificity or true negative rate (TNR), precision, geometric mean (GM), F-measure and false positive rate (FPR) [21]. GM is a measure of the trade-off between sensitivity and specificity hence higher the GM value, better is the trade-off between sensitivity and specificity. F-measure is the harmonic mean of precision and recall, and thus, the higher the value of F-measure, better is the classification. For verification, the experiment has been repeated with LM and PSO algorithms as well. The results for all the 10 datasets are given in Tables 1 and 2. Bold faced results in these tables indicate better. For each dataset, ranks have been assigned to all the algorithms. The classification performance of different competitive methods is now analysed using multi-criteria decision-making (MCDM) process [22]. In this work, performance of the methods is analysed using the well-known TOPSIS (Technique of Order of Preference by Similarity to Ideal Solution) method which is a multi-criteria decision analysis method for ranking various alternatives depending on multiple criteria. It is a widely used method and has applications in various fields. Here, multiple criteria are the different performance measures. The ranking is done according to this method. From the rank assigned to each algorithm for every dataset, it is observed that MPA achieves rank-1 for 6 out of 10 datasets which shows MPA outperforms the other methods, PSO and LM in classification. For the datasets heart, liver, pima, SPECT and vertebral, MPA has higher classification accuracy than LM and PSO. Specifically discussing about each dataset, the results of MPA for all the criteria are better than LM and PSO for the heart dataset. For lung dataset, PSO has better sensitivity than MPA and LM whereas LM has better specificity than MPA and PSO. Also LM has the lowest FPR among all the methods for the lung dataset. For the pima dataset, PSO has higher specificity than MPA and LM and PSO has the lowest FPR as well for the pima dataset. For the SPECT dataset, LM has higher specificity as well as the lowest FPR than the other methods. For the vertebral dataset, MPA has the best results among all the methods for all the criteria. For the dataset hepatitis, MPA and LM have performed equally well but according to TOPSIS ranking, MPA holds rank 1. For the datasets dermatology, kidney, lung and parkinsons, LM has
144
J. Bagchi and T. Si
Table 1 Mean and standard deviation of measures for dermatology, heart, hepatitis, kidney, liver and lung Dataset MPA PSO LM Dermatology
Heart
Hepatitis
Kidney
Liver
Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank
98.63 ± 0.0144 99.26 ± 0.0156 97.49 ± 0.0420 98.79 ± 0.0196 98.34 ± 0.0205 99.00 ± 0.0106 2.51 ± 0.0420 2 81.86 ± 0.0929 75.98 ± 0.1159 87.24 ± 0.1033 82.74 ± 0.1536 81.23 ± 0.0956 78.69 ± 0.1180 12.76 ± 0.1033 1 100 ± 0 100 ± 0 100 ± 0 100± 100 ± 0 100 ± 0 0±0 1 96.75 ± 0.0265 97.11 ± 0.0410 95.86 ± 0.0603 97.69 ± 0.0313 96.40 ± 0.0320 97.33 ± 0.0236 4.14 ± 0.0603 2 71.00 ± 0.0582 86.18 ± 0.0908 51.29 ± 0.1307 71.19 ± 0.0809 65.83 ± 0.0783 77.37 ± 0.0425 48.71 ± 0.1307 1
88.57 ± 0.1101 96.37 ± 0.0393 69.49 ± 0.3856 89.89 ± 0.1161 72.59 ± 0.3883 92.51 ± 0.0638 30.51 ± 0.3856 3 75.89 ± 0.1008 70.10 ± 0.1565 81.59 ± 0.1259 75.28 ± 0.1920 75.05 ± 0.1083 71.58 ± 0.1453 18.41 ± 0.1259 2 96.71 ± 0.0646 87.64 ± 0.0499 96.25 ± 0.1186 97 ± 0.0949 96.69 ± 0.0669 96.98 ± 0.0577 3.75 ± 0.1186 3 87.75 ± 0.1017 83.01 ± 0.1471 96.32 ± 0.1165 97.20 ± 0.0885 88.84 ± 0.1017 88.59 ± 0.1054 3.68 ± 0.1165 3 59.11 ± 0.0315 96.23 ± 0.0520 7.98 ± 0.0894 59.11 ± 0.0447 20.40 ± 0.1898 73.03 ± 0.0294 92.01 ± 0.0894 3
100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 0±0 1 73.97 ± 0.0445 75.63 ± 0.0905 72.09 ± 0.0637 71.69 ± 0.0583 73.59 ± 0.0461 71.60 ± 0.0554 24.38 ± 0.0637 3 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 0±0 2 96 ± 0.0357 99.33 ± 0.0573 94 ± 0.0211 99.60 ± 0.0126 96.58 ± 0.0308 96.63 ± 0.0317 0.66 ± 0.0211 1 65.52 ± 0.0838 56.57 ± 0.1358 72.00 ± 0.1192 69.58 ± 0.0626 63.09 ± 0.0828 70.35 ± 0.0839 43.43 ± 0.1192 2 (continued)
11 Artificial Neural Network Training Using … Table 1 (continued) Dataset Lung
Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank
145
MPA
PSO
LM
82.50 ± 0.1942 92.50 ± 0.1687 20.00 ± 0.4216 90.00 ± 0.1610 20.00 ± 0.4216 89.24 ± 0.1228 30.00 ± 0.4830 2
81.66 ± 0.2540 93.33 ± 0.2108 10.00 ±0.3162 88.33 ± 0.1933 10.00 ± 0.3162 87.66 ± 0.1792 30.00 ± 0.4830 3
74.17 ± 0.2648 10.00 ± 0.2540 81.66 ± 0.3162 86.66 ± 0.1851 10.00 ± 0.3162 82.48 ± 0.1989 40.00 ± 0.5164 1
performed better than both MPA and PSO hence attaining rank-1. But when we have a close look, we can observe that for the kidney dataset, MPA has recorded higher accuracy, specificity and F-measure than LM. Also for the dataset lung, MPA has higher sensitivity, precision, GM and F-measure than LM and lower FPR than LM as well. Despite all these, MPA secures rank 1 for these datasets according to TOPSIS evaluation which verifies the better performance of MPA. One more important observation from Tables 1 and 2 is that, most of the highest GM values are achieved the MPA, irrespective of its rank. This signifies MPA is the most reliable method among all the methods for datasets that have class imbalance problem. The experiments are conducted on a laptop PC having Intel i3-4005U 1.70GHz CPU, 4GB RAM, Windows 7 operating system and MATLAB R2018a Software. The computational times for all the methods and datasets are given in Table 3. Bold faced results in this table indicate better. The observation drawn from this table is that both PSO and LM are faster than MPA in MLP training, rather MPA being the slowest of all the methods. The reason being that there are different phases in the optimization process of MPA. One important thing to note is that efficiency of an algorithm is when it takes less time and effectiveness is when an algorithm gives better results. So LM and PSO are efficient algorithms whereas MPA is effective. In the context of medical data classification, effectiveness is more important than efficiency [17]. Hence, MPA is more effective than PSO and LM. There are certain challenges in MLP training. One of the most significant being that it gets stuck in local optima. Since PSO has the limitation of getting stuck in local optima and slow convergence speed and hence poor exploitation ability of the search space, it doesn’t perform well in MLP training and thus gives the worst results among all the methods. LM also has the limitation of having low convergence speed and getting stuck in local optima. But since MPA performs better because of its efficient exploration as well as exploitation capability than the other methods, it doesn’t get stuck in local optima. The average ranks for all the algorithms are as follows: MPA = 1.4, LM = 1.9 and PSO = 2.7. It can be seen that MPA has achieved the lowest rank (lower is better) among all the methods which shows its better performance in MLP training for classification. The next rank is assigned to LM whereas PSO has the highest
146
J. Bagchi and T. Si
Table 2 Mean and standard deviation of measures for Parkinsons, Pima, SPECT and Vertebral Dataset MPA PSO LM Parkinsons
Pima
SPECT
Vertebral
Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank Accuracy Sensitivity Specificity Precision GM F-measure FPR Rank
87.16 ± 0.0739 96.05 ± 0.0701 58.89 ± 0.2419 87.41 ± 0.0808 73.34 ± 0.1581 91.28 ± 0.0598 41.10 ± 0.2419 2 76.57 ± 0.0394 59.59 ± 0.0693 85.99 ± 0.0537 69.47 ± 0.1095 71.43 ± 0.0432 63.61 ± 0.0599 14.00 ± 0.0537 1 83.16 ± 0.0726 93.19 ± 0.0755 45.92 ± 0.2267 86.99 ± 0.0571 63.50 ± 0.1691 89.72 ± 0.0447 54.08 ± 0.2267 1 84.84 ± 0.0730 75.44 ± 0.1937 89.51 ± 0.0726 78.23 ± 0.1697 81.39 ± 0.1127 75.49 ± 0.1459 10.49 ± 0.0726 1
78.47 ± 0.0939 95.46 ± 0.0688 27.09 ± 0.2129 80.08 ± 0.0876 46.42 ± 0.2271 6.79 ± 0.0623 72.90 ± 0.2129 3 70.69 ± 0.0471 38.57 ± 0.1711 87.49 ± 0.0684 62.03 ± 0.1236 55.86 ± 0.1451 45.57 ± 0.1618 12.50 ± 0.0684 2 82.81 ± 0.0940 93.17 ± 0.0524 44.17 ± 0.2025 86.10 ± 0.0909 2.24 ± 0.1625 89.29 ± 0.0637 55.83 ± 0.2025 2 76.45 ± 0.1226 53.05 ± 0.2809 88.30 ± 0.1146 69.40 ± 0.2357 65.29 ± 0.2042 56.83 ± 0.2360 11.69 ± 0.1146 3
93.34 ± 0.0482 94.62 ± 0.0527 90.00 ± 0.1054 96.61 ± 0.0358 92.11 ± 0.0602 95.51 ± 0.0330 10.00 ± 0.1054 1 69.41 ± 0.0535 55.98 ± 0.1204 76.60 ± 0.0462 56.04 ± 0.0778 65.14 ± 0.0736 55.74 ± 0.0906 23.40 ± 0.0462 3 71.56 ± 0.0739 77.86 ± 0.0734 47.66 ± 0.1618 85.14 ± 0.0489 60.25 ± 0.1090 81.19 ± 0.0524 52.33 ± 0.1618 3 78.71 ± 0.0573 65.00 ± 0.1354 85.24 ± 0.0690 68.77 ± 0.1032 73.97 ± 0.0757 66.04 ± 0.0909 14.76 ± 0.0690 2
11 Artificial Neural Network Training Using … Table 3 Computational Time (in minutes) Dataset MPA Dermatology Heart Hepatitis Kidney Liver Lung Parkinsons Pima SPECT Vertebral
0.352633111 0.187475712 0.120615444 0.302532212 0.177898128 0.226026435 0.154721245 0.383404117 0.200400436 0.16451297
147
PSO
LM
0.235895261 0.111206734 0.101073027 0.175763422 0.106027919 0.148709325 0.092755633 0.225505158 0.125682233 0.099602586
0.075953261 0.046460269 0.024571935 0.052458717 0.377328322 0.377282836 0.078887972 0.651820405 0.058752772 0.397668625
rank. Hence, it is clear that MPA performs better than PSO and LM in MLP training for classification tasks. The main reason behind this is that MPA overcomes the limitations of both PSO and LM of getting stuck in local optima and slow convergence speed which are not at all desirable in MLP training. MPA combines the effects of Lévy and Brownian movements for better exploitation and exploration respectively of the search space results in better ANN training leading to better classification of the medical data.
4 Conclusions In this work, MPA has been used to search the weights of ANN for medical data classification. The proposed method has been applied in the classification of 10 bench-marked medical datasets. The comparative study has also been conducted with LM and PSO. The results are analysed using MCDM method. Results show that MPA performs better than both the methods for most of the datasets due to its better ability of exploring and exploiting the search space. MPA also has the highest rank obtained using MCDM process and hence the best performance among all the methods. The future research would be directed to the development of MPA and its application in training of ANN having more complex architecture for medical data classification.
148
J. Bagchi and T. Si
References 1. S. Haykin, Neural Networks and Learning Machines, 3rd ed. (PHI, 2011) 2. S. Mirjalili, S.M. Mirjalili, A. Lewis, Let a biogeography-based optimizer train your Multi-layer perceptron. Inf. Sci. 269, 188–209 (2014) 3. S. Mirjalili, How effective is the GreyWolf optimizer in training multi-layer perceptrons. Appl Intell 43, 150–161 (2015) 4. H. Faris, I. Aljarah, S. Mirjalili, Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl. Intell. 45, 322–323 (2016) 5. I. Aljarah, H. Faris, S. Mirjalili, Optimizing connection weights in neural networks using the whale optimization algorithm. Soft-Comput. 22, 1–15 (2018) 6. D. Bairathi, D. Gopalani, Numerical optimization and feed–forward neural networks training using an improved optimization algorithm: multiple leader salp swarm algorithm. Evol. Intell. (2019). https://doi.org/10.1007/s12065-019-00269-8 7. S. Gupta, K. Deep, A novel hybrid sine cosine algorithm for global optimization and its application to train multilayer perceptrons. Appl. Intell. 50, 993–1026 (2020). https://doi.org/10. 1007/s10489-019-01570-w 8. I. Kononenko, Machine learning for medical diagnosis. Artif. Intell. in Med. 23, 89–109 (2001) 9. A. Kalantari, A. Kamsin, S. Shamshirband, A. Gani, H.A. Rokny, A.T. Chronopoulos, Computational intelligence approaches for classification of medical data: state-of-the-art, future challenges and research direction. Neurocomputing 276, 2–22 (2018) 10. M.A. Mazurowskia, P.A. Habasa, J.M. Zuradaa, J.Y. Lob, J.A. Bakerb, G.D. Tourassib, Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21, 427–436 (2008) 11. T. Si, S. Hazra, N.D. Jana, Artificial neural network training using differential evolutionary algorithm for classification, in Proceeding InConINDIA 2012 (Springer-Verlag, Berlin, Heidelberg, 2012), pp. 769–778 (2012) 12. L. Shena, H. Chena, Z. Yua, W. Kanga, B. Zhanga, H. Li, B. Yang, D. Liu, Evolving support vector machines using fruit fly optimization for medical data classification. Knowl.-based Syst. 96, 61–75 (2016) 13. H. Al-Askar, N. Radi, A. MacDermott, Recurrent neural networks in medical data analysis and classifications. Appl. Comput. Med. Health, Emerg. Top. Comput. Sci. Appl. Comput. 7, 147–165 (2016) 14. Y.-P. Huang, A. Singh, S.-I. Liu, S.-I. Wu, H.A. Quoc, A. Sereter, Developing transformed fuzzy neural networks to enhance medical data classification accuracy. Int. J. Fuzzy Syst. 20, 1925–1937 (2018) 15. M. Zhu, J. Xia, X. Jin, M. Yan, G. Cai, J. Yan, G. Ning, Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access 6, 4641–4652 (2018) 16. R.K. Dutta, N.K. Karmakar, T. Si, Artificial neural network training using fireworks algorithm in medical data mining. Int. J. Comput. Appl. (0975–8887) 137(1), 1–5 17. T. Si, R.K. Dutta, Partial opposition-based particle swarm optimizer in artificial neural network training for medical data classification. Int. J. Inf. Technol. Decis. Mak. 18(5), 1717–1750 (2019) 18. A. Faramarzi, M. Heidarinejad, S. Mirjalili, A.H. Gandomi, Marine Predators Algorithm: A nature-inspired metaheuristic. Expert Syst. Appl. 152, (2020) 19. E. Triantaphyllou, Multi-criteria decision making methods: a comparative study, 44 (2000). https://doi.org/10.1007/978-1-4757-3157-6 20. Center for Machine Learning and Intelligent Systems (University of California, Irvine). http:// archive.ics.uci.edu/ml/datasets.php 21. Tharwat, A. Classification assessment methods. Appl. Comput. Inf. (2018). https://doi.org/10. 1016/j.aci.2018.08.003 22. G. Kou, Y. Lu, Y. Peng, Y. Shi, Evaluation of classification algorithms using MCDM and rank correlation. Int. J. Inf. Technol. Decis. Mak. 11(1), 197–225 (2012)
Chapter 12
Code Generation from Images Using Neural Networks Chandana Nikam, Rahul Keshervani, Shravani Shah, and Jagannath Aghav
1 Introduction Designing and developing the graphical user interface is an important step in the development of an application. In the current process of development, the design team sketches a tentative design on paper or whiteboard. The blueprint of the user interface is made from this sketch, using tools like Photoshop, which is then sent to developers. The developers try their best to capture the intended look and feel of the application based on these mock-up images, by writing the code for the GUI. This process is very time-consuming and inefficient. Furthermore, an application is deployed in iterative steps, where designers consider the client interactions and patterns and use their feedback to modify the UI design accordingly. The developer has to spend extra efforts in making the required changes iteratively. The purpose of this paper is to automate this process. To simplify the conversion of the input GUI image (design) to the respective React Native code, we have built a model using various techniques of computer vision and deep learning. Our first contribution is the generation of resultant code in React Native, due to which the application can be viewed across multiple platforms such as Android, Web as well as iOS. Using React Native gives us an edge over the previous attempts like pix2code, sketch2code, REMAUI, and ReDraw. Our second contribution is a novel approach to generate synthetic dataset consisting of images of UI components like buttons, text, text-inputs, etc. Our approach generates the dataset by automatically taking the screenshots of the browser. These C. Nikam (B) · R. Keshervani · S. Shah · J. Aghav College of Engineering, Wellesley Rd, Shivajinagar, Pune, Maharashtra 411005, India e-mail: [email protected] R. Keshervani e-mail: [email protected] S. Shah e-mail: [email protected] J. Aghav e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_12
149
150
C. Nikam et al.
screenshots include automatically generated template components with varying parameters like dimensions, font, color, etc. Our third and most significant contribution is the novel approach which uses computer vision as well as deep learning techniques to automate the conversion of mock-ups image to code. The use of computer vision techniques has proved to be significantly effective in the preprocessing of mock-up images, while combination of EAST text detection with CNN gave more accurate classification of components. These contributions highlighted in this paper will promote future research.
2 Related Work In recent years, there have been a few attempts to generate code from mock-ups images, like pix2code, ReDraw, REMAUI, sketch2code, etc. pix2code was one of the first attempts to use deep learning approach to generate code from GUI screenshots. It introduces an approach based on convolutional and recurrent neural networks to generate tokens from an input GUI screenshot. pix2code uses three different datasets to generate code for the three different platforms, i.e., Android, iOS, and web-based technologies, respectively. Our model is trained on a single dataset and generates code in React Native. This code can be used on any of the above three platforms. Reverse Engineering Mobile Application User Interfaces (REMAUI) uses a technique which first recognizes various components in the image by computer vision and OCR and combines the results for forming a more accurate structure of the components. It also takes into consideration the hierarchy of the components. The drawback of REMAUI is, it is not made cross-platform (iOS). The ReDraw App does GUI code generation for Android apps. It uses CNN for the major classification of components. They have made their own dataset of components by taking screenshots of Android Apps, whereas our dataset consists of taking screenshots of synthetically generated components. ReDraw cleans its dataset by using a machine learning classifier, which could be an enhancement to our work. These attempts acted as a motivation to contribute to this cutting-edge field of code generation from images using neural networks. Learning from these benchmarks and enhancing with our own ideas and experiments, we built this novel approach.
3 Synthetic Dataset Generation When the user gives an input image to the program, the model must detect all the components present in the image. As we have used supervised learning approach, the training dataset that is fed to the model (CNN) for classification purpose consists of the image of the component and its corresponding label. In order to generate this dataset, we have automated the process of writing code to generate individual components such as text-inputs, buttons, images, and text. We have also automated
12 Code Generation from Images Using Neural Networks
151
Fig. 1 Dataset Generation. The components generated using React Native and cropped and labeled to generate the dataset
the process of taking screenshots of the generated output components, to produce images required for training the model. We also labeled the generated images in the dataset according to the name of the component. The process of dataset generation is shown in Fig. 1
4 Code Generation from Images: Our Proposed Methodology GUI designers have their work pipeline defined as deciding on a design, sketching it, and giving the sketch to a developer to develop a prototype code. This last step of converting to code must be carried out iteratively during the development of an application. To simplify the conversion of the input GUI image (design) to the respective React Native code, we have built a model using various techniques of computer vision and deep learning. In the preprocessing stage, the computer vision techniques are used, which include canny edge detection, sobel operator, and morphological operations such as erosion and dilation, etc. Deep learning techniques include EAST text detection along with convolutional neutral network. The structure of the user interface may consist of basic components such as button, textbox, etc. After compiling the generated React Native code, we get a prototype app screen layout for the given GUI design. React Native can be used on Android, web, or iOS. The whole program flow is shown in Fig. 2 A multi-step process is followed to convert the images provided by the user into a functional code. The process is broadly divided into three steps: • Preprocessing of image • Classification of components • React Native code generation
152
C. Nikam et al.
Fig. 2 Program workflow. The input image is given to the program, and its components are detected and then classified using a CNN. The corresponding React Native code of these components is generated and run to get the shown graphical user interface
The first and foremost step is to preprocess the input image(I). During preprocessing, each component is detected, and individual images of each component are created by the function boxextraction(). Then these images are passed through EAST text detector, EAST() and a convolutional neural network, and CNN(), for classification and are labeled according to the final confidence array. Now, the co-ordinates as well as the dimensions of the components are known, and they are passed to the template code generator which creates the React Native code for the same, and the user interface is created and can be opened in an android or on a web browser. Figure 3 shows the program flow.
12 Code Generation from Images Using Neural Networks
153
Fig. 3 Complete System Architecture: From taking an image as an input to giving code as an output
I = boxextraction(I )
(1)
p = C N N (I )
(2)
q = E AST (I )
(3)
Final Con f idence Array = p + q
(4)
T emplate code = T emplate Code Generator (Final Con f idence Array) (5)
154
C. Nikam et al.
4.1 Preprocessing The conversion of the input image to its binary form is the first step in preprocessing. The next step is the application of edge detection algorithms followed by morphological operations like erosion and dilation. The summation of two morphological operations, i.e., vertical or hortizontal, gives the image from which contours can be detected. Using these contours, the dimensions and co-ordinates of components can be identified. Components are clipped from the original image, using this information Fig. 4 shows the steps carried out during the preprocessing stage. After the image has been converted to its binary form, some edge detection techniques are used to detect the edges of components, one of which is the Canny edge detection algorithm. This is a classical computer vision technique. From one of the reference papers [18] and by our experiments , the optimal values found for the parameters considered in canny edge detection algorithm are 2.0, 0.1, 0.3 for width of Gaussian filter(known as sigma), low threshold of the hysteresis, and high threshold of the hysteresis, respectively. Edges are significant in identification of contours. To emphasize more edges, we have used sobel operator. It intensifies the resulting gradient of the corresponding image by values in the near 3 × 3 region.
Fig. 4 Stages of Preprocessing: Converting the input image to individual components
12 Code Generation from Images Using Neural Networks
155
The sobel’s output image is dilated and eroded twice. This helps to avoid fragmented region detection as it groups the nearby elements together. Erosion is a morphological operation which reduces the shapes contained in the image, while dilation reduces noise and helps to connect image components. The primary objective of this step is to merge the nearby contours and thus generate larger regions, which are less fragmented. The image after erosion is used for contour detection. Contour detection gives individual images of each component. The detected contours are sorted, and the external boundaries of components are obtained. The components are cropped from the original input image, using the co-ordinates. The co-ordinates of the components are stored in a file for later use in template code generation.
4.2 Classification of Components As pix2code has a remarkably good accuracy by using LSTM, our first choice for our main model was long short-term memory (LSTM). Later, we discovered that it is not an efficient way to train the model using input GUI images along with their code, as in case of pix2code. We found it easier to train the model for classification of components by giving only the individual GUI component images. CNN was used for its reliability in classifying images using powerful but efficient neural network. After optimizing the various layers, our CNN model was classifying the components accurately. Only some edge cases were getting missed out, like text-input and button were getting confused by the CNN. For this, we used the EAST text detector. EAST text detection is an Open CV deep learning technique by which we can easily detect text in images. We get the output from CNN, and we add the output of EAST text detection to it, so that it rectifies the text-input and button confusion, as button consists of text and text-input does not. Hence, for each component that is cropped from the image by the preprocessing is passed to CNN + EAST text detector for classification. CNN is trained on 2000 images of four components which are text, text-input, button, and image. Even though we tried to include more components like checkbox, but the scope of this study remains limited to the above mentioned four components (Fig. 5). We used five layers of convolution and max-pooling and then fully connected layer at the end of CNN. The values of CNN that showed good results are: • • • •
Learning Rate: 3e-3 Dropout Rate: 0.8 Five convolution and max-pooling layers and one fully connected layer Number of nodes from first layer to last: 34, 64, 128, 64, 32 and fully connected layer with 1024 nodes
CNN produces a confidence array for each component it receives. This array is then passed to EAST text detector, to rectify the confusion between button and textinput components. EAST text detector recognizes individual words and draws boxes around each of them, as you can see in Fig. 6. We used this detector to detect if the text
156 Fig. 5 CNN Tensorboard Visualization
C. Nikam et al.
12 Code Generation from Images Using Neural Networks
157
Fig. 6 EAST text detection
is not present. This gives practically good results in improving CNN’s classification of text-input component. We add a large positive confidence value to the text-input item of CNN’s confidence array, if no text is detected in the component. We add a large negative value to the text-input item of CNN’s confidence array, if the text is detected. The classified components with their co-ordinate data are passed to the React Native code generator.
4.3 React Native Code Generation The co-ordinates of the components(which include its x-co-ordinate from the left of the screen, its y-co-ordinate from the top, its width, and its height) that were stored earlier, and taking the component type, the template code for the component is generated by the React Native code generator. When doing so, the absolute pixel-based positioning of the components is translated to the relative positioning of the components in terms of the screen size percentage. This is useful in situations where the screen size differs, such as when it is viewed on a mobile device with a different screen size or when it is viewed on a monitor.
158
C. Nikam et al.
Iterations
Fig. 7 Accuracy
Table 1 Confusion Matrix Input : Output Text Text Textinput Image Button
24 – – 2
Textinput
Image
Button
Accuracy
– 7 – 1
2 – 6 –
– 12 – 11
24/26 = 92% 7/19 = 36% 6/6 = 100% 11/14 = 78%
5 Results We trained our CNN on four components with training accuracy of around 90%. This trained CNN with EAST detection gave us classified components. The final result of a conversion of the given GUI image layout to GUI code is subjective (how to compare two images, we use random filler text in place of the real text, filler template for components like button, etc.). So to evaluate the results, we manually checked if the components classified by CNN and EAST are right or not. 11 test images with multiple components were used for this (Fig. 7; Table 1).
6 Highlights and Analysis • Text classification is excellent, and this is partially due to good training dataset and also due to good preprocessing that provides test components to CNN similar to the training text components. • Image classification by CNN is excellent(100%). All test images were processed and classified accurately. • Button and text confusion is clear as both components have text in the middle.
12 Code Generation from Images Using Neural Networks
159
• Also, confusion between text-input and button is seen despite using EAST text detection. Without East text detection, CNN’s confusion was more visible in the results. We have to improve our model by investigating this more.
7 Conclusion One can never stress enough on the importance of a good user interface. The experience of a user while using an application mainly depends on the user interface which needs to be carefully and creatively designed. Currently, such designers put their idea onto a paper or make a digital copy of the application’s user interface and send it to the developer for coding. Based on the client feedback, the designers and developers are involved in the iterative process of developing and upgrading the UI of the application. Wherever there is manual work, there can be a scope for automation, and this is the reason that the field of code generation from images has emerged. pix2code, ReDraw, and REMAUI have been pioneers in this domain and a great source of motivation for us to begin our thought process in this field and contribute in its interest. In this paper, we have tried to propose and implement a novel approach to automate the conversion of mock-up images to React Native code. We used computer vision techniques for preprocessing of images and deep learning techniques for classification. Though there is a need to work on corner cases, our results signify that our proposed method is practically feasible and can be scaled up to include more UI components. Our CNN model was trained using synthethic dataset which consists of four UI components, namely text, text-input, button, and image. For practical use, the dataset can be easily scaled up for more components. Using the scaled up dataset and OCR to detect exact text in the user input image will make this approach realistic. Our work provides a solution to bridge the gap between developers and designers by automating the process of converting mock-up images to basic GUI code. This will let the designers experiment with the designs on their own, while the developers focus more on functionalities than appearance and alignments of the application.
References 1. T. Beltramelli, pix2code: Generating Code from a Graphical User Interface Screenshot (2018), pp. 1–6. https://doi.org/10.1145/3220134.3220135 2. A. Robinson, Sketch2code: Generating a Website from a Paper Mockup (2019) 3. K. P. Moran, C. Bernal-Cárdenas, M. Curcio, R. Bonett, D. Poshyvanyk, Machine LearningBased Prototyping of Graphical User Interfaces for Mobile Apps, IEEE 4. T. Nguyen, C. Csallner, Reverse Engineering Mobile Application User Interfaces with REMAUI (T) (2015), pp. 248–259. https://doi.org/10.1109/ASE.2015.32 5. Y. Liu, Q. Hu, K. Shu, Improving pix2code based Bi-directional LSTM, in 2018 IEEE International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China (2018), pp. 220–223. https://ieeexplore.ieee.org/document/8720784
160
C. Nikam et al.
6. R. Davis, T. Saponas, M. Shilman, J. Landay. SketchWizard: Wizard of Oz prototyping of Pen-based user interfaces, in UIST:12 Proceedings of the Annual ACM Symposium on User Interface Software and Technology, pp. 119–128. https://doi.org/10.1145/1294211.1294233 7. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, Generative Adversarial Networks (2014) 8. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780, 1989 (1997). https://doi.org/10.1162/neco.1997.9.8.1735. M. Young, The Technical Writer’s Handbook (University Science, Mill Valley, CA) 9. K. O’Shea, R. Nash, An Introduction to Convolutional Neural Networks. ArXiv e-prints (2015) 10. R. Shetty, M. Rohrbach, L. Hendricks,M. Fritz,B. Schiele, Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training (2017) 11. B. Dai, S. Fidler, R. Urtasun, D. Lin, Towards Diverse and Natural Image Descriptions via a Conditional GAN (2017), pp. 2989–2998. https://doi.org/10.1109/ICCV.2017.323 12. K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, Y. Bengio, Show Neural Image Caption Generation with Visual Attention, Attend and Tell (2015) 13. A. Karpathy, F.F. Li, Deep Visual-Semantic Alignments for Generating Image Descriptions (2015), pp 3128–3137 14. J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, K Saenko,Long-term Recurrent Convolutional Networks for Visual Recognition and Description (2015), pp. 2625–2634. https://doi.org/10.1109/CVPR.2015.7298878 15. M. Raman, H. Aggarwal, Study and comparison of various image edge detection techniques. Int. J. Image Process. 3 (2009) 16. S. Bhardwaj, A. Mittal, A survey on various edge detector techniques. Proc. Technol. 4, 220– 226 (2012). https://doi.org/10.1016/j.protcy.2012.05.033 17. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intelligence, 679–698 (PAMI-8) (1986). https://doi.org/10.1109/TPAMI.1986.4767851 18. X. Zhao, W. Wang, L. Wang, Parameter optimal determination for canny edge detection. Imaging Sci. J. 59, 332–341 (2011). https://doi.org/10.1179/136821910X12867873897517 19. X. Zhou, C. Yao, H. Wen, et al., EAST: an efficient and accurate scene text detector, in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York (2017), pp. 2642–2651 20. O. Vincent, O. Folorunso, A Descriptive Algorithm for Sobel Image Edge Detection (2009) 21. H. Cho, M. Sung, B. Jun, Canny text detector: fast and robust scene text localization algorithm, in 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), June 2016 (2009) pp. 3566–3573. https://doi.org/10.1109/CVPR.2016.388 22. S.K. Katiyar, P. V. Arun, Comparative analysis of common edge detection techniques in context of object extraction, in CoRR abs/1405.6132 (2014). arXiv: 1405.6132. http://arxiv.org/abs/ 1405.6132 23. J.A. Landay, B.A. Myers, Interactive sketching for the early stages of user interface design, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 95 24. T. da Silva, et al., User-centered design and agile methods: a systematic review, in 2011 Agile Conference, Aug. 2011, pp. 77–86. https://doi.org/10.1109/AGILE.2011.24 25. J.A. Landay, B.A. Myers, Sketching interfaces: toward more human interface design. Computer 34(3), 56–64 (2001). issn: 0018-9162. https://doi.org/10.1109/2.910894
Chapter 13
A Novel Method to Detect the Tumor Using Low-Contrast Image Segmentation Technique Priyanka Kaushik and Rajeev Ratan
1 Introduction The term segmentation is defined as the process to break the images into number of partitions so that it becomes more convenient to understand it. The low-contrast image segmentation technique is incorporated to detect the tumor in a more specified way [1]. The tumor is defined as a swelling or morbid enlargement that results from an overabundance of cell growth and division. It can be used as a synonym for neoplasm. The word tumor “cannot” be used as a synonym for a cancer. Tumor can be classified as a benign, pre-malignant, or malignant [2]. Magnetic resonance imaging (MRI) and computerized tomography (CT) scan of the images are basically taken to find out the tumor and for the treatment of the tumor. Among the above two steps, MRI is preferred as it highlights the part and parcel characteristic like multiplanar capabilities [3]. To accentuate the part of the tumor, segmentation is a mandatory step. The following are the various segmentation techniques: Thresholding segmentation: For the segmentation of the image pixels accordingly to the level of the intensity, this method is very useful. This method is implemented on the images which consist of the lighter objects as compared to the background. It is divided into two classes [4]. There are two types of thresholding segmentation, namely local thresholding and global thresholding, which are described below in brief as: Global thresholding:
P. Kaushik (B) · R. Ratan Department of Electronics and Communication Engineering, MVN University Palwal, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_13
161
162
P. Kaushik and R. Ratan
It basically converts the gray values into binary values. Suppose, “v” denotes the value of gray, whereas “t” denotes the value of the threshold value. To generate the multiple thresholding, the image is split up into many segments in accordance to the gray values. Suppose there are n segments, then the global thresholding g(v) may be defined as: ⎫ ⎧ ⎪ ⎪ 0 if v < t 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 if t ≤ v < t ⎪ ⎪ 1 2 ⎪ ⎪ ⎪ ⎬ ⎨ 2 if t2 ≤ v < t3 ⎪ g(v) = 3 if t ≤ v < t 3 4 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪. . ⎪ ⎪ ⎪ ⎪ . . ⎪ ⎪ . . ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ n if tn ≤ v < tn+1 The peaks of the image histogram can be used to find out the threshold value. To determine the threshold of the two peaks in histogram of the image, we here assume that p1 and p2 are the gray values and is given by t as: t=
p1 + p2 = arg{ min H (v)} v∈( p1 , p2 ) 2
Local thresholding: To compensate the illumination of overall image equally, local thresholding is used as the concept of the global thresholding that does not work since the background illumination of an image is not constant [3]. The local thresholding can be find out if we assume “v” to be the gray value of image and “t” be the threshold image by g(v) =
0 if v < t 1 if v ≥ t
where t = v0 + t0
Edge-based segmentation technique: The edge segmentation is such type of technique that draws the line among two regions that are having different properties of gray level [5]. Thus, the complete image is partitioned into various segments accordingly to the variations in the intensity of the pixels that are meant to be near edge. The object boundaries can be shown by edges that are detected. It is mainly of two types that are discussed below in brief: (a)
Gray histogram thresholding:
The image is formed in histogram which will denote the graph between frequency and the gray levels, and after that, there is implementation of the technique whose output would depend on the threshold. (b)
Gradient thresholding
13 A Novel Method to Detect the Tumor …
163
This technique works over there which is having the sudden variations in intensity and consisting of low noise [6]. It basically depends on the variations among the intensity values of the pixels which are in neighbors of an image. 1.
Watershed segmentation technique
This technique will help to distinguish between two objects that are emerged in an image, and therefore, it is also named as gradient-based segmentation technique. It basically separates two catchment basins and ridge lines [7]. These are the steps that would be incorporated for this technique: Step 1: The image that should be taken must be raw which would be read, and preprocessing is done. Step 2: By using RGB model, the image is converted into the gray scale. Step 3: Calculate the markers which are foreground. Step 4: Then, the markers which are background are calculated. Step 5: Calculate the watershed transform of the function, which is segmented. Step 6: The resultant image is segmented. The watershed segmentation has various advantages as it provides the result with more accuracy as well as it is easy to understand. 2.
Low-contrast image segmentation
Contrast is defined as the difference in the luminance or color that would make the object easy to distinguish. There are many efficient techniques available for detection of tumors like un-supervised segmentation and fuzzy inference system [1] that is based on feature for segmentation of low contrast. Low contrast is one of the best techniques to detect tumor [8]. Many researchers have contributed in this field, but still there is a scope of improvement for better efficiency and reproducibility of the results. The following are the comparative analysis with the existing low-contrast image segmentation technique: (a)
(b)
An unsupervised (semi-automatic) technique to detect benign and malignant tumors with the help of watershed segmentation technique based on twodimensional MRI datasets and breast tumor tissues can be segmented in 6– 8 min, whereas by using proposed fully automatic low-contrast techniques, it could be carried out in 3D MRI datasets, and hence, the computational time to diagnose tumor region could be reduced. Proposed technique saves time to diagnose the tumor cells using advanced lowcontrast dataset through graph cuts image segmentation algorithm, whereas Reinhard Beichel suggested an algorithm based on low-contrast enhanced CT data, using graph cuts for liver segmentation that help in liver cancer treatment like tumor resection and image-guided radiation therapy (IGRI) and able to save up to 85% as compared to a standard semi-automatic live wire and 20 segmentation refinement approach.
164
P. Kaushik and R. Ratan
(c)
A method is proposed for segmentation of solid tumors with less user intersection for the assessment of response to the therapy by Andac hamamei. Because of the parallelity of the algorithm, the computation time can be reduced, but due to multiple metastases, the user interaction time increases with the number of tumors [9]. With the help of advanced low-contrast image segmentation algorithms, the user interaction could be reduced. Sarada Prasad Dakua proposed a methodology based on contrast enhancement in wavelet domain for graph-based segmentation, in which weighting functions named DroLOG and contrast enhancement stage are incorporated which are used for limited region of contour. Further work can be carried out by using lowcontrast image segmentation to extend this method to other imaging modalities so the region of contour may be accentuated.
(d)
2 Proposed Methodology To diagnose the tumorous cells, using low-contrast segmentation technique, various steps that would be followed as per the requirements are:
2.1 Image Preprocessing To get the image in a more enhanced way, preprocessing plays a vital role. If there are some blurred images or degraded images, then with the help of preprocessing techniques such as image restoration, it could be restored. Step I: Image acquisition: It is used to acquire the input of image that would be given for the training purpose so that the desirable output could bring out. Step II: Image filtering: In order to have smooth working of the model, filtering is the essential step that should be taken to eradicate the noise [10]. Depending upon the dimensions, there are the various filter techniques like dataset median filter to eradicate the salt-and-pepper noise and impulse noise, and bilateral filter is incorporated to throw out the blur of the image. Thus, it can be concluded that filtering plays the part and parcel role in processing of the image. Further, segmentation of the image is done which is described below in brief, and then, the image restoration is done [11]. The importance of the filtering is shown below through the diagram which consists of the original image and the preprocessed image. For 2D image, the dataset median filter is used, whereas for 3D images, the bilateral image is used (Fig. 1). The steps to be followed are described through the flowchart which consists of the steps, namely data insertion, data are read, preprocessing, and segmentation that are followed by feature extraction, tumor area calculation, and classification like below. Figure 2 shows the flowchart of the proposed methodology.
13 A Novel Method to Detect the Tumor …
Original image
165
Pre-processed image
Fig. 1 Comparative view of images before and after preprocessing
Fig. 2 Block diagram to detect and diagnose the tumor
2.2 Image Segmentation The segmentation of an image is another step that is believed to play an important role [12]. It is used to break the images into smaller parts so that it may be easily distinguishable that supports the process for the enhancement of the image.
166
P. Kaushik and R. Ratan
2.3 Feature Extraction The feature to be extracted is the final step that would bring out the beneficial parts of the images, so that images may be put to the above said segmentation technique. Features that are based on the intensity [13] are the skewness which will measure the deviations from the mean value and the mean which are defined as the sum of all the observations divided by the number of the observations, standard deviations, etc. Features that are based on the texture are: I.
Angular second moment: It is defined as the sum of squares of the levels of the image that are gray. It gives the peak value if it consists of the values which are not equal. If p (i, j) is a input image, then angular second moment f 1 is f 1 = i j { p(i, j)}2 Contrast: It is defined as the measure of the local variation in the levels that are gray and of the co-occurrence matrix, and thus, it gives the low values in that case if the intensity values of the pixels are not similar.
II.
N g −1
f2 =
n2
n=0
⎧ Ng Ng ⎨ ⎩
⎫ ⎬ p(i, j)
i=1 j=1
⎭
Here, Ng represents the number of the gray levels. III.
Correlation: The correlation represents the feature which shows that the gray levels are based on the neighbor pixels. i
f3 =
j
p(i, j) − μx μ y σx σ y
Here, µx and µy represent the mean and σ x and σ y represent the standard deviation of the input image. IV.
Variance: It gives the measurement of the distance at which the gray values are spread in the input image [9]. f4 =
i
(1 − μ)2 p(i, js)
j
Here, μ represents the mean of the complete image.
13 A Novel Method to Detect the Tumor …
167
3 Results and Discussion First of all, the images are downloaded from Kaggle.com by creating a new application programming interface (API) and then stored in a directory. Then, to enhance the image, the preprocessing is done. Nearabout 8000 images are created, and furthermore, to train the model of FASTAI, nearabout 7000 images are created, and the rest are used for the training purpose. The dataset is taken, and then, the FASTAI is implemented by using Pytorch, and then, CNN [14] model is followed, and then, the data are taken for the training and the testing purpose, and hence, the result is carried out. The CNN model is a computational neural network which takes the input and applies weights and processes them in a way so that they are variant from one another. The CNN model consists of three layers that are convolution layer, pooling layer, and the third one is the fully connected layer [15]. In the preprocessing of the data, the data are first feeded, and then, it is read, and then, the normalization is done. Here, the batch normalization [16] is done, and thus, the mean and the variance of the input layer are brought out. It further provides the protection against the internal covariant shift problem. After that, the CNN algorithm is applied, and then, the loss function is used that will find out the prediction error in the neural network [18]. Then, the training of the dataset is done, and then, testing is performed, and at last, the result is carried out. So, in nutshell the input samples are taken and that are went for preprocessing and by passing through the various steps as said above the result is taken and the tumor is detected. The low-contrast image segmentation technique will also mention that the detected tumor falls in which category, i.e., benign or malignant tumor. Here, the probability is 0.5 because it gives the maximum accuracy at this value. The edge detection method is implemented to detect the boundaries of tumor, and the Sobel operator is used over canny as it is easy to understand and implement [13].
168
P. Kaushik and R. Ratan
Here, the tumor detected is the malignant tumor as the tumor predicted is above 0.5.
Here, it shows the benign tumor as it is falls in the benign category and the probability is also above 0.5. Output: It is the probability among two classes, (1) benign and (2) malignant. If the class predicted probability is beyond 0.5, then the image belongs to that particular class. The accuracy of the model is 0.875, i.e., 87.5%.
13 A Novel Method to Detect the Tumor …
169
4 Conclusion Therefore, it may be noted that the images obtained may embody the noise that may hinder in the path of the ongoing detection technique. Therefore, the low-contrast technique is embedded to nurture the wanted results. The research is meant to emphasize on the vivid applications of the low-contrast image segmentation, especially in terms of the accuracy and sensitivity. Generally, the tumor is detected at the last stage, so an effort is made to detect the tumor at its very early stage by using low-contrast image segmentation.
References 1. X. Bai, M. Liu, T. Wang, Z. Chen, P. Wang, Y. Zhang, Feature based fuzzy inference system for segmentation of low-contrast infrared ship images. J. Elsevier Appl. Soft Comput. 46, 128–142, September 2016. 2. S. Kaushal, An efficient brain tumor detection system based on segmentation technique for MRI brain images. Int. J. Adv. Res. Comput. Sci. 8(7), 1131–1136 (2017) 3. R. Rana, P. Singh, Brain tumor detection through MRI images: a review of literature. IOSR J. Comput. Eng. (IOSR-JCE) 17(5), 07–18 (2015) 4. K. Zhang, J. Deng, W. Lu, Segmenting human knee cartilage automatically from multi-contrast MR images using support vector machines and discriminative random fields, in 18th IEEE International Conference on Image Processing, vol. 31, no. 10, December 2011, pp. 721–724 5. A. Hamamci, N. Kucuk, K. Karaman, K. Engin, G. Unal, Tumor-cut: segmentation of brain tumors on contrast enhanced MR images for radio-surgery applications. IEEE Trans. Med. Imaging 31(3), 790–804 (2012) 6. R. Beichel, A. Bornik, C. Bauer, E. Sorantin, Liver Segmentation in contrast enhanced CT data using graph cuts and interactive 3D segmentation refinement methods. Med. Phys. 39(3), 1361–1373 (2012) 7. N. Ahuja, A transform for multi scale image segmentation by integrated edge and region detection, IEEE Trans. Pattern Anal. Mach. Intelligence 18(12), 1163–1173 (1996) 8. S.P. Dakua, J.A. Nahed, Contrast enhancement in wavelet domain for graph-based segmentation in medical imaging, in Proceedings of 8th Indian Conference on Computer Vision, Graphics and Image Processing ICVGIP, vol. 76, December 2012, pp.16–19 9. Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20(1), 45–57 (2001)
170
P. Kaushik and R. Ratan
10. A. Das, M. Bhattacharya, GA based neuro fuzzy techniques for breast cancer identification, in IEEE International Conference on Machine Vision and Image Processing, vol. 08 (2008), pp. 136–141 11. A.H. Foruzan, Y.W. Chen, Segmentation of liver in low-contrast images using K-means clustering and geodesic active contour algorithms. EICE Trans. Inf. Syst. E96–D(4), 798–807, April 2013 12. J. Deng, H.T. Tsui, A fast level set method for segmentation of low contrast noisy biomedical images. J. Elsevier science B.M. Pattern Recogn. Lett 23(1–3), 161–169 (2002) 13. M. Zhang, L. Zhang, H.D. Cheng, Segmentation of ultrasound breast images based on a neutrosophic method. Optical Eng. 49(11), 1–10 (2010) 14. M.D. Zeiler, R. Fergus, Visualizing and understanding convolution networks. Proc. Comput. Vision 8689, 818–833 (2014) 15. N.M. Ahmed, S.M. Yamany, N. Mohamed, A.A. Farag, T. Moriarty, A Modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans. Med Imaging 21(3), 193–199 (2002) 16. J. Malik, S. Belongie, T. Leung, J. Shi, Contour and texture analysis for image segmentation. Int. J. Comput. Vision 43(1), 7–27 (2001) 17. Z. Yingjie, G. Liling, New Approach to Low Contrast Image Segmentation. National Science Foundation of China under Grant No.50775174. 2008, pp. 2369–2372 18. S. Anbumozhi, P.S. Manoharan, Performance analysis of brain tumor detection based on image fusion. Int. J. Comput. Information Eng. 8(3), 524–530 (2014)
Chapter 14
The Detection of COVID-19 Using Radiography Images Via Convolutional Network-Based Approach Astha Singh, Shyam Singh Rajput, and K. V. Arya
1 Introduction The highly infectious disease COVID-19 pandemic keeps on devastatingly affecting the wellbeing and prosperity of the worldwide populace. It was initially detected in December 2019 in Wuhan, Hubei, China and has results as an ongoing pandemic, brought about by the disease of people by the extreme intense respiratory condition coronavirus 2, i.e. SARS-CoV-2. A basic advance in the battle against COVID-19 is the successful screening of contaminated patients, with the end goal that those tainted can get quick treatment and care, just as be detached to alleviate the spread of the infection. The primary screening strategy utilized for recognizing COVID19 cases is polymerase chain response (PCR) testing that can distinguish SARSCoV-2 RNA from respiratory examples gathered through an assortment of means, for example, nasopharyngeal or oropharyngeal swabs [1]. While PCR testing is the best quality level as it is profoundly delicate, it is a very tedious, relentless and confounded manual procedure that is hard to come by. But, PCR testing is more dependent on tools and equipment so that it is more expensive. An elective screening technique that has additionally been used for COVID-19 screening has been radiography assessment, where chest radiography imaging, e.g. X-beam or
A. Singh (B) Center of Advance Studies, AKTU, Lucknow, UP 226031, India S. S. Rajput Department of CSE, National Institute of Technology Patna, Patna, Bihar 800005, India e-mail: [email protected] K. V. Arya ABV—Indian Institute of Information Technology and Management Gwalior, Gwalior 474012, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_14
171
172
A. Singh et al.
computed tomography (CT) imaging is accessed by radiologists to search for visual pointers related with SARS-CoV-2 viral disease. Biological testing of COVID-19 is costly and time-consuming which requires some specific tools. Perhaps the lab diagnosis requires for master radiologists to decipher the radiography pictures, since the visual pointers can be inconspicuous [2]. Therefore, there has been a late endeavour to push for open access and open source AI answers for radiography-driven COVID-19 case discovery. The proposed model investigates the presence of customized COVID-19 using profound convolutional neural network configuration for chest radiography pictures that is open source and accessible to the overall population [3].
1.1 Related Work The coronavirus draws a huge attention of numerous scientists to do advance examination about the side effects of this viral infection. One of those methods of examination is the recognition of pneumonia from X-beam chest pictures. There a great deal of datasets for chest X-rays for pneumonia [4]. Researchers introduced clinical determinations and treatable maladies by utilizing picture-based learning models to recognize or characterize distinctive features [5]. In this, pneumonia is discovered 92.80% through the testing conducted on chest X-beam images. In [6], the authors proposed another learning model with four convolutional layers and two thick layers accomplishes 93.73% of testing accuracy. Saraiva et al. [7] introduced a characterization of pictures of youth pneumonia utilizing convolutional neural network. The creator proposed a robust learning model with 7 convolutional layers and 3 thick layers and accomplishes 95.30% testing precision. G. Liang and L. Zheng introduced an exchange learning strategy model for paediatric pneumonia finding.
1.2 Main Contribution As illustrated in above papers, we can observe that there is still requirement to develop a new COVID detection scheme as earlier research work seems incompatible in feature identification from noisy dataset of CT scan. In this paper, we have normalized the noisy dataset of X-ray images by applying filtering and morphological operations. We have applied median filtering followed by Gaussian smoothing scheme to smooth the projection vector which increases the occurrence of features. Laplacian of Gaussian operator is applied at zero threshold to perform morphological filtration. The filtration is applied on radiography pictures of COVID patient to refine and balance the contrast of images in order to apply deal with rich feature set. The database is incorporated with 16,756 chest radiography pictures across 13,645 patient cases, made as a blend and alteration of two open access data vaults containing chest radiography pictures. In this paper, we have suggested an automated identification
14 The Detection of COVID-19 Using Radiography …
173
of COVID-19 by employing deep convolutional neural network-based pre-trained allocation model. We have utilized modified version of ResNet50 to achieve effective prediction for small X-ray dataset. The innovation of this paper is summarized as follows: (i) (ii) (iii) (iv)
Apply normalization and pre-processing of diseased image to obtain best features. Accuracy of about 96.73% is maintained. Proposed method is robust against multiple attacks. Training complexity is reduced by addition of CNN with ResNet-50.
Organization of the remaining contents of this paper is as follows. Section 2 will discuss the proposed scheme. Section 3 will discuss the model architecture. Section 4 elaborates the experimental observations and results. Section 5 concludes this paper. At last, Section 6 describes the references.
2 Proposed Scheme Here, we will be discussing the design methodology, the resulting network architecture, the process of creating the database dataset and the implementation details in creating model. The Gaussian-based image filtration method is applied form feature enhancement. The general equation of Gaussian filtration is given below: g(x, y) =
1 − x 2 +y2 2 e 2σ 2π σ 2
(1)
Here, x and y are the spatial coordinates, and σ is the standard deviation of Gaussian distribution. The variation of grey level values is compressed and controlled by applying logarithmic transform as given below:L(x, y) = K × log 2[1 + g(x, y)]
(2)
where K is a constant which can control the variation of illumination of X-ray images to enhance its feature vector. All the corrupted pixels of X-ray images are scaled with the factor of 32 to obtain an enhanced image (I(x, y)). I (x, y) =
Imax [L(x, y) − L min ] L max − L min
(3)
Imax is the maximum intensity. L max and L min are the minima. Figure 1 presented the basic deep learning model for feature detection and classification. The proposed methodology of the work is described in Fig. 2. As described in Fig. 2, the CT scanned images are initially passed through the noise filters (such as smoothing filters, median filters, Gaussian filter, etc.) that
174
A. Singh et al.
Convolutional Layer Pooling Input Layer Layer
Fully Connected Layer
Logistic Layer
Output Layer
Fig. 1 Deep learning model for feature detection and classification
Input CT image
CT image Noise filter
Select Different Combination and Activation Function (Relu and SoftMax)
Fit Created CNN with Dataset
Train the model with training dataset
Obtain Accuracy with different CNN structure by Classifier
Output
Fig. 2 Flow chart of proposed method
removes variation of illumination effect and restore the corrupted pixel in order to enhanced feature vector. Then different combination and activation function (Relu and Softmax) are selected for extracting features from the images according to the requirement of the model. The extracted features are further feed into the CNN, which is equipped with ResNet50 so the model is trained according to the different results obtained.
14 The Detection of COVID-19 Using Radiography …
175
In this examination, a human–machine synergistic arrangement strategy is used to make the model robust in which human-driven principled prototyping is used together with AI plan examination to make a model structure custom-fitted for the recognizable proof of COVID-19 cases from chest radiography pictures as represented in Fig. 2. The basic feature map at any layer m is represented as Um . Um (x) = relu( f · (Um−1 (x)) + W (x) ∗ Um−1 (x))
(4)
Here, f is a connection weight and W is a convolution kernel. The primary period of the human–machine network situated arrangement strategy used to make the proposed model is a principled model structure prototyping stage, where a fundamental arrangement model is created subject to human-driven arrangement norms and best practices. ResNet-50 contains skip blocks of convolutional layer. The number of output feature map size is equivalent to number of filters in the layer. The input value of a hidden node is described as:I =
1024
X n Ymn
(5)
n=1
Y mn stands for the value of weight between an input neuron X mn and a hidden neuron “I.” All the Y mn having total of 1024 × 256 weight values. The output value of a hidden neuron is expressed as: 1 1 + eI
Y =
(6)
The input value of an output neuron is described as: Z=
256
Y Wn
(7)
n=1
Here, W n denotes the weight value hidden neuron Y and an output Z. All the value of W n forms the matrix with total weight values of 256 × 15. We use activation function to generate the output. The output value of the neuron is expressed as:1 1 + eZ
(8)
1 (On − Z n )2 2 n=1
(9)
Zo = The error rate can be described as:15
E=
176
A. Singh et al.
Here, O n is the actual output, and the Z n is the calculated output. And so, the backward propagation network comes here to propagate the error values back to the neuron. The loss or error in recognition is minimized using weight-update process. The weights are updated under following expression:V = μ
δE δV
(10)
W = μ
δE δW
(11)
V = V + V
(12)
W = W + W
(13)
The avocation for picking potential gauges is that it can push clinicians to all the more promptly pick not exactly who should be sorted out for PCR testing for COVID19 case insistence, yet moreover which treatment technique to use dependent upon the explanation behind infection, since COVID-19 and non-COVID19 defilements require unmistakable treatment plans. The second period of the human–machine communitarian plan model used to make the proposed model is an AI-based structure examination stage, where the hidden model, data, close by human express arrangement necessities, go about as a manual for an arrangement examination strategy to learn and perceive the perfect large-scale design and microarchitecture structures with which to build up the last redone significant neural system designing.
3 Model Architecture The proposed model depicts the entire flow of architecture in Fig. 1, which shows the layers of the CNN. Some other modules of architecture are described as follows: Sample: in this paper, Python programming language was used to set up the proposed significant trade learning models. All examinations were performed on an AI using ResNet50. CNN models (ResNet50) were pre-arranged with subjective presentation loads using the Adam smoothing out specialist. The cluster size, learning rate and number of ages were likely set to 2, 1e5 and 30, independently for all preliminaries. The dataset used was indiscriminately part into two self-governing datasets with 96.73% for getting ready and testing independently. Data Collection and Analysis: the dataset used to prepare and get to the proposed model will be alluded to as COVID19, and it is comprised of 16,756 chest radiography pictures more than 13,645 patient cases. To create the COVID19 dataset, two distinctive openly accessible datasets were changed and consolidated, and they
14 The Detection of COVID-19 Using Radiography …
177
incorporate COVID-19 picture information assortment and RSNA pneumonia detection challenge dataset. The two are picked because they are open source, and they are available to the general population and specialists.
4 Experimental Results To assess the viability of the proposed model, we perform both quantitative and subjective investigation to improve comprehension of its location implementation and dynamic supervision. Quantitative Analysis: To explore the proposed model in quantitative way, we registered the test precision, just as affectability and positive prescient value (PPV) for every contamination type, on the previously mentioned dataset. The test precision, alongside the compositional multifaceted nature and computational unpredictability are appeared in Table 1. Table 1 illustrates that the coronavirus detection by using ResNet-50 is much affective. It tends to be seen that model finds harmony among exactness and computational multifaceted nature by accomplishing 96.73% test precision while simply requiring thousands of MAC activities to perform case expectation. These interpretations additionally feature the viability of utilizing a human–machine collective structure to make exceptionally redid profound neural network designs in faster way, customized around errand, information and operational pre-requisites. This is particularly significant for situations, for example, disease recognition, where new cases and new information are gathered persistently, and the capacity to quickly create new costumed neural networks that is fitted into the ever-advancing information base. The purpose is to classify the X-rays into ordinary lung (as appeared in Fig. 4), pneumonia (as appeared in Fig. 5) and COVID-19 (as appeared in Fig. 3). From the above pictures, we can see that the lung opacities were seen in both the COVID and the pneumonia chest X-ray pictures. The opacities are obscure and fluffy billows of white in the dimness of the lungs. As the contrasts among pneumonia and COVID-19 X-rays were very unobtrusive, high differentiation pictures were made to make it generally simpler to classify as appeared in Table 2. For the equivalent, we standardized the X-rays for every one of the patients by taking away the mean. Table 1 Performance comparison of the proposed approach with existing technique
Publication
Technique
Wang et al. [1]
Convolutional neural network 92.4
Li et al. [2]
ResNet-50
90
Shi et al. [3]
Random forest method
87.9
Wong et al. [8]
COVID-NET
Proposed paper Modified ResNet-50
Accuracy (%)
95 96.73
178
A. Singh et al.
Fig. 3 Representation of chest X-ray of COVID 19 [9, 10]
Fig. 4 Representation of chest X-ray of normal [9, 10]
Fig. 5 Representation of chest X-ray of pneumonia [9, 10] Table 2 Representation of chest X-ray of input, noise and filtered radiographic image [ 9, 10]
Input X-ray
Noise X-ray
Filtered Xray
14 The Detection of COVID-19 Using Radiography …
179
It may be seen that the model can accomplish great affectability for COVID-19 cases, which is significant since we need to restrict the quantity of missed COVID-19 cases. Second, it tends to be seen that model accomplishes high PPV for COVID-19 cases, which demonstrates not many bogus positive COVID-19 identifications. This high PPV is significant given that such many bogus positives would build the weight for the medicinal services model because of the requirement for extra PCR testing and extra consideration. Third, it may be seen that the PPV for ordinary and nonCOVID19 disease cases is recognizably higher than for COVID-19 contamination. Fourth, it can likewise be seen that affectability is perceptibly higher for typical and nonCOVID19 disease cases than COVID-19 contamination cases. Consequently, in view of these outcomes, it tends to be seen that while model performs well overall in distinguishing COVID-19 cases from chest radiography pictures as shown in Table 1, there are a few regions of progress that can be profited by gathering extra information. Subjective Analysis: We further examine and investigate how model makes forecasts by utilizing GS Inquire, a logic strategy that has been appeared to give great bits of knowledge into how profound neural networks are the choice. It very well may be realized that proposed model recognizes restricted zones inside lungs in the chest radiography pictures as shown in Figs. 2, 3 and 4 and Table 2, as being basic factors in deciding if a radiography picture is of a patient with a SARS-CoV-2 viral disease. • Transparency: By understanding the basic elements being utilized in COVID-19 case location, the expectations made by the proposed model become increasingly straightforward and reliable for clinicians to use during their screening procedure to help them in making quicker yet exact appraisals. • New understanding disclosure: The basic elements utilized by the proposed model might assist clinicians with finding new bits of knowledge into the key visual markers related with SARS-CoV-2 viral contamination, which they would then be able to use to improve screening precision. Performance approval: By understanding the basic components being utilized in COVID-19 case discovery, one can approve that the proposed model is depending on reliable data to decide. The modified layer architecture is proposed in this paper because these layers are simple, and classification accuracy is more as compared with original layer. In modified layers architecture, the input layer is connected to the convolution layer, and the convolution layer is further connected to a copiously connected layer. The ReLU activation function and softmax activation function are associated to the copiously connected layer followed by classification layer. As illustrated in Fig. 6, the ROC curve and confusion matrix using ResNet50 are determined in simulation that shows that the ResNet50 solely is more effective than CNN when employed solely in the model for detection of covid-19. The ResNet50 is solely capable of providing nearly 96.73% efficiency which is not desirable for the consideration. The ResNet101 solely capable of providing 80% efficiency which is still incapable of providing desired results.
180
A. Singh et al.
Fig. 6 Confusion matrix and ROC Curve of ResNet50
Fig. 7 Training accuracy performance of models
As illustrated in Fig. 7 shows the accuracy value while performing training process. The dataset is divided in the ratio of 60:40 for training and testing purpose. The classifier classifies the CT images as COVID +ve and −ve. The confusion matrix and ROC curve using ResNet50 along with CNN are determined in simulation that gives approximately 96.73% accuracy in detection of covid-19. The proposed model also outperforms some recently developed architecture such as VGG-19, COVIDNET, MADE-DBM, etc. The classification model based on the combination of deep bidirectional network and Memetic adaptive differential evolution (MADE-DBM [11]) algorithm produces a testing accuracy of 96.1983% which is still lower than the proposed algorithm. Figure 8 shows the real-time COVID-19 disease transmission scenario among the carriers. In this scenario, red colour dot represents infected person, blue colour represents recovered person, green shows unaffected samples and black shows nonrecovered or dead sample. The wall which separates the area into four parts represents by magenta line. As shown in Fig. 8a, the total simulation is for 40 days in which day
14 The Detection of COVID-19 Using Radiography …
181
Fig. 8 Real-time simulation: a after 3 days effect, and b after 40 days effect
one wall is closed, and only two or three persons are infected after 3 days. As shown in Fig. 8b, the number of days increase the wall is opened slowly and more people infected; some person is recovered, and some persons are dead. The final simulation result will be view in the graph. The graph x axis is the number of days, and y axis is the population as we see the days increase the infected person which is red line also increase, and around 25 days, it reaches to maximum then it decreases similarly green line in the graph is unaffected which shows decreasing. The red and black lines are for recovered person and dead person which also increase as the days increase.
5 Conclusions and Future Work This paper investigated how the model distinguished COVID-19 features with desirable accuracy using rational procedure. It increments further the pieces of information into essential components. In this paper, a significant convolutional neural system structure alongside ResNet50 for an early estimate of COVID-19 patients is basic to hinder the spread of the illness to other people. In this paper, we proposed a significant trade learning-based approach using chest X-pillar pictures got from COVID-19 patients and run of the mill to foresee COVID-19 patients thusly. Execution results show that the ResNet50 pre-arranged model yielded the most raised precision of 96.73% among the other models. In the light of our revelations, it is acknowledged that the proposed model will assist experts with settling on decisions in clinical practice on account of the prevalent.
182
A. Singh et al.
References 1. A. Al-Hazmi, Challenges presented by MERS corona virus, and SARS corona virus to global health. Saudi J. Biol. Sci. 23(4), 507–511 (2016) 2. D. Kumar, R. Malviya, P.K. Sharma, Corona virus: a review of COVID-19. Eurasian J. Med. Oncol. 4, 8–25 (2020) 3. Y. Yuliana, Corona virus diseases (Covid-19): Sebuah tinjauan literature. Wellness Healthy Mag. 2(1), 187–192 (2020) 4. P. Yang, P. Liu, D. Li, D. Zhao, Corona Virus Disease 2019, a growing threat to children? J. Infection (2020) 5. M. Hadjidemetriou, Z. Al-Ahmady, M. Mazza, R.F. Collins, K. Dawson, K. Kestrels, In vivo biomolecule corona around blood-circulating, clinically used and antibody-targeted lipid bilayer nanoscale vesicles. ACS Nano 9(8), 8142–8156 (2015) 6. O. Vilanova, J.J. Mittag, P.M. Kelly, S. Milani, K.A. Dawson, J.O. Rädler, G. Franzese, Understanding the kinetics of protein–nanoparticle corona formation. ACS Nano 10(12), 10842–10850 (2016) 7. A. Saraiva, N. Ferreira, L. Lopes de Sousa, N. Costa, J. Sousa, D. Santos, A. Valen te, S. Soares, Classification of Images of Childhood Pneumonia using Convolutional Neural Networks. https://doi.org/10.5220/0007404301120119 8. L. Wang, A. Wong, COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images. arXiv preprint arXiv:2003.09871 (2020) 9. COVID-19 Image Data Collection: Prospective Predictions Are the Future Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong and Marzyeh Ghassemi arXiv:2006.11988. https://github.com/ieee8023/covid-chestxray-dataset (2020) 10. Dataset link. https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia 11. Y. Pathak, P.K. Shukla, K.V. Arya, Deep bidirectional classification model for COVID19 disease infected patients, in IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020)
Chapter 15
Super-Resolution MRI Using Fractional Order Kernel Regression and Total Variation Vazim Ibrahim and Joseph Suresh Paul
1 Intoduction Spatial resolution is one of the key imaging parameters for MRI, that is, limited by several factors such as hardware configuration, acquisition duration, desired signalto-noise ratio (SNR) and the patient throughput. Retrospectively, the improvement of spatial resolution is usually achieved at the cost of reduced SNR or increased scanning time. One approach to speeding up the acquisition is to acquire a low-resolution data with inherent blur and partial volume effects. This necessitates image postprocessing methods that aim at improving the spatial resolution without compromising the desired SNR and includes an intrinsic deblurring mechanism. From an image processing perspective, the combination of deblurring and upsampling to improve resolution is a super-resolution reconstruction (SRR) problem [1]. These are mainly classified into interpolation-based, reconstruction-based and learning-based methods [2–4]. Basically, interpolation-based methods aim at producing high-resolution (HR) images by upsampling low-resolution (LR) images using low-order polynomial ordering, for example, bilinear and bicubic interpolation. Learning-based methods utilize residual information obtained from a training database to reconstruct the HR image [4]. One of the simplest learning methods represents an HR image patch as a sparse linear combination in an overcomplete dictionary trained HR patches sampled from training LR images [5]. Learning methods based on Convolutional Neural Networks (CNN) have also been widely applied for SRR problems in MRI [6]. The disadvantages of learning-based methods mainly arise from redundancies in hidden layers and the accompanying computational complexity that lead to memory mapV. Ibrahim IIITM -K, Trivandrum, Kerala and CUSAT, Kochi, Kerala, India e-mail: [email protected] J. S. Paul (B) IIITM-K, Trivandrum, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_15
183
184
V. Ibrahim and J. S. Paul
ping issues [7]. In reconstruction based methods, a variational framework is used to optimize a cost that involves a data fidelity and a regularization term to incorporate prior knowledge on the image. Each variational SRR approach differs in its formulation based on the type of regularizer/s to suit a specific application. One of the well-known regularizers is the Total Variation (TV) prior, first proposed by Rudin et al. [8] for image noise removal. Thereafter, many generalizations and further applications of the TV functional such as image enhancement, image superresolution, etc., have been proposed. For a comprehensive overview of this topic, we refer interested readers to the introductory article of Chambolle et al. [9]. Despite some of its limitations of over smoothness in the homogeneous regions and staircase artifacts, the TV-based prior has been extensively used to penalize the convex optimization model of image variation by employing image gradient with an l1 -type of penalty [10]. Since regularizers based on standard forms of the TV functional favor a piecewise constant solution, several modifications of the TV functional have been developed to overcome the main limitations and also result in a piecewise smooth approximation. One such approach use kernel smoothing to reduce the jaggedness of edges by adjusting the order of polynomial regression [11]. More refined versions make use of a local Taylor series expansion of a fixed order wherein the TV functional is derived using weights generated from steering kernels as in the Adaptive Kernel Total Variation (AKTV) [11–14]. Local structure description using the structure tensor enables adaptation to the local edge geometry, to preserve the sharpness with simultaneous reduction of staircase effects. Other than the regression framework, the modified forms such as the Structure Tensor Total Variation (STV) [15] penalizes the Schatten-p norm of a neighborhood matrix of spatial gradients. Moreover, motivated by the work done by Shi et al. [16] in our variational framework, we have provided a prior based on low-rank matrix approximation (LRMA). Since direct rank minimization is difficult to solve, usually the problem is relaxed by minimizing the nuclear norm of the estimated matrix referred to as the Nuclear Norm Minimization (NNM) approach. Candes et al. [17] proved that the low-rank matrix can be reconstructed from the noisy data with high probability using the NNM prior. On the other hand, Cai et al. [18] solved nuclear norm minimization as a Proximal (NNP) problem employing a soft threshold on the singular values of the estimated matrix. Besides the aforementioned advantages, NNM has limitations where all the singular values are shrunk with the same threshold. This will ignore the prior knowledge residing in the singular values of the estimated matrix. For example, the largest singular values of the redundant matrix to be estimated deliver major edge and texture information, such that we need to recover the matrix by shrinking the largest singular values with low threshold, while shrinking the smaller ones with larger threshol. To address this, the weighted nuclear norm minimization (WNNM) has been proposed by Gu et al. [19, 20] for image denoising problem with theoretical and numerical validations. Motivated from the latter work, Yang et. al. [22] exploited non-local self similarity details in the WNNM such that the noise in the texture information could be suppressed with robust textural preservation, termed as the Weighted Non-Local Nuclear Norm (WNLNN).
15 Super-Resolution MRI Using Fractional Order …
185
In this paper, we adopt a kernel regression approach with steering kernel estimation using fractional order gradients and a Taylor series representation involving integer and fractional order. TV regularization using Fractional-order gradients is known to enhance textural details in smooth regions along with suppression of staircase artifacts. Thus, introducing the fractional-order gradient will make the adaptation of steering kernel more effective. With this motivation, we model a kernel regression framework using TV functional with local fractional-order Taylor series signal representation termed Fractional-order Adaptive Kernel Total Variation (FOAKTV). Noise suppression in the texture region is achieved by Weighted Non-Local Nuclear Norm (WNLNN) based spectral low-rank prior. To achieve the desired superresolution solution, the cost function with afermentioned priors is minimized using the Alternating Direction Method of Multipliers (ADMM). The proposed method outperforms the state-of-the-art methods such as low-rank total variation (LRTV), dictionary learning (DL) and adaptive kernel total variaton (AKTV) in terms of the output signal-to-noise ratio (SNR) at different input noise levels.
2 Theory 2.1 Super-Resolution Reconstruction Framework Super-resolution reconstruction (SRR) deals with recovering an HR image f ∈ R Mh ×Nh from a given low resolution (LR) image g ∈ R Ml ×Nl , where Mh = r × Ml , Nh = r × Nl and r is the upsampling factor. The observed LR image can be mathematically expressed as g = DBf + η, (1) where D denotes the down-sampling operator and B denotes the blurring operator. The SRR model for estimating the HR image in (1) can be performed by minimizing the cost function, 1 (2) fˆ = min g − DBf 2 , f 2 where fˆ denote the desired HR image. Due to the ill-posed nature of (2), a regularization term is introduced in the form of some prior information to the original cost function in (2). In the regularized form, the cost function becomes 1 fˆ = min g − DBf 2 + λR(f ), f 2
(3)
where λ denote the regularization parameter and R(f ) denote the regularization term that ascertains prior information in the solution. One of the most commonly used regularizers is the TV functional that preserves sharp edges under piecewise constant approximation. Since MR images favor piecewise smooth approximations, use of the
186
V. Ibrahim and J. S. Paul
TV prior can often yield artificial edges, known as the staircase artifacts [8]. Although incorporation of higher order information to promote piecewise smooth assumptions is able to mitigate this problem [9], robust estimation of fine edge details and sharp corners still pose challenges [3]. Motivated by the work done by Takeda et al. in [13, 14] and Shi et al. in [16], the proposed SRR variational model based on (3) can be written as 1 fˆ = min g − DBf 2 + λtv FOAKTV(f ) + λwnlnn f wnlnn , f 2
(4)
2.2 Fractional Order Adaptive Kernel Total Variation The Adaptive Kernel Total Variation propose a framework based on the development of locally adaptive filters with coefficient depending on the pixels in a local neighborhood of interest. These filter coefficients are computed with a particular measure of similarity and consistency between the neighboring pixels that use the local geometric and radiometric structure of the neighborhood at a pixel location. Although AKTV gives edge and textural preservations reducing the staircase effects, over smoothing in the homogeneous region is still a challenge to be emphasized. Even though total variation has a disadvantage of over smoothening effect in the homogeneous region, local steering kernel function cannot drastically suppress this effect in which fine textural details get smoothened in the reconstructed image. To remedy the above problem, an alternative will be to increase the regression order, which eventually uses the higher-order gradients of the image with high-computational complexity. Alternatively, the fractional-order gradients have an efficacy in mitigating above effects in the image restoration problems with preservation of fine geometric features [21]. In view with this, we propose to use the fractional-order gradients along with adaptive kernel total variation as the regularization term to provide a better reconstruction with fine geometric structure preservation even in strong noise. The definition of discrete fractional order gradient for the FOAKTV regularization term is given by D(α) [xi ,xi +w] f (xi , yj ) =
k (−1)r r=0
Γ (α + 1) f (xi + r, yj ), Γ (r + 1)Γ (α) − r + 1)
(5)
where D(.)(n) is the α th order fractional derivative. We define (xi , yj ) for i = 1, 2, ., Mh + w, j = 1, 2, ., Nh + w as the spatial sampling position in the image domain Ω. Let us denote equidistant node with the step h, tk = hk, k = 0, 1, 2, .., b, b = w − 1 for x-direction in the interval [xi , xi + w], where t0 = xi and tb = xi + w. Γ (.) is the gamma function. Similarly, we can derive the definition of D(α) [yj ,yj +w] f (xi , yj ). Defining the discrete fractional order gradients of an image, the proposed regularization functional termed as fractional order adaptive kernel total variation is given by
15 Super-Resolution MRI Using Fractional Order …
FOAKTV (F) =
187 −v
Wf (v, 1, H )(f − Sx−vx Sy y I.Fee 1/e
(6)
vx ,vy (α) (α) (α) Mh where I = [I , I vx , I vy , I (α) vx , I vy ] and F = [f , f vx , f vy , f vx , f vy ] with I ∈ R Nh × 1 represent lexicographically ordered identity matrix and I vx = diag(vx , vx , ..vx ) for a pixel shift v = [vx , vy ] from lpth pixel, p = [1, 2, 3, 4, ...Mh Nh ] in the neighborhood −v
window w × w. Sx−vx and Sy y are the shift operaotrs defined along x and y direc1/e tion, respectively. Wf (v, 1, H ) is the weight kernel defined using kernel function KH (lp − l) given by √ KHp (lp − l) =
1
−(lp − l)T Cp 2 (lp − l)22 d et(Cp ) exp 2π h2 2h2
(7)
where covariance matrix Cp = γp Uθp Λθp Uθp T , ∀p with rotation matrix Uθp , elonga1
tion matrix Λθp and scaling parameter γp . Hp = hCp2 is the steering matrix.
2.3 Spectral Low Rank Prior Furthermore, the texture preservation-based spatially redundant information can significantly improve the reconstruction performance of an image. In that context, we provide a spatial non-local similarity-based regularizer proposed in [22] to preserve the textures with suppression of noise. Nuclear Norm Minimization (NNM) problems can suppress the noise in the image by shrinking the singular values of the observed matrix, due to the fact that the larger singular values of the observed matrix quantify the principal directions. But, in traditional NNM problems, all singular values are shrunk with the same threshold which may not efficiently suppress noise in denoising problems with loss of structural information. In view with this, Weighted Nuclear Norm minimization (WNNM) can enhance the traditional nuclear norm capabilities with rationally varying weight vector based on prior knowledge. The definition of WNLNN minimization for the image f as: f wnlnn =
M h Nh
Θk (f )ω
(8)
k=1
where .ω is the WNN defined in (8), Θk (.) ∈ R w ×1 → R w ×m is the Casorati matrix construction operator with m similar patches for the jth patch of dimension w × w in f . The solution of (8) is the weighted version of singular value thresholding (SVT) given by: 2
W NLSVT (Θk (f )) = Uk soft ωi (Δk )Vk T , ωi soft ωi = diag(max(σi (Θk (f )) − , 0)). 2
2
(9) (10)
188
V. Ibrahim and J. S. Paul
where k = 1, 2, 3, . . . Mh Nh , i = 1, 2, ..s and ωi is estimated by: √ c m ωi = σi (Θk (f )) +
(11)
where c is a constant, m is the number of similar patches for (kth) patch of f , is a small constant to avoid division by zero, and σi (Θk (f )) can be calculated as ωi =
max(σi2 (Θk (f )) − κ, 0)
(12)
3 Proposed SRR Optmization In this section, we discuss the proposed variational model given in (4). As defined earlier, we propose an SRR method incorporating FOAKTV as the regularization term ( for mathematical simplicity, we show the optimization steps based on Q=1 for FOAKTV in (6)), therefore (4) can be rewritten as 1 −vy 1/e Wf (v, 1, H )(f − Sx−vx Sy I.Fee + λwnlnn f wnlnn fˆ = min g − DBf 2 + λtv f 2 v ,v x
y
(13) where λtv regularization term for FOAKTV, λwnlnn is the regularization term for WNLNN. In order to implement the variational problem, we use the Alternating Direction Method of Multipliers(ADMM), which is proven to be efficient for solving SRR methods [16]. In order to proceed, the optimization functional in (13) need to be presented in dual variable formulation as 1 −vy 1/e Wf (v, 1, H )(f − Sx−vx Sy I.Fee + λwnlnn f1 wnlnn fˆ = min g − DBf 2 + λtv f ,f1 2 v ,v x
y
s.t.f = f1
(14) Now the first step towards ADMM optimization is defining the Lagrange function for (14), which can be given by: L (f , f1 , Z, ρ) =
1 −v 1/e g − DBf 2 + λtv Wf (v, 1, H )(f − Sx−vx Sy y I.Fee 2 v ,v x
y
ρ Z + λwnlnn f1 wnlnn + f − f1 + 2 2 ρ
(15) where ρ is the penalty parameter and Z is the Lagrange multiplier term. To ease the optimization procedure, split the optimization problem on (15) into four subproblems and the update step for the optimization variable for each subproblems are given below:
15 Super-Resolution MRI Using Fractional Order …
189
3.1 Subproblem 1: Update f k+1 By Minimizing 1 ρk Zk −vy 1/e f − f1k + k 2 min g − DBf 2 + λtv Wf (v, 1, H )(f − Sx−vx Sy I.Fee + 2 f 2 ρ vx ,vy
(16) This minimization functional is solved using a gradient descent method with a step size of δ. The iterative update step is given by Zk f k+1 = f k + δ BT DT (g − DBf k ) + ρ k (f k − f1k + k )+ ρ −v k −vx −vy T (f − Sx Sy I.F) Wf (v, 1, H )(sign(f k − Sx−vx Sy y I.F)◦ λtv
(17)
vx ,vy
e−1 −vy k f − Sx−vx Sy I.F ) where ◦ is the element-by-element product operator for two vectors.
3.2 Subproblem 2: Update f1k+1 By Minimizing
min f1
ρ k k+1 Zk f − f1 + k 2 + λwnlnn f1 wnlnn 2 ρ
(18)
This subproblem is solved iteratively with gradient descent step for fidelity term and weighted nonlocal nuclear norm described in Sect. 2 defined by: Zk ) ρk = W NLSVT (ft r+1 )
ft r+1 = ftr − δρ k (f k+1 − ft r + ft
r+1
(19)
where δ is the step size for the gradient descent step, here we taken as 0.1.W NLSVT (.) is the Weighted Non-Local Nuclear Norm solution as defined in (8) and the threshold wnlnn γ = ρλmathitk . Repeat the steps in (19) iteratively to get the update of f1 at the (k + 1)th
iteration as f1k+1 = ftr+1 .
190
V. Ibrahim and J. S. Paul
3.3 Subproblem 3: Lagrange Multiplier and Penalty Parameter Update Z k+1 = Z k + ρ k [f k+1 − f1k+1 ] ρ k+1 = o.ρ k
(20)
where Z k+1 is an increasing sequence and o > 1.
4 Result 4.1 MR Phantom Data Evaluation at Different Noise Levels To demonstrate the experiment, we use T1 MR Phantom from Brainweb and evaluated the recovery performance of the proposed and comparison methods at different noise levels of 0.04, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 1% of the maximum intensity of the original image. Signal-to-noise ratio (SNR) in decibels (dB) is used to evaluate the reconstruction quality of the methods. ,All implementations are performed using Matlab on a PC with an Intel Xeon E5-2609 2.4 GHz processor and 16 GB of RAM running Windows 10 operating system. In the Phantom study, we have used the implementations of LRTV [16], SRR-Dictionary Learning [6], and SRRAKTV [14] to compare the recovery performance with the proposed method. In all methods, we have generated an initial low-resolution image as a truncated version of original k-space with the down-sampling factor of 2. Initial high-resolution image from this LR image is generated by using bilinear interpolation. Down-sampling and up-sampling operation in SRR formulation are done in image domain by spatially averaging the pixel intensities and its inverse operation. Blurring and de-blurring operations are performed as suggested in the SRR literatures [16].
4.2 Reconstruction Accuracy To illustrate, the improved image quality obtained using the proposed method and compared with state-of-the-art methods like SRR-LRTV, SRR-DL. In Fig. 1, we have shown reconstructed and difference images of T1 MR Phantom from noise image with a level of 0.2% of maximum intensity ( shown in Fig. 2). Using the proposed method, several parameters used to conduct the experiments include step size, regularization, and the penalty parameter for LRTV optimization, step size, dictionary size, sparsity for SRR-DL, and step size, regularization parameter, and window size, for the AKTV regularizer based SRR. Although noise reduction constants in the elon-
191
b)
a)
15 Super-Resolution MRI Using Fractional Order …
Fig. 1 a shows the reconstructed HR image with different SRR methods with the proposed FOAKTV method for a noise level, in which our method has good SNR performance with textural preservation. b shows the diffrence map of the reconstructed HR images with the groundtruth
Fig. 2 This figure shows the golden standard HR image used for SNR calculation with Noisy HR and LR observed image from which HR image is reconstructed as shown in Fig. 1
gation and rotation parameter calculation for AKTV is used as suggested in [14], we have selected the regularization parameter in optimization heuristically to get an improved reconstruction. In the experiment, we encountered that the regularization parameter for AKTV is sensitive to the algorithm, in which if we give a higher value the image gets over smoothed and lower value, will give underlying staircase artifacts to the reconstructed image. So we have chosen a small regularization value of 0.022 for SRR-AKTV. In our proposed method, we have same parameters of AKTV and, in addition, we have to choose for fractional-order gradient, and penalty parameter for ADMM optimization. For first-order fractional derivatives, we have chosen to be 1.8417 and for second-order to be 2.45. In Fig. 1, we have shown SNR performance
192
V. Ibrahim and J. S. Paul
Fig. 3 Output SNR versus noise level of phantom data evaluation. Here we can see that proposed method give high SNR with respect to the dictionary learning method and LRTV
of the proposed method with the comparison methods, which suggests that at noise level the difference in SNR between the methods is less, but visual quality of our method with fine textural information is preserved, but in LRTV this information’s get smooth out even though it sharpens the edge. Dictionary-learning gives compression artifacts in the edges even if it gives a sharp edge with textural preservation. As the noise level increases, our method gives a consistent performance, while the state-of-the-art method degrades its performance with bad visual quality due to over smoothness and artifacts. From the difference map in Fig. 1, we can show that the dictionary learning method and our methods have fewer intensities in the edge areas, but the LRTV has high intensities in edge areas that show the poor deblurring effect of the method. AKTV also gives a visually similar difference map with not clearly defining the details from noise, whereas our method can preserve the fine details with suppression of noise than AKTV. Figure 3 shows the SNR measurement for all the methods, while changing the noise level from no noise to a 1% noise level. Although LRTV dominates at a low-noise level, the proposed method outperforms all the other comparison methods as the noise level is increased. Acknowledgements This work was supported in part by the Scientific and Engineering Research Board (SERB) under Grant CRG/2019/002060 and planning board of Government of Kerala (GO(Rt)No.101/2017/ITD.GOK(02/05/2017)).
15 Super-Resolution MRI Using Fractional Order …
193
References 1. E. Carmi, S. Liu, N. Alon, A. Fiat, D. Fiat et al., Resolution enhancement in MRI. Magn. Reson. Imaging 24(2), 133-154 (2006) 2. A. Gholipour, J.A. Estroff, S.K. Warfield et al., Robust super-resolution volume reconstruction from slice acquisitions: application to fetal brain MRI. IEEE Trans. Med. Imaging 29(10), 1739-1758 (2010) 3. A.P. Mahmoudzadeh, N.H. Kashou et al., Interpolation-based super-resolution reconstruction: effects of slice thickness. J. Med. Imaging 1(3), 034007 (2014) 4. J.V. Manjón, P. Coupé, A. Buades, V. Fonov, D.L. Collins, M. Robles et al., Non-local MRI upsampling. Med. Image Anal. 14(6), 784–792 (2010) 5. Y. Li, B. Song, J. Guo, X. Du, M. Guizani et al., Super-resolution of brain MRI images using overcomplete dictionaries and nonlocal similarity. IEEE Access 7, 25897–25907 (2019) 6. J. Yang, W. John, S. Huang Thomas, Y. Ma et al., Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861-2873 (2010) 7. A.S. Chaudhari et al., Super-resolution musculoskeletal MRI using deep learning. Magn. Reson. Med. 80(5), 2139-2154 (2018) 8. L.I. Rudin, S. Osher, E. Fatemi et al., Nonlinear total variation based noise removal algorithms. Physica D Nonlin. Phenomena 60(1-4), 259-268 (1992) 9. A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock et al., An introduction to total variation for image analysis. Theor. Found. Num. Methods Sparse Recov. 9(263-340), 227 (2010) 10. T. Chan, A. Marquina, P. Mulet, et al., High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 11. M. Aghagolzadeh, A. Segall et al., Kernel smoothing for jagged edge reduction, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2013), pp. 2474– 2478 12. H. Liu, Z. Wei et al., An edge-adaptive structure tensor kernel regression for image interpolation, in 2010 2nd International Conference on Future Computer and Communication, vol. 2, (IEEE, 2010), pp. V2-681-V2-685 13. H. Takeda, S. Farsiu, P. Milanfar et al., Kernel regression for image processing and reconstruction. IEEE Trans. Image Process 16(2), 349-366 (2007) 14. H. Takeda, S. Farsiu, P. Milanfar et al., Deblurring using regularized locally adaptive kernel regression. IEEE Trans. Image Process. 17(4), 550-563 (2008) 15. S. Lefkimmiatis, A. Roussos, P. Maragos, M. Unser et al., Structure tensor total variation. SIAM J. Imaging Sci. 8(2), 1090–1122 (2015) 16. F. Shi, J. Cheng, L. Wang, P.-T. Yap, aD. Shen et al., LRTV: MR image super-resolution with low-rank and total variation regularizations. IEEE Trans. Med. Imaging 34(120, 2459-2466 (2015) 17. E.J. Candès, X. Li, Y. Ma, J. Wright et al., Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011) 18. J.-F. Cai, E. J. Candès, Z. Shen et al., A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956-1982 (2010) 19. S. Gu, L. Zhang, W. Zuo, X. Feng et al., Weighted nuclear norm minimization with application to image denoising, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869 (2014) 20. S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, L. Zhang et al., Weighted nuclear norm minimization and its applications to low level vision. Int. J. Comput. Vision 121(2), 183-208 (2017) 21. Z. Ren, C. He, Q. Zhang et al., Fractional order total variation regularization for image superresolution. Signal Process. 93(9), 2408–2421 (Elsevier, 2013) 22. K. Yang, W. Xia, P. Bao, J. Zhou, Y. Zhang et al., Nonlocal weighted nuclear norm minimization based sparse-sampling CT image reconstruction, in IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) (IEEE, 2019), pp. 1700–1703
Chapter 16
Identification of the Diseased Portions of an Infected Aloe Vera Leaf by Using Image Processing and k-Means Clustering Sudip Chatterjee and Gour Sundar Mitra Thakur
1 Introduction Human civilization from time immemorial has been using various plants as remedies or medicines for different diseases apart from using plants as food crops [2]. There are innumerable plants from which various medicines are produced. Cinchona, Belladona, Ephedra, Opium Poppy, Rauwolfia serpentina, Aloe Vera, and many other important plants are used to generate medicines for different diseases in human being [26]. Aloe Vera is one of the extensively cultivated medicinal plants around the world, ranging from tropical to temperate regions. Various herbal drinks and drugs are formulated from Aloe Vera to maintain good health. In cosmetic industry, Aloe Vera is used in the production of soap, shampoo tooth paste, hair wash and body creams [8]. Aloe Vera gel has also been reported to be very effective for the treatment of wounds and sores, skin disease, colds and coughs, constipation, piles, asthma, ulcer, diabetes and various fungal infections [8–10, 22]. Like other plants Aloe Vera also get affected by various diseases. Table 1 lists different diseases with their severity commonly found in Aloe Vera plants [4, 6, 7, 12, 13, 15, 17, 18]. Observation for identification of plant diseases by the experts through naked eye is the traditional and prime approach adopted for years [27]. The experts determine the disease by observing the symptoms of the disease on leaves and stems of the Aloe Vera plants. The main disadvantage of this process is continuous monitoring by experts which leads to an expensive outcome in large farms. In addition, usually experts charge high consultation fees. Laboratory tests are also performed for the identification of the diseases. In [6], slides of the fungal part are prepared and observed S. Chatterjee (B) Department of Computer Science and Engineering, Amity University, Kolkata, West Bengal, India G. S. M. Thakur Department of Information Technology, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_16
195
196
S. Chatterjee and G. S. M. Thakur
Table 1 Diseases found in Aloe Vera plants Name of the diseases Symptoms Alternaria leaf spot
Aloe rust Sooty mold
Basal stem rot
It is small, circular or oval-shaped, dark brown necrotic sunken spots on the leaves This fungus creates black or brown spots on the leaves A fungal infection that plays a role in break out of aphids or mealy bugs. Aphids and mealybugs are pests that absorb moisture from juicy leaves of Aloe Vera, and leave a clear, sticky substance called honeydew behind This fungus results into rotting of stems. Rotten Aloe Vera tissues turn black or reddish brown
Severity Each spot may cover an area of 28 cm diameter. The leaf surface becomes rotten and dried within 4–7 days Spreads with 5 cm diameter maximum. Damages leaf It covers entire leaves and growing tips
Plant dies more quickly
under compound microscope. Manuals are used for identification. In [25], first direct isolation and dilution method or pour plate methods were applied for isolation of fungi and then identification of fungi was done on the basis of studying the morphological and cultural characteristic with the help of manuals, monographs and various research papers. Laboratory tests are also expensive and time-consuming. So, to reduce the time and cost for the identification of diseases some automated disease identification methods were in demand and application of automation to detect any disease and predict the type of disease is in practice nowadays [3, 5, 16]. In [28], the authors have collected external reflectance spectra of sample apples, and then for each of the sample apple, the reflectance spectrum samples of chilling injured area are scanned. Then a cloud-based system is used where support vector machine (SVM) and artificial neural network (ANN)-based system are used to classify and predict the data computed upon spectrum samples. In another article [21], researchers have used deep learning by employing a deep neural network in their work where the neural network maps the input layer to the output layer over a series of stacked layers of nodes. This network has the ability to improve mapping during the training process. The authors took 54,306 images with 38 classes for training (50% of total) as well as testing (50% of total). The system was proved to be efficient to work with high volume of data and classes by making it more useful and faster. In another research [1], authors propose a novel way of combining image processing and machine learning concepts to detect and classify diseases on plant leaves. Later an ANN is designed to predict disease present on a leaf. This leads to fast, automatic, easy and less expensive mechanism. The authors in [23] have used drone to collect images of plant parts and then used a SVM with kernel for classification of
16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf …
197
the images. In [27], authors have implemented image processing mechanisms like Otsus method and Sobel operator to grade the disease of leaf from the leaf images. Authors in [5] have used SVMs to identify visual manifestations of cotton diseases. Detection of disease in plants should contain the advantages like accuracy, speed and economy [11]. In this article, a model is proposed to detect the diseased portion of an Aloe Vera leaf by applying image processing and k-means clustering algorithm. In the first phase of the proposed model, k-means clustering is applied to identify the infected leaves, and then in the second phase, image feature extraction is applied to identify the diseased portion of the Aloe Vera leaves. The model is supposed to enhance the accuracy and speed of disease detection with reduced cost when applied in large scale. The rest of the paper is organized as follows: Sect. 2 gives brief explanation of k-means clustering algorithm. In Sect. 3, design of the proposed model is elaborated. Result analysis done in Sect. 4, and finally, Sect. 5 concludes the discussion.
2 k-Means Clustering Algorithm Learning of knowledge is the fundamental basis of machine learning-based systems. A system learns a specific knowledge using any of four learning methodologies, i.e., (a) supervised learning, (b) unsupervised learning, (c) semi-supervised learning, (d) reinforcement learning [20]. There are different algorithms for each learning mechanism. Clustering is a mechanism based on unsupervised learning. Unlike supervised learning system, unsupervised learning does not learn from any training data set; that is, no prepared training data set is used to train the system. In unsupervised learning, unlabeled data are used. Here no label or target value is given for the input or output data [14, 20]. Clustering is a mechanism where we group similar items together. Clustering is automatic classification. Anything can be clustered but more the similar items in a cluster better the cluster is. k-means is a strong approach of clustering method. k-means clustering is a method of vector quantization. This clustering method aims to classify a given data set through k clusters which are given at the beginning. Each cluster will contain the nearest mean termed as cluster centers or cluster centroid. It is called k-means because it generates k unique clusters. These centroids are required to be placed in a way keeping in mind that different location causes different result [14, 20, 24]. Therefore, the best option will be to keep the centroids far away from each other. After this, each point belonging to a given data set is to be associated with the nearest centroid. When all the points are thus associated, then the first phase is over and an early clustering is done. Now, k new centroids need to be re-calculated. Once k new centroids are calculated, a new binding has to be created between the same data set points and the nearest new centroid. Through a loop the k centroids change their locations step by step until this change stops; that is, the centroids do not further. The k-means objective function (Eq. 1) is one of the most popular clustering objectives.
198
S. Chatterjee and G. S. M. Thakur
G=
k
d(x, µi )2
(1)
j=1 i∈Ci
In k-means the data is partitioned into disjoint sets C1 , . . . , Ck , where Ci represents each disjoint set. x represents an object in the data set. Function d is Euclidean distance function [19, 24]. The algorithm follows the following steps: 1. Keep k points in the space represented by the objects (here pixel) to be clustered. Initial cluster centroids are represented by these points. 2. Assign each object to the cluster having the closest centroid. 3. Thus, when all objects are assigned, then re-calculate the positions of k number of centroids. 4. Repeat Step 2. and Step 3. until the centroids stop moving. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
3 Design of the Proposed Model The primary concept of a computer vision-oriented approach consists of several common mechanisms [1]. This approach begins with image collection using digital camera. In the next step, the image is converted to a device-independent format and image processing techniques are applied to convert the input image into twodimensional list of pixels to retrieve useful image features. But manipulating pixels, separating unwanted pxels leading to pixel classification and converging the searching to affected area are the main issues. Step-by-step approach of the proposed model is shown in Fig. 1. In the first step, the RGB images of the leaves are acquired. Examples of such image samples are shown in Fig. 2. In both the images (a) and (c), dark spots indicate the presence of disease. In the second step, a color transformation structure (LAB format) is selected for the RGB leaf image as a device-independent color space transformation. Here L is lightness(Intensity), and this L channel is independent of color, A stands for color component ranging from green to magenta and B represents color component ranging from blue to yellow. A and B channels encode the colors. A linear list of pixels is generated, in the next step, with the row, column and channel data of the original array representing the image. In a device-independent color space, the values used to specify the color will produce the same color irrespective of the device being used to create. In a device-dependent color space, presentation of color depends on the devices used to generate it. RGB system is a color space that is device-dependent. The image is further processed in step 3 and a two-dimensional list representing the pixels is generated to retrieve useful image features. In the fourth step, k-means function for clustering pixels is initiated. k-means clustering of dominant pixels are
16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf …
199
Fig. 1 Design diagram of the proposed leave disease detection model
done where the number of clusters to form has varied in this study, from 3 to 5. In the fifth step, green pixels are detected and converted to black color. In this step, a simple technique—masking has been used to clearly distinguish green, white and non-green pixels of an infected leaf. In order to mask the target pixels, clustered image of the original image is taken. Diseased and non-diseased pixels are colored by using black and white colors, respectively. In the last step, inference process to identify the presence of any disease and display of histogram to show the percentage is computed.
4 Result Analysis The system requires image file as input and creates image file as output in different phases. Finally, it determines whether the input file has any disease or not and also computes the percentage of affected area. Stepwise results are discussed below.
200
S. Chatterjee and G. S. M. Thakur
(a) Diseased Leaf
(b) Enlarged photo of the diseased portion of (a)
(c) Diseased leaf with many disease spots
(d) Aloe Rust
(e) Sooty mold in Aloe Vera Fig. 2 Sample images of fungus-infected Aloe Vera leaf
4.1 Image as Input Two sets (categories) of leaf images of two Aloe Vera diseases were taken to test the system. Each set consists of 5 images. Two samples of diseases are shown in Fig. 2. A 2D list of pixels is created from the input image.
4.2 Clustering of Image Pixels The first output computed is the clustered image of the input image. Comparing with the original images, it has been found that much better results were generated when the number of clusters was 3 or 4. One example of k-means algorithm application on images of Aloe Vera plants, and its result is established in Fig. 3.
16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf …
(a) Clustered image of Figure2(b)
201
(b) Clustered image of Figure 2(c)
Fig. 3 Clustered images of the leaves in Fig. 3
(a) Number of clusters is 2
(b) Number of clusters is 3
(c) Number of clusters is 4
(d) Number of clusters is 5
Fig. 4 Clustering of image in Fig. 2b
(a) Number of clusters is 2
(b) Number of clusters is 3
(c) Number of clusters is 4
(d) Number of clusters is 5
Fig. 5 Clustering of image in Fig. 2c
Figure 4 gives the clustered output when k-means function is applied in Fig. 2a and b by considering cluster numbers as 2, 3, 4, and 5. From the figures, it can be noticed that the clustered images have successfully segregated the diseased and non-diseased portions. It has been observed from the images after clustering that clustered images with 3 or 4 clusters have represented the leaf condition more perfectly. It is also giving practically better result to search for disease and the amount of area affected (Fig. 5).
202
S. Chatterjee and G. S. M. Thakur
(a) Masked image Figure 3(a)
(b) Masked image Figure 3(b)
Fig. 6 Masked images of Fig. 3a and b
4.3 Masking Usually, non-green non-white as well as very dark pixels are found to be diseases on the leaves. Here, green and white pixels are masked to white. So, as a result, the diseased area of the leaf becomes black (Fig. 6). First the clustered image is converted into gray image. The darkest infected pixels and less gray uninfected pixels are masked using binary threshold property. The number of darkest pixels are counted as a diseased pixels. The masked image of Fig. 3b is very complex, and the leaf apart from getting infected badly has also started to dry up, and thus, the image has become a complex one also. This is reflected in masked image of Fig. 6b.
4.4 Calculation of Percentage of the Diseased Portion Percentage of the diseased portion of leaves is calculated for all the figures from Fig. 2b–e. The calculations are done on the masked images, obtained after clustering. First the total number of pixels (Pixeltotal )are obtained from the masked images. Then the number of diseased pixels (i.e., the black pixels) (Pixelblack )are obtained from the same masked images. Finally, the percentage is calculated with Eq. 2. Disease(%age) =
Pixelblack ∗ 100 Pixeltotal
(2)
Manual calculation of the diseased area of a leaf may be accomplished using graph paper where the image of the leaf is placed. Shorter the size of the diseased area easier the calculation. In some cases, the leaves contain many or innumerable diseased areas, and sometimes, diseased areas are so small that calculating total diseased area using graph paper becomes infeasible. In Table 2, calculation of diseased area percentage of each leaf for the five diseases has been shown. The automated system is capable of calculating all five types of diseased sample leaves but manual calculation could not be handled for Fig. 2c and e leaves. The histogram achieved from the automated system also displays the amount of diseased and non-diseased portions clearly.
16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf …
203
Table 2 Calculation of diseased area percentage and comparison with manual calculation Input image sample Histogram Percentage calculation Manual Observation of the diseased portion (Using graph paper) Fig. 2b
• Total number of pixel found: 28,810 • Total number of pixel in diseased area: 4160 • Percentage of diseased area: 14.44
• Sample area: 1582 mm2 • Diseased area: 96.51 mm2 • Percentage of diseased area: 6.1
Fig. 2c
Total number of pixel found: 33800 Total number of pixel in diseased area: 8486 Percentage of diseased area: 25.11
• Sample area: 12, 993 mm2 • Diseased area: many spots found so calculation of diseased area become infeasible
Fig. 2d
• Total number of pixel found: 16275 • Total number of pixel in diseased area: 2551 • Percentage of diseased area: 15.67
• Sample area: 354.30 mm2 • Diseased area: 26.19 mm2 • Percentage of diseased area: 7.39
Fig. 2e
• Total number of pixel found: 55,944 • Total number of pixel in diseased area: 16,265 • Percentage of diseased area: 29.07
• Sample area: 1683.14 mm2 • Diseased area: Many spots found so calculation of diseased area become infeasible
5 Conclusion In the context of smart or precision farming, detecting the presence of diseases is not only an important aspect, but also the prediction of disease as well as gradation of the plant on the basis of it (i.e., quality) is also an inherent requirement. For this purpose, different AI-based models are being used over the last few decades. These types of applications not only make the whole process automated but also provide highly precised results with relatively lower operating cost. In this article, a simple and highly cost-effective methodology is proposed where image processing and k-means clustering are applied to identify the diseased portion in Aloe Vera leaves. Percentage of diseased portion of leaves is also automatically calculated for possible gradation of the leaves. Though the model is applied here for disease detection in Aloe Vera leaves , the same can be applied for leaves of other plants also.
204
S. Chatterjee and G. S. M. Thakur
References 1. H. Al-Hiary, S. Bani-Ahmad, M. Reyalat, M. Braik, Z. Alrahamneh, Fast and accurate detection and classification of plant diseases. Int. J. Comp. Appl. 17(1), 31–38 (2011) 2. A.G. Atanasov, B. Waltenberger, E.M. Pferschy-Wenzig, T. Linder, C. Wawrosch, P. Uhrin, V. Temml, L. Wang, S. Schwaiger, E.H. Heiss et al., Discovery and resupply of pharmacologically active plant-derived natural products: a review. Biotechnol. Adv. 33(8), 1582–1614 (2015) 3. M.P. Babu, B.S. Rao et al., Leaves Recognition Using Back Propagation Neural NetworkAdvice for Pest and Disease Control on Crops (Expert Advisory System, IndiaKisan Net, 2007) 4. R. Bajwa, I. Mukhtar, S. Mushtaq, New report of alternaria alternate causing leaf spot of aloe vera in Pakistan. Canad. J. Plant Pathol. 32(4), 490–492 (2010) 5. A. Camargo, J. Smith, An image-processing based algorithm to automatically identify plant disease visual symptoms. Biosys. Eng. 102(1), 9–21 (2009) 6. S. Chavan, S. Korekar, A survey of some medicinal plants for fungal diseases from osmanabad district of maharashtra state. Recent Res. Sci. Technol. 3(5), (2011) 7. W. Da Silva, R. Singh, First report of alternaria alternata causing leaf spot on aloe vera in Louisiana. Plant Disease 96(9), 1379–1379 (2012) 8. T. Daodu, Aloe Vera, the Miracle Healing Plant (Health Field Corporation, Lagos, 2000), p. 36 9. R.H. Davis, N.P Maro, Aloe vera and gibberellin. anti-inflammatory activity in diabetes. J. Am. Podiatric Med. Assoc. 79(1), 24–26 (1989) 10. A. Djeraba, P. Quere, In vivo macrophage activation in chickens with acemannan, a complex carbohydrate extracted from aloe vera. Int. J. Immunopharmacol. 22(5), 365–372 (2000) 11. F. Garcia-Ruiz, S. Sankaran, J.M. Maja, W.S. Lee, J. Rasmussen, R. Ehsani, Comparison of two aerial imaging platforms for identification of huanglongbing-infected citrus trees. Comput. Electron. Agric. 91, 106–115 (2013) 12. S.K. Ghosh, S. Banerjee, First report of alternaria brassicae leaf spot disease of aloe vera and its disease intensity in west bengal. Eur. J. Biotech. Biosci. 2(1), 37–43 (2014) 13. S.K. Ghosh, S. Banerjee, S. Pal, N. Chakraborty, Encountering epidemic effects of leaf spot disease (alternaria brassicae) on aloe vera by fungal biocontrol agents in agrifieldsan ecofriendly approach. PloS one 13(3), (2018) 14. P. Harrington, Machine Learning in Action (Manning Publications Co., 2012) 15. R. Heim et al., Appearance of aloe rust in madagascar. Revue de Botanique Appl. 20(223), (1940) 16. C. Hillnhuetter, A.K. Mahlein, Early detection and localisation of sugar beet diseases: new approaches. Gesunde Pflanzen 60(4), 143–149 (2008) 17. S. Jaya, G. Saurabh, A. Bharti, I. Kori et al., Biological management of sootymold disease on butea monosperma (palash) at Jabalpur. Online Int. Interdisc. Res. J. 4(Special Issue March), 189–195 (2014) 18. A. Kamalakannan, C. Gopalakrishnan, R. Renuka, K. Kalpana, D.L. Lakshmi, V. Valluvaparidasan, First report of alternaria alternata causing leaf spot onaloe barbadensis in India. Australas. Plant Dis. Notes 3(1), 110–111 (2008) 19. J. MacQueen et al., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (CA, USA, Oakland, 1967), pp. 281–297 20. M. Mohammed, M.B. Khan, E.B.M. Bashier, Machine Learning: Algorithms and Applications (Crc Press, 2016) 21. S.P. Mohanty, D.P. Hughes, M. Salathé, Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 22. A. Olusegun, One Hundred Medicinal Uses of Aloe Vera (Good health Inc, Lagos, 2000), p. 76
16 Identification of the Diseased Portions of an Infected Aloe Vera Leaf …
205
23. T. Rumpf, A.K. Mahlein, U. Steiner, E.C. Oerke, H.W. Dehne, L. Plümer, Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 74(1), 91–99 (2010) 24. S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, 2014) 25. J. Singh, S. Gupta, P. Mishra, I.P. Kori (????) Screening of bio-control agent for the eco-friendly management of fungal diseases of aloe vera 26. J. Sumner et al., The Natural History of Medicinal Plants (Timber Press, 2000) 27. S. Weizheng, W. Yachun, C. Zhanliang, W. Hongda, Grading method of leaf spot disease based on image processing, in 2008 International Conference on Computer Science and Software Engineering, vol. 6. (IEEE, 2008), pp. 491–494 28. Y. Yang, H. Cao, C. Han, D. Ge, W. Zhang et al., Visible-near infrared spectrum-based classification of apple chilling injury on cloud computing platform. Comput. Electron. Agric. 145, 27–34 (2018)
Chapter 17
Restoration of Deteriorated Line and Color Inpainting M. Sridevi, Shaik Naseem, Anjali Gupta, and M. B. Surya Chaitanya
1 Introduction Line drawings and color drawings are simplistic ways of representation of artists views and have performed an essential role in the expression of preliminary sketches and is widely used in animation, manga, design and cartoon drafts. Any drawings on paper will age over a period due to humidity, ink leakage, ink spots and superimposition from other drawings. Hence, there is an urge to digitize the age-old drawings, but it usually requires manual tracing and is expensive to achieve at large scale. Our work aims to restore the drawings with good quality output, inpainting the missing gaps and filling the colors in patches of the image in a fully automated way and avoiding the usage of any patches that are used by existing methods [1]. Image restoration is a complex function of 2D matrix where simple mathematical formula does not define or classify an image from deteriorated or restored image. There is no benchmark or evaluation to quantitatively specify the exact amount of restoration needed for a drawing. There will always be conflicts of which restoration is better and since how the actual image looks are unknown, qualitative ways of comparing different restoration algorithms is the only way to possibly tell which is performing better. Since it is so abstract, dataset created should be varied and diverse with wide varieties of deteriorations. Thus, creating a close to real world dataset for restoration is a very time consuming and difficult job. Hence, our model focuses on enhancing the line extraction method to improve the restoration and avoids the cost of manually creating dataset for line extraction as done in existing model [2].
M. Sridevi · S. Naseem (B) · A. Gupta · M. B. S. Chaitanya Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India M. Sridevi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_17
207
208
M. Sridevi et al.
The proposed approach does line extraction with smoothing, restoration of lines using convolutional neural network and color inpainting of line drawings on images such as cartoon and manga, without losing essential information in the sharp edges, corners and complex curves which was limitation in the existing methods [2, 3] Thus, it will help to restore ancient age-old drawings such as Da Vinci, digitalization of comics and manga. The organization of the subsequent section of the paper is as follows. Section 2 speaks about all the related works of the paper. Furthermore, Sect. 3 explains the proposed method and the algorithms used at different steps. Then follows Sect. 4, which analyzes different methods and the user survey results conducted over related work papers. Then, the paper is concluded through Sect. 5, followed by references used throughout the paper.
2 Related Work This section discuss about the existing works carried out related to line extraction and color inpainting methods. In paper [4], a performance comparison, analysis and applications of traditional line extraction methods are explained. The widely used line extraction techniques are Hough transform, IEPF, RANSAC algorithm, EM algorithm and line regression. Deep learning techniques are also applied to extract lines recently. Fully convolutional neural network is used to restore the deteriorated line drawing to detect gaps and complete them with correct thickness and curvature [5]. This gives better results than the existing traditional line extraction methods. But, fully CNN has higher time complexities and couldn’t be trained for bigger datasets and hence it couldn’t inpaint complex curves. Time complexity of neural networks with m training examples, t epochs, l number of layers each containing n neurons is O(m.t.l.n2 ) which increases as the training examples and iterations keep increasing. Time complexity of Hough transform is O(S.C.(N + R)) where S is number of line segments extracted by an algorithm, C,R are number of rows and columns, respectively, for the HT accumulator array and N is number of points in input scan as explained in paper [4]. Probabilistic Transform method reduces the voting stage complexity from O(M.N) to O(m.N) where m is edge points chosen from the M edge points. So, we chose probabilistic Hough transform and it is widely used in Computer Vision, Image Analysis and Digital Image Processing which uses a voting procedure and is performed in parameter space. The existing color inpainting models are categorized into two categories such as CNN and traditional methods [6]. Traditional color inpainting models use the nearest neighbor pixels to inpaint the deteriorated pixels and Neural Networks are trained with the dataset of deteriorated and inpainted color images. CNN trained model when it is fed with the deteriorated color line drawing gives the inpainted color line drawing. For filling the large missing regions in the input color line drawing, deep learning techniques have shown good results. CNN outputs reasonable structure of images and texture [7] but they generally create blurry textures and distorted structures which
17 Restoration of Deteriorated Line and Color Inpainting
209
are inconsistent with the surrounding areas. This is mere because of the incapability of CNN in extracting the pixel information from distant neighboring locations and efficiency of the CNN depends solely on the dataset used to train. Traditional methods include patch-based approaches [1]. which use variational algorithms, path similarity and propagate the information from the proper part of the image to the damaged part [8, 9]. These methods work really well for static images and have limited efficiency for non-static images such as natural images. However, traditional methods involve dense computation and expensive operations which prohibits practical applications.
3 Proposed Method Our line drawing restoration uses three modules instead of a single network [2, 10, 11], namely a line extraction module, that extracted the lines with partial filling of gaps, a restoration module that uses convolutional neural network to fill the line drawing gaps with exact thickness and a color inpainting module that fills patches of degraded color and is described as block diagram in Fig. 1. Input image is the grayscale image of the deteriorated line drawing which is sent to the line extraction algorithm to extract the lines. This is further sent to the trained convolutional neural network to restore the lines and this output along with the original image is sent as input to the color inpainting module to inpaint the image nearest to the ground truth. The dataset used to train the restoration convolution neural network model is the augmented data of Da Vinci [12] and Walt Disney [13] dataset. The existing dataset goes through image deterioration and data augmentation techniques to increase the dataset. The image deterioration techniques used are adding background texture addition, fading images and introducing random holes. For data augmentation, we used scaling, translation, rotation, flipping and perspective transformation techniques.
Fig. 1 Block diagram of the proposed model
210
M. Sridevi et al.
3.1 Line Extraction Line extraction is identifying the collection of lines on the fed input. The input fed to the line extraction algorithm is a collection of pre-processed sensor readings. Preprocessing removes the outliers that can decrease the efficiency of the line extraction algorithm output by applying data processing techniques. The output is postprocessed by merging some lines and discarding some lines. The existing traditional line extraction algorithms [4] have been modified so that it works well on deteriorated input images. The algorithm is fed the grayscale image. It passes through 3 stages: edge detection, line extraction and line smoothing. The subsequent Sects. 3.1.1–3.1.3 explain the stages involved in the line extraction algorithm.
3.1.1
Edge Detection
The canny method is used to extract the edges. It uses a multi-stage algorithm which detects edges through edge detection operator from the given input image. It takes four input parameters: Input image, High threshold and Low threshold and aperture size. The high threshold varies from image to image. So, to detect the best possible high threshold differently for each image, Otsu method is used. Otsu’s thresholding: It is a global thresholding technique which thresholds searching process by getting a gray level histogram from the image. It works on the gray level histogram and finds the threshold that maximizes the weighted between class variance of segmented classes as minimizing within class variance is more expensive computationally than maximizing between class variance. Canny Edge Detection: By using the Otsu’s parameters generated for the given input image, it extracts the edges by following the given below steps: 1. 2. 3. 4.
It removes the noise by applying Gaussian filter Applies non-maximum suppression and removes spurious response Potential edges are determined by applying the double threshold The weak edges and the edges which don’t have strong connecting edges are suppressed.
3.1.2
Probabilistic Hough Line Extraction
By applying a probabilistic Hough transform method on the extracted edges, lines are extracted. Probabilistic Hough line extraction method uses random sampling of the edge points. Hough line extraction method varies from probabilistic Hough line extraction method in the way image space is mapped to parameter space. Probabilistic transform method reduces the voting stage complexity from O(M.N) to O(m.N) where m is edge points chosen from the M edge points. The Hough line extraction method takes minimum line length and maximum line gap as parameters along with the edges fed from the Canny method. Minimum line length is the allowed length of
17 Restoration of Deteriorated Line and Color Inpainting
211
the line, and the lines shorter than this are rejected. And the maximum pixels allowed between two lines to treat them as a single line segment is the maximum line gap. Minimum line length is inputted as one not to miss any small extracted line segments as the input fed is a deteriorated image. The maximum line gap is inputted as one; this extracts the complex curve and sharp edges efficiently as multiple individual pixels and prevents it from getting the approximated curve.
3.1.3
Line Smoothing
The minimum line length and maximum line gap parameter values lead to an extracted line drawing with multiple individual pixels extracting all complex curves and sharp edges. An example output image of the probabilistic Hough line extraction with extracted sharp edges and multiple individual pixelated image is given in Fig. 2. Dilation adds pixels in the specified directions using the dilation structuring element which fills the gaps. Erosion removes pixels using the erosion structuring element which reduces the thickness of the lines. Figure 2 illustrates the extracted line image before and after smoothing. These sequential erosions and dilations will restore some parts of the image before feeding it to the network. Figure 4 compares the outputs of existing line extraction with our model. Our line extraction model extracts complex curves pretty well and is depicted by Fig. 5. A new dilation structuring element has been suggested based on empirical results by considering many curves and the direction of neighboring pixels covering complex shapes which inpaints the line gaps. It considers 2 pixels diagonally and 1 pixel horizontally and 1 pixel vertically in both directions. Figure 3 illustrates the dilation structuring element. So after applying dilation and erosion, a partially inpainted smoothened line drawing is obtained. The output of the line smoothing applied on the output of line extraction is shown in the image Fig. 2. The overall algorithm of Line extraction followed by smoothing is explained in Algorithm 1.
Fig. 2 Line extraction before and after smoothening
212
M. Sridevi et al.
Fig. 3 Dilation structuring element
Fig. 4 Line extraction of existing model (left) and our model (right)
Fig. 5 Line extraction on Walt Disney image (left), Probabilistic Hough transform (middle), Proposed model after smoothing (right)
17 Restoration of Deteriorated Line and Color Inpainting
213
Algorithm 1 - Ingenious Line extraction technique Input: Grayscale image with color line drawings Output: Image with extracted lines and from input and partially inpainted image Edge detection: set parameters = default values based on experimental results if Otsu parameters > threshold then: set parameters = Otsu parameters Extract edges using Canny for the derived parameters Line extraction: set minimum allowed length of the line to 1 set the maximum allowed length to 1 Extract the lines using the above set parameters foreach line in lines do set X_coordinates = get_coordinates(line); set Y_coordinates = get_coordinates(line); Draw the line using the coordinates Append the line to the result image Line smoothing: set the dilation structuring element to Fig. 3 value set the erosion structuring element to circle of 3x3 dimension Apply dilation once on the extracted line drawing Apply erosion twice on the extracted line drawing
3.2 Restoration Model This part comprises the second stage of the model which includes the inpainting of the line drawings of the deteriorated sketches or the paintings. The input to this model consists of the extracted lines which are obtained from the line extraction model along with the real input image. The expected output is the inpainted line drawings with filled gaps, present in the input image. The model consists of a CNN which is needed to be trained with the dataset.
3.2.1
Model Architecture
The proposed model uses a CNN model for the restoration model. A different number of layers are tried with the varied number of convolutional layers and upsampling layers out of which combination of 15 convolutional layers and three upsampling layers gives the best performance and hence is the best combination for the model architecture of the restoration model. This CNN consists of 15 convolution layers with all former 14 layers with ReLU as activation layer except the last one which uses tan hyperbolic activation layer to keep the value retrieved in the range [−1,
214 Table 1 The CNN model architecture of proposed model
M. Sridevi et al. Layer type
Kernel type
Stride
Output size
Input
−
−
2×H×W
Convolution
5×5
2×2
32 × H × W
Convolution
5×5
2×2
64 × H/2 × W/2
Convolution
3×3
1×1
128 × H/4 × W/4
Convolution
3×3
2×2
256 × H/8 × W/8
Convolution
3×3
1×1
512 × H/8 × W/8
Convolution
3×3
1×1
512 × H/8 × W/8
Convolution
3×3
1×1
256 × H/8 × W/8
Convolution
3×3
1×1
128 × H/8 × W/8
Upsampling deconvolution
–
–
128 × H/4 × W/4
Convolution
3×3
1×1
128 × H/4 × W/4
Convolution
3×3
1×1
64 × H/4 × W/4
Upsampling deconvolution
–
–
64 × H/2 × W/2
Convolution
3×3
1×1
64 × H/2 × W/2
Convolution
3×3
1×1
32 × H/2 × W/2
Upsampling deconvolution
–
–
32 × H × W
Convolution
3×3
1×1
16 × H × W
Convolution
3×3
1×1
8×H×W
Convolution
3×3
1×1
1×H×W
1]. It also has three upsampling layers which are implemented by the deconvolution method to upsample the image by factor 2 in this case. Hence, an upsampling layer helps in giving an output which should be the same size as the size of the input. Input provided to this network consists of two layers, one is the grayscale input image, and the other layer is the output of the line extracted model which contains all the deteriorated line drawing of the input. This network first internally downsampled the input image into 1/64 of the original image and then restored the original size by upsampling and takes three steps each. First convolutional layer uses the kernel of size 5 × 5 while the other 14 convolutional layers use 3 × 3 sized kernel. The strides used in each convolutional layer is shown in Table 3.1. This model uses a nearest neighbor layer as an upsampling layer. The output size of each of the layers is also illustrated in Table 1. The final output is one channeled and has the input image size. The images fed to the network are of fixed size which is 256 × 256.
17 Restoration of Deteriorated Line and Color Inpainting
3.2.2
215
Training Process
Here the training process is done in such a way that the MSE between the target image and the output image is as minimum as possible. So, the intention of the model is to reduce loss as denoted as: Loss = argmin over q |F(x : q) − y ∗ |
(1)
Here Θ denotes each of the pixels of the output. y* denotes the target image. x denotes the corresponding image input. ADADELTA which is a variant of gradient descent is used for training which converges the total loss of the model in less time. Also, the training is done by using batch normalization methods. The total number of epochs for which the model is trained is 43,000 and is decided based on the validation test error. In the proposed model, input comprises the grayscale input image and the line extracted image from the line extracted model. The layers are trained to fill the gaps present in the image fed as input. The intention of the model remains the same which is to reduce the MSE loss between the output of the model and the target image present. The input and the obtained output of the restoration model are shown in Fig. 6.
3.3 Color Inpainting With time drawings fade and the white becomes warm and blacks become gray. The colors in the drawings fade and turn into different shades making the photo look like an antique than representing anything real. Deterioration of color line drawings can be because of the colors fading with time, pixelated noise, external damage and spillage of oil or water. The proposed method deals with colored line drawings of Walt Disney and cartoon images dataset. The proposed algorithm traverses through all the pixels and uses neighboring pixel values to find the defective pixel and corrects it using the average of correct neighboring pixel values. The model also takes care of not considering the border lines to inpaint the destroyed pixel. Fig. 6 The input and output of restoration model
216
M. Sridevi et al.
The proposed does preprocessing on the image before iterating through the pixels for inpainting the image.
3.3.1
Preprocessing Image
The proposed algorithm finds the dimension of the image, superimposes the actual colored drawing and the restored line drawing, it then traverses through the entire image, identifies and marks the black and corner pixels. While finding the defective pixels or correcting them the neighboring pixels which are not borders (non-black pixels) are considered, this marking is used then to decide on which pixels to consider.
3.3.2
Colored Line Drawing Inpainting
We traverse through the image for all the pixels and for each pixel. 1.
2. 3.
Find out the valid neighboring pixels. We remove the black pixels or the pixels which are slightly different to the black pixels. We ignore all the neighbor pixels where (R, G, B) values are less than (15, 15, 15). Detect if our current pixel is defective by finding the average difference with the valid neighboring pixels and check with our defined threshold. If our pixel is defective, then we replace it with the most repeated pixel value. Our model considers the pixels with difference of less than 5 also as the same while finding out the most repeated pixel. For example, RGB (12, 1, 247) and (12, 1, 243) are considered as the same. The threshold difference to consider as similar pixels is set to 25 based on empirical observations.
We are repeating the above step for 10 times; hence, it can cover the patch of 20px. The above algorithm is explained step by step in Algorithm 2. The sample input and output images of the color inpainting model are given by Fig. 7. Fig. 7 Input and output image of color inpainting model
17 Restoration of Deteriorated Line and Color Inpainting
217
Algorithm 2: Inpainting the color line drawing with automatic patch detection Input1: Inpainted Line drawing from restoration CNN Input2: Deteriorated Color Line drawing Output: InpaintedColor Line drawing a. Image Preprocessing: setimage_to_process as superimposed image ofInput1 and Input2 foreach pixel: if R < 15 and G < 15 and B < 15 then: Mark this pixel as black pixel ifis_pixel_corner() is true then: Mark it as corner pixel b. Colored Line drawing inpainting: set iterations = 0 setchannels = [R,G,B] while iterations < 10: foreach pixel: foreach channels of the pixel: setvalid_neighbors = all non border pixels from the 8 neighbors if pixel defective: set pixel_value =updated value set iterations = iterations + 1
3.4 Integration Module This module concerns the integration of all the three models which includes the line extraction model, the restoration model and the color inpainting model. This module takes care of the flow in which the three models are combined to work effectively for image restoration.
3.4.1
Integration of Line Extraction Module and Restoration Module
The first module is the line extraction model which takes the grayscale image of the input painting or the sketch. The output from this module is the line extracted smoothened image. This module does post-processing on the extracted line image, extracting complex curves and restoring the gaps to certain extent and hence make it easy for the restoration model to fill the gaps. Then, the second module is the restoration model. It takes the output of the line extraction model along with the grayscale input image as two input channels. The result is the inpainted line drawings without gaps. The input sent to the restoration model also includes the input image to give the idea of the gaps which were needed to be filled and to give a more accurate
218
M. Sridevi et al.
Fig. 8 Integration of all 3 modules
and smooth image as the output of the restoration model. This integration is illustrated in left part of Fig. 8.
3.4.2
Integration of Restoration Module and Color Inpainting Module
The result of CNN along with the colored input image is sent as input to the color inpainting model. Finally, the output image of the color inpainting is considered as the restored image with filled gaps and inpainted color output. Here, the result of CNN model is also used as input to superimpose the filled gap images to cover the boundary of the image as well, and then the performance of the module increases and the restored image is more proper and accurate. This integration module is shown in right part of Fig. 8. The final output of the integration module is the restored image and can be compared with the input image. It is seen that the output image has better boundary lines along with the inpainted colors which were faded in the real image.
4 User Survey and Analysis To compare our results, since there will always be conflicts of which restoration is better as how the actual image looks are unknown, we have conducted user survey on the images taken from existing models [2, 3]. For our line extraction survey, output 84.95% of the people prefer our output images over the existing model [2]. For comparing the restoration convolution neural network model’s outputs, we have used Da Vinci dataset [12] which consists of 71 deteriorated old sketches by Leonardo da Vinci, and we used 61 for training and 10 for testing. When we compared the restoration outputs, our model was preferred 79.64% times with the current model [3]. We have introduced manual noise on the cartoon dataset and then restored the colors using our color inpainting model. The color inpainting algorithm is rated 8.97 out of 10 in our user survey. Figure 5 is one of the images used in survey which compares output of probabilistic Hough transform to proposed model.
17 Restoration of Deteriorated Line and Color Inpainting
219
5 Conclusions The proposed model is able to perform inpainting the deteriorated color line drawings using line extraction, restoration network and color inpainting. The CNN model does upsampling and downsampling using convolution and deconvolution layers which help in the restoration of deteriorated image. Color inpainting model takes the output of the restoration network along with the deteriorated color line drawing, and it automatically detects the noise in the image if the noise path is less than 20 pixels and restores it properly using the valid neighboring pixels without destroying the border. The model updates the defective pixel value using the most frequent neighbor pixel value to make sure that defective neighbor pixel values do not influence the result. The output of the color inpainting model is superimposed with the restored extracted line drawing, and this gives the final deteriorated color line drawing. The proposed model doesn’t require any post-processing techniques to be applied. In the future, we will be concentrating on using some deep learning technologies in color inpainting model so that it works for the deteriorated image with any width of noise.
References 1. D. Soheil, E. Shechtman, C. Barnes, D.B. Goldman, P. Sen, Image melding: combining inconsistent images using patch-based synthesis. ACM Trans. Graph. (Proc. of SIGGRAPH) 31(4), 82:1–82:10 (2012) 2. K. Sasaki, S. Iizuka, E. Simo-Serra, H. Ishikawa, Joint gap detection and inpainting of line drawings, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu (2017), pp. 5768–5776 3. K. Sasaki, S. Iizuka, E. Simo-Serra, H. Ishikawa, Learning to restore deteriorated Line Drawing, in The Visual Computer (Proceeding of Computer Graphics International), vol. 34 (2018), pp. 1077–1085 4. V. Nguyen, A. Martinelli, N. Tomatis, R. Siegwart, A comparison of line extraction algorithms using 2D laser rangefinder for indoor mobile robotics. IEEE/RSJ Int. Conf. Intell. Rob. Syst. 65(1), 1929–1934 (2005) 5. S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion. ACM Trans. Graph. (Proc. SIGGRAPH 2017), 36(4), 107:1–107:14 (2017) 6. Y. Jiahui, L. Zhe, Y. Jimei, S. Xiaohui, L. Xin, S. Huang Thomas (2018) Generative image inpainting with contextual attention. Comput. Vis Pattern Recogn. (CVPR), pp. 5505–5514 7. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A. Efros, Context encoders: feature learning by inpainting. Comput. Vis. Pattern Recogn. (CVPR) (2016), pp. 2536–2544 8. T.K. Shih, R.-C. Chang, L.-C. Lu, W.-C. Ko, C.-C. Wang, Adaptive digital image inpainting, in 18th International Conference on Advanced Information Networking and Applications, Fukuoka, Japan, vol. 1, pp. 71–76 (2004) 9. M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 417–424 (2000) 10. E. Simo-Serra, S. Iizuka, K. Sasaki, H. Ishikawa, Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. (In Proc. of SIGGRAPH), 35(4), 121–132 (2016) 11. C. Li, X. Liu, T.T., Wong, Deep extraction of manga structural lines. ACM Trans. Graph. 36(4), 117:1–117:12 (2017)
220
M. Sridevi et al.
12. F. Cole, I. Mosseri, D. Krishnan, A. Sarna, Cartoon set: an image dataset (2018). https://goo gle.github.io/cartoonset/download.html 13. K. Susaki, E. Simo-Serra, Da Vinci Dataset. (2018). https://esslab.jp/~ess/en/data/davincida taset/
Chapter 18
Study to Find Optimal Solution for Multi-objects Detection by Background Image Subtraction with CNN in Real-Time Surveillance System Ravindra Sangle and Ashok Kumar Jetawat
1 Introduction In the real-time world, various important applications having huge significance that provides great security using video data in places like theaters and shopping malls. Also, in the medical therapy, quality of life of the patients is improved. Video abstraction is also provided for better security. During the video analysis, traffic management usually analyzes traffic flow, video editing to obtain design futuristic video effects. In the video surveillance, various researches are applied for real-time system in order to detect the objects. The researches usually include indispensable steps like navigation, object detection, and tracking, lastly, object recognition and surveillance systems. By segmenting the images between foreground and background objects, object detection is performed. The correspondence between the objects in the successive frames of video sequence is established by object tracking. Tracking of boundary contour of a deforming and moving objects is usually object tracking based on contour of a sequence of the images. Firstly, the object having the contour is obtained in the first frame. Once, a rough contour with the desired structure is available for the first image of die sequence, an automatic system outlines the contours of the subsequent images at video rate.
R. Sangle · A. K. Jetawat (B) PAHER, Udaipur, India R. Sangle e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_18
221
222
R. Sangle and A. K. Jetawat
2 Literature Review Many researches have been developed for predicting the ranking based on the online reviews. Some of the studies are as follows. In this section, a survey of recent techniques is presented with its advantage and limitations shown in Table 1. Table 1 Literature survey Authors
Method
Advantage
Disadvantage
Thenmozhi and Kalpana [1]
Sequential outline separation, background subtraction, adaptive motion estimation
The developed model used sequential outline separation method that started background subtraction and foreground detection for motion and object discovery. The developed model uses a procedure for separating the territory of enthusiasm from developed background
The developed model operating at individual situation showed subtle changes in attitude, no height, and the movement of a multicolor, the process of matching important points of a complex image was more than the average condition
Elhoseny et al. [2]
Multi-object detection The developed model and region growing evaluated the object movements were based on the assessed position between successive frames at that point achieved accurate tracking
The developed model showed motion the MODT analysis in order to achieve a better detection rate
Bapu and Florinabel Real-time Image [3] processing method to implement the object classification and detection (RTIP-ODC) technique
The enhanced feature extraction procedure performed Preprocessing, object detection, classification, and validation improved the efficiency of the developed technique
In the developed method, several methods were used in the earlier research works, but still the detection and classification accuracy need to be improved
Nadia Kiaee et al. [4]
The developed model used Haar wavelet transform as the resulting wavelet sub-bands were strongly affected on the orientation elements in the GLCM computation
The developed model introduced an algorithm that was optimization for applications but the histogram color variation of each object was low
Gray-level co-occurrence Matrix (LLCM) in Haar wavelet transformed space, support vector machine (SVM)
(continued)
18 Study to Find Optimal Solution for Multi-objects …
223
Table 1 (continued) Authors
Method
Advantage
Disadvantage
Sengar and Mukhopadhyay [5]
Morphological operation, connected component analysis, flood-fill algorithm
The developed model employed a weighted mean and weighted valiance by performing background subtraction which detailed components using wavelet transformed frame which reduced computational complexity
The developed model faced a challenging task during segmentation using morphological operation of the moving objects which impacted on the qualitative analysis
Nguyen [6]
Artificial intelligence Background modeling Region of Interest (ROI) Convolution neural networks (CNN)
The developed model focused on processing the valid background and moving object in the scene. The developed method significantly reduced the data transmission and storage which also improved the accuracy performance
The developed model needed to be investigated more data for performing transfer learning using developed algorithms which empowered real-time decisions for smart manufacturing
Ammar et al. [7]
Deep detector classifier (DeepDC)
The developed method used DeepSphere for objects segmentation was motivated by the ability of the Generative Adversarial Networks (GAN) classify data well categorize extracted images
The developed DeepDC required generative models for computing great quantities of unlabeled samples to achieve high efficiency on supervised methods and to contribute use of very small quantities of labeled samples
3 Motivation for Study In the existing methods, training data images or instants were required that increased more memory. The images captured could be tilted (angles) or augmented which also created noisy data during the movement of camera. It is challenging to track moving targets due to occlusions and interference in the object’s vicinity such as image backgrounds and surrounding noise. Without approximation of background as more memory was consumed it run faster than other methods as the quality of camera was not considered.
224
R. Sangle and A. K. Jetawat
4 Problem Definition The existing methods have faced challenges during background subtraction such as. • The existing method does not use input factors such as object’s size, appearance interval, and changes in background intensity and does not update the background set for speed. • Object tracking can be a tedious procedure because of the measurement of the data contained in the video. Each tracking algorithm requires an ‘object location system’ either in each frame or whenever a new object appears hi a frame. These factors reduce the efficiency of movement vector fields that utilize fuzzy sets to achieve better tracking productivity.
5 Proposed Methodology In our paper, we have implemented background subtraction method using convolution neural network. CNN is class of neural network; it is mostly used in image and video recognition, recommender system, image classification, NLP. As per the concept of neural network, it consists of input layer, hidden layer, and output layer. Hidden layer usually consists of series of convolution layer and convolve with dot products of each other. Here, we have used ReLU as activation function, followed by additional pooling layers. The core building block of CNN is convolution layer [8]. Hidden layer consists of filters and is extended through input volume. In terms of building blocks, here, we have used convolution layers, spatial arrangements, parameter sharing, pooling layer, ReLU layer, full connected layer, loss layers. As shown in Fig. 1, we have applied CNN on the image. Here, we have downloaded MobileNetSSD_deploy dataset and used the same for the training purposed with predefined weights. After performing training, we have applied CNN on given image and we have found the object with 96.90 accuracy.
Fig. 1 Object Detection using CNN with ReLU function
18 Study to Find Optimal Solution for Multi-objects …
225
Fig. 2 Block diagram of proposed methodology
As shown in Fig. 2, initially, we will import the video and then frame by frame we will apply background subtraction method on the video with KNN/MoG2 filter. Then, by using CNN, we have tracked the object and calculated accuracy. • Sequence Frame Sequence frame creates frame by difference between current frame and previous frame. The sequence frame is then compared with the threshold value. The difference between the frame at time t and the frame at time t-1 is determined as follows: D = |I x, y, t − I x, y, t − 1| > τ
(1)
D is compared with a threshold τ and categorized as follows: P(x, y) =
foreground if D > τ background else
• Tracking multiple object using Improvised CNN
(2)
226
R. Sangle and A. K. Jetawat
CNN models extract local simple visual optimal values such as end-points and corners edges. In next phase, these features are passed to the succeeding layer for identifying the more complex features. Generally, CNN contains set of the layer which contains various layers along with the one or more planes for computation which is connected to the local neighborhood of previous layer [9]. These units are also considered as local feature detector whose activation functions are determined at the learning phase resulting in, feature map formulation. These feature maps can be obtained by using input image scanning by a single weight unit by forming a local field with the combination of previous features and stores in the output. This process of feature generation is similar to the data convolution with kernel. Later, this feature map generation can be considered as a plane which shares the weight of each unit. In the next phase of CNN, data sub-sampling is performed that follows the local and convolutional feature maps for generating the feature distortions, it reduces the spatial resolution of the data and increases the complexity.
6 Results and Discussions In this section, we have focused on subjective quality assessment of proposed method (Figs. 3, 4, and 5). • Results of Object Detection • Results of Object Tracking Process of tracking depends on the result of background subtraction. Figure 6a shows the tracking of persons with accuracy, and we performed same thing with background subtraction method. By applying background subtraction methods, it
Fig. 3 a Is the result of subtracting the second frame from the reference or background frame. b Is the 22nd frame of test video
18 Study to Find Optimal Solution for Multi-objects …
227
Fig. 4 a Is the result of subtracting the second frame from the reference or background frame. b Is the 43rd frame of test video
Fig. 5 a Is the result of subtracting the second frame from the reference or background frame. b Is the 67th frame of test video
Fig. 6 The tracking of persons with accuracy by performed background subtraction method
228 Table 2 Average metrics for MOG2, RMOG, proposed model
R. Sangle and A. K. Jetawat Methods
Recall
Precision
F-measure
MOG2
0.6604
0.5973
0.5566
RMOG
0.5940
0.6965
0.5736
Proposed
0.7265
0.6981
0.6543
is observed the accuracy of the tracking of objects is more as compared to frames without background subtraction. We have tried to find the accuracy of the tracking of objects in multiple frames, and it has been observed the accuracy of the objects tracking without background image subtraction is 93.71% and with background subtraction method is 95.19%. The experimental evaluations are performed to analyzes the performance of proposed method. Table 2 shows the average values for the data set of metrics using different algorithms. The best scores are highlighted in bold.
7 Conclusion In this paper, we have discussed novel method for the tracking of objects in video. The approach is based on detection of objects using background subtraction method and use of CNN with this. Initially, we have imported a frame from the datasets. Then, we have applied background subtraction method with different filtering algorithms. We have implemented and tried MoD2 and KNN algorithm on the test video and fetched the foreground image. After this, we have applied CNN to track the object. After tracking the objects, we have find out the accuracy with background subtraction method and without background subtraction method. And we have observed proposed system will give better results as compared to existing system. The accuracy has been improved by approximately 2%. This approach is a different from existing system and differs from classical approach. As a future scope, we can merge the proposed algorithm with improved CNN techniques by implementing new activation or loss function.
References 1. T. Thenmozhi, A.M. Kalpana, Adaptive motion estimation and sequential outline separation based moving object detection in video surveillance system, Microprocess. Microsyst. 103084 (2020) 2. M. Elhoseny, Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circ. Syst. Sig. Process. 39(2), 611–630 (2020) 3. J. Bapu, D.J., Florinabet, Real-time image processing method to implement object detection and classification for remote sensing images. Earth Sci. Inf. 1–13 (2020) 4. N. Kiaee, E. Hashemizedeh, N. Zarrinpanjeh, Using GLCM features in Haar wavelet transformed space for moving object classification. IET Intel. Trans. Syst. 13(7), 1148–1153 (2019)
18 Study to Find Optimal Solution for Multi-objects …
229
5. S.S. Sengar, S. Mukhopadhyay, Moving object detection using statistical background subtraction in wavelet compressed domain. Multimedia Tools Appl. 79(9), 5919–5940 (2020) 6. M.T. Nguyen, L.H. Truong, T.T. Tran, C.F. Chien, Artificial intelligence based data processing algorithm for video surveillance to empower industry 3.5. Comput. Ind. Eng. 148, 106671 (2020) 7. S. Ammar, T. Bouwmans, N. Zaghden, M. Neji, Deep detector classifier (DeepDC) for moving objects segmentation and classification in video surveillance. IET Image Proc. 14(8), 1490–1501 (2020) 8. P.S. Mane, A.K. Jetawat, Web page recommendation using random forest with firefly algorithm in web mining. Int. J. Eng. Adv. Technol. (IJEAT) 9(3), 499–505 (2020) 9. L.N. Kolhe, V. Khairnar, A.K. Jetawat, Prediction-based parallel clustering algorithm for MCommerce, In Information and Communication Technology for Competitive Strategies, ed by S. Fong, S. Akashe, P. Mahalle, Lecture Notes in Networks and Systems, vol. 40 (Springer, Singapore, 2019), pp. 31–39
Chapter 19
Hole-Filling Method Using Nonlocal Non-convex Regularization for Consumer Depth Cameras Sukla Satapathy
1 Introduction Target localization is a demanding technology in intelligent robotic vision. The depth information of environment captured by RGB-D camera can be widely used in the field of robotic vision, automatic navigation, 3D reconstruction. But due to the poor quality of depth maps obtained by existing sensors, depth recovery is a significant research area. The work in [1] estimates the missing values depending on computation of background of the observed scene. Traditional methods such as auto-regressive (AR) [2], tangent plane approximation [3], manifold-thresholding (JARTM) [4] have been developed for enhancement of depth maps. Some recently proposed deep learning methods have also shown their efficacy. A self-supervised network is utilized to investigate the photo-consistency of adjacent frames from video [5] for inpainting of depth maps. To deal with the sparse input obtained from LiDAR [6] exploit the sparsity invariant operations along with multi-scale. The work in [7], surface normal and occlusion boundaries are predicted using deep learning framework to fill the missing values in depth maps. In this work, we propose an optimization framework which can exploit nonlocal self-similarities in the depth map with the adaptation toward discontinuities. To obtain better solution space we have used MAP-MRF based regularization, and hole-filling problem can be solved using the graduated non-convexity (GNC) algorithm [8]. Our extensive experiments have demonstrated effectiveness of the proposed approach on two public benchmarks: Middlebury dataset [9] and Intel RealSense dataset [10].
S. Satapathy (B) Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_19
231
232
S. Satapathy
(a) Input : γ = 300
(b) γ = 71.19
(c) γ = 30.03
(d) γ = 9.50
(e) γ = 1.69
(f) γ = 0.13
Fig. 1 Progressive filling of depth values at missing locations corresponding to γ iterations in Art depth map [9]. The PSNR value of filled depth map obtained by L R L0ψ [16] = 26.82 dB. The proposed method yields estimate with PSNR = 28.02 dB
2 Nonlocal Non-convex Approach for Hole-Filling in Depth Map 2.1 Degradation Model We model the degraded depth maps captured using the consumer cameras as dobs = M [ d + n]
(1)
where M refers to binary mask corresponding to missing values in degraded observation, d denote filled depth map and n is Gaussian noise.
2.2 DAMRF: Discontinuity Adaptive Markov Random Field In convex MRF prior model, for first order neighborhood the adjacent pixels show high degree of interaction which causes smoothness at discontinuities in the estimated depth map. Hence we have motivated to choose a non-convex prior model to avoid the over-smoothness specially at sharp edges by adaptively minimizing the interaction between two sides of a discontinuity. Figure 2 provides both convex and non-convex MRF model. We have chosen the adaptive interaction function
19 Hole-Filling Method Using Nonlocal Non-convex …
(a)convex MRF
233
(b) non-convex MRF
Fig. 2 Convex and non-convex MRF prior functions
as f (δ) = γ − γ e−δ /γ with δ = d(x, y) − d( p, q) out of different functions suggested by [11]. The value of γ controls the shape of the function (larger value make the function convex). Using the discontinuity adaptive prior function and assuming a first-order neighborhood for MRF, the clique potential is 2
Vc (d) =
f (δ(x, y; p, q))
(2)
( p,q)∈N (x,y)
c∈C
or elaborately c∈C
Vc (d) =
x
4 × γ − γ exp {−[d(x, y) − d(x, y − 1)]2 /γ }
y
−γ exp {−[d(x, y) − d(x, y + 1)]2 /γ }
(3)
−γ exp {−[d(x, y) − d(x − 1, y)] /γ } 2
−γ exp {−[d(x, y) − d(x + 1, y)]2 /γ } and the gradient of the regularizer at kth iteration is G (δ) (x, y) =
2δ(x, y; p, q)e−δ
2
(x,y; p,q)/γ
(4)
(s,t)∈N (x,y)
2.3 Non-local Means Filtering (NLM) Due to redundancy in depth map the nonlocal patches have high degree of similarity [12] with the reference patch centered at pixel s which has to be filled. Thus, we are
234
S. Satapathy
no longer restricted to only a small first-order neighborhood of the reference patch but explore throughout the entire depth map for similar patches. So the pixel s can be computed from the weighted average of all the center pixel values of corresponding nonlocal patches. For a depth map d {d(s)|s ∈ } (d : → R), the computed depth value NL[d(s)], for pixel, s N L[d(s)] =
w(s, t)d(t)
(5)
t∈
where w(s, t) are the weights calculated from similarity of the patch vectors d(Ns ) and d(Nt ), where Ns is a square neighborhood of size 5 × 5 around pixel s and d is estimated depth map. The weight calculation defined as w(s, t) =
−G∗d(Ns )−d(Nt )2 1 h2 e Z ( p)
(6)
where h and G refer to filtering parameter and Gaussian kernel, respectively. The normalizing factor −G∗|d(Ns )−d(Nt )|2 h2 e (7) Z (s) = t∈
2.4 Non-local Means Regularization It has been illustrated in the literature [13, 14] that non-local means based regularizer achieve better performance in comparison to local-interaction based regularizer. Hence we propose a nonlocal regularization frame works to solve the problem of hole-filling of depth profile. The nonlocal version of clique potential function defined in (3) is N L[d(s)] =
s
γ − γ exp {−w(s, t)[d(s) − d(t)]2 /γ }
(8)
t
where w(s, t) are calculated using (6). The gradient term of (8) can be derived as 2 × w(s, t)(d(s) − d(t))ex p{−w(s, t)[d(s) − d(t)]2 /γ }
(9)
2.5 Optimization Framework Since the problem in (1) is inverse and ill-posed, prior information about inpainted depth map d is required to regularize the solution.
19 Hole-Filling Method Using Nonlocal Non-convex …
235
In a MAP-MRF framework the estimate is dˆ = arg min dobs − Md 22 +λ d
Vc (d)
(10)
c∈C
where dobs represents the depth frame with missing region, d is the recovered depth map and M is the hole or occlusion mask. The cost function consists of two terms, the first term is data fitting term and the second term is nonlocal non-convex regularizer in which λ is regularization parameter. Due to non-convexity, graduated non-convexity (GNC) algorithm [8] is used in an optimization framework which is able to handle the issue of local minima. Initially, GNC algorithm approximates the non-convex solution space by a convex one, but later the solution space is slowly varying from convex to the non-convex. The underlying principle of GNC is to choose a high value for γ (init) in (8) to make the function strictly convex and minimize it by gradient descent method. The obtained unique minimum value can be used for successive γ iterations. Gradually the minimization function changes from convex to non-convex by shrinking γ . Monitoring the minimas obtained with γ (init) to γ (target) , the global minimum can be approximated. The gradient term relative to (10), at the k th iteration is given by g(k) =
MT (Md − dobs ) + λτ (k)
(11)
where τ (k) at location p in the filled depth map is gradient of regularization term provided in (9).
2.6 Oversegmentation Filling substantial size holes in depth maps has significant importance as most of the holes appear at object boundaries. In Fig. 1 we have used only the degraded depth map as input since size of holes are very small, though in a total half of the depth values are missing. But for meticulous filling of large holes, the corresponding RGB information plays a vital role. To reduce the conflict of ‘color-depth consistency’ in RGB-D image, oversegmentation, namely superpixel division [15] is used. It provide higher accuracy in comparison to only segmentation of RGB image. The weighted average of self-similar nonlocal patches is chosen inside a superpixel search window rather than traditional rectangular search window as discussed in [12].
236
S. Satapathy
Algorithm 1 :NLDAMRF algorithm for hole filling in depth map 1: Input: dobs , M, d0 2: Patch-size = 5 × 5 3: α = gradient step size 4: λ = penalty parameter 5: γ (init) = 300, γ (target) = 0.1, κ = 0.75 6: k = 1 7: while (γ ≤ γ (target) ) do 8: while ( d k − d k−1 ≤ ) do 9: calculate g(k−1) using Eq. 11 10: dk ← dk−1 − αg(k−1) 11: k =k+1 12: end while 13: γ = max[γ (target) , κγ ] 14: update weights w using dk 15: end while
3 Experiments 3.1 Synthetic Experiment The issue in depth hole-filling is no ground truth ia available for the missing locations. So to prove our method quantitatively randomly distributed missing regions are added as given in [16] on Art depth map of Middlebury dataset [9]. Figure 1 shows the progressive filling of depth values at missing locations corresponding to γ iterations. The PSNR value of inpainted depth map obtained in L R L0ψ [16] is 26.82 dB whereas proposed method yields estimate with PSNR 28.02 dB with ground truth. To show the efficacy of proposed discontinuity adaptive method on depth maps having holes along depth edges, simulation has been done for Kinect-like degradation. Figure 3 demonstrate both qualitative and quantitative comparison on Doll depth map of Middlebury dataset [9]. Proposed approach outperforms JARTM [4] in terms of performance metrics: PSNR and (mean absolute deviation) MAD.
3.2 Real-World Experiment In order to compare our results with state-of-the-art techniques, we perform evaluations on depth map captured with Intel RealSense from SUN RGB-D dataset [10]. First row of Fig. 4 shows two RGB images and corresponding degraded depth maps are provided in second row. The filled-in outputs obtained using [7] is shown in third row of Fig. 4. In the last row, we provide the results using proposed algorithm with non-convex regularizer. From the output of first data, it is observed that the holes
19 Hole-Filling Method Using Nonlocal Non-convex …
(a) RGB image
(c) JARTM [4] PSNR: 41.29 dB MAD : 0.66
237
(b) Degraded depth map
(d) proposed PSNR: 42.01 dB MAD : 0.42
Fig. 3 Kinect like degradation: Performance comparison with [4] (PSNR/MAD) for hole-filling in depth maps
near leg of the chair and stand of the table lamp is filled with more correct values when comparing with [7]. Similarly, for second data the depth values near hand of the chair show the efficient recovery of proposed discontinuity adaptive approach.
4 Conclusion A computational approach is proposed for filling holes in degraded depth maps using the information from corresponding RGB image. A discontinuity adaptive MRF based method with iterative regularization is used along with nonlocal depth information to yield the best estimate. The efficiency of proposed approach has been compared with state-of-the-art methods for two synthetic and one challenging realworld data.
238
S. Satapathy
(a)
(b)
Fig. 4 Completion of Intel RealSense depth data from SUN RGB-D dataset [10]. First row: RGB image. Second row: Degraded depth maps. Third row: Output obtained by deep depth completion [7]. Fourth row: Inpainted depth maps obtained using proposed method
19 Hole-Filling Method Using Nonlocal Non-convex …
239
References 1. M. Stommel, M. Beetz, W. Xu, Inpainting of missing values in the Kinect sensor’s depth maps based on background estimates. IEEE Sensor J. 14(4), 1107–1116 (2013) 2. J. Yang, X. Ye, K. Li, C. Hou, Y. Wang, Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 23(8), 3443–3458 (2014) 3. K. Matsuo, Y. Aoki, Depth image enhancement using local tangent plane approximations, in Computer Vision and Pattern Recognition (2015), pp. 3574–3583 4. X. Liu, D. Zhai, R. Chen, X. Ji, D. Zhao, W. Gao, Depth restoration from RGB-D data via joint adaptive regularization and thresholding on manifolds. IEEE Trans. Image Process. 28(3), 1068–1079 (2018) 5. F. Ma, G.V. Cavalheiro, S. Karaman, Self-supervised sparseto-dense: self-supervised depth completion from lidar and monocular camera, in International Conference on Robotics and Automation (IEEE, 2019), pp. 3288–3295 6. Z. Huang, J. Fan, S. Cheng, S. Yi, X. Wang, H. Li, Hmsnet: Hierarchical multi-scale sparsityinvariant network for sparse depth completion. IEEE Trans. Image Process. 29, 3429–3441 (2019) 7. Y. Zhang, T. Funkhouser, Deep depth completion of a single RGB-D image, in Computer Vision and Pattern Recognition (2018), pp. 175–185 8. K. Ramnath, A.N. Rajagopalan, Discontinuity-adaptive shape from focus using a non-convex prior, in Joint Pattern Recognition Symposium (Springer, Berlin, 2009), pp. 181–190 9. D. Scharstein, C. Pal, Learning conditional random fields for stereo, in Computer Vision and Pattern Recognition, IEEE 1–8 (2007) 10. S. Song, S.P. Lichtenberg, J. Xiao, Sun RGB-D: a RGB-D scene understanding benchmark suite, in Computer Vision and Pattern Recognition (2015), pp. 567–576 11. S.Z. Li, Markov Random Field Modeling in Image Analysis (Springer Science and Business Media, 2009) 12. A. Buades, B. Coll, J.-M. Morel, A non-local algorithm for image denoising, in Computer Vision and Pattern Recognition, vol 2 (IEEE, 2005), pp. 60–65 13. D.H. Salvadeo, N.D. Mascarenhas, A.L. Levada, Nonlocal Markovian models for image denoising. J. Electronic Imaging 25(1) (2016) 14. S. Jonna, S. Satapathy, R.R. Sahay, Super-resolution image defencing using a nonlocal nonconvex prior. Appl. Optics 57(2), 322–333 (2018) 15. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, Slic superpixels. Tech. Rep. (2010) 16. H. Xue, S. Zhang, D. Cai, Depth image inpainting: improving low rank matrix completion with low gradient regularization. IEEE Trans. Image Process. 26(9), 4311–4320 (2017)
Chapter 20
Comparative Analysis of Image Fusion Techniques for Medical Image Enhancement Gaurav Makwana, Ram Narayan Yadav, and Lalita Gupta
1 Introduction Medical imaging plays an important role in medical diagnosis. MRI, CT scan, ultrasound, and x-ray imaging provide a 3-D view of the body part. This diagnostic method makes it possible for the doctor to do complex surgery with ease without opening too much of the body area. CT scan can identify the body’s interior diseased areas without causing discomfort, pain to the patient. The MRI uses a strong magnetic and radio field to pick up the signals from the magnetic particle present in the human body and convert the signal into images of the concerned area with the use of sophisticated computer programs. Image processing techniques can modify the acquired data to analyze the outputs of medical imaging systems to identify symptoms of the patients with ease. The modern era of image processing uses different algorithms for image enhancement, sharpening, smoothing, edge detection, and other operations. Object recognition is a very important pre-processing step in medical image segmentation [1, 2]. Contrast enhancement is providing image clarity to the final processed image. Various edge detection algorithms like Sobel algorithm, Prewitt algorithm, and Laplacian of Gaussian operators [3] are high-frequency phenomena the same as noise therefore, it is very difficult to identify the edge from noise or trivial geometric features. Many times for better diagnosis of the patient current situation, medical images from different imaging systems are required for analysis, but it becomes very tedious for the radiologist or doctor to merge this information. Acquired data from various G. Makwana (B) Department of Electrical and Electronics Engineering, Shri Vaishnav Institute of Technology and Science, Indore, India R. N. Yadav · L. Gupta Department of Electronics & Communication Engineering, Maulana Azad National Institute of Technology, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_20
241
242
G. Makwana et al.
imaging sources can be modified using various image processing techniques so that the diagnosis becomes easy for the doctors. Image fusion can be a solution for such type of diagnosis; it is one of the most contemporary, exact, and valuable diagnostic procedure. In image fusion [4], all the images from multiple sources are fused in such a way that the output image gives a better diagnosis, the resultant image will be more accurate and contain better information than the individual images, so in medical imaging, the image fusion technique is far better in decision making. Medical images fusion can give additional clinical information not diagnosed in the individual images [5, 6]. Image fusion techniques can be of three types, pixel level, feature level, and decision level. The pixel level method is the basic type of image fusion in which source image pixels are processed and it retains the maximum original image information in the fused image. The feature-level method is the decision-level fusion method; it processes the characteristics of the source image. The next image fusion level is decision making. It optimizes the design by using the data information obtained from pixel-level fusion or feature-level fusion. An advantage of the decision level is that it reduces the redundancy and uncertainty of information. Bhateja et al. [7] reduces the redundancy and enhances the contrast of the fused image by cascading the SWT and nonsubsampled contour transform. Na et al. [8] proposed a guided filtering fusion algorithm for MRI and CT image fusion. Guided filtering preserves the edge information and smoothing the image. Singh et al. [9] utilized the ripple transform and NSST transforms in cascade for CT and MRI image fusion. This method provides a better representation of the reference image by enhancing subjective visualization. Some researchers have suggested contourlet transform [10], multiscale and multiresolution methods [11], and nonsubsampled shearlet transform [12] for MRI/CT image fusion. Tian et al. [13] proposed a multimodal image fusion algorithm known as improved PCNN (IPCNN) for better image enhancement. This model fuses high frequency and low-frequency coefficients. Huang et al. [14] have discussed HIS, NSCT, Nonsubsampled contourlet transform, NSST, PCNN Frei-Chen, and CNN method for image fusion. They have also discussed MRI/PET, MRI/CT, and MRI/SPECT image fusion. In this paper comparative analysis of wavelet transform, image averaging, and PCA techniques for image fusion is presented. Wavelet transform is the most reliable method for image fusion and provides exceptional information for dissimilar resolutions. It has good spatial and spectral eminence but images having curved shapes have limited directivity. [7, 15], whereas PCA transforms reduces the dimensionality of the data set of correlated variables and retains the variable present in the dataset [16, 17].
20 Comparative Analysis of Image Fusion Techniques …
243
2 Image Transforms 2.1 Discrete Wavelet Transform DWT decomposes the image into four frequency components as shown in Fig. 1. Low-low (LL) frequency components give average information of an image and other frequency components give directional information like low-high(LH), highlow(HL), and high-high (HH) gives horizontal, vertical, and diagonal coefficients, respectively. l[m] and h[m] are low pass and high pass decomposition filters. Figure 2 illustrates the block diagram of the proposed image resolution enhancement technique. DWT uses scaling function φ(x) and wavelet function ξ(x) which belong to lowpass and high-pass filters, respectively. These filters decompose image into multiple frequency bands. 2-D scaling function and 2-D wavelet function ξ H (x, y), ξ V (x, y)& ξ D (x, y) are separable in nature [18]. A function φ(x, y) is said to be separable if it can be expressed as φ(x, y) = φ(x) φ(y)
(1)
ϕ(x, y) gives coefficient of image approximation and intensity variation in different direction found by ξ H (x, y), ξ V (x, y)& ξ D (x, y).
Fig. 1 DWT Decomposition Filter bank structure
244
G. Makwana et al.
Fig. 2 DWT-based image fusion
The DWT can be represented as ωφ ( j0 , m, n) = √
ωξi ( j, m, n) = √
1
M−1 N −1
MN
x=0 y=0
1
M−1 N −1
MN
x=0 y=0
f (x, y) φ j0 ,m,n (x, y)
(2)
f (x, y) ξ ij,m,n (x, y)
(3)
where i = {H, V, D} refer to the decomposition direction of the wavelet and j0 is an arbitrary start scale.ωφ ( j0 , m, n) represent approximation of image f (x, y) at scale of j0 and ωξi ( j, m, n) denotes the horizontal, vertical & diagonal coefficient at scale j ≥ j0 . DWT is very important in resolution enhancement as well as preserving the high-frequency component.
2.2 Stationary Wavelet Transform Stationary Wavelet Transform (SWT) is translation-invariant process. It doesn’t have down-sampling process as shown in Fig. 3, whereas DWT requires downsampling in each subband, which may cause information loss as shown in Fig. 1 and also it is restricted to squared size image but SWT can be implemented for any size of the image. The 2D SWT applies the transform at each point of the image. It saves the detail coefficients and uses the low-frequency information at each level. Scaling function φ(x) and wavelet function ξ(x) are associated with low-pass and high-pass filters, respectively, [19] in SWT. ∞ √ x −k = 2φ h(n − 2k)φ(x − n) 2 n=−∞
(4)
20 Comparative Analysis of Image Fusion Techniques …
245
Fig. 3 SWT decomposition Filter bank structure
√
2ξ
x 2
∞ −k = g(n − 2k)φ(x − n)
(5)
n=−∞
where h n and gn are the impulse response of lowpass and highpass filter. The decomposition method in SWT results in approximation c j,k and detail d j,k set of coefficient at resolution 2− j where the level of decomposition is j. Equation (4) and (5) can be represented as follows ∞
c j+1,k =
h(n − 2k)c j,n
(6)
g(n − 2k)c j,n
(7)
n=∞ ∞
d j+1,k =
n=∞
It can also be represented as c˜ j+1,k =
∞
h(l)c˜ j,k+2 j l
(8)
h(l)d˜ j,k+2 j l
(9)
l=−∞
d˜ j+1,k =
∞ l=−∞
246
G. Makwana et al.
2.3 Principal Component Analysis The principal component analysis method (PCA) is a feature extraction technique that arranges the input variable in a specific way. It retains the most important variable and leaves the least significant variable, so it reduces the dimensionality of the data set of correlated variables and retains the variable present in the data set. PCA identifies pattern data where it can highlight their similarity and differences. The principal component which is a linear combination of the original variable is orthogonal to each other, so there is no redundancy in PCA [16]. Figure 4 shows the PCA operation to improve image resolution. PCA decomposed the input image into a sub-image of the different frequency components. There are two types of methods to find the principal components, covariance matrix and singular value decomposition (SVD). Here covariance matrix is used to find the principal components of both the images and then this extracted feature is merged to reconstruct the updated image. Let us assume that empirical mean of the n-dimensional vector X is zero and the orthonormal projection matrix U has property such that [20] U −1 = U T
(10)
Y = UT X
(11)
Cov(Y ) = E[Y Y T ]
(12)
Cov(Y ) = E[(U T X )(U T X )T ]
(13)
Cov(Y ) = U T E[X X T ]U
(14)
Cov(Y ) = U T Cov(X )U
(15)
CT- Image
PCA MRI- Image Fig. 4 PCA-based image fusion
Fused Image
20 Comparative Analysis of Image Fusion Techniques …
U Cov(Y ) = Cov(X )U
247
(16)
where ⎡
λ1 ⎢0 ⎢ ⎢ ⎢ .. Cov(Y ) = ⎢ ⎢ .. ⎢ ⎢ ⎣ .. 0 U = [U1
0 λ2 .. .. .. 0
U2
⎤ 0.................0 0.................0 ⎥ ⎥ ⎥ ................... ⎥ ⎥ ................... ⎥ ⎥ ⎥ . ................... ⎦
(17)
0.................λn U3 .............U N ]
(18)
we can generalize the form as λ I = Cov(X )U I
(19)
where I = {1, 2, 3......., n} and U I is the Eigen vector.
2.4 Image Fusion by Averaging Image fusion can be done by averaging the corresponding pixels in input image. I F (x, y) =
IC T (x, y) + I M R I (x, y) 2
(20)
During the fusion process, the fused image must have all the important characteristics of the original image without any loss. The advantage of the MRI technique is that it can visualize different soft tissues of the body with better contrast which can’t possible in conventional X-ray, computed tomography (CT), ultrasound, etc., although it gives a better method for diagnosis due to the presence of liquid in the human body it generates low contrast images. To get better contrast, the power of MRI can be increased but it may harm the body tissue and bones. Computed tomography (CT) is another method for brain clinical examinations. CT scan is a better imaging technique in the diagnosis of hard tissue compare to MRI. Figure 5 shows that the CT scans have high contrast as we can see that the visibility of fontanel in MRI scans is very poor, while in CT scans, the discontinuities in the skull distinguish the fontanel from other brain tissues.
248
G. Makwana et al.
Fig. 5 Image Fusion using averaging
3 Result In this paper, the MRI and CT image of size 256 × 256 are used for fusion algorithm analysis. The resolution of both images is the same. The simplest fusion method is image averaging, which averages the corresponding pixel of the source image as shown in Fig. 5. Figure 6 shows the result of the SWT fusion method. Figures 7 and
Fig. 6 Image fusion using SWT
Fig. 7 Image fusion using DWT
20 Comparative Analysis of Image Fusion Techniques …
249
8 show the result of image fusion based on DWT and PCA. The experimental results show that the fused image has better information on brain hard and soft tissue than the separate CT and MRI images. Performance metrics of the fusion techniques are shown in Tables 1 and 2. The performance of the PCA method is better in comparison with the other fusion algorithms. Pixel averaging, SWT, and DWT methods have low contrast as shown in Figs. 5, 6, and 7, whereas in PCA-based fusion in Fig. 8 have high information content and better contrast. Figure 10 shows the histograms of the fusion techniques; the PCA technique shows a wide range of intensity scales that is an indication of high contrast image. It has a high mean, variance, SNR which characterizes image quality. The results show that the PCA method is a more effective fusion technique (Fig. 9).
Fig. 8 Image fusion using PCA
Table 1 Fusion techniques and their Quality Measures-1 Type of transform
Correlation between CT and Fused Image
Correlation between MRI and Fused Image
SNR between CT image and fused image
SNR between MRI image and fused image
DWT
0.568089
0.846860
16.05 db
16.03 db
SWT
0.574868
0.845058
16.08 db
16.03 db
Averaging
0.569123
0.851580
16.07 db
16.07 db
PCA
0.091151
0.999336
10.56 db
35.36 db
Table 2 Fusion techniques and their Quality Measures-2 Type of transform
Mean of Fused Image
Std of Fused Image
MAE of Fused Image with CT
MRI
DWT
32.900344
12.028596
27.9169
27.9576
Skewness
Variance
1.8404
1.0740e + 03
SWT
32.901367
12.070341
27.7454
28.0959
1.8512
1.0882e + 03
Averaging
32.901360
11.965921
27.8570
27.8570
1.8193
1.0625e + 03
PCA
52.999467
20.124869
52.6402
3.0737
1.8297
2.5654e + 03
250
G. Makwana et al. Histogram of CT Image
Histogram of MRI Image 1500
7000 6000 5000
1000
4000 3000
500
2000 1000
0
0
(a)
(b)
Fig. 9 Image histogram, a CT image, b MRI image Histogram of Image Fusion using SWT
Histogram of Image Fusion using Averaging 2000 1800 1600 1400 1200 1000 800 600 400 200 0
1800 1600 1400 1200 1000 800 600 400 200 0
(a)
(b)
Histogram of Image Fusion using DWT 2000
Histogram of PCA Image Fusion 1500
1800 1600 1400
1000
1200 1000 800
500
600 400 200 0
0
(c)
(d)
Fig. 10 Histogram of output image of different fusion technique. a Averaging technique, b SWT technique, c DWT technique, d PCA technique
20 Comparative Analysis of Image Fusion Techniques …
251
4 Conclusion This paper presents a comparative study of the performance of the different fusion methods. Brain MRI and CT image fusion can be used to get a better quality image for perfect diagnosis. MRI is used to diagnose detailed internal structures of the body soft tissue but it generates low contrast images, whereas CT scans generate high contrast images of hard tissue, so we fuse both the images to get better representation for diagnosis. SWT, DWT, Averaging, and PCA are used for image fusion. Comparing the results obtained from the fusion method shows that the fused image is clearer and preserves the CT and MRI image information. It is also observed that PCA provide a better result than other three fusion technique in terms of minimum redundancy, preserve the morphological detail of original images, and gives improved contrast for human perception. The results show PCA has a high standard deviation, variance, high correlation coefficient, and SNR. From the statistical and subjective analysis, it is shown that PCA performance is better than other fusion algorithms.
References 1. M.I. Rajab, M.S. Woolfson, S.P. Morgan, Application of region-based segmentation and neural network edge detection in lesions. Comput. Med. Imaging Graph 28, 61–68 (2004) 2. H. Tang, E.X. Wu, Q.Y. Ma, D. Gallagher, G.M. Perera, T. Zhuang, MRI brain image segmentation by multi-resolution edge detection and region selection. Comput. Med. Imaging Graph 24, 349–357 (2000) 3. A. Huertas, G. Medioni, Detection of Intensity changes with sub pixel accuracy using Laplacian- Gaussian masks. IEEE Trans. Pattern Anal. Mach. Intelligence 8, 651–664 (1986) 4. Y. Zheng, X. Hou, T. Bian, Z. Qin, Effective image fusion rules of multi-scale image decomposition, in Proceedings of the 5th International Symposium on image and signal Processing and Analysis, (Istanbul, Sept. 2007), pp. 362–366 5. C.Y. Wen, J.K. Chen, Multi-resolution image fusion technique and its application to forensic science. Forensic Sci. Int. 140, 217–232 (2004) 6. H. Xie, G. Li, H. Ning, C. Menard, C.N. Coleman, R.W. Miller, 3D voxel fusion of multimodality medical images in a clinical treatment planning system, in Proceedings of the 17th IEEE Symposium on Computer-Based medical System (CBMS’04), IEEE Computer Society ( Bethesda, MD, USA, June 2004), pp. 48–53 7. V. Bhateja, H. Patel, A. Krishn, A. Sahu, A. Lay-Ekuakille, Multimodal medical image sensor fusion framework using cascade of wavelet and contourlet transform domains. IEEE Sens. J. 15(12), 6783–6790 (2015) 8. Y. Na, L. Zhao, Y. Yang, M. Ren, Guided filter-based images fusion algorithm for CT and MRI medical images. IET Image Proc. 12(1), 138–148 (2018) 9. S. Singh, R.S. Anand, D. Gupta, CT and MR image information fusion scheme using a cascaded framework in ripplet and NSST domain. IET Image Proc. 12(5), 696–707 (2018) 10. L. Zhan, X. Ji, CT and MR images fusion method based on nonsubsampled contourlet transform, in International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), (Hangzhou, China, August 2016), pp. 257–260 11. J.M. Patel, M.C. Parikh, Medical image fusion based on multi-scaling (DRT) and multiresolution (DWT) technique, in International Conference on Communication and Signal Processing (ICCSP), (Melmaruvathur, India, April 2016), pp. 0654–0657
252
G. Makwana et al.
12. G. Easley, D. Labate, W.Q. Lim, Sparse directional image representations using the discrete shearlet transform. Appl. Comput. Harmon. Anal. 25, 25–46 (2008) 13. C. Huang, G. Tian, Y. Lan et al., A new pulse coupled neural network (PCNN) for brain medical image fusion empowered by shuffled frog leaping algorithm. Front. Neurosci. 13, 2019 (2019) 14. B. Huang, F. Yang, M. Yin, X. Mo, C. Zhong, A Review of multimodal medical image fusion techniques. Comput. Math. Methods Med. 2020, 1–16 (2020) 15. V. Strela, P.N. Heller, The application of multi-wavelet filter banks to image processing. IEEE Trans. Image Process. 8(4), 548–563 (1997) 16. V.P.S. Naidu, Discrete cosine transform based image fusion techniques. J. Commun. Navigation Signal Process. 1, 35–45 (2012) 17. F. Palsson, J.R. Sveinsson, M.O. Ulfarsson, J.A. Benediktsson, Model-based fusion of multi and hyper-spectral images using PCA and wavelets. IEEE Trans. Geosci. Remote Sens. 53, 2652–2663 (2015) 18. J.H. Zhai, S.F. Zhang, L.J. Liu, Image recognition based on wavelet transform and artificial neural networks, in International Conference on Machine Learning and Cybernetics, (Kunming, China July 2008) (2008), pp. 789–793 19. J.C. Pesquet, H. Krim, H. Cartantan, Time invariant orthonormal wavelet representations. IEEE Trans. Signal Process. 44, 1964–1970 (1996) 20. V.P.S. Naidu, J.R. Raol, Pixel-level image fusion using wavelets and principal component analysis. Def. Sci. J. 58(3), 338–352 (2008)
Chapter 21
Feature Selection and Deep Learning Technique for Intrusion Detection System in IoT Bhawana Sharma, Lokesh Sharma, and Chhagan Lal
1 Introduction IoT is an emerging technology and is nowadays widely used in vast fields like health care, transportation, automobiles, military and smart cities. IoT is defined as the interconnected and distributed network of things which are the physical objects like smart phones, smart AC, temperature sensors and actuators with limited storage and computational power. Large amount of heterogeneous devices are connected in IoT networks which generates huge amount of data and thus bring challenges in storage, computation, network security and privacy [1–4]. Machine learning (ML) and deep learning (DL) techniques can process huge amount of data and thus is considered the most suitable for computation in the field of IoT [5–7]. Machine learning techniques are applied on the features selected from the data, whereas deep learning can learn and extract the features from the data automatically. Network security is the major concern in the field of IoT. ML- and DL-based intrusion detection systems (IDSs) detect many types of network attacks in the IoT networks. Different ML techniques used for IDS are k-nearest neighbours and support vector machines, and deep learning techniques are used for extraction process like CNN and auto-encoders [8–10]. Deep learning techniques advantage is that it can extract features automatically and can handle huge and raw data [11]. In this paper, we applied deep learning techniques in KDD Cup data set and found that accuracy is 99%. Intrusion detection for the IoT applications applied different machine learning and deep learning techniques [12–14]. In this paper, we applied correlation with deep learning technique to classify attacks in the data set.
B. Sharma (B) · L. Sharma · C. Lal Manipal University Jaipur, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_21
253
254
B. Sharma et al.
2 Intrusion Detection System in IoT Intrusion detection and its types. An intrusion detection system (IDS) detects intrusions or attacks in the system or a network by monitoring the operations of the network or host and then analysing according to the specific algorithm, thus generating alarm when intrusion occurs [15]. Inside the network, users that undertake misuse the non-authorized rights are internal. Intruders and those users which are outside the target network and seek to gain nonauthorized access of the network are external intruders. IDSs approaches are classified as network-based IDS (NIDS) which identifies the malicious activities within the network and host-based IDS (HIDS) which identifies the malicious activities within the host [16]. • Signature-based IDS: This technique consists of signatures or patterns stored in database, and attacks can be detected by matching the patterns or signatures. It is simple but very expensive and requires large storage space. It detects the attack which are stored in the database, so timely upgradation of the database is needed, and the new signatures of attacks are added in the database. • Anomaly-based IDS: An intrusion is detected according to the behaviour of the system. It identifies the activity different from the predefined normal behaviour and is labelled as an intrusion. New attacks can be detected by indicating any deviation from the normal behaviour but can have many false positives [17, 18].
3 Related Work Machine learning and deep learning techniques are growing nowadays, and recently, researchers have worked in the field of application of deep learning algorithm in developing an intelligent intrusion detection system to detect malicious activities in the network. In the paper, Chawla. S. et al. developed real-time anomaly detection using sequential deep neural network models for low-power resource constrained IoT devices [1]. Proposed model implemented DNN using Keras which is Python deep learning API, deployed for IoT networks using a Raspberry Pi and tested using the Contiki operating system and Cooja network simulator. Chilamkurti et al. proposed another approach for attack detection known as distributed deep learning [5]. Author has found that distributed model performs better than centralized model and shallow models. Experimental results have found that accuracy of attack detection increased to 99%. In [19], DoS attacks are detected by a radial basis function-based scheme consisting of several hidden layers. The radial basis function with weight optimization technique is introduced. The proposed model was tested on the NSL-KDD and the UNSW-NB15 data sets and reported an accuracy of 99.69%.
21 Feature Selection and Deep Learning Technique …
255
In [20], Shone et al. proposed a deep neural network for network intrusion detection on NSL-KDD data set. Author has combined unsupervised feature learning using auto-encoders and detection using a random forest classifier. In [21], a deep learning model is developed for network intrusion having five hidden layers, and each layer consisting of ten neurons was trained on the UNSWNB 15 data set. The model is trained on the data set for ten epochs with tenfold cross-validation, and by the Gedeon method, features are ranked. The proposed scheme reported an accuracy of about 99%. [22] In this paper, intelligent anomaly based detection ADIoT system is introduced in which random forest machine learning algorithm is applied on UNSW-NB15 data set and selected 12 features for training the model. The model achieved the highest classification accuracy of 99.34% with the lowest false positive rate.
4 Proposed Framework We proposed a framework for intrusion detection system in IoT networks using deep learning techniques into four phases: feature selection, feature pre-processing, training and testing the model as shown in Fig. 1. Data set is collected from the network, and then, select the most promising features from the data set. The KDD’99 data set is used as a standard data set for evaluating IDS. NSL-KDD data set contains 41 features and one label showing normal/abnormal attacks [23]. • Data Pre-processing: NSL-KDD data set contains 41 features having 34 continuous values, 3 nominal values and 4 binary values and one label showing normal/abnormal attacks. We design a data pre-processing to first convert features into integer values. Symbolic
Fig. 1 Workflow of the model
256
B. Sharma et al.
features are protocol type, flag and services which are converted using label encoding. Protocols have three values {TCP, UDP and ICMP} which are converted into integer values. • Feature Selection: We find the correlation between each of these variables, and the variable which has high negative or positive values is correlated and gives the overview of the variables which might affect our target variable. Correlation map provides the highly correlated variables, and those variables are dropped. The features ‘srv_count’ and ‘count’ have got strong correlation with each other; during model building, any one of the variables has to be dropped since they will exhibit multicollinearity in the data. • Feature pre-processing: The data which are extracted after encoding and dropping values are rescaled, and then, we split the processed data into training and testing sets. Seventy-five percentage of data is used for training, and 25% is for testing. Both training and testing data contain labels showing either malicious or normal packets. Accuracy of the trained model is verified by the labels in the testing data. • Training and Testing: In the training phase, processed data from the training set is feed into the deep neural networks model for training. When the new data arrives, it is input to the trained neural networks model for detection to classify as malicious or benign packet. Training accuracy is calculated during the training phase. Trained neural network model is tested on testing data set which detects normal or malicious label in the data set. For the multi-class classification, the trained model labels the data according to the belonging category. Testing accuracy is calculated during testing phase.
5 Evaluation and Analysis In this section, we included the description of the data set used and the processing of the data set in terms data pre-processing, feature selection and training model, results and analysis. • Data Description Out of different data sets available for the evaluation of IDS, NSL-KDD data set is standard data set used by researchers. KDD Cup data set contains 41 features having 34 continuous values, 3 nominal values and 4 binary values and one label showing normal/abnormal attacks.
21 Feature Selection and Deep Learning Technique …
257
• Data pre-processing We uploaded the KDD data set file with 41 features to Google’s Colaboratory. We applied label encoding to convert the symbolic features into integer values. Symbolic features {‘protocol_type’, ’service’, ’flag’, ’label’} have {3, 66, 11, 23} categories, respectively, and are converted into integer values by label encoding. • Feature Selection After data pre-processing, we find the correlation between the features of the data sets and find which features are highly correlated. Correlation map provides the highly correlated variables, and those variables are dropped. Features ‘srv_count’ and ‘count’ have got 0.94 value showing the strong correlation with each other as shown in diagram, so one of the features must be dropped since they will exhibit multicollinearity in the data. • Feature pre-processing After feature selection, we pre-process the features to be feed in the deep learning model. Data set with extracted features is stored in Google’s Colaboratory. We split the data set into 75% training set and 25% testing set containing 370,515 training samples with 40 features and 123,504 testing samples, and the features value are normalized. • Experimental Set-up We conducted our experiment on Google’s Colaboratory and used TensorFlow library and Keras. Trained data set is used to train the deep learning neural networks (DNN) model consisting of dense hidden layers having 128 neurons, and the last layer has 23 neurons depending upon the classes, and hidden layers have ReLU activation function with the last layer having softmax function, and to update weights, Adam optimizer is used with a sparse categorical cross-entropy loss function. DNN model is tuned with various parameters and hyperparameters like dropout and regularization techniques. We conducted the experiment with L2 regularization technique having weight decay 0.01 and dropout rate of 0.001. Different architectures of the deep neural networks are implemented having various number of neurons in dense hidden layers and various number of dense hidden layers, excluding the last hidden layer having softmax function and 23 neurons, to classify multiple classes. However, the classification accuracy of different deep neural network model is same. Adam optimizer is applied with various learning rates as 0.01, 0.001 and 0.0001, and we found that higher accuracy is given by the default learning rate of 0.001. The DNN model was trained with different number of epochs and is then set to be 20. The DNN model is evaluated with the test set applied to the model.
258
B. Sharma et al.
• Results and Analysis Here, we discuss the result of multi-class classification deep neural network model in terms of accuracy and confusion metrics. For evaluating the model, the following metrics are calculated: accuracy, F1 score, precision and recall. • True positives (TP) are when the packets are correctly predicted as being malicious. • True negatives (TN) are when the packets are correctly predicted as being benign. • False positives (FP) are when the packets are benign but are predicted as being malicious. • False negatives (FN) are when the packets are malicious but are predicted as being benign. Precision (P) measures the proportion of malicious packets actually correct. Recall (R) measures what proportion of malicious packets were identified correctly. Fmeasure (F1-score) calculates the harmonic mean of precision and recall. Accuracy measures the fraction of packets that was classified correctly as benign and malicious.
Our model with 0.001 learning rate and dropout rate 0.001 and weight decay of 0.01 achieved the training and testing accuracy of 98.25% as shown in Fig. 2. Out of total normal attack, we found 920 false positive attacks. We applied the KNN model on the data set and compared the accuracy and runtime with the proposed DNN model. The highest accuracy of KNN model with five nearest neighbours is 99%, and runtime is more than DNN model.
6 Conclusions and Future Work Deep learning is an intelligent technique for intrusion detection system in IoT networks. Different techniques such as CNN and auto-encoders are used for feature extraction, pre-processing and classification. In this paper, we found that deep neural network applied on the data set gives higher accuracy as compared to KNN model with reduced runtime. As number of epochs and dense layer increases, we can achieve
21 Feature Selection and Deep Learning Technique …
259
Fig. 2 a Accuracy versus epochs with different learning rates, b loss versus epochs with different number of dense layers, c accuracy versus epochs with regularized model having weight decay 0.01 and dropout rate 0.001, d loss versus epochs with regularized model
higher accuracy with reduced loss. However, for multi-class, the number of false positives is major concern. Our future work is to apply different deep learning techniques for feature extraction and compare the accuracy with reduced number of false positives. A new unknown attack needs new algorithm to detect attacks. IoT networks generate huge amount of data, need larger computation and faster response, so we need techniques which work faster with resource-constraint devices. Real-time data in IoT networks needs real time detection.
References 1. G. Thamilarasu, S. Chawla, Towards deep-learning-driven intrusion detection for the Internet of Things. Sensors (Basel, Switzerland) 19(9) (2019) 2. D. Kwon, H. Kim, I. Kim, K.J. Kim, J. Kim, S.C. Suh, A survey of deep learning-based network anomaly detection. Clust. Comput. (2017) https://doi.org/10.1007/s10586-017-1117-8
260
B. Sharma et al.
3. E. Anthi, L. Williams, M. Słowi, G. Theodorakopoulos, P. Burnap, A Supervised Intrusion Detection System for Smart Home IoT Devices, 4662(c) (2019), pp. 1–13. https://doi.org/10. 1109/JIOT.2019.2926365 4. J. Lloret, J. Tomas, A. Canovas, L. Parra, An integrated IoT architecture for smart metering. IEEE Commun. Mag. 54(12), 50–57 (2016) 5. A.A. Diro, N. Chilamkurti, Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Comput. Syst. 82, 761–76 (2018) 6. A.A. Diro, N. Chilamkurti, Distributed attack detection scheme using deep learning approach for Internet of Things. Futur. Gener. Comput. Syst. 82, 761–776 (2018) 7. J.P.D. Comput, R. Kozik, M. Chora´s, M. Ficco, F. Palmieri, A scalable distributed machine learning approach for attack detection in edge computing environments. J. Parallel Distrib. Comput. 119(18–26), 2018 (2018) 8. M. Roopak, P. Gui, Y. Tian, P.J. Chambers, Deep learning models for cyber security in IoT networks, in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) (2020), pp. 452–457. https://doi.org/10.1109/CCWC.2019.8666588 9. F. Hussain, R. Hussain, S.A. Hassan, E. Hossain, C.R. Mar, (n.d.). Machine Learning in IoT Security : Current Solutions and Future Challenges, pp. 1–23 10. N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, P. Faruki, Network intrusion detection for IoT security based on learning techniques, in IEEE Communications Surveys & Tutorials, PP(0), 1 (2020). https://doi.org/10.1109/COMST.2019.2896380 11. D. Li, L. Deng, M. Lee, H. Wang, IoT data feature extraction and intrusion detection system for smart cities based on deep migration learning. Int. J. Information Manage. (March), 0–1 (2019) 12. M. Ishaque, Feature extraction using deep learning for intrusion detection system, in 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS) (2019), pp. 1–5 13. S. Fenanir, F. Semchedine, A. Baadache, Revue d ’intelligence artificielle a machine learningbased lightweight intrusion detection system for the Internet of Things 33(3), 203–211 (2019) 14. T. Issa, C. Science, K. Tiemoman, C. Science, Intrusion Detection System based on the SDN Network , Bloom Filter and Machine Learning 10(9), 406–412 (2019). G. Thamilarasu, S. Chawla, Towards deep-learning-driven intrusion detection for the Internet of Things. Sensors (Basel, Switzerland) 19(9) (2019) 15. E. Hodo, X. Bellekens, A. Hamilton, P. Dubouilh, E. Iorkyase, C. Tachtatzis, R. Atkinson, Threat analysis of IoT networks Using Artificial Neural Network Intrusion Detection System (2020), pp. 4–9 16. B. Sharma, L. Sharma, C. Lal, Anomaly detection techniques using deep learning in IoT: a survey, in 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) 17. R.K. Malaiya, D. Kwon, S.C. Suh, H. Kim, I. Kim, J. Kim, S. Member, An empirical evaluation of deep learning for network anomaly detection. IEEE Access 7, 140806–140817 (2019). https://doi.org/10.1109/ACCESS.2019.2943249 18. N. Sven, Unsupervised anomaly based botnet detection in IoT networks, in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (2018), pp. 1048– 1053. https://doi.org/10.1109/ICMLA.2018.00171 19. N.G.B. Amma, S. Selvakumar, Deep radial intelligence with cumulative incarnation approach for detecting denial of service attacks. Neurocomputing 340, 294–308 (2019) 20. N. Shone, T.N. Ngc, V.D. Phai, Q. Shi, A deep learning approach to network intrusion detection. IEEE Trans. Emerging Topics Comput. Intelligence 2, 41–50 (2018) 21. M. Al-Zewairi, S. Almajali, A. Awajan, Experimental evaluation of a multi-layer feed-forward artificial neural network classifier for network intrusion detection system, in Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS) (2017), pp. 167–172
21 Feature Selection and Deep Learning Technique …
261
22. I. Alrashdi, A. Alqazzaz, AD-IoT : Anomaly detection of IoT cyberattacks 1n smart city using machine leaming, in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) (2019), pp. 305– 310 23. M. Ge, X. Fu, N. Syed, Z. Baig, G. Teo, A. Robles-kelly, Deep learning-based intrusion detection for IoT networks, in 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC) (2019), pp. 256–25609. https://doi.org/10.1109/PRDC47002.2019.00056
Chapter 22
Mobile Element Based Energy Efficient Data Aggregation Technique in Wireless Sensor Networks—Bridging Gap Between Virtuality and Reality Bhat Geetalaxmi Jairam
1 Introduction In WSN, every sensor node is battery operated. For each transmission, energy is consumed [1]. In clustered architecture, all sensor nodes are sending data to the cluster head which in turn sends data to the base station (sink). As and when the number of transmission increases, energy consumed by cluster heads during transmitting data to the base station will be increased [2]. Furthermore, there may be situation when cluster heads are transmitting redundant data to base station, which causes unnecessary energy consumption in transmitting redundant data to base station [3]. This can affect the lifetime of the WSN [4]. This is the major drawback of WSN. Author has made an attempt to overcome this drawback using proposed data aggregation technique.
2 Data Aggregation Framework for Energy Efficiency 2.1 Overview This section introduces the novel approach of data aggregation protocol in WSN. In the proposed approach, author has used mobile node which collects aggregated data from cluster heads and sends it to the sink which helps in saving energy consumption of the entire network [5]. Proposed work is divided into three phases and implemented B. G. Jairam (B) Department of Information Science and Engineering, The National Institute of Engineering, Mysuru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_22
263
264
B. G. Jairam
using two techniques; one is using iSense modular wireless sensor hardware, and another is QualNet GUI simulator. • In Phase I, CHs forward data to the sink node without aggregation. • In Phase II, CHs forward data to the sink node with aggregation. • In Phase III, CHs forward data to the mobile node with aggregation. Mobile node collects data from CH1, traverses towards CH2 where second level of aggregation will be done. Mobile node will send this aggregated data to the sink node in order to save the energy of sink node [6].
2.2 Data Aggregation Framework Using iSense Motes Here, author has set up a test bed with three-tier architecture using iSense modular wireless sensor hardware and software systems of Coalescences product which senses temperature. The following are the module and mote of iSense hardware [7]. Figure 1 shows the iSense core module which is based on a Jennic JN5148 wireless microcontroller [8], a chip that combines the controller and the wireless communication transceiver in a single housing. The controller provides 32-bit RISC computation and runs at a software-scalable frequency between 4 and 32 MHz. It comprises 128k Fig. 1 Core module (CM30X)
22 Mobile Element Based Energy Efficient Data …
265
bytes of memory that are shared by programme code and data. The advantage of this choice is that memory consumption of programme code and data can be traded. Figure 2 shows the iSense environmental sensor module which combines a thermometer and a light sensor for environmental monitoring. Both sensors are accessed via the I2C serial interface. Figure 3 shows the iSense gateway module 2 (GM20) which provides connectivity with other systems such as personal computers using USB. It enables data exchange as well as serial programming of connected core modules. The USB connector can also be used to power other attached iSense modules, including charging the rechargeable battery module. Fig. 2 Environmental sensor module (EM 10)
Fig. 3 Gateway module (GM20)
266
B. G. Jairam
Fig. 4 iSense mote
Figure 4 shows source node which is the integrated version of EM 10 + GM 20 + CM30X + 2X AA battery holder. Cluster head, sink node and mobile node (which are kept on remote car in test bed) are the integrated version of GM 20 + CM30X + 2X AA battery holder.
2.2.1
Test Bed Configuration for Three Different Phases
This architecture has two clusters. Each cluster consists of two source nodes (temperature sensor) and one cluster head (CH) [9]. In three-tier architecture, the first level consists of source nodes, second level will be having cluster heads (CHs), and third level has sink node which is interfaced to the laptop/PC. Apart from this, it has one mobile node (remote car) which will be acting as intermediate node between CHs and sink node (used only in Phase III). Entire set up can work successfully over line of sight (LOS) of 100 m according to the iSense data [10, 11]. This test bed is tested for 10m LOS in open space. All the sensor nodes are battery powered. Source nodes periodically read the temperature value which is considered as the data for this experiment. Every sensor node will lose 16 mAh of its charge while receiving data. At the same time, they will lose 14 mAh of its charge while transmitting data. When the charge in the CH reaches to threshold value (0 mAh), it will not receive data from the sensor nodes nor transmit data to the sink node. Similarly, whenever the charge in the sink node reaches to threshold value (0 mAh), they will stop receiving the data from the CH. Table 1 shows layers of three-tier architecture.
22 Mobile Element Based Energy Efficient Data … Table 1 Layers of three-tier architecture
(a)
267
Layers
Node types
Numbers
Tier 1
Sensor nodes
4
Tier 2
Cluster heads
2
Tier 3
Sink
1
Phase I
Test bed configuration is same as illustrated early. Source node senses temperature periodically and transmits it to the respective CHs. CHs in turn transmit this temperature to sink node. Individual sensor node is flashed with appropriate algorithm as mentioned below. Periodically, charge (mAh) is measured at CHs and sink node. Figure 5 shows three-tier architecture with two cluster heads and one sink without mobile node, without data aggregation at cluster heads. Figure 6 is the set up with two CHs without data aggregation and one sink node. The following are the list of algorithms at different layers of Phase I for sensing and transmitting data (temperature) from sensor nodes to CHs and from CHs to sink node. Algorithm 1: Algorithm for source node-1 Input: Measure the temperature t1. Output: t1—Send temperature from source node 1 to cluster head-1. Call task to read the sensor Send measured temperature t1 to cluster head-1 Task repeats for every 10 s Algorithm 2: Algorithm for source node-2 Input: Measure the temperature t2. Fig. 5 Three-tier architecture without mobile node and without data aggregation at CHs
268
B. G. Jairam
Fig. 6 Test bed for Phase I
Output: t2—Send temperature from source node 2 to cluster head-1. Call task to read the sensor Send measured temperature t2 to cluster head-1 Task repeats for every 11 s Algorithm 3: Algorithm for source node-3 Input: Measure the temperature t3. Output: t3—Send temperature from source node 3 to cluster head-2. Call task to read the sensor Send measured temperature t3 to cluster head-2 Task repeats for every 13 s Algorithm 4: Algorithm for source node-4 Input: Measure the temperature t4. Output: t4—Send temperature from source node 4 to cluster head-2. Call task to read the sensor Send measured temperature t4 to cluster head-2 Task repeats for every 15 s Algorithm 5: Algorithm for cluster head-1 Input: t—Temperature from source 1 and source 2. Output: 1. Measure charge in mAh at cluster head 1 for every time interval. 2. T —Temperature obtained from source node Receives temperature t Send the temperature value to sink node Algorithm 6: Algorithm for cluster head-2 Input: t—Temperature from source 3 and source 4. Output: 1. Measure charge in mAh at cluster head 2 for every time interval.
22 Mobile Element Based Energy Efficient Data …
269
2. T —Temperature obtained from source node Receives temperature t Send the temperature value to sink node Algorithm 7: Algorithm for Sink Node Input: t—Temperature from cluster head 1 and cluster head 2. Output: 1. Measure charge in mAh at sink node for every time interval. 2. T —Final temperature at sink node Set charge value to 1200 Receive temperature t till it becomes zero Decrement the charge value by 14 Display the charge (b)
Phase II
Test bed configuration is same as illustrated early. This configuration is modified by incorporating data aggregation concept at CHs as shown in Fig. 7 which improves energy efficiency. CHs in turn transmit this aggregated data to sink node. Periodically, charge (mAh) is measured at CHs and sink node. Algorithm for sensor nodes 1 to 4 and sink node is same as mentioned in Phase I. The following algorithms for data aggregation are incorporated to add intelligence at CHs to improve energy efficiency. Algorithm 8: Improving Energy Efficiency of the Sink Node by Data Aggregation Method at CH-1 and CH-2 Input: t1—temperature from source 1 and t2—temperature from source 2. t3—temperature from source 3 and t4—temperature from source 4.
Fig. 7 Three-tier architecture with two cluster head and one sink without mobile node and data aggregation at cluster heads
270
B. G. Jairam
Output: 1. Measure charge in mAh at CH1 and CH2 for every time interval. 2. T 1 and T 2 are temperatures obtained after averaging. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. (c)
Initialize count = 0 Set charge value to 1200 Receive temperature t1, t2 and t3, t4 till charge becomes zero Decrement the charge value by 14, i.e. charge = charge-14 Display the charge If (count < 1) Receive temperature t1 (t3) and increment count by one Else Receive temperature t2 (t4) and decrement count by one Compute temperature T 1 = (t1 + t2)/2 and T 2 = (t3 + t4)/2 Send temperature T 1 and T 2 to sink End if Phase III
Test bed configuration is same as illustrated early. Along with mobile agent (remote car) which will be acting as intermediate node between CHs and sink node as shown in Fig. 8, this configuration is modified by incorporating two-level data aggregation concept, the first level at CHs and second level at mobile node to further improve energy efficiency. Mobile node visits CH1 collecting data from CH1, and then, it moves forward to second CH which in turn transmits this aggregated data to sink node. Periodically, charge (mAh) is measured at CHs and sink node [12]. Here, round-trip time (RTT) of mobile agent is set to 90 s. When mobile node visits CHs, if CHs does not have any data in its buffer, make mobile node to go in sleep cycle to conserve energy of mobile nodes. Figure 9 shows the set up with mobile node and data aggregation at cluster heads and at mobile node. Algorithm for sensor nodes 1 to 4 and sink node is same as mentioned in Phase I. Algorithm for CHs is same as mentioned in Phase II. The following algorithm is for intermediate mobile node with sleep/awake cycle to further improve energy efficiency of entire network. Algorithm 9: Improving Energy Efficiency by introducing Intermediate Mobile Node with sleep/awake cycle Input: t1—Temperature from cluster head-1, t2—Temperature from cluster head-2. Output: T —Temperature obtained by averaging t1 and t2 at mobile node. Mobile node in awake cycle. 1. 2. 3. 4.
Initialize count = 0 Set charge value to 1200 Receive temperature t1 and t2 till charge becomes zero Decrement the charge value by 14
22 Mobile Element Based Energy Efficient Data …
271
Fig. 8 Three-tier architecture with mobile node and data aggregation and redundancy removal at CHs and at mobile node
Fig. 9 Test bed for Phase III
272
5. 6. 7.
8. 9. 10. 11. 12. 13. 14.
B. G. Jairam
Display the charge If (count < 1) Receive temperature t1, when mobile node is near to cluster head 1 range and increment count by one Else Receive temperature t2, when mobile node is near to cluster head 2 range and decrement count by one Compute temperature T = (t1 + t2)/2 Send temperature T to sink node Set round-trip time (RTT) = 90 s When mobile node visits CHs, if CH’s does not have any data in its buffer, make mobile node to go in sleep cycle Else continue End if
2.3 Data Aggregation Framework Using QualNet Here, author has set up test bed with three-tier architecture using QualNet simulator. Setup requirements: • • • • •
Total number of nodes: 10 Number of nodes/end devices: 6 Number of cluster head: 2 Number of sink node: 1 Number of mobile node: 1 Scenario description:
1. 2. 3. 4. 5.
Library used—sensor networks Terrain size—500 × 500 Simulation time—1000 s Energy model—generic Battery model—linear model
22 Mobile Element Based Energy Efficient Data …
273
Test bed configuration for three different phases (a)
Phase I
Here, source node senses data periodically and transmits it to respective CHs. CHs in turn transmit this data to sink.
Snapshot 1: Test Bed for Phase I using qualnet Snapshot 1 shows set up with two cluster heads and one sink without data aggregation at cluster heads. (b)
Phase II
This configuration is modified by incorporating data aggregation concept at CHs. It improves energy efficiency [5]. CHs in turn transmit this aggregated data to sink node.
274
B. G. Jairam
Snapshot 2: Test Bed for Phase II Snapshot 2 shows set up with two cluster heads and one sink with data aggregation at cluster heads and without mobile node. (c)
Phase III
Here, mobile node is used which will be acting as intermediate node between CHs and sink. This configuration is modified by incorporating two-level data aggregation concept. The first level at CHs and second level at mobile node to improve further energy efficiency. Mobile node visits CH1 collecting data from CH1, and then, it moves forward to CH2 in turn transmits this aggregated data to sink node [6, 13].
Snapshot 3: Test Bed for Phase III
22 Mobile Element Based Energy Efficient Data …
275
Snapshot 3 shows set up with mobile node and data aggregation and redundancy removal at CHs and at mobile node.
3 Results and Discussion 3.1 Observation using iSense Motes Figure 10 shows how the charge in CH1 is decreasing for every time interval when it transmits and receives the data. It shows comparison of charges obtained at CH1 for Phase I (without data aggregation), Phase II (with data aggregation) and Phase III (with data aggregation and mobile node). Figure 11 shows how the charge in CH2 is decreasing for every time interval when it transmits and receives the data. It shows comparison of charges obtained at CH2 for Phase I (without data aggregation), Phase II (with data aggregation) and Phase III (with data aggregation and mobile node). Fig. 10 Time interval versus charge at CH1
Fig. 11 Time interval versus charge at CH2
276
B. G. Jairam
Fig. 12 Time interval versus charge at sink
Figure 12 shows how the charge in sink is decreasing for every time interval when it receives the data. It shows comparison of charges obtained at sink for Phase I (without data aggregation), Phase II (with data aggregation) and Phase III (with data aggregation and mobile node).
3.2 Observation Using QualNet During simulation, charge at CH1, CH2 and sink has been measured in every phase, and respective graphs have been plotted. Figure 13 indicates the charge consumed at sink. When author looks into the graph, the charge consumption is optimized from Phase I to Phase III. When author plots a graph of charge v/s time, Phase III gives better output in charge consumption. Figure 14 indicates the charge consumed at cluster head 1. When author looks into the graph, the charge consumption is optimized from Phase I to Phase III. When author plots a graph of charge v/s time, Phase III gives better output in charge consumption. Fig. 13 Charge versus time at sink
22 Mobile Element Based Energy Efficient Data …
277
Fig. 14 Charge versus time at cluster head 1
Fig. 15 Charge versus time at cluster head 2
Figure 15 indicates the charge consumed at cluster head 2. When author looks into the graph, the charge consumption is optimized from Phase I to Phase III. When author plots a graph of charge versus time, Phase III gives better output in charge consumption.
4 Comparative Analysis of All Three Phases 4.1 Using iSense Motes Figure 16 shows comparison of average charges at the sink node in every phase using iSense motes.
278
B. G. Jairam
Fig. 16 Average charge versus phases (I, II and III) using iSense motes
Fig. 17 Average charge versus Phases (I, II and III) using QualNet simulator
4.2 Using QualNet Simulator Figure 17 shows comparison of average charges at the sink node in every phase using QualNet simulator (Table 2). 1. 2.
When author compares Phase I and Phase II, charge, i.e. energy consumed in Phase II, is less than Phase I. When author compares Phase II and Phase III, charge, i.e. energy consumed in Phase III is less than Phase II.
Based on the above observation using iSense motes, when author compares charges (mAh), i.e. energy consumption, Phase III (data aggregation without mobile node) approach has 6.5% less than Phase II (data aggregation with mobile node) approach. Similarly, based on the observation using Qualnet, when author compared charge (mWhr) i.e energy consumption, phase III with Qualnet (Data Aggregation with mobile node) approach has 22.5% of less than Phase II (Data Aggregation without mobile node) approach. This will surely help in improving lifetime of WSN.
22 Mobile Element Based Energy Efficient Data …
279
Table 2 Comparative analysis of average charge at sink node using iSense motes and QualNet Phases
Data (temperature) transmission from source to destination
Average charge (mAh) at sink node using iSense motes
Average charge (mWhr) at sink node using QualNet simulator
Phase I
Sensor nodes CHs to sink node to CHs
1224
9069
1339
12,733
Mobile node 1430 (second-level data aggregation) to sink
16,433
Phase II Sensor nodes CHs (aggregates data) to sink (data to CHs node aggregation at CHs) Phase III Sensor nodes (data to CHs aggregation at CHs and at mobile node with sleep schedule)
CHs (the first-level data aggregation) to mobile node
5 Conclusion In this paper, author has implemented data aggregation with mobile node approach for energy efficiency in wireless sensor network. After comparing result statistics of Phase I (without data aggregation) and Phase II (with data aggregation), it shows that energy consumption is less in Phase II because data aggregation is done at both cluster heads. After comparing result statistics of Phase II (with data aggregation) and Phase III (with mobile node and data aggregation at cluster heads and mobile node), it shows that energy consumption is less in Phase III because data aggregation is done at two levels, one at two cluster heads and another at mobile node before sending data (temperature) to the sink, which ultimately improves the lifetime of the wireless sensor network.
6 Future Work Future work can be concentrated on making use of sensors with enhanced computational power, upgraded memory and different communication potentials, heterogeneous embedded systems could be used for prototyping and to address many commercial and social causes.
280
B. G. Jairam
While traversing the network according to the visiting schedule, if the mobile node fails, then data loss will be resulting in the network. To avoid this, research can be extended to handle mobile node failure. Acknowledgements This work is sponsored and supported by grant from VGST, Government of Karnataka (GRD-128). The author wish to thank Dr. S. Ananth Raj, Consultant, VGST, and Prof. G.L. Shekar, Principal, NIE, for their encouragement in pursuing this research work.
References 1. I. Stojmenovic, M. Seddigh, J. Zunic, Dominating sets and neighbour elimination based broadcasting algorithms in wireless networks. IEEE Trans. Parallel Distrib. Syst. 13(1), 14–25 (2002) 2. M. Zhao, M. Ma, Y. Yang, Mobile data gathering with space division multiple access in wireless sensor networks, inProceedings of the Twenty Seventh International Conference on Computer Communications, IEEE INFOCOM (2008), pp. 1958–1965 3. B.G. Jayram, D.V. Ashoka, Merits and demerits of existing energy efficient data gathering techniques for wireless sensor networks. Int. J. Comput. Appl. 66(9), 15–22 (2013) 4. J. Al-Karaki, A. Kamal, Routing techniques in wireless sensor networks: a survey. J. IEEE Wireless Commun. 11, 6–28 (2004) 5. E.M. Saad, M.H. Awadalla, M.A. Saleh, H. Keshk, R.R. Darwish, A data gathering algorithm for a mobile sink in large scale sensor networks, in Proceedings of the Tenth International Conference on Mathematical Methods And Computational Techniques In Electrical Engineering, Bulgaria (2008), pp. 288–294 6. B.I. Yanzhong, L. Sun, J. Ma, N. Li, I. Ali Khan, C. Chen, HUMS: an autonomous moving strategy for mobile sinks in data gathering sensor networks. EURASIP J. Wireless Commun. Netw. 1–15 (2007) 7. https://www.quarbz.com/Wireless%20Sensor%20Network/2.%20iSense%20Devices% 20and%20Modules.pdf 8. https://datasheet.octopart.com/JN5148/001%2C531-NXP-Semiconductors-datasheet-125 13945.pdf 9. W. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energy efficient communication protocol for wireless microsensor networks. Proc. Thirty Third Int. Conf. Syst. Sci. 2, 1–10 (2000) 10. G. Pottie, W. Kaise, Wireless integrated network sensors. Commun. ACM 43(5), 51–58 (2004) 11. M. Xiang, Energy-efficient intra-cluster data gathering of wireless sensor networks. Int. J. Hybrid Information Technol. J. Netw. 5(3), 383–390 (2010) 12. C. Kavitha, K.V. Viswanatha, A pull based energy efficient data aggregation scheme for wireless sensor networks. Int. J. Comput. Appl. 28(11), 48–54 (2010) 13. B.G. Jairam, D.V. Ashoka, Multiple mobile elements based energy efficient data gathering technique in wireless sensor networks, in Digital Business, vol. 21, ed. by S. Patnaik , X.S. Yang, M. Tavana, F. Popentiu-Vl˘adicescu, F. Qiao. Lecture Notes on Data Engineering and Communications Technologies (Springer, Cham, 2019). https://doi.org/10.1007/978-3-31993940-7_12.
Chapter 23
Implementing Development Status Scheme Based on Vehicle Ranging Technology Chun-Yuan Ning, Jun-Jie Shang, Shi-Jie Jiang, Duc-Tinh Pham, and Thi-Xuan-Huong Nguyen
1 Introduction Traffic incidents have become common with the rise in car ownership at home and abroad [1]. People paid extensive attention to technology for self-driving and helped to push technology. In comparison, the distance measurement technology can provide the self-driving system with precise and real-time decision-making, making the vehicle safer and more relaxed [2]. Many distance measurement technologies have been introduced and applied to different vehicle systems nowadays [3]. The intelligent vehicle system is the primary trend in the development of modern transportation [4]. This paper introduces several kinds of ranging technology used in smart transportation, such as the existing ultrasonic ranging, millimeter-wave radar going, laser ranging, and machine vision running. Firstly, the development status is explained. Secondly, the principle is elaborated. Next, their performance C.-Y. Ning (B) · J.-J. Shang · S.-J. Jiang Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fujian Province 350118, China e-mail: [email protected] J.-J. Shang e-mail: [email protected] D.-T. Pham Center of Information Technology, Hanoi University of Industry, Hanoi, Vietnam Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam D.-T. Pham e-mail: [email protected] T.-X.-H. Nguyen Haiphong University of Management and Technology, Haiphong, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_23
281
282
C.-Y. Ning et al.
is analyzed and compared. Finally, we introduce the development trend of vehicle ranging technology.
2 Ultrasonic Ranging Ultrasonic ranging technology was born in the twenty-first century and has been widely studied and applied in robot obstacle avoidance, car reversing, and liquid-level measurement. The main reasons, advantages, and disadvantages of the measurement error caused by ultrasonic ranging are analyzed in [5, 6]. Ultrasonic ranging is a mechanical wave above 20 kHz generated by the transmitting end, which propagates along the probe’s direction. When the ultrasonic wave encounters an object, it is reflected back and received by the ultrasonic receiving sensor [7]. From the time t from the start of the transmission to the end of the reception and the propagation speed of the ultrasonic wave in the current environment, the distance S to be measured is obtained, and the calculation formula is: S = 1/2 × v × t
(1)
Since the speed of sound of the ultrasonic wave is related to the temperature T, in order to improve the accuracy of the distance measurement, it can be further corrected by the method of temperature compensation. The approximate calculation formula of the ultrasonic propagation velocity v is: v = 331.5 + 0.607 × t
(2)
3 Millimeter-Wave Radar Ranging The foreign-based millimeter-wave radar started in the 1960s. The research content mainly includes the recognition of Doppler and micro-Doppler features of moving targets, the recognition of features based on target time–frequency analysis, and the recognition of features based on target one-dimensional distance images [8, 9]. Millimeter-wave radar ranging uses Doppler radar technology to obtain the relative distance between the vehicle and the obstacle by processing the phase and frequency information between the transmitted signal and the recovered signal [10]. When the time difference between the signal sent by the sensor and the received signal is t, the distance is R, and the speed of light is c, the distance between the vehicle and the obstacle can be calculated according to the formula R = t/2 (Fig. 1).
23 Implementing Development Status Scheme …
Transmit a signal
Aerial
Signal processor
283
Receive signal
Algorithm chip
Car brake system Car control circuit
Car alarm system
Fig. 1 Millimeter-wave radar work flowchart
4 Laser Ranging In 1977, the USA took the lead in developing the world’s first handheld small-scale laser range finder [11]. Its high brightness, strong directionality and penetrating power, simple structure, convenient installation and debugging, etc., were gradually widely used. To various industrial and civil fields [12–14]. Laser ranging is a method of determining the measuring distance by irradiating a laser beam onto a target and then passing back and forth the laser’s information. It is mainly divided into two categories: pulsed laser ranging and continuous-wave phase laser ranging. Pulsed laser ranging technology is the time consumed between pulsed light waves and the target to be measured. If the time required for the light to travel in the air at a speed c between the two points A and B is t, then the distance D between the two points A and B can be expressed as follows. D = ct/2
(3)
It can be seen from the above equation that the distance between A and B is actually to measure the time t of light propagation. The principle of continuous-wave phase laser ranging is to measure the measurement distance by irradiating the object to be measured with a constant laser beam and measuring the phase difference between the emitted light and the received light returned by the target [15]. The principle block diagram of the typical phase laser range finder is shown in Fig. 2. Fig. 2 Phase laser ranging schematic
284
C.-Y. Ning et al.
As shown in the above figure, when T is the period of the transmitted signal, the phase difference between the transmitted and received signals is ϕ, and the distance between the transmitted and received signals is: D = ct/2 =
c ϕ 1 2 2π f
(4)
5 Visual Ranging Visual ranging is the use of computer image processing technology to simulate the human eye to perceive information changes through information about the community vehicle’s surrounding environment. Visual ranging has been widely used in lane state estimation, traffic sign signal recognition, license plate recognition, vehicle deviation from lane center estimation, etc. [16–20]. At present, visual ranging mainly includes monocular ranging and binocular ranging.
5.1 Monocular Vision Ranging Monocular vision ranging uses a camera-based on a small hole imaging model. According to the viewing angle measurement principle, the ratio between the object size and the object distance remains unchanged, and the distance of the object is inversely proportional to the image size [21]. When the object’s actual size and some camera parameters are known, the target can be obtained—the distance between.
5.2 Binocular Vision Ranging Technology Binocular vision ranging refers to the simultaneous use of two cameras to complete the ranging task. Binocular vision ranging is based on a triangulation method that achieves the perception of three-dimensional information by mimicking the way humans use binocular parallax to sense distance [22]. As shown in Fig. 3, the two cameras, respectively, image the same object from different positions and then recover the object’s spatial three-dimensional information from the parallax image.
23 Implementing Development Status Scheme … Fig. 3 Schematic diagram of binocular visual stereo imaging
285
P
6 Multi-sensor Fusion 6.1 Comparison of Traditional Ranging Techniques Table 1 shows the technical indicators, advantages, and disadvantages of the traditional car ranging technology, and their respective uses are compared as follows.
6.2 Multi-sensor Fusion It can be seen by comparison of various ranging techniques. Multi-sensor information fusion technology is used to complement multiple sensor spatial and temporal multifeature information, and synthesize (integrate and fuse) according to certain criteria to form a consistent description or interpretation of a certain feature of the environment, thereby overcoming the single sensor has the disadvantages of low reliability and small effective detection range. Therefore, in recent years, sensor information fusion technology has received more and more attention in the field of intelligent vehicle security research. As shown in Fig. 4, the Audi self-driving car is equipped with up to 9 environmental detection sensors.
7 Conclusion This paper primarily addressed the types of distance measuring instruments for vehicles with advantages. With the further advancement of science and technology, there will be more technical forms for distance measurement. The method implementation of single distance measurement had been combined with different methods of distance measurement. The multiple instruments are combined to complement each other and further increases system measurement accuracy and reliability. Each sensor’s data collection, processing, and fusion algorithm can perform timely, accurate, reliable, and environmentally adaptive front vehicle detection, which is of great
286
C.-Y. Ning et al.
Table 1 Comparison of vehicle ranging technologies Ranging type
Technical indicators
Advantage
Disadvantage
Use
Ultrasonic
Measuring range: 0.2–20 m, optimal distance: 4–5 m
Data processing is a simple, fast, low cost, easy to manufacture, and highly adaptable
Large beam angle, poor directionality, low resolution, short-acting distance, and low precision
Close obstacle detection, car reversing collision avoidance system
laser
Measuring range: 20–190 mm, resolution: 0.5–10 µm
Good directionality, small size, high precision, no electromagnetic interference, high measurement accuracy, long detection distance, relative distance measurement
It is difficult to make, the optical system needs to be kept clean, and the measurement performance is vulnerable to environmental interference
Ranging and anti-collision during vehicle driving
Millimeter wave Frequency band:2.4 GHz, 24 GHz, 60 GHz, 122 GHz, 245 GHz
Long detection distance, reliable operation, penetrating rain, fog, and dust
Lower resolution, Distance more expensive measurement and speed measurement during vehicle travel
Machine vision
Large amount of information, improved accuracy of judgment, low cost, small size, no pollution to the environment
Not suitable for heavy environments such as heavy rain and heavy fog, and has a large amount of calculation
CCD, CMOS camera
Close obstacle detection, vehicle ranging, obstacle detection
importance for preventing collision accidents in vehicles and improving driving safety.
23 Implementing Development Status Scheme …
287
PTZ zoom camera GPS positioning system Photon Fixed camera Lidar
Short-range millimeter wave radar
Long-range millimeter wave radar
Ultrasonic sensor Short-range millimeter wave radar
Fig. 4 Multi-sensor layout for context awareness
References 1. D. Pinto, Traffic Incidents Processing System and Method for Sharing Real Time Traffic Information (2008) 2. T.K. Dao, T.S. Pan, J.S. Pan, A multi-objective optimal mobile robot path planning based on whale optimization algorithm. In: 2016 IEEE 13th Int. Conf. Signal Process, pp. 337–342 (2016). https://doi.org/10.1109/ICSP.2016.7877851 3. A.K. Shrivastava, A. Verma, S.P. Singh, Distance measurement of an object or obstacle by ultrasound sensors using P89C51RD2. Int. J. Comput. Theory Eng. 2, 1793–8201 (2010) 4. T.-T. Nguyen, J.-S. Pan, T.-K. Dao, A Novel Improved Bat Algorithm Based on Hybrid Parallel and Compact for Balancing an Energy Consumption Problem (2019). https://doi.org/10.3390/ info10060194 5. Bu. Zhao Haiming, W.J. Yingyong, Z. Zhijin, Study on a high precision ultrasonic distance measurement method. J. Hunan Univ. Sci. Technol. Natural Sci. Edn. 21(3), 35–38 (2006) 6. H. Wang, Research on High Resolution Ultrasonic Ranging Method. Shandong University of Science and Technology (2004) 7. K. Pan, Research on Reversing Anti-collision Alarm System Based on Ultrasonic Ranging. Nanjing University of Posts and Telecommunications (2018) 8. L. Yang, Y. Wang. D. Luo, Study on classification method of moving vehicle based on microdoppler feature. Fire Control Radar Technol. 43(03), 36–39+58 (2014) 9. K. Li, Q. Zhang, B. Liang, Y. Luo, Micro-Doppler modeling and feature extraction of truck targets. J. Appl. Sci. 32(02), 170–177 (2014) 10. R. Wu, D. Jin, S. ZhongY. Dai, Vehicle ranging system based on millimeter wave radar. Automobile Pract. Technol. (02), 33–35 (2019) 11. L.A.M. Rosales, I.A. Badillo, C.A.H. Gracidas, et al., On-road obstacle detection video system for traffic accident prevention. J. Intelligent Fuzzy Syst. 35(1): (2018) 12. Y. Yu, Laser rangefinder and its development trend. Information Command Control Syst. Simulation Technol. (08), 19–21 (2002) 13. M. Zheng, X. Liu, Development status and principle of laser ranging technology at home and abroad. Sci. Technology Innov. Rev. (01), 35 (2014) 14. L. WangA. Xu, W. Wang, Development of laser and method of laser ranging. J. Jiaozuo Univ. (04):55–56+71 (2007) 15. J.-S. Pan, Z. Meng, S.-C. Chu, Xu. Hua-Rong, Monkey King evolution: an enhanced ebb-tidefish algorithm for global optimization and its ap-plication in vehicle navigation under wireless sensor network environment. Tele-Commun. Syst. 65(3), 351–364 (2017) 16. G. Jiang, M. Yu, Vision-based lane state estimation. J. Circuits Syst. (03), 6–10 (2001)
288
C.-Y. Ning et al.
17. W. Liu, Research on Target Discrimination Method of Vehicle Active Cruise System. Liaoning University of Technology (2017) 18. L. Huang, Y. Zhang, Traffic sign recognition using deep convolutional neural networks. Modern Electronic Technique 38(13), 101–106 (2015) 19. J. Niu, J. Yu, M. Li, Research and implementation of vehicle license plate recognition algorithm. Electronic Measurement Technol. 41(06), 45–49 (2018) 20. W. Liu, Research on Key Technologies in Lane Departure Warning System Under Monocular Vision. Nanjing University of Aeronautics and Astronautics (2014) 21. F. Guo, L. Guofu, Z. Ning, Analysis of three-dimensional target positioning accuracy of compound eyes. Infrared Laser Eng. 43(12), 4088–4093 (2014) 22. G. Fang, K. Wang, Q.-L. Wu, Development of multi-channel large field of view target locator. Optical Precision Eng. 21(01), 26–33 (2013)
Chapter 24
Blockchain Based Electronic Health Record Management System for Data Integrity Neetu Sharma and Rajesh Rohilla
1 Introduction The most precious elements for healthcare industry is health data [1]. Confinement of diverse natures of various health records is a significant issue [2]. Distributed ledger technology has attracted the business owners in different fields including medical [?]. Blockchain contributes digital data security, transparency, trustworthiness and also provides secured health data sharing? [3]. An optimized cybercrime free design can be achieved for safe transfer of health data by using distributed ledger technology [4]. The complex nature of healthcare record management system can be improved using blockchain [5]. Blockchain can play an important role in handling variety of health data like genomic information [6]. Blockchain eliminates the requirment of third party in scenarios that are suspected to be impacted by cybercrimes [7]. When health records are recorded in a distributed manner then it is almost impossible to access the data illegaly [8]. In the process of exchanging or sharing data, blockchain confirms that data cannot be tampered [9].
2 Related Work This section covers the prior work related to EHR management systems. In [10], the author presented a blockchain based medical information monitoring system in cloud conditions. The author enlightened the secured sharing of genomic data of cancer patients in [11]. The challenge of latency in transferring the treatment case to other healthcare institute was showcased in [12]. Validation strategy for electronic health record using distributed technology was designed in [13]. The safe exchange N. Sharma · R. Rohilla (B) Delhi Technological University, Shahbad Daulatpur, Main Bawana Road, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_24
289
290
N. Sharma and R. Rohilla
of EHRs using blockchain was developed in [14]. The system for certificateless cloud based healthcare data was explored in [15]. Participant oriented structure for EHR was designed in [16]. Usage of blockchain in healthcare including electronic health record was investigated in [17, 18]. The blockchain based sharing system for fitness monitoring data among professionals was developed in [19]. Cloud supported and protected electronic health record system were developed in [20]. In [21], scalable electronic healthcare system for medical records using blockchain was designed. Options for exchanging healthcare data were evaluated in [22]. In [23], the effective and customized telemedicine system using blockchain was suggested. In [24], cloud assisted medical information sharing model using distributed ledger technology is designed. In [25], protection defensive health data sharing mechanism using blockchain potential was explored. In [26], a prototype for access control based health record management was recommended. A design for integral solution for medical record handling was introduced in [27]. In [28], it was evaluated that blockchain based systems are complicated than traditional sharing systems. Literature survey shows that blockchain is appropriate for EHR management due to their enormous advantages like security, immutability, transparency, and provenance of the stored data.
3 Overview of Proposed Work Most common challenge faced by traditional healthcare system in managing EHR is system failure if it is handled by single centralized unit. In the absence of past health record, fresh diagnosis and lab tests are required to be done that increases treatment cost and time. Also such system has lack of interoperability to share EHR in secured manner that may cause improper treatment delivered to the patients. Even sometimes patient pay money for services and medicines never availed by them due to lack of EHR visibility. Potential of blockchain technology can be utilized to add integrity in the EHR management system. This technology can prevent attackers from blocking EHRs for forgery purpose. This technology facilitates multiple users to work together for improving public health.
3.1 Proposed System Architecture of EHR Management Proper management of EHR is essential to improve public health. The proposed model overcomes the challenges of traditional healthcare system by covering all the aspects of healthcare domain like privacy, security, transparency, proper record management to make the smooth communication among all the participant members like doctors, patients, healthcare providers, pharmaceutical companies, insurance companies, etc. Endorsement policy like patient consent is necessary to keep the confidentiality of sensitive data so that only genuine healthcare providers, medical researchers or insurance companies can have EHR access rights.
24 Blockchain Based Electronic Health Record Management …
291
Fig. 1 Architecture of proposed EHR management system
Figure 1 illustrates the architecture of proposed EHR management system. The developed system is a three layer structure that includes user layer, application layer and blockchain layer. The user layer is developed to permit health organizations (hospitals) to send EHRs to the patients over blockchain network and also empowers EHR requesters (doctors, medical researchers, pharma companies and insurance companies) to view the transferred data on blockchain ledger with the consent of patient. The application layer is used for handling the EHR uploading and retrieval process between user and blockchain layer. The various purposes of blockchain layer are as follows. 1. 2. 3. 4.
Broadcasting the uploaded EHR to all blockchain network nodes. Creating the new EHR block. Mining the new EHR block. Vallidation of new EHR block by all network nodes to append it into the blockchain ledger.
In blockchain layer, first block is representing the genesis block and rest blocks representing the transferred EHRs(EHR1, EHR2...EHRn).
292
N. Sharma and R. Rohilla
In proposed system, each patients EHR is added and updated on blockchain network by hospitals. EHR can be accessed by requesters with the consent of patient for various purposes such as better treatment, new drug development, insurance claim, etc. At first, health organizations places transaction request by transferring drug EHR data to the patient over blockchain ledger by means of application layer. Upon transferring data, unique EHR ID is created and sent back to the sender by blockchain layer. Transferred data is communicated to all the nodes of blockchain layer. A new block indicating uploaded EHR is mined to calculate its hash value using SHA256 algorithm. The block hash depends on the previous block hash, current block information and nonce. The procedure of mining gives proof of work for new block verification. Blocks are encrypted by hashes to secure information inside blocks. After mining process, this new block is approved by all nodes, and block containing EHR is finally added into the blockchain network. After transferring EHR, it can be retrieved using block hash or EHR ID by recipient, i.e., patient in proposed system. All EHR requesters need the consent of patient for accessing EHR. If patient accept, the EHR retrieval request by acknowledging with the EHR ID then only requesters can access this confidential data.
3.2 Implementation and Result This design study suggests a blockchain-based approach for EHR management. Distributed technology has potential to maintain integrity of EHRs in secured and distributed manner. In developed system, EHRs are sent to the patients by healthcare organizations and can be retrieved with the consent of patient by EHR requesters. In proposed model, EHRs are designed in json format, visual studio code software is used to develop the own blockchain and node js programming language is used
Fig. 2 Illustrates the EHR modules of developed system
24 Blockchain Based Electronic Health Record Management …
293
for source coding. The implemented design has four building modules which are EHR uploading, EHR mining, EHR addition and EHR retrieval. Figure 2 illustrates the EHR modules of developed system. In EHR uploading module, healthcare organizations can transfer EHRs (EHR1, EHR2, EHR3.EHRn) of their patients over blockchain network. EHR includes patient name, age, weight, blood pressure, diet chart, diagnosis, treatment, medication, lab test and radiology results, vaccinations, allergies and sender name. Request response is sent to the EHR sender and uploaded EHR is broadcasted as pending transaction to all network nodes of the blockchain ledger. A unique EHR ID is generated for each uploaded EHR using uuid(universally unique identifier) function. In EHR mining module, further proof of work for upladed
Fig. 3 Execution result of the EHR addition module
294
Fig. 4 Execution result of the EHR retrieval module
N. Sharma and R. Rohilla
24 Blockchain Based Electronic Health Record Management … Table 1 Comparitive result analysis of proposed work with prior works Qualitative Lee et al. [4] Lo et al. [12] Zhou et al. [19] metrics √ √ √ Blockchain system √ √ √ EHR system √ Architecture × × availability √ Design testing × ×
295
Proposed work √ √ √ √
EHR is obtained by miner node. Miner node called validator node creates new block for EHR using SHA(secure hash algorithm)-256 algorithm. Obtained new EHR block contains current EHR, block index, timestamp, nonce, current block hash and hash of previous block. Upon current block mining, EHR addition module confirm approval through all blockchain nodes. If EHR block is approved by all nodes, then it is appended into blockchain ledger. Figure 3 shows the execution result of EHR addition module. In EHR retrieval module, EHR requesters can retrieve the EHR of any patient with their consent using EHR ID safely. Figure 4 shows the execution result of EHR retrieval module.
3.3 Comparative Analysis of Result Table 1 shows the comparitive result analysis of proposed scheme with prior works based on various performance parameters. The proposed system as well all existing systems are blockchain based and designed for managing electronic health records. We developed the architecture and also presented the design test of developed blockchain system which was unavailable in existing works.
4 Conclusion Blockchain technology plays an important role in cybersecurity and can be used to provide integrity to the EHR management system. This technology has capability of preventing attackers from blocking EHR data. This work proposes a blockchain based approach for EHR management that permits healthcare organizations to upload and send EHRs to their patients and also empowers EHR requesters to retrieve the confidential EHR with the consent of patient. The developed system can be highly suitable to ensure availability of genuine data during future treatment, insurance claim, drug discovert, etc. The designed model of proposed system consists of four
296
N. Sharma and R. Rohilla
modules which are EHR uploading, EHR mining, EHR addition and EHR retrieval. The execution results indicate that created blockchain based model can maintain the data integrity of EHR system.
References 1. A. Farouk, A. Alahmadi, S. Ghose, and Atefeh Mashatan (Vision and future opportunities. Computer Communications, Blockchain platform for industrial healthcare, 2020) 2. B. Houtan, A. Senhaji Hafid, D. Makrakis, A survey on blockchain-based self-sovereign patient identity in healthcare. IEEE Access 8, 90478–90494 (2020) 3. L.A. Linn, M.B, Koo, Blockchain for health data and its potential use in health it and health care related research, in ONC/NIST Use of Blockchain for Healthcare and Research Workshop. Gaithersburg, Maryland, United States: ONC/NIST, pp. 1–10 (2016) 4. H-A. Lee, H.-H. Kung, J. Ganesh Udayasankaran, B. Kijsanayotin, A.B. Marcelo, L.R. Chao, C.-Y. Hsu, An architecture and management platform for blockchain-based personal health record exchange: Development and usability study. Journal of Medical Internet Research, 22(6):e16748, 2020 5. S. Tanwar, K. Parekh, R. Evans, Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J. Inf. Secur. Appl. 50, 102407 (2020) 6. K.J. McKernan, The chloroplast genome hidden in plain sight, open access publishing and anti-fragile distributed data sources. Mitochondrial DNA Part A 27(6), 4518–4519 (2016) 7. K. Shuaib, H. Saleous, K. Shuaib, N. Zaki, Blockchains for secure digitized medicine. J. Person. Med. 9(3), 35 (2019) 8. D. Ivan, Moving toward a blockchain-based method for the secure storage of patient records, in ONC/NIST Use of Blockchain for Healthcare and Research Workshop. Gaithersburg, Maryland, United States: ONC/NIST, pp. 1–11 (2016) 9. X. Yue, H. Wang, D. Jin, M. Li, W. Jiang, Healthcare data gateways: found healthcare intelligence on blockchain with novel privacy risk control. J. Med. Syst. 40(10), 218 (2016) 10. A. Mubarakali, Healthcare services monitoring in cloud using secure and robust healthcarebased blockchain (srhb) approach (Mob. Netw, Appl, 2020) 11. B.S. Glicksberg, S. Burns, R. Currie, A. Griffin, Z.J. Wang, D. Haussler, T. Goldstein, E. Collisson, Blockchain-authenticated sharing of genomic and clinical outcomes data of patients with cancer: a prospective cohort study. J. Med. Internet Res. 22(3), (2020) 12. Y.-S. Lo, C.-Y. Yang, H.-F. Chien, S.-S. Chang, C.-Y. Lu, R.-J. Chen, Blockchain-enabled iwellchain framework integration with the national medical referral system: development and usability study. J. Med. Internet Res. 21(12) (2019) 13. F. Tang, S. Ma, Y. Xiang, C. Lin, An efficient authentication scheme for blockchain-based electronic health records. IEEE Access 7, 41678–41689 (2019) 14. Dinh C Nguyen, Pubudu N Pathirana, Ming Ding, and Aruna Seneviratne. Blockchain for secure ehrs sharing of mobile cloud based e-health systems. IEEE access, 7:66792–66806, 2019 15. Huixian Shi, Rui Guo, Chunming Jing, Shaocong Feng, Efficient and unconditionally anonymous certificateless provable data possession scheme with trusted kgc for cloud-based emrs. IEEE Access 7, 69410–69421 (2019) 16. J.H. Beinke, C. Fitte, F. Teuteberg, Towards a stakeholder-oriented blockchain-based architecture for electronic health records: design science research study. J. Med. Internet Res. 21(10), (2019) 17. H.M. Hussien, SMd Yasin, S.N.I. Udzir, A.A. Zaidan, B.B. Zaidan, A systematic review for enabling of develop a blockchain technology in healthcare application: taxonomy, substantially analysis, motivations, challenges, recommendations and future direction. J. Med. Syst. 43(10), 320 (2019)
24 Blockchain Based Electronic Health Record Management …
297
18. A.H. Mayer, C.A. da Costa, R. da Rosa Righi, Electronic health records in a blockchain: a systematic review. Health Inf. J, 1460458219866350 (2019) 19. T. Zhou, X. Li, H. Zhao, Med-ppphis: blockchain-based personal healthcare information system for national physique monitoring and scientific exercise guiding. J. Med. Syst. 43(9), 305 (2019) 20. S. Cao, G. Zhang, P. Liu, X. Zhang, F. Neri, Cloud-assisted secure ehealth systems for tamperproofing ehr via blockchain. Inf. Sci. 485, 427–440 (2019) 21. T. Motohashi, T. Hirano, K. Okumura, M. Kashiyama, D. Ichikawa, T. Ueno, Secure and scalable mhealth data management using blockchain combined with client hashchain: system design and validation. J. Med. Internet Res. 21(5) (2019) 22. A.A. Vazirani, O. O’Donoghue, D. Brindley, E. Meinert, Implementing blockchains for efficient health care: systematic review. J. Med. Internet Res. 21(2), (2019) 23. R. Guo, H. Shi, D. Zheng, C. Jing, C. Zhuang, Z. Wang, Flexible and efficient blockchain-based abe scheme with multi-authority for medical on demand in telemedicine system. IEEE Access 7, 88012–88025 (2019) 24. X. Liu, Z. Wang, C. Jin, F. Li, G. Li, A blockchain-based medical data sharing and protection scheme. IEEE Access 7, 118943–118953 (2019) 25. Y. Wang, A. Zhang, P. Zhang, H. Wang, Cloud-assisted ehr sharing with security and privacy preservation via consortium blockchain. IEEE Access 7, 136704–136719 (2019) 26. E.-Y. Daraghmi, Y.-A. Daraghmi, S.-M. Yuan, Medchain: a design of blockchain-based system for medical records access and permissions management. IEEE Access 7, 164595–164613 (2019) 27. A. Shahnaz, U. Qamar, A. Khalid, Using blockchain for electronic health records. IEEE Access 7, 147782–147795 (2019) 28. P. Esmaeilzadeh, T. Mirzaei, The potential of blockchain technology for health information exchange: Experimental study from patients’ perspectives. J. Med. Internet Res. 21(6) (2019)
Chapter 25
Strategy of Fuzzy Approaches for Data Alignment Shashi Pal Singh, Ajai Kumar, Lenali Singh, Apoorva Mishra, and Sanjeev Sharma
1 Introduction Fuzzy string matching is evolved from traditional database search. The only difference between traditional database search and fuzzy string matching is that results for a string are returned on the basis of likely relevance, i.e. threshold percentage in case of fuzzy string matching, rather than an exact match in case of the traditional database search. Fuzzy matching is a process to check and identify the string and words in the given sentences which are not exactly same, and a threshold value is set with respect to which matching sentences are recognized and displayed. This paper all about the fuzzy matching tool for fuzzy string matching with data alignment. Several fuzzy matching algorithms are used for this purpose [1, 2]. The algorithms which we have used are N-grams, Levenshtein distance, Jaro-Winkler, Cosine Similarity and Boyer Moore. The remainder of the paper includes the methodology of this fuzzy matching tool. We then present the outputs and results of the S. P. Singh (B) · A. Kumar · L. Singh AAIG, Center for Development of Advanced Computing (C-DAC), Pune, India e-mail: [email protected] A. Kumar e-mail: [email protected] L. Singh e-mail: [email protected] A. Mishra · S. Sharma Indian Institute of Information Technology (IIIT), Pune, India e-mail: [email protected] S. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_25
299
300
S. P. Singh et al.
Fig. 1 Data market [3]
tests and benchmarks used for comparing the efficiency and accuracy of the string matching methods implemented. Data Warehousing Market [3] the global data and unaligned size extend and exceeded over USD 13 billion globally in the year 2018, and is estimated to raise at more than 12% CAGR among 2019 and 2025, so we need to develop, correct and accurate heuristic for the data alignment (Fig. 1). In the further section, will see about the literature review and study about various algorithms and their approaches. Methodology section will define the process or the mechanism how you are going to implement the different algorithms to get the better accuracy and their mathematical formulas. Testing and evaluation section will illustrate the comparison of the algorithms and last section is about the conclusion and future works.
2 Literature Survey An exact match is when the source segment matches word to word with a segment in translation memory. It is a 100% match. For example: • Savia goes to the playground (source segment). • Savia goes to the playground (translation memory segment).
25 Strategy of Fuzzy Approaches for Data Alignment
301
2.1 Fuzzy Match Fuzzy match is a match between 1 and 99%. This is a process which helps us to identify duplicates/copies or contests/words that are not exactly the same. In the Fuzzy Match tool and technique, users can set or decide the thresholds levels for acceptable of difference between records. For example: • Savia goes to the playground. (source segment) • Shikha goes to the playground. (translation memory segment) The parameters of the tools and systems should be extensive enough to yield into consideration with minor nuances in a name (including errors in data entered), but not too widespread so that we are left with a large or high number of “false positives”. It is used in spam filtering, spell checking, OCR (Optical Character Recognition), etc. There are certain factors which affect the fuzzy match score. Some of them are: punctuation, word order, formatting and tags, stop words, partial substring matches and matches longer than the source segment. Here we will see the comparison of all fuzzy matching algorithms mostly used in alignment of data. – – – – –
Jaro Winkler Algorithm Levenshtein distance Algorithm N-grams Algorithm Cosine Similarity Algorithm Boyer Moore Algorithm.
3 Methodology Initially, the user inputs a string, the string visits various algorithms. Then a percentage score is calculated with respect to every algorithm. Further, the score is checked against the threshold set by the developer. If the score crosses the threshold, then result is displayed. There are several algorithms which are used for calculating the fuzzy match; the ones used are discussed below (Fig. 2):
3.1 Jaro Winkler Algorithm [4] The Jaro distance among two words or tokens is the minimum number of single character transpositions required to alteration or required one word into the other. The score is normalized in a way that 0 equals to no similarity and 1 shows an exact match.
302
S. P. Singh et al.
Fig. 2 The basic mechanism
The Jaro–Winkler distance metric is best suited and designed for short strings. However, a disadvantage of Jaro–Winkler algorithm is that it goes for transpositions from first word of user input sentence with the first word of sentence from database and goes on. It works better when short strings are compared [5]. Jaro similarity =
0, 1 3
m |s1|
+
m |s2|
+
m−t m
if m = 0 , for m! = 0
(1)
where • m is the number of matching characters • t is half the number of transpositions • where |s1| and |s2| is the length of string s1 and s2, respectively The expression below defines the maximum distance upto which the character is to be searched. Farther than max(|s1|, |s2|) −1 (2) 2 For example, the two strings are: time goes and time runs (Table 1).
25 Strategy of Fuzzy Approaches for Data Alignment Table 1 Maximum distance calculation
303
0
T
0
1
i
1
i
2
m
2
m
3
e
3
e
4
T
4
5
g
5
r
6
o
6
u
7
e
7
n
8
s
8
s
Applying the Jaro formula with m = 6, t = 0, 1/3(6/9 + 6/9 + 6/6) = 7/9 = 0.7777. As some of the starting characters of two strings are same, prefix weighing concept will also be applied for better results. d w = d j + (lp(1 − d j)) where dj = jaro distance. l = length of matching characters. p = 0.1 dw = the prefixed weight score. According to the formula, 0.7777 + 0.1*4*(1 − 0.7777) = 0.86666. It means the above two strings are 86.66% match.
3.2 Levenshtein Algorithm [6] Next comes the Levenshtein algorithm which works on number of edits (insertions, deletions or substitutions) to be done between words to convert into the other word [7]. Also known as edit distance algorithm, it counts number of single character edits (insertions, deletions or substitutions) needed to alter one string to the other string (Table 2). Mathematically, the Levenshtein distance between two strings a and b is given by leva, b (|a|, |b|) where ⎧ max(i, j) if min(i, j) = 0 ⎪ ⎪ ⎧ ⎨ leva,b (i − 1, j) + 1 ⎨ leva,b (i, j) = ⎪ min leva,b (i, j − 1) + 1 otherwise. ⎪ ⎩ ⎩ leva,b (i − 1, j − 1) + 1(ai =bi )
(3)
304
S. P. Singh et al.
Table 2 Levenshtein algorithm Distance
India
is
a
diverse
Country
0
1
2
3
4
5
India
1
0
Is
2
Great
3
Country
4 is
a
diverse
country
Distance
India 0
1
2
3
4
5
India
1
0
1
2
3
4
Is
2
1
0
Great
3
Country
4 India
Is
a
diverse
country
Distance India
0
1
2
3
4
5
1
0
1
2
3
4
Is
2
1
0
1
2
3
Great
3
2
1
1
2
3
Country
4
3
2
2
2
2
Lev a, b (i − 1, j) + 1 //deletion. Lev a, b (i − 1, j) + 1 //insertion. Lev a, b (i − 1, j) + 1 //substitution. For example, suppose the strings are. “India is a diverse country” and. “India is great country” then a is deleted. Diverse → great (replace diverse by great).
3.3 N-grams Algorithm [8] This algorithm works at character level or word level. We break down a character string into multiple character strings each of length “n” (Fig. 3). If n is equal to 2, then it is called bi-gram. If n is equal to 3, then it is called tri-gram and so on. For example, applying n-grams on the text “vwxyz” would give the following components: vwx, wxy, xyz,
According to Markov model [9], “Only previous one word is enough to predict the probability of the next word”. In this algorithm, if one word from middle is missing,
25 Strategy of Fuzzy Approaches for Data Alignment Fig. 3 N- Gram approach
N=
Savia
305
Singh
is
a
good
girl
Unigram
Bigram
trigram
Savia, Singh, is, a, good, girl
Savia Singh, Singh is, is a, a good, good girl
Savia Singh is, Singh a good, a good girl
Table 3 N-grams Algorithm I, am, not, special, but, just, limited, edition
One Gram
I am, am not, not special, special but, but just, just limited, limited edition
Bi Grams
I am not, am not special, not special but, special but just, but just limited, just limited edition
Tri Grams
then it will not form the grams and won’t show any match. This algorithm checks a sentence based on the grams set by the developer. It is actually used to predict the next item in the text. For example, we take sentence and see the result as shown in Table 3. “I am not special but just limited edition”.
3.4 Cosine Similarity [10] Cosine matching determines the similarity between two strings.
A vector is a way of representing distance as well as direction. Cosine similarity among two vectors, taking/considering origin as the reference, indicates how closely the two vectors point in the similar direction. For example, if we want to determine sentiment of two news article whether positive or negative, it would be useful to use cosine similarity, whether their sentiment tends to one side or not. Since it compares on basis of words so it won’t match the words like “night” and “nights”. It would treat them as purely different words. In this algorithm, the unique words of user input sentence and sentences from database are taken one by one and accordingly, the percentage match is calculated. The number of matching words increases the % match. The position of words does not matter (Table 4). Let’s take an example: “Everything happens for a reason in life”
306
S. P. Singh et al.
Table 4 Cosine similarity
Word
Vector A
Vector B
Everything
1
1
Happens
1
1
For
1
0
A
1
0
Reason
1
1
In
1
1
Life
1
1
Of
0
1
That
0
1
Hope
0
1
“Reason of everything in life to happen is hope”
Cosine similarity [11] is computed using the following formula: n Ai Bi A· B Cosine Similarity = cos(θ ) = = i=1 n n AB A2 i=1
i
(4)
2 i=1 Bi
Vector AB = 5VectorA_Sq = 7 Vector B_Sq = 8. Substituting the values, we get 0.668 as the score.
3.5 Boyer Moore Algorithm [12] Boyer Moore algorithm matches the pattern provided by user starting from the last of the pattern and travelling in backward direction. If the pattern is found, then ok; if not, the pattern is shifted in forward direction. This search is good for large text and small pattern, but if text increases, then this is not feasible [13]. Boyer Moore : max (1; length of Pattern−actual Character Index−1)
(5)
25 Strategy of Fuzzy Approaches for Data Alignment
Today is my birthday treat. eat Today is my Birthday treat
eat
307
----Example for analysis ----this is pattern to be searched Today is my Birthday treat
eat
Today is my Birthday treat
Today is my Birthday treat
eat
eat
Today is my Birthday treat
eat
Today is my Birthday treat
eat
Today is my Birthday treat
eat
Today is my Birthday treat
eat
At last, the pattern matches with the given text in this case. To get better results, some features using regex expressions are embedded in this module. In a sentence, if the following comes, they are treated as same with respect to their category: week days, month, dates, number, and time. For example: (1) “On every Monday, she plays football”. This would give 100% match even if it would have been any other week day. “On every Tuesday, she plays football”. “On every Wednesday, she plays football”. “On every Thursday, she plays football”. “On every Friday, she plays football”. “On every Saturday, she plays football”. “On every Sunday, she plays football”. All the above sentences give 100% match. (2) “My train is at 6::30::30 in evening”. Any sentence with different timings would be considered same. Like: “My train is at 7::05::00 in evening”.
308 Table 5 Comparison of Algorithms
S. P. Singh et al. Algorithm
Output (Fuzzy matching string)
Matching percentage (%)
N-Grams
I was assassinated on 01-June-1994
80.00
Levenshtein distance
I was assassinated on 01-June 1994
93.103
I assassinated on 14-June-1974
86.207
I assassinated on 01-June-1994
97.115
he saw an old yellow truck
72.556
I assassinated on 04-June-1974
93.249
Jaro-Winkler
Cosine Similarity I was assassinated on 01-June-1994
83.333
Boyer Moore
83.333
I was assassinated on 01-June-1994
In same way, all features work. Depending on the nature of the algorithm, the positioning of these features matter.
4 Testing and Evaluation of Algorithms We have tested the fuzzy matching tool against various slots of sentences. And here is the table showing the comparison of the outputs of different fuzzy matching algorithms against a particular string. Percentage matching (score) of strings is calculated according to a particular formula of the algorithm used. And if the percentage matching is found to be greater than or equal to 70% (the threshold which we have set), for some string in search space, then that string is displayed as the output of fuzzy matching. Now, we will see the comparison through an example for all the above said algorithms (Table 5; Fig. 4). For example, Let the string to be tested be “he was born on 24-July-1947”.
5 Conclusion and Future Scope Comparison of all the above-mentioned algorithms tells us that our system works well for Jaro Winkler Algorithm, which gives the maximum possible matching results. Our experiments were performed using English as well as Hindi sentences. We always
25 Strategy of Fuzzy Approaches for Data Alignment
Comparison of Algorithms
120 100
309
80
93.103
86.207
80
97.115
93.249 72.556
83.333 83.333
60 40 20 0
Matching percentage(%) N-Grams
Levenshtein distance
Jaro-Winkler Cosine Similarity
Boyer Moore
Fig. 4 Comparison of algorithms
consider the sentence which gives maximum matching percentage to be the most appropriate match. As a future work, we can give 100% match for the sentences with name variations e.g. “Ram” and “Sita”; different spelling of names, shortened names, etc. insertion/deletion of punctuation and spaces, inadvertent misspellings, phonetic spellings, deliberate misspellings, abbreviations, e.g. “Ltd” instead of ”Limited”, and accented characters like “é”, “ê”, “ç”, “ñ” and “à”. This approach can also be incorporated for bilingual or multilingual machine translation. The system can be further improved by making the result editable in case of inexact match so that user can make the needful changes. Domain-specific content shall be added to the database to enrich it.
References 1. S.P. Singh, A. Kumar, Hindi to English transfer based machine translation: an article. Int. J. Adv. Comput. Res. (2015) 2. S.P. Singh, H. Darbari, Hindi-English translation memory systems: an article. Int. J. Emerging Trends Technol. Comput. Sci. (2015) 3. Data Warehousing Market Statistics—Global 2025 Forecasts,Published Date: September 2019 | 265 Pages | Report ID: GMI3744. https://www.gminsights.com/industry-analysis/data-war ehousing-market 4. W.E. Winkler, Y. Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C. (1991) 5. Geeksforgeeks, Jaro and Jaro-Winkler similarity, https://www.geeksforgeeks.org/jaro-andjaro-winkler-similarity/amp/ (2020) 6. Devopedia, Levenshtein Distance. Version 5, September 4. Accessed 2020-09-30 (2019). https://devopedia.org/levenshtein-distance
310
S. P. Singh et al.
7. M. Gilleland, Merriam Park Software,Levenshtein Distance, in Three Flavors. https://peo ple.cs.pitt.edu/~kirk/cs1501/Pruhs/Spring2006/assignments/editdistance/Levenshtein%20D istance.htm (2020) 8. V. John, T. Konstantinos, V. Iraklis, V. Theodora, Text classification using the N-Gram graph representation model over high frequency data streams, in Frontiers in Applied Mathematics and Statistics,vol. 4, p. 41 (2018) 9. D.A. Liberles, B.R. Holland, Encyclopedia of Bioinformatics and Computational Biology (2019) 10. X. Zhu, S. Su, M. Fu et al., A cosine similarity algorithm method for fast and accurate monitoring of dynamic droplet generation processes. Sci. Rep. 8, 9967 (2018). https://doi.org/10.1038/s41 598-018-28270-8 11. GloVe: Global Vectors for Word Representation,Word Vectors-Cosine Similarity, https://www. kaggle.com/cdabakoglu/word-vectors-cosine-similarity (2020) 12. J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd edn. Published Date: 22nd June 2011 13. Global Software Support, Algoritms, Data Structures. https://www.globalsoftwaresupport. com/boyer-moore-substring-search-algorithm/ (2020) 14. Hume; Sunday (November 1991). Fast string searching. Software—Practice and Experience 21(11), 1221–1248. https://doi.org/10.1002/spe.4380211105. S2CID 5902579 15. “The Soundex Indexing System". National Archives and Records Administration. 2007-05-30. Retrieved 2010-12-24.
Chapter 26
Microgrid System and Its Optimization Algorithms Chun-Yuan Ning, Jun-Jie Shang, Thi-Xuan-Huong Nguyen, and Duc-Tinh Pham
1 Introduction The traditional high-voltage transmission mode with large capacity, concentration, and long-distance shows the disadvantages of vehicles with high operating costs, difficulty in operation, and large regulating capacity [1]. Distributed generation has many advantages, such as less pollution, high reliability, high energy utilization efficiency, flexible installation location, etc., and effectively solves many potential problems of sizeable centralized power grid [2]. However, when the distributed power supply is connected to the power grid, it will significantly impact the power grid. To solve this problem, the concept of microgrid was proposed [3]. Microgrids combine generators, loads, energy storage devices, and control devices to form a single controllable unit that simultaneously supplies electricity and heat to users [4]. The micropower supply in the microgrid is connected to the user side, which has the characteristics of low cost, low voltage, and low pollution. The microgrid
C.-Y. Ning (B) · J.-J. Shang Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian Province, Fujian University of Technology, Fuzhou 350118, China e-mail: [email protected] T.-X.-H. Nguyen Haiphong University of Management and Technology, Haiphong, Vietnam e-mail: [email protected] D.-T. Pham Center of Information Technology, Hanoi University of Industry, Hanoi, Vietnam Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam D.-T. Pham e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_26
311
312
C.-Y. Ning et al.
can be connected to the large grid or run separately when the grid is out of order or in need. This paper reviews the development and application of microgrid technology, for the micropower grid are summarized first, and then focus on microgrid control strategy, energy storage, protection mechanism, and outlines the research on the fundamental techniques such as the planning design, to raise some issues which need to solve, and the future development of microgrid technology was discussed.
2 The Basic Structure of a Microgrid A microgrid, or small power grid, is small compared with the traditional large power grid, but its function is not impaired. The microgrid concept has not been unified, and different visions of the microgrid have been proposed in other countries according to their actual conditions [4]. Generally, the microgrid consists of a distributed power supply, load, energy storage device, control device, and other parts. It can realize two operation modes, namely isolated island (directly connected to users), connected to the grid (connected to a large power grid), and smoothly switch between them. The composition and functions of the microgrid are shown in Fig. 1. From the system level, a microgrid is a kind of modern power electronics technology that combines equipment and devices. From the perspective of a large-scale power grid, the micropower grid is a micro and controllable small power grid with high sensitivity and can quickly control other devices and equipment. From the perspective of users, microgrid can meet many requirements of users, including reducing power loss, saving cost, improving voltage stability, etc., which can be realized through microgrid technology [5].
User
Distributed Generation
Load Microgrid Energy storage device
Control device
Fig. 1 Structure and function of microgrid
Bulk power systems
26 Microgrid System and Its Optimization Algorithms
313
3 The Operation Mode of Microgrid A microgrid can be regarded as either a small power system or a virtual power source or load in a distribution network. Microgrid can be divided into the grid-connected mode and isolated mode according to its operation mode [6].
3.1 Grid-Connected Mode In the grid-connected mode, the purpose of control is to rationally utilize the resources and equipment in the microgrid and meet the needs of the upper grid for some auxiliary services of the microgrid through the reasonable dispatch of distributed energy within the microgrid and the coordination of the relationship between the microgrid and the external grid. At an appropriate time, in the grid-connected mode, if there is surplus power in the microgrid, it can be used as a powerful model to sell surplus power to the external grid through the retail market on the distribution network side. In addition to all distributed power units, controllable resources on the demand side can also participate in the market bidding [7, 8].
3.2 Isolated Mode When the microgrid operates independently, the voltage and frequency deviations caused by fluctuations in renewable energy and load are usually compensated by the local control of distributed power within the microgrid. Microgrid energy management system’s main function is through the charging and discharging management of energy storage system, the adjustable output of the distributed power supply, such as fuel cells, diesel generator scheduling, control of the load side, ensure the power grid in power generation and the demand of real-time power balance, to prevent overcharge and discharge of the battery and ensure the long-term and stable operation of the microgrid [9, 10] The mutual transformation among various operating states of the microgrid is shown in Fig. 2.
314
C.-Y. Ning et al.
Operation state of grid connection of microgrid
Emergency de-alignment control Grid control
Optimized and coordinated control
Disassembly transition state of microgrid
Transition state of grid connection
Isolated operation state of microgrid
Restore/connect control
Microgrid outage state
Restore control
Restoration of transition state of microgrid
Fig. 2 Operation state of microgrid
4 Optimal Configuration of Microgrid 4.1 A Microgrid Model In the modeling of microgrid planning and design, reasonable optimization variables, objective functions, and constraints should be selected from different perspectives, such as technology, economy, and environment, according to load demand and distributed energy, and based on the quasi-steady operation model of each device, to form a mathematical description of the planning and design problem [11–13]. Generally, it can be expressed in the following form: min f i (X ), i = 1, 2, 3, . . .
26 Microgrid System and Its Optimization Algorithms
⎧ ⎨ G(X ) = 0 s.t. H (X ) ≤ 0 ⎩ X ∈
315
(1)
where X represents the optimization vector; F is the target function; said the feasible solution space; G and H, respectively, represent the set of functions constituted by equality constraints and inequality constraints. Due to the differences in design objectives, distributed power supply types, and operation characteristics in the planning and design stage, different microgrids’ model details differ significantly. Generally speaking, optimization variables of microgrid planning and design mainly include models [14–16], capacity [15–17], and location [18–21] of distributed power supply, energy storage device, and equipment contained in the cold/heat/power connection system, etc. The tilt angle of the PHOTOVOLTAIC array, the fan wheel hub’s height, the type of scheduling strategy, and the position of the contact switch in the microgrid [22–30] can also be used as variables to be decided. The objectives of microgrid planning and design can be the minimization of total system cost [31–50], the maximization of net investment income [37, 43–48], the minimization of pollutant emissions [31, 32, 38–40, 44], the maximization of system power supply reliability [31, 34, 38, 44], the minimization of system network loss [37, 45, 46, 51], and the minimization of fuel consumption [33, 34, 36, 37, 39, 44, 46–48, 52]. In the economic analysis of the microgrid, the capital investment mainly includes the initial investment cost of distributed power supply, energy storage, controller [31, 33, 34, 38, 44], operation and maintenance cost, equipment replacement cost, fuel cost, sewage penalty [31, 33, 34, 44], power outage penalty [44], power purchase cost [33, 34, 36, 37, 41, 44–48, 52, 53] and so on. Project revenue mainly comes from the sale of electricity [41, 44–48, 53]. Because the influence of system operation optimization strategy needs to be considered in the planning and design of microgrid, the constraints that are considered in the system operation strategy usually need to be taken into account when formulating the constraints. In addition, some constraints of the planning design problem itself need to be considered. Constraint conditions mainly include: balance constraint of power (electricity, cold, and heat) of microgrid [31–37, 39–48, 51–53], power flow constraint [32, 34, 37, 43, 44, 47, 51], thermal stability constraint [42, 45, 47], voltage constraint [32–34, 37, 43, 45, 47, 51], tie line power constraint [31, 34–37, 39, 42, 44–46, 51, 52], equipment operation constraints [36, 37, 42, 44, 52], and Other constraints, such as tie line power fluctuation constraints [31, 34–37, 39, 40, 42, 44–46, 51, 52], etc.
4.2 Solutions to Microgrid Model In essence, the problem of microgrid planning and design is a multi-scene, multiobjective, nonlinear, mixed-integer, and uncertain comprehensive planning problem.
316
C.-Y. Ning et al.
Table 1 Standard solving algorithms used in the planning and design of microgrids Common solving algorithm Heuristic algorithm
The enumeration method [31, 35, 52, 37] Particle swarm optimization [31, 36, 42, 53, 46–50, 50] and its improved algorithm [31, 52, 42, 53, 44, 46, 47] Genetic algorithm[51, 33, 54, 41, 44, 50] and its improved algorithm [51, 33, 54, 41, 44] CS [32], BA [34], AFSA [35] Simulated annealing algorithm
Hybrid algorithm
Particle swarm optimization[31, 52, 42, 46–50, 49, 50] + Quadratic programming[31, 52, 42, 53, 44, 46, 47] Analog programming + Tabu search Differential evolutionary algorithm + Fuzzy multi-objective algorithm
In order to solve the microgrid planning and design problems, the application of enumeration method [31, 35, 37, 52], mixed integer programming method [32– 34, 36, 44, 48, 51, 53], heuristic algorithm [31, 41–43, 45, 46, 49, 50, 54], and hybrid algorithm [35, 37, 40, 44, 47] have been studied, respectively. To show the solution algorithm of the microgrid planning and design problem more intuitively, the optimization algorithm commonly used in the literature is given in Table 1.
5 Research’s Status and Prospect There have been much researches on the planning and design of microgrid, but there are still many key technologies that need to be further studied systematically. 1.
2.
3.
4.
Planning and design research of the microgrid itself. The existing research results are relatively simple, and a comprehensive systematic and scientific planning and design method is urgently needed to be established. Coordinated planning of distribution network and microgrid. The influence of microgrid access should also be considered in the planning and expansion planning of the distribution network itself. Economic analysis and planning of energy storage system. In the market environment, the long-term economic analysis of different roles of the energy storage system and the influence of life on its economic performance still lacks convincing demonstration. The planning and design of microgrid, including cold and hot power supply systems. For microgrids with comprehensive energy network characteristics, there is still a lack of detailed analysis of the coupling characteristics between the optimal ratio of cooling, heating, and electricity corresponding to different structures and different energy flows.
26 Microgrid System and Its Optimization Algorithms
317
6 Conclusion In this paper, domestic and foreign scholars’ latest research achievements in microgrid planning and design methods are reviewed from the theoretical perspective. Based on the microgrid planning and design method’s main contents, the modeling method and solving algorithm involved in the current microgrid planning and design research are expounded, and the possible research directions in the future have prospected. With the continuous deepening and improvement of the planning and design methods of microgrids, the microgrid will play a greater value in practical application.
References 1. M. Meng, S. Chen, S. Zhao, Z. Li, Y. Lu, A review of new energy microgrid research. Mod. Electr. Power 34(01), 1–7 (2017) 2. Z. Lu, C. Wang, Y. Min, S. Zhou, J. Lu, Y. Wang, A review of microgrid research. Power Syst. Autom. 19, 100–107 (2007) 3. G. Wang, Research overview of intelligent micro grid. China Electr. Ind. (Tech. Ed.) 02, 34–38 (2012) 4. J. Yang, X. Jin, X. Yang, X. Wu, Power control technology of ac–dc hybrid micro grid. Power Grid Technol. 41(01), 29–39 (2017) 5. Q. Zhang, Summary of key issues of micro-grid technology and its application. Sci. Technol. Commun. 9(05), 92–93+95 (2017) 6. H. Wang, G. Li, H. Li, B. Wang, Conversion method of grid-connected micro-grid and isolated island operation mode. China Electr. Power 45(01), 59–63 (2012) 7. Y. Wu, X. Zheng, S. Wu, B. Yan, Research on coordinated economic dispatching strategy in the grid-connected mode of micro-grid. Shaanxi Electr. Power 44(08), 6–11+16 (2016) (in Chinese) 8. L. Li, Research on control method under grid-connected operation mode of microgrid. Sci. Technol. Innov. Appl. 14, 132–133 (2019) 9. L. Fang, Y. Niu, S. Wang, T. Jia, Capacity configuration of micro-grid energy storage system based on day-ahead scheduling and real-time control. Power Syst. Prot. Control 46(23), 102– 110 (2016) 10. B. Xue, Research on operation Control Strategy of Multi-Micro grid Parallel Based on Improved Droop Control (North China University of Technology, 2018) 11. H. Huang, J. Zhou, Y. Dong, G. Zhu, Research on optimal scheduling and energy consumption model of integrated energy microgrid management system. Power Demand Side Manage. 05, 13–18 (2020) 12. Z. Wan, Analysis of simplified model of small and medium-sized microgrids in simulink example. Sci. Technol. Vis. 26, 64–67 (2020) 13. Z.-Y. Tian, D. Li, T.-Y. Li, Z. Zhang, Design and study of the optimal scheduling model for micro-grid energy under the constraint of safety and reliability. J. Shipbuilding Power Technol. 40(08), 43–47 (2020) 14. H. Li, Research on the Influence of Distributed Power Supply on the Utilization Rate of Distribution Network Equipment and Access Capacity (Hunan University, 2017) 15. Y. Tang, Planning and Research of Cold, Heat and Power Supply/Integrated Energy System (Southeast University, 2016) 16. H. Yu, X. Wang, D. Zhao, Calculation of access capacity of distributed power supply for the purpose of transforming the adaptability of distribution network equipment. Power Grid Technol. 40(10), 3013–3018 (2016)
318
C.-Y. Ning et al.
17. Influence of distributed energy grid connection in China on distribution network (part I). Electr. Appl. Ind. 07, 18–26 (2020) 18. T. Wang, C. He, X. Zhou, H. Shao, G. Geng, X. Tan, Research on loss reduction method of distribution network based on location and capacity determination of distributed power supply. Renewable Energy 38(09), 1246–1251 (2020). ((in Chinese)) 19. Y. Liu, J. Wang, Q. Jiao, W. Zhao, L. Wang, Fault location of distribution network including distributed power supply based on quantum behavior particle swarm optimization. Smart Power 48(08), 51–55 20. D. Zhou, J. Li, Y. Zhang, G. Lv, W. Chen, J. Yu, Location capacity optimization of distributed power supply in distribution network based on improved BAS algorithm. Renewable Energy 38(08), 1092–1097 (2020) 21. J. Zhu, Research on the influence of distributed power supply access on the voltage quality of distribution network. Electron. World 12, 70–71 (2020) 22. M. Zhang, Research on Optimal Design Method of PHOTOVOLTAIC Power Station Under Complex Terrain (North China Electric Power University, Beijing, 2018) 23. M. Dou, Z. Hua, L. Yan, S. Xie, D. Zhao, Research on the characteristic and efficient control method of cross-height fan. Micromotor 50(03):39–42+53 (2017) (in Chinese) 24. G. Xu. Research on Integrated Optimal Dispatching of Electric Vehicles and Multi-source Micro-Grid (Lanzhou University of Technology, 2018) 25. S. Abu-elzait, R. Parkin, The effect of dispatch strategy on maintaining the economic viability of PV-based microgrids, in IEEE 46th Photovoltaic Specialists Conference (PVSC), Chicago, IL, USA (2019), pp. 1203–1205 26. D. Peng, H. Qiu, H. Zhang, H. Li, Research of multi-objective optimal dispatching for microgrid based on improved genetic algorithm, in Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control, Miami, FL (2014), pp. 69–73 27. S. Chen, X. Li, Experimental study and feature improvement of DC series arc faults with switching noise interference, in IEEE Holm Conference on Electrical Contacts, Albuquerque, NM (2018), pp. 135–142 28. C. Konstantopoulos, E. Koutroulis, Global maximum power point tracking of flexible photovoltaic modules. IEEE Trans. Power Electron. 29(6), 2817–2828 (2014) 29. N. Heidari, J. Gwamuri, T. Townsend, J.M. Pearce, Impact of snow and ground interference on photovoltaic electric system performance. IEEE J. Photovoltaics 5(6), 1680–1685 (2015) 30. L. Powers, J. Newmiller, T. Townsend, Measuring and modeling the effect of snow on photovoltaic system performance, in 35th IEEE Photovoltaic Specialists Conference, Honolulu, HI (2010) 31. Z. Liao, Research on Optimal Operation of Micro-grid Based on Improved Particle Swarm Optimization (Nanchang University, 2019) 32. P. Wang, Research on Multi-Objective Optimization Operation of Micro-grid Based on Cuckoo Algorithm (North China Electric Power University, 2017) 33. K. Li, Optimization Scheduling of Micro-grid Based on Improved Genetic Algorithm (Xi‘an University of Technology, 2018) 34. L. Zhang, Research on Micro-grid Optimization based on Improved Bat Algorithm (Xi‘an University of Technology, 2018) 35. R. Liu, J. Zhang, G. Zhang, T. Liu, Optimal operation of microgrid based on adaptive artificial fish swarm algorithm. Power Grid Clean Energy 33(04), 71–76 (2017) 36. J. Li, G. Yi, H. Hu, J. Huang. Research on the coordinated optimization operation of micro grid based on improved chicken swarm algorithm. High Voltage Electr. Appl. 55(07), 203–210 (2019) 37. J. Tang, Research on Optimal Scheduling of Micro-grid based on Dynamic Fuzzy Chaotic Particle Swarm Optimization (Guangdong University of Technology, 2018) 38. X. Xie, Y. Niu, Y. Gao, Y. Liu, Q. Jia, H. Qu, Current protection setting method for microgrid based on improved particle swarm optimization algorithm. J. Yanshan Univ. 41(01), 39–44 (2017)
26 Microgrid System and Its Optimization Algorithms
319
39. W. Xia, Research on Operation Optimization Model of Micro-grid Based on Improved Leapfrog Algorithm (Xiangtan University, 2017) 40. H. Yang, J. Wang, N. Tai, Y. Ding, Robust optimization of microgrid distributed power supply based on Grey target decision and multi-target Cuckoo algorithm. Power Syst. Prot. Control 47(01), 20–27 (2019) 41. Marine reserves. Optimization scheduling of microgrid based on genetic algorithm. Ind. Control Comput. 32(02), 151–153 (2019) 42. N. Li, J. Gu, B. Liu, H. Lu, Research on optimal configuration and operation strategy of multienergy complementary microgrid system based on improved particle swarm optimization. Electr. Appl. Energy Effi. Manage. Technol. 18, 23–31 (2017) 43. H. Nie, W. Yang, X. Ma, H. Wang.Optimization scheduling of off-grid micro-grid based on improved bird swarm algorithm. J. Yanshan Univ. 43(03), 228–237 (2019) 44. K. Deng, Research on the Optimal Configuration of Micro-grid Power Supply Based on Hybrid Intelligent Optimization Algorithm (Taiyuan University of Technology, 2017) 45. H. Hu, J. Li, J. Huang, Research on optimal operation of microgrid based on improved chicken swarm algorithm. High Voltage Electr. Appl. 53(02), 19–25 (2017) 46. Y. Yang, Research on Optimization Operation of Micro-grid Based on Particle Swarm Optimization (Shenyang Institute of Engineering, 2018) 47. L. Wang, Research on optimal scheduling of isolated island microgrid based on particle swarm optimization. Electrotechnics 04, 55–57 (2020) 48. Y. Wei, B. Zhou, Z. Peng, Improve the NSGA—II algorithm based on interval number in micro grid optimization scheduling application. Power Capacitors React. Power Compensation 38(01), 117–122+132 (2017) 49. A. Askarzadeh, A memory-based genetic algorithm for optimization of power generation in a microgrid. IEEE Trans. Sustain. Energy 9(3), 1081–1089 (2018) 50. K. Rahbar, C.C. Chai, R. Zhang, Energy cooperation optimization in microgrids with renewable energy integration. IEEE Trans. Smart Grid 9(2), 1482–1493 (2018) 51. Z. He, R. Cheng, H. Yang, C. Lu, Research on optimization strategy of micro grid based on multi-agent genetic algorithm. Ind. Control Comput. 30(02), 126–128 (2017). ((in Chinese)) 52. R. Liu, Research on Optimal Operation of Micro-grid Based on Adaptive Artificial Fish Swarm Algorithm (Xi‘an University of Technology, 2017) 53. Q. Wang, J. Zeng, J. Liu, J. Chen, Z. Wang, Research on distributed multi-objective optimization algorithm for micro-grid source-reservoer-load interaction. Chin. J. Electr. Eng. 40(05), 1421– 1432 (2020) 54. J. Liu, Z. Lv, Q. Wang, J. Chen, Analysis of optimal configuration of independent micro-grid based on hybrid integer genetic algorithm. Electr. Appl. Energy Eff. Manage. Technol. 05, 65–70 (2019)
Chapter 27
Reducing Errors During Stock Value Prediction Q-Learning-Based Generic Algorithm Rachna Yogesh Sable, Shivani Goel, and Pradeep Chatterjee
1 Introduction Predicting stock values from temporal data is a multi-domain problem, which requires effective implementation of data cleaning, feature extraction, pattern recognition, classification, and post-processing. All these operations are mutually dependent on each other, and for a good predictor, each of these operations must be designed with utmost accuracy. A typical stock value prediction system can be observed from Fig. 1, wherein the input stock data is given to a technical indicator evaluation block. In this block, indicators like simple moving average (SMA), exponential moving average (EMA), double EMA, triple EMA, moving average convergence divergence (MACD), relative strength index (RSI) are evaluated. These parameters are given to a clustering algorithm. The clustering algorithm divides the data into different groups, wherein each group consists of similar value performing stock data. Each of these groups is given to a different neural network (or data predictor), where the future data value is evaluated. A moving average of these different values results into the final predicted value of the stock. The final value is fed-back into the system, and an error value is evaluated. The predictor is re-trained in order to reduce this error value. Upon successive iterations, R. Y. Sable (B) GH Raisoni Institute of Engineering and Technology, Pune, Maharashtra, India e-mail: [email protected] S. Goel Computer Science and Engineering, School of Engineering and Applied Sciences, Bennett University, Greater Noida, Uttar Pradesh, India e-mail: [email protected] P. Chatterjee Head Digital Transformation, Change Management and Customer Experience, GDC, Tata Motors, Pune, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_27
321
322
R. Y. Sable et al.
Fig. 1 A typical stock market prediction system
the predictor is trained well enough, such that the error of prediction is well within the defined boundaries for the stock value. For instance, if a stock ‘X’ has a current value of INR 100 per stock, and the future predicted value is INR 102, while the actual value should be INR 103, then the error value is INR 1, which is around 1% of the stock value. If the system is able to produce an error which is less than ‘Y’% of the stock price, and that is agreeable by the analysts who are using the system, then the system is defined as a ‘trained’ system. In order to evaluate error, the following formula is used, E stock =
|VActual − VPredicted | ∗ 100 VCurrent
(1)
where E stock is the percentage error in the predicted value, VActual is the actual future stock value, VPredicted is the predicted stock value and VCurrent is the current value of the stock. Once the system is trained, then testing data is applied to the same, in order to predict the final value of the stock. In order to predict the inter-day value of a stock, the input stock data must be quantized to inter-day values; while for intraday prediction the input stock data must be quantized to intraday values. Based on this quantization, the next successive value of stock is predicted. Researchers have worked on different algorithms for the prediction of stock from historical temporal data. In the next section, some of the recent algorithms that perform this task are mentioned, and their performance is evaluated, which is followed by the proposed algorithm, and its statistical analysis. This text concludes
27 Reducing Errors During Stock Value Prediction Q-Learning …
323
with some interesting observations about the proposed algorithm, and methods which can be applied to further improve its performance.
2 Literature Review In order to evaluate the different algorithms used for stock value prediction, this section summarizes them in terms of different parameters relevant to each of the algorithms. These parameters are majorly selected as the common parameters to each of the algorithms like accuracy of prediction, average delay during prediction, f -measure, and algorithm details. All these measures can be summarized using Table 1. From the given evaluation, it is evident that machine learning algorithms like genetic algorithm (GA), hybrid classification techniques like ICA-CCA-SVR and regression methods like LSSVR are better for both long-term and short-term predictions of stocks. Moreover, these methods can be used for effective inter-day and intraday prediction of stocks. Average delay was measured as a ratio of the number of seconds needed for predicting one value to the number of records in the dataset and can be represented as follows: Davg = Done /N
(2)
where Davg is the average delay, Done is the delay in predicting one value, and N is the number of records in the dataset. F-Measure is evaluated in the standard manner, using the following formula, f =2∗
(Precision ∗ Recall) Precision + Recall
(3)
The values of precision and recall are evaluated using standard true positive, true negative, false positive, and false negative sample values. Via this comparison, it is evident that genetic algorithm’s accuracy can be further improved via hybridization of other methods like ARIMA and deep learning. In the next section, a novel model that combines GA with Q-learning is proposed, which improves the overall performance of the existing GA-based method in terms of accuracy and f -measure values.
3 Proposed Q-Learning-Based GA for Stock Value Prediction (QGA) The proposed Q-Learning-based genetic algorithm for stock value prediction can be represented with the help of the block diagram showcased in Fig. 2. Here, the input stock data is given to a GA block for effective feature selection. The selected features
324
R. Y. Sable et al.
Table 1 Analysis of reviewed methods Algorithm under research
Method
Avg. delay F-measure Accuracy (%)
Multi-source multiple instances [1]
Combination of technical M indicators, news feeds, and tweets about the stock are combined, and a neural network is used for classification
M
62
LDA-POS [2]
Language processing with latent discriminative analysis (LDA) and parts of speech (POS) features for improved pattern analysis
L
L
66
NN with disparate data sources [3]
Neural network training and evaluation with data from Google news, current and previous stock values and wiki articles about the stock. All this data is evaluated with different testing set values
H
M
85
LSTM NN [4]
An extension to [3], where a large H memory unit is used for training and evaluation of data from Google trends and technical indicators
M
60
ARIMA [5]
Value-based prediction using the Autoregressive integrated moving average (ARIMA) model for the reduced delay and good theoretical accuracy. Similar to [6] and adds the ARIMA model for prediction. The addition of ARIMA model improves the accuracy, but also adds to the delay of prediction. This work in combination with [6] boosts the development of the underlying research.
H
H
95
Heterogeneous information fusion [7]
A support vector machine is trained M for fusing text and value-based features for improved accuracy of stock price prediction
M
86
SVR [8]
Regression analysis is done on stock values for inter-day and intraday. These values assist in trend prediction, and the trend is used for the evaluation of the next stock value
M
M
80
GA [9]
Genetic algorithm is used for direct M stock prediction using technical indicators. This work is the motivation behind the underlying research and improves its performance using a more sophisticated Q-Learning-based approach
H
94
(continued)
27 Reducing Errors During Stock Value Prediction Q-Learning …
325
Table 1 (continued) Algorithm under research
Method
Avg. delay F-measure Accuracy (%)
Mean Profit Rate (MPR) [10]
A new indicator named MPR is proposed in this work, which improves the short-term prediction accuracy. The MPR value is tested on limited datasets and thus must be explored for larger sets as well
M
M
95
MFNN-based LSTM [6]
This is an extension to [4], wherein H a multi-filters neural network (MFNN) model is added for better training. But this adds to the overall complexity of prediction and thereby must be optimized for real-time operations
M
67
Hybrid stock selection [11]
Here, multiple kinds of stock L indicators are combined together to formulate a single indicator. The single hybrid indicator can be a good theoretical evaluation indicator, but reduces the accuracy of prediction and thereby is not recommended for long-term prediction
L
55
DNN [12]
A true deep learning-based H prediction engine is defined here, which utilizes more than 13 different technical indicators. These indicators are combined together in order to form the final feature vector, which is given to the deep net for classification. Deep nets add to the computational complexity, but retains a very high f -measure value for long-term prediction
H
75
LSSVR [13]
A least squares support vector H regression (LSSVR) method is proposed, which utilizes the concepts of value regression and support vector machines in order to predict the future value of a stock. This is performed with the help of a large real-time dataset. The analysis of this algorithm is done on the basis of both accuracy of value and accuracy of trend prediction
M
90
(continued)
326
R. Y. Sable et al.
Table 1 (continued) Algorithm under research
Method
Avg. delay F-measure Accuracy (%)
TSGRUN [14]
Another deep learning-based H method is described in this text. Here, two-stream gated recurrent unit network (TSGRUN) which utilizes a combination of two-layered neural network along with recurrent neural networks to predict the future value of a stock is described. It reduces the speed of operation, but maintains a high level of accuracy of the system
H
79
Correlation model [15]
Here a correlation-based model is L used that evaluates cross-correlation between technical indicators and the final stock price. This correlation value is fine-tuned to an extent that for any new input stock value set the output stock value should be accurate enough. The algorithm is computationally lightweight, but has low accuracy
L
70
CEFLANN [16] This work uses a computational M efficient functional link artificial neural network (CELFANN) that introduces computationally effective activation and weight evaluation functions to the ANN. Due to which the overall computational complexity reduces, and the training delay is reduced. But this also reduces the accuracy of inter-day and intraday prediction, which can be improved using larger training and testing datasets
L
75
ANN [17]
M
70
This is a subset of the work done in H [18] and utilizes a backpropagation feedforward ANN to achieve good accuracy. But the long-term prediction performance is moderate and can be improved using better training feature values
(continued)
27 Reducing Errors During Stock Value Prediction Q-Learning …
327
Table 1 (continued) Algorithm under research
Method
Avg. delay F-measure Accuracy (%)
SVM + RF [19] A combination of support vector machines and random forest is proposed in this method. It processes numerical data using random forests, and textual data using support vector machines to predict the final value of the stock. Due to this combination, the stock trend and stock price are predicted with good accuracy
H
M
80
ICA-CCA-SVR A combination of independent [18] component analysis, canonical correlation analysis, and support vector regression is used for prediction of stock value. The combination showcases a good prediction performance and thus can be used for real-time deployment
M
H
90
Fig. 2 Flow diagram of the proposed prediction system
are given to a Q-learning-based predictor, that analyzes these patterns and provides the final stock value at the output. The process flow for the genetic algorithm for feature selection can be observed from the following steps, • Input: – – – –
Stock value indicators (N indicators) Number of iterations (N i ) Number of solutions (N s ) Mutation factor (m)
328
R. Y. Sable et al.
• Initially mark that all solutions will be changed, and all weights as 1 • Process: – For each iteration in 1 to N i – For each solution in 1 to N s If the solution is marked as ‘to be changed,’ then Select ‘k’ random indicators from the set of indicators Apply the following formula to evaluate the stock value, Vstock =
k
Ii ∗ wi
(4)
i=1
where Vstock is the predicted value of the stock, Ii is the value for the ith indicator, and wi is the Q-learning weight for the ith indicator, which is evaluated using Eq. 4.1, wi =
var(Ii ) N max var(I ) j j=1
(4.1)
where var is the variance of the feature, and U indicates the union of variances for the features. Now evaluate the fitness using the following formula, F = |Vstock − Vactual |
(5)
Store this fitness value into an array Find the mean of all fitness values, and then evaluate the fitness threshold as follows, F Fth = ∗m (6) Ns Discard all solutions where the fitness value is more than Fth , and mark them as ‘to be changed’ – At the end of the final iteration, select the solution which has the least fitness value, which indicates the least error • Output – The selected solution has the least error, so select the indicators which are used to make this solution. Based on the selected indicators, a Q-learning algorithm is evaluated. This algorithm assists the genetic algorithm by providing the best weights that can be used
27 Reducing Errors During Stock Value Prediction Q-Learning …
329
for prediction. The following process is followed in order to evaluate the Q-learning algorithm, • Input – Previous weights ‘w’ for each of the technical indicators – Current price of the stock – Number of iterations for each weight (N i ) • Process – For each weight ‘i,’ do the following, Go to each iteration, and evaluate the stock value using Eq. 4 Now, evaluate the error using the following equation, E = Vstock − Vactual
(7)
If E > 0, then reduce the weight wi by a penalty factor of ‘m,’ else increase the weight wi by a reward factor ‘m’ (provided by Bellman Eq. 8) m = α ∗ wi + (1 − α) ∗ wi−1
(8)
where α is the Q-learning factor having value between 0 to 1. At the end of all the iterations for all the features, we will get the final modified values of weights, that is given back to the genetic algorithm. • Output – Updated weights as per the actual value of the stock The output of Q-Learning is again given to the genetic algorithm, and then QLearning is evaluated again. This process is repeated unless a constant value of error is obtained and the algorithm converges. Due to the simplistic calculations, the time complexity of the algorithm per iteration is very low. Moreover, due to the reduction in error function, the final accuracy of prediction is very high. This results into a high value of f -measure. The following indicators were used in order to evaluate the final value of the stock, • Simple moving average (SMA), exponential moving average (EMA), double EMA (DEMA), triple EMA (TEMA) • Moving average convergence divergence (MACD) • Relative strength indicator (RSI), maximum RSI, minimum RSI, stochastic RSI • Commodity channel index (CCI) • Williams % R (WillR) • Rate of change (ROC), volume ROC (VROC) • On balance volume (OBV) • Average true range (ATR) • Standard deviation (SD)
330
R. Y. Sable et al.
The standard evaluation formulas for these indicators are readily available at https://www.tradingtechnologies.com/xtrader-help/x-study/technical-indica tor-definitions/. This is a very good source of information for more technical indicators. It can be used by researchers in order to obtain a more accurate model for stock prediction. In the next section, the result evaluation and comparison for the given model are done, followed by the observations and future work for the proposed algorithm.
4 Results and Comparisons In order to evaluate the results of the proposed algorithm, the BSE30 dataset was used. The dataset consists of more than 70 k records, taken over a period of 1 year for the top 30 Bombay Stock Exchange companies. The evaluation was done w.r.t. number of records used for training, and the results were shown in Table 2. From the table, it is evident that the proposed QGA outperforms both simple GA, LSSVR, and ARIMA models in terms of f -measure and accuracy values. Table 2 Result comparison for different algorithms Training set size (records)
Algorithm
Avg. delay
F-measure
Accuracy (%)
10 k
GA
M
M
90
10 k
ARIMA
M
M
90
10 k
LSSVR
H
H
91
10 k
ICA-CCA-SVR
H
H
93
10 k
Proposed
M
H
96
20 k
GA
M
M
91
20 k
ARIMA
M
M
91
20 k
LSSVR
H
H
93
20 k
ICA-CCA-SVR
H
H
93
20 k
Proposed
H
H
97
50 k
GA
M
M
93
50 k
ARIMA
M
M
93
50 k
LSSVR
H
M
93
50 k
ICA-CCA-SVR
H
H
95
50 k
Proposed
H
H
97.5
70 k
GA
M
M
93
70 k
ARIMA
M
M
95
70 k
LSSVR
H
M
95
70 k
ICA-CCA-SVR
H
H
95
70 k
Proposed
H
H
98.2
27 Reducing Errors During Stock Value Prediction Q-Learning …
331
Due to the high number of iterations needed during both Q-learning and genetic optimization, the overall time complexity of the algorithm increases. This is evident from the high average delay encountered during the evaluation of the algorithm. Moreover, the accuracy of prediction is very high, which reduces the prediction error to less than 2%, and less than 10% when compared with other methods.
5 Conclusion and Future Scope From the result evaluation, it is evident that the proposed model is highly effective in terms of accuracy of prediction and has high f-measure, when compared with the existing state-of-the-art methods. But, due to the high number of iterations needed for convergence, the overall delay of prediction increases as the size of dataset increases. It can be observed that the overall prediction error reduces to less than 2%, while the comparative prediction error reduces by 10% when compared with other state-ofthe-art algorithms. In the future, it is recommended that researchers must use faster algorithms and also evaluate a greater number of technical indicators in order to further improve the accuracy of prediction. Moreover, it is also recommended that testing and validation be done on a larger dataset for more promising results.
References 1. X. Zhang, S. Qui, J. Huang, B. Fang, P. Yu, Stock market prediction via multi-source multiple instance learning. 6 (2018). https://doi.org/10.1109/ACCESS.2018.2869735 2. A. Derakhshan, H. Beigy, Sentiment analysis on stock social media for stock price movement prediction (2019) 3. B. Weng, M.A. Ahmed, F.M. Megahed, Stock market one-day ahead movement prediction using disparate data sources. 79, 153–163 (2017) 4. D. Faustryjak, L. Jackowska Strumiłło, M. Majchrowicz, Forward forecast of stock prices using LSTM neural networks with statistical analysis of published messages 5. S.M. Idrees, M. Afshar Alam, P. Agarwal, A prediction approach for stock market volatility based on time series data (2018). ISSN 2169-3536 6. W. Long, Z. Lu, L. Cui, Deep learning-based feature engineering for stock price movement prediction. 164, 163–173 (2019) 7. X. Zhang, Y. Zhang, S. Wang, Y. Yao, B. Fang, P.S. Yu, Improving stock market prediction via heterogeneous information fusion. https://doi.org/10.1016/j.knosys.2017.12.025 8. B.M. Henrique, V.A. Sobreiro, H. Kimura, Stock price prediction using support vector regression on daily and up to the minute prices. 4(3), 183–201 (2018). https://doi.org/10.1016/j.jfds. 2018.04.003 9. T. Xia, Q. Sun, A. Zhou, S. Wang, S. Xiong, S. Gao, J. Li, Q. Yuan, Improving the performance of stock trend prediction by applying GA to feature selection (2018) 10. G. Liu, X. Wang, A new metric for individual stock trend prediction (2019) 11. F. Yang, Z. Chen, J. Li, L. Tang, A novel hybrid stock selection method with stock prediction. 80, 820–831 (2019) 12. O.B. Sezera, M. Ozbayoglua, E. Dogdu, A deep neural-network based stock trading system based on evolutionary optimized technical analysis parameters. Procedia Comput. Sci. 114, 473–480 (2017)
332
R. Y. Sable et al.
13. J.-S. Chou, T.-K. Nguyen, Forward forecast of stock price using sliding-window metaheuristicoptimized machine learning regression (2017) 14. L. Minh Dang, A. Sadeghi-Niaraki, H.D. Huynh, K. Min, H. Moon, Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network (2018). ISSN: 2169-3536 15. A. Nayak, M.M. Manohara Pai, R.M. Pai, Prediction models for Indian stock market. Procedia Comput. Sci. 89, 441–449 (2016) 16. R. Dash, P.K. Dash, A hybrid stock trading framework integrating technical analysis with machine learning techniques (2016) ISSN: 2405-9188 17. A.H. Moghaddam, M.H. Moghaddam, M. Esfandyari, Stock market index prediction using artificial neural network (2016) ISSN 2077-1886 18. J. Kalyani, H.N. Bharathi, R. Jyothi, Stock trend prediction using news sentiment analysis. 8(3) (2016) 19. Z. Guo, H. Wang, Q. Liu, J. Yang, A feature fusion based forecasting model for financial time series 9(6), e101113 (2014)
Chapter 28
An Adaptive Stochastic Gradient-Free Approach for High-Dimensional Blackbox Optimization Anton Dereventsov, Clayton G. Webster, and Joseph Daws
1 Introduction In this work, we introduce a novel adaptive stochastic gradient-free (ASGF) approach for blackbox optimization. The ASGF method achieves improved performance when compared with existing gradient-free optimization (GFO) schemes (including those sometimes referred to as evolutionary strategies [1–5]), as well iterative approaches that rely on the gradient information of the objective function on several challenging benchmarks problems. This new technique is designed to alleviate some of the more difficult challenges associated with successful deployment of machine learning models for complex tasks, namely: (i) high-dimensionality; (ii) nonconvexity and extraneous local extrema; and (iii) extreme sensitivity to hyperparameters. Data sets, models, and network architectures have increased to gargantuan size and complexity, see, e.g., [6, 7]. In these settings, the efficiency of backpropagation and automatic differentiation is diminished. In addition, the use of iterative approaches that rely on the gradient information of the objective function for training can lead to very poor performance, due to extraneous local optima of the loss landscape [8]. Such methods also perform poorly on machine learning tasks with non-differentiable objective functions, e.g., reinforcement learning tasks. GFO methods, i.e., those that rely solely on function evaluations, are well suited to address these issues, and recently, significant advances have been made toward conquering these challenges A. Dereventsov (B) · J. Daws Lirio AI Research, Lirio LLC, Knoxville, TN 37923, USA e-mail: [email protected] J. Daws e-mail: [email protected] C. G. Webster Lirio AI Research and Behavioral Reinforcement and Learning Lab (BReLL), Lirio LLC, Knoxville, TN 37923, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_28
333
334
A. Dereventsov et al.
[2, 4, 5, 9–11]. These methods are sometimes referred to as “evolutionary strategies (ES),” which are a class of algorithms inspired by natural evolution [12]. Recently, interest in ES methods has been reinvigorated and has become a popular approach in several machine learning problems such as training neural networks [8, 13, 14] and reinforcement learning [5, 15–17]. As such, these methods are particularly useful when solving optimization problems related to nonconvex and nonsmooth objective functions. However, there are several ingredients necessary to execute such methods successfully on blackbox optimization problems, including: the computation of efficient search directions; size of the smoothing radius; the number and type of samples used to evaluate the function; and parameters related to the iterative update and candidate solutions. More importantly, the lack of generalization cannot be overlooked, notwithstanding the fact that solutions to most blackbox optimization problems require significant hyperparameter tuning, renders the results when computing with such methods to be highly problem dependent. In contrast, the ASGF approach can be successfully applied to a wide class of optimization problems using a fixed set of hyperparameters. Although more sophisticated methods than brute-force searches have been explored for identifying good hyperparameters, e.g., [18], an approach that diminishes this process is highly desirable. The major contribution of this paper is the design and application of ASGF to challenging nonconvex optimization problems. In particular, our approach is designed to be adaptive, gradient-free, massively parallelizable and scalable, and easily tuned with simple choices of hyperparameters. Moreover, ASGF accelerates convergence toward the global extrema by exploiting an innovative iterative procedure. At each step, we primarily follow the direction of an approximate gradient of a smoothed function, while still maintaining the domain exploration by randomly generated search directions. In addition, we explore the local geometry of the objective function and then adaptively adjust the parameter values at each iteration.
1.1 Related Works GFO approaches include a large class of techniques to optimize an objective function based only on function values. For example, see [19, 20] for a general review on topic. A particular class of GFO algorithms, referred to as evolutionary strategies [12], combines Gaussian smoothing [21, 22] and random search techniques [23, 24], which have been applied to a large class of learning tasks including, e.g., reinforcement learning [5, 15–17, 25], as well as training and optimizing neural networks [8, 13, 14]. Moreover, these strategies have been shown to be competitive when compared to iterative approaches that rely on the gradient information of the objective function [26–30] and further improved by employing either adaptation [1–5] or orthogonal directional derivatives [9, 11]. Our work herein combines the advantages of both approaches in order to overcome several of the grand challenges associated with solving high-dimensional nonconvex optimization problems.
28 An Adaptive Stochastic Gradient-Free …
335
2 Background Our objective is to solve for the global extrema of a high-dimensional nonconvex objective function f : Rd → R. Without loss of generality, we consider the unconstrained blackbox optimization problem, parameterized by a d -dimensional vector x = (x1 , . . . , xd ) ∈ Rd , i.e., (1) min f (x). x∈Rd
Throughout this effort, we assume that f (x) is only available by virtue of function evaluations, and the gradient ∇f (x) is inaccessible; thus (1) is typically solved with a derivative- or gradient-free optimization (GFO) method [19, 20, 22, 31, 32].
2.1 Gaussian Smoothing for GFO Methods Similarly to the so-called evolutionary strategies [3, 5, 9, 10], we introduce the notion of Gaussian smoothing [22, 33] of the objective function f (x) in (1). Let σ > 0 be the smoothing parameter and denote by fσ (x) the Gaussian smoothing of f with radius σ , i.e., 1 2 f (x + σ ) e−2 d = Eε∼N (0,Id ) f (x + σ ) . fσ (x) = d/2 π Rd We remark that fσ (x) preserves important features of the objective function including, e.g., convexity, the Lipschitz constant and is always differential even when f (x) is not. Furthermore, since f − fσ can be bounded by the Lipschitz constant, problem (1) can be replaced by the smoothed version, i.e., minx∈Rd fσ (x) (see, e.g., [11] and references therein). The gradient of fσ (x) can be computed as ∇fσ (x) =
2 σ π d/2
Rd
f (x + σ ) e−2 d = 2
2 E∼N (0,Id ) f (x + σ ) . σ
(2)
M iid Then for M ∈ N+ , j j=1 ∼ N (0, 1) and the learning rate λ ∈ R, traditional GFO methods estimate (2) via Monte Carlo (MC) sampling and provide an iterative update to the state x, given by ∇fσ (x) ≈
M M 2 2λ m f (x + σ m ) and xi+1 = xi − m f (xi + σ m ) (3) σ M m=1 σ M m=1
respectively [5]. The primary advantages of such GFO approaches are that they are easy to implement, embarrassingly parallelizable, and can be easily scaled to include a large number workers. On the other hand, MC methods (see, e.g., [34])
336
A. Dereventsov et al.
suffer from slow convergence rates, proportional to M −1/2 , even though such rates are independent of dimension d . Minor improvements could be expected with the use of quasi-MC sampling [35] or even sparse grid approximations [36, 37]; however, the combination of high-dimensional domains and nonconvex objective functions makes all such GFO strategies only amenable to solving blackbox optimization problems in low to moderate dimensions. Directional Gaussian smoothing via Monte Carlo sampling. A promising attempt to improve the efficiency and accuracy of the MC gradient estimate (3) is to consider decoupling the problem (2) along d orthogonal directions [9]. The gradient can be estimated by virtue of, e.g., an antithetic orthogonal sampling, i.e., ∇fσ (x) ≈
M 1 j f (x + σ j ) − f (x − σ j ) , σ M j=1
where {j }M j=1 are marginally distributed as N (0, 1), and the joint distribution of M j j=1 is defined as follows: if M ≤ d , then the vectors are conditioned to be orthogonal almost surely. If M > d , then each consecutive set of d vectors is conditioned to be orthogonal almost surely, with distinct sets of d vectors remaining independent. Using the orthogonal directions, as opposed to the MC directions (as in (3)) improves the overall performance when approximating (2), however, due to the MC approximation along each orthogonal direction hinders the convergence of such methods suffers as the dimension increases. Directional Gaussian smoothing via Gauss-Hermite quadrature. An efficient approach for computing the decoupled integrals with spectral accuracy is to employ one-dimensional Gauss-Hermite quadrature. This can be accomplished by letting := (ξ1 , . . . , ξd ) be an orthonormal basis in Rd and by computing directional derivatives ∂fσ (x)/∂ξj of fσ at point x in the direction ξj , estimated as 2 ∂fσ (x) ≈ ∇fσ (x| ξj ) := √ ∂ξj σ π
R
vf (x + σ vξj ) e−v dv. 2
Then the directional derivatives ∇fσ (xi | ξj ) can be computed via Gauss-Hermite quadrature with mj ≥ 3 quadrature points, i.e., mj 2 ∇fσ (x| ξj ) ≈ √ wm pm f (x + σ pm ξj ), σ π m=1
(4)
where pm are the roots of the Hermite polynomial of degree mj and wm are the corresponding weights (see, e.g., [38]). Once the directional derivatives are computed, the estimate of the gradient of the smoothed function fσ at point x can be computed as
28 An Adaptive Stochastic Gradient-Free …
∇fσ (x) =
d
337
∇fσ (x| ξj ) ξj .
(5)
j=1
This approach is considered in [11] for applications to nonconvex blackbox optimization and later in the context of RL tasks [17]. However, as described in Sect. 1, this technique requires significant hyperparameter tuning, which necessitates the development of our fully adaptive stochastic gradient-free method strategy.
3 The Adaptive Stochastic Gradient-Free Method (ASGF) Before going into the details, we roughly outline the general flow of the algorithm. At the beginning of an iteration i, the search directions and smoothing parameter σ are used to compute derivatives ∇fσ (xi | ξj ) along the directions ξj according to (4), and the gradient surrogate ∇fσ (xi ) is estimated by (5). The learning rate λ is then selected based on certain local properties of the objective function, and the candidate minimizer is updated by a step of gradient descent: xi+1 = xi − λ ∇fσ (xi ). Finally, we update the search directions and smoothing parameter σ , and proceed to the next iteration. In the following sections, we describe in detail each part of the process.
3.1 The ASGF Algorithm A central feature of the ASGF approach is the selection of the search directions . Although the directional smoothing described in Sect. 2.1 holds for any set of orthonormal vectors , the choice of the updates to has a significant impact on the realization of the optimization process. For instance, taking steps mainly in the direction of the gradient ∇f (x) (assuming it exists) result in a form of (batch) gradient descent, while distributing updates across random directions is more in a style of stochastic gradient descent. In ASGF, the directions are chosen in a way that balances efficiency and exploration as follows: for a particular iteration i the algorithm the first direction ξ1 is set to be the current estimate of the gradient of fσ (xi ), while the other directions ξ2 , . . . , ξd are chosen to complement ξ1 to a random orthonormal basis in Rd , i.e., ξ1 =
∇fσ (xi ) , ξ2 , . . . , ξd ∈ Rd are such that is an orthonormal basis. (6) ∇fσ (xi )2
Such an approach naturally combines the efficiency of exploiting the gradient direction ξ1 (from now on referred to as “main” direction) while retaining the exploration ability provided by the stochastic directions ξ2 , . . . , ξd (called “auxiliary” directions), which are generated randomly on each iteration.
338
A. Dereventsov et al.
By splitting search directions into a single “main” direction and a set of “auxiliary” directions, we can improve the computational efficiency of the approach by using a different number of quadrature points for each of the two classes of directions. For the “main” direction, i.e., the gradient direction ξ1 , we use an adaptive scheme for establishing a suitable number of quadrature points. Specifically, we estimate ∇fσ (x| ξ1 ) via (4) with an increasing numbers of quadrature points m ∈ {3, 5, 7, . . .} until we obtain two estimates that differ less than some threshold εm . Since the “auxiliary” directions ξ2 , . . . , ξd mainly serve an exploration role, a fixed small number of quadrature points is used. In the numerical experiments presented in Sect. 4, we use εm = .1 and m2 = . . . = md = 5 points. Another key aspect of ASGF is the adaptive selection of the learning rate. Instead of using a fixed value for the learning rate or a predetermined schedule, the geometry of the target function is used to derive the step size. For each direction ξj , the values mj , sampled in (4), are used to estimate the directional local Lipschitz {f (x + σ pm ξj )}m=1 constants Lj as
f (x + σ pm+1 ξj ) − f (x + σ pm ξj )
.
Lj = max
1≤m 0 and σ < ρ σ0 then assign to be a random orthonormal basis and set σ ← σ0 set A, B to their initial values and change number of resets r ← r − 1 else update search directions by (6) if max1≤j≤d |∇fσ (x| ξj )/Lj | < A then decrease smoothing σ ← σ ∗ γσ and lower threshold A ← A ∗ A− else if max1≤j≤d |∇fσ (x| ξj )/Lj | > B then increase smoothing σ ← σ/γσ and upper threshold B ← B ∗ B+ else increase lower threshold A ← A ∗ A− and decrease upper threshold B ← B ∗ B+
standard choice of parameter values and domains. In Sect. 4.1, we compare ASGF to other algorithms for nonconvex optimization in the low-dimensional setting, and in Sect. 4.2, we showcase the performance of ASGF in the high-dimensional setting. We note that, unlike many existing algorithms, ASGF does not require careful selection of hyperparameters in order to obtain state-of-the-art results since the parameters are adjusted throughout the realization of the algorithm. As such, we use the exact same set of hyperparameters for all of the stated examples. This showcases the adaptability and stability of our method to the hyperparameters selection, which is an essential feature in the blackbox setting. We would also like to point out that with minor tweaking, ASGF can achieve better results for each of the presented examples; however, the main purpose of this section is to showcase the adaptive nature of the ASGF to automatically determine the suitable parameter values and that we abstain from any kind of hyperparameter tuning. Namely, we use the following set of parameters for ASGF: γs = .9, m = 5, A = .1, B = .9, A− = .95, A+ = 1.02, B− = .98, B+ = 1.01, γL = .9, r = 2, ρ = .01, εm = .1, εx = 10−6 . For the initial value of σ , we use the heuristic σ0 = diam( )/10, where ⊂ Rd is the spatial domain from which the initial state x0 is sampled. Nevertheless, due to the adaptive design of ASGF, the values of the above hyperparameters could be changed without significant distinction in the resulting performance. The presented numerical experiments are performed in Python, and the source code reproducing the stated results is publicly available at https://github.com/joedaws/ASGF.
28 An Adaptive Stochastic Gradient-Free …
341
4.1 Low-Dimensional Optimization Even though ASGF is designed with a high-dimensional setting in mind, in order to provide an extensive comparison with other methods, in this section, we consider a wide range of optimization benchmarks, presented in Table 1. Here we compare the following algorithms: ASGF (ours), Directional Gaussian Smoothing (DGS, see [11]), and Covariance Matrix Adaptation (CMA, see [10]). The presented experiments are performed over 100 independent simulations. All the algorithms start a simulation with the same initial guess x0 sampled at random from the spatial domain ⊂ Rd . A simulation is considered successful if an algorithm returns a minimizer that achieves a value within 10−4 of the global minimum. For ASGF, the hyperparameters are the same across all the examples and are stated in the preamble of Sect. 4. For DGS, we perform a hyperparameter search over the following grid: λ ∈ {.001, .003, .01, .03, .1}, m ∈ {5, 9, 13, 17, 21}, σ ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, γ ∈ {.001, .003, .01, .03, .1} and report only the best obtained results for every example. For CMA, the value of σ is set to be the same as for DGS in the same setting. Lastly, the comparison of ASGF to additional algorithms (such as BFGS, Nelder–Mead, and Powell) is provided in Appendix; however, the most competitive algorithms (namely, DGS, and CMA) are presented in Table 1.
Table 1 Convergence, average number of iterations and function evaluations over 100 simulations Convergence
Iterations
Function evaluations
Benchmark
ASGF (%)
DGS (%) CMA (%)
ASGF
DGS
CMA
ASGF
DGS
CMA
Branin
100
64
100
335
2568
70
3820
23,113
418
Cross-in-Tray
99
76
79
335
3939
74
9031
98,476
446
Dropwave
100
99
1
524
1232
77
44,645
40,646
463
Sphere 10d
100
100
100
16
50
197
669
2066
1969
Ackley 2d
95
97
86
78
1202
77
3774
10,816
462
Ackley 5d
100
99
91
54
2147
151
2703
45,100
1208
Ackley 10d
100
94
97
56
3068
233
3582
125,751
2333
Levy 2d
100
100
97
352
1704
72
5043
56,231
430
Levy 5d
100
100
81
475
760
146
12,909
61,583
1172
Levy 10d
100
100
61
456
714
240
22,353
114,998
2404
Rastrigin 2d
96
100
14
86
2078
83
2785
85,179
496
Rastrigin 5d
100
100
0
2075
1788
−
159,564
180,539
−
Rastrigin 10d
100
100
0%
2430
1562
−
232,258
314,035
−
342
A. Dereventsov et al.
4.2 High-Dimensional Optimization In this section, we demonstrate the performance of ASGF in the setting of highdimensional blackbox optimization on 100, 1000, and 10000-dimensional benchmarks (namely, Ackley, Levy, Rastrigin, and Sphere functions). All the simulations converged successfully, regardless of the initial state x0 , and the particular optimization trajectories are displayed in Fig. 1. The average numbers of iterations and function evaluations are given in Table 2. The hyperparameters used for ASGF are the same ones as in Sect. 4.1 and are specified in the preamble of Sect. 4. We note that the irregularity of the optimization trajectories in Fig. 1 in case of Rastrigin function is caused by the highly oscillatory nature of the loss landscape. The irregularity on Levy benchmark is due to the “parameter reset” feature of ASGF. In particular, it can be observed that the “parameter reset” indeed helps the algorithm to escape local optima and converge to the global minimum. In cases of Ackley and Sphere functions, due to distinguished geometry of the global minimum, the parameter reset is not triggered; hence, the optimization trajectories are smooth.
Fig. 1 Performance of ASGF on high-dimensional test functions
28 An Adaptive Stochastic Gradient-Free …
343
Table 2 Average number of iterations and function evaluations of ASGF Benchmark Iterations Evaluations Benchmark Iterations Ackley 100d 66 Ackley 1000d 103
27,343 414,298
Ackley 10000d Levy 100d Levy 1000d Levy 10000d
89
3,548,775
452 508 617
184,176 2,037,076 24,698,140
Rastrigin 100d Rastrigin 1000d Rastrigin 10000d Sphere 100d Sphere 1000d Sphere 10000d
Evaluations
2995 2901
1,290,215 11,625,963
3206
114,845,172
48 76 112
19,381 303,508 4,480,337
4.3 Reinforcement Learning Tasks The episodic reinforcement learning problem is formalized as a discrete control process over time steps where an agent interacts with its environment E. At each time step t ∈ {1, . . . T }, the agent is presented with a state st ∈ S and correspondingly takes an action at ∈ A. This results in the agent receiving a reward r(st , at ) ∈ R and the state transitions to st+1 ∈ S by interfacing with the environment. The agent achieves its goal by learning a policy π : S → A that maximizes the cumulative return objective function f (π ). It is common to parameterize a policy by a neural network, i.e., πx , where the parameters x = (x1 , . . . , xd ) ∈ Rd represent the weights of a neural network. The architecture used in the examples below is a two-layer network with 8 nodes per layer and the Tanh activation function. A policy πx is obtained by solving the nonconvex high-dimensional blackbox optimization problem max f (πx ), where f (π ) := x∈Rd
T
Eπ r(st , at ) ,
t=1
with action at being sampled from the distribution provided by the policy π .
Fig. 2 Comparison of scalable gradient-free algorithms (ASGF, DGS, and ES) on two reinforcement learning tasks: Pendulum-v0 (left) and InvertedPendulumBulletEnv-v0 (right)
344
A. Dereventsov et al.
The performance of ASGF, DGS [11], and ES [5] is compared in Fig. 2 on two conventional reinforcement learning tasks: Pendulum-v0 from the OpenAI Gym library [42] and InvertedPendulumBulletEnv-v0 from the PyBullet library [43]. In each of the plots, the vertical axis is the average reward over 10 training runs of each of the algorithms. The horizontal axis is the number of simulated episodes of the environment. Since ASGF uses an adaptive number of quadrature points, it needs fewer simulations to obtain a good average return.
5 Conclusions In this work, we introduce an adaptive stochastic gradient-free method designed for solving high-dimensional nonconvex blackbox optimization problems. The combination of hyperparameter adaptivity, massive scalability, and relative ease of implementation makes ASGF prominent method for many practical applications. The presented numerical examples empirically confirm that our method avoids many of the common pitfalls and overcomes challenges associated with high-dimensional nonconvex optimization. Despite the successful demonstration of ASGF on several benchmark optimization problems in Sect. 4, we acknowledge that a single demonstration in the reinforcement learning domain is not a convincing case of the efficacy of ASGF for reinforcement learning, but we present it more as a proof of concept rather than the claim of superiority. Our primary objective was to ensure that ASGF could successfully outperform existing GFO methods. However, we feel strongly that since ASGF enables exploitation of the gradient direction while maintaining sufficient space exploration, we will also be able to accelerate the convergence on several more complicated RL tasks. This is certainly the direction we are working on now and will be the focus of our future efforts.
Appendix Comparison of optimization algorithms in low-dimensional setting In this addendum, we expand on the results of Sect. 4.1 and provide a more detailed comparison of different optimization methods, presented in Tables A.1–A.3. Specifically, we compare our adaptive stochastic gradient-free algorithm (ASGF), Directional Gaussian Smoothing (DGS, [11]), Covariance Matrix Adaptation (CMA, [10]), Powell’s conjugate direction method (Powell, [44]), Nelder–Mead simplex direct search method (Nelder–Mead, [45]), and Broyden–Fletcher–Goldfarb–Shanno quasi-Newton method (BFGS, [46]). The hyperparameter choice for the relevant
28 An Adaptive Stochastic Gradient-Free …
345
algorithms is discussed in Sect. 4.1. Each algorithm is tested on 100 randomly sampled initial states, which are identical across all algorithms. The simulation counts as successful if the returned minimizer achieves a function value that is within 10−4 of the global minimum (Tables 3, 4, and 5).
Table 3 Success rate of algorithms in terms of convergence to global minimum ASGF (%)
DGS (%)
CMA (%)
Powell (%)
Nelder–Mead (%)
BFGS (%)
Branin
100
64
100
100
100
100
Cross-in-Tray
99
76
79
14
10
10
Dropwave
100
99
1
2
0
3
Sphere 10d
100
100
100
100
100
100
Ackley 2d
95
97
86
27
1
0
Ackley 5d
100
99
91
2
0
0
Ackley 10d
100
94
97
0
0
0
Levy 2d
100
100
97
13
7
11
Levy 5d
100
100
81
1
0
1
Levy 10d
100
100
61
0
0
0
Rastrigin 2d
96
100
14
33
1
3
Rastrigin 5d
100
100
0
6
0
0
Rastrigin 10d
100
100
0
1
0
0
Table 4 Average number of iterations on successful simulations Branin
ASGF
DGS
CMA
Powell
Nelder–Mead
BFGS
335
2568
70
4
60
8
Cross-in-Tray
335
3939
74
3
56
9
Dropwave
524
1232
77
3
−
5
Sphere 10d
16
50
197
2
1972
4
Ackley 2d
78
1202
77
6
79
−
Ackley 5d
54
2147
151
6
−
−
Ackley 10d
56
3068
233
−
−
−
Levy 2d
352
1704
72
4
64
12
Levy 5d
475
760
146
10
−
20
Levy 10d
456
714
240
−
−
−
Rastrigin 2d
86
2078
83
4
65
11
Rastrigin 5d
2075
1788
−
5
−
−
Rastrigin 10d
2430
1562
−
5
−
−
346
A. Dereventsov et al.
Table 5 Average number of function evaluations on successful simulations ASGF DGS CMA Powell Nelder– Mead Branin Cross-inTray Dropwave Sphere 10d Ackley 2d Ackley 5d Ackley 10d Levy 2d Levy 5d Levy 10d Rastrigin 2d Rastrigin 5d Rastrigin 10d
BFGS
3820 9031
23,113 98,476
418 446
126 114
116 109
38 40
44,645 669 3774 2703 3582 5043 12,909 22,353 2785 159,564 232,258
40,646 2066 10,816 45,100 125,751 56,231 61,583 114,998 85,179 180,539 314,035
463 1969 462 1208 2333 430 1172 2404 496 − −
117 179 327 700 − 111 712 − 150 447 1004
− 2778 151 − − 123 − − 124 − −
57 78 − − − 69 196 − 230 − −
References 1. P. Hamalainen, A. Babadi, X. Ma, J. Lehtinen, PPO-CMA: proximal policy optimization with covariance matrix adaptation. CoRR, abs/1810.02541 (2018) 2. N. Hansen, The CMA evolution strategy: a tutorial. arXiv preprint arXiv:1604.00772 (2016) 3. N. Hansen, A. Ostermeier, Completely derandomized self-adaptation in evolution strategies. Evolut. Comput. 9(2), 159–195 (2001) 4. G. Liu, L. Zhao, F. Yang, J. Bian, T. Qin, N. Yu, T.-Y. Liu, Trust region evolution strategies. Proc. AAAI Conf. Artif. Intell. 33, 4352–4359 (2019) 5. T. Salimans, J. Ho, X. Chen, S. Sidor, I. Sutskever, Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017) 6. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners (2020) 7. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, Imagenet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115(3), 211–252 (2015) 8. G. Morse, K.O. Stanley, Simple evolutionary optimization can rival stochastic gradient descent in neural networks. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2016), pp. 477–484 (2016) 9. K. Choromanski, M. Rowland, V. Sindhwani, R. E. Turner, A. Weller. Structured evolution with compact architectures for scalable policy optimization. International Conference on Machine Learning, pp. 969–977 (2018) 10. N. Hansen, The CMA evolution strategy: a comparing review, in Towards a new evolutionary computation (Springer, 2006), pp. 75–102
28 An Adaptive Stochastic Gradient-Free …
347
11. J. Zhang, H. Tran, D. Lu, G. Zhang, A scalable evolution strategy with directional Gaussian smoothing for blackbox optimization. arXiv preprint arXiv:2002.03001 (2020) 12. D. Wierstra, T. Schaul, T. Glasmachers, Y. Sun, J. Peters, J. Schmidhuber, Natural evolution strategies. J. Mach. Learn. Res. 15, 949–980 (2014) 13. X. Cui, W. Zhang, Z. Tüske, and M. Picheny. Evolutionary stochastic gradient descent for optimization of deep neural networks. NeurIPS (2018) 14. F.P. Such, V. Madhavan, E. Conti, J. Lehman, K.O. Stanley, J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017) 15. S. Khadka, S. Majumdar, T. Nassar, Z. Dwiel, E. Tumer, S. Miret, Y. Liu, K. Tumer. Collaborative evolutionary reinforcement learning, in Proceedings of the 36th International Conference on Machine Learning (2019) 16. O. Sigaud, F. Stulp, Robot skill learning: from reinforcement learning to evolution strategies. Paladyn J. Behav. Robot. 4(1), 49–61 (2013) 17. J. Zhang, H. Tran, G. Zhang, Accelerating reinforcement learning with a directional-gaussiansmoothing evolution strategy. arXiv preprint arXiv:2002.09077 (2020) 18. J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012) 19. J. Larson, M. Menickelly, S.M. Wild, Derivative-free optimization methods. Acta Numerica 28, 287–404 (2019) 20. L.M. Rios, N.V. Sahinidis, Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim. 56, 1247–1293 (2009) 21. A.D. Flaxman, A.T. Kalai, A.T. Kalai, H.B. McMahan, Online convex optimization in the bandit setting: gradient descent without a gradient. Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, pp. 385–394 (2005) 22. Y. Nesterov, V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, pp 1–40 (2015) 23. N. Maheswaranathan, L. Metz, G. Tucker, D. Choi, J. Sohl-Dickstein. Guided evolutionary strategies: augmenting random search with surrogate gradients. in Proceedings of the 36th International Conference on Machine Learning (2019) 24. I. Rechenberg, M. Eigen, Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. (Frommann-Holzboog Stuttgart, 1973) 25. N. Müller, T. Glasmachers, Challenges in high-dimensional reinforcement learning with evolution strategies, in International Conference on Parallel Problem Solving from Nature. (Springer, 2018), pp. 411–423 26. A. S. Berahas, L. Cao, K. Choromanskiv, K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization. arXiv:1905.01332 (2019) 27. H. Mania, A. Guy, B. Recht, Simple random search provides a competitive approach to reinforcement learning. CoRR, abs/1803.07055 (2018) 28. J. Peters, S. Schaal, Reinforcement learning of motor skills with policy gradients. Neural Netw. 21, 682–697 (2008) 29. A. Pourchot, O. Sigaud, CEM-RL: combining evolutionary and gradient-based methods for policy search. ICLR (2019) 30. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) 31. M. Fazel, R. Ge, S.M. Kakade, M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator. Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1467–01476 (2018) 32. A. Maggiar, A. Wachter, I.S. Dolinskaya, J. Staum, A derivative-free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple importance sampling. SIAM J. Opt. 28(2), 1478–1507 (2018) 33. Y. Nesterov, Introductory Lectures on Convex Optimization (Springer, US, 2004) 34. G. Fishman, Monte Carlo. Springer Series in Operations Research, in Concepts, Algorithms, and Ppplications (Springer, New York, 1996)
348
A. Dereventsov et al.
35. R.E. Caflisch, Monte Carlo and quasi-Monte Carlo methods. Acta numerica. (Cambridge Univ. Press, 1998), pp. 1-49 36. F. Nobile, R. Tempone, C.G. Webster, An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Num. Anal. 46(5), 2411–2442 (2008) 37. F. Nobile, R. Tempone, C.G. Webster, A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Num. Anal. 46(5), 2309–2345 (2008) 38. M. Abramowitz, I. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972) 39. Y. Nesterov, Lectures on convex optimization, vol. 137 (Springer, 2018) 40. A. Auger, N. Hansen, A restart CMA evolution strategy with increasing population size. In 2005 IEEE congress on evolutionary computation, vol. 2, pp. 1769–1776. (IEEE, 2005) 41. D. Eriksson, M. Pearce, J. Gardner, R. D. Turner, M. Poloczek. Scalable global optimization via local bayesian optimization, in Advances in Neural Information Processing Systems, pp. 5497–5508 (2019) 42. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba. Openai gym. arXiv preprint arXiv:1606.01540 (2016) 43. E. Coumans, Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository (2016) 44. M.J. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7(2), 155–162 (1964) 45. J.A. Nelder, R. Mead, A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965) 46. J. Nocedal, S. Wright, Numerical optimization. (Springer, 2006)
Chapter 29
Analysis of Cotton Yarn Count by Fuzzy Logic Model V. Visalakshi, T. Yogalakshmi, and Oscar Castillo
1 Introduction Various modelling methods are there to prognosticate the characteristics of yarn namely mathematical, statistical regression and intelligent models. Bodgan [1, 2], Subramanian et al. [17], Zurek et al. [28] and Frydrych [7] flourished the mathematical models which are based on the theory of fundamentals in basic sciences. The assumptions and simplifications for predicting the accuracy are not much satisfactory. Statistical regression models are easy to predict the yarn strength progressed by the authors Hafez [9], Hunter [10], Mogahzy [12] and Smith and Waters [16]. Various properties of spun yarn are analysed successfully by Cheng and Adams [4], Ramesh et al. [15], Zhu and Ethridge [25, 26], Guha et al. [8] using ANN, neural fuzzy method [11]. Fuzzy logic [5, 6] has been applied in enormous fields such as control system, queuing theory, robotics and optimization [20, 21] which enhances and gives modern approach to the existing field with precise results. The fuzzy inference system is fairly easy as ANN, and the method of prediction by using the significant parameters is uncomplicated. The terminologies, low, high, medium, are used to assess the fineness of fibre and quality of yarn. Eventually, these terminologies were referred to as “fuzzy” which is imprecise, vague and uncertain. Long and fine fibre produces strong yarn with good count. Thus, the cotton yarn count V. Visalakshi (B) Department of Mathematics, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India e-mail: [email protected] T. Yogalakshmi School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] O. Castillo Tijuana Institute Technology, Tijuana, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_29
349
350
V. Visalakshi et al.
can be predicted and acquired more effectively by the fuzzy logic model (fuzzy expert system) from the significant parameters. The high number in the cotton count, the fine the yarn. In the USA, cotton count between 1 and 20 is referred to as coarse counts. A regular single Knit-T-Shirt can be between 20 and 40 count, and fine bed sheets are usually in the range from 40 to 80 count [29].
2 Soft Computing Performances Solutions to complicated computational problems are acquired by soft computing techniques; when classical computing fails to approach solution to the problems, the soft computing can be applied to approach the problems. Soft computing comprises the following heuristic prediction models artificial neural network (ANN), biological neural network (BNN), genetic algorithm, fuzzy logic. The techniques ANN, BNN, genetic algorithm have some void gap in optimization which would be overcome effectively by the fuzzy inference system. Also, those models require huge amount of data, whereas fuzzy inference system does not require. Artificial intelligence is an intelligence conferred by the machines. Programmes in AI are advanced to perform many peculiar tasks such as online trading platform, remote sensing, movement of robots. Fuzzy logic which provides various solutions to any complex problem whose input from time to time is fuzzy in nature. The fuzzy logic interpretation involves distinct parameters, that is, fibre maturity, length uniformity, fibre tenacity, mean length, micronaire, short fibre content and fibre friction. Any structure requires a comparatively meticulous model to obtain a satisfactory system. Dependance of the fuzzy logic system is merely on the knowledge of an expert of the particular system but not on the model of the system. In this paper, the cotton yarn count has been predicted by using fuzzy logic. The fundamental enhancement of fuzzy logic modelling can be augmented with the help of the structure of degree of freedom. The primary focus of this study is to control the essential parameter such as fibre tenacity, mean length and micronaire to predict the yarn count using fuzzy intelligence system. Fuzzy intelligence system associates with three components such as fuzzification, fuzzy inference, defuzzification.
2.1 Fuzzification Fuzzification comprises the membership function which converts the linguistic terms [22] of the parameters into degree of membership. It transforms the input crisp values into linguistic variables. As an illustration consider the relative speed/velocity which
29 Analysis of Cotton Yarn Count …
351
is one of the inputs for acceleration system, the fuzzy set associated with input relative velocity large negative, zero and large positive. The fuzzy singleton relative velocity is fuzzified into the linguistic terms large negative, zero and large positive by the process of fuzzification.
2.2 Fuzzy Inference Fuzzy inference theory is based on the typical conditional statements IF-THEN rule involved with fuzzy variables. Most of the conditional statements describe the dependence of one or more linguistic variable on another. Fuzzy implication attempts to correlate the semantic meaning of the antecedent (partial truth in input variable) with the semantics of the consequent (output fuzzy set) and generates the solution. For example, IF the rainfall is Medium and the fertility of the soil is Good THEN the yield is High.
2.3 Defuzzification Defuzzification converts fuzzy inputs into crisp outputs. It leads to a representation of fuzzy set in terms of crisp number. Deduction of the crisp output can be determined from the outcome region of the membership function through fuzzy logic. As a result of the fuzzy implication process, the output of the crisp value can be obtained and aggregation steps of the fuzzy output. The efficiency of the decision hinge very much on the defuzzification process. Various techniques for defuzzification had been established, and the most commonly used defuzzification method is centre of area (COA). A conceptual fuzzy modelling is given (Fig. 1).
Fig. 1 Fuzzy inference model
352
V. Visalakshi et al.
3 Fuzzy Inference Engine The traditional Boolean logic is a subset of the fuzzy logic [22–24] as it extends its view towards the concept of vagueness. Linguistic variables, fuzzy sets and fuzzy rules are the primitive components of fuzzy logic [3, 13, 27]. Fuzzy set mathematically defines a membership degree for any non-empty set. The membership function (MF) of a set A is denoted by µ A (x) and 0 ≤ µ A (x) ≤ 1. There are various curves of membership functions are available, and the most commonly used are triangular and trapezoidal. Effectiveness on the system performance and veracity has been obtained by the normalization of all inputs to the identical range. A fuzzy set is given by A = {(x, µ A (x)) : ∀x ∈ A, 0 ≤ µ A (x) ≤ 1}. Fuzzy rules are legitimate language interpretation of an expertise. IF fibre tenacity is “low” AND Mean Length is “high” AND Micronaire is “high” Then Yarn Count is “Fine Count” which is one of the fuzzy conditional proposition. Antecedent part of the fuzzy conditional proposition is fibre tenacity is “low” AND Mean Length is “high” AND Micronaire is “high”, whereas the consequent part is Yarn Count is “Fine Count”. The premise predicate might be linguistic, and its truth value lies between 0 and 1. The input variables are comprised of the linguistic terms high, medium and low which are basically used by the textile expert during the process of blowroom, carding, drawing, simplex and spinning. Thus the input variables Fibre Tenacity and Mean Length and Micronaire are subjected to the linguistic terms such as “High”, “Medium” and “Low” depending the inference model the membership functions are applied. IF-Then rule, the preliminary part and the applying the result to the consequent part [18]. In fuzzy propositions, the preliminary part and consequent part in IF-Then rules have some degree of membership. Inference system of Mamdani model [19] predicts the membership function of the output fuzzy set. Fuzzy mamdani model is an intuitive one which is well adapted with human reasoning. It has more interpretable rule base techniques rather than the other FIS. After completing the process, each output variable is assigned with fuzzy set which in turn is to be defuzzified to get the crisp data. The defuzzification can be done by centre of area method. The area and centroid of each subarea are calculated, and then the sum of all subareas is taken to find the defuzzified value of a discrete fuzzy set.
4 Proposed Model Triangular and trapezoidal fuzzy regions are the simplest pice-wise linear membership functions which work very well in many applications like medical diagnosis, power generation even in cotton yarn count. In this model, the inputs are essential parameters for predicting the output and they are considered as triangular and trapezoidal membership functions.
29 Analysis of Cotton Yarn Count …
353
Fig. 2 Proposed fuzzy model
As in Fig. 2, to predict the cotton yarn count, the essential parameters FT, ML and MN are considered through the fuzzy toolbox, MATLAB. For fuzzifying, the linguistic terms associated with the inputs Low(L), Medium(M), High(H) and Coarse Count(CC), Medium Count(MC) and Fine Count(FC) for the output. The study of the bisector method is drawing the vertical line which partition the region into subregion of equal area. The parameters having the units were FT(gram/tex), ML(mm) and MN and CY(Ne). Fuzzy rules can be generated with the defined fuzzy sets. The enhancement of degree of freedom of the fuzzy system through fuzzy logic is adapted and tuned by Rajagopalan et al. [14]. Consider three fuzzy sets describing the input variables fibre tenacity (FT), mean length (ML) and micronaire (MN). These fuzzy sets are labelled as low(L), medium(M) and high(H). There are three inputs, and each consists of three fuzzy sets. Therefore, there will be twenty-seven propositions. Thus, there are n × m fuzzy propositions that exist only if n and m are the number of fuzzy sets of first and second inputs, respectively. There may be few propositions which have no significance. Hence, they are not valid (Table 1). Consider the output variable Cotton Count has three fuzzy sets Course Count(CC), Medium Count(MC) and Fine Count(FC). The fuzzy membership representations of input variables and output variable are shown in figures (Figs. 3, 4, 5 and 6). The 25 valid fuzzy propositions from the available 27 fuzzy propositions are given in Table 1 where H-High; M-Medium; L-Low; CC-Coarse Count; MC-Medium Count; FC-Fine Count. In fuzzy system, the input variable is fuzzified. According to the fuzzified inputs, the corresponding rules are evaluated. Deriving the consequent fuzzy set from the input fuzzy set and rules is called implication method. For example, consider the value of input for fibre tenacity 23.0 g/tex, mean length 30mm and micronaire as 3.5. The fuzzified values of 23 g/tex are 0.25 high and 0.35 medium. The fuzzified value of 30mm mean length is 0.55 High. The fuzzified value of 3.5 micronaire is 0.3 Medium and 0.7 High. The fuzzy propositions which will be triggered for the input values are as follows:
354
V. Visalakshi et al.
Table 1 Valid propositions Fuzzy associative memory Rule. no 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Fibre tenacity L L L L L L L L M M M M M M M M H H H H H H H H H
Mean length L L L M M M H H L L M M M H H H L L L M M M H H H
Micronaire L M H L M H M H L H L M H L M H L M H L M H L M H
Cotton yarn count CC CC CC CC MC MC MC FC CC CC CC MC MC CC MC FC CC CC CC CC MC MC MC MC FC
1. If input FT is High and ML is High and MN is Medium then Yarn count is Medium Count. 2. If input FT is Medium and ML is High and MN is Medium then Yarn count is Medium Count. 3. If input FT is High and ML is High and MN is High then Yarn count is Fine Count. 4. If input FT is Medium and ML is High and MN is High then Yarn count is Fine Count.
29 Analysis of Cotton Yarn Count …
Fig. 3 Membership function for fibre tenacity(x1 )
Fig. 4 Membership function for mean length(x2 )
355
356
Fig. 5 Membership function for micronaire(x3 )
Fig. 6 Membership function for cotton yarn count(x4 )
V. Visalakshi et al.
29 Analysis of Cotton Yarn Count …
357
The output fuzzy set extracted from each triggered rule 1. If input FT is High and ML is High and MN is Medium, then Yarn count is Medium Count. The output truth value is minimum{0.25,0.55,0.3} = 0.25. The membership function for the output fuzzy set is 0.25 Medium Count. 2. If input FT is Medium and ML is High and MN is Medium, then Yarn count is Medium Count. The output truth value is minimum{0.35,0.55,0.3} = 0.3. The membership function for the output fuzzy set is 0.3 Medium Count. 3. If input FT is High and ML is High and MN is High, then Yarn count is Fine Count. The output truth value is minimum{0.25,0.55,0.7} = 0.25. The membership function for the output fuzzy set is 0.25 Fine Count. 4. If input FT is Medium and ML is High and MN is High, then Yarn count is Fine Count. The output truth value is minimum{0.35,0.55,0.70} = 0.35. The membership function for the output fuzzy set is 0.35 Fine Count. Mamdani min is used for shaping the resultant fuzzy set. The valid propositions are executed in MATLAB to obtain the truncated fuzzy sets which leads to the output fuzzy region. This computational process of fuzzy modelling is shown in Fig. 7. Defuzzification completes the process of attaining output value, CYC = 57.6 from the output region which is shown as the shaded region oy CYC in Fig. 7. Centre of area, defuzzification method is applied and the formula is given by J ∗
j=1
U = J
u i µ OU T (x j )
j=1
µ OU T (x j )
(1)
Fuzzy inference engine executes the antecedent variables FT, ML and MN to generate the decision surface of CYC along with its fuzzy singleton which are shown in Fig. 7. The decisions surface is the strong combination of FT and MN which affects CYC which is shown in Fig. 8.
5 R Analysis In this section, the relation between the parameters FT, ML, MN and CYC is obtained. To exhibit a linear relationship, the multiple linear regression model is of the form Y = m0 + m1 X1 + m2 X2 + m3 X3
(2)
where Y is the outcome variable and X 1 , X 2 , X 3 are the multiple distinct predictor variables. The values of m 0 , m 1 , m 2 , m 3 are the regression coefficients. Using R, Table 2 is obtained which interprets that the predictor variable FT, ML and MN are highly significant with the outcome variable CYC. But the parameter ML is slightly significant with CYC. The coefficient of determination R 2 between
358
Fig. 7 Computational output
Fig. 8 Decision surface (FT and MN)
V. Visalakshi et al.
29 Analysis of Cotton Yarn Count …
359
CYC and the parameter FT, ML, MN is 0.92013. This indicates that there is a strong correlation between them. Finally, our regression model equation can be written as CYC = 2.076448FT + 0.184559ML − 8.80129MN + 26.77831 Table2
(3)
Regression Analysis
6 Conclusion In this paper, the CYC is predicted by controlling the essential parameters such as FT, ML and MN which have been effectively done by the fuzzy inference model. Through the regression analysis, it is highly evident that FT and MN are strongly correlated with the CYC. Thus, the parameters FT and MN supremely influence the yarn count. Main advantage of fuzzy inference model is as it is very simple and also based on knowledge expert system. The proposed model facilitates the cotton industry to plan and design in an advance to enhance the quality of the cotton yarn count.
360
V. Visalakshi et al.
References 1. J.F. Bodgan, The characterization of spinning quality. Text. Res. J. 26, 720–730 (1956) 2. J.F. Bodgan, The prediction of cotton yarn strengths. Text. Res. J. 37, 536–537 (1967) 3. R.C. Berkan, S.L. Trubatch, Fuzzy Systems Design Principles (Standard Publishers Distributors, New Delhi, 2000), pp. 22–131 4. L. Cheng, D.L. Adams, Yarn strength prediction using neural networks. Text. Res. J. 65, 495– 500 (1995) 5. D. Hidalgo, P. Melin, O. Castillo, Type-2 fuzzy inference system optimization based on the uncertainty of membership functions applied to benchmark problems, in MICAI’10: Proceedings of the 9th Mexican International Conference on Artificial Intelligence Conference on Advances in Soft Computing: Part II, pp. 454–464 (2010) 6. Dipak Jana, Oscar Castillo, S. Pramanik, M. Maiti, Application of interval type-2 fuzzy logic to polypropylene business policy in a petrochemical plant in India. J. Saudi Soc. Agric. Sci. 17(1), 24–42 (2018) 7. I. Frydrych, A new approach for predicting strength properties of yarn. Text. Res. J. 62, 340–348 (1992) 8. A. Guha, R. Chattopadhyay, Jayadeva, Predicting yarn tenacity: A comparison of mechanistic, statistical and neural network models. J. Text. Inst. 92, 139–145 (2001) 9. O.M.A. Hafez, Yarn strength prediction of American cottons. Text. Res. J. 48, 701–705 (1978) 10. L. Hunter, Prediction of cotton processing performance and yarn properties from HVI test results. Melli Textilberichte 69, E123–124 (1988) 11. A. Majumdar, P.K. Majumdar, B. Sarkar, Application of an adaptive neuro fuzzy system for the prediction of cotton yarn strength from HVI fibre properties. J. Text. Inst. 96, 55–60 (2005) 12. Y.E. Mogahzy, Selecting cotton fibre properties for fitting reliable equations to HVI data. Text. Res. J. 58, 392–397 (1988) 13. A. Rajagopalan, G. Washington, G. Rizzani, Y. Guezennec, Development of fuzzy logic and neural network control and advanced emissions modeling for parallel hybrid vehicles, in Center for Automotive Research, Intelligent Structures and Systems Laboratory, Ohio State University, USA (2003) 14. W. Pedrycz, F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design (The MIT Press, 1998) 15. M.C. Ramesh, R. Rajamanickam, S. Jayaraman, Prediction of yarn tensile properties using artificial neural network. J. Text. Inst. 86, 459–469 (1995) 16. B. Smith, B. Waters, Extending applicable ranges of regression equations for yarn strength forecasting. Text. Res. J. 55, 713–717 (1985) 17. T.A. Subramanian, K. Ganesh, S. Bandyopadhyay, A Generalized equation for predicting the lea strength of ring spun cotton yarns. Text. Inst. 65, 307–313 (1974) 18. H. Surmann, A.P. Ungering, Fuzzy rule-based systems on general-purpose processors. IEEE Micro 15, 40–48 (1995) 19. T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 15, 116–132 (1985) 20. V. Visalakshi, V. Suvitha, Performance measure of fuzzy queue by using pentagonal fuzzy number. IOP Conf. Ser.: J. Phys. Conf. Ser. 1000, 012015 (2018) 21. T. Yogalakshmi, O. Castillo, On intuitionistic fuzzy C-ends. Trends Math. 1, 177–184 (2017) 22. L.A. Zadeh, Fuzzy Sets Inform Control 8, 338–353 (1965) 23. L.A. Zadeh, Fuzzy logic, neural networks and soft computing. Commun. ACM 37, 77–84 (1994) 24. L.A. Zadeh, Fuzzy sets versus probability. Proc. IEEE 68, 421–422 (1980) 25. R. Zhu, D. Ethridge, The prediction of cotton yarn irregularity based on the AFIS measurement. Text. Res. Inst. 87, 509–512 (1996) 26. R. Zhu, D. Ethridge, Predicting hairiness for ring and rotor spun yarns and analysing the impact of fibre properties. Text. Res. J. 67, 694–698 (1997)
29 Analysis of Cotton Yarn Count …
361
27. H.J. Zimmerman, Fuzzy Set Theory and Its Applications, 2nd edn (Allied Publishers Limited, New Delhi, 1996), pp. 109–169 28. W. Zurek, I. Frydrych, S. Zakrzewski, A method of predicting the strength and breaking strain of cotton yarn. Text. Res. J. 57, 439–444 (1987) 29. https://www.apparelsearch.com/education/measurements/textiles/yarn-thread/cotton$_ $count.html
Chapter 30
A Risk-Budgeted Portfolio Selection Strategy Using Invasive Weed Optimization Mohammad Shahid, Mohd Shamim Ansari, Mohd Shamim, and Zubair Ashraf
1 Introduction Portfolio optimization has always been an area of interest in the field of investment. The experts of the fields have developed portfolio models/theories or approaches to optimize the portfolio keeping in mind the various issues and constraints in optimization or proper selection of reasonable number of securities in the portfolio. Some of the important problems of portfolio optimization are quality constraints, transaction constraints, transaction lot constraints, etc. Henry Markowitz has given attention on portfolio diversification in a scientific way to minimize risk of the portfolio, but with the passage of time due to high volatility and problem of plenty of securities, these models do not in real sense provide solution to optimize the portfolio [1–3]. With the advent of high-powered computational technology, the new ways of solving the complex problems have been developed and these techniques have been used in the field of finance and investment as well. The problem of portfolio optimization has also drawn the interest of the experts of field of mathematics, operational research, computer science to solve the complex problems in portfolio selection [4]. Swarm intelligence is one of the techniques which has been developed recently and has got a lot of popularity in the field of portfolio optimization. The fundamentals of portfolio optimization have been the same, i.e., maximization of return and minimization of risk. The technique of swarm intelligence is drawn from the behavior of some of the animals, birds and insects like butterfly, honey bees, cuckoos, ants, cats, etc., where individually swarms are distracted but slowly and gradually achieve their goals by learning naturally as how to arrange foods, supplied to the homes, minimize the distance between home and food source, etc., made things proper, efficient M. Shahid (B) · M. S. Ansari · M. Shamim Department of Commerce, Aligarh Muslim University, Aligarh, India Z. Ashraf Department of Computer Science, Aligarh Muslim University, Aligarh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2_30
363
364
M. Shahid et al.
and optimum. In the same manner, in the field of computer science by using artificial intelligence and computer programming such kind of problems of managing constraints is resolved in efficient manner to optimize the portfolio [5–7]. These techniques have been developed due to many constraints and inability to resolve the issue by using some of the conventional models. In this paper, a novel risk-budgeted portfolio selection strategy using invasive weed optimization (IWO) is proposed to maximize sharp ratio. Invasive weed optimization is a swarm-based approach inspired by the invasion process of weeds in the plants. The natural invasion has been mathematically modeled to search the optimum solution in the solution space. An experimental study has been done to evaluate the proposed strategy by comparing its performance with Genetic Algorithm on dataset of the S&P BSE Sensex of Indian stock exchange (30 stocks). Study reveals the better performance of proposed strategy than GA. The remaining paper is organized as: Sect. 2 discusses the related literature reported. The problem formulation has been presented in Sect. 3. In Sect. 4, the solution approach using IWO has been discussed. Experimental results are shown in section. Finally, conclusion with future direction is given in Sect. 6.
2 Related Work Portfolio selection has been most concerned area in the field of financial optimization for last two decades. Swarm intelligence techniques have been developed to solve the portfolio selection problem [8, 9]. The swarm intelligence (SI) algorithm includes particle swarm optimization (PSO), artificial bee colony optimization (ABC), firefly algorithm, etc. The improved Artificial Bee Colony Method by including efficient non-dominated portfolio sets technique applied in different spheres of life and found to be more effective. The efficiency frontiers obtained by focuses on making trade-off between risk and return of the portfolio. The study included four stock market indices with their values tested to verify the performance of IABC algorithm at different points of time, and results were compared with four other techniques. IABC algorithm was found to be better in terms of convergence, diversity and effectiveness [10]. The nonlinear constrained of portfolio optimization has been simplified by a metaheuristic named PSO developed in 1995. Authors identified the best particle from the population where each particle had a fitness value. At every step, the best position of the particle and the best neighbor in the swarm particles were updated based on its fitness. The PSO was tested on six different portfolios by selecting the stocks with their prices from Shanghai Stock Exchange 50 Index. This technique has been tested on different types of risky portfolios, and a comparison has been made and found to be superior as compared to earlier techniques and model [11–14]. Portfolio selection problem has been studied using the meta-heuristic cat swarm optimization algorithm from the behavior of creatures of mother nature like the seeking and the tracing mode; the seeking mode is when a cat is at rest observing
30 A Risk-Budgeted Portfolio Selection Strategy …
365
its environment by collecting lots of potential information, the tracing mode is when the cat is hunting by applying the collected information. Authors used cat algorithm (CA) for efficient frontier [15, 16]. Firefly algorithm (FA) is swarm intelligence meta-heuristics firstly proposed by Yang in 2008. Firefly algorithm is more intensive exploitation mechanism by introducing two parameters, i.e., unconstrained and constrained portfolio problems. It was found to have great potential in optimizing portfolio problems. [17, 18]. Shoaf et al. applied GA for portfolio selection problem in [19] first time. Afterward, Chang et al. [20] also used this evolutionary algorithm for Mean Variance Portfolio Optimization (MVPO). Some works in the field of portfolio optimization have been proposed to solve the problem using invasive weed optimization with mean variance model in [22, 23].
3 Problem Formulation In this section, portfolio selection problem has been formulated. Here, a portfolio (P) consisting of N assets is constructed, i.e., P = {A1 , A2, ,. . . A N } with associated weights {W1 , W2, , . . . W N }. The expected returns of Ai are as R1 , R2, , . . . R N . Now, total portfolio risk and the risk contribution of Ai are estimated as follows N N Risk P = Wi ∗ W j ∗ CoV(i, j) i
Risk Ai
(1)
j
N = W j ∗ CoV(i, j)
(2)
j
where Wi are the weights of Ai . CoV(i, j) are covariances among returns. Now, risk budgeting is used for distributing the risk share among Ai in P and affected by marginal contribution to risk (MCR). MCTRi can be the amount of risk contributed by Ai to total portfolio risk (Risk P ) and is computed as MCTRi =
Risk Ai Risk P
(3)
Now, Absolute Contribution to Total Risk (ACTRi ) is as ACTRi = Wi *MCTRi , i = 1, 2, . . . N
(4)
The investor wants to invest the funds on the portfolio which maximizes the sharpe ratio (SR P ). Risk-free return is considered in the case here zero as we are dealing equity-based assets. First, the sharpe ratio of the portfolio is estimated, and
366
M. Shahid et al.
the problem statement of the considered problem can be modeled as Max (SR P ) = max
1. 2. 3. 4.
N 1
Ri ∗ Wi Risk P
(5)
subject to the constraints N i=1 wi = 1 wi ≥ 0 ai ≤ wi ≤ bi ACTRi ≤ r % of Risk P , i = 1, 2, . . . , N
Here, (i) represent fully invested constraint and (ii) constraint restricts the short sell. Further, (iii) is the boundaries constraint imposing bounds for assets weights. Lastly, (iv) is risk budgeting constraint that distributes the net portfolio risk over assets with risk limit (r%). Constraints are linear in the problem; feasible region is convex. Therefore, penalty function process is used for constrain handling. Consider the search space of the decision variables is represented by X , then the penalty function J be as. J (x) =
F+
gi (x),
if
i
F(x),
if
x∈ / X x∈X
(6)
where F and g presents the solution value and the constraints value of variable x in the feasible region.
4 Proposed Strategy In this section, proposed strategy for risk-budgeted portfolio using invasive weed optimization has been explained for generating optimal weights to maximize sharp ratio of portfolio. Mehrabian and Lucas in 2006 [21, 22] develop an optimization meta-heuristic by simulating the behavior of invasive weeds. These are adaptable to the environment and have rapid growth. The algorithmic steps are presented in Fig. 1 as follows: Now, reproduction of weeds as stated in step 6 in template has been done linearly between S min (minimum numbers of allowed seeds) and S max (maximum numbers of allowed seeds) according to their cost function. Weeds with a lower fitness value produce more seeds than the weeds with a higher. The number of seeds (seedik ) those are reproduced by each weed in iteration k is given as follows. seedik = Smax − (Smax − Smin )
k f ik − f min k − fk f max min
(7)
30 A Risk-Budgeted Portfolio Selection Strategy …
367
Fig. 1 Algorithmic template k where k = 1,2,…,itermax , and f ik is the fitness function for weed i in iteration k, f min k is the minimum fitness value, and f max is the maximum fitness value in iteration k in colony. Further, weeds distribute their seeds to grow new weeds as in step 7. The standard deviation for iteration k (σ k ) can be computed as
σ k = σfinal + (σinitial − σfinal )
(I trmax − k)n (I trmax − 1)n
(8)
The standard deviation of f ik decreases from σ initial to σ final, and n is the nonlinear modulation index, considered here to be 3.
5 Experimental Results In this section, an experimental study has been conducted to evaluate the performance of suggested strategy by comparing the results from Genetic Algorithm (GA) for same objective and simulation environment. Here, dataset of S&P BSE Sensex of Indian stock exchange is taken for the purpose of monthly holding period returns from April 1, 2010, to March 31, 2020. Experiments were conducted on MATLAB running on the processor Intel i7(R) with 16 GB RAM.
368
M. Shahid et al.
Fig. 2 Objective function versus number of iteration
The parameters of IWO are initialized as, number of points in the population, initial population size = 100, δinitial = 0.5, δfinal = 0.001, S min = 0, S max = 5, n = 2, and maximum iteration is 100. The parameters of GA: single-point crossover probability and polynomial mutation probability, are chosen as 0.6 and 0.4, respectively. Remaining parameters for GA are same as IWO. Now, as output of experiments done, the value of sharp ratio and corresponding return and risk is 0.3209, 0.01460 and 0.0455, respectively. On the same pattern, the values of above-considered parameters for GA are 0.2936, 0.0131 and 0.0447. The proposed strategy is performing better than GA in terms of maximizing the sharp ratio. Also, convergence behavior of IWO and GA is shown in Fig. 2 up to 500 iterations and IWO performs better than GA from very early. For proposed strategy, the optimal portfolio under the formulated optimization problem’s parameters is presented in Table 1. The description of the optimal portfolio with the weights, marginal contributions to the risk, risk budgeting constraint and the status of stocks satisfying risk budgeting constraint for each company are shown in Table 1. The risk budgeting constraints taken in this study are 12.5% of the total risk for all assets. All considered constraints in the problem formulation are satisfied for the optimal portfolio produced. The fully invested constraint is satisfied in Table 1 that restricts the summation of all the optimal weight that must be equal to 1. Boundary constraints which restrict the weight values between 0 and 1 are also satisfied. Here, risk budgeting constraint is also satisfied for all stocks which can be seen in the table with each status value as ‘True’. This implies that the risk contribution of these companies is less or equal to than 12.5%.
6 Conclusions and Future Scope In this work, a novel risk-budgeted portfolio selection strategy using invasive weed algorithm (IWO) has been developed. The aim of the strategy is to maximize sharp ratio under the fully invested, boundaries and risk budgeting constraints. Finding suitable place in the literature for proposed strategy, a study is carried out by comparing
30 A Risk-Budgeted Portfolio Selection Strategy …
369
Table 1 Description of optimal portfolio achieved by proposed strategy S. No
Company name
Wi
MCTRi
ACTRi
r%
Status
1
Asian Paints Ltd.
0.1165
0.0447
0.0052
−0.0005
True
2
Axis Bank Ltd.
0.0000
0.0804
0.0000
-0.0057
True
3
Bajaj Auto Ltd.
0.0000
0.0402
0.0000
-0.0057
True
4
Bajaj Finance Ltd.
0.0300
0.0959
0.0029
-0.0028
True
5
Bajaj Finserv Ltd.
0.0721
0.0726
0.0052
-0.0005
True
6
Bharti Airtel Ltd.
0.0264
0.0355
0.0009
-0.0048
True
7
H C L Technologies Ltd.
0.0000
0.0190
0.0000
-0.0057
True
8
H D F C Bank Ltd.
0.0001
0.0393
0.0000
-0.0057
True
9
Hindustan Unilever Ltd
0.2275
0.0247
0.0056
-0.0001
True
10
Housing Finance Corpn. Ltd.
0.0352
0.0482
0.0017
-0.0040
True
11
I C I C I Bank Ltd.
0.0000
0.0722
0.0000
-0.0057
True
12
I T C Ltd.
0.0000
0.0301
0.0000
-0.0057
True
13
Indusind Bank Ltd.
0.0000
0.0750
0.0000
-0.0057
True
14
Infosys Ltd.
0.0000
0.0419
0.0000
-0.0057
True
15
Kotak Mahindra Bank Ltd.
0.0762
0.0443
0.0034
-0.0023
True
16
Larsen & Toubro Ltd. 0.0000
0.0651
0.0000
-0.0057
True
17
Mahindra & Mahindra Ltd.
0.0000
0.0587
0.0000
-0.0057
True
18
Maruti Suzuki India Ltd
0.0000
0.0652
0.0000
-0.0057
True
19
N T P C Ltd.
0.0000
0.0369
0.0000
-0.0057
True
20
Nestle India Ltd.
0.1823
0.0310
0.0056
0.0000
True
21
Oil & Natural Gas Corpn. Ltd.
0.0179
0.1340
0.0024
-0.0033
True
22
Power Grid Corpn. Of India Ltd.
0.0176
0.0237
0.0004
-0.0053
True
23
Reliance Industries Ltd
0.0022
0.0426
0.0001
-0.0056
True
24
State Bank of India
0.0351
0.1338
0.0047
-0.0010
True (continued)
performance of proposed strategy with Genetic Algorithm on dataset of the S&P BSE Sensex of Indian stock exchange (30 stocks). Study proves the better performance of the proposed strategy for considered parameters on the above-mentioned dataset. Acknowledgements This work is supported by the major research project funded by ICSSR with sanction No. F.No.-02/47/2019-20/MJ/RP.
370
M. Shahid et al.
Table 1 (continued) S. No
Company name
Wi
MCTRi
ACTRi
r%
Status
25
Sun Pharmaceutical Inds. Ltd.
0.0001
0.0262
0.0000
-0.0057
True
26
Tata Consultancy Services Ltd.
0.0719
0.0177
0.0013
-0.0044
True
27
Tata Steel Ltd.
0.0000
0.0450
0.0000
-0.0057
True
28
Tech Mahindra Ltd.
0.0129
0.0313
0.0004
-0.0053
True
29
Titan Company Ltd.
0.0760
0.0742
0.0056
0.0000
True
30
Ultratech Cement Ltd.
0.0000
0.0479
0.0000
-0.0057
True
References 1. H. Markowitz, Portfolio Selection: Efficient Diversification of Investments (Wiley, New York, 1959) 2. H. Markowitz, Portfolio selection. J. Finance 7(1), 77–91 (1952) 3. Y. Jin, R. Qu, J. Atkin, Constrained portfolio optimisation: the state-of-the-art markowitz models, in Proceedings of 5th the International Conference on Operations Research and Enterprise Systems, vol. 1 (2016), pp. 388–395 . https://doi.org/10.5220/000575830388 0395.ICORES, ISBN 978-989-758-171-7 4. Y. Crama, M. Schyns, Simulated annealing for complex portfolio selection problems. Eur. J. Oper. Res. 150(3), 546–571 (2003). https://doi.org/10.1016/S0377-2217(02)00784-1 5. K.E. Parsopoulosi, M.N. Vrahatis, Recent approaches to global optimization problems through particle swarm optimization. Nat Comput 1, 235–306 (2002) (Kluwer Academic Publishers, Netherlands) 6. W. Ting, Y. Xia, The study of model for portfolio investment based on ant colony algorithm, in ETP International Conference on Future Computer and Communication (IEEE, 2009), 978-0-7695–3676–7/09 https://doi.org/10.1109/FCC.2009.71 7. B. Niu, B. Xue, Li. Li, Y. Chai, Symbiotic Multi-Swarm PSO for Portfolio Optimization Ben. Springer-Verlag, Berlin Heidelberg LNAI 5755, 776–784 (2009) 8. D. Karaboga, B. Akay, A survey: algorithms simulating bee swarm intelligence. Artif. Intell. Rev. 31, 61–85 (2009) https://doi.org/10.1007/s10462-009-9127-4 (Springer Science. Business Media) 9. K. Mazumdar, D. Zhang, Yi. Guo, Portfolio selection and unsystematic risk optimisation using swarm intelligence. J. Bank. Financ. Technol. (2020). https://doi.org/10.1007/s42786019-00013-xORIGINALPAPER 10. A. H. L., Liang, Y.-C., Liu, C.-C.: Portfolio optimization using improved artificial bee colony approach, in 2013 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr) (2013). https://doi.org/10.1109/cifer.2013.6611698 11. H. Zhu, Y. Wang, K. Wang, Y. Chen, Particle swarm optimization (PSO) for the constrained portfolio optimization problem. Expert Syst. Appl. 38(8), 10161–10169 (2011). https://doi. org/10.1016/j.eswa.2011.02.075 12. Q. Ni, X. Yin, K. Tian, Y. Zhai, Particle swarm optimization with dynamic random population topology strategies for a generalized portfolio selection problem. Nat. Comput. 16(1), 31–44 (2016) 13. C. Liu, Y. Yin, Particle swarm optimized analysis of investment decision. Cogn. Syst. Res. 52, 685–690 (2018). https://doi.org/10.1016/j.cogsys.2018.07.032
30 A Risk-Budgeted Portfolio Selection Strategy …
371
14. H. Zhang, Optimization of risk control in financial markets based on particle swarm optimization algorithm. J. Comput. Appl. Math. 112530 (2019). https://doi.org/10.1016/j.cam.2019. 112530 15. P.-W. Tsai, V. Istanda, Review on cat swarm optimization algorithms, in 2013 3rd International Conference on Consumer Electronics (Communications and Networks, 2013). https://doi.org/ 10.1109/cecnet.2013.6703394 16. H.M. Kamil, E. Riffi, Portfolio selection using the cat swarm optimization. J. Theoret. Appl. Inf. Technol. 74(3). (2005) ISSN: 1992–8645 17. M. Tuba, N. Bacanin, Upgraded firefly algorithm for portfolio optimization problem, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 978-14799-4923–6/14. (2014). https://doi.org/10.1109/UKSim.2014.25 18. N. Bacanin, B. Pelevic, M. Tuba (2012) Portfolio optimization problem by the firefly algorithm. Mathe Models Eng. Comput. Sci. pp. 63–68, ISBN: 978-1-61804–194–4 (2012) 19. J. Shoaf, J.A. Foster, Efficient set GA for stock portfolios, in Proceedings of the IEEE Conference on Evolutionary Computation, ICEC. (IEEE, 1998), pp. 354–359 20. T.J. Chang, S.C. Yang, K.J. Chang, Portfolio optimization problems in different risk measures using genetic algorithm. Expert Syst. Appl. 36, 10529–10537 (2009) 21. A. Mehrabian, C. Lucas, A novel numerical optimization algorithm inspired from weed colonization. Eco. Inform. 1(4), 355–366 (2006) 22. A.R. Pouya, M. Solimanpur, M.J. Rezaee, Solving multi-objective portfolio optimization problem using invasive weed optimization. Swarm Evol. Comput. 28, 42–57 (2016) 23. P. Swain, A.K. Ojha, Portfolio optimization using particle swarm optimization and invasive weed optimization, in D. Dutta, B. Mahanty (eds.) Numerical optimization in engineering and Sciences. Advances in intelligent systems and computing, vol. 979. (Springer, Singapore, 2020). https://doi.org/10.1007/978-981-15-3215-3_30
Author Index
A Agarwal, Vardan, 45, 46, 101 Aghav, Jagannath, 149 Ansari, Mohd Shamim, 363 Arjaria, Siddhartha Kumar, 59 Arya, K. V., 171 Ashraf, Zubair, 363 B Bagchi, Jayri, 137 Bora, Parinita, 71 C Castillo, Oscar, 349 Chaitanya, M. B. Surya, 207 Chatterjee, Pradeep, 321 Chatterjee, Subarna, 71 Chatterjee, Sudip, 195 Chaubey, Gyanendra, 59
G Gautham, A. K., 87 Goel, Shivani, 321 Gupta, Anjali, 207 Gupta, Lalita, 241
H Hamdare, Safa, 31, 32
I Ibrahim, Vazim, 183
J Jairam, Bhat Geetalaxmi, 263 Jetawat, Ashok Kumar, 221 Jhalani, Harshit, 101 Jiang, Shi-Jie, 281 Jyothi Prasanth, D. R., 127
D Daws, Joseph, 333 Dcruz, Jenny, 31, 32 Dereventsov, Anton, 333 Dhalia Sweetlin, J., 127 Dixit, Anuja, 45, 46 Dixit, Rahul, 45, 46, 101 Dogra, Manmohan, 31, 32 Domala, Jayashree, 31, 32
K Kanagaraj, G., 87 Kaushik, Priyanka, 161 Keshervani, Rahul, 149 Khare, Priyanshi, 1 Kumaraguruparan, G., 87 Kumar, Ajai, 299 Kuriakose, Yohan Varghese, 45, 46
E Eswari, R., 113
L Lal, Chhagan, 253
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-3802-2
373
374 M Makwana, Gaurav, 241 Mishra, Apoorva, 299 Mujahid, A. Abdulla, 87
N Naseem, Shaik, 207 Nguyen, Thi-Xuan-Huong, 281, 311 Nikam, Chandana, 149 Ning, Chun-Yuan, 281, 311
P Paul, Joseph Suresh, 183 Pham, Duc-Tinh, 281, 311
Author Index Shang, Jun-Jie, 281, 311 Sharma, Bhawana, 253 Sharma, Lokesh, 253 Sharma, Neetu, 289 Sharma, Sanjeev, 299 Shukla, Sanyam, 1 Singh, Astha, 171 Singh, Lenali, 299 Singh, Pranav, 101 Singh, Shashi Pal, 299 Sridevi, M., 15, 207 Si, Tapas, 137 Sruthi, Sreeram, 127 Sugirtha, T., 15
T Thakur, Gour Sundar Mitra, 195 R Rajput, Shyam Singh, 171 Ramya, S. P., 113 Ratan, Rajeev, 161 Rohilla, Rajesh, 289
S Sable, Rachna Yogesh, 321 Sangle, Ravindra, 221 Satapathy, Sukla, 231 Shahid, Mohammad, 363 Shah, Shravani, 149 Shamim, Mohd, 363
V Visalakshi, V., 349
W Wadhvani, Rajesh, 1 Webster, Clayton G., 333
Y Yadav, Ram Narayan, 241 Yogalakshmi, T., 349