130 71 20MB
English Pages [503]
Studies in Computational Intelligence 956
Vinit Kumar Gunjan Jacek M. Zurada Editors
Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough Latest Trends in AI, Volume 2
Studies in Computational Intelligence Volume 956
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/7092
Vinit Kumar Gunjan · Jacek M. Zurada Editors
Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough Latest Trends in AI, Volume 2
Editors Vinit Kumar Gunjan Department of Computer Science and Engineering CMR Institute of Technology Hyderabad, India
Jacek M. Zurada Department of Electrical and Computer Engineering University of Louisville Louisville, KY, USA
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-68290-3 ISBN 978-3-030-68291-0 (eBook) https://doi.org/10.1007/978-3-030-68291-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Technology is on the increase faster than ever. The machine learning and cognitive science approaches are the next most essential innovation movement, driven by innovations in computing power and based on the firm foundation of mathematics. Today, in order to succeed, almost every company is willing to integrate these methods into the business fabric. Nonetheless, these ideas were out of reach for companies until a few years ago. The purpose of this book is to put together a comprehensive body of knowledge and recent machine learning and cognitive science research. It facilitates a cohesive view of the framework and as an academic discipline, and research enterprise lays the foundation for modern approaches in machine learning and cognitive sciences. The book is intended to serve as a tool for advancing machine learning and cognitive sciences studies. This seeks to meet the needs of researchers, research scholars and professionals in the sector. It is particularly suitable for research scholars on different techniques used in machine learning and its applications, cognitive sciences and computing technologies. It also provides a good guide for scholars intending to pursue research in these fields. The book is also a good reference for professionals in the industry who want to know the modern technologies. The book makes few assertions about the reader’s context, due to the interdisciplinary nature of the content. Instead, it incorporates fundamental concepts from statistics, artificial intelligence, information theory and other fields as the need arises, concentrating on just those concepts that are most applicable to machine learning and cognitive sciences. The book is written in this sense to give the researchers detailed perspective to enlighten them. The book will help you easily, effectively and accurately grasp the concepts of machine learning, cognitive and related technologies. The book consists of 40 chapters, arranged on the basis of their approaches and contributions to the book’s theme. The chapters of this textbook present key algorithms and theories that form the core of the technologies and applications concerned, consisting mainly of face recognition, evolutionary algorithms such as genetic algorithms, automotive applications, automation devices with artificial neural networks, business management systems and modern speech processing systems. It also covers topics which are used as a part of the learning modules in deep learning algorithms. v
vi
Preface
The book also covers recent advances in medical diagnostic systems, sensor networks and systems in the VLSI area. Hyderabad, India Louisville, USA
Vinit Kumar Gunjan Jacek M. Zurada
Contents
Using CNN to Predict Regressive STD Drug Efficacy Score . . . . . . . . . . . . Ambarish Moharil, Mansimran Singh Anand, Chirag Kedia, and Nikhil Sonavane
1
Emotion Recognition in Hindi Speech Using CNN-LSTM Model . . . . . . . B. Shashank, Bhavani Shankar, L. Chandresh, and R. Jayashree
13
Refinery Profit Planning via Evolutionary Many-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vadlamani Madhav, Shaik Tanveer-Ul Huq, and Vadlamani Ravi
23
A Deep Learning Technique for Image Inpainting with GANs . . . . . . . . . K. A. Suraj, Sumukh H. Swamy, Shrijan S. Shetty, and R. Jayashree
33
A Comparative Study on Distributed File Systems . . . . . . . . . . . . . . . . . . . . Suman De and Megha Panjwani
43
An Organized Approach for Analysis of Diabetic Nephropathy Images Using Watershed and Contrast Adaptive Thresholding . . . . . . . . Syed Musthak Ahmed, Fahimuddin Shaik, Vinit Kumar Gunjan, and Mohammed Yasin Ali A Literature Survey on Identification of Asthma Using Different Classifier and Clustering Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syed Musthak Ahmed, Fahimuddin Shaik, Vinit Kumar Gunjan, and Mohammed Yasin Ali
53
69
Adaptation and Evolution of Decision Support Systems—A Typological Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ravi Lourdusamy and Xavierlal J. Mattam
81
Real-Time Implementation of Brain Emotional Controller for Sensorless Induction Motor Drive with Adaptive System . . . . . . . . . . . Sridhar Savarapu and Yadaiah Narri
95
vii
viii
Contents
Student Performance Prediction—A Data Science Approach . . . . . . . . . . 115 Y. Sri Lalitha, Y. Gayathri, M. V. Aditya Nag, and Sk. Althaf Hussain Basha HARfog: An Ensemble Deep Learning Model for Activity Recognition Leveraging IoT and Fog Architectures . . . . . . . . . . . . . . . . . . . 127 R. Raja Subramanian and V. Vasudevan Performance Evaluation and Identification of Optimal Classifier for Credit Card Fraudulent Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Arpit Bhushan Sharma and Brijesh Singh Potential Use-Cases of Natural Language Processing for a Logistics Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Rachit Garg, Arvind W. Kiwelekar, Laxman D. Netak, and Swapnil S. Bhate Partial Consensus and Incremental Learning Based Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Mohd Mohtashim Nawaz, Vineet Gupta, Jagrati Rawat, Kumar Prateek, and Soumyadev Maity AI Enabled Context Sensitive Information Retrieval System . . . . . . . . . . . 203 Binil Kuriachan, Gopikrishna Yadam, and Lakshmi Dinesh Personalization of News for a Logistics Organisation by Finding Relevancy Using NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Rachit Garg, Arvind W Kiwelekar, Laxman D Netak, and Swapnil S Bhate AI Model Compression for Edge Devices Using Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Uday Kulkarni, S. M. Meena, Sunil V. Gurlahosur, Pratiksha Benagi, Atul Kashyap, Ayub Ansari, and Vinay Karnam An Empirical Study and Analysis of Various Electroencephalography (EEG) Artefact Removal Methods . . . . . . . . . . . 241 J. Vishwesh and P. Raviraj Chatbot via Machine Learning and Deep Learning Hybrid . . . . . . . . . . . . 255 Basit Ali, Vadlamani Ravi, Chandra Bhushan, M. G. Santhosh, and O. Shiva Shankar Prediction Intervals for Macroeconomic Variables Using LSTM Based LUBE Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Vangala Sarveswararao and Vadlamani Ravi Intelligent Character Recognition with Shared Features from Resnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Gopireddy Vishnuvardhan, Vadlamani Ravi, and Maddula Santhosh SVM and Naïve Bayes Models for Estimation of Key Process Variables in Nuclear Power Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 S. Narasimhan and V. Rajendran
Contents
ix
Face Recognition Using Transfer Learning on Facenet: Application to Banking Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Gopireddy Vishnuvardhan and Vadlamani Ravi Deep Learning Chatbot with Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Basit Ali and Vadlamani Ravi A Non-monotonic Activation Function for Neural Networks Validated on Benchmark Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Akash Mishra, Pravin Chandra, and Udayan Ghose Analysis of Approaches for Automated Glaucoma Detection and Prediction System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Upasana Mishra and Jagdish Raikwal Experiences in Machine Learning Models for Aircraft Fuel Flow and Drag Polar Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Ramakrishnan Raman, Rajesh Chaubey, Surendra Goswami, and Radhakrishna Jella Wildlife Video Captioning Based on ResNet and LSTM . . . . . . . . . . . . . . . 353 Abid Kapadi, Chinmay Ram Kavimandan, Chinmay Sandeep Mandke, and Sangita Chaudhari Enhancement of Degraded Images via Fuzy Intensification Model . . . . . . 365 Shaik Fayaz Begum and P. Swathi Design and Investigation of the Performance of Composite Spiral Antenna for Direction Finding Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 375 K. Prasad and P. Kishore Kumar Learning Based Approach for Subtle Maintenance in Large Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Prakhar Lohumi, Sri Ram Khandelwal, Shryesh Khandelwal, and V. Simran Temporal Localization of Topics Within Videos . . . . . . . . . . . . . . . . . . . . . . . 399 Rajendran Rahul, R. Pradipkumar, and M. S. Geetha Devasena Fast Training of Deep Networks with One-Class CNNs . . . . . . . . . . . . . . . . 409 Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat Recognition of Isolated English Words of E-Lecture Video Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Uday Kulkarni, Chetan Rao, S. M. Meena, Sunil V. Gurlahosur, Pratiksha Benagi, and Sandeep Kulkarni Indoor Object Location Finding Using UWB Technology . . . . . . . . . . . . . . 437 Jay A. Soni, Bharat R. Suthar, Jagdish M. Rathod, and Milendrakumar M. Solanki
x
Contents
Design and Tuning of Improved Current Predictive Control for PMSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 S. Sridhar, Md Junaid, and Narri Yadaiah Parametric Analysis of Texture Classification Using Modified Weighted Probabilistic Neural Network (MWPNN) . . . . . . . . . . . . . . . . . . . 459 M. Subba Rao and B. Eswara Reddy Modeling and Simulation of Automatic Centralized Micro Grid Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 M. Padma Lalitha, J. Jayakrishna, and P. Suresh Babu The VLSI Realization of Sign-Magnitude Decimal Multiplication Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Reddipogula Chandra Babu and K. Sreenivasa Rao Image Encryption Algorithms Using Machine Learning and Deep Learning Techniques—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 T. Naga Lakshmi, S. Jyothi, and M. Rudra Kumar
Using CNN to Predict Regressive STD Drug Efficacy Score Ambarish Moharil, Mansimran Singh Anand, Chirag Kedia, and Nikhil Sonavane
Abstract Drug safety and effectiveness has always been an important aspect for pharmaceutical companies. Even after being approved by Drug Regulatory Authorities like FDA, certain drugs have variable effects on different people. Pharmaceutical companies and Drug Authorities collect feedbacks from patients regarding the usage, side effects of drugs on a regular basis. These feedbacks can be used in determining the potency of a drug. In this paper, a focus on an approach is laid out in determining the drug efficacy from these feedbacks as one of the parameters. Drug feedbacks contain information regarding the side effects, benefits and an overall experience of the user. Generally, while predicting efficacy scores through machine learning algorithms, these feedbacks are converted into categorical values by analyzing their sentiment. This approach restricts one from extracting the semantic and syntactic information from patient feedbacks. Hence, this paper discusses an alternate curated approach in rating different patient feedbacks on a regressive scale using Convolutional Neural Networks and then using these linear values of reviews in predicting the efficacy of the drug. A sophisticated NLP-based model has been presented in determining the effectiveness of different drugs. Keywords Natural Language Processing · Convolutional Neural Networks · STD Drugs · Deep Learning · Patient feedbacks A. Moharil (B) Department of Instrumentation and Control, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] M. S. Anand · C. Kedia Department of Computer Science, Vellore Institute of Technology, Vellore, India e-mail: [email protected] C. Kedia e-mail: [email protected] N. Sonavane Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_1
1
2
A. Moharil et al.
1 Introduction [1] The safety and effectiveness of the Pharmaceutical products and drugs rely solely on the results of the clinical trials and the protocols in place. These studies and trials that are performed to see the result of the use of drug before they are released commercially are done under very standard conditions having little number of subjects and a limited time span which might not give a full array of various reactions of drugs. Hence it might be possible that some adverse drug reactions (ADRs) might cause a potential risk to the life of the drug user. So, to overcome such unfortunate scenarios pharma-vigilance (which is surveillance of drug reactions once the drug is released in the market) plays a very important role. Furthermore, the Clinical Support Decision System (CDSS) which is to diagnose and in turn give a remedial decision to overcome the problem plays a very important role in the healthcare system. CDSS can be classified into 2 categories, Knowledge based CDSS and Non-knowledge based CDSS. (1)
(2)
Knowledge based CDSS—This type consists of three parts: The knowledge base, the inference (or conclusion) engine a mechanism to convey the result. When you put in the system the symptoms that you are currently facing, it then uses the “IF–THEN” logic which the help of the knowledge base to output the exact treatment that is needed to overcome the symptoms. Non-knowledge based CDSS—These systems use Artificial intelligence and machine learning, where the computer runs on past data of patients/ past experiences and tries to map and unravel a pattern in the data using that pattern it will give the exact treatment needed.
[2] In the 21st Century, with the outbreak of web 2.0 platforms, there is copious amounts of user data generated. The amount of data that is there to be understood is not something that can be done manually, we need to train our computers to understand the meaning of the data, its semantic, what the data is trying to communicate and most precisely its structure [3–5]. Therefore, in the past decade a lot of research has gone into understand the semantics of the data and a number of algorithms have been made for sentiment identification (or sentiment classification) of user generated data. Semantic analysis is considered among the most prime algorithms in Natural language processing. Its task is interpreting the meaning and the structure of the text which is done by sentence segmentation, tokenization, removing the stop words, stemming and lemmatization, and then using the concept of word2vec to map each word into a form of a vector [6, 7]. To convert a word into a vector we can use various approaches like the Continuous bag of words method or the Skip Gram model. Once the word is converted to a vector various Machine learning or Artificial intelligence algorithm is used to train the vectors and then test them to interpret the meaning of the sample dataset. [8] The most popular algorithms that are there for sentiment analysis are Naïve Bayes, Support vector machine etc. They can take in input as a text and give out output as a positive sentiment, neutral sentiment or a negative sentiment. In our paper, to
Using CNN to Predict Regressive STD Drug Efficacy Score
3
overcome the shortcomings of understanding the effectiveness of drug once released in the market, we are understanding the reviews the user has given after consumption of the drug and using the sentiment analysis we have mapped the user reviews to a review score which ranges from 1 to 10 and then using other attributes like drug name, number of times prescribed, drug approver etc., to find a drug effectiveness score for every drug. The rest of the paper is organized as follows: Sect. 2, Use of Convolution neural network to understand the semantic drug review; Sect. 3 Advantages and Limitations; Sect. 4, Methodology—The exact steps used by the algorithm right from collection of data to showcasing of the results; Sect. 5, Architecture—A flow chart of various steps our algorithm takes; Sect. 6, Dataset—A brief overview of the dataset used; Sects. 7 and 8—Result and Conclusions.
2 Use of CNN for Sentiment Analysis A very common approach to analyzing the sentiment of the text is to use the Naïve Bayes, Support Vector machines, Maximum entropy model, Decision tree etc. But for our analysis we have employed the use of convolution neural network. Generally the use of neural network is done when you are working with image classification or image detection, there an image which is converted to a 2 * 2 matrix of pixels ranging from 0 to 255 is passed under a kernel, each kernel (or a filter) has its specific purpose, if a kernel is used for edge detection then the output image (matrix) coming after multiplying with the kernel is will clearly highlight those edges. Similarly, you can have kernels that can be used to identify faces in the images, identify corners etc. Using the Similar logic CNN can be used to convert every word in the form of a matrix. So basically, every character in a word is converted to a Single dimensional matrix on which kernel can be run for further processing (just as it was used earlier on the matrix of pixels). The advantage of using this approach is that if the structure of the sentence changes then it might be possible that the sentence may end up conveying a different meaning. These changes might not be picked up by the traditional algorithms. Hence, we decided to use convolution neural network to help us with Sentiment analysis.
3 Advantages and Limitations The proposed architecture in the paper is extremely helpful in predicting the base score of a particular drug. As patient reviews play a very important role in the mentioned task, extracting information from these reviews is a vital task for a lot of pharmaceutical organizations. The methodology proposed in this paper not only helps in extracting semantic as well as syntactic information from these reviews but also rates them on a linear scale rather than a categorical one. This gives a
4
A. Moharil et al.
mathematical metric to analyze the reviews based on their base scores. Comparing the scores of two reviews, one would be able to analyze and interpret the impact of a feedback as compared to the other. Conventional models perform sentiment analysis of these reviews and rate them in a categorical way. This results in loss of information and makes it difficult to interpret the information from them. For example, if two reviews are rated as ‘1’ meaning positive by a sentiment analyzer, it is not possible to get answers to the questions like “How much positive is one review than the other?”, “Is the impact caused by one review greater than the other or varies in what sense?”. When the reviews are projected on a linear scale (1–10 in this paper), one can definitely compare two reviews and state that the one with the higher score has more impact than the other. The semantic information exploited by the CNN and then projected on a linear scale based on the base score helps to assign a proper mathematical metric to the patient feedbacks which conventional methods fail to give. As such there are no limitation to the architecture and the methodology, the only one that is present is one needs a curated dataset of reviews and base scores of the drug to implement it. Also, training the algorithm on a lot of layers in the fully connected layer of the CNN might result in overfitting of the algorithm. We extend the use of CNN’s in predicting the drug efficacy and produce a high accuracy of 96.26% as mentioned in 7.
4 Methodology [9] In this paper we will discuss about an extensive methodology to score the reviews received by the patients regarding respective drugs. Patient reviews are extremely important and need to be analyzed in order to predict the base score of the drug. These patient reviews are in text format and must be converted into numerical data which can be interpreted by the machine. This is done using Natural Language Processing. In order to convert the patient reviews into numerical form we can either classify them into distinctive bins and get a categorical or a binary output or we can rate them on a certain scale. To resolve this issue, we look at the base score attribute present in the dataset. The base score of the drug varies on a scale from 1 to 10 indicating the efficacy of the drug. Converting the feedbacks into binary (1’s or 0’s) through sentimental analysis is one option but the problem is that the sensitivity of the base score is too much to extract enough information from these binaries, it will result into a higher state of error. The base score column has an average sensitivity of 10−4 and hence assigning reviews positive or a negative (1’s or 0’s) value will lead to neglecting a lot information in them. The problem is, a review with a base score of 6.26 will be classified as positive i.e. 1 and a review with a base score of 8.965 will also be classified as 1 or positive. This labelling is wrong and prevents us from leveraging the syntactic information present in the patient feedbacks. So, we found out a way to make the classifier learn on the semantic and syntactic structure of the review and then score the reviews based on a regressor using base score as our target variable. This scoring happened on a scale of 1–10 similar as the base score. We
Using CNN to Predict Regressive STD Drug Efficacy Score
5
extracted the features from the CNN and then based on the target variable projected a score on a linear scale of 1–10. It is divided into the following steps.
4.1 Preprocessing Stop-words Removal: The first step is to preprocess the input data. The data present in the dataset was scrapped by the creators from different websites and contained HTML tags which had to be removed. Also, as a crucial step in NLP, we removed all the punctuation marks, prepositions, articles etc. Through a curated “stopwords” list present in the spacy library. Stemming: The second step in pre-processing the data was to perform stemming or lemmatizing on the patient reviews. Words are often present in the sentences in different tenses and forms with several suffixes and prefixes, this is called inflection. This degree of inflection may vary in the dataset and hence it becomes important to convert all the words into their root form. Like “Played”, “Playing” imply the same semantic sense and must be understood by the machine that their meaning is same. So, we convert them into their root from “Play” by removing the suffixes. This process is called stemming of the data. Lemmatization: Lemmatization has more to do with the morphology and semantic sense of the statement. Lemmatization also aims to reduce the word into the root form or the “lemma”. Lemmatization understands the morphological sense of the words and then reduces to the root form. For example, if encountered by the word “saw”, stemming might give the result as just “s” while lemmatizing will convert the word into its base root or morpheme I.e. “see”. Hence, due to the semantic understanding and comprehensibility of lemmatizing we used it to pre-process our data over standard stemming operations which consist of removing prefixes and suffixes. All the lemmatizing operations were done using spacy. Tokenization: Tokenization is the process of chopping or cutting down a sentence into distinct pieces or tokens. Tokenization is considered as the base step for stemming and lemmatization. Tokenization is important in detecting the word boundaries I.e. the end of one word and the beginning of another word. Tokenization creates independent vectors which are used in further steps of semantic computation.
4.2 Word Embeddings To analyze the semantics of the statement we need a mathematical metric to representing their degree of similarity. Such representations can be done with the help of word embeddings. Word embedding is a technique where words in the text are defined as real valued vectors in a predefined vector space. These words have often dimensions of hundreds and thousands. We used the word2vec model for word embeddings. word2vec uses the cosine similarity concept to map the words closer to each other
6
A. Moharil et al.
in the defined vector space. n a · b 1 ai bi = Cos θ = n 2 n 2 a · b 1 ai 1 bi
(1)
Word2vec takes a text as input and returns word vectors as output. There are two methods to convert the text corpora into output vectors using word2vec. They are CBOW (Continuous Bag of Words) and Skip Gram Model. Word2vec provides the flexibility to use either of them. We used a pre trained model of word2vec of news reviews of Google News which consisted pre-trained vectors using the CBOW method of the word2vec word embedding technique. A popular method to initialize word vectors is initializing them with those obtained from unsupervised neural model. This helps in increasing the precision and accuracy of the model. Hence, to further work with our model we used the publicly available word2vec vectors trained on the Google News data consisting of 100 billion words which had a dimensionality of 300.
4.3 Convolutional Neural Network The CNN (Convolutional Neural Network) consists of different layers like, the convolutional layer, the pooling layer, flattening layer with a fully connected layer in the end. The convolutional layer is an end to end feature extractor used to extract features from the input. The novelty of the CNN in classifying the images is well known and have been widely used in various image processing applications. Our CNN model is inspired by the Yoon Kim model. We used the CNN to understand the structure of the sentence given as input and provide a score on basis of training on different features extracted from the input. We feed our trained tokens from the word2vec model into the CNN to extract different features from them. This is a very novel approach and solved most of our problem. We took the words and embedded them according to the pre-trained vectors from the Google News model. These tokens replaced by the respective vector values were passed to the CNN as the input. For Example a sentence “The day is very good” is considered as an input, and the corresponding vector values of each words are “1,2,3,4 & 5” respectively then we get a 5 dimensional input array as [1, 2, 8, 10]. In order to ensure that all the input arrays are of the same size we apply padding to the input. If the MAX_INPUT_SEQUENCE_LENGTH = 7 then post input the sentence vector looks like [0,0,1,2,3,4,5] (Fig. 1). As shown in the above figure let Zi ∈ Rk be the kth dimensional word input vector for the ith word in the input sequence. So our input sentence of length m is represented as (Post Padding the sequence). Zi:m = Z1 ∗ Z2 ∗ · · · ∗ Zm
(2)
Using CNN to Predict Regressive STD Drug Efficacy Score
7
Fig. 1 CNN Architecture for an example sentence. Source arXiv:1408.5882
where * is the convolutional operator. Following this operation, a convolutional filter of size w ∈ Rkh was applied to a window of h words to get the new feature maps. Ti = f(w · Zi :I+h−1 + b).
(3)
where b is the bias. Once these feature maps were obtained from the convolution layer, a pooling operation was performed. This was a max pooling operation to extract all the maximum features from the feature maps, which reduced the loss and ensured that all the necessary and important features were extracted. The idea is to capture the feature with maximum value for each feature map. This method of pooling viz. Max pooling deals and solves the problem of variable sentence length by extracting maximum features into a matrix of fixed dimension. Once this process is done the pooling matrix is flattened into a unit dimensional matrix of vectors and fed as the input to the fully connected layer. We devised a novel approach in scoring the features in the fully connected layers. As described earlier, classifying the input sentence into a binary class or multiple class would just create more problems than solving the one at hand. Using the ReLu (Rectified Layered Unit) we took the weighted sum of all the inputs from the corresponding layers and instead of using an output function, we used this weighted sum from neurons as independent vectors and fed them to a regressor with base score as the mapping output. As a result, we ranked the reviews from a scale of 1–10 instead of classifying them as 1’s or 0’s. With the extracted features acting as independent variables gave a flexible method to rank these reviews. In the final stage we replaced the patient reviews in the dataset with the corresponding “numeric_review_score” obtained as the output of the CNN. Now our dataset was completely numerical with the patient reviews being acutely scored on a scale of 10. This whole dataset was then split into training and testing data (75% as train and 25% as test) and was passed through a Random Forest regressor with base score as a target variable to predict the base score of a drug.
8
A. Moharil et al.
Fig. 2 Overview of procedure
5 Architecture See Fig. 2.
6 Dataset The dataset used in this project was obtained from an open source online platform. The data present in the dataset was scrapped from several healthcare platforms like nih.gov etc. It consists of curated patient feedbacks for various drugs. The dataset contains the date of drug approval by the medicine department of University of Illinois, Chicago (UIC). Each drug has been listed by its side effects, patient feedbacks, effectiveness rating, approval date by UIC and the efficacy score labelled as the base score. The patient reviews are the only relevant text data that was analyzed in the later stages of the architecture. Patient reviews as the input labels and base score as the target variable was fed to the Convolutional Neural Networks to get a regressive score of the reviews similar to the base score. The final dataset fed to the Decision Tree Regressor consisted of the numeric outputs obtained from the Convolutional Neural Networks which replaced the text reviews by patients. The final dataset and the initial dataset had same dimensions. Each consisted of 32,165 rows and 8 feature attributes (Fig. 3).
Using CNN to Predict Regressive STD Drug Efficacy Score
9
Fig. 3 Dataset used in the paper
7 Result As we know from the data presented in methodology, there is an important part of the data that needs to be decoded from its textual format. In simple terms, the reviews of patients need to be understood by the system and based on a criterion, be rated and then it can be used by the regressor. This is done using Natural Language Processing. As we talked about some criteria, we use the base scores from the data to give weights to our reviews data. But for analyzing the reviews, apart from Natural Language Processing, we also use a powerhouse model, Convolutional Neural Network (CNN), which will improve our accuracy in analyzing the texts further. These patient reviews are converted into a numeric score and added back to the database (Figs. 4, 5 and 6). The final accuracy of our CNN model to rank the reviews from 1 to 10 is 97.39%, with 28,948 samples in the training dataset and 3217 in the validation. The accuracy of the final model (post conversation of natural language feedbacks into a linear scale using the CNN in predicting the base score of a drug using the Random Forest Regressor was 96.26% (Fig. 7).
8 Conclusion As suggested in the previous parts, the main or rather the most important focus is on the patient reviews that gives us the most important information about a drug. Hence analyzing this was accurately by the machine was very important. Generally, it would be done using Natural Language Processing, but to add to its accuracy, Convolutional Neural Network was additionally used to understand. The CNN extracts the features from these reviews. We feed our trained tokens from the word2vec model into the CNN to extract different features from them. We took the words and embedded them according to the pre-trained vectors from the Google News model. We used a model antithetical to the general approach of understanding textual data. This is a novel approach and solved most of our problem. Once this problem was tackled, then the
10
A. Moharil et al.
Fig. 4 The following figure that you see is the output of us converting the textual reviews into numeric scores
Fig. 5 Final dataset
Fig. 6 The following figure shows are epoch processing. We have used 3 epochs to train the model
Using CNN to Predict Regressive STD Drug Efficacy Score
11
Fig. 7 Giving a brief detailing of our CNN model, level by level
final job was of the Decision Tree regressor that was used over the entire data for predicting the base score for our testing data.
References 1. Gräßer, F., Malberg, H., Zaunseder, S., Kallumadi, S.: Aspect based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In: DH’18:2018 international digital health conference (2018) 2. Chen, H.-H., Chowdhury, G. (eds.): ICADL 2012, LNCS 7634, pp. 189–198 (2012) 3. Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (69–78) (2014, August) 4. Wang, J., Yu, L. C., Lai, K. R., Zhang, X.: Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 225–230 (2016, August) 5. Stuart, K. D., Majewski, M.: Intelligent opinion mining and sentiment analysis using artificial neural networks. In: International Conference on Neural Information Processing. Springer, Cham (2015, November) 6. Ain, Q.T., Ali, M., Riaz, A., Noureen, A., Kamran, M., Hayat, B., Rehman, A.: Sentiment analysis using deep learning techniques: a review. Int J Adv Comput Sci Appl 8(6), 424 (2017)
12
A. Moharil et al.
7. Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 959–962 (2015, August) 8. Vijayaraghavan, S., Basu, D.: Sentiment analysis in drug reviews using supervised machine learning algorithms. arXiv preprint arXiv:2003.11643 (2020) 9. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014) 10. Doan, S., Bastarache, L., Klimkowski, S., Denny, J.C., Xu, H.: Integrating existing natural language processing tools for medication extraction from discharge summaries. J. Am. Med. Inform. Assoc. 17(5), 528–531 (2010)
Emotion Recognition in Hindi Speech Using CNN-LSTM Model B. Shashank, Bhavani Shankar, L. Chandresh, and R. Jayashree
Abstract Emotion recognition in Hindi speech is a vital functionality necessity in the upcoming world of machine-driven world. In this project, a simulated Hindi emotional speech database has been borrowed from a subset of the IITKGP-SEHSC dataset. We are classifying emotions into 4 classes: happy, sad, fear and anger. We are using pitch, noise, and frequency as the features to determine the emotion. In this paper, we have discussed the advantages of using the CNN-LSTM model for recognizing emotion in Hindi speech. Keywords Emotion recognition · Librosa · MFCC · RMSPROP · Hybrid CNN-LSTM
1 Introduction In the age of Artificial intelligence and Chatbots, capturing the emotion in speech holds immense significance. Since emotion gives out the state of mind of a person, the early you capture it the better you can respond to it accordingly. If we capture the emotion right, by the proper response we can engage the person in a better way. Especially for chatbots, call centers and all other places where the face of the human is not visible, only through capturing of emotion while on call, there is a scope for better customer experience.
B. Shashank · B. Shankar · L. Chandresh (B) · R. Jayashree Department of Computer Science and Engineering, PES University, Bengaluru, India e-mail: [email protected] B. Shashank e-mail: [email protected] B. Shankar e-mail: [email protected] R. Jayashree e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_2
13
14
B. Shashank et al.
Emotion in a speech has two parts to it. One through the words or linguistic communication (What was said) and the other through the tone of speech. Emotion can be recognized through various acoustic features in human speech. Both ways emotion can be recognized. In general, emotion describes feelings related to position, object and other uncertainties, and that is why it is hard to define all the real-life emotions. So, we took basic emotions in humans as told by Paul Ekman, Anger, Sad, Happy and Surprise. We can agree that there has been much research going on in English language using machine learning and deep learning techniques, but when it comes to Hindi and other Indian regional languages like Kannada, Telugu, Tamil, Malayalam, Bengali, Tulu etc. the research on the emotion classification in speech is limited due to the availability of dataset in each local languages. This paper focuses on emotion recognition over Hindi speech signals for four different emotions: Happy, Sad, Anger and Fear. This paper is organized as follows: Section 2 describes the background information about the emotion recognition system, applications of emotion recognition etc. Section 3 describes dataset description. Section 4 explains the methods of feature extraction and optimization from speech signals. Section 5 contains various models explored. Finally, Sect. 6 gives the results and conclusions.
2 Background 2.1 Emotion Recognition in Speech Emotion Recognition is the study of emotions and understanding the methods used for analysis. Emotions are often recognized from expressions seen in face and speech signals. Different techniques exist for visualizing and analyzing emotions, like signal processing, neural networks, computer vision, machine learning. Emotion Recognition is being studied and developed and used all over the globe. Emotion Recognition is gaining its attention and popularity in the research field which is necessary to solve many problems. Native language Speech Emotion Recognition is additionally used in various fields like call center to detect the emotion necessary for identifying the satisfaction of the customer. Emotion Recognition serves as the performance parameter for conversational analysis thus identifying the unsatisfied customer, customer satisfaction and so on. SER is used in-car board systems based on information on the mental state of the driver that can be provided to the system to initiate his/her safety preventing accidents to happen. Native Language Speech Emotion Recognition System can be used for analyzing politicians’ emotions and the voter’s response to it. In this paper we have taken the 4 speeches in Hindi language for testing the built system.
Emotion Recognition in Hindi Speech Using CNN-LSTM Model
15
2.2 Neural Networks The neural network is one of the techniques in deep learning neural networks have a wide range of applications in areas such as image recognition, text processing, emotion recognition.
2.3 Applications of Neural Networks Emotion Recognition is used in call centers for classifying calls according to emotions. Emotion Recognition serves as the performance parameter for conversational analysis thus identifying the unsatisfied customer, customer satisfaction and so on. SER is used in-car board systems based on information on the mental state of the driver that can be provided to the system to initiate his/her safety preventing accidents to happen.
2.4 MFCC Mel Frequency Cepstral Coefficient is a very vital feature used in automatic emotion detection. In audio or sound processing, the mel frequency cepstrum (MFC) is a representation of the short-term power spectrum of an audio or sound. It is based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that are derived from a type of cepstral representation of the audio or sound clip. The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system’s response more closely than the linearly spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound. MFCCs are commonly derived as follows: 1. 2. 3. 4. 5.
Take the Fourier transform of a signal (of required audio clip) Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. Take the logs of the powers at each of the mel frequencies. Take the discrete cosine transform of the list of mel log powers, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum. Raw wave and its Spectrogram for a sample audio clip (Fig. 1). Mel frequency Spectrogram of the audio clip (Fig. 2). MFCC Feature of the audio clip (Fig. 3).
16
Fig. 1 Raw waveform with Spectrogram
Fig. 2 Mel frequency Spectrogram
B. Shashank et al.
Emotion Recognition in Hindi Speech Using CNN-LSTM Model
17
Fig. 3 MFCC representation
3 Dataset The Dataset used for this work is borrowed from a subset of the IITKGP-SEHSC dataset. It is a Hindi audio speech corpus. In this dataset 15 sentences are said in 8 different emotions in 10 sessions each by 10 actors. In this paper, we are using data of 4 actors. The challenge in using this dataset is that it is a simulated dataset created using actors. Limited amount of dataset also might lead to less accuracy. Given a language it’s very common to find different accents and different types in expressing any emotion with slight difference which also might give accept the models output on real time input from real word speeches (Fig. 4).
4 Feature Extraction and Preprocessing To extract features a python library called “Librosa” is made use of. It is a library for music and audio file analysis. As discussed earlier MFCC is a very vital feature to detect emotion. When a word is pronounced in certain ways the shape of our vocal cord, tongue, teeth, etc helps us to make that sound with certain emotion required. The feature which helps to determine this is Mel Frequency Cepstral Coefficient (MFCC). MFCCs can be obtained by using “librosa.feature.mfcc()” librosa function. As part of Preprocessing we tune the noise by adding some white noise and extract MFCC features. Then we also Tune the pitch by shifting the pitch and extract MFCC features. Combining all this will happen to train the model with many variants of an audio clip.
18
B. Shashank et al.
Fig. 4 Dataset structure
Raw wave before preprocessing (Fig. 5). Raw wave after Preprocessing (Fig. 6).
Fig. 5 Raw wave before preprocessing
Emotion Recognition in Hindi Speech Using CNN-LSTM Model
19
Fig. 6 Raw wave after preprocessing
5 Proposed Methodology 5.1 CNN Model Convolutional Neural Networks (CNN) is a class of deep neural networks. The CNNs use relatively less pre-processing compared to other classification algorithms. CNN’s consist of an input layer and an output layer as well as multiple hidden layers. Hidden layers, in turn, consist of a series of convolutional layers that convolve with the dot product.
5.2 ANN Model Artificial Neural Networks are computing systems inspired by biological neural networks. Generally, ANNs learn to perform tasks by considering examples, without being programmed for task-specific rules. ANN is based on a collection of connected nodes (artificial neurons). Each connection can transmit a signal to other neurons. The node that receives a signal then processes it.
20
B. Shashank et al.
Fig. 7 CNN-LSTM model
5.3 CNN—LSTM Model LSTM has a forget gate to ignore and cell state to store the sequential information, the pattern is recognized and stored. As CNN can produce different levels of abstraction for features it can be used in initial layers to extract high level features and feed the right input to later LSTM layers. This architecture has an edge over the other two mentioned models as it takes in the advantages of the CNN model and the LSTM model together. This can be seen in the results section. The architecture used for this experiment is shown in Fig. 7. This architecture consists of 3 layers of CNN followed by 4 layers of LSTM with a dense layer in the end and SoftMax as the activation function for the output layer.
6 Results and Conclusion In this work we have created an emotion recognition model using a hybrid model CNN-LSTM. The model when tested against the test data (i.e. the 20% of the dataset which was kept aside for testing) gives the accuracy in range of 75–80% with the f1 score of 75–80. The diagonal values represent the correctly classified emotion, rest of the emotions are wrongly classified (Table 1). When an audio file of more than 5 seconds is passed to our model the number features extracted will increase leading to miss match in shape of model and input shape to the model. So, the Audio Clips with larger duration are split into chunks and then passed to the model and the predicted emotion with higher percentage is given as the output. Two such examples are discussed below. Mrs. Smriti Irani’s parliament discussion on the topic “Politicizing Rape” was fed as input to the model and the results are as shown in Fig 8.
Emotion Recognition in Hindi Speech Using CNN-LSTM Model
21
Table 1 Confusion matrix of test result Anger Anger
Fear
Happy
Sad
302
28
Fear
17
275
40
28
Happy
30
15
294
21
7
24
49
280
Sad
23
7
Accuracy: 79.9%
Fig. 8 Emotion analysis of Mrs. Smriti Irani’s speech
The model was tested against an audio clip of our Hon’ble Prime Minister Shri. Narendra Modi’s ‘Mann Ki Baat’. The prediction made by the model is shown in Fig. 9. For some of the inputs, the model’s output has been summarized in Table 2. In the audio clip titled “Maan Ki Baat”, Hon’ble Prime Minister Shri. Narendra Modi speaks in a monotonic way, disappointed about the plight of the farmers. Here the model has classified 48% of the speech as sad which is the majority.
Fig. 9 Emotion analysis of Mann Ki Baat
Table 2 Summary of the emotion analysis of speech Speech title
Happy (%) Anger (%) Sad (%) Fear (%)
Maan Ki Baat (Shri. Narendra Modi)
34.00
6.88
48.17
10.93
Smriti Irani’s speech on politicizing rape
20.83
66.66
8.33
4.16
Customer care (JIO customer on removal of free 33.33 data plan)
52.77
4.16
9.72
Pulwama attack (PM speech)
61.53
17.94
0.00
20.51
22
B. Shashank et al.
In the audio clip titled “Smriti Irani’s speech on politicizing rape”, Smriti Irani is angry throughout the clip as the opposition is politicizing the rape. The model here as classified the 66.66% of the total speech as angry. Similarly, for the audio clips “customer care (JIO customer on removal of free data plan)” and “Pulwama Attack (PM Speech)”, Output has been described in Table 2.
References 1. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587, ISSN 0031-3203, https:// doi.org/10.1016/j.patcog.2010.09.020 (2011) 2. Jasleen, Dilber, D.: Feature selection and extraction of audio signals. Int. J. Innov. Res. Sci. Eng. Technol. (2016) 3. Speech emotion recognition using deep learning. Int. J. Recent Technol. Eng. (IJRTE) 7(4S) (2018). ISSN 2277-3878 4. Koolagudi, S., Vempada, R., Yadav, J., Rao, K.: IITKGP-SEHSC: Hindi speech corpus for emotion analysis. https://doi.org/10.1109/ICDECOM.2011.5738540 (2011) 5. Zia, T., Zahid, U.: Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. Int. J. Speech Technol. 22(1) (March 2019), 21–30. https://doi.org/10.1007/ s10772-018-09573-7 (2019) 6. Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 131–135. https://doi.org/10.1109/ICASSP.2017.7952132 (2017) 7. Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 2227–2231. https://doi.org/ 10.1109/ICASSP.2017.7952552 (2017) 8. Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM International conference on Multimedia (MM ’14). Association for Computing Machinery, New York, NY, USA, pp. 801–804. https://doi.org/10.1145/2647868. 2654984 (2014) 9. Huang, C.-W., Narayanan, S.S.: Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition, pp. 1–19 (2017) 10. Noroozi, F., Akrami, N., Anbarjafari, G.: Speech-based emotion recognition and next reaction prediction. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), Antalya, pp. 1–4, https://doi.org/10.1109/SIU.2017.7960258 (2017) 11. Bisio, I., Delfino, A., Lavagetto, F., Marchese, M., Sciarrone, A.: Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans. Emerg. Topics Comput. 1(2), 244–257. https://doi.org/10.1109/TETC.2013.2274797 (2013) 12. Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, Gold Coast, QLD, Australia, pp. 1–5, https://doi.org/10.1109/ICSPCS.2010.5709752 (2010)
Refinery Profit Planning via Evolutionary Many-Objective Optimization Vadlamani Madhav, Shaik Tanveer-Ul Huq, and Vadlamani Ravi
Abstract Evolutionary multi-objective optimization (EMO) found applications in all fields of science and engineering. Chemical engineering discipline is no exception. Literature abounds on EMO with a variety of algorithms proposed by a few dedicated researchers. The Nondominated Sorting Genetic Algorithm (NSGA-III) is the latest addition to the family of EMO. NSGA-III claims to have solved multi and manyobjective optimization problems up to 15 objective functions. On the other hand, during the last 2 decades, chemical engineering has witnessed many applications of multi-objective optimization algorithms such as NSGA-II. In a first-of-its-kind study, this paper exploits the power and versatility of the NSGA-III to solve a fourobjective optimization problem occurring in refinery profit planning. NSGA-III is eminently suitable for this class of problems. We applied NSGA-III to this problem and obtained the full set of pareto solutions for the four-objective problem. We also observed that they are dominated solutions when compared to the FNLGP and others. The ratio of HV/IGD was proposed to measure the quality of the solutions obtained in a run. It can be applied to solve other many-objective optimization problems in Chemical Engineering. Keywords Evolutionary Computation · Refinery profit planning · Pareto solutions · Many-objective optimization · NSGA-III
S. T.-U. Huq · V. Ravi (B) Institute for Development and Research in Banking Technology, Center of Excellence in Analytics, Castle Hills Road #1, Masab Tank, Hyderabad 500057, India e-mail: [email protected] S. T.-U. Huq e-mail: [email protected] V. Madhav Department of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_3
23
24
V. Madhav et al.
1 Introduction Optimization either linear or nonlinear has made deep inroads into chemical engineering. To mention a few studies, Allen [1], Seinfeld and Mcbride [9] were among first to apply linear programming and nonlinear programming respectively for optimizing the refinery operations. It is noteworthy that Seinfeld and McBride [9] employed scalar optimization approach Later, Ravi and Reddy [2], Ravi et al. [3] proposed fuzzy linear fractional programming and fuzzy nonlinear goal programming respectively for the same problems and obtained superior results. Evolutionary computing, both in single-objective and multi-objective environment, has found numerous applications in chemical engineering primarily because many problems involve optimization of cost, profit, yield, design, scaling up the processes from the lab set up and EC methods yield a collection or a population of solutions at convergence. Evolutionary Multi-objective Optimization (EMO) methods such as NSGA-II [4] gained popularity in all engineering disciplines. However, NSGA-II can solve problems upto 2 objective functions. Therefore, Deb and Jain [5] proposed NSGA-III for solving problems from 4 and up to 16 objective functions. Reddy et al. [6], Hemalatha et al. [7] are a few researchers who embraced EC methods and reported excellent results. Rangaiah et al. [8] is the last seminal paper, where they systematically reviewed various bi- and multi-objective optimization problems, which were solved by employing EC methods. To summarize, chemical engineers found the EC methods attractive owing to the fact they offer a palette of solutions to choose from. To the best of our knowledge, NSGA-III has not yet been employed in Chemical engineering area. In this paper, in a first-of-its-kind study, we solve the refinery profit optimization problem of Seinfeld and McBride [9] in a pure, multi-objective environment by employing NSGA-III algorithm which allows simultaneous optimization of all the objective functions without resorting to scalar optimization. Seinfeld and McBride [9] solved it in a bi-objective environment (by taking 2 objectives at a time out of 4 objectives) using the scalarization method, which yielded only one pareto solution. Later, this was solved by Ravi et al. [3] in a bi-objective environment (by taking 2 objectives at a time) using fuzzy nonlinear goal programming approach (FNLGP). Then, Suman [10] applied multi-objective simulated annealing (by taking 2 objectives at a time) and obtained another pareto solution. Despite yielding a few of the pareto optimal solutions, Ravi et al. [3] s approach cannot guarantee all pareto solutions. We adopted the ratio of HV and IGD to rank order the population size times the # runs number of solutions. This akins to the empirical attainment function plot normally used in bi-objective environment. The rest of the paper is organized as follows: Sect. 2 presents an overview of NSGA-III. Section 3 presents the problem formulation. Section 4 presents results and discussion, while Sect. 5 presents conclusions.
Refinery Profit Planning via Evolutionary Many-Objective …
25
2 Overview of NSGA-III Non-dominated sorting genetic algorithm III (NSGA-III) [5] is a multi and manyobjective optimization algorithm and used to optimize three to 15 objective functions simultaneously. This algorithm yields well-diversified and converged solutions. It uses a reference-based framework in order to select a set of solutions from a substantial number of non-dominated solutions to look for diversity. While the basic structure of the NSGA-III is similar to the NSGA-II algorithm [4] there are significant changes in its selection operator. Unlike NSGA-II, the maintenance of diversity among population solutions in NSGA-III is helped by supplying and adaptively updating a number of well-spread reference points For more details, the reader is referred to Deb and Jain [5].
2.1 Measures of Convergence and Diversity To measure the extent of diversity and the state of convergence of the solutions found by multi and many objective optimization algorithms such as NSGA-III, at the end of a run (in other words, after convergence) two widely used criteria include Inverted Generational Distance (IGD) [5, 11] and Hyper volume (HV) [12]. IGD is computed as follows:
I G D A, Z e f f
Ze f f | | |A| 1 = min d z i , a j Ze f f j=1 i=1
where, d z i , a j = z i − a j 2 A is the set of solutions obtained by the algorithm, Z e f f s the set of points present in Pareto optimal surface. a j s a solution present in set A. z i s a solution in the Pareto optimal surface which is near to a j . The IGD measure indicates how close the obtained solutions are to the solutions present in the true Pareto front or Pareto optimal surface. In cases where the true Pareto front is unknown, we run the algorithm by taking large population size and large number of generations. Then, the first Pareto front solutions obtained at the end of the execution are considered as approximation to the Pareto optimal solutions [13]. In our case we considered population size as 500 and number of generations as 500 to approximate Pareto optimal surface.
26
V. Madhav et al.
The Hyper volume of set X is the volume of space formed by non-dominated points present in set X with any reference point. Here the reference point is the “worst possible” point or solution (any point that is dominated by all the points present in solution set X) in the objective space. For a maximization (minimization) problem with positive (negative) valued objectives, we consider origin as the reference point. If a set X has a higher hyper volume than that of a set Y, then we say that X is better than Y.
3 Problem Formulation The refinery design problem presented in Seinfeld and McBride [9] is briefly described as follows. The refinery was designed with the goal of producing three grades of gasoline: premium, high-octane, and low-octane with smaller quantities of jet fuel, kerosene, and fuel oil. It consists of the following units: an atmospheric crude distillation tower (A), a vacuum distillation tower (B), a reformer (C), a hydro cracker for producing gasoline from blend of light gas oil and light catalytic cycle oil (D), a fluid catalytic cracker (E), a hydrocracker for upgrading vacuum tower bottom (F), and a hydrogen plant (G). The model is based on 200,000 barrels per stream day of 32 A.P.I. crude. It was assumed for convenience that each unit was capable of producing at design specification and costs for any input conditions. Hence, product specifications, gas recovery, special treating, blending and storage facilities are not included. The problem as presented in Seinfeld and McBride [9], originally has one linear and 3 nonlinear objective functions, 9 dependent design variables and 10 linear constraints. It is reduced to a problem with 2 independent design variables, 4 linear constraints and the objective functions remain unchanged. It has 12 parameters indicating different constants in the refinery operations. First objective function is the profit, the next three are sensitivity in profit with respect to changes in crucial parameters namely w11 , h2 and w12 . For complete description of the problem, the reader is referred to Seinfeld and McBride [9]. Refinery steam variables in the original problem are as follows: x1 : Light virgin gas oil (bbl/day); x2 : Heavy virgin gas oil (bbl/day); x3 : Vacuum tower bottom (bbl/day); x4 : Vacuum tower bottoms (bbl/day); x5 : Reformer premium gasoline product (bbl/day); x6 : Reformer hydrogen product (SCF/day); x 7 : Hydrogen feed to hydrocracker (SCF/day); x8 : Hydrocracker high-octane gasoline product (bbl/day); x9 : Light catalytic cycle oil (bbl/day). The four objectives are as follows:
Refinery Profit Planning via Evolutionary Many-Objective …
27
Maximi ze Pr o f it = f 1 (x) = 350 ∗ (5.36 ∗ (36,000 + 0.09 ∗ x3 ) + 4.62∗ (17022 + 0.1 ∗ x3 + 0.12 ∗ x1 + 0.55∗ (110,000 − x1 + 0.6 ∗ x3 + 20,000))) + 350∗ (4.41 ∗ (3.5 ∗ (17022 + 0.1 ∗ x3 + 0.12 ∗ x1 )) + 20,000) + 69,800 + 4.2 ∗ (20,000 − 0.85 ∗ x3 ) − 350 ∗ (560000 + 0.000114∗ (89000000 + 900 ∗ x1 + 2010 ∗ x3 )
1.
+ 38,400 + 14*(0.1*x3 + 40,000)∧ 0.6 − 350∗ 20.2 ∗ (0.45 ∗ x1 + 0.67 ∗ x3 + 70,500)∧ 0.6 + 22.2 ∗ (110,000 − x1 + 20,000 + 0.6 ∗ x3 )∧ 0.6 +26.56 ∗ x3 ∗ 0.3) − 350 ∗ (0.112∗ (89,000,000 + 900 ∗ x1 + 2010 ∗ x3 )∧ 0.6) − 84,031 − 1840 ∗ (0.1 ∗ x3 + 40,000)∧ 0.6 − 3380 ∗ (0.45 ∗ x1 + 0.67*x3 + 70,500)∧ 0.6 − 1150 ∗ (110,000 − x1 + 20,000 + 0.6*x3 )∧ 0.6 − 2160 ∗ (x3 )∧ 0.6 − 0.00875∗
2. 3. 4.
(89,000,000 + 900*x1 + 2010*x3 ) − 100,000 ∂ f1 Minimi ze f 2 (x) = ∂w10 = 350 ∗ 4.41 ∗ (0.12 ∗ x1 + 0.1 ∗ x3 + 17,022) ∂ f1 Minimi ze f 3 (x) = ∂h = 385 ∗ 4.62 ∗ (110,000 ∗ x1 + 0.6 ∗ x3 + 20,000) 2 ∂ f1 Minimi ze f 4 (x) = ∂w = 350 ∗ (890,000, 000 ∗ + 900 ∗ x1 + 2010 ∗ x3 ) 11 Four constraints and bounds on the 2 independent decision variables are as follows: − 13,560 ≤ 0.43 ∗ x1 − 0.678 ∗ x3 ≤ 30,552.2 49,400 ≤ 0.54 ∗ x1 − 1.596 ∗ x3 ≤ 20 0.0 ≤ x1 ≤ 90,000 14,700 ≤ x3 ≤ 20,000
As discussed above, NSGA—III was adopted as the optimization algorithm to solve the 4-objective optimization problem. The first objective is the profit function which is to be maximized and the three other objectives are the partial derivatives of the above profit function with respect to 3 of the 12 parameters mentioned above. These objectives to be minimized are the sensitivity of the profits caused by the three parameters.
28
V. Madhav et al.
4 Results and Discussion The code for NSGA-III is adapted from the website https://github.com/msu-coinlab/ pymoo. The parameters of the NSGA-III are fixed as follows after fine tuning: crossover rate = 0.9, mutation rate = 0.1, population size = 30, maximum number of generations = 10. Then, the Pareto-front obtained for the run corresponding to random seed 35 is depicted in Fig. 1 as a parallel coordinate plot (PCP), which is the only way to plot pareto fronts for many objective functions. This is unlike the 2dimensional or 3-dimensional pareto fronts drawn for 2- or 3-objective optimization problems respectively. The PCP is an intuitive way of presenting the pareto solutions. As is customary within EMO studies, the NSGA-III was run for 20 times with different random seed values in order to study its impact of random seed on the final results. The best solutions, 12 in number, obtained for seed 35 are presented in Table 1. This is because for seed 35, the HV/IGD value was the highest. It noteworthy that 30 solutions in the population converged to just 12 on the pareto front Table 1. The ratios HV/IGD values are presented in Table 2. Strictly speaking, the results of the present study cannot be compared to those of Ravi et al. [3], Seinfeld and McBride [9] as none of these studies worked with all
Fig. 1 Pareto front represented by a parallel coordinates plot
Refinery Profit Planning via Evolutionary Many-Objective …
29
Table 1 Final Pareto-optimal solutions corresponding to seed 25 Solution number x1
x3
f1
f2
f3
f4
1
89,887.86 14,872.22 1.40 × 108 4.52 × 107 8.72 × 108 7.00 × 1010
2
58,323.98 14,979.47 1.42 × 108 3.94 × 107 1.43 × 108 6.00 × 1010
3
69,506.85 14,831.44 1.41 × 108 4.14 × 107 1.23 × 108 6.34 × 1010
4
47,886.82 14,790.67 1.43 × 108 3.74 × 107 1.62 × 108 5.66 × 1010
5
49,611.20 15,131.93 1.43 × 108 3.78 × 107 1.59 × 107 5.74 × 1010
6
52,041.00 15,194.81 1.42 × 108 3.82 × 107 1.55 × 108 5.82 × 1010
7
55,418.49 15,290.82 1.42 × 108 3.89 × 107 1.49 × 108 5.93 × 1010
8
45,885.23 15,562.45 1.43 × 108 3.71 × 107 1.66 × 108 5.66 × 1010
9
46,489.41 15,562.01 1.43 × 108 3.73 × 107 1.65 × 108 5.67 × 1010
10
64,447.87 14,876.54 1.42 × 108 4.05 × 107 1.32 × 108 6.19 × 1010
11
76,869.92 15,133.34 1.41 × 108 4.28 × 107 1.11 × 108 6.60 × 1010
12
36,010.83 19,099.61 1.43 × 108 3.58 × 107 1.88 × 108 5.59 × 1010
Table 2 HV/IGD values for 20 runs
Random seed
HV/IGD ratio
1
1.89 × 1026
5
1.22 × 1026
10
3.46 × 1026
15
1.61 × 1026
20
1.94 × 1026
25
2.57 × 1026
30
2.04 × 1026
35
4.73 × 1026
40
1.61 × 1026
42
2.17 × 1026
45
2.05 × 1026
50
1.43 × 1026
55
1.92 × 1026
60
2.20 × 1026
65
1.13 × 1026
70
1.66 × 1026
75
2.91 × 1026
80
2.00 × 1026
85
2.34 × 1026
90
2.27 × 1026
30
V. Madhav et al.
the four objectives simultaneously. But, for the sake of completeness, if one were to compare the values of these objective functions, the entire pareto-front presented in Table 1 of the present study yielded better optimal solutions than those reported by Seinfeld and McBride [9] for all the objectives. In the case of Ravi et al. [3], these solutions fared better in all objectives except f 3 . Overall, we can conclude that these results are better than any of the earlier studies. This is a significant achievement of the present study.
5 Conclusions This paper solves a long standing many-objective refinery profit optimization problem in a pure many-objective environment by employing the popular NSGA-III algorithm, which is applied for the first time in chemical engineering. We achieved significantly better non-dominated solutions in three out of four objectives compared to the state-of-the-art. We employed the ratio of HV and IGD to rank order the all the pareto-optimal solutions obtained after 20 runs of the algorithms and the best result is presented. The parallel coordinates plot was drawn to show the pareto-front for this 4-objective optimization problem. In future, NSGA-III can be employed to efficiently solve other many-objective optimization problems in chemical engineering.
References 1. Allen, D.H.: Linear programming models for plant operations planning. British. Chem. Eng. 16, 685–691 (1971) 2. Ravi, V., Reddy, P.J.: Fuzzy linear fractional goal programming applied to refinery operations planning. Fuzzy Sets Syst. 96, 173–182 (1998) 3. Ravi, V., Reddy, P.J., Dutta, D.: Application of Fuzzy nonlinear goal programming to a refinery model. Comput. Chem. Eng. 22, 709–712 (1998) 4. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002) 5. Deb, K., Jain, H.: An Evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Trans. Evol. Comput. 18, 577–601 (2014) 6. Reddy, P.S., Rani, K.Y., Patwardhan, S.C.: Multi-objective optimization of a reactive batch distillation process using reduced order model. Comput. Chem. Eng. 106, 40–56 (2017) 7. Hemalatha, K., Nagveni, P., Kumar, P.N., Rani, K.Y.: Multiobjective optimization and experimental validation for batch cooling crystallization of citric acid anhydrate. Comput. Chem. Eng. 112, 292–303 (2018) 8. Rangaiah, G.P., Sharma, S., Sreepathi, B.K.: Multi-objective optimization for the design and operation of energy efficient chemical process and power generation. Curr. Opin. Chemcial Eng. 10, 49–62 (2015) 9. Seinfeld, J.H., McBride, W.L.: Optimization with multiple criteria: application to minimization of parameter sensitivities in a refinery model. Ind. Eng. Chem. Process Des. Dev. 9(1), 53–57 (1970)
Refinery Profit Planning via Evolutionary Many-Objective …
31
10. Suman, B.: Study of self-stopping PDMOSA and performance measure in multiobjective optimization. Comput. Chem. Eng. 29, 1131–1147 (2005) 11. Van Veldhuizen, D.A., Lamont, G.B.: Multiobjective evolutionary algorithm research: a history and analysis. Air Force Institute of Technology, Wright- Patterson AFB, Ohio, TR-98-03 (1998) 12. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach. IEEE Trans. Evol, Comput (1999) 13. Ishibuchi, H., Masuda, H., Tanigaki, Y., Nojima, Y.: Difficulties in specifying reference points to calculate the inverted generational distance for many-objective optimization problems. In: 2014 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (MCDM) (2015)
A Deep Learning Technique for Image Inpainting with GANs K. A. Suraj, Sumukh H. Swamy, Shrijan S. Shetty, and R. Jayashree
Abstract An image is identical to a thousand words. Each image pulls the watcher in, to an alternate time and spot making him experience a heap of feelings. They ll in as a path through a world of fond memories. However, consider the possibility that these photographs were harmed or have undesirable objects. To re-establish these photographs by elim-inating damages such as scratches, haziness and overlaid content or il-lustrations, we can utilize a procedure called Image Inpainting. Image inpainting is the procedure of reestablishing the harmed and missing pieces of a picture with the objective of introducing the picture as it was initially envisioned. The extent of our strategy ranges from expulsion of undesirable articles from the picture to reproducing the deteriorated and obscured out parts of the picture. Further, it could be utilized to improve quality of the pictures (for example, the ones capturing criminal activ-ities and their perpetrators). In our paper, we present a profound deep learning procedure to accomplish the above objectives. A pix2pix Gen-erative Adversarial Network is being utilized here with various encoders and decoders which extract the essential highlights of the picture and afterwards recreate it without any fuss. Keyword GAN · Image · Inpainting · Pixels
1 Introduction In the millennia of internet and social media, where displaying and portraying everything one does on social media platforms, the number of pictures being clicked has increased to a great extent. Sometimes, these pictures don’t out the way they were envisaged by the photographer. Image Inpainting is a process which can be used to eliminate unwanted objects as well as obvious aws in pictures and make them better, it can be used when certain pixels are missing or corrupted. Traditional Inpainting methods calculate these missing pixel values based on certain mathematical calculations as an average of surrounding pixel values [1]. This is done so as to maintain K. A. Suraj (B) · S. H. Swamy · S. S. Shetty · R. Jayashree PES University, 100 Feet Ring Road, Banashankari Stage III, Dwaraka Nagar, Banashankari, Bengaluru, Karnataka 560085, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_4
33
34
K. A. Suraj et al.
structural similarity of the image. However, these methods are not good enough when the image is complex due to non-repetitive structures and varying depth perception. Image Inpainting methods can be broadly classified into two categories. The first category involves generating the image from scratch after learning all the features of the image. This method is known as Blind Image Inpainting. The second category focuses on the damaged or missing region and generates the pixels only for that region. Our algorithm emulates the first category as more features can be learnt from the entire image. Blind Image Inpainting can be done by implementing a number of deep learning techniques. Convolutional neural networks for instance, can be used for image generation where high level object recognition and low-level pixel evaluation are grouped together in convolutional encoder-decoder networks. However, these networks don’t have a reference to compare their generated output with. This results in larger training times and leads to distorted structures and blurry textures. The Generator Adversarial Network overcomes this deficiency by introducing a discriminator network which tells the generator net-work how good the generated image is. We present a pix2pix Generative Adversarial Network [2] with novel encoderdecoder layers for image inpainting. A pix2pix model as the name suggests, takes a pixel from the input image, manipulates it and converts it into an output pixel. It maps an input image to its output image and manipulates the pixel values of the input image as required by the end user. Our proposed model has multiple-layers of encoders and decoders. This forces the system to learn meaningful mappings from the input domain to the output domain, which in turn improves the learning of target distribution, all the while avoiding model collapse. Our contributions are summarized as follows: 1. 2.
Our proposed Pix2Pix Generative Adversarial Network comprises of a global discriminator which computes the loss factor for the system as a whole. We use an Adam optimizer which provides faster convergence.
2 Literature Survey Criminisi et al. [1] talks about filling in the missing pixel values using exemplarbased texture synthesis. It was the basis for most of the inpainting models that were developed thereafter. It discusses the technique for filling the missing pixel values by introducing the confidence term, but ultimately falls short when deal-ing with curved structures. Jia et al. [3] introduced tensor voting as a replacement to using a confidence term in deciding the order of filling of miss-ing pixels. To tackle the problem of inpainting curved structures, it put forward the texture-based segmentation approach which divided the image into regions and then went about the recreation process. When considering machine learning, Fawzi et al. [4] was one of the first to consider the use of neural networks in order to ll in missing pixels. The neural networks were pre-trained and their hallucinations were followed by a general smoothing process.
A Deep Learning Technique for Image Inpainting with GANs
35
Total Variation norm was a common algorithm used in smoothing and this greatly helped in restoring images. The difficulties in this were that of depth and back-ground perception. Owing to this, Kim et al. [5] proposed the use of an object detection algorithm YOLO [6] to detect missing portions of the image and then went ahead to ll these portions using classical inpainting methods such as the ones proposed in [1]. This fails in situations where there is a relationship between the different objects of the images. Using CNN’s, Nian et al. [7] proposes the use of a CNN with layers in order to blindly form the en-tire image. But a problem arises when the damaged or missing portion is more than half the size of the image. Yu et al. [8] overcame these shortcomings of Convolutional neural networks by using Generative Adversarial Networks. It combines contextual attention with GANs which helps the network in extracting features/characteristics of the images. This however failed to provide satisfactory results when tried on high resolution images. To overcome this problem Hsu et al. [9] used unsupervised GANs and VGG networks to recreate the holes in high resolution images and then CNNs to enlarge these images. This method produced promising results but resulted in large computational power and time. Our area of research focuses on reducing these parameters by optimizing the model efficiency of our image inpainting model.
3 Methodology 3.1 Training The dataset we utilized, comprised of pictures of resolution 256 × 256. An arbitrary mask was created on each picture in every conceivable spot. This was then joined alongside the first picture to frame a generated versus target pair. These subsequent pictures were of resolution 512 × 256. These pictures were then input into the generator framework with a batch size of 150. The generator originally produced a picture by reproducing the masked input image. It does this by computing certain values for the missing pixels in the masked region. This is then passed onto the discriminator which calculates the loss function as a factor of the first picture and the created picture. The weights of the framework are then recomputed and this procedure rehashes until Nash equilibrium is acquired. The accompanying calculation shows execution of the generator adversarial network.
3.2 Testing The trained model is called upon to anticipate the idea of the missing bit of a masked picture. The model created by us, appeared to perform well after 20,000 epochs of
36
K. A. Suraj et al.
training in a GPU driven system. The generated image is shown alongside the initial masked image and the expected image in Figs. 1 and 5.
4 Algorithm The generative adversarial nets were trained using a minibatch stochastic gradient descent method. The quantity of steps to apply to the discriminator, k is a hyperparameter. We utilized k = 1, the most economical alternative, in our examinations. 1. 2. 3.
Sample minibatch of m noise samples z(1), …, z(m) from noise prior pg(z). Sample minibatch of m examples x(1), …, x(m from data generating distribution pdata(x) Update the discriminator by ascending its stochastic gradient 1nX h log DXi + log 1 D G Z(i)i
4. 5.
(1)
End for Sample minibatch of m noise samples z(1), …, z(m) from noise prior pg(z). Update the generator by ascending its stochastic gradient: 1nX h log 1 D G Z (i)i
(2)
The gradient based updates are like the standard gradient based learning rule. We utilized momentum in our trials for quicker convergence. Figure 2 shows the basic architecture of our pix2pix GAN model.
4.1 Flow of Control See Fig. 3.
5 Implementation Generative Adversarial Neural systems are profound deep learning procedures which contain two systems, such that they are set up in opposition to each other. The essential philosophy behind this procedure is to increase the ability of the generator in recreating the pictures and limit the discriminator’s loss computations. LCGAN = E[log(D(x; e))] + E[log(1 D(G(Z); e))]
(3)
A Deep Learning Technique for Image Inpainting with GANs
37
Fig. 1 Dataset illustration
(a) Fruits Dataset
(b) Human faces Dataset
(c) Vehicle number plate Dataset
38
K. A. Suraj et al.
Fig. 2 Schematic representation of GAN
Fig. 3 Model flow of control
LCGAN = E[log(1 D(G(z); e))]
(4)
The conditions (3) and (4) clarifies the loss calculation of the discriminator and generator. It tends to be seen that the generator’s loss function maximizes or increases the fake information while the discriminator’s loss function boosts genuine information and simultaneously limits the phony information. This is done till the Nash equilibrium is accomplished which is the place where both the models accomplish their separate ultimate objectives without changing their underlying goals. The loss function for the framework all in all is a minimax function as given underneath:
A Deep Learning Technique for Image Inpainting with GANs
39
Fig. 4 This is the structure of the encoder decoder used in the model
LCGAN (G; D) = E[log(D(r;y)] + Elog(1 D(x;G(x;y)]
(5)
LL1 = Ex ; y; z[jy G(x; z)j]
(6)
A Pix2Pix GAN is a conditional type of GAN which makes an interpretation of an input picture to an output picture. Its generator has an encoder-decoder framework which accepts a given picture as input and makes an interpretation of it to result in a output picture. The Discriminator framework utilizes L1 loss function alongside the typical GAN loss computation. Our model has seven layers of encoders and decoders in the generator which utilize a LeakyRELU as an activation unit. The final produced image is then optimized utilizing an’ADAM’ enhancer for quicker convergence. Figure 4 gives an idea of the underlying encoder decoder framework. The masked input picture on being passed to the generator is sent to a layer of encoders which extract important features from the picture by breaking it down to a 64 × 64 sized picture. The features of the picture are obtained over seven layers of encoders as appeared in Fig. 4. This low level picture is then passed onto a multi layer decoder structure as appeared in Fig. 4 which recovers the picture in its unique measurements as presented initially to the encoder. The recovered picture is then passed onto the discriminator which processes the loss function and optimizes it until Nash Equilibrium is acquired as referenced by conditions (5) and (6).
6 Results and Discussion The binary cross entropy loss function was used to generate images which are indistinguishable from the original. Along with the ADAM optimizer the algorithm converged at a faster rate and without any model collapse. The images generated from our model was compared to the original on the basis of structural similarity, contrast as well as general appearance. We used the metric obtained by Peak signal
40
K. A. Suraj et al.
Fig. 5 Masked as well as the model generated images (The first Column of images shows input masked images. The second column shows the images generated by our model.)
noise to ratio as a comparison. The following images show the image generated by the model along with the source image and target image. From the results in Fig. 5 it is evident that the model can be used in wide variety of situations and applications without going into the nitty–gritty of the algorithm parameters. The primary attributes of an image such as texture, color, depth, shape and object-edges are well perceived in our model. Before the mask itself is generated, the above-mentioned parameters are taken well into consider-ation. The image obtained as a result is always visually coherent to the original image. Therefore, this aspect of our model helps with structuring corrupted, lost as well as blurry images which are poorly taken. This can be of immense help to police personnel in identifying fugitives who apply various techniques in masking their face, provided some of their facial features are visible. This also becomes extremely important to the police when decoding license plate numbers which are covered in paint/ink or grease by criminals who are trying to evade identification as shown in Fig. 5. Painters can also use this method to understand and interpret damaged and lost part of paintings. Creating pragmatic imaginations/hallucinations of the original image is done by our model. The versatility of our model generates umpteen use cases and possibilities. Qualitative Results Our model generates images with uniform coherence. Peak signal noise ratio was the metric we used to compare the generated image from our model to the original image. The greater the value of this ratio, the better is the quality of the reconstructed image. Our model generated PSNR values of 88% for the lichee and cauli ower images, 80% for human(celebrity) faces and 94 for license plates dataset respectively. These mentioned values are an indication that the model
A Deep Learning Technique for Image Inpainting with GANs
41
Fig. 6 Comparison of proposed model output image with those of Retouch and Google Snapseed (The first image shows results from our model. The second shows the images generated by Retouch, the third shows results from Google Snapseed)
is generating the images as a hallucination of what it assumed the image would look like rather than regenerating the original image as a whole. From Fig. 6 it is clear that our model generates the best images visually as well as structurally. The eye as well as the eye socket is not even generated in the other two models. The PSNR value obtained is 0.29 and 0.30 for retouch and Snapseed respectively. With the numeric aspect as well, our model generates results which are far superior to the models which are currently available to the general public as well as for commercial use. We even conducted a survey among the students and faculty of our university. Unanimously our model was selected as the best visually coherent generator of images.
7 Conclusion In this paper, we present the use of a Generative Adversarial Network in the form of a pix2pix model. A set of masked images are sent to the generator whereas the original image is given to the discriminator which is able to pre-dict the originality of the image generated from the generator. At the training stage, the model automatically learns how to map corrupted/damaged pixels to target pixels, which implicitly captures the prior as well as the background information of the images. Once the generator has learned, it can automatically repair a corrupt image without knowing in advance where the damaged pixels are present (blind inpainting). The results obtained are superior to the existing inpainting methodology in both quantitative as well as visual appearance. The model presented works well with any given size for the images.
References 1. Criminisi, A., Perez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004)
42
K. A. Suraj et al.
2. Isola, P., et al.: Image-to-image translation with conditional adversarial net-works. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 3. Jia, J., Tang, C.-K.: Image repairing: robust image synthesis by adaptive nd tensor voting. In: 2003 IEEE Computer Society Conference on Computer Vi-sion and Pattern Recognition, Proceedings, vol. 1. IEEE (2003) 4. Fawzi, A., et al.: Image inpainting through neural networks hallucinations. In: 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). IEEE (2016) 5. Kim, C., et al.: Diminishing unwanted objects based on object detection using deep learning and image inpainting. In: 2018 International Workshop on Advanced Image Technology (IWAIT). IEEE (2018) 6. Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 7. Cai, N., et al.: Blind inpainting using the fully convolutional neural network. Vis. Comput. 33(2), 249–261 (2017) 8. Yu, J., et al.: Generative image inpainting with contextual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) 9. Hsu, C., Chen, F., Wang, G.: High-resolution image inpainting through multiple deep networks. In: 2017 International Conference on Vision, Image and Signal Processing (ICVISP). IEEE (2017).
A Comparative Study on Distributed File Systems Suman De and Megha Panjwani
Abstract Distributed File Systems are mainstays for the way massive amounts of data is stored. The emergence of Hadoop File Systems, Google File Systems and Network File Systems have changed the course of how data is managed in servers and has its own implications on Cloud Computing and Big Data management. Each file system offers its own advantages and challenges in terms of performance, faulttolerance, consistency, scalability and availability. This opens an open debate on how these can be taken up for implementation. The choice of a feature available with each one of them has their own metrices that differentiates them from other file systems. This paper looks at a comparative study on the file systems and proposes the criterion behind the choice of selection of a specific file system. The study also explores the advantages of using a file system and the benefits and disadvantages associated with them. Keywords Distributed file system · Hadoop file system · Google file system · Network file system · OpenAFS
1 Introduction The need for distributed environments is increasing with the immense need to store, process and analyze data in the fields of aerodynamic research, weather forecasting, scientific applications, banking, etc. A Distributed File System is a client/server architecture-based application that facilitates the access and process of data stored on multiple servers and responds to the client like it does for data stored in a local system. The process is facilitated with the client receiving a copy of the file and the same being cached in the local system. This type of file system organizes files from S. De (B) · M. Panjwani SAP Labs, Bangalore, Karnataka, India e-mail: [email protected] M. Panjwani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_5
43
44
S. De and M. Panjwani
individual servers into a global directory and hence it appears that the remote access to the file is not location specific but is still identical from the client. The Distributed Files Systems comes with a mechanism to avoid conflicts and try and share the most current version of the data when requested by a client. The files are replicated over various sources by means of file or database replication to handle situations of data failure and provide the possibility of data recovery. The requirements to be considered while designing such a DFS ranges from [1]: • Fault tolerance: Measures how fast the data can be recovered during any failure. • Hugh file size: A significant number of files can size in GBs. Certain systems process files in chunks that provides a benefit of limiting the amount of data processed by a single function from significant Gigabytes to a few Megabytes. • Write-once-read-many pattern: Many Distributed File Systems provide optimized functionalities for file write and read operations. • Role of Metadata: Distributed File Systems allocate a specific vertex as the primary node, that manages the meta information the associated stored files. The different areas that are to be considered for designing such systems are Transparency, flexibility, reliability, performance, scalability and security, as well as factors of Architecture, Processes, Communication, Naming, Synchronization, Caching and Replication and approaches to handle Fault Tolerance.
2 History of File Systems A prefilled copyright form is usually Network File System was one of the most popular Distributed File Systems developed by Sun Microsystems in 1985 and was an open, and widely used distributed file system for a long time. It allows mounting all or portion of a file system and can be accessed as per the privileges assigned. Network File System are classified in two types: NFSv3, that has seen its usage for a large period and NFSv4, which was presented in 2003 and consists of multiple upgrades over NFSv3. NFSv4.1 (RFC-5661) was ratified for improving scalability and give better performance. NFv2 and NFSv3 supported the use of UDP protocol and facilitated stateless connections whereas NFSv4 only supported TCP protocol. The stored files are classified as a list of bytes and the system follows a tree-like modelling as a representation that assists the use of hard and symbolic links (Fig. 1). Andrew File System started as part of a larger project called Andrew. It was developed by the Carnegie Mellon University and was initially called “Vice”. This was primarily designed for systems which run operating systems like BSD, UNIX and Mach and uses a set of trusted servers to provide a homogeneous, geographyindependent namespace to all clients. The development of Andrew File System is currently continued through a project known as, OpenAFS. This development works on multiple offerings like Linux, Apple Mac OSX, Sun Solaris and MS Windows NT [1].
A Comparative Study on Distributed File Systems
45
Fig. 1 Network file system architecture
3 Literature Survey Distributed File Systems, as of now found in the past area, has been the field for various progressions and here, we investigate certain pertinent works done in the theme as of late. In a 2017 paper named, “An Efficient Cache Management Scheme for Accessing Small Files in Distributed File Systems” by Kyuongsoo Bok, Hyunkyo Oh, Jongtae Lim and Jaesoo Yoo, they proposed a disseminated store the executives plot that considers reserve metadata for effective gets of little documents in Hadoop Distributed File Systems (HDFS). The conveyed reserve the executives plot that applied store metadata synchronization with adaptable guideline of correspondence cycle to improve little record get to speed and limit organize load with NameNodes in HDFS. The proposed plot essentially diminished the recurrence of DataNode circle access by keeping up little records that are much of the time utilized by clients in each DataNode reserve and overseeing store metadata on them in the NameNode. Moreover, the proposed plot limited the correspondences with the NameNode by keeping up square metadata and reserve metadata in the customer store and diminished superfluous document gets to by applying a store metadata update strategy [2]. In a paper identified based on Metadata the in 2019 named, “An Efficient RingBased Metadata Management Policy for Large-Scale Distributed File Systems” by Yuanning Gao, Xiaochun Yang, Jiaxi Liu and Guihai Chen, they proposed a novel hashing plan called AngleCut to segment metadata namespace tree and serve huge scope conveyed capacity frameworks. AngleCut first uses a territory protecting hashing (LPH) capacity to extend the namespace tree into straight keyspace, i.e., various Chord-like rings. AngleCut, a proficient and adaptable metadata the executives to parcel metadata namespace tree and serve EB-scale document frameworks. AngleCut first tasks the namespace tree into various Chord-like rings utilizing a novel ring-based region saving capacity. At that point we structure a very much
46
S. De and M. Panjwani
planned history-based distribution procedure to dispense the metadata consistently to MDS’s. The two-layer metadata store instrument is proposed to improve the question productivity and decrease the system overhead. To wrap things up, we structure a productive dispersed preparing convention called 2PC-MQ to ensure the consistency of conveyed metadata exchanges. AngleCut jam the metadata territory basically just as keeping up a high burden adjusting degree between MDS’s. The hypothetical confirmation and broad examinations on Amazon EC2 displayed the prevalence of AngleCut over the past writing [3].
4 Google File System The Google File System was revealed in 2003 to handle the ever-expanding data processing requirements of Google and its applications. It is comprised of groups that contains a considerable lot of storage machines which have been created using cheaper tools and technologies and uses cluster-based model. It has groups made up of a lone master vertex, many block servers and gets connected by different clients. Files are segregated into fixed-size chunks which are grouped using a unique 64bit chunk handle. Every block is replicated on different block servers to confirm reliability. It is developed to manage Google’s huge group needs by keeping the existing application functionalities intact. Files are dumped in tree-like structures distinguished by their path names. Meta Information like entry grant requirement, namespace and related data is maintained by the primary responsible, that communicates and observes the progress of every block server via periodic heartbeat notifications (Fig. 2). Google File System consists of characteristics like: • Error tolerance • Copying mechanism of important data • Self-reliant data backup and recovery
Fig. 2 Google file system architecture
A Comparative Study on Distributed File Systems
• • • •
47
Larger productivity Less communication of primary and sub-category servers because of block server Identification mechanism and authorization scenarios Significant presence and lesser downtime
The Google File System considers block copy placement principles that serves two purposes: increase data reliability and availability and enhanced network bandwidth usage. This confirms few copies of a block that is present although a complete rack is hampered or is disconnected. The biggest GFS group has more than 1K vertices with 300 Terabytes of disk storage capacity.
5 Hadoop File System The Hadoop Distributed File System is an opensource variant of Google File System from Yahoo and was intended to run over hardware equipment to encourage promotion serving and Search Engine Optimization (SEO) prerequisites. The critical distinction of HDFS with different frameworks is that, it is profoundly more fault tolerant when compared to other counterparts. It gives larger productive rights to information and is principally intended for web offerings that comprises of huge information sets [4]. Facebook, eBay, LinkedIn and Twitter are among the web organizations that utilize Hadoop File System to support large information chunks and prerequisites for data analytics’ projects. Hadoop File System group comprises of a singular name vertex which goes about as the ace hub that handles the record scaffolding group convention and refers to the entry point to files provided by users. The client information is put away as records, which are divided in about a single square, and the mentioned squares are put away in a lot of Information Vertices. These perform creation of blocks, cancellation, and replication at whatever point the Name Node commands so. A Hadoop File System is for the most part conveyed to enormous scope usage and thus, the help for minimal effort product equipment is an especially valuable component. The mentioned systems execute regularly on a GNU/Linux based framework (Operating System). Hadoop File System is made utilizing Java and other systems which underpins Java as a runtime for the Information Vertex or the Data Node offering. Electronic applications utilizing such information concentrated activities can go into several petabytes and many hubs/vertices. Accordingly, must be strong as server disappointments are basic at such scale. It was used by The New York Times for enormous scope picture transformations, Media6Degrees and Fox Audience Network for log handling, information mining and AI, LiveBet for log stock piling and chances investigation and Joost for meeting. It is the center of many open source information distribution center other options, which are called information/data lakes (Fig. 3).
48
S. De and M. Panjwani
Fig. 3 Hadoop file system’s master slave architecture
6 Other File Systems and Services File systems, like OpenAFS, OpenIO, GlusterFS, MapR FS, Aluxio, have been widely used and we discuss about a few open sourced and proprietary File Systems below.
6.1 OpenAFS OpenAFS offers a Distributed File System presentation that offers a client-server model with integrated file and copied read-only content distribution, location independence, scalability, security, and migration capabilities. AFS works for a range of diversified systems like UNIX, Linux, MacOS X, and MS Windows [5].
A Comparative Study on Distributed File Systems
49
6.2 OpenIO OpenIO presents an item stockpiling answer for building hyper-adaptable IT frameworks with a wide scope of applications. OpenIO has local item APIs alongwith SDKs for Python, C, and Java, and coordinates an HTTP REST/API that reserves solid similarity with Amazon S3 and OpenStack’s Swift Application Programming Interfaces. It presents a restrictive File System connector to get to information put away in an OpenIO object-store through file access techniques and depends on Fuse and gives a POSIX File System that can be distributed and managed over nearby systems with the help of NFS, SMB, and FTP [6].
6.3 Google Cloud Storage Google Cloud Storage is a RESTful online document web administration for getting to information on Google Cloud Platform framework. The administration consolidates the presentation and adaptability of Google’s cloud. It is an Infrastructure as a Service (IaaS), tantamount to Amazon S3 online capacity administration [7].
6.4 MapR File System (MapR FS) The MapR FS refers to a grouped record framework that enhances an assortment of interfaces including traditional read/compose by means of Network File System and a FUSE interface, for platforms like, Apache Hadoop and Spark [8]. The main characteristics associated with MapR FS are as follows: • • • • • •
Repeated indexes, no single hub repeats entirety of the meta-information Dispersed bunch metadata which courses its action into replication chains Productive utilization of B-trees even with extremely huge indexes Group apportioning without loss of consistency Refresh simultaneously without requiring worldwide locking structures Moving overhauls and online filesystem upkeep
7 Comparison See Table 1.
50
S. De and M. Panjwani
Table 1 Comparison of various file systems on various parameters and design considerations File system
Performance Scalability
NFS
Average one-way latencies of 0.027, 6.87, 13.9 ms
Scalable-allows Available parallel storage in small and big file sizes of 100 MB, 5 GB
HDFS
Average two-way latency of 175 s for a file size up to 50 GB
Addition or deletion of nodes on the go is possible
GFS
Has fixed chunks; each chunk is 64 KB block and each block has 32 bit checksum
OpenAFS Parallel processing is not possible; average 1024 MB sized file processed per unit time
Availability Fault tolerance
Data flow
Reliability
Can tolerate CPU failure and its state available in /var/lib/nfs
Transmission happens through TCP & UDP
Earlier versions not reliable, improvised in NFS v4
High availability in Hadoop 2.x to solve single failure
Creates replica of machines in different clusters
Special technique MapReduce is used for data transfer
Creates replica of data users on different machines
Minimize master’s involvement in file access to avoid hotspots
Partitions memory into tablets called as BigTable which allows high availability
Chunks stored in Linux system and replicated at multiple sites
Pipelining over TCP connections maintained for highbandwidth data flow
Controls multiple replicas at different locations; ensuring reliability
Scalable up to level of Peta Bytes; 1 GB per user: 1 PB for 1 million users
4-bit releases of APS available from Secure Endpoints with stability issues
Replication doesn’t happen but RO multiple servers are used
R/W or R/O data; mechanism to create 11 replicas of read-only data
Ensured by read-only file replication and client-side file caching
8 Conclusion Although there are multiple different file systems being used around the world, we have tried to study and compare the most popular ones. Based on above comparative values, we can conclude that HDFS has most preferred attributes with high performance, availability and a strong file replication strategy against fault tolerance as well. GFS comes next in line as regards scalability and using chunks of data for pipelining transmission over TCP channels. Talking about NFS and OpenAFS, NFS is bit preferred at is an older file system and is thus considered more stable by users but OpenAFS also provides some user-centric features such as scalability up to the levels of Peta Bytes giving 1 GB to each user. Also, OpenAFS is an open source version of the traditional AFS and thus, is being preferred by users in today’s date.
A Comparative Study on Distributed File Systems
51
References 1. Sudha Rani, L., Sudhakar, K., Vinay Kumar, S.: Distributed file systems: a survey. Int. J. Comput. Sci. Inf. Technol. 5(3), 3716–3721 (2014) 2. Bok, K., Lim, J., Oh, H., Yoo, J.: An efficient cache management scheme for accessing small files in distributed file systems. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, 2017, pp. 151–155. https://doi.org/10.1109/BIGCOMP.2017.788 1731 3. Gao, Y., Gao, X., Yang, X., Liu, J., Chen, G.: An efficient ring-based metadata management policy for large-scale distributed file systems. IEEE Trans. Parallel Distrib. Syst. 30(9), 1962–1974 (2019). https://doi.org/10.1109/TPDS.2019.2901883 4. Nithya, M., Maheshwari, N.U.: Load rebalancing for Hadoop distributed file system using distributed hash table. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS), Palladam, 2017, pp. 939–943. https://doi.org/10.1109/ISS1.2017.8389317 5. Available Website: https://www.openafs.org/. Last Accessed: 25th May 2020 6. Available Website: https://www.openio.io/. Last Accessed: 25th May 2020 7. Available Website: cloud.google.com/storage/. Last Accessed: 25th May 2020 8. Available Website: https://www.mapr.com/. Last Accessed: 25th May 2020
An Organized Approach for Analysis of Diabetic Nephropathy Images Using Watershed and Contrast Adaptive Thresholding Syed Musthak Ahmed, Fahimuddin Shaik, Vinit Kumar Gunjan, and Mohammed Yasin Ali Abstract The main origin of enduring kidney disease and a significant source of coronary mortality is diabetic nephropathy. Diabetic Nephropathy was divided into phases: micro albuminuria and macro albuminuria. Nephropathy is characterized pathologically by thickening of glomerular and tubular basal membranes in persons with type diabetes, with gradual mesangial extension (diffuse or nodular) contributing to gradual reduction of glomerular filtration surface. It raises the risk of death, primarily from cardiovascular causes, and in the absence of other renal disorders, it is characterized by increased urinary albumin excretion (UAE). A new algorithm is proposed in this work to examine the fundamental problems present in acquired Diabetic Nephropathy images. Through integrating these two approaches, a pre-processing technique such as (Contrast Enhancement, CLAHE) as well as postprocessing technique such as (Cell Detection) segmentation provides an integrated worldview of picture handling techniques that is utilized to make the casing work and is helpful for basic translation just as an educational asset for the normal man. Investigating and exploring the significance of less broadly utilized estimation parameters in clinical picture examination stage is acted in this examination. Keywords Diabetic nephropathy · Diabetic mellitus · Cardiovascular mortality
S. M. Ahmed SREC, Warangal, Telangana, India e-mail: [email protected] F. Shaik (B) Department of ECE, Annamacharya Institute of Technology & Sciences, Rajampet, India e-mail: [email protected] V. K. Gunjan Department of CSE, CMR Institute of Technology, Medchal, Hyderabad, India M. Y. Ali Department of ECE, Mewar University, Chittorgarh, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_6
53
54
S. M. Ahmed et al.
1 Introduction Since Human Life is sincere than all else, a tremendous agreement of exertion has been made today to recognize an ailment, and its complexities Diabetic mellitus is a metabolic condition that is ordered by the pancreas’ inability to control blood glucose rates. This inquiry adds to levels of blood glucose better than average range [1]. The International Diabetic Federation, India is one of the six IDF SEA (South East Asia) nations on the planet. 387 million individuals universally have diabetes and 75 million individuals in the SEA locale; this will develop to 123 million by 2035. In 2014 India had 66.8 million instances of diabetes. Diabetic Nephropathy is the principal wellspring of ceaseless kidney sickness and a main wellspring of cardiovascular mortality. Nephropathy is portrayed pathologically by thickening of glomerular and rounded basal films in people with type diabetes, with slow mesangial augmentation adding to steady decrease of glomerular filtration surface. Diabetic nephropathy has been categorized into stages: albuminuria micro and albuminuria macro. Diabetic nephropathy is a pathological condition with the following characteristics: chronic albuminuria (>300 mg/d or >200 µg/min) which is reported 3–6 months later, at least 2 occasions. Progressive reduction in the rate of glomerular filtration (GFR) Elevated blood pressure (Fig. 1). One picture from [2] is used as an instance to demonstrate the distinction between regular and irregular (Diabetic) groupings. The key differences observed here are with podocyte reduction and decreased mesangial cell enlargement resulting in Diabetic Nephropathy. This can be considered as a risky situation in diabetic patients which increases the morbidity. The care system level has to be increased such that this particular disorder related to kidney can be controlled at each and every level to cut down the tricky situations. The possible reason in an enormous measure of advanced picture handling claims is to separate basic structures from the picture information from which human crowds can increase a portrayal, clarification or impression of the image, or to offer ‘more’ criticism for extra mechanized kind of picture preparing techniques [3]. The natural vision identified with the field of PC vision technique is one of the most significant
Fig. 1 Characteristic glomerular changes of diabetic nephropathy
An Organized Approach for Analysis of Diabetic …
55
methods for human perception of the earth, leading complex exercises without any difficulty, for example, watching, perusing, distinguishing and grouping designs. Improvement is a focal movement of advanced picture preparing and examination, which looks to upgrade the picture’s quality in relations of comprehension of human brilliance [4].
2 Need and Importance of Research Problem Diabetic mellitus (DM) is a metabolic condition brought about by a powerlessness of the pancreas to control the arrival of blood glucose. Impacts of this bind will uncover blood glucose levels crazy. An individual kick the bucket on the planet for like clockwork from diabetic-related reasons. There is currently a critical need to break the association among diabetes and its entanglements [5]. In any case this issue prompts the harm of the indispensable organs individually even without having any manifestations. In this way it very well may be asymptomatic which is a high hazard issue in the afflictions identified with Diabetes. On the off chance that this Analysis dependent on picture acknowledgment can be gainful for prior finding, mindfulness and treatment for this reason. To assemble mindfulness among individuals, clinical picture examination of diabetic patients with its related inconveniences, for example, diabetic nephropathy is required [6]. This procedure can help the on-going clinical treatment to maintain a strategic distance from the passing rate among the Diabetic patients by distinguishing the side effects at a beginning period [7]. There are downsides of clinical conclusion and investigation of patient photos acquired by various usually accessible testing strategies. By means of picture division procedures, further distinguishing proof and extraction of fundamental highlights required for exact finding can be gotten [8]. These properties are extremely helpful in surveying the hazard and profundity of the reprobate identified with DM [9]. This clinical picture preparing is to be acted in a novel manner by programming alteration dependent on the utilization of the techniques for picture upgrade and division [10]. The final result of this examination is to be useful in recognizing and foreseeing issues brought about by D.M [11].
3 Destinations and Scope The destinations of the proposed examination work are spread over here in an unassuming and clear way that will prime to the progression of society both considerably and in fact in accomplishing these objectives. • To explore ongoing Image Enhancement (Contrast Enhancement Methods) and Segmentation (Watershed and Cell Detection) calculations to take care of the issue and endeavor to change the specific application calculation.
56
S. M. Ahmed et al.
• The overview of different calculations is completed by means of experimentation techniques, yet one needs to remember that all the calculations in Image Processing are reliant on the kind of data sources gave and the necessities, for example, tool compartments and reenactment devices required. • To build up an Organized philosophy, as depicted in the title of the subject, by consolidating the strategies of picture handling (improvement and division) as indicated by the prerequisites and furthermore by arranging measurable correlations of picture characteristics. • Exploring and investigating the significance of estimation models that are not broadly utilized in the field of clinical imaging. These standards must be related with the clinical discoveries so as to dispatch a logical ground for assessing a specific calculation and preclude which of the techniques are not all that important. • To set up a methodological robotized clinical imaging framework for identifying and foreseeing irregularities in the called indicative issue.
4 Plan of Work and Methodology The Technique is spread out beneath as a bit by bit convention for tending to the issue being thought of. Image information base combination identified with Diabetic Nephropathy Gathering the picture document (CT pictures of patients with Myonecrosis) from Online (Public Archive) and furthermore from Diabetic research organizations in the further work with all the clinical discoveries with appropriate authorization from the related specialists. The information gathered must have clinical appraisal factors close by with the end goal that a relationship is completed to unmistakably build up the get yields are precise. The visual information is comprised of shifted databases with various modalities and attributes of records. Employment of calculations Therefore, the picture upgrade and division calculations will be considered for which usage are usable. The calculations will be chosen dependent on the materialness and value on the Image information obtained for the Investigation reason. The calculations which are not accessible for sending will be executed as a major aspect of the exertion. Testing Process The calculations will be tried on the base of visual clinical information gathered. This procedure fundamentally relies upon different picture properties and furthermore different parameters associated with the reenactment condition picked for experimentation reason. A few calculations can utilize thorough testing to separate significant highlights.
An Organized Approach for Analysis of Diabetic …
57
Analysis An assessment of the calculation will be made utilizing the result of the tests. For every calculation a factual examination and graphical portrayal between a standard gathering (DM) and an abnormal gathering (DM with Myonecrosis) will be classified. The discoveries ought to be corresponded over a predetermined measure of time, with the guide of the logical calling, with the restorative assets. One should think about some relative qualities or calculation disadvantages. Modified calculation Any or more upgrades to current late Image improvement and division methodologies may become evident relying upon the investigation. At the point when this happens, another calculation can be created that exhibits the changes. Test adjusted calculation Based on the models the changed calculation will at that point be assessed.5. Description of the research work. The brief explanation of each block in Fig. 2 is explained as follows. • The image acquiring process is used to retrieve or interpret a grayscale or color image from a file defined by the filename of the series. • The instruction available in Matlab which is popularly referred as rgb2gray(RGB) transforms RGB to a grayscale color picture I. Rgb2gray transforms RGB images to grayscale by removing the information about hue and saturation while maintaining luminance • Using a glomerular cell image, this enhancement illustration shows how to refine an image to compensate for non-uniform light, and then use the improved image to clearly distinguish the cell. This helps you to know about the characteristics of the cell, and to measure statistics easily for all the different sections of the picture. • An Instruction to enhance the image known as “imadjust” improves the image contrast by converting the input quality information values to different values, such that 1 percent of the data becomes saturated at low and high input data intensities by comparison. • The instruction histeq provides the equalization of histograms. It improves image contrast by translating values into an intensity map, such that the output map histogram roughly matches a specified histogram (default uniform distribution). • The instruction adapthisteq executes adaptive histogram equalization with a contrast-limited. It operates on small data regions (tiles) rather than the whole file, as opposed to histeq. The contrast of each tile is improved, so that the histogram of each output area fits roughly the specified histogram (default uniform distribution). The contrast enhancement should be reduced to prevent distortion of the noise that may be present in the image [12]. • Segmentation is also the key stage in the processing of images: the point at which we switch from treating each pixel as an observation unit to dealing with objects (or sections of objects) in the image, consisting of several pixels. If segmentation is well performed then the other levels of image processing are simplified.
58
S. M. Ahmed et al.
Fig. 2 Working block diagram of implemented method
• Division of the Watershed to recognize contacting objects in an image. The change of the water-shed is additionally even minded to this issue. The changing watershed acquired will have the option to discover in an image with watchwords known as “catchment bowls” and “watershed edge lines,” by review it as a surface where the watched light pixels are demonstrated to be huge and dim pixels are low [13]. • Cell Detection illustrates how edge detection and simple morphology are used to detect a cell. When the object has adequate contrast from the background, an
An Organized Approach for Analysis of Diabetic …
59
object can be easily identified in an image. The cells in this case are cells with prostate diabetes nephropathy [14]. • During this stage of analysis the visual parameters were analyzed and the statistical parameters were also determined using the program Medical Image Processing and Visualization (MIPAV).
5 Algorithm • Step 1: Read original image from the data base. • Step 2: Conversion of original colour image into gray image. • Step 3: Enhancing the converted gray image using Contrast Enhancement technique • Step 4: Enhanced Image in the first pass subjected to CLAHE • Step 5: Finding the edges of grayscale image by using Laplacian of Gaussian method • Step 6: Create a linear structure element using strel() function • Step 7: Dilate the image • Step 8: Fill the holes in the image • Step 9: Clear the light structures associated to the image border • Step 10: Create a structural element using diamond function • Step 11: Erode the greyscale in the image • Step 12: Find the perimeter of the cell in the binary image. • Step 3: Analysis using statistical values. Algorithm description The picture from the Electron Microscopy which is considered to be the open source repository from Nikon is taken as the picture of the data. Glomerular Cell shot is considered in this algorithm for experimentation and investigation. The picture is as a rule in RGB shading. Along these lines, utilizing the MATLAB program, the RGB picture is changed and the picture experiences numerous calculations to show signs of improvement yield. The RGB picture is at first changed to a dim scale to forestall dynamic estimations [2]. The following move is to utilize differentiate upgrade procedures to improve the dim shading. After these means are done the key advance is then performed division of the cell recognition. Cell identification is the procedure of edge base division, utilizing the Gaussian Laplacian process. It produces the structural factor in the next step and then dilates and fills the holes in the images and eventually erodes the grayscale in the image to give the diameter of the cell in the binary picture.
60
S. M. Ahmed et al.
Statistical Analysis using MIPAV The MIPAV program requires various modalities such as PET, MRI, CT, or microscopy to quantitatively examine and imagine medical images. Using the standard user interface and analysis methods of MIPAV, researchers at remote locations (via the internet) can conveniently exchange data and analyzes on results, thus improving their ability to study, identify, track and manage medical disorders. Furthermore, MIPAV encourages analysts to complete measurable examinations on conceal and molded ROIs (Region of intrigue). Factual Analysis report The productivity of the applied methodology was thoroughly assessed utilizing quality measurements, for example, Area, Perimeter, Median, Standard Intensity Deviation, Coefficient of Skewness.
6 Results and Analysis The images shown below are electron-microscopic images of glomerular cells. Figure 3a, b are the images of healthy cell and diabetes affected cell respectively. Images represented below represents healthy and affected cell using Watershed.
6.1 Watershed Method Figure 3a, b are initial electron microscopic images that were affected by health and disease. Via the visual inspection we can note in the first picture that the base membrane is not dense and that the cells of Macula Densa are not large. In Fig. 3b, it
a
original image
b
original image
Fig. 3 a Original image showing the healthy condition of base membrane (Courtesy: Visuals Unlimited, Inc., US. Credits: Dr. John D. Cunningham), b original Image showing mild expansion of base membrane (Courtesy: PathologyOutlines.com, Inc. Reviewer: Nikhil Sangle, M.D.
An Organized Approach for Analysis of Diabetic …
a
61
b
gray image
gray image
Fig. 4 a Grey converted image of Fig. 3a, b grey converted image of Fig. 3b
is obvious from visual observation that Basemembrane is marginally thickened and that the macula densa cells (color black) are also densified. The above images are the grey scale converted images from the original color images of Fig. 3a, b respectively. The above images represents the histogram informations of Fig. 3a, b respectively. The above images shows the contrast enhanced output subjected from grey converted images. Figure 6a is the healthy cell condition image. Figure 6b is the affected cell condition image (Fig. 5). The above images represents the histogram informations of Fig. 6a, b respectively. The above images are the outputs of gradient magnitude filtering (Fig. 7). The above images are the outputs of watershed transformation technique of the glomerular cells (Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20).
b
a 3000
2000
2500 1500 2000 1500
1000
1000 500 500 0
0 0
50
100
150
200
250
Fig. 5 a Histogram of Fig. 4a, b histogram of Fig. 4b
0
50
100
150
200
250
62
S. M. Ahmed et al.
a
b
contrast enhanced image
contrast enhanced image
Fig. 6 a Contrast enhanced image, b contrast enhanced image
b
a 1600
3000
1400 2500 1200 2000
1000
1500
800 600
1000
400 500
200
0
0 0
50
100
150
200
250
0
50
100
150
200
Fig. 7 a Histogram of Fig. 6a, b histogram of Fig. 6b
a
Gradient magnitude (gradmag)
b
Gradient magnitude (gradmag)
Fig. 8 a Gradient magnitude filtered image, b gradient magnitude filtered image
250
An Organized Approach for Analysis of Diabetic …
a
Watershed transform of gradient magnitude (Lrgb)
b
63 Watershed transform of gradient magnitude (Lrgb)
Fig. 9 a Watershed transform of the gradient image (Fig. 8a), b watershed transform of the gradient image (Fig. 8b)
a
Opening-by-reconstruction (Iobr)
b
Opening-by-reconstruction (Iobr)
Fig. 10 a Morphological opening reconstructed image, b morphological opening reconstructed image
a
Opening-closing (Ioc)
b
Opening-closing (Ioc)
Fig. 11 a Morphological closed image, b morphological closed image
64
a
S. M. Ahmed et al.
b
Opening-closing by reconstruction (Iobrcbr)
Opening-closing by reconstruction (Iobrcbr)
Fig. 12 a Opening-closing by reconstruction, b opening-closing by reconstruction
a
Regional maxima of opening-closing by reconstruction (fgm)
b
Regional maxima of opening-closing by reconstruction (fgm)
Fig. 13 a Image showing Regional maxima of opening-closing by reconstruction, b image showing regional maxima of opening-closing by reconstruction
a
Regional maxima superimposed on original image (I2)
b
Regional maxima superimposed on original image (I2)
Fig. 14 a Image showing regional maxima superimposed on original image, b image showing regional maxima superimposed on original image
An Organized Approach for Analysis of Diabetic …
a
Modified regional maxima superimposed on original image (fgm4)
b
65 Modified regional maxima superimposed on original image (fgm4)
Fig. 15 a Image showing modified regional maxima superimposed on original image, b image showing modified regional maxima superimposed on original image
a
Thresholded opening-closing by reconstruction (bw)
b
Thresholded opening-closing by reconstruction (bw)
Fig. 16 a Image of thresholded opening-closing by reconstruction, b image of thresholded opeingclosing by reconstruction
a
Watershed ridge lines (bgm)
b
Watershed ridge lines (bgm)
Fig. 17 a Image showing watersheded ridge lines, b image showing watersheded ridge lines
66
a
S. M. Ahmed et al. Markers and object boundaries superimposed on original image (I4)
b Markers and object boundaries superimposed on original image (I4)
Fig. 18 a Image with markers and object boundaries superimposed on original image, b image with markers and object boundaries superimposed on original image
a
Colored watershed label matrix (Lrgb)
b
Colored watershed label matrix (Lrgb)
Fig. 19 a Colored watershed label matrix, b colored watershed label matrix
Fig. 20 a Output image showing RGB superimposed transparently on original image, b output image showing RGB superimposed transparently on original image
7 Conclusion A concise overview of the current procedure and suggested procedure, as well as a simple representation of Diabetic Myonecrosis is provided here. For an instance, the results provided in this report are of existing method. The experimental findings of
An Organized Approach for Analysis of Diabetic …
67
the new approaches would be well adapted for predicting anomalies in Myonecrosis Diabetic Patients. The picked mode for writing computer programs is the MATLAB logical computational language (R2010a or more) utilizing picture creation, picture handling, fixed point and neural systems tool kits. Calculations presented on different frameworks, for example, MIPAV clinical imaging applications, might be utilized if their discoveries are useful in taking care of the issue.
References 1. Sharifi, A., Vosolipour, A., Aliyari, Sh, M., Teshnehlab, M.: Hierarchical Takagi-Sugeno type fuzzy system for diabetes mellitus forecasting. In: Proceedings of 7th International Conference on Machine Learning and Cybernetics, Kunming, vol. 3, pp. 1265–1270, 12–15 July 2008 2. Riphagen, I.J., Lambers, H.J., Heerspin, K., Rijk, O.B.G., Carlo, A.J.M.: Gaillard diseases of renal microcirculation: diabetic nephropathy. In: Living Reference Work Entry PanVascular Medicine, pp. 1–34. Springer Publications, July 2014 3. Peres, F.A., Oliveira, F.R., Neves, L.A., Godoy, M.F.: Automatic segmentation of digital images applied in cardiac medical images. In: IEEE-PACHE, Conference, Workshops, and Exhibits Cooperation, Lima, PERU, Mar 15–19, 2010 4. Intajag, S., Tipsuwanporn, V., Chatthai, R.C.: Retinal image enhancement in multi-mode histogram. In: 2009 World Congress on Computer Science and Information Engineering, vol. 4, pp. 745–749, Mar 2009 5. Kumar, A., Shaik, F.: Image Processing in Diabetic Related Causes. Springer-Verlag Singapur Publishers (Springer Briefs in Applied Sciences and Technology-Forensics and Medical Bioinformatics), May 2015. ISBN: 978-981-287-623-2 6. Shaik, F., Sharma, A.K., Ahmed, S.M., Gunjan, V.K., Naik, C.: An improved model for analysis of diabetic Retinopathy related imagery. Indian J. Sci. Technol. 9(44) (2016). ISSN: 0974-6846 7. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. An Imprint of Pearson Education, 1st edn. Addison-Wesley 8. Sajad, A., Hayath, B.P.: Diabetic Cardiomyopathy: Mechanisms, Diagnosis, and Treatment. Department of Cardiology Northwick Hospital, UK (2004) 9. Omar, A., Sunni, A.A.L., Withers, S.: Diabetic Cardiomyopathy. The Manchester Heart Centre, UK (2009) 10. Ravindraiah, R., Shaik, F.: Detection of exudates in diabetic retinopathy images. In: National Conference on Future Challenges and Building Intelligent Techniques in Electrical and Electronics Engineering (NCEEE ‘10), pp. 363–368, Chennai, INDIA, July 2010 11. Shaik, F., Sharma, A.K., Ahmed, S.M.: Detection and analysis of diabetic myonecrosis using an improved hybrid image processing model. In: IEEE International Conference on Advances in Electrical, Electronics, Information, Communication and Bioinformatics-2016 (AEEICB2016 to be published in IEEE Explore) at Prathyusha Institute of Technology and Management, 27–28th February 2016. ISBN: 978-1-4673-9745-2 12. Askew, D.A., Crossland, L., Ware, R.S., Begg, S., Cranstoun, P., Mitchell, P., Jackson, C.L.: Diabetic retinopathy screening and monitoring of early stage disease in general practice: design and methods. Contemp. Clin. Trials 33(5), 969–975 (2012) 13. Shaik, F., Sharma, A.K., Ahmed, S.M.: Hybrid model for analysis of abnormalities in diabetic cardiomyopathy and diabetic retinopathy related images. Springer Plus Journal, Springer Publications, Apr 2016. ISSN: 2193-1801 14. Peto, T., Tadros, C.: Screening for diabetic retinopathy and diabetic macular edema in the United Kingdom. Curr. Diab. Rep. 12(4), 338–345 (2012)
A Literature Survey on Identification of Asthma Using Different Classifier and Clustering Techniques Syed Musthak Ahmed, Fahimuddin Shaik, Vinit Kumar Gunjan, and Mohammed Yasin Ali
Abstract Asthma disease are the scatters, gives that influence the lungs, the organs that let us to inhale and it’s the principal visit disease overall particularly in India. During this work, the matter of lung maladies simply like the trouble experienced while arranging the sickness in radiography are frequently illuminated. There are various procedures found in writing for recognition of asthma infection identification. A few agents have contributed their realities for Asthma illness expectation. The need for distinguishing asthma illness at a beginning period is very fundamental and is an exuberant research territory inside the field of clinical picture preparing. For this, we’ve survey numerous relapse models, k-implies bunching, various leveled calculation, characterizations and profound learning methods to search out best classifier for lung illness identification. These papers generally settlement about winning carcinoma discovery methods that are reachable inside the writing. The probability of endurance of patients with maladies is frequently made conceivable if the sickness is recognized and analyzed in perfect time. (SVM), (KNN) and vector machine, Random help in the recognition of lung mass. A numeral of procedures has been started in malignancy recognition strategies to advance the productivity of their identification. Different applications like as help vector machines, neural systems, picture preparing methods are widely used in for asthma illness recognition which is explained during this work. Keywords DWT · Wavelet parcel change · ANN · Regression models · K-nearest neighbor · Random timberland · Decision tree · Support vector machine · K-implies · Hierarchical and deep learning procedures S. M. Ahmed Department of ECE, SREC, Warangal, Telangana, India F. Shaik (B) Department of ECE, Annamacharya Institute of Technology & Sciences, Rajampet, India V. K. Gunjan Department of CSE, CMR Institute of Technology, Medchal, Hyderabad, India M. Y. Ali Department of ECE, Mewar University, Chittorgarh, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_7
69
70
S. M. Ahmed et al.
1 Introduction Auscultation is among one among the principal essential non-intrusive and furthermore simple investigation instruments for spotting issue inside the tract like lung illnesses. All things considered, regardless of their exhibition, these devices just give a negligible and furthermore abstract comprehension of the breathing sounds. The detriments of making utilization of stethoscopes and furthermore tuning in to the commotions are utilizing the human ear territory, additionally as their absence of capacity to give a target look into investigation of the spotted framework respiratory clamors. They need adequate affectability likewise in light of the fact that the presence of deficient arrangement of classification. Other than reality that common likewise as exceptional lung sounds are mixed inside the aviation routes and furthermore therefore position a difficulty of characterization of breathing conditions, semi-occasional HS from heartbeat task for the most part thwarts the LS and furthermore accordingly covers or blocks logical examination of LS particularly over low recurrence parts. The primary consistency parts of HS stay inside the assortment 20–100 Hz. this is regularly the cluster during which LS has significant segments. Thus, since HS and LS cover in recurrence and are likewise rather non-fixed, the many issue being looked in separating HS from LS is doing as such without hardening with the most trademark elements of the LS. Auscultation is that the clinical reference to observing commotions happening inside organs like lung. It regularly is performed utilizing stethoscope by doctors. Auscultation both gives direct information about the capacity of lung [1] additionally as provisions close patient-doctor cooperation [2] on account of numerous advantages, auscultation is expected near be a valuable thingamabob. By the by its critical limitations and inconveniences. There are a few factors that influence auscultation, comprising of the criticism of stethoscope and furthermore outside sound [3] Stethoscope could be undependable in noisy environments like rescue vehicle, a bustling crisis center and so on the adequate finding moreover needs considerable preparing likewise as understanding of the clinical specialists [4]. In the previous 20 years, much research work is directed on PC based respiratory clamor examination. A tremendous a piece of these investigates contains obtaining, sifting framework, characteristic extraction, phantom assessments likewise as classification of framework respiratory sounds [5]. In abstract works, consistency assessments approaches like Fourier based methodologies, parametric techniques like AR strategies and furthermore time-recurrence assessment approaches like wavelet changes are utilized fundamentally to take a gander at respiratory clamors [6]. For the class of those sounds, regularly producer finding equations like engineered semantic systems, k-closest nearby neighbor (k-NN) are utilized. In this examination, asthmatic and run of the mill framework respiratory commotion signals are recorded from individuals with asthma in different degrees and
A Literature Survey on Identification of Asthma …
71
furthermore ordinary subjects. Afterward, the signs isolated as breathing and exhalation clamor signals. Every inward breath and exhalation sound record incorporates very one and diverse number of breath cycle. Both for these components and since of insufficient breathing clamor reports, taped sounds are isolated directly into segments. Along these lines, every segment contains of equivalent kind of breath cycle together breathing or exhalation stage. Each sound segment is surveyed and furthermore prepared as a different example. High-pass separating arrangement of lung-sound accounts to bring down heart sounds would get blocks impressive parts of lung sounds. Separating systems are delegated straight versatile channels likewise as channels utilizing time-recurrence based methodologies. Assortments of sifting plans are spread out inside these two orders. In [2], a recursive strategy for least squares (RLS) based versatile clamor wiping out (ANC) sifting method is proposed to separation or lower the HS from LS. Here, a band pass separating framework variant of the copied HS was utilized on the grounds that the suggestion signals. Time recurrence (TF) separating procedures have in like manner been star acted for HS decline in LS. Techniques for heart commotion limitation are shown close by the examination investigations of heart-sound end. Identical scientists approve that the versatile channel might be significantly easier in diminishing sound from time assortment data than straight channels, wavelet constriction, and strife based sound decrease plans. Choi et al. [7] the main strategy to bring down HS impacts is to utilize a high pass channel with cut-off normality shifting from 50 to 150 Hz. tons progressively complex methods to diminish HS from breath clamor accounts have really been characterized inside the scholarly fills in as adaptable separating techniques, wavelet demising, likewise as blend of HS confinement and-expulsion and furthermore LS figure. On the contrary hand, snaps are short, eruptive and intermittent sounds, shorter than 100 ms, typically happening during inspiration. They’re characterized by a quick first pressure avoidance clung to by a short swaying. These unusual commotions are ordered as incredible pops and furthermore coarse pops upheld their period. Subsequently, fine snaps are determined as those enduring much yet 10 ms and coarse pops are indicated as those enduring very 10 Ms. Snaps are a subjective demonstrative device additionally as are regularly created either by touchy openings among districts of the lungs, emptied to leftover volume, because of abrupt balance of gas worry all through motivation or by alteration in adaptable pressure and nervousness coming about because of unexpected opening of shut air entries. Nonetheless, the instrument utilized for auscultation, the stethoscope, frequently absents a solid response inside the acquisition of lung sounds. Basically, this apparatus is only a sound channel between the body surface and furthermore the ears. Its recurrence reactions are only from time to time assessed, appraised or differentiated, with the instruments being typically picked for their look, online notoriety and furthermore inappropriately supported instance of execution, as against their specialized qualities. Ordinarily, the recurrence activity of the stethoscope lean towards the diminished regularities, enhancing those under 112 Hz and lessening higher regularities [8]. Thus, the activity of the stethoscope needs when auscultation aspiratory commotions, which can have recurrence segments a lot more than 112 Hz. Since
72
S. M. Ahmed et al.
heart sounds are made principally out of decreased regularities, they trigger a lot of obstruction when the stethoscope is shaped utilization of to auscultator lung sounds, being a purpose behind confusion inside the auscultation of respiratory sounds. In proficient technique, a few difficulties occur during the examination methodology, similar to a qualification in level of affectability in the middle of the ears, clinical experts’ strategy inside the errand of distinguishing lung clamors, and nearness of outer and furthermore inside sounds, which can make mistakes inside the acknowledgment of the commotion as neurotic or normal, upsetting the exactness of the determination.
2 Study on Preprocessing and Have Selection in Asthma Disease 2.1 Pre-processing Strategy The essential target of pre-preparing is to fortify picture quality to make it for reducing or emptying the immaterial pieces of the photos. Pre-handling stage is crucial to fortify nature of picture. The commotion and other high repeat sections are cleared by channels and set up the datasets for additional preparing. The essential inspiration driving carcinoma distinguishing proof system is to help the radiologist and specialists to require definite choice quickly [9].
2.2 Feature Selection Highlight choice assumes a key job in many example acknowledgment issues like picture characterization. This element extraction procedures is predicated on ‘Fourier change, straight prescient coding, wavelet change and Mel-recurrence campestral coefficients (MFCC) along with the characterization techniques upheld vector quantization, Gaussian blend models (GMM) and counterfeit neural systems, utilizing recipient working trademark bends’.Here an ideal edge to separate the wheezing class from the conventional is proposed. Post-handling channel is joined to improve the arrangement precision. Results uncover that the methodology upheld MFCC coefficients added to GMM gives great characterization of respiratory sounds in typical and wheeze classes. McNamara’s test has indicated fantastic outcomes among the shifted classifiers (p < 0.05) [10].
A Literature Survey on Identification of Asthma …
73
2.3 Highlight Choice Methods Highlight choice [11–14] is a pre-handling procedure utilized in AI to evacuate pointless traits to expand learning precision. Highlight determination doesn’t just infer to cardinality decrease yet in addition towards the selection of properties that might be because of the nearness or nonappearance of connection among the qualities and the order calculation. It implies that the demonstrating device chooses or dismisses qualities relying upon their need during examination. Likewise enormous measure of information represents an issue in Learning Task and Feature choice gets essential for high dimensionality. Within the sight of numerous superfluous highlights some of which don’t include a lot of significant worth during the learning procedure, the learning models are probably going to get ‘computationally mind boggling, over fit, become less fathomable and diminish learning exactness’. Highlight choice is one compelling approach to recognize pertinent highlights for dimensionality decrease. In any case, the preferred position accompanies additional exertion of attempting to get an ideal subset that will be a genuine portrayal of the first dataset. Highlight choice strategies are delegated Filter techniques, wrapper techniques, implanted techniques and half and half strategies as appeared in Fig. 1.
3 Survey on Different Classifier of Asthma Using Neural Networks References of data (Lung doctor) is collected from 60 subjects of peoples using a 4-channel Data Acquisition System (DAS).The lung organ sounds of 60 subjects samples which include 30 good condition and 30 with asthma effect four various conditions over chest. spectral sub band based feature extraction scheme is applied which is on (ANN) and (SVM). From Welch’s spectral density is featured from lung sound cycle, changes into uniform sub bands. Group of mathematical features is
Fig. 1 Feature selection methods
74
S. M. Ahmed et al.
Fig. 2 ANN using asthma
computed from each sub band and applied to SVM and ANN classify normal and asthmatic subjects [8]. This system is represented in Fig. 2 (Table 1). Table 1 Different types of classifiers Classifier/Techniques
Name of classifiers
Additional functions
Uses
Accuracy (%)
CNN
Convolution neural networks
MFCCs statistics-based classification (KNN,SVM,GMM)
Image 70 recognition, Self driving vehicles
ARNN
Attractor recurrent neural network
ARNN with fuzzy functions (FFs-ARNN)
Solves the particular task, image classifications
85.2
DWT
Discrete wavelet Wavelet packet transform transform (WPT) with artificial neural networks
Signal coding, data compression, reducing the noise
90
DFT
Discrete fourier transform
DFT with artificial neural networks
Image processing, signal converting, image classifications
74
ANN
Artificial neural networks
Spirometry (SPIR) and impulse oscillometry system (IOS)
Speech recognition, language generation, text classification, image classification
98.85
A Literature Survey on Identification of Asthma …
75
4 Brief Review of Clustering Techniques Various data mining computations are used in batching. During this territory, two gathering procedures are shown: k-implies, and various leveled grouping [15]. Concerning strategies that don’t use learning data, they needn’t mess with greater chance to encourage prepared assigned test data. One among the upsides of those methodologies is that they exhaust less time. As a burden, we won’t suggest spatial information. Since the plans approach, these estimations don’t recognize the spatial information; appropriately, which are unstable to clatter and power in homogeneities.
4.1 K-Nearest Neighbor Classifier In K-nearest neighbor (KNN) framework, nearest neighbor is evaluated with admire to estimation of k, that describe what range of nearest neighbors ought to be investigate to depict type of a mannequin statistics factor [16]. Nearest neighbor method is limited into two lessons i.e., shape based totally KNN and shape much less KNN. The shape based totally method offers with simple shape of the records the place the shape has fewer frameworks which associated with planning records checks [17]. In shape less method whole statistics requested check information factor and getting geared up data, partition. Chosen check facilities and every one association facilities and thusly the factor with most diminutive detachment is comprehended as nearest neighbor [18]. K nearest neighbor (KNN) might be a basic calculation, which stores all cases and gathering new cases snared in to likeness measure. KNN count in like manner called as. • • • • • •
Case based thinking K closest neighbor Example based thinking Instance based learning Memory based thinking Lazy learning [19]. KNN calculations are used since 1970 in various applications like quantifiable estimation and model affirmation etc. KNN might be a non parametric gathering method which is broadly requested into two sorts. • Structure less NN strategies • Structure based NN strategies.
76
S. M. Ahmed et al.
4.2 Bolster Vector Machine (SVM) In AI, support vector machines (SVMs) are ‘controlled learning models with related learning computations that explore data and see plans’, used for course of action and backslide assessment. The basic SVM gets huge amounts of data and predicts, for each last one among given data, which of two potential classes outlines the information, making it a non-probabilistic two overlay straight classifier. Given huge amounts of getting ready models, each stepped highlights a spot with one among two characterizations, a SVM for planning count into one grouping or the inverse. A SVM model shows the models centered in space, mapped with the objective on how various orders are disconnected by an indisputable opening that is as wide as would be reasonable. Contingent on which side of the opening the model falls inside the comparable space the new models are mapped with place under grouping [20]. SVM considered as an area of neural systems, AI, information handling and example acknowledgment. Figure 3 represent its primary thought, SVM calculations are regularly seen on the grounds that the undertaking of isolating two classes positive and negative in highlight space. The most issue is to search out a hyper-plane that isolates those classes adequately relying on maximal edge. The activity of SVM is depicted in two stages ‘Nonlinear mapping of an info vector into a high-dimensional element space that is escaped both the information and yield’ and ‘Development of an ideal hyper plane for isolating the highlights found in step 1’.
4.3 Choice Tree Classifier Decision tree calculations are most routinely utilized includes in depiction [21]. Choice tree gives a satisfactorily reasonable indicating procedure and it. Also improves the depiction framework. The decision tree is fundamental instrument it ask clients to look for after a tree structure sufficiently before long perceive how the assurance is surrounded. Decision tree gets from the major portion and vanquish figuring. In these tree structures, ‘leaves address classes and branches address conjunctions’ that cause those classes. The one that that matches most into various classes is picked at each inside purpose of the tree. To predict the arrangement name of information, a route to a leaf from the reason is found depending on the estimation of the predicate at each inside point that is visited. Firstly every now and again utilized counts of the choice trees are ID3 [22] and C4.5 [23]. What’s more, in this manner the one abused for microarray data examinations is that the subjective woods [24], which incorporates social affair of collection trees. In [25] a superb execution of sporadic boondocks for noisy and multi-class microarray data is illustrated. Choices trees are customarily utilized in practices review, unequivocally in choice appraisal, to help see a way
A Literature Survey on Identification of Asthma …
77
Table 2 The various classifiers with their different attributes are compared and tabulated below Classifiers
Advantages
Issues
DT (ID3 and C4.5) [21]
Easy to understand Non-parametric approach
Multi valued attributes Image classifications Noisy data decision making classification
Applications
K NN [17]
Simple implementation Time requirement Robust KNN scaling over No training sets need multimedia dataset
Data mining Text mining Image and text classification
SVM [20]
High accuracy Flexible selection of kernels for nonlinearity
Low sparse SVM classifier Multi-label classification
Text Classification, image classification
RF [26]
Fast Scalable Robust to noise
Complexity Need more computational sources Time consuming in prediction process
To discover bunch of patients Grouping of microarray information, object discovery
Naïve Bayes
Capacity to decipher issue as far as basic relationship among indicators, no free parameters to be set
Data scarcity Assumption of independent prediction
Document classification medical diagnostic systems
CNN [27]
Feature learning Weight sharing
High computational Image recognition cost Image classification Need training data sets
RNN [28]
Can using time series model Can using various network models
Difficult in use active function Not suitable of long time operation
Speech recogninition Image processing
well on the appreciation to show up at an objective. Choice tree calculations are most generally utilized includes in depiction [21]. Choice tree gives an effectively reasonable demonstrating method and it. Furthermore improves the portrayal system. The determination tree is straightforward instrument it urges customers to seek after a tree structure adequately soon see how the choice is shaped (Table 2).
5 Conclusion During this overview, we’ve examined a speedy review of asthma location utilizing different neural system orders and its centrality as of late. We’ve likewise dissected somehow or another, to recognize the asthma utilizing of different sorts of neural systems and procedures. A diagram of the writing overview on different procedures
78
S. M. Ahmed et al.
which will be utilized for order and investigation of asthma and sign handling additionally are examined. Continuous inquires about are on a move towards new methods which will be applied for progressively precise asthma identification and grouping. During this survey, we’ve moreover deselected from various perspectives, to realize challenges in scrounging through asthma recognizable proof. A casing work of the composing survey on various strategies which will be used for characterization and examination segment are also analyzed. Advancing investigates are continue forward new techniques, this associated for progressively careful asthma identification and characterization. In our examination, we break down the differed methods utilizing asthma arrangement, for example, k-Nearest Neighbor (K-NN) Classifier, Gaussian blend model (GMM) and Discrete wavelet change (DWT), Support vector machine (SVM) and so forth. Looking at of these strategies, confused convolution neural systems (CCNNs) may give preferred grouping over different techniques.
References 1. Harper, P., Kraman, S.S., Pasterkamp, H., Wodicka, G.R.: An acoustic model of the respiratory tract. IEEE Trans. Biomed. Eng. 543–550 (2001) 2. Shareef, N., Wang, D.L., Yagel, R.: Segmentation of medical images using LEGION. IEEE Trans. Med. Imaging 18(1), 74–91 (1999) 3. Gudmundsson, M., El-Kwae, E.A., Kabuka, M.R.: Edge detection in medical images using a genetic algorithm. IEEE Trans. Med. Imaging 17(3), 469–474 (1998) 4. Sonka, M., Grunkin, M.: Image processing and analysis in drug discovery and clinical trials. IEEE Trans. Med. Imaging 21(10), 1209–1211 (2002) 5. Ye, X., Lin, X., Dehmeshki, J., Slabaugh, G., Beddoe, G.: Shape-based computer-aided detection of lung nodules in thoracic CT images. IEEE Trans. Biomed. Eng. 56(7), 1810–1820 (2009) 6. Shaik, F., Sharma, A.K., Ahmed, S.M., Gunjan, V.K., Naik, C.: An improved model for analysis of diabetic retinopathy related imagery. Indian J. Sci. Technol. 9(44). ISSN: 0974-6846. Submitted in October 2015, Accepted in Jan 2016, Published in November (2016) 7. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24(2), 361–370 (2016) 8. Shaik, F., Sharma, A.K., Ahmed, S.M.: Detection and analysis of diabetic myonecrosis using an improved hybrid image processing model. In: IEEE International Conference on Advances in Electrical, Electronics, Information, Communication and Bioinformatics-2016 (AEEICB2016 to be published in IEEE Explore) at Prathyusha Institute of Technology and Management, 27–28th Feb 2016. ISBN: 978-1-4673-9745-2 9. Tai, S.C., Kuo, T.M., Li, K.H.: An efficient super resolution algorithm using simple linear regression. In: 2013 Second International Conference on Robot, Vision and Signal Processing, pp. 287–290. IEEE (2013) 10. Suárez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999) 11. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 2(1), 86–97 (2012) 12. Chen, J., Wang, J., Cheng, S., Shi, Y.: Brain storm optimization with agglomerative hierarchical clustering analysis. In: International Conference on Swarm Intelligence, pp. 115–122. Springer, Cham (2016)
A Literature Survey on Identification of Asthma …
79
13. Kesavaraj, G., Sukumaran, S.: A study on classification techniques in data mining. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–7 (2013) 14. Singh, M., Sharma, S., Kaur, A.: Performance analysis of decision trees. Int. J. Comput. Appl. 71 (2013) 15. Khandare, S.T.: A survey paper on image segmentation with thresholding. Int. J. Comput. Sci. Mobile Comput. 3(1), 441–446 (2014) 16. Thanh Noi, P., Kappas, M.: Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18(1), 18 (2018) 17. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., et al.: Top 10 algorithms in data mining. Knowledge and Information Systems 18. Bhatia, N.: Survey of nearest neighbor techniques. arXiv:1007.0085 (2010) 19. Deekshatulu, B.L., Chandra, P.: Classification of heart disease using k-nearest neighbor and genetic algorithm. Proc. Technol. 10, 85–94 (2013) 20. Abbasi, S., Derakhshanfar, R., Abbasi, A., Sarbaz, Y.: Classification of normal and abnormal lung sounds using neural network and support vector machines. In: 2013 21st Iranian Conference on Electrical Engineering (ICEE), pp. 1–4. IEEE (2013) 21. Twa, M.D., Parthasarathy, S., Roberts, C., Mahmoud, A.M., Raasch, T.W., Bullimore, M.A.: Automated decision tree classification of corneal shape. Optometry Vis. Sci. Official Publ. Am. Acad. Optometry 82, 1038 (2005) 22. Hsu, C.H., Manogaran, G., Panchatcharam, P., Vivekanandan, S.: A new approach for prediction of lung carcinoma using back propagation neural network with decision tree classifiers. In: 2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2), pp. 111–115. IEEE (2018) 23. Murty, N.R., Babu, M.P.: A critical study of classification algorithms for LungCancer disease detection and diagnosis. Int. J. Comput. Intell. Res. 13(5), 1041–1048 (2017) 24. Varadharajan, R., Priyan, M.K., Panchatcharam, P., Vivekanandan, S., Gunasekaran, M.: A new approach for prediction of lung carcinoma using back propogation neural network with decision tree classifiers. J. Ambient Intell. Humanized Comput. 1–12 (2018) 25. Parvin, H., MirnabiBaboli, M., Alinejad-Rokny, H.: Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng. Appl. Artif. Intell. 37, 34–42 (2015) 26. Louppe, G.: Understanding random forests: from theory to practice. arXiv:1407.7502 (2014) 27. Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A., Mougiakakou, S.: Multisource transfer learning with convolutional neural networks for lung pattern analysis. IEEE J. Biomed. Health Inform. 21(1), 76–84 (2016) 28. Chen, L., Pan, X., Zhang, Y.H., Liu, M., Huang, T., Cai, Y.D.: Classification of widely and rarely expressed genes with recurrent neural network. Comput. Struct. Biotechnol. J. 17, 49–60 (2019)
Bibliography 29. Homs-Corbera, A., Fiz, J.A., Morera, J., Jané, R.: Time-frequency detection and analysis of wheezes during forced exhalation. IEEE Trans. Biomed. Eng. 182–186 (2004) 30. Yang, G.Z., Hansell, D.M.: CT image enhancement with wavelet analysis for the detection of small airways disease. IEEE Trans. Med. Imaging 16(6), 953–961 (1997) 31. Ukai, Y., Niki, N., Satoh, H., Watanabe, S., Ohmatsu, H., Eguchi, K., Moriyama, N.: A coronary calcification diagnosis system based on helical CT images. In: 1997 IEEE Nuclear Science Symposium Conference Record, vol. 2, pp. 1208–1212. IEEE (1997) 32. Hamalainen, A., Henriksson, J.: Convolutional decoding using recurrent neural networks. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), vol. 5, pp. 3323–3327. IEEE (1999)
80
S. M. Ahmed et al.
33. Berber, S.M., & Kecman, V.: Convolutional decoders based on artificial neural networks. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 1551–1556. IEEE (2004) 34. Kim, S.J., Kim, C.H., Jung, S.Y., Kim, Y.J.: Shape optimization of a hybrid magnetic torque converter using the multiple linear regression analysis. IEEE Trans. Magn. 52(3), 1–4 (2015) 35. Wang, P., Ge, R., Xiao, X., Zhou, M., Zhou, F.: hMuLab: a biomedical hybrid MUlti-LABel classifier based on multiple linear regression. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 14(5), 1173–1180 (2017) 36. Au-Yeung, S.K., Siu, M.H.: Maximum likelihood linear regression adaptation for the polynomial segment models. IEEE Signal Process. Lett. 13(10), 644–647 (2006) 37. Meng, J., Gao, Y., Shi, Y.: Support vector regression model for measuring the permittivity of asphalt concrete. IEEE Microwave Wirel. Compon. Lett. 17(12), 819–821 (2007) 38. Zhang, Y., Du, Y., Ling, F., Fang, S., Li, X.: Example-based super-resolution land cover mapping using support vector regression. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7(4), 1271–1283 (2014) 39. Setiono, R., Liu, H.: A connectionist approach to generating oblique decision trees. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(3), 440–444 (1999) 40. Ham, J., Chen, Y., Crawford, M.M., Ghosh, J.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(3), 492–501 (2005) 41. Duwairi, R., Abu-Rahmeh, M.: A novel approach for initializing the spherical K-means clustering algorithm. Simul. Model. Pract. Theory 54, 49–63 (2015) 42. Kodinariya, T.M., Makwana, P.R.: Review on determining number of cluster in K-means clustering. Int. J. 1(6), 90–95 (2013) 43. Shovon, M.H.I., Haque, M.: Prediction of student academic performance by an application of k-means clustering algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(7) (2012) 44. Zhang, Y., Wang, S., Ji, G., Dong, Z.: An MR brain images classifier system via particle swarm optimization and kernel support vector machine. Sci. World J. (2013) 45. Zhang, H., Hou, Y., Zhang, J., Qi, X., Wang, F.: A new method for nondestructive quality evaluation of the resistance spot welding based on the radar chart method and the decision tree classifier. Int. J. Adv. Manuf. Technol. 78(5–8), 841–851 (2015) 46. Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., Li, K.: A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment 47. Chen, M., Shi, X., Zhang, Y., Wu, D., Guizani, M.: Deep features learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data (2017) 48. Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. MICCAI LNCS 8150, 246–253 (2013) 49. Sridar, P., Kumar, A., Quinton, A., Nanan, R., Kim, J., Krishnakumar, R.: Decision fusion-based fetal ultrasound image plane classification using convolutional neural networks. Ultrasound Med. Biol. 45(5), 1259–1273 (2019) 50. Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noiserobust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99. Springer, Cham (2015) 51. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. arXiv:1511.03677 (2015) 52. Shaik, F., Sharma, A.K., Ahmed, S.M.: Hybrid Model for Analysis of Abnormalities in Diabetic Cardiomyopathy and Diabetic Retinopathy related images. Springer Plus Journal, Springer Publications, Apr 2016. ISSN: 2193–1801 53. Hannan, M.A., Ali, J.A., Mohamed, A., Uddin, M.N.: A random forest regression based space vector PWM inverter controller for the induction motor drive. IEEE Trans. Industr. Electron. 64(4), 2689–2699 (2016)
Adaptation and Evolution of Decision Support Systems—A Typological Survey Ravi Lourdusamy and Xavierlal J. Mattam
Abstract There are many ways in which a literature survey can be done. One is to study the chronological order of development and another is to make a study of the developmental process itself. There is also a way of surveying the evolution in its different stages of growth with various adaptations made. This paper is an attempt to study the different stages of growth of the Decision Support Systems taking into consideration the different types based on its framework, its application, and its architecture. In each, a few models are evaluated to show the evolution of Decision Support Systems. Although there are many models of Decision Support Systems that have evolved with growth of technological advancements, a complete study of all the models is not the scope of the paper. Rather, the paper attempts to show that since the Decision Support Systems evolved with the adaptation of advancements in technological developments, the decision support system in the future will also evolve in the line of technological developments. Keywords Decision support systems · Evolution · Adaptation
1 Introduction Decision support has been a very important task in many fields. It has therefore been studied and systems have been developed to support decisions from the time machines helped in the decision making processes. The study of the origin and history of a system helps in identifying the progress of its development and to evaluate it in light of its contemporary significance. So a chronological study will show how the system developed over the years, while a study of the history of the system as a developmental process will highlight the progress in the development of the system. The focus in this study of the origin and history of Decision Support Systems R. Lourdusamy · X. J. Mattam (B) Sacred Heart College, Tirupattur, Tamil Nadu 635601, India R. Lourdusamy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_8
81
82
R. Lourdusamy and X. J. Mattam
(DSS) is to present how DSS evolved over the years with the adaptation of various technological developments in the different types of DSS and the future of DSS can thus be reliably predicted.
1.1 Background ‘As We May Think’, a paper by Vannevar Bush in 1945 [1] could well be the origin of the DSS. The idea of the machine that could replicate human thinking and the decision-making process is discussed in this paper. From a mere idea to a complex operational system, the DSS has come a long way. The DSS has constantly developed and has become so sophisticated that the normal human thinking process can not arrive at decisions that DSS helps to make. The origin and history of DSS have been researched by authors differently. While some have published the chronological order of the development of DSS [2, 3], some others have studied it as a developmental process [4]. Still, others have made a literature survey of DSS [5–7]. The growth of DSS seen as an evolutionary development has been published in some papers [8–10]. The evolutionary approach to development is presented in phases with a gradual growth from simple to complex form over a period of time. The development of DSS is an evolutionary process. There are many other reasons to state that DSS has evolved [8]. The adaptation of technological advancements in the DSS has led to their evolution.
1.2 Main Focus of the Article The article is a study of the evolution of DSS in its different types. Basically, it is a study of the types, categorized in three ways. The first section reviews the types based on the framework of the DSS. In this, we study the evolution of five different types of DSS. In the second section, the evolution of DSS is studied from the major applications of DSS. The study is made only with three applications in which there is a lot of interest in research and development. In the last section, a study is made on the evolution of DSS based on its architecture. Here again, only three components of the DSS are studied and these three components are part of three phases of evolution of DSS. Initially, the DSS was just an interactive computer-based system, but with the introduction of the World Wide Web, the second phase of the DSS began, and more recently with advancements in artificial intelligence, there is the third phase of the DSS. Each of these components is incrementally added to the basic architecture of the DSS. The purpose of this paper is not to make an exhaustive study but rather to present a brief sketch on how the evolution of DSS can be studied from the point of view of its composition and application. It presents a few types of DSS under three basic
Adaptation and Evolution of Decision Support …
83
classifications. There are many other types and that could be classified in many other ways. That is beyond the scope of this brief typological survey.
2 Evolution of DSS Framework DSS framework is seen from the point of view of how the DSS works. The main component or the aim of the DSS differs and accordingly, the DSS can be classified as communication-driven, data-driven, model-driven, knowledge-driven, and document-driven DSS. In each of these, there has been an adaptation of newer technologies and a gradual evolution that happened over the period of time.
2.1 Communications-Driven Communication-driven DSS enables a group of people to communicate with each other, share ideas, and arrive at a common decision. In Communication-driven DSS, communications, collaboration, and shared decision-making are stressed. Basically, they make use of all forms of communication to arrive at decisions. It enables interaction between groups of people, facilitates sharing of information, supports coordination and collaboration among the group and it helps in group decision task. The evolution of Communication-driven DSS began with face-to-face communication and steadily adapted itself with changes in communication technology to be more sophisticated and precise as of the present [11]. Therefore the evolution of a communication-driven DSS is complementary to the evolution of communication technology and information technology. Group Decision Support System (GDSS) is a communication-driven DSS that aims to remove communication barriers and improve the decision-making process. GDSS is a combination of communications, computers, and decision support technologies. The task of GDSS is to remove communication barriers using communication technologies available within the GDSS. The communication technologies within GDSS include hardware and software technologies. The sophistication in GDSS comes with the development of communication technologies [12]. A Decision Room is one such result of GDSS sophistication wherein the participants of the decision-making process are together through technological advancements in communication rather than being physically together in one space but are present at the same time [13]. The development of Mobile Decision Support Systems (MDSS) belongs to the latter part of the evolution of GDSS which is coupled with the advancements in wireless technologies. A lot of research has been done on MDSS and the findings indicate the evolution of the different types of DSS all of which are communication driven. All these researches also indicate the various possibilities of MDSS adaptations in the future [14]. The driving force behind this rapid expansion and evolution
84
R. Lourdusamy and X. J. Mattam
of MDSS is its close affinity with Business Intelligence and the steady advancements in mobile technology. The faster data exchanges that take place due to advancements in wireless technology have also contributed to this phase of the DSS evolution and it is proceeding in the way of MDSS [15]. Communication being the core of any decision support system used by multiple decision-makers or in collective decision making, the design of communication architecture has equally evolved with the technologies surrounding networks for information exchanges. An excellent example of communication in GDSS is in Cooperative Decision Support Systems [16]. Cooperative Decision Support Systems involve a cooperative decision setting. It is a collective decision-making process and as such is communication dependent. Collective decision making probably began with the theory of elections found in the eighteenth-century literature. Later it was extended to the ordinal and cardinal ranking problem in the statistical framework and the game theory approach. Adaptation of all these theories in DSS and the consequent development of these theories led to the evolution of DSS through the years [17].
2.2 Data-Driven Data-driven DSS is an approach in which decisions are backed by data that is reliable and verifiable. The effectiveness of the analysis and interpretation of raw data is valued in data-driven DSS. The three main tasks of data-driven DSS are data gathering, data storage, and data analytics. Advancements in the theories and technologies related to these three tasks have simultaneously led to the evolution of data-driven DSS. The SAGE (Semi-Automatic Ground Environment) which was operational for the US air-defense from 1963 is probably the beginning of data-driven DSS. While advancements in defense systems speeded-up the evolution of data-driven DSS on one hand, the growth of business intelligence also provided the necessary stimulus for data-driven DSS. Business Intelligence is a data-driven DSS that is primarily used for data warehousing and data analytics [18]. There had been generations in the evolution of data-driven DSS. It began the application-driven approach which gathered with data from a single application or a few related applications. The next generation was data-centric and it used large data warehousing to store the vast amount of data. The data warehouse supported different applications. In the third generation of data-driven DSS, real-time data is used to enhance current decisions [19]. The present generation of DSS is large-scale complex systems which make use of greatly evolved systems of data mining and data analytics using Big Data and artificial intelligence tools [20–23].
Adaptation and Evolution of Decision Support …
85
2.3 Model-Driven In model-driven DSS, a decision problem is converted into a mathematical or statistical model with variables to solve the problem. The model-driven DSS could be either in the form of statistical software packages or spreadsheets or forecasting tools or financial modeling or optimizing software. The models in a model-driven DSS help in analyzing data that help in the decision-making process. The evolution of the model-driven DSS is bound by the development of models as the adaptations of the improvements in models are reflected in the DSS. The DSS which were initially used in the ‘70s are classified as model-driven DSS and they continue to evolve with the evolution of models. Between 1970 and 2001, there were more than 1800 different papers written on DSS, most of which come under the domain of model-driven [5–7, 24]. The relevance of models in DSS is at all levels of the decision-making process. The past models are used to derive newer models and are indicative of future trends. And since models are independent of the data, it remains stable and it can be used across applications. A Model Management System (MMS) manages the different models in a DSS [25]. The basic purpose is to protect the users of the DSS from the models and the other processing aspects of the DSS [26]. The MMS can be either decision processing MMS or the model processing MMS [27]. There are varieties of MMS based on their functions. The development in management sciences has, in more ways, contributed to the development of MMS and in turn to the evolution of DSS [28, 29].
2.4 Knowledge-Driven Knowledge-driven DSS keeps a store of facts, rules, procedures, and all similar functions which are used to solve decision problems using the stored criteria. The stored structure is the Knowledge warehouse and the data is drawn from it using a data mining technique. The evolution of the knowledge-driven DSS is linked to the evolution of the knowledge warehouse which has the structure to capture, cleanse, store, organize, leverage, and disseminate data in the form of knowledge [30]. Knowledge warehouse provides the DSS with an analyzed phase of the knowledge management process which is shared across the organization. Technologies such as data mining, web mining, and text mining help the knowledge acquisition process. The process of knowledge acquisition to create a knowledge warehouse is the knowledge discovery process [31]. The evolution of these processes (known as the process knowledge [8]) together with technologies related to them is part of the evolution of the knowledge-driven DSS.
86
R. Lourdusamy and X. J. Mattam
2.5 Document-Driven Document-driven DSS works with unstructured data. It is a relatively new classification compared with the other DSS frameworks. The development of document-driven DSS can be largely attributed to the evolution of artificial intelligence technologies like big data analytics, text mining, ontological learning, Pattern Recognition, Natural Language Processing, and others. Document-driven DSS is designed to store, retrieve, extract, and process information from various electronic file formats. In document-driven DSS information is gathered from a large variety of formal and informal sources like hand-written letters, reports, photographs, videos, audio recordings, reports, memos, etc. Only relevant data needs to be captured and stored for the use of the DSS. The advancement of technology that captures and stores the relevant data has led to the evolution of document-driven DSS. Document Management Systems which helps integrate a variety of storage and processing technologies is an essential part of the document-driven DSS [32, 33].
3 Evolution of DSS Applications The DSS is used for many applications. Although initially, it evolved out of the management information system that was used mainly for businesses, it gradually took on a more independent form that could be applied for many other fields. Not all the applications are well researched and studied. Many applications are just an adoption of a generic DSS to a particular field. The main DSS applications are for business, healthcare, and natural resource management that are studied here.
3.1 Business Enterprise Management DSS in Business Enterprise Management has the largest amount of research done in comparison with other DSS. There were about 481 papers on corporate functional management while there were only 203 papers on DSS in other areas between the years 1971 and 2001 [5–7]. This implies that there have been a lot of studies conducted and the evolution of the DSS in Business Enterprise Management has been steady. DSS was described as Management Information System (MIS) and it was meant to help Business Enterprises [34]. The DSS is an evolutionary part of the MIS. The evolution of MIS concepts includes everything from basic data processing to sophisticated expert and intelligent systems. While the MIS was concerned strictly with the organization of data and its conversion to information for management support, the DSS was more about processing the data and its conversion information to knowledge for management
Adaptation and Evolution of Decision Support …
87
support. Since DSS was based on the MIS, the evolution of both is complementary [35]. The core of both the MIS and DSS is the information systems. The adaption of the developments in information systems gives rise to the steady evolution for MIS and together with it, the DSS. The evolution of both systems is also seen in systems planning, analysis, and design processes. The development of various theories that back the processes is incorporated into the systems [36]. While technological advances have led to the simultaneous evolution of both the MIS and the DSS, there are advances in DSS independent of the MIS in which the decision processes are directly coupled with the development of technology or decision-making processes [37]. Business Intelligence (BI) is an evolution of business-related DSS independent of the MIS. Although the phrase business intelligence was first used by Hans Peter Luhn in 1958, it came into common usage only in 1989. The two basic functions of BI are data warehousing and data visualization [38]. The evolution of BI can be seen in different phases, each phase leading to more complex phases. The first phase had data mining and managerial reporting. The next phase used On-Line Analytic Processing (OLAP) technologies to analyze data in data warehouses. The following phase had Balanced Scorecard methodology and it was followed by the use of Web analytics and Web mining for BI. The phases after that used Business Dashboard technology. Later, appropriate mobile and location-aware technologies came into existence for BI [39].
3.2 Healthcare There are a variety of ways to define DSS in health care. The DSS in healthcare provides the healthcare workers and patients with precise information that is generalized or individual-specific by intelligently filtering it and presenting it for the betterment of the patient or the general population. The architectural evolution of DSS in healthcare can be broadly placed in four phases. The first phase began with the standalone DSS in 1959. In the next phase, beginning in 1967, the integrated system was designed. From 1989, the standards-based models were used and from 2005 service models are being used [40]. Technological development in DSS for healthcare happened with the adaptation of newer technology in the DSS. For example, the earliest clinical DSS had synchronous alert processes. Later the smart alter routing was used and it was supplemented by alert prioritization systems for emergencies. Later to avoid alert fatigue problems, the asynchronous alerts or the clinical event monitor were used. Still, later the notification escalation techniques were used for critical alerts. Together with it, the expectation tracking was used. Alerts were used for a health maintenance reminder system. A recent advancement is the ambient alert system that is less intrusive and non-interruptive [41]. With the advent of AI tools like machine learning and data analytics, the DSS in healthcare advanced as in the case of all other DSS. Supervised learning and predictive analytics are used to reduce readmission risks. Also, visual differential
88
R. Lourdusamy and X. J. Mattam
diagnostic tools help in medical diagnostic decision support. Still recent evolution in healthcare DSS accesses relevant medical literature to answer queries relating to the latest advancement in treatment [42].
3.3 Agriculture and Natural Resources Management Strategic planning processes are important in agriculture and natural resource management. Information and knowledge are vital for the planning processes. The DSS for agriculture and natural resource management has evolved rapidly with the use of Geographical Information Systems (GIS). Geographic-referencing helps in obtaining the necessary information for decision making. GIS also allows spatial referencing that in turn helps in visualization of the area. The spatial referencing done using GIS is built into the DSS for agriculture and Natural Resource Management. Through GIS, the DSS can forecast disease/pest risk and optimal time of field assessment [43–48]. There has been an evolution in the DSS for agriculture and Natural Resource Management in keeping with the development of technologies and processes. DSS has also been used in different ways when it comes to agriculture and Natural Resource Management. The development of the DSS is also linked to the better understand of agriculture and natural resources [49, 50].
4 Evolution of DSS Architecture The architectural evolution of the DSS is the most visible change in the DSS over the period of time from its very beginnings. The DSS has evolved from a very basic keyboard-based single interactive system to a global system that has voice recognition and natural language processing abilities. While the evolution of the computer interface of the DSS happened with the adoption of the newer technologies in DSS in all the different phases of the evolution, the adoption of web and AI techniques are seen as different phases of the evolution in itself.
4.1 Interactive Decision Support One of the main elements in any DSS is the interactive interface. The interactive interface allows the user of the DSS to query the DSS and obtain insights. One definition of a DSS includes interaction as a basic component of the DSS. The interactive function of the DSS is not limited to the user and the computer-based system but extends to the interaction between the variables of the models and the interaction between the components of the DSS [51].
Adaptation and Evolution of Decision Support …
89
The evolution of the DSS in its interactive function with the user has taken many forms. One such is conflict resolution which is described as the interactive decision problem. A graph model of conflict resolution makes use of theories on conflict analysis, drama theory, hyper-game analysis, and theory of moves. The DSS is based on these principals to assist in the interactive decision. User interaction is an essential component of this DSS architecture [52]. Another example of an interactive architecture makes use of the game theory for representing multiple users in a collaborative DSS [53]. The GDSS makes use of Multi-Criteria Decision Making (MCDM). MCDM is an interactive approach that brings out the group’s preferred choice to build the group’s decision. Many interactive decision analytic representations are developed and many DSS use a hybrid of these representations [54, 55]. The research and developments in the interactive analytic representations have also resulted in the evolution of DSS into an intelligent DSS with a visually interactive approach to reduce the complexity of the human–computer interactive interface. AI technological advancements are also adopted into such DSS evolution [56].
4.2 Web-Based Decision Support Web-based DSS is a natural extension of DSS as the World Wide Web contains many elements necessary for an effective DSS. The web has the storage capability, the presentation capability, and the ability to gather, share processes, and use information. Moreover, the web has no restriction of time and space. The web-based DSS is a phase of DSS evolution from a localized system to a globalised system [57]. The evolution has also caused a shift from the LAN based client–server architecture to the application of internet technologies for web-based DSS in the early 1990s. The web-based DSS has increased the availability, scalability, flexibility, and performance of DSS [58]. The evolution of web-based DSS has, in turn, helped develop knowledge repositories and knowledge management systems. The MIS from which the DSS initially evolved is now developed with the use of the knowledge management systems that make use of the web-based DSS [59]. An advantage in the evolution of web-based DSS is the reusability of technologies. So there is no overhauling of technologies in this phase of DSS evolution but is viewed as the growth of technologies by making use of both the old and new technologies [60]. One new technology that has developed and is part of the evolution of web-based DSS is web mining. Web mining is a process of discovering useful and relevant information for web data. It is a continually evolving technology involving web content, web structure, and web usage [61].
90
R. Lourdusamy and X. J. Mattam
4.3 Artificial Intelligence (AI) Based Decision Support Adopting AI in DSS is a significant evolution in DSS in recent years. It is seen as a natural outcome of the growth and development of AI technologies. AI integrated DSS is referred to as Intelligent DSS (IDSS) since it mimics the human cognitive capabilities in decision making. IDSS makes use of AI tools data mining, data analytics, data fusion, and optimizing techniques [62]. Successful IDSS work in the complex domain. They intelligently interact with the users, effectively communicate events and changes, check reasoning errors, extract information from huge datasets, and forecast problems [63]. IDSS includes ambient intelligence, internet of things, recommender systems, advisory systems, expert systems, neural networks, machine learning, deep learning, and many other such tools. IDSS aims to guide users through the decision making tasks or supplying them with new capabilities for decision. IDSS are support systems that have some degree of human intelligence in their components. DSS has evolved in several ways with the integration of AI in DSS. The focus of DSS has shifted from problem-solving using structured data to anticipating problems using structured and unstructured data. IDSS includes biometric-based DSS, Ambient Intelligence and the Internet of Things-Based DSS, Genetic Algorithm based DSS, Fuzzy Sets IDSS, Intelligent Agent-Assisted DSS, Adaptive DSS, Computer Vision-Based DSS, Sensory DSS, Robotic DSS and many other such AI adaptations to DSS [64].
5 The Gap in Literature The available literature is clear about the different types of DSS and its evolutionary stages can be inferred from the various studies. But the various types could not have evolved in isolation. The types are interrelated as shown in Fig. 1. It would be interesting therefore to study the relationship between the types and their consequent evolution. For example, the study of how developments in
DSS Based on Framework ModelDriven DataDriven
Communicationdriven
KnoledgeDriven
DocumentDriven
DSS Applications Business Healthcare Agriculture NaturalResource …. ….
DSS Based on Architecture
Interactive
Web-based
Fig. 1 Block diagram representing the relationship among the types of DSS
AI-based
Adaptation and Evolution of Decision Support …
91
communication-driven DSS led to a complementary evolution in the data-driven or model-driven framework would be interesting. Moreover, as discussed, since the evolution of DSS is a direct consequence of technological developments, computer-altered technologies like Virtual Reality, Augment Reality, Mixed Reality, and Extended Reality that are in existence should has become part of DSS. Types of DSS using these technologies could be studied as the next stage of evolution.
6 Conclusion There are many ways to study the development of DSS and one such way is the adaptation of processes and technological advancements into the system. The DSS gradually evolved out of such adaptations. A detailed study of the evolution of DSS could point to the future evolution of DSS. The present study is not an exhaustive typological survey. There are many other ways in which the DSS has evolved and there are many other types of DSS that are not referred to in this brief survey. This work is rather a sketch of how the evolution of DSS can be studied from a typological standpoint. Further studies can be done to elaborate on the various evolutionary methods and types.
References 1. Bush, V.: As We May Think. The Atlantic, Atl (1945) 2. Power, D.: Decision support systems: from the past to the future. In: Americas Conference on Information Systems (AMCIS). pp. 2025–2031. AIS Electronic Library (AISeL), New York (2004) 3. Averweg, U.R.: Historical overview of decision support systems (DSS). In: Encyclopedia of Information Science and Technology, Second Edition. pp. 1753–1758. IGI Global (2009) 4. Turban, E., Aronson, J., Llang, T.: Decision Support Systems and Intelligent Systems. (2003) 5. Eom, H.B., Lee, S.M.: Decision support systems applications research: a bibliography (1971– 1988). Eur. J. Oper. Res. (1990). https://doi.org/10.1016/0377-2217(90)90008-Y 6. Eom, S., Lee, S., Somarajan, C., Kim, E.: Decision support systems applications. OR Insight 10, 18–32 (1997). https://doi.org/10.1057/ori.1997.9 7. Eom, S., Kim, E.: A survey of decision support system applications (1995–2001). J. Oper. Res. Soc. (2006). https://doi.org/10.1057/palgrave.jors.2602140 8. Burstein, F., W. Holsapple, C., O’Leary, D.E.: Decision support system evolution: predicting, facilitating, and managing knowledge evolution. In: Handbook on Decision Support Systems, 2 (2008) 9. Arnott, D.: Decision support systems evolution: framework, case study and research agenda. Eur. J. Inf. Syst. 13, 247–259 (2004). https://doi.org/10.1057/palgrave.ejis.3000509 10. Sauter, V.L., Schofer, J.L.: Evolutionary development of decision support systems: important issues for early phases of design. J. Manag. Inf. Syst. (1987). https://doi.org/10.1080/07421222. 1988.11517809 11. Burstein, F., Holsapple, W.C., Power, D.J.: Decision support systems: a historical overview. In: Handbook on Decision Support Systems, 1 (2008)
92
R. Lourdusamy and X. J. Mattam
12. DeSanctis, G., Gallupe, B.: Group decision support systems: a new frontier. ACM SIGMIS Database 16, 3–10 (1984). https://doi.org/10.1145/1040688.1040689 13. Gray, P.: Group decision support systems. Decis. Support Syst. 3, 233–242 (1987). https://doi. org/10.1016/0167-9236(87)90178-3 14. Gao, S.: Mobile decision support systems research: a literature analysis. J. Decis. Syst. (2013). https://doi.org/10.1080/12460125.2012.760268 15. Power, D.J.: Mobile decision support and business intelligence: an overview. J. Decis. Syst. (2013). https://doi.org/10.1080/12460125.2012.760267 16. Bui, T., Jarke, M.: Communications requirements for group decision support systems. In: Proceedings of the Hawaii International Conference on System Science. pp. 524–533 (1986) 17. Bui, T.X., Bui, T.X.: Co-oP: A Group Decision Support System for Cooperative Multiple Criteria Group Decision Making. Springer (1987) 18. Power, D.J.: Understanding data-driven decision support systems. Inf. Syst. Manag. (2008). https://doi.org/10.1080/10580530801941124 19. Watson, H.J., Wixom, B.H., Hoffer, J.A., Anderson-Lehman, R., Reynolds, A.M.: Real-Time business intelligence: Best practices at continental airlines. Inf. Syst. Manag. 23 (2006). https:// doi.org/10.1201/1078.10580530/45769.23.1.20061201/91768.2 20. Filip, F.G.: Decision support and control for large-scale complex systems. Annu. Rev. Control. (2008). https://doi.org/10.1016/j.arcontrol.2008.03.002 21. Power, D.J.: Using ‘Big Data’ for analytics and decision support. J. Decis. Syst. (2014). https:// doi.org/10.1080/12460125.2014.888848 22. Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Big Data. (2013). https://doi.org/10.1089/big.2013.1508 23. Poleto, T., De Carvalho, V.D.H., Seixas Costa, A.P.C.: The roles of big data in the decisionsupport process: an empirical investigation. In: Lecturer Notes in Business Information Process (2015). https://doi.org/10.1007/978-3-319-18533-0_2 24. Power, D.J., Sharda, R.: Model-driven decision support systems: concepts and research directions. Decis. Support Syst. (2007). https://doi.org/10.1016/j.dss.2005.05.030 25. Konsynski, B.R.: Model management in decision support systems. In: Data Base Management: Theory and Applications, pp. 131–154. Springer Netherlands, Dordrecht (1983) 26. Blanning, R.W.: Model Management Systems. Decis. Support Syst. 9, 9–18 (1993). https:// doi.org/10.1016/0167-9236(93)90019-Y 27. Applegate, L.M., Konsynski, B.R., Nunamaker, J.F.: Model management systems: design for decision support. Decis. Support Syst. 2, 81–91 (1986). https://doi.org/10.1016/0167-923 6(86)90124-7 28. Bharadwaj, A., Choobineh, J., Lo, A., Shetty, B.: Model management systems: a survey. Ann. Oper. Res. 38, 17–67 (1992). https://doi.org/10.1007/BF02283650 29. Elam, J.J., Henderson, J.C., Miller, L.W.: Model Management Systems: An Approach to Decision Support in Complex Organizations. Philadelphfa. PA 191 04 (1980) 30. Hamad, M.M., Qader, B.A.: Knowledge-driven decision support system based on knowledge warehouse and data mining for market management. Int. J. Appl. Innov. Eng. Manag. 3, 139–147 (2014) 31. Hosseinioun, P., Shakeri, H., Ghorbanirostam, G.: Knowledge-Driven decision support system based on knowledge warehouse and data mining by improving apriori algorithm with fuzzy logic. Int. J. Comput. Inf. Eng. 10, 528–533 (2016). https://doi.org/10.5281/zenodo.1339201 32. Fedorowicz, J.: Task force on document-based decision support systems. In: Sprague, R.H.J., Watson, H.J. (eds.) Decision Support for Management, pp. 168–181. Prentice Hall, Upper Saddle River, New Jersey (1996) 33. Swanson, E.B., Culnan, M.J.: Document-based systems for management planning and control: a classification, survey, and assessment. MIS Q. 2 (1978). https://doi.org/10.2307/248903 34. McCosh, A.M., Morton, M.S.S.: The fundamental character of decision support systems. In: Management Decision Support Systems, pp. 3–25. Palgrave Macmillan UK, London (1978) 35. Zhang, X.: The evolution of management information systems: a literature review. J. Integr. Des. Process Sci. 17 (2013). https://doi.org/10.3233/jid-2013-0009
Adaptation and Evolution of Decision Support …
93
36. Dickson, G.W.: Management information systems: evolution and status. Adv. Comput. 20, 1–37 (1981) 37. Dickson, G.W.: Management information-decision systems: a new era ahead? Bus. Horiz. 11, 17–26 (1968). https://doi.org/10.1016/0007-6813(68)90004-9 38. Jakši´c, D., Pavkov, S., Pošˇci´c, P.: Business Intelligence Systems Yesterday, Today and Tomorrow—An Overview. Zb, Veleuˇcilišta u Rijeci (2016) 39. Panian, Z.: The evolution of business intelligence: from historical data mining to mobile and location-based intelligence. Recent Res. Bus Econ. (2013) 40. Wright, A., Sittig, D.F.: A four-phase model of the evolution of clinical decision support architectures (2008) 41. McCallie, D.P.: Clinical decision support: history and basic concepts. In: Healthcare Information Management Systems: Cases, Strategies, and Solutions, 4t edn. (2015) 42. Yang, J., Kang, U., Lee, Y.: Clinical decision support system in medical knowledge literature review. Inf. Technol. Manag. 17 (2016). https://doi.org/10.1007/s10799-015-0216-6 43. Kliskey, A.D.: The role and functionality of GIS as a planning tool in natural-resource management. Comput. Environ. Urban Syst. (1995). https://doi.org/10.1016/0198-9715(94)00029-8 44. Kumbhar, V., Singh, T.: A systematic study of application of spatial decision support system in agriculture. Int. J. Sci. Eng. (2013) 45. Reddy, M.N., Rao, N.H.: GIS Based Decision Support Systems in Agriculture. Hyderabad, Telangana (1995) 46. Sreekanth, P.D., Kumar, K.V., Soamand, S.K., Srinivasarao, C.H.: Spatial decision support systems for smart farming using geo-spatial technologies. In: Patil, P.L., Dasog, G.S., Biradar, D.P., Patil, V.C., Aladakatti, Y.R. (eds.) National Conference on Application of Geo Spcial Technologies and IT in Smart Farming, pp. 118–122. University of Agricultural Science Darwad, Dharwad, Karnataka (2018) 47. Chen, T.E., Chen, L.P., Gao, Y., Wang, Y.: Spatial decision support system for precision farming based on GIS web service. In: Proceedings—2009 International Forum on Information Technology and Applications, IFITA 2009 (2009) 48. Paolo, R., Benno, K., Thorsten, Z., Barbara, K., Beate, T., Jeanette, J.: Decision support systems in agriculture: administration of meteorological data, use of geographic information systems (GIS) and validation methods in crop protection warning service. In: Efficient Decision Support Systems—Practice and Challenges from Current to Future (2011) 49. Korschgen, C., Knutson, M.: Natural resource assessment and decision support tools for bird conservation planning. In: Ralph, C. John; Rich, Terrell D. (eds.) Bird Conservation Implementation and Integration in the Americas: Proceedings of the Third International Partners in Flight Conference. 2002 Mar 20–24; Asilomar, California, volume 2, Gen. Tech. Rep. PSWGTR-191. Albany, CA: US Dept. of Agriculture, Forest Service, Pacific Southwest Research Station, vol. 191, pp. 1213–1223 (2005) 50. Stock, M.W., Rauscher, H.M.: Artificial intelligence and decision support in natural resource management. New Zeal. J. Sci. 26 (1996) 51. Druzdzel, M.J., Flynn, R.R.: Decision support systems. In: McDonald, J.D., Levine-Clark, M. (eds.) Encyclopedia of Library and Information Sciences, p. 9. CRC Press (2017) 52. Fang, L., Hipel, K.W., Kilgour, D.M., Peng, X.: A decision support system for interactive decision making—Part I: Model formulation. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. (2003). https://doi.org/10.1109/TSMCC.2003.809361 53. Hernandez, G., Seepersad, C.C., Allen, J.K., Mistree, F.: A method for interactive decisionmaking in collaborative. Int. J. Agil. Manuf. Syst. 5, 47–65 (2002) 54. Salo, A.A.: Interactive decision aiding for group decision support. Eur. J. Oper. Res. (1995). https://doi.org/10.1016/0377-2217(94)00322-4 55. Teghem, J., Delhaye, C., Kunsch, P.L.: An interactive decision support system (IDSS) for multicriteria decision aid. Math. Comput. Model. 12 (1989). https://doi.org/10.1016/0895-717 7(89)90370-1 56. Angehrn, A.A., Lüthi, H.-J.: Intelligent decision support systems: a visual interactive approach. Interfaces (Providence) 20, 17–28 (1990). https://doi.org/10.1287/inte.20.6.17
94
R. Lourdusamy and X. J. Mattam
57. Yao, J.T.: An introduction to web-based support systems. J. Intell. Syst. 17 (2008). https://doi. org/10.1515/JISYS.2008.17.1-3.267 58. Bayani, M.: Web-based decision support systems: a conceptual performance evaluation. In: 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES), pp. 21– 26. IEEE (2013) 59. Boreisha, Y., Myronovych, O.: Web-based decision support systems as knowledge repositories for knowledge management systems. Ubiquitous Comput. Commun. J. 3, 22 (2008) 60. Chen, H., Zhang, X., Chi, T.: An architecture for web-based DSS. In: 6th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems (SEPADS’07), pp. 75–79. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2007) 61. Turban, E., Sharda, R., Delen, D.: Data Warehousing. In: Turban, E., Sharda, R., Delen, D. (eds.) Decision Support and Business Intelligence Systems, pp. 326–373. Prentice Hall, Upper Saddle River, New Jersey (2011) 62. Phillips-Wren, G.: AI tools in decision making support systems: a review. Int. J. Artif. Intell. Tools. 21, 1240005 (2012). https://doi.org/10.1142/S0218213012400052 63. Guerlain, S., Brown, D.E., Mastrangelo, C.: Intelligent decision support systems. In: SMC 2000 Conference Proceedings. 2000 IEEE International Conference on Systems, Man and Cybernetics. “Cybernetics Evolving to Systems, Humans, Organizations, and their Complex Interactions” (Cat. No.00CH37166), pp. 1934–1938. IEEE 64. Kaklauskas, A.: Intelligent decision support systems. In: Biometric and Intelligent Decision Making Support, pp. 31–85. Springer International Publishing (2015)
Real-Time Implementation of Brain Emotional Controller for Sensorless Induction Motor Drive with Adaptive System Sridhar Savarapu and Yadaiah Narri
Abstract This paper presents the implementation of space vector modulation (SVM) based direct torque control (DTC) of sensorless induction motor (IM) drive, using brain emotional controller (BEC). An Intelligent controller is developed by the mammalian brain is to deliver emotions by processing information in amygdale and orbitofrontal cortex. It is very well structured as quick acting controller and appropriate for self-learning control applications. A model reference adaptive system (MRAS) with sensorless stator and rotor flux observer is used to estimate the stator and rotor fluxes, stator currents and rotor speed of sensorless IM. The proposed strategy is implemented on Opal-RT digital simulator to control the speed, torque ripple and flux ripple reduction of the SVM-DTC sensorless IM when compared with PI control. Accordingly, the simulation and experimental results have been carried out. Keywords Brain emotional controller · Mammalian brain · Amygdala · Orbitofrontal cortex · Model Reference Adaptive System · Direct Torque Control · Low speed
1 Introduction The control techniques of IM using vector control and DTC are having wide consideration in the customizable speed drive application [1]. The vector control technique is identified with the combination of current and flux control, while the DTC is involved with the torque and flux controllers. The vector control technique experiences sensitivity of the motor parameters and design of control regulators, these S. Savarapu (B) Department of Electrical and Electronics Engineering, Jawaharlal Nehru Technological University College of Engineering Ananthapur, Ananthapuramu 515002, A.P., India e-mail: [email protected] Y. Narri Electrical and Electronics Engineering Department, Jawaharlal Nehru Technological University Hyderabad, College of Engineering, Kukatpally, Hyderabad 500085, Telangana, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_9
95
96
S. Savarapu and Y. Narri
drawbacks are bypassed while utilizing DTC technique. The essential DTC structure proposed by Takahashi in [2], offers a quick response and robustness with respect to the rotor parameters and has a much lower control structure. The primary downside of Takahashi DTC technique is high torque and flux ripple. In this strategy the optimal changes in torque and flux is influenced through the hysteresis controllers to get inverter switching. Since none of the voltage switches of the inverter can convey the ideal changes in both the torque and stator flux, it causes high torque and flux ripple in the DTC [3]. The authors have explored various DTC control strategies. A new strategy Space vector modulation (SVM) in DTC replaces hysteresis mode of inverter switching with dwell time based switching [4]. DTC-SVM operates at constant switching frequency [5]. By using SVM, the switching signals for the inverter are determined. The SVM based DTC methods are recommended by several authors [6], it maintains constant switching frequency and also effects on reducing the flux and torque ripple. A sensorless control application rotor speed and rotor position is estimated which effects performance of the IM in reducing the hardware complexity, cost, elimination of sensor cables, size of the drive and higher noise immunity. For speed estimation various estimation techniques are available such as adaptive observers, kalman filters and MRAS out of which MRAS is taken into consideration for more popular and wide acknowledged strategy. The observer does not estimate the rotor speed but it estimates flux of the stator and rotor, which are used as a (reference system) in MRAS for estimation of rotor angular speed in adaptive system [7–10]. Further the estimated rotor speed is fed to control IM drive for sensorless application. Bio-inspired artificial intelligence controller can be used to overcome these issues related to parameter variation, high non-liner nature of drive and load toque disturbance effects on the dynamic performance of the motor. Artificial intelligent controller inspired by bio-inspiration based on a cerebral medical model and an emotional process BEC [11, 12]. In this paper, the BEC is used as a speed controller which returns the reference torque from the speed error. It is also proposed to overcome the limitations of existing controllers such as complexity in design and fast response in disturbed environment [13, 14]. The BEC was introduced by Lucas in 2004 [15] and it was applied for PMSM drive by M. A. Rahman. The BEC design contains sensory input signals and emotional cues or reward functions that produce the generation of emotional signals [16, 17]. To decrease in torque ripples, harmonics reduced in the stator phase windings and the fast response time of the speed signal. This paper main objective is to focusing on the fast acting controller taking uncertain non-linearity of the system into consideration. The experimental work is performed with the real-time digital simulator (Op-RTDS) in hardware-in-loop mechanism and suggested controller’s performance is contrasted with PI based SVM DTC strategies.
Real-Time Implementation of Brain Emotional …
97
2 Architecture and Computational Model of Brain Emotional Controller Biologically inspired BEC is developed by the functional mechanism of the mammalian brain. The emotions are made in the limbic system by correlated data from its various parts, at that point it reactions a specific activity from the mammalian brain. The parts of the system would be sensory input, orbitofrontal cortex, thalamus and amygdala signal and also these elements are responsible for processing of emotions Fig. 1. These parts are constructed with mathematical models. The intelligent emotional response is very quick and rapid decision making when compared to conventional controllers. Figure 1 shows the model of the brain emotional controller (BEC). It is multiple input single output with sensory inputs S i , where t = 1, … n, For better understanding, one sensory input node (feedback signals) and one emotional cue EC signal node it depends on the performance objectives are considered here. The most reliable tuning of the preliminary coefficients of the BEC isn’t always necessary. But, beside the idea range of the initial parameters might cause performance that is unstable. Each sensory input S i there are two cases in brain emotional controller, one is related to orbitofrontal cortex output, OC t , and another one is amygdala’s output, At, the main learning process of the system with the Amygdale and orbitofrontal cortex components. The input into the controller is sensory signal (S i ) that can be altered with function f, this can be combination of error value (e), plant output (Z p ) and controller output (Z c ). It may be expressed as Si = f (e, z p , z c )
(1)
f = K1 · e + K2 · z p + K3 ·
Fig. 1 Structure of brain emotional controller
z c dt
(2)
98
S. Savarapu and Y. Narri
where K1 , K2 , K3 are the gains of S i . The sensory signal has been forwarded into the sensory cortex and Amygdala it’s nothing but Thalamus’ outcome. S i is processed with the function g to get sensory cortex. SCt = g(Si )
(3)
g(si ) = esi
(4)
The function g is represented as
There’s a single corresponding node At and OCt node, that generate the amygdala and orbitofrontal cortex outputs. For each sensory input St there are two states in BEC one is related to OCt, output and another one is At output As given by: At = Vt Si
(5)
Ath = kth Vth Sth
(6)
Sth = max(0, si )
(7)
The OC t output may be written as OCt = Wt St
(8)
where V t , W t are the gain connections of At and OC t , and its learning rule can be expressed as (5), (8). The weights V t , V th and W t are varied by online learning (tuning) automatically (9)–(12) and the learning process needs their primary values. Vt = α max(0, EC − At−1 ) SCt
(9)
t Vt (t) =
Vt dt + Vt (0)
(10)
0
The At and OC t learning process passes through the internal weight update Wt = β E t − EC SCt where E t = At (t − 1) − OCt (t − 1)
(11)
Real-Time Implementation of Brain Emotional …
99
t Wt (t) =
Wt dt + Wt (0)
(12)
0
In the above, symbol is represents variations in weights. α and β are the learning rates of the At and OC t . EC = h(e, Z p , Z c )
(13)
H = K 1 · e + K 2 · |e.z c | + K 3 · z p
(14)
EC has K 1 , K 2 , and K 3 gains. The output of the emotional node E t is obtained by the difference between amygdala node At and thalamus nodes Ath . And Orbitofrontal Cortex node OC t is as fallows E t = At + Ath − OCt
(15)
The total derived output as E = A−O
(16)
where A and O are the outputs of Amygdala and Orbitofrontal cortex in each time. E is the resultant of the controller. The reward signal EC that may also differ every time specifies fine and satisfactory surface of control process and it’s a part in tuning these weights. Unlike the traditional artificial neural system, the BEC doesn’t need any distinct iteration for analyzing or upgrading parameters at Fig. 2. Just a flowchart illustrating the functioning process of this introduced BEC, also it indicates that following departure each period that the weights have been corrected to tackle the system adjustments.
3 MRAS Speed Estimator for IM drive For sensorless control of IM drive MRAS is a standout amongst the most popular and preferred scheme as the implementation is ease for obtaining good results. Rotor flux based MRAS using the voltage model as a reference model and current model as an adjustable model. Reference model is independent of the rotor speed and adjustable model is dependent on the rotor speed. The separation between the outputs of both the reference model and the adaptive model wind up at zero when the speed is exactly assessed. The outputs are represented by r x and r y these are estimates of rotor flux space vector in stationary reference frame. The adjustable method may be based on the next rotor model.
100
S. Savarapu and Y. Narri
Fig. 2 Flowchart of the operation of the BEC
u sn = Rs i s + 0 = Rr i r +
dψ r dt
dψ s dt − jωr ψ r
(17)
(18)
Real-Time Implementation of Brain Emotional …
101
Fig. 3 MRAS speed estimator block diagram
The above equations are illustrates the stator and rotor model in stationary reference frame. The calculation of the rotor flux depends on the rotor speed of the Eqs. (17), (18) represents the adjustable model of Fig. 3. The rotor flux components in the stationary reference frame are followed as
ψx p r ψr y
0 u sn x (Rs + σ L s ρ) i αs − u sn y 0 (Rs + σ L s ρ) i βs 1 − Tr −ωr L m i αs ψr x ψr x p = + ψr y ωr − T1r ψr y Tr i βs
Lr = Lm
(19)
(20)
where σ = 1 − (L 2m L s L r ) and p = dtd . ωr, is rotor angular speed and Tr is the rotor time constant. The difference between the two rotor flux space vectors is utilized as speed error signal. The speed tuning signal adjusts the rotor speed estimation algorithm that converts the error signal to zero. The speed tuning signal adjusts the rotor speed estimation algorithm that converts the error signal to zero. ω
Esti
= K p ε + Ki
εdt
(21)
102
S. Savarapu and Y. Narri
Input of the PI controller ε = ψˆ rx ψry − ψˆ ry ψrx
(22)
The PI controller parameters are Kp and Ki . The stator and rotor flux observer system (Reference observer system) Observer is mainly used for estimation of stator and rotor flux components and stator current, the relationships are illustrated by the fallowing Eqs. (23)–(26).
d ψˆ sx 1
= 1 −ψˆ sx + kr ψˆ rx + u sx − k i sx − iˆsx dτ τs y
d ψˆ s 1
= 1 −ψˆ sy + kr ψˆ ry + u sy − k i sy − iˆsy dτ τs 1 x ψˆ rx = ψˆ s − σ L s i sx kr 1 y ψˆ ry = ψˆ s − σ L s i sy kr
(23)
(24) (25) (26)
The stator current components can be represented from (27), (28) 1 x ψˆ s − kr ψˆ rx σ Ls 1 y iˆsy = ψˆ s − kr ψˆ ry σ Ls
iˆsx =
(27) (28)
For the estimation of the motor speed adaptive speed observer is used as a reference system in the MRAS speed estimator scheme (29), (30). d ψˆ rx 1 Lm x = − ψˆ rx − ωˆ r ψˆ ry + i dt Tr Tr s
(29)
ˆr d 1 y L ˆ r + ωˆ r ˆ + m i sy =− dt Tr Tr
(30)
y
Real-Time Implementation of Brain Emotional …
103
4 SVM-DTC for Induction Motor Mathematical modeling of IM The mathematical model is referred to stationary reference frame is used in DTC strategy [17]. The stator flux is estimated in the real-time at each sampling period ts by voltage and current signals. The discrete form of d, q components of stator flux and actual stator flux at each sampling period is shown in (31), (32) and (33). ψds (k + 1) = ψds (k) − ts Rs i ds (k) + ts Vds (k)
(31)
ψqs (k + 1) = ψqs (k) − ts Rs i qs (k) + ts Vqs (k)
(32)
ψs (k + 1) =
2 2 (k + 1) ψds (k + 1) + ψqs
(33)
The standstill flux equations shown in (34), (35) are used up to the motor starts rotating and then the current model of the flux (31), (32) and (33) is used [18]. ψds (k + 1) = ψds (k) − σ L s i ds (k) +
Lm ψdr (k) Lr
(34)
ψqs (k + 1) = ψqs (k) − σ L s i qs (k) +
Lm ψqr (k) Lr
(35)
where σ = 1 − (L 2m L s L r ). The torque developed in the IM is expressed in (36) and (37) Te =
3 p Lm
ψ s × ψr 2 2 Lr Ls
(36)
Using the d-q stator flux and currents, the torque of the motor is estimated as Te (k + 1) = Te (k) −
3p (ψds (k) i qs (k) − ψqs (k)i ds (k) 4
(37)
SVM-DTC The DTC-SVM removes the variable switching frequency of the operation by using the rotor reference voltage vector. It estimates the actual value of torque and flux. The controller provides the voltage vectors, these vectors are then accomplished by SVM method to achieve exact space vector with fixed switching frequency. There are eight switching states of the inverter shown in Fig. 4. At any moment, two zero vectors and two energetic voltage vectors are employed to adjust the reference voltage, Vs*.Triangle-comparison approach is utilized.
104
S. Savarapu and Y. Narri
Fig. 4 Inverter switching in space vector PWM Switching sequence in sector-I
Vs∗ Ts = di Vi + di+1 Vi+1 + d0,7 V0
(38)
Where V i and Vi+1 are adjacent voltage vectors in the ith sector and d 0,7 = d z are zero switching voltage vectors √ ∗ 3 Vs × Ts × sin π3 − α d1 = Vdc √ ∗ 3 Vs × Ts × sin α d2 = Vdc
(39)
(40)
d0 = Ts − (d1 + d2 )
(41)
dz = Ts − d1 − d2
(42)
‘α’ is the angle of reference voltage Vs*. The switching functions Sa , Sb and Sc in sector SVM DTC strategy can be obtained by the average of d1 , d2 and dz are given in (24)–(27) Sa = d1 + d2 +
d0 2
(43)
Real-Time Implementation of Brain Emotional …
Sb =
105
d0 d1 + d2 + 2 2
(44)
d0 2
(45)
Sc =
5 Simulation Results In this paper, implemented in simulation and real time Hardware in loop (HIL) environment and hardware experiment verifications are conducted to validate the simulation results. A fractional h.p IM can be employed for experimental and simulation performs. Their parameters have been shown from the Appendix 1. The controller gains of DTC are tuned on trial and error basis. The simulator results are got with reference speed mention of 157 rad/s with no load condition it is observed that the BEC settles to command speed smoothly and quickly without oscillations shown in Fig. 7a, b. While using PI controller speed reach the command speed with transient oscillations as shown in Fig. 6a, b. The motor speed reaches to its steady state with BEC and PI controllers are 0.1 and 0.6 s. The capability of BEC based IM drive is tested by conducting speed tracking. The speed of IM is tracking with command speed values 50-150-250-300 rad/s at the time of intervals 0-0.2-0.5-0.8-1 s as shown in Fig. 7c and for PI Controller Fig. 6c. When sudden load is applied on BEC and PI controller based SVM DTC of IM drive, sudden change in speed response is occurred and excessive stator current draws as a result small dip is observed in the speed response as shown in Figs. 6d and 7d. The load disturbance occurred at 0.36 NM at step time 0.8 s is shown in Figs. 6d and 7d. This is recovered swiftly without any transients at the time of load disturbing by using BEC. From the Figs. 6e and 7e. Noticed that the less torque ripples are observed by using BEC based SVM DTC when compared with PI controller based SVM DTC. The torque ripple of the motor is determined by Tmax 2 − Tmin 2 Tmax N − Tmin N 1 Tmax 1 − Tmin 1 × 100 + + ··· + %T = N TL TL TL (46) The Tmax and Tmin points are set apart at the instant close to simulation time of 1 s, at which the torque ripples of BEC based SVM DTC is determined. Thus, SVM DTC strategy effects torque ripple of 7.89% and for PI controller based SVM DTC is 14.8. In Figs. 6f and 7f shows d-q axis stator flux locus and the three phase currents and three phase voltages of both BEC and PI control based DTC SVM are shown in Figs. 5, 6g, 8g, h. ψri p−1q = (V1 , q − Va∗ ) × T1
(47)
106
S. Savarapu and Y. Narri
Fig. 5 Block diagram of IM with BEC control [18]
ψri p−2q = (V2 , q − Va∗ ) × T2
(48)
ψri p−z = −Va∗ × T2
(49)
ψri p−1d = V1,d × T1
(50)
ψri p−2d = V2,d × T2
(51)
The flux ripple over the switching interval can be obtained by using (47)–(51). This indicates that flux ripple is shown in Figs. 6f and 7f. The flux ripple is 6.5% for BEC and for 9.8% PI controller In Fig. 7i. Illustrates the estimated speed from MRAS speed observer here giving reference speed is 157 rad/s in closed loop system it exactly estimate the MRAS as shown in the Figure is 157 rad/s obtained. From the below Figures it can be concluded that BEC offers better performance in all applications with a rapid response of low steady-state error and sensitivity to the disturbing load. The torque and flux ripples also reduced. The BEC overcomes the problems.
Real-Time Implementation of Brain Emotional …
107
(a)
(b)
(c)
(d) 0.1
1.2
0.05
0.8
q-axis flux
Torque (N-m)
1
0.6 0.4 0.2 0
0
-0.05
-0.2 -0.4
0
0.5
1
1.5
-0.1 -0.1
2
-0.05
Time (sec)
Phase currents (amp)
(e)
0 d-axis flux
0.05
0.1
(f)
10 5 0 -5 -10
0
0. 05
0. 1
0. 15
0. 2
Time (sec)
(g)
Fig. 6 Tracking performance of PI controller based SVM DTC of sensorless IM drive with MRAS. The reference speed of 157 rad/s when no load is applied: a speed response, b zoomed speed response, c with different speed tracking. Applied step change in load of 0.36 N-m at 0.8 s: d speed response, e torque response, f d-q axes stator flux response, g three phase stator currents
108
S. Savarapu and Y. Narri
6 Experimental Results The proposed BEC for sensorless control of IM drive by DTC-SVM Method with Adaptive System is conducted by using OPAL-RT hardware digital simulator (OpRTDS). The Mechanism by forming a closed loop access in between plant algorithms
(a)
(c)
(e)
(b)
(d)
(f)
Fig. 7 Tracking performance of BEC controller based SVM DTC of sensorless IM drive with MRAS. The reference speed of 157 rad/s when no load is applied: a speed response, b zoomed speed response, c with different speed tracking. Applied step change in load of 0.36 N-m at 0.8 s: d speed response, e torque response, f d-q axes stator flux response, g three phase stator currents, h phase voltages, i estimated speed with MRAS
Real-Time Implementation of Brain Emotional …
109
10
40 30
6 4
Phase voltage(V)
Phase currents(amp)
8
2 0 -2 -4 -6
10 0 -10 -20 -30
-8 -1 0
20
0
0.05
0.15
0.1
0.2
-40
Time(sec)
1.4
1.45
1.5
1.55
1.6
Time(sec)
(g)
(h)
Speed(rad/sec)
200
150
100
50
0
0
0.5
1
1.5
2
Time(sec)
(i)
Fig. 7 (continued)
and control algorithms in real time is called as rapid control prototyping (RCP) shown in Fig. 8 [19]. In real-time step of TS = 4 μs and switching frequency of fs = 19 kHz is chosen. The Induction motor Simulink model.mdl file divided into two subsystems. All computations which are necessary in the sub system are called
Fig. 8 RCP based experimental setup photograph for DTC induction motor drive
110
S. Savarapu and Y. Narri
subsystem master (SM). Plant reference signals and soft-line monitoring signals are kept in the other subsystem and it is called subsystem console (SC). In Figs. 9a, c and 10a, c. Illustrates the experimental results of speed, torque and flux response of BEC and PI based DTC-SVM of IM drive with reference speed of 157 rad/s. The load applied on the drive is a D.C generator connected with suitable number of lamps. Figures 9b, d and 10b, d illustrates the speed, torque and flux response of the BEC based SVM-DTC under loaded condition. By using outgoing channels of op-RTDS the speed and torque magnitude are measured in terms of voltage. It is observed that, the actual torque ripple corresponding to this 100mv which is calculated by using (46) is equal to 9.29% under loaded condition. By using PI controller based SVM DTC IM drive The torque ripple is 15.6% From the Fig. 10a, b. The voltage
(c)
(a)
(d) (b)
Fig. 9 Experimental results of PI controller based SVM DTC sensorless IM drive a speed and torque response, b zoomed response of speed and torque, c d-q axes flux responses, d stator flux response
(a)
(b)
(c)
(d)
Fig. 10 Experimental results of BEC based SVM DTC sensorless IM drive a speed and torque response, b zoomed response of speed and torque, c d-q axes flux responses, d stator flux response
Real-Time Implementation of Brain Emotional …
111
Fig. 11 Experimental results of sensorless IM drive by using PI controller a three phase voltages, b three phase currents. By using BEC controller, c three phase voltages, d three phase currents
Table 1 Comparative results Controller
Performance parameter
Speed settling time
Torque ripple (%)
Flux ripple (%)
Speed drop on 60% load (%)
PI based SVM DTC
Simulation
0.6
14.8
9.8
40
Experimental (Opal-RT HIL RCP)
–
15.6
12
47
BEC based SVM DTC
Simulation
0.1
7.89
6.5
5
Experimental (Opal-RT HIL RCP)
–
9.29
7.8
7
corresponding to the droop in speed is equal to 0.7 V on 1:10 scale. Then the actual speed droop is 0.7 × 10 = 7 rad/s. The experimental three phase voltages and three phase currents in Fig. 11a–d are shown (Table 1).
7 Conclusions In this paper, The BEC has been effectively executed in OP-RTDS hardware in loop (HIL) for SVM DTC based IM drive under various test conditions and execution of the drive is examined. The inherent learning process of BEC is presented for sensorless control of SVM DTC of induction motor. BEC controller gives very well adaptive for tuning BEC parameters and shows good self-learning mechanism under various test conditions. BEC gives better results when compared with existing PI controller. In various test conditions implied on IM drive, BEC gives Dynamic efficiency when
112
S. Savarapu and Y. Narri
it comes to less settling time speed with reduced peak overshoots, reduction in stator phase current harmonics and ripple reduction in electromagnetic torque. This makes BEC robust, effective and insensitive.
Appendix 1 Parameters of Induction Motor Rating: Ps = 120 W, Vs = 36 V, Rs = 0.896 , 3 − AC, f = 120 Hz Rr = 1.82 , Is = 6A, P = 4
Parameters: L ls = 1.94 mH, L lr = 2.45 mH, L m = 69.3 mH.
References 1. Casadei, D., Pmfumo, F., Serra, G., Tani, A.: FOC and DTC: two viable schemes for induction motors torque control. IEEE Trans. Power Electr. 17(5), 779–787 (2002). https://doi.org/10. 1109/TPEL.2002.802183 2. Takahashi, I., Noguchi, T.: A new quick-response and high-efficiency control strategy of an induction motor. IEEE Trans. Ind. Appl. 22(5), 820–827 (1986). https://doi.org/10.1109/TIA. 1986.4504799 3. Buja, G.S., Kazmierkowski, M. P.: Direct torque control of PWM inverter-fed AC motors. IEEE Trans. Ind. Appl. 51, 744–757(2004). https://doi.org/10.1109/TIE.2004.831717 4. Lascu, C., Boldea, I., Blaabjerg, F.: A modified direct torque control for induction motor sensorless drive. IEEE Trans. Ind. Appl. 36(1), 122–13 (2000). https://doi.org/10.1109/28. 821806 5. Adamidis, A., Koustsogiannis, Z., Vagdatis, P.: Investigation of the Performance of a variablespeed drive using direct torque control with space vector modulation. Tylor & Francis Trans. on Electric Power Components and Systems, pp. 1227–1243 (2011). https://doi.org/10.1080/ 15325008.2011.567214 6. Tripathi, A., Khambadkone, A.M., Panda, S.K.: Torque ripple analysis and dynamic performance of a space vector modulation based control method for AC-drives. IEEE Trans. Power Electron. 20, 485–492 (2005). https://doi.org/10.1109/TPEL.2004.842956 7. Kojabadi, H.M.: Simulation and experimental studies of model reference adaptive system for sensorless induction motor drive, simulation. Model. Practice Theory 13, 451–464 (2005) 8. Schauder, C.: Adaptive speed identification for vector control of induction motors without rotational transducers, IEEE Trans. Ind. Appl. 28, 1054–1061 (1992) 9. Khan, M. R., Iqbal, I., Mukhtar, A.: MRAS-Based Sensorless Control of A Vector Controlled Five-Phase Induction Motor Drive. Electric Power Systems Research, Volume 78, Issue 8, pp. 1311–1321(2008). 10. Khan, M.R., Iqbal, I.: MRAS-based sensorless control of series-connected five-phase twomotor drive system. Korian J. Electr. Eng. Technol. 3(2), 224–234 (2008) 11. Moren, J., Balkenius, C.: A computational model of emotional learning in the amygdala. In: Proceedings of 6th International Conference on Simulation Adaptive Behaviors Cambridge, pp. 411–436 (2000)
Real-Time Implementation of Brain Emotional …
113
12. Moren, J.: Emotion and Learning: A Computational Model of the Amygdala. Ph.D. dissertation, Lund University, Lund, Sweden (2002) 13. Beheshi, Z., MdHashim, S.Z.: A review on emotional learning and its utilization in control engineering. Int. J. Adv. Soft Comput. Appl. 2(2) (2010) 14. Daryabeigi, E., Arab Markade, G., Lucase, C.: Emotional Controller (BELBIC) for Electric Drives-A Review IEEE Conference, pp. 2901–2907. https://doi.org/10.1109/IECON.2010.567 4934 15. Lucas, C., Shahmirzadi, D., Sheikholeslami, N.: Introducing BELBIC: brain emotional learning based intelligent control Int. J. Intell. Automat. Soft Comput. 10(1), 11–22 (2004). https://doi. org/10.1080/10798587.2004.10642862 16. Rahman, M.A., Milasi, R.M., Lucas, C., Arrabi, B.N., Radwan, T.S.: Implementation of emotional controller for interior permanent magnet synchronous motor drive. IEEE Trans. Ind. Appl. 44(5), 1466–1476 (2008). https://doi.org/10.1109/IAS.2006.256774 17. Qutubuddin, M.D., Yadaiah, N.: Modeling and implementation of brain emotional controller for permanent magnet synchronous motor drive. In: Engineering Applications of Artificial of Intelligence System, pp. 193–203 (2017). https://doi.org/10.1016/j.engappai.2017.02.007 18. Savarapu, S., Narri, Y.: High performance of brain emotional intelligent controller for DTCSVM based sensorless induction motor drive. J. Supercomput. (2021). https://doi.org/10.1007/ s11227-020-03556-9 19. Abourida, S., Belanger, J.: Real-Time platform for the control protoptyping and simulation of power electronics and motor drives. In: Proceedings of 3rd International Conference on Modeling, Simulation and Applied Optimization, Sarjah, pp. 1–6 (2009)
Student Performance Prediction—A Data Science Approach Y. Sri Lalitha, Y. Gayathri, M. V. Aditya Nag, and Sk. Althaf Hussain Basha
Abstract Education is necessary to improve an individuals’ life with quality and brilliance. The education system has changed from conventional Teaching/Learning to activity based learning. In spite of necessary measures for effective learning the education system is not able to reap the expected outcomes. Among various factors the most vital factor that determines the reputation of an Educational Institution is Students Performance. Although with wide literature available on performance of students, still it lacks necessary tools or approaches to address different challenges faced in identifying the low performing students for necessary pedagogical intervention and ensure successful completion of their graduation on-time. Predicting Student Performance much ahead is a challenging task and very less studies are available. Identifying probable risk students at an early stage is helpful for corrective measures by student, instructor and authorities. In this paper we present a novel Machine Learning approach and study the following: influence of student background on performance, Predicting First year Result, Predicting the performance of future semester from the progressive performance till date and Predicting the performance based on categorization of related Courses. The study is on real-time data of an Engineering College, we experimented with different predictive models to show that the proposed model achieves better performance with improved accuracy. Keywords Pre-requisite base performance prediction · Educational data · Prediction and analysis Y. Sri Lalitha (B) · Y. Gayathri Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] Y. Gayathri e-mail: [email protected] M. V. Aditya Nag Institute of Aeronautical Engineering, Dundigal, Hyderabad, India e-mail: [email protected] Sk. Althaf Hussain Basha A1 Global Institute of Engineering and Technology, Markapur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_10
115
116
Y. Sri Lalitha et al.
1 Introduction Student performance is considered the crux of any educational institute. The Learning Environment changed from Teaching/Learning to Active Learning, from Face to Face interaction to Blended Learning. In-spite of wide spectrum of pedagogical inventions and implementations the institutes are not able to achieve the expected outcomes. To improve the quality of students, institutions are striving to formulate hypothesis and find solutions to problems. With the significant growth in the demand of computational models, Institutions are exploring Data Analytics to get better insights from data. The main objective of the institutes is to keep a track of how the students are performing in particular fields and identify the areas where subsequent training is required. Predictive Analytics is the field of Machine Learning when applied to Educational data can help the authorities to identify the low performing students much before a new semester begins. They can make a detailed report with the predicted information for proper pedagogical intervention and corrective measures on high risk students.
2 Literature Review Deriving the required information from the ever increasing volumes of Data is a challenging Task. This has motivated the researchers to always find techniques for better insights of data. A number of researchers have been attempting to study data from students over the past decade. Various methodologies were used such as regression analysis, classification, Data mining approaches and correlation analysis. The Student Information Systems is used to find the key aspects of Student Success and provide valuable insights for students, instructors and administrators to improve the student retention in higher education [1]. They predicted the grades for newly enrolled courses. The holistic study of past performance can help students to choose their majors, with varying difficulty in a semester and also give insights to instructor of which students requires assistance. Identifying low-performing student and providing timely assistance will improve student success rate and retention. The works in [2, 3] also discuss predicting future grades in newly enrolled courses and informing students and instructors with relevant insights. These works included the traditional university score as historical data and predict the categorical grades ranging between (A-F). The performance in on-line MOOC platform Quiz and Assignment scores are considered for prediction of scores and retention of students [2]. Along with traditional university scores, student demographic information, course features, and instructor statistics are considered to infer student retention rate [3]. In another work performed on retention rate in MCA a Post Graduate course in India, concluded students graduation grade and stream are important factors [4]. In another recent study the authors formulate a custom mixed-membership multi-linear regression model (PMLR) to predict student grades in a traditional university setting [5]. A group of 50 students
Student Performance Prediction—A Data Science Approach
117
enrolled for a course over 4 years considered for a study with performance indicators such as “Assessments”, “Assignments”, “Mid Terms”, “Labs”, “End Exams” and applied Decision Tree Model to predict score and high-risk students [6]. In a work K-Nearest Neighbour is applied to predict the grade in a course. Linear regression with k-Means is applied to predict grades [7, 8]. In another work Association Rule Mining is applied to know the Performance Indicators that influence the Student performance [9]. Another work applied Decision Tree, CART and CHAID algorithms to predict successful and unsuccessful student in New Zealand [10]. A Study in [11] used students regularity, internal marks, seminar performance to predict test grade using ID3, CART and C4.5 classification accuracy respectively. In another work students personal, demographic information, family income levels are considered for predicting performance by applying ID3, CART and C4.5. These works motivated us to create Machine learning models for real time GRIET data and determine at early stage the low performing students. Problem Statement: Analyse the data to study the following: (i) The influence of the Pre-Engineering performance (board (SSC/CBSE), Intermediate) and background information (rural/urban or Admission type) on Engineering. Predict First Year result and Low performing students that may help in reduce no. of failures with proper measures by instructors. (ii) Categorization of students into (Risk, Average and Advanced) level performing students. (iii) Grade prediction in a course and in semester. (iv) Categorizing courses based on pre-requisites and predicting the course grades.
3 Machine Learning Models for Experimentation Three Classification algorithms Support Vector Machine (SVM), Decision Trees (DT) and K-Nearest Neighbors (KNN) [12] were used not for comparison purpose but to see the relative performance of these algorithms with our proposed assumptions. The SVM classifier is a kernel-based supervised learning model that classifies the data into two or more classes. SVM builds a model, maps the decision boundary for each class, and specifies the hyperplane that separates the different classes. Increasing the distance between the classes by increasing the hyperplane margin helps increase the classification accuracy. SVM can be used to effectively perform non-linear classification. The KNN algorithm is instance based learning method that classifies the data based on closest similarity in feature vector. A pairwise similarity measure is applied to identify the k-most similar neigbours in the training set. Data is classified based on the most common class among its k-nearest neighbours. Euclidean distance measure is used to find the similarity between feature vectors. The DT classifier uses a series of decisions to determine the class label. The decision tree is based on a hierarchical decision scheme. The root node and internal nodes of the tree are the decision points, selecting the feature on which dataset is split. Leaf nodes represent the class labels. The path from root to leaf specifies the classification rules. Decision Tree is an efficient tool for the solution of classification and regression problems.
118
Y. Sri Lalitha et al.
This paper discuss the work as two methodologies (i) Analysis and Prediction of First Year Performance and (ii) Predicting Semester and Course wise Low performing students.
3.1 Methodology-I: Analysis and Prediction of First Year Performance This section deals with Data Analysis and Visualizations followed by the First Year result Prediction. Dataset Collection: The dataset is collected from an Engineering College that constitutes six streams of Engineering with features spanning from pre-engineering details to the courses. In this work the data from 2011 to 18 is considered which contains student marks/grades and semester performance. It contains details like Board (SSC, CBSE ICSS, ICFS), Urban/Rural, Marks (SSC and Intermediate), admission type (Counselling/Management) along with First Semester marks (Table 1). Dataset Preparation: The dataset is spanned over 8 years of data with discrepancies. For every 2 or 3 years there are changes in regulations, the new regulation constitutes new courses (additional) or replacement of obsolete courses. The evaluation system changed from marks to grades scale (0–10) with 0 indicating fail score and 10 indicating top Score. The data constituted missing values, some special values in lieu of marks/grade. In First year of Engineering all streams of students have the same courses, prediction on this data requires pre-engineering details along with First Semester marks therefore the Dataset-1 constituted 8080 records with 11 features after pre-processing. Visualization and Analysis of Results Tables 2 and 3 depict First year data categorized on basis of Urban and Rural background. Table 2 depicts stream wise percentage of Urban, Rural students in the dataset and Table 3: Insights on low performance Urban versus Rural percentages, it is observed that students from Rural places are prone to fail in First Year, and it is seen evidently in Civil and IT streams. Contrarily the failures are less than five percent in First year from Urban background. The chart in Fig. 1 is Donut chart. Donut chart gives the complete view of data distribution on different features of the dataset. The chart is depicting the data distribution of Failure candidates in Civil Stream. Intermediate and SSC marks are categorised into two scales as grade above 6 and grade below 6. Our assumption is that Table 1 Dataset Stream
IT
CSE
ECE
MTech
Civil
EEE
Records
1003
2205
1951
1054
939
930
Student Performance Prediction—A Data Science Approach Table 2 Urban versus rural Students percentage
Table 3 Urban versus rural failure %
119
Stream
Urban
Rural
IT
63.6
36.4
CSE
71.5
28.5
ECE
71
29
Mech
56.2
43.8
Civil
68.1
31.9
EEE
69.8
30.2
Stream
% Urban failures
% Rural failures
IT
3.3
18.9
CSE
1.1
13.2
ECE
0.7
9.9
Mech
4.6
13.4
Civil
3.6
28.7
EEE
2.3
17.4
Fig. 1 Insights of low performance civil stream
the grade 6 in Board and Intermediate are low performing students at Engineering. Therefore Rural/Urban factor has high impact on results. We have visualized the failure based on Admission Type, the admission types are counselling and donation. From Fig. 2 it is observed excluding ECE and CSE above 15% of students with admission type as donation are low performing. In ECE and CSE utmost 7% of students with admission type donation are low performing. The reasons may be students opted for donation seats due to good prospects for
120
Y. Sri Lalitha et al.
Fig. 2 Insights of admission on failure performance
Failures based on Admission Type EEE 15%
IT 21% CSE 7% ECE 5%
Civil 23%
IT
CSE
ECE
Mech 29% Mech
Civil
EEE
streams in terms of employability and on-demand streams. Therefore through this analysis Rural or Urban, Admission type Donation has significant impact on First Year Engineering performance. This Rural/Urban influence has disappeared from 2nd year engineering onwards. We have built three prediction models SVM, Decision Tree and K-NN algorithms and predicted results. The training datasets are from Dataset-1. The experiments on six streams with three machine learning models has shown accuracy ranging between 87 and 98% with least accuracy of 87% noticed in Civil stream for KNN Model. Among all SVM algorithm exhibited better results on whole dataset with accuracy ranging between 91 and 98% again 91% being Civil Stream. The Civil is a new stream introduced in 2008, whereas all other streams introduced in 1997, Civil stream is in transition state for attaining bench-mark accuracy of 98% (Fig. 3).
3.2 Methodology-II: Predicting Semester/Course Wise Low Performing Students This section deals with identifying low performing students using the following approach. (a) (b)
Prediction based on progressive performance over the past semesters Prediction based on progressive performance by Categorization of courses
Dataset-2: The datasets from 2011 to 16 are considered for Predicting Test Grades and we restrict this study to CSE and IT streams. The courses such VEE, Gender Sensitisation, Environmental Science, are excluded from our study as these are mandatory courses, and student may tend to just make an attempt for qualifying hence considered as less informative for our study. Data pre-processing is applied to bring the data into a consistent form. Normalization is applied to bring the values to
Student Performance Prediction—A Data Science Approach
121
Predicon Accuracy Comparision First Year Result 100 95
98 96
98 96
98 96
95
95 91 90 87
90 85 80 IT
CSE SVM
ECE
Mech
Decision Tree
Civil
EEE K-NN
Fig. 3 Accuracy of models on first year data
a common scale. Null or missing values are replaced by mode for categorical variable, mean/median for numeric variables in case of outliers/skewed data median is used otherwise mean is used to fill missing value. Further for the study of prediction models the following approaches are considered in dataset preparations. (a) Prediction based on progressive performance over the past semesters: Predicting the grades based on past data alone has not lead to better solution. Including the performance of student in previous tests known as progressive improvement of a student had a significant influence on performance. Progressive Performance of All courses (PPAC) till date, that is if 3rd year 2nd semester grade for a course titled “Design and Analysis of Algorithms”, is to be predicted then the model considers the grades of various courses upto 3rd year 1st Semester. The dataset contains course attributes ranging from 1st semester to 5th semester. Model (b): Firstly categorizes courses into one of these categories (Analytical or Theoretical). Identify the category and pre-requisite courses for the given course. Then find the courses from the category that has this pre-requisite course. Form a list of such courses in which the student is already graded. Consider this list for predicting the grade of current course. This approach is known as Progressive performance based on Related Courses (PPRC) [13, 14]. Figure 4 depicts the dependent courses relation. If 3rd year 2nd semester course titled “Design and Analysis of Algorithms”, grade is to be predicted, First it identifies the category of the course as “Analytical”. Then all the related courses with pre-requisites similar to the pre-requisites of DAA are considered up to 3rd year 1st semester to predict the grade. Figure 4 is for DAA, the
122
Y. Sri Lalitha et al.
C Language Problem. Solving ability
Data Structures DBMS
Computer Organizaon Operang Systems Advanced Unix Programming
Fig. 4 Course grouping
Table 4 Grade scaling
Range
Proficiency level
0–4
Risk
5–7
Average
7–10
Advanced
pre-requisites are C Language and Data Structures. While predicting the grade for DAA courses C, Data Structure, Operating Systems and DBMS grades are features considered for prediction. The Model (c) details the Progressive Performance with Grade Scaling (PPGS). Since the objective of this work is to Predict low performing students, identifying the grades is not sufficient. We have categorized the grades as presented in Table 4, three categories (Low, Average and Advanced) levels based on grades secured in Courses/Semester [14]. This work modeled all the three approaches. The algorithm in Fig. 5, presents the Progressive performance of students over the past semesters with Grade Scaling, in order to identify the Proficiency level of a student and identify week students course-wise.
4 Results Analysis The datasets thus prepared using the above approaches are modelled using SVM, KNN and Decision Tree algorithms. Table 5 depicts the accuracy of all the four approaches with SVM modelling, It is the Prediction accuracy of different courses
Student Performance Prediction—A Data Science Approach
123
Algorithm : Progressive Performance Based on Related Courses with Grade Scaling 1 Initially courses are categoried as Analytical and Theoretical 2 Let A_List , T_List represent the Analytical and Theoretical Categories 3 Let I be the course for grade prediction 4 identify the CC_i and PR_i # CC_i represents Course Category eitherto A_List or T_List, PR_i represents the pre-requisites of the course i 5 for each PR_i of i 6 identify the courses from CC_i with PR_i , append to grp_list # grp_list is a collection of courses from CC_i with prerequisites same as i 7 let f_list be the completed courses of grp_list 8 for each j in f_list 9 for each student_grade k in j 10 categorize the student_grade as (Low, Average, Skilled) 11 Build Models with grp_list 12 Predict Grade scale 13 Evaluate the Models Fig. 5 Algorithm for GS-PRC
Table 5 Semester grade prediction course wise PPAC
PPRC
GS-PAC
GS-PRC
OOAD
57.4
66.5
80.2
87
ACD
61.5
66.2
71.7
80.5
ALP
58.8
65.2
77
86
CNS
53.5
64
73
82
DWDM
59.8
66.2
75
82.4
in a semester. From the table it can be seen that categorizing the related courses based on pre-requisites of the course has a significant improvement with optimized performance. Figure 6 depicts the average prediction accuracy on different courses of a semester with the three machine learning models (SVM, KNN and DT). Among the three models here SVM has shown slightly better results. Categorization of courses based and Grade scaling has significant improvement in identifying the low performing students at an early stage. The method is efficient since the no. of features are reduced with categorization.
124
Y. Sri Lalitha et al.
Grade Scale - Semester Result 83.38
85 81.66
AXIS TITLE
80
79.26
75 70
75.38
73.9 70.54
65 60 DT
KNN GS-PAC
SVM GS-PRC
Fig. 6 Comparison of machine learning models
5 Conclusions and Future Work In this work initially we studied the effect of pre-Engineering performance along with first-semester performance to predict the low-performing students in Engineering First Year and noticed that admission type, rural/urban has significant impact on students performance in First Year Engineering. In subsequent year this influence has disappeared, the study is done for all streams of Engineering. In another study we proposed to predict grades course-wise and Semester-wise using two approaches called PPAC, PPRC. It is observed that the accuracy improved with PPRC, where the related courses based on pre-requisites of a course are considered as features to predict the grades. Since the objective is to identify the low performing students we used Grade scaling to categorize the grades to low, average and skilled levels. The methods here exhibited above 80% accuracy with proposed PRC approach. In the future work we would like to explore deep learning models for improved accuracy.
References 1. Tinto, V.: Research and practice of student retention: what next? J. College Stud. Retent. Res. Theor. Pract. 8(1), 1–19 (2006) 2. Carey, K. et al.: Choosing to improve: voices from colleges and universities with better graduation rates (2005) 3. Grayson, A., Miller, H., Clarke, D.D.: Identifying barriers to help-seeking: a qualitative analysis of students’ preparedness to seek help from tutors. Br. J. Guid. Counsell. 26(2), 237–253 (1998) 4. Yadav, S.K., Bharadwaj, B.K., Pal, S.: Mining educational data to predict student’s retention: a comparative study. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 10(2) (2012)
Student Performance Prediction—A Data Science Approach
125
5. Elbadrawy, S.R.S., Karypis, G.: Personalized multi-regression models for predicting students’ performance in course activities. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, LAK’15 (2015) 6. Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. (IJACSA) Int. J. Adv. Comput. Sci. Appl. 2(6) (2011) 7. Pardos, Z.A., Wang, Q.Y., Trivedi, S.: The Real World Significance of Performance Prediction in International Educational Data Mining Society (2012) 8. Zhang, L., Rangwala, H.: Early identification of at-risk students using iterative logistic regression. In: Penstein Rose, C., Mavrikis, M., Martinez-Maldonado, R., Porayska-Pomsta, K., Hoppe, H.U., McLaren, B., Luckin, R., du Boulay, B. (eds.) Artificial Intelligence in Education, vol. 10947, pp. 613–626. Association for Computing Machinery (2018) 9. Oladipupo, O.O., Oyelade, O.J.: Knowledge discovery from students’ result repository: association rule mining approach. IJCSS 4(2) (2016) 10. Kovacic, Z.J.: Early prediction of student success: mining student enrollment data. In: Proceedings of Informing Science and IT Education Conference 2010 11. Yadev, S.K., Pal, S.: Data Mining: a prediction for performance improvement of engineering students using classification. World Comput. Sci. Inf. Technol. (WCSIT) 2(2), 51–56 (2012) 12. Sri Lalitha, Y., Govardhan, A.: Semantic framework for text clustering with neighbors. In: ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of CSI, Volume II, Advances in Intelligent Systems and Computing 249, ©Springer International Publishing Switzerland 2013 December 2013 pp. 261–271. ISBN: 978-3-319-03095-1 13. Xu, J., Xing, T., van der Schaar, M.: Personalized course sequence recommendations. IEEE Trans. Signal Process. 64(20), 5340–5352 (2016) 14. Yeung, C.-K., Yeung, D.-Y.: Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In: Proceedings of the Fifth Annual ACM Conference on Learning at Scale, pp. 1–10. Association for Computing Machinery (2018). https://doi.org/ 10.1145/3231644.3231647
HARfog: An Ensemble Deep Learning Model for Activity Recognition Leveraging IoT and Fog Architectures R. Raja Subramanian and V. Vasudevan
Abstract Ambulatory monitoring has become predominant in today’s world where monitoring and recognition of activities and physiological parameters of human beings are required. Continuous monitoring of physiological parameters from elderly people, infected children and patients in remote places pave effective prediction and prevention of diseases. Activity recognition is mandatory for mentally challenged people, so that they can be cautioned during vulnerable activities. An effective and efficient Internet of Medical Things (IoMT) architecture leveraging fog computing and deep learning algorithms are required to handle such time critical tasks. The gateway layer, composed of commodity devices, contains algorithms for processing and analyzing the data extracted out from the human being via sensors. The response time for the task can be minimized by reducing the number of hits to the cloud layer, avoiding the unnecessary data transit time of readings from physiological sensors. In this paper, we have developed an ensemble deep learning paradigm to recognize activities of human beings through sensors. The paradigm is orchestrated with a fogassisted cloud computing framework, FogBus. The accuracy of the model against mHealth dataset is observed and the performance of the architecture is evaluated leveraging FogBus framework. Keywords Activity recognition · Fog computing · Ambulatory healthcare · Deep learning
1 Introduction Human Activity Recognition (HAR) plays a significant role in Ambulatory health care for continuous monitoring of patients suffering from mental illness, physical fitness and assisting aged-people in remote location. HAR also finds its use case in human surveillance and gaming. Various learning algorithms strive to infer simple R. Raja Subramanian (B) · V. Vasudevan Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Virudhunagar, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_11
127
128
R. Raja Subramanian and V. Vasudevan
and complex human activities including running, walking, sleeping or other specific motion deemed to be monitored by a remote physician. Monitoring of such activities probe the presence of illness or preserve the lifestyle of typical patient for mental and physical fitness. HAR serves as a essential framework to capture ambulatory activities, body motion and actions using the wide variety of data captured through the edge devices or sensors [1, 2]. Researches in HAR is classified based on the type of data taken for analysis from edge device. Various studies are carried out in the literature to recognize human activity from videos, with high-definition cameras fitted in patient’s location [3, 4]. With the development of Internet of Medical Things (IoMT) technologies, activity recognition framework is redesigned with edge devices fitted on to the patient body in appropriate locations including ankles, etc. These edge devices sense physiological parameters, other motions for effective ambulatory care and activity recognition. With the rapid development of smart phone usages, people are often convened with a considerable amount of processing and memory capacities through their smart phones. These mobile devices do not have the required resources to perform the analytics tasks for activity recognition. Hence a framework is required to draw the raw data from sensor devices at the edge, store and analyze the data and provide recognition results back to the applications including mobiles, laptops, among others. The process becomes critical for sensitive applications pertaining to medical analysis like HAR, where the response time should be as small as possible. Analytics requires high processing and memory power, as the sensed information need to be offloaded and stored in cloud. The stored data need to be analyzed using various learning algorithms. The effectiveness of the algorithm is significant requirement, as HAR is the critical medical application, where wrong results will have adverse effects. Hence complex learning strategies employing multiple algorithms are usually involved. The time complexity of the algorithm is the next influential factor, as the result, though being accurate, obtained after deadline, will be insignificant for the user, especially in time-critical applications like HAR. To handle such velocity of data and aid rapid analysis, a cloud-only architecture will not be sufficient. Hence a gateway leveraged architecture encapsulating edge layer with fog nodes is required. These fog nodes need to handle and process requests with minimum hits to cloud. The insight layer in IoT architecture with various learning algorithms, inculcated to act on the data, need to be activated. The key contributions of our work include: i. ii. iii.
We propose a cloud-fog interoperable framework for IoMT applications with special focus on HAR. A deep learning model to effectively perform human activity recognition task. Performance evaluation of the proposed application with fogbus.
The rest of the paper is modeled as follows: Sect. 2 describes the background and related works pertaining to HAR. Section 3 depicts the proposed architecture for HAR. The empirical evaluation of the proposed architecture is depicted in Sect. 4, followed by Conclusion and scope for further research.
HARfog: An Ensemble Deep Learning Model …
129
2 Related Work Advancements in IoMT applications typically reflect in the success of various medical scenarios, especially in elderly-aged cases. In IoMT leveraged health care systems for activity recognition, various researches are carried out with appropriate sensors at the edge. Researchers [5] calibrated a model to measure blood pressure and heart rate and send it to hospital, at a specified rate per day. This aids the physician to feel confident on the patient condition from a remote location, and plan the necessary treatment changes accordingly. Video sensors are used in [4] to aid visually challenged people, in their daily lives. A model is generated to extract features from video frames and direct the user based on the analysis. Inference of human activity or surrounding environment via videos is typically complex in both time and space grounds. And it is difficult to set-up a confined environment for such people with various cameras. Various health care systems are designed in Europe and Japan, leveraging sensors fitted onto the body of elderly or physically challenged to recognize activity or extract physiological parameters. It is also tedious for the patient to get fitted with multiple sensors onto the body. Hence light-weight trouble-free sensors are set on to the human body in prominent locations. These sensors intend to collect data continuously and send the same to analysis engine. The responses for the data should be accurate and rapid. To aid this, researchers concentrate on the use of Deep learning paradigms for sensor data analysis [6]. Unlike the classical machine learning algorithms including k-nearest neighbors, support vector machines, among others, deep learning intend to apply efficiently for extensive applications working on top of processing images, speech, text data [7–9]. With the presence of various categories of deep learning algorithms including generative, discriminative and hybrid, there is a need to choose an appropriate algorithm for the activity recognition problem [10, 11]. Generative algorithms present sensor data dependencies in graphical notations leveraging nodes and arcs. Generative models work on top of pre-trained models acting on unlabeled data. The statistical distributions derived out of the unseen data are subject to possible tuning by labelled datasets. The resultant distributions are subjected to classification algorithms [5]. Gaussian Mixture, Autoencoders, Restricted Boltzmann are common deep generative models. Discriminative models on the other hand, work on labelled datasets and turn out to be a robust distinguisher for classification purposes. Convolutional Neural Network, Recurrent Neural Network, Deep Neural models are competitors under discriminative deep model category. The hybrid of generative and discriminative models employs the effective features outputs of generative models on to efficient discriminant models of discriminative models [12]. Hybrid algorithms strive to provide better results compared to atomic models. The sensors are fit onto the body of the patient in appropriate location depending on the nature of activity or vital sign to be inferred. Ubiquitous technologies invented smart watches, smart shirts, smart bands affixed with sensors and kept in close proximity with human body. These edge devices embodied with inertial sensors, health
130
R. Raja Subramanian and V. Vasudevan
sensors or environmental sensors provide useful information for activity recognition. The typically used HAR sensors, inertial sensors, are accelerometer sensors, gyroscope, magnetometer, which provide data pertaining to motion. Recording vital signs of human body require health sensing component composing Electrocardiogram (ECG), Electromyogram (EMG), Electroencephalograph (EEG), among others. Temperature, pressure, humidity can normally be measured leveraging the corresponding Environment sensors. In this research, we use inertial sensors to detect human motion activities including sitting, standing, walking, jogging, climbing up/down the stairs and health sensor ECG to detect the abnormalities created in body due to the motion. Subsequent analysis of the retrieved data can be performed using handcrafted features subjecting to statistical measures or deep learning-based feature extraction techniques, among others. Handcrafted features may not provide essential features for feature tracking and recognition. They provide a statistical model with deterministic outputs. On the other hand, feature extraction using deep learning algorithms provide various depth of features, with respect to the hidden layers. We propose an architecture leveraging deep learning models for extracting features and subjecting the features to appropriate classification algorithms. Various deep learning algorithms including CNN, RNN and pretrained models are used for activity recognitions. The extracted features can be subjected to dimensionality reduction algorithm and subsequently classified using Artificial Neural Networks, Support Vector Machines (SVM), Clustering techniques, Random forest algorithms, among others. On top of the accuracy of these feature extraction and classification algorithms, the efficiency of the model will be the main factor of focus in such ambient and ambulatory applications. Hence it is required to model an architecture that support faster responses with such deep learning algorithms for feature extraction and classification. A fog assisted learning architecture is required to develop a monitoring and inference model with better response times.
3 Background Technologies A framework for modeling fog-assisted cloud models is the FogBus [13]. The FogBus compose four QoS supports including security, resource allocation/management, cloud management, data storage. The FogBus can be integrated with the smart watch/ band consisting of inertial and health sensors pertaining to HAR. The data extracted out from the edge is transferred to the fog worker nodes. The group of worker nodes managed by the cluster head monitors resource management and job allocation. Security and data integrity is ensured leveraging block chain technologies coupled with encryption algorithms. HTTP RESTful APIs are leveraged for communication among the fog nodes. The storage and processing are centered at the Aneka Platform. The platform provides APIs for seamless deployment of virtual resources as fog nodes onto the cloud. The cloud platform is employed through dynamic scheduling that provides the means to effectively use the local resources. Thus, the commodity systems leveraged, close to the edge, in the gateway layer, processes data and provides
HARfog: An Ensemble Deep Learning Model …
131
result. Thus, decreasing the response time, by avoiding the data transit time to the cloud. The virtual machines are usually obtained as Infrastructure-as-a-service framework and act as resources to process the data. Among the programming models supported by Aneka [14] including Distributed Thread model, Parameter Sweep model, Bag of Tasks model and Map Reduce Framework model, we leverage Bag of Tasks model for our system architecture. Fog nodes are leveraged via FogBus and Cloud integration is sought through Aneka.
4 System Architecture for HAR We propose a Cloud-fog interoperable model, named as HARfog, composing the edge sensors acquiring data from human, the set of gateway nodes with deep learning models and the central cloud. The architecture, as shown in Fig. 1, composes of following modules:
4.1 Sensors at the Edge The HARfog requires the inertial sensors including the gyroscope, the magnetometer and the ECG sensor. The sensor devices are fitted onto the body of the patient in appropriate locations, or can be deployed through a smart watch or smart band. In this research, we have used external placement of sensors at ankles and hands. The sensors collect information and transfers the same to the gateway layer.
Fig. 1 HARfog architecture
132
R. Raja Subramanian and V. Vasudevan
4.2 Gateway Layer The Gateway layer includes mobile phones, tablets or laptops acting as fog nodes, that collect the sensor data and transfers them to the worker/cluster head nodes. The cluster head receives the job requests and input data from gateway fog nodes. The cluster head is responsible for efficiently scheduling the job to appropriate workers. An arbitration module periodically monitors the load at each worker node and records the usage statistics for effective resource utilization. The cluster head is also augmented with encryption algorithms and secure channels leveraging block chain technology. Decision of job allocation to each worker is decided on run time. The worker node is orchestrated with Raspberry Pi systems. These commodity systems perform the job allocated by the cluster head. The worker nodes comprise of the deep learning algorithm for HAR. The learning algorithm is used to process the data and analyze the input data.
4.3 Cloud Layer The fog-assisted model achieves better response times, when all job requests hit at the gateway nodes. When gateway nodes get overloaded or when the commodity systems could not handle/reach the required data, the job is transferred to the central cloud data center.
4.4 Deep Learning Module The processing and analysis of the dataset is performed using deep learning paradigm. The model is first trained on the sensor data in mHealth dataset. The data is divided into training, testing and validation in the ratio of 70:20:10 respectively. The trained model is made available in all the nodes. During testing, based on the jobs allocated to each worker node, the processing and analysis takes place at the node. We use an ensemble model and combine the results of each node through a maximum voting scheme. This technique improves the overall accuracy of the model. It is empirically evaluated that the ensemble model provides better accuracy than the atomic models. But the ensemble model is subjected to higher response times and network overheads.
5 Implementation and Evaluation The deep learning algorithm is implemented with SciKit Learn package of python. The network composes input layer with 23 features, output layer with 12 features, 3
HARfog: An Ensemble Deep Learning Model …
133
Fig. 2 a Training accuracy, b testing accuracy, with number of edge nodes
hidden layers. Each of the hidden layers comprise of 20 fully connected layers. The output layer is a softmax layer with N outputs, where N corresponds to the number of activities to be classified. We leverage on Adam optimizer and ReLu activation function. The HARfog model is evaluated using FogBus framework. The model accuracy, latency and execution time are measured. The mHealth dataset [15, 16] consists 23 attributes of sensor information classified into 12 activities: standing, sitting, lying down, walk, climbing stairs, waist bends forward, frontal arm elevation, knee bending, cycling, jogging, running, jump front and back. The data present in each node uses the knowledge base acquired during training. The predictive analytics is performed parallelly by multiple nodes. The bagger combines the results among the nodes through maximum voting scheme. The training and testing accuracies of the prediction model, with respect to the number of edge nodes is depicted in Fig. 2a, b respectively. The accuracy is calculated as the number of activities calculated as the percentage of total activities correctly classified. It is evident that as the training time decreases as the number of nodes increases. The distributed training knowledge across the edge nodes decreases the testing accuracy. The timing parameters, latency and execution time, arbitration time and jitter, for different fog scenarios is observed. Five scenarios including Master only, 1 edge node, 2 edge node, ensemble mode and cloud are observed. It is evident from Fig. 3a that the latency of the model is more with cloud execution, due to the data transit time from edge to cloud. Data processed close to the edge, master node, has low latency. The latency increases linearly with increase in number of nodes and ensemble models. The Execution times with the different fog scenarios are depicted in Fig. 3b. It is evident that data processing at the cloud layer happens at a faster rate compared to the executions in the commodity devices at the gateway. Hence there need to be a trade-off between the gateway node usage a response time and processing time, with respect to data transit time. For data hit at the gateway, the response time is lowered, by avoiding data transit time. For data misses, edge nodes or cloud is sought. This increases the response time with respect to data transit time. Transit time is maximum for cloud execution, but data processing happens at the faster rate. The efficiency of the model increases when the data is processed close to the edge.
134
R. Raja Subramanian and V. Vasudevan
Fig. 3 a Latency, b execution, in different fog scenario
Figure 4a depicts the arbitration time with the different fog scenarios. It is obvious that the arbitration rate is less while assigning jobs to the Master or Cloud directly. In other scenarios, there is a need to perform load balancing among the nodes, which has its effect in arbitration time. Jitter depicts the response time variation among consecutive job requests. Being a time-critical application, jitter turns out to be a significant performance measure. From Fig. 4b it is evident that in Master only scenario, jitter is higher, as the single node is responsible for resource allocation, arbitration, security and prediction. Jitter in two edge and single edge scenarios differ with respect to workloads. Ensemble and Cloud scenarios have a relatively higher jitter. Other works that proposed the architectures for HAR [1, 2, 5, 17–19] leverage primarily on learning paradigms confined to the data under study. The evaluation of the HAR model with respect to timing constraints is scarce, to the best of our knowledge. HARfog is able to perform analytics on the data for time critical tasks, as it leverages fog-assisted cloud computing framework. With ensemble algorithm and multiple edge nodes support, accuracy of the model is higher than the existing state-of-the-art HAR models.
Fig. 4 a Arbitration time, b jitter, in different fog scenario
HARfog: An Ensemble Deep Learning Model …
135
6 Conclusion Ambulatory healthcare is a broad area of research. In this paper, we focused on activity recognition as a use case to elderly patients, children and mentally challenged people. The deep learning paradigm is orchestrated with a fog-assisted cloud architecture leveraging fogbus. The proposed HARfog architecture integrates ensemble deep learning at the edge nodes and deploy the same as a real-life application for activity recognition. Usually deep learning algorithm require high computation power to get better accuracies. Hence, we analyzed the timing parameters: latency, execution times, arbitration time and jitter, in addition to training and testing accuracy, for different fog scenarios. HARfog turns out to be an efficient architecture for performing activity recognition. The HARfog model can effectively be applied for other use cases including monitoring and analysis of exercise activities suggested by physiotherapist for musculoskeletal problems. Hence the patients can easily be monitored by physicians periodically from the remote place. We focus on implementing the model for a couple of exercises leveraging HARfog. Further extension of the research includes analyzing the cost effectiveness of the model.
References 1. Cao, L., Wang, Y, Zhang, B, Jin, Q, Vasilakos, A.V.: GCHAR: an efficient group-based context– aware human activity recognition on smartphone. J. Parallel Distrib. Comput. (2017) 2. Ordonez, F.J., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 115 (2016) 3. Cichy, R.M., Khosla, A., Pantazis, D., Torralba, A., Oliva, A.: Deep neural net- works predict hierarchical spatio-temporal cortical dynamics of human visual object recognition. arXiv:1601. 02970 (2016) 4. Onofri, L., Soda, P., Pechenizkiy, M., Iannello, G.: A survey on using domain and contextual knowledge for human activity recognition in video streams. Ex- Pert Syst. Appl. 63, 97–111 (2016) 5. Mamoshina, P., Vieira, A., Putin, E., Zhavoronkov, A.: Applications of deep learning in biomedicine. Mol. Pharm. 13, 1445–1454 (2016) 6. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 7. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015) 8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997) 9. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 10. Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I.J., et al.: Unsupervised and transfer learning challenge: a deep learning approach. ICML Unsupervised Transf. Learn. 27, 97–110 (2012) 11. Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal Inf. Process. 3, e2 (2014) 12. Sarkar, S., Reddy, K., Dorgan, A., Fidopiastis, C., Giering, M.: Wearable EEG-based activity recognition in PHM-related service environment via deep learning. Int. J. Prognostics Health Manage. 7, 10 (2016)
136
R. Raja Subramanian and V. Vasudevan
13. Tuli, S., Redowan, M., Shikhar, T., Rajkumar, B.: FogBus: a blockchain-based lightweight framework for edge and fog computing. J. Syst. Softw. 154, 22–36 (2019) 14. Vecchiola, C., Xingchen, C., Rajkumar, B.: Aneka: a software platform for .NET-based cloud computing. High Speed Large-Scale Sci. Comput. 18, 267–295 (2009) 15. Banos, O., Garcia, R., Holgado, J.A., Damas, M., Pomares, H., Rojas, I., Saez, A., Villalonga, C.: mHealthDroid: a novel framework for agile development of mobile health applications. In: Proceedings of the 6th International Work-Conference on Ambient Assisted Living an Active Ageing (IWAAL 2014), Belfast, Northern Ireland, 2–5 (2014) 16. Banos, O., Villalonga, C., Garcia, R., Saez, A., Damas, M., Holgado, J. A., Lee, S., Pomares, H., Rojas, I.: Design, implementation and validation of a novel open framework for agile development of mobile health applications. Biomed. Eng. Online 14(S2:S6), 1–20 (2015) 17. Henry, F., Ying, W., Mohammed, A., Uzoma, R.: Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Syst. Appl. 105, 233–261 (2018) 18. Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., Amirat, Y.: Physical human activity recognition using wearable sensors. Sensors 15, 31314–31338 (2015) 19. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In International Workshop on Ambient Assisted Living, pp. 216–223, Springer (2012)
Performance Evaluation and Identification of Optimal Classifier for Credit Card Fraudulent Detection Arpit Bhushan Sharma and Brijesh Singh
Abstract With the increase in the usage of the credit card by the people, the transaction done by credit cards has increased dramatically in the world. With this drastic increase in the usage of credit cards, the number of fraudulent also increases enormously and it is very difficult to identify the difference between a fraudulent transaction and normal transaction. American Express-issued credit card to 53.7 Million users, however, recorded Rs. 73,380 fraud in a year on average. The credit card fraudulent causes serious losses to the individual and the organization. The credit card issuing companies offer credit card fraud detection applications to the users and individuals for their safety. This paper focuses on the different algorithms used for credit card fraud detection and to find the optimal algorithm for classification of credit card fraud detection. It uses Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machine, eXtreme Gradient Boosting, Random Forest and computes the accuracy, AUC-ROC values for all the classifiers. Keywords Credit card · Data mining · Fraud detection · Machine learning
1 Introduction Machine learning and artificial intelligence are the most common tools used in the new generation for finding out fraudulent, diagnosis, and optimal solution for every problem [1]. The present generation uses solutions that work on the methodologies of using algorithms and mathematical computations of libraries by working on datasets, which is very difficult for humans to work on them [2]. Machine learning techniques mainly comprises of two main categories: supervised learning and unsupervised learning. The credit card fraud detection can be done in both ways and can only be decided by proper study of datasets. The supervised learning requires earlier categorization to abnormalities whereas the unsupervised learning technique did not require any categorization [3–5]. The detection of fraud in credit card is a vast A. Bhushan Sharma (B) · B. Singh Department of Electrical and Electronics Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_12
137
138
A. Bhushan Sharma and B. Singh
concept of study which is included in the main study of financial frauds and it is a topic of attraction by increasing attention from the scientific community and the rising costs that they generate for the system, reaching billions of dollars and by fraudulent in credit card it yearly losses a percentage loss of revenues equal to the 1.4% of online payments [6–9]. The main categories on which Credit Card fraudulent can be classified are as follow: a. b. c.
Physical credit cards lost or stolen by another person and used by him. Credit card number is stolen and used for shopping. Credit card skimming, where the data from a card magnetic stripe is electronically copied onto another card.
There are several ways for credit card fraud and this paper focuses on the basic classification algorithms which are used for credit card fraud detection and found the optimal classification algorithm for fraudulent detection [10–14]. This paper shows the study of different classification models for credit card fraud detection by the performance and accuracy of the system. The Sect. 2 focuses on the literature view of credit card fraud detection whereas the Sect. 3 shows the detailed study of all the different algorithms used for accurate measurement. The Sect. 5 contains a detailed study of all test results and their analysis [15–20].
2 Data Analysis Methodology and Classifier Selection Credit card fraud detection has drawn a lot of attention from all individuals across the globe. There are different techniques which came into existence by studying data of credit card fraud detection with special emphasis on neural network, and data mining. The first attempt for the detection of fraud in credit card was done by Ghosh and O’ Reilly with a neural network. Ghosh and O’ Reilly built an Artificial Neural Network system that was accomplished on a sample of large labeled transactions of credit card. Different classifiers for training neural networks have been made to classify fraud and normal transactions. One of the major used classifiers which were competent on the correlation of all predictions with the help of base classifiers was the Meta classifier. The meta-learning system permits financial organizations to reveal their models of falsified transactions by swapping classifier agents. All credit card contains a different sequence at their magnetic stripe which gives the security to every individual. A sequence is defined as an ordered list of elements. Figure 1 shows that S is a subsequence of a sequence S, this shows that S can be derived from S by removing a little element from S without troubling the comparative location of the other elements. Consider S1 and S2 be the two given sequences and S be a subsequence of both S1 and S2 , then S is called a common two-stage Credit Card Fraud Detection (CTS-CCFD). A sequence alignment technique is used for arranging two or more than two sequences to measure their similarity index. Figure 1 shows the two main types of Sequence alignments namely: Global alignment and
Performance Evaluation and Identification …
139
Fig. 1 Global alignment and local alignment of credit card magnetic stripe
local alignment. Here the sequence S1 = is aligned with the sequence S2 = . Upon the analysis of various classification models being adopted throughout the globe, the researchers have found many problems regarding fraudulent discovery and all have mentioned a basic common problem- the accuracy of all the models on reallife data. Real-life data is a major issue because of data sensitivity and privacy issues. There are many papers issued regarding the Imbalance datasheet or skewed distribution of a dataset. The main reason behind this is having quite fewer frauds when compare to non-frauds in the transaction dataset. According to the paper, the issue occurs due to a real transaction looks exactly the fraudulent transaction done by any individual. The credit card transaction data have categorical values. In case machine learning algorithms do not support the categorical values. The most common issue that marks financial fraud discovery is feature selection. Lack of adaptability occurs when algorithms are visible to new fraud patterns than normal contacts. There are different models implemented for credit card detections. The logistic regression (LR) has been used to address the classification problems. The instances from distributions are approximated by Gaussian Mixture Model (GMM). To get the impact in economic value thoughtfulness evaluation has been done. Another model is known as RiskBased Ensemble (RBE) which can manage the data comprising of problems and give out maximum results. The credit card Fraud detection system introduced in paper is built to find out the optimal algorithm for classification of fraud by processing the large datasets. The proposed system was able to overcome all the challenges. The given dataset has all the features which are necessary to accomplish a machine learning model. It contains 31 features in total out of which 28 are anonymized and labeled as V1, V2…V28. The remaining features are time, amount and type of transaction. We have taken time and amount into consideration for fraudulent detection as shown in Figs. 3 and 4 respectively. Figure 3 shows the distribution graph feature for a time as time is one of the fundamental quantities; it is mostly used as an attribute for finding the correlation over the distribution. Figure 4 shows the distribution graph feature for monetary and value function for finding the correlation over the distribution for given two out of three instances. Figure 2 shows the classification steps of the overall functioning of the fraud detection. The figure illustrates the churning
140
A. Bhushan Sharma and B. Singh
of dataset from optimizing it to use a classifier on it and then making it suitable for prediction of target value. There are many different libraries of python for data visualization of which one of them is matplotlib. The matplotlib gives the visualization of a scatter plot of normal transactions and fraudulent transactions, as shown in Fig. 5. The graph shows that out of total transactions, there were 99.827% of normal transactions and 0.173% transactions were fraudulent. Figures 6 and 7 show the same visualization of normal
Fig. 2 Classifier steps
Fig. 3 Distribution of time feature
Performance Evaluation and Identification …
Fig. 4 Distribution of monetary and value function
Fig. 5 Mat plot of normal transactions and fraudulent transactions
and fraudulent transactions in histogram form, respectively.
141
142
A. Bhushan Sharma and B. Singh
Fig. 6 Fraud versus normal transactions count
Fig. 7 Fraud versus normal transactions count by maximizing fraud domain values
3 Data Preparation and Performance Evaluation of Classification Algorithms There are 31 features and 284807 entries total in the credit card dataset. The main features which are taken as attributes for training and testing are amount and time.
Performance Evaluation and Identification …
143
Fig. 8 The Heatmap and correlation of amount and time
The time and amount play a vital role in the consideration for training and testing over the distribution for an instance. The correlation gives a data distribution on heatmap as shown in Fig. 8. Correlation is used to describe the linear relationship between two continuous variables (e.g., time and amount). In general, correlation tends to be used when there is no identified response variable. It measures the strength and direction of the linear relationship between two or more variables. Here we have used time and amount foe calculations. It helps to find out whether the correlation is positive or negative. The data is divided into training and testing cases where the shape of training data and testing data is shown in Tables 1 and 2 respectively. Table 1 shows the matrix dimensions of training data and Table 2 shows the matrix shape of testing data. Whereas, the Table 3 shows the shape of ‘X’ variable of CSV dataset for training and validation and Table 4 shows the shape of ‘Y’ variable of CSV dataset for training and validation. The data is then after segregated into high positive and high negative correlation indices Table 1 Data type
Data type
X
Y
Training data matrix shape
256,028
31
Testing data matrix shape
28,779
31
144 Table 2 Shape of ‘X’ and ‘Y’
A. Bhushan Sharma and B. Singh Shape
Training
Validation
X
492, 30
124, 30
Y
492
124
Table 3 Different classification algorithms with Mathew coefficient correlation Classification algorithm
Normal transaction
Fraudulent transaction
MCC ratio
Logistic regression (LR)
0.971878
0.022339
43.505886
Linear discriminant analysis (LDA)
0.965681
0.021112
45.740858
K-nearest neighbor (K-NN)
0.963492
0.026293
36.644430
Classification and regression tree (CART)
0.927414
0.025415
36.490812
Support vector machine (SVM)
0.973708
0.027843
34.971375
Extreme gradient boost (XGB)
0.967946
0.026500
36.52626
Random forest (RF)
0.973967
0.020308
47.959769
Naïve Bayes
0.977306
0.083130
11.756329
Table 4 Amount details of fraudulent transaction
Count
492.000
Mean
122.211
Standard deviation
256.683
Minimum
0.000
25%
1.000
50%
9.250
75% Maximum
105.890 2125.870
and then it is again visualized with high-level representation by using T-Distribution Stochastic Neighbor Embedding (t-SNE) algorithm for training the neural network. The network is trained for fraud detection and then after finding the correlation between the normal transactions and fraud transactions, the different classification algorithms are trained to find the accuracy between all the algorithms. The spot classification algorithms used for training fraud detection are:
Performance Evaluation and Identification …
145
3.1 Logistic Regression Previous researchers proposed the performance of models based on Multiple Linear Regression Analysis (MLRA). Multiple independent variables are related to a dependent variable. The logistic regression is a supervised learning method for the classification algorithm. The dependent variable or target value cannot have more than two possible cases. The method classifier is borrowed by machine learning’s alternative field of statistics. It uses the go-to method for binary classification problems with two class values.
3.2 Linear Discriminant Analysis (LDA) LDA consist of statistical properties of data which is calculated for each class. For every single input variable (x), it is the variance of the variable class and for multiple variables, it is calculated over the multivariate Gaussian, which means the covariance matrix.
3.3 K-Nearest Neighbors (K-NN) K-NN or K-Nearest Neighbor is a simple and effective method for creating predictions and it is also used by most people to get the prediction of complex datasets. The KNN records a new form of training dataset. From these equal neighbors (the value of K-folds) a summarized prediction is made and similarity among records can be measured in many ways.
3.4 Classification and Regression Trees (CART) The Classification and regression tree model are the same as the binary model which uses binary tree algorithm and data structure to find out the optimal value of the system, each root node represents a single input variable in it. The prediction is done by an output variable which is the leaf node of the binary tree. Entropy(S) = (i = 1, c) − Pi Log2 Pi Gain(S, A) = Entropy(S)−(v ∈ Values(A))
|Sv| entropy(Sv) |S|
146
A. Bhushan Sharma and B. Singh
Fig. 9 XGBoost classifier
3.5 Support Vector Machine (SVM) SVM or Support vector machine is the most used supervised methods in the machine learning algorithm which is used for regression and classification. In SVM, we plot each data item as n-dimensional space, and each feature here represents each coordinate.
3.6 eXtreme Gradient Boosting (XGBoost) XGBoost works on the implementation of decision trees designed for speed and performance and boosted by gradient framework. In the prediction of the target value for the system, it involves unstructured data of text and images for which artificial neural networks tend to perform all other frameworks. It gains popularity during recent times for its high accuracy and powerful performance for algorithms. The use of the XGBoost algorithm in machine learning is given in Fig. 9.
3.7 Random Forest (RF) Random Decision Forest is an accumulating method for regression and classification tasks that are operated by constructing a multilevel of decision trees at the time of output with regression and classification methods. It has high efficiency than other classifiers and suitable for the overfitting of the training set. Random forest ranks the importance of variables in a regression or classification problem in a natural way can be done by Random Forest.
Performance Evaluation and Identification …
147
Fig. 10 The box-whiskey plot shows the features with high positive correlation
P(x|c)P(c) , P(x) P(c|x) = [P(x1 |c) × P(x2 |c) × P(x3 |c) × P(x4 |c) × . . . × P(xn |c)]
P(c|x) =
× P(c) Here, P(c|x) = Posterior Probability, P(x) = Predictor Prior Probability, P(c) = Class Prior Probability, P(x|c) = likelihood (Figs. 10, 11 and 12).
4 Result and Discussion The test results of the various algorithms have been given in Table 3. The Mathew Coefficient Correlation is used in machine learning to measure the quality, efficiency, and accuracy of a binary classification model. The accuracy is measured by using a confusion matrix which tells the implemented model is up to the mark and much more efficient than other models. All the parameters of fraudulent transactions manipulated by Poisson distribution have been given in Table 4. This table describes the amount details of fraudulent transactions with all, the parameters calculated for credit card fraudulent data. The data values calculated from the count of 492 with the mean value of 122.211 and standard deviation of 256.683 as shown in Fig. 13. This figure shows that the Poisson distribution of fraudulent transactions with mean of 122.211 and standard deviation is 256.683. However, the Fig. 14 shows that the Poisson distribution of normal transactions with mean of 88.291 and standard deviation is 250.105. In similar way, the Fig. 15 shows the performance of classification algorithms by ROC-AUC Curve plot and computes which is better in the accuracy. The Random forest classifier has high ROC and AUC as a comparison to other classifiers. The MCC is also maximum for the Random Forest classifier
148
A. Bhushan Sharma and B. Singh
Fig. 11 Architecture
Fig. 12 The box-whiskey plot shows the features with high negative correlation
algorithm. The accuracy of the model by using the Confusion matrix of fraud and normal transactions is shown in Fig. 16. It is metric of evaluating the models and helps in predicting our model correction by calculating True positive, True negatives, False positive and False negatives by using given mathematical expression. True Positive (TP) is number of transactions that were fraudulent and were also classified as fraudulent by the system. True Negative (TN) is number of transactions that were legitimate and were also classified as legitimate. False Positive (FP) is
Performance Evaluation and Identification …
149
Fig. 13 Poisson distribution graph for fraudulent transactions
Fig. 14 Poisson distribution graph for normal transactions
number of transactions that were legitimate but were wrongly classified as fraudulent transactions. False Negative (FN) is number of transactions that were fraudulent but were wrongly classified as legitimate transactions by the system. The various metrics for evaluation are: i.
ii.
iii.
iv.
Accuracy is the fraction of transactions that were correctly classified. It is one of the most powerful and commonly used evaluation metrics. Accuracy (ACC)/Detection rate = (TN + TP)/(TP + FP + FN + TN) Precision also known as detection rate is the number of transactions either genuine or fraudulent that were correctly classified. Precision/Detection rate/Hit rate = TP/TP + FP Sensitivity measures the fraction of abnormal records (the records that have maximum chances of being fraudulent) correctly classified by the system. True positive rate/Sensitivity = TP/TP + FN Specificity measures the fraction of normal records (the records that have minimum chances of being fraudulent) correctly classified by the system. True negative rate /Specificity = TN/TN + FP
150
A. Bhushan Sharma and B. Singh
Fig. 15 Comparison of classification algorithms
v.
False Alarm rate measure out of total instances classified as fraudulent how many were wrongly classified. False Alarm Rate = FP/FP + TN Thus, Accuracy = (T P + T N )/(T P + T N + F P + F N ) Sensitivity = T P/T P + F N Specificity = T N /T N + F P Precision = T P/T P + F P
The positive correlation is a relationship between two variables where if one variable increases, the other one also increases. A positive correlation also exists in one decrease and the other also decreases as shown in Fig. 10. The Fig. 11 show the architecture of the model for prediction of optimal algorithm goes with the initialization of dataset by visualization and then applying classifier on it to validate testing. The performance analysis of the system is done for its accuracy measurement. The Fig. 11 shows the negative correlation. The heavy negative correlation is represented by a value of −1. The dataset which has perfectly positive or negative attributes,
Performance Evaluation and Identification …
151
Fig. 16 Tree plot used for random forest for credit card fraud detection
then there is a high chance that the performance of the model will be impacted by a problem called Multicollinearity. It is used for classification problems, though with classification problems the models must be capable of producing posterior probabilities. The model outputs are combined with a uniformly weighted average. The squared error is augmented with a penalty term which considers the diversity of the ensemble. The accuracy of Random forest is maximum in the system with highest fata rate percentage shows it as one of the best classification models for credit card fraudulent detection. The algorithm is much more implemented because of lowest false alarm rate (Tables 5, 6 and 7). The performance matrix shown in the table helps to detect different parameters of the classifiers for finding the best model classifier. The table shows the parameters like accuracy, sensitivity, specificity and precision. The parameter with highest accuracy, highest precision, highest sensitivity and lowest specificity is the best classifier for
152
A. Bhushan Sharma and B. Singh
Table 5 The accuracy, data rate and false alarm rate of all classifiers
Technique
Accuracy (%)
Data rate (%)
False alarm rate (%)
SVM
94.65
85.45
5.2
Random forest
99.71
99.68
0.12
KNN
97.15
96.84
2.88
Decision tree
97.93
98.52
2.19
Logistic regression
94.7
77.82
2.9
Table 6 Performance matrices Metrics
Classifier Logistic regression
Support vector machine
Decision tree
Random forest
Accuracy
0.977
Sensitivity
0.975
0.975
0.955
0.986
0.973
0.955
0.984
Specificity Precision
0.923
0.912
0.978
0.905
0.996
0.996
0.995
0.997
Table 7 Confusion matrix format
Actual/predicted
Normal transaction
Fraudulent
Normal transaction
True positive
False positive
Fraudulent
False negative
True negative
the detection of frauds. The algorithm which satisfies all the parameters is Random forest with Accuracy of 0.986, sensitivity of 0.984, precision of 0.997 and specificity of 0.905. The confusion matrix is used for finding the accuracy of the classifier much accurately and helps to detect which algorithm is more suitable for the detection of fraudulent transactions is shown in Fig. 17. The format of confusion matrix is shown in the table. The confusion matrix is used for finding the accuracy of the classifier much accurately and helps to detect which algorithm is more suitable for the detection of fraudulent transactions. The algorithm for implementation of the model to detect fraud is given in Fig. 15 by using a random forest classifier and Fig. 16 shows a tree plot of samples and results for credit card fraud detection. The Table 8 describes the amount details of normal transactions with all, the parameters calculated for credit card fraudulent data. The data values calculated from the count of 284315 with the mean value of 88.291 and standard deviation of 250.105. The flowchart of Fig. 18 shows the implementation idea of algorithm for future model.
Performance Evaluation and Identification …
153
Fig. 17 Confusion matrix for model accuracy Table 8 Amount details of normal transaction
Count Mean Standard deviation
284,315.000 88.291 250.105
Minimum
0.000
25%
5.650
50%
22.000
75%
77.050
Maximum
25,691.160
154
A. Bhushan Sharma and B. Singh Legal Pattern database
Incomng Transaction
Frequent item set mining
Random Forest
Fraud Pattern database Random Forest Output
Legitimate Transaction
Allow Transaction
Customer Transaction Database Fraudulent Transaction
Alarm to Bank
Fig. 18 The flowchart for credit card fraud detection using random forest
5 Conclusion Credit card fraud detection is one of the major and complex issues which require a lot of studies and research to get a substantial result and for planning it, machine learning plays a vital role in it. The Random forest model of machine learning gives higher accuracy for prediction and validation of results with maximum accuracy. The model was 99.827% accurate with 284315 non-fraud transactions and 492 fraud transactions.
References 1. Suzuki, K.: The computer analysis with discriminant function on the gastric ulcer. Gastroenterelogia Japonica 5(2), 149 (1970). https://doi.org/10.1007/BF02775263 2. Goy, G., Gezer, C., Gungor, V. C.: Makine Ö˘grenmesi Yöntemler I file Kredi Kartı Sahtecil i˘gi QLQ T despite. In: 2019 4th International Conference on Computer Science and Engineering (UBMK), pp. 350–354. https://doi.org/10.1109/UBMK.2019.8906995 3. Duman, E., Sahin, Y.G.: Detecting credit card fraud by decision trees. Int. Multiconf. Eng. Comput. Sci. I (2011) 4. Kundu, A., Sural, S., Majumdar, A.K.: Two-Stage Credit Card Fraud Detection Using Sequence Alignment, pp. 260–275. Springer, Berlin (2006) 5. Zanin, M., Romance, M., Moral, S., Criado, R.: Credit card fraud detection through parenclitic network analysis. Complexity (2018). https://doi.org/10.1155/2018/5764370 6. Benson Edwin Raj, S., Annie Portia, A.: Analysis on credit card fraud detection methods. In: 2011 International Conference on Computer, Communication and Electrical Technology, ICCCET 2011, pp. 152–156 (2011). https://doi.org/10.1109/ICCCET.2011.5762457
Performance Evaluation and Identification …
155
7. Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Proc. Comput. Sci. 48(C), 679–685 (2015). https://doi.org/10.1016/j.procs. 2015.04.201 8. Thennakoon, A., Bhagyani, C., Premadasa, S., Mihiranga, S., Kuruwitaarachchi, N.: Real-time credit card fraud detection using machine learning. In: 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 488–493 (2020) 9. Pamina, J. et al.: An effective classifier for predicting churn in telecommunication 11(2010), 221–229 (2019) 10. Biau, G., Scornet, E.: A random forest guided tour. Test 25(2), 197–227 (2016). https://doi. org/10.1007/s11749-016-0481-7 11. Khalili, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest (2011) 12. Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., Si, Y.: A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 6, 21020–21031 (2018). https://doi.org/10.1109/ACCESS.2018.2818678 13. Ji, C., Zou, X., Hu, Y., Liu, S., Lyu, L., Zheng, X.: XG-SF: an XGBoost classifier based on shapelet features for time series classification. Proc. Comput. Sci. 147, 24–28 (2019). https:// doi.org/10.1016/j.procs.2019.01.179 14. Dhaliwal, S.S.: Effective Intrusion Detection System Using XGBoost (2018). https://doi.org/ 10.3390/info9070149 15. Thomas, L.T.M.P.E.: Machine learning—XGBoost analysis of language networks to classify patients with epilepsy. Brain Inform. 4(3), 159–169 (2017). https://doi.org/10.1007/s40708017-0065-7 16. Dong, H.: Gaofen-3 PolSAR Image Classification via XGBoost and Polarimetric Spatial Information, pp. 1–20 (2018). https://doi.org/10.3390/s18020611 17. Google: “Colab Python,” Google [Online]. Available: www.colab.research.google.com 18. Rai, A.: Geek for Geeks [Online]. Available https://www.geeksforgeeks.org/ml-credit-cardfraud-detection/. Accessed March 2020 19. Python: “Python,” [Online]. Available: www.python.org. 20. Frei, L.: Towards data science. Medium, 16 Jan 2019 [Online]. Available https://towardsdatas cience.com/detecting-credit-card-fraud-using-machine-learning-a3d83423d3b8
Potential Use-Cases of Natural Language Processing for a Logistics Organization Rachit Garg, Arvind W. Kiwelekar, Laxman D. Netak, and Swapnil S. Bhate
Abstract All industries like Healthcare and Medicine, Education, Marketing, ecommerce are using AI and providing a technical advantage to these industries. Logistics is such an industry where AI has started showing its effect by making SCM a more seamless process. Processing natural language has always been a computer science and AI subfield, which covers interactions between computer and human language. The existing literature review lacks in representing the recent developments and challenges of NLP to maintain a competitive edge in the field of logistics. Literature Survey also shows that many of us are curious about knowing the various scopes of implementing NLP in Logistics. This article aims to answer the question by exploring the use-cases, challenges, and approaches of NLP in logistics. This study is of corresponding interest to researchers and practitioners. The study demonstrates a deeper understanding of logistics tasks similarly by implementing NLP approaches. Keywords Logistics · Artificial intelligence · Natural language processing (NLP) · NLP in logistics · Deep learning · Word embedding
Supported by ATA Freight Line India Pvt. Ltd. R. Garg (B) · A. W. Kiwelekar · L. D. Netak Dr. Babasaheb Ambedkar Technological University, Lonere, MS, India e-mail: [email protected] A. W. Kiwelekar e-mail: [email protected] L. D. Netak e-mail: [email protected] S. S. Bhate ATA Freight Line India Pvt. Ltd., Pune, MS, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_13
157
158
R. Garg et al.
1 Introduction The technology is growing at a considerable rate, and the businesses appear to get benefited from it. Every business domain, whether it may be Healthcare, Education, Marketing, or e-commerce around the globe, has employed advanced technologies in each approach they use in their model of business [1]. Thus, the dominance of the advancements in technology leaves no space for doubt. Considering the context of technological advancement and its all-permeating role in multiple businesses, the thought of artificial intelligence in logistics is an inescapable subject. Technology is said to be an essential part of any logistics system that plays an essential role in enhancing the effectiveness and efficiency by improving enterprise competitiveness and performance [2]. Artificial intelligence (AI) is not a newly born field. John McCarthy, a computer scientist at Stanford who coordinated Dartmouth college academic conference on the subject in 1956 [3]. AI is an area of study in computational science domain that automates the intuitive behavior of systems and smart software. AI is trying to program a computer so that it can independently solve problems like a human being can with the necessary training. It is a computer-controlled system designed to perform executing tasks like visual processing, language regeneration, recognition of speech, and decision making. Artificial Intelligence (AI) is changing the perspective of humans to examine the world. Artificial Intelligence has vividly changed business requirements. AI provides the latest technology to affect growing information that can’t be handle by human beings. The survey shows that the present is the ideal time for the logistics business to adopt AI. There is another evidence that AI is now at its best in terms of acceptance in the world [4]. The furthermost essential way to make sure that a company’s supply chain work processes are operative at the feasible level is a well-oiled logistics or freight forwarding team. With the fastest-growing digitization in the world’s professional development, businesses use Artificial Intelligence (AI) to automate and optimize their supply chain by time and money-saving [5]. Artificial Intelligence (AI) holds a vital position in the logistics business [6]. As the world’s logistical needs continue to become more complex, information-driven applications have already entered into the supply-chain rationalization globally. Artificial intelligence, however, is already changing logistics [7]. AI automates or improves several time-consuming operations, providing logistics specialists with valuable perspectives based on highly complex and large sets of data. The logistics leaders are looking into solutions that use AI to influence massive data to modernize processes and enhance decision-making in identifying emerging markets and monitoring fluctuation in exchange-rate to risks management and quest best suppliers. The purpose of artificial intelligence in logistics is indisputably delimiting a significant and central role of operation [8]. According to the latest statistics survey in the logistics industry, it reflects a high demand for logistical requirements. It is to fulfill such soaring market demands. The core functionalities of this industry are becoming complicated and entirely data-driven. The steady computation of data-driven applications is already
Potential Use-Cases of Natural Language Processing …
159
visible in streamlining logistical operations on a global scale. The supply chain in the logistics business is parallel responsible for the deployment of AI in logistics as a corresponding factor [9]. • Researchers at IBM estimate that just 10% of existing systems, data, and activities contain elements of analysis and finding of artificial intelligence in the logistics industry (IBM). • In 2018, Research and advisory company Gartner predicted that most commercial conversations with clients in any logistics industry would go down with virtual agents by 2020 (Gartner). • In 2022, the logistics industry is going to change its look, potentially with artificial intelligence (AI) and machine modeling (DHL Trend Research). • According to Vason Bourne Report at Teradata in 2017, Logistics Industry can generate 42% of its Revenue by Investment in AI (Teradata). In this paper, we have discussed various application benefits and challenges of implementing Artificial Intelligence in the domain of logistics. We have also discussed the possible application of NLP techniques in logistics. We have organized this paper into six sections. Section 2 discusses the motivation behind the use of NLP in logistics. Challenges and the risk of implementing NLP in Logistics are present in Sect. 3. Section 4 discusses the major approaches for processing Natural Language in the domain of logistics. Section 5 discusses the possible use case of NLP techniques in Logistics, and we finally conclude in Sect. 6.
2 Motivation Behind NLP in Logistics Natural Language Processing (NLP) is a computational science and AI subdomain focussing on a machine and human language relationship [10]. NLP focuses on analyzing the human language to draw insights. NLP is the computer program for smartly and sensitively interpreting, perceiving, and deriving context from human language. NLP employed to analyze text, permitting machines to know how humans speak. NLP is an AI and machine learning feature that has a tremendous potential to efficiently decode large volumes of data by a foreign language. There are several applications of NLP in the business world; a few of those are Sentimental analysis, Classification, translation from Speech-to-Text, and vice versa. Considering the future of AI, it not only gives rise to a distinct trend but also the convergence of many of the key technical developments. Gartner has placed the NLP in the peak of inflated expectation in the Hype Cycle for Emerging Technologies in 2019 [11]. • The NLP helps the machine to grasp the emotions of online communications from an unstructured text so as to recognize risk indicators in advance [4]. • Because of current language barriers, NLP technology could simplify auditing and compliance measures [12]. We communicate traditionally with systems using complex programming scripts and preset answers. NLP exceeds these limits and allows an individual to interact with
160
R. Garg et al.
machines using verbal communication. NLP is a domain that includes knowledge of computers in the interpretation and handling of the language that a person speaks. The most indispensable way to ensure that a company’s supply chain working at a best attainable level is a smooth-running logistics or freight forwarding team. Expect the unexpected series of circumstances in a logistics domain that would affect the expected delivery date of a product. To bring automation tasks that basis on information is one of the most popular and effective use-cases of NLP. The benefits of NLP vary not only from operational and production efficiency but also to more operative analysis of data to get a modest advantage and new insights. Technology marketers are emerging solutions that bring AI into the domain of logistics. These are virtual personal assistants (VPAs) and cognitive procurement advisors (CPAs) that uses natural language processing (NLP) to expand further automation and efficiency in logistics organization [13]. These VPAs and CPAs are not different from smartphones Siri, Google Now, and Cortana. The difference, however, is that the virtual assistant is competent of much more than performing simple tasks such as alarm setting or retrieving information from the internet. This new technology can conduct even more complicated operations, such as transactions based on the historical, current, and predicted contexts. Reviewing the existing literature shows that there are several barriers like satisfaction and future expectation, information systems between shippers, and 3PLs [12]. The literature also lacks in representing the recent developments and challenges of NLP to maintain a competitive edge in the field of logistics. The review includes the study of published articles across peer-reviewed journals in logistics and NLP. It shows that different tasks in the logistics industry may also get benefited from the NLP approach (Fig. 1).
Fig. 1 Scope of natural language processing in logistics
Potential Use-Cases of Natural Language Processing …
161
2.1 NLP in Breaking Language Barrier Language barriers are a crucial concern for carrying out logistics activities (e.g., pickup instructions and guidance for truckers). Processing natural language has stunning potential for processing and translating massive amounts of foreign language data efficiently. It interprets untapped information because of the language barrier in various logistics tasks [14].
2.2 NLP in Contract Management Due to increasing engagement in extensive paperwork carried out manually in the logistics industry. The contract management process is a time-consuming and laborious task. As the operational cost is increasing exponentially, the client may require an automated solution to map contract management that would strengthen visibility in essential contract requirements [4, 15].
2.3 NLP in Order Management The system connects via voice command to a smart order management device. If the requirement is understood and the parts identified, the automated order of the appropriate parts can be put using the system’s capability of processing natural language to ensure hands-free operation [4].
2.4 NLP in Processing of Logistics Textual Data NLP can process text data, the conversation of supply chain participants through Chabot, and guarantees the capturing of data in an exceedingly consistent, and trustworthy way [14].
2.5 NLP for Sentiment Analysis and Customer Satisfaction NLP can be used to gather useful data from various social media sources, unstructured text processing, conduct sentiment analysis, and indicate possible risks [16]. NLP can even improve customer satisfaction through intelligent, sensible, automated responsive customer support to all or any associate with easy to understand supply chain knowledge [14].
162
R. Garg et al.
2.6 NLP in Operational Procurement NLP system can also improve the operational procurement and thus streamline the auditing and compliance actions between buyer-supplier bodies. For example, vendor matching by linking a freight and logistics company to one global supplier in various national territories [17].
2.7 NLP in Information Extraction Information extraction is the method by which organized information is effectively extracted from completely non-structured and semi-structured data [18]. NLP approach achieves the Information Extraction and Relation extraction as the critical feature in logistics.
2.8 NLP in Improving Efficiency Natural Language Processing can save transporters time by speeding up data entry and thus significantly improves the efficiency of supply chains [19]. The system can recognize users ‘activity patterns and predict what they need to automatically fill shipping orders, freight charges, and other transactions, which can help save shipping parties’ precious time.
2.9 NLP in Transportation Management NLP integrated with a transportation management system, emails, chat, and text to learn from these conversations and apply the insights to a logistics organization [20].
2.10 NLP in Automation NLP system acknowledges the behaviors of specific users. It starts to forecast what they require by auto-populating shipping orders, bills of freight, and different transactions that save the supply chain valuable time.
Potential Use-Cases of Natural Language Processing …
163
3 Possible Challenges of NLP According to Blume Global [14] stakeholders training, interfaces for capturing and managing information, integration into existing business processes, and investment are mainly four challenges implementing and using NLP in logistics. The main challenge AI’s evolution is that humans do not support it because of the buzz created by the news, solution providers, and vendors [4]. It is a primary human mindset that we often disregard something that we don’t understand, and most people are entirely unaware of these technologies. That is why they stay far from the use of AI and find it difficult. Van der Linde [21] has very well depicted the comical depiction of 3 stages of AI. According to McKinsey [22] studies that 49% of all activities that humans do nowadays may be possible to automate by adapting current signified technology. According to Rao [23] partners and global AI leader at PricewaterhouseCoopers, even entirely accurate data could be problematically biased. In this paper, we find out some significant challenges that NLP can face in the domain of Logistics. • Text Preprocessing: Data coming from different sources have different characteristics, and that makes text preprocessing as one of the most challenging jobs in NLP. A clean dataset allows a model to learn important features and not overfit on the irrelevant text. • Ambiguity: Ambiguity is the most challenging aspect of the NLP solution. Natural Language is highly ambiguous. Ambiguity encompasses a broad spectrum of forms from speech ambiguity, preposition ambiguity, lexical and semantic ambiguity to more complex structures as metaphors. Handling ambiguity requires a sophisticated NLP technique [24]. • Polysemy: Words in natural language usually have a fair number of possible meanings. In processing texts, polysemous words might hamper the accuracy of the derived results, because completely different contexts are mixed within the instances, collected from the corpus, during which an ambiguous word occurred [25]. • Co-Reference Resolution: Co-reference resolution involves determining that two nouns referencing the same thing. Different types of noun phrases act separately; this leads us to use several approaches according to the type of noun. Few among many researchers consider the terms co-reference and anaphora mutually, although they are distinct terms [26–28]. • Information Extraction and mapping: Extraction of information is a practical approach to text understanding. This process aims to extract organized data from unstructured or semi-organized text [29]. Information Extraction exists as one of the most challenging steps in NLP due to the handling of wide-scale heterogeneously-produced data sources from several documents, and linguistic models [30]. • Combinatorial Explosion: Combinatorial explosions occur when the complexity increases exponentially, due to an increase in potential inputs combinations. Suppose a language has 10,000 vocabulary terms, and each sentence has 10 words,
164
R. Garg et al.
then the solution model be as large as 10,00010 , which is so vast, and it is difficult to compute it effectively at all [31]. • Fuzzy and Probabilistic: The fuzzy and Probabilistic model has the potential to be an excellent tool for modeling of linguistic semantics. However, the challenging task is Fuzzy Natural Logic because they provide a mathematical model of the ambiguity phenomenon, and we have already discussed that ambiguity is an unavoidable feature of natural language semantics [32]. • Pronoun Resolution: Resolution of Pronoun is a challenge in identifying references to previous or later speech objects such as noun phrases describing realworld objects known as references. However, such objects may be words, full sentences, or paragraphs [33]. One of the common examples is misclassifying ‘it’ as pleonastic means the occurrence of ‘it’ does not refer to any entity, although it occurs in any sentence about the state. NLP may have people expectation issues; many of the people don’t have a complete understanding of how NLP works; hence they keep soaring hope of it and some of which are not possible. Since AI has made a place in our lives, either less or more, of managing our tasks, but many of us still believe that AI is not capable of undertaking all our tasks.
4 Logistics Tasks 4.1 Operational Procurement The procurement task is one of the most critical functions in the logistics business in providing the input for the organization to transform into output. Operational procurement refers to the procurement of goods and services that are required by an organization to perform day-to-day operations. Some of the activities involved in procurement include creating quality standards, issuing purchasing orders, handling contracts, and coping with complaints. Applying operational procurement best practices to the procurement function of a logistics organization provides cost savings in several ways. More significant contracts can negotiate with suppliers for goods or services, which provides a financial benefit. Operational procurement ensures, the workflow runs smoother, and procurement processes won’t have to be started over and over again from scratch. The history is available and shared within the organization for ease of use. In earlier times, Electronic Data Interchange (EDI) was the most common method for information exchange indirect procurement. But, the global use of EDI limited due to its high cost of deployment, heavy lock-in expenses, and the incapability to process unorganized information [34]. According to Yi-Ming Tai, the technical capability of internet-based strategic procurement systems is a balancing fabric for
Potential Use-Cases of Natural Language Processing …
165
direct procurement management [35]. Different procurement approaches may be built depending on the buyer number. The procurement must exploit the expertise and skills of the supply base [36].
4.2 SCM Parties Communication As more and more businesses are trying to reach global markets, a high degree of international coordination requires in logistics management. Logistics is not only moving the materials and the fulfillment of orders, but more than that. Managers must keep up with the information flow. Diversity in the language is a significant challenge in the global environment. As supply chains go global, many companies are finding difficulties due to this diversity. Companies can lose their competitive edge because of interruption in communication flow by language barriers. Supply chain logistics concerned more about the assurance of shipments rather than a way of shipping. Clients expect orders, and customers expect shipments. The assurance depends upon carrying out a promise that passes through multiple languages to its final destination [37]. Proper communication is the key to reduce conflicts in SCM objectives and simplifies personal relationships. The convergence of various industries, each with its own ‘language’ and various cultures across all companies and countries, can hinder communication across supply chains. There is better communication when two international logistics companies have similar tasks because both people work in the same environment and understand the needs of each other. It has dramatically improved by automation and electronic ways and also helped to build confidence to better connects between parties [38].
4.3 Logistics Documentation Supply chain operations range beyond the national border and global borders of all countries. Tasks in a logistic organization start from customer’s end and thus involves many firms that include purchaser, supplier, third party logistic, freight forwarder, carriers at several points, freight lines, airlines, various government officials, customs departments at different locations and financial bodies like banks for completion of the entire logistics process. The supply chain bodies are actively involved, and the details and documentation for a smooth flow across different distribution points should have clear visibility. Freight and logistics businesses sometimes handle more than 50 documents for each shipment they process, which can total thousands of documents in a single working week. The whole supply chain mechanism includes a collection of paperwork from different organizations and the whole set of documents, and terms of trade by all countries have set uniform in order to facilitate foreign trade.
166
R. Garg et al.
Streamlining logistics documentation by automation provides a powerful, digital document management solution for freight and logistics companies. With automated documentation, organizations can put all of their documentation in a logical order and make it available at the fingertips of those who need to access it. It saves time, space, and transportation of massive paperwork, and ensures that logistics documentation is added to the systems and filed on time.
4.4 Customer Satisfaction and Service Customer satisfaction (CS) is recognized as a crucial aspect to keep a competitive edge by organizations across multiple industries. In context to modern supply chain management, the impact of CS is even more critical. It achieves customer satisfaction results in a positive business relationship with another firm [39]. The efficiency of the supply chain may be a secret weapon for rising customer satisfaction and company profitability. Wherever you work in the logistics system, you serve various types of customers by providing support and handling customer complaints. We can use the implementation of quality information extraction to determine and track customer satisfaction efficiently [40]. Data collected can be analyzed to understand the importance of customer’s interest and ensure positive feedback and experience with the company services [41]. Voice assistants and chatbots can enhance the customer experience. With 24×7 chatbots, customers can receive support at any point and can quickly adapt their plans according to the chatbot or virtual assistant information. You don’t need dedicated human support for various logistics tasks like to book an order, track a shipment, or answer customer questions. Chatbots offer constant support round the clock.
4.5 Predictive Risk Management One of the main tasks of a logistics organization is to identify and minimize potential risks to vendors, distributors, and various other stakeholders in the supply chain by reviewing reports, business news and, social media posts. Logistics companies must comply with procurement and ethical standards by tracking information on possible violations by participants in the supply chain. The current compliance and risk management area face the problem of unstructured and semi-structured documents. These documents are complicated to be converted to quantitative data. Because of this problem, it is challenging to expertly asses managerial information from these documents during risk management. Microsoft’s senior risk manager proposed the general idea of using natural language processing (NLP) to analyze unstructured documents and extract vital information from the doc-
Potential Use-Cases of Natural Language Processing …
167
uments and using a dimensionality reduction algorithm for risk identification [42]. The use of NLP software enables predictive risk management to track news content and interactions related to supply chain variables and to take effective remedial measures.
4.6 Contract Management and Drafting Contract management is a strategic management branch utilized by both buyers and sellers. The objectives of contract management are to control the requirements and relationships of the client and supplier in an organization. Contract management contributes to organizational profitability/success by controlling risk and cost. Aberdeen Research Group estimates that an organization’s explicitly defined contract management process saves up to 80% more than other companies. The beauty of contract management is that it enables even a less efficient procurement to be recovered, whereas poor management of contract can undo great procurement work. Implementation of AI into contract management is evolutionary. According to Guiraude Lame [43], NLP can extract legal concepts and relations among these concepts (ontologies). The author provided a general framework for defining legal terms and relational associations between them using NLP techniques. ML and NLP help to keep essential definitions and clauses uniform in all contracts. In an organization with several trade units, it can be challenging to maintain consistent definitions across all contracts. Contract managers could use ML and NLP approaches to search all contracts for a specific definition, check if a specific definition is correct for their contract. Contract management implements NLP in the following ways. 1. NLP techniques to review contracts in-depth and at speed 2. Build a high-level contract repository by creating a knowledge base about all your contracts 3. Derive insights from or within contract clause data to identify risks 4. Tagging of terms for contract drafting and notification of important dates and clauses 5. Begin to use drawn insights and intelligence into CLM processes and other commercially informed organization processes 6. Flagging of noncompliance by both parties to mitigate risk and ensure adherence.
4.7 Supply Chain Planning The planning of the supply chain is a crucial task in SCM. Supply planning determines how best to fulfill the demand plan requirements. The aim of this program is to align supply and demand to achieve the organization’s financial and service goals. There
168
R. Garg et al.
is a need for smart tools to create concrete plans in today’s business world. SCP modules include • • • • • • • • •
Integrated company planning of sales and operations Collaborative preparation including prediction and refilling Inventory handled by vendors/direct sales point Planning activities like promotion, and growth cycle Planning of demand and inventory Scheduling and planning of production/factory Distribution Requirement Planning (DRP) Strategic network design Supply planning (optimized, DRP and deployment).
4.8 Supplier Relationship Management (SRM) As customers aim to stay financially sound and compete internationally, the risk of maintaining a competitive relationship with the majority of customers increases [44]. Many companies have shown interest in the supplier relationship management (SRM) system because of its effectiveness in many situations of business. Customer relations management is an important element in retaining a competitive edge and competitive differentiation [45]. SRM is an integral part of supply chain management (SCM). Managing supplier relationships consists of monitoring and organizing productive relationships with the third-party suppliers of business who provide the business with products and materials. Supplier Relationship Management affects all domains of the supply chain and has a significant impact on performance. Organizations always face the challenge of managing the “people” while implementing an integrated system in the business. Machines remember everything and can retain information from historical data of the user, including previous discussions, decisions, preferences, and locations. The computer can understand the context of a situation and make communications more complete and useful by linking this to information from other sources, including diaries, contacts, and e-Mails. Instructions in natural language rather than in any code or specific wording provide a much faster, more accessible style of communication with the user [46]. AI can analyze data related to customers and provide insights that can use for future decisions regarding particular customers. As a result, a company can make better decisions and improve its customer service by providing customized solutions [47].
4.9 Auditing and Compliance Check Government agencies across the globe continually issue rules that affect supply chain operators. Organizations may face slower performance and possibly costly fines if
Potential Use-Cases of Natural Language Processing …
169
they fail to integrate regulatory compliance adequately into international logistics operations. Diversity in the mode of transport and methods is a significant challenge in the global environment. Organizations must collaborate with professionals who recognize logistics regulations, local standards, and who can gather critical information from the organization, but also from trade partners.
5 Some of the Major Approaches of NLP for Logistics Tasks As an aid to the reader, we have given a try for finding out the possible use case of the NLP technique in logistics. The following are the possible application domain of using NLP techniques in logistics (Refer to Table 1). The first column in the table represents the logistic task, and the second column represents the NLP approach for performing that task. The next sub-sections describe the detailed description of the approach.
5.1 Rule-Based NLP Approach Usually, a two-part analyzer/generator may detect in a rule-based methodology. First, the linguistic information refers to the declarative component; second, the processing portion reflecting an analysis/generation strategy [48, 49]. Several features defined each token of content in this methodology and contrasted it with a rule. A rule contains patterns and actions. At the point when a pattern match happened, a predefined action terminates. Heuristic rules have also been developed and revised on an ad-hoc and customized basis [50]. The rule-based approach is a shallow NLP technique and requires to learn rules from training data [51].
5.2 Machine Translation The role of machine translation is to translate one language into the other automatically by maintaining the semantic meaning of the input text and delivering the text in the target language. One of the sub-domains of AI is Machine translation; current innovative technical implementation led to very significant improvements in the quality of translation. Translating the data element names and descriptions using automated machine translation is a critical element in simplifying the interactions and transactions in the logistic system. The input language actually consists of a series of symbols in a machine translation process and is translated into a series of symbols in a separate
170
R. Garg et al.
Table 1 Logistics tasks with corresponding NLP approach Logistic tasks NLP technique that can be used for the given task Operational procurement • Streamline the auditing and compliance between Logistics firms
• Chatbots/procurement bots
• Reinforcement learning SCM communication • Decoding high volumes of foreign language • Machine translation and interpreted untapped information due to of existing language gaps among buyer-supplier entities • POS tagging and deep learning • Word sense disambiguation Logistics documentation • Automation of analysis of the “millions of • Rule-based approach pages of documents” collected annually and sorted unstructured information • Getting/issuing documentation of invoices • Text categorization/classification and payment/order queries • Text similarity • Document summarization Customer satisfaction and service • Sentimental analysis of customer • Sentimental analysis • Managing the Advertisement Funnel of • Chatbots Logistics Organization and Market Intelligence for Logistics Firm • Responding to customers queries, predicting • Question answering the customer requirements and automate customer service to provide easy to understand supply chain information to some or all decision makers • Reinforcement learning Risk management • Finding news relevancy to logistics • Text similarity and topic modeling organization • Information retrieval and extraction Contract management • Generation of standard documents and legal • Text categorization/classification contracts in supply chain management • Language modelling for contracts • Ontology and language modeling • Keyword extraction • Word embedding (continued)
Potential Use-Cases of Natural Language Processing … Table 1 (continued) Logistic tasks Supply chain planning • Planning a product from raw material to the consumer
171
NLP technique that can be used for the given task • Topic modeling and text classification • Digital assistant (VPA)
Supplier relationship management • Manage and schedule positive interactions with third-party company vendors • Clustering the information and analyzed and integrated into the logistics business
• Machine translation • Information extraction
• Text similarity Auditing and compliance • Saving shippers time by rushing up data entry • Relationship and information extraction and auto-populating form fields, auto-populating shipping orders, bills of freight, and other transactions, that saves the shipper valuable time • Text similarity
output language by the program. In the input language, given the phrase S, we are looking for the output phrase T translated for the input by a translator. We can minimize the chances of error by choosing the most probable sentence S to give output sentence T. Thus, we want to select S to maximize P(S/T). Machine Translation can help to counteract the language barrier for logistic communications. • The beauty of machine translation is that an order created in Spanish can be obtained in native language by a non-Spanish worker; • When a container arrives at the port, a description of the products filled in Japan in the Japanese language inside a container can be understood by Indian custom department; • Transportation papers are easy to understand when people read them in their own language because most workers involved in the transport flow do not know English fluently.
5.3 Text Similarity The similarity of text holds an important role to play in encouraging data processing innovations. Similar content is now applied extensively in the fields of data mining, artificial intelligence, information processing, knowledge management [52]. The goal
172
R. Garg et al.
of text similarity is to determine the lexical and semantic closeness of two phrases. For example, the similarity of the phrase “what is the status of different logistics operation?” with “what operation in logistics has status?”. If you just think of only word-level similarities, the two sentences seem very close, since three of the four different words overlap with each other. overlap_measure = “status, different, logistics, operation” ∩ “operation, logistics, status” = 3 This concept of similarity refers to as lexical similarity. The true meaning of the words or the entire sentence is not generally taken into account, while the true meaning of the phrases is different. Another concept of similarity mostly traversed by the NLP research community is how similar are any two phrases concerning its meaning? If we look at the sentences, “what is the status of different logistics operation?” and “what operation in logistics have status?” we realize that the two terms have a different meaning, although the words significantly overlap. We do need to look at the meaning of the sentence to capture further semantics instead of matching word for word. We have to concentrate on phrase/paragraph levels (or lexical sequence level) before we calculate similarity if a text breaks in a particular group of related terms. Semantic similarity generally uses to perform NLP tasks such as paraphrase identification and automatic question answering.
5.4 Word Embedding Embedding words is a standout amongst the most popular presentation of documentary vocabulary. In this approach, the text is converted into numeral representation, and numerical presentations of a related text can be distinct. This approach maps the word having the same significance with similar representation, or we can say it is capturing the word meaning in a text, contextual and lexical similarity, and connection with all other terms. In this approach, every word plots in one vector, and the value of the vector learns in the same way in which neural network works. Word embedding is classifying into two categories.
5.4.1
Frequency Based-TF-IDF Word Embedding
TF-IDF is an information retrieval technique that can obtain by multiplying the measure for the raw count (frequency) of term and inverse of document frequency. This product named TF-IDF [53, 54]. This measure intended to result in how the term is relevant in each document. The intuition for TF-IDF is that if any document finds a word repeatedly, then it should be significant and given a score of high value.
Potential Use-Cases of Natural Language Processing …
173
But if several other documents contain that word, it might not be a distinctive word, which gives the word a score of lower value to that word [55]. The main objective of this term-weighting approach is an enhancement in the effectiveness of retrieval [56]. Term Frequency (TF) is a share of how, as often as possible, the word ‘T’ occurs in a ‘D’ document. Inverse Document Frequency gives the statistics of information that word gives. It depends on counting the number of documents in the document list which contain the term. TF(t) =
(Number of times term t comes in a document) (Total number of words in the document)
IDF(t) = loge
(Total No. of documents) (No. of documents with term t in it)
Take a 1000 words text document in which the term ‘logistics’ comes 30 times. The term raw count (frequency) for ‘logistics’ is (30/1000), which is equal to 0.03. Assume a collection of 10 million documents, and the word ‘logistics’ appears in one 1000 of these documents. The measure of IDF is the logarithm of (10,000,000/1000), which is equal to 4. The multiplication of these two measures comes out as the measure of TF-IDF, which is equal to 0.03 ∗ 4 = 0.12.
5.4.2
Prediction Based Word Embedding
Simply, we may describe word embeddings, the text is translated into numeral representation, and numerical presentations of a related text can be distinct. It is because all deep learning and machine learning algorithms are not capable of processing a string of plain text or raw text. Prediction based model is a probabilistic model. This model anticipates the word that comes next in the sequence, depending on the surrounding and current words. Previous methods for word embedding are deterministic and demonstrated to be restricted in representing words before Mitolov et al. [57] acquainted word2vec to the NLP community. The algorithm can execute work like King − Man + Woman = Queen, which was assumed as a marvelous result. There are, in general, two types of prediction-based Word Embedding approaches that we have encountered. Continuous Bag of Words (CBOW) Based on the surrounding words, the CBOW model attempts to estimate the primary target word. As an example, we can use “monkey” & “tree” as surrounding words for “climbed” as the output word. CBOW, in general, forecast the probability of a word relating to the context. This model understands the embedding of a word by anticipating its meaning. The architecture is known as a model for bag-of-words since the word ordering does not impact the prediction [20]. The below expression gives the complexity of training
174
R. Garg et al.
Q = N ∗ D + D ∗ log2 (V ) where N × D gives the dimensions of Projection layer P, where the input layer projects, and encodes with N previous words, and V gives vocabulary size. Skip Gram Skip-gram is like CBOW yet as opposed to predicting the present word on the basis of its meaning. It predicts the surrounding word. This model switches (reverse) the use of target and context words. Skip-gram predicts the context word by taking a word from vocab. The complexity of training in this model is proportionate to Q = C ∗ (D + D ∗ log2 (V )) Here C gives Maximum Distance of the Word’s.
5.5 POS Tagging Tagging Part of speech means to classify the word as noun, pronoun, adjective, verb, adverb, conjunction, and other subcategories referred to as POS tagging. The method of distinguishing the term in a text is equivalent to a specific speech part, depends on the definition and its meaning. Tags help build parse trees that utilized in building Named-Entity Recognition (NER) and extracting the relationship between words. Parts-of-speech is useful as they disclose to us a lot about a word and its neighboring word. Knowing the syntactic structure is an important feature of parsing when tagging POS. The syntactic structure means determiner is followed by nouns, and nouns are followed by adjectives or verbs [58]. Speech recognition is the task of understanding human speech. It improves logistics communication. Real-world has several applications of speech recognition, such as virtual assistants, digital assistants, Chatbots. A POS-based language model is used to create the text output that trains on the audio data. POS tagging is a learning approach that is supervised in nature and makes uses of features like the previous word, next word, capitalized of the first letter. There is various POS tag set like treebank, text blob, pattern tagger, spaCy. The most popular POS tagger uses the Penn Treebank tag set. The POS tagging can build on various methods like Lexical based uses the occurrence of most frequent words in a corpus. The rule-based approach uses predefined rules for tagging and the Probabilistic approach base on the probability of a tag occurrence. CRF and HMM are the probabilistic approaches to assign POS tags. Deep learning uses a neural network for POS tagging. Hinrich Schfitze in Distributional Part-of-Speech Tagging [59] defined a fully automatic algorithm that derived information necessary for tagging speech parts. The noticeable attribute in this algorithm is the ability of handling uncertainty associated with part-of-speech, a persistent factor in natural language which was not taken into account during previous work on learning corpora categories. A simple rule-
Potential Use-Cases of Natural Language Processing …
175
based part of speech tagger was introduced by Brill [48], that is an improvement over another stochastic tagger. The tagger at first tags by assigning each word its most likely tag, evaluated by inspecting a large tagged corpus, without respect to context. The tagger is exceptionally portable. A Markov model-based tagger was demonstrated by Michele et al. [60] in which a word is tagged by utilizing meaning on both the sides and check it with the unsupervised and supervised approach. The authors also provided variations in HMM model learning for the evaluation of a tag set and sequence of lexical probabilities.
5.6 Keyword Extraction Keywords characterize as an arrangement of one or more word gives a minimal picture of a document’s content. In a perfect world, keywords in dense form represent the essential content of a document. Keywords have likewise connected to boost the usefulness of Information Retrieval systems [61]. The extraction of keywords means detecting short sentences or a group of words that briefly portray the contents of the document [62]. We use Rapid Automatic Keyword Extraction algorithm for keyword extraction and classifying the keywords/phrases of a document. This algorithm is a standout amongst the famous and renowned models for extracting keyword. Rose Stuart et al. developed this algorithm in 2010 [61]. This methodology extensively utilized language processing techniques for English and languages with a similar structure as of English. RAKE selects keywords based upon each keyword’s score. This score specifies the sum of the frequency-degree ratio [63]. RAKE input includes a list of stop words and a set of phrase limiters. The algorithm uses a list stop words and sentence boundaries to segment into an arrangement of the informative word known as candidate keywords. The algorithm is as follows. 1. Candidate keywords – Splitting the text document using word limiters (like punctuation and spaces) into a list of words [63] – Splitting the obtained list of words by the stop words to break series into a series of continuous words. Every series is now called a “candidate keyword” [63]. 2. Keyword scores – After each candidate keyword has been identified and the term co-occurrences graph are completed, the score is determined for each candidate’s keyword; their participant keyword scores are defined by the sum of the same scores. – Add the word score of the component terms for each candidate’s keyword and determine its score. – There are quite a few metrics for calculating word score those are, frequency of the word, degree of a word, and degree-frequency ratio (deg(w)/freq(w)).
176
R. Garg et al.
3. Extracted keywords – Make the list of candidates as the final list of keywords extracted the first one-third highest score candidates.
5.7 Deep Learning Deep Learning is an area of machine learning that includes the algorithms inspired by brain structure and research known as artificial neural networks. A higher level of performance is given by the development of models that require more data, but less language ability to train and operate in the field of linguistic communication [64]. The creation of deep neural research networks with Keras in Python is very easy; however, we should pursue a strict life-cycle of the model given below (Fig. 2). Several important applications in processing natural language use a deep learning approach. One among them is the word embedding [65]. Collobert and Weston [66] defined a deep neural network architecture for NLP. It takes the input of massive databases (e.g., 631 million words from Wikipedia). It outputs a POS tag, blocks, recognition of named entity, semantic positions, words closeness, and language modeling. Another vital application of DL in logistics communication is an automatic correction of language grammatical error, which provides better communication across logistics parties [67]. Google’s ceaseless work in the domain of Artificial Intelligence and machine modeling is creating a significant leap forward for prospective NLP applications. Similarly, the GoogleBrain project is one of the admirable projects in this field [68,
Fig. 2 Neural network models life cycle
Potential Use-Cases of Natural Language Processing …
177
Fig. 3 Neural networks architecture: a machine learning; b deep learning; c; RNN; d forward neural network [73]
69]. Google’s deep learning researchers have published a lot of impressive publications; examples of these would include the inception module used with GoogLeNet Convolutional Neural Network. Most of the NLP tasks need semantical modeling across the complete phrase. It includes making the essence of a sentence in a fixed dimension. Recurrent Neural Network can review sentences directed at increasing their use for tasks such as machine translation, in which the entire sentence can be represented as a static vector, and then returned to the variable-length sequence [70]. TensorFlow is a project by Google that also uses RNN and having the ability to develop machine learning models. TensorFlow provides a diversity of different toolkits that enable us to construct models at our desired level of abstraction. TensorFlow blends machine learning and deep learning models and algorithms and makes them useful across a shared interface [70]. One can easily download the TensorFlow library and installed it in your system using the python-pip installation. TensorFlow comes with various inbuilt bundled packages like speech recognition and image recognition. Google Colab is another way of using TensorFlow. Installation is not necessary; it can be used directly in the browser with Collaboratory [71] it is a free Jupyter notebook environment that does not require any setup and entirely runs on the cloud. TensorFlow is pre-installed in this notebook. The user just has to import the TensorFlow and start using it [72] (Fig. 3).
178
R. Garg et al.
5.8 Chatbots Documents procurement is the contractual bond between the customer and the supplier of goods or services. With the advancement in technology, procurement professionals are showing interest in it, because Chatbots present a valuable and distinctive opportunity to ensure better services and capabilities for customers and suppliers. More than that, by using textual information, Chatbots can help in the creation of digital procurement for stakeholders, suppliers, and the procurement teams themselves and thus improve the customer-supplier experience. Keeping various business challenges in mind a Chatbots can provide diverse solutions in the process of procurement [20]. Artificial Intelligence with natural language processing powered Chatbots can provide support to the procurement process via Text Interface. A procurement chat-bot can assist companies 24/7 by keeping users active and providing information such as ordering progress, delivery status, availability of inventory, stock cost, supplier status, and contract terms. Various approaches for Chatbots include a rule-based approach, a bot answer queries supported by some rules on which it trained.
5.9 Reinforcement Learning The process of learning optimized behavior by means of experiences of trial and error with complex learning circumstances is known as Reinforcement learning. This approach tries to mimic the critical system used by humans for learning the right behavior without having a stringent environmental model. Stefan Feuerriegel et al. [74] suggested a method to identify negation cue and scope in business news in order to enhance the calculated accuracy of sentimental emotions. RL surely has the capabilities of working with humans and also assisting them. It can be interpreted easily by considering a robot or a virtual assistant interacting with you and taking your decisions into account to take steps to accomplish a shared objective. Giannoccaro and Pontrandolfo provided a systematic approach for stock decision management of the supply chain at all stages in a cohesive manner. The authors explained how to enforce this learning approach in standard reward conditions to implement an optimized inventory policy [75]. The other RL approach to determine a renewal policy in a Vendor Managed Inventory system with shipment inventory was outlined by Sui et al. in 2010 [73]. A method by which an RL algorithm is implemented to obtain the policy decisions and cost for various commercial service modes in delivery systems was outlined by Qiu et al. in 2007 [76].
Potential Use-Cases of Natural Language Processing …
179
5.10 Text Classification Although machine learning has undergone significant development in recent years, it has proved that it is both simple as well as fast, correct, and trustworthy. It has been used effectively for a lot of purposes, but it primarily deals with NLP problems. NB Classifier is a probability-based algorithm. It depends on applying Bayes’ theorem with the underlying assumption that every pair of a feature are conditionally independent for predicting the text tag. Bayes theorem computes the probability of outcome class to a classified document. P
oc cd
=
P( cd ) ∗ P(oc) oc P(cd )
Here “oc” gives outcome class, and ‘cd” is the document which needs to be classified. The NB classifier is mainly used for classifying text. The method of assigning text tags or categories according to their content is known as text classification. Classification of text is sometimes also called as text categorization or text tagging. It is one of the vital aspects of processing natural language. The wide-ranging applications include analysis of sentimental emotions, text labeling, spam identification, subject categories, tags, genres, and intent detection.
5.11 Relationship Extraction Extracting semantic relationships from text is called Relation Extraction (RE), which usually occurs between two or more entities. Relation extraction plays a crucial function in a retrieval of structured knowledge from raw text provided by unorganized sources. Many applications for the extraction of knowledge, language learning, and knowledge discovery need to understand the semantic relationship between organizations. It may need someone to identify connections between medicines to construct a medical repository, recognize the imagery scenes, or to retrieve relationships between people in order to construct a searchable knowledge repository. This relationship is described as a tuple t = (e1 , e2 , . . . , en ) where ei is the entities of a predetermined relationship R in document D [77]. These relations can be of different types. e.g., “AI implemented in Logistics” states an “implemented in” relationship from AI to Logistics. Triples like (AI, implemented in, Logistics) denotes this relation. There are two methods for extracting relationships; one is an open way of extracting information, which extracts the desired information
180
R. Garg et al.
from raw text with minimal to no interference by humans. In order to accomplish the same goal, another method is supervised extraction of information that uses some prior knowledge [78].
5.12 Language Modelling Language Modeling is a crucial component of modern NLP. Language Models estimate the relative likelihood of different phrases in the text and give the probability of words that follow. LM computes a token likelihood. A token may be a phrase or a series of words. Several forms of language processing applications are present, such as machine translation, spell correction, speech recognition, description, query answering, sentiment analysis. The Language Model is necessary for each of these applications. The language model learns to estimate the word sequence possibility. P(W ) = P(w1 , w2 , w3 , . . . , wn ) and it can be used to find the probability of the next word in the sequence P(w5 |w1 , w2 , w3 , w4 ) A Language Model is a method that measures these probabilities. The text needs to be normalized. Normalizing the text means the indexed text and query terms should have the same form. For example, U.S.A., U.S, and u.s.a. are to be considered the same (remove dots and case—lowercase) and should give the same result while querying any of them. Two types of language models are mainly available: 1. Statistical Language Models: These models have a typical statistical approach, such as N-gram, Markov Models, and other language rules, to learn word probabilities distribution. 2. Neural Language Models: These are new approaches in the NLP domain and have surpassed the efficacy of computational language models. They use different types of neural networks for language modeling.
5.13 Document Summarization Document or text summarization is the task where it creates a short description of a text document. This method refers to the task of extracting the most crucial information in a given large text to generate a short and meaningful summary of the same. The results showed that integrating even simple summarization systems can primarily save employees turnaround time without considerably decreasing work
Potential Use-Cases of Natural Language Processing …
181
quality [79]. Text summarizer is useful both for humans as well as for computer programs and algorithms that allow a great deal of data to process in a shorter time. Document summarization mainly includes creating a heading for a document or creating an abstract of a document. There are two main types of approaches for summarizing text in NLP: 1. Summarization based on extraction: The use of this technique is to make a text summary. This method involves extracting and compiling important sentences from the source text. For example Source text: John and Peter went to attend Global Logistics Summit 2019. While in the summit, John fell down and was admitted to hospital. Extractive summary: John and Peter attend Global Logistics Summit. John admitted to hospital. The extracted words are in bold and joined together to create a summary. Sometimes the summary is grammatically strange. 2. Abstraction-based summarization: The abstraction-based technique involves paraphrasing and shortening the original document just like humans do. The abstraction-based text summarization delivers the most useful knowledge from the source text by producing new sentences and phrases. Therefore, the abstractive technique performs better than extractive. Below is the abstractive summary of the source text. Abstractive summary: John hospitalized after attending a Global Logistics Summit with Peter.
5.14 Word Sense Disambiguation (WSD) Across all major languages around the world, there are several words that denote meanings in different contexts. Consider the word ‘bank’ of the English dictionary, which may have multiple meanings like “riverside,” “financial institution,” “reservoir,” [80, 81]. WSD finds the meaning of an ambiguous word from the set of possible meanings available for that uncertain word. When used in a sentence, it automatically figures out the intended sense of an ambiguous expression [82–84]. One of the critical issues in language translation is multiple meaning (or senses) of a single word that can worsen the logistics communication, using WSD can improve this communication. There are three approaches to word sense disambiguation mentioned below. 1. Dictionary-based or Knowledge-based: As the name defines, for disambiguation, these methods mainly depend on dictionaries, treasures, and knowledge base. This approach is suitable for domain-specific word sense disambiguation [80]. The Lesk method (Lesk 1986) is an important dictionary-based method. A wordnet is a lexical dictionary published by George A Miller in 1995. It is
182
R. Garg et al.
the most widely used dictionary in this field for finding a semantic relationship between words [84]. 2. Supervised Approach: Supervised approach use machine learning for finding the exact meaning of a word from a manually created corpus of word with their sense. The classifier uses to recognize the sense of the word based on their context of use. The classifier accuracy depends on the size and the diversity of the data combined in the corpus. Larger the corpus, better the results. However, these supervised approaches are subject to a new information acquisition bottleneck, as they rely on large quantities of manually-tagged training data that are laborious and costly to produce [82]. 3. Unsupervised Approach: this approach works in the absence of resources and does not rely on any corpus of words, lexical dictionaries, or any other external source. It is basically a knowledge-free approach [84]. Unannotated corpora are the basis of the unsupervised approach. The clustering of words constructs it [80]. Unsupervised learning methods can solve a new knowledge acquisition bottleneck because they do not rely on manual effort [82].
5.15 Ontology Ontology is a naming convention for concepts and interrelationships in the domain of AI and the web. The ontology allows us to discover inconsistencies and inferences in data [85]. An ontology, along with a set of individual classes instances, creates a knowledge base. In reality, there is a thin line where the ontology ends, and the knowledge base begins. Ontologies played a vital role in information processing and recognize as an essential component of information systems. An ontology is generally an explicit definition of a conceptualization [86]. An ontology defines an intermediate layer that enables translation and integration of all the aspects involved in contract management. A semantic ontology helps in every phase of contract management, such as contract monitoring and execution. An ontology of conceptual concepts helps to explain the duties, responsibilities, and obligations of each contract partner [87]. Further semantic explanations are required to understand and process contract information automatically by computer. Ontology has become an accessible technology for semantics language [85].
5.16 Sentimental Analysis Sentiment analysis (SA) is contextual mining of text data as positive, neutral, or negative, allowing businesses to gain a more in-depth customer knowledge of how they see their product, brand, or service. In nature, human emotions are complicated. We use several forms of emotions in combination with tone to express a specific
Potential Use-Cases of Natural Language Processing …
183
feeling. Besides recognizing the opinion, SA determines the characteristics of the expression. • Polarity: having two contradictory opinions of a speaker i.e., positive or negative sentiments, • Subject: the element for which the speaker is expressing or the element that is talking about, • Opinion holder: It is an entity who gives the opinion. There is a close link in supply chain output with the degree of trust, cooperation, and knowledge sharing among its participants [88]. Supply chains are a socio-tech network with major technological and social influences. The technical forces are to resolve the technical problems related to the system, while social factors handle human behavior in the relationship between supply chain participants [89]. Sentimental analysis is of great interest in any supply chain because it develops and manages a buyer-seller relationship. It also acts as an asset for creating the values in a supply chain.
5.17 Question Answering Answering the questions (Q&A) is an application of NLP for creating a system that responds automatically to queries asked by humans in natural language. This automatic system uses a customized repository or a set of documents to answer the queries [90]. The central pillar of logistics is the customer-focused framework. The Logistics Service providers are now more likely to predict customer expectations and identify their business-related sensitivities with technology-driven processes and intelligent insights. To answer the customers’ questions in real-time, the solution providers are more personalized, intuitive, and interactive. QA has applications for a wide variety of functions, such as information retrieval and entity extraction [91]. QA system has certain paradigms about it: 1. Knowledgebase—The structure of the data source is in such a way that it allows retrieval of answers using NLP. 2. Answer generation—Apply NLP techniques to extract answers from the retrieved piece of text. 3. Question processing—Apply NLP techniques to determine the topic and entities of the question and generate a query that can fire on the pre-stored structured knowledgebase. 4. Information retrieval—Retrieve and rank answer snippets based on the queries.
184
R. Garg et al.
5.18 Topic Modelling Data are being collected at a high rate every day. In recent years, because of the ever-growing data volume, it is difficult to obtain the details we are aiming for. So, therefore, we need some tools and techniques to perform this task by organizing, scanning, and understanding this ever-growing data volume. Topic modeling is one such method in the area of linguistic processing. Topic modeling is an approach of identifying the conceptual topic in a set of documents. Modeling of a topic is a form of statistical modeling. It is an automatic process that identifies topics present in document collection and to derive hidden patterns from that collection and, thus, help in better decision making. The modeling of topics offers us strategies for organizing, understanding, and summarizing large collections of linguistic data. It assists in: • Identifying unseen trends (topics) contained in collections of linguistic data • Document interpretation as per topics • Use of interpretation in organizing, scanning and summarizing the texts. Topics are nothing but “a repeating pattern of co-occurring terms in a corpus.” For “healthcare” topic, a proper modeling technique for topics gives the following words—“health,” “doctor,” “patient,” “hospital,” and for an “education” topic it results in “school,” “college,” “literacy,” “learning.” Topic modeling is an ‘unsupervised’ machine learning technique; in other words, this approach doesn’t require training. Topic modeling is also used for classifying documents and extracting information from unorganized data by selecting features (Table 2).
6 Finding and Conclusions We conclude that with the growing digitization, the present is the right time for any logistics business to adopt AI. Natural Language Processing (NLP) is computational science and artificial intelligence sub-domain focusing on machine and human language relationships. It has extraordinary capabilities for information extraction and relation extraction. Many research papers addressed their case studies with the positive adoption of AI. However, simplified models for NLP can still be used effectively to implement AI-driven logistics for various types of logistics activities. Many logistics studies showed that logistics practitioners still ignore the adoption of AI, although technological innovations have taken hold on text data processing. Studies show that there is not strong evidence of adopting AI practices in executing several tasks in the logistics company. NLP can save time and resources in the logistics domain by automating various processes related to textual data. The logistics industry uses this technology as a backbone to get new insights from data. The natural language uses various models and approaches for processing it, but deep learning and neural network solves the majority of NLP problems. Tensor based learning
Potential Use-Cases of Natural Language Processing … Table 2 Summary of major approaches for NLP techniques Major approaches for NLP techniques Methods Description Rule-Based approach
In this approach, a set of features denotes every token of content, and it contrasted with the rule Word embedding Embedding words is the text converted into numbers. It is for capturing the context of words in a text, contextual and lexical similarity, and connection with all other terms Frequency-based (TF-IDF) Term frequency (TF) is the proportion of how, as often as possible, the term ‘T’ appears in a ‘D’ document. Document Frequency is the appearance of the term ‘T’ in all document sets. IDF is inverse of document frequency and measures how much information that word gives. The multiplication of these two measures comes out as the measure of TF-IDF Continuous bag of words The CBOW model attempts to estimate the primary target word based on surrounding word Skip-Gram Skip-gram is like CBOW yet as opposed to predicting the present word on the basis of its meaning. It predicts the surrounding word POS tagging The method of distinguishing the term in a text is equivalent to a specific speech part, depends on the definition and its meaning Document summarization It mainly includes creating a heading for a document or creating an abstract of a document Text classification Classification of text is a probabilistic approach that uses Naive Bayes classification rule for assigning tags and categories to text Rapid automatic keyword We use rapid automatic keyword extraction extraction algorithm for keyword extraction and classifying the keywords/phrases of a document. RAKE selects keywords based upon each keyword’s score Deep learning Deep learning is an area of machine learning that includes the algorithms inspired by brain structure and research known as artificial neural networks
185
Sub section references [5.4]
[5.4]
[5.4]
[5.4]
[5.4]
[5.5]
[5.13]
[5.10]
[5.6]
[5.7]
(continued)
186 Table 2 (continued) Major approaches for NLP techniques Methods Description Reinforcement learning
The process of learning optimized behavior by means of experiences of trial and error with complex learning circumstances is known as reinforcement learning Language modeling Language models estimate the relative likelihood of different phrases in the text and give the probability of words that follow Word sense disambiguation WSD finds the meaning of an ambiguous (WSD) word from the set of possible meaning available for that uncertain word Ontology An ontology is a specification of the meanings of the symbols in an information system Topic modeling It is an automatic process that identifies topics present in document collection and to derive hidden patterns from that collection
R. Garg et al.
Sub section references [5.9]
[5.12]
[5.14]
[5.15] [5.18]
is also making consistent progress in numerical computation, expanding its implementation in several areas also includes processing linguistic data. NLP promotes the world of logistics companies in new, more professional ways than ever. NLP facilitates logistics providers to enrich customer experience through conversational commitment. The long run of AI in logistics is full of potential. With the growing digital transformation, NLP winds up a fundamental piece of everyday business and speed up the way towards an upbeat, projecting, automated, and customized future logistics. Acknowledgements The author thanks, Akshay Ghodake for valuable discussion on Natural Language Processing and Logistics in general. This research is supported by ATA Freight Line Pvt. Ltd. Research Fellowship. Conflicts of Interest Arvind W. Kiwelekary and Laxman D. Netak declares that they have no conflict of interest. Rachit Garg has received research grants from ATA Freight Line India Pvt. Ltd. Swapnil S. Bhate owns a position of Innovation Associate in CATI department of ATA Freight Line India Pvt. Ltd.
Glossary • AI (Artificial Intelligence)—AI is an area of study in the computer science domain that automates the intuitive behavior of machines and smart software.
Potential Use-Cases of Natural Language Processing …
187
• ANN (Artificial Neural Networks)—The ANN consists of several nodes like the human brain and trying to mimic the behavior of its biological cells. • CBOW—It stands for Continuous Bag-of-Words. This method attempts to determine the present target word based on the surrounding words. • CPA (Cognitive Procurement Advisor)—It is a method of using innovative technologies to help in the management of the procurement. It uses self-learning technology to handle data and assist in the procurement or purchase of products and services. • HMM (Hidden Markov Models)—HMMs are a class of probabilistic graphic models, which allows a series of unknown (hidden) variables from the observed variables to projected. A simple example is the weather forecast (hidden variable), depending on the type of clothing that someone wears (observed). • NER (Named Entity Recognition)—The purpose of NER is to recognize and classify named entities into pre-defined categories in text, including names of persons, organizations, places, terms of time, numbers, monetary values, percentages. • NLP—It stands for Natural Language Processing. It is a method of analyzing, perceiving, and deriving context smartly and sensibly from human speech. • POS—It stands for Parts-of-Speech. It allocates the syntactic functions of a word to a group. The main components in the language are the words noun, pronoun, adjective, determinant, verb, adverb, and preposition. • RAKE (Rapid Automatic Keyword Extraction)—We use RAKE for keyword extraction and keyword ranking from a document. • RNN—It stands for Recurrent Neural Network. It is a neural network where the output from the previous step is passed as input to the current level. • TF-IDF—It is a measure of the importance of a word. It stands for Term Frequency—Inverse Document Frequency. Term Frequency is the proportion of how as often as possible, the term ‘T’ appears in a ‘D’ document. Document Frequency is the appearance of the term ‘T’ in all document sets. IDF is inverse of document frequency and measures how much information that word gives. The multiplication of these two measures comes out as the measure of TF-IDF. • VPA is a software program that understands natural language voice commands and completes user tasks, also known as an AI assistant.
References 1. Reynosa, R.: 7 Industries Using AI in 2020 (+14 Examples). https://learn.g2.com/industriesusing-ai/ (2019). Accessed Jan 2020 2. Bhandari, R.: Impact of technology on logistics and supply chain management. IOSR J. Bus. Manag. 13, 17 (2014) 3. Smith, C.: The History of Artificial Intelligence. Technical report (2006). Available at: https:// courses.cs.washington.edu/courses/csep590/06au/projects/history-ai.pdf 4. Gesing, D.M.B., Peterson, S.J.: Artificial Intelligence in Logistics. Technical report (2018). Available at: https://www.logistics.dhl/content/dam/dhl/global/core/documents/pdf/glo-coretrend-report-artificial-intelligence.pdf
188
R. Garg et al.
5. Noronhan, J., Bubner, N., Bodenbenner, P.: Logistics Trend Radar. Technical report (2016). Available at: https://www.dpdhl.com/content/dam/dpdhl/en/trends-in-logistics/assets/ dhl-logistics-trend-radar-2016.pdf 6. Peters, C.: 5 Ways AI Will Transform the Logistics Industry. https://www.altexsoft.com/blog/ business/5-ways-ai-will-transform-the-logistics-industry/ (2018). Accessed July 2019 7. Netti, D.: Artificial Intelligence and Robotics in Logistics 2Scenarios in the FMCG market. Technical report. https://www.unwe.bg/uploads/Department/FormUploads/27cb5b_AI%20in %20Logistics.pdf 8. Adam Robinson, A.: The Top 5 Changes That Occur with AI in Logistics. https://cerasis.com/ ai-in-logistics/. Accessed Aug 2019 9. Taguchi, K., Kyozu, H., Otogawa, Y., Isobe. M.: AI Technology for Boosting Efficiency of Logistics and Optimizing Supply Chains. Technical Report 2. Available at: https://www.hitachi. com/rev/archive/2018/r2018_02/pdf/P095_100_R2b03.pdf 10. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011) 11. Goasduff, L.: Top Trends on the Gartner Hype Cycle for Artificial Intelligence (2019). https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cyclefor-artificial-intelligence-2019. Accessed Sept 2019 12. Shi, Y., Osewe, M., Li, Q., Lu, H., Liu, A.: Global challenges and research gaps for third-party logistics: literature review. Int. J. Logist. Econ. Glob. 8(1), 46–66 (2019) 13. Ewing, M.: The Future of Procurement in the Age of Digital Supply Networks. Technical report (2017). Available at: https://www2.deloitte.com/content/dam/Deloitte/us/Documents/ process-and-operations/us-cons-digital-procurement-v5.pdf 14. Goasduff, L.: Benefits of Natural Language Processing for the Supply Chain. https://www. blumeglobal.com/learning/natural-language-processing/. Accessed Sept 2019 15. Machine Learning, and Natural Language Processing in Contract Management. https:// procureability.com/machine-learning-and-natural-language-processing-in-contractmanagement/. Accessed Jan 2020 16. Spirina, K.: AI in Logistics: Data-Driven Shifts to Boost Your Business. https://indatalabs. com/blog/ai-in-logistics-and-transportation (2019). Accessed July 2019 17. AI in Procurement. https://sievo.com/resources/ai-in-procurement. Accessed Dec 2019 18. HernáNdez-PeñAloza, G., Belmonte-Hernández, A., Quintana, M., ÁLvarez, F.: A multi-sensor fusion scheme to increase life autonomy of elderly people with cognitive problems. IEEE Access 6, 12775–12789 (2018) 19. Cooper, A.: Machine Learning is Transforming Logistics. https://tdan.com/machine-learningis-transforming-logistics/23526 (2018). Accessed Jan 2019 20. Arya, K.: How AI technology is digitising supply chain processes. https://www.itproportal. com/features/how-ai-technology-is-digitising-supply-chain-processes/, 2019. Accessed June 2019 21. Van der Linde, N.: How AI Technology Is Digitising Supply Chain Processes. https://tutorials. one/artificial-intelligence/ (2016). Accessed Jan 2019 22. Manyika, J. et al.: Harnessing Automation for a Future That Works. Technical report (2017). Available at: https://www.mckinsey.com/featured-insights/digital-disruption/ harnessing-automation-for-a-future-that-works 23. Rao, A.S., Ghosh, S.: Artificial intelligence in India – hype or reality. https://www.pwc.in/ assets/pdfs/consulting/technology/data-and-analytics/artificial-intelligence-in-india-hypeor-reality/artificial-intelligence-in-india-hype-or-reality.pdf (2018). Accessed Jan 2019 24. Jusoh, S.: A study on NLP applications and ambiguity problems. J. Theor. Appl. Inf. Technol. 96, 1486–1499 (2018) 25. Tomuro, N., Lytinen, S.: Polysemy in lexical semantics-automatic discovery of polysemous senses and their regularities. In: NYU Symposium on Semantic Knowledge Discovery, Organization and Use (2008) 26. Mitkov, R., Evans, R., Or˘asan, C., Dornescu, I., Rios, M.: Coreference resolution: to what extent does it help NLP applications? In: Sojka, P., Horák, A., Kopeˇcek, I., Pala, K. (eds.) Text, Speech and Dialogue, pp. 16–27. Springer, Berlin, Heidelberg (2012)
Potential Use-Cases of Natural Language Processing …
189
27. Morton, T.S.: Coreference for NLP applications. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pp. 173–180. Association for Computational Linguistics, USA (2000) 28. Sukthanker, R., Poria, S., Cambria, E., Thirunavukarasu, R.: Anaphora and coreference resolution: a review. CoRR, abs/1805.11824 (2018) 29. Grishman, R.: Information extraction: capabilities and challenges. Notes prepared for the 2012 International Winter School in Language and Speech Technologies (2012) 30. Ji, H.: Challenges from information extraction to information fusion. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, pp. 507–515. Association for Computational Linguistics, USA (2010) 31. Priest, C.: The Curse of Dimensionality—Combinatorial Explosions. https://blog.datarobot. com/the-curse-of-dimensionality-combinatorial-explosions (2017). Accessed June 2018 32. Novak, V.: Fuzzy Logic in Natural Language Processing, pp. 1–6 (2017) 33. Sayed, I.Q.: Issues in Anaphora Resolution. Stanford (2003) 34. William, D., Presutti, Jr.: Supply management and e-procurement: creating value added in the supply chain. Ind. Mark. Manag. 32, 219–226 (2003) 35. Tai, Y.-M.: Competitive advantage impacts of direct procurement management capabilities and web-based direct procurement system. Int. J. Logist. Res. Appl. 16(3), 193–208 (2013) 36. Christiansen, P., Maltz, A.: Becoming an “interesting” customer: procurement strategies for buyers without leverage. Int. J. Logist. Res. Appl.: Lead. J. Supply Chain Manag. 5, 177–195 (2010) 37. Capellan, C.: Speaking the Language of Logistics. http://mylogisticsmagazine.com/logistics/ columnist/speaking-the-language-of-logistics/ 38. Joseph, K., O’Brien, T., Correa, H.: Tax strategies and organisational communication in MNC supply chains: case studies. Int. J. Logist. Res. Appl. 20(2), 105–128 (2017) 39. Patil, R.J.: Due date management to improve customer satisfaction and profitability. Int. J. Logist.: Res. Appl. 13(4), 273–289 (2010) 40. Sánchez-Rodríguez, C., Hemsworth, D., Martínez-Lorente, Á.R.: Quality management practices in purchasing and its effect on purchasing’s operational performance and internal customer satisfaction. Int. J. Logist. Res. Appl. 7(4), 325–344 (2004) 41. Ghoumrassi, A., Tigu, G.: The impact of the logistics management in customer satisfaction. In: Proceedings of the International Conference on Business Excellence, vol. 11, pp. 292–301. De Gruyter Open (2017) 42. Hao, M.: Using NLP-based Machine Learning to Automate Compliance and Risk Governance. https://nsfocusglobal.com/using-nlp-based-machine-learning-to-automate-compliance-andrisk-governance/ 43. Lame, G.: Using NLP Techniques to Identify Legal Ontology Components: Concepts and Relations 12, 169–184 (2003) 44. Piercy, N.: Strategic relationships between boundary-spanning functions: aligning customer relationship management with supplier relationship management. Ind. Mark. Manag. 38, 857– 864 (2009) 45. Park, J., Shin, K., Chang, T.-W., Park, J.: An integrative framework for supplier relationship management. Ind. Manag. Data Syst. 110, 495–515 (2010) 46. Artificial Intelligence and Its Impact on Procurement and Supply Chain. Technical report. Available at: https://www.gep.com/white-papers/artificial-intelligence-impact-on-procurementsupply-chain 47. Jacob, K.: 6 Ways AI Is Making Supply Chain More Seamless (Supply Chain aka Logistic Industry). https://www.manipalprolearn.com/blog/6-ways-ai-making-supply-chain-moreseamless-supply-chain-aka-logistic-industry 48. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155. Association for Computational Linguistics (1992) 49. Shaalan, K.: Rule-based approach in Arabic natural language processing. Int. J. Inf. Commun. Technol. (IJICT) 3(3), 11–19 (2010)
190
R. Garg et al.
50. Kang, N., Singh, B., Afzal, Z., van Mulligen, E.M., Kors, J.A.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013) 51. Dwivedi, S.K., Singh, V.: Research and reviews in question answering system. Procedia Technol. 10, 417–424 (2013) 52. Jiao, Y.: A method of calculating comment text similarity based on tree structure. In: 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 1, pp. 220–223. IEEE (2015) 53. Ramos, J. et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142. Piscataway, NJ (2003) 54. Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. (2004) 55. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988) 56. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 1–37 (2008) 57. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) 58. Jurafsky, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, vol. 2 (2008) 59. Schütze, H.: Distributional part-of-speech tagging. arXiv preprint cmp-lg/9503009 (1995) 60. Banko, M., Moore, R.C.: Part of speech tagging in context. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 556. Association for Computational Linguistics (2004) 61. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min.: Appl. Theory 1, 1–20 (2010) 62. Haque, M.: Automatic keyword extraction from Bengali text using improved rake approach. In: 2018 21st International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2018) 63. Thushara, M.G., Krishnapriya, M.S., Nair, S.S.: A model for auto-tagging of research papers based on keyphrase extraction methods. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1695–1700. IEEE (2017) 64. Brownlee, J.: Deep Learning for Natural Language Processing: Develop Deep Learning Models for Your Natural Language Problems 65. Du, T., Shanker, V.: Deep learning for natural language processing. Eecis. Udel. Edu 1–7 (2009) 66. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008) 67. Yang, H., Luo, L., Chueng, L.P., Ling, D., Chin, F.: Deep learning and its applications to natural language processing. In: Deep Learning: Fundamentals, Theory and Applications, pp. 89–109. Springer (2019) 68. Sharma, A.R., Kaushik, P.: Literature survey of statistical, deep and reinforcement learning in natural language processing. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 350–354. IEEE (2017) 69. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018) 70. Ancheta Wis. TensorFlow. https://en.wikipedia.org/wiki/TensorFlow#TensorFlow. Accessed Jan 2019 71. Jupyter noteback). https://colab.research.google.com/notebooks/welcome.ipynb. Accessed Jan 2019 72. Colab- Jupyter notebook). https://medium.com/tensorflow/colab-an-easy-way-to-learn-anduse-tensorflow-d74d1686e309. Accessed Jan 2019
Potential Use-Cases of Natural Language Processing …
191
73. Sui, Z., Gosavi, A., Lin, L.: A reinforcement learning approach for inventory replenishment in vendor-managed inventory systems with consignment inventory. Eng. Manag. J. 22(4), 44–53 (2010) 74. Pröllochs, N., Feuerriegel, S., Neumann, D.: Detecting negation scopes for financial news sentiment using reinforcement learning. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 1164–1173. IEEE (2016) 75. Rabe, M., Dross, F.: A reinforcement learning approach for a decision support system for logistics networks. In: 2015 Winter Simulation Conference (WSC), pp. 2020–2032. IEEE (2015) 76. Qiu, M., Ding, H., Dong, J., Ren, C., Wang, W.: Impact of business service modes on distribution systems: a reinforcement learning approach. In: IEEE International Conference on Services Computing, pp. 294–299. IEEE (2007) 77. Bach, N., Badaskar, S.: A review of relation extraction. Lit. Rev. Lang. Stat. II(2), 1–15 (2007) 78. Kaushik, N., Chatterjee, N.: A practical approach for term and relationship extraction for automatic ontology creation from agricultural text. In: 2016 International Conference on Information Technology (ICIT), pp. 241–247. IEEE (2016) 79. Qaroush, A., Farha, I.A., Ghanem, W., Washaha, M., Maali, E.: An efficient single document Arabic text summarization using a combination of statistical and semantic features. J. King Saud Univ.-Comput. Inf. Sci. (2019) 80. Dubey, P.: Word Sense Disambiguation in Natural Language Processing 81. Edmonds, P., Agirre, E.: Word Sense Disambiguation—Algorithms and Applications (2008) 82. Kumar, S., Jat, S., Saxena, K., Talukdar, P.: Zero-shot word sense disambiguation using sense definition embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5670–5681 (2019) 83. Orkphol, K., Yang, W.: Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Future Internet 11(5), 114 (2019) 84. Pal, A.R., Saha, D.: Word sense disambiguation: a survey. CoRR, abs/1508.01346 (2015) 85. Yan, Y., Zhang, J., Yan, M.: Ontology modeling for contract: using owl to express semantic relations. In: 2006 10th IEEE International Enterprise Distributed Object Computing Conference (EDOC’06), pp. 409–412. IEEE (2006) 86. Estival, D., Nowak, C., Zschorn, A.: Towards ontology-based natural language processing. In: Proceedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology, pp. 59–66. Association for Computational Linguistics (2004) 87. Kabilan, V., Johannesson, P., Rugaimukamu, D.: An Ontological Approach to Unified Contract Management (2003) 88. Swain, A., Cao, Q.: Exploring the impact of social media on supply chain performance: a sentiment analysis. In: Memorias del 44th Annual Meeting, Decision Sciences Institute, Baltimore (2013) 89. Swain, A.K., Cao, R.Q.: Using sentiment analysis to improve supply chain intelligence. Inf. Syst. Front. 1–16 (2017) 90. Soares, M.A.C., Parreiras, F.S.: A literature review on question answering techniques, paradigms and systems. J. King Saud Univ.-Comput. Inf. Sci. (2018) 91. Stroh, E., Mathur, P.: Question Answering Using Deep Learning (2016)
Partial Consensus and Incremental Learning Based Intrusion Detection System Mohd Mohtashim Nawaz, Vineet Gupta, Jagrati Rawat, Kumar Prateek, and Soumyadev Maity
Abstract The flamboyant, scrupulous, growth of internet, coupled with easily available computer systems has brought with it an alarmingly serious threat of malicious, ill-intention and often criminally rich cyber activities. Intrusion is one such attempt to compromise the sanctity and security of computer systems. As the progress in technology has equipped the systems with renewed vigor of increased computational resources, researchers all over the world have been swarming up to build efficient systems to thwart intrusion attempts. This paper provides insights on various techniques and technologies widely used in building intrusion detection systems. The paper proposes a solution for developing intrusion detection systems based on the concepts of consensus and incremental learning in order to mitigate the issues of batch learning and centralized systems. The proposed solution is based on the NSL-KDD data set and has high accuracy and detection rate. Keywords Intrusion detection system · Incremental learning · Weighted random selection · Consensus · Dual buffer system
1 Introduction Intrusion is an attempt to compromise the security of a computer system. The intruder usually tries to compromise the security policies like integrity, confidentiality and availability of the system in addition to user policies and system defined security measures. Intrusion detection, as name suggests, is the process of detection of the intrusion and attempt to mitigate it. Due to widespread network and involvement of large number of users, the intrusion detection is challenging but critically important in today’s world to achieve the goal of cyber security. The demand for IDSs has M. M. Nawaz · V. Gupta · J. Rawat · K. Prateek (B) · S. Maity Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India e-mail: [email protected] S. Maity e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_14
193
194
M. M. Nawaz et al.
increased exponentially and the IDSs that can scan network’s large data, are robust and have a good accuracy, are in extreme demand within the industries. Various tools like sniffers, firewalls, IDSs have become a necessity now-a-days because of increased cyber-attacks. Intrusion detection systems are classified as Host-based IDSs (HIDSs) and Network-based IDSs (NIDSs) based on where the components are placed. NIDSs monitor and analyze the network traffic for detection of any kind of intrusion, subsequently triggering alarms of intrusion. Distributed Intrusion detection systems (DIDSs) are designed for networks where centralized systems don’t work efficiently. There exists variety of methods to implement an IDS but the use of data mining and machine learning is quite profound and increases the efficiency of detection systems significantly. The model based systems provide a simple yet effective solutions but probabilistic methods are known for better accuracy. Some researchers have tried to combine the model based and probabilistic techniques to build efficient classifiers. Atay [1] uses Probabilistic Neural Networks for classification. However, it is interesting to note that implementation of incremental learning and consensus into the intrusion detection systems is exceptionally rare if not yet to be implemented. Models trained using batch learning loses learned information on retraining. On the other hand, incremental learning is a form of machine learning where the model is able to learn incrementally without losing the learned information. Incremental learning can also be useful to the systems where the training data is available incrementally i.e. in chunks and/or the memory is limited. For the purpose of evaluation of IDSs KDDCup’99 [2] data set is used. Data set considers four types of attacks namely, DoS attack, Probe attack, R2L attacks and U2R attacks. NSL-KDD [3] is advance and available publicly which solves certain issues of KDDCup’99 data set.
1.1 Organization of Paper The paper discusses the related work in Sect. 2. The paper proposes a solution based on an incremental learning algorithm Learn++ [4] with partial decentralization in order to detect intrusion. Thereby, it makes decisions on the basis of consensus among the nodes. The Algorithm is discussed in detail in Sect. 3. The Proposed solution’s model was trained and tested on NSL-KDD dataset. The performance of the proposed solution is discussed in Sect. 4. Additionally, Sect. 5 concludes the paper with possible future work.
2 Related Work Different architectures and algorithms for anomaly detection are a subject of rigorous research. Sen et al. [5] proposed cluster based semi-centralized IDS for wireless networks [5]. Proposed model divides the network into clusters using [6] clustering
Partial Consensus and Incremental Learning …
195
algorithm. The method uses a mobile agent for processing but that leads to security concerns related to mobile agents [7]. The volume editors, usually the program chairs, will be your main points of contact for the preparation of the volume. Ouiazzane et al. [8] proposed a model of distributed network IDS [8] which can detect intrusions in a large network using multi-agent system and HADOOP Distributed File System (HDFS). However, the use of modern big data techniques and file systems like Hadoop file systems adds to the complexity. Michel Toulouse et al. [9] proposed a fully distributed network IDS based on a consensus algorithm called “average consensus algorithm” [9]. Naive Bayes classifier is used as a model for anomaly detection. However, the model suffers the problems of consensus algorithm [10]. Krishnan Subramanian et al. [11] proposed an intelligent IDS which uses support vector machines and neural networks [11]. The model trains individual neural network for each type of attack. Weighted average is assigned to each separate attack and voting is done to detect the anomaly. Abdelaziz et al. [12] proposed an anomaly based IDS for ad-hoc networks which protect the routing operation in the network from wormhole attacks and rushing attacks. The proposed solution in [12] uses cluster based hierarchical architecture where cluster heads are chosen on the basis of the computational capabilities. Viktor et al. [13] reviewed incremental machine learning algorithms. The paper studied the accuracy obtained by different most widely used algorithms. Polikar et al. [4] proposed ‘Learn++’, an incremental learning algorithm. The algorithm uses ensemble of classifiers learned on the data sets sampled from a database and uses a weak classifier as the base learning algorithm. Weighted majority voting process is used to combine results of different learned classifiers. Interested reader can find more in [4]. The comparison of related works is listed below in tabular form with advantages and disadvantages of each work (Table 1).
3 Proposed Solution We have proposed a solution for an intrusion detection system using incremental learning and partial consensus.
3.1 Algorithm Initially, a classifier model is trained and kept at each node. N is the total number of nodes in the network and i is the node at which the packet/request arrives. The algorithm works as follows: Phase I: Packet/Request arrives at node i and i obtains the probability by applying its own classifier. If resulting probability is greater than or equal to the confidence probability, i makes the decision and the results are displayed.
196
M. M. Nawaz et al.
Table 1 Comparison among different methods Proposed solution
Pros
Cons
Sen [5]
Processes intrusions at local (cluster) level and global (network) level separately which reduces the communication overhead
When network topology changes frequently and head nodes leave network at higher rate, frequent elections must be taken which increases repeated data transfer
Toulouse et al. [9]
Better accuracy because decision is taken after consulting multiple nodes
Inefficient for large number of NIDS modules as, for every decision making node 1000 iteration are executed
Ouiazzane et al. [8]
High processing speed because of Data must be replicated and used Hadoop file system, stored at multiple sites to prevent scale-able to large networks data loss
Subramanian et al. [11]
Able to detect different type of Computationally expensive and attack efficiently as for every type needs large amount of data to train of attack different specialized neural networks are used
Abdelaziz et al. [12]
Very high accuracy of detection without affecting network performance
Works on an assumption that link between cluster head and cluster nodes is secure
Phase II: If resulting probability is lesser than the confidence probability, i relays the request to m nodes and waits for a certain time period to get the results. i then makes the decision based on weighted average of probabilities received including own classifier prediction and displays the results. Phase III: Weight readjustment and instance storing is applicable only if phase II has been executed otherwise only the selection of m nodes is needed. The weights of these m nodes are readjusted and broadcast to the network. i selects m nodes for next time and the received instance is stored into the database. Figure 1 shows the architecture using a flow diagram.
3.2 Learning • Learn++ [4] is used with the neural networks as weak base classifier and whenever a sufficient number of new instances are accumulated at the node, model is retrained.
3.3 Selection of Nodes The m number of nodes, where m should be much less than N, are selected using weighted random selection where weights are based on predictive weights and
Partial Consensus and Incremental Learning …
Fig. 1 Process architecture
197
198
M. M. Nawaz et al.
distance of node in the network. The nodes to be requested are selected beforehand to reduce the overhead of neighbor selection when packet arrives. A dual buffer system is used for reduced overheads and better efficiency.
3.4 Updating the Weights The updating of weights is one of the crucial steps. Initially, all the nodes are given a constant weight, W init . If a node recovers after failure, it needs to sync the set of weights from its neighbors and take the average. If a node does not return any value, it is not included in weight updating and decision making. Otherwise weights are updated as: ε, α, β: er r or t ol er ance f act or, wei ght cont r ol par amet er s v, ci :agr eedval ueand pr posed pr obal i t yval ue B P F(W ): Band P ass Funct i oni nr ange[0, W max ] I f : |v − ci | ≤ εt hen : W i = W i + α ∗ e−|v−ci |/ε |v − ci | − e−|v−ci |/ε el se: W i = W i − β∗ ε W i = B P F(W i )
3.5 Consensus Once the weights are updated, they are broadcast to the network. Each weight update message contains an epoch number to identify the current epoch. The message is signed by the node which sends the update message and includes only weights that are needed to be updated along with the IDs of nodes to which the respective updated weights belong. When a node receives a weight change message, signature is checked and the node further relays the message if signature is valid. If message contains ID of any node, it checks if it sent the prediction and message epoch number is greater than its current epoch number, if so, it updates its weights. Otherwise, it rejects the message and do not propagate it further. Rather, it broadcasts an alert message to indicate to all nodes that something malicious has happened and every node should revert to the previous weights. Such an alert message contains ID of weight change message and its epoch number. Nodes need to revert back to previous set of weights. Other nodes update the weights if signature is valid and epoch number is greater than their current epoch number. Each node needs to keep a few previous set of weights so that the changes can be undone.
Partial Consensus and Incremental Learning …
199
3.6 Parameters The proposed solution relies upon certain parameters and the precise choice of these parameters is very important. The value of confidence probability, weight time and m is an important factor in deciding the latency. W init can be set to 1 while W max can be arbitrarily set to any value not too large. For example, W max can be set to W init + ε. A ε respectively. The values of these and β shall be chosen small and in range of ε and 10 parameters decide the accuracy and latency and can only be found intuitively and experimentally depending upon a number of factors like size and type of network, base classifier’s accuracy, trade-off between latency and accuracy of prediction, etc.
4 Performance Analysis The results obtained after evaluation on NSL-KDD data set are summarized below: Table 2 lists the accuracy, detection rate and average latency obtained using the proposed model when tested in a dedicated wired local network of 10 nodes within small distances of lesser than a few miles. The associated figures, Figs. 2 and 3 provide the insights into the observations. Values of parameters used: m = 4, α = 1.0, β = 0.3, ε = 5. Table 2 Accuracy and latency S. No.
Attack type
Accuracy
Detection rate
1
DoS
98.11
98.44
18.61
2
Probe
97.48
98.19
19.84
3
R2U and R2L
95.91
97.76
23.39
Fig. 2 Accuracy and DR graph
Average latency (ms)
200
M. M. Nawaz et al.
Fig. 3 Latency graph
5 Conclusion All Concepts of distributed systems when combined with data science techniques result in robust and accurate systems which also deals with inherent problems of centralized systems. Systems which are able to learn incrementally with the ability to deal with large data and have high accuracy and robustness are welcomed into the industry. Such systems can be particularly helpful in smaller networks. The proposed solution uses the concepts of consensus and incremental learning and is not only efficient but also eliminates problems inherent to batch learning and centralized systems. The solution is simple, lightweight yet effective and has high accuracy. However, the network latency may increase in the large networks depending upon network traffic, type and topology. Future work includes decreasing the latency in prediction while keeping the accuracy constant, finding the optimal parameters with respect to different network types and topology and working on the scalability of solution.
References 1. Atay,˙I.: Intrusion Detection with Probabilistic Neural Network(PNN)—Comparative Analysis (2018) 2. KDDCup’99 dataset. https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 3. NSL-KDD dataset for network intrusion detection systems. https://www.unb.ca/cic/datasets/ nsl.html, https://www.ncbi.nlm.nih.gov/ 4. Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 31(4), 497–508 (2001) 5. Sen, J.: An intrusion detection architecture for clustered wireless ad hoc net-works. In: Computational Intelligence, Communication Systems and Networks (CIC-SyN), 2010 Second International Conference on. IEEE, 2010 6. Lin, C.R., Gerla, M.: Adaptive clustering for mobile wireless networks. IEEE J. Sel. Areas Commun. 15(7), 1265–1275 (1997)
Partial Consensus and Incremental Learning …
201
7. Farmer, W., Guttman, J., Swarup, V.: Security for mobile agents: authenti-cation and state appraisal. In: Proceedings of the European Symposium on Research inComputer Security (ESORICS), LNCS, Sept 1996 8. Ouiazzane, S., Addou, M., Barramou, F.: A multi-agent model for network intrusion detection. In: 2019 1st International Conference on Smart Systemsand Data Science (ICSSD), Rabat, Morocco, 2019, pp. 1–5 9. Michel Toulouse, B‘ui Quang Minh and Philip Curtis.” A consensus based networkintrusion detection system” 2015 5th International Conference on IT Convergenceand Security (ICITCS). 10. Lynch, N.A.: Distributed Algorithms. Morgan KaufmannPublishers Inc., San Francisco, CA, USA (1996) 11. Subramanian, K., Senthilkumar, S., Thiagrajan, B.: Intelligent Intrusion Detection System Using a Committee of Experts (2016) 12. Abdelaziz Amara korba, Mehdi Nafaa, Yacine Ghamri-Doudane, 2016 7th Inter-national Conference on the Network of the Future (NOF). 13. Losing, V., Hammer, B., Wersing, H.: Choosing the BestAlgorithm for an Incremental On-line Learning Task (2016)
AI Enabled Context Sensitive Information Retrieval System Binil Kuriachan, Gopikrishna Yadam, and Lakshmi Dinesh
Abstract Artificial Intelligence (AI) is slowly seeping into all areas of knowledge and many of the activities we carry out day by day. Most of the repetitive and manual tasks that do not require great capacity can be moderately simple optimized using AI. The machines are getting smarter everyday through AI to deliver improved results quickly and improving the customer experience and learning on the fly. Search engines are a text book application of Natural Language Processing (NLP), a field of AI dedicated to teaching computers to understand human written language in order to find users the information they’re looking for. A major limitation of most existing information retrieval systems and models is that there is a gap between the information about the actual user and understanding the search context is largely ignored since the retrieval decision is made solely on the query and document collection. In this paper we are focusing on building an In-house AI based Context Sensitive Information Retrieval System with Keyword Search, Latent Drichlet Allocation (LDA) and Document to Vector (Doc2Vec) Similarity models and Entity Extraction with the available In-house data which helps in bridging the gap between the understandings of user search query by the computer and brings the context sensitive results. Keywords Context sensitive information retrieval system · Keyword search · LDA and Doc2Vec · DM · DBOW · Entity extraction
1 Introduction In most existing information retrieval models, the retrieval problem is treated as involving one single query and a group of documents from one query, however, the retrieval system can only have very limited idea about the user’s information need. An optimal retrieval system thus should try and exploit the maximum amount
B. Kuriachan (B) · G. Yadam (B) · L. Dinesh Boeing India Private Limited, Bangalore, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_15
203
204
B. Kuriachan et al.
of additional context information as possible to enhance retrieval accuracy, whenever it’s available. Emphatically, context-sensitive retrieval has been identified as a significant challenge in information retrieval research [1]. In general, the retrieval results using the user’s initial query might not be satisfactory; often, the user would wish to revise the query to boost the retrieval/ranking accuracy. For a posh or difficult information need, the user may have to switch his/her query and look at ranked documents with many iterations before the data need is totally satisfied. For this reason, context sensitive information retrieval system has attracted much attention recently. We define context sensitive information retrieval system broadly as exploiting the use of assorted AI based NLP unsupervised techniques like LDA [2], Word-Embedding [3] and Semantic Similarity with Doc2Vec [4] models and integrating the results of these techniques to boost retrieval results. A major advantage of context sensitive information retrieval system is that we will improve the retrieval accuracy without requiring any user effort. For instance, if the present query is “bank”, without knowing any extra information, it’d be impossible to grasp whether it’s intended to mean the Banks like ICICI, SBI or the river bank. As a result, the retrieved documents will likely have both sorts of documents— some is also about the banks like ICICI, SBI etc., and a few is also about the river banks. However, any particular user is unlikely trying to find both varieties of documents. Such an ambiguity is resolved by understanding the concept within the style of bank he/she is trying to find supported the given search query. This paper explores the developing of a context sensitive information retrieval system and its building mechanism which helps the users who wants to build an intelligent AI based searching mechanism in their respective domains/areas to search out the relevant information for the given user queries. The main focus of this paper is to elaborate the concepts and techniques used in the implementation process of retrieval system, the dataset we used is our in-house aerospace data which is restricted to share and talk, so we are interpreting the same concepts on an open source Stackoverflow data. The rest of the paper is divided into several sections such as Section A describes the Dataset used, which explains the nature of parameter description. Section II starts by describing the overall approach taken and presents the details of Keyword Search implementation, LDA [2] Analysis, Doc2Vec [4] Similarity Scores calculation and integration of the model results and generating the ranking of results based on the observed topic weights, similarity scores of the given query with the information to retrieve, entity extraction from the user query and fine tune the results. Section 3 presents results and discussions, followed by conclusions and future work in Sect. 4.
AI Enabled Context Sensitive Information Retrieval System Table 1 Description of parameters
Parameter
Description
Title
Title of the user query
Body
Description of the user problem
Tags
Name of the related technologies
205
1.1 Experiment Dataset In this research we have used Stackoverflow’s open source dataset [5]. The dataset comprises different queries posted by the users on the problems they faced on multiple technologies. Table 1 describes the parameters in the dataset. The entire dataset is available from the Stackoverflow’s Wiki Dump portal [5].
1.2 Related Work In this research, we have used various NLP concepts to build Context Sensitive Information Retrieval System which starts from Data Collection, Data Preprocessing, N-Gram Distribution Analysis, Vectorization, Topic Modelling Analysis with LDA [2], Document Similarity Calculation with Doc2Vec Model [4] and Integration of LDA [2] with Doc2Vec [4] in a pipelined architecture to establish the semantic relationship between the query given by the user and the related results produced for it which helps the Data Science developers to implement and establish the contextual relationship between the end user interaction with the dataset they have in their respective domains in real time.
2 Technical Approach In our study, we have met four major tasks—Data Collection and Preprocessing, Filter relevant documents based on matched keywords from the user query, Topic Modelling Analysis with LDA [2] Algorithm by using Vectorization techniques like CountVectorizer [6], Document similarity calculation with Doc2Vec [4] Model and fine tune the results with Entity Extraction. The steps involved in our pipeline are shown in Figs. 1 and 2. A. B. C.
Defining the problem—which is the Context Sensitive Information Retrieval System using NLP Techniques. Data Collection—from Stackoverflow’s open source data. Data Preprocessing
This section involves steps to perform Data Preprocessing technique which is an essential step in building a Machine Learning model and depending on how well
206
B. Kuriachan et al.
Fig. 1 Context sensitive info retrieval system pipeline - offline
Fig. 2 Context sensitive info retrieval system pipeline - online
the data has been preprocessed the results are seen. Various text data preprocessing techniques like removal of special characters, punctuations, white spaces, stopwords [7] and stemming [8] of words in the sentences were implemented using respective techniques from NLTK [9] Library (Fig. 3).
AI Enabled Context Sensitive Information Retrieval System
207
Fig. 3 Sample output on source text to cleaned text
D.
E.
Vectorization In this step we have used CountVectorization [6] technique from sklearn [10] and analyzed the distribution of unigrams, bigrams and trigrams frequency from the vocabulary of the input dataset and then we transformed the sentences into vectors. Keyword Search Results In this section we have found the relevant documents with keywords matched from the user given query, this can be done with various third party tools with technologies like MarkLogic [11], Lucene [12] and Elastic Search [13] etc. Since our focus is on building an in-house context sensitive information retrieval system, we created our own logic to retrieve documents which has matched keywords from the user given query was explained below. Once the corpus is created with CountVectorizer [6] a sparse matrix with dimensions of N*M is generated, where N represents the number of rows in our dataset and M represents the size of vocabulary of the corpus as shown in Table 2 with M being the Mth word and N being the Nth Document of the dataset.
Now when the user gives his/her input query say “Show me files with images in python” will be processed with the same preprocessing techniques and transformed to a matrix of dimension 1 * M which was transposed to M * 1 as shown in Table Table 2 Multiplication of sparse matrix and user query matrix Problem
file
image
…
Mth word
Doc 1
1
0
0
…
0
Doc 2
0
2
1
…
0
Doc 3
0
0
2
…
0
….
…
…
…
…
…
Doc N
1
1
1
…
1
208
B. Kuriachan et al.
Table 3 Resultant matrix after multiplication
Document number
Matched keyword count
2
3
3
2
…
…
N
1
2 and then multiply the transposed matrix of the user query (M * 1) with the sparse matrix (N * M) of the dataset to fetch.
Multiply (*) problem file image …. Mth word
0 1 1 … 1
= (N*M) * (M*1) = N*1
On multiplication of both these matrices we will get a resultant matrix with N * 1 which represents the occurrence of each word from the user given query with each document from the dataset to fetch. The resultant matrix represents the count of matched keywords for all the documents in our dataset, filter the documents which have a minimum value of 1 matched keyword count from the resultant matrix after multiplication. The higher value of keyword count represents more number of matched keywords observed in the document of dataset to be fetched with the words from user given query as shown Table 3. From the above result we can observe that for the user query of “Show me files with images in python” the documents 2, 3 and N are having matched keywords of image and file with Document 2 being the majority of match in relevant keyword count followed by 3 and Nth documents respectively. F.
Topic Modelling with LDA In this section we have used Topic modelling technique with LDA [2] algorithm to understand the semantic relationship between the user given query and the dataset to produce as search results. Topic modelling refers to the task of identifying topics that best describes a set of documents. These topics will only emerge during the topic modelling process therefore called latent. The big idea behind LDA [2] is each document can be described by a distribution of topics and each topic can be described by a distribution of words and the goal of LDA [2] is to map all the documents to the topics in a way, such that the words in each document are mostly captured by those imaginary topics.
By using Latent Drichlet Allocation from genism [3] we distributed the dataset into topics by creating the corpus and dictionary of words on the documents which are having relevant keyword count greater than 0, becomes our input to the model.
AI Enabled Context Sensitive Information Retrieval System
209
Fig. 4 Sample Coherence score plot with coherence values of each topic
The optimal no of topics are derived by using Coherence Model from genism where we define the range of values and validating the model performance by calculating the coherence score for the given number of topics in an iterative approach as shown in the below coherence plot (Fig. 4). Selecting optimal number of topics is completely depends on the input data, using metrics like coherence, perplexity, log likelihood scores helps in identifying the optimal number of topics. Once the optimal number of topics are generated, topic weights can be generated by running LDA [2] algorithm. The below figure represents the snapshot of the topic weights generated for each document for the given number of optimal topics as 10 (Table 4). Now we set a minimum weightage which considers the maximum number of topics for a given document, from the above figure we can observe that to consider a minimum of two topics for a given document the minimum weighted to be considered is 0.18 on our dataset. Now we will replace the null values in respective topics with our minimum weightage value and with this we are done with our topic weightage generation for each document on our dataset. Table 4 Topic weights for each document Doc No.
1
1
0.11
2
3 0.47
3
0.92
….
5
6
0.06
2 4
4
0.18
0.11
7
8
0.35 0.24 0.72
0.09
10 0.47
0.16 0.08
9
210
B. Kuriachan et al.
Table 5 Predicted topic weights for the user query 1 Query
F.1.
2
0.31
3
4
0.18
5
6
7
8
0.23
9
10
0.22
0.05
LDA Topic Weightage Prediction The next step as part of Topic modelling with LDA is Topic Weightage Prediction and it can be done as follows.
Once we capture the user query, for finding the relevant documents he/she wants to fetch from our dataset, we have applied the same data preprocessing procedure, transform the query to vector using.transform operation of CountVectorization, create corpus and then predict the topic weightages using model [usercoprus] function and the output is shown Table 5. From the above table we can observe that the user query is majorly talking about four different topics 1, 3, 5 and 9 respectively, from our initial analysis on topic weightage we considered a minimum topic weightage value of 0.18 using the same value we can filter out the topics which are having greater than equal to our minimum weightage value, in our case we need to consider all the observed 4 topics. Now we filtered the corresponding topic numbers (1, 3, 5 and 9) from our topic weightage distribution on our keyword match results and then multiplied those topic weightages with its corresponding observed matched keyword word count from step E and then sum all the weightages for each document. In our case from step E we observed that document 2, 3 and N are having relevant keywords, so we have fetched the topics weights of (1, 3, 5 and 9) for all those documents and multiplied those weightages with its corresponding matched keyword count as shown Table 6. From the above output we can observe that from the LDA Score column on all the relevant documents of 2, 3 and N, document 2 being the most relevant for the user query followed by document 3 and document N. G.
Doc2Vec Similarity Calculation The next step in the process of understanding semantic relationship with the user given query and the dataset to produce as search results is Doc2Vec [4] Similarity calculation, The objective of it is to create the numerical representation of sentence/paragraphs/documents regardless of its length, unlike word2vec that computes a feature vector for every word in the corpus, Doc2Vec
Table 6 LDA score calculation 1
3
5
9
LDA score
Doc 2
0.18 * 3
0.47 * 3
0.24 * 3
0.18 * 3
3.21
Doc 3
0.18 * 2
0.92 * 2
0.08 * 2
0.18 * 2
2.72
…
…
…
…
…
…
Doc N
0.18 * 1
0.5 * 1
0.2 * 1
0.18 * 1
1.06
AI Enabled Context Sensitive Information Retrieval System
211
[4] computes a feature vector for every document in the corpus. The vectors generated by Doc2Vec [4] can be used for tasks like finding similarity between sentences/paragraphs/documents. We have two types of document embedding models from paragraph vectors. (1)
Distributed Memory (PV-DM) and (2) Distributed Bag Of Words (DBOW)
1.
Distributed Memory (PV-DM) The basic idea behind PV-DM is inspired from Word2Vec [3], within the CBOW model of Word2Vec [3], the model learns to predict a center word supported the context. for instance, given a sentence “The dog sat on sofa”, CBOW model would learn to predict the word “sat” given the context words—the, cat, on and sofa. Similarly, in PV-DM, the central idea is randomly sample consecutive words from a paragraph and predict a center word from the randomly sampled set of words by taking as input—the context words and a paragraph id. Distributed Bag Of Words (DBOW) The model is slightly different from the PV-DM model. The DBOW model “ignores the context words in the input, but force the model to predict words randomly sampled from the paragraph in the output”.
2.
In our research we have used Doc2Vec [4] similarity model from genism [3] with DBOW approach and trained the model on tagged documents by declaring optimal range of epochs and calculated the similarity score for the documents with relevant keyword count greater than 0. Now we have similarity score for each of our documents obtained from keyword match and then we multiplied the score with its corresponding keyword match count as shown below for our user query (Table 7). H.
Entity Extraction The final step in the process of implementation is Entity Extraction [15] which is a process of identifying key elements from the text and helps in understanding the conditions or specifications from the user query and making the results more in-line with the user query. In our context of user given query “Show me files with images in python” with python being the entity from the query, which means the user is looking for image files more specifically on python. In our source data we have a column called Tags which talks about the technology of the document it is related to, after finding the relevant documents from the techniques we discussed above in this section and filtered the results on Tags
Table 7 Doc2Vec similarity score calculation
Similarity score
Doc2Vec score
Doc 2
0.7 * 3
2.1
Doc 3
0.3 * 2
0.6
…
…
…
Doc N
0.2 *1
0.2
212
B. Kuriachan et al.
Table 8 Integrated score for each document
LDA score
Doc2Vec score
Combined score
Doc 2
3.21
2.1
5.31
Doc 3
2.72
0.6
3.32
…
…
…
…
Doc N
1.06
0.2
1.26
column with key as “python” which made our output to be more context specific for the user query. Techniques like Named Entity Recognition models [14] and Domain Specific Pattern Identification are used to extract the entities like time dimension or word phrase entities from the text.
3 Results Now we have scores obtained from LDA [4] analysis and Doc2Vec Similarity [4] analysis, we calculated the sum of these two scores for each relevant document from the keyword search logic gives the contextual similarity score of the document with respect to the given user query. We sorted the documents in decreasing order of the score which intuits the ranking order of relevancy of the related documents. The order of sequence of the documents represents the order of relevancy of the document with the user query (Table 8). From the above result we can observe that for the given user query “Show me files with images in python” document 2 being the top most relevant contextual result, followed by document 3 and document N, now these relevant results are validated for “python” tag, unmatched results are excluded and matched results are displayed accordingly.
4 Conclusion and Future Work Natural Language Processing has emerged as a powerful area in solving complex problems without fatigue and in a consistent, unbiased way. Considering the staggering amount of unstructured data that’s generated every day from medical records to social media and across the domains, automation will be critical to fully analyze text and speech data efficiently. Human language is astoundingly complex and diverse, so it is important to interpret the human readable language into machine understandable language and generate powerful insights out from the data and bridged the gap between human and machines in building some powerful reusable assets and scalable eco-systems in documentation process or accuracy of documentation etc.
AI Enabled Context Sensitive Information Retrieval System Table 9 Table of abbreviations
AI
213
Artificial intelligence
NLP
Natural language processing
LDA
Latent Drichlet allocation
Doc2Vec
Document to vector
CBOW
Continuous bag of words
DBOW
Distributed bag of words
DM
Distributed memory
In our study, we have used keyword search technique, Topic Modeling Analysis, Document Similarity Techniques, Entity Extraction technique and implemented the Context Sensitive Information retrieval system using various statistical algorithms like Latent Drichlet Allocation [4], Doc2Vec [4] Modeling with techniques like Probability distribution, Distributed Bag of Words and Distributed Memory etc. Outcomes from this technique can be used by data science experts in building a context sensitive information retrieval system on their in-house data and building some in-house applications across the domains. In future we are trying to leverage building user and item profile using search and click data, personalization and improve search results relevance based on their preference/interests (business unit, programs etc.). Thus the building of a context sensitive information retrieval system can be done, allowing much better progress in the field of NLP in Artificial Intelligence.
Appendix See Table 9.
References 1. Context-Sensitive Information Retrieval Using Implicit Feedback using Xuehua Shen, Bin Tan and ChengXiang Zhai 2. ALDA: An Aggregated LDA for Polarity Enhanced Aspect Identification Technique in Mobile App Domain by Binil Kuriachan and Nargis Pervin 3. Loper, E., Bird, S.: NLTK: The Natural Language Toolkit 4. Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., Zhao, L.: Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey 5. https://stackoverflow.blog/tags/cc-wiki-dump/ 6. Dai, A.M., Olah, C., Le, Q.V.: Document Embedding with Paragraph Vectors 7. Mu, C., Zhao, J., Yang, G., Zhang, J., Yan, Z.: Towards Practical Visual Search EngineWithin Elasticsearch 8. https://gist.github.com/sebleier/554280 9. Shahmirzadi, O., Lugowski, A., Younge, K.: Text Similarity in Vector Space Models: A Comparative Study
214
B. Kuriachan et al.
10. Rehurek, R.: Gensim a Python framework for Topic Modelling, Word2Vec and Doc2Vec 11. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-Learn: Machine Learning in Python 12. Pande, B.P., Tamta, P., Dhami, H.S.: Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm 13. Asaad, C., Baïna, K., Ghogho, M.: NOSQL Databases: Yearning for Disambiguation 14. Teofili, T., Lin, J.: Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors 15. Giorgi, J., Wang, X., Sahar, N., Shin, W.Y., Bader, G.D., Wang, B.: End-to-End Named Entity Recognition and Extraction using Pre-trained Language Models
Personalization of News for a Logistics Organisation by Finding Relevancy Using NLP Rachit Garg, Arvind W Kiwelekar, Laxman D Netak, and Swapnil S Bhate
Abstract Artificial Intelligence-driven applications have already stepped in to streamline logistics on a global scale. News impact and relevancy helps in taking the right decision at the right time in the logistics industry. This paper attempts to provide a state of art in finding relevancy in news headlines. We present the research done on logistics data using natural language processing. In this paper, we will explain the different algorithms we have used as well as the various embedding strategies we have tried to find news relevancy. We have used statistical and deep learning models to extract information from the corpora. The proposed methods are compared based on relevancy score and results are significantly acceptable. Keywords Logistics · Artificial intelligence · Natural language processing (NLP) · Deep learning · News relevancy · Word embedding
1 Introduction This would have been unimaginable, 100 years ago, how much news content a person would regularly access these days. More and more users find and read the news that is interesting to them through search engines, and almost every engine provides news search features. With the vast amount of online news available today, there is a growing need for efficient technology to meet the needs of user data. The rapid R. Garg (B) · A. W. Kiwelekar · L. D. Netak Dr. Babasaheb Ambedkar Technological University, Lonere, MS, India e-mail: [email protected] A. W. Kiwelekar e-mail: [email protected] L. D. Netak e-mail: [email protected] S. S. Bhate ATA Freight Line India Pvt. Ltd., Pune, MS, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_16
215
216
R. Garg et al.
development of the Internet, online news search and browse have been one of the most important activities for a user. On the other hand, more and more news is available online, while it is more difficult for users to obtain interesting and relevant news articles [1]. So, to fulfill the need of a given user for news needs, it is necessary to study how to retrieve the most relevant news. Relevance is the paramount driver of news consumption. People find such stories most important to their personal lives, as they concern family members, the place where they work, their leisure activities, and their local communities. Delivery of news to a particular user based on his interest improves the business value fetched for the information [2]. Expect the unexpected when it comes to the logistics industry as a number of circumstances may have an impact on the expected the delivery date of the product. So, news headlines related to such an event can be checked and its impact on users can find out so that take the right decision can be taken at the right time. Logistics business involves various stakeholders thus in that scenario it is always necessary to keep up-to-date with maritime news so that business activities or operations can be updated and improved by maritime news. This study uses an innovative methodology, described in the later sections, to find maritime news relevancy and understand how news can be recommended to them in a crowded media environment. The method used in this study is a technological NLP based method offers unique advantages over conventional approaches to measuring news preferences. This paper is divided into sections where Sect. 2 discusses the motivation of using NLP and Sect. 3 discusses the problem statement and literature review. Section 4 discusses the overview of the proposed system. Section 5 discusses the material and methods used for the proposed system. Section 6 projects the experimental results, while Sect. 7 concludes.
2 Motivation Behind NLP for News Relevancy The motivation behind using NLP is that humans make decisions using most of the available information. This usually takes several minutes to discover new information and make a decision. NLP is opening the door for the technology media industry. We could potentially exploit this difference to create a business strategy. NLP techniques can be used to extract different information from the headlines. NLP technology can also help a platform accurately predict user expectations in order to effectively increase user satisfaction. The context analysis was another NLP technique that provided promising results. This is a recent deep learning approach where you rely on a large corpus of text to learn and predict the words around a target [3]. You can then deduce the context in which it usually appears. The result is a vector representing each word. Natural Language Processing (NLP) is a trend in computer science aimed at training the computers to directly perceive and generate human language, without transforming it into computer algorithms. NLP is opening the door for the technology
Personalization of News for a Logistics Organisation …
217
media industry. It deals with human language each hour and second; the ability of the computer to work with the human language makes it capable of completely changing media processes all over the world. Computer intelligence will automate searching for necessary information, parsing relevant news, and analyzing and systematizing the news according to relevancy.
3 Literature Review In an intelligent logistics platform, Intellectualization has become a new trend for any industry, driven by intelligent and smart technology including cloud computing, big data, and Internet of things Yang et al. has developed an intelligent system that includes main applications such as e-commerce, self-service transceiver, big data analysis, location route and distribution optimization [4]. This paper creates an intelligent system that covers the core application of Natural Language Processing (NLP) to meet the need for smart logistics services. NLP consists of various operations over natural language, such as understanding it, extracting information retrieval from natural language, and embedding representations into words. A model has been developed by Bouraoui et al. which provides a new viewpoint to determine the semantic similarity of words. Word embedding is an essential method of modeling semantic similarities among terms [5]. This model is based word2vec model developed by Google, one of the most commonly used word embedding algorithms. Word2vec is based on a skip-gram model because the training goal of the Skip-gram model is to find word representations that are useful for predicting the neighboring words in a phrase or document so that word embedding can be defined. For big data that is a huge data set used in many fields. Google’s proposed Word2vec is a neural network for text data and this model can also be used to train the data and fit it into different dimensions to reduce the dimension size of the original data. This is achieved by generating word vector by employing a skip-gram model and then reducing the features by k means clustering, considering the issues in large scale data word2vec provides the feature of clustering for similar data [6]. word2vec is a two-layer a neural network having input as text corpus and output as a vector space model of a text corpus. The purpose of word2vec is group the vector of similar words in the vector space model. The similarity is measured by the cosine similarity between the words. The cosine-similarity value of zero means no similarity at a 90◦ angle where a similarity value of 1 means 100% identical at a 0◦ angle. Term Frequency-Inverse Document Frequency (TF-IDF) is an algorithm that is also used by Google to rank factors for content. TF-IDF is frequency based other method for word embedding that occurrence of a single document rather than the entire corpus. TF-IDF is a technique of information retrieval calculating the frequency of a word and its inverse document frequency and calculating the score after multiplying both the values obtained from TF and IDF. According to Turney and Pantel TF/IDF is a special form of Vector Space Models that give a good analysis of VSMs [7]. It is always recommended that text be transformed into some numeric or vector repre-
218
R. Garg et al.
sentation before using any machine learning approach over text data. TF-IDF and word2vec are among various algorithms to do so. Many typical text classification methods use the word frequency (TF) and the inverse document frequency (IDF) to describe the value of terms and to measure the weighting of words in the classification of text documents [8]. The TF-IDF model is improved by many kinds of literatures, to overcome the traditional TF-IDF many types of research came up with modification in TF-IDF for better implementation of TF-IDF for their problem area. Few of them have introduced information gain with the traditional TF-IDF approach. In literature [9], the traditional TF-IDF method is improved by introduction of the part of speech weight coefficient and the position weight (span weight) of the characteristic word. In this approach weight of part of speech (POS) of characterizes words is calculated and then addition of position weight and POS weight of characterizes word is done. In literature [10], the traditional TF-IDF algorithm is improved by proposing a keyword extraction approach of Chinese medical words based on (Word Factor TF-IDF) WF-TF-IDF. According to them, traditional TF-IDF is not capable of including all features of words because of its relative evenly distribution in the corpus, which would decrease the precision of keyword extraction. The keyword extraction is divided into two parts one is data pre-processing that includes denoising, regular expression processing, Chinese word segmentation, synonyms exchanging and stop word filtering and other part is keyword extraction that includes WF-TF-IDF that is a consideration of word frequency in the title and description, and the word distribution in the categories. Word factor (WF) includes three factors which are description term frequency (DTF), title term frequency (TTF) and term distribution in categories (TDC). In literature [11], the TF-IDF is improved for calculating the weight of the feature words represented by word vector. Hence the degree of dispersion within the class, and the degree of association between the feature words and the class, are added. Authors have used both the algorithms TF-IDF and word vector. Firstly, word weight is calculated by an improved version of TF-IDF and then the model is trained with text in order to get the Vector Space Model for each text. In another improvement in TF-IDF, mentioned in the literature [12]. It is found that existing TF-IDF improvements is unable to deal with imbalanced dataset properly. FDCD-TF-IDF is based on word frequency distribution information and classification distribution information of TF-IDF. It works on calculating the inter-class factor, intraclass factor, category factor and then calculating the FDCD-TF-IDF value and finally permutation the features in descending order according to this value for selection of characteristic word. In literature [13], A novel TF-IDF weighting approach effective ranking is proposed. That approach employs two different within document term frequency normalizations to capture two different aspects of term saliency. One is Relative Intradocument TF (RITF) in which the importance of a term shall be measured by taking into consideration its frequency relative to the average TF of the document and second is Length Regularized TF (LRTF) in which factor normalizes the term frequency by considering the number of terms present in a document. An experiment was performed, and results have been compared with TF-IDF models and proba-
Personalization of News for a Logistics Organisation …
219
bilistic model. As a result, analysis it is found that Relative Intra-document-based TF (RITF) performs best for short queries, while Length Regularized TF (LRTF) performs better for long queries.
4 System Overview From the literature survey, it is understood that different recommender systems use different personalization attributes like user interest, group interest, location-based and ontology. The proposed news recommendation system considers Natural Language Processing (NLP) techniques for finding the impact or relevance of news with the user. During the implementation of our work, there was a need of pre-processing the data. For this purpose, we have to use python libraries from different data cleaning and NLTK packages. The most widely used a technique for extracting text characteristics words is TF-IDF (Term Frequency- Inverse Document Frequency) algorithm and word vector embedding. The TF-IDF algorithm does not have the characteristic word extract function. It is only a weight calculation method, that can be used to evaluate the importance of the word segmentation and to obtain words that may represent the text characteristic [14]. TF-IDF algorithm extracts the text characteristic words by descending the order of these words by their weight. TF-IDF is the product of TF and IDF shown in the equation below, tfidf (t, d , D) = tf (t, d ) ∗ idf (t, D) where t denotes the terms; d denotes each document; D denotes the collection of documents and tf and idf are given by following equations respectively tf (t) =
Number of times term ‘t comes in a document Total number of terms in the document
idf (t) = loge
Total No. of documents No. of documents with term ‘t in it
Word vector embedding is the collective name for a set of modeling language and feature learning techniques in natural language processing (NLP) in which phrases or words from the vocabulary are mapped to vectors of real numbers [15, 16]. In general, a Word Embedding format attempts to map a word using a dictionary to a vector. It is based on the neural network for embedding the words. This approach is a two-layer neural net that processes text by “vectorizing” words [17, 18]. The input to this approach is a text corpus and the output is a set of vectors: feature vectors that represent words in that corpus. The purpose and usefulness of converting vectors from words are to group similar words vectors together in vector space. Converting word to vector can make a very accurate assumptions about the meaning of words based on past appearances. It creates vectors that are distributed numerical representations
220
R. Garg et al.
Fig. 1 The architecture of popular Word Embedding: CBOW and Skip-Gram
of word features, such features are like the context of individual words. It does so without human intervention. The output of the vectorizing neural network is a vocabulary in which each item has a vector attached to it, that can be fed into a deeplearning network or simply queried to detect relationships between words. This is done in one of two ways, either by using context to predict a target word (a method known as the continuous bag of words, or CBOW) or by using a word to predict a target context, which is called skip-gram [19–23] (Fig. 1).
5 Materials and Methods 5.1 Data Collection The dataset used for this experiment is the shipment dataset and user profile of a reputed leading tractor manufacturing company provided by ATA Freight Line India Pvt Ltd. Once the data is collected, we cannot go straight from raw text to fitting a machine learning or deep learning model. We must preprocess our text first. The data is pre-processed by passing the raw data to a series of filters implemented in python. The pre-processing of data includes removing numbers, stop words, punctuation and lowering the case of data. Once the data is cleaned it is used for building a model.
Personalization of News for a Logistics Organisation …
221
5.2 Model Building The cleaned data through various pre-processing techniques are now ready to be used. The vocabulary is prepared from the cleaned text using a bag of words and n-gram. A bag-of-words model, (BoW), is a way of extracting text-based features for use in modeling, such as machine learning algorithms. The goal is to transform each free text document into a vector that can be used as input or output for a machine learning model. Once the BoW is created vocabulary of words is created using n-gram. Ngram is a more comprehensive approach to create a vocabulary of grouped words. This modifies the scope of the vocabulary and allows the bag-of-words to extract a little more meaning from the text. The words in the vocabulary are scored using term and document frequencies to create a word dictionary. The external weight to words that are of high importance to the customer can also be added as an additional feature to support the customer’s area of interest. To find out the relevancy of news headline the average score of unigram and bigram in input text is calculated using the built model. The relevancy of news (in percentage) is calculated based on the calculated score. The cleaned data is also used for building another model using word vector space. This model is based on word embedding. Word embedding is one of the most popular document vocabulary representation. It is a vector representation of a particular word. In this method, the weight of each word is calculated using a neural network. Word embedding method gives a vector space model in which the context of each word with another word is defined. We can look up the index of words against the trained weight.
5.3 Front End The GUI-based framework is designed for the front-end user experience in HTML bootstrap and Django Framework. Bootstrap is the most commonly used HTML, CSS, and JavaScript platform for creating responsive, mobile websites. Django is a high-level web-based Python framework that promotes rapid development and clean, pragmatic design (Figs. 2 and 3).
6 Experiment and Results The work done in this paper is implemented in the Python language. The experiment environment consists of Intel Core i5-4590 CPU, memory is 4 GB and operating system is Windows 7 professional. The hyperparameters for word embedding include size = 400, window = 5, min_count = 1, workers = no. of CPU, training algorithm = CBOW. The dataset used for the experiment is data provided by ATA Freight
222
R. Garg et al.
Fig. 2 The architecture of process flow in term frequency model
Line India Pvt. Ltd. The test data used for finding the relevancy is maritime news headlines from JOC.com and worldmaritimenews.com. The small part of the test dataset for the understanding purpose is shown in Table 1. The accuracy indicator shows the relevancy of news with the customer. It is found that both the models showing satisfying results but the model built with word embedding is giving more acceptable result with better score in identifying the news relevancy with customers because it is multidimensional vector that captures the words relationships. The experiments results are shown in Table 2.
7 Challenges Encountered Most of the methods used in this study are vocabulary dependent and rely on the user to update their corpus to take into account new words, phrases, and relevant individuals. This means that the machine learning models on which they rely on need to be updated regularly. If the source of our algorithm shifts, we need to retrain all our models from scratch, because online learning would not work due to vectors with varying lengths.
Personalization of News for a Logistics Organisation …
223
Fig. 3 The architecture of process flow in vector space model Table 1 Sample Maritime News Dataset from JOC.com and worldmaritimenews.com Maritime news headline News Headlines News 1 News 2 News 3 News 4 News 5 News 6 News 7 News 8 News 9 News 10
Union strike at Nhava Sheva in upcoming week Consumer goods giant Unilever has chosen deere as its logistics provider to expand its reach in China MAIB: CMA CGM G. Washington lost 137 containers after 20◦ rolls John Deere India Gets BS-V Certification for 3029 EWX Engine Columbia welcomes its largest vessel to date French container shipping major CMA CGM signed with its Asian counterparts COSCO Shipping, Evergreen and OOCL the launch of Ocean Alliance Day 4 Product China overtakes Germany as Biggest Containership Nation Maersk secures $5 billion in credit to help fund sustainability efforts Maersk Ups Bunker Surcharge amid VLSFO price rise British Ports seeking funds to fight coronavirus spread
224
R. Garg et al.
Table 2 Experiment results: model 1 and model 2 news relevancy (in %) Experiment Results News Model 1 (TF) relevancy score Model 2 (VSM) relevancy score News 1 News 2 News 3 News 4 News 5 News 6 News 7 News 8 News 9 News 10
28 36 0.33 25 0.006 1.91 4.73 8.55 13.76 1.76
63.26 66.27 1.94 55.73 −0.55 8.09 35.30 20.32 52.71 7.20
8 Conclusion Artificial intelligence can be helpful to the logistics sector. First of all, this technology increases the efficiency and accuracy of any logistics operation. Second, AI enables the automation of time-consuming tasks and the reduction of final costs. In this paper, two models are proposed for finding news relevancy. By introducing the concept of NLP and DNLP, we present two schemes for the relevancy of news headlines. Experiments carried out on a dataset of shipments and user profile shows that the proposed model performs well known state of the art TF-IDF baselines with significantly acceptable results. The results are compared with the word embedding model and it is found that the word embedding model gives much better results than the model based on term frequencies as it is a multidimensional vector that captures the relationships of the words whereas traditional term frequency model maps the single value of the word and captures no meaning. Through the analysis of experimental results, it is found that in initial phases where the data is less the algorithm based on term frequencies effect has been significantly improved for finding relevancy of news headline with customers but once a large amount of data is captured the word embedding approach performs better for news relevancy. For the future work perspective this approach can be implemented with improved word embedding approach by changing the hyperparameters and results can be compared for better relevancy of news prediction. The following recommendations that enhance the results obtained, but there are still flaws in the assumption that a model trained in past news may predict future trends. These limitations cannot be overcome until potential machine learning models are able to learn from the large examples of unrelated events that people depend on the acquisition of new information, in particular news stories.
Personalization of News for a Logistics Organisation …
225
Acknowledgements The author thanks, Akshay Ghodake for valuable discussion on Natural Language Processing and Logistics in general. This research is supported by ATA Freight Line Pvt. Ltd. Research Fellowship. Conflicts of Interest Arvind W Kiwelekary and Laxman D Netak declares that they have no conflict of interest. Rachit Garg has received research grants from ATA Freight Line India Pvt. Ltd. Swapnil S Bhate owns a position of Innovation Associate in CATI department of ATA Freight Line India Pvt. Ltd.
References 1. Liu, X., Chen, C., Liu, S.: Algorithm for ranking news. In: Third International Conference on Semantics, Knowledge and Grid (SKG 2007), pp. 314–317. IEEE (2007) 2. Gao, S., Ma, J., Chen, Z.: Modeling and predicting retweeting dynamics on microblogging platforms. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 107–116 (2015) 3. Agrawal, A., Sahdev, R., Davoudi, H., Khonsari, F., An, A., McGrath, S.: Detecting the magnitude of events from news articles. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 177–184. IEEE (2016) 4. Yang, M., Mahmood, M., Zhou, X., Shafaq, S., Zahid, L.: Design and implementation of cloud platform for intelligent logistics in the trend of intellectualization. China Commun. 14(10), 180–191 (2017) 5. Bouraoui, A., Jamoussi, S., Hamadou, A.B.: A New method for the construction of evolving embedded representations of words. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), Hammamet, pp. 83–87 (2017) 6. Ma, L., Zhang, Y.: Using Word2Vec to process big text data. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2895–2897. IEEE (2015) 7. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010) 8. Ma, Z., Feng, J., Chen, L., Hu, X., Shi, Y.: An improved approach to terms weighting in text classification. In: 2011 International Conference on Computer and Management (CAMAN), pp. 1–4. IEEE (2011) 9. Yang, Y. Research and realization of internet public opinion analysis based on improved TF-IDF algorithm. In: 2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), pp. 80–83. IEEE (2017) 10. Sun, P., Wang, L., Xia, Q.: The keyword extraction of Chinese medical web page based on WF-TF-IDF algorithm. In: 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 193–198. IEEE (2017) 11. Roul, R.K., Sahoo, J.K., Arora, K.: Modified TF-IDF term weighting strategies for text categorization. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6. IEEE (2017) 12. Wu, H., Yuan, N.: An improved TF-IDF algorithm based on word frequency distribution information and category distribution information. In: Proceedings of the 3rd International Conference on Intelligent Information Processing, pp. 211–215 (2018) 13. Paik, J.H.: A novel TF-IDF weighting scheme for effective ranking. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 343–352 (2013) 14. Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings International Conference on Machine Learning and Cybernetics, vol. 2, pp. 944–946. IEEE (2002)
226
R. Garg et al.
15. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014) 16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 (2013) 17. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003) 18. Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008) 19. Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518 (2007) 20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 21. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 22. Shi, T., Liu, Z.: Linking GloVe with word2vec. arXiv preprint arXiv:1411.5595 (2014) 23. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
AI Model Compression for Edge Devices Using Optimization Techniques Uday Kulkarni, S. M. Meena, Sunil V. Gurlahosur, Pratiksha Benagi, Atul Kashyap, Ayub Ansari, and Vinay Karnam
Abstract In recent years Artificial Intelligence (AI) models have gained applications in various fields. A wave of Machine Learning (ML) and AI is happening as the domination of edge devices is becoming set in stone. Edge devices are becoming more powerful and all over, so a lot of ML tasks can happen right on our edge devices rather than using the high-end system. The computational cost, accuracy and limited power are considered very important while deploying an AI model on edge devices. It is crucial to reduce the size of dense neural networks such that they can be deployed on edge devices without losing much accuracy. There is growth in the number of researches to reduce the computation cost by maintaining the accuracy of Convolutional Neural Network (CNN) model. The pure inference efficiency is also becoming one of the most important issue, here is where pruning and quantization play a significant role. In this paper, we are proposing a methodology that compresses large AI model and improves the inference time such that it can be deployed on edge devices. The accuracy loss of the proposed algorithm drops by only 0.44% after pruning and quantization on CNN seven layer architecture. U. Kulkarni (B) · S. M. Meena · S. V. Gurlahosur · P. Benagi · A. Kashyap · A. Ansari · V. Karnam KLE Technological University, Hubballi, India e-mail: [email protected] S. M. Meena e-mail: [email protected] S. V. Gurlahosur e-mail: [email protected] P. Benagi e-mail: [email protected] A. Kashyap e-mail: [email protected] A. Ansari e-mail: [email protected] V. Karnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_17
227
228
U. Kulkarni et al.
Keywords Pruning · Sparsity · Deep learning
1 Introduction Deep learning (DL) is a subset of AI that emulates the operations of the human mind in handling information and making designs for use in decision making. The popularity and importance of DL have been improving because of its large domain of problem-solving methods in a very short period. One of the challenges of DL is to reduce the size of a dense neural network without losing much accuracy such that it can be deployed on edge devices [1, 2]. A CNN has an input layer, a hidden layer and an output layer where each layer has neurons with some weight. It uses complicated mathematical models to process data. Each layer in the network uses feature hierarchy which is nothing but performing specific types of ordering and sorting in a process. DL implemented on edge devices has a lot of advantages such as low communication bandwidth, less cloud computing resource cost, low inference time and better data security [3]. Whereas traditional computation models rely on only mobile sensing and cloud computing. Using the neural network has become very common in applications ranging from speech recognition to computer vision. A CNN model LeNet-5 with less than 1 M parameters to classify handwritten digits was designed in 1998. Deep face was successful in classifying human faces with 120 M parameters. Large neural networks consume considerable storage, computational resources and memory bandwidth and are very powerful. These resources become prohibitive for embedded mobile applications. Large networks require more costly DRAM access because they do not fit in on-chip storage. The original AlexNet is over 200 MB in float format, neural networks take a lot of space on the disk. The neural network contains millions of neurons and almost all the space is taken up by the weight of neurons [4]. Large-sized files can be compressed by zip format but it can be used to compress a neural network because everything is floating point numbers. The weights of neurons are distributed in each layer in the neural network within a certain range [5]. Alternatively, SIMD operations can be used which do many operations per clock cycle. The DSP chips are also available to accelerate eight-bit calculations too. The computation cost of the model can be improved and also make them use less power by replacing floating point operation with integer eight-bit which is important to edge devices [6, 7]. It will make a way for edge devices which do not have the high processing power to run floating operations efficiently and also enable other applications in the IoT world. Pruning will help the network by reducing the energy required to run the large network so that it can run in real-time mobile applications. Pruning will also benefit the model by reducing the size and transmission of mobile applications incorporating CNNs. A methodology is presented to prune the network such that original accuracy is preserved. After the initial training of the network, all the connections whose weights
AI Model Compression for Edge Devices Using …
229
are lower than a threshold are removed. A fully connected layer is converted into a sparse layer making it easy to identify the important connections and remove all unimportant connections [8, 9]. The sparse network is retrained so that the connection removed during pruning of the neural network can learn to maintain its accuracy. The pruning is repeated iteratively only if accuracy is maintained if accuracy starts reducing then terminate the pruning and save the model for quantization [10, 11]. After quantization, the model has less number of weights distributed in the neural network as compared with the original CNN model and also each weight is represented in int8.
2 Background Study To implement Pruning and Quantization in neural networks, we used CNN (Convolutional Neural Network), Resnet50 Architecture, CIFAR-10 dataset, Deep learning approaches.
2.1 Convolutional Neural Network (CNN) The CNN is a type of neural network that is most often applied to image processing. It is used to identify the object in image and also used for natural language processing [12, 13]. The idea behind CNN is to filter the images before training the deep neural network. It has two parts: feature learning and classification of input images. The working of CNN is broadly divided into three parts. The first part is convolution to extract the feature in the input image. Second part is applying non-linearity that allows us to deal with nonlinear data. Third part is pooling operation which allows to down sample the spatial reduction of the input image and deal with multiple scales with that image. The output layer of convolution is not a single image but it is a volume of images representing all the different filters that they detect [14]. The pooling operation is used to reduce the dimensionality of the input layer and this can be done with any layer after the convolution layer. The common technique here for pooling is called max pooling and the idea is to slide another window over the network, for each patch simply take maximum value and repeat on the entire image. The first goal is to extract features from the image and feed the output features into fully connected layers or dense layers [15]. Dense layer can output a probability distribution over the image in different categories or classes. The various architectures of CNNs are LeNet, AlexNet, VGGNet, GoogleNet, ResNet, ZFNet etc. In Fig. 1 first part represents feature learning and second part represents classification of image using convolution neural network. The objective of the first phase of feature learning is to extract low and high level features from the input image by applying convolution, max pooling and non-linearity operations. Fully connected
230
U. Kulkarni et al.
Fig. 1 A typical CNN
layer in the second phase learns non-linear combination of high level features and classifies the image using the softmax classification technique.
2.2 CIFAR-10 The CIFAR-10 dataset has a total of 60,000 images and 10 classes containing 6000 images of each class. The dataset is divided into five training batches and one testing batch of 10,000 images. There are 50,000 training images and 10,000 testing 32 × 32 color images. The training batch contains 5000 images from each class. The classes in the dataset are airplane, bird, automobile, cat, deer, frog, dog, horse, ship, and truck.
2.3 Deep Learning Deep learning is a subset of machine learning in AI and sometimes it is referred to as deep neural learning or deep neural networks [16]. It uses multiple layers of perceptrons to extract high level features from raw input [17]. There are various methods that can be used to create efficient deep learning model and techniques are transfer learning, training from scratch, dropout, pruning of neural network etc. Figure 2 shows neural networks have input layer, hidden layer and output layer, but deep learning has input layer, number of hidden layers as per requirement for
AI Model Compression for Edge Devices Using …
231
Fig. 2 Representation of deep learning architecture
best deep learning model and output layer [18]. The large set of labeled data helps to train deep learning models by extracting high and low level features from the input data.
2.4 ResNet-50 ResNet (Residual Neural Network) is similar to convolution, activation and pooling with one addition of original input layer to the output layer. The ResNet architecture is displayed in Fig. 3. It is a neural network which is used for many computer vision applications. The deep learning model needed a deeper network architecture that can either give good accuracy or at least the same as the shallower networks. The increment of layers in deep neural networks decreased the accuracy to maintain that we add stack of convolution, activation and pooling with addition of original input to output layer. Fig. 3 ResNet architecture
232
U. Kulkarni et al.
Fig. 4 Pruning
2.5 Pruning The word prune means to reduce the extent of something by removing superfluous or unwanted parts. The idea of pruning of neural network is inspired by synaptic pruning in the human brains where neurons are connecting until the adult age and after that synapses elimination takes place in which removal of the unwanted connection between neurons happen [19] and is shown in Fig. 4. It is a classical method to reduce the AI models complexity. Pruning is a technique in the development of the deep learning model by removing some unimportant neurons from the deep neural network [20, 21]. It helps in the development of the light and efficient model for edge devices.
2.6 Quantization AI model deployment on edge devices for real-time inference is important for many applications. The edge devices are limited in terms of computing resources, power and memory bandwidth [22]. The deployment is possible when the models are optimized and one optimization method is to reduce the bits for the representation of numbers. The bits are used to represent a real number on the computer. But to represent an infinitely large number we only have a limited number of bits. Usually, a 32-bit floating point is the default for most applications including a lot of deep learning models. It is possible to develop the CNN models with smaller datatypes like 8-bit integers [23]. All calculations are quantized i.e., discretized to perfect values and represent using integers in place of floating-point numbers.
AI Model Compression for Edge Devices Using …
233
3 Related Work The CNNs have very dense networks that are costly in terms of computation and memory. We cannot deploy these networks because of very high computation cost and energy usage on edge devices. So we need to reduce the size of a neural network such that the accuracy should be maintained. The early works by Yann Le Cun et al. [24] introduce optimal brain damage which gives the idea of pruning of neural network by removing unimportant weight from a network. The works by Song Han et al. [25] in the paper “Learning both weight and connection for efficient neural networks” describe a method which prunes redundant connections using three steps. In the first step, train the network to learn which connection is important and in the second step, prune the unimportant connections and in the last step retrain the network to learn the connection which is lost due to pruning. Neural network pruning has been used to reduce the network complexity and to reduce overfitting. The method for CNN to prune filters from CNN with a minimum effect on the accuracy is introduced by Hao Li et al. [26] in the “Pruning Filters for Efficient Convnets”. By removing whole filters from the network together with their connecting feature maps, the computation cost reduced to a large extent. Pruning the smallest filters works better as compared to the pruning of random and largest filters. After finding and sorting the absolute kernel weight prune the filter with the smallest sum value and their feature maps, when kernel weight of a filter is less than the threshold value then magnitude-based weight pruning may prune the whole filter. Quantization is also a good idea to reduce the size or computation cost of neural network and it reduce the uses of energy in the device due to a reduction in floatingpoint operation which helps the neural network to deploy on edge devices. The work by Raghuraman Krishnamoorthi, [27] in his white paper suggests that perchannel quantization from int32 to int8 provides good accuracy and quantization of activation into int8 with no loss in accuracy. The loss in accuracy is only due to weight quantization and more parameter network like ResNet is robust than lower parameter network like MobileNet. The compression of CNN using filter pruning is introduced by Jian-HaoLuo et al. [21]. A simple but effective framework ThiNet is proposed to accelerate and also compress the CNN models. They used the pretrained CNN model and compressed with a predefined compression rate in a four step process. The filter selection, pruning, fine tuning are the first three-step and the fourth step is to iterate step 1 to prune the next layer. The Asymptotic Soft Filter Pruning (ASFP) method proposed by Yang He et al. [26] to improve the inference for the DNNs. In the first stage update the pruned filters during retraining and in the second stage, asymptotically prune the networks.
234
U. Kulkarni et al.
4 Proposed Work In this paper, we propose a method to reduce the model size of the CNN model so that it can be deployed on the edge devices using the concept of pruning of neural networks and quantization of float 32-bit to integer 8-bit and is shown in Fig. 5.
4.1 Working of the Proposed System The input for the proposed system is the CNN model. The first step is to extract the weight of each node from the input model. The next step is to apply pruning with different sparsity values and retrain with the dataset on which the model was previously trained. After pruning, start retraining the pruned neural network using the same dataset used previously during the training of the input model [28–30]. The CIFAR-10 dataset is used for the training. The iterative pruning method is used to find best pruned model until there is a significant drop in the accuracy and the best accurate model is saved. Further apply post-training quantization on the saved model to convert the 32-bit floating value into the 8-bit integer value [31]. Lastly save the quantized model which has very less parameters as compared to the baseline model.
4.2 Removal of Unimportant Weights in Each Layer of the Neural Network In the neural network, each node represents some weight of different value. The nodes with a less value of weight as compared to the product of average weight of neural networks and sparsity value are unimportant nodes. The selection of unimportant nodes in each layer of the neural network is dependent on the mean value of all the weights of neurons and sparsity.
Fig. 5 Architecture design
AI Model Compression for Edge Devices Using …
σmean W(i, j)
N 1 [n] = mean W(i, j) N n=0
235
(1)
Equation (1) evaluates the total mean weight of the neural network by summing up themean of weights in each layer. N represents the total number of layer and [n] mean W(i, j) represents the mean value of weight in each layer. X i = σmean × S[i]
(2)
Equation (2) X i denotes the threshold value and is used for pruning the weights lesser than the threshold value. S[i] represent the sparsity at ith index and σ mean is the mean value of weights.
4.3 Conversion of Float32 to Int8 The optimization of the Al model from float32 to int8 is important when deploying on low-embedded devices. The 8-bit quantization is used to fix the range between 0 and 255 (Fig. 6) [32]. The relation between different parameters used to find quantized value is shown below. N=
n max − n min × (Q−Z ) 28 − 1
N = (Scaling f actor ) × (Q − Z )
(3) (4)
In Eqs. (3 and 4), N represents quantized value in int8. nmax , nmin represent the minimum and maximum value of range. Q represents the quantized 8-bit integer and Z is used to map the value to zero. The Scalingfactor is used to shift the number line.
Fig. 6 Number line
236
U. Kulkarni et al.
5 Results The implementation includes iterative pruning and post training quantization on two types of network, a simple CNN (a seven-layer architecture) and ResNet-50 on the CIFAR-10 dataset. These neural networks have a very large number of parameters and also have fully connected layers. It’s challenging to maintain the original model accuracy after size compression. From Fig. 7. it can be seen that there is a minimum loss in accuracy after applying pruning iteratively by increasing the sparsity to a
Fig. 7 Accuracy versus sparsity
AI Model Compression for Edge Devices Using …
237
Fig. 8 Distribution of weights in each convolutional layer
certain value. The objective of applying post-training quantization after pruning is to reduce the model size by the factor of four i.e., conversion of float32 into int8 which makes the model still smaller in size and computationally efficient for edge devices.
5.1 Accuracy Drop versus Sparsity Figure 7 shows the trade-off curve between accuracy drop and the sparsity value (the number of pruned connections between the nodes of a neural network). The accuracy is constant for an unpruned neural network, shown with a red straight line. The blue dotted line represents the accuracy drop of neural network after pruning without retraining with input dataset. There is an accuracy drop due to the removal of already learned features during training. The neural networks which are retrained after pruning have very less accuracy drop up to a certain value of sparsity and are shown in a black dotted line. The retraining is important to maintain accuracy as it helps to learn the features which are pruned.
5.2 Distribution of Weights in CNNs Layer Figure 8 shows the comparison between the number of parameters before pruning and after pruning in each layer of CNNs along with threshold value. A few CNN layers
238
U. Kulkarni et al.
like fourth and fifth have a very large number of parameters before pruning, but after pruning they reduce drastically. In seven-layer CNNs architecture the total weight before pruning was 0.95376 million and after pruning it reduces to 0.19118 million, so the total weights pruned are 0.76258 million. There is around 30% reduction in floating point operation after pruning.
5.3 Comparison Between Baseline Model and Proposed Model The model size and accuracy comparison between baseline model and proposed model are shown in Fig. 9. The baseline model has size of 15.30 MB with accuracy of 84.17% and proposed model has size of 0.97 MB with accuracy of 83.73%. There is a drastic change in the size of the model after applying pruning and quantization. There is accuracy loss of only 0.44% after pruning and quantizing the CNN seven layer architecture. The size of the ResNet-50 model was found to be 98 MB with accuracy of 93%. After pruning the accuracy drop is 1.32% and the parameters reduce by 20% and floating point operation reduces by 27%.
6 Conclusion An approach is proposed to compress large neural network models so that they perform with efficient inference on device at low computation cost and memory. Experimental results show the effectiveness of this method in achieving significant
Fig. 9 Model size comparison
AI Model Compression for Edge Devices Using …
239
model size reductions and efficient inference while providing competitive performance. A few possible future extensions are applying filter pruning on the network to remove the entire filter that are insignificant, apply custom quantization to achieve higher compression.
References 1. Li, E., Zhou, Z., Chen, X.: Edge intelligence: on-demand deep learning model co-inference with device-edge synergy. In: Proceedings of the 2018 Workshop on Mobile Edge Communications, 7 Aug 2018, pp. 31–36 (2018) 2. He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 1389–1397 (2017) 3. Desai, S.D., Giraddi, S., Narayankar, P., Pudakalakatti, N.R., Sulegaon, S.: Back-propagation neural network versus logistic regression in heart disease classification. In: Advanced Computing and Communication Technologies, pp. 133–144. Springer, Singapore (2019) 4. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400. 2013 Dec 16 5. Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115. 2014 Dec 18 6. Wang, X., Han, Y., Leung, V.C., Niyato, D., Yan, X., Chen, X.: Convergence of edge computing and deep learning: a comprehensive survey. IEEE Commun. Surv. Tutorials 22(2), 869–904 (2020) 7. Migacz, S.: NVIDIA 8-bit inference width TensorRT. In: InGPU Technology Conference (2017) 8. Narang, S., Elsen, E., Diamos, G., Sengupta, S.: Exploring sparsity in recurrent neural networks. arXiv preprint arXiv:1704.05119. 2017 Apr 17 9. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through L 0 regularization. arXiv preprint arXiv:1712.01312. 2017 Dec 4 10. Zhou, W., Veitch, V., Austern, M., Adams, R.P., Orbanz, P.: Compressibility and generalization in large-scale deep learning. arXiv preprint arXiv:1804.05862. 2018 May 2 11. Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866. 2014 May 15 12. Cohen, N., Shashua, A.: Inductive bias of deep convolutional networks through pooling geometry. arXiv preprint arXiv:1605.06743. 2016 May 22 13. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, 6 Sep 2014, pp. 818-833. Springer, Cham (2014) 14. He, K., Sun, J.: Convolutional neural networks at constrained time cost. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353–5360 (2015) 15. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440. 2016 Nov 19 16. Hanni, A., Chickerur, S., Bidari, I.: Deep learning framework for scene based indoor location recognition. In: 2017 International Conference on Technological Advancements in Power and Energy (TAP Energy), 21 Dec 2017, pp. 1–8. IEEE (2017) 17. Wang, H., Raj, B.: On the origin of deep learning. arXiv preprint arXiv:1702.07800. 2017 Feb 24 18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014 Sep 4 19. Wang, Y., Zhang, X., Xie, L., Zhou, J., Su, H., Zhang, B., Hu, X.: Pruning from scratch. In: AAAI 2020, pp. 12273–12280 (2020)
240
U. Kulkarni et al.
20. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. 2015 Oct 1 21. Luo, J.H., Wu, J., Lin, W.: Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 5058–5066 (2017) 22. Mishra, A., Marr, D.: Apprentice: using knowledge distillation techniques to improve lowprecision network accuracy. arXiv preprint arXiv:1711.05852. 2017 Nov 15 23. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704– 2713 (2018) 24. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990) 25. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135-1143 (2015) 26. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710. 2016 Aug 31 27. Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342. 2018 Jun 21 28. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. 2018 Mar 9 29. Warden, P.: How to quantize neural networks with tensorflow (2016) 30. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016) 31. Silberman, N., Guadarrama, S.: TensorFlow-slim image classification model library (2016) 32. Sahni, M.: Making neural nets work with low precision. EfficieNN. https://sahnimanas.github. io/post/quantization-in-tflite/. 2018 Dec 7
An Empirical Study and Analysis of Various Electroencephalography (EEG) Artefact Removal Methods J. Vishwesh and P. Raviraj
Abstract The brain’s electrical activity over a period of time can be recorded by means of electroencephalogram (EEG) with the help of multiple electrodes placed on the scalp which converts ionic potential into electrical signals. EEG is vastly used to diagnose epilepsy, sleep disorders and diseases related to brain. This paper planned for tending to the different procedures used to reduce the noise or artefacts say Electrooculography (EOG) and other artefacts from the EEG signals, as well as comparison between the methods. We initially present basic information on the qualities of EEG signal, of the artefacts and of the EEG estimation model. Keywords EEG · EOG · Electrodes · Estimation model · Artefacts
1 Introduction The action of the brain can be recorded with the help of electroencephalogram (EEG) signals. EEG was first recorded on creature cerebrum in 1875 by Richard Caton and it was first recorded on human mind by Hans Berger in 1929 [1]. This neurophysiologic measurement can be acquired by non-invasive technique that utilizes a lot of electrodes along the scalp [2]. EEG assumes a pivotal job in numerous parts of the present research. Nowadays EEG is extensively used in neuroscience, cognitive science, neurolinguistics and neurophysiological research. EEG is regularly used to analyze epilepsy, which causes variations from the norm in EEG readings [3]. It is additionally used to analyze sleep disorders, depth of anesthesia, unconsciousness, encephalopathies, and brain death. An EEG might also be helpful for treating the disorders like brain tumour, brain damage from head injury, stroke, sleep disorders etc. As indicated by the psychological condition of an individual, the EEG is ordered as EEG groups as delta (alert with mental movement), theta (resting/sleeping), alpha (loose and conscious), beta (on edge and focused on/deep J. Vishwesh (B) · P. Raviraj Department of Computer Science and Engineering, GSSS Institute of Engineering and Technology for Women, VTU, Belagavi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_18
241
242
J. Vishwesh and P. Raviraj
Fig. 1 10-20 system of electrode placement
sleep), and gamma (hyper brain activity) waves. So as to record an EEG signal, standard 10–20 account framework involving a sum of 21 electrodes is utilized as appeared in the Fig. 1. Nevertheless, the EEG signals recorded from frontal part which are closer to eyes (Fp1, Fp2) of the scalp contain impressive measure of noise (EOG) introduced due with the eye developments. Here eye developments can be horizontal, vertical or even flickers. To eliminate the EOG artefacts contained in EEG signals, a few techniques have been proposed by the specialists. EEG data is almost always contaminated by many artifacts, which changes the estimations and influence the signal of inserts. Most ideal route is to keep away from the event of the artefacts with an EEG signal while recording it [4] with the help of some of the artifact prevention methods, but unfortunately the EEG signals are defiled with various physiological factors other than mind activity, which are routinely not of interest. Due to the sources of noise may be ocular movements, eye blinks, and cardiac activity or device errors, being very diverse and different characteristic, most of the researchers concentrate on removing one particular kind of artifacts. Artefacts can be stamped or it tends to be distinguished. When the artefacts are identified, it tends to be evacuated by dismissing the piece of the signal that containing the artefacts. For this situation there is an opportunity of losing the helpful data from the EEG signal. The retraction of noise and artefacts is a significant issue in EEG signal handling, and is typically an essential for the resulting signal investigation. EEG signal segment can be removed in case if it contains excessive interference [5], but, it is important to channel just the artefacts while holding however much data of the EEG signal as could reasonably be expected [6]. If we want to remove the artefacts from the EEG signals, then the first step is to detect it and then to remove the artefacts. There are different approaches to detect the artefacts from the EEG signal, for example, statistical thresholding method [7], classification method say DETECT [8]. These methods only identify
An Empirical Study and Analysis of Various …
243
the artefacts from the EEG signal by methods for a distinguishing proof or characterization technique. Artefact evacuation should be possible utilizing Surface Laplacian (SL), Common Average Referencing (CAR), Independent Component Analysis (ICA), Constrained ICS (cICA), Common Spatial Patterns (CSP), Singular Spectrum Analysis (SSA), Adaptive Noise Canceller (ANC), Principal Component Analysis (PCA), Adaptive Noise Canceller (ANC), Quaternion valued Least Mean Square (QLMS), Single Value Decomposition (SVD), Common Spatio-Spatial Patterns (CSSP), Linear Regression, Local Averaging Technique (LAT), Blind Source Separation (BSS), Filtering methods say Adaptive filtering, Bayes filtering, etc. The most every now and again utilized techniques are ICA, CAR, SL, PCA, CSP, ANC and Adaptive Filtering [9].
2 Characteristics of the EEG The EEG signal is ordinarily depicted in two terms state (1) rhythmic action and (2) transient. The rhythmic action of the EEG signal is again divided into bands by means of frequency. An EEG signal containing the frequency which ranges from 0.01 Hz to around 100 Hz. Signal with the frequency 0.01 Hz is usually not recorded correspond to slow cortical potentials (SCPs). The more energy concentrated of the EEG signal is in the lower range of the spectrum [10]. The EEG records the electrical activity of the brain in a form of five different oscillations bands including delta ( P(C j |X ) For 1 ≤ j ≤ m, j = i
(2)
Thus, the classifier estimates the class for which the value of P(Ci /x) is maximum. By Bayes’ theorem P(Ci |X ) = 2.
3.
P(X |Ci )P(Ci ) P(X )
(3)
Since P(X ) is constant for all classes, we need to maximize only P(X |Ci )P(Ci ). If the class prior probabilities P(Ci ) are not known, then it is commonly assumed that all the classes are equally likely and hence we would therefore maximize P(X |Ci ). Otherwise, we maximize P(X |Ci )P(Ci ). With large data sets with many attributes, it would be computationally expensive to compute P(X |Ci ). To reduce the computation in evaluating P(X |Ci ), it is assumed that there are no dependent relationships among the attributes. In such a case the Bayes theorem simplifies to P(X |Ci ) =
n k=1
P(xk |Ci ) = P(x1 |Ci ) × P(x2 |Ci ) × . . . × P(xn |Ci )
(4)
SVM and Naïve Bayes Models …
293
The probability values of P(x1 |Ci ), P(x2 |Ci ), . . . , P(xn |Ci ) are calculated from the training sets. With these values, the model is tested with the test sets for validation before deployment.
3.2 Support Vector Machines A support vector machine (SVM) identifies an optimum hyper plane that can separate the data by the class of the predicted variable using a set of data tuples called support vectors. As brought out in [8], the SVM uses a nonlinear mapping to transform the original data into a higher dimension and calculates a linear optimal separating hyper plane. This hyper plane is called a “decision boundary” that separates the data based on the class of predicted variables. The hyper planes are derived from support vectors that lie closest to the decision boundary. For a linearly separable data, SVM derives maximum marginal hyper plane that gives largest separation between classes. For the classification of linearly inseparable data, SVMs can find nonlinear decision boundaries by applying a kernel functions like polynomial, Gaussian radial basis and sigmoidal. Given the training vector xi ∈ R n , i = 1 . . . . . . .l in say two classes, and the class variable y ∈ R i such that yi ∈ {1, −1},C − SV C solves the primal optimization problem 1 T ω ω+C ξi 2 i=1 l
min
w,b,ξ
(5)
Subject to yi (ω T φ(xi ) + b) ≥ 1 − ξi ξi ≥ 0, i = 1 . . . . . . . . . l Where φ(xi ) maps xi into a higher dimensional space. C > 0 is the regularization parameter. Due to higher dimensions, we usually solve the dual problem 1 min α T Qα − e T α α 2
(6)
Subject to y T α = 0 0 < αi ≤ C, i = 1 . . . . . . . . . l Where e = [1, . . . . . . l]T is the vector of all ones, Q is a l by l positive semi definite matrix. Q i, j ≡ yi y j K (xi , x j ) and K (xi , x j ) ≡ φ(xi )T φ(x j ) is the kernel function.
294
S. Narasimhan and V. Rajendran
After the Eq. (6) is solved, using the primal-dual relationship, the optimal ω satisfies w=
l
yi αi φ(xi )
(7)
i=1
Moreover, the decision function is sgn(ω T φ(x) + b) = sgn(
l
yi αi K (xi , x) + b)
(8)
i=1
(Korovkinas and Dan˙enas, 2017) [6] The Support Vector Machines are very slow in learning but are comparatively more accurate due to their ability to model complex nonlinear decision boundaries. Since the complexity of the classifier is dependent on number of support vectors rather than dimensions, this model is less prone to over fitting than other methods.
4 The Process System We have taken the process data on the oil cooling circuit for the mechanical seal of 2.7 Mega Watts centrifugal pump in a nuclear power plant for this study. The pump circulates the coolant to take away the thermal energy from the fission reaction in the core to produce steam in steam generators. The oil cooling system will circulate the oil in the mechanical seal assemblies of the pump to provide cooling, lubrication as well as sealing for the pump shaft assembly. Distributed digital Control System (DCS) scans all the process sensors, collects the data and logs in the historian. We collected the DCS data of the key parameters of the process for one month of stable operating performance from the server for the analysis at a sampling rate of 5 s as listed in Table 1. The total data tuples taken for the study for the above period is about two lakhs. Table 1 Process parameters and ranges of measurements Sl.No
Parameter ID
Parameter Description
Range of Measurement
1
TTRol20_801
Oil inlet temperature to bottom mechanical seal
0 to 600o C
2
TTRol20_808
Return oil temperature after cooling by blowers
0 to 600o C
3
PTRol20_803
Oil inlet pressure to bottom mechanical seal
0–7 kg/cm2
4
FTRol20_801
Oil outlet flow from mechanical seal
0–50 m3 /hr.
SVM and Naïve Bayes Models …
295
5 Model Building The historian data in the process computers of DCS for the selected variables are imported to data analysis tool ‘R’ version 3.6.3. After eliminating the missing values, we transformed the time series data to categorical attributes by slicing the ranges to 25 bins. The data frame now consists of 20585 data tuples of four categorical variables with 25 classes. Further, we have partitioned the data frame to training set and a test set in the ratio 70:30. Since the data is pertaining to steady state condition of the process, the summary information of the dataset indicates a class imbalance in each of the attributes as shown in Tables 2 and 3. The models were implemented in R language version 3.6.3 with the package ‘e1071’ [9] using the training data set. All model parameters were set to their default values as follows: Default Naive Bayes parameters: • Laplace (positive double controlling Laplace smoothing): 0 (disables Laplace smoothing) Table 2 Training Datasets Class Summary TTRol20_801 Classo C 37.6–39.2 44.0–45.6 45.6–47.2 47.2–48.8 48.8–50.4 50.4–52.0 Quantity 121
147
297
1389
34069
107922
TTRol20_808 Classo C 40.8–42.4 42.4–44.0 44.0–45.6 45.6–47.2 47.2–48.8 48.8–50.4 Quantity 116 PTRol20_803 Class kg/cm2
3.7–4.0
Quantity 88 FTRol20_801 Class m3 /hr.
246
4629
91690
47010
71
4.0–4.3
4.3–4.6
4.6–4.9
4.9–5.2
5.5–5.8
44524
97926
137
48
1297
28.6–30.4 30.4–32.2 32.2–34.1 34.–35.9
Quantity 45
34535
105775
2346
35.9–37.7 37.72–39.5 919
435
Table 3 Test Datasets Class Summary TTRol20_801 Classo C 37.6–39.2 44.0–45.6 45.6–47.2 47.2–48.8 48.8–50.4 50.4–52.0 Quantity 70
76
142
685
14920
45794
TTRol20_808 Classo C 40.8–42.4 42.4–44.0 44.0–45.6 45.6–47.2 47.2–48.8 48.8–50.4 Quantity 52 PTRol20_803 Class kg/cm2
3.7–4.0
Quantity 49 FTRol20_801 Class m3 /hr.
121
2038
20032
39318
71
4.0–4.3
4.3–4.6
4.6–4.9
4.9–5.2
5.5–5.8
18949
42054
87
18
578
28.6–30.4 30.4–32.2 32.2–34.1 34.1–35.9 35.9–37.7 37.7–39.54
Quantity 29
14717
45367
1028
381
230
296
S. Narasimhan and V. Rajendran
• na.action (A function to specify the action to be taken if “NA” are found):The default action is not to count “NA” for the computation of the probability factors Default SVM parameters: • • • • • •
type: C-classification kernel: linear gamma: 1/(data dimension) cost of constraints violation (cost): 1 tolerance of termination criterion (tolerance): 0.001 epsilon in the insensitive-loss function (epsilon): 0.1
We used the predictions made by the models on test data set to evaluate their performance for each of the variable.
6 Model Evaluation The performance of the models were evaluated using the confusion matrix in R package “caret” [10]. The evaluation is carried out based on the values of Total Positives (P), True positives (TP), Total Negatives (N), True negatives (TN), false positives (FP) and False negatives (FN).The formulas are as Table 4: The best results that could be obtained for the most prevalent class are as per the Tables 5 and 6. The results on summary metrics indicate that 1.
2.
The global accuracy of all the models is uniform at around 75% except for oil inlet pressure TTRol20_803. The accuracy level of both the models for this parameter is around 80%. Both the models exhibit a high prevalence value for the parameters TTRol20_801 and FTRol20_801 compared to the accuracy values indicating that the model prediction is not better than random predictions of the most prevalent class.
The other quality metrics for the most prevalent class for each parameter are tabulated in Table-6. The results of these metrics indicate that 1. 2. 3. 4.
The performance of both the models is similar for the parameter TTRol20_808. Naïve Bayesian Classifier has higher sensitivity than SVM for the parameter FTRol20_801. Naïve Bayesian Classifier has poor selectivity for the parameter FTRol20_801. Similarly, Naïve Bayesian Classifier is less precise than SVM for the parameter FTRol20_801. Both the models are showing a precision value of one for TTRol20_801.
SVM and Naïve Bayes Models …
297
Table 4 Model Metrics Sl.No
Model Metrics
Description
Derivation
1
Global Accuracy/Recognition rate
percentage of test data tuples that are correctly classified
T P+T N P+N
2
No Information Rate/Prevalence
The largest class percentage in the data
P (P+N )
3
Global Error Rate
percentage of test data tuples that are wrongly classified
F P+F N P+N
4
Sensitivity/Recall/True Positive Rate (TPR)
True Positive (recognition) Rate, Measure of Completeness
TP P
5
Specificity/Selectivity/True Negative Rate (TNR)
True Negative (recognition) Rate, Measure of Completeness
TN N
6
Precision/Positive Prediction value (PPV)
When the prediction is positive, how often is it correct(Exactness)
TP T P+F P
7
Negative Predicted Value (NPV)
when the prediction is negative, how often is it correct
TN T N +F N
8
Miss Rate/False Negative Rate (FNR)
probability of false positive prediction
1 − Sensitivit y
9
Fallout/False Positive Rate (FPR)
probability of missing a genuine 1 − Speci f icit y class
10
Balanced Accuracy
Average of sensitivity and selectivity
Sensitivit y+Speci f icit y 2
11
F1-Score
Harmonic average of precision and recall
2×Pr ecision×Recall Pr ecision+Recall
5. 6.
7.
8.
× 100
× 100
Both the models have very low precision of predicting negative classes for TTRol20_801. SVM model show zero type-1 error (false positive rate) for the parameters TTRol20_801 and FTRol20_801. Whereas naïve Bayesian classifier has near zero rate for TTRol20_801. The balanced accuracy for the prevalent class is higher for SVM compared to naïve Bayesian. The accuracy values for all the parameters in the range 0.75 to 0.87 except for FTRol20_801 where naïve Bayesian classifier has the lowest accuracy of 0.665. The F1 score for the models exhibit consistency and similarity in the range 0.82 to 0.86 for all the parameters.
With the above evaluation metrics, we can conclude that both the models have a similar performance in this study except for few cases in TTRol20_801 and FTRol20_801 where it is also established that the no information rate is higher than the global accuracy levels. To compare the models in a comprehensive scale, the receiver operating characteristics curve is plotted and the Area under the Curve
298
S. Narasimhan and V. Rajendran
Table 5 Summary Metrics Values Sl.No
Variable
Model
Accuracy
CI interval No Information Rate
Global Error Rate
1
TTRol20_808
Naïve Bayesian
0.7658
(0.7625, 0.7692)
0.6785
0.2342
SVM
0.7642
(0.7609, 0.7676)
0.6814
0.2358
Naïve Bayesian
0.7479
(0.7445, 0.7514)
0.9948
0.2521
SVM
0.7463
(0.7428, 0.7497)
0.9937
0.2537
Naïve Bayesian
0.7344
(0.7309, 0.7379)
0.9703
0.2656
SVM
0.7423
(0.7388, 0.7457)
0.987
0.2576
Naïve Bayesian
0.8082
(0.8051, 0.8113)
0.6592
0.1918
SVM
0.8077
(0.8045, 0.8108)
0.6599
0.1923
2
3
4
TTRol20_801
FTRol20_801
PTRol20_803
(AUC) is calculated using the R package ‘pROC’ [11].The Receiver Operating Characteristics (ROC) curve is a popular model comparison tool based on two major evaluation metrics-sensitivity and specificity. The area under the ROC curve indicates the capability of accurate prediction by the model. The various AUC values for the models under this study are as in Table-7. The table shows that the overall AUC values of the SVM model for all the parameters is only marginally higher compared to the Naïve Bayesian model. Based on the results we can conclude that both the models have a near similar performance for this data set.
7 Conclusion The main aim of this paper was to introduce classification models that are trained with the past data recorded in DCS system and test their capability for estimation of a real time process data without giving a time treatment while processing. Further, an expert system can be developed using these models that can use the real time data, analyse the key parameters, and predict their values to help the operator to understand whether the process is in an optimum condition with maximum efficiency. The data was collected at a most stable configuration of the process and the data frames were made by assigning classes for the ranges. We compared two supervised machine-learning algorithms of naïve Bayes and SVM classification for estimating
F1 Score (Harmonic Mean Of Precision Recall)
Balanced Accuracy
Fall Out/False Positive Rate (FPR)
Miss Rate/False Negative Rate (FNR)
Negative Prediction Value (NPV)
Positive Prediction Value (PPV)/Precision
0.8246 0.82335
Naïve Bayesian SVM
0.7506
SVM
0.2952 0.7413
SVM Naïve Bayesian
0.2935
0.2037
SVM Naïve Bayesian
0.2009
Naïve Bayesian
0.618
SVM
0.8523 0.6249
SVM Naïve Bayesian
0.8518
0.7048
Naïve Bayesian
0.7065
SVM
0.7963
SVM Naïve Bayesian
0.7991
Naïve Bayesian
Sensitivity/Recall/True Positive Rate (TPR)
Specificity/Selectivity/True Negative Rate (TNR)
TTRol20_808
Model Metrics
Table 6 Class Metrics Values
0.85464
0.85594
0.87308
0.87252
0
0.00313
0.25383
0.25183
0.02418
0.02014
1
0.99998
1
0.99687
0.74617
0.74817
TTRol20_801
0.85276
0.8309
0.85185
0.665
0.04
0.48320
0.25630
0.18690
0.04685
0.45200
0.99929
0.84950
0.96
0.51680
0.7437
0.8131
FTRol20_801
0.85684
0.85769
0.7787
0.7804
0.3131
0.3104
0.1295
0.1287
0.7322
0.7347
0.8436
0.8445
0.6869
0.6896
0.8705
0.8713
PTRol20_803
SVM and Naïve Bayes Models … 299
300
S. Narasimhan and V. Rajendran
Table 7 Area under the Curve Values Model
TTRol20_808
TTRol20_801
FTRol20_801
PTRol20_803
Naïve Bayesian
0.8548
0.9352
0.918
0.9034
SVM
0.9008
0.9472
0.9253
0.8962
the key performance variables. We also have evaluated the performance of the models using a test data set calculating various metrics based on their predictions. The study results show that both the models are performing similar for all the parameters with their global accuracy levels at 75%. The AUC values are in the range of 90–95% except for naïve Bayes model for the parameter TTRol20_808 that is at 85%. Hence, we cannot neglect the usability of these models for prediction in a real time data scenario under steady state conditions. Further study can be done on the performance by modifying the class levels with lower bin sizes, using ensemble techniques such as Adaboost, Random Forest and Bagging Tree algorithms.
References 1. Flynn, D., Ritchie, J., Cregan, M.: Data mining techniques applied to power plant performance monitoring. In: IFAC Proceedings Volumes (IFAC-PapersOnline) (2005) 2. Li, J.Q., Niu, C.L., Liu, J.Z., Zhang, L.Y.: Research and application of data mining in power plant process control and optimization. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2006) https://doi.org/10.1007/11739685_16 3. Ogilvie, T., Swidenbank, E., Hogg, B.W.: Use of data mining techniques in the performance monitoring and optimisation of a thermal power plant. IEE Colloq. (1998). https://doi.org/10. 1049/ic:19980647 4. Narasimhan, S., Rajendran: Application of data mining techniques for sensor drift analysis to optimize nuclear power plant performance. Int. J. Innov. Technol. Explor. Eng. (2019) https:// doi.org/10.35940/ijitee.a9139.119119 5. Narasimhan, S.: Optimization of a process system in nuclear power plant- A data mining approach. Grenze Int. J. Eng. Technol. Spec. Issue, vol. Grenze ID:, pp. 1–11 (2020) 6. Korovkinas, K., Dan˙enas, P.: SVM and Naïve Bayes classification ensemble method for sentiment analysis. Balt. J. Mod. Comput. 5(4) (2017) https://doi.org/10.22364/bjmc.2017. 5.4.06 7. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Coling 2010—23rd International Conference on Computational Linguistics, Proceedings of the Conference (2010) 8. Agarwal, S.: Data mining: Data mining concepts and techniques (2014) 9. Hornik, K., Weingessel, A., Leisch, F., Davidmeyerr-projectorg, M.D.M.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, 1.7–3 (2019) 10. Kuhn, M.: Classification and Regression Training [R package caret version 6.0–86] (2020). Accessed 10 Jun 2020 [Online]. Available: https://cran.r-project.org/package=caret 11. Robin, X., et al.: pROC: An open-source package for R and S + to analyze and compare ROC curves. BioMed Cent. Ltd (2011). https://doi.org/10.1186/1471-2105-12-77
Face Recognition Using Transfer Learning on Facenet: Application to Banking Operations Gopireddy Vishnuvardhan and Vadlamani Ravi
Abstract Of late, biometric systems are ubiquitous to provide user authentication and security. With the advent of AI aided computer vision, face recognition systems are the new age biometrics gaining attention. They are more robust than fingerprint scanners and provide contactless experience to users. In this paper, we discussed the applications of face recognition and face verification techniques for banking and finance services. We propose an efficient and better approach to train a face recognition model which has potential applications in banking operations among other domains. We performed face detection with Histograms of Oriented Gradients (HOG) face detection. Our approach involves transfer learning on the state-of-the-art face recognition model Facenet to extract face embeddings and a kind of Nearest Neighbors (NN) to label the face. Our approach doesn’t involve large datasets and powerful GPU computations to train the model. We performed our experimentation on the Georgia Tech Face-Database (GTFD) and achieved an accuracy of 96.67%, which is very close to human vision (97.53%) and a significant improvement over other approaches. Keywords Face recognition · Facenet · Face detection · Histograms of oriented gradients · Transfer learning · Nearest neighbors
G. Vishnuvardhan · V. Ravi (B) Center of Excellence in Analytics, Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, 500057 Hyderabad, India e-mail: [email protected] G. Vishnuvardhan e-mail: [email protected] G. Vishnuvardhan School of Computer and Information Sciences, University of Hyderabad, 500046 Hyderabad, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_23
301
302
G. Vishnuvardhan and V. Ravi
1 Introduction With authentication and security being the major concerns of any organization, biometrics are becoming the most important and a common thing in our lives. Biometric technology, a detailed study on biometrics [1], is gaining more attention. Early biometrics systems involve fingerprint scanners. With the recent advances in Artificial Intelligence (AI), people started using Face Recognition (FR) system as biometrics [2]. A robust FR system deployed on a Single Board Computer (SBC), can replace the existing biometric fingerprint scanners. One of the advantages of face recognition is that it gives contactless biometric experience to a user. Some of the important characteristics of a good face recognition system include: 1. 2. 3. 4.
Face Recognition should perform on a real-time video from a live camera without any additional actions from the user [3]. Since, the performance of Face Recognition is largely dependent on face detection, a good face detection technique is desirable [4]. It should also perform liveliness detection and tolerate to spoofing and attacks [5]. It should perform well even with poor light conditions and face tilting angles [6].
Face Recognition is one of the computer vision problems with many potential applications across various domains. It is one of the challenging problems and is very different from face verification. It involves matching the given face of a person with the existing faces in the database. It is a type of one to many mapping, whereas face verification is one to one mapping, involving validating the face and name of a person. We now briefly survey the literature on FR. In the 1990s, People identified many potential applications for FR and tried to solve this challenging problem. The Literature on FR is categorized into four types of learning [7]: (i) Holistic Learning (ii) Local Handcrafts learning (iii) Shallow Learning (iv) Deep Learning. The holistic approach derives the lower representations of a face image. Turk [8] obtained lower dimensions with the eigenfaces approach. These eigenfaces are calculated with the Principal Component Analysis (PCA) [9] on the given face. Belhumeur [10] achieved lower dimensions with fisher linear discrimination (FLD). He [11] suggested a Laplacian approach to get lower dimensions of a face. Maffiri [12] extracted PCA features based on L1 and L2 norm and wavelets for FR. In local handicrafts learning methods, local features are extracted from a given face and supervised learning is performed on the extracted features. Štruc [13] proposed Gabor filters to extract features from a given face. Ahonen [14] extracted Local Binary Pattern (LBP) histograms from a face. Kekre [15] extracted features with Discrete Cosine Transform (DCT) and Walsh Transform.
Face Recognition Using Transfer …
303
Before the rise of deep neural networks, there were some shallow learning approaches to face recognition. Chan [16] proposed PCAnet for image classification. He proposed two variations of PCAnet namely RandNet and LDAnet. However, these methods fail to learn highly non-linear face representations. Later in 2011, Krizhevsky [17] proposed Alexnet with deep CNN’s reporting better results with a huge margin. From then on, researchers from FR community started to study deep face recognition. Sun [18] proposed a face recognition with a hybrid of CNN and Restricted Boltzmann Machines (RBM). Tiagman [19] introduced deep face for face recognition. Later, Schroff [20] proposed a state-of-the face recognition, Facenet model trained on 6M images, and achieved an accuracy of 97.35 which is very close to human performance (97.53). Apart from FR as biometrics, there are numerous applications with of recognition, across various domains. Here are some of them related to Banking and Finance sector. 1.
2.
3.
4.
Facial based payment systems are next-generation payment systems. Users need not carry physical currency or cards or mobile. At Point Of Sale (POS) machine user can go and smile to make a payment. With robust face recognition, this is the safest mode of payment with less fraudulent transactions. A face verification system embedded into Automated Teller Machines (ATMs) along with Personal Identity Number (PIN) can provide more security to users. Banks can reduce fraudulent withdrawals from ATMs. A face recognition system installed in banks can remove the use of passbooks from customers. This can improve authentication in bank transactions. Also, banks can keep a record of their customer visits, which banks can use for analytics and improve their business and customer relations. Although mobile banking applications include passwords, PINs, and captcha to provide security, still there are few instances of unauthenticated logins. Face Verification system provides another level of security in mobile banking applications and can reduce unauthenticated logins
The motivation for the present research is to explore the various applications of face recognition and face verification in the banking sector. Address the challenging problems of face recognition like training with limited computation resources, limited training data and to introduce a better feature extraction technique for face recognition on GTFD. The contributions of this paper include, providing a method to build a face recognition system without any high-performance GPU system and massive datasets. Our approach to face recognition doesn’t require any additional training when a new face is added to the FR system. Our approach of feature extraction is better compared to others. We also provided a use case for secure banking operations. The rest of the paper is organized as follows: Sect. 2 presents the detail of background knowledge; Sect. 3 presents a detailed description of our proposed model; Sect. 4 presents the dataset description and evaluation metrics; Sect. 5 presents a discussion on the results and Sect. 6 presents the concludes and future directions.
304
G. Vishnuvardhan and V. Ravi
2 Background 2.1 Histogram of Oriented Gradients (HOG) Dalal and Triggs [21] proposed Histogram of Oriented Gradients (HOG) technique for human detection. The first step in HOG is to divide an image into grids of size 8 × 8. HOG features are calculated for each grid in the image based on the gradient magnitude and gradient angle with adjacent pixel values. Gradient magnitude and gradient angles are calculated from the horizontal and vertical gradients. Similarly, the gradient vector (magnitude and angle) for all 64 pixels in a grid is computed. The next step is to transform these 128 vectors into 9 dimensional vectors with histogram binning. This vector represents the magnitude and direction of the edge present in the patch. All these vectors form HOG features and a Support Vector Machine model classifies faces from these features.
2.2 Transfer Learning Transfer learning [22] is a learning methodology of deep neural networks. Transfer learning is applying knowledge gained on solving one problem to other related or different problems. Transfer learning is one of the widely chosen techniques for training deep neural networks. The performance of neural networks improves as the depth of the network increases. But training deeper networks is a challenging task. It requires larger datasets and powerful GPU computation. Transfer learning overcomes these challenges and makes training easier and efficient.
2.3 Triplet Loss and Facenet Facenet [20] is the popular face recognition neural network from Google AI. With the achievement of the accuracy of over 97% on Labeled Faces in the Wild (LFW), it is the state-of-the-art face recognition algorithm. Facenet is a trained in the triplet loss function. Each training batch consists of Baseline Image
Same Person
Positive Image Negative Image
Different Person
Triplet loss function involves minimizing the distance between the baseline and a positive image while maximizing the distance between the baseline and negative image.
Face Recognition Using Transfer …
305
3 Proposed Methodology We performed the face recognition in two steps: firstly face detection followed by face recognition. Face detection gives the coordinates of all faces in a given image. The performance of any face recognition system depends mainly on the detection algorithm. Hence, it is the most crucial step. We considered the Histogram of Oriented Gradients (HOG) method for face detection. HOG object detection involves extracting HOG features from a region and a support vector machine (SVM) to classify the object in the region. The scaling problem in face detection is solved by performing HOG detection on images with different scales. Non Max Suppression (NMS) removes multiple bounding boxes across a single object. The next step is to perform face recognition. The cropped face is then passed on to a trained Facenet to extract the features. These features are of size 128-Dim also called embeddings. Since we consider the Facenet model trained on the triplet loss function to extract the features, the embeddings of the same person are closer than the embedding of different persons. In the training phase, we stored all the embeddings in a knowledge base along with the person ID. A good face recognition system should differentiate unknown persons in the database. To ensure this we performed tolerance test and then applied KNN with Euclidian Distance (D). While inferring, we performed a kind of Nearest Neighbors (NN) with majority voting. For a given inference image, we obtained face embedding from the Facenet and compared it with all the available embeddings from the knowledge base. We considered Euclidian Distance (ED) for the comparison. If the ED is less than the tolerance limit (TL), we consider it as neighbors. In the end, we performed majority sampling on the available neighbors. Semantics of the inference image is depicted in the Fig. 1.
4 Dataset Description We considered Georgia Tech Face Database (GTFD) to measure the performance of our approach with other approaches. GTFD consists of 750 images with 50 persons with 15 faces per person captured from different angles. The dataset has color images of size 640 × 480 with a cluttered background. Images of persons are with different tilting angles and light conditions. All the cropped images of a subject are depicted in the Fig. 2. We split our dataset into train and test in 80 and 20 ratio so that 600 images are in the train set and 150 fall into the test set. We performed stratified random sampling so that three images per person fall into the test set.
306
G. Vishnuvardhan and V. Ravi
Face
Cropped
Face Detection
Image
Face
(HOG)
Trained Facenet
128 D Embeddin
No
If
Discard
Knowledge
D 0, or 2 x=−1
a minima occurs. The maximum value of the function is a limiting value (as x → ∞) of 1 while the minimum value at (x = –1) is σ 5 (–1) = –1/8. Since the limit of the function as x→ –∞, is σ5 (x) → 0, for x → ∞, σ5 (x) → 1 and at x = –1, σ5 (–1) is negative. The property is established. An examination of the behavior of σ5 (x) reveals that it is monotonically decreasing in the region x ∈ (–∞,–1), and monotonically increasing in the region x ∈ (–1, ∞). We have also established the following: Property 5 The function σ 5 (x) has the range (–1/8,1). The Fig. 1 shows the variation of the activation functions and their derivatives. From Fig. 1(b) it is noticed that the first 4 activation functions (σ1 –σ4 ) have symmetric and non-negative derivatives while the activation function σ5 derivative is neither symmetric about the y-axis, nor it is non-negative (which is to be expected, as the function is a decreasing function in the region (–∞,–1)).
3 Tasks and Experiment Design 3.1 Benchmark Tasks The following five benchmark tasks are taken from the UCI repository of machine learning databases [3]:
322
A. Mishra et al. 1
1
'1
0.8
'2 1
0.6
0.8
2
'3 '4
3
0.4
'5
4
0.6
5
0.2 0.4
0 -0.2
0.2
-0.4 -0.6
0
-0.8 -1 -15
-10
-5
0 x (a)
5
10
15
-0.2 -15
-10
-5
0 x (b)
5
10
15
Fig. 1 (a) Variation of activation functions, and (b) of activation function derivative
1.
2.
3.
4.
5.
Airfoil Self-Noise (ASN) Data Set: The data set has 1503 records with 5 input dimensions and 1 output dimension. 1202 records are randomly selected to create the training data set while the rest 301 records constitute the test data set. For details regarding the characteristics of the data set please see [4]. Combined Cycle Power Plant (CCP) Data Set: The data set has 9568 records with 4 input dimensions and 1 output dimension. 7654 records are randomly selected to create the training data set while the rest 1914 records constitute the test data set. For details regarding the characteristics of the data set please see [5]. Concrete Compressive Strength (CCS) Data Set: The data set has 1030 records with 8 input dimensions and 1 output dimension. 824 records are randomly selected to create the training data set while the rest 206 records constitute the test data set. For details regarding the characteristics of the data set please see [6]. Energy Efficiency (EE) Data Set: The data set has 768 records with 8 input dimensions and 2 output dimensions. 614 records are randomly selected to create the training data set while the rest 154 records constitute the test data set. For details regarding the characteristics of the data set please see [7]. Condition Based Maintenance of Naval Propulsion Plants (NPP) Data Set: The data set has 11934 records with 16 input dimensions and 2 output dimension. 9547 records are randomly selected to create the training data set while the rest 2387 records constitute the test data set. For details regarding the characteristics of the data set please see [8].
The division of the data set of exemplars into training data set and test data set is in the ratio of 80:20.
A Non-monotonic Activation Function …
323
4 Experiment Design For the conduct of the experiments, the architecture design of the feed-forward neural networks has to be decided first. Since the universal approximation results for these networks require the minimum presence of one hidden layer, the number of hidden layers is kept at one. To decide the size of the hidden layer (number of nodes in the hidden layer) experiments where conducted for each task in which the number of hidden layer nodes was varied from 2 to 100. These exploratory experiments used the log-sigmoid activation function at the hidden layer nodes. The smallest sized layer that gave a satisfactory error of training in 100 epochs of training on the training data sets (for the tasks) was chosen as the architecture to be used. The architectural summary of the networks is presented in Table 1. For each learning task, an ensemble of 50 weights and thresholds are created. Thus together with the fact that there are five activation functions to be used for each task, at the hidden layer nodes, and there are 5 tasks, therefore a total of 50 × 5 × 5 = 1250 networks are trained. The initial weights and thresholds are drawn from uniformly distributed random numbers in the range (–1,1). The experiments are conducted on Matlab 2018b, a Microsoft Windows 10 Professional operating system on Intel i5 CPU with 8 GB RAM. Thus, we associate an ensemble of 50 initialized networks with one activation function (the activation function being used at the hidden layer nodes), for each task. For each of these networks, the training algorithm used, in the network training phase, is the improved version of the resilient backpropagation algorithm [9] with weight back tracking (iRPROP+ ) as proposed in [10]. The iRPROP+ algorithm has been demonstrated to be better than other variants of the back-propagation algorithm like quickprop, and second order methods like BFGS and conjugate gradient methods including scaled conjugate gradient and the Levenberg Marquardt method [10, 11]. The networks are trained for 2000 epochs. The generalization capability (error measured on the test data set) of the trained feed-forward artificial neural networks defines the quality of the network achieved to solve a task. The lower the generalization error, better is the network trained. To report on the experimental results, to characterize the generalization error, we report the mean of the mean squared errors (MMSE) for each ensemble identified by the activation function used, for each task. Similarly, we also report the standard Table 1 Architectural summary of the feed-forward artificial neural networks Tasks
Inputs
Hidden Nodes
Outputs
Training Set Size
Test Set Size
ASN
5
25
1
1202
301
CCP
4
45
1
7654
1914
CCS
8
8
1
824
206
EE NPP
8
35
2
614
154
16
95
2
9547
2387
324
A. Mishra et al.
deviation (STD) of these mean squared error. Since, the median is considered to be a more robust estimator of the central tendency of any data [12], we report the median of the mean square (generalization) errors (MeMSE). For comparison of the relative performance of the different activation functions, we also create a comparison matrix in which the row and columns are labelled by the activation functions. Every element of the matrix shows in how many tasks, the networks using the row label activation functions were achieving lower MMSE or MeMSE values than the networks using the column label activation functions. The row sum (column sum) of the matrix elements presents the total number of cases in which the row label (column label) activation function was better (worse). If the comparison matrix is made on the basis of the values of the MMSE or MeMSE, we have an assessment of the relative efficacy and efficiency of the activation functions. If the comparison matrix is made on the basis of the one sided Student’s t-test (for MMSE) [13] or the Wilcoxon’s rank-sum test (for MeMSE) [13], then the corresponding entries of the comparison matrix represents in how many cases /tasks, the row label activation function performed better than the column label activation function and the difference in the MMSE or the MeMSE was also statistically significant. The tests of hypotheses were carried out at a significance level of α = 0.05.
5 Results and Discussions The summary of the generalization errors is shown in Table 2. The minimum MMSE /MeMSE achieved for each task is highlighted in this table. Tables 3 and 4 shows the comparison matrix on the basis of the MMSE value and the MeMSE value achieved, respectively. The best performing activation function is also highlighted in these tables. Table 5 represents the comparison matrix based on the comparison of the means of the (mean squared) errors using the one sided Student’s t-test to find in how many cases/tasks, the row label activation functions allow networks to achieve a statistically significant lower mean error than the column level activation function using networks, at a significance level of α = 0.05. Similarly, Table 6 represents the comparison matrix based on the comparison of the medians (of the mean squared errors) using the one sided Wilcoxon’s rank-sum test to find in how many cases/tasks, the row label activation functions allow networks to achieve a statistically significant lower median error than the column level activation function using networks, at a significance level of α = 0.05. In both Tables 5 and 6, the matrix entry for the best performing activation function is highlighted. From the data presented in Tables 2, 3, 4, 5 and 6, we may infer the following: 1. 2.
The non-monotone activation function (σ5 ) using networks always achieve a lower MMSE and MeMSE value (Table 2). The above statement is further strengthened by the Tables 3 and 4 representing the comparison matrix of results on the basis of MMSE and MeMSE values,
A Non-monotonic Activation Function …
325
Table 2 Generalization error summary All values × 10−3 Task
Statistics
σ1
σ2
σ3
σ4
σ5
ASN
MMSE
33.31888
34.67144
33.64792
33.93929
29.71525
STD CCP
2.12713
2.01997
3.36342
2.09996
3.67445
MeMSE
33.22339
34.74828
33.21022
34.06882
29.49841
MMSE
11.83476
11.76419
11.34546
11.48920
11.14864
STD CCS
0.18308
0.14489
0.13688
0.14314
0.10689
MeMSE
11.85351
11.78025
11.36367
11.47429
11.14969
MMSE
21.00206
23.97788
22.44438
23.55736
20.75705
STD EE
NPP
2.05358
2.72309
2.92537
3.09862
3.06756
MeMSE
20.73807
23.58249
22.11885
22.85954
20.04935
MMSE
2.34412
2.63844
2.54609
2.34486
1.93008
STD
0.26935
0.42007
1.24218
0.37366
0.31685
MeMSE
2.36640
2.53293
2.31717
2.31167
1.89252
MMSE
29.54724
10.75668
10.77826
10.42107
9.01156
STD MeMSE
4.89036
1.63243
1.59089
1.28749
1.66771
30.01103
10.98135
10.83320
10.30814
8.82063
Table 3 Comparison matrix on the basis of value of MMSE σ1
σ2
σ3
σ4
σ5
Row Sum
σ1
00
03
03
03
00
09
σ2
02
00
01
00
00
03
σ3
02
04
00
03
00
09
σ4
02
05
02
00
00
09 20
σ5
05
05
05
05
00
Column Sum
11
17
11
11
00
Table 4 Comparison matrix on the basis of value of MeMSE σ1
σ2
σ3
σ4
σ5
Row Sum
σ1
00
03
01
02
00
06
σ2
02
00
00
00
00
02
σ3
04
05
00
03
00
12
σ4
03
05
02
00
00
10
σ5
05
05
05
05
00
20
Column Sum
14
18
08
10
00
326
A. Mishra et al.
Table 5 Comparison matrix on the basis of value of MMSE using one sided Student’s t-test at a significance level of α = 0.05 σ1
σ2
σ3
σ4
σ5
Row Sum
σ1
00
03
01
01
00
05
σ2
02
00
00
00
00
02
σ3
02
03
00
02
00
07
σ4
02
03
00
00
00
05
σ5
04
05
05
05
00
19
Column Sum
10
14
06
08
00
Table 6 Comparison matrix on the basis of value of MeMSE using one sided Wilcoxon’s rank-sum test at a significance level of α = 0.05 Task
σ1
σ2
σ3
σ4
σ5
Row Sum
σ1
00
03
01
02
00
06
σ2
02
00
00
00
00
02
σ3
02
04
00
02
00
08
σ4
02
03
00
00
00
05
σ5
04
05
05
05
00
19
Column Sum
10
15
06
09
00
3. 4.
respectively. From these values it is clearly seen that the networks using the activation function σ5 achieve lowest MMSE values as well as lowest MeMSE values. From Tables 3 and 4, we may also infer that the worst performance is of networks using the activation function σ2 . The above asserts that on the basis of comparison of MMSE and MeMSE, the performance of the networks using the activation function σ5 is the best. Tables 5 and 6 are the summary of the results for the statistical significance of this assertion. From these tables we infer that the networks using the activation function σ5 is achieving lower MMSE and MeMSE values in all five tasks /cases when compared with the activation function σ2 , σ3 , and σ4 . The networks using the activation function σ5 is significantly better than the networks using the activation function σ1 in 4 out of the 5 tasks, but is not worse than σ1 in any task.
Thus, from these inferences we may conclude that the networks using the nonmonotone function σ5 , as an activation function at the hidden layer nodes, outperforms the networks using the other four activation functions which are monotonically increasing functions. Thus the usage of other non-monotone activation functions that are bounded, continuous and differentiable should be explored.
A Non-monotonic Activation Function …
327
6 Conclusions The most commonly used activation functions in the field of artificial neural networks is the class of sigmoidal activation functions which are bounded, continuous, differentiable and monotonically increasing function. The universal approximation results for feed-forward artificial neural networks allow activation functions to be any arbitrary non-polynomial function. In this paper the properties of a bounded, continuous, differentiable and non-monotone function is described. The efficacy and efficiency of using this non-monotone function as activation function is demonstrated on five benchmark learning task. The non-monotone activation function is compared with four generally used activation functions on the benchmark tasks. Results demonstrate that the networks using the non-monotone activation function (σ5 ) at hidden layer nodes out-perform the other 4 sigmoidal activation function using networks.
References 1. Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numerica 8, 143–195 (1999) 2. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall Inc, New Jersey (1999) 3. Dua, D., Graff, C.: UCI Machine Learning Repository, School of Information and Computer Sciences, University of California, Irvine, http://archive.ics.uci.edu/ml, last accessed 20 Feb 2020 4. Lau, K.: A neural networks approach for aerofoil noise prediction. Master’s thesis, Department of Aeronautics, Imperial College of Science, Technology and Medicine, London, United Kingdom (2006) 5. Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 60, 126–140 (2014) 6. Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998) 7. Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012) 8. Coraddu, A., Oneto, L., Ghio, A., Savio, S., Anguita, D., Figari, M.: Machine learning approaches for improving condition-based maintenance of naval propulsion plants. J. Eng. Marit.E Environ. 230(1), 130–153 (2014) 9. Riedmiller, M.: Advanced supervised learning in multi-layer perceptrons—From backpropagation to adaptive learning algorithms. Comput. Stand. Interfaces 16(5), 265–278 (1994) 10. Igel, C., Hüsken, M.: Empirical evaluation of the improved RPROP learning algorithms. Neurocomputing 50, 105–123 (2003) 11. Sodhi, S.S., Chandra, P.: Bi-modal derivative activation function for sigmoidal feedforward networks. Neurocomputing 143, 182–196 (2014) 12. Huber, P.J.: Robust statistics. Wiley, New York (1981) 13. Hettmansperger, T., McKean, J.: Robust Nonparametric Statistical Methods, ser. Kendall’s Library of Statistics: An Arnold Publication No. 5. Arnold (1998) 14. Mishra, A., Chandra, P., Ghose, U.: A new Activation function validated on function approximation tasks. Proc. Int. Conf. Comput., Inform. Netw. (Accepted) (2020)
Analysis of Approaches for Automated Glaucoma Detection and Prediction System Upasana Mishra and Jagdish Raikwal
Abstract Glaucoma is main reason behind early or non restorable blindness. Approx 4% people above the age of 40 years are suffering from glaucoma. It is a disease related with age that gradually damages optic nerve & causes loss of vision. At early stages it is impossible to detect glaucoma & at later stages we can’t restore normal vision once it is lost. Various automatic glaucoma detection systems were studied & analyzed in this survey paper. Systematic review of pre processing, feature extraction & selection, Machine Learning approaches & various glaucoma affected person’s data was conducted to find best possible solution for efficiently detecting and predicting glaucoma. It is important to automatically predict glaucoma but unfortunately not much work has been done in this area. Many existing Machine learning techniques are capable of identifying about 85% of cases accurately at later stages but then we can’t cure eyes to restore vision. Optical Coherence Tomography can be utilized for predicting development of glaucoma in an eye and also baseline peripapillary retinal nerve fiber layer (pRNFL) & macular Optical Coherence Tomogrphy (OCT) parameters can be utilized for checking risk of glaucoma in near future. Keywords Glaucoma detection · Glaucoma prediction · OCT · Feature extraction · Reinforcement learning · Machine learning · Classification
1 Introduction Optic nerve gets destructed in situation of glaucoma & a serious cause of vision loss. A big class of humans which possess glaucoma may become totally blind due to lack of proper treatment and cure. Eye-care professionals can identify patients suffering from glaucoma by using variety of clinical finding [1]. Glaucoma is a group of several diseases with similar characteristics. It depends on number of parameters such as age, blood pressure, diabetes, myopia, inherent glaucoma if any person in family has suffered from it in past & intraocular pressure. The discovery of ophthalmoscope U. Mishra (B) · J. Raikwal IET DAVV, Indore, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_26
329
330
U. Mishra and J. Raikwal
Fig. 1 Glaucoma Eye View
helped to detect glaucomatous changes in the fundus of eye. Further invention of the tonometer to find intraocular pressure also helped in the diagnosis of glaucoma (Fig. 1). The first operation of glaucoma of human was done in 1856 by Graefe [1]. However evidences have shown that it is very difficult to detect glaucoma manually as it totally depends on expertness of professionals. In this paper, an attempt was made to study and analyze how to efficiently determine or find progression of glaucoma in individuals as soon as possible so that it gets cured & vision loss can be stopped. It can be achieved by implementing classification algorithms by considering determinants that causes glaucoma in human accurately.
1.1 Glaucoma It is a severe disease which causes loss of sight in a person [1]. It’s a situation in which pressure within the eyeball increases which slowly leads to vision loss. It is difficult to identify at initial level and when detected the lost vision cannot be restored. We can only prevent further blindness by proper treatment.
1.2 Glaucoma Detection System It is a complex condition of eye which brings permanent blindness in a person. Therefore, detection at early stage is important because treatment at initial level can safeguard remaining vision but we cannot restore lost vision. Any single test can’t effectively diagnose it. A regular eye check-up on consultation with eye specialist includes screening for glaucoma & may indicate that if another examination is required or not.
Analysis of Approaches for Automated …
331
Any eye specialist conducts 5 test for assessment of glaucoma and then considers patient health history to diagnose and detect Glaucoma which are as follows. a. b.
c.
d.
e.
Tonometry: It determines pressure within eye of a patient. Optical Coherence Tomography: This scan is very important for diagnosis of glaucoma. It is used to find important sign of early glaucoma damage that is retinal nerve fibre layers around optic nerve. Ophthalmoscopy: Optic nerve is examined in this test. As it’s a severe problem related to optic nerve, so this is a very important test. Eye drops are used to enlarge the size of pupil of patient’s eye to look optic nerve more clearly to find signs of disease related nerve cell loss in eye. Perimetry: Glaucoma is a disease which causes peripheral vision loss at initial stage. Therefore this test is done to detect vision loss. This test is also called as a visual field test. It includes testing each eye distinctly with an automatic device that flashes lights in the periphery of eye of person. Gonioscopy: It s the test related to intraocular fluid outflow drainage angle. Fluid is constantly being prepared in eye & then it flows out at fixed angles. This test is done to find that whether the high eye pressure is caused by a blocked angle that is known as angle closure glaucoma or if angle is open but not working properly then it is known to be as open angle glaucoma.
1.2.1
Automated Glaucoma Detection Process
Manual detection of glaucoma is very difficult and sensitive for which very high expert knowledge and skills are required. Many researches are going on for development of automatic glaucoma detection techniques. After tremendous efforts automated glaucoma detection has been achieved by utilizing and combining different machine learning approaches like neural networks, decision tree, Support Vector Machine, Naive Bayes, k nearest neighbour and linear regression etc. The general process of automated glaucoma detection is consists of following steps: • • • • •
Retinal Images Preprocessing Feature Selection Feature Extraction Classification
In disease detection process all images of fundus of eye is taken. Preprocessing is done to reduce inequalities in regularities with pictures. Then Feature extraction is done to simplify number of features or data needed to describe a big dataset effectively [2]. Only significant feature’s are utilized in classification or used in algorithm. Classification means to analyse the retinal images on the basis of result of analysis dataset is further categorized into different classes as normal or glaucomatous.
332
a.
b.
c.
d.
U. Mishra and J. Raikwal
Retinal Images: Image of fundus of eye or retina is captured using different ophthalmic imaging technology. These images play vital role in glaucoma detection. Fundus images are basic images that do not contains eye’s detailed information. Confocal Scanning Laser Tomography and OCT are used to clearly analyze fundus images and detect glaucoma. Laser light is used in CSLT. It is the process of scanning object by focused ray of laser & captures reflected ray through a small confocal pin hole. OCT uses normal light & picture is generated on basis of reflected light capturing internal eye details [2]. Pre-processing Technique: For error free & accurate diagnosis of glaucoma various pre processing techniques are applied on images. Appearance based approach and Method of blood vessel removal is used for preprocessing by separating blood vessels [3]. Mean filters pre processing technology is used for pre-processing of OCT pictures that involves colour, resizing images, noise removal from original image etc (Figs. 2 and 3). Feature Extraction: It is very necessary step as quality of features determines the efficiency of detection system. Different feature extraction methods are used by researchers in earlier works for successful detection. In our survey we found that moment method was utilized to find characteristics such as median, mean & variance. Morphology & Pixel intensity were also used to determine feature such as luminous intensity, translation variance & Cup-Size etc. [4]. Macular algorithm and few other methods were also used for this purpose. Feature Selection: It means process of selecting a sub class of required features for purpose of building a model for algorithm. Various techniques were used to
Fig. 2 Ophthalmic Imaging Technology
Fig. 3 Different Pre-processing Technologies
Pre-processing techniques
Appearance based approach
Method of blood vessel removal
Mean Filters
Analysis of Approaches for Automated …
333
Table 1 Methods and Their Performance for Detection & Prediction Name of Method
Glaucoma Detection
Glaucoma Prediction
Attention Based CNN
Yes with 96% accuracy
No
Deep Multi-task Learning Interpretable Model
Yes with 85% accuracy
No
Recurrent Neural Network
Yes
No
Cross Validation Algorithm
Yes with 82% Accuracy
No
Classification Approach Using Deep CNN
Yes with AUC 83%
No
Pattern Classification Method
Yes with average accuracy
No
Random forest
Yes with 98% Accuracy
Yes
Transfer learning with CNN
Early Detection with 95% accuracy & advanced with 85%
No
Perimeter method of Fractal analysis
Yes with 92% Accuracy
No
C5.0
Yes with 97% Accuracy
Yes
KNN
Yes with 97% Accuracy
Yes
SVM
Yes with 97% Accuracy
Yes
OCT
No
Yes
Clinical Disc Parameters
No
Yes
Fuzzy logic
No
Yes
Linear Regression
Yes
No
e.
select features for detection of glaucoma like wrapper methods, filter methods and PCA etc. [5]. Classification: Different classification or machine learning methods are used to automatically detect glaucoma. Various machine learning methods along with their performance is shown in tabular form (Table 1).
1.3 Glaucoma Prediction System Many Glaucoma detection techniques are there but they detect it at mature stage not at early stage. If symptoms of glaucoma are predicted or identifies at initial level then we can treat patients at earlier and can save loss of vision but unfortunately very small amount of work has been carried out for automatic forecasting of glaucoma. Till now 2 methods ‘fuzzy logic’ & ‘linear regression’ were utilized in task of prediction [6]. Recently with advancement in technology, a new instrument has been made and provided to eye specialist doctors to predict development of glaucoma with high eye pressure known as “Risk calculator” [7]. University of California has discovered/designed this Risk calculator that can estimates future exposure to ocular
334
U. Mishra and J. Raikwal
hypertension progressing to disease in a patient. Recent study validated key risk determinants which can help in prediction of glaucoma. Another method of glaucoma prediction is morphological prediction of glaucoma by quantitative analysis of shape of eye & volume using 3-D T2-weighted MR pictures [8]. Morphological characteristics of eyeballs, which could be key reasons for developing glaucoma are not described in proper manner. Authors investigated 3D topographic features of glaucomatous eyeballs with or without myopia to find capacity of those characters for predicting glaucoma.
2 Related Work Many authors proposed different approaches for efficient and accurate detection of Glaucoma. This part of our paper contains a small and simple description of earlier existing works carried out by different authors. The existing work is classified as per the type of machine learning methods used by authors for their work.
2.1 Supervised Learning These algorithms are consists of a dependent value or parameter which is calculated from a given set of independent values or variables. We create a function that maps inputs to outputs. Some papers which used Supervised Learning are discussed below. Rashmi Panda et al. [9] put forward a automated model for Retinal nerve fiber layer defect detection. As it is an early proof of glaucoma condition in fundus images. Early detection & prevention are the ways to stop loss of vision. New method performs detection in fundus images using patch characteristics driven RNN. Fundus images dataset is used for purpose of evaluating performance. High RNFLD detection and accurate boundary localization is obtained by this system. Kavita Choudhary et al. [10] presented a paper with the aim of detection of glaucoma at early stages using cross validation algorithm. Authors analysed symptoms prevailing in persons & computed & generalized those symptoms to reach to conclusive evidence. It was found that measures such as blood pressure, Age, Sugar level, & myopia were combined for different datasets are related with chances of person suffering from glaucoma. Authors in their study have done analysis of glaucoma disease by Classification techniques such as ‘cross validation’ algorithm & ‘split validation’ algorithm. Outcome reveals that patients which have high blood pressure, high sugar level, myopia & with the family history of this disease can suffer from glaucoma. It is also observed that the patients with age more than 50 have higher chances of glaucoma. Seong Jae Kim et al. [11] in this paper studied and made an attempt to design MLmodels which have robust power of prediction & interpretability to diagnose glaucoma on basis of RNFL thickness & visual field. Different features were collected
Analysis of Approaches for Automated …
335
after examination of RNFL thickness & visual field. Authors used 4 ML techniques like C5.0, random forest, SVM & K-nearest neighbour to design glaucoma prediction model. Learning models are constructed using training dataset & their performance was evaluated by using validation data set. Finally authors observed that random forest model gives best performance & remaining other models show similar accuracy. Nesma Settouti et al. [12] has focused on deployment of Computational Intelligence methods to solve automatic feature extraction task. A semi automatic approach for glaucoma monitoring using retina pictures is presented in which a segmentation method of cups & discs regions is proposed which automatically calculates cup/disc ratio for examination of disease. Comparative study of is done & very good outcomes are obtained. L. K. Singh et al. [13] has discussed about use of Machine learning techniques to efficiently identify and prevent glaucoma. Eye specialist can be benefited by computer enabled detection. ML techniques can detect this eye disease accurately and can easily classify persons whether they have glaucoma or not.
2.2 Unsupervised Learning In this type of methods, we don’t have any expected target value to determine. It is widely utilized for segmenting or clustering customers in sections or classes. Shwetha C. Shetty et al. [14] disused and analyzed that Glaucoma is an ocular disorder and its identification includes quantification of shape-size of optic cup. Preprocessing of data is then clustered using ‘K means’ which is deployed in optic cup segmentation. It again deployed to find its dimension. Since fractal dimension is utilized to determine dimension of irregular objects, authors presented new method for detection using perimeter method. Outcome reveals new approach is accurate in detecting glaucoma.
2.3 Reinforcement Learning This type of learning is used to train machines to make specific decisions. This reinforcement machine remembers from previous outcomes & tries to observe information to ensure best business decisions accuracy. In our study we found that unfortunately very less or no work is done for glaucoma detection using Reinforcement learning methods.
336
U. Mishra and J. Raikwal
2.4 Deep Learning Deep learning algorithms for glaucoma detection and prediction are getting a lot of attention as they are achieving very high and unexpected accuracy such these algorithms can outperform human. Existing papers which used deep learning models for constructing models in their study are as follows. Liu li et al. [15] in this paper presented an attention based convolutional neural network for detecting glaucoma, known as AGCNN. Approaches which were proposed in past for automatic detection system based on fundus images are insufficient to remove high redundancy, which may lead to reduced reliability & accuracy of detection. To overcome this shortcomings, new proposed method establishes a large-scale data set, which includes fundus images labeled as (+) ve or (–) ve. The attention maps of some pictures are taken through an experiment. New AG-CNN is constructed which includes a subnet, pathological area localization & a glaucoma classification subnet. Experiment on LAG database& other available datasets reveals that the proposed method gives a detection performance superior than existing methods. Jin Mo Ahn et al. [16] presented a new deep learning approach to diagnose disease which utilizes fundus photography. Author discussed that advanced & early glaucoma both can be identified using ML-techniques along with fundus images. Dataset of 1,542 images was used and divided into training, validation & test datasets. Newly put forward model that is trained using CNN is more effective and accurate in detection of early glaucoma. Annan Li et al. [17] suggest that automatic detection of disease is important for retinal image analysis. When studied and compared with segmentation based approaches it is found that image classification based approaches performs better. But challenges are always there due to improper sample, effective features and also shape variations of optic disc. To overcome from it classification model for detecting glaucoma is put forward by authors in this papers, in which deep convolutional networks is used to represent visual appearance, holistic & local characteristics are integrated to reduce or remove misalignment. Ali Serener et al. [18] discussed about Open angle glaucoma as it is one of basic kind of disease & slowly a person tends to lose his sight. Diagnosis of this disease manually by experts is possible but it either takes a huge time or costly. Authors presented a model to detect both early & advanced glaucoma automatically. ‘ResNet-50’ & ‘GoogLeNet’ deep CNN algorithms are tested and trained using transfer learning. Found that ‘GoogLeNet’ model is better than ‘ResNet-50’ for detecting disease in eye of patient. Ramin Daneshvar et al. [19] analyzed that baseline OCT can forecast VF progression in persons with suspicious glaucoma & also authors compared performance with semi quantitative optic disc measures. It is observed that baseline pRNFL & macular OCT can be deployed in checking risk of disease progression in future. People abnormal OCT findings require better care to prevent progression of functional damage.
Analysis of Approaches for Automated …
337
Guangzhou An et al. [20] in this paper studied & suggested to design an approach to diagnosis glaucoma in persons with open-angle disease based on 3-D OCT data & colour images. Transfer learning of CNN was deployed along with different input like pictures of optic disc, RNFL thickness map, disc RNFL deviation & macular GCC deviation map. After combining outcomes of every CNN model, a random forest was deployed to categorize pictures of healthy & diseased eyes using feature vector representation of an input image and removing 2nd connected layer. Finally study using pictures & extracted quantified images from OCT as basis of an automated MLmodel for glaucoma identification reveals that new combination method obtained an AUC of 0.96. Mamta Juneja et al. [21] found drawbacks in examination of pupil by eye expert doctors because it takes huge time, but this problems can be rectified by utilizing automatic systems based on concept of learning methods. Authors put forward an Artificial Intelligent glaucoma expert system on basis of segmentation of optic disc & cup. This approach segments Optic Disc & Cup from retina pictures by deploying Deep Convolutional Neural Network architecture as G-Net. It has higher accuracy and it can be applied to other medical pictures also. Xiangyu Chen et al. [22] proposed a DL framework to diagnose glaucoma utilizing deep CNN. This DL structure has 6 layers in which 4 convolutional & 2 fully connected layers. To improve performance, data augmentation & dropout techniques are deployed in newly deep CNN method. In near future this work can be extended to multiple eye diseases diagnosis. The Summary of Literature review is shown in Table 2.
3 Biological Survey When there is damage seen in the optical nerve head and this is seen because the intra ocular pressure in eye is increased in such case the glaucoma disease is detected. Here the authors have proposed an approach which works on the quasi-bivariate variational mode decomposition and the fundus images are used for such decomposition. Here in experiment 70 number of features has been detected from the available QB-VMD SBIs. At the end SVM is used for the classification of the obtained features Agrawal et al. [23]. Another mechanism studied here that is based on the generative adversarial networks for the synthesis of the images. Retinal images are synthesized firstly and then the semi-supervised method for glaucoma detection from the labelled and unlabelled images of glaucoma. T-SNE is used for image synthesis and the features are analyzed that are present in the images. The realistic retinal images are found at the end and the classification is done to detect glaucoma Andriaz Diaz-Pinto et al. [24]. The early detection of glaucoma from the signal seen in the anterior eye chamber that found from aqueous humor fluids. This detection is done by using the giant magneto resistance sensor and all this analysis is done by the algorithm i.e. rational
338
U. Mishra and J. Raikwal
Table 2 Literature Review Ref. No. Name of Author
Objectives
Advantage
Disadvantage
9
Rashmi Panda, A. Rao, D. Padhy and G. Panda
To detect RNFL Defect in Early Glaucoma
High detection accuracy
Can’t work properly with large datasets
10
Kavita Choudhary, Prateek Maheshwari and Sonia Wadhwa
To detect glaucoma at initial level using cross validation algorithm
It generalizes Large and complex symptoms & helps dataset can be used patients and in future doctors. Accuracy of detection increases
11
S. J. Kim, K. J. Cho & S. Oh
To develop ML-models of high prediction power & interpretability based on RNFL thickness & VF
This model can be used for prediction of glaucoma against unknown eye test details
Accuracy can be increased by combining multiple algorithms
12
Nesma Settouti, Habib Daho and M. Chikh
To use Semi automatic Method for Glaucoma Monitoring
Use of computational intelligence to detect disease
In semi automatic system manual efforts are needed which reduces its scope
13
L. K. Singh, Hitendra Garg & Pooja
To deploy ML or Deep Learning Techniques to identify automatic glaucoma types
Doctors can be benefitted by it
This technique requires skilled people to operate
14
Shwetha C. Shetty & Priyanka Gutte
To detect Glaucoma Proposed model using perimeter can be used to method detect diabetic retinopathy
Better clustering techniques can be used
15
Liu Li, Yang Li, Xiaofei Wang, Jiang, Zulin, Xiang & N. Wang
To build a CNN Model for Attention-based Glaucoma Detection
The predicted attention maps enhances performance
Large database is needed
16
J. M. Ahn, Kim, K. Sung Ahn, Cho, S. Kim
To construct a deep learning model to identify glaucoma using fundus photography
Our model is more efficient in detection of at initial level
These model can be tested using different datasets and accuracy can be improved
17
Annan Li, Cheng, To combine & Solve the problem D. W. Kee Wong & Integrate Holistic & of influence of J. Liu Local features for misalignment classification of disease
Accuracy is good but can be improved
(continued)
Analysis of Approaches for Automated …
339
Table 2 (continued) Ref. No. Name of Author
Objectives
Advantage
Disadvantage
18
Ali Serener and Sertan Serte
To develop system for detection of early & advanced both glaucoma using fundus images automatically
Performance of early & advanced glaucoma detection is good
Sensitivity is less
19
R. daneshvar, A. Yarmohammadi, Simo K. Law, J. Caproli & K. Mahdavi
To compare OCT and clinical disc parameters for glaucoma progression
Greater baseline Structural data is not VFI can be used to used in these model predict VF progression
20
Guangzhou, K. Hashimoto, Tsuda, T. Kikawa, Yokota, Masahiro Akiba & T. Nakazawa
To build ML-model for glaucoma detection in persons with open-angle glaucoma based on 3-D OCT data & color fundus images
New method has capacity to act with high sensitivity to detect glaucoma
21
Mamta Juneja, S. Singh, Shivank Bali, S. Gupta & P. Jindal
To diagnose It can be modified glaucoma by to use in other deploying deep medical work also learning convolution network
22
X. Chen, Y. Xu, T. Use of Deep CNN Y. Wong & to find glaucoma Jiang Liu accurately
It easily enabled different methods to improve results
Only open angle glaucoma can be detected
It is a complex method
Takes long time as it has different layers
dilation wavelet transform. The rate of dissipation of fluid in the eye is used to differentiate the glaucoma eye or the normal eye. Early stage of the glaucoma is detected with 90% accuracy by the proposed approach Ganesh E. et al. [25]. ConvNet for the clinical interpretability has been a difficult challenge for the glaucoma detection. An approach, that uses the ConvNet for glaucoma detection along with representing the critical region seen in images for transparent analysis. The experiment done on the ORIGA dataset and the method worked well WangMin Liao et al. [26]. The intra ocular pressure seen in the eyes can be used for the detection of glaucoma disease. So for this a device is designed which uses this data of intra ocular pressure and the device is power free multifunctional contact lens. This contact lens device works as sensor and the detection of Interleukin 12p70 can be performed through this. This detection is an important fact for the detection of glaucoma disease Chao Song et al. [27]. Glaucoma is one of the dangerous diseases that lead to the vision loss. Here the authors have used the optimal disc for the prediction of risk of the glaucoma disease. The OD is optimized by the sliding window method and the morphological
340
U. Mishra and J. Raikwal
processing. Next the novel neural network and then clinical measurement is done for glaucoma classification. Finally, the curve score is analyzed for the risk prediction of glaucoma Xin Zhao et al. [28].
4 Research Gaps and Findings From the survey conducted in this paper of earlier work done in field of Automated glaucoma detection we found so many drawbacks which are as follows: • • • • •
It is very difficult to manually detect glaucoma, It is difficult to determine glaucoma in a person at initial level. Accuracy in existing method is very low. Very small amount of work has been done in predicting glaucoma at early stage. Reinforcement learning methods are not explored for glaucoma detection and prediction.
5 Solutions to the Research Gaps As per the gaps seen in the literature studied by us the system that is able to detect the glaucoma at earlier stage is developed. Here we suggest a method that is based on the CNN and this method is capable to resolve the gaps seen. The CNN is capable to distinguish the glaucoma and non glaucoma eye as this method is capable to infer the information that is present in hierarchical manner. Also, we have implemented artificial intelligence, which will work by segmentation of optic cup and disk. As per the need we can make use of multiple neural networks for segmenting the optimal cup and disk (Fig. 4).
6 Conclusion We have concluded how glucome can be detected automatically by using various machine learning and classification techniques. In this paper a detailed survey of past work has been done. KNN, K-means, C5.0, Random forest, decision tree, Naive bias, fuzzy Logic, linear regression, transfer learning using CNN and fractal analysis etc. are used for automated glaucoma detection and in future we can build a hybrid model by combining 2 or more methods for early detection and prediction of progression of glaucoma and we have compared the existing works and found that OCT and use of Morphological features for prediction of glaucoma by analysis of ocular shape & volume using 3-D T2-weighted MR images are efficient for predicting glaucoma with more than 85% accuracy. Also by combining machine learning techniques
Analysis of Approaches for Automated …
Start
Input image for Cup Predication
Model-2 Cup Predictor
Retinal Fundus Image
Construction of Input Image for Cup Prediction
Predicted Cup
341
Preprocessi ng
Predicted Disk
Calculation of CDR
Input image for Disc Prediction
Model-1 Disc Predictor
En d
Fig. 4 Approach that can be used as reference
for detection and morphological method for prediction a hybrid model can also be constructed.
References 1. Khalil, T., Khalid, S., Syed, A.M.: Review of Machine Learning Techniques for Glaucoma Detection and Prediction. Science and Information Conference 2014, London, UK, 438 DOI:https://doi.org/10.1109/sai.2014.6918224 2. Daniele, M.S., Barros, J.C.C., Moura, Cefas R. Freire, Alexandre C. Taleb, Ricardo A. M. Valentim, Philippi S. G. Morais: Machine learning applied to retinal image processing for glaucoma detection: Review and perspective. BioMed Eng OnLine, 2020. DOI:https://doi.org/ 10.1186/s12938-020-00767-2 3. Soorya, M., Ashish, I., Malay, K.D.: An automated and robust image processing algorithm for glaucoma diagnosis from fundus images using novel blood vessel tracking and bend point detection. Int. J. Med. Inform., 52–70 (2018). DOI: https://doi.org/10.1016/j.ijmedinf.2017. 11.015 4. Lee, J. et al.: Machine learning classifiers-based prediction of normal-tension glaucoma progression in young myopic patients. Jpn. J. Ophthalmol. @Springer 64(1), 68–76 (2020). https://doi.org/10.1007/s10384-019-00706-2 5. Niwas, S.I, Lin, W, Bai, X, Kwoh, C.K, Sng, C.C, Aquino, M.C, Chew, P.T.: Reliable feature selection for automated angle closure glaucoma mechanism detection. J. Med. Syst. 39(3), 21 (2015). https://doi.org/10.1007/s10916-015-0199-1 6. Nacer Eddine, B., Nabhia, A., Seife Eddine, B.: Glaucoma diagnosis using cooperative convolutional neural networks. Int. J. Adv. Electron. Comput. Sci., ISSN: 2393–2835 5(1), Jan (2018)
342
U. Mishra and J. Raikwal
7. Steve, L.M., Medeiros, F., Gordon, M.: Diagnostic tools for calculation of Glaucoma risk. Surv Ophthalmol, author manuscript; available in PMC Jun 9 2018. DOI: https://doi.org/10.1016/j. survophthal.2008.08.005 8. Tatewaki, M., Omodaka, T., Matsudaira, Y., Kunitoki, K., Nakazawa, T.Y.: Morphological prediction of glaucoma by quantitative analyses of ocular shape & volume using 3-dimensional T2-weighted MR images. Source: Scientific Reports, 9(1) (2019). DOI: https://doi.org/10.1038/ s41598-019-51611-0 9. Rashmi Panda, N.B. Puhan, Aparna Rao, Padhy, D., Panda, G.: Recurrent Neural Network Based Retinal Nerve Fibre Layer Defect Detection in Early Glaucoma. School of Electrical Sciences, IIT Bhubaneswar, India Glaucoma Diagnostic Services, L. V. Prasad Eye Institute Bhubaneswar, India DOI: https://doi.org/10.1109/isbi.2017.7950614 10. Choudhary, K., Maheshwari, P., Wadhwa, S.: Glaucoma Detection using Cross Validation Algorithm: A Comparitive Evaluation on Rapidminer. 978-1-4799-4562-7/14/$31.00 ©2014 IEEE DOI: https://doi.org/10.1109/norbert.2014.6893924 11. Jae Kim, S., Jin Cho, K., Oh, S.: Development of Machine Learning Models for Diagnosis of Glaucoma. 23 May 2017. https://doi.org/10.1371/journal.pone.0177726 12. Settouti, N., Daho, H., Amine Bechar, M.E., Chikh, M.A.: Semi-Automated Method for the Glaucoma Monitoring. 15 October 2017. https://doi.org/10.1007/978-3-319-63754-9_11 @Springer 13. Law Kumar Singh, H., Garg, P.: Automated Glaucoma Type Identification Using Machine Learning or Deep Learning Techniques. 12 December 2019. DOI:https://doi.org/10.1007/978981-15-1100-4_12 14. Shetty, S.C., Gutte, P.: A Novel Approach for Glaucoma Detection Using Fractal Analysis. 978-1-5386-3624-4/18/$31.00 c 2018 IEEE DOI: https://doi.org/10.1109/wispnet.2018.853 8760 15. Li, L., Xu, M., Liu, H., Li, Y., Wang, X., Jiang, L., Wang, Z., Fan, X., Wang, N.: A largescale database and a CNN model for attention-based Glaucoma detection. IEEE Trans. Med. Imaging, 1–11 (2019). DOI: https://doi.org/10.1109/tmi.2019.2927226 16. Ahn, J,M., Kim, S., Ahn, K.-S., Cho, S.-H., Bok Lee, K., Samuel Kim, U.: A Deep Learning Model for Detection of Both Advanced & Early Glaucoma using Fundus Photography. 27 November 2018. https://doi.org/10.1371/journal.pone.0207982 17. Li, A., Cheng, J., Kee Wong, D.W., Liu, J.: Integrating Holistic and Local Deep Features for Glaucoma Classification. 978-1-4577-0220-4/16/2016 IEEE DOI: https://doi.org/10.1109/ embc.2016.7590952 18. Serener, A., Serte, S.: Transfer Learning for Early and Advanced Glaucoma Detection with Convolutional Neural Networks. 978-1-7281-2420-9/19/ 2019 IEEE DOI: https://doi.org/10. 1109/tiptekno.2019.8894965 19. Daneshvar, R., Yarmohammadi, A., Alizadeh, R., Henry, S., Law, S.K., Caproli, J., Mahdavi, K.: Prediction of Glaucoma Progression with Structural Parameters: Comparison of Optical Coherence Tomography and Clinical Disc Parameters. Am. J. Ophthalmol., December 2019 DOI: https://doi.org/10.1016/j.ajo.2019.06.020 20. Guangzhou, A., Omodaka, K., Hashimoto, K., Tsuda, S., Shiga, Y., Takada, N., Kikawa, T., Yokota, H., Akiba, M., Nakazawa, T.: Glaucoma diagnosis with machine learning based on optical coherence tomography & color fundus images. J. Healthc. Eng. (2019). https://doi.org/ 10.1155/2019/4061313 21. Juneja, M., Singh, S., Agarwal, N., Bali, S., Gupta, S., Jindal, P.: Automated detection of Glaucoma using deep learning convolution network (G-net). Springer Science & Business Media, LLC, part of Springer Nature 2019. DOI:https://doi.org/10.1007/s11042-019-7460-4 22. Chen, X., Xu, Y., Kee Wong, D.W., Wong, T.Y., Liu1, J.: Glaucoma Detection based on Deep Convolutional Neural Network. 978-1-4244-9270-1/15 ©2015 IEEE DOI: https://doi.org/10. 1109/embc.2015.7318462 23. Agrawal, D.K., Kirar, B.S., Pachori, R.B.: Automated glaucoma detection using quasi-bivariate variational mode decomposition from fundus images. IET Image Process. 13(13), 2401–2408 (2019). https://doi.org/10.1049/iet-ipr.2019.0036
Analysis of Approaches for Automated …
343
24. Diaz-Pinto, A., Colomer, A., Naranjo, V., Morales, S., Xu, Y., Frangi, A.F.: Retinal image synthesis and semi-supervised learning for Glaucoma assessment. IEEE Trans. Med. Imaging 38(9), 2211–2218 (2019). https://doi.org/10.1109/TMI.2019.2903434 25. Ganesh, E., Shanker, N.R., Priya, M.: Non-invasive measurement of Glaucoma disease at earlier stage through GMR sensor ah biomagnetic signal from eye and radwt algorithm. IEEE Sens. J. 19(14), 5404–5412 (2019). https://doi.org/10.1109/JSEN.2019.2909526 26. Liao, W., Zou, B., Zhao, R., Chen, Y., He, Z., Zhou, M.: Clinical interpretable deep learning model for Glaucoma diagnosis. IEEE J. Biomed. Heal. Informatics 24(5), 1405–1412 (2020). https://doi.org/10.1109/JBHI.2019.2949075 27. Song, C., Ben-Shlomo, G., Que, L.: A multifunctional smart soft contact lens device enabled by nanopore thin film for glaucoma diagnostics and in situ drug delivery. J. Microelectromechanical Syst. 28(5), 810–816 (2019). https://doi.org/10.1109/JMEMS.2019.292 7232 28. Zhao, X., et al.: Glaucoma screening pipeline based on clinical measurements and hidden features. IET Image Process. 13(12), 2213–2223 (2019). https://doi.org/10.1049/iet-ipr.2019. 0137
Experiences in Machine Learning Models for Aircraft Fuel Flow and Drag Polar Predictions Ramakrishnan Raman , Rajesh Chaubey, Surendra Goswami, and Radhakrishna Jella
Abstract Fuel consumption prediction for transport aircrafts is critical for optimally managing fuel, saving energy costs, minimizing fuel emissions, optimizing aircraft trajectories and for efficient air traffic management. This paper shares experiences associated with developing machine learning based fuel flow and drag polar models for twin-engine business jet aircrafts. A set of neural network topologies, along with different activation functions, is analyzed with respect to the fuel flow prediction accuracies. Further, machine learning models are leveraged towards arriving at high fidelity drag polar model. Potential applications of the developed models include fine tuning the traditional fuel prediction models and assessing impact of factors such aircraft aging, engine performance degradation, on fuel consumption. Keywords Aircraft fuel consumption · Machine learning · Drag polar
1 Introduction Fuel comprises a significant portion of the operational costs for airlines, and hence fuel consumption is a highly critical area of focus for many stakeholders in the air transport industry, including airlines, aircraft OEMs, and air traffic control authorities. Predicting the amount of fuel consumed for various scenarios, would be required to optimize the amount of fuel on board, which in turn would help in optimizing payload weight, and save fuel costs. For airlines, a good fidelity fuel consumption R. Raman (B) · R. Chaubey · S. Goswami · R. Jella Honeywell Technology Solutions Lab, Bangalore, India e-mail: [email protected] R. Chaubey e-mail: [email protected] S. Goswami e-mail: [email protected] R. Jella e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_27
345
346
R. Raman et al.
model is critical [1–3]. A good fidelity fuel consumption model requires accurate fuel flow rate assessment and drag polar assessment. Fuel flow rate indicates the mass of fuel injected into the engine per unit time, and is impacted by many factors such as cruise altitude, speed of aircraft, weather conditions and aircraft/engine configuration. Drag polar models, involving aerodynamic coefficients such as coefficient of drag, determines the performance of the aircraft under various scenarios. Of recent interest in the avionics engineering community is the ability to predict the fuel consumption using advanced machine learning techniques [4–6]. However, an integrated machine learning based fuel flow prediction in tandem with drag polar is not adequately addressed in existing literature. This paper discusses experiences in using neural networks for fuel flow rate and drag polar predictions for twin engine business jets. The rest of the paper is organized as follows: Sect. 2 discusses data visualization and principal component analysis of the various aircraft state and environment parameters. Section 3 discusses machine learning models for fuel flow rate and drag polar predictions, and application scenarios, while Sect. 4 has the conclusions.
2 Data Visualization and Principal Component Analysis This section discusses data visualization and adoption of Principal Component Analysis for initial analysis of the flight data. The raw flight data used in this paper was sourced from a logger application integrated with high fidelity simulator flights, as opposed to actual flight data from airlines/OEMs. This is specifically done to avoid any restrictions that may come in sharing the work with the community through this paper. The raw data sourced included various aircraft state parameters and environment parameters such as Mach speed of the aircraft, gross weight, altitude, current temperature, ISA Deviation, fuel flow, cross wind and tail wind. The envelope space for variations of the various parameters was as follows - cruise altitude: 20,000 feet to 40,000 feet; Gross Weight: 54,000 lbs to 77,300 lbs; ISA Deviation: −16 to + 17; Cruise Mach: ; Tail Wind: 0 to 150 kts. Various scenarios pertaining to different envelopes of different parameters such as altitude, acceleration, deceleration and tail winds were considered. For instance, in cruise phase, over 176 scenarios were considered. Figure 1a illustrates an example scenario for cruise phase (30,000 feet altitude, acceleration followed by deceleration, with wind).
2.1 Principal Component Analysis There are many factors that impact the fuel consumption. Visualization of the different parameters would provide deeper insights into those key factors. Figure 1b illustrates an example of visualization of Gross Weight and Mach speed against Fuel Flow. For instance, 3-dimensional plot of altitude and Mach speed versus fuel
Experiences in Machine Learning Models …
347
Fig. 1 (a) Example Scenario: Cruise Phase (b) Visualization of Flight Parameters
flow would indicate that flying at higher altitude with higher speed results typically in lesser fuel flow. However, to understand the interplay of the various factors on fuel flow would require a higher dimensional visualization, which would be difficult to represent. Principal Component Analysis (PCA) [7] was used towards getting lower dimensional views. PCA comprises projection of an n-dimensional input data onto a reduced k-dimensional linear subspace, such that the reconstruction error is minimized. The lower-dimensional view is a projection of various points in the multidimensional space when viewed from its most informative viewpoint. PCA can be done by singular value decomposition of a data matrix, after mean centering, and normalizing the data matrix for each attribute. Figure 2a illustrates the plot of Fuel Flow rate plotted to the Principal Component comprising Altitude, Speed, Gross Weight, and Temperature. The distinct clusters that appear in the plots indicate that the selected features may demonstrate reasonable correlation with Fuel Flow.
Fig. 2 (a) Principal Component Analysis (b) Neural Network for Fuel Flow prediction
348
R. Raman et al.
3 Fuel Flow Predictions 3.1 Prediction Model This section discusses the machine learning model for fuel flow rate prediction. The case study discussed pertains to the climb, cruise and descent phases of a twinengine business jet. Figure 2b illustrates the neural network used. The number of hidden layers and number of units in the hidden layer defines the topology of the network. The neural networks discussed in this paper deploy a feed-forward backpropagation approach, implemented using GNU Octave v4.2.1. The cost function used in the neural network model pertains to regularized linear regression, as illustrated in Eq. (1), where θ indicates the network model parameters, and λ being the regularization parameters which controls the degree of regularization (to prevent overfitting). J (θ ) =
2 λ m 1 m (i) + hθ x − y (i) θ 2j i=1 j=1 2m 2m
∂ J (θ ) 1 m (i) = hθ x − y (i) x (i) j f or j = 0 i=1 ∂θ0 m m λ ∂ J (θ ) 1 + θ j f or j ≥ 1 h θ x (i) − y (i) x (i) = j i=1 ∂θi m m
(1) (2) (3)
Three activation functions [8] were used for the experiments—Leaky Rectified Linear Unit (LRelu), Hyperbolic Tangent and Scaled Exponential linear unit (Selu). The choice of these specific inputs is made from an understanding of the physics governing the various phases of flight. The fuel flow rate (kg/second) values associated with the inputs (for the same time instant) are taken as the outputs. The data is randomized and split into training (75%), cross validation (15%) and testing (10%) sets. Typically, the training set is used to fit the model, while the validation set is used to estimate prediction error for model selection. The test set is then used for assessment of the generalization error of the final chosen model. The prediction accuracy of the neural network is impacted by the number of hidden layers, the number of neuron nodes in hidden layers, and the activation functions used. Different topologies of the neural networks were used, and the resulting models are evaluated for their fuel flow rate predictive performance. The two-metrics used for evaluating the model are Mean Absolute Relative Prediction Error or Mean Error (ME), and Normalized Root Mean Square Error (NRMSE), as illustrated in Eq. (4), where n: number of observations, Xobs : observed values, i.e. the actual fuel flow rate in the prediction set, Xmodel : modelled values, i.e. the model mean prediction of the fuel flow rate in the prediction set. The normalized NRMSE form is taken, by normalizing the Root Mean Square Error (RMSE) to the mean of the observed data.
Experiences in Machine Learning Models … 1 n X obs,i − X model,i ME = ; R M S E = 1 n X obs,i
349
n i=1
X obs,i − X model,i n
2 ; N RMSE =
RMSE X obs
(4)
3.2 Model Evaluation Separate models were trained for each of the phases – climb, cruise and descent phases. Experiments were conducted for different number of hidden layers—1, 2, and 4 layers, with varying number of hidden units (5 to 25) per layer. Figure 3 illustrates the NRMSE and ME for Neural Network (NN) with 2 hidden layers, on the vertical axis, for training (“-Trg-”), cross validation (“-Val-”) and test (“Test-”), for the three activation functions, with the horizontal axis indicating the number of units in hidden layer. Models having low ME /NRMSE are preferred. During the experiment runs, HypT converged significantly faster than LRelu and Selu. However, LRelu and Selu exhibited better performance, i.e. lower ME (~2.5%) and NRMSE (~0.03). This study validates the feasibility of using appropriate neural network architectures, as characterized by number of hidden layers, number of units in hidden layers, and activation functions, towards predicting the fuel flow rate. The predictions also turn out to be comparably better to similar studies of predicting fuel flow rate using statistical models. For instance, one of the recent studies adopted a Gaussian Regression approach, where ME reported for was around 6% for the fuel flow rate in cruise phase, while NRMSE was around 0.62 [9].
Fig. 3 Neural Network (NN) Model Performance Evaluation –2 hidden layers
350
R. Raman et al.
3.3 Drag Polar Computation Relationship between the drag coefficient and lift coefficient is one of the critical factor that determines the performance of the aircraft under various scenarios [10, 11]. The detailed aerodynamic data are collected through flight tests and are owned by the various aircraft manufacturers/OEMs. In our proposed model, aerodynamic equations are used to estimate the drag and lift coefficients (CD and CL ), and machine learning algorithms are leveraged to assess the delta difference with respect to the performance demonstrated by the aircraft based on flight data. Figure 4 illustrates the fundamental relationships between various parameters towards computation of CD : • CD0 (M) is the fundamental coefficient which is calculated based on basic geometry of the whole aircraft, surface roughness and flying Mach Number • k1(M) is dependent on wing geometry and is a constant till a certain Mach number (Critical Mach Number), and later is a function of Mach Number • k2(M) pertains to the scenario of high CL , and rapid increase of drag • CD,wave is an additional drag component when aircraft is in transonic region. Due to various factors, such as aircraft aging, there would be changes in the aerodynamic properties of the various flight surfaces, requiring changes in the various coefficients in the equations and additional constants. We have used neural networks to predict the delta error in the coefficient of drag. Figure 5 illustrates this scenario, where the CD computed based on the ideal aerodynamic equations, is corrected with a delta correction estimate as predicted by the neural network (MathWorks® MATLAB R2019a Statistics and Machine Learning Toolbox™ was used). For this drag polar NN model, the Levenberg–Marquardt algorithm (LMA or just LM) was used as the training algorithm, with a neural network comprising one hidden layer of 27 units. The MSE in this case was about 1.5e-7, with Regression value of about 0.999. This compares well with the CD as estimated from flight tests (ME < 0.08%). The fuel flow rate model (previous subsection) combined with this drag polar model provides a high-fidelity machine learning based fuel consumption model for the aircraft.
Fig. 4 CD Computations
Experiences in Machine Learning Models …
351
Fig. 5 CD computations augmented with corrections predicted through Machine Learning
3.4 Applications of Fuel Consumption NN Models During operations of the aircrafts by the airlines over many years, the actual fuel consumption would vary on different instances (tails) of the same aircraft configurations, due to aircraft aging and engine performance degradation. One means to address the differences is to estimate correction factors—based on the average deviations in the fuel consumption experienced tails. This is not only computationally expensive and time consuming process, but the corrections are typically positioned at the mean of the deviations. While the traditional fuel computations on onboard avionics systems are based on aircraft state parameters, aircraft and engine performance characteristics, there are other extraneous factors (e.g. quality of fuel) that impact the actual fuel consumption, but are difficult to incorporate in the model. Neural network models such as the one discussed in this paper, can be leveraged to give the tail specific fuel flow predictions. This is akin to an aircraft “personalized” (i.e. each aircraft tail specific) NN model, that is learning based on the actual experiences of the specific aircraft. This would hence incorporate the consolidated impact of various tail specific extraneous factors blended with impact of changes in aircraft/engine performance characteristics over time.
4 Conclusions This paper discusses aircraft fuel consumption prediction through machine learning based fuel flow and drag polar models. The approach involved analysis of various factors impacting fuel flow, evaluation of various neural network topologies for prediction performance, and augmenting theoretical aerodynamic equations to match actual aircraft performance characteristics. Principal Component Analysis was used
352
R. Raman et al.
to understand the interplay of the various factors on fuel flow. The performance of different activation functions in neural networks for fuel flow predictions were analyzed—Leaky Rectified Linear Unit and Scaled Exponential linear unit exhibited better performance. For drag polar predictions, Levenberg–Marquardt algorithm was used to augment the coefficients computed through aerodynamic equations, and good prediction results were demonstrated. This study motivates promising directions for future research and incorporation in avionics systems. A point to be noted is that the various parameters learned in the NN models depends on the aircraft type and aircraft engine configurations. Potential applications of this model include fine tuning the traditional fuel prediction models (that were constructed based on aerodynamics, engine models and early flight test data), assessing the impact of factors such aircraft aging, engine performance degradation on fuel consumption, and quantifying the extraneous parameter space pertaining to aircraft tail specific factors (extraneous in traditional fuel consumption models).
References 1. Horiguchi, Y. et. al.: Predicting fuel consumption and flight delays for low-cost airlines. In: 29th AAAI Conference on Innovative Applications (IAAI-17)), pp. 4686–4693 (2017) 2. Turgut, E.T.et. al.: Fuel flow analysis for the cruise phase of commercial aircraft on domestic routes. Aerosp. Sci. Technol. 37, 1–9 (2014) 3. Stolzer, A.J.: Fuel consumption modeling of a transport category aircraft: A flight operations quality assurance analysis. J. Air Transp. 8(2) (2003) 4. Li, G.: Machine learning in fuel consumption prediction of aircraft. In: 9th IEEE International Conference on Cognitive Informatics (ICCI), Beijing, pp. 358–363 (2010) 5. Baumann, S.: Using machine learning for data-based assessing of the aircraft fuel economy. In: 2019 IEEE Aerospace Conference, pp. 1–13 (2019) 6. Baumann, S., Klingauf, U.: Modeling of aircraft fuel consumption using machine learning algorithms. CEAS Aeronaut. J. 11, 277–287 (2020) 7. Aleix, M., Martınez, A.M., Kak, A.C.: PCA versus LDA. IEEE Trans. Pattern Anal. & Mach. Intell. (2), 228–233 (2001) 8. Wikipedia—Activation Function, https://en.wikipedia.org/wiki/Activation_function, last accessed 15 Jun 2020 9. Chati, Y.S., Balakrishnan, H.: A Gaussian process regression approach to model aircraft engine fuel flow rate. In: 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS), pp. 131–140 (2017) 10. John, D.A., Jr. : Fundamentals of Aerodynamics, McGraw-Hill (2010) 11. Nita, M., Scholz, D.: Estimating the oswald factor from basic aircraft geometrical parameters. In: Dtsch Luft- und Raumfahrtkongress, Germany, pp. 1–19 (2012)
Wildlife Video Captioning Based on ResNet and LSTM Abid Kapadi, Chinmay Ram Kavimandan, Chinmay Sandeep Mandke, and Sangita Chaudhari
Abstract Wildlife videos often have elaborate dynamics, and techniques for generating video captions for wildlife clips involve both natural language processing and computer vision. Current techniques for video captioning have shown encouraging results. However, these techniques derive captions based on video frames only, ignoring audio information. In this paper we propose to create video captions with the help of both audio and visual information, in natural language. We utilize deep neural networks with convolutional and recurrent neural networks both involved. Experimental results on a corpus of wildlife clips show that fusion of audio knowledge greatly improves the efficiency of video description. These superior results are achieved using convolutional neural networks (CNN) and recurrent neural networks (RNN).
1 Introduction Describing videos in natural language proves easy for humans, but is an incredibly complex task for computers. The goal of automatic video captioning is to narrate the tale about the sequence of events occurring in a video. Video captioning, the function of automatically creating a summary of a video in natural language, happens to be a critical problem for the fields of both Natural Language Processing and Computer Vision. Audio features can also play an important role in video captioning along with A. Kapadi · C. R. Kavimandan · C. S. Mandke (B) · S. Chaudhari Ramrao Adik Institute of Technology, Nerul, Navi Mumbai, Maharashtra, India e-mail: [email protected] A. Kapadi e-mail: [email protected] C. R. Kavimandan e-mail: [email protected] S. Chaudhari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_28
353
354
A. Kapadi et al.
visual features. Therefore a good understanding of the audio signature is necessary to generate accurate captions for the videos. Wildlife video captioning refers to the description of videos captured from the natural habitats of animals so as to provide insight into their behavior. This information may be required by students, or people who might want to get informed about the ecosystem of a particular area. When a viewer watches a raw video, they might not be able to discern the various entities (flora/fauna). But, in today’s world, the viewer can receive this information at the same time he’s looking at the video, by means of captions. For this purpose, we need to generate relevant, diverse and accurate captions for the wildlife videos, using appropriate terms. A layman might not know about the various flora and fauna represented in the video, and the various activities they are performing. Captioning helps the layman understand these things, whilst also making it easier for the experts. As captions clearly show the assignment among sound, meaning, and words, captions might help these children understand how to read. Captions assist persons with normal hearing in understanding a second language. With captions enabled, users can see videos in cases where sound is not available. Most methods in the multimodal fusion domain attempted to together learn temporal features from different modalities. Also, none of these approaches have been fine-tuned for the wildlife sub-domain. Another problem is that no attempt has been made to use temporal transformations of the various modalities of differing granularities of analysis. A video’s temporal constructs are necessarily complex as the video usually includes temporally sequential events (E.g. a video where a lion stands up and walks a little; next, he starts roaring). Amongst those actions, there are clear temporal dependencies. Hence we hypothesize that understanding and aligning the high-level (global) and low-level (local) temporal transformations for many modalities is useful.
2 Related Work Captioning has been emerging as an important part in the videos as the growth of deaf and hard of hearing population is increasing day by day. The world is now shifting more towards digitization and hence the organizations are more inclined to train or teach their employees students through videos. There has been a significant amount of research in the field of video captioning. Researchers have attempted to make use of attention based mechanism to learn where to focus in video during generation of captions. Park et al. [1] proposed a framework to describe the videos which focuses more on reinforcement and adversarial techniques that are use during inference. The advantages are it increases vocabulary and improves person word correctness. However this mechanism might include unnecessary false inference. Pei et al. [2] proposed a framework which uses an attention-based recurrent network as the main caption decoder and makes use of a memory based decoder to assist the decoding process. The advantage of this mechanism is that it can carry context through video. Dong et al. [3] observed that the skewed spread of words is a major reason for identification
Wildlife Video Captioning Based on ResNet and LSTM
355
fault and feature inadequacy in video captioning. This paper highlights Information Loss strategy, which targets the connection between the video specific visual content and corresponding representative words. HAIL performs better than state of art video captioning methods with comparable enhancements 18.2% in CIDEr on MSVD dataset. Zhao et al. [4] proposed the use of spatial-temporal attention mechanism within an encoder-decoder neural network in order to caption videos. The advantage is that the caption generated maintains human language benefits. However, video diversity reduces efficiency and misspelling causes havoc. Yan et al. [5] proposed a co-attention mechanism based recurrent neural network is used, where CAM is used to cipher the image and text features, and RNN deciphers them to produce the video caption. The advantage of this method is that the caption formed by CAM RNN has less error and are efficient enough even if we increase complexity. Li et al. [6] proposed a multitask reinforcement learning method to train a video captioning mechanism in an End to end mode is proposed. The advantage of this method is that it uses relevant features instead of genetic features. Xu et al. [7] proposed the Joint Event Detection and Description Network (JEDDi-Net). This mechanism uses three-dimensional convolution to mine video appearance and motion features, which are sequentially passed to the temporal event proposal network and the captioning network. The advantage of this method is that it is capable of aligning and fusing both subjective and objective context of different modalities for video understanding and sentence generation. Long et al. [8] proposed a unique multi-faceted attention architecture that combinedly uses multiple dissimilar structures of input. The advantage is that it can deal with semantic attribute and noise. Sun et al. [9] proposed a model to perform multiple activities: video classification and captioning. The various algorithm used in this method is Classification, 3D convolution, LSTM. This method gives superior captions than an end to end training on video captioning dataset as well as saves time required for training. The disadvantage is there is no attention layer and high number of training videos are required. Wang et al. [10] proposed a unique hierarchically ranged cross-modal attentive network (HACA) to understand and align subjective and objective contexts among different modalities of the video. After studying various existing methods we understood that most of the existing systems use either CNN or RNN are used for image data. CNN works well with data that has spatial relationship. RNN are used for text data, speech data. RNN are not useful for Image data input. A hybrid RNN and CNN approach may be superior when data is suitable for CNN, but has temporal characteristics such as time series that can be identified and exploited by RNN component.
3 Proposed Model We aim to create a system that provides relevant captions for wildlife videos, using a combination of ResNet model and VGGish model, to process the frame images and to process the audio respectively. We have used both CNN and RNN. We have used CNN for image classification as CNN can tune in parameters easily with increase
356
A. Kapadi et al.
Fig. 1 Overview of captioning framework
in number of layers. Also CNN is capable of reducing the number of parameters without losing the quality of models. RNN recollects every single data through time. It is valuable in video captioning simply because of the component to recollect past contributions too. This is called Long Short Term Memory (LSTM). Since video captioning involves sequence of events, RNN is useful in video captioning. ResNet is another neural design for lessening the complexity and comprehending the degradation of the model while keeping great execution. By reducing complexity, less number of parameters needs to be trained and hence we spend less time on training. VGGish is a pretrained feature extractor. We have used VGGish model as it produces a 128-dimension embedding for every second of our audio clip. These embeddings can be used in our own model. The methodology we used is given below: Separating highlights from video: To use visual and sound prompts, we utilize the pre-prepared CNN models to extricate deep visual highlights and deep sound highlights correspondingly. More specifically, we have used the ResNet model for picture grouping and the VGGish model for sound grouping as shown in Fig. 1.
3.1 Convolutional Neural Networks Steps Performed In CNN Following are the steps performed in CNN: 1. 2.
The picture goes through a progression of convolutional, nonlinear, pooling layers and fully connected layers, and then generates the output. First layer is Convolution layer. The image is represented in the form of a matrix and we consider a filter (neuron) which moves along the input image. It multiplies its values with the original pixel values. One number is obtained at the end which is the result of summation of values.
Wildlife Video Captioning Based on ResNet and LSTM
3.
4.
5.
357
The nonlinear layer is included after every convolution activity. It has an activation function, which brings nonlinear property. Without this property a system won’t have the option to demonstrate the class label. The pooling layer follows the nonlinear layer. It works with width and tallness of the picture and plays out a down sampling procedure on them. Thus the picture volume is diminished. This implies if a few highlights (as limits) have just been distinguished in the past convolution activity, it is compacted to less nitty gritty pictures. After completion of arrangement of convolutional, nonlinear and pooling layers, it is important to join a completely associated layer. This layer takes the yield data from convolutional systems. Appending a completely associated layer to the furthest limit of the system brings about a N dimensional vector, where N is the measure of classes from which the model chooses the ideal class.
The convolution layer is an important part of the CNN. It varies according to the configuration and consists of a filter that combines with a local region from the input image. The formula is represented as Eq. (1), where w represents the matrix filter, and b is the bias parameter ymn =
wmn + bn
(1)
3.2 Recurrent Neural Networks Steps performed in RNN Following are the steps performed in RNN: 1.
2. 3. 4. 5. 6.
Initialize the system layers and the underlying hidden layer with same weight and activation function. The dimension of the current hidden layer will be reliant on the dimension of repetitive neural network. After initializing loop through sources of info, pass the word and hidden layer into the RNN. The RNN returns the yield and an adjusted hidden layer. For all the steps, this continues to repeat until the problem is solved. The yield is passed to the feed forward layer, and it restores a prediction. By calculating the difference between the output generated by our RNN model and the expected output error is calculated. Finally, the error is backpropagated to all the previous step to update the weights.
The equation is given as Eqs. 2 and 3, where ∇ w is the gradient with respect to the weight, “LO” is the loss function, and α the learning rate. It is reduced by a factor of 0.15 after every 10 epochs. Vt = β.Vt−1 − 1 + α.∇w .L O(W, X, y)
(2)
358
A. Kapadi et al.
W = W − Vt
(3)
3.3 Hierarchical Attentive Encoder The hierarchical attentive encoder consists of two LSTMs and the input to the lowlevel LSTM is a sequence of temporal features. The high-level LSTM here operates at a lower temporal resolution and runs one step every s time steps. Thus it learns the temporal transitions of the segmented feature chunks of sizes. Furthermore, an attention mechanism is employed between the connections of these two LSTMs. It learns the context vector of the low-level LSTM’s outputs of the current feature chunk, which is then taken as the input to the high-level LSTM. Since we are utilizing both the visual and audio features, there are two hierarchical attentive encoders (v for visual features and a for audio features). Hence four sets of representations are learned in the encoding stage: high-level and low-level visual feature sequences and high-level and low-level audio feature sequences. This has been depicted in Fig. 1. The higher level LSTM understands the temporal transitions of the feature parts of size s. The context vector of the other LSTM is taken as the input for the high level at step i. This can be represented via formulas 4 and 5, where eh denotes the high-level LSTM whose output and hidden state at i are oieh , h ieh and el is the low-level encoder LSTM, whose output and hidden state at step i are oiel , h iel f jeh =
sj
α jk okel
(4)
k=s(i−1)+1 eh ) oieh , h ieh = eh ( f ieh , h i−1
(5)
3.4 Globally and Locally Aligned Cross-Modal Attentive Decoder In the decoding stage, the representations of different modalities at the same granularity are aligned separately with individual attentive decoders. That is, one decoder is employed to align those high-level features and learn a high level (global) crossmodal embedding. Since the high-level features are the temporal transitions of larger chunks and focus on long-range contexts, we call the corresponding decoder as global decoder (gd). Similarly, the companion objective decoder (ld) is utilized to position the low-level (objective) characteristics that attend to fine-grained and local dynamics. After every time interval t, the attentive decoders understand the related image and sound contexts using their attention methodology.
Wildlife Video Captioning Based on ResNet and LSTM
ctld =
t−1
ld ld αtn hn
359
(6)
n=1 gd
ct =
t−1
gd
αtn h ngd
(7)
n=1
Every decoder has a cross-modal attention mechanism, used for understanding attention of various modalities. This module specifically looks after various modalities and outputs as per the following formula 8, where ctvi , ctdc , ctdc are visual, decoder and audio context at time t respectively. W vi , W au , W dc and β tvi , β tau , β tdc are learnable matrices. cte = tanh βtvi wvi ctvi + βtau wau ctau + βtdc wdc ctdc
(8)
4 Experimental Setup 4.1 Dataset Used The datasets already available for video captioning include Microsoft Video Description Dataset (MSVD), MSR Video To Text (MSR-VTT) [11]. These datasets, however, include only a small amount of wildlife videos, being general purpose video datasets. Also, the wildlife videos that are available include a fair amount of background noise and human interference. Thus, we cannot use these datasets for training our model. The other wildlife specific datasets available also include background noise, or no sound altogether, or human interference. In order to solve this, we have compiled a dataset of wildlife clips from various videos. While extracting the clips from the videos, care has been taken so as to avoid inclusion of background noise. The clips are of 6–10 s each. Each clip has been annotated manually with one or more captions using a large vocabulary. The manually generated captions are concise and accurate and describe the clips well. The clips are diverse with respect to the locations, wildlife and their actions involved. Naturally occurring sounds have also been duly annotated. This dataset is very well suited to be used for our model, because it does not have videos with any human interference or background noise.
4.2 Preprocessing We have performed evaluation on our dataset, which we have compiled, of wildlife clips with rich sound, and minimal human voice/musical audio interference. Every
360
A. Kapadi et al.
video has 2–5 annotated baseline captions, compiled manually. In order to generate the visual features, the pretrained ResNet model is utilized on the frames of the video which are sampled at 3 frames per second. For the sound features, the raw WAV files are processed with the help of the pretrained VGGish model.
4.3 Evaluation Metrics We followed four different automated evaluation criteria: BLEU, ROUGE-L, and CIDEr-D; computed with regular MS-COCO server evaluation code. BLEU: It is one of the initial measures to measure similarity between two lines, in this case, expected caption and generated caption [12]. ROUGE-L: It is another method of comparing two lines based on longest common subsequence. It favors longer lines [12]. CIDEr-D: It is a relatively newer technique to compare two lines, based on common n-grams [12].
4.4 Training Details At the validation set all the hyper parameters are tuned. The maximum frame count is 50, and the maximum number of audio segments is 20. The low-level encoder is a bidirectional LSTM with hidden dim 512, 128 for the hierarchical attentive encoders audio, and the high-level encoder is a LSTM with hidden dim 256 (64 For audio HAE), the chunk scale of which is 10 (4 for audio HAE). The global decoder is an LSTM with hidden dim 256 and an LSTM with hidden dim 1024 is the local decoder. The decoder’s maximum stage size is 16. We’re using Volume 512 word embedding. We also implemented Dropout for regularization, with a value of 0.2. The gradients are clamped into the [−10, 10] range. We set all parameters within the range [−0.08, 0.08], with a uniform distribution. The optimizer Adadelta is utilized taking batch size as 64. Initially the learning rate is set as 1 and subsequently decreased by a factor 0.5 if the current CIDEr score does not surpass the previous best for 4 epochs. The maximal amount of epochs is set at 50 to avoid over fitting and at each epoch the training data is shuffled. Schedule sampling is used for training the models. For testing the models, we use size 5 beam search.
5 Results and Discussion We have performed testing based on annotated videos. We have compared the generated captions with the annotations using measures such as BLEU, ROUGE-L and
Wildlife Video Captioning Based on ResNet and LSTM Table 1 Comparison with existing results
Method
361 Bleu
CIDEr
ROUGE-L
S2VT
39.6
66.7
67.5
VT-RNN
41.9
53.2
69
CNN-RNN
42.4
54.3
69.4
SA-LSTM
45.3
74.9
64.2
Proposed model
46.2
76.5
68.2
CIDEr-D as shown in Table 1. We also tested multiple baselines to verify the components’ effectiveness in our video captioning model. In order to verify the dominance of the deep audio features in captioning, we have compared our model with non audio based models, showing that better captions are generated when audio features are also used. Figure 3 highlights the importance of recognizing and explaining a video using the audio features, as while the giraffe is seen in the frame, the buzzing sound belongs to the insects not in the frame. The model is very effective at recognizing the actions performed by the wildlife. It is also adept at recognizing the sounds in the clips and combining these two aspects to give relevant and accurate captions. As both the actions and the sounds are described in the captions generated, our model has the edge over existing models that ignore audio features. Figure 2 shows the variation of loss with the number of epochs for the training set. The loss fluctuates with increasing number of epochs and becomes smaller and more negligible over time. Figure 3 shows the captions generated by the algorithm on test videos.
Fig. 2 Graph for loss versus epoch for training set
362
A. Kapadi et al.
(a) A giraffe is eating the leaves of a tree while insects are making buzz noise.
(c) A tiger is sitting and roaring.
(b) A crocodile is lying on the ground.
(d) A lion is standing and roaring loudly
Fig. 3 Actual algorithm generated description
6 Conclusions Wildlife video captioning refers to the description of videos captured from the natural habitats of animals so as to provide insight into their behavior. This information may be required by students, or people who might want to get informed about the ecosystem of a particular area. We implement a generic video captioning architecture that learns both globally and locally about the associated cross-modal attention. The attention to audio features enables the model to generate captions for wildlife that might not be in the frame when their sounds are heard. This enables the model to generate richer and more accurate captions than the models already in use. Video captioning problem is not yet solved, as the best performance so far is still far from human-level captioning. Here, we identify several possible future directions, according to discussions in the literature and progress in related fields.
References 1. Park, J.S., Rohrbach, M., Darrell, T., Rohrbach, A.: Adversarial inference for multi sentence video description. University of California, Berkeley (2019) 2. Pei, W., Zhang, J., Wang, X., Ke, L., Shen, X., Tai, Y.-W.: Memory attended recurrent network for video captioning. Southern University of Science and Technology (2019) 3. Dong, J., Gao, K.: Video specific information loss for video captioning. Chinese Academy of Sciences, Beijing, China (2019)
Wildlife Video Captioning Based on ResNet and LSTM
363
4. Zhao, B., Li, X., Lu, X.: CAM-RNN Co-attention Model Based RNN for Video Captioning. IEEE (2019) 5. Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., Dai, Q.: Spatial temporal attention mechanism for video captioning (2019) 6. Li, L., Gong, B.: End to end video captioning with multitask reinforcement learning. Beihang University (2019) 7. Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video stream. Boston University, Baidu Research, University of British Columbia (2018) 8. Long, X., Gan, C., de Melo, G.: Video captioning with multi faceted attention. Tsinghua University, Rutgers University (2018) 9. Sun, J., Wang, J., Yeh, T.: Video understanding: from video classification to captioning. Stanford University (2018) 10. Wang, X., Wang, Y.-F., Wang, W.Y.: Watch listen and describe: globally and locally aligned cross modal attentions for video captioning. University of California, Santa Barbara (2018) 11. Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. Microsoft Research, Beijing, China (2016) 12. Kilickaya, M., Erdem, A., Ikizler-Cinbis, N., Erdem, E.: Re-evaluating automatic metrics for image captioning. Hacettepe University (2016)
Enhancement of Degraded Images via Fuzy Intensification Model Shaik Fayaz Begum and P. Swathi
Abstract The poor conditions of weather dust substantially reduce the overall quality of both the images taken, thus preventing useful image data from is being detected. A simple membership function is used in the proposed technique to set the pixels of a given channel to the range of zero to one, fluctuating intensifying operators applied according to various threshold and a new adjustment method designed specifically for this technology. Fuzzy theory provides a major issue—solving method between classical mathematics accuracy and the real world ‘is inherent imprecision. Fuzzy logic addresses the study of potential logic or several valued logics; instead of specified and accurate rationale, it applies approximation. This research aims to check the processing capability of the method proposed, whereby the findings acquired are able to filter the numerous degraded images.
1 Introduction Enhanced image is the process of increasing image quality to allow the image to perceive further details. Image can be degraded because of many reasons such as noise, lighting and poor photographic techniques. This can cause bad contrast, overexposure and exposure to exposed image regions [1]. In such conditions, capturing images often yields unwanted artifacts such as poor contrast, defective colors or color casting. Therefore, various approaches have been suggested for handling these unexpected events and retrieving coherent results with appropriate colors. These approaches are different from basic to complex because of differences in the processing principles are used. In this study, a groundbreaking technique is implemented that uses tuned blurry intensification operations to rapidly process bad-quality images taken in bad weather dust conditions. Intensive work was carried out to assess S. F. Begum (B) · P. Swathi Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh, India e-mail: [email protected] P. Swathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_29
365
366
S. F. Begum and P. Swathi
the processing efficiency of the experimental technique, in which the results of the study demonstrated their efficiency in filtering numerous degraded images [2]. The obtained image introduces ambiguity due to poor and non-uniform light levels of the object and non-linearity of the image acquisition. This ambiguity makes it appear in an image in the manner of inaccurate limits and values of color combinations. The smoothness and intensity of video images taken during dust conditions will slightly degrade and decrease. This paper proposes a new method of improving images. First, transform the degenerated image into a fluid domain into a global PAL smooth enhancement; then implement a local band-limited histogram equalization to enhance the spatial domain’s local component; finally, implement POSHE algorithm to enhance the information [3]. Results of the analysis show that this method not only increase the dust image intensity efficiently, but also enhance that visibility of the edge data and get a great visual appearance on the image. Images of shot interior shots in inclement weather are in poor comparison. The climate substantially, under poor visibility disperses the light which reaches a camera. The subsequent decay, in contrast, varies across the scene and is constant in the depth of display points. For this reason, traditional space symmetric image processing methods are not adequate to remove the effects of the image context. We introduce a physics dependent model that describes the scenes performances under uniform poor weather conditions. Increases in scenario-point intensities under different environments create strong constraints for identifying discontinuities in scenario depth as well as for the evaluating scene structure [4]. And there is a simple scene contrast restore algorithm is provided. Contrary to past techniques, our reduced temperature technique requires a priori scene structure, scene reflectance distributions, or comprehensive knowledge of the specific weather situation. All of the methods mentioned in this article a wide variety of climatic conditions such as haze, nebula, smoke and many other aerosol situations are effective. Our processes can also be extended to the gray scale, RGB colour, object detection and sometimes even Infrared images [5]. We also extend our techniques, captured using a surveillance camera, to improve the contrast among moving scenes and images. Images taken in dust storm conditions also feature reduced visibility and unnecessary effects of color casting. In these cases, conventional methods to visibility restore typically cannot effectively restore images due to poor estimate of haze thickness and persistence of problems with color casting. In this paper, we present a novel visibility restore approach based on Laplacian to effectively solve inadequate haze thickness estimation and to mitigate the color cast problems. Doing this, it is possible to produce a high-quality picture with strong visibility and vibrant colour. Experimental findings from qualitative and quantitative tests indicate that the proposed approach can significantly enhance photos recorded during bad weather conditions and deliver superior results to most other state-of-the-art approaches.
Enhancement of Degraded Images …
367
2 Literature Survey This part of the work offers a brief survey of the literature on the use of blurry intensification operations. Most of the work related to the proposed work has been revisited to understand the problem. Various methods under spatial domain like thresholding, filtering, level transformation, histogram equalization exists. Notwithstanding the their benefits of even being easy to perform and less complicated in the real world, they have not produced end—results and they’re not very reliable and do not provide perceptual quality [6]. Fuzzy is a scheme of acquisition of information, it procedures human knowledge in the form of flippant rules. Fuzzy image processing is very effective in enhancing image contrast. Fuzzy methodology can effectively deal with the inconsistency and incoherence of the image [7]. The problem of enhancing contrast of images enjoys much attention and spans a wide gamut of applications, ranging from improving visual quality of photographs acquired with poor illumination to medical imaging [8]. It is, however, extremely difficult if the conditions under which a specific filter should be chosen are not hard to establish, because it can only be analyzed briefly in some parts of an image. Thus a filtering system should be able to deal with unclear and ambiguous details [9]. Finally, it is important to develop an innovative technique which uses fuzzy intensification operators, as it also can use in many areas of image recognition and digital image recognition.
3 Methodology Fuzzy image processing is a compilation of all methodologies that comprehend, portray and process the images as fuzzy sets, their segments and characteristics. Interpretation and processing rely on the procedure chosen and the problem to be solved [10]. Work in this region has been growing in recent last few years due to the rise of sandstorms and dust weather. Due to variability of the processing principles used, the complexity of the implemented methods differs among renowned researchers. Hence a detailed representation is provided in Fig. 1 for all the processing system used. Working into aspect, the proposal technique starts with (almost) zeta processing, that is a tuning function used to control the color quality of the processing image. The corrupted image is then computed and dissolved into its fundamental channels mainly referred to as Red, Green, and Blue (RGB). Two factors are required to determine the operators for the intensifying. Primarily, the estimation of the parameter known to be as tau (τ) which specifies the thresholds limits of the operations. The use of (τ) allows the pixel image processing operators. Following, an affiliation function is mandatory for the reason that it sets the values of that same pixels of a specified channel to the normal range of zero to one. This function must be implemented in such a way that the intensification operators can work well.
368
S. F. Begum and P. Swathi
Fig. 1 Flowchart for the fuzzy intensification of degraded images
When these operations are implemented then the output obtained from each source is tuned and to use the implemented tuning procedure, that can be represented as:
4 Experimental Investigations Image file (.jpg or.jpeg or.png) with Fig. 1 after reading an image into the matlab, by the help of algorithm and equation sand in this work dusty image is considered for processing. An input image as shown in Fig. 2 and it is collected from data base. As this image is dusty in nature and a factor called tuning parameter is used for improving the intensity levels later the image is disintegrate into three layers namely R, G, B. Further thresholding parameter is applied to compute membership function for the RGB layers. The resulted images are processed through intensification operators via tuning parameter. The R, G, B layers are concatenated to provide the processed output image for a tuning value of 0.3 and 0.4 as shown in Figs. 3 and 4. In a similar fashion the input image is subjected to same process for different tuning parametric values such as 0.5,0.6 as shown in Figs. 5 and 6 respectively. Further all
Enhancement of Degraded Images …
369
Fig. 2 Input image
the images are analysed for statistical values such as parameter, Circularity, Solidity, Eccentricity are represented as in Table 1. It is observed from the above tabular column that the solidity is constant which measures the density of an object, whereas perimeter value increases up to zeta = 0.5 and later it starts reduce it means the optimal and maxima tuning parameter is 0.6. This case is similar for Circularity also, which is a common shape factor of the object, whereas Eccentricity value increases as zeta value is increased, which is parameter that calculates distance between the foci of the curvature of the vision and its major axis. Eccentricity must be nearer to 1.
370
S. F. Begum and P. Swathi
Fig. 3 Output image for a tuning value of 0.3
5 Conclusion The outcome of the proposed technique provides the various results with refined colours and lucid features. It has performed well in providing appropriate colors and revealing fine information for the images produced. This technique can be extended to process other degraded images taken in hazy, foggy or misty weather conditions. With the help of the algorithm it is verified that the received image is better than original image. This work is carried out by using MATLAB technical computing language of version R2015b.
Enhancement of Degraded Images …
Fig. 4 Output image for a tuning value of 0.4
371
372
Fig. 5 Output image for a tuning value of 0.5
S. F. Begum and P. Swathi
Enhancement of Degraded Images …
373
Fig. 6 Output image for a tuning value of 0.6
Table 1 Comparative results S.NO
Parameter
Input
Output (0.3)
Output (0.4)
Output (0.5)
Output (0.6)
1
Perimeter
1632
1648
1649
1650
1620
2
Circularity
0.7706
0.7693
0.7676
0.7717
0.7658
3
Solidity
1
1
1
1
1
4
Eccentricity
0.6515
0.6621
0.6741
0.6802
0.6866
References 1. Yu, D., Ma, L.H., Lu, H.Q.: Normalized SI correction for hue-preserving color image enhancement. In: 6th International Conference on Machine Learning and Cybernetics, pp. 1498–1503 (2007) 2. Al-Ameen, Z.: Visibility enhancement for images captured in dusty weather via tuned trithreshold fuzzy intensification operators. Int. J. Intell. Syst. Appl. (IJISA), 8(8), 10–17 (2016). https://doi.org/10.5815/ijisa.2016.08.02 3. Hanmandlu, M., Tandon, S.N., Mir, A.H.: A new fuzzy logic based image enhancement. In: 34th Rocky Mountain Symposium on bioengineering, Dayton, Ohio, USA, pp. 590–595 (1997) 4. Khan, M.F., Khan, E., Abbasi, Z.A.: Multi segment histogram equalization for brightness preserving contrast enhancement. Adv. Comput. Sci., Eng. & Appl., Springer, 193–202 (2010) 5. Verma, O.P., Kumar, P., Hanmandlu, M., Chhabra, S.: High dynamic range optimal fuzzy color image enhancement using artificial ant colony system. Appl. Soft Comput. 12(1), 394–404 (2012)
374
S. F. Begum and P. Swathi
6. Hauli, Yang, H.S.: Fast and reliable image enhancement using fuzzy relaxation technique. IEEE Trans. Sys. Man. Cybern. SMC 19(5), 1276–1281 (1989) 7. Hanmandlu, M., Verma, O.P., Kumar, N.K., Kulkarni, M.: A novel optimal fuzzy system for color image enhancement using bacterial foraging. IEEE Trans. Inst. Meas 58, 2867–2879 (2009) 8. Raju, G., Nair, M.S.: A fast and efficient color image enhancement method based on fuzzy-logic and histogram. Int. Elsevier J. Electron. Commun. 68(3), 237–243 (2014) 9. Hasikin, K., Isa, N.A.M.: Enhancement of the low contrast image using fuzzy set theory. In: 2012 UKSim 14th International Conference on Computer Modelling and Simulation, Cambridge, pp. 371–376 (2012), https://doi.org/10.1109/uksim.2012.60 10. Kaur, T., Sidhu, R.K.: Performance evaluation of fuzzy and histogram based color image enhancement. Procedia Comput. Sci. J. 58, 470–477 (2015)
Design and Investigation of the Performance of Composite Spiral Antenna for Direction Finding Applications K. Prasad and P. Kishore Kumar
Abstract Antennas are very required elements in the safety and the security of the nation. During war times it is needed to identify the enemy targets. To obtain the enemies signal frequency and the polarization the defense antennas that are molded on to the airborne vehicles must be are to operating at a wide range of bandwidths with circularly polarized radiation patterns. These CP antennas were suitable at receiving ends for detecting any type of linearly polarized signals that are oriented in any direction. In this paper design and develop a composite spiral antenna covering a frequency range of 0.5−18 GHZ with the dimension of 90mm × 90mm is examined. The spiral antenna is a novel antenna of its kind for microwave direction finding as it exhibits wide frequency band. The Square spiral antenna is the best choice rather than a round spiral since it has reduced aperture. Spiral antenna has some inherent features like ultra wideband bandwidth, circular polarization and flush mounting capability. Hence these Spiral antennas suits as a best choice in direction finding applications. The design of composite spiral circuit is performed by using a computer program generated in MATLAB software. HFSS is a high performance full wave EM field simulator used to analyze the performance of spiral antenna in terms of its input reflection coefficient, Radiation pattern, gain and axial ratio in this paper. After the completion of the antenna design, its various parts are fabricated and assembled with precision. The assembled antenna is tested for various characteristics using network analyzer and anechoic chamber. The investigated outcomes of the tested Spiral antenna are projected in this paper. Keywords Antenna design · Spiral antenna · HFSS · MATLAB · Radiation characteristics K. Prasad (B) Department of ECE, Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh, India e-mail: [email protected] P. Kishore Kumar Department of ECE, Ravindra College of Engineering for Women, Kurnool, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_30
375
376
K. Prasad and P. Kishore Kumar
1 Introduction Antennas started paving a tremendous path to communication systems with the development of dipole antenna very first by Heinrich Hertz. In this present technical environment, A varieties of require antennas with wide bandwidths, small in size and wider coverage with high gain are very much needed. The aforementioned characteristics can be obtained by concept of frequency independent antennas [1] proposed by Dr. V. H. Rumsey at university of Illinois, Urbana Champaign. In the year 1954 E. M. Turner has introduced the concept of Spiral antenna. This Spiral antenna is from the family of frequency independent antennas. To determine the direction of the microwaves, a special type antenna device is needed and the discussion of such antenna named the spiral antenna in literature was first appeared between 1956–1961 time periods. Generally the Electronic warfare systems (EWS) that are used in defense provides secured defense link facilities in order to study the incoming threats and will also provides safety and protection. In the battle fields specifically they perform the real time interception analysis to detect the enemy target ships. All these type of military operations are executed with the help of the electromagnetic spectrum (EMS). For various applications such as navigation, radar, satellites, intelligence, communications, sensing, secrete communication information storage and signal processing electromagnetic spectrum is extensively used. Therefore huge demand is prevailed today for EM devices. As these electromagnetic devices offers portability and affordability they can be easily fit to the military forces operations. Due to the increasing demand on the usage of the electromagnetic spectrum and the related devices in military operations a great need of electromagnetic warfare has evolved. One such type of electromagnetic device for signal transmission and reception is the Cavity backed spiral antenna. It is broadly used for warfare systems. As they have inbuilt characteristics like operation at broadband frequencies, exhibiting circular polarization, physically small in size, less weight and flush mounting ability these Spiral antennas as the best choice for military applications. Very particular they are used in direction finding applications. Spiral antennas falls under the category of frequency independent antennas. These frequency independent antennas functions over a wide range of frequencies. The other peculiar characteristics of these frequency independent antennas are their performance over antenna parameters such as the radiation pattern, polarization and impedances do not alter over large bandwidth. Basically they do provide low gain. Therefore an array of these spiral antennas can meet the required gain for a specific operation. Spiral antennas are small in size since are fabricated with many windings in it, so the size of these antennas reduced. Circular polarization a critical antenna parameter that is taken into investigation in the direction finding applications. Spiral antenna is comprehensively implemented as a receiving radiating element for the direction finding applications. Spiral antennas retain reliable gain and input
Design and Investigation of the Performance …
377
impedance over wide bandwidths operating with circular polarization, so they can be used comfortably over a wide range of applications. One such application is the military surveillance. A broadband composite spiral antenna mainly consists of the feed and balanced to unbalanced (balun) transformer and backing cavity. The most is to be taken into consideration while designing the composite spiral antenna parameters, in particular the designing of the composite spiral radiator, the backing cavity and Balun transformer has to be performed with good precision values. Photo etching process is used in fabricating the composite spiral antenna as this is treated as a planar structure. Balun is a balanced to unbalanced transformer. Composite spiral antenna is a balanced line where as coaxial line is unbalanced. This necessitates incorporation of a suitable balun. Also the input impedance of the spiral circuit is around 110 whereas coaxial line impedance is 50 . Hence balun is used as impedance transformer, which transforms 50 to 110 and vice versa. At present the two general types of broadband spirals antennas they are in usage. The most widely used is the wire radiator, or the printed spiral. It consists of two long arms and a constant width strips that have been wound around each other to form planar spiral. Constant width is maintained throughout the spiral structure. The second type of the spiral is the complementary spiral. In these complimentary spiral antennas the widths of the conductors and the widths of the spaces between them are equal. It is also dependent on the function of angle as it is increasing with distance from the terminals of the antenna. The shapes of these complimentary Spiral antennas are generally conical probably than the planar shapes. Therefore they suit only to a specific type of applications as their shape, size and complexity will influence a lot. The shortcoming that is there for wire and planar spiral antennas can be mitigated by using an alternative Spiral antenna which utilizes varieties of radiating elements to improve the overall performance. Conventional Circular and Square Spiral antennas are depicted in the Fig. 1. The main objective of this research work is to design and simulate Spiral antenna for military applications for direction finding. Secondly, to fabricate the designed
Fig. 1 Conventional Circular and Square Spirals
378
K. Prasad and P. Kishore Kumar
antenna. Lastly to test the fabricated antenna and to investigate qualitative and quantitative attributes of the antenna. the In this research paper the design of spiral card and balun are done using a computer program generated in MATLAB software. This software generates the required co-ordinates for spiral as well as balun circuit. The necessary positive/negative films are generated and the circuits are printed on substrate using photolithographic techniques. The composite spiral antenna has been simulated using HFSS software. At last after the completion of the antenna design [2], the various parts are fabricated and assembled with precision. The assembled antenna is tested for various characteristics using network analyzer and anechoic chamber.
2 Concept of Frequency Independent Antennas An antenna with primary electrical characteristics that vary insignificantly with frequency over an extremely wide range; the various types of such antennas constitute a group of broadband antennas of which the ratio of maximum operating frequency to minimum ranges to 20:1 or more. The theoretical and technological foundations for frequency-independent antennas were established between 1957 and 1965 by the American scientists V. H. Rumsey, D. Dyson, and others. The most common frequency independent antennas are in the form of two-arm spiral and conical helical antennas, log-periodic antennas, and sickle-shaped dipoles. There are also multi arm spiral and helical antennas that have several independent inputs; a well known type is in the form of a conical dipole with an ultra-wide range of input impedances. Frequency independent antennas are used for shortwave radio communications, telemetry, and radio astronomy. During the 1970s lightweight types of relatively simple design were developed for various frequency ranges: log periodic wire antennas were developed for decametre waves, and spiral and helical antennas were created for centimetre and millimetre waves from strip conductors deposited on a fibre glass substrate by a photochemical process. Highly directional frequency independent antennas are being designed as horn antennas with walls having transverse ribs and antenna arrays composed of log-periodic or conical helical radiators positioned along radii in a specific sector of a circle. Researchers recognized early that the two arm spiral could be excited in two distinct modes while at least half the energy or gain is lost, the spiral remains a useful device for many applications and high quality results over band widths exceeding five octaves, 32:1 are quite common. Early investigators recognized that two distinct modes were possible for the two arm spiral. The Normal Mode or Sum Mode that excited the two arms out of phase and a difference mode that excited the two arms with equal amplitude, but with in phase currents. It was thought this mode would also be frequency independent with nearly constant impedance and null on axis, but attempts to utilize it were unsuccessful. The problem of feed line radiation due to the in-phase currents was recognized and seemed unsolvable. A number of people attempted to use hybrid rings to feed a two arm spiral. The two ring outputs used to
Design and Investigation of the Performance …
379
feed the spiral were brought into the center of the ring and attached to two coaxial lines placed side by side that went up through the cavity center and attached to the spiral terminals. Various means were tried to suppress the feed line radiation. Surrounding the feed lines with absorber, which was needed in the cavity anyway for broad band operation helped, but did not allow the quality of results being obtained in normal mode operation. The problem of operating the spiral in two or more modes simultaneously was simply and elegantly solved in 1960 when Paul Shelton of Radiating Systems Incorporated, suggested the use of a spiral with three or more arms. Shelton recognized that the number of useful modes on a spiral would be one less than the number of arms on the spiral i.e., a three arm spiral would have two useful modes; a four arm spiral would have three useful modes. Rumsey introduced concept of frequency independent antennas. There are two concepts namely Angle concept and the Ratio concept or the scaling concept.
2.1 Basic Principles of Direction Finding Systems Fundamentally, seven different methods are used to measure angle of attack (AOA). Angle of Attack (AOA) is a significant and critical parameter that directly affects the aerodynamic forces of an aircraft. The first and the simplest approach is to use a narrow beam to search beam antenna and to search the direction of interest. However this is not a very popular approach for EW applications. In the second method AOA is measured through amplitude comparison. Where as in the third method, AOA is measured through phase comparison. In the fourth method AOA is measured through Doppler Frequency shift. Time of Arrival Difference method is used to measure the AOA in the fifth method. Sixth method of finding AOA is through a microwave lens. Finally the last one in measuring AOA is with the help of multiple beam arrays and beam forming networks. Out of seven approaches 2 & 3 are the most common in EW applications. In both approaches a multiple number of antennas and receivers are needed. Besides, the antennas and receivers must be matched in amplitude or phase. Thus, the AOA information becomes the most costly parameter to be measured. Although numerous techniques have been developed to satisfy a wide variety of requirements, depending on the fundamental measurement principles the design of a design of a RDF device may be divided into two categories, they are Amplitude comparison and the Phase comparison/Time delay.
3 Composite Spiral Antennas Now a day there is a large need for the efficient antenna systems that are more suitable to function efficiently at adverse environment conditions and also has to provide desirable antennas performances. Also there is a great need to integrate those antennas
380
K. Prasad and P. Kishore Kumar
structure in different shaped bodies and outer surfaces. In particular, while designing those antennas systems, the size of the antenna is to be reduced specifically for the applications in order to meet space limitations on the outer surfaces of the vehicles. Initially this is a difficult task to be performed by the antenna engineers, later with great deal of literature study on miniaturized antenna systems, techniques and the experimentations made it easy to do. All these performance needs are to be satisfied but without scarifying it’s electrical performance. Such designed antennas can solve many engineering problems and have numerous applications in electromagnetic applications. For airborne platforms size and weight are at premium. Square spiral is a better choice when compared to round spiral because of reduced aperture. The size of conventional square spiral can be further reduced by using modulation technique. In this research paper the design and develop a composite spiral [3] antenna is performed for direction finding systems covering a frequency range for 0.5 GHz– 18 GHz with the dimension of 90 mm having square configuration for 0.5–2 GHz and round configuration for 2–18 GHz and having strip width and strip gap of 0.5 mm. The composite spiral radiator that is implemented in this research work consists of linear arms for four sides of the spiral. The largest and smallest sides of the spiral are decided by the lower and upper bounds of the frequency band of operation respectively. For circular spiral the maximum diameter of the spiral is decided by λ/pi at lowest operating frequency, where as for composite spiral the maximum side of the square is λ/4 at lowest operating frequency. Therefore a side reduction of 13.62% id achieved. Composite spirals are useful in direction finding systems in airborne platforms where size has premium importance. For further size reduction various modulation techniques are supposed to be incorporated [4]. The following Fig. 2 depicts an air force warfare system in which the frequency independent antennas are implemented.
Fig. 2 Air force warfare system
Design and Investigation of the Performance …
381
The three fundamental antenna components that in designing the Composite Spiral antenna are the designs of spiral radiator, the backing cavity and the balun transformer [3]. The frequency of operation plays a critical role in determining Spiral card dimensions. The smallest and largest sides of spiral card are chosen depending on the by the high and low frequencies that are under operation. Interestingly the Spiral cavity has to match with the Spiral card while designing. The side of the cavity selected such that it should be equal to largest side of spiral. Depth of the cavity varies based on the spiral dimensions and maximum depth is taken to be less than quarter wavelength at lowest frequency [5]. MATLAB programs are industrialized for the design of spiral radiator and balun. Thus design of spiral antenna involves the design of following components such as design of spiral card, design of balanced to unbalanced (Balun) transformer and the design of cavity. Once the antenna design calculations are performed the antennas is supposed to be designed. Later the antenna design takes place. After that the designed antenna is subjected to fabrication process. Fabrication is the process of realizing the designed antenna practically. The design of the antenna follows sequential steps. The design flow is shown in the Flow graph given in Fig. 3. All the dimensions of the antennas are treated as coordinates in designing the antenna in software. Aristo software is used in producing the master drawing. A good compatibility is to be provided between the coordinates that are used for designing using software tool and that are calculated initially as dimensions. Through the generated master drawing an artwork layout is created on the photo resist coated Mylar film [6]. Positive film is developed using this artwork layout and photo reduction techniques provided.
4 Results and Discussion The following three measurements were carried out on Composite spiral antenna covering 0.5–18 GHz frequency range. Radiation pattern measurements, Gain measurements and Return loss measurements.
4.1 Simulated Radiation Patterns Simulated Radiation patterns are obtained by using the HFSS software, and from this Radiation patterns, Gain can be calculated at different frequencies and the beam width also can be calculated at different angles which satisfies circular polarization which is obtained by calculating Axial Ratio. The comparison of the simulated results and the measured results were shown the following Figs. 4 and 5.
382
K. Prasad and P. Kishore Kumar
ANTENNA INITIAL DESIGN GENERATE MASTER DRAWING CREATE ARTWORK LAYOUT PERFORM PHOTO REDUCTION IMPLEMENTATION OF POSITIVE DEVELOPMENT LAMINATE CLEANING RESIST APPLICATION RESIST EXPOSURE RESIST DEVELOPMENT INSPECTION EATCHING PROCESS BONDING FINISHING Fig. 3 Flowchart for steps involved in micro strip antenna fabrication
4.2 Comparison on the Investigated Results The designed antenna is tested with the help of using Network analyzer in the anechoic chamber. The return loss plot is taken in the network analyzer. The radiation characteristics are investigated by using anechoic chamber. Here the horizontal and the vertical polarization characteristics in the radiation pattern is studied through the anechoic chamber. Using the radiation patterns 3 dB beam width, gain and squint of the antenna are measured. These measured results and simulated results are tabulated below.
Design and Investigation of the Performance …
383
0 -30
-7.00
Curve Info
30
dB(GainTotal) Setup1 : Sweep1 Freq='18GHz' Phi='0deg' dB(GainTotal) Setup1 : Sweep1 Freq='18GHz' Phi='90deg'
-14.00 60
-60 -21.00 -28.00
90
-90
-120
120
-150
150 -180
Fig. 4 Simulated Radiation pattern of composite spiral antenna at 18 GHz
4.3 Gain of the Antenna Gain of the Prototype Antenna is given in Table 1 Beam width of Composite Spiral Antenna Beam-width for Prototype Composite spiral antenna is shown in Table 2 Axial Ratio for Composite Spiral Antenna Axial Ratio for prototype antenna in shown in Table 3 Comparison of Specified and Realized Specifications Finally the Comparison for specified and realized specifications for prototype composite spiral antenna is shown in Table 4.
5 Conclusions Circular polarization is important characteristic of EW antennas. The Composite spiral antenna is designed and developed covering a frequency range of 0.5–18 GHz having the round configuration frequency of 2–18 GHz and square configuration
384
K. Prasad and P. Kishore Kumar
Fig. 5 Measured Radiation pattern of composite Spiral antenna at 18 GHz
Table 1 Gain of the composite Spiral Antenna Frequency
Simulated Gain of dB
Measured Gain in dBLi (Linear Isotrophic Antenna)
0.5 GHz
−7.5 dB
−7 dBLi
1 GHz
−4 dB
−3.5 dBLi
2 GHz
−3 dB
−3 dBLi
6 GHz
−3.5 dB
−3 dBLi
11 GHz
−4 dB
−2.5 dBLi
15 GHz
−3 dB
−3 dBLi
18 GHz
−3.5 dB
−3.5 dBLi
of 0.5–2 GHz with dimension of 90 mm × 90 mm with spiral width and gap of 0.5 mm. Experimental results of the antenna are compared with simulated results, and observed that there is a good agreement between simulated and experimental results. Furthermore this research work can also be extended to improve the axial ratio by using different loading techniques such as absorber and resistive loadings.
Design and Investigation of the Performance … Table 2 Beam width of Composite spiral antenna
Frequency
Simulated Beam width
Measured Beam width
0.5 GHz
120°
110°
89°
70°
1 GHz
Table 3 Axial Ratio of Composite Spiral Antenna
Table 4 Comparisons
385
2 GHz
80.18°
6 GHz
72.28°0
62° 62.8°
11 GHz
81.23°
62.6°
15 GHz
78.2°
58°
18 GHz
54.99°
52°
Frequency
Simulated Axial Ratio
Measured Axial Ratio
0.5 GHz
0.9 dB
1.8 dB
1 GHz
0.5 dB
0.5 dB
2 GHz
0.1 dB
1 dB
6 GHz
0.1 dB
0.1 dB
11 GHz
0 dB
0 dB
15 GHz
0 dB
0 dB
18 GHz
0 dB
0 dB
Specifications
Specified
Realized
Frequency band of operation
0.5−18 GHz
0.5−18 GHz
VSWR
3:1
2.5:1
Polarization
Circular
Circular
Beam width
50°−120°
50°−120°
Gain
−15 dB−0 dB
−15 dBLi−0 dBLi
Axial Ratio
2−3 dB
2 Db
Return Loss
Less than−10 dB
Less than−10 dB
The reduction in size and improvement in performance can be taken up by modulated square spirals antennas with gradual modulation and miniaturization techniques. The Defense systems are going to be extended up to 100 GHz. Hence an ultra broadband antenna can be designed over those frequency ranges by improved techniques.
386
K. Prasad and P. Kishore Kumar
References 1. Rumsey, V.H.: Frequency independent antennas. IRE Nat Conv. Rec., Part-I, pp. 114–118, July 1957 2. Constantine A Balanis: Antenna theory—Analysis and design, 2nd edn. Harper Row, New York (2002) 3. Kim, D., Kim, J., Kim, J., Park, W.-S., Hwang, W.: Design of a multilayer composite-antennastructure by spiral type. In: PIERS Proceedings, Marrakesh, MOROCCO, 20–23 March 2011 4. Morgan, T.E.: Reduced size spiral antenna. In: Proceeding 9th European Microwave Conference, pp. 181–185, sept. 1979 5. Fu, J., Chang, M., Li, P., Kong, W.: Design of composite spiral antenna loaded with a new type reflector in Ku-band. In: 2016 IEEE International Conference on Electronic Information and Communication Technology (ICEICT 2016), pp. 568–570 6. Song1, Y., Zhang2, P., Liu1, Y., Duan1, K., Huang1, L., Li1, M.: Broadband circular polarized composite spiral antenna using two-layer AMC. In: 2019 PhotonIcs & Electromagnetics Research Symposium—Fall (PIERS—FALL), Xiamen, China, pp. 1197–1203, 17–20 December, 2019
Dr. Kasigari Prasad noted Academician, Graphologist, Writer & Motivational Speaker, Presently working as Associate Professor of Electronics & Communication Engineering Department, at Annamacharya Institute of Technology & Sciences:: Rajampet (Autonomous), one of the reputed technical institutions in Andhra Pradesh. He completed his Ph.D in Antenna and Wave Propagation domain in Electronics and communication stream. He studied his Post graduate in M.A Political Science, Post-graduation in English, studied PGD in Personality development, and also completed his PGD in Journalism and Mass Communication. Dr. Prasad bagged International Young Investigator Award by IRNET, INDIA during the year 2012 for his research work. He is awarded as “The Best Engineer” on the eve of Engineers Day 2015 by IEI (India). Dr Prasad is also honored with the “The Jewel of Rajampet” award from the Regency Educational Institutions. Dr. Prasad is feathered with Best Educator Award from IARDO, India. He has published good number of technical papers in various domains such as Antennas, Microwaves, optical fibers.
Design and Investigation of the Performance …
387
Mr P. Kishore Kumar received his B.Tech degree in Instrumentation and control Engineering from SKTRMCE, Kondair, Affiliated to J.N.T.U. HYD, A.P in 2007 and M.Tech. Post Graduate in Digital systems and computer electronics in SJCET, Yemmiganur, JNTU Ananthapuram, A.P. in 2012. Currently He is working as Assistant Professor in Ravindra College of Engineering For Women, Kurnool, A.P., India. He published four technical papers in various International journal and National conferences. His current research interests include antennas and microwaves.
Learning Based Approach for Subtle Maintenance in Large Institutions Prakhar Lohumi, Sri Ram Khandelwal, Shryesh Khandelwal, and V. Simran
Abstract Maintenance of any infrastructure is a thankless job. Due to the problem driven approach that many institutions follow of starting the procedure of fixing the problems as and when they arise. This leads to delays due to delays in procurement of material required to solve these problems. To tackle this delay, this paper presents a data driven approach to maintenance which predicts the future number of complaints and maintains an inventory to handle those problems. Procurement Analysis deals with deriving insights from the procurement data to guide business decisions. This includes the decisions regarding stocking of the inventory. To properly stock inventory, it is required to find a pattern in the inventory requirements. For an large institution, this becomes a big issue if handled by an inexperienced person. In any large institution, inventory management and procurement analysis is needed to maintain the infrastructure of the organisation. This maintenance system is, in most cases, driven by complaints made by the people associated with the institution. If a person finds some fault in the infrastructure, they will register a complaint which will be rectified by using material from the inventory. Thus there is a direct relation between the complaints and procurement. Our approach, tries to streamline this process of inventory management and maintenance by predicting the number of complaints that can arise in the future. Knowing the number of complaints that can arise in the future, it becomes an easier task to procure inventory. Thus instead of following a on-the-fly problem driven approach of fixing the problems as and when they arise, we propose a data driven approach to analyse the past complaints and stock materials to solve future complaints before or as soon as they arrive.
1 Introduction In any large institution or organisation, the problem of maintaining and repairing the infrastructure is a tedious task. There are 3Ps of Effective Maintenance—Planning, Preparedness and Pro-activeness. We need proper planning about how complaints P. Lohumi (B) · S. R. Khandelwal · S. Khandelwal · V. Simran Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211004, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_31
389
390
P. Lohumi et al.
will be handled, our inventory should be prepared to handle the complaints and we should be proactive in our approach towards maintenance. This paper presents a solution to maintenance which plans maintenance activities better by reducing the maintenance cycle, prepares our inventory to handle the problem before it has arisen and proactively predicts the type and number of maintenance problems that can be encountered in the future. Maintenance requires manpower, expertise, and time. A simple complaint of repair can take a lot of time if proper planning is not done about how to handle the complaint. To do such planning, requires insight and expertise that few possess. In today’s world of Artificial Intelligence, machines can do this task of generating insights better than an inexperienced human being. Natural Language Processing (NLP) [1] is focused on enabling computers to understand and process human languages. Humans are able to read and write using a vocabulary built with alphabets. For computers, the vocabulary is the binary number system. To make computers understand human languages, the “data” in English or Hindi or Spanish need to be converted to “data" in numbers. This is what Natural Language Processing (NLP) is all about. Complaints are generally repetitive in nature, using which the algorithm is able to detect patterns in the complaints. This helps us to plan on the materials that might be needed in the future by looking at the trends and predictions of past complaints. This is achieved using Natural Language Processing with the help of Artificial Neural Networks and Recurrent Neural Networks. Neural Networks, “a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data” [2]. Using this technology to our advantage, we employed Neural Networks to classify the complaints and predict their future numbers.
2 Case Study We did a case study of the complaint management system of Motilal Nehru National Institute of Technology Allahabad. The complaint management infrastructure followed in Public and Private institutions is very different. In Public Institutes, like MNNIT Allahabad, the complaint management system is based on a problem-driven approach. Currently, the system waits for a problem to arise and then heads out to solve it. When a problem arises, the complainant registers a complaint in the complaint registers. The complaint registers are then passed on to the relevant authorities who take action on the complaint. This causes delays due to the passive nature of the approach and delays encountered in the procurement of materials required to solve the problem. With the introduction of faster procurement tools by the Government like Government e-Marketplace (GeM), the process of procurement of materials has become faster than before. Still there is delay in solving problems due to the passive nature of the approach.
Learning Based Approach for Subtle Maintenance in Large Institutions
391
To solve this problem, this paper suggests a data driven approach to maintenance. Complaints are usually recurrent and seasonal in nature. Thus instead of waiting for problems to rise, we use the past data to predict the future problems using Deep Learning. Using this prediction, we stand a better chance at solving the problems at a much faster rate. The prediction helps us in procurement of material before the problem has even arisen. Thus when the problem actually arises we can take immediate action OR tackle and solve the issue before it creates a problem. For eg, by assessment of past information we may know that a particular drain clogs in the rainy season and overflows on the roads causing water logging. When the rainy season is approaching, our algorithm will predict that the same problem may arise. Hence we will be able to handle the issue by getting the drain pumped or by making proper embankments on the road and so on and so forth. Thus this approach not only saves us time, but also gets rids of any inconvenience that may arise in the future.
3 Objective The objective of the paper is to establish an approach to solve the problem defined in the Case Study earlier. Under the current approach, how much material would be needed to solve future problems is done manually by officers. The objective is to build an algorithm, that takes in as an input the problems encountered in the past years. Using Natural Language Processing, these textual problems need to be converted into computer readable numbers and the complaints are classified into different buckets as shown in Table 1. The buckets are divided on the basis of the raw materials needed to solve different complaints. The algorithm then computes how many complaints of each type would be encountered in the future. These predictions of the future number of complaints will help the authorities make a better and informed decision about the paper.
4 Related Work The problem statement defined and the dataset used in the algorithm is made up of real time complaints fetched from MNNIT Allahabad. Due to the unique and specific nature of the problem statement, the literature survey has been limited. Numerous attempts have been made in classifying texts into various classes. This task has been attempted using probabilistic algorithms, keyword search and even deep learning using neural networks. Recurrent Neural Network with Word Embedding for Complaint Classification [3] approaches this task of complaint classification using word embedding and Recurrent Neural Networks. The paper also talks about how Space Vector Machine and TF-IDF [4] is used for classification.
392
P. Lohumi et al.
A similar project on “Research Paper categorization using Machine Learning and NLP” [5] by Aaqib Saeed. The objective of the article was to categorise research paper by most suitable conferences.In this article, the performance of different classifiers have been evaluated after applying text pre-processing, feature selection along with dimensionality reduction methods for text classification problem, which is to determine most suitable conference for a research paper based on its title.
5 Overview of Approach The approach adopted by us works in three steps. First we pre-process the complaints data using Natural Language Processing [1]. The data is cleaned and important information is extracted, discarding the redundant information like stop words. Using TF-IDF [4], the text is converted to frequencies. After this, the complaints are classified into various class based on the type of material required to solve each problem. For example, the complaints that require a CFL as a raw material to be solved are put in one class. These classes need to be decided by the user according to their use case. This classification is done using Artificial Neural Network. A multilayer neural network can distort the input space to make the classes of data linearly separable [6]. After we have calculated the number of complaints in each class, the future number of complaints of each class is predicted using Recurrent Neural Network [7] using a suitable time lookback.
6 Working 6.1 Pre-processing of Data • Removing Stop Words: All the data is converted to lower case to avoid any outliers because of case-sensitivity. Raw Data contains a lot of noise that needs to be removed. This noise may include articles, prepositions or other parts of speech. These may carry meaning when a human is reading the data but for a computer, they are just noise as they do not contain any information which is eliminated. • TF-IDF Vectorization [4]: TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a very powerful way of transforming text into a meaningful numerical representation. The text is broken into bigram vectors and their TF-IDF values are calculated.
Learning Based Approach for Subtle Maintenance in Large Institutions
393
6.2 Classification of Complaints Using Artificial Neural Network • Structure of Network: The neural network utilises two hidden layers apart from the input and output layers. Figure 1 shows the structure of the Neural Network. Input layer is shown in green, output in red, hidden layer in blue and the orange rectangles represent the dropout regularisation layer. • Dropout Regularisation: Large Neural Networks with small training sets can tend to overfit the data. This can lead to an illusion of greater accuracy when tested against the training set. To avoid this overfitting, we use Dropout Regularisation. Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. [9] • Activation Function: We have made use of two activation functions in the neural network. A ReLU [10] Activation Function has been employed in the hidden layers and Softmax [11] Activation Function in the output layer so that the neural network outputs probabilities of the input being in different class. The class with highest probability is selected as the output class. • Output Classes: The classes are decided on the basis of the raw materials required to handle the complaints thus making it dependant on the use case. In our case study of MNNIT Allahabad, the classes used are shown in Table 1.
6.3 Prediction of Future Number of Complaints Using Recurrent Neural Network Long Short Term Memory (LSTM) [8] have been used for the purpose of prediction of the future number of complaints. LSTM have the power to store the previous states in their internal short term memory using a feedback loop. This feedback loop creates a time feedback which allows the RNN to have a lookback. It allows the RNN to look at the previous outputs to determine the current output. This allows us to look at the previous 12 months of data to predict the number of complaints that will arise in the upcoming month. By extention, the same method can be applied to predict the number of complaints in the following year by looking at the complaints of previous years.
394
P. Lohumi et al.
Fig. 1 Artificial neural network used for classification Table 1 Classes used in our case study
Label
Complaint type
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Permission Glass work Whitewash and painting Gym and sports Insect related Construction Water cooler Fan regulator Pipe Leakage Sanitary ware Water tank Switchboard and wiring Light points (LED, CFL, bulbs) Tubelight related Fire extinguisher related Geyser related Cleaning related Air conditioner complaints Miscellaneous
Learning Based Approach for Subtle Maintenance in Large Institutions
395
7 Experiment and Results The data used for the purpose of our algorithm is textual complaints collected from Motilal Nehru National Institute of Technology. The data contains the complaints of several months. We use all the data except the last month to train our model and the final month to test our model. For example, the total dataset contains complaints from October 2018 to December 2019. We partition this into two parts, Part A which will have the complaints from October 2018 to November 2019, and Part B containing the complaints of December 2019. The Part A dataset of complaints is of around 3000 complaints. We manually labelled some of these complaints and used them to train our ANN model as described using a 80-20 train-test split. The remaining unlabelled complaints are them classified using the trained Artificial Neural Network. After the classification, the future number of complaints for each class is predicted using an LSTM approach as described in the sections above. The final prediction of this algorithm is the number of predicted complaints for the month of December 2019. This prediction is then compared with the actual number of complaints in December 2019 from Part B Dataset. In the Deep Learning approach using ANN and RNN, we achieved an accuracy of 98.02% and the prediction of the future number of complaints also showed a high degree of similarity with the actual data. Confusion matrix for the classification of complaints is shown in Fig. 2. A similar setup was made using linear algorithms like Linear Regression and Logistic Regression. The purpose of this was to compare the results obtained by our approach and compare them the approach of Linear Regression. On the same data, using Linear and Logistic Regression we achieved an accuracy of 86% on the classification of complaints but the prediction of the future number of complaints failed to show much similarity with the actual data.
8 Conclusion The paper presents our approach for prediction to help in material procurement. This algorithm has been implemented and embedded into a web portal and will be used in Motilal Nehru National Institute of Technology. Complaints can be registered and stored in the Database through the Web Portal. Thus the web portal represents a one stop solution for the maintenance needs of the institution. The high accuracy of our method proves that this method of classification on the basis of raw materials and subsequent prediction of each class is highly effective. Though there is scope for lots of future work, this paper establishes a base and proves the correctness of the method.
396
P. Lohumi et al.
Fig. 2 Confusion matrix
Conflicts of Interest Arvind W Kiwelekary and Laxman D Netak declares that they have no conflict of interest. Rachit Garg has received research grants from ATA Freight Line India Pvt. Ltd. Swapnil S Bhate owns a position of Innovation Associate in CATI department of ATA Freight Line India Pvt. Ltd.
References 1. Deng, L., Liu, Y.: Deep Learning in Natural Language Processing. Springer Publication, pp 1–20 (2018) 2. Nielsen, M.: Neural Networks and Deep Learning (2015) 3. Assawinjaipetch, P., Shirai, K., Sornlertlamvanich, V., Marukata, S.: Recurrent Neural Network with Word Embedding for Complaint Classification (2016) 4. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIRí99), pp. 222–229 (1999) 5. Saeed, A.: Research Paper Categorization Using Machine Learning and NLP (2016) 6. LeCun, Y., Bottou, L., Orr, G.B., Muller, K.R.: Deep Learning (1998)
Learning Based Approach for Subtle Maintenance in Large Institutions
397
7. Sanchez, E.N., Rodriguez-Castellanos, D.I., Chen, G., et al.: Pinning control of complex network synchronization: A recurrent neural network approach. Int. J. Control Autom. Syst. 15, 1405–1414 (2017). https://doi.org/10.1007/s12555-016-0364-4 8. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory (1997) 9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (2014) 10. Zou, D., Cao, Y., Zhou, D., et al.: Gradient descent optimizes over-parameterized deep ReLU networks. Mach. Learn. 109, 467–492 (2020). https://doi.org/10.1007/s10994-019-05839-6 11. Duan, K., Keerthi, S.S., Chu, W., Shevade, S.K. and Poo, A.N.: Multi-category Classification by Soft-Max Combination of Binary Classifiers
Temporal Localization of Topics Within Videos Rajendran Rahul, R. Pradipkumar, and M. S. Geetha Devasena
Abstract Searching and localizing special moments or important events in any video such as baby’s first steps, game-winning goal, lectures, medical diagnosis techniques and protagonist dialogues can be very useful and productive, but it is a time-consuming job to manually search for the anticipated clip from the plethora of videos. This is where temporal localization concept comes into play with the help of robust deep learning techniques and Intelligent video analytics. Studying patterns, recognizing key objects and spotting anomalies in the videos can be performed swiftly using intelligent video analytics. Increasing the number of meta tags assigned to a video, accuracy of searching within the video as well as from the bucket of videos can potentially improved. The system calculates the screen time of prime characters in a video, auto-generates tags using subtitle and object classifiers and further, localize topics within a video with the help of generated tags. Keywords Open Source Computer Vision · Intelligent Video Analytics · Convolution Neural Net · Visual Geometry Group · Sub Rip Text files
1 Introduction Searching for memorable or important events in any video can be segmented, even if the user does not title or add any metadata to the video. This can be implemented using the technology called temporal concept localization within videos. By deploying deep learning algorithms. Intelligent Video Analytics (IVA) analyzes videos to detect, identify and examine anomalies or particular events of interest, IVA automatically transforms raw video into a searchable structured data. In most searches, video retrieval and ranking are performed by matching query terms to existing metadata. However, the videos can contain an array of topics that aren’t always characterized by the uploader, and many of these miss localizations to brief but important moments within the video. Improved video search within the video, video R. Rahul (B) · R. Pradipkumar · M. S. Geetha Devasena Sri Ramakrishna Engineering College, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_32
399
400
R. Rahul et al.
summarization, highlight extraction, action moment detection can be performed with this system. It will localize video-level labels to the precise time in the video where the label actually appears and do this at an unprecedented scale. In a world full of video data capturing and devices where video data is generated and stored, there is potential for mining physical real-world data. Which can be used to improve existing systems with actionable data. Due to the exceptional improvement of deep learning, the performance of video analysis can be significantly extrapolated and give birth to new research angles with respect to video analysis. For example, pattern recognition is a field that is extensively reliant on video analytics with deep learning otherwise referred to as Deep video analytics. Security, live entertainment, and healthcare are reliant on video analytics. Video tag annotations are used by major social media platforms like YouTube and Facebook for video search. A majority of tags are assigned to videos by users which is time consuming and may result in annotations that are subjective and lack precision. There are several studies on utilization of content-based extraction techniques to automate tag generation. However, these methods require a lot of computation power and resources and challenging to apply across domains. In this paper, we investigate the most important features of a video that can be used for tag generation. The generated tags must be representative of the actual physical features of the video data so that the system can view the world from the perspective of human beings. This is achieved by using the following features and tag generation methods: (1)
(2)
(3)
The VGG16 mode proposed by K. Simonyan and A. Zisserman, is used to calculate the screen time of people and objects in the frame so as to realize the significance of that particular object or person. The TensorFlow object detection API is the framework for creating a deep learning network that solves object detection problems. There are already pretrained models in their framework which they refer to as Model Zoo. This includes a collection of pretrained models trained on the COCO dataset, the KITTI dataset, and the Open Images Dataset. A subtitle file has subtitles and time associated with each other. This can be used skip to sections of memorable dialogues. Subtitle files can pre-exist or can be dynamically generated. For example, YouTube employs dynamic subtitle generation when there is no subtitle file provided by the uploader.
2 Related Work Sigurdsson et al. [1] implemented a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. This temporal CRF model tracks a person’s action and movement models it and helps in reasoning. This approach automatically explores parts of the video which are most relevant to the actions being searched for. Mohammad Sabokrou [2] implemented a method for
Temporal Localization of Topics Within Videos
401
real-time anomaly detection and localization in crowded scenes. In this method, each video is defined as a set of non-overlapping cubic patches, and is described using two local and global descriptors. Experimental results showed that their algorithm is comparable to a state-of-the-art procedure on UCSD ped2 and UMN benchmarks, but even more time-efficient. The experimental results confirmed that their system can reliably detect and localize anomalies as soon as it happens in a video. Alwassel et al. [3] implemented Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. This approach was inspired by the human ability to accurately spot and find actions in video. This approach automatically explores parts of the video which are most relevant to the actions being searched for. Paul et al. [4] implemented W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. This method was developed to overcome frame-wise annotation requirement issues. This approach makes use of weak labels obtained from tagged videos on the Internet and makes use of the correlation of videos with similar tags to temporally localize the activities.
3 Proposed System The IVA system mainly focuses on localizing particular part of the video by taking the frames of a video pre-processing it and assigning auto-generated meta-tags with minimum run-time and maximum accuracy. The proposed system consists of following modules character screen time calculation, video extraction using subtitles, video extraction using auto generated tags, search using srt file (Sub Rip Text files) (see Fig. 1).
3.1 Methodology • • • •
Prime character screen time calculation using VGG16 model Generating meta tags using subtitles Auto-generating object tags using pre-trained object classifier Skipping frames in a video to reach anticipated frame.
4 Implementation and Results Prime Character Screen Time Calculation. The prime character screen time calculation is performed by training the character image with VGG16 pre-trained image classifier. In this paper cartoon video is used
402
R. Rahul et al.
Fig. 1 Workflow of the proposed IVA
and screen time of the particular cartoon character is identified. This technique can be used in various application such as in medical imaging, sports etc., For example in medical imaging application anomaly such as a tumor can be assigned as prime character, then if a patient body is scanned and the scanned image is passed into the screen time calculator, anomaly if any i.e., a tumor can be identified if detected. (see Fig. 2) shows the first process of extraction, were each frame of a video is extracted and stored. After the extraction process is over, all the images in the storage are manually labeled depending number of characters in the image. In the example shown in (see Fig. 3) three classes are used as there are two characters in the cartoon video. Class1 denoted character one, class-2 denotes character 2 and class-0 denotes neither character 1 or character 2. The labeled images are retrieved from the directory and passed into the array. Further, the images are pre-processed to fit into the model. The pretrained VGG16—CNN model (see Fig. 4) trained with the extracted images. As the input images for the VGG16 should be in the dimension (224 × 224 × 3) the extracted images are reshaped. The extracted image data is split into training and test dataset, the split is done in a ratio of 30% for test and 70% for train because this yields better accuracy when compared to 20 and 80% split. To increase the model accuracy and reduce the loss value the model had been trained by 100 epochs, then model is built and trained successfully with the help of necessary library files and supporting API.
Temporal Localization of Topics Within Videos
403
Fig. 2 Images extracted from the video
The model yields an accuracy of 87% and the result of the calculated screen time (see Fig. 5). The screen time is calculated with a validation data and the screen time of the two characters in the video had been calculated. We also tried with another test video which gave us an output of accuracy 77%, the accuracy of the model can be increased by improving the size of input image. Only one frame per second is extracted that can be further increased to thrust the overall accuracy. Generating Metadata from Subtitle File. Subtitles are widely used in many videos, while watching movies of other language subtitle play a major role for the user. Sub Rip Text files widely known as srt file displays the subtitle in a video. The srt file mainly consists of two attribute timestamp and its respective subtitle text. The data from the srt file is extracted using regular expression and stored in an excel file in csv format. (see Fig. 6) a sample metadata with the time stamp extracted from the srt file of the movie Harry Potter. By assigning meta data specific to the timestamp localization topics within videos becomes plausible. Auto-Generating Object Tags Using Pre-Trained Object Classifier. The images are extracted from the video with a frequency of one frame per second, each image is preprocessed and passed into TensorFlow pre-trained mobile netSSD object classifier, all the objects detected in that particular frame is assigned as a metatag. Figure 7 objection detection and identification of an image from a video. The objects in the frame such as monitor, mouse, cup etc., are assigned as metatag for the frame, Faster rcnn is another widely used object classifier which has higher accuracy but requires higher computation time and power. So, we have used mobile net SSD for faster detection with minimal computation power. The challenges
404
R. Rahul et al.
Fig. 3 Sample csv file
involved in object detection are variation of illumination on each video frame and image clarity issues. Skipping Frames in a Video to Reach the Anticipated Frame. The primal objective of this system is localizing a particular part in a video, the previous processes helps to identify the screen time of prime characters and autogenerate meta tags for each frame in the video which majorly support in the localization process. The metadata of the subtitle and the metadata of objects are combined along with their associated frame number and stored in a single csv file, then the data from the csv file are stored into 2 list one for meta data and another for timestamp. To perform localization map () function is used, if the user enters a dialogue of a video it compares with the metatags in the list and gives an output of the associated timestamp. Further, it is passed into a python code with starts the video from
Temporal Localization of Topics Within Videos
405
Fig. 4 VGG 16 architecture
Fig. 5 Output of the trained model and calculated screen time
Fig. 6 Sample metadata from subtitle file
the particular timestamp. If the user is not satisfied with the output, Esc key can be pressed, the next relevant part of the video is played. If all the relevant parts of the video are played then again, its asks for the user to enter the text. Figure 8 illustrates a sample cartoon video which skips the frames and starting the video from the anticipated frame.
406
R. Rahul et al.
Fig. 7 Object detection
Fig. 8 Skipping frames and staring the video from the anticipated frame
5 Conclusion The developed IVA system helps to localize a particular part in a video, this operation is performed using various techniques. Firstly, the screen time of prime characters in the video is calculated using VGG-16 model. This helps in segmenting the characters in the video. Secondly using the subtitles of the video, meta tags are generated. This process is carried out using.srt file of the video. To localize and traverse to a particular part of the video, time stamp obtained from the srt file is used and the frames are skipped. Next, objects in each frame are detected and assigned as meta tag for them, the meta tags are auto generated using pre-trained object classifier.
Temporal Localization of Topics Within Videos
407
Finally, all the meta tags generated from each technique are segmented and assigned for the respective frame. Then, localization process is performed. Temporal localization finds its application in various areas such as in medical field were doctors and surgeons can view specific video clips of previously recorded medical procedures. Reviews in sports events such as sixes, fours and out appeals in cricket, goals and fouls in football game etc., make use of video temporal localization to make faster review decisions. Large number of metatags are generated in this process. This makes it difficult to handle the huge amount of data. Further, it requires large memory space to store the metatags generated in each frame. Subtitles are not available for all videos so this is a major constrain in the localization process. Videos with multi-language conversations and languages other than English are difficult to be processed. Using Temporal localization, users can view particular instances in the videos as per their preferences, instead of playing the entire video. The system has improved accuracy, the screen time calculation of the character is faster. The VGG16 model used for calculation is most efficient when compared to other models like VGG-19 which requires high computation time.
6 Future Enhancement The existing system can be enhanced by increasing the overall localizing accuracy and reducing the time taken in assigning meta tags for the frames. To increase accuracy the meta tags taken from the srt file should be cleansed. For instance, conjunctions, punctuations and pronouns in the file should be neglected. Some videos don’t have srt file, in such case audio to text API can be used to generate subtitles. The pre trained classifier is generic, to increase object detection accuracy, new object (classes) can be included and existing classes can be retrained with new dataset depending on the application.
References 1. Sigurdsson, G.A., Divvala, S., Farhadi, A., Gupta, A.: Asynchronous temporal fields for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 585–594 (2017) 2. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly detection and localization in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–62 (2015) 3. Alwassel, H., Caba Heilbron, F., Ghanem, B.: Action search: spotting actions in videos and its application to temporal action localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 251–266 (2018) 4. Paul, S., Roy, S., Roy-Chowdhury, A.K.: W-talc: weakly-supervised temporal activity localization and classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 563–579 (2018)
Fast Training of Deep Networks with One-Class CNNs Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat
Abstract One-class CNNs have shown promise in novelty detection. However, very less work has been done on extending them to multiclass classification. The proposed approach is a viable effort in this direction. It uses one-class CNNs i.e., it trains one CNN per class, for multiclass classification. An ensemble of such one-class CNNs is used for multiclass classification. The benefits of the approach are generally better recognition accuracy while taking almost even half or two-thirds of the training time of a conventional multi-class deep network. The proposed approach has been applied successfully to face recognition and object recognition tasks. For face recognition, a 1000 frame RGB video, featuring many faces together, has been used for benchmarking of the proposed approach. Its database is available on request via e-mail. For object recognition, the Caltech-101 Image Database and 17Flowers Dataset have also been used. The experimental results support the claims made. Keywords One-Class CNNs · Fast Training · Rapid Convergence · Caltech-101 · Deep Learning
1 Introduction One-class Convolutional Neural Networks (CNNs) are being using for anomaly detection and bi-partitioned space search [1–5]. However, they are not usually applied to multiclass classification. Class-specific Convolutional Neural Networks (CNNs) have been used in action detection and classification in videos. The authors of [6] after drawing inspiration from [7–9] use two Class-specific Networks, the first for RGB images and the second for optical flow images respectively. However, the approach uses two CNNs those too for object detection and not classification by also using dynamic programming. The approach which comes close to the proposed approach is that of [6], which uses cue-based class-specific CNNs for visual tracking. However, they use very shallow untrained CNNs in a complex fashion with a small feature A. M. Hafiz (B) · G. M. Bhat Department of ECE, Institute of Technology, University of Kashmir, Srinagar 190006, J&K, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_33
409
410
A. M. Hafiz and G. M. Bhat
vector for use in subsequent stages. Again, in [10] the authors use very shallow CNNs and the length of the feature extracted in the later stages is small which may have affected the volume of contextual information available in the feature. They also initialize their network selectively. The authors also state that the training time is generally higher than those of contemporary techniques. Deep learning [11–19] frameworks are efficient at extracting discriminative features with increasing amount of context concurrent with increasing number of layers [20]. This has played a role in the use of feature extraction by deep neural networks [21] and subsequent use of these features with various classifiers like Support Vector Machine (SVM) [22], K Nearest Neighbor (K-NN) [23–26], etc. Using a deep neural network (e.g. AlexNet [27]), the features from the fully connected layer(s) are extracted and these are fed to a statistical classifier like SVM or KNN. In R-CNN [28], the authors use class-specific linear SVM classifiers using fixed-length features extracted with the CNN, replacing the softmax classifier learned by fine tuning. However, the positive and negative examples defined for training the SVM classifiers are different from those for fine-tuning the CNN. In [26], the authors use a KNN classifier on top of the last fully connected layer of a deep neural network. The authors do not use the features of the intermediate layers of the deep network. In [24], the authors combine the features of different layers of a deep neural network, and after dimensional reduction apply them to a KNN classifier. In this paper, a novel approach is proposed which consists of using several oneclass CNNs (pre-trained AlexNet’s) with Nearest Neighbor (NN) Classifiers. The advantages of this approach are generally better classification accuracy as compared to conventional multiclass deep networks, rapid convergence because of reduced training times and overall simplicity as compared to that of other contemporary techniques, while using the prowess of transfer learning as it benefits from the training experience of powerful CNNs like AlexNet [27]. For every one-class network, training is done as per convention. The networks are used for classification with last three layers removed, whereupon the feature map of the last fully connected layer or ‘fc Layer,’ is fed to a Nearest Neighbor Classifier. Here, as an experimental choice for demonstration of results, the application of the proposed approach has extended to video face recognition as well as object recognition. An RGB Video (640 × 480 pixels) of 1000 frames has been used for benchmarking purposes. The benchmarking video is available on request by e-mailing the corresponding author. For object recognition experiments, the Caltech-101 Image Database, and the 17Flowers Dataset have been used. In the experiments, the training time for the proposed technique was found to be as less by almost half or one-third of that taken by other conventional approaches.
2 Proposed Approach The deep network used is AlexNet [27]. Transfer learning is used. The proposed approach uses ‘C’ AlexNet’s, where ‘C’ is the number of classes. Thus it can be said
Fast Training of Deep Networks with One-Class CNNs
411
that ‘C’ one-class AlexNet’s have been used. Let each one-class AlexNet be denoted by transferNet i , where i = 1…C. First, a separate AlexNet is trained on one class of images. Next, a training dataset R comprising of C subsets is generated. Each subset Ri ∈ R consists of the emission of the fully-connected layer or ‘fc Layer’ of the ith AlexNet i.e. transferNet i which is fed on training images for class i. For classification of a region-of-interest or ROI, each trained AlexNet i.e., transferNet i gives its ‘fc Layer’ emission/feature-map when fed on the ROI. Let the Test Set S consist of C feature maps: S i , where i = 1…C. Next, the distance of the nearest neighbor of S 1 in R is found and the class of the nearest neighbor is noted. This procedure is repeated for i = 2…C. This gives a C-row, 2-column array (C × 2 array), called score_db whose 1st column entries give the minimum distance of S i to its nearest neighbor in R, and whose 2nd column entries give the corresponding class of the nearest neighbor of S i in R respectively. Let r m be row-index of minimum in 1st column of score_db. Thus, r m corresponds to the least distance among all neighbors of S i in R. Finally, the corresponding 2nd column entry (class of nearest neighbor) of the row having r m in score_db is noted and this is assigned as class of the ROI. The ‘Cosine Metric’ distance was used for nearest neighbor search. For an mx-byn data matrix X, treated as mx (1-by-n) row vectors x 1 , x 2 , …, x mx , and an my-by-n data matrix Y, treated as my (1-by-n) row vectors y1 , y2 , …, ymy , the ‘cosine distance’ d st between the vector x s and yt is given by Eq. (1). ⎞ y x s t ⎠ dst = ⎝1 − xs x s yt y t ⎛
(1)
Let N tr be the number of training images per class. The subset Ri ∈ R, consists of N tr × 4096, fc7 Layer emissions of transferNet i fed on N tr training images of one class. Thus, the size of the training set is K × 4096 where K = N tr × C, where C is the number of classes. The dimension 4096 is used because of the built-in structure of the ‘fc Layer’ in AlexNet. It should be noted that in the classification stage, the trained AlexNet’s are used only up to the ‘fc Layer.’ Inside the AlexNet architecture, this ‘fc Layer’ is designated as ‘fc7 Layer.’ Also, each one-class AlexNet i.e. transferNet i is trained end to end on the training images of its class. Figure 1 shows overview of the proposed approach (for four one-class networks).
3 Experimentation The training of the one-class models was done on a machine with an Intel® Xeon® (2 core), 16 GB RAM, and 12 GB GPU. AlexNet [27] was used as the deep learning model. The input images (ROIs, training images, and testing images) used for the model were RGB images with dimensions 227 × 227 × 3. The deep networks were trained using the Stochastic Gradient Descent with Momentum Algorithm,
412
A. M. Hafiz and G. M. Bhat
ROI
netTransfer1
netTransfer2
netTransfer3
netTransfer4
fc Layer Emission
fc Layer Emission
fc Layer Emission
fc Layer Emission
Class of Nearest Neighbor assigned to ROI
NN Search
fc Layer Emission
fc Layer Emission
fc Layer Emission
fc Layer Emission
netTransfer1
netTransfer2
netTransfer3
netTransfer4
Class 1 Training subset
Class 2 Training subset
Class 3 Training subset
Class 4 Training subset
Fig. 1 Overview of the proposed approach (for four one-class networks)
with Initial Learn Rate = 0.01, L 2 Regularization Factor = 0.0001, Momentum = 0.9, Validation Frequency = 50, and Validation Patience = 5. These parameters gave the best results. The trained deep networks converged successfully in the epochs mentioned below (Further training did not lead to significant improvements in recognition accuracy).
3.1 4Face Database This database used for benchmarking of the proposed algorithm was a 1000 frame, 640 × 480 pixels, RGB video and is available on request via e-mail. The video features four persons with varying illuminations and different face orientations due to pan and tilt of head. Figure 2 shows a single frame from the video.
Fast Training of Deep Networks with One-Class CNNs
413
Fig. 2 A frame from the benchmarking video
Fig. 3 Four ROIs extracted from a frame of the video (Note the varying illumination, which makes the fourth ROI difficult to extract)
ROIs were extracted per frame, up to a maximum of four. Figure 3 shows the four ROIs extracted from a frame of the video. The distance metric used in for Nearest Neighbor search was ‘cosine distance metric’ and the search was conducted for 1 nearest neighbor. This was found to give best recognition accuracy. Since the maximum number of persons is four, hence, four transfer learning trained, one-class AlexNet’s were used for the experimentation of the proposed approach. For benchmarking of the performance of the proposed approach, the same is compared with that of a Multiclass AlexNet trained on all four classes of persons using transfer learning, here referred to as netTransfer4, and also the Deep Network—KNN Hybrid approach of [24]. The performance of the approaches is shown in Table 1. All networks are trained on parameters which give sufficient training. As is observable from Table 1, the recognition accuracy of the proposed approach is better than that of a conventional deep learning model (Multi-class AlexNet trained for four classes using transfer learning), while coming next to the recognition accuracy of the approach of [24]. However, the total training time taken (using transfer learning) by all the four one-class deep networks is almost half of that taken by a four-class AlexNet. This is an advantage of the proposed approach over other approaches given
414
A. M. Hafiz and G. M. Bhat
Table 1 Comparison of various approaches on the 4Face Database Person 01 Person 02 Person 03 Person 04 Test accuracy Total deep (%) network training time (s) Total ROIs in Test Set
243
261
261
153
-
-
ROIs correctly 195 recognized by conventional Deep Learning Approach (Using a Single four-class AlexNet trained by Transfer Learning) TrainingSet = 140 images × 4 classes = 560 images Test Set = 60images × 4 classes = 240 images Mini-batch size = 16; Epochs = 1; Iterations = 35
261
261
153
94.7
391.14
ROIs correctly 216 recognized by Proposed Approach (Using 4 one-class networks with KNN) Training set per Network = 80 images Test Set per Network = 40 images Mini-batch size = 40; Epochs = 1; Iterations = 2; K = 1
261
261
153
97.1
222.01
(continued)
Fast Training of Deep Networks with One-Class CNNs
415
Table 1 (continued) Person 01 Person 02 Person 03 Person 04 Test accuracy Total deep (%) network training time (s) ROIs correctly 224 recognized by approach of [24] (Using a Single four-class AlexNet trained by Transfer Learning with KNN) TrainingSet = 140 images × 4 classes = 560 images Test Set = 60 images × 4 classes = 240 images Mini-batch size = 16; Epochs = 1; Iterations = 35; K = 1
261
261
153
97.9
391.14
in Table 1. As is evident from Table 1, the proposed approach gives comparable recognition accuracy to other conventional techniques in spite of the fact that the one-class networks are trained on a much smaller number of images (Training images per Network = 80, Testing images per Network = 40). For fine-tuning the performance of the proposed approach, various distance metrics were used in the nearest neighbor search algorithm. The performance of the proposed approach is shown in Table 2. Using cosine distance metric with K = 1 i.e. one nearest neighbor gave best results. The processing time for an ROI using the proposed approach is slightly more than that of the conventional approach using Deep Learning and KNN. Table 2 Performance of proposed approach for various distance metrics (K = 1, Total number of sample frames = 40, Person to recognize = ‘02’) Metric
Cosine
Correlation
Spearman
Number of Frames in which Person 02 was correctly recognized
29*
22
24
416
A. M. Hafiz and G. M. Bhat
Fig. 4 Some images from Caltech-101 Image Database for three categories (‘airplanes’, ‘cougar_body’, ‘butterfly’)
3.2 Caltech-101 Image Database Caltech-101 Image Database [29] was also used for evaluation of the proposed technique. Figure 4 shows some images from the database. Fifty categories were selected. The number of images selected from each category was 40. Thus equality in number of images per category was used. The images were randomly selected. The results of the experiments are shown in Table 3.
3.3 17Flowers Database 17Flowers Image Database [30] was also used fully for evaluation of the proposed technique. Figure 5 shows some images from the database. All images in the available 17 categories were selected. The number of images for each category in the database is 80. For experimentation the images were randomly selected with 70% of images (i.e. fifty-six images) for training and 30% of images (i.e. twenty-four images) for testing per class. The results of the experiments are shown in Table 4.
Fast Training of Deep Networks with One-Class CNNs
417
Table 3 Comparison of various approaches on Caltech-101 image dataset Approach
Test accuracy (%) Total deep network training time (s)
Conventional Deep Learning 85.2 Approach (Using a Single fifty-class AlexNet trained by Transfer Learning) Training set = 15 images × 50 classes = 750 images Test Set = 25 images × 50 classes = 1250 images Mini-batch size = 15; Epochs = 1;Iterations = 50
597.52
ROIs correctly recognized by Proposed Approach (Using fifty one-class networks with KNN Classifier) Training set per Network = 15 images Test Set per Network = 25 images Mini-batch size = 15; Epochs = 1; Iterations = 1; K = 1
86.6
407.25
Approach of [24] (Using a Single 87.3 fifty-class AlexNet trained by Transfer Learning with KNN) Training set = 15 images × 50 classes = 750 images Test Set = 25 images × 50 classes = 1250 images Mini-batch size = 15; Epochs = 1; Iterations = 50; K = 1
597.52
Note that the training time of proposed approach is much lesser as compared to those of other techniques
4 Conclusion and Future Work In the context of one-class CNNs, very little work has been done on using them for multi-class classification. In this paper, a novel deep learning technique is presented for multi-class classification, which consists of using one-class deep networks combined with KNN classification. The one-class deep networks are trained only on one class, leading to much lesser training times viz. almost half of that of conventional networks and generally better recognition accuracy. The deep neural network used is AlexNet. For experimentation, we have used the tasks of face recognition and object recognition respectively as applications of the proposed approach. A new face recognition video database has been used which is available on request by email. Also,
418
A. M. Hafiz and G. M. Bhat
Fig. 5 Some images from 17Flowers Dataset
for the object recognition task, the Caltech-101 Image Dataset, and the 17Flowers Dataset have been used respectively. The performance of the proposed approach is compared experimentally with that of other conventional approaches and advantages of the proposed approach are noted. Future work would involve making the proposed approach more efficient. Also, efforts will be made to apply the proposed approach to other contemporary network architectures, and also to other deep learning associated tasks like Instance Segmentation [31].
Fast Training of Deep Networks with One-Class CNNs
419
Table 4 Comparison of various approaches on 17Flowers dataset Approach
Test accuracy (%) Total deep network training time (s)
Conventional Deep Learning 92.2 Approach (Using a Single seventeen-class AlexNet trained by Transfer Learning) Training set = 56 images × 17 classes = 952 images Test Set = 24 images × 17 classes = 408 images Mini-batch size = 56; Epochs = 1; Iterations = 17
774.82
ROIs correctly recognized by 93.6 Proposed Approach (Using seventeen one-class networks with KNN) Training set per Network = 56 images Test Set per Network = 24 images Mini-batch size = 56; Epochs = 1; Iterations = 1; K = 1
543.32
Approach of [24] (Using a Single 94.9 seventeen-class AlexNet trained by Transfer Learning with KNN) Training set = 56 images × 17 classes = 952 images Test Set = 24 images × 17 classes = 408 images Mini-batch size = 56; Epochs = 1; Iterations = 17; K = 1
774.82
Note the much lesser training time of proposed approach as compared to those of other techniques
References 1. Oza, P., Patel, V.M.: Active authentication using an autoencoder regularized CNN-based oneclass classifier. In: 2019 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019), 14–18 May 2019, pp 1–8. doi:https://doi.org/10.1109/FG.2019.875 6525 2. Perera, P., Patel, V.M.: Learning deep features for one-class classification. IEEE Trans. Image Process. 28(11), 5450–5463 (2019). https://doi.org/10.1109/TIP.2019.2917862 3. Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3379–3388 (2018) 4. Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep one-class classification. In: Paper presented at the Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research (2018) 5. Zhang, M., Wu, J., Lin, H., Yuan, P., Song, Y.: The application of one-class classifier based on CNN in image defect detection. Procedia Comput. Sci. 114, 341–348 (2017). https://doi.org/ 10.1016/j.procs.2017.09.040
420
A. M. Hafiz and G. M. Bhat
6. Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: BMVC, 2014 7. Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR, 2015 8. Gemert, J.Cv., Jain, M., Gati, E., Snoek, C.G.: Apt: action localization proposals from dense trajectories. In: BMVC, 2015, p. 4 9. Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization. In: CVPR, 2015 10. Lu, J., Wang, G., Deng, W., Moulin, P., Zhou, J.: Multi-manifold deep metric learning for image set classification. In: CVPR, 2015 11. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 12. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 13. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015) 14. Hafiz, A.M., Bhat, G.M.: A survey of deep learning techniques for medical diagnosis. In: Singapore, 2020. Information and Communication Technology for Sustainable Development. Springer Singapore, pp. 161–170 15. Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2020). doi:https://doi.org/10.1109/TPAMI.2020.298 2166 16. Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: a survey. IEEE Trans. Knowl. Data Eng. 1–1 (2020). doi:https://doi.org/10.1109/TKDE.2020.2981333 17. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M.: Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020). https://doi. org/10.1007/s11263-019-01247-4 18. Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G.: A survey of deep learning and its applications: a new paradigm to machine learning. In: Archives of Computational Methods in Engineering. doi:https://doi.org/10.1007/s11831-019-09344-w 19. Hafiz, A.M., Bhat, G.M.: Multiclass classification with an ensemble of binary classification deep networks. arXiv preprint arXiv:200701192 (2020) 20. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. Springer, pp 818–833 (2014) 21. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp. 3320–3328 (2014) 22. Tang, Y.: Deep learning using linear support vector machines. arXiv preprint arXiv:13060239 (2013) 23. Sitawarin, C., Wagner, D.: On the robustness of deep K-nearest neighbors. arXiv preprint arXiv: 190308333 (2019) 24. Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:180304765 (2018) 25. Le, L., Xie, Y., Raghavan, V.V.: Deep similarity-enhanced K nearest neighbors. In: 2018 IEEE International Conference on Big Data (Big Data), 10–13 December 2018, pp 2643–2650. doi:https://doi.org/10.1109/BigData.2018.8621894 26. Ren, W., Yu, Y., Zhang, J., Huang, K.: Learning convolutional nonlinear features for K nearest neighbor image classification. In: 2014 22nd International Conference on Pattern Recognition, 24–28 August 2014, pp. 4358–4363. doi:https://doi.org/10.1109/ICPR.2014.746 27. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 28. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 December 2015, pp. 1440–1448. doi:https://doi.org/10.1109/ICCV.2015.169
Fast Training of Deep Networks with One-Class CNNs
421
29. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178. IEEE (2004) 30. Nilsback, M.-E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR, 2006 31. Hafiz, A.M., Bhat, G.M.: A survey on instance segmentation: state of the art. Int. J. Multimedia Inform. Retrieval (2020). https://doi.org/10.1007/s13735-020-00195-x
Recognition of Isolated English Words of E-Lecture Video Using Convolutional Neural Network Uday Kulkarni, Chetan Rao, S. M. Meena, Sunil V. Gurlahosur, Pratiksha Benagi, and Sandeep Kulkarni
Abstract Speech Recognition has been gaining a lot of importance, as there is tremendous growth in its applications such as building subtitles for e-lecture videos, transcription of recorded speech for people with physical disabilities. Speech recognition is a complex task because it involves various strong accents, run-over words, varying rates of speech and background noise. Previously used Speech Recognition systems were speaker dependent and was challenging in constructing acoustic models, which had non-linear boundaries. The paper presents recognition of isolated English words in three steps, pre-processing, segmentation and extraction of Mel Frequency Cepstrum Coefficient (MFCC) of an audio signal from a video. Further, training and classification of audio signals is done using Convolutional Neural Network (CNN). Various types of input features, which play a vital role in the recognition process is described and we infer that, the representation of MFCC feature with varying length in analysis window and window step produces different results. The feature set with higher Winlen and Winstep, yields better result i.e., 97%. Keywords Convolution neural network · Mel frequency coefficient cepstral · Fast fourier transform (FFT) · Hidden markov model (HMM) · Gaussian mixture model (GMM) · Artificial neural network (ANN) U. Kulkarni (B) · C. Rao · S. M. Meena · S. V. Gurlahosur · P. Benagi · S. Kulkarni KLE Technological University, Hubballi, India e-mail: [email protected] C. Rao e-mail: [email protected] S. M. Meena e-mail: [email protected] S. V. Gurlahosur e-mail: [email protected] P. Benagi e-mail: [email protected] S. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_34
423
424
U. Kulkarni et al.
1 Introduction Recognition of Isolated English words is transcription of human oration into readable textual data. It involves building of system for mapping string of words with acoustic signals [1]. Nowadays, human-computer interaction has been a vital part of our life, as humans are dependent on computer for several tasks, for example communication, control and command. People with disabilities find it hard to use keyboards or to use LCD displays. Therefore, interaction with computer with speech as a medium is of great significance to people with physical disabilities [2]. This paper addresses one such issue, i.e., building of subtitles for engineering e-lecture videos. Over the years, speech recognition systems have evolved over different algorithms from diverse disciplines. Speech recognition systems are built upon three collective approaches namely, pattern recognition approach, acoustic-phonetic approach and artificial intelligence approach [3]. Even after various advances, the execution of speech recognition in real time situations delays human level execution [4]. Recognition of English words is a challenging task because human speech signals are very distinct from each other due to their prosodic features, background noise, different dialects and run-over words. Many approach to model audio classification had their shortcomings, due to the requirement of huge training data [5, 6]. Neural network and gradient descent algorithms didn’t scale well, albeit the increase in number of layers, nodes and connections, didn’t guarantee to solve the problem. Many machine learning approaches required normalized data as they don’t sustain well when they have features with different scales. Speech recognition systems such as HMM coupled with GMM, are susceptible to small in-variance shift of speech features. The aforementioned shortcomings are addressed by CNN. CNN’s are designed with one or more convolutional layers, pooling layers, dense layers and lastly a softmax layer, which predicts the output statistically using probabilities computed for each class [7]. CNN takes advantage of its special structure such as spatial connectivity, limited weight sharing and pooling, which makes CNN robust to small variations in speech features [5, 8]. The outline of this paper is summarized as follows: Sect. 2 presents related work. Section 3 describes proposed work, Sect. 4 briefly explain feature extraction (MFCC), CNN and implementation details. Section 5 discusses the results. Section 6 concludes the paper.
2 Related Work Before the emergence of neural network, researchers used high level features such as MFCC on several classification algorithms to identify phoneme in each frame. Speech recognition system such as HMM coupled with GMM, which are based on expectation maximization algorithm performed well, and is discussed in [9–11]. HMM
Recognition of Isolated English Words of E-Lecture Video Using …
425
handled varying temporal speech features and GMM with its discriminative power predicts each of the HMM state. Their performance was further improved by fine tuning with discriminative features, after which they were trained for maximum efficiency. GMM’s had major shortcomings they were statistically inefficient to model non-linear boundaries [12]. HMM perform better with ANN, as ANN are powerful discriminative models and HMM are good at handling temporal features of speech. Hybrid HMM/ANN’s have been performing better than HMM/GMM due to their ascribed use of “deep learning” [9, 13–16]. Discusses how the increased performance is related to both increase in layer as well as increase in the level of abstractness, uses of context-dependent phonemes are described in [17, 18]. Use of spectral features (which are obtained by converting time domain signals into frequency domain, using FFT) opposed to cepstral features (MFCC) are described in [7, 19]. Authors et al. [20–22] discusses CNN system with filter bank energies as input. Use of raw audio signal as input is proposed in [5]. CNNs have been applied to acoustic modeling before, notably by [6, 23], in which it was applied over acoustic frames that overlap in order to learn more stable acoustic features over time for classes such as phone, speaker and gender.
3 Proposed Work Speech Recognition is mainly of four types namely, isolated words recognition, connected words recognition, continuous speech recognition and spontaneous speech recognition. In our approach, we present a speaker independent isolated word recognition system. The proposed approach involves four phases namely, segmentation, feature extraction, training and classification using CNN. The input to CNN are MFCC features and filter bank energies, which are later compared to prove which features tend to best represent audio features. The drawbacks which are evident in other speech recognition systems are overcome by using CNN. Own data set is built that contains audio recording of English words and few engineering e-lecture videos. Audio signals recorded contain 107 English words recorded with help of three speakers, two male and one female. Each word has 15 samples, five sample each by three speakers. All of these were recorded in a single audio .wav file, in a well setup radio station. Engineering e-lecture videos are collected from various machine learning tutorials by AndrewNg. Popular keywords were selected from these videos for recognition purposes. The first phase includes extracting audio from engineering e-lecture videos using audacity. Further, segmentation of these audio signal is done by detecting silence in audio file and then segmenting it using pydub, a free python library for the process of segmentation. Feature extraction, which is the most important step as the entire classification performance is dependent on how well the features are represented for the classifier to classify. MFCC and filter bank energies are used, which have been state-of-the-art feature extraction algorithm since 1980’s. Lastly, these features are fed as input to CNN for classification of audio signals.
426
U. Kulkarni et al.
Fig. 1 Signal before pre-emphasis
4 Implementation 4.1 Feature Extraction The Feature Extraction phase aims to present a better illustration of the speech waveform. This form of representation of audio signals, intents to reduce the loss of information that discriminates these waveform together. In this phase, two features are extracted, MFCC and filter bank energies. Steps involved in calculating MFCC features are Pre-emphasis, Framing, Windowing, Fast Fourier Transform (FFT), applying Mel Filter Bank and Discrete Cosine Transform (DCT). Filter bank energizers are the intermediate representation of MFCC features, which are obtained after the application of Mel Filter Banks on the audio signal.
4.1.1
Pre-emphasis
This step is to amplify higher frequencies, as higher frequencies have lower magnitude when compared to lower frequencies. This also increases signal to noise ratio (SNR). Figures 1 and 2 shows an audio signal before and after pre-emphasis.
4.1.2
Framing
After Pre-emphasis of an audio signal, an audio signal is divided logically into frames. Frequency in an audio waveform change overtime, so logically divide an audio waveform into frames, assuming that the frequency is static over the short period of time. Typical audio frames are 25 ms in length and overlap (frame step) of 10 ms. For a 16 kHz signal, each frame would be of 0.025 * 16,000 = 400 ms and
Recognition of Isolated English Words of E-Lecture Video Using …
427
Fig. 2 Signal after pre-emphasis
frame step equals to 160 ms. Therefore, the first 400 ms sample start from 0 and next 400 ms sample would start from 160 ms.
4.1.3
Windowing
Hamming window is used in our approach, which has a bell shape at the center and tapers at the corners, referring Fig. 3. It is widely used to avoid the spectral leakage, which is caused when beginning and end of a sample do not match. When an audio signal is logically divide into frames, the ends of two or more sample do not match, and they may lead to spectral leakage, to avoid this a windowing function like hamming windows is used which smoothens up the curve at the ends to zero Fig. 3 Hamming window function
428
U. Kulkarni et al.
and thus avoiding discontinuity in signals. Equation 1 represents hamming window function (HWF). w[n] = 0.54 − 0.46 cos
4.1.4
2π n N −1
(1)
Fast Fourier Transform and Power Spectrum
FFT is used to convert signal from time domain to frequency domain. N-point FFT is applied on each frame to calculate the frequency spectrum, which is also known as Short Term Fourier Transform. Further, the power spectrum is calculated for each frame. This is encouraged by the human cochlea, by this, it means that information is gained about the frequencies that are present in each of those frames. The power spectrum (periodogram) is calculated using the Eq. 2. Where xi is the ith frame of signal x. P=
4.1.5
|F F T (xi )|2 N
(2)
Filter Banks
To obtain filter banks triangular shaped filters are used. They are 26–40 in number. These filters are applied using Mel-Scale on the power spectrogram to extract frequency bands. Mel Scale is inspired by human’s non-linear perception of sound. These filters are more close to each other at lower frequencies (being more discriminative) than higher.
4.1.6
Mel-Frequency Cepstral Coefficients
Filter Banks computed in the previous steps are highly correlated and overlap each other. Thus to de-correlate Discrete Cosine Transform is applied on the 26–40 coefficients obtained from Filter Banks. Only 12–13 coefficients are kept and rest are discarded, as they present only fast changing filter bank features which do not contribute much for the recognition task. Figure 4. represents MFCC features.
Recognition of Isolated English Words of E-Lecture Video Using …
429
Fig. 4 MFCC features
4.2 Convolutional Neural Network In our approach, we use 2D convolution, where the input MFCC speech features are represented in an image format. The following section explains more about the network and how it is organized.
4.2.1
Convolution Layer
Convolution is a process of dot multiplication where each neuron from a convolutional layer acts as a receptive field or the field of vision, on a small region of the input [24]. These receptive fields (neurons) tile over each other to cover the entire space. These neurons acts as a local filter and they very well exploit spatial correlation present in audio signals. The first Convolutional layer detect features that can be easily recognized and interpreted, as the number of Convolutional layer increases, they detect more features which are relatively small, abstract and hard to interpret. The last layer of a CNN combines all the features detected in previous layers and makes a specific classification of the input data. Figure 5 represents how various layers are organized in CNN. Figure 6 shows how input features are represented in 2-Dimension The input to the convolutional layer is the MFCC features. There are 13 coefficients for each frame of an audio signal. Our approach has a total of 50 frames, thus making a total of 650 coefficients. The input is re-shaped to 26 * 26 by padding the input features with zeros. Every neuron of a convolution layer will be connected to a small part of the input neurons. In Fig. 7, a 5 * 5 region is connected by a neuron of a hidden layer, convolving over 25 features. In CNN, weights and biases are shared among the neurons present at the hidden layer. For example, as shown in above figure, each neuron has a bias and 5 * 5 weights connected to its local receptive fields. The same bias and weights
430
U. Kulkarni et al.
Fig. 5 Layers in CNN
Fig. 6 2D input MFCC features
are shared among all the hidden neurons from a convolution layer. Therefore, all neurons search for a particular feature over the input space at different locations leading to a feature map. Thus by using different sets of shared weights and bias, a number of feature maps are generated. The numbers of feature maps are usually described by the number of filters at a CNN layer. Initial weight initialization is made by using Glorot Uniform Distribution, where the weights are drawn from a distribution with mean zero and with some specific variance. Distribution used is Gaussian distribution. Output of j, kth hidden layer is given by Eq. 3,
Recognition of Isolated English Words of E-Lecture Video Using …
431
Fig. 7 Convolution operation
σ b+
4 4
wl,m a j+l,k+m
(3)
l=0 m=0
where, σ is the neural activation function—RELU. b is the shared value for the bias. wl,m is a 5×5 array of shared weights. And, finally, aj,k is used to denote the input activation at position j,k. The activation function used here is RELU. The rectifier function is an activation function f (x) = Max(0, x). The nodes which use this function are known as RELU node. The reason that RELU is used than conventional linear activation function is it can add non-linearity to the network. Other advantage of using RELU over other activation function is, it is less prone to gradient descend and it has sparse representation compared to sigmoid or tanh, which have dense representation.
4.2.2
Pooling Layer
A pooling layer is generally used after convolution layer. The main aim of this layer is to down sample the input data that is by reducing the dimensionality [25]. Pooling layer takes output from convolution layer, having n number of feature maps. These feature maps are condensed using max pooling. In max-pooling, a pooling unit outputs the maximum of 2 * 2 region of the feature map, as shown in Fig. 8.
432
U. Kulkarni et al.
Fig. 8 Pooling operation
4.2.3
Dropout Layer and Dense Layer
Dense layer is a fully connected layer which is used after convolution and pooling layers. In our network, we use a dense layer after our 2nd max pool layer. The purpose of a dense layer is to establish connections with all the neurons of a hidden layer, which will help to establish output for further layers. Dropout layer is used to prevent inter-dependencies between nodes. Here, randomly nodes are ignored so that nodes do not learn functions which rely on input from other nodes. This approach makes network more robust and prevents convolution neural networks from over fitting [26].
5 Results The network is trained with 107 audio signals on an i7 processor with 32 GB ram and 2 GB NVIDIA GTX 950. The network is built on Lasagne framework for python, which uses Theano as base library. Table 1 depicts CNN configuration, on which the network was run for 1200 epochs, yielding a result of 97%. Figure 9, depicts the confusion matrix i.e., correctly and wrongly classified audio signals. From Table 2, the representation of MFCC feature with varying length in analysis window and window step produces different results. The feature set with higher Winlen and Winstep, yields better result i.e., 97% when compared to the other with 78%.
Recognition of Isolated English Words of E-Lecture Video Using … Table 1 CNN layer information
Name
433 Size 1 × 26 × 26
0
input
1
conv2d1
23 × 22 × 22
2
maxpool 1
23 × 11 × 11
3
conv2d2
46 × 7 × 7
4
maxpool 2
46 × 3 × 3
5
dropout 1
46 × 3 × 3
6
dense
256
7
dropout 2
256
8
output
107
Fig. 9 Confusion matrix Table 2 MFCC feature extraction using different parameters
S. No.
Winlen
Winstep
Result (%)
1.
Duration/20
Duration/20
78
2.
Duration/30
Duration/50
97
434
U. Kulkarni et al.
Table 3 MFCC Features v/s Filter Bank Energies
No. of coefficient MFCC Filter bank energies
Result (%)
650
97
1300
89
Recently, there has been a lot of discussion regarding which feature has to be used between MFCC features or filter bank energies. As MFCC features tend to lose correlation between the frames of a signal, due to application of Discrete Cosine Transform. Here, in Table 3 comparison between the two is given, by using the same network as well as same window length and window steps. In our study, MFCC features have proved to be better than Filter Bank Energies, even though Filter Bank Energies carried more information (features).
6 Conclusion and Future Work In this paper, its discussed how CNN can be applied for recognition of isolated English words. We have delineated the effects of varying parameters of window under analysis and different input speech features and the effect they have on the output. Further we focus our work on giving multiple inputs to CNN network, like features with their second or third temporal derivative, use of multi modal features.
References 1. Jurafsky, D.: Speech and Language Processing. Pearson Education India (2000) 2. Grewal, S.S., Kumar, D.: Isolated word recognition system for English language. Int. J. Inform. Technol. Knowl. Manag. 2(2), 447–450 (2010) 3. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R.: Automatic speech recognition and speech variability: a review. Speech Commun. 49(10–11), 763–786 (2007) 4. Deng, Y., Acero, D.G.E.: Context-dependent trained machine neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(2), 33–42 (2012) 5. Palaz, D., Doss, M.M., Collobert, R.: Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 19 Apr 2015, pp. 4295–4299. IEEE 6. Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, 14 Jun 2009, pp. 609–616 7. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 25 Mar 2012, pp. 4277–4280. IEEE
Recognition of Isolated English Words of E-Lecture Video Using …
435
8. Pujar, K., Chickerur, S., Patil, M.S.: Combining RGB and depth images for indoor scene classification using deep learning. In: 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 14 Dec 2017, pp. 1–8. IEEE 9. Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: 2012 8th International Symposium on Chinese Spoken Language Processing, 5 Dec 2012, pp. 301–305. IEEE 10. He, X., Deng, L., Chou, W.: Discriminative learning in sequential pattern recognition. IEEE Sig. Process. Mag. 25(5), 14–36 (2008) 11. Deng, L., Li, X.: Machine learning paradigms for speech recognition: an overview. IEEE Trans. Audio Speech Lang. Process. 21(5), 1060–1089 (2013) 12. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012) 13. Dahl, G., Ranzato, M.A., Mohamed, A.R., Hinton, G.E.: Phone recognition with the meancovariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010) 14. Mohamed, A.R., Sainath, T.N., Dahl, G., Ramabhadran, B., Hinton, G.E., Picheny, M.A.: Deep belief networks using discriminative features for phone recognition. In: 2011 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), 22 May 2011, pp. 5060– 5063. IEEE 15. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBNHMMs for real-world speech recognition. In: Proceeding of NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 1 Dec 2010 16. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: 2011 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), 22 May 2011, pp. 4688–4691. IEEE 17. Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, 11 Dec 2011, pp. 24–29. IEEE 18. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012) 19. Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1096–1104 (2009) 20. Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 26 May 2013, pp. 8614–8618. IEEE 21. Bocchieri, E., Dimitriadis, D.: Investigating deep neural network based transforms of robust audio features for LVCSR. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 26 May 2013, pp. 6709–6713. IEEE 22. Palaz, D., Collobert, R., Doss, M.M.: End-to-end phoneme sequence recognition using convolutional neural networks. arXiv preprint arXiv:1312.2137 (2013) 23. Hau, D., Chen, K.: Exploring hierarchical speech representations with a deep convolutional neural network. UKCI 2011 Accepted Papers, vol. 37 (2011) 24. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. Handb. Brain Theor. Neural Netw. 3361(10), 1995 (1995) 25. Desai, S.D., Giraddi, S., Narayankar, P., Pudakalakatti, N.R., Sulegaon, S.: Back-propagation neural network versus logistic regression in heart disease classification. In: Advanced Computing and Communication Technologies, pp. 133–144. Springer, Singapore (2019) 26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Indoor Object Location Finding Using UWB Technology Jay A. Soni, Bharat R. Suthar, Jagdish M. Rathod, and Milendrakumar M. Solanki
Abstract UWB technology is an emerging technology and gives the exact location with accuracy in centimeters. UWB technology is working on the very extremely short pulses and carrier-free band. This research paper proposes an implementation of an indoor positioning system using Ultra Wide Band (UWB). We create a network using anchors which is capable of tracking multiple UWB tag and gives the exact location of a tag in the x-axis, y-axis, and z-axis. Keywords Ultra wide band · Indoor positioning systems (IPSs) · Real timing localization · Uwb transceivers · Uwb tag · Graphical user interface (GUI)
1 Introduction Positioning is identifying the location of objects, people and another thing. Positioning, that classified into two types, outer positioning and indoor positioning and these types are depend on environment. Whereas outer positioning is performed at outside buildings and indoor positioning is performed at inside building like mall, house, hospital. Global positioning system is preferable, and efficient for outer spaces rather than the indoor infrastructure and also the satellite radio signals cannot be penetrated through the wall. The indoor positioning system is to determine the actual realtime position of things in physical space repetitively. Current all tracking systems using such as only Bluetooth, WLAN are not too flexible and needs to recalculations J. A. Soni (B) · B. R. Suthar · J. M. Rathod · M. M. Solanki Birla Vishvakarma Mahavidhyalaya (an Autonomous Institution), Anand, Gujrat, India e-mail: [email protected] B. R. Suthar e-mail: [email protected] J. M. Rathod e-mail: [email protected] M. M. Solanki e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_35
437
438
J. A. Soni et al.
of location in small change in infrastructures, and also it leads to less accuracy and exactness. The indoor positioning has some five-quality metrics: (1) system accuracy and precision (2) system coverage (3) Latency in making position updates (4) Building’s infrastructure impact (5) effect of random errors by signal interference and reflection [1]. The objective of the project is to track the UWB tag using IPS network, which is working on UWB technology. The IPS network consists of a minimum of four anchors, and they all communicate with each other, and whatever data come from anchors that detect the position of a tag’s data that goes to the computer and mobile app via server or gateway and gives the exact location of tag and anchors. It is easy to track the device and avoid the hassle to find the assets. In IPS network, we used DMW1001 as UWB transceiver, which is used in anchors to create IPS network, and all anchors communicate with each other via the server.
2 UWB Positioning UWB positioning is a new emerging technology. UWB has a high data rate that can reach 100 Megabits per second (Mbps), which makes a suitable solution for near-far field transmission. Also, It has a large bandwidth and very short pulses waveforms which rectifying the effect of multipath interference and facilitate to Time of the arrival of burst transmission between the transmitter and related receivers, which makes UWB beneficial solution for Indoor positioning than other technology. Moreover, the period of a single pulse that determines the minimum differential path delay, whereas the period of signals that defines the maximum multipath observable delay [2].
2.1 Comparison with Another Positioning Technology UWB technology, unlike other positioning technologies such as infrared and ultrasound sensors, does not require any line of sight, and it is not affected by any interference of different communication technologies. Table 1 shows the differences between UWB technology and all different technology [3].
3 Theory UWB has to be used to locate an indoor object with high precision, whereas GPS cannot be used instead of UWB because it does not give good accuracy. In IPS, to achieve an accurate position, different methods are used [4].
Indoor Object Location Finding Using UWB Technology
439
Table 1 Comparison between technologies Technology
UWB
Bluetooth
Wi-Fi
RFID
GPS
Accuracy
Centimeter
1–5 m
5–15 m
Centimeter to 1m
5–20 m
Reliability
5 Stars (strong immunity to multipath and noise)
1 Star (sensitive to multipath and noise)
1 Star (sensitive to multipath and obstruction)
5 Stars
3 Stars (sensitive to obstruction)
Security
5 Stars
1 Star
1 Star
1 Star
N/A
Latency
5 Stars (Time required 3 s to get XYZ)
1 Star (Time required >3 s to get XYZ)
2 Stars (Time 3 Stars (Time required = 1 s required = to get XYZ) 100 ms to get XYZ)
Scalability density
4 Stars (>10’s to thousands of tag)
2 Stars (Hundreds of tag)
2 Stars (Hundreds to thousands of tag)
5 Stars (Unlimited)
5 Stars (Unlimited)
3.1 Ranging Ranging is a method that finds the distance between tag and anchor. The physic formula d = c ∗ t is using to calculate distance, where c = speed of light in air is 299, 792 and 458 m/s. Time is unknown, but finding a time of arrival(TOA) using different methods helps to calculate the distance [4]. One way Ranging One-way ranging is depending on two-node, which have a synchronized clock. The node N1 transmits Tr, and Node N2 is received Tr. Therefore ToF can be calculated by following T f = Tr − Tt
(1)
In the case of Nodes are non-synchronized so that it may lead to an incorrect estimation in Calculation in ToF [4, 5]. Two Way Ranging Two way ranging is opened at both sides for communication between anchor and tag. To measure the distance between two objects, so they need to exchange at least two messages [5, 6]. The tag initializes by sending an initial message to the known address of all anchors in the created network of IPS in a time referred to the Tsi (Time sending initial), anchor records time of initial reception Tri and replies with responding message at time Tsr, including message ID, Tri, and Tsi. Then, the tag will receive a response message
440
J. A. Soni et al.
Fig. 1 Estimation of ToF between tag and anchor
and record the time Trr. Using all Tsi, Tri, Tsr, and Trr, The ToF will Estimate, and hence the distance to all anchors will be deduced [6, 7]. From diagram, we can estimate distance and ToF as follows: Distance = T oF × c
(2)
where c is speed of light (c = 299, 792 and 458 m/s) and (Fig. 1) ToF =
[(T r r − T si) − (T sr − T r i)] 2
(3)
4 Implementation 4.1 System Overview In the introduction, the problem stated that to find object position in Indoor infrastructure. To solve the problem, we used 5 DMW1001 Development boards (UWB transceivers), Raspberry pi board, and one server. Out of 5 board, we used four as the anchor, and one as tag and raspberry pi is used to create a gateway that transmits all
Indoor Object Location Finding Using UWB Technology
441
Fig. 2 System overview
data to the server. Using server data, we visualized Real-Time position all anchors and tag in the form of the x-axis, y-axis, and z-axis on a mobile app and PC (as per seen in Fig. 2).
4.2 Hardware The system has four anchors and one tag and one raspberry pi to make gateway and connected to the server. The system needs to the server with a screen display, which receives a position of tag and anchor, and it demonstrates in the display [8].
4.2.1
Tag
The tag itself has one DMW1001 development board. It sticks on top of the raspberry pi board to make a gateway. The gateway is to communicate with the server using TCP protocol, and raspberry pi communicates to DMW1001 board with URAT protocol [8, 9]. The current position of tag is going itself on server. Then after it displayed on the screen and the location of tag calculated using a nonlinear least square method in JavaScript.
4.2.2
Anchor
The anchor is itself the DMW1001 development board which consist omnidirectional antenna [9]. The anchor work as a responder to communication is initiated by tag to collect the distance of them. It takes the signal, processes it, and send it back to
442
J. A. Soni et al.
gateway or node or server, which execute the vital method to calculate the position and express it in different ways like on the mobile screen.
4.3 Software To configure the anchor and tag data different c program, python and JavaScript are used. C is used for programing DWM board and Java program and python script are used to calculate the position of anchor and tag.
4.4 Server/GUI The Server takes all anchor and tag positions from the gateway for processing it. The initializing is done from Graphical User Interface (GUI) that shows the anchor and tag position which shown in Fig. 3. When GUI initialized, that time GUI will cover with a grid where each box is 10 cm × 10 cm large.
Fig. 3 Real time position of tag and anchor on GUI
Indoor Object Location Finding Using UWB Technology Table 2 Error in distance measure by IPSs
Actual (cm)
Measure by IPSs (cm)
443 Difference (cm)
30
23
7
50
41
9
100
94
6
200
202
2
300
297
3
500
500
5
Fig. 4 Line graph of actual distance and measure by IPSs
5 Results and Discussion Figure 3 shows the position of anchors and the tag on a GUI screen in term of x, y, and z axis. And it is also tracking tag in Realtime where the anchor’s position is rigid and tag is movable. The tag moves at a speed of 20 cm/s, and 2 s and 98 ms after GUI gives the tag exact location on screen. The accuracy of location is depending on the speed of tag. At the high-speed, error is less vice versa. The accuracy test done to compare the actual distance of the object or tag and measure the distance of the object or tag using IPSs network. The result of test that shown in Table 2 and Fig. 4 shows the graphical represent of accuracy test results. On the basis of accurate test results, the accuracy of IPSs system is ±9 cm which depends on speed of tag. At the high speed of tag, results more accurate and less error in position vice versa.
6 Conclusion In this paper, we propose a method to implement IPSs network using UWB transceivers with high accuracy. This indoor positioning system concept based on a Time difference of flight or arrival process and IEEE’s 802.15.4 support onesided two-way ranging and leading to centimeter accuracy in location measurement. The anchors are capable of communicating with one to another. The UWB tag can communicate with all anchors of system and process all analysis to compute the position. Once anchors initialize than the location of anchor and tag send to the GUI
444
J. A. Soni et al.
at a high data rate, and it displays setup of anchors in the system. Also, it can track tag and gives it Real-Time position of on GUI screen in term of the x-axis, y-axis, z-axis. The IPSs have an accuracy of ±9 cm. Acknowledgements Authors are thankful to Principal and HoD of Electronics Engineering Department for their motivation. We are also thankful to SSIP for funding the project. Government of India—World Bank for TEQIP-III project and hence forth providing funding assistance for publication. We are thankful to Professor I/C of ELARC-COE of Electronics Engineering Department, BVM Engineering College for research facility based support.
References 1. Alarifi, A., Al-Salman, A.M., Alsaleh, M., Alnafessah, A.: UltraWideband indoor positioning technologies: analysis and recent advances. J. Sens. 16, 707 (2016). https://doi.org/10.3390/s16 050707 2. Malajner, M., Planinšsiˇc, P., Gleich, D.:. UWB Ranging Accuracy. IEEE (2015). https://doi.org/ 10.1109/IWSSIP.2015.7314177 3. Liu, H., Darabi, H., Banerjee, P., Liu, J.: Survey of wireless indoor positioning techniques and systems. IEEE Trans. Syst. Man Cybern. Part C Appl., Rev (2007) 4. Dädeby, S., Hesselgrena, J.: System for indoor positioning using ultra-wideband technology (2017). (Online). Available at http://publications.lib.chalmers.se/records/fulltext/249898/249 898.pdf 5. Baba, A., Atia, M.: Burst mode symmetric double sided two way ranging. IEEE (2011). https:// doi.org/10.1109/WD.2011.6098183 6. Yi, J., Leung, V.: An asymmetric double-sided two-way ranging for crystal offset. IEEE (2007). https://doi.org/10.1109/ISSSE.2007.4294528 7. Tsai, M.-F., Pham, T.-N., Hu, B.-C., Hsu, F.-R.: Improvement in UWB indoor positioning by using multiple tags to filter positioning. J. Internet Technol. 20(3) (2019). https://doi.org/10. 3966/160792642019052003003 8. Lagerkvist, A.: Construction of an indoor positioning system using UWB. (Online). Available at http://www.diva-portal.org/smash/get/diva2:1290355/FULLTEXT01.pdf 9. DMW1001 User Manual. Decawave Ltd (2016)
Design and Tuning of Improved Current Predictive Control for PMSM S. Sridhar, Md Junaid, and Narri Yadaiah
Abstract This paper presents a design and tuning technique for a improved current predictive control based on hybrid controller for Permanent Magnet Synchronous Motor (PMSM) to overcome the problem of accuracy the PMSM is facing with, a ANFIS Controller is introduced in the Predictive Control. This novel control strategy gives the guaranteed dynamic performance with increased accuracy. The proposed method includes the Hybrid controller which comprises of ANFIS Controller in addition to the Fuzzy algorithm to get higher accuracy by eliminating the static error and simple disturbances. The main feature of this technique is to adjust the errors caused by the obsolete traditional predictive controller and PI controller, thereby enhancing the accuracy. The analysis is confirmed through comparison of various simulations using PI controller and Hybrid controller. Keywords Permanent magnet motor · Predictive control · Model parameter mismatch · Proportional integral compensation link · Fuzzy algorithm · ANFIS
1 Introduction A Permanent Magnet Synchronous Motor (PMSM) is an AC Motor in which the field excitation is given by permanent magnets, and it has a sinusoidal Back EMF waveform. With the use of permanent magnets, the PMSM can produce torque even at speed-zero. Current Control of PMSM is the major area of focus to achieve desired output. Among the different current control methods for PMSM drives like PI (Proportional Integral) Control, Hysteresis Control and Predictive Control, the Predictive Control technique is popular, as it gives a dominant performance. Though Predictive Control has so many advantages like fast dynamic response, it lacks in the accuracy due to static error and oscillations in the system. S. Sridhar · Md Junaid (B) Department of Electrical and Electronics Engineering, JNTUA, Ananthapuramu, A.P, India N. Yadaiah Department of Electrical and Electronics Engineering, JNTUH, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_36
445
446
S. Sridhar et al.
In the PMSM system [1], amongst the current control techniques, importantly PI controller [2] has some benefits of basic structure, great soundness and unwavering quality. Hysteresis control has great quickness, yet in addition has numerous deformities, for example, huge wave and unfixed exchanging recurrence. It is regularly hard to reach the necessities of elite control. As per the distinct action modes of voltage vector, the current predictive control [3] of Permanent Magnet Synchronous Motor system could be classified into two ways. The voltage required can be achieved by the [4, 5] current waveforms, rotor position angle. And at that point the voltage vector is implemented to motor decisively by using Space Vector PWM [6] of the inverter. Past a few control periods, the motor current would follow the standard current by the voltage vector of controller’s outcome. The PWM miscreant predictive current control [7] causes the current-loop of the motor control framework to accomplish great dynamic and consistent execution. Nonetheless, until now there were few issues that should be settled since the predictive current control depends on motor framework model, and this step-bystep procedure needs high exactness of the model. Nonetheless, precise estimations of some key factors are difficult to be caught, and a few factors are changed with working condition of the motor, for example, stator inductance and resistance. In order to eliminate the problem of accuracy PMSM is facing with, a Hybrid Controller is introduced in the Predictive Control [2]. This novel control strategy gives the guaranteed dynamic performance with increased accuracy. The major concentration on PMSM is because of its upraising importance and scope for it being used in Hybrid Vehicles [8]. The proposed method includes the Hybrid Controller which comprises of ANFIS Controller in addition to the Fuzzy algorithm to get higher accuracy by eliminating the static error and simple disturbances. The main feature of this technique is to adjust the errors caused by the obsolete traditional predictive controller and PI controller, thereby enhancing the accuracy.
2 Concept of Predictive Current Control The traditional PWM predictive current control algorithm is illustrated as follows. And the structure figure of design is shown in Fig. 1. The equations of PMSM voltages in d axis and q axis are shown as shown below: di d − ωe L q i q dt di q + ωe L d i d + ωe ϕ f vq = Ri d + L q dt vd = Ri d + L d
(1)
Here R is the stator resistance. L d is d -axis inductance of stator. L q is q-axis inductance of stator. For surface-mounted PMSM stator inductance L d = L q = L. ω e is electrical angular velocity. ϕ f is magnetic flux linkage. id and iq is d-axis and
Design and Tuning of Improved Current Predictive Control for PMSM
447
Fig. 1 Structure of Improved Predictive Current Control based on Hybrid Controller
q-axis stator current respectively whereas the symbols vd and vq is d-axis and q-axis stator voltages respectively. For the discrete time framework, the stator voltages of d and q axes is given by the approximation techniques. The time period T is taken as the sampling period of the discrete time frame. If it is small enough then the ωe is to be the fixed value. By using the equation (1) we get the voltages as shown below: vd (k) = Ri d (k) + L T [i d (k + 1) − i d (k)] − ωe Li q (k) vq (k) = Ri q (k) + L T i q (k + 1) − i q (k) + ωe Li d (k) + ωe ϕ f
(2)
(k) stands for the kth control period. As per the Eq. (2), the voltages obtained can be utilized to obtain the stator currents id , iq for the kth control period. Similarly by substituting the standard currents we can obtain the predictive currents in the upcoming time periods i.e., k + 1, k + 2, …. The estimation is exact if the motor factors are exact. pr e ref vd (k) = Ri d (k) + (L d /T ) i d (k) − i d (k) − ωe L q i q (k) (3) vqpr e (k) = Ri q (k) + (L q /T ) i qr e f (k) − i q (k) + ωe L d i d (k) + ωe ϕ f pr e
pr e
The predictive voltage vd (k), vq (k) are adjusted by Space Vector Pulse Width Modulation (SVPWM), such that the current can be in similar way that resembles the standard current in the upcoming control time frame y[9]. As shown in Fig. 2, the time splitting concept is clearly evident that the time sequences can be availed. Due to setup limitations, up to the initial time of the (k + 1)th control period, the duty cycle number could be modified by any of advanced microelectronic devices that could be possibly, a programmable device like:
448
S. Sridhar et al.
Fig. 2 The sequence concept of estimated current sampling of (k + 1) period and its duty cycle
Microprocessors. The earlier time period predictive voltage vd pre (k + 1), vq pre (k + 1) will result to the consequence that the predictive voltage vd pre (k), vq pre (k) if performance is not accurate. Therefore, it is required to estimate the current value id (k + 1), iq (k + 1) in order to improve the accuracy. It is given by the compensation techniques as shown below:
pr e v (k − 1)T RT i d (k) + T ωe i q (k) + d i dcom (k + 1) = 1 − L L
RT i q (k) − T ωe i d (k) + vqpr e (k − 1)T /L i qcom (k + 1) = 1 − L
(4)
where vd pre (k + 1), vq pre (k + 1) are the estimated voltages determined in the Eq. (2). id com (k + 1), iq com (k + 1) are the (k + 1)th period values of current compensation of delayed period. These values are used to obtain the standard current values of the delayed current compensation. pr e ref vd (k) = Ri dcom (k + 1) + (L d /T ) i d (k) − i dcom (k + 1) − ωe L q i q (k) vqpr e (k) = Ri qcom (k + 1) + (L q /T ) i qr e f (k) − i qcom (k + 1) + ωe L d i dcom (k + 1) + ωe ϕ f (5) From the Eqs. (4) and (5) the estimated current have the option to follow the standard current id ref (k), iq ref (k) till the finish of the following control time period. From the study of all the standard current equations, estimated currents, compensated currents, we come to know that the PWM predictive current control has powerful execution as the compensated values have the nearest connection between the outcome of the setup and the reference parameters that are taken from the motor as mentioned in the Table 3. In this way, regardless of the estimated output of the predictive controller the accuracy is enhanced. The other parameters like inductance and the flux of the motor are progressively noteworthy.
Design and Tuning of Improved Current Predictive Control for PMSM
449
3 ANFIS Controller ANFIS is a mix of Fuzzy Systems and Neural Networks, which has the advantages of both. ANFIS is a basic information learning procedure that utilizes fuzzy inference system model which utilizes membership functions, fuzzy logic operators and on the off chance that rules to get the necessary output. It gives smoothness because of the Fuzzy control (FC) insertion and versatility because of the Neural Network Back propagation (Figs. 3, 4 and 5).
Fig. 3 ANFIS controller
Fig. 4 ANN Structure with two inputs and one output for ANFIS Controller
450
S. Sridhar et al.
Fig. 5 Replacing PI controller with ANFIS Controller
Table 1 Comparative results between existing Fuzzy v/s Hybrid Controller when iq ref = 10A, nt = 1500 rpm
Parameters
Existing method
Proposed method
Iq
1.5–2.3 s (2 control periods)
1.7–2 s (2 control periods)
Id
0.443 A
0.357 A
Mc1 (k)
1.5–2.22 (2 Control periods)
1.5–2.0 (2 control periods)
4 Design of Hybrid Controller: The ANFIS Controller along with Fuzzy algorithm is Hybrid Controller. Hybrid Controller is used to eliminate the limitations of the Fuzzy algorithm. ANFIS Controller reduces the problem the framework is facing with. The steady state error reduces as well as the system stability improves. These are the reasons to use a Hybrid Controller which majorly utilizes the ANFIS Controller and the rule base of Fuzzy Inference System. It is designed as shown in the block diagram of Enhanced Predictive Current Control based on Hybrid Controller of Fig. 1. A novel flux observer along with the delay algorithm is used in addition to the Hybrid Controller. Here the Hybrid Controller block represents the black box which in it comprises of the ANFIS Controller and the Fuzzy Inference System. Its design is simple and the inputs are as same as that of the Fuzzy Controller. The Predictive Current Control output waveform of iq and id enhances when compared with the number of control periods in the waveforms of the earlier used techniques like PI control, Fuzzy Algorithm etc., of the PMSM. The comparisons are made between the existing methods to the proposed method through waveforms shown below and refer Table 1 for the comparative values. A flowchart is also attached hereby (Fig. 6).
5 Simulation Results The Simulation is carried out with the values of the rated parameters, as mentioned in Table 3, in MATLAB/SIMULINK, the results of the scope of the Hybrid Controller as shown in Fig. 1 acquired are as follows: The Current and Mc1 output waveforms with rated motor parameters under different loads are:
Design and Tuning of Improved Current Predictive Control for PMSM
451
Fig. 6 Membership function
The current waveforms with different method under the different operating condition are: As the Table 1, clearly depicts the fact of improved predictive control of current when compared to the existing method. The ANFIS controller makes the currents iq , id and Mc1 (k) waveforms to settle down quickly hence eliminating the distortions and reducing the settling time. The table demonstrates the comparisions between the values of iq, id and Mc1 (k) when iq ref = 10A, nt = 1500 rpm. For 2 control periods the proposed method takes significantly lesser time when compared to the existing method. The values bear the fact and hence the proposed method is superior to the existing method. The graphical figures represented from Figs. 7(A–I), 8(A, B), 9(A, B), and 10(A– C) are the simulation results and the comparisons are made to show the improvement in steady state performance and accuracy.
452
S. Sridhar et al.
6 Conclusion The outcomes and results of the Hybrid Controller based PMSM by using MATLAB/SIMULINK, depicts the better execution of current loop control by reducing the settling time. The steady state errors have been significantly reduced.
(a)Id when iqref=5A, speed=400rpm
Id when
b)Iq when iqref=5A, speed= 400rpm
Mc1(k) (B)
5A, speed
400rpm
(c) Mc1(k) when iqref=5A, speed= 400rpm (A)
Id when
10A, speed
400rpm
Iq when
Iq when
10A, speed
400rpm
5A, speed
400rpm
Fig. 7 q and d axes current waveforms at 5 A, 400 rpm
Design and Tuning of Improved Current Predictive Control for PMSM
Mc1(k) (C)
Id when
10A, nt
Id when
5A, nt
800rpm
I q when (E)
5A, nt
Iq when
5A, nt
800rpm
Id when
10A, nt
Mc1(k) when (D)
5A, nt
Fig. 7 (continued)
800rpm
Iq when
10A, nt
800rpm
800rpm
800rpm
800rpm
453
454
M c1(k) when (F)
S. Sridhar et al.
10A, nt 800rpm Iq when
,n
Id when iref q =5A, nt =800rpm
Mc1(k) when (H)
M c1(k) when iref q =5A, nt =800rpm (G)
Mc1(k) when iqref=10A, nt=1500rpm (I)
,n
Fig. 7 (continued)
The ANFIS Controller uses the rule base with proper instructions provided to it. The results of the simulation are contrasted to that of the results of the one obtained by using the Fuzzy algorithm. The fast, accurate and the stable results of current are achieved by using the Hybrid controller. ANFIS controller plays the major role in reducing the steady state errors. The novel flux observer, delay compensation, torque
Design and Tuning of Improved Current Predictive Control for PMSM
M c1 (k) when (A)
= 5A, nt = 800rpm
id when 400rpm, 3A
Iq when 400rpm, 3A
455
i abc when 1500rpm, 10A B (a) Traditional PI Control
Id when 400rpm, 3A
Iq when 1500rpm, 10A B (b) Proposed Method
Fig. 8 Comparison of traditional and new methods by using different weight coefficient, iabc current at different current and speeds
Ilabc (A), 3A, 400rpm in Proposed method
Ilabc (B)_1 Traditional PI Control
Fig. 9 Various differentiation in between traditional and proposed method
456
Id waveform (a)_2 Current waveform with twice rated inductance parameter
S. Sridhar et al.
(B) i q v/s iqref waveform in the proposed method
ilabc waves with line to line display (C) Proposed Method
id closest to idref of flux perception and current dynamic procedure in consequence of flux observer.
iq when compared to iqref pulses of flux perception and current dynamic procedure in the consequence of flux observer
Fig. 10 Current waveforms and its comparisons between both proposed and traditional method
Design and Tuning of Improved Current Predictive Control for PMSM
457
calculation, compensation with weights and proportional integration link as shown in the Fig. 1 depicts the necessity of the accuracy in the output. The fuzzification, defuzzification by using the instructions i.e., the knowledge base and rule base play the key role in this improved current predictive control in the Hybrid controller. The obsolete PI control though popularly used because of its simple structure lacks in terms of accuracy. It is ensured that this Hybrid Controller fulfills the requirements and covers up all the drawbacks that were enabled by using the earlier techniques.
References 1. Kim, K.-H., Baik, I.-C., Moon, G.-W., Youn, M.-J. (eds.): A Current Control for a Permanent Magnet Synchronous Motor with a Simple Disturbance Estimation Scheme (September 1999) 2. Morel, F., Lin-shi, X.F., Retif, J.M., et al.: A comparative study of predictive current control schemes for a permanent-magnet synchronous machine drive. IEEE Trans. Industr. Electron. 56(7), 2715–2728 (2009) 3. Wang, Z., Yu, A., Li, X., Zhang, G., Xia, C.: A novel current predictive control based on fuzzy algorithm for PMSM. IEEE J. (2018). https://doi.org/10.1109/JESTPE.2019.2902634 4. Wang, G., Yang, M., Niu, L., Gui, X., Xu, D.: Improved predictive current control with static current error elimination for permanent magnet synchronous machine. In: IECON 2014—40th Annual Conference of the IEEE In.ustrial Electronics Society, pp. 661–667, Dallas, TX (2014) 5. Tong, S., Sui, S., Li, Y.: Observed-based adaptive fuzzy tracking control for switched nonlinear systems with dead-zone. IEEE Trans. Cybern. 45(12), 2816–2826 (2015) 6. Mohamed, Y.A.R.I., El-Sadany, E.F.: A control scheme for PWM voltage-source distributedgeneration inverters for fast load-voltage regulation and effective mitigation of unbalanced voltage disturbances. IEEE Trans. Industr. Electron. 55(5), 2072–2084 (2008) 7. Wang, H., Yang, M., Niu, L., Xu, D.: Improved deadbeat predictive current control strategy for permanent magnet motor drives. In: Proceedings of 6th IEEE Conference on Industrial Electronics and Applications, pp. 1260–1264 (2011) 8. Al-Aawar, N., Arkadan, A.R.A.: Optimal control strategy for hybrid electric vehicle powertrain. IEEE J. Emerg. Sel. Top. Power Electron. 3(2), 362–370 (2015) 9. Zeng, Q., Chang, L.: An advanced SVPWM-based predictive current controller for three-phase inverters in distributed generation systems. IEEE Trans. Industr. Electron. 55(3), 1235–1246 (2008) 10. Lemmens, J., Vanassche, P., Driesen, J.: PMSM drive Current and voltage limiting as a constraint optimal control problem. IEEE J. Emerg. Sel. Top. Power Electron. 3(2), 326–338 (2015) 11. Li, N., Ming, Y., Dianguo, X.: An adaptive robust predictive current control for PMSM with online inductance identification. In: Proceedings of International Review of Electrical Engineering (IREE), vol. 7, no. 2, pp. 3845–3856 (2012) 12. Serrano-Iribarnegaray, L., Martinez-Roman, J.: A Unified approach to the very fast torque control methods for DC and AC machines. IEEE Trans. Industr. Electron. 54(4), 2047–2056 (2007) 13. Kakosimos, P., Abu-Rub, H.: Deadbeat predictive control for PMSM drives with 3L NPC inverter accounting for saturation effects. IEEE J. Emerg. Sel. Top. Power Electron. (2018). https://doi.org/10.1109/JESTPE.2018.2796123,Jan 14. Fang, Y., Xing, Y.: Design and analysis of three-phase reversible high-power-factor correction based on predictive current controller. IEEE Trans. Industr. Electron. 55(12), 4391–4397 (2008) 15. Wipasuramonton, P., Zhu, Z.Q., Howe, D.: Predictive current control with current-error correction for PM brushless AC drives. IEEE Trans. Ind. Appl. 42(4), 1071–1079 (2006)
458
S. Sridhar et al.
16. Moreno, J.C., Espi Huerta, J.M., Gil, R.G., Gonzalez, S.A.: A robust predictive current control for three-phase grid-connected inverters. IEEE Trans. Ind. Electron. 56(6), 1993–2004 (2009) 17. Niu, L., Yang, M., Xu, D.: An adaptive robust predictive current control for PMSM with online inductance identification. Proc. Int. Rev. Electr. Eng. (IREE) 7(2), 3845–3856 (2012) 18. Cai, X.B., Zhang, Z.B., Wang, J.X., Kennel, R.: Optimal control solutions for PMSM drives: a comparison study with experimental assessments. IEEE J. Emerg. Sel. Top. Power Electron. 6(1), 352–362 (2017) 19. Butt, C.B., Hoque, M.A., Rahman, M.A.: Simplified fuzzy logic-based MTPA speed control of IPMSM drive. IEEE Trans. Ind. Appl. 40(6), 1529–1533 (2004) 20. Uddin, M.N., Rahman, M.A.: Fuzzy logic based speed control of an IPM synchronous motor drive. IEEE Trans. Ind. Appl. 3(3), 1259–1264 (1999) 21. Uddin, M.N., Rahman, M.A.: High-speed control of IPMSM drives using improved fuzzy logic algorithms. IEEE Trans. Ind. Electron. 54(1), 190–199 (2007) 22. Nasiri, A.: Full digital current control of permanent magnet synchronous motors for vehicular applications. IEEE Trans. Veh. Technol. 56(4), 1531–1537 (2007) 23. Li, N., Ming, Y., Dianguo, X.: An adaptive robust predictive current control for PMSM with online inductance identification. Int. Rev. Electr. Eng. 7(2) 24. Yin, Z., Han, X., Du, C., Liu, J., Zhong, Y.: Research on model predictive current control for induction machine based on immune-optimized disturbance observer. IEEE J. Emerg. Sel. Top. Power Electron. (2018, March). https://doi.org/10.1109/JESTPE.2018.2820050 25. Li, K.: PID tuning for optimal closed-loop performance with specified gain and phase margins. IEEE Trans. Control Syst. Technol. 21(3), 1024–1030 (2013) 26. Kim, H., Degner, M.W., Guerrero, J.M., et al.: Discrete-time current regulator design for AC machine drives. IEEE Trans. Ind. Appl. 46(4), 1425–1435 (2010)
Parametric Analysis of Texture Classification Using Modified Weighted Probabilistic Neural Network (MWPNN) M. Subba Rao and B. Eswara Reddy
Abstract Texture classification is one of the sort-out methods in pattern recognition. In this research work, a novel proposal called Modified Weighted Probabilistic Neural Network (MWPNN), which can be used to classify the textures, is proposed. It outperforms the previous method by adding inherent capabilities with respect to the weighing characteristic. The weights are modified with help of the Sensitivity Analysis (SA) Method. This MWPNN includes the Self-Organizing Maps of the Neural Network (SOM) and including weighting factors extracted from a supervised labelling process. The proposed approach is tested on sample textures and the results obtained are compared to the Probabilistic Neural Network (PNN) and the Weighted Probabilistic Neural Network (WPNN) with bench mark machine learning algorithms such as Naïve Bayes Classifier and Multi-Layer Perceptron. The efficiency of this method is compared to Mean, Standard Deviation, Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR). The entire simulation is carried out using MATLAB computing tool with the help of Image Processing and ANN toolboxes along with required auxiliary functions and blocksets. Keywords Texture Classification · Probabilistic Neural Network · Weighted Probabilistic Neural Network · Self-Organizing Map · Machine Learning Algorithms and Performance Evaluation
1 Introduction Some of the most critical issues of pattern detection is description of texture patterns [1, 2]. Recognition of patterns that include mathematics, computing, artificial intelligence, computer science, psychology and physiology, and so on, grew immensely in M. Subba Rao (B) Research Scholar, JNTUA, Ananthapuramu and Professor of CSE, Annamacharya Institute of Technology and Sciences (Autonomous), Rajampet, Kadapa, A.P., India e-mail: [email protected] B. Eswara Reddy Software Development Center, JNTUA, Anantapuramu, A.P., India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_37
459
460
M. S. Rao and B. E. Reddy
the 1960s as an area of research. This is generally determined by the type of training algorithm used to generate an output value. Multiple techniques for texture segmentation, texture classification, and texture synthesis are established. We realize that the study of texture has a long tradition but its uses for actual image data have been limited to date [3]. These limitations were mainly due to the innate limitations of the available methods which are based on various random characteristics of Imaging technologies. In machine learning problem solving, a task of certain sort of output value to the given input value is based on a specific algorithm. Supervised learning means that the optimal output has been achieved through the collection of training data [4]. There are three main ways to recognize patterns: statistical, structural and neurological pattern recognition. This same identification of quantitative patterns was used in the design of commercial recognition systems [5, 6]. A particular set of features that are popularly known as d-dimensional application. This is constrained by the decision limits that the statistical distribution of the patterns would create in the future space. The decision limits have to be chosen by methodology utilized based on the various issues such as how the distribution process is carried out depending on various random variable functions in stochastic process. Many principles of the theory of statistical judgment are used to determine judgment between different groups of patterns [7]. Broadly, it is conducted in two ways: preparation is first, and sorting is second. A class family observes the Statistical Model. They are the conditional probability density function P (x/Ci) (possibility of the vector X function given by the Ci class). Recognition of structural patterns [8–10], relies on syntactic grammars to differentiate data from various classes depending on the interrelationships currently found within the data. The data provided for the purpose of classification must be chosen depending on various parameters such as the brightness and saturation condition while the image has been acquired or captured. In the structural approach a structural definition or representation describes each pattern type. We find two methods mainly in the recognition of structural patterns. They are an analysis of syntax and a matching framework. The basis of syntax is formal language hypothesis, and a special mathematical technique based on sub-patterns is used for matching structure. Time series analysis is very useful feature, which helps in computing the genuineness of the entire database such that duplicity is avoided and image data also play a significant role here. Because Neural Network has this sort of data set of training [11], they are also a luring attraction for the study. Designing a neural network is both mathematical and theoretical activity since it requires error-back propagation algorithm. The explanation is that there are a number of considerations involved in the design of a network that is the product of the developer’s observations, but with some issues in mind, researchers could even advise the backpropagation algorithm with better results [10–14]. Mathematical computations have to be considered with uttermost preciseness to avoid the lastminute errors which may result in the faultiness of the algorithms, which is not a desired concern in these types of texture image applications. Wide experience of
Parametric Analysis of Texture Classification Using Modified …
461
researchers shows that we can use the back-propagation algorithm to boost efficiency by considering few details such as the working of the neuron network.
2 Probabilistic Neural Network The Probabilistic Neural Network [15, 16] is often feed-forward neural network emerging from Bayesian network and a predictive computing algorithm known as Discriminate Processing Kernel Fisher. According to Burrascano et al. [17], it was shown that PNN is more time-efficient than conventional back propagation networks and is acknowledged as an additional option for real-time classification problems. Within a PNN, services are organized into a multi-layer feed forward neural network with four layers that are the input layer, the hidden layer, the summation/pattern layer, and the output layer [18–23]. The structure of Probabilistic Neural Network is shown in Fig. 1.
2.1 Input Layer Each neuron has been recognized as a predictive variable in the input layer. N − 1 neurons are found in categorical variables where the number of groups is N. Organizes the distribution of values by deducting and separating the interquartile level from the median. The input signals then supply the values to the secret layer for each of the neurons. Fig. 1 Structure of Probabilistic Neural Network
Input layer
hidden layer summation layer
Output layer
462
M. S. Rao and B. E. Reddy
2.2 Hidden Layer/Pattern Layer This layer carries one neuron for each case in the training data set. It can preserve the values of the predictor variables for the case, along with the desired value, and hide the Euclidean distance of the testing phase from the centre point of the neuron, and then apply the RBF kernel function using the sigma values.
2.3 Summation Layer Once evaluated, a single neuron pattern collection for PNN networks is stored for each hidden neuron for the target variable of each group and the actual target group for each training event. Here, the weighted value of the hidden neuron itself is considered to be the pattern neuron involved with the category of hidden neuron. The very first layer here measures the distance parameter from the input vector to the learning input vector when the input vector is available. For any further work, the calculated distance becomes an effective and improved parameter. The distance can generate a vector in which its actual elements show what the closeness of the input vector is to the input vector of the training. The second level comprises the function from each set of input vector and generates its total output as a probabilistic vector. At the end of the day, the full active function mostly on second layer output layer chooses the limit of these odds and generates digits 1 and 0, in which 1 represents positive identification for such a class as well as 0; negative classification for non-target classes.
2.4 Output Layer One can compare the output layer to the build-up of weighted votes in each target model in the pattern layer and then use the highest valued vote to predict the target classification.
3 Weighted Probabilistic Neural Network The WPNN structure is shown in Fig. 2. The layer holds K pattern node pools as we encounter the regular PNN, and there is a relationship between every node with in summation and the pattern layer, which is why the weighted average between the summation-to-pattern layers is either 1 or 0. However, the pattern layer nodes in WPNN have relationships with each node in the summation layer, so each pattern node is assigned to each summation layer node. In any case, the structural variation of the pattern-summation layer is similar to that of the traditional PNN, except that
Parametric Analysis of Texture Classification Using Modified …
463
Fig. 2 Block Diagram of WPNN
the criteria weights are applied between the pattern-to-summation layers of the soft label matrix M Label. In this work, we propose an approach to classify different textures, the modified Weighted Probabilistic Neural Network (WPNN) [24–26]. For each pixel, the modified WPNN is capable of producing image contrast predictions for different tissues right from the modelling process to the final stage [27]. Sensitivity Analysis Method can be used to modify the weight. Usually, this function is not found for standard neural network algorithms. It uses the vectors produced from the SOM neural network forward towards the weighting approach extracts from the supervised labelling organization which may mark the SOM standard vectors in accordance with the training dataset identified by the researchers. Figure 3 shows the proposed WPNN block classifier diagram. First step, we did not send input vectors to the SOM Neural Network and human experts at the same time. SOM Neural Network divides input data into a set of subclasses, the number of which exceeds the number of groups in the final target. Manual labelling approach [28] may be adopted if there are limited amounts of SOM template and the result has been satisfactory reporting. But when there are large amounts of reference vectors, since it is not instantaneous, the SOM structure is not possible. In comparison, manual labelling is subjectively reproducible. One such step generates the standard vectors (also defined as the “SOM map”) and the identify the significant for the next step. Usually Bayesian-based supervised labelling of ordered framework labels reference vectors in accordance with the specified training sets described by the experts, in addition to the weighting considerations for and comparison vectors. These same weighted variables correspond to the probability of considering vectors belonging to
464
M. S. Rao and B. E. Reddy
Input data
Stage1
SOM Neural Networks
SOM map Input data
Stage2
Labelling
Training Sets
Kernel Function Parameters
Weighted Factors
Output
WPNN Probabilistic Classification
Fig. 3 Two stage flow diagram of proposed WPNN
the peremptory target groups. In stage 2, we use weight values for reference vectors to approximate the probabilistic density function used by the WPNN to conduct a probability—based categorization of the results.
3.1 Supervised Labelling Mechanism The labelling method introduced in this paper is capable of supervised and soft SOM reference vectors, but we cannot ignore the importance of the Bayesian methodology, which is the foundation of the suggested technique. Standard vectors for SOM have been labelled using the chosen training data set manually. Assume the final objective classification is y, and the comparison vector number of SOM neural network is x, of y x. Then the posterior probability [29] through Baye’s theorem is given in Eq. (1) as p(x|y ) P(y|x) = k p(x|k ) where p(x|k) is the conditional pdf of class k at pixel x.
(1)
Parametric Analysis of Texture Classification Using Modified …
465
3.2 WPNN Algorithm Normal PNN carries a three-layer feed forward network consisting of a data layer, a pattern layer, and a summation layer. It relies on the classification of the kernel used for the neural network. The input has ‘M’ number of nodes on the pattern layer to test the input vector and the pattern node pools K. The pattern layer of the jth pool is composed of Nj number of pattern nodes. This is why each and every node in the given pattern and input layer is directly linked. The shown summation layer contains ‘K’ nodes which are allocated to one in each pool with in pattern layer. The pattern nodes of each jth summation node are linked to the respective jth summation node of the summation layer. By default, the Gaussian function becomes an activation function in the form of a radial base function in the pattern layer. Linear base and sequential activation functions are used in the summation layer. When there are nj nodes in the pattern layer representing class Kj, the standard pdf estimate for Kj is given in Eq. (2).
p x|K j
Nj x − μ j 2 1 = exp − 2σ 2 N j (2π )d/2 σ d i=1
(2)
where d is the input dimension, the mean vector for jth pattern is μj , and the smoothing factor is x. All above equation helps PNN to calculate probability-related segmentation and output yields with posterior probability values dependent on Bayesian. PNN’s get a well-known benefit of being able to appear as trained in a fast onepass training package, saving time. The precision of the approximate function of the distribution of likelihood relies primarily on the smoothing factor ÿ: if the value is smaller, this results in a very spiky approximation that cannot be generalized. From the other view, the precise values should be smoothened by a larger ÿ. When considering the cross-validation algorithm, we can find optimal 1, but sometimes, under greater pattern analysis, one single π value cannot match for all patterns. However even assuming a single value for multidimensional data conveys individual input vector variables which implies equal variance for all, does not hold true for each time.
3.3 Sensitivity Analysis The SA procedure [30] is commonly used to evaluate the effect of the different inputs of the neural network. It can then be used to extract relevant attributes of input. The central idea of SA is to show the effectiveness of selected features to the neural network output after the training process.
466
M. S. Rao and B. E. Reddy
3.4 Bench Mark Classifiers Used in the Comparison Classification results are obtained by using our proposed Modified Weighted Neural Networks. Provided outputs are contrasted with bench mark classifiers such as Naïve Bayes Classifier and Multi-Layer Perceptron (MLP) along with PNN and WPNN.
3.5 Multi-Layer Perceptron (MLP) MLP is a feed-forward neural network that is trained with the use of back propagation [31]. The network is made up of an input layer, a hidden layer, and an output layer. The number of hidden layers, total number of neurons in hidden layers, and the transfer functions of the MLP must be selected.
3.6 Navie Bayes Classifier (NB) The fundamental principle of NB is the conditional independence of attributes. It calculates the class of the maximum likelihood of attributes and supposes that the attributes are autonomous of one another. One such methodology operates well if the dataset is noise-proof and outlier. The mathematical expression of a method is described as just an Eq. (3). P(X |Z ) = P(Y |X ) P(X )P(Y )
(3)
4 Results and Discussions Various images like Cloth, Lymphocytic, Roof, Stone, Paper, Canvas Art, Red Christmas, Speaker Grill, Magnolia and Broadatz are considered for this work and carried out the simulation using MATLAB Technical Computing Language using various toolboxes and block sets with the help of auxiliary functions which support in successful simulation and computing of the methodologies. The sample texture images used for simulation purposes are shown in Fig. 4 in a random manner. Every texture image is subdivided into two non-overlapped sub images of 64 × 64 dimension, i.e. 64 non-overlapped sub image textures. Four textures of each category are considered for experimental purposes, out of these three textures, i.e. 192 sub image textures, for each category are considered for the training database, and 64 sub image textures of the same category are considered for the test database. The present paper measured texture classification rate, Mean, Standard Deviation, STNR and root mean
Parametric Analysis of Texture Classification Using Modified …
Cloth Texture
Stone
Lymphocytic
Paper
Roof Texture
canvas art
467
Brodatz
Speaker grill
Fig. 4 Sample texture images
squared error using machine learning classifiers like Multi-Layer Perceptron, Naïve Bayes Classifier with PNN, WPNN and Proposed Method. As per the observation of parametric values from the Table 1, and its corresponding graphical representations for the parameters MSE, PSNR, Standard Deviation and Mean values as shown in Fig. 5. The graphical comparison of each parameter is also shown in Figs. 6, 7, 8 and 9. The discussion is as follows. On comparison with different texture images of MSE parameter has shown a good improvement i.e. reduced values in MWPNN to WPNN, which is a desired one. As the error reduces the classification becomes easier. And for PSNR values MWPNN has shown a tremendous response, which conveys noise has been suppressed leading to a better perception in images. The classification Performance of textures using bench mark classifiers NB, MLP and variants of PNN such as WPNN and proposed method MWPNN are shown in Table 2 with graphical representations in Fig. 10. The results show that the MWPNN is the best classification method among all these classification methods. As said in above the original image is put under classification processes via PNN, WPNN and MWPNN. Hence with human perception it can be seen that enhancement is clearly done using WPNN compared with other methods [32].
5 Conclusion In this paper PNN, WPNN and proposed methods have been used and simulated for a particular set of images which are related to texture compilation. In this process of first approach towards classification has shown a good response as per human vision and graphical representation. From human understanding the MWPNN method shows better enhancement than the available procedure of PNN. By observing first trail we can say that MWPNN can be used in many pattern classification problems which involve the texture images which have very fine and precise data. The
2.9395
12.3667
Broadatz
5.8028
Canvas art
Magnolia
5.378
Paper
1.5758
16.2147
Stone
18.305
2.7092
Roof
Speaker grill
6.648
Red Christmas
13.949
Lymphocytic
37.282
43.482
35.5392
46.1899
40.528
40.8586
36.0657
43.8364
39.9379
36.719
41.102
54.523
27.891
31.557
37.896
31.7455
26.3314
18.4842
38.1703
27.1229
84.0632
42.317
66.648
63.4587
67.178
98.9704
115.767
82.0611
105.264
72.957
10.5085
2.2414
2.3491
1.4765
2.4205
1.673
1.84
1.8865
3.1388
9.369
WPNN MEAN
MSE
STD
MSE
PSNR
PNN
Cloth
Type of texture
Table 1 Comparing the Results of PNN, WPNN and MWPNN
37.949
44.659
44.4558
46.4723
44.325
45.9299
45.5166
45.4082
43.1971
38.4474
PSNR
43.16
57.38
31.36
32
40.34
33.65
30.14
19.29
41.37
21.82
STD
83.827
37.245
67.4
68.177
66.904
98.734
116.25
81.87
105.28
74.637
MEAN
MWPNN
1.2126
0.6107
1.4384
1.4651
1.0221
1.3947
1.0514
1.5482
1.0392
1.5048
MSE
74.147
83.722
72.8813
72.7399
74.093
73.1168
75.133
72.3076
75.2103
72.5314
PSNR
48.2067
59.616
35.335
41.465
39.893
41.3503
42.927
24.7061
58.1443
33.8197
STD
56.826
25.1264
32.3472
29.0674
38.4123
71.7197
93.3329
45.5453
78.1491
38.7958
MEAN
468 M. S. Rao and B. E. Reddy
Parametric Analysis of Texture Classification Using Modified …
469
Fig. 5 Graphical representation of MSE, PSNR, STD and MEAN Values for PNN, WPNN and WPNN
Fig. 6 Graphical comparisons of MSE values
proposed methodology on processing for broadatz has produced 99.1% of classification for considered texture imagery results. Thus, on comparison proposed scheme yielded best classification ratio under unsupervised classification. The comparison is carried out with respect to well defined parameters and the corresponding graphical representations.
470 Fig. 7 Graphical comparisons of STD values
Fig. 8 Graphical comparisons of PSNR values
M. S. Rao and B. E. Reddy
Parametric Analysis of Texture Classification Using Modified …
471
Fig. 9 Graphical comparisons of MEAN values
Table 2 Classification performance of textures using bench mark classifiers NB, MLP and variants of PNN such as WPNN and proposed method MWPNN Name of texture Various classifiers and their classification rates (accuracy) Navie Bayes
MLP
PNN
WPNN MWPNN
Cloth
56.48
87.23 90.6
94.32
94.99
Lymphocytic
57.38
88.9
89.72 92.34
93.56
Roof
56.98
90.11 92.56 95.56
96.54
Stone
60.55
86.89 91.23 92.34
94.99
Paper
61.56
88.23 91.67 94.78
95.45
Canvas art
54.67
79.12 88.45 90.06
92.55
Red christmas
54.3
77.78 84.23 89.23
90.22
Speaker grill
59.45
82.16 84.99 90.15
92.42
Magnolia
53.78
84.56 92.45 94.56
96.45
Broadatz
62.67
91.12 96.32 98.72
99.1
472
M. S. Rao and B. E. Reddy
Fig. 10 Graphical representation of accuracy of different textures using NB, MLP, PNN, WPNN and MWPNN
Acknowledgements This research is carried out under the support of JNTUA University, Anantapur and also from employer institution AITS (Autonomous), Rajampet where research facilities are provided.
References 1. Charladies, D., Kasparis, T.: Wavelet-based Rotational invariant roughness features for Texture classification and segmentation. IEEE Trans. Image Process. 11, 825–837 (2002) 2. Givliano, V.E.: How we find patterns. Int. J. Sci. Technol. 42–49 (1967) 3. Sorwar, G., Abraham, A.: DCT based texture classification using a soft computing approach. Malaysian J. Comput. Sci. 17(1), 13–23 (2004) 4. https://www.diesel-ebooks.com/item/9780470845134/Webb-Andrew-R.-Statistical-PatternRecognition/1.html 5. Subba Rao, M., Eswar Reddy, B.: An overview of pattern recognition methods on texture classification. CiiT Int. J. Artif. Intell. Syst. Machine Learn. (2011). Print: ISSN 0974 – 9667. Online: ISSN 0974 – 9543 6. Subba Rao, M., Eswar Reddy, B.: Comparative analysis of pattern recognition methods: an overview. Indian J. Comput. Sci. Eng. (IJCSE) 2(3) (2011). ISSN 0976–516 7. Jain, A.K., Dun, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Mach. Intell. 22(1) (2000) 8. Fu, K.S.: Syntactic pattern recognition and applications. Prentice Hall, Englewood Cliffs, New Jersey (1982) 9. Gonzalez, R.C., Thomason, M.G.: Syntactic pattern recognition: an introduction. Addison Wesley, Reading, Massachusetts (1978) 10. Pavlidis, T.: Structural pattern recognition. Springer, Berlin 11. Kulkarni, A.: Artificial neural networks for image understanding. Van Nostrand Reinhold, New York (1994) 12. Economou, K., Lymberopoulos, D.: A new perspective in learning pattern generation for teaching neural networks. Neural Netw. 12(4–5), 767–775 (1999) 13. Mizutani, E., Demmel, J.W.: On structure-exploiting trust region regularized nonlinear least squares algorithms for neural network learning. Neural Netw. 16(5–6) (2003, June–July) 14. Kamarthi, S.V., Pittner, S.: Accelerating neural network training using weight extrapolation. Neural Netw. 9, 1285–1299 (1999)
Parametric Analysis of Texture Classification Using Modified …
473
15. Specht, D.F.: Probabilistic neural networks for classification, mapping, or associative memory. In: Proceedings of IEEE International Conference on Neural Networks, vol. 1, IEEE Press, New York, pp. 525–532 (1988) 16. Offman, M.F, Basri, M.A.M.: Probabilistic neural network for brain tumor classification. In: 2011 Second International Conference on IEEE Transaction on Intelligent system, Modelling and Simulation (ISMS) (2011) 17. Burrascano, P., Cardelli, E., Faba, A., Fiori, S., Massinelli, A.: Application of probabilistic neural networks to eddy current non destructive test problems. In: Proceedings of 7th International Conference on Engineering Applications of Neural Networks, Cagliari (2010) 18. El Emary, I.M.M., Ramakrishnan, S.: On the application of various probabilistic neural networks in solving different pattern classification problems. World Appl. Sci. J. (2008) 19. Ueda, N.: Optimal linear combination of neural networks for improving classification performance. IEEE Tran. Pattern Anal. Mach. Intell. 22(2) (2000) 20. Schmidt, W.A.C., Davis, J.P.: Pattern recognition properties of various feature spaces for higher order neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 15(8) (1993) 21. Gazula, S., Kabuki, M.R.: Design of supervised classifiers using Boolean neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 17(12) (1995) 22. Ney, H.: On the probabilistic interpretation of neural network classifiers and discriminative training criteria. IEEE Trans. Pattern Anal. Mach. Intell. 17(2) (1995) 23. Kusy, M., Kluska, J.: Assessment of prediction ability for reduced probabilistic neural network in data classification problem. Soft Comput. 21(1), 199–212 (2017) 24. Haralick, R., Bosley, R.: Texture feature for image classification. In: Third ERTS Symposium, vol. 1, issue no. 24–25, 7–12 (1973) 25. Subba Rao, M., Kiranmayee, M.: Texture classification using weighted probabilistic neural networks. In: International Conference on Computer Science and Information Technology held at Pune by IRNet on 14 July 2012 26. Kusy, M., Kowalski, P.A.: Modification of the probabilistic neural network with the use of sensitivity analysis procedure. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE, pp. 97–103 (2016) 27. Song, T., Jamshidi, M., Lee, R.R., Huang, M.: A modified probabilistic neural network for partial volume segmentation in brain MR image. IEEE Trans. Neural Netw. 18(5), 1424–1432 (2007) 28. Harring, S., Viergever, M.A.: A multiscale approach to image segmentation using Kohonen networks. Tech. Report RUU-CS-93–06, Utrecht University (1993) 29. Song, T., Jamshidi, M., Lee, R.R., Huang, M.: A novel weighted probabilistic neural network for MR image segmentation. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2501–2506 30. Kusy, M., Kowalski, P.A.: Weighted probabilistic neural networks. Inf. Sci. (2018, February 27) 31. Rumelhart, D.E., McClelland, J.L. , C. PDP Research Group (eds.): Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. Foundations, MIT Press, Cambridge, MA, USA (1986) 32. Subba Rao, M., Probabilistic neural networks for brain tumor classification. In: National Conference on Technological Developments in Power Engineering held at Vignan University, Guntur on 17–18 Aug 2012
Modeling and Simulation of Automatic Centralized Micro Grid Controller M. Padma Lalitha, J. Jayakrishna, and P. Suresh Babu
Abstract This paper studies the comprehensive modeling of a micro grid network of an automatic centralized micro-grid controller (ACMC) based hybrid AC/ low voltage DC (LVDC),able to operate the device on-grid and off-grid with operation of a Fuzzy logic controller. There are separate feeders with loads attached at varying voltages in the LVDC and AC networks. Using a bidirectional AC–DC–AC converter, we can associate very broad AC and LVDC networks to control active (P) and reactive power (Q) from sources dependent on LVDC network’s load prerequisite and controlling of voltage. The proposed ACMC was implemented on a test system outfitted with a bidirectional converter including LVDC and AC networks with radial distribution. This gives a play and plug feature to the system. A double fed induction generator based wind energy converter and solar PV cluster with maximum tracking of the power factor were used as sources. In MATLAB simulink the system was simulated. The outcomes demonstration the ACMC effectively plays out the 4 quadrant of P,Q in the gadget for different conditions.
1 Introduction Renewable energy system are proposing a new technology that is cleaner and able to meet the growing demands of interconnected and disconnected societies for electricity. Micro grids have been a major attraction for the scientific community in recent years, as well as a potential option for future conventional energy systems. The paradigm shift towards sustainable power sources is driven by uncertainty as a M. P. Lalitha (B) · J. Jayakrishna (B) · P. S. Babu Department of Electrical and Electronics Engineering, AITS, 516126 Rajampet, Boyanapalli, India e-mail: [email protected] J. Jayakrishna e-mail: [email protected] P. S. Babu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_38
475
476
M. P. Lalitha et al.
result of fast exhaustion of traditional petroleum fuels. By the way, an enormous level of those inexhaustible sources produce an innate DC power yield. This presentation must be appropriately advanced for dealing with standard AC burden. At the same time, quick development and progression of electronics devices over a last decade has been increase over gadgets that use DC power to work. It has made a restored enthusiasm for DC power frameworks development in a traditional system. therefore the requisite emerged for various stages of conversion between AC and DC. Extensive works has led to highly effective converters for DC-to-DC than AC–DC converters. Such machines are fitted with adapters to switch the power supplied to the DC. Appliances have been designed using DC power drives because of simplicity of speed control. It involves extra stages of AC–DC transformation, reducing its output because of a transformation failure. Small-scale domestic deployment benefited from low renewable costs. Subsequent to changing over the DC to AC input, the AC micro grids deliver the AC burden to its systems. For DC stacks, an extra change phase of the power that has just been changed over prompts repetitive power changes over is included. The existence of AC burden, stays a redundant problem. Therefore, the idea of a mule AC–DC micro grid to address the multi conversion issues has been further expanded. The solution was proposed as separate AC and DC buses combined with a bidirectional converter. There have additionally been conversations in regards to the plan of a bidirectional converter with two unique sorts of control systems. For the management of power of independent small scale network with cross breed source, a microcontroller based solution has been suggested. A new analytical technique has been developed with a consistent model for handling the reliability evaluation of sustainable power source in a PV-Wind stockpiling system. The LVDC system distribution removes the several conversion stages. DC power is more effective than AC power for household appliances. The household devices are mainly operating with 12, 24 or 48 V voltage levels. The LVDC definition can likewise be applied to work facilities and the achievability of the equivalent was examined. The household devices are fitted with a voltage of not more than 50 V. The performance of the different equipment was measured for various sizes of cable and different voltage ranges in both DC and AC networks. The voltage analysis analyzed various voltage levels such as 326, 230, 120, 48 V. The comprehensive analysis found a voltage of 326 V is better suited to the office setting than a voltage of 48 V. That’s because the key quality for an office setting is reliability. The losses were shown to be higher at 48 V for the office environment than other voltage levels. on the contrary, because safety is a major concern for a home setting, 120 V can be a superior voltage level. Furthermore, 48 V can be a perfect decision relying upon the house’s link size prerequisite, as most telecommunications just as home machines are appraised at that level. The following grey areas were established in current literature based on the complete writing overview conducted:
Modeling and Simulation of Automatic Centralized Micro Grid …
477
• A plan for isolated DC micro grids tackled the problem of increased DC penetration, yet the existence of AC sources raises the issue of repetitive changes. • The implementation of a hybrid AC–DC micro grid to address the issue of excess changes was proposed. Be that as it may, because of the fast infiltration of sustainable power sources and the predominant development of LVDC systems in the micro grid’s DC segment the subject of repetitive changes reemerges alongside the requirement for a totally different system regulation. • Modeling of an AC/LVDC hybrid configuration consisting of an AC network linked to an LVDC network. The propose plan of the micro grid does not limit either the number of AC or LVDC buses. • These two networks are interconnected with a bidirectional power converter. In all four quadrants this converter is liable for active and reactive power transmission. • The LVDC network is composed of 326, 230, 120 and 48 V voltage feeders that supply a various client portion. • Doubly Fed Induction Generator (DFIG) capacities and DC network generation uses solar photovoltaic cluster with power point tracking mechanism (MPPT). In this paper the system’s off grid is simulated.
2 Structure Modeling This segment tends to the structure of the proposed system modeling the sustainable sources are present in network. Double-fed induction generation and sunlight based PV array are the inexhaustible sources consider for the prospective test system, and subsequently their respective modeling aspects were discussed. A standard hybrid AC–DC micro grids schematic representation can be shown in Fig. 1. The micro grid comprises of independent DC and AC buses. A bidirectional converter which is liable for the current flow amid two buses interconnects them. Doubly Fed Induction Generator are associated with the AC bus, whereas a DC bus is connected to sources like fuel cell and PV cluster.
Fig. 1 Standard AC–DC hybrid micro grid
478
M. P. Lalitha et al.
Fig. 2 Bi-directional converter block diagram
The AC network is associated to the micro-grid AC bus via a breaker. Figure 2 displays the proposed model of hybrid AC/LVDC micro-grid. The micro grid’s AC component is able to accommodate an n-buses of AC network. During micro grid service, the number of buses on either network will increase or decrease. DFIG is modeled to the rotor with the AC–DC–AC converter that is liable for the machines reactive power distribution. Also the system is fitted with a pitch control mechanism to control active strength. The PV cluster is planned in accordance with implementation of MPPT mechanism. The MPPT technique based on perturbation and observation (P&O) was implemented in this paper. This paper proposes the creation of an ACMC, whose modeling will be explain in depth in later parts. This is liable for the main control in the system of all generations, and the controller changes the structure of network. A 250 KVA power bidirectional converter has been modeled that interconnects both systems. The ACMC detects load voltages and current just as source voltages and currents and executes calculations for control and checking.
2.1 Modeling of the Systems Sources The modeling of the DFIG based wind generator and the Photovoltaic system are taken in the conventional approach. Photovoltaic system modeling was performed using one Photovoltaic cell diode configuration. The doubly fed induction generator had a fifth order generating system electrically designed. The mechanical model was made by using a single mass lumped model. The mechanical subsystem’s output serves as the signal to the electric subsystem. Modeling of the electrical subsystem was performed by utilizing d-q axis.
Modeling and Simulation of Automatic Centralized Micro Grid …
479
3 Modeling of Converters The system consist of three converter types. In order to monitor MPP a boost convert is associated with a photovoltaic array. By continuous analysis of the operating point characteristic power against the module voltage curve, the terminal voltage is modified. The DFIG Rotor circuit uses an AC–DC–AC converter. In the four quadrants between the DC and the AC network of the micro grid there is a bidirectional converter, which regulating the active and reactive power flows and also for the management of the DC connection voltage on a micro grid. The following segment explains briefly describes how such converters are modeled.
3.1 Boost Converter Modeling The state space is used to model the converter. To model the converter, an average state space model was used. This converter is used in implementation of the MPPT algorithm based on P&O.
3.2 DFIG Controller Modeling Numerous methods of control have been explored for modeling DFIG. The DFIG is equipped with a pitch modulation feature and a rotor side converter. The angle of pitch is continuously measured by regulated the system. The point of pitch is accounted for constantly, and contrasted with the reference esteem. A properly tuned proportional integral controller sends the deviation. The machine’s rotor comprises an AC–DC–AC converter that couples to the system. As the system is modeled in direct-quadrature reference frame, controller modeling becomes simple. This allows the controllers to have a decoupled design and allowing independent controlling of reactive and active powers.
3.3 Modeling of Bi-directional Converter The principal four quadrant converter connects the hybrid micro-grid’s DC and AC networks. The key function of this converter is: • Convert power among DC and AC to facilitate exchange of power between networks as required. • preserve constant Micro grid DC-link voltage.
480
M. P. Lalitha et al.
The converter has been modeled in the direct-quadrature reference frame to allow the design of an active and reactive control loop. Converter block diagram is shown in Fig. 2. The two main control loops in the system are briefly mentioned.
3.3.1
Active and Reactive Power Loop Control
The reactive and active power are measured in the direct-quadrature reference after a power invariant transformation, since this procedure allows the controlling of both axes by decoupling individually. The decoupled equations suggest that the Q-axis current controls the active power and that the D-axis current controls the reactive power. Two different loops are identical to one another.
3.3.2
DC-Link Voltage Loop
The extrinsic voltage loop has been configured to regulate the dc link voltage to a innuendo esteem. The converter is operated with feedback to ensure the nominal bus voltages are maintained under all conditions.
4 Modeling of ACMC The aim of the ACMC is to give auxiliary control, i.e. organized controlling and monitoring of the total functions of micro grid. The designed micro-grid includes local control unit The ACMC principal roles are as follows: • To serve actual and reactive reference esteems are given for the bi-directional converter under discussion. • To check and controls the power and schedule needed for micro-grid flow through adequate generation of the converters are mentioned. • To control and monitor LVDC links voltages. • To track burden generator bus plug and play feature in every section of the micro grid network. ACMC compute and progress and measured at each point in the network, the source and load currents and voltage buses are measured. The amount of power supplied by generators, which depends on the specifications, is either present on the AC or DC networks. The AC and DC generations sources are controlled after the schedule has been prepared so that the power balance criterion is met. once the generation reaches the load demand, then according to the economic analysis algorithm, the correct generator dispatch will be updated. Practically, the control is accomplished by changing the sum of cells in the photovoltaic array in parallel and in series for subsequent change in generation (Fig. 3).
Modeling and Simulation of Automatic Centralized Micro Grid …
481
Fig. 3 Flow chat for power generation control
With the DFIG, the ACMC signals are transmitted to the active power control unit of the system, which adjusts the reference of active power as needed. when the load is exceeds the micro grid capacity, then the load is loaded. An ACMC entry is often accompanied by any connection to a new generator bus or loading bus to record the changed network structure in its data base. The program algorithm is discussed in this manner.
4.1 Algorithm for Load Shedding Step 1: On both AC and DC networks it collects and stores the data for critical burden. Step 2: At unbalance power the critical loads are constant.
482
M. P. Lalitha et al.
Step 3: The load priority is given according to the critical conditions of the added load. Step 4: The lower priority load is separated by the load priority. Step 5: If the generation power is equal to load demand in a system go to step6 otherwise go to step3. Step 6: End.
4.2 Algorithm for Plug and Play Step 1: The corresponding impedance and Thevenin resistance of AC and DC networks was measured and stored separately. Step 2: Adding or removing of any bus from the network will be done by the sense of change in equivalent impedance. Step 3: A new feeder can add or remove by sense the voltage level. Step 4: The modifications are done by updating the network. Step 5: Go to step 1 and repeat the same procedure. The priority is identified and listed before the operation using the user. The ACMC is design for the algorithms execution and the operation 4 quadrant in micro-grid and regulation of the DC-link voltage bus in the network.
5 Case Study A system test was developed to evaluate and simulate the advance hybrid micro grid. The system network in AC has 7 buses and DC have 5 buses are consider with radial distribution. The format of the device taken into consideration for the case have a look at has been presented in Fig. 4. Different buses and parameters are used modeling of DC and AC parameters are shown in Tables 1 and 2. A complete generation ability of AC is 1250 kW and DC is 500 kW are to be considered. The bi-directional converter has a power of 250kVA. In order to satisfy the needs of customers the voltages are chosen. The AC single phase supplies 230 V for home loads and three phases supplies 415 V and the 11KV supplies for the industrial loads. The level of the DC side voltage in different utilities has been planned. The 120 V feeder is specifically designed to meet the company and office requirements, while 48 V is optimally suited for domestic requirements.
6 FUZZY Logic Controller Fuzzy Logic is a specific field of focus in the study of Artificial Intelligence. Fuzzy logic controller is a method which allows nonlinear controllers to be erected
Modeling and Simulation of Automatic Centralized Micro Grid …
483
Fig. 4 Standard structure used for case study
Table 1 DC network parameter
S. No.
Element description
Parameter values
1
Bus 1
120
2
Bus 2
48
3
Bus 3
120
4
Bus 4
326
5
Bus 5
690
6
PV cluster 1
250 kW, 48 V
7
PV cluster 2
250 kW, 120 V
8
Converter 1
400 V/120 V, 500 W
9
Converter 2
400 V/48 V, 1 kW
10
Converter 3
400 V/120 V, 1 kW
11
Converter 4
400 V/326 V, 500 W
12
Converter 5
400 V/230 V, 500 W
13
DC-link bus voltage
400 V
from analytical information which block diagram. The process of input signals and assigning them a fuzzy value the fuzzification block is required. The collection of rules allows for the regulation of a linguistic definition of the different factors and is based on process information. The inference mechanism is responsible for interpreting the data, considering the rules and their membership functions. The purpose of fuzzifier is to transform narrow input values into fuzzy values. Fuzzy Knowledge Base stores information on all fuzzy interactions between input and output. The membership function therefore determines the variables input for the fuzzy rule base and the variables output for the managed field. Fuzzy Rule Base retains the awareness of the domain method activity Inference Engine works like
484 Table 2 AC network parameter
M. P. Lalitha et al. S. No.
Element description
Parameter values
1
Bus 1
230 V
2
Bus 2
11 kV
3
Bus 3
690 V
4
Bus 4
415 V
5
Bus 5
690 V
6
Bus 6
415 V
7
Bus 7
230 V
8
DFIG 1
1 MW, 690 V
9
DFIG 2
250 kW, 690 V
10
Transformer 1
400 V/120 V, 1 kW
11
Transformer 2
400 V/326 V, 500 W
12
Transformer 3
11 kV/415 V, 1 MVA
13
Transformer 4
690 V/415 V, 10 kVA
14
Transformer 5
415 V/230 V, 10 kVA
15
AC-link bus voltage
415 V
every FLC kernel. The performance value is updated with the rule base depending on the error signal value ε, and error rate ε. This simulates individual preferences practically by executing indirect reasoning. Defuzzifier’s function is to turn fuzzy values from fuzzy inference engines into crisp values (Fig. 5).
Fig. 5 Block diagram of fuzzy logic control
Modeling and Simulation of Automatic Centralized Micro Grid …
485
7 Simulation Results The discourse investigation was simulated in the framework of MATLAB/Simulink. In a variety of instances, actual and reactive power transmission was simulated on the basis of each grid’s requirement. The ACMC controls are checked and the findings in Table 3 can be seen. In case 1, 2 and 3, the controlling operation for ACMC is decide by adjusting it according to the need for generations. In the 4th case the overabundance active power is shared between the DC and the AC, and in the 5th case, surplus active power is shared between the AC and DC side. In case 4, a parallel switch of 2 kVA of reactive power was accomplished from the DC to the AC side (Fig. 6). Figure 7a displays variations in the level of irradiation versus module output power at constant temperature. At t = 0, it radiated 0.2 kW/m2 . It increase to 0.4 kW/m2 at t = 3 s and at a final estimation of 1 kW/m2 in t = 10 s. In solar panel the MPPT operation can be seen as slope soar nature of the graph. Figure 7b is displaying the actual DFIG power that performs a 3seconds interrupt velocity and is stable with residues still between wind varieties due to MPPT and rotor converter function. The negative power value is the resultant power and the stored power is positive. The output control signal are transmitted through the ACMC to the different sources of DC and AC grid. Figure 8a is the control signal given to the system by 1 MW DFIG. Signal provided to both 120 and 48 V bus linked PV clusters are Fig. 8b Table 3 Consequences various case performed on the test system Case number
AC load in KW
AC generation in KW
Power transferred in KW
DC load in KW
DC generation in KW
1
1000
1000
0
250
250
2
1250
1250
0
250
250
3
1250
1250
0
500
500
4
1500
1250
250
250
500
5
1000
1250
250
750
500
Fig. 6 Simulation diagram
486
M. P. Lalitha et al.
Fig. 7 Renewable sources power generation in the experiment system a photovoltaic power o/p versus irradiation, b DFIG o/p power versus time
Fig. 8 Generation of control signal from ACMC. a ACMC Control signal to 1 MW DFIG, b control signal supplied to 120 V bus associated Photovoltaic array, c control signal supplied to 48 V bus associated Photovoltaic array
and 9c. Figure 9a indicates that the converter, assisted by the AC grid, transmits a power of about 200 kW and the DC grid is capable to increase the power capacity
Fig. 9 ACMC operation of voltage control and Four quadrant a active power supplied on the DC side, b transfer of reactive power to DC link bus, c active power supplied from DC side d DC-link bus voltage profile
Modeling and Simulation of Automatic Centralized Micro Grid …
487
of provide 100 kW of electricity. In the same way, in the case Fig. 9c a load shift of approximately 600 kW on the AC side was simulated. This generation was raised to 380 kW from the AC and the DC grid supplies the 120 kW by the converter. The transferred reactive power was simulated in the two directions shown in the Fig. 8b. The DC grid had a reactive power of 2 kVAR in the simulation and a change in demand of 2 s resulted in a transfer of 2 kVAR from AC grid as possible. During the converter entire process, Fig. 9d indicates a DC-link bus voltage that was sustained by the converter at a steady 400 V voltage.
8 Conclusion This paper proposed the design and operation of a smart hybrid AC/LVDC micro grid. Therefore, in this paper we are development the Fuzzy logic controller that contain the benefits of low switching. The results obtained, accept the idea of such a projected configuration for the attractive operation and control to be achieved. For flexible autonomous operation, the following significant effects on the current power system: The project contributes to the creation of various regional micro-grid clusters with a greater flexibility and reliability as local micro grids do not have the effect of a dependency-dependent micro-grid. This concept will also help upgrade existing lines for bidirectional transmission of electricity because each local source of energy is perhaps locally used. Through constructing a self-sufficient local micro grid in remote areas with little or no access to the traditional grid, this design would eliminate the need for connections through a large transmission network. The AC /LVDC system will improve the power market and reduce the need for all the electricity generated as surplus energy can still be converted, stored and used in LVDC system batteries as needed. The paper explained the simulation of the bidirectional converter in detail. It also describes the analysis and control of various sources. The key concept of the ACMC implemented and the operation of Off-Grid mode will be simulated for better results which demonstrate that the system works reliably.
References 1. Farajah, E., Eghtedarpour, N.: Power control and management in a hybrid AC/DC micro grid. IEEE Trans. Smart grid 5(3), 1494–1505 (2014) 2. Mohamed, A., Elshaer, M., Mohvammed, O.: Bi-directional AC-DC/DC-AC converter for power sharing of hybrid AC/DC systems. In: Proc. IEEE Power Engineering society General Meeting July 2011, pp. 1–8 (2011) 3. Paliwal, P., Patidar, N.P., Nema, R.K.: A novel method for reliability assessment of autonomous PV-wind storage system using probabilistic storage model. Int. J. Electr. Power Ener. Syst. 55, 692–703, Elsevier (2014)
488
M. P. Lalitha et al.
4. Liu, X., Wang, P., Loh, P.C.: A hybrid AC/DC microgrid and its coordination control. IEEE Trans. Smart Grid 2, 278–286 (2011) 5. Liu, H., Tao, S.-G.: The methods of simulation of boost converter based on MATLAB. Commun. Power supply Technol. 4(21), 22–24 (2004) 6. Rodriguez-Amenedo, J.L.: Automatic generation control of wind farm with variable speed wind turbine. IEEE Trans. Energy Conver. 17(2), 279–282 (2002) 7. Zhao, J.F., Jiang, J.G., Yang, X.W.: AC-DC-DC isolated converter with bidirectional power flow capability. IET power Electron. 3(4), 472–479 (2010) 8. Peri, P.G.V., Paliwal, P., Joseph, F.C.: ACMC- based hybrid AC/LVDC micro-grid. IET Renew. Power Gener. (2017)
The VLSI Realization of Sign-Magnitude Decimal Multiplication Efficiency Reddipogula Chandra Babu and K. Sreenivasa Rao
Abstract Multiplication is a dynamic procedure in which intermediate partial products (IPPs) are typically picked from a set of multiples of pre-calculated radix-10 X. Many plays require just [0, 5] by encoding the Y digits to a one-hot representation of the signed digits in [−5, 5]. This eliminates the sense of choice at the cost of additional IPP. Two-complement signed-digit (TCSD) encoding is also used to characterize IPPs that allow dynamic negation (through one xor per bit of X multiples) of Y-coded digits in [−5, − 1]. With the generation of 17 IPPs for 16-digit operands, we are able to launch a partial product reduction (PPR) with 16 IPPs that improve VLSI regularity. We thus save 75% of the negating xors by encoding sign-magnitude signed-digit (SMSD). For first-level PPR, we create an efficient adder with two SMSD input numbers, the total number defined by the TCSD encoding. Multi-level TCSD 2:1 reduction results in two TCSD combined partial items jointly subject to a special early-initiated conversion scheme for the final binary-coded decimal portion. As such, the VLSI implementation of a 16-digit parallel decimal multiplier is synthesized where results show some increase in efficiency over previous similar designs. Keywords Radix-10 Multiple · Sign-Magnitude Digits (SMSDs) · VLSI Design
1 Introduction In computer, electronic and automated uses of decimal data, for example research, development, banking, and online applications, the decimal PC number mix is used. With regular, moderate machine virtual decimal computing units complex decimal structures cannot meet the ever-expanding demands of power handling. Taking all R. C. Babu (B) · K. S. Rao Department of ECE, Annamacharya Institute of Technology and Sciences, Rajampet, Kadapa, A.P. 51626, India e-mail: [email protected] K. S. Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_39
489
490
R. C. Babu and K. S. Rao
into account, their recommendations were a vital component of all helpful processors supported at a late date. Decimal digit encoding of combined decimal coded (BCD) computation has actually administered decimal number shuffling, whether or not recognized by the hardware or programming. The development of decimal number gear is not yet undergoing and hardware figures and structures can be moved. For example, in X Y calculation, multipliers on the frontline BCD use iterative enlargement counts where mid-way objects (for example, the sum of one digit BCD multiplier Y times, and X digits) are added one by one. Every single BCD number in [0, 9] x or several essentials can be delivered autonomously as a BCD number (e.g. 7X 1/4 4X/x 2X). The final procedure will significantly increase the value of the midway thing tree per BCD digit multiplier (as driven by the number of weighted BCD digits added), which will regularly exceed the most moderated fragmentary product assortment. The last decision would cause less over top BCD multipliers by using the possibly fast and insignificant effort BCD digital multipliers. This is Erle et al. This is Erle et al. Registered three purpose-based multipliers for the midway age of decimal digit by number to achieve fewer circles, less wiring and no persuasive motivation to store multiplicand registers. When VLSI development progresses quickly, semi (complete) equivalent BCD-multipliers will be drawn, which will multiply (all) the majority of the way products are duplicated and equivalent copied.
2 Literature Review Imposing: columns acrónims MR, ME, DN, # OP, PPDS and PPDE speak about generator recruitment, various encryption systems, complex IPP denial, a number of operands to use (for example a number of decimal and conceivable items not overabundance), single digit items etc. Single digit items. This article provides a brief overview of a group of educational evaluations from the past. The 5-piece number 3d(31 − 3d) of the digit encoding at [0, 6]([−6, − 0] is defined by Svoboda in Table 1). Table 1 shall exclude such decimals in enlargement, with floating-point operands and creative game-planning portals (FPGAs) in the form of a single organized movement or FPGAs, as the scope of this study is not included in the FPGAs. No references to any other FPGA plans are made in Table 1. The following are almost no basic explanations for the evaluation focus, n = 16. (1)
(2)
(3)
Reference [2] (progressive snappy vehicle multiplier with free X-items where a defective thing digit set (PPDS) and a partial thing digit encoding (PPDE) is translated into a BCD CS over time, somewhere between [0, 18] and [0, 10] and double BCD. 1). Reference [12]: a speed vehicle-free, special X multiplier where PPDS and PPDE express improvements [0, 15] and (8, 4, 2, and 1). The degree of decrease is 5. Reference [1]: the snappy car-free X multiplexes of 4:1 and 2:1 corresponding multiplier, with a mean of PPDS and PPDE of [0, 10] and BCD CS separately.
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
491
Table 1 Latency comparison of PPG Stage () 1
References
Components
Delay
Total
Ratio
[2]
– 2X_generation
6
17
1.42
7
0.58
11
0.92
8
0.67
27
2.25
13
1.08
12
1.00
12
1.00
2
[12]
3
[1]
4
5
6
7
8
(4)
[6] Radix-5
[6] Radix-10
[7]
[4]
Proposed
Mux 4:1
3
BCD full adder [29]
8
8X_generation
4
Mux 3:1
3
4X_generation: g
8
Mux 3:1
3
BCD to 4,2,2,1 conversion
1
3X_generation
4
Mux 4:1
3
3X_generation
21
Mux 5:1
4
Dynamic negation
2
3X_generation
7
Mux 5:1
4
Negation
2
4X_generation
6
Mux 5:1
4
Negation
2
4X_generation
8
Mux 5:1
4
Reference [2] There are only two designs, one with a 3:2 decline and a 2:1 (±2x, ± X) and a 2:1 multiplex (10X, 5X) decline. Reference [2] Thus, there are two designs, one with a 3:2 decline and specifically two upgrade cell reductions. The degree of collapse is 8. (a) Radix 10: [−5, 5] SMSD recovery, slow-three-year age multiplexing 5:1 complex negatively. (b) the decrease is six, radix-10. SMSD multiplier recovery by multiplier/multiplier, and numerical increase by digit is achieved in PPDS [−6, 6] with Svoboda adder as a moderate midway element. [3] BCD-to-[−5, 5] SMSD multiplier can be rehashed by SPDS.
(5)
Reference [4]: multiplier simultaneous, multiplier SMSD [-5, 5], free time of X-different papers with an overabundant 3X monitor, impressive multiplexing impairment 5:1. It was done two times at the end. The other is 17:8, with three additional CS and a 4-b snake involved. Two blower speeds (4; 2) and a 5b snake are installed in the next stage (8:2). All approaches are checked with a particular upgrade concept.
492
R. C. Babu and K. S. Rao
3 Current System In particular, rapid radix 10 extension can be boiled by equal deficient thing age (PPG) and a midway thing decrease (PPR). It is equally necessary to reduce the silicon cost while preserving the remarkable execution degree. Et P = x Y is a decimal increment in which multipliers X, Y, and P, are standard numbers of [0, 9] radix-10. In case of doubt, these numbers are answered by BCD encoding. Nonetheless, center midway stuff (IPPs) are still not possible without decimal digit sets. IPP substitute images are essential for the PPG, and are of particular importance for decimal replication from two angles: one is the quick and insignificant IPP exercise period, and the other has an effect on the IPP images, influencing PPRs’ displays. PPP substitute images Direct PPG multiplane [3] is lazy, expensive and prompts n-tweeple BCD IPPs for nn duplications (e.g. 2n BCD numbers) with BCD digit-by-digit multiplication methods [5]. In any case, the digit meaning (SMSD) function of multiplier and multiplier (SMsd) recovers and even more accurately uses 3-b by 3-b PPG However, most PPG plans use pre-managed tests from multiplicand X (or X performance), which is still a long-term tradition. Set 0, 1, prefiguration complete. 9}X is languid and expensive to obtain as regular BCD numbers. The typical therapeutic strategy is to make use of a more small, more modest system that is readily accessible, free to deal with (for example, 0, 1, 2, 4, 5}, with drawback of duplicating the number of BCD numbers to be added to the PPR, e.g. 3X = (2X, X), 7X = (5X, 2X) or 9X = (5X, 4X) repeat BCD IPPs for example.
4 Proposed System Deceimal IPPS Chicked with Sign Magnish Representation: Decimal IPPs are usually referenced as 4-b in any case [−5–007, 5–007] (5–007 to 7). For example, consider 5–007 = 5 and 5–007 = 7, each with its own sign estimate and with two further representations. The latter is suited for other than nullification simple calculation exercise and is best done in the form of a symbol. Here, a decimal expansion plot with qualities in accordance with Table 1 will appear (Fig. 1). (1) (2) (3) (4) (5) (6) (7) (8) (9)
[−5, 5] the number of SMSD multipliers duplicated; {0} indicating prefigured objects to X; the encode of pre-determined items of SMSD 4-b [− 6, 6]; various shipping dynamics of only one XOR (for example 4 b) per digit; Instead of (n + 1), add n operands to empty; SMSD + SMSD − Snake TCSD for each of the four sortiments of data signals; [− 7, 7] The complete missed stuff domain of the TCSD; Tedious BCD changes are dispatched early; Up to final BCD PPR with final modification.
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
493
Fig. 1 Block diagram of the proposed multiplier
4.1 Recoding of Multiplier’s Digits One BCD multiplier of its kind is the [0, 9]-X pre-figured element {3, 6, 7, 8, 9}-X and {2, 4, 5}, 8}-X data combined data. On the other hand, the SMSD multiplier numbers of the BCD-to-redundant-5, [6] have a complex excusal limit on IPP X products, which are [0, 5], to [0], which only have one hard multiplier (for example 3 X) This re-coding however produces a shift (n + 1) as the number of multipliers that reduces the IPP by 1. This is particularly unacceptable for n = 16 (for example, the lowest word size for decimal operand IEEE754-2008). The input/rates of single hot code explanations are resolved in (1) which indicates that y I = v 3 v 2 v 1 v 0 and y − 1 w3 w 2 w 1 w 0 are solved by two consecutive digits of BCD multiplicatorsif y − 1 to 5, sv is the target code and v is 1 to 5 one-hot. This is because the rate is reduced by 2:1 PPR by 1 and this is preventable as described in Section III-C (Fig. 2). It is easy to see how morally equivalent recoding is applied to different outcomes of a comparable digit collection ([−6, 6] SMSD). This can also be obtained from the BCD digit bits I and b I − 1 with a corresponding standard expression for a couple of multiple X SMSDs (e.g., 1 X(u)2,2 X(d); 3 X(t); 4 X(q); 5 X(p)). For example, 3 x (t) is defined by (3) and the rest is in the Appendix. v3 = v1 (ω ⊕ v0 ) v4 = ω ∨ v0 v2 ∨ ωv0 (v2 ⊕ v1 ) v5 = v2 v 1 (ω ⊕ v0 ) sv = v3 ωv0 ∨ v2 (v1 ∨ v0 )
(1)
The main digit of the various supplied is the limit of four (5 = 9 = 45) that remains four following the changes of BCD-to-SMSD. Note that these items are addressed using a limit of one additional digit (for example a total of 68 items). t2 = b2 b0 b3 b2 (a3 ∨ a2 ) ∨ a 3 b2 (a 2 ∨ a 1 a 0 ) ∨ a 2 a1 b3 ∨ b2 b1 b0 (a3 ∨ a2 a1 ∨ a2 a0 )
494
R. C. Babu and K. S. Rao
Fig. 2 Two consecutive digits of X,3 X (BCD), and 3 X ([− 6, 6]SMSD)
∨ a 3 b0 (b3 ∨ a 2 a 1 b2 b1 )a3 b2 b1 b0 t1 = ∨b2 b1 a2 b3 (a1 ∨ a0 ) ∨ b1 (a2 ∨ a1 ) ∨ a3 b1 a b b ∨ b2 b1 b0 ∨ a 2 a 1 b2 b0 ∨ b2 b1 b0 ∨ ∨ a 3 2 3 0 a 1 a 0 b3 b0 ∨ b2 b1 b0 ∨ b2 b0 (a2 (a1 ∨ a0 ) ∨ b1 (a2 ∨ a1 )) ∨ a3 b3 b0 b2 ∨ b1 ∨ a 2 a 1 b3 b0 t0 = bo (a2 (a1 ∨ ao ) ∨ a 3 a 2 a 1 ) ∨ b0 (a3 ∨ a 2 a1 ∨ a2 a 1 a o )
(2)
4.2 Partial Product Generation 1.
3 PPP defines and the normal course of action for IPPs, plus n + 1 for n = 4. Tickbars are multiplicand BCD-digits. The width of the IPP network’s longest section (i.e., 10 n-weighted position) is (n + 1) where all digits have a location with [−6, 6], the top and base parts short [−5, 4] and [−6, 3].
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
495
ω = w3 ∨ w2 (w1 ∨ w0 ) v1 = v2 ∨ v1 (ω ⊕ v0 ) v2 = ωv0 v3 ∨ v2 ∨ v1 ∨ v2 v1 ∨ ω ∨ v0 (v3 ∨ v 2 v1 ) They restrict the grid size to n (e.g. 5/4 for n = 4, and 17/16 for n = 16), immediately between the PPG finish and the PPR commencement. Here’s the way it works: They calculate the whole of two dim digits (see Fig. 3) as follows, exempt from (and vice versa) the normal PPG. The 10 n-weighted Recoded Multiplier Hold calculation is empty, and higher the lower dark integer. There is however no provision for a rise. For Yn − 1 > 4, let H denote the most notable X n − 1 − Y 0 digit (for example, the top dark digit in Fig. 3), where X n − 1 and Y 0 represent the BCD variable ‘s biggest multiplicand, respectively, and the least critical recoded multiplier amount. We delete H as 10 one-hot signs using an 8-input justification (see the farthest right window in Fig. 4.
Fig. 3 Normal organization of IPPs
Fig. 4 Required circuit for (17 → 16) depth reduction
496
R. C. Babu and K. S. Rao
Fig. 5 Overall view of 16 × 16-digit multiplier
It corresponds to the optimum total digit S at 10 n-weighted locations (instead of two gray digits in Fig. 3) and the carry bit c to be added to the 10 n + 1-weighted digit next to the bottom gray digit resulting in S (in Fig. 5 both S and S are identified by the white triangle). The total, as seen in the left-hand section of Fig. 5. Recode a multiplicand 10-weighted digit (i.e., X 1) explicitly.
4.3 Partial Product Reduction (PPR) Total n = 16 PPR as shown in Fig. 5 where BCD is spoken by a circle, a triangle, a square and a star, [− 6, 6] SMSD, [− 7, 7] TCSD and Binary Signed Digit (BSD); Choosing a first-level IPP SMSD representation while endorsing PPG does not imply any additional difficulty for PPR, as all decrease rates use TCSD adders apart from the first one with an alternate SMSD + SMSD-to-TCSD include. This snake is not as perplexing as having included an appropriate TCSD. SMSD Adder: Figure shows a parallel cut of SMSD + SMSD-to-TCSD for four separate cases identifying all possible variations of the information signals. Figure 6(a–d) in the documentary portrayal. High lines of contrast are posibits, and negabits. (Posibit is a regular item with a number juggling confidence equal to its critical position and an reward number crunching with a smart rank of x − 1). To organize I extents, image pieces are added, and a negative image moves the measurement of extremity posibits to negabits and alters their legitimate states. Subsequently, the bit-assortment U is decayed at a similar rate and the bit-assortment
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
497
Fig. 6 Digit slice of SMSD + SMSD-to-TCSD adder for four sign combinations
V is reset. Nonetheless, as will be tended to long before in the subsequent level, only one 4-b viper administers each of the four cases, which uncovers the purpose for the snake name. The pair (c in, c in) is the incoming symbol carrying C from the lower position. In all four cases, the representations of Z, V and C in are described, resulting in two additional representations for S (see the following numerical example for details). Case 1: (The number of values indicated): Fig. 8 is a mathematical reflection of the addition of P = s p 101 (P 5) and Q = sq 100 (Q 4) respectively. The reciprocal number of negabits is 1 − (0 − ) and the reciprocal is the statistical form 0 (− 1). The signed data carrying C in = 0 is determined by posibit c in = 0, and negabit c in = 1 − inverted. The complete adder in position 0 then accepts two negabits and one posibit, generating a potential sum of 1 and a negabit holding 0 − such that only one non-zero input is arithmetic − (i.e. − 1) + 1 = − 1. This 4-in-1 adder is slightly more effective than the [−7, 7] TCSD adder (i.e. less overhead-free delay), which can be checked by testing (4) and (5) the 4-in-1 adder and the TCSD adder logic pre-processing boxes. 2.
TCSD Adder: TCSD Adder needed to reduce subsequent concentrations (log 2 n − 2). The architecture needed is similar to that seen in.
Figure 7 except for pre-processing boxes in which the necessary logical expressions are defined in (6). Next up, Fig. 9 Reflects a single-digit adder piece. z 2 = ( p2 ⊕ q2 )( p3 (q3 ∨ p1 ) ∨ q3 p1 ) ∨ p2 q2 p 1 p3 q3 ∨ p 3 q 3 p2 q 2 ∨ p 2 q 2 p 1 z 3 = p 3 q 3 p 2 q 2 p1 ∨ p3 q 3 p2 q 2 p 1 ∨ ( p3 ⊕ q 3 ) p2 q 2 p1 ∨ p 2 q 2 p 1 .
(3)
The cycle −6 w 4 + 10 w 4 can be reversed. However, in case w 4 = 0, the decimal borrowing is moved to a decimal position which allows the bond to be extended. We use a parallel borrowing generator to avoid a slow borrowing distribution that uses
498
R. C. Babu and K. S. Rao
Fig. 7 Digit slice of the 4-in-1 SMSD + SMSD → TCSD adder Fig. 8 TCSD adder
decimal borrowing propagates, generating signals µ = (W = 0) and µ = (w < 0) = w 4. This borrowed signal is created by the Kogge Stone (KS)[26] 4-step parallel network, with 15 input pairs (x, µ) and the borrowing-in-b 8 in Part 1 (i.e. location 7). 3.
The table below provides similarities between the current system and the different BCD propagation schemes.
Advantages of Proposed System
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
499
Fig. 9 Final conversion in part 2. a Digit slice of TCSD + TCSD-to- BCD adder. b Architecture for final product generation in positions 8–22
• • • •
The time of partial compilation and sorting of items. The calculation is generalized to negative numbers of the BCD. Modern use of power for production. Completed BCD High-Speed Multiplier.
5 Implementation of the Proposed Multiplier in Fir Filter 5.1 Fir Filtre A filter is an undesirable signal feature or component of a filter. Filtering is a type of signal processing, which means that all signal components are wholly or partly that. Analog and electrical are two primary types of filters. It is possible to classify filters in several different categories on the basis of classification criteria. The two primary types of optical filters are analog finite pulse filters and digital infinite impluse pulse filters (IIRs). FIR Optical Signal Processing Filters are one of the major types of filters. For no acceptance, FIR filters are called finite filters. When we send the impulse through the system (single spike), the output will slowly become negative until the impetus passes through the buffer. There are no non-recursive filter data. Figure 10 demonstrates the design of a finite momentum filter.
500
R. C. Babu and K. S. Rao
Fig. 10 Finite impulse response filter realization
The FINITE-IMPULSE optical filter (FIR) is commonly used in a variety of digital signals such as speech synthesis, loudspeaker equalization, echo cancellation, active feedback and numerous network implementations, including SDR, etc. Some systems require large-scale FIR filters to conform with strict frequency specifications. Such filters must also handle fast sampling rates for wireless communication. Nevertheless, with filter order, the amount of multiplications and nutrients needed is reduced linearly for each filter output. Since the FIR filter algorithm does not allow a redundant function, it is difficult to execute a high-quality FIR filter in restricted resources in real time. Filter coefficients remain very stable and apiary signal processing applications have been developed. FIR filters are widely used for the optical transmission of signals. The resulting equation is calculated by the N-tap FIR filter out(n) =
N−1
x(n − i)h(i)
(4)
i=0
where (h): i = 0 s is … What is that? What is that? N − 1} is a vector filter. The FIR filter performs a transformation operation [8], primarily based on an infinite frequency. There are, however, differences between finite-length signals (e.g. images). The problem stems from the concepts of these fields. The general cout = p 3 q 3 , cout = p3 q 3 p2 q 2 p1 ∨ p 2 q 2 p1 ∨ p 3 q 3 v0 = p0 ⊕ q0 , v1 = q1 ⊕ ( p0 ∨ q0 ), v2 = q1 ( p0 ∨ q0 ) z 1 = ( p2 ∨ q2 ) p3 q 3 p 1 ∨ p 3 (q3 ⊕ p1 ) ∨ p 3 q 3 p 2 q 2 p 1 ∨ p3q3 p1 p2 q2 solution suggested is to stretch each line by connecting the edges of the signal. There are additional N − 1 samples at the signal limit. The difference of the signal between the left and the right may be unevenly separated. Values are set to 5–007 and µ for the number of moves left and left (alpha + µ = N − 1) feedback signals.
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
501
Fig. 11 Simulation result of multiplier
6 Result See Figs. 11, 12, 13, 14, 15, 16, 17 and 18.
7 Conclusion The 16 radix-10 BCD parallel multiplier is proposed for 17 SMSD-generated partial products [−6, 6]. As mentioned below, a large proportion of the improvements made with this paper and previous methods led to a modest 1.5% decrease in the region and a 10% decrease in power dissipation at a latency of approximately 4.8 ns compared to the most rapid previous work [7]. For the above, the minimum delay is 4.8 ns, while the proposed specification allows the synthesizer to meet the 4.4-ns time limit (i.e. 9% faster). In other words, the drawback is that the current system operates 9% higher at frequency and dissipates up to 13% less power without having to change the field.
502
Fig. 12 Block diagram of multiplier
Fig. 13 Internal diagram of RTL schematic
R. C. Babu and K. S. Rao
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
Fig. 14 Block diagram of FIR Filter
Fig. 15 The area of the multiplier which contain the lut
503
504
Fig. 16 Simulation result of FIR filter
Fig. 17 The internal diagram of RTL schematic of the fir filter
R. C. Babu and K. S. Rao
The VLSI Realization of Sign-Magnitude Decimal Multiplication …
505
Fig. 18 The power report of the fir filter
References 1. Busaba, F.Y., Slegel, T., Carlough, S., Krygowski, C., Rell, J.G.: The design of the fixed point unit for the z990 microprocessor. In: Proceedings of ACM Great Lakes 14th Symposium on VLSI, pp 364—367 (2004, April) 2. Kenney, R.D., Schulte, M.J.: High-speed multi operand decimal adders. IEEE Trans. Comput. 54(8), 953–963 (2005) 3. Larson, R.H.: High-speed multiply using four input carry save adder. IBM Tech. Discl Bull 16(7), 2053–2054 (1973) 4. Kenney, R.D., Schulte, M.J., Erle, M.A.: High-frequency decimal multiplier. In: Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp. 26–29 (2004, October) 5. Lang, T., Nannarelli, A.: A radix-10 combinational multiplier. In: Proceedings of 40th Asilomar Conference on Signals, Systems, and Computers, pp. 313–317 (2006, October) 6. Erle, M.A., Schwarz, E.M., Schulte, M.J.: Decimal multiplication with efficient partial product generation. In: Proceedings of IEEE 17th Symposium on Computer Arithmetic, pp. 21–28 (2005, June) 7. Erle, M.A., Schulte, M.J.: Decimal multiplication via carry-save addition. In: Proceedings of IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp. 348–358 (2003, June) 8. IEEE standard for floating–point arithmetic.IEEE Standards Committee (2006, October)
Image Encryption Algorithms Using Machine Learning and Deep Learning Techniques—A Survey T. Naga Lakshmi, S. Jyothi, and M. Rudra Kumar
Abstract In this world, the present situation is coming from text to multimedia content transmission. In multimedia data, images play major role for transmission and it is very important to protect the image data while transmitting over network. This can be achieved by Image encryption. There are so many different techniques should be used to protect confidential image data from unauthorized access. In this paper, a survey has done on existing works which is used different techniques for image encryption by using machine learning and deep learning algorithms and it helps to analyze different algorithms for different images and also for image data sets. Keywords Cryptography · Image · Encryption · Machine Learning · Deep Learning
1 Introduction Image encryption is the process of encrypting the images from plain to encrypted format. Machine learning is an application of artificial intelligence to learn the system automatically and to improve the system from the experience. Deep learning or deep neural network is used to capable of learning the unstructured data. The main reason to write this paper is to know the different kinds of algorithms that are used to encrypt the image using machine learning and deep learning algorithms. We are trying to discuss the different articles namely Image Encryption Using Chaotic Based Artificial Neural Network by Chauha and Prajapati [1]-performing image encryption using chaotic based neural network. A Novel Priority Based Document Image Encryption With Mixed Chaotic Systems Using Machine Learning approach by Revanna T. N. Lakshmi · M. R. Kumar (B) Department of Computer Science and Engineering, AITS, Rajampet, India e-mail: [email protected] S. Jyothi Department of Computer Science, SPMVV, Tirupati, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. K. Gunjan and J. M. Zurada et al. (eds.), Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough, Studies in Computational Intelligence 956, https://doi.org/10.1007/978-3-030-68291-0_40
507
508
T. N. Lakshmi et al.
and Keshavamurthy [2] proposed an algorithm based on the priority of the document image encryption. An Image Compression and Encryption Scheme Based On Deep Learning by Hu et al. [3] proposed image compression and encryption algorithm. Learnable Image Encryption by Tanaka [4] proposed an algorithm easy to encrypt the images without disturbing the learning mechanism. Encrypted Image Retrieval System: A machine learning approach by Hazra et al. [5] proposed an algorithm to retrieve the encrypted image using machine learning approach. Machine Learning Classification over Encrypted Data by Bost et al. [6] proposed a classification algorithm over encrypted data. Reversible Data Hiding Scheme During Encryption Using Machine Learning by Manikandana and Masilamania [7] proposed data hiding scheme for encrypting images. Batch Image Encryption Using Generated Deep Features Based on Stacked Auto encoder Network by Hu et al. [8] proposed batch encryption using deep features based on SAE.Iris image encryption based on deep learning by Li et al. proposed an encrypted algorithm on iris. PrivacyPreserving Distributed Deep Learning via Homomorphic Re-Encryption by Tang et al. [9] proposed deep learning algorithm in distributed environment. DLEDNet: A Deep Learning-based Image Encryption and Decryption Network for Internet of Medical Things by Ding et al. [10] proposed encryption and decryption for internet of medical things.
2 Machine Learning Algorithms 2.1 Image Encryption Using Chaotic Based Artificial Neural Network Chauha and Prajapati have presented a paper [1]. In this paper they proposed a new algorithm for the image encryption/decryption scheme using chaotic neural network. In this algorithm they combined two approaches Chaotic crypto system and ANN based Crypto system to make Chaotic based artificial neural networks. If the given inputs are same, chaotic systems produces the same results. Suppose if there is any change in the input system, it cannot be predicted the systems behavior. The pros and cons of the chaotic system and ANN are discussed in the review paper [11]. The objective is to identify the use of ANNs in the field of chaotic Cryptography. The technique they are used is, firstly it generates chaotic sequence and the weights of the ANN is generated based on the chaotic sequence. The chaotic sequence generated and forwarded to ANN and weights of ANN are updated that helps in the generation of the key in the encryption algorithm. The encryption algorithm that is used is AES asymmetric encryption algorithm. The CNN is implemented in MATLAB. To compare the outputs the parameters used are relative performance peak signal to noise ratio (PSNR) and mean square error (MSE) for different input images.
Image Encryption Algorithms Using Machine Learning and Deep …
509
2.2 Novel Priority Based Document Image Encryption with Mixed Chaotic Systems Using Machine Learning Approach Revanna and Keshavamurthy C has presented a paper [2]. Different kinds of information are present in the document images and this information is required to be encrypted with different levels of security. In this paper, firstly the image is classified based on feature extraction. The K-Nearest Neighbor (K-NN) image classification method is used to classify the given document with the trained set of features which is already obtained from document database. Optical Character Recognition (OCR) technique [12] is used to identify the type of document and to identify the presence and location of text/numerals in the document. Based on the document type the priority level is assigned. Document images with different priorities are encrypted with different multi-dimensional chaotic maps. Document are having highest priority are encrypted with highest level of security but Documents with lower priority levels are encrypted with lesser security levels. They proposed a work for different types of documents with different types of image features for large trained database. From the results they examined that the documents without priority levels are consuming more time for encryption when compared with the document images with priority. The statistical tests NIST (National Institute of Standards and Technology) are also conducted to identify the randomness. The proposed work ensures security against the various statistical and differential attacks.
2.3 Learnable Image Encryption Tanaka has presented a paper [4] in order to train the network with encrypted images without any privacy issues. It is a block wise pixel shuffling algorithm known as learnable image encryption algorithm. In this algorithm, 8-bit RGB is divided in to mxm block size. Each block is divided in to 4-bit upper and 4-bit lower images. After division, the image pixels intensities are randomly chosen and reversed. Then the random pixel shuffle is applied, and encrypted image is restored. The network structure is adjusted based on the proposed algorithm. To handle block wise image encryption, a convolution m × m sized filter and m × m stride is used for first layer. The feature map is unsampled to original sized resolution by sub-pixel convolution [13]. This can be done after stacking several networks in network style layers [14]. Experiments were conducted with cifar dataset [15]. In proposed algorithm the block size is set to four and the image size is set to 32 × 32. Different types of encryption algorithms are compared with their accuracies of trained networks and it is similar to plain image encryption, combined cat map and navie block wise pixel shuffle.
510
T. N. Lakshmi et al.
2.4 Encrypted Image Retrieval System: A Machine Learning Approach Tapan Kumar Hazra, Sreejit Roy Chowdhury, Ajoy Kumar Chakraborty have presented a paper [5]. In this paper they proposed an encryption algorithm for the images in the database. It also helpful to retrieve the images securely without any privacy issues. They concentrated on the feature database construction algorithm and image retrieval mechanism. In feature database construction, the key features are obtained by choosing colour moments, HSV histograms. Encrypting images using the algorithm [16] is advantageous as the encryption techniques do not change the pixel values. They proposed an algorithm using SVM or KNN. Firstly, an algorithm is proposed for generating vectors based on space frequency localization by considering RGB image by storing the individual colour channels in 2D array. After that encryption is performed using pseudo random permutation. Each channel is encrypted, and the same process is continued for all the images present in the database. The feature database is constructed for all images and image retrieval using KNN algorithm as well as KNN is performed. Both SVM and KNN algorithms are implemented and compared. Both algorithms retrieve similar images and also PNR value is calculated for both algorithms shown same results. But SVM algorithm requires less time because it searches less number of images which belongs to the same class whereas KNN searches all the images in the database. The proposed algorithm is applied for the unsupervised learning algorithms.
2.5 Machine Learning Classification Over Encrypted Data Raphaël Bost, Raluca Ada Popayz Stephen Tuz Shafi Goldwasserz have presented a paper [6]. Machine learning classification plays major role in genomics predictions or medical, financial predictions, spam detection, face recognition. Privacy is major issue to secure the data and the classifier confidential. For that reason they constructed three classification protocols. They are hyper plane decision, decision trees and Naïve Bayes. Hyper-plane decision classifiers are used to solve binary classification problem in which user input is classified in to class1 if it satisfies the condition otherwise labeled under class2. They can be extended from two to k classes. The classifier is [17]. Naïve Bayes classifier works with various probabilities, where each variable is calculated with probabilities of certain class it falls in to. They preferred maximum posterior decision rule by choosing the maximum posterior probability. Bayesian classifier is [18]. Decision trees are used to classify the data by partitioning one
Image Encryption Algorithms Using Machine Learning and Deep …
511
attribute at a time. It is a tree structure where internal nodes represent the classification rules and the leaf nodes represents to class labels.
2.6 Reversible Data Hiding Scheme During Encryption Using Machine Learning V. M. Manikandan and V. Masilamania have presented a paper [7]. For secured digital image transmission, Reversible data hiding (RDH) is a recent research field of information security. Now days, the invention of robotics are very useful in telemedicine applications. The transmission electronic patient records (EPR) and medical images is a common process in telemedicine applications. The robotics captured images that should be authenticated so they introduced RDH scheme to authenticate the data. They proposed RDH scheme, consists of sender and receiver side. Different existing RDH schemes are available [19–21] In the sender side, the images gathered from robotics are encrypted with encrypted keys by using data hiding key. A block-wise image encryption technique has been used in the proposed scheme to obtain the encrypted image with hidden EPR data bits. It generates encrypted image with secret message and it is send to the receiver side. There image recovery and data extraction process is carried out which generates recovered image and secret image. At the receiver side a trained Support Vector Machine is required. It helps in classification for data extraction and image recovery process from the encrypted image. After their experimental study of the proposed scheme on medical images from OsriX dataset shows that the proposed scheme performs better than the existing schemes in terms of embedding rate and bit error rate.
3 Deep Learning Algorithms 3.1 An Image Compression and Encryption Scheme Based on Deep Learning Fei Hu, Changjiu Pu, Haowei Gao, Mengzi Tangand Li Li have presented a paper [3]. In this paper, they introduced an application of image compression using Stacked Auto-Encoder (SAE) and encryption using chaotic logistic map. SAE is a kind of deep learning algorithm for unsupervised learning. It is multilayer auto encoder consists of several auto encoders. Each layer is trained with data and projects input to the next layer and the same process is continued for the remaining layers. These projection vectors are dense representations of the input data. The given input image is given to SAE by considering different levels; the input image is compressed and produces the compressed image. The compressed image is given to chaotic logistic
512
T. N. Lakshmi et al.
map to encrypt the given image generates encrypted image. After their experiments done on different images, conclude that the above idea is effective and feasible. It can be used on internet for transmission and image protection.
3.2 Batch Image Encryption Using Generated Deep Features Based on Stacked Auto Encoder Network Fei Hu, Jingyuan Wang, Xiaofei Xu, Changjiu Pu, and Tao Peng have presented a paper [8]. For encryption of images, chaos based algorithms have been used widely. Already existing chaos based encryption schemes are not more secure for batch image encryption. They concentrate only single sequence which is vulnerable. So they proposed a batch image encryption scheme with stack auto encoder network was introduced, generates two chaotic matrices. One set of matrix is used to shuffle the entire matrix by shuffling the pixel positions in plain image. Second one is used to generate the confusion matrix relationship between the permuted image and encrypted image. Different chaos algorithms for encrypted images are [22]. In batch encryption scheme they are going to generate chaotic matrix generation network by initializing parameters. For batch image encryption, image dataset are prepared and shuffling matrix is generated with chaotic sequence. After that shuffling the plain image and confusing the relationship between permuted image and encrypted image. For performance and security analysis they preferred key space analysis, histogram analysis, correlation analysis, sensitivity analysis and differential attack. The experimental results shows that the batch image encryption is efficient and fast when compared to other algorithms but it has a limitation that all images have same size.
3.3 Research on Iris Image Encryption Based on Deep Learning Xiulai Li, Yirui Jiang, Mingrui Chen and Fang Li have presented a paper [23]. Now a days there is a demand for encryption technology based on biometrics. In that Iris technology has become more important in information security because the characteristics of iris do not change. They proposed iris feature encryption technology because it is very difficult to make forgery. In this algorithm iris feature is extracted from the framework consists of iris image acquisition, image processing, feature extraction and the given input iris image is compared with the database. If it matches or not it gives the recognition result. Deep learning is a technique used to construct a network with different hidden layers. Deep learning techniques based on artificial neural networks for data feature
Image Encryption Algorithms Using Machine Learning and Deep …
513
extraction are [24–28]. Each layer is trained by using the trained datasets. The network structure helps to extract the features. They had chosen CNN algorithm for deep learning for training the datasets. They introduced an algorithm for iris feature extraction based on deep learning. Reed Solomon error correcting code and iris image encryption and decryption schemes are introduced.
3.4 Privacy-Preserving Distributed Deep Learning via Homomorphic Re-Encryption Fengyi Tang, Wei Wu, Jian Liu, Huimei Wang and Ming Xian have presented a paper [9]. The concept of deep learning in distributed environment requires data privacy. The private keys of all learning participants are the same [10]. Because a learning participant must connect to the server via TLS/SSL secure channel to avoid leaking data from one participant to other participant. The privacy preserving deep learning scheme via homomorphic encryption is discussed in [10]. The disadvantages are handled in this proposed algorithm. To solve these problems they proposed privacy preserving distributes deep learning scheme to improve the information leakage to the server and the learning participants requires only single secure channels to communicate and also the accuracy of deep learning is higher. To achieve the concepts they introduced key transform server using homomorphic encryption in asynchronous stochastic gradient descent. The cost for the algorithm is also tolerable. It helps to get more security for deep learning in distributed data environment.
3.5 DLEDNet: A Deep Learning-Based Image Encryption and Decryption Network for Internet of Medical Things Yi Ding, Guozheng Wu, Dajiang Chen, Ning Zhang have presented a paper [29]. Now a days, the rapid development of internet of medical things Technology [30– 32] are there to connect to different network to facilitate the process of treating and diagnosing the patients is important. And also to maintain the patient’s data secure plays crucial role. To achieve this they proposed DLEDNet to encrypt and decrypt the medical image. They used Cycle-GAN network which is the main learning network to transfer the medical image from source to destination which is known as target domain. It refers to hidden factors used to guide the model to know the encryption process. They proposed Region of Interest (ROI) mining network to extract the objects directly from the encrypted image. They had done the experiment on X-ray Dataset shows that high level of security is achieved with good performance.
514
T. N. Lakshmi et al.
4 Conclusion The main idea of this paper is to know the different image encryption algorithms using machine learning and deep learning algorithms. There are different situations where we can apply algorithms. Some of the algorithms with their performances and key idea are discussed for each and every algorithm.
References 1. Chauhan, M., Prajapati, R.: Image encryption using chaotic based artificial neural network. Int. J. Sci. Eng. Res. 5(6) (2014) 2. Revanna, C.R., Keshavamurthy, C.: A novel priority based document image encryption with mixed chaotic systems using machine learning approach. Facta Univ Ser Electron Energetics 32(1), 147–177 (2019) 3. Hu, F., Pu, C., Gao, H., Tangand, M., Li, L.: An image compression and encryption scheme based on deep learning, arXiv.1608.05001, Oct 2016 4. Tanaka, M.: Learnable image encryption. In: IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW) (2018) 5. Hazra, T.K., Chowdhury, S.R., Chakraborty, A.K.: Encrypted image retrieval system: a machine learning approach, January 31, 2019, Digital Object Identifier, https://doi.org/10.1109/ACC ESS.2019.2894673 6. Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data 7. Manikandana, V.M., Masilamania, V.: Reversible data hiding scheme during encryption using machine learning, Procedia Computer Science Volume 133, 2018, Pages 348-356 8. Hu, F., Wang, J., Xu, X., Pu, C., Peng, T.: Batch image encryption using generated deep features based on stacked autoencoder network. Math. Prob. Eng. 2017, 12 pp (Article ID 3675459) 9. Tang, F., Wu, E., Liu, J., Wang, H., Xian, M.: Privacy preserving distributed deep learning via homomorphic re-encryption. e Creative Commons Attribution (CC BY) license. https://cre ativecommons.org/licenses/by/4.0/, Electronics 2019, 8(4), 411, https://doi.org/10.3390/electr onics8040411, 9 April 2019 10. Aono, Y., Hayashi, T., Wang, L., Moriai, S.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13, 1333–1345 (2018) 11. Chauhan, M., Prajapati, R.: Image encryption using Chaotic cryptosystems and artificial neural network cryptosystems: a review. Int. J. Sci. Eng. Res. 5(5) (2014) 12. Schenker, A., Last, M., Bunke, H., Kandel. A.: Classification of web documents using a graph model. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR-2003), IEEE-Computer Society (2003), ISBN:0-7695-1960-1, https://doi.org/ 10.1109/ICDAR.2003.1227666 13. Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883 (2016) 14. Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (ICLR) (2014), arXiv:1312.4400v3 [cs.NE] 4 Mar 2014 15. Krizhevsky, A.: Learning multiple layers of features from tiny images, Tech Report ,April 8, 2009 16. Chowdhury, S.R., Hazra, T.K., Chakroborty, A.K.: Image encryption using pseudorandom permutation. Am. J. Adv. Comput. 1(1), 4–6 (2014). https://doi.org/10.15864/ajac.v1i1.2
Image Encryption Algorithms Using Machine Learning and Deep …
515
17. Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 1 (2006) 18. Tschiatschek, S., Reinprecht, P., Mücke, M., Pernkopf, F.: Bayesian network classifiers with reduced precision parameters. In: Machine Learning and Knowledge Discovery in Databases, pp 74–89 (2012) 19. Al-Qershi, O.M., Khoo, B.E.: High capacity data hiding schemes for medical images based on difference expansion. J. Syst. Softw. 84, 105–112 (2011) 20. Celik, M.U., Sharma, G., Tekalp, A.M., Saber, E.: Reversible data hiding. In: International Conference on Image Processing, IEEE, pp. II–II (2002) 21. Celik, M.U., Sharma, G., Tekalp, A.M., Saber, E.: Lossless generalized-lsb data embedding. IEEE Trans. Image Process. 14, 253–266 (2005) 22. Fridrich, J.: Symmetric ciphers based on two-dimensional chaoticmaps. Int. J. Bifurcation Chaos Appl. Sci. Eng. 8(6), 1259–1284 (1998) 23. Li, X., Jiang, Y., Chenm, M., Li, F.: Research on iris image encryption based on deep learning. EURASIP J. Image Video Process. 2018, 126 (2018) 24. Rao, Y., Ni, J.: A deep learning approach to detection of splicing and copy-move forgeries in images. In: IEEE International Workshop on Information Forensics and Security, pp. 1–6. IEEE, London (2017) 25. Ye, J., Ni, J., Yi, Y.: Deep learning hierarchical representations for image steganalysis. IEEE Trans. Inf. Forensics Secur. 12(11), 2545–2557 (2017) 26. Aminanto, M.E., Kim, K.: Detecting impersonation attack in WiFi networks using deep learning approach. In International Workshop on Information Security Applications, pp. 136–147. Springer, Cham (2016) 27. Le, T.P., Aono, Y., Hayashi, T., et al.: Privacy-preserving deep learning: revisited and enhanced. In: International Conference on Applications and Techniques in Information Security, pp. 100– 110. Springer, Singapore (2017) 28. Chen, Z., Information, D.O.: Face deep learning technology in the design and implementation of the security in colleges and universities. J. Anyang Inst. Technol. 16(6), 70–75 (2017) 29. Ding, Y., Wu, G., Chen, D., Zhang, N.: DLEDNet: a deep learning-based image encryption and decryption network for internet of medical things. IEEE, arXiv:2004.05523v1,2020 30. Gatouillat, A., Badr, Y., Massot, B., Sejdi´c, E.: Internet of medical things: a review of recent contributions dealing with cyber-physical systems in medicine. IEEE Internet Things J. 5(5), 3810–3822 (2018) 31. Zhang, N., Yang, P., Ren, J., et al.: Synergy of big data and 5g wireless networks: opportunities, approaches, and challenges. IEEE Wirel. Commun. 25(1), 12–18 (2018) 32. Chen, D., Zhang, N., Qin, Z., Mao, X., Qin, Z., Shen, X., Li, X.Y.: S2M: a lightweight acoustic fingerprints-based wireless device authentication protocol. IEEE Internet Things J. 4(1), 88– 100 (2017)