Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication: Proceedings of MDCWC 2020 (Lecture Notes in Electrical Engineering, 749) 9811602883, 9789811602887

This book is a collection of best selected research papers presented at the Conference on Machine Learning, Deep Learnin

101 15 30MB

English Pages 662 [639] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Organization
Contents
About the Editor
Machine Learning, Deep Learning and Computational Intelligence Algorithms
Deep Learning to Predict the Number of Antennas in a Massive MIMO Setup Based on Channel Characteristics
1 Introduction
2 Contributions of the Paper
3 Mutual Orthogonality
3.1 Trend Analysis
3.2 Deep Learning Architecture to Predict Number of Antennas Required for Orthogonality
4 Perfect CSI
4.1 Signal-to-Interference-Noise-Ratio (SINR) Analysis
4.2 Deep Learning Architecture to Predict Number of Antennas Required for Convergence of SINR
5 Imperfect CSI: An SINR Analysis
6 Results
6.1 Mutual Orthogonality Simulation Data
6.2 Predicting Number of Base Station Antennas Required for Orthogonality
6.3 Perfect CSI-SINR Convergence Simulation Data
6.4 Perfect CSI-Predicting Number of Base Station Antennas Required for Convergence of SINR
6.5 Imperfect CSI—Analysing the Number of Antennas Required for Convergence of SINR
7 Conclusions
References
Optimal Design of Fractional Order PID Controller for AVR System Using Black Widow Optimization (BWO) Algorithm
1 Introduction
2 Overview of Automatic Voltage Regulator (AVR) System
3 Fractional Calculus and Fractional Order Controllers
3.1 Fractional Calculus
3.2 Fractional Order Controller
4 Black Widow Optimization
4.1 Initial Population
4.2 Procreate
4.3 Cannibalism
4.4 Mutation
5 Proposed BWO-FOPID Controller
6 Results and Discussions
6.1 Step Response
6.2 Robust Analysis
7 Conclusion
References
LSTM Network for Hotspot Prediction in Traffic Density of Cellular Network
1 Introduction
2 Dataset Collection and Representation
3 Hotspot Prediction Using LLR Method
3.1 Algorithm to Find Hotspot Using LLR
3.2 LSTM Architecture to Predict Future Hotspot Using LLR
4 Hotspot Prediction Using CDF Method
4.1 Algorithm to Find Hotspot Using Cumulative Distribution Function
4.2 LSTM Architecture to Predict Future Hotspot Using CDF
5 Results
6 Conclusions
References
Generative Adversarial Network and Reinforcement Learning to Estimate Channel Coefficients
1 Introduction
2 Contributions of the Paper
3 Signal Source Separation
3.1 Using Generative Adversarial Networks and Reinforcement Learning
3.2 Generative Adversarial Networks for Data Space Distribution Modelling
3.3 Reinforcement Learning-Based Sampling Technique for Signal Estimation
4 Results
4.1 Extraction of Distributions Using GAN
4.2 Estimating True Values of Channel Coefficients
5 Conclusions
References
Novel Method of Self-interference Cancelation in Full-Duplex Radios for 5G Wireless Technology Using Neural Networks
1 Introduction
2 Signal Modeling
3 Solutions for Self-Interference (SI) Cancelation
3.1 Outline of Hybrid SI Cancelation
3.2 Proposed Solution for Implementing Digital Cancelation Using Neural Networks
4 Results and Discussions
5 Conclusions
References
Dimensionality Reduction of KDD-99 Using Self-perpetuating Algorithm
1 Introduction
1.1 Feature Selection Methods
2 Related Work
3 Proposed Work
3.1 Basic Idea Behind Self-perpetuating Algorithm
3.2 The Proposed Algorithm
4 Experimental Setup
5 Conclusion and Future Work
Reference
Energy-Efficient Neighbor Discovery Using Bacterial Foraging Optimization (BFO) Algorithm for Directional Wireless Sensor Networks
1 Introduction
1.1 Problem Identification
2 Related Works
3 Energy-Efficient Neighbor Discovery Using BFOA
3.1 Overview
3.2 Fundamentals of Optimization Algorithm
3.3 Estimation of Metrics
3.4 Energy-Efficient Neighbor Discovery
4 Simulation Results
4.1 Simulation Setup
4.2 Simulation Results and Analysis
5 Conclusion
References
Auto-encoder—LSTM-Based Outlier Detection Method for WSNs
1 Introduction
2 Related Literature Work
3 Proposed Model
3.1 Auto-encoder Preliminaries
3.2 BLSTM-RNN Preliminaries
4 Experimental Results
5 Conclusion
References
An Improved Swarm Optimization Algorithm-Based Harmonics Estimation and Optimal Switching Angle Identification
1 Introduction
2 Harmonic Estimation and Switching Angles Identification
3 Improved Particle Swarm Optimization Algorithm
4 Simulation Results
5 Conclusions
References
A Study on Ensemble Methods for Classification
1 Introduction
2 Related Work
3 Ensemble Learning Approaches
3.1 Bagging
3.2 Boosting
3.3 Stacking
3.4 Random Forest
4 Application of Ensemble Techniques
5 Deep Learning and Ensemble Techniques
6 Experimentation
7 Conclusion
References
An Improved Particle Swarm Optimization-Based System Identification
1 Introduction
2 Problem Formulation
3 Improved Particle Swarm Optimization Algorithm
4 Simulation Results
5 Conclusions
References
Channel Coverage Identification Conditions for Massive MIMO Millimeter Wave at 28 and 39 GHz Using Fine K-Nearest Neighbor Machine Learning Algorithm
1 Introduction
2 Network Architecture
3 Simulation Methodology
4 Simulation Measurements
5 Pathloss
6 Power Delay Profile
7 Fine-KNN
8 Conclusion
References
Flip Flop Neural Networks: Modelling Memory for Efficient Forecasting
1 Introduction
2 Previous Work
2.1 Long Short-Term Memory (LSTM)
3 Model Architecture
4 Experiments
4.1 Household Power Consumption
4.2 Flight Passenger Prediction
4.3 Stock Price Prediction
4.4 Indoor Movement Classification
5 Conclusion
References
Wireless Communication Systems
Selection Relay-Based RF-VLC Underwater Communication System
1 Introduction
1.1 Paper Structure
2 Proposed System Model
2.1 Source-Relay (s-r) Hop
2.2 Relay-Destination (r-d) Hop
2.3 Underwater Attenuation Coefficient Model
2.4 Water Turbidity Channel Modeling
2.5 Pointing Error in Underwater VLC Link
3 BER Performance of the System
4 Numerical Results
5 Conclusion
References
Circular Polarized Octal Band CPW-Fed Antenna Using Theory of Characteristic Mode for Wireless Communication Applications
1 Introduction
2 Theory of Characteristics Modes Analysis (TCMs)
3 Antenna Design Procedure
4 Experimental Results and Discussion
5 Conclusion
References
Massive MIMO Pre-coders for Cognitive Radio Network Performance Improvement: A Technological Survey
1 Introduction
2 System Background
2.1 Cognitive Radio Network (CRN)
2.2 MIMO System
2.3 CRN MIMO
2.4 Massive MIMO Systems
3 Pre-coding in Massive MIMO
3.1 Linear Pre-coding Techniques
3.2 Nonlinear Pre-coding Techniques
3.3 Constant Envelope Pre-coding Techniques
3.4 Pre-coding in MIMO CRN
4 Conclusion
Reference
Design of MIMO Antenna Using Circular Split Ring Slot Defected Ground Structure for ISM Band Applications
1 Introduction
2 Antenna Design
3 Results and Discussions
4 Conclusion
References
Performance Comparison of Arduino IDE and Runlinc IDE for Promotion of IoT STEM AI in Education Process
1 Introduction
2 Significance of IoT and AI in Education: Opportunities and Challenges
3 System Model: Smart Home
4 Results and Discussion
5 Microcontrollers
5.1 Arduino UNO and Arduino IDE
5.2 STEMSEL and Runlinc IDE
6 Implementation
7 Conclusion and Future Works
References
Analysis of Small Loop Antenna Using Numerical EM Technique
1 Introduction
2 Loop Antennas
3 Finite Element Mesh
4 Methodology
5 Antennas-Gradient Methods
6 Conclusion and Discussion
References
A Monopole Octagonal Sierpinski Carpet Antenna with Defective Ground Structure for SWB Applications
1 Introduction
2 Antenna Design and Analysis
2.1 Methodology
2.2 Configuration
2.3 Operating Principle
3 Results and Discussion
4 Conclusion
References
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems
1 Introduction
2 System Model
2.1 FBMC with OQAM System Model
2.2 Overlapping Structured FBMC with OQAM Signals
2.3 The PAPR Description in FBMC with OQAM System
3 Selective Mapping (SLM) Scheme
3.1 Conventional SLM Scheme
3.2 The SLM with Converse Vectors (C-SLM)
3.3 Conversion Vector-Based Dispersive SLM (C-DSLM) Schemes with FBMC with OQAM
4 Proposed DFT Spread C-DSLM for FBMC—OQAM System
4.1 DFT Spreading
4.2 The Conversion Vectors and Its Design
4.3 Proposed DFT Spread Converse Vectors with DSLM Method (C-DSLM)
4.4 Analysis of Computational Complexity
5 Performance Evaluation
5.1 Simulation Environment
5.2 Calculation of Computational Complexity
5.3 Transmission and Reception of DFT Spread C-DSLM Scheme
6 Conclusion
References
Secure, Efficient, Lightweight Authentication in Wireless Sensor Networks
1 Introduction
2 Authentication in WSN
3 Related Work
4 Proposed Authentication Mechanism
4.1 Simple Cluster Head (CH) Development
4.2 Proposed Protocol
5 AVISPA Tool Simulation Results, Security Analysis, and BAN Logic
5.1 BAN Logic Analysis: Logical Rules of BAN Logic
6 Conclusion
References
Performance Evaluation of Logic Gates Using Magnetic Tunnel Junction
1 Introduction
2 Magnetic Tunnel Junction
3 Proposed Approach
4 Simulation of Logic Circuits and Comparison
5 Conclusion
References
Medical IoT—Automatic Medical Dispensing Machine
1 Introduction
2 Challenges Faced During COVID-19 Situation
3 Need for Automatic Medical Dispensing Machine
4 Existing Solutions
4.1 Semi-automated Dispensing Machine Using Barcode
4.2 Drug Data Transfer System
4.3 Computerized Physician Order Entry (CPOE) for Neonatal Ward
5 Proposed Solution
5.1 Modules in Our Solution
6 Conclusion and Future Scope
References
Performance Analysis of Digital Modulation Formats in FSO
1 Introduction
2 Implementation of Modulation
3 Digital Schemes
4 Differential Phase-Shift Keying
4.1 DPSK System
4.2 DPSK Optical System
5 Offset Quadrature Phase-Shift Keying
5.1 OQPSK System
5.2 OQPSK Optical System
6 Results and Discussions
7 Conclusion
References
High-Level Synthesis of Cellular Automata–Belousov Zhabotinsky Reaction in FPGA
1 Introduction
1.1 Cellular Automata
1.2 High-Level Synthesis
2 Belousov Zhabotinsky Reaction
2.1 Simplified Reaction Mechanism
2.2 Reaction Surface
3 Programming the Automata in FPGA
3.1 Optimizations Specific to FPGA
4 Results
4.1 Simulation
4.2 Synthesis
5 Conclusions
Reference
IoT-Based Calling Bell
1 Introduction
2 Proposed Method
3 Arduino Specific Instructions
4 Screens
5 System Test
6 Conclusion
References
Mobile Data Applications
Development of an Ensemble Gradient Boosting Algorithm for Generating Alerts About Impending Soil Movements
1 Introduction
2 Background
3 Methodology
3.1 Data
3.2 Measures for Evaluating ML Algorithms
3.3 Different Algorithms Used for Classification
3.4 Model Calibration
4 Results
5 Discussion and Conclusion
References
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks
1 Introduction
2 Related Work
3 Seam Carving and Seam Insertion
4 Detection of Seam Carving
5 Experiments
5.1 Experimental Setup
5.2 Learning
5.3 Detection Heatmaps
5.4 Robustness to Percentage of Seams Removed
5.5 Robustness to JPEG Compression
5.6 Explainability on Object Removed Images
5.7 Extension to Seam Insertion Detection
6 Conclusion and Future Work
References
A Machine Learning-Based Approach to Password Authentication Using Keystroke Biometrics
1 Introduction
2 Keystroke Dynamics
2.1 Types of Authentication Systems
2.2 Timing Features
2.3 Evaluation Parameters
3 Models Used
3.1 Support Vector Machines (SVM)
3.2 Random Forest Algorithm (RF)
3.3 Artificial Neural Network (ANN)
4 Dataset
5 Results
6 Conclusion
References
Attention-Based SRGAN for Super Resolution of Satellite Images
1 Introduction
1.1 Deep Learning for Super Resolution
1.2 Generative Adversarial Network-Based Deep Learning for SR
1.3 Motivation and Contribution
2 Attention-Based SRGAN Model
2.1 Network Structure
2.2 Generator
2.3 Discriminator
2.4 Loss Function
3 Results and Discussion
4 Conclusion
References
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques
1 Introduction
2 Datasets
3 Proposed Method
4 Result and Discussion
5 Conclusion
References
Computer-Aided Classifier for Identification of Renal Cystic Abnormalities Using Bosniak Classification
1 Introduction
2 Methodology
2.1 Pre-processing
2.2 Kidney Segmentation
2.3 Feature Extraction
2.4 Classification
3 Results and Discussion
3.1 Database
3.2 Experiment Setup
4 Conclusions
References
Recognition of Obscure Objects Using Super Resolution-Based Generative Adversarial Networks
1 Introduction
1.1 Image Super Resolution
1.2 Deep Learning for Super Resolution
2 Proposed Methodology
2.1 Stages of Recognition
3 Results and Discussion
3.1 Database for Stage I RCNN:
3.2 Stage II-GAN
3.3 Stage III-ALEXNET
3.4 RCNN-Based Object Detection
3.5 GAN-Based Super Resolution
4 Conclusion
References
Low-Power U-Net for Semantic Image Segmentation
1 Introduction
1.1 Convolutional Neural Networks
1.2 Quantized Neural Network
1.3 Need of Field Programmable Gate Array (FPGA) for inference of CNNs
2 Related Works
3 Methodology
3.1 Vitis™ AI Development Kit
3.2 Development Flow
3.3 Deep Learning Processing Unit
3.4 Network Architecture
3.5 Dataset
3.6 Training
4 Experiments and Results
4.1 Processing System—Programmable Logic System
4.2 Network Inference and Quantization
5 Discussion
6 Conclusion
References
Electrocardiogram Signal Classification for the Detection of Abnormalities Using Discrete Wavelet Transform and Artificial Neural Network Back Propagation Algorithm
1 Introduction
2 Proposed Method for Classifying ECG Signals
3 Collection of Database
4 Preprocessing of Electrocardiogram Signal
5 Feature Extraction of ECG Signal
6 Artificial Neural Network Classifier Using Back Propagation Algorithm
7 Simulated Results and Discussions of ANN Classifier Using Back Propagation Algorithm
8 Conclusion
References
Performance Analysis of Optimizers for Glaucoma Diagnosis from Fundus Images Using Transfer Learning
1 Introduction
2 Related Work
3 Methodology
3.1 Transfer Learning
3.2 Optimization Algorithms
4 Results and Discussion
5 Conclusion
References
Machine Learning based Early Prediction of Disease with Risk Factors Data of the Patient Using Support Vector Machines
1 Introduction
2 Literature Review
3 Proposed System
3.1 Overview of the Proposed System
3.2 Proposed System Implementation
4 Experiments and Results
4.1 Mobile Application for Health Monitoring
4.2 R-Studio Data Visualization
5 Discussion
6 Conclusion
References
Scene Classification of Remotely Sensed Images using Ensembled Machine Learning Models
1 Introduction
2 Related Works
3 Proposed Works
3.1 Speed-Up Robust Feature (SURF) Extraction
3.2 Ensemble Classifier Learning Systems
4 Performance Evaluation Metrics
4.1 Precision
4.2 Recall
4.3 Accuracy
4.4 F1-Score
5 Results and Discussions
5.1 Dataset Description
5.2 Experimental Analysis of Base Classifiers
6 Conclusion
References
40 Fuzziness and Vagueness in Natural Language Quantifiers: Searching and Systemizing Few Patterns in Predicate Logic
Abstract
1 Introduction
2 Related Works
3 Objectives of the Study
4 Quantifiers in Punjabi and Hindi
4.1 Punjabi Quantifiers
4.2 Hindi Quantifiers
5 Vague Nature for Punjabi Quantifiers: Some Investigations in a Predicate Logic
5.1 Structuring Fuzzy Quantifiers
5.2 Mapping Plan for Fuzzy Quantifiers
6 Discussions and Results
7 Conclusion and Future Endeavors
Acknowledgements
References
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure Linguistic Feature Engineering: A Novel Approach
1 Introduction
2 Related Work
3 Dataset
3.1 Collection
3.2 Filtering
3.3 Preprocessing
4 Methodology
4.1 Linguistic Features
4.2 Problem Formulation
4.3 Machine Learning Models
5 Feature Engineering
6 Results
7 Conclusion
8 Limitations and Future Work
References
Groundwater Level Prediction and Correlative Study with Groundwater Contamination Under Conditional Scenarios: Insights from Multivariate Deep LSTM Neural Network Modeling
1 Introduction
2 Literature Review
3 Multivariate LSTM Modeling Under Circumstantial Scenarios of Groundwater Prediction
3.1 Simulation Results of Conditional Scenarios
4 Groundwater Contamination and Predictive Modeling
4.1 Data Preprocessing and Empirical Modeling
4.2 Model Simulation and Discussion
5 Correlated Study Between Groundwater Level and Groundwater Contamination
6 Concluding Remarks and Future Scope of Study
References
A Novel Deep Hybrid Spectral Network for Hyperspectral Image Classification
1 Introduction
2 Related Works
3 Methodology
4 Dataset
5 Results
6 Conclusion
References
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE Enhancement and Classifying with Support Vector Machine
1 Introduction
2 Related Literary Work
3 Proposed System
4 Description of the Schematic Diagram
4.1 Dataset
4.2 Stage I—Preprocessing
4.3 Stage II—Segmentation
4.4 Stage III—Feature Extraction
4.5 Stage IV—Classification
4.6 Results and Conclusion
5 Conclusion and Future Scope
References
Analysis of Pre-earthquake Signals Using ANN: Implication for Short-Term Earthquake Forecasting
1 Introduction
1.1 Involving Concepts
1.2 Introduction to Neural Networks
1.3 Elman Backpropagation Neural Network
2 Study Area
3 Methodology
3.1 Anomalous Outgoing Longwave Radiation
3.2 Elman Backpropagation Neural Network
4 Findings and Discussions
4.1 Training Function
4.2 Layers and Description of Nodes
4.3 Input Nodes
4.4 Variables Involved
4.5 Spatial Parameters
4.6 Time Variable
5 Conclusion
References
A Novel Method for Plant Leaf Disease Classification Using Deep Learning Techniques
1 Introduction
2 Literature Review
3 Materials and Methods
3.1 Preprocessing
3.2 Training
3.3 Testing
4 Results and Discussion
5 Conclusion
References
Recommend Papers

Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication: Proceedings of MDCWC 2020 (Lecture Notes in Electrical Engineering, 749)
 9811602883, 9789811602887

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Electrical Engineering 749

E. S. Gopi   Editor

Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication Proceedings of MDCWC 2020

Lecture Notes in Electrical Engineering Volume 749

Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Junjie James Zhang, Charlotte, NC, USA Yong Li, Hunan University, Changsha, Hunan, China

The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering - quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •

Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada: Michael Luby, Senior Editor ([email protected]) All other Countries: Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **

More information about this series at http://www.springer.com/series/7818

E. S. Gopi Editor

Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication Proceedings of MDCWC 2020

Editor E. S. Gopi Department of Electronics and Communication Engineering National Institute of Technology Tiruchirappalli Tiruchirappalli, Tamil Nadu, India

ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-16-0288-7 ISBN 978-981-16-0289-4 (eBook) https://doi.org/10.1007/978-981-16-0289-4 © Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

I dedicate this book to my mother Late. Mrs. E. S. Meena.

Preface

Due to the feasibility of collecting huge data from mobile and wireless networks, there are many possibilities of using machine learning, deep learning and the computational intelligence to interpret and to hunt knowledge from the collected data. The workshop aims in consolidating the experimental results, integrating the machine learning, deep learning and computational intelligence for wireless communication and the related topics. This book consists of the reviewed papers grouped under the following topics: (a) machine learning, deep learning and computational intelligence algorithms, (b) wireless communication systems and (c) mobile data applications. I thank those directly and indirectly involved in executing the online event MDCWC 2020 held from 22 October to 24 October 2020 successfully. Thanks Tiruchirappalli, India October 2020

E. S. Gopi Programme Chair MDCWC 2020

vii

Organization

Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication (MDCWC 2020) is the first international online workshop organized by the Pattern Recognition and the Computational Intelligent Division, Department of Electronics and Communication Engineering, National Institute of Technology Tiruchirappalli. It was conducted completely in the virtual mode from 22 to 24 October 2020. The keynote speakers include the following: (a) Prof. K. K. Biswas (Retired Professor), Indian Institute of Technology Delhi, on “deep learning”; (b) Prof. Emre Celebi, Professor and Chair of the Department of Computer Science, University of Central Arkansas, on “data clustering and K-means algorithm”; (c) Prof. Dush Nalin Jayakody, Professor, School of Computer Science and Robotics, National Tomsk Polytechnic University, Russia, on “age of information and energism on the future wireless networks”; (d) Dr. Jithin Jagannath, Director, Marconi-Rosenblatt AI/ML Innovation Lab, New York, USA, on “how will machine learning revolutionize wireless communication”. The invited talk includes the following: (a) Dr. Lakshmanan Nataraj, Senior Research Staff, Mayachitra Inc., Santa Barbara, CA, USA, on “detection of GAN-generated images and deepfakes”; (b) Dr. Lalit Kumar Singh, Scientist, NPCIL-BARC, Department of Atomic Energy, Government of India, on “reliability analysis of safety critical systems using machine learning”; (c) Dr. Shyam Lal, Faculty in National Institute of Technology Karnataka, on “deep learning for IOT applications”; (d) Dr. Gaurav Purohit, Scientist, CSIR-CEERI, Pilani, on “intelligent data analytics for prediction of AIR quality on pre- and post-COVID India dataset”; (e) Mr. Abhinav K. Nair, Senior Engineer (R&D), Radisys India Pvt. Ltd., on “6G— why should we talk about it now?”; (f) Mr. Mahammad Shaik, Qualcomm Engineer, Hyderabad, on “computational intelligence for efficient transmission policy for energy harvesting and spectral sensing in cognitive radio system”. The paper review process was executed using EasyChair. All the selected papers were presented under three different tracks, (a) machine learning, deep learning and computational intelligence algorithms (MLDLCI), (b) wireless communication (WC) and (c) mobile data applications (MDA). The selected papers include the topics like deep learning to predict the number of antennas in massive MIMO setup, black widow algorithm for AVR system, LSTM ix

x

Organization

for hotspot detection, GAN to estimate channel coefficients, self-interference cancellation in full-duplex system, RF-VLC underwater communication system, mobile data applications like SRGAN for super-resolution of satellite images, glaucoma diagnosis from fundus images, hyperspectral image classification, etc.

Technical Programme Commitee Abhinav K. Nair, Radisys India private Limited Akhil Gupta, Lovely Professional University, Phagwara, Punjab Anand Kulkurni, Symbiosis Institute of Technology, Pune K. Aparna, National Institute of Technology Surathkal K. K. Biswas, Retired Professor, Indian Institute of Technology Delhi Deep Gupta, Visvesvaraya National Institute of Technology Dushantha Nalin K. Jayakody, National Tomsk Polytechnic University (TPU), Russia Emre Celebi, Professor and chair of the Department of Computer Science, University of Central Arkansas, USA Gaurav Purohit, Scientist, CSIR-CEERI, Pilani Hariharan Muthusamy, NIT, Srinagar Jithin Jagannath, Director, Marconi-Rosenblatt AI/ML Innovation Lab, New York, USA Lakshmanan Nataraj, Senior Research scientist, Mayachitra Deep learning solutions, Santa Barbara, CA, USA Lakshmi Sutha, National Institute of Technology Puducherry Lalit Singh, Indian Institute of Technology Bhubaneswar Mandeep Singh, National Institute of Technology Surathkal Mohammad Shaik, Qualcomm, Hyderabad Murugan, National Institute of Technology Silchar, Assam A. V. Narasimhadhan, National Institute of Technology Surathkal Rajarshi Bhattacharya, National Institute of Technology Patna Rangababu, National Institute of Technology Meghalaya, Shillong Sanjoy Dharroy, National Institute of Technology Durgapur Sankar Nair, Qualcomm, Chennai B. Sathyabama, Thiagarajar College of Engineering, Madurai Satyasai Nanda, Malaviya National Institute of Technology Jaipur Shravan Kumar Bandari, National Institute of Technology Meghalaya, Shillong Shilpi Gupta, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat Shyam Lal, National Institute of Technology Karnataka Shrishil Hiremath, National Institute of Technology Rourkela Sudakar Chauhan, National Institute of Technology Kurukshetra R. Swaminathan, Indian Institute of Technology Indore Shweta Shah, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat Smrithi Agarwal, Motilal Nehru National Institute of Technology Allahabad Tajinder Singh Arora, National Institute of Technology, Uttarakhand

Organization

Umesh C. Pati, National Institute of Technology Rourkela Vineetha Yogesh, Qualcomm, Bangalore Wasim Arif, National Institute of Technology Silchar

Executive Commitee Members Patron Professor Mini Shaji Thomas, Director, NITT Co-patron Dr. Muthuchidambaranathan, Head of the ECE Department, NITT Coordinator and Programme Chair Dr. E. S. Gopi, Associate Professor, Department of ECE, NITT Co-coordinators Dr. B. Rebekka, Assistant Professor, Department of ECE, NITT Dr. G. Thavasi Raja, Assistant Professor, Department of ECE, NITT

Session Chairs Anand Kulkurni Ashish Gaurav purohit Gopi E. S. Lakshmanan Nataraj Maheswaran Narasimhadhan A. V. Rebekka B. Sathyabama Satyasai Jagannath Nanda Shravan Kumar Bandari Shyam Lal Smrithi Agarwal Sudhakar Sudharson Thavasi Raja G.

xi

xii

Referees Anand Kulkarni Aparna P. Ashish Patil Gangadharan G. R. Gaurav Purohit Janet Barnabas Koushik Guha Lakshmanan Nataraj Lakshmi Sutha G. Lalit Singh Mahammad Shaik Maheswaran Palani Mandeep Singh Murugan R. Rebekka Balakrishnan Sanjay Dhar Roy Sankar Nair Sathya Bama B. Satyasai Jagannath Nanda Shilpi Gupta Shravan Kumar Bandari Shrishail Hiremath Shweta Shah Shyam Lal Smriti Agarwal Sudakar Chauhan Sudha Vaiyamalai Sudharsan Parthasarathy Swaminathan Ramabadran Thavasi Raja G. Umesh C. Pati Varun P. Gopi Venkata Narasimhadhan Adapa Vineetha Yogesh

Supporting Team Members Rajasekharreddy Poreddy, Research scholar G. Jaya Brindha, Research scholar Vinodha Kamaraj, Research scholar

Organization

Contents

Machine Learning, Deep Learning and Computational Intelligence Algorithms Deep Learning to Predict the Number of Antennas in a Massive MIMO Setup Based on Channel Characteristics . . . . . . . . . . . . . . . . . . . . . . Sharan Chandra, E. S. Gopi, Hrishikesh Shekhar, and Pranav Mani

3

Optimal Design of Fractional Order PID Controller for AVR System Using Black Widow Optimization (BWO) Algorithm . . . . . . . . . . Vijaya Kumar Munagala and Ravi Kumar Jatoth

19

LSTM Network for Hotspot Prediction in Traffic Density of Cellular Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Swedha and E. S. Gopi

35

Generative Adversarial Network and Reinforcement Learning to Estimate Channel Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pranav Mani, E. S. Gopi, Hrishikesh Shekhar, and Sharan Chandra

49

Novel Method of Self-interference Cancelation in Full-Duplex Radios for 5G Wireless Technology Using Neural Networks . . . . . . . . . . . L. Yashvanth, V. Dharanya, and E. S. Gopi

59

Dimensionality Reduction of KDD-99 Using Self-perpetuating Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Swapnil Umbarkar and Kirti Sharma

79

Energy-Efficient Neighbor Discovery Using Bacterial Foraging Optimization (BFO) Algorithm for Directional Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sagar Mekala and K. Shahu Chatrapati

93

Auto-encoder—LSTM-Based Outlier Detection Method for WSNs . . . . . 109 Bhanu Chander and Kumaravelan Gopalakrishnan

xiii

xiv

Contents

An Improved Swarm Optimization Algorithm-Based Harmonics Estimation and Optimal Switching Angle Identification . . . . . . . . . . . . . . . 121 M. Alekhya, S. Ramyaka, N. Sambasiva Rao, and Ch. Durga Prasad A Study on Ensemble Methods for Classification . . . . . . . . . . . . . . . . . . . . . 127 R. Harine Rajashree and M. Hariharan An Improved Particle Swarm Optimization-Based System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Pasila Eswari, Y. Ramalakshmanna, and Ch. Durga Prasad Channel Coverage Identification Conditions for Massive MIMO Millimeter Wave at 28 and 39 GHz Using Fine K-Nearest Neighbor Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Vankayala Chethan Prakash, G. Nagarajan, and N. Priyavarthan Flip Flop Neural Networks: Modelling Memory for Efficient Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 S. Sujith Kumar, C. Vigneswaran, and V. Srinivasa Chakravarthy Wireless Communication Systems Selection Relay-Based RF-VLC Underwater Communication System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Mohammad Furqan Ali, Tharindu D. Ponnimbaduge Perera, Vladislav S. Sergeevich, Sheikh Arbid Irfan, Unzhakova Ekaterina Viktorovna, Weijia Zhang, Ândrei Camponogara, and Dushantha Nalin K. Jayakody Circular Polarized Octal Band CPW-Fed Antenna Using Theory of Characteristic Mode for Wireless Communication Applications . . . . . 193 Reshmi Dhara Massive MIMO Pre-coders for Cognitive Radio Network Performance Improvement: A Technological Survey . . . . . . . . . . . . . . . . . . 211 Mayank Kothari and U. Ragavendran Design of MIMO Antenna Using Circular Split Ring Slot Defected Ground Structure for ISM Band Applications . . . . . . . . . . . . . . . . . . . . . . . . 227 F. B. Shiddanagouda, R. M. Vani, and P. V. Hunagund Performance Comparison of Arduino IDE and Runlinc IDE for Promotion of IoT STEM AI in Education Process . . . . . . . . . . . . . . . . . 237 Sangay Chedup, Dushantha Nalin K. Jayakody, Bevek Subba, and Hassaan Hydher Analysis of Small Loop Antenna Using Numerical EM Technique . . . . . . 255 R. Seetharaman and Chaitanya Krishna Chevula

Contents

xv

A Monopole Octagonal Sierpinski Carpet Antenna with Defective Ground Structure for SWB Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 E. Aravindraj, G. Nagarajan, and R. Senthil Kumaran DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems . . . . 281 K. Ayappasamy, G. Nagarajan, and P. Elavarasan Secure, Efficient, Lightweight Authentication in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Bhanu Chander and Kumaravelan Gopalakrishnan Performance Evaluation of Logic Gates Using Magnetic Tunnel Junction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Jyoti Garg and Subodh Wairya Medical IoT—Automatic Medical Dispensing Machine . . . . . . . . . . . . . . . . 323 C. V. Nisha Angeline, S. Muthuramlingam, E. Rahul Ganesh, S. Siva Pratheep, and V. Nishanthan Performance Analysis of Digital Modulation Formats in FSO . . . . . . . . . . 331 Monica Gautam and Sourabh Sahu High-Level Synthesis of Cellular Automata–Belousov Zhabotinsky Reaction in FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 P. Purushothaman, S. Srihari, and S. Deivalakshmi IoT-Based Calling Bell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Sundara Babu Maddu, Gaddam Venu Gopal, Ch. Lasya Sarada, and B. Bhargavi Mobile Data Applications Development of an Ensemble Gradient Boosting Algorithm for Generating Alerts About Impending Soil Movements . . . . . . . . . . . . . . 365 Ankush Pathania, Praveen Kumar, Priyanka, Aakash Maurya, K. V. Uday, and Varun Dutt Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Lakshmanan Nataraj, Chandrakanth Gudavalli, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, and B. S. Manjunath A Machine Learning-Based Approach to Password Authentication Using Keystroke Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Adesh Thakare, Shreyas Gondane, Nilesh Prasad, and Siddhant Chigale Attention-Based SRGAN for Super Resolution of Satellite Images . . . . . . 407 D. Synthiya Vinothini and B. Sathya Bama

xvi

Contents

Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Pradeep Kumar Das, Ayush Pradhan, and Sukadev Meher Computer-Aided Classifier for Identification of Renal Cystic Abnormalities Using Bosniak Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 439 P. R. Mohammed Akhil and Menka Yadav Recognition of Obscure Objects Using Super Resolution-Based Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 B. Sathyabama, A. Arunesh, D. SynthiyaVinothini, S. Anupriyadharsini, and S. Md. Mansoor Roomi Low-Power U-Net for Semantic Image Segmentation . . . . . . . . . . . . . . . . . 473 Vennelakanti Venkata Bhargava Narendra, P. Rangababu, and Bunil Kumar Balabantaray Electrocardiogram Signal Classification for the Detection of Abnormalities Using Discrete Wavelet Transform and Artificial Neural Network Back Propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 493 M. Ramkumar, C. Ganesh Babu, and R. Sarath Kumar Performance Analysis of Optimizers for Glaucoma Diagnosis from Fundus Images Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . 507 Poonguzhali Elangovan and Malaya Kumar Nath Machine Learning based Early Prediction of Disease with Risk Factors Data of the Patient Using Support Vector Machines . . . . . . . . . . . 519 Usharani Chelladurai and Seethalakshmi Pandian Scene Classification of Remotely Sensed Images using Ensembled Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 P. Deepan and L. R. Sudha Fuzziness and Vagueness in Natural Language Quantifiers: Searching and Systemizing Few Patterns in Predicate Logic . . . . . . . . . . . 551 Harjit Singh An Attempt on Twitter ‘likes’ Grading Strategy Using Pure Linguistic Feature Engineering: A Novel Approach . . . . . . . . . . . . . . . . . . . 569 Lovedeep Singh and Kanishk Gautam Groundwater Level Prediction and Correlative Study with Groundwater Contamination Under Conditional Scenarios: Insights from Multivariate Deep LSTM Neural Network Modeling . . . . . 579 Ahan Chatterjee, Trisha Sinha, and Rumela Mukherjee A Novel Deep Hybrid Spectral Network for Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 K. Priyadharshini @ Manisha and B. Sathya Bama

Contents

xvii

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE Enhancement and Classifying with Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 P. Raja Rajeswari Chandni Analysis of Pre-earthquake Signals Using ANN: Implication for Short-Term Earthquake Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 Ramya Jeyaraman, M. Senthil Kumar, and N. Venkatanathan A Novel Method for Plant Leaf Disease Classification Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 R. Sangeetha and M. Mary Shanthi Rani

About the Editor

Dr. E. S. Gopi has authored eight books, of which seven have been published by Springer. He has also contributed 8 book chapters to books published by Springer. He has several papers in international journals and conferences to his credit. He has 20 years of teaching and research experience. He is the coordinator for the pattern recognition and the computational intelligence laboratory. He is currently Associate Professor, Department of Electronics and Communication Engineering, National Institute of Technology, Trichy, India. His books are widely used all over the world. His book on “Pattern recognition and Computational intelligence using Matlab”, Springer was recognized as one of the best ebook under “pattern recognition” and “Matlab” categories by the Book authority, world’s leading site for book recommendations by thought leaders. His research interests include machine intelligence, pattern recognition, signal processing and computational intelligence. He is the series editor for the series “Signals and Communication Technology”, Springer publication. The India International Friendship Society (IFS) has awarded him the “Shiksha Rattan Puraskar Award” for his meritorious services in the field of education. The award was presented by Dr. Bhishma Narain Singh, former Governor, Assam and Tamil Nadu, India. He is also awarded with the “Glory of India Gold Medal” by International Institute of Success Awareness. This award was presented by Shri. Syed Sibtey Razi, former Governor of Jharkhand, India. He was also awarded with “Best citizens of India 2013” by The International Publishing House and Life Time Golden Achievement award 2021, by Bharat Rattan Publishing House.

xix

Machine Learning, Deep Learning and Computational Intelligence Algorithms

Deep Learning to Predict the Number of Antennas in a Massive MIMO Setup Based on Channel Characteristics Sharan Chandra , E. S. Gopi , Hrishikesh Shekhar , and Pranav Mani

Abstract Deep learning (DL) solutions learn patterns from data and exploit knowledge gained in learning to generate optimum case-specific solutions that outperform pre-defined generalized heuristics. With an increase in computational capabilities and availability of data, such solutions are being adopted in a wide array of fields, including wireless communications. Massive MIMO is expected to be a major catalyst in enabling 5G wireless access technology. The fundamental requirement is to equip base stations with arrays of many antennas, which are used to serve many users simultaneously. Mutual orthogonality between user channels in multiple-input multiple-output (MIMO) systems is highly desired to facilitate effective detection of user signals sent during uplink. In this paper, we present potential deep learning applications in massive MIMO networks. In theory, an infinite number of antennas at the base station ensures mutual orthogonality between each user’s channel state information (CSI). We propose the use of artificial neural networks (ANN) to predict the practical number of antennas required for mutual orthogonality given the variances of the user channels. We then present an analysis to obtain the practical value of antennas required for convergence of the signal-to-interference-noise ratio (SINR) to its limiting value, for the case of perfect CSI. Further, we train a deep learning model to predict the required number of antennas for the SINR to converge to its limiting value, given the variances of the channels. We then extend the study to show the convergence of SINR for the case of imperfect CSI. Keywords Multiple-input multiple-output (MIMO) · Mutual orthogonality · Channel state information (CSI) · Signal-to-interference-noise-ratio (SINR) · Artificial neural networks (ANNS) · Deep learning (DL)

S. Chandra (B) · E. S. Gopi · H. Shekhar · P. Mani National Institute of Technology, Tiruchirappalli, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_1

3

4

S. Chandra et al.

1 Introduction With an increasing demand for faster and more reliable data transmission, traditional data transfer schemes may no longer be able to satisfy growing demands. Several technologies have been developed to satisfy these resource-heavy requirements, but massive MIMO [1] is among the leaders of this race. Massive MIMO has proved to be both effective and efficient in usage of energy and spectrum, promising performance boosts of up to 10–100 times faster than that provided by existing MIMO systems [2]. The concept of MIMO can be boiled down to transmitting and receiving multiple signals over a single channel simultaneously. Although there is no prescribed criterion to classify what a massive MIMO system is, generally, these systems tend to utilize tens or even hundreds of antennas as opposed to three or four antennas in traditional MIMO systems. The key advantage of a massive MIMO system is that it can bring up to a 50-fold increase in capacity without a significant increase in spectrum requirement. Further, owing to large data rates, massive MIMO is expected to play a major role in launching and sustaining 5G technology. A typical massive MIMO system consists of several hundreds of antennas in the base station and many users, interconnected to form a dense network (refer Fig. 1). (1) and (2) signify the mathematical model governing the massive MIMO system [3]. Y =HX +N ⎡ ⎢ ⎢ ⎢ ⎢ [⎢ ⎢ ⎢ ⎣

y1 y2 y3 y4 .. . yM





⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣

h11 h21 h31 h41 .. .

h12 . . .h1N h22 . . .h2N h32 . . .h3N h42 . . .h4N .. .. .. . ..

hM 1 hM 2 . . .hMN

(1)

⎤⎡

x1 ⎥ ⎢ x2 ⎥⎢ ⎥ ⎢ x3 ⎥⎢ ⎥ ⎢ x4 ⎥⎢ ⎥ ⎢ .. ⎦⎣ . xN





⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ + ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

n1 n2 n3 n4 .. .

⎤ ⎥ ⎥ ⎥ ⎥ ⎥] ⎥ ⎥ ⎦

(2)

nM

In massive MIMO, mutual orthogonality between each user’s channel state information(CSI), is fundamental to the task of message signal detection. This orthogonality between different channels relies upon an underlying assumption of an infinite number of base station antennas [3]. However, in practical cases, this is not feasible.

Deep Learning to Predict the Number of Antennas …

5

Fig. 1 Schematic of a generalized massive MIMO system where Y i represents the ith antenna in the base station and Xi represents the ith user. The blue lines indicate the channel coefficients corresponding to user 1

Hence, in Sect. 3 of this paper, an in-depth study is conducted on the number of antennas in the base station required to guarantee a low margin of error in reconstructing the message signal sent by a particular user. Further a deep learning (DL) [4] system is built to predict an optimal number of base station antennas for effective deciphering of the message at the base station, given the variances of the channels in consideration. Section 4 investigates how the signal-to-interference-noise-ratio (SINR) of a signal being transmitted in a MIMO system varies with the number of antennas. In theory, this value is said to converge to a constant under the assumption of infinite base station antennas. A study is carried out in this section to realize the practical number of antennas required in the base station for which the SINR converges to the calculated constant value, to within a threshold, for perfect CSI. We then deploy a deep learning (DL) model whose objective is to predict the number of base station antennas for ensuring convergence of the practical value of SINR to its limiting value, within a threshold of 0.01. In Sect. 5, we perform a similar analysis for imperfect CSI [3], where we demonstrate the practical value of base station antennas required for convergence of SINR to its limiting value, under different threshold conditions. We realize that a similar DL model can be exercised for the case of imperfect CSI as well.

2 Contributions of the Paper We propose a deep learning (DL) model to predict the practical number of antennas to be installed at the base station to ensure mutual orthogonality between user channels. We use data generated through Monte Carlo simulation [5] to train our artificial neural

6

S. Chandra et al.

network (ANN) [6]. The model learns a precise mapping between the variances of the user channel coefficients and the required number of antennas. It is found that the model predicted values quickly and accurately, thereby allowing for potential deployment in a massive MIMO system. For perfect channel state information (CSI), we realize that the practical value of antennas required to drive the SINR to its limiting value is well within the capabilities of a massive MIMO system. We use a deep learning model to predict this value. This model is trained using data generated through a Monte Carlo simulation. However, the more impactful observation is that this is possible even for the imperfect CSI case, which is of more practical utility. A similar DL approach can, therefore, be extended to predict the number of antennas required to obtain the convergent value of SINR, for imperfect CSI.

3 Mutual Orthogonality Mutually orthogonal channels are highly desirable in MIMO systems. Consider a system of four antennas and two users. Pre-multiplying hH 1 to (1) and (2), we get H H H Y =h HX +h N where h is the channel vector corresponding to user 1 (first hH 1 1 1 1 column of H matrix). We see that, if the channel vectors are mutually orthogonal, we have h∗11 h12 = 0; h∗21 h22 = 0; h∗31 h32 = 0; h∗41 h42 = 0. Hence, if noise is ignored, matched filter response can be used to detect the signal corresponding to x1 (user 1).

3.1 Trend Analysis In this section, we analyse the trend in orthogonality between each user’s channel state information (CSI) with the number of antennas at the base station (refer Fig. 2). We show, using the Weak Law of Large Numbers [7] that, as the number of antennas in the base station increases, the channel state information corresponding to individual users must becomes orthogonal to each other: h1 and h2 follow Gaussian distribution with zero mean   h11 h∗12 + h21 h∗22 +h31 h∗32 +· · · + hM 1 h∗M 2 = E h1 h∗2 = E [h1 ] E h∗2 = 0 (3) M →∞ M lim

Deep Learning to Predict the Number of Antennas …

7

Procedure 1 Monte Carlo Simulation for Orthogonality Data Mmax = 10001 Threshold = 0.01 Trials = 100 for Beta1 = 0.02, 0.04, . . . , 2 do for Beta2 = 0.02, 0.04, . . . , 4 do Mavg = 0 for trial = 1, 2, . . . , Trials do for m = 1, 2, . . . , Mmax do H 1 = matrix of dimensions m × 1 with values drawn from a complex Gaussian distribution with mean 0 and variance Beta1 10: H 2 = matrix of dimensions m × 1 with values drawn from a complex Gaussian distribution with mean 0 and variance Beta2 11: Result = ||Dot Product of H 1 and H 2 | | 12: if Result < Threshold then 13: Mavg = Mavg + m 14: Break 15: end if 16: end for 17: end for 18: Add [Beta1 , Beta2 ] to training data input list Mavg 19: Add ( Trials ) to training data output list 20: end for 21: end for 1: 2: 3: 4: 5: 6: 7: 8: 9:

In order to investigate the degree to which mutual orthogonality holds true in the practical sense, we adopted the Monte Carlo simulation technique. Monte Carlo algorithms [5] are a class of algorithms which use repeated sampling of data from random variables to model and predict the overall behaviour of a system. Each user’s channel vector can be modelled as an M-dimensional complex vector sampled from a zero-mean Gaussian distribution with fixed variance. We proceeded by varying, both, the variances of the channels and the number of antennas in the base station and sampled values for our channel vectors. The procedure we made use of is depicted as Procedure 1. Utilizing the results obtained from one observation is futile as the value obtained may represent an unlikely outcome, leading to a very noisy output. Hence, to avoid this problem, for a fixed pair of variances, we repeat the experiment several times and average our results over all these trials. Following this approach, we were able to plot the number of antennas required for orthogonality with varying variances of each user’s channel vector. The graph obtained is depicted in Sect. 6.1 (refer Fig. 5).

8

S. Chandra et al.

Fig. 2 We can see how the value, depicted in (3), approaches zero with increasing number of antennas. In these figures, β2 is fixed at 1 for each β1

3.2 Deep Learning Architecture to Predict Number of Antennas Required for Orthogonality Using the dataset generated in Sect. 3.1, we leverage the power of a deep learning (DL) model whose objective is to predict the number of antennas required to safely assume orthogonality between channel vectors of individual users. We make use of an artificial neural network (ANN) that takes the variances of any two channel vectors, as input, and predicts the number of antennas required to ensure orthogonality between each user’s channel vector. The network utilized for the purpose of this study (refer Fig. 3) consists of an input layer fed with variances β1 and β 2 and an output layer that gives the number of antennas, M, to ensure orthogonality between two channels with characteristics β1 and β 2 . The leaky version of the rectified linear unit (ReLU) activation function is applied to each layer, and the learning rates are tweaked as per the Adam optimization algorithm. Batch gradient descent [8] is used to feed-forward randomized batches of input variances through the network. The mean-squared error (MSE) [9] is used to compute the gradients and modify the weights of the network. The network successfully learns to map the relationship between the output (number of antennas, M) and the input (variances β1 and β 2 ) (refer Fig. 6).

Deep Learning to Predict the Number of Antennas …

9

Fig. 3 Model of the artificial neural network (ANN), used to predict the number of antennas required

4 Perfect CSI The process of detection of user signal in massive MIMO relies upon the multiplication of the received signal with the corresponding user’s channel vector. However, for this, the channel vector for each user should be determined, and this calculated value should not deviate considerably in time until the next channel vectors are recalculated. To extract the channel vector values, we generally send a pilot data consisting of an identity matrix to isolate each of our channel vectors individually. However, the assumption made here is that there is no noise corrupting the channel vectors during our estimation. The noise arising from corrupted channel vectors is accounted for in Sect. 5, imperfect CSI.

4.1 Signal-to-Interference-Noise-Ratio (SINR) Analysis For simplicity, let us, once again, consider a system of four antennas at the base H H station and two users (as in Sect. 3). From RHS of hH 1 Y =h1 HX +h1 N , we have ∗ ∗ h11 (h11 x1 + h12 x2 + n1 ) + . . . + h41 (h41 x1 + h42 x2 + n4 ) . When we consider detection of signal corresponding to user 1, it is understood that [h11 h21 h31 h41 ] are known and [h12 h22 h32 h42 ] are complex Gaussian random variables  with zero mean and variance β2 . Further, let us consider noise variance, E N 2 = σN2 = 1 and power allocated to each user, Pu . Hence, we have, the theoretical limiting value of SINR [3], lim SINR= lim

M →∞

M →∞ Eu M

Eu h1 2 Eu Eu β1 M h1 2 = 2 = lim

N 2 2 M →∞ M σ σN N i=2 βi + σN

(4)

where the power allocated to each user, Pu = EMu , is represented as a function of Eu , the energy allotted to each user and M , the number of antennas at the base station.

10

S. Chandra et al.

Fig. 4 Plot of calculated SINR and expected SINR value versus the number of antennas at the base

N station. For a we have, β1 = 0.1 and N i=2 βi = 0.5 and for b we have, β1 = 1.5 and i=2 βi = 8

The theoretical limiting value of SINR, shown above, works well under the assumption that there is an infinite number of base station antennas. However, it is unknown as to what extent the above equations hold for practical values of base station antennas. Hence, a study was conducted to analyse how the trend of SINR of the received signal varies with the number of base station antennas (refer Fig. 4). Please note that, for the purpose of this study, we have considered noise variance, σN2 = 1. Procedure 2 Monte Carlo Simulation for Perfect CSI

1: Mmax = 3000 2: Trials = 100 3: Threshold = 0.01 4: Beta1max = 6

max = 20 5: N i=2 βi 6: Euser = 1 7: for Beta1 = 0.05, 0.1, . . . , Beta1max do

N max do 8: for N i=2 βi = Beta1 + 0.05, Beta1 + 0.1, . . . , i=2 βi 9: Mavg = 0 10: SINRexpected = Beta1 × E 11: for trial = 1, 2, . . . , Trials do 12: for m = 1, 2, . . . , Mmax do 13: H 1 = Matrix of dimensions m × 1 with values drawn from a complex Gaussian distribution with mean 0 and variance Beta1

14:

H 12mag ×Euser

N m β ×E Den = i=2 mi user + 1 Num SINR = Den if SINR − SINRexpected

Num =

15: 16: 17: < Threshold then 18: Mavg = Mavg + m 19: Break 20: end if 21: end for 22: end for

N 23: Add Beta1 and i=2 βi to training data input list Mavg 24: Add ( Trials ) to training data output list 25: end for 26: end for

Deep Learning to Predict the Number of Antennas …

11

The results in Fig. 4 were obtained by fixing Eu = 5 and noise variance, σN2 = 1. From Fig. 4a, we can infer that by fixing just 127 antennas at the base station, the received signal SINR converges to the expected value, within a threshold of 0.01. Similarly, from Fig. 4b, we understand that by fixing 1441 antennas and fixing 3499 antennas at the base station, convergence occurs within an applied threshold of 0.3 and 0.1, respectively. However, this high value of

antennas is obtained for a large cumulative value of user variances (β1 = 1.5 and Ni=2 βi = 8). On demonstrating the existence of a potential trend, once again, we employ a Monte Carlo-based approach to simulate practical systems

and realize the relationship governing the variances of the channel vectors (β1 and Ni=2 βi ) and the number of antennas at the base station(M ). We note that the convergent SINR value depends only on the sum of variances of other channels and not the individual variances themselves. Hence, by varying the sum over a range of values, we mimic multiple user scenarios in a practical massive MIMO system. Following the steps shown in Procedure 2, we calculate expected SINR values and compare it with the expected SINR at infinity, which is Eβ1 (when we consider signal corresponding to user 1).

4.2 Deep Learning Architecture to Predict Number of Antennas Required for Convergence of SINR Using the data generation method mentioned in Sect. 4.1, we create a dataset that maps the variances of the channels to the number of antennas required for the SINR to converge to its limiting value. We use this dataset to train an artificial neural network (ANN) to learn the mapping. We use a threshold of 0.01 for training. For the purpose of this

study, we use a network that consists of an input layer fed with variances β1 and Ni=2 βi , where β1 is the variance of the channel whose

SINR we are interested in predicting and Ni=2 βi is the sum of variances of all other channels. The output layer predicts the minimum number of antennas, M, to ensure SINR of the channel with variance β1 is within a threshold of its convergent value. The leaky version of the rectified linear unit (ReLU) activation function is applied to each layer and the learning rates are tweaked as per the Adam optimization algorithm. Batch gradient descent is used to feed-forward randomized batches of input variances through the network. The mean-squared error (MSE) [9] is used to compute the gradients and modify the weights of the network.

5 Imperfect CSI: An SINR Analysis Section 4 deals with the ideal case, where the channel information observed is not corrupted by noise. However, in reality, this is not the case. Similar to the previous case, the channel vector values are determined using the “pilot” data sent across the

12

S. Chandra et al.

channel. However, the observed channel vector hˆ has an added noise component to it, i.e. hˆ = h +error. As a result, the theoretical value of SINR for imperfect CSI becomes (6). SINR =

Pu h1 4 2 2

Pu h1 2 + Pu Ni= 2 h1 h1 βi + Pp

When we substitute, Pu =

√Eu M

(5)

and Pp = NPu , where M → ∞, we have:

⇒ SINR = N Eu 2 β1 2

(6)

Procedure 3 Monte Carlo Simulation for Imperfect CSI 1: Mmax = 20001 2: Trials = 100 3: Beta

1 = 0.1 4: N i=2 βi = 0.5 5: Euser = 5 6: N = 4 7: for m = 1, 2, . . . , Mmax do 8: SINRavg = 0 9: for trial = 1, 2, . . . , Trials do 10: H 1 = matrix of dimensions m × 1 with values drawn from a complex Gaussian distribution with mean 0 and variance Beta1 11: e = matrix of dimensions m × 1 with values drawn from Gaussian distribution with mean 0 and variance 1   12: 13: 14:

Hˆ1 = H 1+

 e user N × E√ m

H 1mag 2 ×Euser √ M  Hˆ1

N mag 2 ×Euser i=2 βi × H 1 mag 2 √ Den = N1 + m SINR = Num Den SINRavg + = SINR

Num =

+

Hˆ1mag 2 H 1mag 2

15: 16: 17: end for 18: SINRavg / = Trials 19: Add SINRavg to list of SINRs 20: end for

As shown in (6), the SINR converges to a constant value when the number of antennas in the base station tends to infinity. However, we proceed to study at what realistic value of base station antennas this expected SINR is obtained.

Deep Learning to Predict the Number of Antennas …

13

The steps utilized for this purpose is similar to that used for Perfect CSI and is given as Procedure 3. However, the only difference is the formula used for calculating the expected SINR. Following the steps shown in Procedure 3, we calculate expected SINR values and compare it with the expected SINR as the number of antennas tends to infinity.

6 Results 6.1 Mutual Orthogonality Simulation Data Figure 5 indicates that for an average variance of β1 = 0.5 and β2 = 0.5, the practical number of antennas required for orthogonality is 125. Also, the trend indicates that the required number of antennas to guarantee a margin of error of 0.1 % is only feasible up to β1 = 3.26 and β2 = 3.81. On the basis of Fig. 5, we can confirm of a conclusive trend between the number of antennas in the base station (M) and the variance of the two user channels (β1 , β2 ), required for the CSI dot product to fall below a threshold of 0.01. It can also be seen from Fig. 5b that the change in the number of antennas, M , is equally sensitive to the change in both, β1 and β 2 . It can be inferred that as the variances of the two user channels are increased, there is a general increasing characteristic in the required number of antennas.

Fig. 5 Figure a and b are two-dimensional and three-dimensional representations of the plot between β1 , β 2 and the number of antennas, M . A well-defined, directly proportional relationship can be observed between the independent features, β1 and β 2 , and the dependent variable, number of antennas, M

14

S. Chandra et al.

6.2 Predicting Number of Base Station Antennas Required for Orthogonality The artificial neural network (ANN) approach was able to accurately predict the required number of antennas within a margin of error of about 20 antennas. This number is not significant from the perspective of massive MIMO. Further, it is apparent that this error can be reduced further, given more data for the network to train on. It is evident from Fig. 6 that the network is able to learn the relationship governing the variances and the number of required antennas. However, an important point to be noted is that the above study is confined to two users. Hence, in the practical scenario, the required number of antennas must be calculated for all pairs of users and the maximum value obtained, from all pairs, must be considered.

6.3 Perfect CSI-SINR Convergence Simulation Data In Sect. 4.1, pertaining to perfect CSI, we make use of Procedure 2 to generate data for the number of antennas

required for SINR convergence to the expected value. Here, the parameters β1 and Ni=2 βi are varied and the corresponding required number of antennas is recorded. The resulting values, generated, resemble a triangular plane with a well-defined inclination angle. Figure 7a highlights the dependency of the required number of antennas on

β1 . As can be observed, the trend is almost linearly increasing for fixed values of Ni=2 βi . Similarly,

from Fig. 7b, on analysing the variance of required number of antennas with Ni=2 βi , we once again observe a significantly proportional trend. However, from Fig. 7c, we can see that the increase in number of antennas is more steep for β1

Fig. 6 The testing data was generated by drawing random variance values between 0 and 4, and then using Procedure 1, we obtain the number of antennas required for convergence. For about 2000 test data points, sorted in increasing order of antennas required, the testing curve and actual values are plotted

Deep Learning to Predict the Number of Antennas …

15

Fig. 7 Figure depicts a three-dimensional plot between β1 , N i=2 βi and the number of antennas, M . Different viewing angles of the plot is shown in a–d. A direct dependence can be observed

N between β1 , i=2 βi and the number of antennas, M

as opposed to Ni=2 βi . From Fig. 7d, we can infer that there is a general increase in

the required number of antennas with increasing β1 and Ni=2 βi values.

Further, we note that on fixing a quintessential value of β1 = 2 and Ni=2 βi = 10, we observe that the required

number of antennas is 115. Analysing the extreme case, by fixing β1 = 6 and Ni=2 βi = 20, we get a required number of antennas as 330. From this, we can infer that these observations are within the implementation capabilities of a practical massive MIMO system. Hence, the use of this data to train a deep learning (DL) model to predict the number of antennas is of high practical use.

6.4 Perfect CSI-Predicting Number of Base Station Antennas Required for Convergence of SINR Utilizing the deep learning (DL) model proposed in Sect.

4.2, the required number of antennas is estimated given a fixed value of β1 and Ni=2 βi . The predicted value of the number of antennas is plotted against the true value of the required number of antennas. The estimate, on an average, is within 19 antennas of the required number

16

S. Chandra et al.

Fig. 8 The testing data was generated by drawing random variance values between 0 and 6, for β1 and between β1 + 0.05 to 20 for

N i=2 βi .Using Procedure 2, we then obtain the number of antennas required for convergence of SINR. For about 4000 test data points, sorted in increasing order of antennas required, the testing curve and actual values are plotted

of antennas which can be inferred from the calculated mean-squared error (MSE = 368.64688). As can be seen in Fig. 8, the network is able to, effectively, learn the trend on the input data. The performance can be improved by using larger datasets for training.

6.5 Imperfect CSI—Analysing the Number of Antennas Required for Convergence of SINR Similar to perfect CSI, the SINR is calculated using a Monte Carlo approach and averaged over a number of trials. The resulting SINR is plotted alongside the expected SINR, as shown in Fig. 9.

These results were obtained by fixing Eu = 5, β1 = 0.1, Ni=2 βi = 0.3 and noise variance, σN2 = 1. The study indicates that by fixing a (i) threshold of 0.1, we

Fig. 9 Imperfect CSI-plot of calculated SINR and expected SINR value vs number of antennas at the base station

Deep Learning to Predict the Number of Antennas …

17

require 411 antennas, (ii) threshold of 0.05, we require 4516 antennas, and for (iii) threshold of 0.01, we require 18,585 antennas for the received SINR to converge to the expected value. Hence, according to the accuracy necessary, these values can be utilized as a lower bound.

7 Conclusions Often, finding the number of antennas to be installed in the base station for effective and accurate deciphering of received signals is a challenging task. To solve this issue, we make use of a deep learning model to obtain the number of antennas required, given the maximum practically possible pair of input variances. Once at least the predicted number of antennas are available for use at the base station, the channel state information (CSI) of users are approximately orthogonal. Further, the latency of the neural network is small enough to handle dynamic CSI. Accordingly, we can activate the required number of antennas. The signal-to-interference-noise-ratio (SINR), in case of perfect channel state information (CSI), is said to converge to a constant value when the number of antennas, M → ∞. This signifies that even though power allocated to each user Pu → 0, the SINR does not down-scale to 0. We realize that for practical values of channel variances, the SINR converges and the number of antennas required to ensure this convergence occurs at around 200 − 300 antennas. This is followed up by using a deep learning model to predict the required number of antennas for the convergence of SINR to the expected values. Similarly, for imperfect channel state information, the number of required antennas increases to around 450. These observations have potential applications in realizing the limiting value of SINR, in massive MIMO systems. Further, these observations invite deep learning (DL) solutions to predict the number of antennas required in the base station to ensure convergence of SINR, in case of imperfect CSI.

References 1. Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R (2014) An overview of massive MIMO: benefits and challenges. IEEE J Sel Top Sig Process 8(5):742–758 Oct. 2. Van Chien T, Björnson E (2017) Massive MIMO communications. In: Xiang W, Zheng K, Shen X (eds) 5G mobile communications. Springer, Cham 3. Gopi ES (2015) Digital signal processing for wireless communication using Matlab, 1st ed. Springer Publishing Company 4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10. 1038/nature14539 5. Reuven Y, Rubinstein, Kroese DP (2016) Simulation and the Monte Carlo method, 3rd ed. Wiley Publishing 6. Hassoun Mohamad H (1995) Fundamentals of artificial neural networks, 1st edn. MIT Press, Cambridge

18

S. Chandra et al.

7. Weisstein EW, Weak law of large numbers. From MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/WeakLawofLargeNumbers.html 8. Ruder S (2016) An overview of gradient descent optimization algorithms. ArXiv abs/1609.04747 n. pag 9. Schluchter MD (2014) Mean square error, in Wiley StatsRef: statistics reference online. Wiley, New York

Optimal Design of Fractional Order PID Controller for AVR System Using Black Widow Optimization (BWO) Algorithm Vijaya Kumar Munagala and Ravi Kumar Jatoth

Abstract A new technique to improve the fractional order proportional integral derivative (FOPID) controller parameters for AVR system was proposed. The proposed technique uses the meta-heuristic Black Widow Optimization (BWO) algorithm for FOPID controller tuning. The AVR systems without controller tend to generate fluctuations in the output terminal voltage, so a controller is needed to reduce these fluctuations and produce stable voltage. The BWO-FOPID controller was used in the system to reduce the output fluctuations of the terminal voltage. Moreover, the two additional tuning parameters of FOPID controller greatly improve the reliability of the system. Simulation results depict that the algorithm works superior to the other optimization tuning algorithms which are based on PSO, CS, GA, and CYSGA. The system’s rise time and settling time values were improved significantly when compared with other controllers. Finally, a robust analysis was carried out to the designed system to check the reliability of the controller. Keywords AVR system · FOPID controller · BWO optimization

1 Introduction Electrical power generation systems are responsible for the production of electricity using various natural resources. These systems are incorporated with generators that convert mechanical energy into electrical energy. During the conversion process, the systems tend to oscillate at equilibrium state because of vibrations in the moving parts, load variations, and various external disturbances. To overcome this, often the synchronous generators are fed with the help of exciters. The exciters will control the input to the generators in such a way to uphold the output voltage at a stable level. V. K. Munagala (B) · R. K. Jatoth Department of ECE, National Institute of Technology, Warangal, India e-mail: [email protected] R. K. Jatoth e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_2

19

20

V. K. Munagala and R. K. Jatoth

In this process, AVR systems are used in the control loop to maintain a stable signal level at the input of the exciter so that the generator maintains the constant output voltage at the terminals. There are various techniques were provided like optimum control, robust control, H ∞ /H 2 , and predictive control. All these control strategies use a proportional integral derivative (PID) controller as a basic control element. The modern industrial controllers still use the PID controller because of its simple structure, ease of understanding the operation, and importantly robustness under different operating conditions. Nowadays, the effort has been increased to improve the performance of PID controllers using evolving mathematical concepts. One such technique is the use of fractional order calculus to design PID controllers. Such a method adds additional parameters called the order of integration (λ) and order of differentiation (μ) to the PID controller and these controllers are called fractional order PID (FOPID) controllers. For optimal design of FOPID controller, the parameters like proportional gain (K p ), integral gain (K i ), differential gain (K d ), λ, and μ should be carefully tuned. These extra parameters will give additional advantages for FOPID controllers [1]. In the past, FOPID controllers were used in many applications. Some of the applications include speed control of DC motor [2], control of the servo press system [3], control of water flow in irrigation canal [4], boost converter [5], control of flight vehicle [6], and temperature control [7]. Various objective functions were utilized in the literature to tune the parameters of FOPID/PID controllers. The basic cost function, integral of absolute error (IAE) was used in [8] along with an improved artificial bee colony algorithm (ABC) for tuning FOPID controllers. FOPID controller design using genetic algorithm was discussed in [9]. FOPID controller tuning problem was solved using particle swarm optimization algorithm (PSO) [10]. Another cost function called integrated squared error (ISE) was utilized in the tuning of FOPID controllers [11]. Minimization of ITSE was used to identify the best-tuned parameters of FOPID controller [12]. Zwee Lee Gaing proposed an objective function [13] for the optimum design of FOPID controller and this function was used for PID controller tuning [14, 15]. In addition to the techniques mentioned, a variety of objective functions were created by combining ITAE, IAE, ISE, and ITSE with weighted combinations of settling time, rise time, steady-state error, and overshoot [12, 16, 17]. Chaotic map-based algorithms were also used in the literature [14, 18, 19]. The advantage of chaotic maps is they will improve the existing algorithm performance which will further optimize the objective functions. Along with these techniques, authors have used multi-objective optimization [20, 21] by combining more than one objective function. Here, a set of Pareto solutions are generated, from these a suitable solution will be identified. Unknown parameters of FOPID controller were identified using salp swarm optimization [22], cuckoo search optimization algorithm [23]. A brief comparison of various optimization algorithms for AVR system controllers was discussed in [24]. The design of FOPID controllers for the AVR system using various optimization algorithms was discussed in [25–30]. A frequency-domain approach for optimal tuning of FOPID controller was discussed by [31, 32].

Optimal Design of Fractional Order PID Controller …

21

The paper was organized into the following sections. Sect. 2 discusses the operation of AVR system and analysis of system response parameters. A brief overview of fractional differ-integrals and FOPID controllers was given in Sect. 3. A brief description of the Black Widow Optimization algorithm and its working methodology was discussed in Sect. 4. The tuning of FOPID controller parameters using the BWO algorithm [33] was described in Sect. 5. The performance of the BWOFOPID controller was compared with other optimization-based FOPID controllers and robust analysis for the proposed controller was also made in Sect. 6.

2 Overview of Automatic Voltage Regulator (AVR) System Synchronous generators are commonly used in power generation systems. Due to the variations in the load or sudden changes in power usage, the generators produce oscillations at the output for a significant amount of time. These oscillations may lead to system instability and can cause catastrophes. To improve the terminal voltage stability, generators are controlled by excitation systems and AVR systems. Various constituents of the AVR system are amplifier, exciter, generator, and sensor [14]. The interconnections of various system blocks were shown in Fig. 1. Initially, the generator terminal voltage V t (s) given to the sensor circuit converts the terminal voltage into a proportional voltage signal V s (s). Then this signal is subtracted from the reference voltage V ref (s) and the error signal V e (s) will be generated. The error signal strength will be improved with the amplifier and the output of the amplifier connects to the exciter. The exciter converts the input signal to a signal that is suitable to drive the generator. The corresponding mathematical representations of various blocks in the AVR system are given by the following equations. The transfer function of the amplifier was represented as G a (s) =

Vref(s)

+

+

Vs(s)

Ve(s)

Amplifier (Ga)

Ka 1 + sτa

Exciter (Ge)

Sensor (Hs) Fig. 1 Components of the AVR system

(1)

Generator (Gg)

Vt(s)

22

V. K. Munagala and R. K. Jatoth

where K a is amplifier gain which has values in the range [100, 400] and τ a is the time constant of the amplifier and lies in the range [0.02, 0.1]. The transfer function of the exciter was represented as G e (s) =

Ke 1 + sτe

(2)

where K e is exciter gain which has values in the range [10,400] and τ e is the time constant of the exciter and lies in the range [0.5, 1.0]. The transfer function of the generator was represented as G g (s) =

Kg 1 + sτg

(3)

where K g is generator gain which has values in the range [0.7, 1.0], and τ g is the time constant of the generator and lies in the range [1.0, 2.0]. The transfer function of the sensor was represented as G s (s) =

Ks 1 + sτs

(4)

where K s is exciter gain which has values in the range [1.0, 2.0], and τ g is the time constant of the sensor and lies in the range [0.001, 0.06]. To understand the behavior and dynamics of the system, a step response is plotted in Fig. 2 and identified its key performance parameters. Table 1 shows the variation of key parameters with a change in K g value. Since the terminal voltage varies with load changes, different values of K g were considered in the range [0.7, 1.0] for step 1.6

Output magnitude

1.4 1.2 1 0.8 Input

0.6

Kg=1 Kg=0.9

0.4

Kg=0.8 Kg=0.7

0.2 0

0

1

2

3

4

5

Time(s)

Fig. 2 AVR system unit step response

6

7

8

9

10

Optimal Design of Fractional Order PID Controller …

23

Table 1 Identified key parameters for AVR system Parameter

K g = 0.7

K g = 0.8

K g = 0.9

Kg = 1

Rise time

32.021

29.7327

27.8725

26.331

423.0789

472.1487

520.1147

624.964

0.125

0.111

0.1

Settling time Steady state Error Peak overshoot

47.85

52.94

0.091

57.61

61.97

Gain margin

1.7365

1.3928

1.1254

0.9116

Phase margin

17.6058

9.7074

3.1952

−2.3247

Magnitude (dB)

50 0 -50 -100

Phase (deg)

-150 0 Kg=1 Kg=0.9

-90

Kg=0.8 Kg=0.7

-180 -270 10-1

100

101

102

103

Frequency (rad/s)

Fig. 3 Bode plot of AVR system without controller

response. The gain margin and phase margin were calculated from the bode plot mentioned in Fig. 3.

3 Fractional Calculus and Fractional Order Controllers 3.1 Fractional Calculus Fractional calculus deals with the evaluation of real order integro-differential equations. Here, integration and differentiation are denoted by a common differ-integral operator (D-operator) a Dta , where a, t represent limits of integration and α is the order of differentiation.

24

V. K. Munagala and R. K. Jatoth

⎧ dα , R(α) > 0, ⎪ ⎪ ⎨ dt α 1, R(α) = 0, α a Dt = t ⎪ ⎪ ⎩ ∫(dτ )−α , R(α) < 0

(5)

a

The D-operator has two famous definitions known as Grunwald–Letnikov (GL) and Riemann–Liouville (RL) definition [34]. According to GL, the D-operation is defined as α a Dt

  [ t−a h ] 1  k α f (t) = lim α f (t − kh) (−1) h→0 h k k=0

(6)

where h is the computation step size. The RL definition for D-operator on function f (t) was defined as α a Dt

f (t) =

dn t f (τ ) 1 ∫ dτ r (n − α) dt n a (t − τ )α−n+1

(7)

where α  (n − 1, n) and (·) represent the Gamma function. Fractional order dynamic systems are represented in linear terms of D-operators given by equation an Dαn y(t)

+an−1 Dαn−1 y(t) + · · · +a0 Dα0 y(t)

= bm Dβm u(t) +bm−1 Dβm−1 u(t) + · · · +b0 Dβ0 u(t)

(8)

ξ

where D ξ =0 Dt , n  (0, 1 , 2, … k), m  (0, 1, 2, … k), α k and β k (k  n, n − 1, … 0) are arbitrary real numbers. Eq. (8) can also be represented in a more standard form as q a Dt x(t)

= A · x(t) + B · u(t)

y(t) = C · x(t)

(9) (10)

where u  Rr , x  Rn and y  Rp are the input signal, state and output signal of the fractional order system and A  Rnxn , B  Rnxr , C  Rpxn and q represent fractional commensurate order.

3.2 Fractional Order Controller The fractional order controllers have additional parameters (λ and μ) because of which they give flexibility in tuning. Different forms of the fractional controller

Optimal Design of Fractional Order PID Controller … Fig. 4 Different forms of FOPID controller

25

Derivative Order(μ) FOPID

λ=0,μ=2

λ=1,μ=1 PID

λ=0,μ=1 PD

λ=0,μ=0 P

λ=1,μ=0 PI

λ=2,μ=0 Integral Order(λ)

were shown in Fig. 4. From the figure, it is observed that all other integer-order controllers are variations of FOPID controller. The generalized equation representing fractional PID controller was given by G C (s) =

Ki U (s) = Kp + λ + Kdsμ E(s) s

(11)

U(s) represents output and E(s) represents the input of the controller. Generally, the input for any controller is the difference between the desired signal and the response signal. Correspondingly, in the time domain, the equation can be represented as μ

u(t) = K p e(t) + K i Dt−λ e(t) + K i Dt e(t)

(12)

Therefore, from Eqs. (11) and (12), the real terms λ and μ make Gc (s) an infinite order filter because of the real differentiation and integration.

4 Black Widow Optimization The algorithm was developed by Vahideh et. al [33] based on the black widow spider lifestyle. Generally, the female spiders are dominant than the male spiders and mostly active during the night of the day. Whenever female spiders want to mate with male spiders, they put pheromone on the web and males are attracted to it. After mating, the female spiders consume the male spider. Then female spiders lay eggs as shown in Fig. 5 and they will mature in 8–11 days. The hatched black widow spiderlings exhibit sibling cannibalism because of competition and lack of food source. In some special cases, the spiderlings will also consume the mother slowly. Because of this, only strong black widow spiders only survive during their life cycle.

26

V. K. Munagala and R. K. Jatoth

Fig. 5 Female black widow spider with eggs in her web [33]

4.1 Initial Population In the algorithm, a variable in the solution space is called a widow and the solution is called black widow spider. Therefore for an n-dimensional problem, the widow matrix consists of n elements. Widow = [x1 , x2 , x3 , . . . , xn ]

(13)

The fitness values of the population are represented as Fitness = f (x1 , x2 , x3 , . . . , xn )

(14)

The optimization algorithm is started by generating the required candidate widow matrix of size npop ×n as an initial population. In the next stage, from the initial population next generation will be produced using procreation.

4.2 Procreate To produce the next generation population, the male and female spiders will mate in their corresponding webs. To implement this, a matrix with the same length as the widow matrix called alpha matrix was used whose elements are random numbers. The corresponding equations are shown in (15) and (16) where x 1 , and x 2 represent parents and y1 , and y2 represent children. y1 = α ∗ x1 + (1 − α) ∗ x2

(15)

Optimal Design of Fractional Order PID Controller …

y2 = α ∗ x2 + (1 − α) ∗ x1

27

(16)

The entire process is reiterated for n/2 times and lastly, the parents and spiderlings are combined and sorted according to their fitness values. The number of parents participating in procreation is decided by the procreation rate (PR).

4.3 Cannibalism The black widow spiders exhibit three types of cannibalism. In sexual cannibalism, after mating, the female spider consumes the male spider. In sibling cannibalism, stronger spiderlings eat weaker spiderlings. In the third type, sometimes spiderlings eat their mother. In the algorithm, this behavior was implemented as a selection of the population according to their fitness values. The population was selected based on cannibalism rating (CR).

4.4 Mutation From the population, the Mutepop number of spiders is selected and mutation is applied for any two randomly selected positions for each spider. The process of mutation was shown in Fig. 7. The number of spiders to be mutated (Mutepop) is selected according to the mutation rate (PM). In the algorithm, the procreation rate was chosen as 0.6, the cannibalism rate was chosen as 0.44, and the mutation rate was taken as 0.4. The complete flow of the black widow optimization algorithm was shown in Fig. 6.

5 Proposed BWO-FOPID Controller The BWO-FOPID controller provides two additional degrees of freedom. This allows designing a robust controller for a given application. The process to tune the FOPID controller using the BWO algorithm was denoted as a block diagram in Fig. 8. V ref (s) is the reference voltage that should be maintained by the AVR system at its terminals. V t (s) is the actual terminal voltage produced by the system. V e (s) is error voltage, which indicates the difference of V ref (s) and V t (s). For each iteration of the BWO algorithm, a population of K p , K i , k d , λ, and μ are generated and are substituted in the objective function. FOPID controller takes V e (s) as the input signal and produces the corresponding control signal. For this signal, the AVR system terminal voltage and error are calculated. The process is repeated until the termination

28

V. K. Munagala and R. K. Jatoth

Randomly select parents

Start

Generate population using Logistic Map

Procreate

Cannibalism

Evaluate Fitness of Individuals Mutation

Yes End

No Stop condition

Update population

Fig. 6 Flow chart of black widow optimization (BWO) algorithm [33]

Fig. 7 Mutation in BWO algorithm

criteria were met using the method mentioned in Sect. 2. Finally, the best values of the parameters will be identified and are used to design the optimum FOPID controller. The designed controller was inserted into the system. The controller output is given as input to the AVR system and it produces corresponding terminal voltage. The terminal voltage was again compared with the reference voltage and the error signal is produced. The process is repeated until the error signal becomes zero. When the desired level was reached, the controller produces a constant U(s) to uphold the output level at the terminal voltage.

Optimal Design of Fractional Order PID Controller …

29

Parameter Tuning J=ZLG (Objective Function)

BWO Optimization

Vref(s)

Ve(s) +

-

KP

KI

KD

λ

AVR System

μ

K P + K I s − λ + KDs μ

U(S)

Amplifier

Exciter

Generator

Vt(s)

BWO-FOPID Controller

Sensor

Fig. 8 Block diagram of BWO-FOPID controller

During the FOPID controller design, to tune the parameters, ZLG optimization function was used. Although various standard optimization functions like IAE, ISE, ITAE, and ITSE are available, it is mentioned that the ZLG producing better results. The equation for the ZLG optimization function [13] was given in Eq. (17).



ZLG = 1 − e−β ∗ Mp + E ss + e−β ∗ (Ts − Tr )

(17)

The term Mp represents maximum peak overshoot, E ss is the steady-state error, Ts and Tr represents settling time and rise time of the system, respectively. β is an adjustment parameter and generally, it is taken as 1 [13]. The identified values of controller parameters are given in Table 3. The convergence curve for the BWOFOPID algorithm during parameter identification was shown in Fig. 9. The range of parameters considered for the optimization process was mentioned in Table 2.

6 Results and Discussions All the simulations are performed using MATLAB/Simulink (with FOMCON toolbox) version 8.1a on the computer with Intel i5 processor @ 3.00 GHz and 8 GB RAM. For the BWO optimization algorithm, the total population was chosen as 50 and 35 iterations were performed.

6.1 Step Response The tuned values of FOPID parameters were mentioned in Table 3. A comparison of FOPID controllers designed using different optimization algorithms was made

30

V. K. Munagala and R. K. Jatoth BWO

6.4 6.2 6

Cost value

5.8 5.6 5.4 5.2 5 4.8 4.6 4.4

X: 35 Y: 4.307

5

10

15

20

25

30

35

Iterations

Fig. 9 Convergence curve

Table 2 Range of K p , K i , K d , λ, and μ

Parameter

Lower value

Upper value

Kp

0.1

3

Ki

0.1

1

Kd

0.1

0.5

λ

0.5

1.5

μ

0.5

1.5

Table 3 Obtained values of K p , K i , K d , λ, and μ Algorithm-controller

Kp

Ki

Kd

Lambda

mu

BWO -FOPID(Proposed)

2.6597

0.7462

0.4263

1.0106

1.3442

C-YSGA FOPID[19]

1.7775

0.9463

0.3525

1.206

1.1273 0.9702

PSO-FOPID[10]

1.5338

0.6523

0.9722

1.209

CS-FOPID[15]

2.549

0.1759

0.3904

1.38

0.97

GA-FOPID[12]

0.9632

0.3599

0.2816

1.8307

0.5491

using the step response displayed in Fig. 10. From the figure, it can be observed that the BWO-FOPID controller produces a low value of overshoot than the others. To further investigate the controller performance, T s , T r , and E ss were calculated and compared with other FOPID controllers. The BWO algorithm produces the best parameter values because of the cannibalism stage, in which the weak solutions are automatically omitted and only strong solutions exist. It is observed that the BWO-FOPID controller has a better settling time of 0.1727s and overshoot 1.2774 s and produced a very less steady-state error. The rise time of the controller is a little higher than the PSO-FOPID and GA-FOPID

Optimal Design of Fractional Order PID Controller …

31

1.2

Magnitude

1 0.8 C-YSGA PSO

0.6

CS GA

0.4

BWO

0.2 0

0

1

2

3

4

5

6

7

8

9

10

Time(s)

Fig. 10 Step response comparison of controllers

Table 4 Performance measures for various controllers Algorithm-controller

Rise time (s) Settling time (s) Overshoot (%) Steady-state error

BWO-FOPID(Proposed) 0.1127

0.1727

1.2774

C-YSGA FOPID[19]

0.1347

0.2

1.89

1.8972E−04 0.009

PSO-FOPID[10]

0.0614

1.3313

22.58

0.0175

CS-FOPID[15]

0.0963

0.9774

3.56

0.0321

GA-FOPID [12]

1.3008

1.6967

6.99

0.0677

controllers. Moreover, the PSO-FOPID controller produced the highest overshoot of 22.58%, whereas the GA-FOPID controller produced a high rise time of 1.3008 s and settling time of 1.6967 s. Since, in the voltage regulator systems, overshoot causes severe problems than rise time issues more importance should be given to optimizing overshoot. If the operating environment of the system is strictly constrained, then a tradeoff can be made between rise time and overshoot. The comparison for performance parameters of different FOPID controllers was made in Table 4.

6.2 Robust Analysis To understand the reliability of the designed controller, the robust analysis was performed by changing the time constants of various subsystems in the range of -20% – 20%. Step responses were plotted for variation in τ a , τ e , τ g, and τ s values. From Fig. 11a–d, it was observed that the BWO-FOPID controller performs well even though there is a change of parameter values up to 40%.

32

V. K. Munagala and R. K. Jatoth 1.2

1

Terminal Voltage

Terminal Voltage

1.2

-20%

0.8

-10%

0.6

10% 20%

0.4 0.2 0

0

0.2 0.4 0.6 0.8

1

1.2 1.4 1.6 1.8

-10% 10%

0.6

20%

0.4 0.2 0

0.2 0.4 0.6 0.8

1

1.2 1.4 1.6 1.8

Time(s)

Time(s)

(a)

(b)

2

1.2

1

Terminal Voltage

Terminal Voltage

-20%

0

2

1.2

-20%

0.8

-10%

0.6

10% 20%

0.4 0.2 0

1 0.8

0

0.2 0.4 0.6 0.8

1

1.2 1.4 1.6 1.8

2

1 -20%

0.8

-10% 10%

0.6

20%

0.4 0.2 0

0

0.2 0.4 0.6 0.8

Time(s)

(c)

1

1.2 1.4 1.6 1.8

2

Time(s)

(d)

Fig. 11 Variation of time constants of AVR system a τ a , b τ e , c τ g , d τs

7 Conclusion A new meta-heuristic optimization-based approach for FOPID controller design to regulate the terminal voltage of the AVR system was presented. The method uses the Black Widow Optimization algorithm to identify optimum parameter values of the proposed controller. The results show that the BWO tuned fractional controller produced better values of parameters. It is due to the cannibalism exhibited by the population in which only stronger ones survive. As a result, the controller has improved rise time, settling time of the overall system with acceptable overshoot. To check the controller performance, it was compared with C-YSGA FOPID, PSOFOPID, CS-FOPID, GA-FOPID controllers. To study the behavior of the controller under parameter uncertainty, the robust analysis was performed and results show that the proposed BWO tuned FOPID controller was able to perform decently. Acknowledgements This work is funded by the Department of Science and Technology, DST-ICPS division, Govt. of India, under the grant number DST/ICPS/CPS-INDIVIDUAL/2018/433(G).

Optimal Design of Fractional Order PID Controller …

33

References 1. Shah P, Agashe S (2016) Review of fractional PID controller. Mechatronics 38:29–41 2. Petráš I (2009) Fractional-order feedback control of a DC motor. J Electr Eng 60:117–128s 3. Fan H, Sun Y, Zhang X (2007) Research on fractional order controller in servo press control system. In: International conference on mechatronics and automation, ICMA, pp 2934–2938 4. Domingues J, Valerio D, da Costa JS (2009) Rule-based fractional control of an irrigation canal. In: Proceedings of 35th annual conference of IEEE industrial electronics IECON ’09, pp 1712–1717 5. Tehrani K, Amirahmadi A, Rafiei S, Griva G, Barrandon L, Hamzaoui M, Rasoanarivo I, Sargos F (2010) Design of fractional order PID controller for boost converter based on multiobjective optimization. In: Proceedings of 14th international power electronics and motion control conference (EPE/PEMC), pp 179–185 6. Changmao Q, Naiming Q, Zhiguo S (2010) Fractional PID controller design of hypersonic flight vehicle. In: Proceedings of international conference on computer, mechatronics, control and electronic engineering (CMCE), pp 466–469 7. Ahn HS, Bhambhani V, Chen YQ (2009) Fractional-order integral and derivative controller for temperature profile tracking. Sadhana 34:833–850 8. Zhang D-L, Tang Y-G, Guan X-P (2014) Optimum design of fractional order PID controller for an AVR system using an improved artificial bee colony algorithm. Acta Automatica Sin 40:973–979 9. Li M, Dingyu X (2009) Design of an optimal fractional-order PID controller using multiobjective GA optimization. In: 2009 Chinese control and decision conference, Guilin, pp 3849– 3853 10. Zamani M, Karimi-Ghartemani M, Sadati N, Parniani M (2009) Design of a fractional order PID controller for, an AVR using particle swarm optimization. Control Eng Pract 17:1380–1387 11. Lee CH, Chang FK (2010) Fractional-order PID controller optimization via improved electromagnetism-like algorithm. Expert Syst Appl 37:8871–8878 12. Pan I, Das S (2012) Chaotic multi-objective optimization based design of fractional order PIλ Dμ controller in AVR system. Int J Electr Power Energy Syst 43:393–407 13. Gaing ZL (2004) A particle swarm optimization approach for optimum design of PID controller in AVR system. IEEE Trans Energy Convers 19:384–391 14. Tang Y, Cui M, Hua C, Lixiaong L, Yang Y (2012) Optimum design of fractional order PIλ Dμ controller for AVR system using chaotic ant swarm. Expert Syst Appl 39::6887–6896 15. Sikander A, Thakur P, Bansal RC, Rajasekar S (2018) A novel technique to design cuckoo search based FOPID controller for AVR in power systems. Comput Electr Eng 70:261–274 16. Zeng GQ, Chen J, Dai YX, Li LM, Zheng CW, Chen MR (2015) Design of fractional order PID controller for automatic regulator voltage system based on multi-objective extremal optimization. Neurocomputing 160:173–184 17. Ortiz-Quisbert ME, Duarte-Mermoud MA, Milla F, Castro-Linares R, Lefranc G (2018) Optimal fractional order adaptive controllers for AVR applications. Electr Eng 100:267–283 18. Pan I, Das S (2013) Frequency domain design of fractional order PID controller for AVR system using chaotic multi-objective optimization. Int J Electr Power Energy Syst 51:106–118 19. Mihailo M, alasan MC, Diego O (2020) Fractional order PID controller design for an AVR system using chaotic yellow saddle goatfish algorithm. Mathematics, MDPI, pp 1–21 20. Zhang H, Zhou J, Zhang Y, Fang N, Zhang R (2013) Short term hydrothermal scheduling using multi-objective differential evolution with three chaotic sequences. Int J Electr Power Energy Syst 47:85–99 21. Dos Coelho LS, Alotto P (2008) Multi-objective electromagnetic optimization based on a nondominated sorting genetic approach with a chaotic crossover operator. IEEE Trans Magn 44:1078–1081 22. Khan IA, Alghamdi AS, Jumani TA, Alamgir A, Awan AB, Khidrani A (2019) Salp swarm optimization algorithm-based fractional order PID controller for dynamic response and stability enhancement of an automatic voltage regulator system. Electronics 8:1472

34

V. K. Munagala and R. K. Jatoth

23. Bingul Z, Karahan O (2018) A novel performance criterion approach to optimum design of PID controller using cuckoo search algorithm for AVR system. J Frankl Inst 355:5534–5559 24. Mosaad AM, Attia MA, Abdelaziz AY (2018) Comparative performance analysis of AVR controllers using modern optimization techniques. Electr Power Compon Syst 46:2117–2130 25. Ekinci S, Hekimoglu B (2019) Improved kidney-inspired algorithm approach for tuning of PID controller in AVR System. IEEE Access 7:39935–39947 26. Mosaad AM, Attia MA, Abdelaziz AY (2019) Whale optimization algorithm to tune PID and PIDA controllers on AVR system. Ain Shams Eng J 10:755–767 27. Blondin MJ, Sanchis J, Sicard P, Herrero JM (2018) New optimal controller tuning method for an AVR system using a simplified ant colony optimization with a new constrained Nelder–Mead algorithm. Appl Soft Comput J 62:216–229 28. Calasan M, Micev M, Djurovic Z, Mageed HMA (2020) Artificial ecosystem-based optimization for optimal tuning of robust PID controllers in AVR systems with limited value of excitation voltage. Int J Electr Eng Educ 1:1–25 29. Al Gizi AJH, Mustafa MW, Al-geelani NA, Alsaedi MA (2015) Sugeno fuzzy PID tuning, by genetic-neural for AVR in electrical power generation. Appl Soft Comput J 28: 226–236 30. Daniel Z, Bernardo M, Alma R, Arturo V-G, Erik C, Marco P-C (2018) A novel bio-inspired optimization model based on Yellow Saddle Goatfish behavior. BioSystems 174:1–21 31. Monje CA, Vinagre BM, Chen YQ, Feliu V, Lanusse P, Sabatier J (2004) Proposals for fractional PIλ Dμ tuning. In: Proceedings of 1st IFAC workshop on fractional derivatives and applications, Bordeaux, France 32. Monje CA, Vinagre BM, Feliu V, Chen Y (2008) Tuning and auto-tuning of fractional order controllers for industry applications. Control Eng Pract 16:798–812 33. Hayyolalam V, Kazem AAP, Algorithm BWO (2020) A novel meta-heuristic approach for solving engineering optimization problems. Eng Appl Artif Intell 87(103249):1–28 34. Monje CA, Chen Y, Vinagre BM, Xue D, Feliu-Batlle V (2010) Fractional-order systems and controls: fundamentals and applications. Springer, Berlin

LSTM Network for Hotspot Prediction in Traffic Density of Cellular Network S. Swedha and E. S. Gopi

Abstract This paper implements long short-term memory (LSTM) network to predict hotspot parameters in traffic density of cellular networks. The traffic density depends on numerous factors like time, location, number of mobile users connected and so on. It exhibits spatial and temporal relationships. However, only certain regions have higher data rates, known as hotspots. A hotspot is defined as a circular region with a particular centre and radius where the traffic density is the highest compared to other regions at a given timestamp. Forecasting traffic density is very important, especially in urban areas. Prediction of hotspots using LSTM would result in better resource allocation, beam forming, hand overs and so on. We propose two methods, namely log likelihood ratio (LLR) method and cumulative distribution function (CDF) method to compute the hotspot parameters. On comparing the performances of the two methods, it can be concluded that the CDF method is more efficient and less computationally complex than the LLR method. Keywords Hotspot · LSTM · LLR · CDF · Traffic density · Cellular networks

1 Introduction Wireless cellular networks consist of several base stations in a given geographical area. The traffic density of a particular network in a specific area depends on numerous factors like time, location, number of base stations, users connected and so on. Traffic density of cellular networks can be computed as the number of users accessing or packets/bytes transmitted by every base station in a given area. The temporal autocorrelation and spatial correlation among neighbouring base stations of cellular network data are nonzero (refer [1, 2]). However, at a given time, only certain locations in the given area have a high influx of traffic density. We term such locations as ‘hotspots’. Forecasting the traffic density of mobile users with high accuracy, espeS. Swedha (B) · E. S. Gopi National Institute of Technology, Tiruchirappalli, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_3

35

36

S. Swedha and E. S. Gopi

cially in urban areas, is the need of the hour (refer [3]). The aim of the paper is to identify such hotspots using two different methods and predict the future hotspot in the given area. Spatio-temporal neural network architectures have also been proposed by previous scientists using deep neural networks to address the same. It has been concluded that long-term prediction is enhanced through such methods. Thus, the importance of deep learning algorithms in mobile and wireless networking and even hotspot prediction has been further imprinted (refer [4]). Emerging hybrid deep learning models for temporal and spatial modelling with the help of LSTM and auto-encoder-based deep model, respectively, have been researched upon previously (refer [1]). Our motivation to employ LSTM is further confirmed through references such as [1–3, 5–8]. For instance, sequence learning using LSTM networks incorporating a general end-to-end approach to predict target variables of very long sequences has been presented with minimal assumptions about the data sequence [6]. It has been concluded that LSTM network’s performance is high even for very long sequences. Reference [5] reviews an illustrative benchmark problem wherein conventional LSTM outperforms RNN. The shortcomings of LSTM networks are addressed by proposing forget gates (refer [5]). It solves the problem of processing input streams that are continuous and do not have marked sequence ends. Thus, we employ LSTM for prediction and compare the performance of the two proposed methods. The data is represented as images for better visualization and interpretation (refer [9]). There are some areas in these images that are more dense than others, which form the hotspot region at a given timestamp. In the first method, called the log likelihood ratio (LLR) method (refer [10]), we find the centre and radius of hotspot using LLR. We consider two hypotheses: the null hypothesis, H0 , represents the assumption that the traffic density to be uniformly distributed and the other hypothesis, H1 , represents the actual distribution of traffic density. We find the LLR for the two hypotheses and maximize it to obtain the hotspot parameters. We train an LSTM network with input as sequence of raw data for 10 consecutive timestamps and target variable as hotspot parameters of 11th timestamp. In the second method, called the cumulative distribution function (CDF) method, we find the CDF starting from the centre of the hotspot found through LLR method by increasing the radius from a minimum radius to the maximum radius that will cover the entire image. We use CDF to compute the expectation value in each contour which is the area between two concentric circles and plot it as an image. Using the CDF, we determine the radius of hotspot as the least radius whose CDF is greater than a threshold value fixed by us depending on the data. We train an LSTM network with input as sequence of CDF for 10 consecutive timestamps and target variable as hotspot parameters of 11th timestamp, where the radius of hotspot is same as radius computed using CDF. The proposed methods differ from the existing methods in the sense that it predicts the hotspot parameters using LSTM. The second proposed method in fact makes use of CDF to reduce complexity and compute the hotspot parameters. Further, the data

LSTM Network for Hotspot Prediction in Traffic …

37

is visualized as images with the hotspot region plotted on the image in order to better understand the physical implication of the data attained. In Sect. 2, we will briefly discuss how the data representing traffic density can be visualized as images. We use a matrix to store the data and scale it to 255 in order to represent it as images. We use MATLAB to plot the images. Section 3 discusses at length about the LLR algorithm and its implementation to find the hotspot parameters. Section 4 describes the CDF method’s algorithm to compute hotspot parameters and each contour’s expectation value. In Sect. 5, the results of the two proposed methods are compared. The various applications and extensions of the work presented in this paper are discussed in Sect. 6.

2 Dataset Collection and Representation The dataset contains the traffic density at a city level scale for one week, collected at each hour (refer [11]). It consists of base station number and the number of users, bytes and packets accessing that particular base station at a given hour. The corresponding latitude and longitude of each base station is given in a separate file. In order to represent the given data as an image for every timestamp, we fix the image size as 151 × 151 (see Fig. 1). The minimum and maximum latitude and longitude is normalized and scaled to 150, with latitude serving as the x coordinate and longitude serving as the y coordinate of the image. A matrix M, of dimension 151 × 151, is formed, and corresponding to the location of base station, the appropriate element of the matrix is assigned as the number of users. It must be noted that the number of packets and bytes is not used to represent the data as image. If more than one base station’s location corresponds to the same pixel coordinates, then we add the data value to the already existing value at the corresponding element of M. It is then scaled to 255 to represent it as image. Finally, 100 times the logarithm of the matrix is represented as an image using MATLAB functions. This process is repeated for all the timestamps.

3 Hotspot Prediction Using LLR Method In this proposed solution, we define and identify a hotspot using log likelihood ratio (LLR) method as to be discussed in Sect. 3.1. After identifying the hotspot parameters which are the centre and radius of hotspot, we proceed to train the LSTM network to predict the hotspot parameters of future timestamp. For this, we reshape the entire matrix of dataset after normalization and use it as input to the LSTM network. We give raw data values of 10 consecutive timestamps as input to the LSTM network and use 11th timestamp’s hotspot parameters calculated through LLR method as target variables.

38

S. Swedha and E. S. Gopi

Fig. 1 Representation of traffic density data as image for 115th timestamp

3.1 Algorithm to Find Hotspot Using LLR A hotspot is defined as a region where the traffic density is high. It can be of any shape. Based on the previous work done in [10], it has been concluded that circular hotspots are better than ring hotspots. Thus, in this paper, we consider it to be circular. To identify the coordinates of the centre and radius of hotspot, we employ ‘log likelihood ratio’ (LLR) method. Consider the total region to be represented as S. At a given location on the image with pixel coordinates (x, y), consider a circular region R of radius r . At a given timestamp, let J be the total number of users in the entire region, K be the expected number of users within the circular region R assuming uniform distribution, where and L be the actual number of users within the circular region R. Let z K = area(R)×J area(S) be a random variable vector with 22, 801(151 × 151) elements, where each element represents each pixel point of the image, taking values 0 and 1. We assume that the elements are independent of each other. A zero represents that the the pixel is outside the circle constructed, and one represents that the pixel is within the circle. In other words, the values 0 and 1 represent whether or not the pixel lies outside the hotspot region chosen. Consider null hypothesis H0 as uniform distribution of traffic density across the given area for the given circular region R. Under the null hypothesis H0 , the probability that a pixel is within the hotspot is, KJ and the probability that a pixel is outside ) . Since L users are within the hotspot R, the probability that z the hotspot is (J −K J   L  J −K  J −L follows H0 is, p(z|H0 ) = KJ . J Consider hypothesis H1 as the actual nonuniform distribution of traffic density across the given area. At a given location on the image with pixel coordinates (x, y), consider a circular region R of radius r . For a given circular region R with L users inside it and considering the hypothesis H1 , the probability that a pixel is within the

LSTM Network for Hotspot Prediction in Traffic …

39

Fig. 2 a Image of 115th timestamp, b Hotspot computed using LLR method for 115th timestamp

hotspot is LJ and the probability that a pixel is outside the hotspot is, (J −L) . Since J L users are within the hotspot R, the probability that z follows H1 is, p(z|H1 ) =  L  L  J −L  J −L . J J     p(z|H1 ) Thus, the log likelihood ratio is defined as LLR = log p(z|H = L log KL + 0)  −L  . In order to find the hotspot, we need to maximize LLR. We (J − L) log JJ−K increase the radius r from 4 to 8 units by steps 0.1 unit and traverse at every pixel location. The circle having the maximum LLR value is labelled as hotspot (see Fig. 2).

3.2 LSTM Architecture to Predict Future Hotspot Using LLR Matrix M is reshaped into a 22,801-dimensional vector which is normalized. A sequence of vectors from 10 consecutive timestamps is given as input to the LSTM network. The target variable consists of 11th timestamp’s normalized hotspot parameters, (r, x, y) where r is the radius of circle and (x, y) is the centre of circle. The LSTM architecture (see Fig. 3) consists of two layers. It uses the default functions, namely hyperbolic tangent function and sigmoid function, at the input gate, forget gate and output gate (refer [12]). xt represents the input, which in our case is 10 consecutive vectors of size 22,801 each and yt represents the output of LSTM cell. yt−1 represents the previous output from the cell. igt , f gt , ogt and cst are the input gate, forget gate, output gate and cell state vectors respectively at the current instant t. P and Q are the weight matrices. b is the biases for the gates. The equation at the input gate is given by igt = sigma(Pi yt−1 + Q i xt + bi ). The equation at the forget gate is given by f gt = sigma(P f yt−1 + Q f xt + b f ). The equation at the output gate is given by ogt = sigma(Po yt−1 + Q o xt + bo ). The equation of cell vector is given by cst = f gt . ∗ cst−1 + igt . ∗ tan h(Pc syt−1 + Q c sxt + bc s), and the equation of the output yt is given by yt = ogt . ∗ tan h(cst ) where .∗ represents element-wise

40

S. Swedha and E. S. Gopi

Fig. 3 LSTM cell diagram

Fig. 4 Traffic density values of 10 consecutive traffic density images (time 105 to time 114) as plotted in Fig. 1 are reshaped into a vector each of size 22,801. The raw data is given as input sequence to the LSTM. It predicts the hotspot parameters of the 11th timestamp (time115). We have plotted the circle found through LLR computation (red) and LSTM-LLR predicted circle (black). The zoomed portion of the predicted hotspot can be seen in the 12th subplot

vector product (refer [12]). The ten consecutive timestamps and its prediction of hotspot parameters of 11th timestamp through LSTM-LLR can be seen in Fig. 4 for timestamp 115.

4 Hotspot Prediction Using CDF Method In the first proposed solution, we are reshaping the entire normalized matrix of dataset and using it as input to the LSTM layer. This has higher complexity due to its large size. Hence, our aim is to predict the hotspot parameters through a simpler approach. One such approach is to use ‘cumulative distribution function (CDF)’. It provides a better representation of the dataset. In this method, we compute expectation values within contours which are concentric circles from the centre of hotspot calculated through LLR method and represent it as an image. We give CDF values appended

LSTM Network for Hotspot Prediction in Traffic …

41

with centre coordinates (for each timestamp) of 10 consecutive timestamps as input to the LSTM network and use 11th timestamp’s hotspot parameters calculated through CDF method as target variables.

4.1 Algorithm to Find Hotspot Using Cumulative Distribution Function 4.1.1

Computation of CDF

We take the centre of hotspot computed using LLR method. We consider radius of 4 units and increase it by 1 unit until √ all the pixels are covered. Since it is a square image, the maximum radius would be 2 times the length of side of square. This is because the maximum radius would occur when the√hotspot is at the corner of the square. Thus, the radius is incremented from 4 to 2 times 151 which is approximately 213 units by steps of 1 unit. We then add the values at each pixel within the circle considered at each iteration and divide it by the sum of all values at all pixels. We store this value in a 210-dimensional vector. Thus, CDF is computed in this manner. As one can expect, the last element of CDF vector will always be 1. We take the least radius as hotspot radius for which the CDF value is greater than 0.1. We have taken the value as 0.1 after experimentation. At this value, the radius calculated through LLR method and radius calculated through CDF method almost coincide (see Fig. 5 as an example). This value may differ depending on the dataset.

4.1.2

Computation of Values of Contours

We start from the centre of hotspot computed using LLR method. We increase the radius from 4 to 213 units by steps of 1 unit. Consider a 151 × 151 dimensional matrix Contour to store values of each contour. We count the number of pixels within a ring of inner radius r − 1 and outer radius r where r > 4 and the number of pixels within a circle of radius r = 4. For the pixels inside the ring, the corresponding elements of Contour matrix are assigned as difference between CDF value at r and CDF value at r − 1 multiplied by the number of pixels inside the ring when r > 4. For r = 4 and pixels inside a circle of radius r = 4, the corresponding elements of Contour matrix are assigned as CDF value at r multiplied by the number of pixels inside the circle of radius r = 4. Once all the elements of Contour matrix have been assigned, 100 times logarithm of Contour matrix is plotted along with the hotspots from both methods (see Fig. 5).

42

S. Swedha and E. S. Gopi

Fig. 5 (i) and (ii) represent timestamps 115 and 116, respectively. a Contour images. The red and green circles represent the hotspot regions obtained through LLR and CDF methods, respectively. Note that in the 115th timestamp, the two circles have coincided. b Cumulative distribution function’s stem graphs

4.2 LSTM Architecture to Predict Future Hotspot Using CDF CDF vectors of all timestamps are already in the normalized form. A sequence of vectors with CDF value of each timestamp appended with the normalized x and y coordinates of hotspot of that timestamp from 10 consecutive timestamps is given as input to the LSTM network. The target variable consists of 11th timestamp’s normalized hotspot parameters, (r, x, y) where r is the radius of circle and (x, y) is the centre of circle. Thus, the input size becomes 210 + 2, which is 212. The LSTM architecture consists of two layers (see Fig. 6). It uses the default functions, namely hyperbolic tangent function and sigmoid function, at the input gate, forget gate and output gate (refer [12]). xt represents the input, which in our case is 10 consecutive vectors of size 212 each and yt represents output of the LSTM cell. yt−1 represents the previous output from the LSTM cell. The equations of the LSTM cell

LSTM Network for Hotspot Prediction in Traffic …

43

Fig. 6 LSTM cell diagram

Fig. 7 The CDF values of 10 consecutive contour images (time 105 to time 114) as plotted in Fig. 5 are initialized as a vector each of size 210. It is appended with normalized values of centre of hotspot parameters of the 10 consecutive timestamps. It is given as input sequence (each of size 212) to the LSTM network. It predicts the hotspot parameters of the 11th timestamp (time115). We have plotted the circle found through LLR computation(red) and CDF computation (green) and LSTM-CDF prediction (yellow). In this case, the circles have coincided. Hence, only the LSTMCDF predicted circle (yellow) can be seen. The zoomed portion of the predicted hotspot can be seen in the 12th subplot

are same as those in Sect. 3.2. The ten consecutive timestamps and its prediction of hotspot parameters of 11th timestamp through LSTM-CDF can be seen in Fig. 7 for timestamp 115.

5 Results Different stages of implementation of the two methods discussed can be seen in Fig. 8. The zoomed portions of Fig. 8 can be seen in Fig. 9. It shows how close the circles computed through CDF method and predicted using LSTM-LLR and LSTM-CDF are with that computed through LLR method. The hotspot values can be seen in Fig. 12. Columns 3 and 5 represent the radius in terms of kilometres and centre coordinates (x, y) in terms of latitude and longitude, respectively. When the

44

S. Swedha and E. S. Gopi

Fig. 8 (i)–(v) represent timestamps 113–117, respectively. a Representation of traffic density as images as described in Fig. 1. b Hotspot predicted by LLR method as described in Fig. 2. c Contour images obtained through CDF method along with hotspot regions predicted by LLR and CDF method as described in Fig. 5. d Hotspot regions predicted by LSTM-LLR (black circle) and LSTMCDF (yellow circle) methods along with the hotspot regions computed through LLR (red (see Fig. 2)) and CDF (green (see Fig. 5)) methods

Fig. 9 Represents the zoomed portions of each timestamp (113–117) in Fig. 8d

LSTM Network for Hotspot Prediction in Traffic …

45

Fig. 10 (i) loss function of LSTM network (ii) normalized error in prediction of radius for testing data (iii) normalized error in prediction of x coordinate of centre of hotspot for testing data (iv) normalized error in prediction of y coordinate of centre of hotspot for testing data for a LLR method b CDF method

Fig. 11 Represents the average error in hotspot parameters when different methods (LLR and CDF) are compared with LSTM-LLR and LSTM-CDF predictions

Fig. 12 Represents the hotspot parameter values found through LLR and CDF computation and their corresponding values in the real world for the given data set

radius differs by 1 unit on the image, it translates to a difference of 0.221 km (see columns 2, 3, 4 and 5 of row 1 in Fig. 12). The comparison of the performance of the LSTM networks of both methods can be seen in Figs. 10 and 11. The average error in predicting the hotspot parameters can be seen in Fig. 11. The first column describes which method and prediction are compared. The LSTM-CDF prediction method (with radius computed using LLR value or CDF threshold) performs better than LSTM-LLR prediction method in predicting the centre of hotspot as its average

46

S. Swedha and E. S. Gopi

error between predicted and actual values of x- and y-coordinates of the centre of hotspot is lesser than those of LSTM-LLR prediction method (see rows 1 and 2 of Figs. 11 and 10), although the average error between predicted and actual values of radius predicted by the LSTM-CDF prediction method is more than that through LSTM-LLR prediction method (see row 1, column 1 of Figs. 11 and 10). However, the total average error in predicting hotspot parameters using LSTM-CDF prediction using both CDF threshold and LLR value on comparison with LSTM-LLR method is lesser (see Fig. 11). This can be further verified by seeing Figs. 4, 7 and 9. It can be noted that prediction of centre of hotspot region is important for efficient resource allocation and LSTM-CDF prediction even with the radius computed using CDF threshold performs well. The LSTM architecture in CDF method is of lesser complexity as the input size is lesser than the LSTM architecture for LLR method (see Figs. 3 and 6). Depending on the application, the contour rings after a certain CDF value need not be considered. This is because the CDF value tends to become nearly the same, with marginal difference beyond a certain radius. This need not be represented in the image if that particular application does not require them (Fig. 12).

6 Conclusions The prediction of hotspot and computation of contour can be used to steer the antenna to the desired direction. This would help in better reception of signals and increase signal-to-noise ratio. It would result in better allocation of bandwidth. Further, the beam forming capabilities are enhanced through the prediction of hotspot as the antenna can focus better on the more dense regions. In fact, the contour images give a better description of traffic density as it also estimates an expected value for each contour. In this manner, a more efficient resource allocation takes place [13]. Thus, when a mobile user is connected from one base station to the other, it results in a smooth hand over. As it can be seen from the training data’s images, there is only one hotspot in every timestamp. However, this might not be the case for a different geographical region. In such cases, the image can be divided into parts (say 4), and for each part, a local hotspot can be computed using the algorithms mentioned in this paper. The overall hotspot of the timestamp would be the one that corresponds to the highest LLR value. While training the LSTM network, the target variables would be the parameters of all local hotspots.

References 1. Wang J, Tang J, Xu Z, Wang Y, Xue G, Zhang X, Yang D (2017) Spatiotemporal modeling and prediction in cellular networks: a big data enabled deep learning approach. In: IEEE INFOCOM 2017—IEEE conference on computer communications

LSTM Network for Hotspot Prediction in Traffic …

47

2. Feng J, Chen X, Gao R, Zeng M, Li Y (2018) DeepTP: an end-to-end neural network for mobile cellular traffic prediction. IEEE Netw 32(6):108–115 3. Zhang C, Patras P (2018) Long-term mobile traffic forecasting using deep spatio-temporal neural networks. In: Mobihoc ’18: proceedings of the eighteenth ACM international symposium on mobile ad hoc networking and computing, pp 231–240 4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor 21(3):2224–2287 5. Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. In: Proceedings of ICANN’99 international conference on artificial neural networks (Edinburgh, Scotland), vol. 2. IEE, London, pp 850–855 6. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst J 7. Huang C-W, Chiang C-T, Li Q (2017) A study of deep learning networks on mobile traffic forecasting. In: 2017 IEEE 28th annual international symposium on personal, indoor, and mobile radio communications (PIMRC) 8. Chen L, Yang D, Zhang D, Wang C, Li J, Nguyen T-M-T (2018) Deep mobile traffic forecast and complementary base station clustering for C-RAN optimization. J Netw Comput Appl 00:1–12 9. Zhang C, Zhang H, Yuan D, Zhang M (2018) Citywide cellular traffic prediction based on densely connected convolutional neural networks. IEEE Commun Lett 22(8):1656–1659 10. Nair SN, Gopi ES (2019) Deep learning techniques for crime hotspot detection. In: Optimization in machine learning and applications, algorithms for intelligent systems, pp 13–29 11. Chen X, Jin Y, Qiang S, Hu W, Jiang K (2015) Analyzing and modeling spatio-temporal dependence of cellular traffic at city scale. In: 2015 IEEE international conference, 2015 communications (ICC) 12. Lu Y (2016) Empirical evaluation of a new approach to simplifying long short-term memory (LSTM). In: arXiv:1612.03707 [cs.NE] 13. Alawe I, Ksentini A, Hadjadj-Aoul Y, Bertin P (2018) Improving traffic forecasting for 5G core network scalability: a machine learning approach. IEEE Netw 32(6):42–49

Generative Adversarial Network and Reinforcement Learning to Estimate Channel Coefficients Pranav Mani , E. S. Gopi , Hrishikesh Shekhar , and Sharan Chandra

Abstract The emergence of massive multiple-input multiple-output (MIMO) systems throughout the world, due to the promise of enhanced data rates, has led to an increasing need to guarantee accuracy. There is little value in large data rates if the channel state information (CSI) is subject to frequent contamination. In the context of massive MIMO systems, error in decoding the signal is introduced mainly due to two key factors: (i) intercell interference (ii) intracell interference. The problem of extracting the information signal from the contaminated signal can be interpreted as a signal separation problem where all the signals involved are Gaussian. A twostep approach is proposed to achieve this. First, a generative adversarial network (GAN) is used, to learn the distribution of three Gaussian sources (the desired signal, interference, and noise) from their mixture. The learnt distributions yield the mean and variances of three Gaussian signals. The variances predicted by the GAN, along with the sum, are used to generate the three original signals. This is interpreted as a reinforcement learning (RL) problem. Such an interpretation provides for life-long learning with decreasing error in the estimated signals. As a result, the desired signal is recovered from a corrupted signal, in a two-step process. Keywords Multiple-input multiple-output (MIMO) · Channel state information (CSI) · Artificial neural networks (ANNs) · Deep learning (DL) · Generative adversarial networks (GANs) · Reinforcement learning (RL)

1 Introduction Massive MIMO systems are one of the primary emerging 5G wireless communication technologies [1] and continue to grow in popularity. These systems rely on orthogonality of a user’s channel vector with the received signal, hence allowing a All authors have given equal contribution. P. Mani (B) · E. S. Gopi · H. Shekhar · S. Chandra Department of Electronics and Communication Engineering, National Institute of Technology, Tiruchirappalli 620015, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_4

49

50

P. Mani et al.

computationally inexpensive operation to recover the message signal [2]. However, recovering the required user’s channel coefficient is challenging. In practice, a “pilot” data is sent across the channel to calculate the required channel coefficient values, but the values observed are corrupted by both inter- and intracell interference, as well as noise. In recent times, deep learning (DL) algorithms [3] have played a pivotal role in mobile and wireless networking [4] and hence, the authors look to leverage the power of DL-based methodologies to attack this problem. The disturbances produced due to the inter- and intracell interference can be modeled in the form of Gaussian distributions. Hence, the problem of extracting the channel coefficient can be reduced to the separation of three Gaussian variables. This paper deals with the separation of the observed channel coefficients at the base station into the intended signals, intracell interference and intercell interference. The technique utilized in this paper to deduce the value of the channel coefficient is broken down into two key parts. The first part deals with estimating the individual distributions given the sum. The subsequent part deals with sampling values from the distributions modelled in the first step, to estimate values of each of the signals that comprise the sum. Generative adversarial networks (GAN) [5] have shown to be promising in wireless channel modeling [6]. In this paper, the authors look to tap into this potential and hence in Sect. 3.2, a GAN approach is proposed to explore isolation of the constituent Gaussian distributions observed in the received signal. The use of GANs is based on their ability to learn underlying data space distributions without explicit parameterization of output density functions. The sum is fed into three separate GANs and each GAN is tasked with learning one of the three constituent distributions. In this paper, the authors collectively refer to this approach of using an array of GANs as a “Bank of GANs” approach. Utilizing the bank of GANs, the constituent parameters of the Gaussian distributions are derived. In this case, these parameters are the mean and variance. Section 3.3 explores the use of a neural network with an MMSE loss function, which can be interpreted as a reinforcement learning (RL) [7] problem, to sample three signals from the learnt distributions conditioned on the value of the sum of the signals which is available at the receiver’s side. In our context, the state of the agent is made up of three variances deduced by the GANs along with the observed value of the channel coefficient, i.e., the Gaussian sum or the mixture signal. An action then consists of using this environment state information to estimate the values of the original uncontaminated channel coefficient, intercell interference signal, and intracell interference signal. The reward is set as a negative of the mean squared error (MSE). Maximization of the reward is then equivalent to minimization of the MSE loss function between the predicted signals and the actual signals that are present in the training data. It can also be noted that an interpretation as a singlestep RL problem can be extended to enable lifelong learning. This would then allow for the estimate of the signals to adapt to changes in the physical properties of the channel with time.

Generative Adversarial Network and Reinforcement Learning …

51

2 Contributions of the Paper In this paper, the authors explore a two-step approach to obtain corrected channel coefficients from corrupted channel coefficients. The first step attempts to separate the underlying distributions that make up the corrupted channel coefficients, which consists of three separate Gaussian distributions, using a “Bank of GANs” approach. This is followed up by extracting the true value of the channel coefficient by employing an RL agent. It is shown that the GAN-based approach is able to extract the source distributions. Further, it is seen that a lifelong learning [8] RL system is capable of picking up trends from the underlying data conditioned on the variance they are drawn from and their sum. The authors also point out the idea that lifelong learning allows adaptability to changes in the physical properties of the channel.

3 Signal Source Separation 3.1 Using Generative Adversarial Networks and Reinforcement Learning In practical scenarios, there exists multiple cells or base stations. In such scenarios, an antenna in a base station receives a mixture of three signals, which are the intended signal (message), intracell interference (interference from other users in that channel), and intercell interference (between two cells). In order to extract the intended signal, we need to separate the sources from the mixture. In the massive MIMO scenario, we can model all the sources as being Gaussian distributed. That is, if X 1 , X 2 , X 3 are three Gaussian distributed signals and we have, X = X 1 + X 2 + X 3 , we need to generate our estimates of the source signals, X 1 , X 2 , X 3 . A two-step approach is proposed to achieve this (refer Fig. 1). First, a deep generative model is applied to learn the underlying distribution of the data, without parameterizing the output. A generative adversarial network (GAN) is used to achieve this. This is shown in Sect. 3.2. Subsequently, a reinforcement learning-based approach is used to obtain estimates of the source signals from the learnt distributions. This is shown in Sect. 3.3.

3.2 Generative Adversarial Networks for Data Space Distribution Modelling Generative adversarial networks (GANs) consist of a generator and a discriminator. The generator tries to generate data that follows the underlying probability distribution of the training samples without explicitly parameterizing the output density.

52

P. Mani et al.

Fig. 1 Schematic representation of the proposed system to estimate channel coefficients. Here, the output from each of the GANs is a random variable which follows the distribution X i ∼ N (0, βi ) where N (0, βi ) represents a Gaussian distribution with zero mean and variance of βi

The role of the discriminator is to identify real and fake data. In order to model the discriminator and the generator, neural networks are used. A neural network G (z, θ 1 ) is used to model the generator and it maps input z to the data space x (the space in which the training samples of the desired distribution lie). The discriminator neural network D (x, θ 2 ) gives the probability a vector x from the dataspace is real. Therefore, the discriminator network weights need to be trained so as to maximize D(x, θ 2 ) when x belongs to the real dataset, and 1−D(x, θ 2 ) when x belongs to fake data (generated by the generator network), that is x = G(z, θ 1 ). Thus, we can interpret the discriminator and the generator as two agents playing a minimax game on the following objective function, V (while using binary cross entropy loss):      V (D, G) = min max Ex∼pdata (x) log (D (x)) + Ez∼pz (z) log (1 − D (G (z))) G

D

(1) where pdata is the distribution over the real data and pz is the distribution over the input to the generator. The proposed algorithm uses a “Bank of GANs” approach consisting of three GAN networks. The output of the ith generator is values drawn from the distribution corresponding to the ith signal. Using the trained GAN, the mean and variance of each Gaussian source are obtained (refer Fig. 2). In Sect. 3.3, this learnt distribution is used to sample estimates of the original source signals using a reinforcement learning approach. The procedure used to learn the distributions is presented as Algorithm 1.

Generative Adversarial Network and Reinforcement Learning …

53

Fig. 2 Collective output from the “Bank of GANs” model is plotted. The GANs were fed with signals from Gaussian distributions having variances a 1, 2.25, 4 and b 2, 10, 10. The predicted variances are a 0.88, 2.323, 4.115 and b 2.08, 10.147, 11.038

Algorithm 1 Bank of GAN(s) 1: Define neural network architecture for the generator and discriminator 2: Initialize three generator(G 1 , G 2 , G 3 ) and discriminator networks(D1 , D2 , D3 ) for learning the distributions of the three signals 3: Generate data from three Gaussian distributions to represent signal, intercell and intracell interference 4: Add them to form the mixture signal 5: for each epoch do 6: for each batch do 7: for i = 1, 2, 3 do 8: Sample a batch of X i source signals from real(training data) 9: Sample batch of mixture signals and forward through G i to obtain fake data. 10: Forward the real and fake data through Di to obtain predictions of probabilities of inputs being real. 11: Feed this probability to negative binary cross entropy log function. 12: Use this prediction to obtain the gradients with respect to the weights of the discriminator network. 13: Update the weights of the discriminator using the optimizer chosen. 14: Forward the fake data through the discriminator to obtain probability that generator output is classified as real by the discriminator. 15: Feed this probability to the negative binary cross entropy loss. 16: Use this prediction to obtain gradient of the objective function with respect to parameters of the generator network. 17: Update the weights of the generator 18: end for 19: end for 20: end for 21: Use the trained networks to forward on the mixture signal to produce outputs. 22: Collect outputs and compute estimates of variance and mean.

54

P. Mani et al.

3.3 Reinforcement Learning-Based Sampling Technique for Signal Estimation In the previous section, a generative adversarial network was used to learn the underlying distribution of the source signals given the mixture signal. In this section, a single-step RL method is proposed to sample the learnt distribution given the mixture signal, so as to extract the original source signals. This method can be interpreted as a reinforcement learning agent which represents its environment using the mixture signal and the variances of the three source signals (obtained using the GANs). From this state, an action simply consists of sampling three signals from the predicted Gaussians. This is done using a neural network whose outputs are the required estimates. It can be noted that, during training, a batch of mixture signals of batch size, m, is fed as input and the reward is defined to be the negative of the collective mean squared error. This allows the network to understand trends that are typical of the signal and noise data. Also, a less noisy training period is observed. The reward function was designed such that it acts as a measure of how close the sampled signals are to the original source signals. The next task is to perform gradient ascent on this reward. Equivalently, an attempt is made to minimize the negative of the reward,   2 3 (1/m)|xi − xˆi  | , where xi is the actual ith batch of source signals R = − i=1 and xˆi is the ith corresponding batch of source signals sampled by the RL agent. It can be observed from the results shown in Fig. 3 that the predicted signals learn the trend of the original source signals sampled from the dataspace. The algorithm used for training this agent is described in Algorithm 2. Algorithm 2 Single-Step RL for Sampling 1: Initialize parameters of a neural network to mathematically model sampling policy 2: Initialize parameters for learning such as learning rate, optimizer, batch size, number of epochs, etc. 3: Generate data drawn from three Gaussians of a range of variances 4: Add them to form the mixture signal 5: for each epoch do 6: for each iteration through batches do 7: Sample batch of mixture signals and variances of inputs and forward through network to obtain predictions of source signals. 8: Compute gradients with respect to the parameters of the neural network of the objective  2  3 function : − i=1 (1/m)|xi − xˆi  | 9: Update the weights of the neural network based on the optimizer used. 10: end for 11: end for 12: Use the trained network for obtaining samples while inferencing

Generative Adversarial Network and Reinforcement Learning …

55

Fig. 3 Actual value of each signal is plotted in red while the predicted distribution is plotted in yellow. A and B depict two instances of test results with variances for A being in the range of 1–5 for the desired signal and 5–15 for interference and noise. Variances for B are in the range of 2–6 for the desired signal and 7–15 for interference and noise. Mapping for sum, X 3 , X 2 and X 1 are depicted in (a)–(d), respectively

56

P. Mani et al.

4 Results In this section, the results and experiments corresponding to the study carried out in each section are presented.

4.1 Extraction of Distributions Using GAN The approach explored in Sect. 3.2 is implemented using data drawn from Gaussian distributions with mean zer o and variances (a) 1, 2.25, 4 and (b) 2, 10, 10. The resulting distributions mapped by the GANs are illustrated in Fig. 2. It can be seen from Fig. 2 that the distributions predicted by the ‘Bank of GANs’, all exhibit Gaussian-like bell curves. The means and variances are calculated using the outputs from the generators. The variances obtained have a mean squared error of 0.0109 and 0.3684 in Fig. 2a, b, respectively. These variances are then fed into the RL network, for sampling the source signal’s values.

4.2 Estimating True Values of Channel Coefficients In this section, the results of training a neural network are highlighted. The network utilizes a leaky version of the rectified linear activation function [9] with a negative slope of 0.2, for the hidden layers. For the output layer, a linear activation function is used. For training, the network applies an RMSProp [10] optimizer with an initial learning rate of 0.0001. In order to generate training data, the Monte Carlo [11] approach is employed to generate signals from Gaussian distributions. The results of training are shown on two different variance ranges for the intended and interference signals: intended from 1 to 5, interference from 5 to 15, and intended from 2 to 6, interference from 7 to 15. The results of testing on these models are shown in Fig. 3. The test data is generated from Gaussians whose variances themselves are drawn from a uniform random distribution. The predictions of the agent and the true value of the coefficients are plotted in Fig. 3 for each of the three components along with the sum of the three components. It can be observed that even though the signals used to generate test data are sampled randomly, and conformed to lie within a limited range, the trained network is able to capture the trend in the signals. Although the RL network is able to capture the general trend in the signals, there still exists scope for improvement in mapping the exact values of the distribution. However, the actual mixture signal is mapped closely by the sum of the three predicted signals. The graphs in Fig. 3 indicate that the reinforcement learning agent is able to replicate the general trends of the input distribution with reasonable accuracy. The observed mean squared error loss on the test data in Fig. 3a is: 0.006919, 0.81324, 2.08179, and 1.87812 for Sum, X 1 , X 2 , and X 3 , respectively, and the observed mean

Generative Adversarial Network and Reinforcement Learning …

57

squared error loss on the test data in Fig. 3b is: 0.002142, 1.74944, 2.30441, and 2.17069 for Sum, X 1 , X 2 , and X 3 , respectively. The mixture of signals is also fed to train the GAN, along with the variances learnt by the GAN and obtain the estimates of the original signals from the RL agent. The observed MSE is 0.86326, 1.91800, 1.91943 and 0.00425 f or X 1 , X 2 , X 3 and Sum, respectively.

5 Conclusions Massive MIMO systems offer highly improved communication performance. One of the fundamental impediments to the use of these systems is pilot contamination [12]. The performance of the system is limited by the accuracy and complexity of the techniques used to estimate the channel coefficients based on the pilot signals. Considering the ability of generative adversarial networks to learn data space distribution, their usage in signal source separation is explored. It is seen that this approach can have practical significance in separating the distributions corresponding to the various signals being received at the base station, given a mixture of the individual distributions. It is known that independent component analysis [13] depends on the increased Gaussianity of a mixture as opposed to the original signals. Therefore, their usage here is ruled out when all signals are Gaussian. The authors have therefore presented the idea of using a two-step approach to estimating channel coefficients of a massive MIMO system from the contaminated coefficients, with the second step, involving sampling of values using a single-step RL approach. In doing so, the corrected channel coefficients could be obtained from the corrupted channel coefficients within a reasonable degree of accuracy. By using larger datasets, more accurate results can be achieved. It can be noted that the use of Wasserstein GAN [14] increased dataset sizes, and sophisticated network architectures are potential directions for improving on the accuracy of this two-step concept. An important application of the correction of channel coefficients in MIMO systems is enabling scaled-down power allocation at the sender side while still achieving nonzero SINR. The realization of such a system offers the potential to reduce power consumption while establishing finite channel capacity. In such systems, however, with large numbers of antennas, the number of channel coefficients increase and estimating them accurately is crucial. Further with today’s usage levels of mobile and hand-held wireless devices, there exists a surplus of data which contain underlying patterns in a wide array of applications including power consumption, message characteristics, etc. It is therefore important to investigate data-driven methods such as the one proposed in this paper.

58

P. Mani et al.

References 1. Wang CX, Haider F, Gao X, You XH, Yang Y, Yuan D, Hepsaydir E (2014) Cellular architecture and key technologies for 5G wireless communication networks. IEEE Commun Mag 52(2):122–130 2. Gopi ES (2016) Digital signal processing for wireless communication using Matlab. Springer International Publishing 3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor 21(3):2224–2287 5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680 6. Yang Y, Li Y, Zhang W, Qin F, Zhu P, Wang CX (2019) Generative-adversarial-network-based wireless channel modeling: challenges and opportunities. IEEE Commun Mag 57(3):22–27 7. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press 8. Thrun S (1998) Lifelong learning algorithms. In: Learning to learn. Springer, Boston, pp 181– 209 9. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 10. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 11. Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol. 10. Wiley, New York 12. Elijah O, Leow CY, Rahman TA, Nunoo S, Iliya SZ (2015) A comprehensive survey of pilot contamination in massive MIMO-5G system. IEEE Commun Surv Tutor 18(2):905–923 13. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430 14. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875

Novel Method of Self-interference Cancelation in Full-Duplex Radios for 5G Wireless Technology Using Neural Networks L. Yashvanth , V. Dharanya , and E. S. Gopi

Abstract Full-duplex communication is a promising technique which guarantees an enhanced spectral efficiency in modern 5G wireless communications. In this technique, same set of frequency channels is used for simultaneous uplink and downlink signal transmissions and hence is termed as full-duplex (FD) communications or full-duplex radios. However, a major shortcoming of this technique is the presence of self-interference (SI), which arises due to the presence of both transmitters and receivers in close proximity and in fact several solutions have been proposed to mitigate it. In this paper, we give a new insight on the applicability of neural networks in solving (linear and nonlinear) SI problems using hybrid cancelations. Keywords Full-duplex radios · Self-interference · Hybrid cancelation · Neural networks

1 Introduction It is a known fact that 5G wireless technology has a lot of defined innovative wireless principles and algorithms to provide maximized benefits to users in terms of high data rate, reduced power consumption, high spectral efficiency, etc. [1]. In such a scenario, “In-band Full-Duplex (FD)” communication is one among these methods, that seeks to achieve better spectral efficiency. It exploits the same set of All authors have given equal contribution. L. Yashvanth (B) · V. Dharanya · E. S. Gopi Department of Electronics and Communication Engineering, National Institute of Technology, Tiruchirappalli 620015, India e-mail: [email protected] V. Dharanya e-mail: [email protected] E. S. Gopi e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_5

59

60

L. Yashvanth et al.

Fig. 1 Full-duplex communication scenario

frequency resource channels for simultaneous uplink and downlink signal transmissions [2], unlike distinct forward and reverse channels, that happens in conventional half-duplex communication (3G/4G). The scenario is pictured in Fig. 1 for communication between two nodes in FD mode with all signals being transmitted/received in carrier frequency band f 1 . But, owing to the mentioned technique, there is a very high possibility of the receiver to receive signals from its own transmitter, which is formally called as selfinterference (SI) (red highlighted signals in Fig. 1) [3–5]. SI is an artifact that arises in FD scenarios which is highly undesirable and thus has the potential to corrupt the receiving signal (or) received signal of interest (SOI) to a large extent. There have been many works being reported as of today, to solve the problem of SI. One among them, which proves to be effective is the hybrid cancelation of SI [2, 5]. This method utilizes successive SI cancelations in analog and as well in digital domain. Whereas such robust hybrid cancelation of SI do exist, in this paper, we give a new direction to tackle the problem by means of artificial intelligence (AI) using neural networks, which shall be proved to perform well even for nonlinear SI cancelation.

2 Signal Modeling In this section, an attempt is made to model the complete transceiver at baseband level of transmission. The complete setup along with SI cancelation mechanism is shown in Fig. 2. As shown in figure, let x(nTs ) be the actual digital baseband data to be transmitted. Subsequently after passing it through DAC and a power amplifier (typically Class C power amplifier), let the final analog baseband signal that is ready to be modulated and transmitted through antenna be denoted as x1 (t). Note that, for the sake of simplicity, bandpass processing blocks such as modulator and demodulators are not shown in this figure. On the other hand, let the receiving antenna receive a signal from another host with same carrier frequency that it used for transmitting its data to the same host. Thus, the picture renders to an “In-band Full-Duplex” communication. As a result, let the

Novel Method of Self-interference Cancelation in Full-Duplex …

61

Fig. 2 Transceiver setup signal modeling

self-interference (SI) from the transmitter affect the receiver of the same transceiver. Hence, the overall received signal at the input of receiver block at baseband level is assumed to be y(t) + α1 x1 (t − β) + α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) + g(t) where y(t) is the actual signal of interest (SOI) with α1 x1 (t − β) and α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) representing the linear SI and nonlinear SI (neglecting higher-order terms greater than 5th harmonic), respectively, from transmitter [2, 4]. It should be noted that due to the presence of RF circuit application in the transceiver such as Power amplifiers etc., it is quite possible for them to generate these higher-order harmonics of their inputs [6]. Hence, these terms manifest as nonlinear SI to the receiver. Furthermore, let α and β represent the possible scaling and delay factors incurring to the transmitting signal to manifest as SI at receiver. Additionally, let g(t) represent the additive channel noise. Subsequently, first stage SI cancelation is performed in analog domain (discussed in upcoming section) which tries to perform partial linear SI cancelation only. Further, passing it through LNA and ADC, digital cancelation is also performed which makes the resultant output free from both linear and nonlinear SI. Thus, this method of employing both analog and digital cancelation of SI is commonly referred to as the hybrid cancelation of SI.

3 Solutions for Self-Interference (SI) Cancelation 3.1 Outline of Hybrid SI Cancelation 3.1.1

Passive Analog Cancelation

In the passive analog cancelation, an RF component subdues the SI. This can be realized with the help of a circulator, antenna separation, antenna cancelation, or an isolator. One of the main limitations of this technique is that it cannot suppress the SI reflected from the environment. More details can be found in [2, 5].

62

L. Yashvanth et al.

3.1.2

Active Analog Cancelation

The residual SI from passive analog cancelation is alleviated by the active analog cancelation. As has been mentioned earlier, this attempt is made only to suppress the linear SI from the composite signal, which otherwise would lead to saturation at the ADC block leading to SOI distortion. Accordingly, let the composite received signal as discussed be y3 (t) = y(t) + α1 x1 (t − β) + α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) + g(t)

(1)

with symbols having same meanings. Thus, active analog cancelation (or simply call it analog cancelation) attempts to remove linear term, namely αx1 (t − β) before processing the signal by LNA. As the transmitted signal x1 (t) is known to receiver (because it is the same node which transmits the transmitting signal), active analog cancelation tries to generate an estimate of αx1 (t − β) and hence removes it from the received signal leaving with partial SOI. Literature given by Kim et al. [2], mentions that, this estimate can be predicted as a linear combination of time-shifted versions of the transmitting signal x1 (t). An attempt is made to understand this logic and is mathematically worked as follows: From Nyquist Sampling-Reconstruction theorem, α xˆ1 (t) = α

∞ 

x1 (nTs ) sin c(

n=−∞

t − n) Ts

(2)

It can be shown that [7],

α xˆ1 (t + τ ) = α =⇒ α xˆ1 (τ − β) = α

∞  n=−∞ ∞ 

x1 (nTs + τ ) sin c(

t − n) Ts

(3)

x1 (nTs + τ ) sin c(

−β − n) Ts

(4)

n=−∞

Replacing τ with t and introducing a variable cn , the equation is modified as: α xˆ1 (t − β) =

∞  n=−∞

cn x1 (t + nTs )

(5)

Novel Method of Self-interference Cancelation in Full-Duplex …

63

Hence, once an estimate of the linear SI is computed, it is subtracted from the composite received signal to obtain a partially suppressed SI signal. i.e., y2 (t) = y3 (t) − α xˆ1 (t − β)

3.1.3

(6)

Digital Cancelation

As mentioned, the main challenge in the SI cancelation is the linear distortion caused by multipath delay. Apart from this, the nonlinear distortion caused by the nonlinearity of the transmitter power amplifier (PA) at high transmit power, quantization noise, and phase noise [2, 4] also hinder efficient SI cancelation. For simplicity, let us ignore quantization noise, because it is almost inevitable to prevent it in the regime of digital signal processing. The purpose of digital SI cancelation is to completely suppress the residual SI from the analog cancelation techniques. In digital cancelation technique, an attempt is made to model the SI channel filter, whose output (which is SI) is then merely subtracted from the ADC output [4, 5, 8]. The linear interference component can be modeled as: m 

I1 (n) =

h(k)y1 (n − k)

(7)

k=−m+1

where I1 (n) is the linear SI component and h(k) constitutes the corresponding linear SI channel filter parameters. Similarly, let, I2 (n) =

2r +1 

m 

y1 (n − k)|y1 (n − k)|t−1 h t (k)

(8)

t=3,5,7 k=−m+1

where I2 (n) is the nonlinear interference component and h t (k) constitutes the coefficients of the tth order nonlinear SI channel filter model. Thus, the calculated linear and nonlinear components can be removed from the received signal y1 (n) as y1 (n) − I1 (n) − I2 (n) to estimate the signal of interest y(n) as yˆ (n). In order to obtain I1 (n) and I2 (n), the filter coefficients h(k) and h t (k) have to be estimated. For estimating the filter coefficients, let us define the associated cost functions, J1 and J2 formulated based on the least squares setup and seek them to be minimized. Thus, J1 and J2 , defined from [2] are as follows : J1 =

p−1  n=0

|I1 p (n) −

m 

y1 p (n − k)h(k)|2

(9)

k=−m+1

where p is assumed to be the number of pilot symbols. (Thus, it assumed that, in the pilot phase, receiving SOI is considered to be absent, which means y1 p (n) is

64

L. Yashvanth et al.

the actual raw transmitting signal (before launched by transmitting antenna), i.e., y1 p (n) = x(n)∀n ∈ {0, 1, . . . , p − 1} and I1 p (n) serving as the linear SI at output of analog cancelation block (digital version)). And from [9] as, J2 =

p−1  n=0

|I2 p (n) −

2r +1 

m 

y1 p (n − k)|y1 p (n − k)|t−1 h t (k)|2

(10)

t=3,5,7 k=−m+1

with symbols having similar meanings from that of (9). These equations are solved using pseudo-inverse technique (obtained from the method of least squares). This phase of estimating the filter coefficients from above equations can be precisely termed as SI channel estimation.

3.2 Proposed Solution for Implementing Digital Cancelation Using Neural Networks The proposition is to model the digital cancelation block using a neural network model. Here, a feed-forward, back-propagating neural network [10] is used to estimate the SI channel (comprising linear and nonlinear components jointly) by solving the minimization problems in (9) and (10). The model can be approximated with three hidden layers as shown in Fig. 3. Let the number of nodes in the input layer be N and all three hidden layers contain same number of nodes as the input layer. The output layer contains a single node for predicting the SI sample values. Thus, the associated overall loss function based on mean square error (MSE) per epoch would be of the form: n=Nb −1 1  |z(n) − zˆ (n)|2 Nb n=0

Fig. 3 Architecture of the neural network used

(11)

Novel Method of Self-interference Cancelation in Full-Duplex …

65

where with assumptions of 19 “effective” weights ({h k }k=9 k=−9 ), zˆ (n) = h −9 y1 (n − 9) + h −8 y1 (n − 8) + ... + h 0 y1 (n) + · · ·

(12)

+ h 8 y1 (n + 8) + h 9 y1 (n + 9)

and z(n) representing the desired SI samples. Here, Nb represents the number of iterations. To be very precise, Nb = p − N + 2. Also, it is reinforced that y1 (n) is same as x(n) in the training phase. At this point, it is worth to realize the fact that, the novelty of this paper is to utilize the same set of weights of neural networks (in analogy with the coefficients of filter modeling SI channel) to cancel both linear and nonlinear SI from the composite signal. Hence, the proposed method is indeed a better technique than conventional approach, wherein two different and distinct filters are employed to cancel linear and nonlinear SI separately. Thus, authors claim that the computational complexity of the proposed approach is indeed lesser than the conventional solutions. The trained models are then used to find the linear and nonlinear interference components from the received signal during testing phase. These interference components are then removed from the received signal to obtain the signal of interest, yˆ (n).

3.2.1

Implementation Details

Let the additional specifics of the neural network be initialized as follows: • Activation function for each layer—rectified linear unit(ReLU). • Total number of input nodes, N (hence, number of “effective” weights) = 19. • The weights are updated after each batch containing 19 samples per row is processed. Also, total number of training samples per iteration is 19. • The model is trained with 83 iterations per epoch. • Number of Epochs = 400. • Weights are updated using equations governed by Adam optimization technique (instead of classical stochastic gradient descent). Initially the weights were initialized to null values. • For tuning the hyperparameters of the model k-fold cross-validation (k = 5) is used instead of a separate validation dataset. Table 1 summarizes the details of the dataset used for training and testing.

4 Results and Discussions In this section, we describe the implementation details and corresponding results with suitable interpretations and discussions in performing SI cancelation using proposed neural networks. In order to illustrate the proposed solution, in this paper, the following scenario is considered.

66 Table 1 Dataset details Purpose

L. Yashvanth et al.

No. of sample sets

Description

Training

83

Testing

883

100 symbols out of 1000 symbols are used as pilot symbols (p = 100), i.e., for training the model. A total of 83 sample sets is formed out of 100 symbols with each set containing 19 symbols Remaining 900 symbols are used for testing which forms 883 sample sets

With the set of specifications as mentioned in Sec. 3.2.1, a MSE of 0.1 was obtained. 1. Transmitting signal, x(n)—Baseband signal of 10s duration (sampling frequency, f s = 100 Hz) with spectral content between 15 and 25 Hz. The signal is created using FIR coefficients by frequency sampling technique. 2. Receiving signal, y(n)—Baseband signal of 10s duration (sampling frequency, f s = 100 Hz) with spectral content between 35 and 45 Hz. The signal is generated as a periodic random Gaussian signal. 3. Channel noise, g(n)—Additive white Gaussian noise with resulting SNR = 0 dB. The relevant time-domain plots of transmitting and receiving signals are given, respectively, in Figs. 4 and 5. The corresponding frequency domain plots are depicted in Figs. 6 and 7. Further, in order to mimic the presence of analog versions of above signals, the signals are defined with a higher sampling frequency, say 10 times its original f s , i.e., 1000 Hz. Successively, as per (1), a composite signal to model the received signal embedded in SI is formed with β = 100 and with random values for αi ∀i ∈ {1, 2, 3}. This assumption is taken because in general, a wireless channel is time-varying in nature [11]. As per the sequence defined in Sect. 3.1, the foremost step is SI via passive analog cancelation. As this method is accomplished by means of physical structures, only the successive two steps are accounted for in this paper. Accordingly, next step is analog cancelation. In accordance with (5), an estimate of linear SI is constructed with the help of linear combination of 40 shifted versions of the transmitting signal. Let this estimate be subtracted from received composite signal to obtain y2 (t). The resultant signal which passes through a LNA and an ADC, is now ready to be processed by the trained neural network as described in Sect. 3.2. Once the neural network is trained, the network is then employed in testing phase, acting as a mere filter with defined weights obtained in training phase. Thus, the resultant signal is now filtered and subsequently, the output is subtracted from the signal that was partially SI free (the digital signal at input of neural network).

Novel Method of Self-interference Cancelation in Full-Duplex …

67

X 501 Y 5.237

6 5

Transmitting Signal Strength

4 3 2 1 0 -1 -2 -3 -4

0

100

200

300

400

500

600

700

800

900

1000

800

900

1000

Sample Number

Fig. 4 Transmitting Signal (SI signal) – Time domain waveform 3

Receiving Signal Amplitude

2

1

0

-1

-2

-3

0

100

200

300

400

500

600

Sample Number

Fig. 5 Receiving Signal (SOI) – Time domain waveform

700

68

L. Yashvanth et al.

Magnitude (dB)

50 0 -50 -100 -150

0

5

10

15

20

25

30

35

40

45

50

35

40

45

50

Frequency (Hz) 10

Phase (degrees)

0

4

-1 -2 -3 -4

0

5

10

15

20

25

30

Frequency (Hz)

Fig. 6 Transmitting Signal (SI signal) – Frequency domain - Magnitude and phase response

Magnitude (dB)

100 0 -100 -200 -300

0

5

10

15

20

25

30

35

40

45

50

35

40

45

50

Frequency (Hz) Phase (degrees)

0 -5000 -10000 -15000

0

5

10

15

20

25

30

Frequency (Hz)

Fig. 7 Receiving Signal (SOI) – Frequency domain – Magnitude and phase response

Novel Method of Self-interference Cancelation in Full-Duplex …

69

2.5 Receiving Signal Extracted Signal

2 1.5

Signal Amplitude

1 0.5 0 -0.5 -1 -1.5 -2 -2.5

10

20

30

40

50

60

70

80

90

100

Sample Number

Fig. 8 Superimposed Receiving Signal (SOI) and SI canceled signal – 0 to 100 samples 2.5 Receiving Signal Extracted Signal

2 1.5

Signal Amplitude

1 0.5 0 -0.5 -1 -1.5 -2 -2.5

110

120

130

140

150

160

170

180

190

200

Sample Number

Fig. 9 Superimposed Receiving Signal (SOI) and SI canceled signal – 100 to 200 samples

The resultant extracted signals are compared with the SOI and the relevant plots are sketched after averaging over 5 Monte Carlo simulations in Figs. 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17.

70

L. Yashvanth et al. 2.5 Receiving Signal Extracted Signal

2 1.5

Signal Amplitude

1 0.5 0 -0.5 -1 -1.5 -2 -2.5 200

210

220

230

240

250

260

270

280

290

300

Sample Number

Fig. 10 Superimposed Receiving Signal (SOI) and SI canceled signal – 200 to 300 samples 2.5 Receiving Signal Extracted Signal

2

Signal Amplitude

1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 300

310

320

330

340

350

360

370

380

390

400

Sample Number

Fig. 11 Superimposed Receiving Signal (SOI) and SI canceled signal – 300 to 400 samples

Novel Method of Self-interference Cancelation in Full-Duplex …

71

2.5 Receiving Signal Extracted Signal

2 1.5

Signal Amplitude

1 0.5 0 -0.5 -1 -1.5 -2 -2.5 400

410

420

430

440

450

460

470

480

490

500

Sample Number

Fig. 12 Superimposed Receiving Signal (SOI) and SI canceled signal – 400 to 500 samples 3 Receiving Signal Extracted Signal

2

Signal Amplitude

1

0

-1

-2

-3

-4 500

510

520

530

540

550

560

570

580

590

600

Sample Number

Fig. 13 Superimposed Receiving Signal (SOI) and SI canceled signal – 500 to 600 samples

72

L. Yashvanth et al. 3 Receiving Signal Extracted Signal

Signal Amplitude

2

1

0

-1

-2

-3 600

610

620

630

640

650

660

670

680

690

700

Sample Number

Fig. 14 Superimposed Receiving Signal (SOI) and SI canceled signal – 600 to 700 samples 3 Receiving Signal Extracted Signal

Signal Amplitude

2

1

0

-1

-2

-3 700

710

720

730

740

750

760

770

780

790

800

Sample Number

Fig. 15 Superimposed Receiving Signal (SOI) and SI canceled signal – 700 to 800 samples

Novel Method of Self-interference Cancelation in Full-Duplex …

73

3 Receiving Signal Extracted Signal

Signal Amplitude

2

1

0

-1

-2

-3 800

810

820

830

840

850

860

870

880

890

900

Sample Number

Fig. 16 Superimposed Receiving Signal (SOI) and SI canceled signal – 800 to 900 samples 3 Receiving Signal Extracted Signal

Signal Amplitude

2

1

0

-1

-2

-3 900

910

920

930

940

950

960

970

980

990

1000

Sample Number

Fig. 17 Superimposed Receiving Signal (SOI) and SI canceled signal – 900 to 1000 samples

74

L. Yashvanth et al.

Further, an attempt is made to visualize the efficiency of SI cancelation in frequency domain. In Figs. 18 and 19, while the former characterizes the spectral information of the net received signal (1), the latter depicts the frequency domain information of final SI-free extracted signal. While Fig. 18 contains substantial information across entire baseband from 15 to 45 Hz, Fig. 19 depicts the significant information only in the frequency range specified by SOI. Thus, it is evident that, the proposed solution indeed suppresses the nonlinear SI very well from the SOI just with the help of one filter (neural network), with merely one set of weights. Also, to look at the SI cancelation in more detail, consider Fig. 4, which suggests that for the 1000 sample transmitting signal, a peak in its amplitude occurs at approximately half the duration (≈ 501th sample) of the signal. However, by virtue of (1) and the choice of β as 100, more curiosity arises to visualize the different signal sample values at 601th sample. This illustration is shown in Fig. 20. Thus, as seen from the figure, SI is very well suppressed in the extracted signal. Furthermore, the correlation between the extracted signal and desired SOI is found to be profoundly higher.

Magnitude (dB)

60

40

20

0

0

5

10

15

20

25

30

35

40

45

50

35

40

45

50

Frequency (Hz) 10

Phase (degrees)

0

4

-5

-10 0

5

10

15

20

25

30

Frequency (Hz)

Fig. 18 Composite signal (received by receiver) – Frequency domain – Magnitude and phase response

Novel Method of Self-interference Cancelation in Full-Duplex …

75

Magnitude (dB)

60 40 20 0 -20

0

5

10

15

20

25

30

35

40

45

50

35

40

45

50

Frequency (Hz) 10 4

Phase (degrees)

0

-5

-10 0

5

10

15

20

25

30

Frequency (Hz)

Fig. 19 SI canceled signal – Frequency domain – Magnitude and phase response 15

X 601 Y 12.57

10

Signal Amplitude

SI Signal Receiving Signal Extracted Signal

5 X 601 Y 0.7048

0 X 601 Y -0.7835

-5

-10 585

590

595

600

605

610

615

Sample Number

Fig. 20 Illustration of SI Cancelation : Superimposed Receiving signal (SOI), SI canceled signal and SI signal (Transmitting signal)

76

L. Yashvanth et al.

5 Conclusions As has been described in Sect. 3.1, an effective conventional method of curbing the self-interference cancelation that arises in an In-band full-duplex communication is sought using hybrid cancelation technique. In so far, it demands for the employment of two separate optimum filters to suppress linear and nonlinear SI components, respectively, in the digital domain. However, the trivial means to construct them individually by solving (9) and (10) are computationally expensive. Hence, the proposed method using single neural network in place of aforementioned optimum filters greatly simplifies the computation expensiveness without compromising with the quality of results. This is because the neural network-based technique seeks to suppress both linear and nonlinear SI components jointly in a single step. The justification of this above statement clearly lies in the demonstrated results from the preceding section. It should be vigilantly noted that the suppression of SI is well achieved both in time and frequency domain. However, it is worth reconciling the fact that only till 5th harmonics which have more potential to cause significant SI are considered as nonlinear terms throughout this paper, and hence are liable to be canceled. Other higher-order terms can safely be neglected.

References 1. Chávez-Santiago R, Szydełko M, Kliks A et al (2015) 5G: the convergence of wireless communications. Wirel Pers Commun 83:1617–1642. https://doi.org/10.1007/s11277-015-24672 2. Kim J, Sim MS, Chung M, Kim DK, Chae CB (2016) Full duplex radios in 5G: fundamentals, design and prototyping. In: Luo FL, Zhang C (eds) Signal processing for 5G. https://doi.org/ 10.1002/9781119116493.ch22 3. Zhou M, Liao Y, Song L (2017) Full-duplex wireless communications for 5G. In: Xiang W, Zheng K, Shen X (eds) 5G mobile communications. Springer, Cham. https://doi.org/10.1007/ 978-3-319-34208-5_11 4. Ahmed EA (2014) Self-interference cancellation in full-duplex wireless systems. UC Irvine. ProQuest ID: Ahmed_uci_0030D_12951. Merritt ID: ark:/13030/m5rz0s9d. Retrieved from https://escholarship.org/uc/item/7zh6f8fm 5. Nwankwo CD, Zhang L, Quddus A, Imran MA, Tafazolli R (2018) A survey of self-interference management techniques for single frequency full duplex systems. IEEE Access 6:30242– 30268. https://doi.org/10.1109/ACCESS.2017.2774143 6. Farzaneh F (2018) Introduction to wireless communication circuits 7. Proakis John G, Manolakis Dimitris K (2006) Digital signal processing, 4th edn. Prentice-Hall Inc., USA 8. Kaiser T, Zarifeh N (2016) General principles and basic algorithms for full duplex transmission. In: Luo FL, Zhang C (eds) Signal processing for 5G. https://doi.org/10.1002/9781119116493. ch16 9. Haneda K, Valkama M, Riihonen T, Antonio Rodriguez E, Korpi D (2016) Design and implementation of full duplex transceivers. In: Luo FL, Zhang C (eds). https://doi.org/10.1002/ 9781119116493.ch17

Novel Method of Self-interference Cancelation in Full-Duplex …

77

10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536. https://doi.org/10.1038/323533a0 11. Rappaport T (2001) Wireless communications: principles and practice, 2nd edn. Prentice Hall PTR, USA

Dimensionality Reduction of KDD-99 Using Self-perpetuating Algorithm Swapnil Umbarkar and Kirti Sharma

Abstract In this digitized world, massive amount of data is available on the network but that too is not safe and secure from the stupefying techniques of the attackers. These threats lead to the need for intrusion detection systems (IDSs). As standard model KDD-99 dataset is used for research work in IDSs. But KDD-99 dataset suffers from the dimensionality curse as the number of features and the total number of instances available in the dataset are too large. In this paper, a self-perpetuating algorithm on the individually analyzed feature selection techniques is proposed. The proposed algorithm came up with reduced feature subset of up to 14 features with reduced time, increased accuracy by 0.369%, and number of features decreased by 66.66% with J48 algorithm. Keywords Feature selection · KDD-99 dataset · J48 algorithm · Classification · Dimensionality reduction

1 Introduction Intrusion in any network is the most blemish part of the network. Those attacks are being spread in various forms. The research work for this intrusion in a network is a major concern. New attacks are observed every single day on our system/network. From small firms to large organizations, all are in the trap of these attacks. Attackers find stupefying techniques every single day to interrupt the network that conquers the network security tools of even the big firms. Because of it, there is a need to develop systems to fight against every recent type of attack. To solve all these threats on the network, intrusion detection systems are made to detect the attacks. Based on the type of attack, intrusion detection systems (IDSs) are categorized into two types, namely host-based IDSs and network-based IDSs. Host-based IDSs scan and examine the computer systems’ files and OS processes, while network-based IDSs do the same over network traffic. To develop such IDSs, training of our system is to be S. Umbarkar (B) · K. Sharma Computer Science and Engineering Department, Parul Institute of Engineering and Technology, Parul University, Vadodara, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_6

79

80

S. Umbarkar and K. Sharma

done as such efficient outputs are drawn without any information loss. In this paper, the KDD-99 dataset is considered to analyze the results. The KDD-99 specifies the attacks types broadly in five types: (a) DDoS, (b) U2R, (c) R2L, (d) Probe, and (e) normal. The pitfall of the KDD-99 dataset is its increased dimensionality which is approximately 42*400 K. This much-increased dimensionality gives the increased time complexity for IDSs.

1.1 Feature Selection Methods Before training any IDS, feature selection is performed that would train our system to detect the attacks. Targeting the feature selection process and decreasing time complexity is the major concern of our study of research in this paper. Feature selection techniques are generally done in three ways: (a) filter methods, (b) wrapper methods, and (c) embedded methods. Our major concern in research is basically to uncover the technique which could give the best features in our feature selection process by analyzing all types of feature selection techniques. Filter methods generally give the feature by analyzing the interdependence with dependent features while wrapper methods analyze the adequacy of the feature by practically training the algorithms on it and at the end embedded methods give out the features by analyzing each recursion of the algorithm. In the paper, firstly the feature selection techniques are individually analyzed. Feature selection techniques [1] are analyzed individually: 1. 2. 3. 4. 5. 6. 7. 8.

CfsSubsetEval ClassifierAttributeEval ClassiferSubsetEval GainRatioAttributeEval InfoGainAttributeEval OneRattributeEval SymmetricalUncertAttributeEval WrapperSubsetEval

After analyzing these algorithms individually, the proposed algorithm is applied to take out the best feature selection technique with minimum features and increased time complexity.

2 Related Work This section is putting light on the previous studies on feature reduction methods and the classification methods used in order to increase the efficiency and reduce the time complexity. In 2018, Umbarkar and Shukla [1] proposed heuristic-based feature reduction techniques for dimensionality reduction of the KDD-99 dataset. They considered only three feature selection techniques, viz. information gain, gain

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

81

ratio, and correlation coefficient. They managed to achieve an accuracy of 92.60% using a C4.5 classification algorithm. They analyzed the results by considering only the three feature selection techniques although eight feature selection techniques are present. In 2020, Sydney Kasongo and Sun [2] proposed wrapper-based feature extraction for wireless intrusion detection system in which they proposed the WFEU technique giving 22 attributes of reduced feature set. They used SVM as the classification technique giving an accuracy of only 77.16%. In 2015, Bjerkestrand et al. [3] analyzed various feature selection algorithms and proposed three feature selection algorithms with different attribute evaluators and search methods. They indicated that the performance of the classifier is not affected if the number of attributes is reduced. In 2020, Li et al. [4] proposed a method by considering the weakly correlated features only. They gave out the results by dividing the original training dataset into four parts and applied CNN, giving high accuracy and low complexity on the NSL-KDD dataset. In 2019. Sara et al. [5] used filter and wrapper methods for feature selection. They used feature grouping based on linear correlation coefficient cuttlefish algorithm (FGLCC-CFA) with a reduced feature subset of 15 features applying FGLCC to get high speed and CFA to get the best accuracy. In 2019, Selvakumar and Muneeswaran [6] used filter and wrapper-based methods with firefly algorithm to reduce the large set of features in the KDD-99 dataset. Using C4.5 and Bayesian network, they gave the reduced feature set of 10 features only stating improved accuracy. In 2020, Alazzam et al. [7] brought an algorithm named pigeon-inspired optimizer which is derived by evaluating three datasets: KDD-99, NSL-KDD, and UNSW-NB15. The output presented the best efficiency in terms of TPR, FPR, accuracy, and F-score in comparison to other algorithms. Only one type of feature selection is used by them, wrapper methods, even though other ways can also be analyzed to compare the results. In 2019, Hakim et al. [8] brought the analysis by considering the stated algorithms, information gain, gain ratio chi-squared and relief selection methods with J48, random forest, Naïve Bayes, and KNN classification algorithms. The results come up with the best results showing enhancement in feature selection significantly but the little increment in accuracy. In 2011, Nziga [9] used two-dimensionality reduction techniques, one linear technique (principal component analysis), and the other nonlinear technique (multidimensional scaling) and found a lower-dimensional optimal set of features from the main dataset, KDD-99. In order to compare the classification techniques also, she used J48 classified and Naïve Bayes approach and got 4 and 12 dimensions of the features but the feature selection technique remained to two only. In 2015, Chabathula et al. [10] used principal component analysis (PCA) feature selection technique and applied it to different classifiers like random forest, KNN, J48, SVM and got the best result with random forest tree algorithm. Here too the author remained restricted to one feature selection technique. In 2010, Das and Nayak [11] proposed a generic approach where they used divide-andconquer technique leaving behind the feature selection algorithm available. They gave the idea for our algorithm, self-perpetuating algorithm, that a generic approach can lead to an algorithm too. After referencing all research papers related and required for our research work, we highlighted the above work that we considered as our reference. The previous studies

82

S. Umbarkar and K. Sharma

focused on increasing the accuracy but the set of features taken remained same, no reduction in that was done. Although some papers presented work in reduction of features too, we have done a general approach by considering all the feature selection techniques and taking out the best out of it using derived self-perpetuating algorithm.

3 Proposed Work 3.1 Basic Idea Behind Self-perpetuating Algorithm Intrusion detection systems (IDSs) are designed as such to detect the attacks in the network by training our system with the predefined dataset and applying the IDS on our system. As standard model, KDD-99 dataset is always considered. The same dataset is considered to propose the self-perpetuating algorithm. But the issue comes in the training phase only of the system. As already known, the KDD-99 dataset with 42 features is having huge dimensionality, this makes our training phase timeconsuming. The study proves that all the features present in the dataset are not of equal importance. The set of attributes can be reduced although with improved efficiency. Many researchers already made algorithms proposed to reduce the dimensionality of the dataset. They studied a particular class of feature selection leaving analysis of all the classes of feature selection techniques.

3.2 The Proposed Algorithm In the paper, a self-perpetuating algorithm is proposed by analyzing both filter and wrapper methods of feature selection. So major feature selection methods concerned for analyzing the results are: (a) CfsSubsetEval, (b) ClassfierAttributeEval, (c) ClassifierSubsetEval, (d) Information Gain, Gain Ratio, (e) OneRAttributeEval, (f) SymmetricalUncertAttributeEval, and (g) WrapperSubsetEval. The main principle of the self-perpetuating algorithm is based on analyzing individual feature selection techniques and then combining the best feature selection with rest so that features should not be lost, more efficient algorithm can be derived. The benchmark to compare the accuracy and time complexity of every method that will be analyzed is set as the accuracy (Ai ) and time complexity (T i ) with the full dataset. At the initial stage of the self-perpetuating algorithm, the dataset is passed to all the feature selection techniques successively. Algorithm: Self-perpetuating algorithm Input: KDD-99 dataset Output: Optimized reduced feature set A = Accuracy of full dataset T = Time complexity of full dataset

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

83

Ai = Accuracy of feature set FSi T i = Time complexity of feature set FSi FSi : CfsSubsetEval, ClassfierAttributeEval, ClassifierSubsetEval, Information Gain, Gain Ratio, OneRAttributeEval, SymmetricalUncertAttributeEval, WrapperSubsetEval 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

for each FSi do Apply C4.5 to each derived feature sets got from FSi if(Ai > A) && (T i > T ) FSbest_individual = FSi end if end for for each (FSi - FSbest_individual) do FScombine_features = (FSbest_individual) Union FSi Apply C4.5 to each derived feature sets got from FScombine_features if(Acombine_features > A) && ( Tcombine_features > T ) FSbest_combine = FScombine_features end if end for end

After applying the C4.5 classification algorithm at every technique, its accuracy and time complexity is compared with Ai and Ti . If it is greater than the benchmark, then that would be the superior technique giving best accuracy in less time and reduced feature set (FSbest_individual ). The next phase of the self-perpetuating algorithm is to unite FSbest_individual with the rest of the listed feature selection techniques and again comparing its accuracy and time complexity with Ai and Ti by applying C4.5 classifications algorithm. After analyzing all the iterations of the second phase FSbest_combine is derived.

4 Experimental Setup For the study, the KDD-99 dataset is considered. The system having 8 Gb of RAM and Windows 10 operating system. For analyzing the techniques, Weka 3.8.4 is used. The feature selection techniques listed in [1] are applied for the entire KDD-99 training dataset. After applying different feature selection techniques, the ranking of the features is derived, as shown in Table 1. After getting feature ranking for different feature selection algorithms, the most important features are selected for each category of feature selection techniques, and the rest of the features are considered as unpotential features as they are providing less information. The comparison of potential attributes and unpotential attributes is shown in Fig. 1 (Table 2).

12,3,37,5,6,2,36,32,23,31,24,35,34,38,33,1,25,39,30,4,26,29, 41,40,27,28,10,22,16,19,13,17,11,8,14,18,9,15,7,20,21

3,5,30,35,36

SymmetricalUncertAt tributeEval

WrapperSubsetEval

5,23,3,6,24,12,36,32,2,37,33,35,34,31,30,29,38,39,25,4,26,1, 40,41,27,28,10,22,16,19,13,17,11,8,14,18,9,15,7,20,21

Information_Gain

12,11,14,22,9,37,3,2,31,17,32,6,18,36,5,19,16,1,10,15,23,38, 25,35,39,24,26,30,4,34,33,41,40,27,29,13,28,8,7,20,21

3,5,11,23,32,34,35,36

ClassifierSubsetEval

5,23,3,6,12,36,32,37,24,31,35,33,34,2,1,39,41,38,40,30,29,2 7,25,26,4,10,16,28,19,22,17,11,13,18,14,15,9,7,20,8,21

41,13,12,20,14,15,16,17,18,11,10,9,4,2,3,5,8,6,7,19,21,40,34,33,22,35,36,37,38,39,32,31,30,25,23,24,26,29,27,28,1

ClassifieAttributeEval

OneRattributeEval

3,11,12,31,37

CfsSubsetEval

Gain_Ratio

Feature ranking

Feature selection method

Table 1 Feature ranking

84 S. Umbarkar and K. Sharma

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

85

Fig. 1 Attribute comparison

Table 2 Comparison of size and attributes of different reduced feature subset Feature selection methods

Volume (Mb)

No. of selected attributes

KDD full dataset

46.6

42

0

5

37

10

32

Cfs subset Eval

6.73

Classifier attribute Eval

10.8

Classifier subset Eval

12.9

No. of unpotential attributes

8

34

Info gain attribute Eval 15

10

32

Gain ratio attribute Eval

11.4

10

32

OneR attribute Eval

15.1

10

32

Symmetrical uncert attribute Eval

14.4

10

32

5

37

Wrapper subset Eval

8.37

For the selected feature subset, the J48 classification algorithm is applied on the KDD-99 training dataset. For each selected feature subset, after applying the classification algorithm in Weka 3.8.4, the results for training accuracy and training time are as shown in Table 3. Figures 2 and 3 show a comparison of different feature selection algorithms with their training accuracies and training time, respectively. Clearly, from both figures, WrapperSubsetEval is having more accuracy and less training time than the original KDD-99 dataset with 42 features. In the next stage feature subset selected by each feature selection method is considered and with a reduced feature set J48 classification algorithm is trained. In the next

86

S. Umbarkar and K. Sharma

Table 3 Comparison of feature selection methods Feature selection methods

Total attribute

Cfs subset Eval 42

Selected attribute

Classification algorithm

Accuracy (Training%)

Training time (s)

5

J48

97.140

0.800

Classifier attribute Eval

42

10

J48

93.892

1.760

Classifier subset Eval

42

8

J48

99.961

0.990

Info gain attribute Eval

42

10

J48

99.930

2.300

Gain ratio attribute Eval

42

10

J48

98.740

5.560

OneR attribute 42 Eval

10

J48

99.933

0.970

Symmetrical 42 uncert attribute Eval

10

J48

99.330

1.140

5

J48

99.952

0.830

Wrapper subset Eval

42

Fig. 2 Comparison of training accuracy of different feature selection methods

phase, accuracy and time complexity are calculated for the testing dataset. Table 4 gives the accuracy and time complexity of the J48 algorithm for different reduced feature sets of feature selection methods. From Fig. 4, WrapperSubsetEval is having more accuracy, i.e., 92.218% than original KDD-99 dataset with 42 features. The accuracy got from reduced feature subset is 0.133% more than original accuracy of KDD-99 calculated on 42 features.

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

87

Fig. 3 Comparison of training time of different feature selection methods

Table 4 Comparison of testing accuracy and time complexity of different feature selection methods

Accuracy (%) All features

92.085

Time (s) 14.77

CfsSubsetEval

87.174

ClassifierAttributeEval

85.411

129.2

ClassifieSubsetEval

92.049

Information_Gain

91.829

5.234

Gain_Ratio

91.807

11.678

OneRAttributeEval

91.763

142.43

SymmetricalUncertAttributeEval

91.828

123.11

WrapperSubsetEval

92.218

138.15

10.278 123.64

As well as, a number of features also reduced from 42 to 5, i.e., decrement of 88.095% of overall features volume (Fig. 1). In the next phase, the reduced feature set got from WrapperSubsetEval is combined with all other reduced feature subsets of feature selection methods. The J48 classification algorithm is trained by combined subsets and accuracy and time complexity is calculated for the testing dataset which is shown in Table 5. From Fig. 6 WrapperSubsetEval combined with Gain_Ratio is having more accuracy, i.e., 92.454% than the original KDD-99 dataset with 42 features. The accuracy got from the reduced feature subset is 0.369% more than the original accuracy of KDD-99 calculated on 42 features. As well as, the number of features also reduced from 42 to 14, i.e., decrement of 66.66% of overall features volume (Fig. 1).

88

S. Umbarkar and K. Sharma

Fig. 4 Comparison of testing accuracy of different feature selection methods

Fig. 5 Comparison of testing time of different feature selection methods

5 Conclusion and Future Work In this paper, with the help of our proposed self-perpetuating algorithm, we reduced feature subset of up to 14 features. On reduced feature subset testing, accuracy and time complexity are calculated which is better than the original dataset having 42 features. The testing accuracy is increased by 0.369% and the number of features

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

89

Table 5 Testing accuracy and time complexity of feature selection techniques with WrapperSubsetEval Accuracy (%)

Time (s)

ClassifierSubsetEval Union WrapperSubsetEval

92.064

1949.55

ClassifierSubsetEval Union Info_Gain

91.93

243.68

CfsSubsetEval Union WrapperSubsetEval

92.204

2687.01

WrapperSubsetEval Union Info_Gain

92.014

159.35

WrapperSubsetEval Union Gain_Ratio

92.454

161.29

WrapperSubsetEval union OneRAttributeEval

92.025

154.45

WrapperSubsetEval union SymmetricalUncertAttributeEval

91.99

144.32

WrapperSubsetEval Union ClassifierAttributeEval

92.0792

228.9

Fig. 6 Comparison of testing accuracy of different feature selection methods

decreased by 66.66%. Thus, the proposed algorithm successfully reduced the dimensionality of the KDD-99 dataset. In the future, this work can be extended by applying variations of mathematical operations to get the reduced feature set. Further comparison of feature selection methods can be done by considering different classification algorithms like decision tree, Naïve Bayes, KNN, etc.

90

S. Umbarkar and K. Sharma

Fig. 7 Comparison of testing time of different feature selection methods

Reference 1. Umbarkar S, Shukla S (2018) Analysis of heuristic-based feature reduction method in intrusion detection system. In: 2018 5th international conference on signal processing and integrated networks (SPIN), Noida, 2018, pp 717–720 2. Kasongo SM, Sun Y (2020) A deep learning method with wrapper based feature extraction for wireless intrusion detection system. Comput Secur 3. Bjerkestrand T, Tsaptsinos D, Pfluegel E (2015) An evaluation of feature selection and reduction algorithms for network IDS data. In: 2015 international conference on cyber situational awareness, data analytics and assessment (CyberSA), London, pp 1–2 4. Li Y, Xu Y, Liu Z, Hou H, Zheng Y, Xin Y, Lizhen Cui Y (2020) Robust detection for network intrusion of industrial IoT based on multi CNN fusion. Measurement 154 5. Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H (2019) Cyber intrusion detection by combined feature selection algorithm. J Inf Secur Appl 44 6. Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81 7. Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl148 8. Hakim L, Fatma R, Novriandi (2019) Influence analysis of feature selection to network intrusion detection system performance using NSL-KDD dataset. In: 2019 international conference on computer science, information technology, and electrical engineering (ICOMITEE), Jember, Indonesia, pp 217–220 9. Nziga J (2011) Minimal dataset for network intrusion detection systems via dimensionality reduction. In: 2011 sixth international conference on digital information management, Melbourn, QLD, pp 168–173. https://doi.org/https://doi.org/10.1109/ICDIM.2011.6093368 10. Chabathula KJ, Jaidhar CD, Ajay Kumara MA (2015) Comparative study of principal component analysis based intrusion detection approach using machine learning algorithms. In: 2015

Dimensionality Reduction of KDD-99 Using Self-perpetuating …

91

3rd international conference on signal processing, communication and networking (ICSCN), Chennai, pp 1–6. https://doi.org/https://doi.org/10.1109/ICSCN.2015.7219853 11. Das A, Nayak RB (2012) A divide and conquer feature reduction and feature selection algorithm in KDD intrusion detection dataset. In: IET Chennai 3rd international on sustainable energy and intelligent systems (SEISCON). Tiruchengode, pp 1–4. https://doi.org/https://doi.org/10. 1049/cp.2012.2241

Energy-Efficient Neighbor Discovery Using Bacterial Foraging Optimization (BFO) Algorithm for Directional Wireless Sensor Networks Sagar Mekala and K. Shahu Chatrapati

Abstract In directional wireless sensor networks (WSN), the existing neighbor’s discovery methods involve high latency and energy consumption, compared to the block design-based methods. Moreover, the duty cycle schedule of nodes has to be addressed to increase the network lifetime. In this paper, an energy-efficient collaborative neighbor discovery mechanism using the bacterial foraging optimization (BFO) algorithm is recommended. In this computation, each node with a directional antenna performs beamforming using BFOA with sector number and beam direction as the fitness function. In the end, appropriate active nodes with higher energy levels are selected from the neighbors during data transmission. The obtained results have shown that the recommended model minimizes power conservation and delay and enhances the lifetime of time of network activity. Keywords WSN · Energy · BFO · Algorithm · Neighbor

1 Introduction Typically, WSNs consist of a finite set of resource constraint tiny devices such as sensor and actuators deployed in a field to investigate physical and environmental interests. These small-sized devices are equipped with limited power, less storage, short-range radio transceivers, and limited processing. Therefore, they have not only sensed capability but also data processing and communication capabilities. In sensor networks nodes densely distributed in a field to attend cooperatively engaged for allotted function such as environment monitoring (for example, temperature, air quality, noise, humidity, animal movements, water quality, or pollutants), industrial process control, battlefield surveillance, healthcare control, home intelligence, and security and surveillance intelligence [1]. Traditional WSNs contain S. Mekala (B) Department of CSE, Mahatma Gandhi University, Nalgonda, Telangana, India K. Shahu Chatrapati Department of CSE, JNTUH CEM, Peddapalli, Telangana, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_7

93

94

S. Mekala and K. Shahu Chatrapati

many sensor devices that can coordinate to accomplish a common task within an environmental area. Each sensor consists of a small microcontroller for communications [2]. Effective route discovery models can be used to increase the network lifetime by consuming limited power during communication activities [3–5]. The fundamental objective of WSNs is that discovering all the neighbors of a sensor node is called neighbor discovery, which is one of the primitive functionalities for many sensor networking applications. But it has been a challenging operation since the number of neighbors of a node could not be predicted accurately. Neighbor discovery with limited energy conservation is an essence of network formation, which regulates sensor network setup and normal operations (such as routing and topology control) and prolongs the lifetime of WSNs. To address the design problems of energy-efficient neighbor discovery in recent literature, discovering the neighbor’s process significantly classified into three categories: probabilistic, deterministic, and quorum-based. Regular neighbor discovery mechanisms focus on reducing power conservation by limiting the active periods of sensor nodes in sensor networks [1]. A neighbor discovery protocol (NDP) is a representation scheme to find neighbor nodes. The basic concept of symmetric neighbor discovery, every node in the WSN has a common duty cycle. On the other hand, nodes in network use independent duty cycle in asymmetric approaches. In WSNs, nodes operated on limited battery power and resource-constrained environments. Hence, NDPs need to address issues relevant to asymmetric and symmetric duty cycles efficiently. Therefore, the lack of supporting neighbor discovery in asymmetric networks is a considerable limitation of the block designs-based NDP [4, 6]. In WSNs, the sensor nodes operated on three modes by adopting a low-duty cycle (i.e., the nodes in the network have a minimum work period than the sleep period). To evaluate the energy efficiency of nodes in the sensor network has two primary metrics are proposed, such as discovery latency and duty cycle. However, low-duty cycled sensor network node in standby mode particular period of time to wake up the neighbors, leads to a considerable delay. In general, a small duty cycle causes a longer discovery latency and vice versa. On the other hand, some WSNs applications due to the dynamic nature of the devices in the sensor network lead the constant changes in network topology so that the neighbor node changes from time to time. Hence, it is a problematic process to find neighbor nodes with limited power consumption and low discovery latency [5, 7].

1.1 Problem Identification As discussed in [2], the neighbor discovery methods U-connect, Disco, and quorum involve high latency and energy consumption, compared to the block design based neighbor discovery methods. But it involves high complexity and computation overhead since the new block design. The collaborative neighbor discovery mechanism for directional WSN [8] applies the beamforming technique for neighbor discovery. But during beamforming, the appropriate sector number and beam direction have been chosen for better results

Energy-Efficient Neighbor Discovery Using Bacterial …

95

[8–11]. Moreover, this technique did not address the node’s energy level duty cycle schedule to prolong the lifetime of the network. To solve these issues, energy-efficient collaborative neighbor discovery mechanism using BFOA was proposed.

2 Related Works The great responsibility of resource-constrained, self-organized, and small in size devices in WSNs are to discover neighbor nodes efficiently in many time crucial applications. Many neighbor discovery techniques have emerged in sensor networks with the goals of efficient power utilization and data delivery, adjusting the duty cycle and latency [1–3]. In this subsequent section, we explore existing models, influence to design the proposed model in this paper. To preserve unnecessary wakeup schedules in the neighbor discovery process and to address issues in block-based design duty cycle schedule in various applications of IoT, Lee et al. [4] have proposed a scheme for neighbor discovery by merging different duty cycles and block design with the help of the exclusive-OR NDP. The implemented model remarkably operated on both asymmetric and symmetric duty cycle environments to preserve power consumption. To optimize the performance of discovering the neighbor process, Nur et al. [8] have established a model called the collaborative neighbor detection (COND) appliance for directional wireless sensor networks (DSNs), to achieve low-duty cycle neighbor discovery technique in sensor network by message sampling in a distributed manner to get neighbor nodes. The simulation results of indirect discovery COND show that it significantly minimizes the discovery latency in the network. In literature, numerous neighbor discovery algorithms were proposed, and most of this emphasis is on pairwise schedule discovery. Chen et al. [9] have implemented a neighbor discovery for mobile sensor networks using a group-based design to work on the top-up of existing pairwise schedule node discovery mechanisms by careful designing of schedule reference model among all nodes in the network to prolong the lifetime and by reducing discovery latency. By considering the node constraints, packet overhead, and traffic patterns in WSNs, Amiri et al. [12] have suggested a bio-inspired model based on the foraging behavior of ants, i.e., ant colony optimization. The inspired model, combined with fuzzy logic, affects the route discovery from the source node to the sink node in a multihop communication. The simulation results of the fuzzy logic ant colony optimization routing algorithm (FACOR) significantly prolong the lifetime of the network by careful maintenance of power conservation of a node in WSNs and need to address node’s failure, mobility, and new node in a high-density network. To enhance the performance of data packet delivery in unreliable wireless communication, Meghashree et al. [13] have offered a model by adopting the reactive route discovery protocol in WSNs, and to reduce discovery overhead and they introduced a biased back-off scheme in the route discovery stage.

96

S. Mekala and K. Shahu Chatrapati

Energy-efficient shortest path (EESP) model in WSNs is proposed by Lingam et al. [14]. EESP pursues hierarchal approaches of DSR and energy-efficient AODV along with the use of uniform load distribution among network nodes, and EESP discovers an optimal path of minimum by considering the intermediate node energy, from the obtained results, show that maximizes the average lifetime of nodes in the WSNs, where nodes have been same capabilities. Based on the proposed algorithm, only a certain number of nodes should be ON their radio in the network for some time. All the nodes in a specific region’s network do not exist in the active period for all time. Hence, active schedules of a node in the network are minimized. Therefore, there is a higher possibility of discovering at least a few neighboring devices in the WSNs with the same duty cycle. As mentioned in the previous section, it is quite efficient and straightforward to implement an asynchronous protocol for WSNs, and we combined COND with BFOA to prolong the lifetime of the network in this paper [4–6, 9].

3 Energy-Efficient Neighbor Discovery Using BFOA 3.1 Overview In this paper, an energy-efficient collaborative neighbor discovery mechanism using BFOA is presented. In this algorithm, each node with a directional antenna performs beamforming using BFO to poll the neighbors along with its sector. The sector number and beam direction are considered as the fitness function for BFOA. During the polling stage, each node sends a REPLY to the HELLO message, which contains the node ID, its remaining energy of a node, and its duty cycle schedule. On receiving the reply message, the polling node can obtain the complete neighborhood information and their duty cycle schedule and energy levels. Hence, during data transmission, the optimum active nodes with sufficient battery energy are selected from the neighbors.

3.2 Fundamentals of Optimization Algorithm In recent years, a new stochastic technique was proposed to study optimization problems, i.e., bacterial foraging algorithm (BFA) based on the natural nature of Escherichia coli (E. coli) bacteria which are residing in individual viscera. The BFA is an optimization algorithm based on computational intelligence. It has been widely adopted and proposed in various engineering problems, including directional antennas, power consumption mechanisms, controller design, and artificial neural networks because of the social insect behavior of E. coli bacterium. Therefore, we were inspired to use the basic properties of bacterium foraging optimization by

Energy-Efficient Neighbor Discovery Using Bacterial …

97

applying three processes, such as chemotaxis, reproduction and elimination, and dispersal. In general, there are two different ways of shifts in E. Coli bacterium, such as tumbling and swimming to perform various operations, for example, finding food, nesting, brooding, protecting, and guarding [5, 11]. These two modes of operations can be performed randomly; that process is chemotactic to find nutrients. For an actual bacterium, the tumbling movement performed during foraging employing a collection of stretched flagella. The flagella guide the E. coli bacterium during its movement. The bacterium can increase the speed of the swimming rate when the flagella choose to shift in an anti-clockwise direction [11]. In this algorithm, the bacterium endures chemotactic, they were intense to forward toward good food and prevent virulent environment, they received sufficient nutrient gradients, and they increase their population and decide to divide accordingly to form its reproduction blocks. In the elimination-dispersal phase, all the bacteria in a specific block are to be destroyed, or a complete block is dispersed into a new pasture the environment [11].

3.3 Estimation of Metrics a.

Fitness Function The following equation illustrates the fitness function for BFOA: F = Fswitch × (|W | − 1) × 

(1)

where switch delay: F switch , set of disks: W, and delay turning parameter:  =ρ× where ρ is selected value such that the latency in every layer should be reduced. When a node in the directional network, the discovery of neighbors guaranteed in a given layer w ∈ W, then it is crucial to a node wait for a small period and vice versa; otherwise, it leads to unnecessary discovery overhead. Consequently, the selected value of ρ dynamically controls the awake up the time slot of a node in a particular layer w ∈ W for neighbor discovery. b.

Node Residual Energy The following equation describes every node enduring energy (Eres) intensity after one data packet transmission/reception on a sensor network. E res = E i − (E t x + Er x )

(2)

where E i: preliminary power intensity, E tx: power conservation for broadcast, and E rx : power conservation for reception.

98

S. Mekala and K. Shahu Chatrapati

c.

Node Duty Cycle Schedule In this section, we demonstrated the scheduling of duty cycle, which leads to the energy-budget of nodes in the network. The duty cycle is fraction time a node in a work period. As we know, nodes in the network can operate in three modes: work period, sleep period, and listens period. The following equations can demonstrate nodes duty cycle in a network.

The number of data packets aggregated by the receiver node at time slot T is assumed employing the following equation. tini  +T

A1 =

DPR(t)dt

(3)

tini

The following equation represents the amount of data packet broadcasted by a node at time slot T is tini +T  s +Tt x

A2 =

DPT(t)dt

(4)

tini

The following equation shows that the whole packet aggregated by receiver node excluding T tx is T−Tt x

A3 =

DPR(t)dt

(5)

tini

We consider DPT(t) and DPR(t) as a low advent cost. Then, the following equation can estimate duty cycle by considering the mean advent cost. T
0 signify the elementary chemotactic phase dimension, which used to describe the distance between the stages in the course of turns. Let  α be the unit span arbitrary track signifying all. (Using unit span, the track of drive after a fall can be assessed.) The following sequence steps are elaborate in design of optimization algorithms:

100

S. Mekala and K. Shahu Chatrapati

Energy-Efficient Neighbor Discovery Using Bacterial …

101

Arrange the chemotactic arguments S(i) and the X nodes according to the ascending order of C iAC T I V E . The nodes in the network with the high measured value of C iAC T I V E can die and available neighbor nodes with adequate energy vales separated into two. Note: During the replication phase, the population arranged accordingly such that the least energy node could die, and the most magnificent nodes are split into two halves and put down in the same environment. If r < R, move to the removal diffusion phase. In this circumstance, we have not attained the number of specified replication phases. Removal Diffusion For every, i range from 1 to X with possibility Pd , remove and diffuse every nodule, remove the node and diffuse one to an arbitrary position in the developmental area. If e < Pd , then move to step 1. Otherwise, the process terminated. Note: The frequency of a chemotactic event is larger than the replication activity, which is around more substantial than an elimination-dispersal event. Thus, based on the sector number and beam direction, neighbors are polled along with its sector.

3.4 Energy-Efficient Neighbor Discovery The sequence of steps represented in this algorithm as follows: In the polling phase, every sensor node Si broadcasts the beacons (short HELLO message) to its neighbor set of sensor devices. beacon

Si −→ Neighbor sensor node-set

102

S. Mekala and K. Shahu Chatrapati

Table 1 Beacon format of a node

Node_ID

Frame_ID

Remaining energy

Duty cycle

The border setup of the beacon is explored in Table 1. In the next section, we study the parameter in the beacon (HELLO message), including the remaining energy and duty cycle. Each node sends a REPLY to the beacon (HELLO message). REPLY

Si −→ Neighboring Nodes On receiving the reply message, the polling node can obtain the complete details of neighbors along with their duty schedule and energy levels. During data transmission, the active nodes with high residual energy are selected from the neighbors.

4 Simulation Results 4.1 Simulation Setup The proposed energy-efficient neighbor discovery using bacterial foraging optimization (EENDBFO) algorithm is simulated in NS2 and compared with the collaborative neighbor discovery (COND) mode [8]. The performance metrics are neighbor discovery delay, neighbor discovery ratio, packets received, or average node residual energy (Table 2).

4.2 Simulation Results and Analysis Impact of differing the nodes To analyze the group of nodes in a sensor network, we change node density in the area accordingly from 50 to 200. Figure 1 exhibited the discovery delay for EENDBFO and COND when the number of nodes diversified in the way of 50, 100, 150, and 200. Concerning Fig. 1, the discovery delay in EENDBFO reduces from 7.5 to 3.2, and discovery delay in COND reduces from 10.0 to 5.1. But the discovery delay in EENDBFO is 36% smaller when differentiated with COND. Figure 2 depicts the discovery ratio calculated for EENDBFO and COND when several nodes in the network have differed. From obtained results, the nodes grow from 50 to 200, and the discovery ratio EENDBFO grows from 0.40 to 0.54, and the discovery ratio COND grows from 0.20 to 0.45. The analysis shows that the discovery ratio of EENDBFO is 33% larger concerning the COND.

Energy-Efficient Neighbor Discovery Using Bacterial … Table 2 Simulation metrics

Fig. 1 Neighbor discovery delay for varying the nodes

Fig. 2 Neighbor discovery ratio for varying the nodes

103

Number of nodes deployed

50, 100, 150 and 200

Size of deployment area

1300 × 1300

Deployment type

Uniform random

MAC protocol

IEEE 802.11b

Traffic type

CBR

Data transmission rate

50 kb

Propagation model

Free space model

Antenna type

Directional antenna

Modulation

BPSK

Number of directions

4

Transmission range

200–400 m

Slot duration

1 ms

Initial node energy

12.0 J

Transmission energy consumption

0.660 W

Reception energy conservation

0.395 W

104

S. Mekala and K. Shahu Chatrapati

Fig. 3 Packets received for varying the nodes

Figure 3 explored the set of data packet reception in EENDBFO and COND when the number of nodes diversified in the way of 50, 100, 150, and 200. The simulation results show that the reception of packets at EENDBFO extends from 1233 to 1688, and the reception of data packets at COND extending from 809 to 1381. Hence, the reception of data packets at EENDBFO is 24% larger packets when differentiated with COND. Simulation in Fig. 4 explores that the average residual energy constructed in EENDBFO and COND when the nodes in the network have differed. As per simulation, EENDBFO, the average remaining energy of a node reduces from 11.8 to 11.4 J, and COND, the average remaining energy of a node reduces from 7.4 to 5.1 J. The average residual energy in EENDBFO is 44% smaller than COND. Impact of varying Communication Range of Node To analyze the impact of the communication range of nodes in a network’s deployed area, we carefully engineered nodes, transmission, and receiver range accordingly from 200 to 400 m. Figure 5 exhibits the discovery delay accumulated from EENDBFO and COND when the number of nodes diversified in the way of 200, 250, 300, 350, and 400. Concerning Fig. 5, the discovery delay in EENDBFO improved from 3.8 to 4.8, and the discovery Fig. 4 Average residual energy for varying the nodes

Energy-Efficient Neighbor Discovery Using Bacterial …

105

Fig. 5 Neighbor discovery delay for varying the communication range

delay in COND improved from 4.6 to 7.0 s. But the discovery delay of EENDBFO is 26% smaller when differentiated with COND. Figure 6 depicts the node discovery ratio calculated for EENDBFO and COND when several nodes in a sensor network setup are deferred. In simulation results, the nodes are range from 200 to 400, and the node discovery ratio in EENDBFO reduces from 0.70 to 0.59, and another hand discovery ratio in COND reduces from 0.43 to 0.26. The analysis clearly shows that the discovery ratio of EENDBFO is 50% faster when compared with COND. Figure 7 explores the set of data packet reception in EENDBFO and COND when the number of nodes diversified in the way of 200, 250, 300, 350, and 400. The simulation results show that the reception of packets at EENDBFO extends from 1757 to 2149, and the reception of data packets at COND extends from 1439 to 1750. Hence, the reception of data packets at EENDBFO is 20% larger packets when differentiated with COND. Simulation results in Fig. 8 explore the average residual energy constructed in EENDBFO and COND when the nodes in the network have differed. As per simulation, EENDBFO, the average remaining power of a node increases from 10.4 to Fig. 6 Neighbor discovery ratio for varying the range

106

S. Mekala and K. Shahu Chatrapati

Fig. 7 Packets received for varying the range

Fig. 8 Average residual energy for varying the range

11.4 J, and COND, the average remaining power of a node increases from 5.5 to 6.9 J. The average residual energy in EENDBFO is 44% larger than COND.

5 Conclusion In this paper, we have developed an EENDBFOA for directional WSN. In this algorithm, each node with directional antenna performs beam forming using BFO to poll the neighbours along its sector. During the polling stage, the appropriate active nodes with higher energy levels can be selected from the neighbours during data transmission. By simulation results, it has been shown that the proposed EENDBFOA minimizes discovery delay and energy consumption and increases discovery ratio.

Energy-Efficient Neighbor Discovery Using Bacterial …

107

References 1. Manir SB (2015) Collective neighbor discovery in wireless sensor network. Int J Comput Appl (0975–8887), 131(11) 2. Choi S, Lee W, Song T, Youn J-H (2015) Block design-based asynchronous neighbor discovery protocol for wireless sensor networks. J Sens 2015. Article ID 951652, 12 p 3. Selva Reegan A, Baburaj E (2015) An effective model of the neighbor discovery and energy efficient routing method for wireless sensor networks. Indian J Sci Technol 8(23). https://doi. org/10.17485/ijst/2015/v8i23/79348, Sept 2015 4. Lee W, Song T-S, Youn J-H (2017) Asymmetric neighbor discovery protocol for wireless sensor networks using block design. Int J Control Autom 10(1):387–396 5. Sun W, Yangy Z, Wang K, Liuy Y (2014) Hello: a generic flexible protocol for neighbor discovery. IEEE 6. Qiu Y, Li S, Xu X, Li Z (2016) Talk more listen less: energy-efficient neighbor discovery in wireless sensor networks. IEEE 7. Karthikeyan V, Vinod A, Jeyakumar P (2014) An energy-efficient neighbour node discovery method for wireless sensor networks. arXiv preprint arXiv:1402.3655,2014 8. Nur FN, Sharmin S, Ahsan Habib M, Abdur Razzaque M, Shariful Islam M, Almogren A, Mehedi Hassan M, Alamri A (2017) Collaborative neighbor discovery in directional wireless sensor networks: algorithm and analysis. EURASIP J Wireless Commun Netw 2017:119 9. Chen L, Shu Y, Gu Y, Guo S, He T, Zhang F, Chen J (2015) Group-based neighbor discovery in low-duty-cycle mobile sensor networks. IEEE Trans Mobile Comput 10. Agarwal R, Banerjee A, Gauthier V, Becker M, Kiat Yeo C, Lee BS (2011) Self-organization of nodes using bio-inspired techniques for achieving small-world properties. IEEE 11. Das S, Biswas A, Dasgupta S, Abraham A (2009) Bacterial foraging optimization algorithm: theoretical foundations, analysis, and applications. Foundations of computational intelligence, vol 3. Springer, Berlin, pp 23–55 12. Amiri E, Keshavarz H, Alizadeh M, Zamani M, Khodadadi T (2014) Energy efficient routing in wireless sensor networks based on fuzzy ant colony optimization. Int J Distrib Sensor Netw 2014. Article ID 768936, 17 p 13. Meghashree M, Uma S (2015) Providing efficient route discovery using reactive routing in wireless sensor networks. Int J Res Comput Appl Robot 3(4):145–151 14. Sathees Lingam P, Parthasarathi S, Hariharan K (2017) Energy efficient shortest path routing protocol for wireless sensor networks. Int J Innov Res Adv Eng (IJIRAE) 4(06):2349–2163

Auto-encoder—LSTM-Based Outlier Detection Method for WSNs Bhanu Chander and Kumaravelan Gopalakrishnan

Abstract Wireless sensor networks (WSNs) have got tremendous interest from various real-life appliances, in particular environmental applications. In such longstand employed sensors, it is difficult to check the features and quality of raw sensed data. After the deployment, there are chances that sensor nodes may expose to unsympathetic circumstances, which result in sensors to stop working or convey them to send inaccurate data. If such things not detected, the quality of the sensor network can be greatly reduced. Outlier detection ensures the quality of the sensor by safe and sound monitoring as well as consistent detection of attractive and important events. In this article, we proposed a novel method called smooth auto-encoder to learn strong plus discriminative feature representations, and reconstruction error of among input–output of smooth auto-encoder is utilized as an activation signal for outlier detection. Moreover, we employed LSTM-bidirectional RNN for maturity voting for collective outlier detection. Keywords WSNs · Outlier · Smooth auto-encoder · LSTM-RNN

1 Introduction From the results of recent advances in computer connections, wireless machinery tools, and information and communication technologies now have novel technology named Wireless Sensor Networks (WSNs). In ancient times, we have wired nodes and networks which produce very low results. But, by the development of wireless networks, now it is sufficient to build large networks with a high outcome. From the innovation, WSNs are widely employed in numerous real-life appliances such as industrial, academic, civil, and military fields. Here, the deployed nodes major objective is to collect valuable raw sensed data from real world and transform it to expert systems where they are analyzed for appropriate decision making [1–4]. But, B. Chander (B) · K. Gopalakrishnan Department of Computer Science and Engineering, Pondicherry University, Pondicherry 609605, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_8

109

110

B. Chander and K. Gopalakrishnan

depending on the applications, sensor nodes deployed in the unsympathetic surroundings along with the inhibited potentials of sensors like bandwidth, energy, CPU performance, memory, etc., compose WSNs defenseless against dissimilar types of misbehaviors or outliers. An outlier or anomaly is a dimension that extensively deviates from the common patterns of sensed data. Outlier first defined by well-known researcher Grubbs in the year 1960 as “An Outlier observation or outlier, is one that deviates markedly from other members of the sample in which it occurs.” In the year 2015, Titouna defined as “An observation that deviates a lot from other observations and can be generated by a different mechanism,” Van Vuong in 2017 defined outlier as “Data Items which doesn’t conform to an expected pattern or other items in the data set.” Outliers that influence the sensor data always keep up a correspondence to node software or hardware malfunction, reading errors, malicious attacks, and strange events. Hence, it is vital to proficiently as well as perfectly classify outliers in the sensor data to make sure data eminence, safe and sound monitoring, plus consistent recognition of attractive and important events. However, in the environment of WSNs, the outlier is one of the sources that significantly manipulate the collected data, for a high nature decision making on expert systems needs the quality of data [3–12]. And the main motive why we need to detect outliers in sensed data is outliers from time to time contain more interesting than normal patterns since they may enclose essential hidden information. If we find this outlier, we can detect upcoming events before they occur. Outlier or anomaly detection is an extremely indispensable issue for numerous research domains like data mining, health care, drug or medicines, and sensor networks, and it has been researched in dissimilar types of phases and appliances. Here, any outlier exposure task intends to identify patterns that are not in deal with the estimated pattern, and such out of the ordinary patterns are defined as outliers. Moreover, it is useful in discover noise, fraud, defects, intrusion, and errors, and so on [1–3]. The subject of how to detect outlier or anomaly has turn into increasingly considerable in the disease diagnosis, machine health, and fraud detection of credit card, environmental events, intrusion detection of set-ups, and other aspects. Outlier discovery approaches make sure about the quality of sensor data. Both effectual and well-organized outlier detection methods designed for WSNs not only categorize outliers in a scattered and online mode with high detection precision and low false alarm, but also gratify WSN resource limitations in terms of bandwidth, memory, communication, and computational. But the situation of sensor networks plus the temperament of sensor data makes the intention of a proper outlier exposure system to tough. The unnatural atmosphere of a WSN also impacts on outlier detection approaches. From literature, sensor node restraint on computational power then memory, and it stands for the approaches employed or developed for outlier exposure, should contain a low computational complication and engage in modest memory space [4–6, 13–15]. Additionally, relabeled or preprocessed data are complicated to acquire in WSNs. Outlier or anomaly recognition for WSNs should be intelligent to function on un-labeled data. From the above study, we can conclude that the key dispute of outlier or anomaly exposure in WSNs is to identify outliers with high

Auto-encoder—LSTM-Based Outlier Detection …

111

precision at the same time as consuming nominal resources of the sensor node or the network [7, 8, 14–16]. Researches on outlier detection have prepared numerous techniques in a great improvement stage for sensor network. Machine learning (ML) models produce enormous outcomes with huge accuracy when they have prearranged datasets. Coming to WSN, which is placed in real-time appliances, it is difficult to get labeled data. Deep learning (DL) is a subdivision of ML; with the help of numerous nonlinear transformations; DL permits the networks to mechanically learn the representation from raw sensed data. Long-established ML methods generally need considerable domain professionals as well as time to opt for first-rate features from the raw sensed data. DL endows with simplifying the progression of artificial feature mining that conquer the limitations of traditional or long-established ML models [3–6, 13, 14]. In the year 2006, Hinton prepared the researchers pay interest to DL. In his work, Hinton projected a technique to teach (DNN) deep neural network: At first, supervised greedy-layer-wised pre-assistance was employed to locate a set of moderate first-rate parameters, after that a minor change to the complete network, and it effectively shuns the dilemma of gradient loss. Since the advances in numerous additional features, DL and representation knowledge have been employed in the field of outlier exposure in WSNs as sound and well. In contrast with long-established ML, DL offers more talent and shows potential progression in WSNs modernization. Some of them are high prediction accuracy—ML cannot analyze the entire complex parameters such as channel variation and obstructions, etc., but DL can efficiently abstract all of this layer by layer. In addition, there is no need to preprocess input data—because DL typically selects the feature parameters openly composed from the set-up; this improvement of DL lessens the proposed design complication and enlarges the forecast precision [7, 8, 10–12, 14–16]. As a replacement for scheming features physically, it will be more helpful whether a model is to learn or trained for efficient feature signs by design from dream data, during representation learning. For a mixture of computer visualization odd jobs, an idyllic feature or characteristic representation should be healthy for small disparities, smooth for maintaining data structures, as well as discriminative for taxonomy associated tasks. DL provides more success rate in WSN applications since DL tolerates incomplete or erroneous input raw sensed data and can easily handle a large amount of input information and the capability to make control decisions [1–4, 7, 8, 14–16]. In this manuscript, we offered an auto-encoder modification, smooth auto-encoder (SmAE), headed for learn strong, hefty, booming as well as choicy feature representations. It is totally special from standard AEs which recreate every example from its encoding; we utilize the encoding of every instance to rebuild its confined neighbors. In this way, the learned demonstrations are constant, invariable with local neighbors and moreover vigorous to petite deviations of the inputs.

112

B. Chander and K. Gopalakrishnan

2 Related Literature Work In [15], authors designed a novel outlier detection model that gains knowledge of spatio-temporal relationships among dissimilar sensors and that gained knowledge of learned representation employed for recognition of outliers. They utilized SODESN-based distributed RNN structural design along with the leaning method to train SODESN. Authors simulate designed representation with real-world collected data, and outcomes show excellent detection even with inadequate link qualities. In [16], authors proposed two outlier detection approaches, LADS and LADQA, especially for WSNs. The authors employed QS-SVM and converted it to a sort problem to decrease linear computation complications. The experimental outcome confirms that the proposed approaches have lower computation with high accuracy of outlier detection. The authors of [7] come with a different schemes, and they proposed a deep auto-encoder to discover outliers from the spectrum of sensor nodes by comparing the normal data with a fixed threshold value. Evaluation is done with various numbers of hidden layers, and results achieve better performance. The authors fabricated a model for varied WSNs to detect outliers by design with the help of cloud data analysis [8]. The tentative evaluation of the projected process is performed on both edge plus cloud test on real data that have been obtained in an indoor construction atmosphere after that faint with a series of fake impairments. The gained outcome shows that the projected process can self-adapt to the atmosphere deviations and properly classify the outliers. The authors of [9] prepared novel outlier detection Toeplitz support vector data description (TSVDD) for efficient outlier exposure, and they utilized the Toeplitz matrix for random feature mapping which decreased both space and time complications. Moreover, new model selection was employed to make the model stable with lower dimensional features. The experimental results on IBRL datasets reveal that TSVDD reaches higher precision and lower time complexity in comparison with hand methods. Reference [10] projected a one-class communal outlier exposure with LSTM-RNN-based neural network. A model is trained with standard time series data; here, the prediction error of a definite quantity of most recent time steps higher than the threshold value will be a sign of collective outlier. The representation is calculated on a time series report of the KDD-1999 dataset and simulation express that the projected replica can notice collective anomaly resourcefully. The authors of [11] planned a model that forecasts the subsequent short-range frame from the preceding frames via employing LSTM with denoising auto-encoder. Here, the restoration error among the input with output of the auto-encoder is employed like an activation gesture to sense original actions. In [12], the authors proposed a novel technique where deep auto-encoder deployed as a central classifier, training model with cross-entropy loss job, and back-propagation model to resolve the issues of weight updating momentum factors are included. Lab experimental observations on datasets have shown that the proposal has a high-quality precision of feature extraction. In [17], the authors employed a novel LSTM for detection of outliers from time-based data which are number f time steps ahead. The prediction

Auto-encoder—LSTM-Based Outlier Detection …

113

inaccuracy of a solitary point was subsequently calculated through forming its forecast error vector to robust a multivariate Gaussian supply, which was employed to evaluate the probability of the outlier’s actions. The authors of [18] proposed a model by merging both predictive auto-encoders with LSTM for acoustic outlier gestures. They predicted a novel reconstruction error on auto-encoder, the data instance that shows the above threshold named as a novel event. The design of [18] is too utilized in a [19], and here, LSTM-RNNs are engaged to predict short-range frames.

3 Proposed Model 3.1 Auto-encoder Preliminaries Deep learning approaches learn from multiple-layered non-linear transformations from input to output representations. These kinds of operations are put at a highranking position on feature extraction compare to long-established models. As we discussed in the above sections, DL approaches are able to detain more conceptual features at superior layers, representative DL models like stacked auto-encoder (SAE), convolution neural networks (CNN) and deep belief networks (DBN) have reportedly reached great achievement in object tracking, event recognition, image classification, computer vision, and pattern recognition, etc. However, in comparison with all these DL models, auto-encoders can directly gain knowledge of the feature mapping task by lessening the reconstruction error among input and its encoding. According to LeCun—1987, some of the probabilistic approaches might describe as intermediate variables whose subsequent be construed as a representation. Autoencoder agenda comes under this category; it starts by explicitly demonstrating feature-extracting task in a definite parameterized closed form. This entire function is named as encode and is referred as ( f  ), proficient calculation of a feature vector h = f  (x) starting an input x. For every data instance x(t) commencing a data set {x(1), ..., x(T )}, we describe h(t) = f  (x(t)). Here, h(t) is the representation or regulations code computed from x(t). Here, a new parameterized encoder function (g ) mapping from feature-space to input generates rebuilding r = g (h). Auto-encoders are parameterized from end to end by their encoder along with decoder, those skilled using dissimilar training beliefs. The parameters (θ ) of both encoder and decoder, intelligent on the tasks like rebuilding then possible of unique input, i.e., efforts to obtain the little reconstruction or rebuilding error L(x, r)—a measure of the deviation among x, and its replication an average on a training set. In favor of trimming down the restoration error to detain the construction of the data creation and division, it was essential that somewhat in the learning stage standard or the parameterization prevents the AE from discovering the unique purpose to zero reconstruction mistake. Moreover, for the progression of auto-encoders, numerous regularization terms are proposed. Sparse auto-encoders penalize the hidden-unit sparse with an L1 consequence or Kullback–Leibler (KL) deviation; Denoising auto-encoders (DAE)

114

B. Chander and K. Gopalakrishnan

tests healthy for undersized random perturbations. Contractive auto-encoders (CAE) diminish the amount of efficient freedom levels of the demonstration via accumulating a systematic contractive consequence. Here, DAE and CAE vigorous to minute alterations of the inputs between training exemplars.

3.1.1

Smooth Auto-encoder

In comparison with other auto-encoders, smooth auto-encoders (SmAE) are completely special; it powerfully learns nonlinear feature signs. On behalf of every input, SmAE intends to restructure or rebuild its surrounded target neighbors, as an alternative to modernize itself as long-established auto-encoder variations accomplish. The actual objective principle of SmAE is described as: Jsm AE () =

n  k  i=1 j=1

dh       wn x j , xi L x j , g(f(xi )) + β K L(ρ ρ j ) j=1

Here, w(·, ·) indicates the weight function characterizes in the course of a smoothing kernel w(x j , x i ) = 1/Z K(d(x j , x i )), as well as the point Z is exploited    to certification kj=1 wn x j , xi = 1 for every part of i. k is the amount of aimed neighbors of x i . d(·,·) is a distance or space that deals with the feature space/relationship in the novel space. The first term of the above equation pushes the bordering input examples to enclose related representation. Like same, the produced attributes are not only strong with confined dissimilarities, in addition flexible the same as the input examples on various datasets. Coming to the second term, it regularizes on model complication via KL sparsity. Depending on the applications, dissimilar kernels are applied for mapping nonlinear separable data instances to high dimensions. Here, we applied the radial basis function (RBF) since it has a lesser amount of computational complication which is very helpful to increase the network lifetime. Another reason to choose the RBF kernel was the parameters gamma (γ ) and cost (c) play a key task. The same way various distance measures can be adopted based on metric learning; here, we employed Mahalanobis because it uses group means and variances for every variable which solves the correlations issues. A target neighbor has dissimilar variations or concepts, in training data; we decide some k nearest neighbors (Knn) based on Mahalanobis. The knn measured as the k target neighbors along with the subsequent detachments is applied to calculate the weight assignment. In this article, the designed model carried on the restoration or reconstruction error that the SmAE assigned to recreate an output that has not seen in the training time. The weighted reconstruction error  cross-entropy  of the same with loss for sample x i can be simplified into the form— kj=1 wn x j , xi x j . Log(g(f (xi )))    − (1 − kj=1 wn x j , xi x j . Log(1 − g(f (xi ))). The target function of same will be n h written as: JsmAE () = i=1 L ce (x j , g(f (xi ))) + β dj=1 KL(ρ||ρ j ).

Auto-encoder—LSTM-Based Outlier Detection …

115

3.2 BLSTM-RNN Preliminaries From the past few years, long short-tem memory recurrent neural network (LSTMRNN) has been applied to represent the association among existing and preceding events and holds the time series issues efficiently. In general, an LSTM-RNN is not just skilled on standard data; it is also talented to forecast, quite a few times steps ahead of an input. Most of the methods estimate outliers at the individual level from related work, not at the collective level; moreover, both standard and outlier data applied for the training phase. Coming to the design background of LSTM, it holds the input layer, LSTM hidden layer, along with output layer. Here, an input node takes input data and output will be any transform (sigmoid, tanh, etc.) utilities. The LSTM hidden layer is fashioned as of the count of smart nodes those be entirely associated to the input plus output nodes. Coming to the LSTM hidden layer, this is fashioned from some well-groomed nodes that are completely related to the input as well as output nodes. Gradient descent plus back-propagation are some of the well-known techniques utilized for best of its loss function; moreover, it updates its factors. As discussed above, LSTM has the authority to integrate deeds into a system by teaching it with standard data. So the network turns as envoy for variants of the data. In detail, a prediction is prepared with two characteristics: first—the value of an example and second—its pose at a definite time. This suggests that two similar input values at dissimilar times possibly outcome in two dissimilar outputs. And the reason was LSTM-RNN is stateful, and it has a remembrance that varies in reaction to inputs. So here we designed a fresh communal outlier detection technique based on LSTM with bidirectional RNN. Here, LSTM-RNN is utilized for correlation among proceeding as well as existing time steps to approximate outlier score for every time step, which helps for expanding time series outlier detection. In addition, bidirectional RNNs were used to access the situation from mutual temporal information. It was done through handing the input data in both ways through two split hidden layers and then delivering to the output layer. The arrangement of bidirectional RNNs along with LSTM memory blocks guides to bidirectional-LSTM set-up; here, perspective from both temporal ways is exploited. And, this helps toward developing collective outlier exposure based on the progressions of solitary data points based on their outlier score. We prepare an LSTM-RNN on standard data to gain knowledge of ordinary behavior. And this prepared model confirmed on standard validation sets for guesstimate model parameters. Then the resulted classifier utilizes to cost the outlier score in support of a particular data instance at every time step. The outlier score of a series of time steps will summative starting the involvement of every entity. With the help of fixed threshold, a series of solitary time steps is specified as communal outlier if its outlier score is superior to the threshold. For better accuracy, we made a mixture of initial assessments for the finest network with changeable hidden layers along with their size. The finest network draft for RNNs holds three hidden layers with 156-256-156 LSTM units. As well as, the BRNNs finest layout contains six hidden layers where three for each track

116

B. Chander and K. Gopalakrishnan

through 216 LSTM units each. System weights are repetitively restructured with standard gradient descent through back-propagation of sum of squared error (SSE). The gradient descent technique entails the system weights to be initialized by nonzero standards; as a result, we initialize the weights by random Gaussian distribution through mean (0) as well as standard deviation (0.1). Threshold value In designed model, both input and output layers of the system enclose 54 units. So, the accomplished auto-encoder is proficient toward recreating every example along with novel events through handing out the reconstruction error with an adaptive threshold. For each and every time step, the Euclidean distance flanked by every identical input value along with the system output is calculated. The spaces are summed-up plus separated through the number of coefficients to stand for the reconstruction error of every time step through a solitary assessment. For the best possible event exposure, a threshold “θth ” is practiced to gain a binary sign. Here, threshold is relative to median of the error signal of a progression e0 as a result of multiplicative coefficient β, restrained to the choice from βmin = 1 to βmax = 2: θth = β ∗ median(e0 )

(1)

4 Experimental Results For the experimental results, we consider a benchmark data set accumulated from WSNs positioned at Intel Berkeley Research Laboratory (IBRL). Here, the data are gathered with the TinyDB in network query processing method that fabricates on the Tiny-OS policy. The sited WSNs include 54 Mica2Dot sensor nodes sited in the IBRL for 30 days nearly 720 h. Sensors assemble data with five dimensions voltage in volts, light in Lux, the temperature in degree celsius, humidity ranging from 0 to 100%, along with set-up topology position for every 30 s gap. In IBRL set-up, Node 0 is considered the starting node and remained nodes broadcast data with more than a few hops to node 0. The farthest nodes produce the sensed data with the utmost of 10 hops. For 720 h, these 54 nodes collected almost 2.3 million readings. For the experiment on the proposed model, we prepared a testing set because the original atmosphere data did not contain any labels as to which data are normal and outlier. Here, we choose three dimensions: humidity, temperature, and voltage. We engaged k-fold cross-validation to reduce the samples to half the size. Each of these dimensions holds 5000 sample for preparation or training, 1000 sample for certifying or validating, plus 2000 samples for testing. In our technique, we apply the unsupervised-based target neighbor toward exemplify weight function; furthermore, network is promote fine-tuning by RBF kernel. The hyper-parameters like layer aspect, sparsity consequence as well as kernel bandwidths were found through the validation set. In Table 1, we mentioned model accuracy, precision along

Auto-encoder—LSTM-Based Outlier Detection …

117

Table 1 Accuracy and error rate of same with existing methods Method

AE

DAE

CAE

Same

AE-2

DAE-2

CAE-2

SmAE-2

Accuracy

95.24

94.68

92.49

96.17

97.12

96.46

97.98

99.26

Error rate

1.98

1.58

1.46

1.18

1.64

1.15

1.10

0.82

with its error rates; moreover, we compared the proposed model with other existed AE, DAE, CAE, AE-2, DAE-2, CAE-2, and SmAE-2 (Here, sign 2 indicates the projected version build by stacking 2 hidden layers). And, the outcome result shows the same has high-quality accuracy as well as a low error rate. So, two labels normal and outlier are prepared, and this data set holds nearly 5000 normal and 400 abnormal samples. We employed k-fold cross-validation to compress the samples to half the size. After various testing procedures, we fix with best network model that trained with momentum of 0.9, learning velocity l = {1e−3 to 1e−7 } with dissimilar noise sigma values σ = {0.25, 0.5}. 54–20–54, 54–54–54, and 54–128–54 is the best network topologies, so we maintained the same network set-up for every testing, for the best comparison work. Here, each of the network topologies are trained and evaluated for every 50 epochs. From Table 2, it clearly shows the overall valuation and evaluation of the projected method with other accessible up-to-date techniques, and our projected method shows the most excellent results in terms of precision, recall, along with F measure up to 96.89, 94.43, and 95.90 with input noise-standard-deviation of 0.5 see Table 3. Here, we conducted numerous experiments with numerous network layouts for each network style; however, we explain the most excellent standard network layout results. By arranging input noise deviation of 0.1, 0.25 both BLSTM-AE, LSTMSmAE produce higher values precision values with nearly 91.89, 93.46, and 92.24, Table 2 Performance evolution designed method with various network layouts and existing methods Method L ST M − AE

Precision

Recall

F-measure

89.1, 90.24

86.90, 88.29

85.24, 86.32

54–54–54, 54–128–54

92.23, 92.85

91.63, 91.45

93.48, 92.69

54–54–54, 54–128–54

94.41, 94.90

92.81, 93.14

93.43, 94.17

96.77, 96.89

93.32, 94.43

95.73, 95.90



B L ST M − AE L ST M − D AE

54–54–54, 54–128–54 

B L ST M − D AE L ST M − C AE



B L ST M − C AE L ST M − Sm AE B L ST M − Sm AE

 54–54–54, 54–128–54

118

B. Chander and K. Gopalakrishnan

Table 3 Comparison evolution of designed method precision with existing model

Precision 98 96 94 92 90 88 86 84

Precision

94.64. In the end, the achieved results showed that the employment smooth autoencoder with the different BLSTM proposal is valuable; moreover, a momentous performance progression with respect toward the modern technology was designed. For collective outlier detection, we observe the prediction errors of a particular successive data point. For this, we calculate the relative error, collective error, and prediction error. For relative error, we analyze the error among real value along with its own prediction value from BLSTM-RNN at each time step. And, it is described in equation form as RE (x, x) = |x − x|. Prediction Error Threshold (PET) estimates that a particular time stamp value is considered as standard or a point for possible collective outlier. If the RE is more than the calculated PET, then it is placed as a point of collective outlier. Finally, the collective range identifies the collective outliers based on the count of minimum amount of outliers come into view in succession in a network flow.

5 Conclusion Outlier detection in WSNs is one of the challenging tasks, and researchers are continuously working on it for best results. In this article, we try to plan a model for outlier detection by employing smooth auto-encoder-based LSTM-bidirectional RNN. We have provoked this process as a result of exploring SmAE for its robust learning ability of target neighbor representation, LSTM-RNN for the issues of time series, and try

Auto-encoder—LSTM-Based Outlier Detection …

119

to adjust both of the techniques to detect group outliers. The designed version is estimated with the benchmark IBRL dataset. Experimental analysis proves the projected method has superior accuracy, recall contrast to existing techniques.

References 1. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS 2. Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231 3. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: NIPS 4. Ngiam J, Coates A, Lahiri A, Prochnow B, Le QV, Ng AY (2011) On optimization methods for deep learning. In: ICML 5. Ranzato M, Boureau YL, LeCun Y (2007) Sparse feature learning for deep belief networks. In: NIPS 6. Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. In: NIPS 7. Feng Q, Zhang Y, Li C, Dou Z, Wang J (2016) Anomaly detection o f spectrum in wireless communication via deep auto-encoders. J Supercomput. https://doi.org/10.1007/s11227-0172017-7 8. Cauteruccio F, Fortino G, Guerrieri A, Liotta A, Mocanu DC, Perra C, Terracina G, Vega MT (2019) Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance. Inf Fus 52:13–30 9. Huan Z, Wei C, Li G-H (2018) Outlier detection in wireless sensor networks using model selection based support vector data description. Sensors 18 10. Thi NN, Cao VL, Le-Khac N-A (2016) One-class collective anomaly detection based on LSTM–RNNs. IEEE 11. Marchil E, Vesperini F, Weningerl F, Squartini FES (2015) Non-linear prediction with LSTM recurrent neural networks for acoustic novelty detection. IEEE 12. Zhu J, Ming Y, Song Y, Wang S (2017) Mechanism of situation element acquisition based on deep auto-encoder network in wireless sensor networks. Int J Distrib Sensor Netw 13(3) 13. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: ICML 14. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representation in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408 15. Oliver O (2013) Distributed fault detection in sensor networks using a recurrent neural network. Neural Process Lett. https://doi.org/10.1007/s11063-013-9327-4 16. Cheng P, Zhu M (2015) Lightweight anomaly detection for wireless sensor networks. Int J Distrib Sensor Netw 2015. Article ID 653232 17. Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings. Presses universitaires de Louvain, p 89 18. Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1996–2000. IEEE 19. Marchi E, Vesperini F, Weninger F, Eyben F, Squartini S, Schuller B (2015) Non-linear prediction with LSTM recurrent neural networks for acoustic novelty detection. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–7

An Improved Swarm Optimization Algorithm-Based Harmonics Estimation and Optimal Switching Angle Identification M. Alekhya, S. Ramyaka, N. Sambasiva Rao, and Ch. Durga Prasad

Abstract In this paper, harmonic parameters are estimated using an improved particle swarm optimization (IPSO) algorithm and extended the concept for identification of correct switching angles of inverters to minimize the total harmonic contents. Initially a power system voltage signal with multiple harmonic components is considered in the presence of noise, and the parameters such as amplitude (A) and phase angle (ϕ) are estimated by using conventional PSO and IPSO. Later an objective function is framed for such voltage for cascade H-bridge inverter to identify the precise switching angles which reduces overall harmonic contents. Comparisons show the effectiveness of the IPSO in both cases to identify optimal solutions. Keywords PSO · Harmonics · Optimal switching · Inertia weight

1 Introduction The structural changes in integrated power system with renewable energy resources, converters, and inverters along with highly nonlinear loads inject harmonics and lead to poor quality of electrical power [1]. These injected harmonics need to be estimated and mitigated with proper solutions since they will result in some adverse effects on regular functions of relays and other devices. The estimation of harmonics in the M. Alekhya (B) · S. Ramyaka · N. Sambasiva Rao Department of Electrical and Electronics Engineering, NRI Institute of Technology, Vijayawad, India e-mail: [email protected] S. Ramyaka e-mail: [email protected] N. Sambasiva Rao e-mail: [email protected] Ch. Durga Prasad Department of Electrical and Electronics Engineering, SRKR Engineering College, Bhimavaram, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_9

121

122

M. Alekhya et al.

power system signals and identification of optimal switching angles of inverters to minimize the injected harmonics during DC to AC conversion are achieved by intelligent optimization techniques in a better way compared to conventional approaches. Some of the approaches available in literature are: Utilizing structural properties of the voltage signal injected with harmonics and noise, genetic algorithm (GA) applied in support with least square technique in [3] for estimation of nonlinear parameters. Since the convergence rate of GA is slow, PSO applied in 2008 [4] for same estimation problem in similar process mentioned in [3]. Compared to GA, PSO yields better fitness value and the estimation results close to actual values compared to GA created a research line for application of intelligent optimization algorithms. Later, an improved version of PSO is applied for this harmonic estimation problem to get more accurate results with fast convergence rate [5]. However, this improved PSO is complex in structure compared to PSO and computational time also large. Artificial bee colony (ABC) algorithm hybridized with least squares applied in [6] in line with earlier articles for better results with more accuracy in the presence of noise. The objective of aforementioned stochastic and population search-based algorithms is to estimate amplitude and phase of distorted signals with fast convergence and high accuracy. Some of the other optimization algorithms applied in the same domain are available in [7–10]. These harmonics injected from the power electronics devices are minimized by optimal switching concept. Few works are available in literature for the solutions of identification of optimal switching times of various inverters. Several optimization techniques were applied to identify the switch patterns [11]. In this paper, a simple improved version of PSO is used for harmonics estimation of distorted voltage signals and further optimal switching times identification to reduce the harmonic contents. This improved PSO produces global optimal values and reduces additional burden on selection of variables. This also provides high accurate and fast converged results compared PSO.

2 Harmonic Estimation and Switching Angles Identification The voltage signal with multiple harmonics (ωh = h·2π f 0 ) and noise (μ(t)) is expressed in time domain with fundamental frequency f 0 is v(t) =

N 

Ah sin(ωh t + ϕh ) + μ(t)

(1)

h=1

In Eq. (1), N = total number of harmonics. Representation of Eq. (1) in discrete form with sampling period Ts for computing the errors is given by

An Improved Swarm Optimization Algorithm-Based …

v(k) =

N 

123

Ah sin(ωh kTs + ϕh ) + μ(k)

(2)

h=1 



Let the estimated parameters of amplitude and phase are Ah and ϕh , respectively. The distorted signal with estimated parameters is represented as, v(k) =

N 

  Ah sin ωh kTs + ϕh

(3)

h=1

Once the actual and estimated signals are available, then an objective function is framed with the help of error and it attains minimum values when the estimated signal is closely matching with actual signal. Therefore, the first objective function used for harmonic components estimation [3–6] is given by J1 = min

N 

(v(k) − v(k))2

(4)

h=1

Later, these harmonics generated from the signal conversion activities and nonlinear load participation are minimized by identification of suitable filter parameters and/or switching angles patterns. In this case, second objective function is framed for cascade H-bridge inverter to find optimal switching angles so that the output signal consists non-dominated harmonic contents [11].  J2 = min

(δ1 ,δ2 ,δ3 )

V ∗ − V1 100 · 1 ∗ V1

4

     V5 2 1 V7 2 1 50 · 50 · + + 5 V1 7 V1

(5)

In Eq. (5), V1 , V5 and V7 are the harmonic components whose expressions are available in [R]. At optimal solution of switching angles δ1 , δ2 , δ3 , the objective function attains its minimum.

3 Improved Particle Swarm Optimization Algorithm Among aforementioned intelligent optimization techniques, PSO algorithm is simple and produces global optimal functional values with convergence rate. However, the selection of control parameters plays key role in searching process [12–15]. But the conventional PSO operates with constant control parameters suffers with premature conditions. Later, several variants were proposed but these variants are more complex than parent PSO. Therefore, the velocity equation of particles is readjusted in this paper with damped quantities shown in Eq. (6). This automatically updated position vector shown in Eq. (7).

124

M. Alekhya et al.

    vni+1 = ωωd vni + c1 cd r1 pbesti − pni + c2 cd r2 gbesti − pni

(6)

pni+1 = pni + vni+1

(7)

All the terms in Eqs. (6) and (7) are as same as PSO and ωd and cd are the damping values inserted for each control parameter. The improvements in the results for both estimation and mitigation with the proposed method are presented in consequent sections by providing comparison results with parent PSO.

4 Simulation Results Initially harmonics injected voltage signal corrupted with noise along with zero frequency component is considered for estimating harmonics components by both PSO and IPSO. This test signal consists harmonics of order fundamental, 3rd, 5th, 7th, 11th is generated in MATLAB software. The mathematical expression of the signal represented in the form of Eq. (1) is given by x(t) = 1.5. sin(ωt + 80) + 0.5 sin(3ωt + 60) + 0.2 sin(5ωt + 45) + 0.15 sin(7ωt + 36) + 0.1 sin(11ωt + 30) + 0.5 exp(−5t) In the estimation problem, additional noise is also included. First, the harmonic components along with DC decaying component are estimated using conventional PSO with constant inertia weights strategy. Four values of inertia, weights are considered for this purpose since there is no specific procedure for selection of such control parameter. Later IPSO is applied for the same problem with worst inertia weight values and obtained global best values. All these results are reported in Table 1. Table 1 Optimal drift parameter values for single line to ground faults PSO case

Parameter

1st

3rd

5th

7th

11th

Zero

ω = 0.9

A ϕ

1.4084 80.685

0.4906 63.579

0.1569 13.307

0.0010 32.076

0.0612 71.326

0.7666 –

ω = 0.8

A ϕ

1.4994 79.966

0.5006 60.039

0.1986 44.0.790

0.1490 35.967

0.0512 90.238

0.5116 −

ω = 0.7

A ϕ

1.5002 79.993

0.5004 59.911

0.1998 44.809

0.1501 35.732

0.0999 30.199

0.5108 −

ω = 0.6

A ϕ

1.5000 80.017

0.4997 60.003

0.2000 44.938

0.1502 36.174

0.0999 29.682

0.5096 −

ω = 0.3

A ϕ

1.4980 79.865

0.4515 34.239

0.1832 66.939

0.1356 64.989

0.0764 69.663

0.5053 −

Proposed

A ϕ

1.4993 79.995

0.5008 60.025

0.2003 45.326

0.1501 35.918

0.1006 29.844

0.5105 −

An Improved Swarm Optimization Algorithm-Based …

125

0.0209

0.8

0.7

0.6

0.0216

0.0212

0.9

3.742

30.52

30.998

Fitness values

0.3

PROPOSED

Fig. 1 Fitness function J1 values for different inertia weights

Table 2 Optimal drift parameter values for single line to ground faults

Method

δ1

δ2

δ3

Reference [11]

33.498

54.759

67.103

Proposed

33.506

54.757

67.110

For all PSO runs at different inertia weights, the values of fitness function at the end of final iteration are plotted in Fig. 1. From this Fig. 1, it is observed that the proposed dynamic control parameters concept reduces selection of control parameters burden for finding global optimal solutions. The same PSO strategy is applied for identification of optimal switching values in order to minimize the total harmonic distortion (THD). For this purpose, Eq. (5) is considered and the results are reported in Table 2 at a modulation index (m) of 0.6

5 Conclusions In this paper, harmonic component estimation and optimal switching conditions patterns are identified using improved PSO algorithm and compared with standard PSO. Comparisons revealed the importance of control variables selection in the original PSO and simple mechanism adopted in improved PSO eliminates the additional burden on this selection. Without increasing the computational burden, accurate results achieved with fast convergence with proposed technique.

References 1. Harris FJ (1978) On the use of windows for harmonic analysis with the discrete Fourier transform. Proc IEEE 66(1):51–83 2. Ren Z, Wang B (2010) Estimation algorithms of harmonic parameters based on the FFT. In: 2010 Asia-pacific power and energy engineering conference. IEEE, Mar 2010, pp 1–4

126

M. Alekhya et al.

3. Bettayeb M, Qidwai U (2003) A hybrid least squares-GA-based algorithm for harmonic estimation. IEEE Trans Power Deliv 18(2):377–382 4. Lu Z, Ji TY, Tang WH, Wu QH (2008) Optimal harmonic estimation using a particle swarm optimizer. IEEE Trans Power Deliv 23(2):1166–1174 5. Yin YN, Lin WX, Li WL (2010). Estimation amplitude and phase of harmonic based on improved PSO. In: IEEE ICCA 2010. IEEE, June 2010, pp 826–831 6. Biswas S, Chatterjee A, Goswami SK (2013) An artificial bee colony-least square algorithm for solving harmonic estimation problems. Appl Soft Comput 13(5):2343–2355 7. Kabalci Y, Kockanat S, Kabalci E (2018) A modified ABC algorithm approach for power system harmonic estimation problems. Electric Power Syst Res 154:160–173 8. Singh SK, Kumari D, Sinha N, Goswami AK, Sinha N (2017) Gravity search algorithm hybridized recursive least square method for power system harmonic estimation. Eng Sci Technol Int J 20(3):874–884 9. Singh SK, Sinha N, Goswami AK, Sinha N (2016) Power system harmonic estimation using biogeography hybridized recursive least square algorithm. Int J Electr Power Energy Syst 83:219–228 10. Singh SK, Sinha N, Goswami AK, Sinha N (2016) Robust estimation of power system harmonics using a hybrid firefly based recursive least square algorithm. Int J Electr Power Energy Syst 80:287–296 11. Kundu S, Burman AD, Giri SK, Mukherjee S, Banerjee S (2017) Comparative study between different optimization techniques for finding precise switching angle for SHE-PWM of threephase seven-level cascaded H-bridge inverter. IET Power Electron 11(3):600–609 12. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95— International Conference on Neural Networks, vol 4. IEEE, Nov 1995, pp 1942–1948 13. Nagaraju TV, Prasad CD (2020) Swarm-assisted multiple linear regression models for compression index (Cc) estimation of blended expansive clays. Arabian J Geosci 13(9) 14. Prasad CD, Biswal M, Nayak PK (2019) Wavelet operated single index based fault detection scheme for transmission line protection with swarm intelligent support. Energy Syst 1–20 15. Nagaraju TV, Prasad CD, Raju MJ (2020) Prediction of California bearing ratio using particle swarm optimization. In: Soft computing for problem solving. Springer, Singapore, pp 795–803

A Study on Ensemble Methods for Classification R. Harine Rajashree and M. Hariharan

Abstract Classification is the most common task in machine learning which aims in categorizing the input to set of known labels. Numerous techniques have evolved over time to improve the performance of classification. Ensemble learning is one such technique which focuses on improving the performance by combining diverse set of learners which work together to provide better stability and accuracy. Ensemble learning is used in various fields including medical data analysis, sentiment analysis and banking data analysis. The proposed work focuses on surveying the techniques used in ensemble learning which covers stacking, boosting and bagging techniques, improvements in the field and challenges addressed in ensemble learning for classification. The motivation is to understand the role of ensemble methods in classification across various fields. Keywords Machine learning · Ensemble learning · Boosting · Bagging

1 Introduction Machine learning is one of the ways to gain artificial intelligence. Machine learning focuses on equipping the machine to learn on itself without being explicitly programmed. This, in turn, leads way to gain intelligence. Machine learning is widely classified into types, namely supervised learning and unsupervised learning. Classification is a prominent machine learning task which works into mapping input to output. It is an supervised learning which does mapping of provided input to an output. It basically finds the class to which an input data might possibly belong. It is supervised learning since the data used to train the model which approximates R. Harine Rajashree (B) Department of CSE, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India M. Hariharan Post Graduate Programme in Management, Indian Institute of Management, Tiruchirappalli, Tamil Nadu, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_10

127

128

R. Harine Rajashree and M. Hariharan

Fig. 1 Types of machine learning

the mapping function are labelled with correct classes. Figure 1 depicts the types of machine learning along with applications. In addition to classification, regression is also a supervised learning which is also applied in various fields like risk assessment and stock value analysis. Classification is a predictive modelling where the class label of an input data is predicted.The model is trained with numerous data which is already labelled. Some classic examples include spam/non-spam mail classification, handwritten character classification. Binomial and multi-class are two diverse types of classifications. Many popular algorithms are involved to perform the classification task. Few well known are • • • •

K-nearest neighbours Naive Bayes Decision trees Random forest

Although the performance of the algorithms was commendable, there is consistent necessity to improve the performance. Ensemble learning is one familiar technique to improve the accuracy. Ensemble learning, in turn, has many approaches to improve the accuracy of classification. The idea of using ensemble models is to combine numerous weak learners to act as a single strong learner. The work presented analyses the various methods of ensemble techniques, its application in the fields and also experimental analysis on the effect of ensemble learning. The rest of the paper is organized as follows: Sect. 2 studies the related work, Sect. 3 where the various ensemble techniques are discussed, Sect. 4 discusses the application of ensemble techniques and Sect. 5 discusses the use of ensemble techniques in deep learning, Sect. 6 provides an experimentation, and Sect. 7 concludes the proposed survey.

A Study on Ensemble Methods for Classification

129

2 Related Work Ensemble is an evergreen research area where many studies have been proposed. The work proposed by Sagi et al. [1] provides a detailed study covering the advent of ensemble learning besides explaining the history of every ensemble technique. The key take away from the proposed work is the idea to refine the algorithms to fit big data. Another mention is to carry future work of combining deep learning with ensemble learning. Gomes et al. [2] proposed a survey specifically on the use of ensemble learning in context of data stream. The author has studied over 60 algorithms to provide a taxonomy for ensemble learners in the context. The study concludes with evidence that data stream ensemble learning models help in overcoming challenges like concept drift, and it tends to wok in real-time scenarios. Dietterich et al. [3] initiated a better understanding in their work which explains a plenty of questions such as why ensemble works, basic methods for constructing ensembles. The paper describes the algorithms that form a single hypothesis to perform input to output mapping and suffer from three main losses, namely • Statistical problem • Computational problem • Representation problem These problems cause high variance and bias, thus explaining why ensembles can reduce the bias and variance. Ren et al. [4] discussed the theories that lead to ensemble learning which includes bias variance decomposition and diversity. The paper also categorizes the algorithms under different classes. It also discusses distinct methods like fuzzy-based ensemble methods and deep learning-based methods. The work focuses equally on regression tasks besides analysing classification tasks. In this paper, various approaches involved in classification tasks, the application of ensemble learning and the challenges are discussed.

3 Ensemble Learning Approaches Ensemble learning works by forming a set of hypothesis, whereas classic algorithms work to find a single hypothesis. The ensemble then makes the hypothesis vote in some manner to predict the output class. There are numerous explanations on how to categorize the ensemble methods. Few articles categorize them into sequential ensemble techniques and parallel ensemble techniques. The author in [3] classifies the techniques based on the hypothesis design. The model can be independently constructed hypothesis that is diverse and accurate. The other methods are where hypotheses are constructed as an additive model. The author in [4] does a very specific category like decomposition-based ensemble methods and fuzzy ensemble methods. The description from [1] is the most simple categorization. • Dependent framework • Independent framework

130

R. Harine Rajashree and M. Hariharan

Dependent framework is when the result of each learner affects the next learner. The learning of the next learner is affected by the previous learning. Independent framework is constructed independently from each other. Any ensemble technique would fall under these two categories. The prime types of ensemble techniques are • Bagging • Boosting • Stacking These techniques have numerous algorithms working under them to achieve the goals of ensemble.

3.1 Bagging One major key to ensemble models is diversity. Diversity plays crucial role in improving the performance of ensemble model. Bagging is one of the approaches to implement diversity. Bagging is bootstrap aggregating which works by training each inducer on a subset of instances. Each inducer gets trained on different subsets, thereby generating different hypotheses. Then a voting technique is used to determine the prediction of the test data. Bagging often contains homogeneous learners and implements data diversity by working on samples of data. In [5], Dietterich discusses in detail why ensembles perform better than individual learners. The author claims that bagging is the most straightforward method to construct an ensemble by manipulating training examples. He also states that this method of ensemble works better on unstable algorithms. These algorithms are affected by major changes even when the manipulation is small. Tharwat et al. [6] propose a plant identification model which uses bagging classifier. The work proposes the usage of bagging classifier on a fused feature vector model for better accuracy. Decision tree learner is used as base learner, and the results show that the accuracy gets increased with increase in number of learners. The paper also finds that the accuracy rate was proportional to the number of training data and size of the ensemble. Wu et al. in [7] propose an intelligent ensemble machine learning method based on bagging for thermal perception prediction. The author shows the performance of the ensemble against SVM and ANN. The ensemble outperformed the classic algorithms in prediction of thermal comfort and many other measures. Many improvements were suggested in bagging some of which include improved bagging algorithm. The algorithm is improvised by assigning an entropy to each sample. Jiang et al. in [8] used the algorithm for pattern recognition to recognize ultra-high-frequency signals. The model showed improved performance against many algorithms. Another interesting variant is wagging which is weight aggregation. It works by assigning weights to samples. However, in the work by Bauer et al. [9], there was no significant improvements shown in results. But, bagging has shown improved performance by decreasing the error. In many experiments along with Naive Bayes and MC4, the error has been significantly reduced.

A Study on Ensemble Methods for Classification

131

In [10], Kotsiantis et al. categorize the variants of bagging algorithms into eight categories which include • Methods using alternative sampling techniques. • Methods using different voting rule. • Methods adding noise and inducing variants.

3.2 Boosting Boosting is an iterative technique which works by adjusting the weights of the observation made by previous classifier. This ensures that the instances which are not classified properly are picked more often than the correctly predicted instances. This makes boosting a well-known technique under dependent framework. There are many algorithms in boosting technique of which the following are discussed. • AdaBoost • Gradient boosting. AdaBoost AdaBoost was the pioneer in boosting technique. It works to combine weak learners to form a strong learner. It uses weights and assigns them in such a way that weights of wrongly classified instances are increased, and for the correctly classified ones the weights are decreased. Thus, the weights make the successive learners concentrate more on wrongly classified instances. In [11], the author discusses AdaBoost in a very detailed manner. Various aspects of AdaBoost have been explained forming an expansive theory on the algorithm. Prabhakar et al. in [12] proposed a model combining dimensionality reduction technique and AdaBoost classifier for classification of Epilepsy using EEG signals. The classification is improved with more than 90% accuracy. Haixing et al. [13] used an AdaBoost-kNN ensemble learning model for classification of multi-class imbalanced data. The model uses kNN as base learner and incorporates AdaBoost. The results showed 20% increase in accuracy than classic kNN model. Many other works involved usage of AdaBoost along with feature selection techniques for increased accuracy. Gradient Boosting Gradient boosting machine (GBM) works in an additive sequential model. The major difference between AdaBoost and GBM is the way they manage the drawbacks of the previous learner. While AdaBoost uses weights, GBM uses gradients to compensate the drawbacks in succeeding learners. One prominent advantage of using GBM is that it allows user to optimize user-specified cost function instead of unrealistic loss function. Many literatures use an improvement of gradient boost which is extreme gradient boost (XGB). Shi et al. [14] propose a weighted XGB which is used for ECG heartbeat classification. The model was used in classifying heartbeats under four categories like normal and ventricular, and the work concludes saying the method is suitable for clinical application.

132

R. Harine Rajashree and M. Hariharan

3.3 Stacking Stacking combines multiple learners by employing a meta learner. The base level learners are trained on the training data set, and the meta-learner trains on the base learner features. The significance of stacking is that it can reap the benefits of wellperforming models by learning a meta-learner on it. The learners are heterogeneous, and unlike boosting only a single learner is used to learn from base learners. Stacking can also happen in multiple levels, but they might be data and time expensive. Ghasem et al. [15] used a stacking-based ensemble approach for implementing an automated system for melanoma classification. The author also proposed a hierarchical structure-based stacking approach which showed better results besides the stacking approach.

3.4 Random Forest Random forest is a very popular ensemble method which is bagging method where trees are fit on bootstrap samples. Random forest adds randomness by selecting best features out of random subset of features. Sampling over features gives the added advantage that the trees do not have to look at the same features to make decisions. Lakshmanaprabhu et al. [16] proposed a random forest classifier approach for big data classification. The work exhibits how ensemble techniques work with big data. The model is implemented on health data, and RFC is used to classify the same. The results showed maximum precision of 94% and showed improvement against existing methods. Paul et al. [17] proposed an improvised random forest algorithm which iteratively reduces the features which are considered unimportant. The paper aimed to reduce the number of trees and features while still maintaining the accuracy. It could prove that the addition of trees or further reduction of features does not have effect on accuracy.

4 Application of Ensemble Techniques Ensembles are employed due to their ability to mitigate a lot of problems that might occur while using machine learning. Many literatures [1, 4] discuss the advantages and disadvantages. The significant benefits of using ensemble are discussed below. • Class imbalance: When the data have majority of instances belonging to a single class, then it is said to be class imbalance. Machine learning algorithms thereby might develop an inclination towards that class. Employing ensemble methods can mitigate this issue by performing balanced sampling, or employing learners that would cancel the inclinations of the previous learner. In [13], it is shown how ensemble is used on imbalanced data.

A Study on Ensemble Methods for Classification

133

Table 1 Findings from the literature stated above Data set Bagging Boosting Letter data set Led-24 Iris Sonar

94.90 73.57 94.67 77.14

96.74 71.43 94.67 81.43

Random forest 96.84 74.93 94.67 81.90

• Bias Variance Error: Ensemble methods tackle the bias or variance error that might occur in the base learners. For instance, bagging reduces the errors associated with random fluctuations in training samples. • Concept drift: Concept drift is the change in the underlying relationships due to change in the labels over time. Ensembles are used as a remedy since diversity in the ensembles usually reduces the error that might occur due to the drift. Similar to the benefits, there are certain limitations in using ensembles. Few of them are • Storage expensive • Time expensive • Understanding the effect of parameters like size of ensemble and selection of learners on the accuracy. At few places, smaller ensembles work better, whereas in some literature the increase in accuracy is stated to be proportional to the number of learners. Robert et al. [18] conducted an experimental analysis by comparing the ensemble methods against 34 data sets. Some significant outcomes of the analysis are mentioned in Table 1. From the findings, it is visible that at some cases ensembles can also perform poor, whereas in most cases according to the literature the accuracy is better. Although few questions are still open, ensembles have widely been employed for improved performance.

5 Deep Learning and Ensemble Techniques The emergence of deep learning has resulted in enormous growth in various domains. Deep learning paves way for improvements in artificial intelligence to get par with humans. Architectures like densely connected neural network and convolutional neural network are very popular. Deep learning plays an important role in speech recognition, object detection and so on. Aggregation of multiple deep learning models is a simple way to employ ensemble in deep learning. Other way is to employ ensemble inside the network. Dropouts and residual blocks are an improvement in such models. They tend to create variations in the network, thereby improving accuracy. Numerous literatures show how neural networks employ ensemble methods for classification

134

R. Harine Rajashree and M. Hariharan

purposes. Liu et al. [19] proposed an ensemble of convolutional neural networks with different architectures for vehicle-type classification. The results show that the mean precision increased by 2% than single models. Zheng et al. [20] in the work proposed an ensemble deep learning approach to extract EEG features. It uses bagging along with LSTM model and showed higher accuracy when compared with techniques like RNN. Such works exhibit how ensemble is marching forward in the artificial intelligence era.

6 Experimentation To understand the effect of ensemble methods, the ensemble methods have been employed on an open banking data set which studies the attributes of an user to classify if he/she would take a loan from the bank. The findings from the experimentation are listed in the table below. Method Stacking AdaBoost GBM Bagging Random forest

Accuracy 88.5 89.01 88.27 88.72 86.91

AdaBoost exhibits highest accuracy of 89% with decision tree as base learner. Random forest shows 86.91% accuracy. The 3% increase in accuracy is still promising to help the task. On the other hand, another data set trying to classify default credit payments was experimented. Random forest showed the highest accuracy of 98.76%, whereas the other techniques showed accuracy around 75% only. This shows the advantage of ensemble technique while also expressing the limitation of effect of parameters on accuracy. The effect of feature selection can also be studied in future for reaping much higher performance.

7 Conclusion Classification tasks aim to predict the class label of unknown test data. Ensemble learning is a popular technique to improve the performance of classification. It combines numerous learners to form a single strong learner. Many techniques and algorithms are present in ensemble learning. Bagging, boosting and stacking are the very popular ensemble techniques. These algorithms are discussed along with their applications in various fields. Besides, the future of ensemble learning in the field of artificial intelligence , advantages and disadvantages in application of ensemble learning are explained in detail. As a future work, analysis can be made on how

A Study on Ensemble Methods for Classification

135

ensemble learning can be used efficiently to fit big data and in the direction of buildings models that are simpler and effective in terms of cost and time.

References 1. Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4): 2. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv (CSUR) 50(2):1–36 3. Dietterich TG (2002) Ensemble learning. The handbook of brain theory and neural networks 2:110–125 4. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53 5. Dietterich TG (2000) Ensemble methods in machine learning. International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, pp 1–15 6. Tharwat A, Gaber T, Awad YM, Dey N, Hassanien AE (2016) Plants identification using feature fusion technique and bagging classifier. The 1st international conference on advanced intelligent system and informatics (AISI2015), 28–30 Nov 2015, Beni Suef. Egypt. Springer, Cham, pp 461–471 7. Wu Z, Li N, Peng J, Cui H, Liu P, Li H, Li X (2018) Using an ensemble machine learning methodology-Bagging to predict occupants-thermal comfort in buildings. Energy Build 173:117–127 8. Jiang T, Li J, Zheng Y, Sun C (2011) Improved bagging algorithm for pattern recognition in UHF signals of partial discharges. Energies 4(7):1087–1101 9. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn 36(1–2):105–139 10. Kotsiantis SB (2014) Bagging and boosting variants for handling classifications problems: a survey. Knowl Eng Rev 29(1):78 11. Schapire RE (2013) Explaining adaboost. Empirical inference. Springer, Berlin, Heidelberg, pp 37–52 12. Prabhakar SK, Rajaguru H (2017) Adaboost classifier with dimensionality reduction techniques for epilepsy classification from EEG. International conference on biomedical and health informatics. Springer, Singapore, pp 185–189 13. Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) BPSO-adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176– 193 14. Shi H, Wang H, Huang Y, Zhao L, Qin C, Liu C (2019) A hierarchical method based on weighted extreme gradient boosting in ECG heartbeat classification. Comput Methods Progr Biomed 171:1–10 15. Ghalejoogh GS, Kordy HM (2020) Ebrahimi F (2020) A hierarchical structure based on Stacking approach for skin lesion classification. Expert Syst Appl 145: 16. Lakshmanaprabu SK, Shankar K, Ilayaraja M, Nasir AW, Vijayakumar V, Chilamkurti N (2019) Random forest for big data classification in the internet of things using optimal features. Int J Mach Learn Cybern 10(10):2609–2618 17. Paul A, Mukherjee DP, Das P, Gangopadhyay A, Chintha AR, Kundu S (2018) Improved random forest for classification. IEEE Trans Image Process 27(8):4012–4024 18. Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Philip Kegelmeyer W, Eschrich S (2004) A comparison of ensemble creation techniques. International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, pp 223–232 19. Liu W, Zhang M, Luo Z, Cai Y (2017) An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors. IEEE Access 5:24417–24425

136

R. Harine Rajashree and M. Hariharan

20. Zheng X, Chen W, You Y, Jiang Y, Li M, Zhang T (2020) Ensemble deep learning for automated visual classification using EEG signals. Patt Recogn 102:

An Improved Particle Swarm Optimization-Based System Identification Pasila Eswari, Y. Ramalakshmanna, and Ch. Durga Prasad

Abstract An improved particle swarm optimization (IPSO) is used to identify infinite impulse response (IIR) system based on error minimization concept. Since the parameter selection of conventional PSO influences searching process, dynamic control parameters are inserted in the mechanism to avoid premature solutions. This modification helps to final global optimal values even the initial control parameters are worst in nature. The method is tested for two standard IIR systems of third- and fourth-order models to show the improvements. Finally, comparative results show the effectiveness of the dynamic nature of the control parameters of PSO in order to find close parameter values of unknown systems. Keywords IIR filter · Particle swarm optimization · Control parameters

1 Introduction The elimination of specific band of frequencies is achieved by digital filters in digital processors. Linear and nonlinear filter are the broad classification for such digital filters. For better filtering, IIR filter is widely used instead of FIR filters in control, signal processing, and communication related fields. The objective is to estimate the actual parameters of the unknown system for patterns of different input and outputs [1, 2]. Several gradients-based learning methods [3–5], intelligent approaches [6], evolutionary and swarm optimization techniques [7–13] were used to estimate the filter/system parameters. In past, gradient approaches were used for filters to estimate frequency [3, 4]. Quaternion algebra concept is introduced later in [5] to reduce the complexity in the design of IIR filter, and a learning algorithm for its training was proposed. Since the prediction of the adaptive IIR algorithms is more difficult, P. Eswari (B) · Y. Ramalakshmanna Department of ECE, SRKR Engineering College, Bhimavaram, Andhra Pradesh, India Ch. Durga Prasad Department of EEE, SRKR Engineering College, Bhimavaram, Andhra Pradesh, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_11

137

138

P. Eswari et al.

intelligent approaches provide alternate solutions with less complexity and with high convergence [6]. Particle swarm optimization (PSO) was introduced in [7] for IIR filter coefficients identification. To reconstruct missing elements of N-dimensional data, this PSO was adopted in [7]. Earlier to this, directly the parameters were estimated using PSO under ideal conditions in [8]. Evolutionary-based algorithms such as genetic and differential evolution algorithms [9, 10] were also tried in digital filter (IIR) coefficients identification. These aforementioned intelligent approaches need control parameters, and their selection influences the convergence and hence nonparametric-type algorithms were also applied for identification of parameters of IIR and FIR filters. Teaching and learning-based optimization (TLBO) applied in [11] to identify filter parameters. Other mathematical-based algorithms were also available in literature for estimation of filters in noise conditions [12, 13]. Recently cat behavior-oriented optimization algorithm (CSO) [1], gravitational search-based technique (GSA) [2], and recent algorithms [14] were applied in estimation problem for better convergence. For accurate identification of filter coefficients with fast convergence rate, improved PSO is used in this paper since the PSO algorithm is easy to implement and fast to execute. However, the control parameters influenced on final results are minimized by damping nature and achieved close values to exact solution. The efficacy is tested with few well-defined models discussed in consequent sections.

2 Problem Formulation Identification of the exact modal parameters of the unknown system from the observations of output and input patterns is known as system identification. This task is completed by parameters substitutions of the model for the set of standard inputs so that its output matches the system actual outputs. The schematic representation of system identification is given in Fig. 1 in line with definition where the parameters are identified by optimization algorithm. The input (x)–output (y) relation can be described in terms of the following Eq. (1) N 

bi y(n − i) =

i=0

M 

ak x(n − k)

(1)

k=0

In Eq. (1), N (≥ M) is the filter’s order. The transfer function of the filter described in Eq. (1) is given by H (z) =

M ak z −k Y (z) = k=0 N −i X (z) i=0 bi z

Suppose b0 = 1 the adaptive IIR filter transfer function is,

(2)

An Improved Particle Swarm Optimization-Based System …

139

Actual IIR System

Model IIR System

Optimization Algorithm

Fig. 1 Approach diagram for system identification

M

−k k=0 ak z N 1 + i=1 bi z −i

(3)

Y (z) a0 + a1 z −1 + a2 z −2 + · · · + a M z −M = X (z) 1 + b1 z −1 + b2 z −2 + · · · + b N z −N

(4)

H (z) = In detail, Eq. (3) is rewritten as H (z) =

The estimated filter model is given by M

He (z) =

ˆ k z −k k=0 a N 1 + i=1 bˆi z −i

To identify the correct parameters of the actual system, an error is calculated from the known and unknown systems outputs using the equation given by Error, e(k) = y(k) − ye (k)

(5)

For nearer parameters estimation, the error defined in Eq. (5) is approaching zero for the entire time scale. Identification of such exact parameters is achieved using population search-based techniques where the objective function is framed with the help of error shown in Eq. (5) is given by J = min

N 

e(k)2

(6)

k=1

At optimal solution, J is approaching zero and process is converged. For this purpose, an improved PSO is used with dynamic control parameters.

140

P. Eswari et al.

3 Improved Particle Swarm Optimization Algorithm PSO is a popular search-based intelligent algorithm implemented from the food searching mechanism of birds [15]. The primary solution of the optimization problem is randomly generated in the search space known as initial solution. Each solution is represented as ‘position’. With the help of best position of individual and group, new position is updated with the help of ‘velocity’ calculation. This velocity is calculated for individual birds using the current position, local best position, and global best position of particles along with other parameters known as control parameters. These control parameters selection influences overall search process. The wise selection of suitable control parameters is a difficult task and hence new algorithms were proposed in later stages. However, an improved PSO is used in this by considering the simple architecture and fast convergence of the PSO algorithm [16–18]. This improved version used dynamic control parameters which produces more reliable solutions irrespective of initial selection of control parameter values. Using this, the position and velocity equations of particles are given by     vni+1 = ωωd vni + c1 cd r1 pbesti − pni + c2 cd r2 gbesti − pni

(7)

pni+1 = pni + vni+1

(8)

All the terms in Eqs. (7), (8) are as same as PSO ( pni represents position and vni represents velocity) and ωd and cd are the damping values inserted for each control parameter. Since the acceleration coefficients are mixing with random number, only inertia weight is more influenced parameter. Therefore, the dynamic change is considered only for inertia weight for the rest of the paper. The improvements in the results with the proposed method are reported in Sect. 4.

4 Simulation Results To analyze the performance of the improved PSO for the estimation parameters, two case studies have been taken. Test system 1: The transfer function of the fourth-order plant (fourth-order IIR filter) is given by H (z) =

a0 + a1 z −1 + a2 z −2 + a3 z −3 1 − b1 z −1 − b2 z −2 − b3 z −3 − b4 z −4

(9)

In Eq. (9), the actual coefficients of the unknown system are presented in first row of Table 1. As stated in Sect. 3, initially the identification of the filter parameters is checked using PSO with constant control parameters. At ω = 0.8, the minimum value of the objective function is achieved. The estimated parameters are reported in

An Improved Particle Swarm Optimization-Based System …

141

Table 1 Estimated parameters of unknown test system 1 Case

a1

a2

a3

b1

b2

b3

b4

Actual

−0.9000

0.8100

−0.7290

−0.0400

−0.2775

0.2101

−0.1400

ω = 0.2

−0.8973

0.8068

−0.7285

−0.0422

−0.2773

0.2116

−0.1392

ω = 0.6

−0.8961

0.8058

−0.7284

−0.0435

−0.2776

0.2112

−0.1395

ω = 0.8

−0.8709

0.7677

−0.7189

−0.0620

−0.2680

0.2171

−0.1269

Proposed

−0.8972

0.8066

−0.7283

−0.0422

−0.2777

0.2110

−0.1391

Table 2 Estimated parameters of unknown test system 2 Case

a0

a1

a2

b1

b2

b3

Actual

−0.2

−0.4

0.5

−0.6

−0.25

0.2

Proposed

−0.1964

−0.4095

0.5052

−0.5928

−0.2418

0.1960

the second row of Table 1. At other constant values of the inertia weight parameter, the optimal values achieved with PSO are also presented in Table 1. Among all cases, the best solution is achieved when the ω = 0.2. However, with the proposed dynamic natured ω, similar estimated values are achieved with worst initialization of control parameters. The final objective function values at three different inertia weights 0.2, 0.6, and 0.8 are −56.51, −52.61, and −31.58 dB, respectively. However, −55.56 dB is the function value for the proposed method where the initial value is started at 0.8 inertia weight. Test system 2: The transfer function of the third-order plant is given by H (z) =

a0 + a1 z −1 + a2 z −2 1 − b1 z −1 − b2 z −2 − b3 z −3

(10)

In Eq. (10), the actual coefficients of the unknown system are presented in first row of Table 2. The improved PSO is applied to identify the unknown plant parameters and the final solution values are reported in Table 2 which is close to actual values.

5 Conclusions In this paper, improved PSO is applied for estimation of unknown plant parameters. This method avoids the additional burden in the selection of control parameters of the conventional PSO and produces global optimal values (close) irrespective of initialization process. The results provided with two higher-order models show the advantages of the improved PSO in the system identification process.

142

P. Eswari et al.

References 1. Panda G, Pradhan PM, Majhi B (2011) IIR system identification using cat swarm optimization. Expert Syst Appl 38(10):12671–12683 2. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2011) Filter modeling using gravitational search algorithm. Eng Appl Artif Intell 24(1):117–122 3. Chicharo JF, Ng TS (1990) Gradient-based adaptive IIR notch filtering for frequency estimation. IEEE Trans Acoust Speech Sig Process 38(5):769–777 4. Netto SL, Diniz PS, Agathoklis P (1995) Adaptive IIR filtering algorithms for system identification: a general framework. IEEE Trans Educ 38(1):54–66 5. Took CC, Mandic DP (2010) Quaternion-valued stochastic gradient-based adaptive IIR filtering. IEEE Trans Sig Process 58(7):3895–3901 6. Cho C, Gupta KC (1999) EM-ANN modeling of overlapping open-ends in multilayer microstrip lines for design of bandpass filters. In: IEEE antennas and propagation society international symposium 1999 Digest. Held in conjunction with: USNC/URSI National Radio Science Meeting (Cat. No. 99CH37010), vol 4. IEEE, pp 2592–2595 7. Hartmann A, Lemos JM, Costa RS, Vinga S (2014) Identifying IIR filter coefficients using particle swarm optimization with application to reconstruction of missing cardiovascular signals. Eng Appl Artif Intell 34:193–198 8. Durmu¸s B, Gün A (2011) Parameter identification using particle swarm optimization. In: Proceedings, 6th international advanced technologies symposium, (IATS 11), Elazı˘g, Turkey, pp 188–192 9. Ma Q, Cowan CF (1996) Genetic algorithms applied to the adaptation of IIR filters. Sig Process 48(2):155–163 10. Karaboga N (2005) Digital IIR filter design using differential evolution algorithm. EURASIP J Adv Sig Process 2005(8):856824 11. Singh R, Verma HK (2013) Teaching–learning-based optimization algorithm for parameter identification in the design of IIR filters. J Inst Eng (India): Ser B 94(4):285–294 12. DeBrunner VE, Beex AA (1990) An informational approach to the convergence of output error adaptive IIR filter structures. In: International conference on acoustics, speech, and signal processing. IEEE, pp 1261–1264 13. Wang Y, Ding F (2017) Iterative estimation for a non-linear IIR filter with moving average noise by means of the data filtering technique. IMA J Math Control Inf 34(3):745–764 14. Zhao R, Wang Y, Liu C, Hu P, Jelodar H, Yuan C, Li Y, Masood I, Rabbani M, Li H, Li B (2019) Selfish herd optimization algorithm based on chaotic strategy for adaptive IIR system identification problem. Soft Comput, 1–48 15. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural networks, vol 4. IEEE, pp 1942–1948 16. Nagaraju TV, Prasad CD (2020) Swarm-assisted multiple linear regression models for compression index (Cc) estimation of blended expansive clays. Arab J Geosci 13(9) 17. Prasad CD, Biswal M, Nayak PK (2019) Wavelet operated single index based fault detection scheme for transmission line protection with swarm intelligent support. Energy Syst, 1–20 18. Nagaraju TV, Prasad CD, Raju MJ (2020) Prediction of California bearing ratio using particle swarm optimization. In: Soft computing for problem solving. Springer, Singapore, pp 795–803

Channel Coverage Identification Conditions for Massive MIMO Millimeter Wave at 28 and 39 GHz Using Fine K-Nearest Neighbor Machine Learning Algorithm Vankayala Chethan Prakash, G. Nagarajan, and N. Priyavarthan Abstract Massive MIMO millimeter wave (mm-wave) system that integrates various technologies together with hundreds of antennas that supports devices together. In a mm-wave communication, the signal degrades due to its atmospheric absorption, a pencil beam is formed that is liable to attenuate due to obstacles present in between the propagation paths. Offering such a huge bandwidth a greater number of devices are interconnected. However, in order to provide a seamless connection of devices, identification of channel conditions is a need to analyze. From the channel analysis, a channel characterization is found for classifying of signal paths into Line of Sight (LoS) and Non-Line of Sight (NLoS). An energy detector is used for the signals perceiving above 10 dB. These signals are analyzed for channel conditions such as pathloss and power delay profile. In this work, independent identically distributed AWGN channel is considered. Based on which a dataset is constructed, machine learning algorithm, namely K-nearest neighbor (K-NN), is applied for efficient channel characterization into LoS and NLoS. An accuracy of 96.3 and 94.3% is obtained for pathloss, and an accuracy of 94.5 and 93.3% is obtained for power delay profile at 28 and 39 GHz, respectively. Keywords LoS · Massive MIMO · mm-wave · NLoS · Pathloss and power delay profile

1 Introduction The technical aspects of massive MIMO have brought the integration of various networks together namely fifth generation. Massive MIMO with mm-wave has gained

V. C. Prakash (B) · G. Nagarajan Department of ECE, Pondicherry Engineering College, Puducherry, India e-mail: [email protected] N. Priyavarthan Department of CSE, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_12

143

144

V. C. Prakash et al.

its research importance in handling many numbers of transceiving terminals efficiently. To support this greater number of devices, a huge amount of bandwidth is required. For such higher bandwidth, the system must be operated in higher frequencies such as mm-wave frequencies. However, signals experience heavy distortions due to its shorter wavelength. These signals when obstructed create multiple paths and makes the quality of the signal to degrade. For wireless sensor networks, Internet of things, human-centric applications, device-centric applications, there is a demand for such higher bandwidth. In order to provide coverage to these numbers of devices, the channel conditions with respect to the propagating environment is analyzed. Massive MIMO operated in TDD and FDD mode, where in TDD mode one of the uplink channels or the downlink channels is estimated, by Hermitian transpose the other channel is estimated, whereas in FDD channel the uplink and downlink channel are estimated separately. The obstacles present in the propagation of signals between the transmitter and the receiver makes the signal to degrade further with respect to the operating frequency of the system. To enhance the signals that are intended to the users, the device localization is mere important. In general, it is widely known that localization happens through global positioning system (GPS). Many techniques are in usage based on the distance, arrival of the signal, and geometric techniques. In some scenarios, these techniques are used in combination with GPS that makes the localization procedure more reliable. GPS and the abovesaid techniques fail to provide accuracy in localization procedure due to factors present in the propagation environment. Due to wide spread of devices, there is a need to localize and provide signal toward the intended device. In 5G and beyond 5G networks, there is a need for massive coverage enhancement for various IoT applications such as agriculture, medicine, where these IoT devices are employed in a distributed manner both in urban and rural areas. For better quality of service, a distributed massive MIMO system is a better solution [1]. A beam aligned framework is proposed based on Bayesian decision where the coordination between the base station and the user equipment is taken into account [2]. A distance-based localization and mapping are proposed with extended Kalman filter irrespective of propagation environment and position of base station [3]. A fingerprinting database positioning is done with the received signal strength based on Gaussian regression. It reduces complexity in analyzing the position of individual user terminal on comparison with range-based or angle-based techniques [4]. A two-step localization procedure is followed where the angle of arrival (AOA) and triangulation is proposed for localization. To reduce the complexity of localization compressed sensing is utilized for the identification of LoS and NLoS conditions [5]. For a massive MIMO mm-wave system, a mixed analog–digital convertor is designed to enhance the overall system performance [6]. Localization is carried with angle of departure (AOD) and received signal strength (RSS) for each individual user terminal at the base station for beamforming signals that uses orthogonal frequency division multiplexing (OFDM) with reduced peak-to-average power ratio (PAPR) [7]. Situational awareness in a massive MIMO mm-wave system for propagation environment where a model is designed in which the statistical channel conditions and the position with respect to clock offset are considered [8]. For high

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

145

data rate communication in a massive MIMO mm-wave system, a joint position estimation and orientation is performed for 5G networks [9]. An enhanced cooperative group localization is performed for coverage connectivity where the RSS and AOA which increases the localization accuracy for both massive MIMO and device 2 device networks [10]. A hybrid RSS-AOA technique with uniform cylindrical arrangement of antennas in massive MIMO mm-wave system is considered for localization. A channel compression method is proposed where the received signal vector dimensions are reduced by maintaining the same accuracy [11]. A massive MIMO system with uniform cylindrical array is considered where the received signal vectors are transformed into beam space vector where a linear relation is maintained based on the direction of arrival with low complexity search algorithm [12]. A sparsity-based error correcting localization, i.e., on the residual update by generalized orthogonal matching pursuit algorithm which reduces complexity in large-scale arrays [13]. A fingerprint localization for a single site massive MIMOOFDM system where an angle delay channel matrix is obtained from the instantaneous channel condition, with clustering algorithm the complexity is reduced [14]. An angle delay Doppler power spectrum is extracted from the channel state information, a fingerprint is built, and it is compared pre-collected reference points. This fingerprinting database is used with a distance-based kernel method thus increases the localization accuracy [15]. In a massive MIMO, LTE user channel access the location information is obtained with the synchronization when connected to the network. In order to improve the allocation of radio resource to the user, the location information is used for beamforming that is designed for both LoS and multi-path environments [16]. A mobile cloud computation and massive MIMO are integrated together in 5G networks. Direction of arrival (DOA) technique is used for localization of real-time monitoring of patients [17]. A multiple hypothesis testing for power delay profile is performed where the maximum likelihood and Bayesian estimation are proposed for the identification of NLoS from LoS conditions [18]. Propagation environment with obstacles is examined for pathloss, power delay profile, RMS delay spread, and mean excess delay in a massive MIMO mm-wave system [19]. A hybrid RSS-TOA-based localization is proposed for a massive MIMO mm-wave frequency for 32 and 64 antennas at the base station with 4 and 8 user terminals. An energy detector is utilized for channel densification process which reduces channel complexity [20]. The channels are studied for mm-wave frequencies at 4.5, 28, and 38 GHz, the pathloss is studied for vertical–vertical, vertical-horizontal, and vertical-omni polarizations in an indoor environment [21]. An IoT-based healthcare system in an indoor and outdoor is considered, where a design methodology is proposed with accelerometer and magnetometer for localization of patient with identification of patient activities such as walking, standing, sleeping are monitored and transmitted to the concerned staff [22]. A home remote monitoring system is designed for patients in IoT environments, the protocol conversion of ISO to IEEE 11,073 protocol, M2M protocol, and a scheduling algorithm is proposed for medical data transmission to staff at hospitals. In spite of data transmission, secure storage and authorization of data are also taken

146

V. C. Prakash et al.

into account [23]. In a dense forest environment, the future wireless sensor networks and IoT devices are deployed and examined for pathloss at 2.4 GHz. Two scenarios were considered for simulation and measurements, namely (a) free space zone and (b) diffraction zone, where the delay spread values are also presented [24]. In a 5G network, mm-wave frequencies with IoT devices are considered for higher bandwidth. Pathloss is analyzed for 38 GHz in an outdoor environment for characterization of LoS and NLoS for antenna polarizations, namely vertical–vertical and vertical-horizontal. Parameters such as cell throughput, throughput at edges, spectral efficiency, and fairness index [25]. IoT in industrial applications needs a wide range of bandwidth with seamless connectivity, where a need for localization is a must. Normally, narrow band IoT is used for industrial and healthcare applications, and GPS fails to localize for such a low power IoT device. Based on the distance, an analytical model is designed for geometric probabilistic analysis [26]. Indoor positioning system for an IoT environment is considered, and a wi-fi trilateration method is proposed for position of users with respect to the reference points [27]. In IoT, localization of devices has gained importance for quality of experience. A localization technique is designed on the part of Butler project, namely FP7, which is the most commendable technique on EU projects [28]. The feasibility of massive MIMO in industrial IoT is analyzed by placing massive antennas in datacenter for seamless connectivity with large number of devices [29]. Massive MIMO with IoT is analyzed for connectivity under two generic schemes, namely massive machine-type communication and ultra-reliable low latency communication. For physical layer technologies between massive MIMO and IoT, a strong integration is needed in terms of protocol design [30].

2 Network Architecture A distributed massive MIMO mm-wave system is considered for identification of coverage area based on received signal at the user equipment. Figure 1 shows the architecture of distributed massive MIMO. The radio towers are deployed with massive antennas to provide connectivity to number of user equipment. For such number of devices, a huge bandwidth is required. For seamless connectivity, the propagating channel conditions are analyzed for LoS and NLoS.

3 Simulation Methodology In 5G communication networks such as mobile networks, wireless sensor networks, cognitive radio networks, device-to-device communication, there is a need to study on the propagation environment. For a next generation of wireless communication, a greater number of devices are to be connected together. Most of the literature review reveals that massive MIMO mm-wave cellular communication reveals that for it is

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

147

Fig. 1 Distributed massive MIMO connectivity

operated at 28 and 39 GHz frequencies. To extend support for such huge number of devices, massive MIMO with mm-wave frequencies is considered as a use case. A distributed massive MIMO with 128 transmitting antennas and 4 receiving antennas operating at 28 and 39 GHz is considered. As multiple copies of signals arrive at the receiver, an energy detector is utilized where signals above 10 dB is allowed. Based on these channels, measurements are made on the uplink channel. With the principle of channel reciprocity, the downlink channel measurements are obtained. Parameters such as pathloss and power delay profile are extracted for LoS and NLoS scenarios. A dataset is constructed based on simulations for both the parameters. A fine K-NN algorithm is trained, and a tenfold cross-validation is performed on the dataset. A 1000 samples of dataset has been built. The full dataset is divided into ten parts where nine parts are used for training and one part is used for testing.

4 Simulation Measurements An identically independent distributed AWGN channel is considered where the uplink channel at the base station is analyzed for pathloss and power delay profile at 28 and 39 GHz. As the channel operates in TDD mode, the uplink and the downlink channels can be estimated simultaneously with the Hermatian matrix. The channel at the uplink is estimated in frequency domain [31] Y ( f ) = H ( f ) · x( f ) + d( f )

(1)

where d( f ) is said to be the IID noise vector at the receiver with mean zero and a unit variance,

148

V. C. Prakash et al.

Table 1 Simulation parameters

Entities

Remarks

Simulation tool

MATLAB 2019a

Frequency

28, 39 GHz

No. of transmitting antennas

128

No. of receiving antennas

4

Channel

Indoor, urban

Environment

AWGN

Operating mode

TDD

x( f ) is said to be the signal transmitted from the RRH antennas to the user equipment and H ( f ) represents the channel frequency response. With channel reciprocity, the downlink channel is obtained with Hermatian transpose of the uplink channel that is given as Y ( f )α H ( f )H H ( f ) · x( f ) + d( f )

(2)

With inverse fast Fourier transform, the channel frequency response is converted into channel impulse response (Table 1). h(t) = I F F T (H ( f ))

(3)

5 Pathloss Pathloss is defined as the ratio of received power to transmitted power. The received signal power varies with the distance. As the distance increases, the received signal power reduces thus degrading the signal. In comparison with LoS, NLoS suffers a greater degradation due to the presence of obstacles. Figure 2 shows the pathloss in LoS environment at frequencies of 28 and 39 GHz. Figure 3 shows the pathloss in NLoS environment of operating frequencies of 28 and 39 GHz. From the figures, it is clear that as the operating frequency increases, the pathloss also increases. The receiver power of the antenna in pathloss is given by PR =

PT Ar 4π D β

where PT D

Transmitting power distance between the transmitter and receiver

(4)

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

Fig. 2 Pathloss at LoS environment

Fig. 3 Pathloss at NLoS environment

149

150

β Ar

V. C. Prakash et al.

Pathloss exponent Aperture area of the receiver The aperture area of the receiver is given by Ar = G r

α 4π

(5)

where Gr

Receiver Gain

α=

c f

(6)

The antenna gains of both the transmitter and receiver antenna gain is considered to be one due to isotropic antennas. The pathloss incurred between the transmitter and receiver with respect to distance is given PL = PT − PR  PL = 20 log

4π f c

(7)  (8)

6 Power Delay Profile The power delay profile is given by the average received signal power with respect to its time delay. Figure 4 represents the power delay profile where the time varying channels are examined. From the figure, it is clearly visible that the channel operating at 39 GHz experiences more distortions and appears to be higher in NLoS conditions than LoS condition on comparison with 28 GHz. It is evident that the negative received signal power tends to be in NLoS condition. The power delay profile is given by PDP(t) = |h(t)|2

(9)

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

151

Fig. 4 Power delay profile

7 Fine-KNN Fine K-nearest neighbor algorithm is a nonparametric algorithm where the entire dataset is used as training during classification. It predicts the values based on their distances between datapoints with respect to the Euclidean distance among the training and testing data. It arranges the values of the datapoints in an ascending order. From the set of values, the top values of datapoints is chosen. From the frequent occurrences of the dataset values, the classes are separated. Thus, the classification of LoS and NLoS happens. The distance between data points A = [a1 , a2 , …., an ] and B = [b1 , b2 , …, bn ] is represented as d(A, B) =

 (b1 − a1 )2 + (b2 − a2 )2 + · · · + (bn − an )2

(10)

The scatterplot represents the relationship of variables in the datasets. Figures 5 and 6 show the classification of datapoints into LoS and NLoS for 28 and 39 GHz, respectively. The red data points represent the LoS conditions, and the blue data points represent the NLoS condition. However, the crossed data points represent the misclassification of LoS and NLoS conditions. Both the figures exhibit a positive correlation where the x-axis and the Y-axis increase linearly. The confusion matrix shows the accuracy of classification with the true value and the predicted value. These values are made with respect to the observed value and predicted value. The confusion matrix resembles the accuracy of the machine learning algorithm. Figures 7 and 8 denote the true positive, false positive, false negative, and true negative values of pathloss at 28 and 39 GHz. These values are displayed based on the conditions made between the observations and predictions as per the classes 1 and 0.

152

V. C. Prakash et al.

Fig. 5 Scatterplot of pathloss at 28 GHz

Fig. 6 Scatterplot of pathloss at 39 GHz

From the dataset of 1000 samples, true positive value is 530, false positive value is 21, false negative value is 16, and the true negative value is 433. The true positive rate for class 0 is 0.96 and for class 1 is 0.96, whereas the false negative rate is 0.04 for both the classes, respectively. The positive predictive value of class 1 and 0 is 0.95 and 0.97 and the false discovery rate of class 1 and 0 is 0.05 and 0.03. For 39 GHz,

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

153

Fig. 7 Confusion matrix of pathloss at 28 GHz

Fig. 8 Confusion matrix of pathloss at 39 GHz

true positive value is 521, false positive value is 29, false negative value is 22, and true negative value is 428. The true positive rate for class 1 and 0 is 0.95 and false negative rate for both classes 1 and 0 is 0.05. The positive predictive value of class 1 and 0 is 0.94 and 0.96, and the false discovery rate of class 1 and 0 is 0.06 and 0.04. Figures 9 and 11 show the receiver operating characteristics of pathloss at 28 and 39 GHz for positive class 1. The performance of the classifier is determined with

154

V. C. Prakash et al.

Fig. 9 ROC of pathloss at 28 GHz

the ROC curve and is represented with a red point on the curve. The accuracy of the classifier is denoted by the area under the curve. As the area under the curve increases, the accuracy of the classifier also increases. Figure 9 shows the curve between true positive rate and false positive rate where TPR is 0.96 and the FPR is 0.04. From Fig. 11, the curve depicts the TPR of 0.95 and FPR of 0.05. Figures 10 and 12. show the ROC curve of pathloss of 28 and 39 GHz for positive class 0. From the graph, Fig. 10, the TPR is 0.96 and FPR is 0.04 and from Fig. 12, TPR is 0.95 and FPR is 0.05. Scatterplot represents the relationship between the variables and its correlation. Figures 13 and 14 show the scatterplot for PDP dataset of classes 1 and 0, i.e., LoS and NLoS conditions. The red data points depict the LoS condition, and the blue data points depict the NLoS conditions. Misclassification such as LoS into NLoS conditions and vice versa is marked with red- and blue-colored cross-markings. However, the scatterplot shows no correlation between the variables in the datasets. The confusion matrix explains the prediction accuracy of the machine learning algorithms. It displays the values between the observations and the predictions, that are depicted in Figs. 15 and 16. It provides the true positive, false positive, false negative, and true negative values for PDP datasets operating at 28 and 39 GHz. From the dataset of 1000 samples, the true positive value is 518, false positive value is 32, false negative value is 23, and true negative value 427 for a dataset

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

155

Fig. 10 ROC of pathloss at 39 GHz

operating at 28 GHz. The true positive rate for class 1 and 0 is as 0.95 and 0.94, and the false negative rate is of 0.05 and 0.06. The false discovery rate for class 1 and 0 is 0.07 and 0.04, and the positive predictive value for class 1 and 0 is 0.93 and 0.96. For 39 GHz, the true positive value is 515, false positive value is 35, false negative value is 32, and the true negative value is 418, the true positive rate for class 1 and 0 is 0.93 and 0.94, and false negative rate is 0.07 and 0.06 for classes 1 and 0. The positive predictive value is of 0.93 and 0.96 for class 1 and 0, whereas the false discovery rate is of 0.07 and 0.04 for class 1 and 0, respectively. Figures 17 and 18 show the ROC of PDP (positive class 1) for both 28 and 39 GHz, respectively. The performance of the classifier is analyzed with the ROC curves that are denoted with red points in the graph. The area under the curve shows the accuracy of the algorithm, higher the area under the curve, higher is the accuracy of the algorithm. The graph is plotted for true positive rate and false positive rate where the true positive rate is of 0.95, and false positive rate is of 0.06 at 28 GHz. For 39 GHz, the true positive rate is of 0.94 and the false positive rate is of 0.05. Figures 19 and 20, ROC of PDP for 28 and 39 GHz (positive class 0). It represents the performance of the prediction accuracy that is represented by red dot and the area under the curve. Larger the area under the curve indicates the higher accuracy of the algorithm. The curve is plotted for true positive rate and false positive rate where

156

V. C. Prakash et al.

Fig. 11 ROC of pathloss at 28 GHz

TPR is 0.94 and the FPR is 0.05 for 28 GHz and for 39 GHz, TPR is 0.93 and FPR is 0.07.

8 Conclusion In an indoor massive MIMO system operating at mm-wave frequency signal experience a greater degradation with operating frequency and obstacles present in the propagation environment. The channel conditions are analyzed for LoS and NLoS environment with an energy detector, and a dataset with 1000 samples is built. A machine learning algorithm, namely K-NN, is used for classification of LoS and NLoS conditions. A tenfold cross-validation is performed where the accuracy of the classification is analyzed with the testing and training data. An accuracy of about 96.3 and 94.3% is obtained for pathloss, and an accuracy of 94.5 and 93.3% is obtained for power delay profile at 28 and 39 GHz, respectively.

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

Fig. 12 ROC of pathloss at 39 GHz

Fig. 13 Scatterplot of PDP at 28 GHz

157

158

Fig. 14 Scatterplot of PDP at 39 GHz Fig. 15 Confusion matrix of PDP at 28 GHz

V. C. Prakash et al.

Channel Coverage Identification Conditions for Massive MIMO Millimeter … Fig. 16 Confusion matrix of PDP at 39 GHz

Fig. 17 ROC of PDP at 28 GHz

159

160

Fig. 18 ROC of PDP at 39 GHz

V. C. Prakash et al.

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

Fig. 19 ROC of PDP at 28 GHz

Fig. 20 ROC of PDP at 39 GHz

161

162

V. C. Prakash et al.

References 1. Chen X, Kwan Ng DW, Yu W, Larsson EG, Al Dhahir N, Schober R (2020) Massive access for 5G and beyond. arXiv preprint arXiv:2002.03491, pp 1–21 2. Maschietti F, Gesbert D, de Kerret P, Wymeersch H (2017) Robust location-aided beam alignment in millimeter wave massive MIMO. In: IEEE Global Communications Conference 3. Li X, Leitinger E, Oskarsson M, Astrom K, Tufvesson F (2019) Massive MIMO based localization and mapping exploiting phase information of multipath components. IEEE Trans Wireless Commun 18(9):4254–4267 4. Savic V, Larsson EG (2015) Fingerprinting based positioning in distributed massive MIMO systems. In: IEEE 82nd vehicular technology conference 5. Garcia N, Wymeersch H, Larsson EG, Haimovich AM, Coulon M (2017) Direct localization for massive MIMO. IEEE Trans Signal Process 65(10):2475–2487 6. Zhang J., Dai L, Li X, Liu Y, Hanzo L (2018) On low resolution ADCs in practical 5G millimeter-wave massive MIMO systems. IEEE Commun Mag 56(7):205–211 7. Mahyiddin WA, Mazuki ALA, Dimyati K, Othman M, Mokhtar N, Arof H (2019) Localization using joint AOD and RSS method in massive MIMO system. Radioengineering 28(4):749–756 8. Mendrzik R, Meyer F, Bauch G, Win MZ (2019) Enabling situational awareness in millimeter wave massive MIMO systems. IEEE J Sel Top Signal Process 13(5):1196–1211 9. Shahmansoori A, Garcia GE, Destino G, Grandos G, Wymeersch H (2015) 5G position and orientation estimation through millimeter wave MIMO. IEEE Globecom Workshops 10. Leila G, Najjar L (2020) Enhanced cooperative group localization with identification of LOS/NLOS BSs in 5G dense networks. Ad Hoc Netw 88–96 11. Lin Z, Lv T, Mathiopoulos PT (2018) 3-D indoor positioning for millimeter-wave massive MIMO systems. IEEE Trans Commun 66(6):2472–2486 12. Lv T, Tan F, Gao H, Yang S (2016) A beamspace approach for 2-D localization of incoherently distributed sources in massive MIMO systems. Signal Process 30–45 13. Abhishek, Sah AK, Chaturvedi AK (2016) Improved sparsity behaviour and error localization in detectors for large MIMO systems. IEEE Globecom Workshops 14. Sun X, Gao X, Ye Li G, Han W (2018) Single-site localization based on a new type of fingerprint for massive MIMO-OFDM systems. IEEE Trans Veh Techn 67(7), 6134–6145 15. Zhang X, Zhu H, Luo X (2018) MIDAR: massive MIMO based detection and ranging. In: IEEE Global Communication Conference 16. Fedorov A, Zhang H, Chen Y (2018) User localization using random access channel signals in LTE networks with massive MIMO. In: IEEE 27th International Conference on Computer Communication and Networks (ICCCN) 17. Wan L, Han G, Shu L, Feng N (2018) The critical patients localization algorithm using sparse representation for mixed signals in emergency healthcare system. IEEE Syst J 12(1):52–63 18. Prakash VC, Nagarajan G, Ramanathan P (2019) Indoor channel characterization with multiple hypothesis testing in massive multiple input multiple output. J Comput Theor Nanosci 16(4):1275–1279 19. Prakash VC, Nagarajan G, Batmavady S (2019) Channel analysis for an indoor massive MIMO mm-wave system. In: International conference on artificial intelligence, smart grid and smart city applications 20. Prakash VC, Nagarajan G (2019) A hybrid RSS-TOA based localization for distributed indoor massive MIMO systems. In: International conference on emerging current trends in computing and expert technology. Springer, Berlin 21. Majed MB, Rahman TA, Aziz OA, Hindia MN, Hanafi E (2018) Channel characterization and path loss modeling in indoor environment at 4.5, 28 and 38 GHz for 5G cellular networks. Int J Antennas Propag Hindawi 1–14 22. Dziak., Jachimczyk., Kulesza.: IoT-Based Information System for Healthcare Application: Design Methodology Approach, Applied Sciences, MDPI, 7(6), 596, (2017). 23. Park K, Park J, Lee JW (2017) An IoT system for remote monitoring of patients at home. Appl Sci MDPI 7(3):260

Channel Coverage Identification Conditions for Massive MIMO Millimeter …

163

24. Iturri P, Aguirre E, Echarri M, Azpilicueta L, Eguizabal A, Falcone F, Alejos A (2019) Radio channel characterization in dense forest environments for IoT-5G. Proceedings, MDPI 4(1) 25. Qamar F, Hindia MHDN, Dimyati K, Noordin KA, Majed MB, Rahman TA, Amiri IS (2019) Investigation of future 5G-IoT Millimeter-wave network performance at 38 GHz for urban microcell outdoor environment. Electronics, MDPI 8(5):495 26. Tong F, Sun Y, He S (2019) On positioning performance for the narrow-band internet of things: how participating eNBs impact? IEEE Trans Ind Inf 15(1):423–433 27. Rusli ME, Ali M, Jamil N, Md Din M (2016) An improved indoor positioning algorithm based on RSSI-trilateration technique for internet of things. In: IOT, International conference on computer and communication engineering (ICCCE) 28. Macagnano D, Destino G, Abreu G (2014) Indoor positioning: a key enabling technology for IoT applications. IEEE World Forum on Internet of Things 29. Lee BM, Yang H (2017) Massive MIMO for industrial internet of things in cyber-physical systems. IEEE Trans Ind Inf 14(6):2641–2652 30. Bana A-S, Carvalho ED, Soret B, Abrao T, Marinello JC, Larsson EG, Popovski P (2019) Massive MIMO for Internet of Things (IoT) connectivity. Phys Commun 1–17 31. Li J, Ai B, He R, Wang Q, Yang M, Zhang B, Guan K, He D, Zhong Z., Zhou T, Li N (2017) Indoor massive multiple-input multiple-output channel characterization and performance evaluation. Front Inf Technol Electr Eng 18(6):773–787

Flip Flop Neural Networks: Modelling Memory for Efficient Forecasting S. Sujith Kumar, C. Vigneswaran, and V. Srinivasa Chakravarthy

Abstract Flip flops circuits can memorize information with the help of their bistable dynamics. Inspired by the flip flop circuits used in digital electronics, in this work we define a flip flop neuron and construct a neural network endowed with memory. Flip flop neural networks (FFNNs) function like recurrent neural networks (RNNs) and therefore are capable of processing temporal information. To validate FFNNs competency on sequential processing, we solved benchmark time series prediction and classification problems with different domains. Three datasets are used for time series prediction: (1) household power consumption, (2) flight passenger prediction and (3) stock price prediction. As an instance of time series classification, we select indoor movement classification problem. The FFNN performance is compared with RNNs consisting of long short-term memory (LSTM) units. In all the problems, the FFNNs show either show superior or near equal performance compared to LSTM. Flips flops shall also potentially be used for harder sequential problems, like action recognition and video understanding. Keywords Flip flops · LSTM · Memory

1 Introduction Efficient prediction and forecasting of time series data involve capturing patterns in the history of the data. Feed-forward networks process data in a single instance and therefore cannot solve time series prediction problems unless data history is S. Sujith Kumar · V. Srinivasa Chakravarthy (B) Department of Biotechnology. Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology, Madras, Chennai 600036, India e-mail: [email protected] S. Sujith Kumar e-mail: [email protected] C. Vigneswaran School of Computing, SASTRA Deemed University, Thanjavur, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_13

165

166

S. Sujith Kumar et al.

explicitly presented to the network through tapped delay lines and other techniques for representing temporal features. Alternatively, a neural network with loops, which would have the ability to recognize the patterns in the data and comprehend the information over multiple time steps and also, can process temporal data by virtue their memory property [1–3]. Flip flops are basic electronic circuits with memory property. Based on their input conditions, they can hold on to information through time or simply allow to pass through [4]. In this paper, we show how using neuron models that emulate electronic flips flops it is possible to construct neural networks with excellent temporal processing properties. It will be demonstrated that such networks show high levels of performance with prediction and classification of time series. This paper describes an implementation of flip flop neural networks for solving benchmark sequential problems. It also presents a brief comparison of the results with LSTM-based models.

2 Previous Work Holla and Chakravarthy [5] have described a deep neural network consisting of a hidden layer of flip flop neurons. The network compared favourably with LSTM and other RNN models on long-delay decision-making problems. In this paper, we use a variation of the flip flop neural network described in [5] and apply problem statements pertaining to prediction and classification of time series data. Flip flop model is compared with the popular RNN variant, long short-term memory (LSTM), and the observations of the comparative study are described in Sect. 3.

2.1 Long Short-Term Memory (LSTM) Long short-term memory (LSTM) is a popular and dominant variant of RNNs and one of the most widely employed memory-based units [1, 2]. They are widely used for sequential and time series-based problems. They have the ability to retain information that would be highly discriminative in the final decision-making process and also exhibit the ability to forget or discard information that contributes less to the performance of the model. The LSTM operates through a gating mechanism. This was done predominantly to overcome the issues of catastrophic forgetting or vanishing gradients that are associated with the long-term memory. Basically, the task of the gates is to purge information that would only end up serving as noise to the model and utilize information that would prove to be crucial. This mechanism of remembering and forgetting information, achieved through the gating mechanism, is implemented by training the gating parameters over the set of input features. The input gate of LSTM decides what new patterns of data would be preserved in the long-term memory. Thus, the input gate filters out the combination of the current input and the short-term memory and transmits it to the downstream structures. The

Flip Flop Neural Networks: Modelling Memory …

167

forget gate of LSTM decides which patterns from the long-term memory will be preserved and which ones would be discarded by multiplying the long-term memory with forget vectors obtained by the current input. The output gate is the one which produces the short-term memory which will be used by the next LSTM cell as memory from the previous time step.

3 Model Architecture Flip flops are electrical latch circuits that can store the information related to the previous time steps. They consist of two stable states: one that stores information of the previous time steps and the other that clears the state. The current state of flip flop is dependent on stable input states and the previous state. SR, JK, D, and T flip flops are the types of flip flops that are widely used in the field of digital electronics. We will be working with the SR flip flop in our simulation experiments as it is the simplest implementation of the bi-stable latch. The JK flip flop is more generalized version of SR flip flop with the ability to avoid undefined state when both the inputs are high. The SR flip flop is a bi-stable latch circuit that consists of two competing inputs S and R, to SET and the RESET, respectively. The output of the circuit at the current time step (Qt ) will have a value of 0 or 1 depending on the states of S and R inputs. The feedback mechanism helps to model memory in this circuitry. Thus, SET (S), RESET (R), and the output of the previous time step (Qt − 1 ) are given as input to the flip flop for the particular time step. Table 1 shows the truth table of simple bi-stable SR flip flop. From Table 1, we can interpret that changes to the inputs (S and R) are crucial in determining the state at the current time step (Qt ). The equivalent algebraic equation of the SR flip flop is given below. Q t = S + R  Q t−1

(1)

Since the current output depends on the last state, the SR flip flop has memory property. The complete architecture of a flip flop-based neural network is given in Fig. 1. The network depicted in Fig. 1 has five layers out of which the third layer is the flip flop layer which plays the role of memory. Input to the flip flop layer from the previous layer is divided into half to obtain set and reset input of flip flops. The output Table 1 SR flip flop’s truth table realization

Set (S)

Reset (R)

Feedback (Qt − 1 )

Output (Qt )

1

0

X

1

0

1

X

0

0

0

Qt −1

Qt − 1

1

1

X

Undefined

168

S. Sujith Kumar et al.

Fig. 1 Flip flop neural network consisting of five layers with flip flop layer consists of five flip flops

at the previous time step Q t − 1 is fed back as input to the flip flops to obtain Qt , the output at the next step. Thus, the final step is the propagation of output Qt through a linear layer following till the last output layer. The forward propagation of layers involving the conventional neurons is given as, Z = W · X +b

(2)

where ‘W ’ is the weights of the network initialized via Xavier initialization, ‘X’ is the input data and ‘b’ is the bias term. The output obtained post the application of the activation function is given as, A = tanh(Z )

(3)

Let N i be the total nodes preceding flip flops, and the inputs set (X S ) and reset (X R ) are obtained by, X S = X [Ni mod 2 == 0] (even neurons)

(4)

X R = X [Ni mod 2 == 1] (odd neurons)

(5)

where X[k] is the output of the kth neuron (zero indexed) from the previous layer and mod defines the modulus function to get the remainder. The weights projecting from the previous layer to the flip flops are modelled as one-to-one connections, so that the dimension of previous layer to the flip flop layer is twice that of the flip flops,

Flip Flop Neural Networks: Modelling Memory …

169

W S = W [Ni mod2 == 0]

(6)

W R = W [Ni mod 2 == 1]

(7)

Thus, the weighted inputs S and R are given by, S = X S · WS

(8)

R = X R · WR

(9)

The final state V (t + 1) of the flip flop layer at t + 1 time step is given by the equation, V (t + 1) = S + (1 − R) ∗ V (t) − S ∗ (1 − R) ∗ V (t)

(10)

where V (t) is the previous state of the flip flop layer. The backpropagation through flip flop layer is defined as, ∂E ∂ OF F ∂ S ∂E = · · ∂ws ∂ OF F ∂S ∂ws

(11)

∂E ∂E ∂ OF F ∂ R · = · ∂w R ∂ OF F ∂R ∂w R

(12)

Thus, the partial derivatives of the weights are given by, ∂ OF F = 1 − (1 − R) ∗ V (t) ∂R

(13)

∂ OF F = − V (t) ∗ (S + 1) ∂R

(14)

∂S =S ∂ws

(15)

∂R =R ∂w R

(16)

4 Experiments The FFNN architecture described is applied to three prediction and classification time series prediction problems. The prediction problems considered are:

170

1. 2. 3.

S. Sujith Kumar et al.

Household power consumption. Flight passenger prediction. Stock price prediction.

The time series classification problem considered is indoor movement classification. For different problem statements, different architectures are used depending on the complexity of the data. The changes in the architecture are mainly done to the initial input layer to match the input features. Since the experiments carried out were related to prediction and classification (binary), the final output layer has a single node. The aforementioned time series problems were also tackled using LSTM networks so that a clear comparison could be obtained with a benchmark. The results of the models prediction along with predictions obtained through LSTMs are given below. The results have been obtained empirically by narrowing down on the best set of hyperparameters that yielded the most efficient results for both flip flop network and the LSTM network.

4.1 Household Power Consumption In the household power consumption problem, the dataset was obtained from Kaggle [6] which contained measurements gathered between December 2006 and November 2010, for a total of 36 months. It comprises seven features: the global active power, submetering 1, submetering 2, submetering 3, voltage, global intensity and the global reactive power. The problem was framed in such a way that the model must predict the global active power of the future months provided that it is trained on the historic data of the aforementioned seven features. The architecture followed for the flip flop model comprises three hidden layers similar to Fig. 1, with dimensions: 10, 5 and 10. The input layer and output layer are set with 7 and 1, respectively. The LSTM model on the other hand comprises a hidden layer of size 30. Both the models used a window size of 60 during training to represent the history of the data. Adam optimizer is used to optimize the model parameters, and surrogate loss was calculated using mean squared error (MSE) as a training criterion for both the models. Figure 2 shows the predictions obtained through an FFNN and an LSTM network. Table 2 presents the MSE of the model on the test dataset. From both Fig. 2 and Table 2, it is clearly evident that the flip flop network is more efficient than LSTM in its ability to predict the power consumption pattern. Mean Square Error (MSE) =

N 1  (X i − Yi )2 N i

where ‘N’ is the total number of inputs, X i is the ground truth label for that input, and Y i is the output predicted by the FFNN.

Flip Flop Neural Networks: Modelling Memory …

171

Fig. 2 Predictions done by flip flop network and LSTM on the power consumption test dataset

Table 2 MSE on test data by trained flip flop network and LSTM

Model

Training epochs

Mean squared error

Flip flop network

100

0.00073 (7.3e−4)

LSTM

100

0.00133 (1.3e−3)

4.2 Flight Passenger Prediction The international airline passenger dataset obtained through Kaggle [7] contains the number of passengers travelled every month internationally. The task is to predict this univariate time series representing the number of passengers that would travel in the subsequent months. The architecture of flip flop network used is similar to that for power consumption dataset except the input layer is set with a single neuron and LSTM consists of 20 hidden units. The same setup of loss function and optimizer from previous experimentation were used. From Fig. 3 and Table 3, it can be easily

Fig. 3 Depicts the prediction given by the flip flop network and LSTM on the test data of flight passenger dataset

172 Table 3 MSE on test data by trained flip flop network and LSTM

S. Sujith Kumar et al. Model

Training epochs

Mean squared error

Flip flop network

100

0.0010367 (1.0e−3)

LSTM

100

0.0023718 (2.3e−3)

concluded that the flip flop network clearly outperforms LSTM and is more effective in capturing the relevant temporal information from the history to predict accurately.

4.3 Stock Price Prediction We have taken the Apple stock price dataset for the stock price prediction experiment which consists of Apple’s stock in the period of January 2010 to February 2020 [8]. This is a multivariate time series prediction problem as there were four features: open, high, low and close. The task of the model was to predict the open prices of the stock in future days (test set) based on training on the past data. The FFNN used for this dataset consists of input layer and output layer of dimensions 4 and 1, respectively, and hidden layers are set with the same number of neurons as that of the previous two experiments, whereas LSTM is modelled with 30 hidden units. A window size of 60 days is used to capture the history. Figure 4 shows the predictions made for the subsequent 1200 days by flip flop network and LSTM on test data. Table 4 presents the MSE on test data by both the models. Although the FFNN predicts the correct pattern of the stock’s opening price, this time the predictions are not as accurate as

Fig. 4 Depicts the predictions given by the flip flop model and LSTM on the apple stock test dataset

Table 4 MSE loss on test data by trained flip flop network and LSTM

Model

Training epochs

Mean squared error

Flip flop network

100

0.0048732 (4.8e−3)

LSTM

100

0.0033890 (3.3e−3)

Flip Flop Neural Networks: Modelling Memory …

173

that by LSTM and also the predictions made by LSTM are less noisy as compared to the FFNN prediction.

4.4 Indoor Movement Classification ‘Indoor user movement’ dataset is a benchmark dataset used for time series classification and retrieved from the UCI repository [9]. The dataset is collected by placing four wireless sensors in an environment and a moving subject; on the basis of the subject’s movement, the wireless sensors recorded a series of signal strength along time. Depending on the recorded signal strength from the sensors, the movement was binary classified as −1 and 1, wherein −1 and +1 represent no transition and transition between the rooms, respectively. The architecture utilized for the FFNN consists of 4 neurons and 1 neuron in the input and output layers, respectively, with no changes in hidden layers as used in previous experiments. Further, LSTM is set with 30 hidden units; binary cross-entropy (BCE) loss is used as the empirical loss during training. A window size of 70 was used to look back at data of previous time steps during the training and validation phase. Figure 5 shows the validation accuracy during at every 10 epochs on the validation dataset, which is shuffled and split in a 70:30 ratio from the original dataset. It is noted that at the end of training, the FFNN acquired the accuracy of 91.01, whereas LSTM reached a lesser accuracy of 88.06. Fig. 5 Performance by flip flop network and LSTM in terms of validation accuracy

174

S. Sujith Kumar et al.

5 Conclusion Flip flops modelled as neural networks prove to be an effective way of keeping hold of previous patterns in terms of memory and utilizing them for predictions on the future time steps. Experiments of FFNNs applied to the domains of time series prediction and classification show that flip flop models give performance which is comparable, if not superior, to the performance of LSTMs which are the current state-of-the-art models for solving temporal problems. Application of flip flops can also be extended to more complex domains such as scene analysis, video analysis and understanding.

References 1. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenomena 404:132306 2. Santhanam S (2020) Context based text-generation using LSTM networks. arXiv preprint arXiv: 2005.00048 3. Wu W et al (2019) Using gated recurrent unit network to forecast short-term load considering impact of electricity price. Energy Procedia 158:3369–3374 4. Chakrabarty R et al (2018) A novel design of flip-flop circuits using quantum dot cellular automata (QCA). In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE 5. Pawan Holla F, Chakravarthy S (2016) Decision making with long delays using networks of flip-flop neurons. In: 2016 International joint conference on neural networks (IJCNN), pp 2767– 2773 6. UCI Machine Learning (2016) Household electric power consumption, Version 1, Aug 2016. Retrieved from www.kaggle.com/uciml/electric-power-consumption-data-set/metadata 7. Andreazzini D (2017) International airline passengers, Version 1, June 2017. Retrieved from www.kaggle.com/andreazzini/international-airline-passengers/metadata 8. Nandakumar R, Uttamraj KR, Vishal R, Lokeshwari YV (2018) Stock price prediction using long short-term memory. Int Res J Eng Technol (IRJET) 3362–338 9. Bacciu D, Barsocchi P, Chessa S et al (2014) An experimental characterization of reservoir computing in ambient assisted living applications. Neural Comput Appl 24:1451–1464. https:// doi.org/10.1007/s00521-013-1364-4

Wireless Communication Systems

Selection Relay-Based RF-VLC Underwater Communication System Mohammad Furqan Ali, Tharindu D. Ponnimbaduge Perera, Vladislav S. Sergeevich, Sheikh Arbid Irfan, Unzhakova Ekaterina Viktorovna, Weijia Zhang, Ândrei Camponogara, and Dushantha Nalin K. Jayakody

Abstract Visible light communication (VLC) has become recently attracted a renewed communication trade in underwater environment. However, the deployment of underwater applications and oceanographic data collection is more challenging than a terrestrial basis communication. In this regard, a more sophisticated communication system needs to deploy in harsh aqueous medium. Afterward, collected data transmits with the inland base station for further analysis. The necessity of realtime data streaming for military and scientific purposes a dual-hop hybrid cooperative communication is needed. Throughout this research, a dual-hop hybrid RF and underwater visible light (UVLC) relayed communication system developed under consideration of strong turbulence channel conditions along with miss-alignment of transceivers. Moreover, RF link is modeled by nakagami-m fading distribution, while the UVLC link is modeled by Gamma-Gamma distribution for strong turbulence channel conditions. Furthermore, an amplify-and-forward (AF) and decodeand-forward (DF) protocols were considered to assist information transmission with M. Furqan Ali (B) · T. D. Ponnimbaduge Perera · V. S. Sergeevich · S. Arbid Irfan · U. Ekaterina Viktorovna · W. Zhang · D. N. K. Jayakody School of Computer Science and Robotics, National Research Tomsk Polytechnic University, Tomsk, Russia e-mail: [email protected] T. D. Ponnimbaduge Perera e-mail: [email protected] V. S. Sergeevich e-mail: [email protected] S. Arbid Irfan e-mail: [email protected] D. N. K. Jayakody e-mail: [email protected] Â. Camponogara Federal University of Juiz de Fora, Juiz de Fora, Brazil e-mail: [email protected] D. N. K. Jayakody School of Postgraduate Studies, Sri Lanka Technological Campus, Padukka, Sri Lanka © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_14

177

178

M. Furqan Ali et al.

underwater-based destination. The simulation results are used to analyze RF-UVLC link combination and bit error rate (BER) performance through both AF and DF signal protocols in different waters along with the large- and small-scale factors in highly turbid water mediums and pointing errors of the propagated light beam. We used to Monte Carlo approach for the best fitting curves to yield simulation results. Keywords Cooperative communication · Hybrid underwater visible light communication (HUVLC) · Underwater wireless communication (UWC) · Visible light communication (VLC)

1 Introduction Underwater wireless communication (UWC) has become a promising future technology for ocean observation. An evolutionary revolution in wireless communication underwater signaling has become an attracted trade to explore unknown undersea sectors and phenomenal activities. It encourages to increase human interests toward underwater environment. Additionally, UWC technology is a significant approach to enable the realization of many potential applications. A numerous underwater applications have been observed in existing literature in terms of observing and monitoring marine life, water pollution control, early detection warning of tsunami, earthquakes, natural resource and plenty of natural hazardous [1]. Tsunami and earthquakes are sudden and unexpected events which are highly impossible to control. However, these natural disasters occur due to imbalance of water phenomenon which can be monitored and detected earlier by deploying detection techniques such as UWC [2]. Moreover, the necessity of commercial and military applications requires sophisticated hybrid UWC methodology with exceptionally secure data. The current state is depicted that the large number of underwater wireless applications deployed based on acoustic signaling. Consequently, acoustic waves are less suitable to fulfill the communication gap. As a result, in underwater acoustic signaling, the waves propagate with very low speed approximately 1500 m/s because of signal delay, low data rate in few kbps and low bandwidth [3]. Therefore, the deployment of underwater applications with high speed and in real-time monitoring married with acoustic link are more challenging. In addition, ocean water has more density, permittivity and electrical conductivity compared to terrestrial channel, and these factors affect signal propagation [4]. On the other hand, the electromagnetic waves propagate over very short distances in underwater and attenuate easily due to intrinsic (physio-chemical) properties of water [5]. Thus, in cooperative hybrid communication scenario, underwater visible light communication (UVLC) is the wireless candidate to fulfill the desired communication requirements by combining with electromagnetic waves (in RF ranges). Indeed, VLC has shown impressive performance against traditional acoustic communication with higher bandwidth, lower time delay and latency, high data rate and better security performances especially for real-time video streaming and underwater mapping for

Selection Relay-Based RF-VLC Underwater Communication System

179

geographical data collection [6]. Furthermore, VLC link has shown superiority over existing traditional wireless candidates in underwater, since the VLC setup is more easy to install, very cost effective for various underwater applications deployment over short distances [7]. VLC carried out a research interest to deploy as communication purposes and opens the door of future opportunities to signal transmission over long distances. Hence, the VLC technology has drawn an attention of many researchers worldwide, mainly due to its potentials for the next-generation communication systems (i.e., 5G and 6G). Additionally, VLC technology based on light emitting diodes (LEDs) plays a major role in the broadband wireless communication technology nowadays. On terrestrial basis communication, VLC carried out a research interest toward high data rate of various deployable applications. It is a solution of increasing demand of high data-traffic and an alternative communication media for indoor applications. It shows high performance especially for indoor wireless communication using LED lamps. LEDs also have many advantages, i.e., low electrical power consumption, tiny in size, reliable for using long lifetime period, cost effective and have a capability of very less heat radiation [8]. Additionally, the current VLC technology possesses various other merits and availability as no electromagnetic interference and radiation with highly secure data rate transmission without any delay. An another approach of VLC could be complimented with FSO link to enable high data rate in multimedia services for improving system quality of service (QoS) and performances [9]. RF communication is used as potential wireless signal carrier on terrestrial basis over long distances. However, dual-hop hybrid communication link is investigated for improving system quality over long ranges in different channel conditions and their requirements. A combined RF-underwater optical communication (UWOC) system has been proposed in the literature [10–12]. The above matter facts, widely addressed turbulence and pointing error phenomena for UVLC, few works are recorded. In [13], the authors investigated a vertical UVLC system model using Gamma-Gamma probability distribution and assumed strong channel turbulence conditions of water along with a closed form expression to calculate BER performance at the underseabased destination as formulated. Similarly, in [14], the authors have designed the UVLC system over log-normal distribution and derived closed form expression for asymptotic BER performances. There is one another study carried out an impressive work that proposed a multi-input multi-output (MIMO)-based UVLC channel model and widely analyzed the diversity gain of the system in the presence of turbulence properties of aqueous medium [15]. Throughout this work, we investigate a combined hybrid link with two different communication hops for information transmission in different channel conditions. This study is more focused on VLC underwater link. The underwater channels are highly complex to deploy communication setup and chosen for modulation techniques. To calculate the BER performance by implementing on-off-keying (OOK) modulation technique makes simple the system performance analyses rather than higher modulation techniques. By motivated from this work, we investigated the suitable relay for RF-VLC hybrid dual-hop communication under strong channel conditions along with misalignment of transceivers. Additionally, we comprised the

180

M. Furqan Ali et al.

BER performance for RF-UVLC link considering nakagami-m fading factor with VLC link impaired by strong turbulence channel conditions throughout amplifyand-forward (AF) and decode-and-forward (DF) relay protocols in different types of water media. To the best of our knowledge, in the literature, there are only a few studies that have investigated the VLC signaling in different water mediums. In this regard, the main contribution of this work is to propose the concept of a cooperative hybrid relay RF-UVLC communication system model in highly turbid water channel conditions with different relay protocols.

1.1 Paper Structure The reminder of this study is organized as follows: In Sect. 2, we proposed a model of dual-hop hybrid communication system considering different channel impairments. The overall BER performance calculation based on OOK modulation scheme is summarized in Sect. 3. Then, numerical results of the proposed system model are presented and discussed in Sect. 4. Finally, in Sect. 5, we state some concluding remarks.

2 Proposed System Model The proposed system model is assumed as a hybrid RF-UVLC link where a single antenna source node s broadcasts signal and communicates with underwater-based destination node d through an AF relay node r which is equipped with two antennas for reception. We consider the signal transmission through AF relay (see Fig. 1) and evaluate the system performance. In another assumption, we used a DF relay to assist the transmission of information and comprise the suitable relay protocol for the hybrid RF-UVLC system in different waters. RF link is modeled by the Nakagami-m distribution fading, while VLC link is assumed and modeled by Gamma-Gamma and exponential Gaussian distribution random variables. Moreover, the relay consists of two directional antennas where one antenna toward to the source for receiving signal via RF link and other toward to the destination which is responsible to transmit information to underwater-based destination through the VLC link. The AF relay node receives the signals from s, amplifies and then forwards it to the undersea destination d, the whole system concept is depicted in Fig. 1. While the DF relay receives the signals from s, regenerates the signal and then forwards to the undersea destination d. It is noteworthy that s and d are identical and mounted with a single antenna. The whole system works in half-duplex mode. The signal broadcasted by source to relay and further forward regenerated or amplified information transmission with fixed amplification gain factor to the undersea-based destination.

Selection Relay-Based RF-VLC Underwater Communication System

181

Fig. 1 Proposed system model of dual-hop hybrid cooperative RF-VLC underwater wireless communication, where the source communicates with the destination through a relay in different communication links along with different channel conditions

2.1 Source-Relay (s − r) Hop On terrestrial basis, RF communication link s broadcasts signals with an average electrical signal power E sr . Thus, at the relay the received signal ysr can be written as  (1) ysr = E sr h sr x + n sr , where x denotes the transmitted information signal, n sr is used to model noise considering by additive white Gaussian noise (AWGN) with zero mean and variance σsr2 . Moreover, h sr is used to model channel coefficient between source and relay, s − r , which is modeled by the Nakagami-m distribution with w, z ∈ {s, r, d}. Similarly, the distance between communication nodes is represented by dwz . In addition, the

182

M. Furqan Ali et al.

system assumes intensity modulation and direct-detection (IM/DD) technique with OOK modulation format. Thus, the signal-to-noise ratio (SNR) at the relay is given by E sr |h sr |2 , (2) γsr = σsr2 The average SNR for the s-r link is denoted by γsr = γ¯sr h 2 and can be expressed as γ¯sr =

E sr , σsr2

(3)

Aforementioned, the RF link is modeled by considering Nakagami-m flat fading. Thus, the probability density function (PDF) of expected γsr is Gamma distributed and given as [16] m γ m−1 exp(−γ ), (4) f sr (γ ) = Γ (m) where Γ (·) represents the Gamma function,  = fading factor and average SNR, and m ≥ 21 .

m γ¯sr

denotes the ratio of Nakagami-m

2.2 Relay-Destination (r − d) Hop In order to evaluate strong Gamma induced turbulence fading, we consider the Gamma-Gamma distribution for representing the link between relay and destination (r − d) proposed in [17]. To model VLC link, the channel conditions h rd consider as the combination of path loss h l , water turbidity h t and pointing error h p . Consequently, the channel coefficient for the r − d link is defined as h rd = h l h t h p . Note that h l is deterministic while h t and h p are random variables following the GammaGamma probability and exponential Gaussian distribution, respectively [18]. Furthermore, the RF signal received at the relay is converted into an optical signal using sub-carrier intensity modulation (SIM) scheme and then transmitted to the undersea destination [19]. In this way, the received VLC signal yrd at the undersea destination can be expressed as  (5) yrd = E rd ηrs h rd x¯ + n rd , in which, E rd is the average electrical signal power, η is the electrical to optical conversion efficiency, rs is the photo-detector responsivity, x¯ is the information from the source regenerated by the relay, and n rd is the additive white Gaussian noise with zero mean and variance σrd2 . Taking into account the usage of the AF protocol in the dual-hop hybrid RF-UVLC system model, the received signal information x at the relay is amplified and then forwarded to the undersea destination. Consequently, the received signal information at the undersea destination is given by

Selection Relay-Based RF-VLC Underwater Communication System

yrd = where ρ = √



E rd

E sr |h sr |2 +N0



183

E rd ηrs ρ(ysr )h rd + n rd ,

(6)

denotes the amplify factor. Also, replacing ysr value from

(1) and received signal can be specified as yrd = =



E rd ηrs ρ



 E sr h sr x + n sr h rd + n rd

(7)



 E sr E rd ηrs ρh sr h rd x + E rd ηrs ρh rd n sr + n rd ,       Ps

(8)

Pn

in which Ps and Pn denote the received signal power and additive noise power, respectively. Thus, the SNR at the destination for r − d link can be written as γrd =

2 E sr dsr-t η2 rs2 |h sr |2 |h rd |2 E rd 2 2 2 2 E rd η rs |h rd | σsr2 + E sr2 dsr−t |h sr |2 σrd2

+ σsr2 σrd2

.

(9)

2.3 Underwater Attenuation Coefficient Model Optical link severs due to the physio-chemical properties of water channel and color division organic materials (CDOM). Additionally, the existing suspended small-scale and large-scale factors are also responsible for optical signal fading. The VLC signal is directly affected by the absorption and scattering phenomenon in the underwater environment. In our investigated system, considering the r − d link, the path loss is modeled and using the extinction coefficient c(λ), which is the total sum of the absorption a(λ) and scattering b(λ) coefficients. The expected numerical values of a(λ), (bλ) and c(λ) for simulation results in different waters are mentioned in Table 1. Then, the extinction coefficient, which varies according to the types of water, is described as c(λ) = a(λ) + b(λ), (10)

Table 1 Expected experimental values of small-scale, large-scale and extinction coefficients in different water mediums [20] Different waters for a(λ) (10−3 ) b(λ) (10−3 ) c(λ) (10−3 ) UWC Pure seawater Clear ocean water Coastal ocean water

53 69 88

3 80 216

56 150 305

184

M. Furqan Ali et al.

If the r and d nodes are apart by a given vertical distance dt and the Beer Lambert expression is adopted, the path loss of UVLC link is given by [21] h l = exp(−c(λ)dt ).

(11)

The VLC link requires the proper alignment of beam length. Additionally, the necessity condition of link arrangementis that the receiver should be in field of view (FOV) for proper signal transmission. The modified channel attenuation coefficient is described in terms of path loss and geometrical losses. The geometrical losses depend on the physical constraint of setup, i.e., aperture diameter, full width transmitter beam divergence angle and correction coefficient for simulation results. If the signal is transmitted through a collimated light source, such as a laser diode, then the geometrical losses are negligible, and as a consequence, the signal depends only on the path loss. Moreover, the geometrical losses are taken into account for the diffused and semi-collimated sources, i.e., LEDs and diffused LDs [22]. Thus, the overall attenuation of the optical link in terms of path loss and geometrical losses is described as [23] h l ≈ h pl + h gl ≈

Dr θF

2

dt−2

τ Dr 1−τ , exp −c dt θF

(12)

where Dr , θF and τ are represented receiver aperture diameter, full width transmitter beam divergence angle, and correction coefficient, while h pl and h gl are the path loss and geometrical losses, respectively.

2.4 Water Turbidity Channel Modeling As the proposed system model is paid greater attention to the underwater VLC link, the RF link is excluded from the scope of this research on considering complex channel conditions. Therefore, s − r link simply modeled by nakagami-m fading. Furthermore, VLC link is modeled to consider heavy turbulence channel conditions combining with pointing errors. According to [24], the VLC link under strong channel conditions follows the Gamma-Gamma probability distribution and can be expressed as (αrd +βrd )  (αrd +βrd ) (αrd βrd ) 2 (h t ) 2 −1 K αrd −βrd (2 αrd βrd h t ), (13) f h t (h t ) = 2 Γ (αrd )Γ (βrd ) where Γ (·) is the Gamma function, and modified Bessel function of the second kind is denoted by K (αrd −βrd ) (·). The large-scale αrd and small-scale βrd parameters are, respectively, given by Elamassie et al. [13]

Selection Relay-Based RF-VLC Underwater Communication System







⎢ ⎜ ⎜ αrd = ⎢ ⎣exp ⎝  ⎡

185

0.49σh2t 12 5

1 + 0.56(1 − Θ)σh t

⎟ ⎥ ⎟ ⎥  76 ⎠ − 1⎦





⎤−1 (14)

⎤−1

⎟ ⎥ ⎢ ⎜ 0.51σh2t ⎟ − 1⎥ ⎜ βrd = ⎢ exp 5 ⎦ ⎣ ⎝ 12  6 ⎠ 1 + 0.69σh5t

,

(15)

In (14) and (15), the scintillation index for plane wave model known as Rytov 7 11 variance denoted by σh2t . The Rytov variance can be defined as 1.23Cn2 k 6 L 6 where , refractive-index structure Cn2 and corresponding link length wave number k = 2π λ denoted by L. The variation of αrd and βrd parameters are shown in Fig. 2. It is clearly depicted that as scintillation index increses the parameters decrease and vise-versa although αrd parameter shows an exponentially increment comparing with βrd parameter while increasing scintillation index.

45 -Large scale factor -Small scale factor

40

35

Parameters: ,

30

25

20

15

10

5

0 -2 10

10

-1

10

0

10

1

Log intensity variance

Fig. 2 Analysis of the large- and small-scale factors in underwater medium

10

2

186

M. Furqan Ali et al.

2.5 Pointing Error in Underwater VLC Link An another signal fading source of optical communication is the position deviation of the transmitter optical beam to the receiver aperture, which is named as pointing error. It occurs due to malposition phenomena and inclination of relay buoy and/or receiver, caused by the flexibility of ocean currents and waves. The pointing error is equivalent beam width dependent where the equivalent beam width is described as wzeq = 2σd ζ . The parameter ζ is presented as the ratio between the equivalent beam width radius and the pointing error displacement  standard deviation σd . The random radial displacement Rd is calculated as Rd = Rd2x + Rd2y . The fractions Rd2x and Rd2y are denoted as displacements along with the horizontal and elevation axes, respectively. Moreover, the collected power fraction is assigned by A p . Thus, the pointing error can be expressed as [17] 

2R 2 h p ≈ A exp − 2 d wzeq p

 (16)

3 BER Performance of the System In the proposed dual-hop communication system model, based on (5) and (6) to calculate BER performance of receiving signal, a single-carrier OOK modulation technique is used to transmit information. In the RF-UVLC hybrid communication link, the instantaneous SNR for whole system at the destination employing AF protocol with fixed-gain relay can be calculated as γd =

γsr γrd , γrd + C

(17)

where the fixed-gain amplifying constant is denoted by C. The overall system BER performance on OOK modulation technique over AWGN channel can be calculated as [25] 

γd . (18) BERd = Q 2

4 Numerical Results This section covers the numerical analysis of BER for the proposed dual-hop hybrid RF/UVLC system model considering distinct water medium. Unless otherwise, we used the physical constraints of setup as photo-detector aperture diameter Dr , full width transmitter beam divergence angle θ , distance between base station and relay

Selection Relay-Based RF-VLC Underwater Communication System

187

dsr and the vertical depth of destination from sea surface dt . The numeric values used in simulation are summarized in Table 2. We target to calculate BER performances at the destination which is located vertically in underwater environment depicted as Fig. 1. To simulate the results, we calculate αrd and βrd when the ocean water temperature and salinity vary. In our simulation, αrd and βrd are considered when the water temperature is 5◦ C and salinity at 20 per salinity unit (PSU). The corresponding values used in simulation are summarized in Table 2. Table 2 Numerical values adopted in the simulation Symbol and description Numeric values Aperture diameter (Dr ) Divergence angle (θ) Distance between source and relay (dsr ) Distance between relay and destination (dt ) Laser diode photo-detector efficiency (η) Photo-detector responsivity (rs ) Large-scale factor (αrd ) Small-scale factor (βrd ) Equivalent beam width

5 cm 6◦ 200 m 15 m 0.5 0.28 5.9645 4.3840 2.5

100

10-1

-2

BER

10

10-3

10

With Pointing Error in Pure Sea Water With Pointing Errorin Clear Ocean Water With Pointing Error in Coastal Ocean Water Without Turbulence & Pointing Error Pure Sea Water Clear Ocean Water Coastal Ocean Water

-4

0

10

20

30

40

50

60

SNR, [dB]

Fig. 3 Decode-and-forward (DF) relayed BER performance of hybrid RF-VLC communication link in different water mediums with and without pointing errors

188

M. Furqan Ali et al.

In Fig. 3, we simulate the BER performance of DF relay which is used to assist information in underwater-based destination. In a hybrid DF relayed RF-UVLC link, the best BER performance is achieved in pure seawater. The BER performance without turbulence condition in pure seawater is also analyzed. The BER performance in highly strong turbulence conditions without and with pointing error comprised also depicted in Fig. 3. In more contrast, the BER performance in pure seawater shows better performance in highly turbid water rather than clear ocean water with pointing error in highly turbid water conditions. It is clearly seen that coastal ocean water has shown poor performance as compared to the pure and clear ocean waters due to high turbidity and randomness of water currents. In Fig. 4, the simulation results show the BER performance of AF relayed hybrid RF-VLC link. The BER performances in comparison with both of the strong turbulence and with pointing error conditions are summarized. In Fig. 3, it is clearly seen as pure seawater and clear ocean water have superior performance in high SNR channel conditions. Moreover, AF relay has superior performance in proposed channel model as compared with DF relayed communication link in less SNR channel conditions. If targeting 1 × 10−4 BER at high SNR value, AF relayed model has superior performance regardless of waters comparatively DF relayed communica-

With Pointing Error (Pure Sea Water) With Pointing Error in Clear Ocean Water With Pointing Error in Coastal Ocean Water RF-VLC without Turbulence RF-VLC Pure Sea Water RF-VLC Clear Ocean Water RF-VLC Coastal Ocean Water

10 0

10

-1

BER

10 -2

10

-3

10

-4

0

5

10

15

20

25

30

35

40

45

50

SNR, dB

Fig. 4 Amplify-and-forward (AF) relayed BER performance of RF-VLC communication link in different water mediums with and without pointing errors

Selection Relay-Based RF-VLC Underwater Communication System

189

100

-1

10

-2

BER

10

10-3

10

AF-relay in Pure Sea Water AF-relay in Clear Ocean Water AF-relay in Coastal Ocean Water DF-relay in Pure Sea Water DF-relay in Clear Ocean Water DF-relay in Coastal Ocean Water

-4

0

10

20

30

40

50

60

SNR, dB

Fig. 5 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid communication link in different water mediums

tion link. Relatively, comparable performance in clear ocean water achieving higher BER performance with AF relayed RF-VLC combination in low SNR values. A more detailed comparison between AF and DF relayed RF-VLC hybrid communication link in different waters is depicted in Fig. 5. It is clearly mentioned that the both the relayed links in pure seawater in strong channel conditions over low SNR values show better performance. The BER performance of coastal ocean water shows poor performance in both the combined communication links. The best comparison of BER performance in different water of dual-hop communication link with pointing errors is depicted in Fig. 6. A more detailed comparison is summarized for both relayed communications. In Fig. 6, the performances of AF and DF relay are analyzed in highly turbid channel conditions along with pointing error impairments. It clearly refers that AF relay has shown the superior BER performance than DF relay in all types of channel conditions in low SNR while DF relay shows almost the same performance but relatively higher SNR channel conditions. Thus, achieving high BER in different water mediums at high SNR, the RF-UVLC link shows better performance as combined wireless hybrid communication candidate.

190

M. Furqan Ali et al. 100

10-1

-2

10

-3

BER

10

AF-relay in Pure Sea Water AF-relay in Clear Ocean Water

10

AF-relay in Coastal Ocean Water

-4

DF-relay in Pure Sea Water DF-relay in Clear Ocean Water DF-relay in Coastal Ocean Water

0

10

20

30

40

50

60

SNR, dB

Fig. 6 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid communication link in different water mediums considering only pointing error

5 Conclusion A hybrid communication system is an approachable technique to achieve reliable data rate in different channel conditions regardless of different water mediums. Moreover, VLC is a promising key-enabling technology for acquiring high data rate in different waters. Thus, in this paper, we provided simulation results using Monte Carlo approach to verify the superiority and investigated the BER performance of RFUVLC link under different relay protocols. In this work, a dual-hops channel model consisting of an RF and a VLC underwater link is considered. An underwater-based floating buoy is considered as a relay and is used to assist information transmission through different protocols between onshore base station and undersea node. Then, we calculated the BER performance at underwater-based destination. Throughout this work, we investigated the proposed system model performance in different water types. In the simulation results, it is clearly shown that AF relay shows better BER performances than DF relay-based communication in lower SNR conditions. Acknowledgements This work was funded by the framework of the Competitiveness Enhancement Program of the National Research Tomsk Polytechnic University grant No. VIU-ISHITR-180/2020.

Selection Relay-Based RF-VLC Underwater Communication System

191

References 1. Ali MF, Jayakody DNK, Chursin YA, Affes S, Dmitry S (2019) Recent advances and future directions on underwater wireless communications. Arch Comput Methods Eng, 1–34 2. Ali MF, Jayakody NK, Perera TDP, Krikdis I (2019) Underwater communications: recent advances. In: ETIC2019 international conference on emerging technologies of information and communications (ETIC), pp 1–6 3. Zeng Z, Fu S, Zhang H, Dong Y, Cheng J (2016) A survey of underwater optical wireless communications. IEEE Commun Surv Tutor 19(1):204–238 4. Dautta M, Hasan MI (2017) Underwater vehicle communication using electromagnetic fields in shallow seas. In: 2017 international conference on electrical, computer and communication engineering (ECCE). IEEE, pp 38–43 5. Kaushal H, Kaddoum G (2016) Underwater optical wireless communication. IEEE. Access 4:1518–1547 6. Awan KM, Shah PA, Iqbal K, Gillani S, Ahmad W, Nam Y (2019) Underwater wireless sensor networks: a review of recent issues and challenges. Wirel Commun Mobile Comput 7. Majumdar AK (2014) Advanced free space optics (FSO): a systems approach, vol 186. Springer, Berlin 8. Singh S, Kakamanshadi G, Gupta S (2015) Visible light communication-an emerging wireless communication technology. In: 2015 2nd international conference on recent advances in engineering & computational sciences (RAECS). IEEE, pp 1–3 9. Gupta A, Sharma N, Garg P, Alouini M-S (2017) Cascaded fso-vlc communication system. IEEE Wirel Commun Lett 6(6):810–813 10. Zhang J, Dai L, Zhang Y, Wang Z (2015) Unified performance analysis of mixed radio frequency/free-space optical dual-hop transmission systems. J Lightwave Technol 33(11):2286–2293 11. Ansari IS, Yilmaz F, Alouini M-S (2013) Impact of pointing errors on the performance of mixed rf/fso dual-hop transmission systems. IEEE Wirel Commun Lett 2(3):351–354 12. Charles JR, Hoppe DJ, Sehic A (2011) Hybrid rf/optical communication terminal with spherical primary optics for optical reception. In: 2011 international conference on space optical systems and applications (ICSOS). IEEE, pp 171–179 13. Elamassie M, Sait SM, Uysal M (2018) Underwater visible light communications in cascaded gamma-gamma turbulence. In: IEEE globecom workshops (GC Wkshps). IEEE, 1–6 14. Elamassie M, Al-Nahhal M, Kizilirmak RC, Uysal M (2019) Transmit laser selection for underwater visible light communication systems. In: IEEE 30th annual international symposium on personal, indoor and mobile radio communications (PIMRC). IEEE, 1–6 15. Yilmaz A, Elamassie M, Uysal M (2019) Diversity gain analysis of underwater vertical mimo vlc links in the presence of turbulence. In: 2019 IEEE international black sea conference on communications and networking (BlackSeaCom). IEEE, pp 1–6 16. Illi E, El Bouanani F, Da Costa DB, Ayoub F, Dias US (2018) Dual-hop mixed rf-uow communication system: a phy security analysis. IEEE Access 6:55-345–55-360 17. Elamassie M, Uysal M (2019) Vertical underwater vlc links over cascaded gamma-gamma turbulence channels with pointing errors. In: 2019 IEEE international black sea conference on communications and networking (BlackSeaCom). IEEE, pp 1–5 18. Farid AA, Hranilovic S (2007) Outage capacity optimization for free-space optical links with pointing errors. J Lightwave Technol 25(7):1702–1710 19. Song X, Cheng J (2012) Optical communication using subcarrier intensity modulation in strong atmospheric turbulence. J Lightwave Technol 30(22):3484–3493 20. Hanson F, Radic S (2008) High bandwidth underwater optical communication. Appl Opt 47(2):277–283 21. Mobley CD, Gentili B, Gordon HR, Jin Z, Kattawar GW, Morel A, Reinersman P, Stamnes K, Stavn RH (1993) Comparison of numerical models for computing underwater light fields. Appl Opt 32(36):7484–7504

192

M. Furqan Ali et al.

22. Elamassie M, Uysal M (2018) Performance characterization of vertical underwater vlc links in the presence of turbulence. In: 11th international symposium on communication systems, networks & digital signal processing (CSNDSP). IEEE, pp 1–6 23. Elamassie M, Miramirkhani F, Uysal M (2018) Channel modeling and performance characterization of underwater visible light communications. In: 2018 IEEE international conference on communications workshops (ICC workshops). IEEE, pp 1–5 24. Sandalidis HG, Tsiftsis TA, Karagiannidis GK (2009) Optical wireless communications with heterodyne detection over turbulence channels with pointing errors. J Lightwave Technol 27(20):4440–4445 25. Grubor J, Randel S, Langer K-D, Walewski JW (2008) Broadband information broadcasting using led-based interior lighting. J Lightwave Technol 26(24):3883–3892

Circular Polarized Octal Band CPW-Fed Antenna Using Theory of Characteristic Mode for Wireless Communication Applications Reshmi Dhara

Abstract An innovative design thought intended for a multipurpose multiuseful printed antenna with coplanar waveguide (CPW)-fed support circular polarization (CP) is depicted in this manuscript. Using theory of characteristic modes (TCMs) is investigated for octal band circular polarization (CP). TCM depicts that the whole radiator contribute to excite electric and magnetic modes to generate broad impedance performance also. To find resonating CP frequencies and radiating behavior, seven characteristics modes are excited using asymmetric CPW-fed techniques. The antenna is made up of a radiator that is contained of a ring whose shape is hexagonal. This is connected with an annular ring on the left most corners that generate wide circular polarization. The implemented design of antenna generates a broad impedance bandwidth (IBW) band spanning over 1.5 GHz—beyond 14 GHz. Additionally, the 3 dB axial ratio band widths (ARBWs simulated) for octal bands are 310 MHz (3.13–3.34 GHz, f c (CP resonating frequency) = 3.2 GHz), 310 MHz (6.45–6.76 GHz, f c = 6.6 GHz), 40 MHz (8.08–8.12 GHz, f c = 8.1 GHz), 120 MHz (8.63–8.74 GHz, f c = 8.7 GHz), 180 MHz (9.49–9.67 GHz, f c = 9.5 GHz), 30 MHz (11.69–11.72 GHz, f c = 11.7 GHz), 40 MHz (12.19–12.23 GHz, f c = 12.2 GHz) and 140 MHz (12.57–12.71 GHz, f c = 12.6 GHz) correspondingly. Keywords Octal band antenna · Circular polarized · CPW-fed · Theory of characteristic modes (TCMs)

1 Introduction Presently, printed monopole antennas are widely utilized because of their many attractive features, like omnidirectional radiation patterns, wide impedance bandwidth, ease of fabrication, low cost, and lightweight. On the other hand, monopole antennas are well-matched with integrated circuitry of wireless communication R. Dhara (B) Department of Electronics and Communication Engineering, National Institute of Technology Sikkim, Ravangla 737139, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_15

193

194

R. Dhara

devices because of their easy feed techniques. Additionally, maximum number of the monopole antennas aims are to support linearly polarized (LP) radiation. Utilization of CP antennas is more beneficial to create and obtain CP EM waves and is comparatively less dependent to their exact positionings. The CP is habitually produced by stimulating two nearly degenerate orthogonal resonant modes of equal amplitude. So, if CP is generated by the monopole antenna, its performance may get significantly improved. CP antennas can generate polarization variety by creating both left-hand circular polarization (LHCP) and right-hand circular polarization (RHCP). Circular polarization can be generated by single feed with slotted loop for L-band communication [1]. Here, the extremely big size of the antenna achieved IBW and ARBW of both 11.1% (140 MHz, f c = 1.262 GHz). A rectangular microstrip antenna of size 24 × 16 × 1.5875 mm3 with slotted ground plane attained linearly polarized (LP) IBW are 5.125–5.395 and 5.725–5.985 GHz [2]. A planar antenna with large dimension 30 × 30 × 1.6 mm3 achieved LP IBW of 220 MHz, i.e., 8.9% [3]. Another large triple-strip antenna with size 50 × 50 × 1.6 mm3 gave dual ARBW of 70 MHz at the lower band (1.57 GHz) and 60 MHz at the upper band (2.33 GHz) within IBW spanning over 1.43–3.29 GHz [4]. Another designed concept has been discussed in Ref. [5] for getting wide IBW and wide ARBW. The papers cited above produced either only dual or triple CP bands with narrower IBW. In this paper achieved IBW is very wide and produced octal bands CP are large compared to earlier reports and the structure of the antenna is very simple. The proposed antenna generated ultra-wideband with superior impedance matching over wider frequency range and able to exciting CP multibands. CPW-fed is used in this antenna. The proposed antenna simultaneously has reasonably higher gain, wider bandwidth, and multi-CP characteristics in comparison to the earlier cited antennas. Owing to antenna designs generating octal CP bands, motivated us for to focus our work on planning a compact antenna giving octal or more CP bands. However, the TCM analysis is lacking in previously reported literature for the wideband/UWB antennas. Here, the proposed CP antenna also utilizes the analysis of TCM tools [6] for the broad impedance band and octal CP band response. Herein this paper, the proposed antenna is designed using Eqs. (i)–(viii) following the references of some related existing designs [7–9]. Primary goal of this work was to design a multi-CP band application antenna for small form factor devices. We hoped to design the circularly polarized compact planar monopole antenna with single-fed for octal bands CP applications. This would defeat the necessity for use of multiple circular polarized antennas. The implemented antenna is planed taking 1.5 GHz as the theoretical lower resonating frequency, so that it can cover all of the Wi-Fi, WLAN, and UWB bands. But after optimization, the simulated impedance bandwidth of the proposed antenna observed at 1.5 GHz with smaller dimension of size compared to theoretical size. This is excellent result fulfilling the criteria for miniaturization. Our designed antenna gave octal band CP characteristics, in addition also gave broad IBW. A hexagonal ring connected with an annular ring over left most corners which gives wide CP bands (AR ≤ 3 dB) inside the range IBW curve. In association with related study, in our knowledge, this is one of the best results achieved. FR4-epoxy substrate is

Circular Polarized Octal Band CPW-Fed Antenna Using …

195

used here, which produces some extra complicacies beyond 12 GHz. It places a restriction on our proposed antenna that it cannot be used for applications further than microwave frequency band. Simulation was done using ANSYS Electronics Desktop 2020R1. For the proposed antenna simulated IBW span is over 1.5 GHz to beyond 14 GHz. In addition, the simulated ARBWs for octal bands are 310 MHz (3.13–3.34 GHz), 310 MHz (6.45–6.76 GHz), 40 MHz (8.08–8.12 GHz), 120 MHz (8.63–8.74 GHz), 180 MHz (9.49–9.67 GHz), 30 MHz (11.69–11.72 GHz), 40 MHz (12.19–12.23 GHz), and 140 MHz (12.57–12.71 GHz). Size of the antenna is 55 × 56 × 1.6 mm3 , with 23.24% size reduction can be possible. The paper is prepared as follows: Sect. 2: Theory of Characteristics Modes analysis; Sect. 3: Procedure of Antenna Design; Sect. 4: Experimental Result and Discussion; and Sect. 5: Conclusion.

2 Theory of Characteristics Modes Analysis (TCMs) Here, CMA operation for implemented antenna is demonstrated. Figure 1 shows the CMA analysis for this octal band circularly polarized antenna. Figure 1a described the implemented antenna configuration and plot of eigenvalues versus frequency plot of the seven fundamental characteristic modes. The eigenvalues (λn = 0) for mode 2, 3, 4, 6, 7, 8, 10 are dominant mode, whereas no mode is inductive mode as it has very high eigenvalues (λn > 0) and 1, 5, and 9 modes are capacitive mode as it has low high eigenvalues (λn < 0). Figure 1b described the implemented antenna configuration and plot of characteristics angle versus frequency plot of the seven fundamental characteristic modes. Here modes 2, 3, 4, 7, 6, 8, 10 cross 180° axis line at resonant frequencies 12.36, 11.8, 10.65, 9.93, 8.27, 8.09, 3.90 GHz, respectively, which are dominant mode, whereas 1, 5, 9 modes are non-resonant mode as they does not cross 180° axis line. Similarly Fig. 1c described large model significance value around 1 is dominant at their resonant frequencies for mode 2, 3, 4, 7, 6, 8, 10 and model significance ( β1 or |xk − xk+1 | > β2 else

(11)

The data which has been collected has various upper and lower limits. Thus, we form an aggregate indexing compromising eight factors thus normalizing it as composite unit has different units of measurement. As a result, each parameter is normalized by Eq. 12. Xi =

X i − X min X max − X min

(12)

where X i = Normalized value lies between [0, 1] X min = Minimum Value observed X max = Maximum Value observed. We have computed the relationship between groundwater level and water contamination level and for that an aggregate scoring method is needed. We have given equal weights to all the parameters taken for analysis. The eight-dimensional indices maybe represented in eight dimensions with minimum value 0 and maximum of 1. The aggregate score of water contamination uses weighted Euclidian distance from the ideal point of (1, 1, 1, 1, 1, 1, 1, 1). So, the calculation is given in Eq. 13.

Groundwater Level Prediction and Correlative Study …



Agw = 1 −

591

(1 − T )2 + (1 − p H )2 + (1 − C)2 + (1 − B)2

+(1 − N )2 + (1 − Fc)2 + (1 − T c)2 + (1 − F)2 8

(13)

where T = Temperature pH = pH of water C = Conductivity B = B.O.D N = Nitrate Level Fc = Faecal Coil T c = Total Coil F = Fluoride level. The data has been preprocessed and now we have built the empirical model which will be modeled to feed the data in our neural network modeling. Let the water quality is taken for constant place and time and the parameter number is j then we can say: Si,n =



   Yi,1 , T1 . . . Yi,n , Tn

(14)

The feeded data will be linear imputation computed using Eq. 15. L(t) = Yi,u +

 (Yi,u − Yi,v )  t − Ti,u (Ti,u − Ti,v )

(15)

4.2 Model Simulation and Discussion The model has been simulated on the normalized and processed dataset and predictions are made of the next years to get the insight on how the water contamination level is varying across the years in the said states. As the data is being normalized and aggregated using Formulas 12 and 13, respectively, 1 will denote highest amount of water contamination whereas 0 will give the lowest (Figs. 8, 9 and Table 2).

5 Correlated Study Between Groundwater Level and Groundwater Contamination Through our modeling, in Sect. 3 we have built a multivariate LSTM neural net model and it has been modeled under different conditional scenarios and the output curve shows us that under best-case scenario the groundwater level is much better

592

A. Chatterjee et al.

Fig. 8 Predictive curve for groundwater contamination level. Source Created by Author

Fig. 9 Loss curve for the model. Source Created by Author

Groundwater Level Prediction and Correlative Study …

593

Table 2 Results of model Metrics name

Metrics value

Parameters

RMSE

0.00004235

500 epochs, 32 batch size, 8 time step (multivariate)

Source Created by Author

Table 3 Pearson correlation results

Pearson correlation table

Groundwater contamination

Groundwater level

−0.6754

Source Created by Author

than the worst-case scenario and the average case follows the similar usual trend. In Sect. 4, we have predicted the water quality level by taking eight parameters of water contamination, the entire data has been processed and an aggregated final score is generated using data normalization and Euclidian distance formation. We know that there is a constant water pressure balancing among groundwater and seawater so that seawater cannot seep into making the water unusable. In this section, we have presented a correlated study between these two factors, viz. groundwater level and groundwater contamination. We analyze how much these factors are being correlated across the year and how they are influencing each other. Along with this, we examine how much change in the water contamination is being caused by the change of groundwater level. At first, we have examined the correlation value between the two variables using Pearson’s correlation factor (Table 3). The results from the correlation value show there is a high negative correlation among the variables taken. This denotes that our assumption of water pressure balancing is true and the water is getting contaminated with gradual decrease of groundwater level leading to increase of water contamination. To solidify our claims, we use OLS model keeping our water level as independent variable and the contamination level as dependent value and from the result we can conclude that how much contamination level is increasing per 1 unit change of groundwater level (Table 4). It is very much clear from the correlation result that these two variables are related in negative way thus constant loss of groundwater level is misbalancing the water balance and water contamination is increasing. OLS model suggests that for every 1-unit loss of groundwater there will be 0.0356 unit increase of water contamination level. Comparing the two above graphs, i.e., Figs. 10 and 11, it can be concluded that the time during which the groundwater level is decreasing, the contamination level is Table 4 OLS model results

Coefficient from OLS model Source Created by Author

−0.0356

594

A. Chatterjee et al.

Fig. 10 Future prediction curve for groundwater level. Source Created by Author

Fig. 11 Future prediction curve for aggregated groundwater contamination. Source Created by Author

Groundwater Level Prediction and Correlative Study …

595

evidently increasing. Even from the correlation factor, it is observed that these factors have negative correlation. Hence, with increasing groundwater level, the contamination is decreasing over the given period of time. Also, if the predicted plots are considered for both the factors, it can be observed that even in the upcoming time, the contamination level will decrease with increasing groundwater level.

6 Concluding Remarks and Future Scope of Study Decreasing groundwater level is a serious situation across India. Along with the rising level of groundwater contamination is questioning the existence of human race. In this paper, we have analyzed and predicted the upcoming trend for the groundwater availability using LSTM modeling under different circumstantial scenarios. We have assumed scenarios which will affect our results in water level. Simulation result of the model is being taken for analysis. In the next section, we have predicted the trend for groundwater contamination level, the data is being highly preprocessed and an aggregate function value is being generated using Euclidian distance method. The trend has been analyzed and in the next section a correlative study has been shown in between groundwater level and ground contamination level. We observed a negative correlation among the said variables indicating the lowering level of groundwater level is allowing the unsuitable seawater seep into the floors and the reaming water is getting contaminated through this. Lastly, we showed prediction for coming years using interpolated-average moving value of the time series. The study could be extended by taking the water level for more states or across the nation. We can use genetic algorithms as optimizer function such as ABC algorithm, LA algorithm to check the trend style. The time span taken could be extended for 10 more years for better curve result.

References 1. Sarkar A, Pandey P (2015) River water quality modelling using artificial neural network technique. Aquatic Procedia 4:1070–1077 2. House PLA, Chang H (2011) Urban water demand modeling: review of concepts, methods, and organizing principles. Water Res Res 47(5) 3. Gwaivangmin BI, Jiya JD (2017) Water demand prediction using artificial neural network for supervisory control. Nigerian J Technol 36(1):148–154 4. Coulibaly P, Anctil F, Aravena R, Bobée B (2001) Artificial neural network modeling of water table depth fluctuations. Water Resour Res 37(4):885–896 5. Gulati A, Banerjee P (2016) Emerging water crisis in India: key issues and way forward. Indian J Econ Special Centennial Issue 681–704 6. Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran. Stochastic Environ Res Risk Assess 30(7):1797–1819

596

A. Chatterjee et al.

7. Adamowski J, Karapataki C (2010) Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: evaluation of different ANN learning algorithms. J Hydrol Eng 15(10):729–743 8. Benítez R, Ortiz-Caraballo C, Preciado JC, Conejero JM, Sánchez Figueroa F, Rubio-Largo A (2019) A short-term data based water consumption prediction approach. Energies 12(12):2359 9. Res EAM Incoming water crisis in India: as understood by others 10. Rojek I (2008) Neural networks as prediction models for water intake in water supply system. In: International conference on artificial intelligence and soft computing. Springer, Berlin, Heidelberg, pp 1109–1119 11. Saravanan M, Sridhar A, Bharadwaj KN, Mohanavalli S, Srividhya V (2015) River network optimization using machine learning. In: International conference in swarm intelligence. Springer, Cham, pp 409–420 12. Shenoy N, Nayak P (2019) Lion algorithm-optimized long short-term memory network for groundwater level forecasting in Udupi District, India. arXiv preprint arXiv:1912.05934 13. Pan M, Zhou H, Cao J, Liu Y, Hao J, Li S, Chen CH (2020) Water level prediction model based on GRU and CNN. IEEE Access 8:60090–60100 14. Tsanis IK, Coulibaly P, Daliakopoulos IN (2008) Improving groundwater level forecasting with a feedforward neural network and linearly regressed projected precipitation. J Hydroinf 10(4):317–330 15. Suhag R (2019) Overview of ground water in India. PRS 16. van der Lugt BJ, Feelders AJ (2019) Conditional forecasting of water level time series with RNNs. In: International workshop on advanced analysis and learning on temporal data. Springer, Cham, pp 55–71 17. Satishkumar U, Kulkarni P (2018) Simulation of groundwater level using recurrent neural network (RNN) in Raichur District, Karnataka, India. Int J Curr Microbiol App Sci 7(12):3358– 3367 18. Mohanty S, Jha MK, Kumar A, Sudheer KP (2010) Artificial neural network modeling for groundwater level forecasting in a river island of eastern India. Water Resour Manage 24(9):1845–1865 19. Halder S, Roy MB, Roy PK (2020) Analysis of groundwater level trend and groundwater drought using standard groundwater level index: a case study of an eastern river basin of West Bengal, India. SN Appl Sci 2(3):1–24 20. Ghosh NC, Singh RD (2009) Groundwater arsenic contamination in India: vulnerability and scope for remedy 21. Gokhale R, Sohoni M (2015) Detecting appropriate groundwater-level trends for safe groundwater development. Current Sci 395–404 22. Hu Z, Zhang Y, Zhao Y, Xie M, Zhong J, Tu Z, Liu J (2019) A Water quality prediction method based on the deep LSTM network considering correlation in smart mariculture. Sensors 19(6):1420 23. Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability 11(7):2058 24. Bowes BD, Sadler JM, Morsy MM, Behl M, Goodall JL (2019) Forecasting groundwater table in a flood prone coastal city with long short-term memory and recurrent neural networks. Water 11(5):1098 25. Daliakopoulos IN, Coulibaly P, Tsanis IK (2005) Groundwater level forecasting using artificial neural networks. J Hydrol 309(1–4):229–240

A Novel Deep Hybrid Spectral Network for Hyperspectral Image Classification K. Priyadharshini @ Manisha and B. Sathya Bama

Abstract Image classification is the process of allocating land cover classes to picture elements. Hyperspectral image classification of land cover is not easy due to the problematic variability among the samples for training count of band spectra. Deep learning has network which is capable of learning from unstructured or unlabeled data without supervision. The convolutional neural network (CNN) is one of the most widely employed approaches for visual data processing based on deep learning. To classify the hyperspectral data, a hybrid network with 2D and 3D CNN is developed. The spectral information and spatial information are used together for hyperspectral image analysis that enhances the experiment result considerably. With pixel as the basic analysis unit, classification technique of convolutional neural network has been developed. Principal component analysis (PCA) is implemented which lowers the dimensionality of hyperspectral data. PCA implementation reduces feature size to increase computational efficiency. The first principal component has a superior function, as it has the highest variance compared to the other components. Three hyperspectral datasets are used for the analysis such as Pavia university, Indian pines, and Salinas scene. Hyperspectral data of Indian Pines is obtained from the Northwest Indiana on June 1992 by Airborne Visible Infrared Imaging Spectrometer (AVIRIS). Indian pine data has the size (145 × 145) pixels. The Pavia university is obtained from Reflective Optics System Imaging Spectrometer (ROSIS) Pavia, Northern Italy, in 2001. The data has 103 spectral bands with the size of 610 × 340 pixels. The Salinas scene is obtained from the AVIRIS Salinas Valley, CA, USA, in 1998, with 512 × 217 spatial dimension. The hybrid CNN is computationally efficient compared to the 3D CNN and for minimum training data it provides enhanced performance. Keywords Hybrid spectral network · Convolutional neural network · Principal component analysis · Support vector machine

K. Priyadharshini @ Manisha (B) · B. Sathya Bama Thiagarajar College of Engineering, Madurai, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_43

597

598

K. Priyadharshini and B. Sathya Bama

1 Introduction Hyperspectral images (HSIs) comprise numerous of near-image band spectra to impart simultaneously both rich spectral and spatial detail. Hyperspectral devices receive the energy from several overlapping small spectral channels of electromagnetic spectrum, i.e., hyperspectral sensors capture information as a collection of images covering hundreds of small and contiguous spectral bands over a broad range of spectrum, allowing the detection of accurate spectral signatures for different signal materials. Hyperspectral imaging is a technique that produces a spectral variance spatial map, making it as useful tool for many applications. The huge amount of bands allows the way for objects to be identified by corresponding output in spectral domain. However, this huge amount of band spectra is the factor which results to complex problems in analysis techniques. From either airborne or spaceborne platforms the hyperspectral imaging cameras collect radiance data. Before analysis, techniques that radiance data must be converted to apparent surface reflectance. HSIs are obtained by scanning different spectral bands within the same area. Such spectral domain data may add some extent of similarities, implying that two consecutive bands may show identical perceptions. But recognition of these hyperspectral associations is beneficial. A precise reflectance or radiance distribution may be recorded at each pixel. The resulting hyperspectral image (HSI) can be used to locate objects, classify different components, and detect processes in various fields of application such as military, agricultural, and mineralogical applications.

2 Related Works HSI technology was primarily used in many difficult Earth observation and remote sensing applications such as greenery monitoring, urbanization research, farm and field technology, and surveillance. The need for quick and reliable authentication and object recognition methods has intensified the interest in applying hyperspectral imaging for quality control in the agriculture, medicinal, and food industries. Definitely, a physical material with rich spectral details has its own characteristic reflectance or radiance signature. Hyperspectral remote sensors have a superior discriminating capability, particularly for materials which are visually similar. These distinctive features enable numerous uses in fields of computer vision and remote sensing, e.g., military target identification, inventory control, and medical diagnosis, etc. There is therefore a tradeoff between high spectral resolution and spatial accuracy. The benefits of hyperspectral imaging over conventional approaches include reduced sample processing, non-destructive design, fast acquisition times, and simultaneous simulation of spatial distribution of various chemical compositions [1].

A Novel Deep Hybrid Spectral Network for Hyperspectral …

599

One of the main problems is how the HSI functions can be easily removed. Spectral—spatial features are currently commonly used, and efficiency in HSI classification has slowly increased from using only spectral features to using spectral—spatial features together [2]. Deep learning models have been developed for the purpose of classifying HSI to remove spectral—spatial characteristics. The core view of deep learning is to derive conceptual features from original input, using superimposed multilayer representation. In SAE-LR, the testing time is enhanced in comparison with KNN and SVM. Also, it takes much time for the training [3]. The methods such as WI-DL and QBC observes more time for testing and training. The traditional image priors need be integrated into the DHSIS method to advance the accuracy and performance [4]. There is a longest training time observed for the Salinas scene dataset for the proposed CNN [5]. The method of band selection is vital to choose the salient bands before fusion with the extracted hashing codes to decrease training time and save storage space. The major drawback is the requirement of a number of properly labeled data for the model preparation.

3 Methodology A deep hybrid spectral network is developed by using the 2D and 3D convolutional layers. The 3D CNN extracts the spatial and also the spectral data but at increased computational complications. For handling the spectral information, the 2D CNN is not suitable as it contains only the spatial information. The hyperspectral data obtained from the AVIRIS and ROSIS hyperspectral sensors are used for the analysis. Hyperspectral datacube : I ∈ R M×N ×D

(1)

I—Original input M—Width of input N—Height of input D—No. of spectral bands. Principal component analysis (PCA)—after dimensionality reduction, the spectral bands are reduced to B from D. PCA reduced datacube X ∈ R M×N ×B

(2)

X-Modified input after PCA Using 2D-CNN or 3D CNN results in a very complex model. The major cause occurs because of the reason that hyperspectral data are volumetric data. The 2D CNN alone cannot extract good discriminating maps of the spectral dimensions from the

600

K. Priyadharshini and B. Sathya Bama

Fig. 1 Network structure for deep hybrid spectral network

feature. Correspondingly, a 3D CNN is more computationally complicated and seems to perform worse alone over several spectral bands for groups with similar textures. Figure 1 represents the deep hybrid network for the classification of hyperspectral image. For the hybrid spectral network, the 3D CNN and 2D CNN layers are constructed in such a way that they make full use of both spectral and spatial feature maps to achieve optimum accuracy. The dataset is divided into overlapping small 3D patches and the calculations are tabulated in Table 2. The total 3D patches observed from X is given by, (M − S + 1) × (N − S + 1)

(3)

The 3D patch at location (α, β) is represented by Pα,β , acquires the width from α − (S − 1)/2 to α + (S − 1)/2,

(4)

β − (S − 1)/2 to β + (S − 1)/2

(5)

and height from,

The input data in 2D CNN is transformed with 2D kernels. The convolution comes about, measuring the sum of the dot product between the input data and the kernel. To cover maximum spatial dimension, the filter is strided over the input data. The 3D convolution is achieved by translating the 3D data to a 3D kernel. The feature maps of the convolution layer are created in the hybrid model for HSI data using the 3D kernel over multiple contiguous bands within the input layer.

A Novel Deep Hybrid Spectral Network for Hyperspectral …

601

The 2D CNN is applied once before the flatten layer, bearing in mind that the spatial information inside the varying spectral bands is severely discriminated against without significant loss of information from the spectral domain, which is necessary for HSI data. In the hybrid model, the total count of parameters depends on the count of classes in a dataset. Table 1 represents the calculation of total trainable parameters of the proposed hybrid CNN. The network consists of four convolutional layers and three dense layers. Of the four convolutional layers, three are 3D convolutional layers and the remaining one is the 2D convolutional layer (Table 2). In hybrid network, the count of trainable weight parameters for the Indian pines dataset is 5, 122, 176 and the count of the patches is 14641. The Adam optimizer backpropagation algorithm is used to train the weights and initialize it randomly to a value.

4 Dataset The experiments were conducted on three hyperspectral dataset such as Pavia university, Indian pines, and Salinas scene. In 1992, the AVIRIS sensor obtained Indian pines dataset over the Indian pines test site in northwest Indiana. IP has images with a spatial dimension of 145 × 145 m/pixel and a wavelength of 224 spectral bands varying from 400 to 2500 nm, 24 bands were omitted. 16 vegetation groups which are mutually exclusive are located in the IP dataset. Nearly 50% (10,249) of a total of 21,025 pixels, however, include ground truth information from each of the 16 different classes. Pavia university dataset was acquired by the ROSIS sensor, Northern Italy, in 2001. This consists of 610 spatially 340 pixels and spectral information is recorded with 1.3 mpp spatial resolution within 103 bands varying from 430 to 860 nm wavelength. The ground truth is conceived to provide nine levels of urban land. In addition, approximately 20% of total 207,400 picture elements include information about ground reality. The Salinas scene dataset was collected in 1998 over the Salinas Valley, CA, USA by the 224-band AVIRIS sensor, and the images are 512–217 spatial dimensions and spectral information is encoded in 224 bands with a wavelength varying from 360 to 2500 nm. For both in Salinas scene and Indian pines due to water absorption, 20 spectral bands were discarded.

5 Results The plot of accuracy, epoch, and loss provides an indication of useful things about the training of the model, such as the speed of convergence over epochs (slope). The classified image obtained from the hybrid CNN is represented in Fig. 2.

Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776

Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776

Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776

Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512

Salinas scene

Conv_2d − (3 × 3 x 96 × 64) + 64 = 55,360 Dense 1 − (18,496 × 256) + 256 = 4,735,232 Dense 2 − (256 × 128) + 128 = 32,896 Dense 3 − (128 × 16) + 16 = 1161 Total trainable parameters = 4,844,793

Conv_2d − (3 × 3 x 576 × 64) + 64 = 331,840

Dense 1 − (18,496 × 256) + 256 = 4,735,232

Dense 2 − (256 × 128) + 128 = 32,896

Dense 3 − (128 × 16) + 16 = 2064

Total trainable parameters = 5,122,176

Total trainable parameters = 4,845,696

Dense 3 − (128 × 16) + 16 = 2064

Dense 2 − (256 × 128) + 128 = 32,896

Dense 1 − (18,496 × 256) + 256 = 4,735,232

Conv_2d − (3 × 3 x 96 × 64) + 64 = 55,360

Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856 Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856 Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856

Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512

Pavia university

Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512

Indian pines

Table 1 Trainable parameters

602 K. Priyadharshini and B. Sathya Bama

A Novel Deep Hybrid Spectral Network for Hyperspectral …

603

Table 2 Calculation of number of patches Dataset

M

N

S

No. of patches

Indian pines

145

145

25

14,641

Pavia university

610

340

19

190,624

Salinas scene

512

217

19

98,306

Fig. 2 Indian pines, Pavia university and Salinas scene, respectively—classification

Table 3 Accuracy comparison

Network

Accuracy (%)

SVM

88.18

2D CNN

89.48

3D CNN

90.40

Hybrid network

99.79

The number of epoch considered for the proposed hybrid CNN is 100. For the validation and training samples, the value of loss convergence and accuracy is obtained. Table 3 represents the accuracy comparison of the proposed hybrid network with the 3D CNN, 2D CNN, and the support vector machine. The hybrid spectral network provides an accuracy of 99.79% (Fig. 3).

6 Conclusion Hyperspectral image classification is not easy as the ratio involving number of bands in the spectral domain and the number of samples for training is adverse. Three benchmark hyperspectral datasets such as Salinas scene, Pavia university, and the Indian pines are used for the classification. The 3D or 2D convolution single-handedly cannot reflect the highly discriminatory function as opposed to 3D and 2D hybrid

604

K. Priyadharshini and B. Sathya Bama

0

1

1 0.5 0

0

1 13 25 37 49 61 73 85 97

1 1223344556677889

loss Vs Epochs

3.5

Epochs

Epochs

Epochs

loss Vs Epoch for Pavia University

3

loss Vs Epochs of Salinas Scene 6

2

2.5 2 1.5

loss

1.5

Loss

Loss

1.5

1 18 35 52 69 86

0.5

2

1 0.5

1

1 16 31 46 61 76 91 1 11 21 31 41 51 61 71 81 91

0

2 0

0

0.5

4

Epochs

1 16 31 46 61 76 91

1

Accuracy Vs Epochs

Accuracy

Accuracy Vs Epoch for Pavia University

Accuracy

Accuracy

Accuracy Vs Epoch for Indian pines 1.5

Epochs

Epochs

Fig. 3 Plot of Accuracy versus Epoch and loss versus Epoch for Indian pines, Pavia University and Salinas Scene

convolutions. The proposed model is more beneficial than the 2D CNN and 3D CNN. The used 25 × 25 spatial dimension is most suitable for the proposed method. The experimentation is carried on three hyperspectral datasets to analyze and compare the performance metrics. Classification of hyperspectral data using SVM provides an accuracy of 88.18%. The performance of the proposed model is able to outperform 2D CNN (89.48%) and 3D CNN (90.40%) by providing the accuracy of 99.79%. The hybrid CNN is computationally efficient compared to the 3D CNN and for minimum training data it provides enhanced performance.

References 1. Chang C-I (2003) Hyperspectral imaging: techniques for spectral detection and classification. Springer Science and Business Media, vol 1 2. Camps-Valls G, Tuia D, Bruzzone L, Benediktsson JA (2014) Ad_x0002_vances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process Mag 31(1):45–54 3. Liu P, Zhang H, Eom KB (2017) Active deep learning for classification of hyperspectral images. IEEE J Select Top Appl Earth Observ Remote Sens 10(2) 4. Dian R, Li S, Guo A, Fang L (2018) Deep hyperspectral image sharpening. In: IEEE transactions on neural networks and learning systems, vol 29, no 11 5. Yu C, Zhao M, Song M, Wang Y, Li F, Han R, Chang C-I (2019) Hyperspectral image classification method based on CNN architecture embedding with hashing semantic feature. IEEE J Select Top Appl Earth Observ Remote Sens 12(6)

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE Enhancement and Classifying with Support Vector Machine P. Raja Rajeswari Chandni

Abstract Ophthalmic diseases are generally not serious, but can be lifesaving too. Even though genetic eye disorders have their own significant effect for generations, man-made disorders due to certain unhealthy practices can induce serious conditions like vision loss, retinal damage, macular degeneration caused in young adults due to smoking, and so on. Besides all the odds, detection of diseases way before they start to threaten could be easier to get rid of major damage. This proposed system focuses on providing first-level investigation in detecting ophthalmic diseases and to assist subjects to identify the anomalous behavior earlier as well as initiate remedial measures. Retinal fundus images used undergo pre- and post-processing stage, then is trained, tested, and classified based on the disorders like vitelliform macular dystrophy (VMD), retinal artery and vein occlusion (RAVO), Purtscher’s retinopathy (PR), and diabetic patients with macular edema (ME). Keywords SVM · Edge-aware local contrast adaptive histogram equalization (EALCLAHE) · Color features · Image processing

1 Introduction Ophthalmic diseases possess no threat to human beings initially; however, change over time causes impeccable effect to the subject. A human visual system model (HVSM) is employed by the computer vision experts in diagnosing diseases through digital image and video processing by CAD systems. These experts are providing simplified models that are easy to understand and work on for further processing, exploring, and identifying abnormalities. Manual assessments can often be miscalculative while correlating the actual symptoms. Thus, improved and quality enhanced systems are needed to provide correct diagnosis at the right time, where medical image processing (MIP) algorithms come into existence [1]. The MIP’s strategy to work with images includes producing images from normal fundus image to upgraded P. Raja Rajeswari Chandni (B) Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, India © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_44

605

606

P. Raja Rajeswari Chandni

form like improving quality, differentiating regions, identifying edges, working on non-uniformity, measuring entropy, color balancing, etc., and has significant effect in processing the medical images [2, 3]. The MIP’s strategy also helps the diagnosis of such retinal abnormalities acquired with the help of color photographs, in which the image is undesirable for all types of retinal disease identification. So effective preprocessing techniques are needed to process and produce contrast image with better color balanced enhancement by avoiding non-uniform illumination [4]. In order to denote the differences between processed and original image, so many evaluators are available like NIQE, RIQA, etc., to assess the output images. But processing of color images is not easier as that of the gray scaled ones, and thus, in this paper, an effective algorithm for reducing non-uniform illumination while preserving the edges is implemented to identify diseases and help the society in treating disease before it gets worsen.

2 Related Literary Work Sakthi Karthi Durai et al. [1] in this paper discuss the various diseases detected from retinal fundus images. The detected diseases are age-related macular degeneration (AMD), cataract, hypertensive retinopathy, diabetic retinopathy. Various classifiers and preprocessing methods were reviewed of which adaptive histogram equalization forms the major role in preprocessing and SVM gave the best output. Kandpal and Jain [5], Sarika et al. [6] in this paper deal with the various methods of enhancing the color texture features that are used in the preprocessing of retinal fundus images and the CLAHE method with edges and dominant orientations proved to be better of all. The technique simply suppresses the non-textured pixels and enhances the textured so that better quality image is obtained which is estimated with the help of BRISQUE. Shailesh et al. [7] in this review paper brief the comparison of techniques so far employed using various preprocessing to classification stages. The initial preprocessing stage mostly used CLAHE together with color moments for effective image retrieval process. CLAHE proved to be flexible in detecting the non-textured region. Segmentation techniques deployed were ROI, green channel extraction, quantization, thresholding, etc. A series of classifiers was incorporated, in which support vector machine (SVM), radial basis function neural network (RBFNN), artificial neuro fuzzy (ANF), artificial neural network (ANN), random forest (RF), decision tree performed well. Onaran et al. [8] in this paper discuss on the findings of the Purtscher’s retinopathy (PR), that is caused mainly due to the traumatic state, which mainly forms like a small series of cotton wool spots. In this paper, two image types are used OCT and FUNDUS. On comparing the results between the two, FUNDUS images provided a good start for the accumulation of the small bilateral cotton wool spots manly on the posterior poles of the retina.

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

607

Xiao et al. [9] in this paper deeply discuss the vector quantization technique that is effective in detecting macular edema (ME). As the color images are od 24-bit, the clustering vectors divide the image into eight levels with the seven threshold values injected to the segmentation process of 8 × 2 × 2. Thus, most of the information is extracted from this segmentation technique for the processing of RGB image. Anusha et al. [10] in this paper explain the feature extraction techniques that are used to get the color moments mainly for image retrieval and the values of mean, stand deviation, skewness distribution between moments. The textured features like entropy, energy, etc., are also got to understand the features well. Thus, image retrieval process is effectively fast using color moments.

3 Proposed System The proposed system utilizes CAD systems for the diagnosis for medical data on physical abnormality. Firstly, the retinal fundus images are preprocessed with help of histogram technique and are also quantized to obtain the intermediate levels color information. Secondly, segmentation of the images is done using thresholding and quantization mainly for the purpose of post-processing to avoid missing any information which could be useful in detecting the retinal surface’s abnormality. Thirdly, features are extracted to get the texture details, information regarding feature differentiation, energy, mean amplitude, median, and standard deviation to make it easy for the algorithm to obtain a firm decision in detection of diseases. Finally, all the abovementioned steps are summarized and coupled to the training step for classification using SVM classifier and is tested for accuracy (Fig. 1).

4 Description of the Schematic Diagram 4.1 Dataset Retinal fundus images were initially collected from online platform like MESSIDOR, DRIVE, STARE, KAGGLE, CHASE, ARIA, ADCIS, etc., which aim at providing datasets for research purposes. The process is initialized with the acquisition of images through a digital camera or fundus images obtained from the dataset repository.

608

P. Raja Rajeswari Chandni

Fig. 1 Schematic overview of the proposed system

4.2 Stage I—Preprocessing Retinal fundus images are prominent to periodic noises like salt and pepper and Gaussian noise; thus, a firm preprocessing technique is required to strengthen the overall region of the image. The introduced (EALCLAHE) technique initially processes RGB components separately along with gamma correction and G-matrix to display as the RGB dash components. The clip limit is set with Rayleigh distribution, upon setting the edge threshold limit to leave away intact the strong edges with minimum intensity amplitude and initiating the amount of enhancement for smoothing the local contrast. Further, for denoising the enhanced images denoising convolution neural networks (dnCNN) which has pretrained nets and offers better noise reducing techniques over the others. The flowchart of the preprocessing step is listed below in Fig. 2. For testing the image quality, PSNR, SSIM, NIQE, BRISQUE quality metrics are used (Fig. 3). The testing parameters for the noise removal used in this proposed method are: • Peak Signal-to-Noise Ratio (PSNR): Peak signal-to-noise ratio or the PSNR [11] is employed to calculate the variations present in the original and denoised image of size M × N. This is estimated using equation of SNR and is expressed in decibel (dB). The original image and denoised images are represented as r (x, y) and t(x, y) respectively.

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

609

Fig. 2 Schematic flow of the preprocessing system

P SN R = 10 log10

2552 MSE

(1)

where 255 is the highest intensity value in the grayscale image and MSE is the mean-squared error and is given by.

610

P. Raja Rajeswari Chandni

Fig. 3 Results of the preprocessing step

 M SE =

M,N [r (x,

y) − t(x, y)]2

M*N

(2)

• Structural Similarity Index Measure (SSIM): SSIM index is calculated for a selected window ranging between x and y of size N × N may be drained in the subsequent way

(2μμw + C1 )(2σ (I, Iw ) + C2      SSIM =  2 μ + μ2w + C1 (σ I )2 + σ Iw2 + C2

(3)

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

611

Table 1 List of quality metrics for four different test images Evaluation parameters Sample image 1 Sample image 2 Sample image 3 Sample image 4 PSNR

21.2276

23.5478

22.7431

23.9875

SSIM

0.8631

0.8976

0.9054

0.9127

NIQE

3.0017

3.0006

2.9871

3.0076

25.0067

23.1549

24.5479

24.3103

BRISQUE

μx = > Average value of x. μy = > Average value of y. x 2 = > ‘x’—Variance y2 = > ‘y’—Variance C 1 = (k 1 L)2 , C 2 = (k 2 L)2 . C 1 and C 2 the variables to stabilize the weak denominator. L is the dynamic range. The default values of k 1 = 0.01, k 2 = 0.03. • Naturalness Image Quality Evaluator (NIQE): Naturalness or no-reference image quality score is a nonnegative scalar value that measures the distance between natural scenes calculated from image A with respect to the input model. Score = niqe(A);

• Blind/Referenceless Image Spatial Quality (BRISQUE): BRISQUE predicts the score of the image with the use of support vector regression models that are trained on sets of images which respond to differential mean opinion score values. Usually the BRISQUE score values vary from 0 to 100, lower the value, better is the perceptual quality. Score = brisque(A);

The testing parameters of the proposed system for quality metrics are tabulated below (Table 1 and Fig. 4).

4.3 Stage II—Segmentation Though the previous step acts upon the improvement of image data to suppress the undesirable distortions, segmentation is also utmost necessary to fine tune the properties of the image. The proceeding step is the conversion of RGB color space to HSV (hue, saturation, value) in order to obtain the luminance values of each color which provides better information for the feature extraction process. The individual threshold values for each color are obtained based on range set segmentation and are listed below in Table

612

P. Raja Rajeswari Chandni

Fig. 4 a, d, g Original images, b, e, h luminance enhanced images, c, f, i images with proposed EALCLAHE output

Table 2 List of the threshold values obtained and is set for the quantization level

Images

Threshold value of H

Threshold value of S

Threshold value of V

Test image 1

0.5

0.5

0.3

Test image 2

0.45

0.5

0.4

Test image 3

0.5

0.45

0.04

2. Then, the images are quantized to separate into eight levels. In an aim to obtain the complete color information regarding the hue, saturation, value of the RGB color space (Fig. 5).

4.4 Stage III—Feature Extraction Feature extraction is the step that which forms the base of predicting the diseases through classification.

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

613

Fig. 5 Segmented image using imquantize ( )

The proposed system utilizes color texture analysis using Gabor filter. Gabor filters are so flexible that it offers higher degrees of freedom over Gaussian derivatives. During texture analysis, Gabor features are extracted to analyze the particular frequency of an image in specified direction around the region of analytical interest. Wavelet transform is also used to support the texture analysis in obtaining a combo of feature vectors. Further, image retrieval allows diagnostic process more effective to differentiate images based on its color features (Table 3).

4.5 Stage IV—Classification This step helps in differentiating between the similarity features. The features like color moments, mean amplitude, standard deviation, energy, entropy, and color textures are obtained from the above step and are injected to the classifier to perform at its best. SVMs are efficient classifiers mainly for machine learning researches. There are several functional approaches including polynomial, radial basis, neural networks, etc. The linear SVM classifier maps points into divided categories such that they are wide separated with gaps between them. Thus, a hyperplane is selected to classify the dataset provided and the plane must satisfy the condition Yi [(w · xi ) + b] ≥ 1 − εi , εi ≥ 0 where W = the weight vector

(4)

614

P. Raja Rajeswari Chandni

Table 3 List of the feature extraction techniques used

Features

Equations

Energy

−1 N −1 N

p(x, y)2

x=0 y=0

Correlation

−1 N −1 N

(1−μx )(1−μ y ) p(x,y) σx σ y

x=0 y=0 −1 N −1 N

Entropy

p(x, y) log( p(x, y))

x=0 y=0 N −1 N −1

Contrast

|x − y|2 p(x, y)

x=0 y=0

Mean

1 n

  ri j p ri j

L−1 

i =0

Standard deviation

j =0  N

i=1 (x i j −x)

2)

N

b = the bias εi = the slack variable. The prompted system is classified effectively using support vector machine (SVM). This system uses 7:3 ratio for training and testing of image database. Further, the classified images are inspected with the help of confusion matrix in order to examine the classifiers performance. The Accuracy of the system is represented graphically in the fig below.

4.6 Results and Conclusion The classified images are successfully tested with the help of confusion matrix and the number of misclassified images is also available in confusion matrix table and their accuracies are plotted in the graph. The accuracy is estimated using TP, TN, FP, FN parameters. The overall accuracy of the system obtained to be 94.3%. Further using various other comparison techniques can improve the quality of the system (Figs. 6, 7 and Table 4). Accur acy % =

TP + TN × 100 TP + FP + TN + FN

(5)

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

Fig. 6 Confusion matrix for the proposed system

Fig. 7 Graphical representation of the accuracies obtained for each class

615

616 Table 4 Accuracy values for individual class

P. Raja Rajeswari Chandni Categories

Accuracy (%)

VMD

0.913

RAVO

0.924

PR

0.851

ME

0.924

5 Conclusion and Future Scope The ability to detect various diseases in their early stage is a very useful work for the society. This system can be useful in healthcare domain, especially in routine checkups and so, the diseases could be caught in their initial stages. It helps in recognizing the disease of the person and minimizes the cost of diagnosing the disease. The proposed system is based on digital image processing technique, which combines the features of Retinal color, shape, texture to form a feature vector for texture analysis and then predicting the disease using supervised learning SVM algorithm. Our future enhancement is to implement this project with hardware setup to overcome some limitations of image processing and to further enhance the model to a product or even an android application which can be used to conduct test without human supervision and no duly cost for diagnosing.

References 1. Sakthi Karthi Durai et al B (2020) A research on retinal diseases predictionin image processing. Int J Innov Technol Explor Eng 9(3S):384–388. https://doi.org/10.35940/ijitee.c1082.0193s20 2. Vonghirandecha P, Karnjanadecha M, Intajag S (2019) Contrast and color balance enhancement for non-uniform illumination retinal images. Tehniˇckiglasnik 13(4):291–296. https://doi.org/ 10.31803/tg-20191104185229 3. Rupail B (2019) Color image enhancement with different image segmentation techniques. Int J Comput Appl 178(8):36–40. https://doi.org/10.5120/ijca2019918790 4. Jiménez-García J, Romero-Oraá R, García M, López-Gálvez M, Hornero R (2019) Combination of global features for the automatic quality assessment of retinal images. Entropy 21(3):311. https://doi.org/10.3390/e21030311 5. Kandpal A, Jain N (2020) Retinal image enhancement using edge-based texture histogram equalization. In: 2020 7th international conference on signal processing and integrated networks (SPIN), Noida, India, pp 477–482. https://doi.org/10.1109/SPIN48934.2020.9071108 6. Sarika BP, Patil BP (2020) Automated macula proximity diagnosis for early finding of diabetic macular edema. In: Research on biomedical engineering. Springer, Berlin. https://doi.org/10. 1007/s42600-020-00065-9 7. Shailesh K, Shashwat P, Basant K (2020) Automated detection of eye related diseases using digital image processing. In: Handbook of multimedia information security: techniques and applications, pp 513–544. https://doi.org/10.1007/978-3-030-15887-3_25 8. Onaran Z, Akbulut Y, Tursun S, O˘gurel T, Gökçınar N, Alpcan A (2019) Purtscherlike retinopathy associated with synthetic cannabinoid (Bonzai) use. Turkish J Ophthalmol 49(2):114–116. https://doi.org/10.4274/tjo.galenos.2018.67670

Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE …

617

9. Xiao W, He L, Mao Y, Yang H (2018) Multimodal imaging in purtscher retinopathy. Retina 38:1. https://doi.org/10.1097/IAE.0000000000002218 10. Anusha V, Reddy V, Ramashri T (2014) Content based image retrieval using color moments and texture. Int J Eng Res Technol 3 11. Gonzalez R, Woods R Digital image processing

Analysis of Pre-earthquake Signals Using ANN: Implication for Short-Term Earthquake Forecasting Ramya Jeyaraman, M. Senthil Kumar, and N. Venkatanathan

Abstract Earthquake is complex physical phenomena. The heterogeneous nature of the earth’s interior is the reason for the unpredictable nature of earthquake occurrence. In recent years, scientists across the world are trying to develop a model using multiparameter earthquake precursors. In this paper, we discuss the association of abnormal irregularity in solid earth tides (SET) and anomalous transient change outgoing longwave radiation (OLR) with major earthquakes and utilizes a neural network to forecast the occurrence of notable earthquakes. We have considered the area of Simeulue, Indonesia region, and considered earthquakes of magnitude >5.0 takes place during the period from 2004 to 2014. Earthquake parameters for Simeulue, Indonesia region, has been taken for analysis by which anomaly date of solid earth tide and weights has been assigned for the continual anomaly days, OLR anomaly date, distance, day of OLR anomaly, latitude, longitude, anomaly index which appears before the earthquake are selected as input parameters, whereas the date of occurrence of earthquake, latitude, longitude, depth, magnitude are selected as output parameter for the neural network. We have used Elman backpropagation neural network model for forecasting the above-said output parameters. The analysis of the results given by the EBPNN have shown reasonable accuracy. Even though the results have to be tested in other regions, the results of the EBPNN have shown encouraging signs in developing an effective short-term earthquake-forecasting model. Keywords Earthquake forecasting · Solid earth tides · Outgoing longwave radiation · Artificial neural network

1 Introduction Scientists have associated solid earth tides with the occurrence of earthquakes, as the displacement produced by earth tides affects the motion of the tectonic plates; the results suggest that the big earthquakes are triggered by the abnormal irregularity R. Jeyaraman · M. Senthil Kumar · N. Venkatanathan (B) SASTRA Deemed to be University, Thanjavur, Tamil Nadu, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_45

619

620

R. Jeyaraman et al.

in solid earth tides (SET) and anomalous transient change in outgoing longwave radiation (OLR). Ide [1] has confirmed that the majority of the higher magnitude earthquakes are likely to happen when there is high tidal stress which is limited to specific regions or circumstances. Similarly, transient thermal abnormalities occurring before destructive earthquakes were detected by Russian scientists during the late 1980s through the use of satellite technology. By understanding the atmospheric earthquake signals scientifically by making use of advanced remote sensing instruments, satellite thermal imaging data can be used as an effective tool in the detection of OLR anomaly [2].

1.1 Involving Concepts A machine learning subgroup of artificial intelligence(AI) which offers computational statistical tools to explore the data and train a model by analyzing complex algorithms by programming. A subgroup of techniques for machine learning strategy attempts to imitate the deep learning anatomy of the human brain. This includes a multineural network architecture that focuses on predictive analytics used to construct complicated models and algorithms to generate a predictive analysis. In making reliable decisions, predictive analytical models provide an upward drive and expose “complex and secret perspectives” by learning from past patterns and historical data relationships.

1.2 Introduction to Neural Networks When there is a need for tabular datasets processing, solving classification prediction problems, and regression prediction problem, artificial neural networks (ANN) is used. Moustra et al. [3] developed an artificial neural network for time series analysis of seismic electric signals through which input data is the magnitude and the corresponding output will be the next day magnitude and performance evaluation has been made. Recurrent neural networks(RNN) is a class of ANN that is used when the information is in the form of time series data and when there is temporal dynamic behavior, CNN is used when there is a need to map the image data to a resulting variable when there is a temporal dynamic activity. It holds well with data that has a spatial relationship. Asim et al. [4] predicted using seismic features and classified in combination with support vector regressor— hybrid neural network prediction system and their performance can be measured for a particular region. Vardaan et al. [5] discussed forecasting earthquakes and trends using a series of past earthquakes. Long short-term memory (LSTM), one of the categories of RNN, is used for modeling the series of earthquakes. The model is

Analysis of Pre-earthquake Signals Using ANN: Implication …

621

trained for predicting the future trend of earthquakes. It is contrasted with feedforward neural network (FFNN) and as a result, LSTM was found to better than FFNN.

1.3 Elman Backpropagation Neural Network The Elman neural networks is a type of dynamic RNN that follows a varied feedforward topology. The architecture of Elman neural network comprises of the input layer, followed by a hidden layer, and finally the output layer. The best part of using the Elman network is that the particular context input nodes which memorize the previous data of the hidden nodes. This makes Elman-NN applicable in the fields of dynamic system identification and prediction control parameters used for creating the Elman backpropagation neural network. The historical dataset of precursory parameters of the earthquake occurred in Simeulue, Indonesia region from 2004 to 2018 is considered for training and subsequent testing will be done for the historical dataset of precursory parameters of the earthquake that occurred in Simeulue, Indonesia region, during 2004–2014. We have utilized the United States Geological Survey (USGS) for obtaining earthquake catalogs. The number of iterations will be changed to forecast the earthquakes concerning spatial variables such as latitude, longitude, magnitude, and date/time of occurrence with reasonable accuracy for achieving optimization. These precursors were observed several days to months before the occurrence of big earthquakes; hence, we have used ANN to forecast the occurrence of earthquakes.

2 Study Area Indonesia is prone to earthquakes due to its location on the Ring of Fire, an arc of volcanoes and fault lines in the Pacific Ocean basin. The field shaped like a shoe extends 40,000 km (25,000 miles) and is where most earthquakes occur around the world. Several large earthquakes have struck the Indonesian region as Indonesia’s tectonics seem to be highly complex because, and many tectonic plates like Eurasian Plate, Australian Plate, Philippine Sea Plate will meet at this point and the Pacific Plate between two oceanic plates [6]. Sumatra sits above the convergent plate boundary, where the Australia Plate is suppressed along with the Sunda megathrust under the Sunda Plate. The convergence on this section of the boundary is strongly oblique and the strike-slip portion of the plate movement is accommodated along the Great Sumatran Fault on the right side. The Sunda megathrust activity has triggered several tremendous earthquakes. The February 20, 2008 magnitude 7.4 Simeulue, Indonesia earthquake occurred as a result of a thrust fault on the border between the Australia and Sunda plates.

622

R. Jeyaraman et al.

The Australia plate travels north-northeast toward the Sunda plate at a pace of about 55 mm/year at the location of this earthquake [7] (Table 1). In this study, 28 earthquakes of Simeulue, Indonesia region, earthquakes are analyzed in terms of magnitude (Fig. 1). Table 1 List of earthquakes occurred in Simeulue, Indonesia region since 2004 with magnitude >6 (data provided by USGS https://earthquake.usgs.gov) Event

Origin time

Latitude

Longitude

Mag

Depth

Place

25-07-2012

00:27:45.260Z

2.707

96.045

6.4

22

Simeulue, Indonesia

26-01-2011

15:42:29.590Z

2.205

96.829

6.1

23

Simeulue, Indonesia

09-12-2009

21:29:02.890Z

2.759

95.91

6

21

Simeulue, Indonesia

29-03-2008

17:30:50.150Z

2.855

95.296

6.3

20

Simeulue, Indonesia

20-02-2008

08:08:30.520Z

2.768

95.964

7.4

26

Simeulue, Indonesia

22-12-2007

12:26:17.470Z

2.087

96.806

6.1

23

Simeulue, Indonesia

29-09-2007

05:37:07.260Z

2.9

95.523

6

35

Simeulue, Indonesia

07-04-2007

09:51:51.620Z

2.916

95.7

6.1

30

Simeulue, Indonesia

11-08-2006

20:54:14.370Z

2.403

96.348

6.2

22

Simeulue, Indonesia

19-11-2005

14:10:13.030Z

2.164

96.786

6.5

21

Simeulue, Indonesia

08-06-2005

06:28:10.920Z

2.17

96.724

6.1

23.5

Simeulue, Indonesia

28-04-2005

14:07:33.700Z

2.132

96.799

6.2

22

Simeulue, Indonesia

30-03-2005

16:19:41.100Z

2.993

95.414

6.3

22

Simeulue, Indonesia

26-02-2005

12:56:52.620Z

2.908

95.592

6.8

36

Simeulue, Indonesia

27-12-2004

20:10:51.310Z

2.93

95.606

5.8

28.9

Simeulue, Indonesia

28-12-2004

03:52:59.230Z

2.805

95.512

5

24.9

Simeulue, Indonesia

29-12-2004

10:52:52.000Z

2.799

95.566

5.4

23.2

Simeulue, Indonesia

01-01-2005

01:55:28.460Z

2.91

95.623

5.7

24.5

Simeulue, Indonesia

05-02-2005

04:09:53.640Z

2.325

95.065

5.1

30

Simeulue, Indonesia

09-02-2005

01:02:26.190Z

2.278

95.156

5

28.7

Simeulue, Indonesia

24-02-2005

07:35:50.460Z

2.891

95.729

5.6

30

Simeulue, Indonesia

28-03-2005

12:56:52.620Z

2.335

96.596

5.4

28.9

Simeulue, Indonesia

28-03-2005

16:34:40.570Z

2.087

96.503

5.5

30.6

Simeulue, Indonesia

28-03-2005

16:44:29.780Z

2.276

96.183

5.1

30

Simeulue, Indonesia

28-03-2005

17:03:34.430Z

2.751

96.049

5.4

30

Simeulue, Indonesia

28-03-2005

18:48:53.500Z

2.467

96.758

5.1

26.8

Simeulue, Indonesia

28-03-2005

19:54:01.090Z

2.889

96.411

5.6

29.2

Simeulue, Indonesia

28-03-2005

23:37:31.350Z

2.914

96.387

5.4

28.6

Simeulue, Indonesia

Analysis of Pre-earthquake Signals Using ANN: Implication …

623

Fig. 1 A location map for Simeulue, Indonesia region. The red color indicates the epicenters over the region of 2004–2018 earthquakes which hasa magnitude above 6 as listed in Table 1

3 Methodology 3.1 Anomalous Outgoing Longwave Radiation At the height of the atmosphere, outgoing longwave radiation (OLR) is thermal energy reflecting the sum of energy released by the earth’s surface and atmosphere into space. It is the budget for earth radiation. NOAA—A polar-orbiting spacecraft observing OLR values. (W/m2 ) is the energy flux of outgoing longwave radiation from the surface of the earth. From 160 °E to 160 °W longitude, data is centered on equatorial regions. The raw data is translated into a regular index of anomalies. A grid resolution of 1° × 1° (latitude × longitude) is the information obtained. OLR data is time series data spanning the entire planet. Variations in the anomaly were found 3-60 days or seven months before the devastating earthquakes. Anomaly variations of the OLR flux have been determined from the mean OLR flux of the past 10 years, Elmτ =

t  τ =1

Elmτ

(1)

624

R. Jeyaraman et al.

where “t” is the number of predefined previous year for which mean OLR flux is determined for given location (l, m) and time (t) Flux index (Elmτ ) =

E pqτ − Elmτ σlmτ

(2)

where Elmτ —Flux index value for p-latitude, q-longitude, and data acquisition time (t). Elmτ —Current OLR value flux determined for spatial coordinates (p, q) and time (t). Elmτ —Mean OLR value flux determined for spatial coordinates (p, q) and time (t). Anomalous nature of flux index of energy “[Elmτ ]∗ ” can be determined by removing out the energy flux index value below +2σ level of mean OLR flux, and it helps in maintaining the duration of anomalous flux observed. If Elmτ ≥ Elmτ + 2 Then, Elmτ = [Elmτ ]∗ ELSE Elmτ = 0

(3)

where [Elmτ ]∗ = Anomalous energy flux index observed for given location and time.

3.2 Elman Backpropagation Neural Network Studies were done on earthquake prediction using recurrent neural networks (RNN) on the pre-earthquake scenario of seismic, solid earth tide, and atmospheric parameters of earthquakes occurred in the Simeulue, Indonesia region, over the past 14 years (2004–2018) will be given as an input for the machine to learn. The use of the neural network in the learning environment can be done in two stages namely training and testing. The retrospective analysis has been made on the earthquakes of Simeulue, Indonesia region. In this present analysis, we looked for the stress that is investigated during the syzygy from the perspective of their seismic activity effect within solid earth tides. The recurrent dy/dt deceleration causes the interlocking of the interface of the tectonic plate, leading to increased tidal stresses. Therefore, such rapid deformation will lead to a shift in the state of stress over the entire seismic zone, leading to the release of maximum energy and thus increasing the risk of earthquakes. Given this

Analysis of Pre-earthquake Signals Using ANN: Implication …

625

Fig. 2 Elman backpropagation network

triggering effect of solid earth tides, earthquakes of greater magnitude are likely to occur (Fig. 2). Our current work focuses on the creation of a neural network model using a recurrent neural network, and the performance of the Elman backpropagation neural network is investigated with the corresponding input parameters to determine the accuracy of the network.

4 Findings and Discussions Florido, Emilio et al. [8] analyzed using the Levenberg–Marquardt backpropagation algorithm is used for training in a feed-forward ANN and the obtained results were compared to the simple backpropagation algorithm. The total neurons which are present were established empirically in the hidden layer [9]. To find the degree of similarity between tidal and OLR occurrence and the magnitude of earthquakes, cross-correlation is made. In the present work, we identified a relationship between SET, OLR, and time of earthquake occurrence by analyzing the earthquake in 2004 and 2014. The results indicate that solid earth tides (SET) contribute to the interlocking of tectonic plates, resulting in the release of huge amounts of thermal radiation due to the heat of transformation phenomenon resulting in irregular tectonic activity.

4.1 Training Function Based on the network training function named trainlm, a neural network model for earthquake prediction has been developed. Trainlm is a supervised neural network training algorithm which works according to Levenberg–Marquardt optimization by updating the weight and bias values.

626

R. Jeyaraman et al.

4.2 Layers and Description of Nodes Input nodes involving two solid earth tide variable variables such as date of solid earth tide anomaly and weights allocated for continual anomaly days, date of atmospheric OLR anomaly, distance, day of OLR anomaly, latitude, longitude, pre-earthquake anomaly index are involved in eight variables. There is no fixed approach for fixing the optimum number of hidden nodes.

4.3 Input Nodes By analyzing the deviation index, a thermal abnormality in the area is obtained. Thermal radiation irregularities have been found to occur in the epicenter area for several months or days before this event [10]. The introduction of radon gas emissions induces an abnormal decrease in relative humidity and a rise in OLR due to the core-to-earth drift of H+ ions to rise tectonic activity. This results in a drop in relative humidity on the surface of the earth. Because of the upward acceleration, the anomaly in OLR flux is due to latent heat release [11].

4.4 Variables Involved The seismic information used in this work is derived from the entire USGS instrumentally reported quakes that occurred in Simeulue, Indonesia. After removing aftershocks and foreshocks, an earthquake magnitude greater than 6 Richter is considered. Input parameters consist of three spatial variables related to earthquake spatial characteristics, a single variable, and two anomaly value-related variables are considered.

4.5 Spatial Parameters Longitude, latitude, and depth of earthquake are three parameters that are allocated to each event.

4.6 Time Variable The day and time between the event and the anomaly happened are considered to be minimum peak date and maximum peak date is considered in this variable. The peak

Analysis of Pre-earthquake Signals Using ANN: Implication … Table 2 List of parameters used for predicting the magnitude of earthquakes occurred in Simeulue, Indonesia region since 2004 with magnitude >6 (data provided by USGS https://ear thquake.usgs.gov)

Parameters

627 Value

Network style

Elman backprop

The feature used for training

Train LM

Adaption learning

Learn GDM

Performance parameter function

MSE

Layers_count

4

Number of neurons

10

Transfer function

Transig

Training

Levenberg–Marquardt

Data division

Random

No. of epochs

10

date difference is calculated by the difference between the acceleration to deceleration that happened between the anomaly days (Table 2). Since the earth is a highly homogeneous medium, the task of predicting earthquakes remains difficult. To ease the difficulty, the neural network method is used to understand the precursors that emerge before the occurrence of major earthquakes due to the different physical changes. A category of recurrent neural networks has a simple structure and it is considered to be an effective tool in solving time sequence problems. It also can reflect the dynamic behavior of the system. Bhatia et al. [12] discussed many techniques for evaluating the multilayer perceptron for different input parameters and a set of hyperparameters and time series analysis can be done effectively using LSTM using different inputs and a different set of hyperparameters. In this work, a neural network is developed utilizing the Elman backpropagation network. The input from the input layer is fed into the hidden layer, then the output from the hidden layer is entered into the output layer. The activation function is then applied to the intermediate layers of the network to get the output. Then the output measured error is backpropagated to the input layer from which its weights are changed. The same approach continues until the corresponding output is identical to the target data or the output error is minimized. From Fig. 3, the observed and predicted largest events using the recurrent neural network are compared by keeping the threshold value of ±0.3 for latitude and longitude. Here, out of 28 earthquakes, 18 earthquakes output hold good for the neural network thereby forecasted output is 64.29% efficient compared with the actual one for latitude, and the error percentage is 35.7. Out of 28 earthquakes, only five earthquakes hold good when compared with the actual one for longitude. Hence, it shows less efficiency for the longitude of 17.85% with an error percentage of 82.1428. Figure 4a shows the expected vs threshold value for depth parameter. It has been inferred from the graph by keeping the threshold value of ±2.5 for depth and ±0.45 for magnitude. For depth value, the forecasted output is 21/28 earthquakes holds

628

R. Jeyaraman et al.

Error Calculation

16 14 12 10 8 6 4 2 0 Above -0.30

0 to -0.3 Latitude

0 to 0.3

Above+0.3

Longitude

Fig. 3 Comparison of the expected versus actual predicted graph for spatial coordinates

a

b

Depth

15 10 5 0

Magnitude

15 10 5 0 Above - 0 to -2.5 0 to 2.5 +2.5

Above +2.5

Above 0.45

0 to 0.45

0 to +0.45

Above +0.45

Fig. 4 Comparison of the expected versus actual predicted graph for a depth value, b magnitude

good which is 75% efficient compared with the actual one and the error percentage of 25. For magnitude, out of 28 earthquakes, 14 earthquakes output hold good for the neural network thereby forecasted output is 50% efficient compared with the actual one and the error percentage is 50. The accuracy can be improved by adding more data to the neural network. The retrospective research findings are following the genetic consensus in the field of seismology: Input: The precursory earthquake parameters for Simeulue, Indonesia region, has been taken for analysis by which date of an anomaly of solid earth tide and weights assigned for the continual anomaly days, OLR anomaly date, distance, day of OLR anomaly, latitude, longitude, anomaly index in which the input parameters appear before the earthquake is chosen. Output: The latitude, longitude, depth, the magnitude will be the output parameter for the neural network. The network is modeled with four hidden layers. Method of training happens by which a predefined desired target output is compared with the actual output and the difference is termed as an error in terms of percentage.

Analysis of Pre-earthquake Signals Using ANN: Implication …

629

From the retrospective analysis, it is inferred that it is possible to forecast the earthquakes (at least in this study region) with reasonable accuracy. The output reveals that the error percentage is minimal and the predicted output holds good for some input vs output parameters as mentioned above. Although at the present earthquake prediction cannot be made with a high degree of certainty, this research offers a scientific method for assessing the short-term seismic hazard potential of an area.

5 Conclusion The significance of anomaly obtained in SET evidence a very high impact of OLR on earthquake triggering. Hence, tidal amplitude irregularities of SET trigger plate tectonics thereby leads to OLR anomalies, which act as a short-term precursor for detecting the time of occurrence of earthquakes. When the tidal triggering is found to be stronger, the larger magnitude earthquake will occur. Through the analysis with higher reliability, we have identified a strong link between the precursors and location of the devastating earthquake and Outgoing longwave radiation and the magnitude of the earthquake. The result we obtained strongly suggests that a neural network model using multiparameter earthquake precursors can develop a short-term earthquakeforecasting model. In this paper, the correlation of peculiar anomalies in solid earth tides (SET) and anomalous outgoing transient shift longwave radiation (OLR) with major earthquakes is obtained. We use a neural network to predict large earthquakes for Simeulue, Indonesia region, and considered earthquakes with a magnitude greater than 5.0 occurred during the period from 2004 to 2014. We discuss the issue in anticipating the spatial variables of the earthquake by finding a pattern through a neural network in the history of earthquakes. Preliminary outcomes of this research are discussed. Although the technique is capable of achieving effectiveness, further efforts are being made to achieve a stringent conclusion. Also, since the networks tend to miss significant aftershocks and pre-shocks, it is expected that the results can be improved by including more number of data in the neural network to achieve better efficiency. Acknowledgements We are greatly indebted to the Ministry of Earth Sciences for financial assistance (Project No: MoES/P. O(seismo)/1(343)/2018). We thank National Oceanic and Atmospheric Administration for providing data for Outgoing Longwave radiation to the user community.

References 1. Ide S, Yabe S, Tanaka Y (2016) Earthquake potential revealed by tidal influence on earthquake size frequency statistics. Nat Geosci Lett. https://doi.org/10.1038/NGEO2796

630

R. Jeyaraman et al.

2. Carreno E, Capote R, Yague A (2001) Observations of thermal anomaly associated to seismic activity from remote sensing. General Assembly of European Seismology Commission, Portugal, pp 265–269 3. Moustra M, Avraamides M, Christodoulou C (2011) Artificial neural networks for earthquake prediction using time series magnitude data or Seismic Electric Signals. Expert Syst Appl 38:15032–15039. https://doi.org/10.1016/j.eswa.2011.05.043 4. Asim KM, Idris A, Iqbal T, Martínez-Álvarez F (2018) Earthquake prediction model using support vector regressor and hybrid neural networks. PLoS ONE 13(7):e0199004. https://doi. org/10.1371/journal.pone.019900 5. Vardaan K, Bhandarkar T, Satish N, Sridhar S, Sivakumar R, Ghosh S (2019) Earthquake trend prediction using long short-term memory RNN. Int J Electr Comput Eng (IJECE) 9(2):1304– 1312. ISSN: 2088-8708. https://doi.org/10.11591/ijece.v9i2.pp1304-1312 6. Boen T (2006) Structural damage in the March 2005 Nias-Simeulue earthquake. Earthq Spectra 22. https://doi.org/10.1193/1.2208147 7. Borrero J, McAdoo BG, Jaffe B, Dengler L, Gelfenbaum G, Higman B, Hidayat R, Moore A, Kongko W, Lukijanto L, Peters R, Prasetya G, Titov V, Yulianto E (2005) Field survey of the March 28, 2005 Nias-Simeulue earthquake and tsunami. Pure Appl Geophys 168:1075-1088. https://doi.org/10.1007/s00024-010-0218-6 8. Florido E et al (2016) Earthquake magnitude prediction based on artificial neural networks: a survey 9. Wang Q, Jackson DD, Kagan YY (2009) California earthquakes, 1800–2007: A unified catalog with moment magnitudes, uncertainties, and focal mechanisms. Seismol Res Lett 80(3):446– 457 10. Jing F, Shen X, Kang C (2012) Outgoing long wave radiation variability feature prior to the Japan M9.0 earthquake on March 11, 2011. In: IEEE international geoscience and remote sensing symposium, Munich, pp 1162–1165. https://doi.org/10.1109/IGARSS.2012.6351341 11. Natarajan V, Bobrovskiy V, Shopin S (2019) Satellite and ground-based observation of preearthquake signals—a case study on the Central Italy region earthquakes. Indian J Phys 12. Bhatia AA, Pasari S, Mehta A (2018) Earthquake forecasting using artificial neural networks. In: The international archives of the photogrammetry, remote sensing and spatial information sciences, vol XLII-5

A Novel Method for Plant Leaf Disease Classification Using Deep Learning Techniques R. Sangeetha

and M. Mary Shanthi Rani

Abstract Agricultural productivity is one of the important sectors that influence the Indian economy. One of the greatest challenges that affect agricultural productivity is plant disease which is quite prevalent in almost all crops. Hence, plant disease detection has become a hot research area to enhance agricultural productivity. Automated detection of plant diseases is hugely beneficial to farmers as it reduces the manual workload of monitoring and detection of the symptoms of diseases at a very early stage itself. In this work, an innovative method to categorize the tomato and maize plant leaf diseases has been presented. The efficiency of the proposed method has been analyzed with plant village dataset. Keywords Agricultural productivity · Classification · Plant leaf disease

1 Introduction Agriculture is one of the significant important sectors that have a great influence on the economy of developing countries. The main occupation of 60% of the rural populace is agriculture, and the livelihood of the farmers depends solely on their agricultural productivity—the greatest challenge faced by farmers in the prevention and treatment of plant diseases. Despite the hard and sustained efforts of farmers, productivity is affected by crop diseases, which needs to be addressed. With the remarkable innovations in sensors and communications technologies, the agricultural sector is becoming digital with automated farm practices like water management, crop disease monitoring, pest control, and precision farming, etc. Classification and identification of plant disease are one of the important applications of machine learning. Machine learning deals with the development of algorithms that perform tasks mimicking human intelligence. It learns abstractions from data just like human R. Sangeetha · M. Mary Shanthi Rani (B) Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed To Be University), Gandhigram, Dindigul, Tamil Nadu, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication, Lecture Notes in Electrical Engineering 749, https://doi.org/10.1007/978-981-16-0289-4_46

631

632

R. Sangeetha and M. Mary Shanthi Rani

beings learn from experience and observations. Machine learning has become a trendy research area with its growing number of applications in computer vision in the fields of medicine, agriculture, remote sensing, forensics, law enforcement, etc. Deep learning is a subset of machine learning which learns data associations using deep neural networks [1]. Several researchers have explored the utilization of deep neural networks in plant disease classification and detection. In this paper, a convolution neural network (CNN) model has been constructed for fast and accurate categorization of diseases affecting tomato leaves and maize leaves. As tomato is one of the income-generating crops of farmers of rural South India, this work has been developed for helping them in early detection of disease. In this research work, a pre-trained CNN model smaller VGG16 net was used to classify the leaf diseases of various plants from the image dataset. The review of the literature is presented in Sect. 2. Our proposed work of the plant leaf disease classification is described in Sect. 3. The experimental setup and results discussion in terms of accuracy are presented in Sect. 4. Conclusion and future enhancement are discussed in Sect. 5.

2 Literature Review Kawasaki et al. [2] developed a new method of deep convolutional neural network to differentiate healthy cucumbers leaves. The CNN model used to identify two injurious microorganism infections: melon yellow spot virus (MYSV) and zucchini yellow mosaic leaves (ZYMV). This model accuracy is 94.9% for cucumber leaf disease. Mohanty et al. [3] described a CNN model to classify the leaf disease using three types of plant leaf dataset: colored images, gray-scaled images, and segmented leaves. Two standard architectures AlexNet and GoogleNet are used for classification. The highest accuracy for AlexNet is 0.9927%, and the GoogleNet is 0.9934%, by the transfer learning. Sladojevic et al. [4] developed a deep convolutional neural network method to classify the plant diseases. The transformation of the CNN model increase is used to increase the dataset size. The accuracy of this model with fine-tuning is 96.3% and without fine-tuning is 95.8%. Nachtigall et al. [5] discussed the application of CNN to detect and categorize images of apple trees. AlexNet has been used to categorize the disease. They compared the shallow technique against a deep convolutional neural network. In that method, the multi-layer perception was chosen. The accuracy of the CNN model 97.3% for apple tree leaves. Brahimi et al. [6] described the convolutional neural network for classifying tomato leaf disease. The tomato leaves are split into nine classes of diseases. The method used two standard architectures to be described. AlexNet and GoogleNet are used in learning from scratch or transfer learning. The GoogleNet improves the

A Novel Method for Plant Leaf Disease Classification …

633

accuracy from 97.71 to 99.18%, and AlexNet improves the accuracy from 97.35 to 98.66%. Dechant et al. [7] applied the deep learning method to classify the maize plants. Three phases were suggested in this model. In the first phase, several techniques have been taught, and in the second phase, a heat map was produced to indicate the probability of infection in each image. The heat map was used in the final phase to classify the picture. The total accuracy for maize leaves was 96.7%. Lu et al. [8] described applying the CNN model to classify the rice leaves diseases. They collect 500 images from the yield to build a dataset. AlexNet was used to create a rice disease classifier, with the overall accuracy of 95.48% for rice leaves. Kulkarni et al. [9] discussed an artificial neural network (ANN) methodology to find plant disease detection and classification. Gabor filter is used for extracting feature that gives better recognition result. An ANN classifier classifies the various kinds of plant diseases and also identifies the mixture of color and leaf features. Konstantinos et al. [10] described the CNN model to detect both diseased and non-diseased leaves. Several model architectures have been trained, with the highest results in disease identification achieving a 99.53% success rate. Fujita et al. [11] applied a CNN classifier model using cucumber disease. The dataset consists of seven different classes, including a healthy class. The work is based on AlexNet architecture to classify cucumber diseases. The accuracy of the proposed work was 82.3%.

3 Materials and Methods The major objective of this research work is to effectively build a convolutional neural network for classification of tomato and maize leaf diseases. Tomato leaves are affected by seven common diseases that consist of target spot, mosaic virus, yellow leaf curl virus, bacterial spot, early blight, late blight, and septoria leaf spot [12]. The three common diseases that affect the maize leaves include northern leaf blight, brown spot, and round spot [13]. CNN is a category of artificial neural network well suited for object detection and classification problem, specifically in computer vision. It is also referred to as ConvNet or CNN. It is a deep learning technique consisting of three layers. One is the input layer which is the starting node and output node which is the ending node and hidden layer which is present in between the input and output layer, and it could be multiple hidden layers present in one layer. The hidden layers contain convolution layer, ReLU, pooling (here we used max pooling), and fully connected layer. The following four steps are carried out in the convolution layer. • • • •

Divide the image into the filter, Multiply the image filter by the corresponding image filter, Add the image filter, Divide the total amount of pixels.

634

R. Sangeetha and M. Mary Shanthi Rani

Pooling layer is used to reduce the space dimension of an image. Batch normalization allows every layer of a network to learn without anyone else’s input somewhat more autonomously of different layers. The proposed work utilizes deep CNN smaller VGG16 with thirteen layers for characterizing various sorts of diseases in tomato and maize leaves. Rectified linear unit (ReLU) is an activation function which is used to convert the positive part of its output. It is the main commonly used function as it learns faster than other functions and computationally less intensive. Figure 1 displays the workflow of the proposed model (Fig. 2). The detailed information regarding the no. of classes and the number of images which are used in the dataset is given in Tables 1 and 2. Figure 3 shows the visual representation of ten types of diseases by healthy and unhealthy leaves. The proposed method involves the following three main stages: 1. 2.

3.

Preprocessing This step involves the selection and fine-tuning of the relevant dataset. Training This stage is the core of a deep learning process which trains the CNN model to categorize diseases using the preprocessed dataset. Testing The trained model is validated with the test dataset, and the accuracy of the model is calculated in this stage.

Fig. 1 Flow diagram of our proposed work for tomato leaves

A Novel Method for Plant Leaf Disease Classification …

635

Fig. 2 Pictorial representation of our proposed work for maize leaves

Table 1 Dataset summary for tomato leaves

# Classes for tomato leaves

# Images

Early blight

1246

Bacterial spot

2127

Target spot

1404

Septoria spot

1771

Yellow leaf curl virus

1963

Late blight

1909

Mosaic virus

1246

Tomato healthy leaf

1591

Total

13,257

3.1 Preprocessing One of the most vital elements of any deep learning application is to train the dataset using the model. In the proposed work, images are taken from plant village dataset. It consists of 13,257 images of tomato leaves and 3150 images of maize images,

636 Table 2 Dataset summary for maize leaves

R. Sangeetha and M. Mary Shanthi Rani # Classes for maize leaves

# Images

Northern leaf blight

854

Brown spot

857

Round spot

855

Maize healthy leaf

584

Total

3150

Fig. 3 a Healthy leaf. b Bacteria spot. c Early blight. d Late blight. e Mosaic virus. f Septoria leaf spot. g Target spot. h Yellow leaf curl virus. i Northern leaf blight. j Brown spot. k Round spot

including both healthy and non-healthy leaves. The dataset is initially divided into the ratio of 80:20 or 70:30 for the training phase and test phase to improve the results. The accuracy of the network depends on the size and proportion that has been taken for training and testing. Overfitting of data results in high test dataset error, and underfitting leads to both high training and test errors. In the proposed method, the dataset is divided into 80:20. All the images are resized to 256 * 256 as a preprocessing step, to reduce the time complexity of the training phase.

3.2 Training In the training phase, the dataset is trained using smaller VGG16 model with ReLU activation function. One important feature of ReLU is that it eliminates negative values in the given input by replacing with zero. This model uses binary cross-entropy rather than categorical cross-entropy.

A Novel Method for Plant Leaf Disease Classification …

637

Fig. 4 Sample model for smaller VGG16 net architecture

3.2.1

Smaller VGG16 Net

Simonyon and Zisserman introduced the VGG network architecture. The proposed model uses a pretrained smaller VGG16 net. Here, thirteen convolution layers are present, and each layer is followed by ReLU layer. Max pooling is present in some convolution layers to trim down the dimension of the image. Batch normalization helps to learn faster and achieve higher overall accuracy. Both ReLU activation function and batch normalization are applied in all experiments. Dropout is a technique which is used to reduce overfitting in the model during the training set. The softmax function is used in the final layer of the deep learning-based classifier. The training phase using VGG16 network is shown in Fig. 4.

3.3 Testing In this segment, the validation set for prediction of the leaf as healthy/unhealthy with its disease name is utilized to estimate the performance of the classifier. Fine-tuning: It helps to improve the accuracy of classification by making the small modification hyperparameters and increasing the number of layers.

4 Results and Discussion The experimental results of our model VGG16 for the plant village dataset are given in Table 3. It lists the classification accuracy of each of the seven diseases along with

638

R. Sangeetha and M. Mary Shanthi Rani

Table 3 Classification accuracy of various tomato leaf diseases using smaller VGG16 net No. of images

Bacterial spot

Early blight

Late blight

Septoria spot

Target spot

Yellow curl virus

Mosaic virus

400

72.49

68.24

63.29

60.35

40.86

78.97

61

728

84.86

91.59

82.29

75.86

80.75

96.76

91.4

953

90.26

94.86

93.12

85.45

91.5

85.82

78.65

1246

99.94

98.69

98.71

98.46

98.4

99.91

99.74

healthy leaves. An accuracy is described as a several correctly classify images partitioned divide by an absolute number of images in the dataset. The dataset contains more than 1000 images under each disease class with a maximum limit of 1246 images. The graphical demonstration of the accuracy of the model is shown in Fig. 5. The graphical illustration of the accuracy of the model is shown in Fig. 6. Table 4 presents the influence of batch size on the classification accuracy. It is obvious from Table 4 that accuracy increases with minibatch size. It is also worth noting that there is not much increase in accuracy for batch sizes 16, 32, and 64. There is a steep rise inaccuracy from batch size 2–8.

Fig. 5 Classification accuracy for tomato leaf

A Novel Method for Plant Leaf Disease Classification …

639

Fig. 6 Classification accuracy for maize leaf

Table 4 Classification accuracy of various maize leaf diseases using smaller VGG16 net

No. of images Northern leaf blight Brown spot Round spot 200

62.94

68.24

63.29

400

74.52

78.23

76.12

600

83.75

84.91

83.08

850

95.17

96.45

94.65

Table 5 also shows that our model achieves good accuracy above 97 with batch size for early blight, yellow curl, and mosaic virus. The graphical representation of Table 5 is shown in Fig. 7.

Table 5 Classification accuracy for various batch sizes Batch size

Bacterial spot

Early blight

Late blight

Septoria spot

Target spot

Yellow curl Mosaic virus virus

2

72.49

68.24

63.29

68.11

73.58

69.95

8

91.97

70.14

93.1

93.78

88.48

97.82

98.21

16

95.39

97.55

94.57

95.95

91.91

98.84

98.91

32

98.7

98.41

97.96

98.16

95.8

99.5

98.24

64

99.05

99

98.25

98.73

98.78

99.12

99.74

81

640

R. Sangeetha and M. Mary Shanthi Rani

Fig. 7 Analysis of classification accuracy for various batch sizes

Table 6 presents the influence of batch size on the classification accuracy of maize leaves. Table 6 clearly demonstrates that accuracy increases with minibatch size. It is also worth noting that there is not much increase in accuracy for batch sizes 16, 32, and 64. There is a steep rise inaccuracy from batch size 2–8. The graphical representation of Table 6 is shown in Fig. 8. Figure 9 demonstrates the visual presentation of the outputs of the proposed model for test images. Tomato leaves and maize leaves diseases using smaller VGG16 net. It is observable from Fig. 4 that our trained model has achieved 98% accuracy in classifying tomato leaf diseases and maize leaf diseases. Table 6 Classification accuracy for various batch sizes

Batch size

Northern leaf blight

Brown spot

Round spot

2

69.27

67.87

65.45

8

89.62

70.53

87.76

16

90.46

92.47

90.35

32

93.65

94.01

93.43

64

95.31

94.56

94.86

A Novel Method for Plant Leaf Disease Classification …

641

Fig. 8 Analysis of classification accuracy for various batch sizes

5 Conclusion In this paper, a smaller VGG16 net has been used to classify the diseases affecting tomato and maize leaves using plant village dataset. The model uses thirteen layers instead of 16 layers in VGG16. The results have demonstrated that the model has achieved 99.18% for tomato leaves and 94.91% for maize leaves. Classification accuracy is evaluated with 13,257 images of healthy and unhealthy tomato leaves and 3150 images for maize leaves. The performance of this model has been analyzed for different minibatch sizes and number of tomato and maize images. This paper is focused on classifying the diseases in tomato and maize leaves. In the future, this could be extended to classify diseases of other leaves as well.

642

R. Sangeetha and M. Mary Shanthi Rani

Tomato Leaf Diseases

Maize Leaf Diseases

Fig. 9 The testing accuracy comparison between the Tomato and Maize leaf disease

Acknowledgements The experiments are carried out at Advanced Image Processing Laboratory, Department of Computer Science and Application, The Gandhigram Rural Institute (Deemed to be University), Dindigul, and funded by DST-FIST.

A Novel Method for Plant Leaf Disease Classification …

643

References 1. Kalpana Devi M, Mary Shanthi Rani M (2020) A review on detection of diabetic retinopathy. Int J Sci Technol Res 9(2). ISSN: 2277-8616 2. Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2015)Basic study of automated diagnosis of viral plant diseases using convolutional neural networks. In: International symposium on visual computing, pp 638–645. Springer, Cham 3. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 4. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep neural networks based recognition of plant diseases by leaf image classification. Comput Intell Neurosci 5. Nachtigall LG, Araujo RM, Nachtigall GR (2016) Classification of apple tree disorders using convolutional neural networks. In: IEEE 28th international conference on tools with artificial intelligence (ICTAI), pp 472–476 6. Brahimi M, Boukhalfa K, Moussaoui A (2017) Deep learning for tomato diseases: classification and symptoms visualization. Appl Artif Intell 31:299–315 7. DeChant C, Wiesner-Hanks T, Chen S, Stewart EL, Yosinski J, Gore MA, Nelson RJ, Lipson H (2017) Automated identification of northern leaf blight-infected maize plants from field imagery using deep learning. Phytopathology 107:1426–1432 8. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep convolutional neural networks. Neurocomputing 267:378–384 9. Kulkarni Anancl H, Ashwinpatil RK (2012) Applying image processing technique to detect plantdisease. Int J Modern Eng Res 2(5):3661–3664 10. Ferentinos PK (2018) Deep learning models for plant disease detection and diagnosis. Comput Electron Agric 145:311–318 11. Fujita E, Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2016) Basic investigation on a robust and practical plant diagnostic system. In: 15th IEEE international conference on machine learning and applications, ICMLA, pp 989–992 12. Sangeetha R, Mary Shanthi Rani M (2019) Tomato leaf disease prediction using convolutional neural network. Int J Innov Technol Explor Eng 9(1):1348–1352 13. Zhang X, Qiao Y, Meng F, Fan C, Zhang M (2018) Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 6:30370–30377