240 78 19MB
English Pages 588 [589] Year 2023
Lecture Notes in Electrical Engineering 966
N. Subhashini Morris. A. G. Ezra Shien-Kuei Liaw Editors
Futuristic Communication and Network Technologies Select Proceedings of VICFCNT 2021, Volume 1
Lecture Notes in Electrical Engineering Volume 966
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Yong Li, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova, Genova, Genova, Italy Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering—quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada Michael Luby, Senior Editor ([email protected]) All other Countries Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **
N. Subhashini · Morris. A. G. Ezra · Shien-Kuei Liaw Editors
Futuristic Communication and Network Technologies Select Proceedings of VICFCNT 2021, Volume 1
Editors N. Subhashini School of Electronics Engineering Vellore Institute of Technology Chennai, Tamil Nadu, India
Morris. A. G. Ezra Lee Kong Chian Faculty of Engineering and Science Universiti Tunku Abdul Rahman Petaling Jaya, Malaysia
Shien-Kuei Liaw Department of Electronic and Computer Engineering National Taiwan University of Science and Technology (NTUST) Taipei, Taiwan
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-19-8337-5 ISBN 978-981-19-8338-2 (eBook) https://doi.org/10.1007/978-981-19-8338-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
About This Book
Every year, communication technologies break through new limits, and the rate of development is no secret. There is a lot of room for improvement, which allows us to discuss the newest developments and forecast future trends. This book aims at offering new ideas and an in-depth information on the research findings in the field of communication and networks and contains the original research work presented at the Virtual International Conference on Futuristic Communication and Network Technologies (VICFCNT 2021) held on 10–11 December 2021 in Vellore Institute of Technology, Chennai. Problems, challenges, prospects, and research findings in communication and network technologies are the primary topics of discussion. The book is published in two volumes and covers cutting-edge research in cyberphysical systems, optical communication and networks, signal processing, wireless communication, antennas, microwave engineering, RF technologies, Internet of things, MEMS, NEMS, wearable technologies, as well as other contemporary technological advances. This book presents state-of-the-art innovations in the field of communication and offers promising solutions to many real-world problems. It will be a valuable resource for individuals to expand their knowledge and enhance their research ideas, as well as channelling them in the ideal direction for future research in these areas.
v
Contents
IoT-Based Monitoring, Communication and Control of Small Wind Turbines Using Azure Cloud Service . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Shreya, V. Nimal Yughan, Jyotika Katyal, and P. Augusta Sophy Beulet Implementation of e-Healthcare Data Acquisition System Using IoT (Internet of Things) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adarsh Ravi Mishra, Ragini Shukla, and Ravi Mishra Review of Discrete Wavelet Transform-Based Emotion Recognition from Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aditi Anand, Aishwarya Nambiar, Shruti Pani, and Mohanaprasad Kothandaram
1
13
25
Network Intrusion Detection Using Machine Learning . . . . . . . . . . . . . . . . Pratik Kumar Prajapati, Ishanika Singh, and N. Subhashini
55
Glacier Ice Surface Velocity Using Interferometry . . . . . . . . . . . . . . . . . . . . M. Geetha Priya, D. Krishnaveni, and I. M. Bahuguna
67
A Study on Various Optimization Techniques for Understanding the Challenges, Issues, and Opportunities of Hybrid Renewable Energy Built Microgrid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Venkatasubramani and R. Ramya Intrusion Detection System on New Feature Selection Techniques with BFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Rajeshwari and M. P. Anuradha
77
89
Frequency and Stability Control of Photovoltaic and Wind-Powered Grid-Connected DC Bus System . . . . . . . . . . . . . . . . . . 105 M. Moovendan, R. Arul, and S. Angalaeswari
vii
viii
Contents
An Evaluation of Feature Selection Methods Performance for Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 P. Usha and M. P. Anuradha IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Maddikera Kalyan Chakravarthi, Tamil Selvan Subramaniam, Ainul Hayat Abdul Razak, and Mohd Hafizi Omar ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Soumya Shaw, S. Siddharth, S. Ramnath, S. Kishore Nithin, Suganthi Kulanthaivelu, and O. S. Gnana Prakasi Levy Flight-Based Black Widow Optimization for Power Network Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 S. Dhivya and R. Arul Exploratory Spatial Data Analysis (ESDA) Based on Geolocational Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 P. Baby Shamini, Shubham Trivedi, K. S. Shriram, R. R. Selva Rishi, and D. Sayyee Sabarish Modified Hill Cipher with Invertible Key Matrix Using Radix 64 Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 A. Ashok Kumar, S. Kiran, and D. Sandeep Reddy BT Classification Using Deep Learning Techniques from MRI Images—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 M. Neethu and J. Roopa Jayasingh Design and Development of Automated Smart Warehouse Solution . . . . . 193 B. Nagajayanthi and Roopa JayaSingh Emotion Recognition from Facial Expressions Using Videos and Prototypical Network for Human–Computer Interaction . . . . . . . . . . 205 Divina Lawrance and Suja Palaniswamy A Review on Early Diagnosis of Parkinson’s Disease Using Speech Signal Parameters Based on Machine Learning Technique . . . . . . . . . . . . 217 Rani Kumari and Prakash Ramachandran Investigation of Attention Deficit Hyperactivity Disorder with Image Enhancement and Calculation of Brain Grey Matter Volume using Anatomical and Resting-State functional MRI . . . . . . . . . . 235 K. Usha Rupni and P. Aruna Priya
Contents
ix
VLSI Implementation for Noise Suppression Using Parallel Median Filtering Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Pobbathi Nithin Kumar, Shubhada Budhe, A. Annis Fathima, and Chrishia Christudhas Investigation on Performance of CNN Architectures for Land Use Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 R. Avudaiammal, Vijayarajan Rajangam, A. Swarnalatha, P. S. Nancy, and S. Pavithra Enhanced ATM Security Using Facial Recognition, Fingerprint Authentication, and WEB Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 K. V. Gunalan, R. A. Sashidhar, R. Srimathi, S. Revathi, and Nithya Venkatesan Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban Using GIS and Satellite Image Processing . . . . . . . . . . . . . . . . . . . . . 289 S. Meghana and M. Geetha Priya Shoreline Change Detection and Coastal Erosion Monitoring: A Case Study in Kappil–Pesolikal Beach Region of the Malabar Coast, Kerala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Sushma S. Bharadwaj and M. Geetha Priya A Novel Approach with Hybrid Technique for Monitoring and Leakage Detection of Water Pipeline Using IoT . . . . . . . . . . . . . . . . . . 311 D. Mahesh Kumar, BA. Anandh, A. Shankar Ganesh, and R. Sakthivel VGG-16 Architecture for MRI Brain Tumor Image Classification . . . . . . 319 N. Veni and J. Manjula Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using Multispectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 K. R. Raghavendra, M. Geetha Priya, and S. Sivaranjani Ethereum-Based Certificate Creation and Verification Using Blockchain E. Mutharasan, J. Bharathi, K. Nithesh, S. Bose, D. Prabhu, and T. Anitha
339
IoT and Machine Learning Algorithm in Smart Agriculture . . . . . . . . . . . 355 A. Revathi and S. Poonguzhali Laminar Ice Flow Model-Based Thickness and Volume Estimation of Karakoram Glaciers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 S. Sivaranjani and M. Geetha Priya Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using Space-Based Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 B. Shashank and M. Geetha Priya
x
Contents
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data . . . . . . 391 S. Dhanush and M. Geetha Priya Flood Mapping and Damage Assessment of Odisha During Fani Cyclone Using HSR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 C. Rakshita, M. Geetha Priya, and D. Krishnaveni Disability Assistance System Using Speech and Facial Gestures . . . . . . . . 411 B. N. Ramkumar, S. L. Jayalakshmi, R. Vedhapriyavadhana, and R. Girija A Deep Learning Neural Network Model for Predicting and Forecasting the Cryptocurrency–Dogecoin Using LSTM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 N. Shivaanivarsha, M. Shyamkumar, and S. Vigita Crop Monitoring of Agricultural Land in Chikkaballapura District of Karnataka Using HSR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 A. Sowjanya and M. Geetha Priya RGB-to-Grayscale Conversion Using Truncated Floating-Point Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 S. Sankar Ganesh and J. Jean Jenifer Nesam IoT-Based System Development for Online Power Quality Condition Monitoring of Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 A. Vijayalakshmi, R. Omana, S. Sanjiti, G. Abdul Samath, and B. Ebenezer Abishek A Cloud-Based Prediction and Self-Diagnosis System for PCOS Using Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Jishnu Saurav Mittapalli, Kush Khanna, Jainav Amit Mutha, and Saranya Nair AN2DROM: Anti-drowsiness Device for Motorcyclist . . . . . . . . . . . . . . . . . 485 Maddikera Kalyan Chakravarthi, Tamil Selvan Subramaniam, M. F. A. B. Mohamad Basri, Y. V. Pavan Kumar, D. John Pradeep, and Ch. Pradeep Reddy One-Dimensional Flood Modeling of River Kaveri Using HEC-RAS . . . . 495 Mahesh B. and Geetha Priya M. Face Recognition-Based Attendance System with a Mobile Application Using Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Yepuganti Karuna, Vandana Sai Sumanth, Allanki AkshayRao, Pandeti Sanjay Varma, and Saladi Saritha The Contemporary State of Glacial Lakes in Chandra Basin, Western Himalayas: A Case Study in 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 S. Sriram and M. Geetha Priya
Contents
xi
AI-Enabled Dimming Streetlight with Energy Optimization . . . . . . . . . . . 527 Akhil Pathak, M. S. Bala Murugan, and Manoj Kumar Rajagopal A Survey of QoE Framework for Video Services in 5G Networks . . . . . . . 541 K. B. Ajeyprasaath, P. Vetrivelan, Elizabeth Chang, and Sankara Gomathi Linearization of R-R Peak in Abdominal ECG Signals for Fetal ECG Separation Using Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 D. Edwin Dhas and M. Suchetha Accident Alert and Intensity Predictive System with Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Saiteja Ailneni, Anurag Sangem, and S. Sofana Reka IoT-Based Smart Health Monitoring System Using Cloud Services . . . . . 575 Karthik Patelkana, Charan Devapatla, and R. Ramesh Noninvasive Detection of Alzheimer’s Disease from Conversational Speech Using 1D-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 John Sahaya Rani Alex, Rishikesh Bhyri, Gowri Prasood Usha, and S. V. Arvind Machine Learning for Diabetes Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 K. Annarose, Arushi Adhar, and Sathiya Narayanan
About the Editors
N. Subhashini is an Associate Professor in the School of Electronics Engineering, Vellore Institute of Technology, Chennai. She has over 16 years of teaching and research experience. She received her B.E degree from University of Madras, Master’s degree from College of Engineering, Guindy, India and PhD from VIT University. She is a gold medalist in her Post graduation. She has several research papers published in reputed peer-reviewed journals and conferences. Her research interests include optical metro/access networks, FTTx technologies, Next Generation architectures and services, optical fiber technology, WDM systems, network and information security. Morris. A. G. Ezra received his B.E. degree from Bharathiar University, his M.E. degree from Anna University, and his Ph.D. degree from Multimedia University, Malaysia. He started his career with the Karunya Institute of Technology as a lecturer in 1993 before moving to Malaysia in 1998. Professor Ezra has over 23 years of experience in the academic field. He has secured national and international research grants worth more than RM 1 million. He is actively involved in supervising undergraduate and postgraduate students. His research areas include digital signal processing, wireless ad-hoc networks, mobile communication, optimization using PSO, and GA/IGA. He has published over 40 papers in international journals, conferences, and co-authored book chapters. Shien-Kuei Liaw received double Ph.D. degrees from National Chiao-Tung University in photonics engineering and National Taiwan University in mechanical engineering. He joined the faculty of Taiwan Tech (also known as NTUST) in 2000. Currently, Prof. Liaw is Chairman of the Department of Electronics and Computer Engineering and Graduate Institute of Electro-Optical Engineering, NTUST. Besides inventing 40 patents, he authored and co-authored more than 280 journal articles and international conference presentations in optical communication, fiber sensing, and
xiii
xiv
About the Editors
optical devices. Professor Liaw was an academic visitor to the University of Oxford and the University of Cambridge in 2011 and 2018, respectively. He gave presentations as a keynote speaker or an invited speaker at many conferences. He also served as a guest editor for several textbooks. Professor Liaw was President of the Optical Society (OSA), Taiwan Section, and Secretary-General of Taiwan Photonics Society. Professor Liaw is a senior member of IEEE and OSA.
IoT-Based Monitoring, Communication and Control of Small Wind Turbines Using Azure Cloud Service M. Shreya, V. Nimal Yughan, Jyotika Katyal, and P. Augusta Sophy Beulet
Abstract Clean energy development is significant for combating global climate change and limiting its most devastating effects. A major challenge faced when trying to use wind energy to generate power is that wind is a variable. Wind farms are generally located at remote places, far away from the research centers. The time, manpower and cost required to collect the data from the turbines is one of the biggest challenges faced by the wind energy sector. The relatively newer wind turbines are equipped with a supervisory control and data acquisition also known as SCADA. However, the conventional turbines which have been installed for many years are not equipped with the SCADA system. This makes data collection and controlling of such turbines a very big task. This might result in such turbines becoming of no avail. Thus, a system has been proposed to help mitigate this issue through the use of a cloud service enabled with an IoT hub. A Raspberry Pi installed at each turbine will collect the values from the turbine sensors. The node red running on it will send these values to the Microsoft Azure cloud service. Through the use of Azure cloud service, the data from all turbines can be monitored at any place with the help of the Internet. The system promises to establish a two-way communication with the turbines which make it different from the SCADA system that can be installed on turbines even if they are not at par with the latest technology prevalent. Keywords Wind turbine · Azure · Remote monitoring · SCADA · Node red · Cloud integration
M. Shreya (B) · V. N. Yughan · J. Katyal · P. A. S. Beulet School of Electronics Engineering, VIT University, Chennai, India e-mail: [email protected] V. N. Yughan e-mail: [email protected] J. Katyal e-mail: [email protected] P. A. S. Beulet e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_1
1
2
M. Shreya et al.
1 Introduction The variations in the air pressure result in the existence of the earth’s wind system. Uneven solar heating at various places on the earth results in varied air pressures. Wind is defined as the movement of air. Temperature difference between land and sea results in localized wind patterns. As the height increases, the speed of the wind increases since the roughness of the land causes the wind speed to decrease. Reliable wind speed is extremely difficult to find on most regions of the earth. Some specific areas have an average wind speed of 4–5 m/s which can be utilized for the generation of electricity. The wind turbine is a device which converts the kinetic energy of the wind to electrical energy [1]. It is designed to take maximum advantage of the wind’s kinetic energy using the data obtained from the wind vane and anemometer. The data obtained from the vane is used to align the turbine in the direction of the wind. The blades turn due to the wind, they provide maximum power when wind speed is 11 m/s, however when the wind speed exceeds 25 m/s, the turbine slows down to avoid excessive voltages. The nacelle located at the top of the hub rotates the gear box which is connected to it lifts the turning velocity. The generator connected to the gearbox through a fast axis produces electricity. The voltage is raised, and then the energy is injected into the electrical grid. A group of wind turbines located at the same area to generate electricity is known as a wind farm or wind power plant. A wind farm should have a minimum of three wind turbines. Most wind farms are installed offshore or on land and have capacity of hundreds of MW. They contribute to the voltage control, frequency and power of a power system and help maintain its stability. The wind farms are located at remote places where significant power can be generated after estimating the average wind speed. Thus, it becomes important to have a system which helps to monitor the data from each turbine from the base station and also a technology that controls the turbines in order to extract maximum power out of them. The real-time application of the proposed system aids in collecting data from all sensors present on each turbine and storing it in a database. This can further be used for wind power forecasting using time series analysis and wind resource assessment. Wind speed, wind direction, temperature, and pressure are the four key criteria used to identify the area where wind energy can be efficiently harnessed. The information can also be used to calculate wind parameters like mean wind speed, wind shear (turbine height vs speed), and determine whether wind speed follows the Weibull distribution. The wind power density can be calculated as follows: Wind Power Density(P) = P= v: mean wind speed A: cross rotational area
1 ∗ ρ ∗ A ∗ v3 2
1 ∗ C P ∗ ρ ∗ π ∗ R 2 ∗ v3 2
(1) (2)
IoT-Based Monitoring, Communication and Control of Small Wind …
3
ρ: Air density. To compare the performance of turbines: CP versus TSR: Coefficient of performanceCP = Tip speed ratio (TSR) =
Power extracted(P) Power available(Pw )
speed of rotor tip wind speed
(3) (4)
To maintain constant TSR, when wind speed (v) increases, we need to increase the rotor speed. Constant TSR generates more power. Supervisory control and data acquisition also known as SCADA is a combination of hardware and software that helps industries to monitor, gather and process realtime data, control processes at remote locations and record events in a logger system. The ability to maintain efficiency, process data and communicate issues in the system makes SCADA very important to an organization as it helps to reduce the time taken to perform these actions manually. Its architecture consists of programmable logic controllers (PLCs) which make communication possible to factory machines, sensors and end devices which then sends the data to computers with SCADA software. The software helps the employees of an organization to monitor the data and analyze it in real time [2]. This SCADA system is now being utilized efficiently in the wind energy sector. The system runs on a computer in the control room of the wind farm or any computer accessing the farm using TCP/IP [3]. This helps in saving the time and energy required by humans to fetch data from each turbine manually. However, the conventional turbines which have been installed for many years are not equipped with the SCADA system. This makes data collection and controlling of such turbines a very big task. This might result in such turbines becoming of no avail. To avoid such a situation and to be able to exploit these wind turbines to the maximum potential, a system utilizing the Internet of Things can be developed. The work presented aims to establish a system where the data from the controllers of each of these turbines are sent to a cloud service, and this data can then be accessed by authorized personnel anywhere in the world using a computer with Internet access.
2 Literature Review The demand for renewable energy resources is on an increase and so is the need to solve the problems encountered when extracting the resources. A review of literature was conducted to analyze the past research related to remote monitoring and communication of wind turbines. A method to analyze the data collected from the wind turbine was proposed in [4]. The RS485 interface collects the data from the controller and sends it to the microcontroller which is then sent to the cloud server using the GSM module. The paper [5] suggested a wind turbine monitoring and control system using IoT in which the sensing unit is interfaced with the Arduino
4
M. Shreya et al.
for signal conditioning and then the Raspberry Pi acts as the IoT device where the parameters are monitored and controlled. The control unit sends out an alert when values exceed safe limits so as to maintain the optimum operation. In another method, the Raspberry Pi constantly senses the sensor data of the turbine and is remotely monitored by pushing the data using MQTT protocol to a dashboard [6]. The turbine is turned ON/OFF only based on the preset conditions given in the Raspberry Pi. In [7], the author put forward an architecture of remote monitoring of wind turbines where the parameter values stored in the data logger are sent to the cloud using Raspberry Pi as the gateway and are monitored real-time using GUI. The authors of [8] came up with a method which is more generalized and not limiting to wind turbines only. The SCADA system is mimicked using Raspberry Pi, and the data collected is sent to the server which has a MySQL database where it is stored. This is then monitored using a GUI. The authors of paper [9] demonstrated a model in which the sensor data is sent to Thinger.io using ESP32 Wi-Fi module and is monitored real time using a dashboard. In the paper [10], the authors propose a system in which the primary objective is to detect cable twist issues in the windmill using a flex sensor. The health of the turbine is supervised remotely, and if any issue, it gets notified using GSM through the PIC microcontroller. Paper [11] presented a method of monitoring and fault diagnosis of wind turbines using IoT. RF tags are placed with each component to identify it (smart turbine), and vibration and torque sensors are used to monitor the health of turbines. The collected data is sent to a central server with smart connection and is then stored and displayed. The authors of paper [12] propose a monitoring system for condition and SSCI detection in real time. The system was developed on an FPGA-CPU controller with IoT enabled along with a data logging facility. Another solution was proposed in [13] to monitor the output voltage and other parameters of 64 turbines in a wind farm which is IoT enabled through the NTP protocol. Power measurement and energy analysis was performed for each IoT node to estimate battery life. Through the survey of literature, it is evident that the authors try to remotely monitor the turbine using IoT solutions and try to establish a control over it by only using the pre-set conditions, but the real need is to develop a lost cost solution as an alternative for the SCADA system. The proposed solution should perform remote monitoring of the turbine and also enable two-way communication so that the turbine can also be controlled remotely as per the authorized user’s need. The monitoring dashboard should also contain various features to efficiently handle the data and gather more insights.
3 Architecture The basic architecture for both the practical and simulation model is the same. The simulation model helped us mimic the functions of sensors present on the turbines and how we can effectively retrieve the data in real time. To accurately explain how
IoT-Based Monitoring, Communication and Control of Small Wind …
5
the simulation works and how we should perceive it when implementing it with the actual hardware, we have divided the architecture model and pointed out the required changes.
3.1 Practical Model The wind turbine controller (e.g., TMC controller) or the data logger of a particular turbine has the details of certain parameters used in Wind Resource Assessment like wind speed, wind direction, temperature and pressure which are continuously monitored. A Python script running in the Raspberry Pi (placed in each turbine) will access these values and will be forwarded to the node red flow active in the Raspberry Pi (Fig. 1). The node red flow performs the required preprocessing to extract only the values and drop the other overhead data which is not of importance. The values will be accumulated for a specified amount of time (e.g., 5 min), and then the average of those will be delivered to the corresponding device in the Azure IoT Hub. The IoT Hub contains the data of all the turbines connected to it. The received data is then processed in the Azure Stream Analytics Job using SQL query to extract the device particular data. This is then monitored real time using visualization tools like Microsoft Power BI. The cloud-to-device message is sent from IoT hub to node red flow which is running on Raspberry Pi and that signal is transferred to the turbine through a relay which aids in turning ON/OFF the turbines.
Fig. 1 Architecture of practical model
6
M. Shreya et al.
Fig. 2 Architecture of simulation model
3.2 Simulation Model A circuit designed in TINKERCAD (Fig. 2) will perform the process of generating sensor values and forwarding it to the node red flow. This is similar to the function of obtaining values from the TMC controller and the Python script used to extract values and send it to node red as explained in the practical model. The sensor attached to the Arduino microcontroller will be the simulated version of data from the wind turbine controller or data logger. The data of each device is sent to ThingSpeak through a Wi-Fi module which is then forwarded to the node red flow created using Application Programming Interfaces (APIs). The ThingSpeak merely acts as an interface between TinkerCAD and node red. The node red flow extracts the required data, aggregates the data for a specified amount of time and then formats the aggregated value along with device ID and other information to be sent to that device in Azure IoT Hub. The received data in the hub is processed in a stream analytics job and sent to Power BI for real-time visualization. The cloud-to-device message is sent from the hub to every device or a particular device alone using another node red flow which when received performs the control mechanism of the turbine.
4 Hardware Required 4.1 Raspberry Pi Raspberry Pi is a single-board computer and provides the Raspberry Pi OS (formerly known as Raspbian). The role of Raspberry Pi is to use serial communication to extract turbine data from the logger or controller, process it and send it to Azure IoT Hub through the node red.
IoT-Based Monitoring, Communication and Control of Small Wind …
7
4.2 Relay Relay is an electrically operated switch which can be controlled by turning it ON or OFF based on the current flow through it. It can be controlled by low voltages and is used for switching smaller voltage to higher.
4.3 TMC Controller or Any Wind Turbine Controller The orbital TMCs are all-in-one wind turbine control systems with the ability to run in various wind turbine models by various manufacturers. The TMC3 controller was designed to operate with medium-sized turbines with capacity in the range from 10 to 750 kW. The controller performs the control action of the turbine, and various parameters are measured and logged. The values stored in the controller can be accessed from the system using the TMC2 communication software.
5 Software Required 5.1 TinkerCAD TinkerCAD is a free to use, online 3D modeling program. In this project, Arduino is simulated using TinkerCAD as Raspberry Pi cannot be simulated online and it also acts as the controller of the wind turbine along with mimicking Raspberry Pi’s functionality.
5.2 ThingSpeak The ThingSpeak platform is an IoT-specific service with which real-time data from the device can be analyzed in the cloud. Since the data from TinkerCAD cannot be sent to node red directly, ThingSpeak acts as an interface which receives data from TinkerCAD and sends it to node red flow.
5.3 Node Red It is a flow-based programming tool that helps visualize the IoT system through a set of nodes that help to easily integrate various devices in the IoT model. The GPIO pins of Raspberry Pi can be accessed directly from the flow. Node red sends data from
8
M. Shreya et al.
Raspberry Pi to the Azure IoT Hub device and receives cloud-to-device messages to send signal to relay.
5.4 Microsoft Azure IoT Hub Azure IoT Hub is a service that enables bidirectional communication between various IoT devices which is reliable, secure and fully managed. It provides multiple deviceto-cloud, cloud-to-device communication, message routing and monitors device status and events.
5.5 Power BI The Microsoft Power BI platform is mainly used as a business intelligence software wherein the user can perform data analysis and also manipulate the data effectively. The data of each device can be sent from a stream analytics job in the hub as output to Power BI and required visualization can be performed.
6 Implementation 6.1 Simulation Model The TinkerCAD software is used to replicate the controller function of the wind turbine. We designed three circuits on TinkerCAD that represent three turbines on a wind farm (Fig. 3). The temperature sensor is connected to the analog pin of the Arduino Uno which receives the sensor data. This data is transmitted to ThingSpeak platform (Fig. 4) using the ESP8266 module. This value is then pulled by the node red software and processed there. The node red flow shown in Fig. 5 will remain the same for all the turbines. The data from the ThingSpeak platform is pulled by the HTTP node which is triggered by the inject node every second. This data is converted into a JSON object, and the sensor value is extracted from it. The aggregator node takes an average of all the values received within 10 s and sends it to the function node. The aggregated value from the node is then passed to the function node which formats the message as required by the Azure node. The turbines are registered as separate devices in the IoT Hub. One extra device called ‘common’ is also created in order to control all the turbines at once. Each device has its own ‘SharedAccessKey’ which can be used to securely transmit data from the device to the IoT hub. The data received from the devices to the IoT Hub needs to be processed and visualized in a software. For this, we use
IoT-Based Monitoring, Communication and Control of Small Wind …
Fig. 3 TinkerCAD turbine circuit
Fig. 4 ThingSpeak platform
9
10
M. Shreya et al.
Fig. 5 Node red flow for Turbine 1
Azure Stream Analytics and Power BI. We need to first create a Stream Analytics Job on Azure and specify the input, output and query. In this case, the input will be the IoT hub and the output will be the three turbine datasets on Power BI. The query is simply a code used to process the input and output. Here we have more than one device, and hence, we specify the DeviceID. The values are displayed on the Power BI visualization dashboard. The C2D communication is also possible wherein a message can be sent to either a specific turbine or collectively to all the turbines. This message is received and processed by the Rpi, which then instructs the turbine to perform a specific function (say turn ON/OFF).
6.2 Practical Model In the practical setup, the TMC controller of the wind turbine sends the data of the sensors to the Raspberry Pi through serial communication. The node red flow running in the RPi triggers a Python script that runs simultaneously, which is programmed to extract data from the controller. This data is processed on node-red and aggregated for a certain time period. This final aggregated value is formatted and sent to the Azure IoT Hub. The process from here including the C2D communication will be similar to what was done in the simulation.
7 Results and Discussion When the simulation on TinkerCAD starts running, the sensor values from each turbine are sent to their respective ThingSpeak graph. From here, the node red extracts the value and sends it to the Azure Hub for further processing. Once we
IoT-Based Monitoring, Communication and Control of Small Wind …
11
Fig. 6 C2D node red output of Turbine 2
start the stream analytics job, the data gets stored in the form of datasets on the Power BI dashboard. These datasets correspond to the temperature values from the respective turbines. We can visualize the real-time data transmission using different visualization styles. The corresponding turbine values have been displayed in the node-red debug window. The cloud-to-device or C2D communication is also made possible by the Azure cloud services. We can send messages to either the individual devices or all the devices through the message to device tab. The node red flow consists of two subflows (Fig. 6). The first subflow is common to all turbines in the wind farm, and the second one is device specific or turbine specific. The common subflow in all turbines will have the same ‘SharedAccessKey’, and hence, when we send a message through the common device on Azure, the message reaches all the turbines. If we want to send a message to the specific turbine, we select the turbine from the list of devices and can send a message through that.
8 Conclusion The system aimed at eliminating the problem of manual data collection from small wind turbines that were installed years ago. The installation of SCADA systems in order to make these turbines up to date with technology is a very expensive affair. Thus, a cost-efficient IoT-based turbine communication and control system has been proposed. It uses the Microsoft Azure cloud service which is equipped with an IoT Hub and can be paired with hundreds of devices at once. It also makes the C2D communication possible which is a huge step up from the existing systems and thus proves to be of great value in case of any emergency situation. The Power BI visualization tool helps visualize the sensor data of all turbines farm wise and is a very economical tool. The proposed system offers solutions to the problems mentioned earlier and also promises to be more budget friendly than the existing solutions. Currently, our discussion is limited to small wind turbines that lack the
12
M. Shreya et al.
SCADA system. Further research shall include trying to integrate all the turbines with SCADA systems to gather data and hence also provide a two-way communication.
References 1. Wind turbine Wikipedia, https://en.wikipedia.org/wiki/Wind_turbine 2. Inductive Automation SCADA Homepage, https://www.inductiveautomation.com/resources/ article/what-is-scada 3. DEIF SCADA Systems, https://www.deif.com/wind-power/applications/scada-systems 4. Wilson S, Kirubanand VB (2019) Wind turbine data collection using IoT. IJITEE 9(2) 5. Kalyanraj SLP, Sabareswar S (2016) Wind turbine monitoring and control systems using Internet of Things. In: 2016 21st century energy needs—materials, systems and applications (ICTFCEN), 2016, pp 1–4. https://doi.org/10.1109/ICTFCEN.2016.8052714 6. Singh VK, Rishav MBC, Ray UK (2018) IOT based windmill monitoring system. IJERT NCESC-2018 6(13) 7. Akyüz E, Demircan B (2019) IoT and cloud based remote monitoring of wind turbine. Celal Bayar Üniversitesi Fen Bilimleri Dergisi 15. https://doi.org/10.18466/cbayarfbe.540812 8. Sadab-Ul-Kabir, Khaled Saifullah Sadi M, Quamruzzaman M, Monitoring and Controlling of Multiple Industrial Processes by Using Raspberry Pi in SCADA System. https://www.aca demia.edu/37571146/Monitoring_and_Controlling_of_Multiple_Industrial_Processes_by_ Using_Raspberry_Pi_in_SCADA_System 9. Aghenta LO, Iqbal MT (2019) Low-cost, open source IoT-based SCADA system design using Thinger.IO and ESP32 thing. MDPI Electron 8:822. https://doi.org/10.3390/electroni cs8080822 10. Hayum A, Sethu Anand R et al (2019) Analysing and monitoring windmill using Internet of Things. IJESC 9(3) 11. Edison Prabhu K, Sanoofa AN, Implementation of IOT for real time monitoring and fault diagnosis of wind turbine. IJRTER, CELICS’18. https://doi.org/10.23883/IJRTER.CONF.021 80328.055.KSMLF 12. Zhao L, Zhou Y, Brandao Machado Matsuo I, Korkua S, Lee WJ (2019) The design of a holistic iot-based monitoring system for a wind turbine. https://doi.org/10.1109/ICPS.2019.8733375 13. Srbinovski GC, Morrison AP, Leahy P, Popovici E (2017) ECO: an IoT platform for wireless data collection, energy control and optimization of a miniaturized wind turbine cluster: power analysis and battery life estimation of IoT platform. In: 2017 IEEE international conference on industrial technology (ICIT), 2017, pp 412–417. https://doi.org/10.1109/ICIT.2017.7913266
Implementation of e-Healthcare Data Acquisition System Using IoT (Internet of Things) Adarsh Ravi Mishra, Ragini Shukla, and Ravi Mishra
Abstract A smart healthcare system monitors the vital parameter of patient and continuously displaying the information regarding patient. Smart healthcare system is built by some important sensors and microprocessor. The main objective of the paper is to introduce smart healthcare device that supports doctors as well as guardian of patients to monitor patient’s condition. Sensors are integrated and attached to device to monitor patient in a very effective manner. This device prevents patient to reach in its chronic condition because one alarm has been attached to it; if the condition is reaching up or down as per the set normal value, it will be alarming to indicate the patient’s condition. Device is providing Zigbee wireless connection to prompt monitoring convenience to doctor in system, so that it helps to diagnose sick patient easily. It is a device that supports doctor for better treatment and frequent observation of the patient. Keywords Health care · Zigbee wireless connection · Monitor · Microprocessor · Sensors
1 Introduction According to today’s trend to develop healthcare sensor-based platform, IoT technology attracts attention. IoT is also the best, and if we see its benefits, then we cannot ignore it. Another similar technique which is currently in the trend is Web of Things (WOT) [1]. The physical device that connects it to the Internet is the focus of the OSI 7 layer’s application layer to develop the WOT e-health application [2]. A. R. Mishra (B) · R. Shukla Department of Computer Application and Information Technology, Dr. C. V. Raman University, Bilaspur, India e-mail: [email protected] R. Mishra Department of Electronics and Telecommunication, GHRIET, Nagpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_2
13
14
A. R. Mishra et al.
One of the important studies among many important studies is IoT-based health care, which uses a lot of very important techniques and application protocols [3]. To implement a better e-healthcare solution, the role of the IOT is also important in sensor-based health care, which prepares the health IoT gateway [4], which establishes the connection between the sensor and the Internet provides the facility to the caretaker to observe the patient’s parameters properly, locally and remotely. We have opted Raspberry Pi 3 board in our experiments, and there are many other such useful experiments in which Raspberry Pi 3 board has been used in different– different areas of health care [5, 6]. Arduino Uno has also played an essential role in the application of health care [7]. An experiment has also been developed in which Intel’s Galileo board has been used [8], which uses the Xbee module to communicate through the Zigbee protocol. In healthcare area, there are various examples of connecting medical devices on the Internet in order to perform different local and remote healthcare services like patient monitoring online, medical consultation online, alarm supported devices and many surgical invention [9, 10]. The main hike area has suffused with development of portable healthcare/medical devices and objects. These objects may be used for ubiquitous measurement of patient health parameters, such as blood pressure and temperature, for activity monitoring and recognition of remote patients [11]. We consider during observation that Zigbee is an effective tool to monitor patient parameter remotely [12]. Raspberry Pi is also benefitted to monitor patient parameter effectively; with the use of Raspberry Pi, we can send patient’s data on the server easily; it is also beneficial for remote patient’s monitoring [13]. IoT is one of the latest technologies, which is efficiently used by many areas in different–different field. Such as to control energy consumption efficiently, IoT has worked, and it has maintained flow of energy through smart grid system [14]. It has provided many services and applications to use effective impactful result thoroughly [15, 16]. This paper is based on monitoring the patient. We have designed a very effective reliable economical one patient monitoring system which is capable of sending realtime records related to passion. Health care is a vast area that requires continuous testing. This is one of our efforts in which we have created such a device which will take input of heartbeat rate and temperature and send message, when it will be higher and lower than its normal value. We are trying to make a small contribution in this area. It will continuously monitor the heart rate ECG and body temperature. The doctor will be able to monitor the condition of the patient with the help of the GPS of the mobile, and if needed, they will be immediately available for immediate treatment to the patient. After measuring patient-relevant parameters such as recent trade ECG temperature, it will be translated wirelessly into the central IC using Zigbee technology. Simultaneously, the record of the parameters in the server will be kept, in which we can access and monitor the records from anywhere. The device will be capable of recording, displaying and sending patient-related data to the server. The device will have Internet connection supporter and a GPRS
Implementation of e-Healthcare Data Acquisition System Using IoT …
15
supported so that any person can monitor the records of sick person from anywhere in the world to know his condition. The patient will be with sensor and whatever physiological data matches the patient, that data will be measured and recorded in the form of electronic data. And we can store this data in the server through GPRS network. IoT is the most powerful interfacing paradigm in today’s scenario. Due to higher computing and communicating capabilities, our social and living life becomes piece of Internet. By past year approximately, all the healthcare organizations will have implemented IoT technology. The most challenging work of our society is to improve quality care and efficient treatment of patients. For effective treatment, people want speed and accurate health care, and IoT and sensors-based devices could be a great option for proper health care. Therefore, IoT becomes popular to productive health care in population. Nowadays, different body parameters of patient are monitored and observed. An example of fastest treatment is intensive care unit (ICU) where patient’s parameters are observed speedy and medicines are also provided in less amount of time. The huge people in area suffering from different–different diseases are quite high. The main reason is pollution, high population, and poor healthcare facilities. Some serious diseases, like cardiac, require regular monitoring. Otherwise, these can be fatal.
2 Existing System Till now, PC was used as data acquisition (DAQ) system for collecting patient vital records. Bluetooth technology was used to measure patient’s parameters locally and remotely to provide treatment immediately.
3 Proposed System Information collection and monitoring will be done through microcontroller. If the patient’s status is higher or lower than the normal condition, it will be uploaded to the server and an indication message will come in the doctor’s mobile. This paper is dedicated to these patients who are not in the critical stage but can be monitored continuously to prevent them from going into a cute position key. If the patient’s condition is abnormal, an indication message will be sent to the doctor, which will help him/her to get fast treatment. It can treat a distant patient and monitor its condition in a very fast way.
16
A. R. Mishra et al.
4 Methodology Few years ago in traditional method, devices were used to monitor patient’s accurate parameter and for analyzing the patient’s medical data. Today, we use the device to treat patient in a good manner. That could be possible through the development of IoT healthcare applications. The research work is based on some sensors and IoT object, i.e., microcontroller. IoT object is enabled to collect patient’s parameters like temperature, pulse oxygen level, and cardiac activity after real-time collection of data. Data analysis part will be done and then data will be placed in cloud storage to measure health recovery rate and recovery time of patient. In healthcare object, Raspberry Pi is the microcontroller, and ECG electrode, pulse oximeter, and temperature sensor MEMS are used to measure the parameters of patients (Fig. 1).
5 Block Diagram In the block diagram, the formal layout of system hardware is shown. Hardware is centrally focused on Raspberry Pi, which is providing efficient function to monitor the whole treatment of sick person and helps for the establishment of normal parameter. Zigbee provides wireless accessing and monitoring to local or remote patient. Integrated sensors are benefitted to sense the body symptoms. Focused hardware concentrates on patient’s convenient treatment. In the diagram, e-health system clearly represents how system has defined and how sensors are attached to microcontroller, i.e., portable system has been developed for platform, so that people can easily carry everywhere with no fear of handling from one place to other place (Fig. 2).
6 Hardware Description A description about all the connected hardware components individually has precious qualities: Microcontroller is the heart of our application. As per the written program, it is being communicated and interfaced with device and maintaining transmission of all biosensor records; here we prefer the best Raspberry Pi for the implementation of application.
Implementation of e-Healthcare Data Acquisition System Using IoT …
17
Fig. 1 Methodology of hardware
6.1 Zigbee Zigbee is the replacement of all existing non-standard wireless technologies, and it is enhanced by IEEE 802.15.4 Personal Area Network (PAN) standard, basically designed to control long-distance controlling application.
6.2 LCD It is very effective to show sensor’s record/data. It is prompt for mandatory information.
18
A. R. Mishra et al.
Fig. 2 Block diagram of hardware
6.3 GPRS This IoT device includes GPRS modem for implementation. This GPRS will help to communicate with microcontroller to monitor remote patient. The modem has GPRS and GSM conveniences.
6.4 Temperature Sensor It is used to measure patient’s body temperature. Thermistors are an effective temperature device.
6.5 ECG Electrode ECG electrode generally is used to detect electrocardiogram and for this we have to attach the electrodes in certain parts of patient’s body like chest, arm, and leg. It detects the ECG impulses after each heartbeat. It detects the ECG impulses and transmits it to machine. The machine is only going to convert impulse to wavy lines, and it is maintaining record also. It is helpful to diagnose the cardiac emergency and prevention from acute condition.
Implementation of e-Healthcare Data Acquisition System Using IoT …
19
6.6 Pulse Sensor It senses the pulse rate. It is helpful to enhance your exercise routine. It helps to get rid of anxiety level, lethargy, and nostalgic life. The sensor is plug and play pulse rate sensor. It senses from fingertip.
6.7 MEMS This accelerometer is acceleration sensor basically; it is helpful to detect frequency and intensity of human body movement. The MEMS technology to accelerometer is a relatively a new development. MEMS is used to detect deflection of human body from its current position. The deflection can be measured through electric signal as sensor output (Fig. 3).
Fig. 3 Hardware description
20
A. R. Mishra et al.
7 Software-Based Device The e-health system includes various software components and is a very flexible accommodate for hardware components in urge device support. Communication, transmission, effortlessly storage and display bio sensor data which can be synchronous or asynchronous. Event-driven architecture of Node.js is providing web interface and accessing the biosensor that significantly allow the users to monitor and analyze the patient’s record/data live or recorded. Node.js is one server site application that provides web interface. Cloud storage is used in backend to store our live record.
8 Experiment Outcome The healthcare system is placed on the patient in the process of care to improve the accuracy of measurements and the response time for result analysis. The healthcare system acts as a patient monitor that can collect the data and send real-time information of the patient’s breathing phase, heartbeat, temperature, and activity level. Figure 4 shows the commands to run the Raspberry Pi code. It is an interface to fetch the information and display record on console, coding for sensors is done in Python language, and particular sensors give output as per patient’s body reaction. Figure 5 shows the complete integrated hardware setup. In that different–different sensors are attached with heart Raspberry Pi like we have integrated body temperature sensor, pulse, and ECG sensor for taking information regarding patient’s condition. Zigbee is the replacement of Bluetooth in our hardware device. GPRS technique involves remote communication (Fig. 6). The above figure is showing the results on the graphical user interface in the monitor. This interface is user friendly, so that the observer can easily understand the data. If it is varying, the person can see the fluctuation of digits (Fig. 7). The above figure is showing the results on the web page (IoT). Through the Web site, we can monitor patient’s condition from anywhere even if the patient is far, then
Fig. 4 Raspberry Pi console
Implementation of e-Healthcare Data Acquisition System Using IoT …
21
Fig. 5 Integrated hardware setup
Fig. 6 GUI in standard output device
Fig. 7 Web-based result in web page (IoT)
also we can access the parameter very efficiently. We have used cloud in backend to store sensed data through sensors to visualize and for the comparison of fluctuated data.
22 Table 1 Patient heart rate, temperature, and ECG report observed for 7 days
A. R. Mishra et al. Day
Heart rate
Temperature (in Celsius)
ECG report
Day 1
100
37
Normal
Day 2
92
38
Normal
Day 3
98
36
Normal
Day 4
96
37
Normal
Day 5
102
38
High
Day 6
104
39
High
Day 7
99
37
Normal
There are some results we got from the patient by doing the example field test. We can monitor the heart rate, temperature, and ECG report that are changing every day (Table 1). Data shows us that the temperature, ECG, and heart rate arre fluctuating and it is not constant from day 1 to day 7.
9 Conclusions and Future Scope A wireless e-healthcare monitoring device is successfully designed, developed, and implemented. Even the device has been tested successfully in its field. We found the flexibility is the best because we can integrate as per the need to analyze, monitor, or access patient’s multiple parameters. E-healthcare system is very effective to monitor ECG, temperature, and pulse rate with the help of biosensors, even it is based on Zigbee so that it is cheaper and consumes low power. The medical information and record can be acquired for patients in personalized nature. This paper presented e-healthcare system combined with multiple sensors to sense patient’s physiological parameter. It integrates easier use and detailed graphical representation and efficiently allowing and viewing local and remote patient’s parameter. This e-health network technology gives better solution for post-operative patients or discharged patients from hospital as well as for continuous monitoring from home. We can provide other extensions in our current work. Multiple parameters also can be added like age, weight, and blood pressure in the future. Medical data is very sensitive, so we have to ensure its security and privacy that are inevitable in e-healthcare system.
Implementation of e-Healthcare Data Acquisition System Using IoT …
23
References 1. Maia P, Batista T, Cavalcante E, Augusto B, Delicato FC, Pires PF, Zomaya A (2014) A web platform for interconnecting body sensors and improving health care. Proc Comp Sci 40:135–142 2. Duquennoy S, Grimaud G, Vandewalle JJ (2009) The web of things: interconnecting devices with high usability and performance. In: IEEE international conference on embedded software and systems (ICESS). IEEE Explore, pp 323–330 3. de Morais Barroca Filho I, de Aquino Junior GS (2017) IoT-based healthcare applications: a review. In: International conference on computational science and its applications. Springer International Publishing AC Cham, pp 47–62 4. Rahmani AM, Thanigaivelan NK, Gia TN, Granados J, Negash B, Liljeberg P, Tenhunen H (2015) Smart e-health gateway: bringing intelligence to Internet-of-Things based ubiquitous healthcare systems. In: IEEE consumer communications and networking conference (CNCC). IEEE Explore, pp 826–834 5. Orha I, Oniga S (2014) Wearable sensors network for health monitoring using e-Health platform. Carpathian J Electron Comp Eng (CJECE) 7(1):25–29 6. Orha I, Oniga S (2013) Automated system for evaluating health status. In: IEEE international symposium for design and technology in electronic packaging (SIITME), IEEE Explore Galati Romania, pp 219–222 7. Yakut O, Solak S, Bolat ED (2014) Measuring ECG signal using e-health sensor platform. In: International conference on chemistry, biomedical and environment engineering (ICCBEE), vol 1, Kocaeli University research Information System, pp 71–75 8. Kodali RK, Swamy G, Lakshmi B (2015) An implementation of IoT for healthcare. In: IEEE recent advances in intelligent computational systems (RAICS). IEEE Trivandrum India, pp 411–416 9. Sebestyen G, Saplacan G, Krucz L (2010) CARDIONET – A Distributed e-Health System for Patients with Cardio-Vascular Diseases, Workshop on medical informatics, part of ICCP, Cluj-Napoca (2010). 10. Yang L, Yang SH, Plotnick L (2012) How the internet of things technology enhances emergency response operations. Technol Forecast Soc Change Elsevier 80:1854–1867 11. Sebestyen G, Tirea A, Albert R (2012) Monitoring human activity through portable devices. Carpathian J Electron Comp Eng (CJECE) 5:101–106 12. Kim B, Kim Y, Lee I, You I (2008) Design and Implementation of a ubiquitous ECG monitoring system using SIP and the ZigBee networks. In: Proceedings of the future generation communication and networking (FGCN 2007). IEEE Jeju, Korea, pp 599–604 13. Xueliang X, Cheng T, Xingyuan F (2007) A health care system based on PLC and ZiGbee. In: Proceedings of the international conference on wireless communication, networking and mobile computing. IEEE Shanghai, China, pp 3063–3066 14. Mishra R, Pandey A, Savariya J (2020) Application of internet of things: last meter smart grid and smart energy efficient system. In: Proceedings of the international conference on power, control and computing technologies, IEEE Raipur, India, pp 32–37 15. Mishra AR, Mishra R, Shukla R (2019) A survey on e-Health data acquisition system for monitoring patients using internet of things. J Emerg Technol Innov Res 6:458–463 16. Mishra AR, Mishra R, Shukla R (2021) Application and services of Internet of Things (IoT) in healthcare: a comprehensive review. Int J All Res Educ Sci Method 9:2048–2055
Review of Discrete Wavelet Transform-Based Emotion Recognition from Speech Aditi Anand, Aishwarya Nambiar, Shruti Pani, and Mohanaprasad Kothandaram
Abstract This paper aims to review the difference between pre-processing with and without Discrete Wavelet Transform (DWT), using the amalgamation of cepstral and non-cepstral feature extraction techniques. This paper has taken into consideration of three different classifiers (Support Vector Machine, Decision Tree and Linear Discriminant Analysis) to draw a comparative study based on their performance on the chosen feature set. The extraction of emotions from speech can be a challenging task. Four different features like zero-crossing rate (time-domain), energy (time-domain), pitch (frequency-domain) and Mel-Frequency Cepstrum Coefficients (MFCCs) (frequency-domain) used to outstrip the predicament of selecting appropriate features in the time domain and frequency domain. Zero-crossing rate combats the issue that arises while comprehending the stress in speech. In addition to this, the real-time gender-dependent database is incorporated in our model. It compares the results obtained with that of the current most prominently used gender independent emotion database (RAVDESS) to understand the gender dependency of the Speech Emotion Recognition (SER) system. These authors equally contributed to this work. Keywords Support vector machine · Decision tree · Linear discriminant analysis · MFCC · Gender-dependent database · Zero-crossing · Wavelet transform
1 Introduction Speech signals are known to be one of the fastest means of communication, making use of exceedingly in an irreplaceable capacity in our day to day activities. As effortless and unchallenging detection and comprehension are for humans, it is a strenuous and challenging task for the machines. Human discernment for emotion is exceptional in an acted speech in comparison with spontaneous speech since the former shows superior discriminating characteristics than the latter [1]. A. Anand · A. Nambiar · S. Pani · M. Kothandaram (B) School of Electronics Engineering (SENSE), VIT University, Chennai, Tamil Nadu 600127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_3
25
26
A. Anand et al.
Over the years, technological advancements in the field of emotion recognition from speech have evolved abundantly. The real-time scenarios such as depression detection [2], lie detection/evaluation [3, 4], psychiatric diagnosis [5], mobile services [6], health care [7] and diagnosis of auxiliary diseases [8–11] require emotion recognition from speech. In the speech recognition process, linguistic information plays a decisive role, as present-day studies have exemplified [12, 13] that it can append current knowledge to the conventional spectrum-based recognition systems, which enhances their accuracy. They seem to have increased robustness to acoustic degenerate from noise effects and channels [14, 15]. The often used method to assess the performance of a Speech Emotion Recognition (SER) system is recognition accuracy. Although over the years, SER systems have shown a remarkable advancement, yet it continues to lag way behind human performance, particularly in unpropitious conditions. Integration of SER with Automatic Speech Recognition (ASR) substantially enhances its capability when dealing with disparate emotional speeches [16]. Lately, several studies and researches dedicated to approaching the problem of dissimilarities among various languages. Hence, efforts have been made in focusing the abounding vocal cues [17] or feature normalization and selection algorithms [18]. A fundamental approach in speech recognition systems is identifying an acceptable and fitting set of features for any classifier. The primary indicators of the speaker’s emotional states are the prosodic features. Energy, pitch, formant, duration, linear prediction cepstrum coefficient (LPCC) and Mel-Frequency Cepstrum Coefficient (MFCC) are the crucial features as indicated by research on emotion of speech [19, 20]. In speaker-independent emotion classification systems, features characterizing the vocal tract such as MFCCs and LPC-based group delay overshadowed by vocal cord parameters such as pitch and energy [21]. MFCCs are very discerning but are faced with the affliction of speaker variability and do not work well with speaker normalization [22]. Pitch frequency directly affects the number of harmonics present in the spectrum [23]. Energy also aids in indicating the emotions; the study of energy is reliant on short-term average amplitude and short-term energy [23]. In conjunction with this, as a feature set, they made use of the glottal excitation in the standard speech production model. Prosodic features mainly deal with acoustic qualities of sound such as intonation, stress, voice quality, speech rate and rhythm [24]. It is beyond question that prosody plays an indisputable role in the speech act and communication in everyday lives [25, 26]. The speech becomes accustomed to the particularities and attributes of the speaker; thus, each person may use different variations of intensity, tone and rhythm in their spoken utterances. The emotion categories such as negative, non-negative emotion [27], the set of disgust anger, joy, fear, surprise, sadness and neutrality [28] are differentiated using numerous automatic emotion recognition using speech system. However, natural speech can take arbitrary value due to its binary nature, i.e. it takes a value in between them, for example, angry or happy. To express this continuum, emotions to be constituted by three attributes called as emotion primitives. As proposed in [29] expressing the negative in contrast to positive nature of not just an emotion but also activation,
Review of Discrete Wavelet Transform-Based Emotion Recognition …
27
outlining the excitation on a scale from calm to excited, and dominance, expounding the appearance of the person on a scale from subservient or compliant to controlling or assertive. They can be normalized to take values in the range of [−1, +1] each, without losing generality. The organization of MFCC provides the most practical combination of features for the speaker-dependent systems with broad spectral measures. Mel-Frequency Cepstral Coefficients (MFCCs) are known to be the most recurrently used parameters in the speech recognition technologies and state-of-the-art speaker [30, 31]. MFCC extraction process utilizes discrete cosine transform (DCT), but the frequency filtering (FF) method [32], on the other hand, makes use of a simple filter. Statistics of spectral features mixed with statistics of prosodic measures, in utterance-level emotion recognition, and categorization performed with the help of both sources. For instance, [33] made use of statistics such as energy, mean, skewness of pitch, standard deviation, range and MFCC to recognize emotions using Support Vector Machine (SVM) classifiers. Numerous types of classifiers have been diligently made use of for SER. These include artificial neural networks (ANN) [34], hidden Markov model (HMM) [35], Support Vector Machine (SVM) [36], Gaussian mixture model (GMM) [37], K-nearest neighbour (KNN) [38] and various others [36]. SVM and HMM are prominently used in nearly all speech-related applications [22, 35, 39, 40]. MFCCs or LPCCS are converted to suprasegmental parameterizations to link with prosodic feature vectors [34, 43, 44].To determine the global statistical features for classification, an alternative method for emotion recognition is the implementation of statistical functions to low-level acoustic features. One of the most exceedingly used classifiers for global features is SVMs [43, 44]. Other classifiers include decision trees [45] and K-nearest neighbour (KNN) [46]. These are increasing in Speech Emotion Recognition. The application, as mentioned above, requires empirically chosen high-dimensional handcrafted features. Based on the literature survey summarized in the paper, various limitations were identified such as complexity issues, selection of speech databases, a variation of accuracy based on gender, region and accent of speakers, the position of microphones, the best combination of algorithms. From the collection of energy, pitch and syllable duration values, local prosodic features are gathered. The combination of global and local features enhances the performance of the system. A speech utterance comprises silence, unvoiced and voiced speech. In addition to this, noisy and silent frames that are a part of the speech alleviate the computational complexity. Most prevalently used techniques for voice activity detection (VAD) are auto-correlation, zero-crossing rate and short time energy. Zerocrossing rate gives a high count in unvoiced speeches as opposed to voices speeches. MFCC and classical pitch performed well in comparison with TEO-AUTO-ENV and TEO-FM-VAR. Although principal component analysis (PCA) and independent component analysis (ICA) [54–56] are prominently used feature extraction techniques for various machine learning algorithms, they come with their shortcomings. These techniques are increasingly used for capturing the main essence of the data, thereby reducing
28
A. Anand et al.
the problem of redundancy and overfitting. However, with elimination, information loss occurs, leading to the degradation of accuracy. This paper, therefore, focuses on the basic technique such as MFCC, primarily because it offers sound error reduction and produces a robust feature in the presence of additive noise. Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM) are the two main classifiers used before the invasion of deep learning era. But problems arise while working with HMM and needed to be solved for it to be useful in realworld applications. These involve evaluation, decoding and learning. Also, in some practical situations, convergence problems arise while using GMM-based learning from demonstration (LFD). In this paper, SVM is preferred due to its accuracy and robustness, i.e. owing to an optimal margin gap between separating hyperplanes and its computational efficiency. On the other side, decision trees make automatic feature selection of the most discriminatory features, are faster to build and have fewer parameters to tune. Similarly, LDA is preferred because it is a simple model and helps in reducing dimensionality with less complexity. The organization of the paper is as follows: a quick overview of the emotion recognition system and data sets detailed in Sect. 2. Section 3 describes the implementation of the proposed method. Experimental results and discussion in Sects. 4, and 5 concludes the findings.
2 Speech Emotion Recognition System Figure 1 displays the general block diagram of any SER system. Feature extraction, training–testing model and prediction block are main building blocks of the SER systems. In any SER system, selection of features from the given speech data is a very crucial step as the working efficiency of the model depends on these feature data. Although to increase the accuracy and efficacy of the system, several pre-processing methods or selection of combinational classifiers or feature extraction techniques are required.
2.1 Feature Extraction Speech recognition systems prominently use spectral and prosodic features. For a better realization of anger and stress, Teager energy operator (TEO) features are essential. Gross statistics of features that are prosodic are made use in the computation of global features. From the collection of energy, pitch and syllable duration values are local prosodic features. The combination of global and local features enhances the performance of the system. A speech utterance comprises silence, unvoiced and voiced speech. In addition to this, noisy and silent frames that are a part of the speech alleviate the computational complexity. Most prevalently used techniques for voice activity
Review of Discrete Wavelet Transform-Based Emotion Recognition …
Feature ExtracƟon
Speaker 1
29
Model Training
Speaker 2 Trained Classifier
Test Speech
Feature ExtracƟon
PredicƟon
Recognizing EmoƟon Fig. 1 General block diagram of SER system
detection (VAD) are, auto-correlation, zero-crossing rate and short time energy. Zerocrossing rate gives a high count in unvoiced speeches as opposed to voices speeches. From the literature, the MFCC and classical pitch performed well in comparison with TEO-AUTO-ENV and TEO-FM-VAR. MFCC offers good error reduction and produces a robust feature in the presence of additive noise. Hence, normalization is required to lessen the impact of the noise. It has conclusively proved that MFCC, along with their derivations, is the most frequency chosen and preferred acoustic features for any feature extraction task. MFCC is known to have a superior-frequency resolution in the low-frequency region, and the robustness to noise is also commendable, in the case of the high-frequency coefficient. Hence, lay off the high-level order of the MFCC and work with only low-level order as audio feature parameters.
2.2 Classifiers The three most preferred algorithms employed for the classification purpose are Support Vector Machine (SVM), Decision Tree (DT) and Linear Discriminant Analysis (LDA).
2.2.1
Support Vector Machines
The Support Vector Machine (SVM) was initially brought into notice by Hava Siegelmann and Vladimir (Vapnik). Since then, it has gathered exorbitantly high rates of interest not only in the machine learning research community [49] but also in
30
A. Anand et al.
the pattern recognition community. Because it offers numerous computational and theoretical merits derived from the statistical learning theory [47]. It is also one of the most popularly used clustering algorithms in industrial applications. A Support Vector Machine (SVM) is a supervised machine learning model that makes use of classification algorithms for two-group classification problems. Usually, a linear SVM is implemented for linearly separable data to classify the data sets [48]. SVMs are the training samples that determine the optimal hyperplane. They are known to be substantially strenuous patterns to classify [49]. SVM classifiers considered to be the elite and reliable classifiers for emotion recognition, and classification [49, 50]. Support Vector Regression (SVR) is a redesigned version of SVMs which is used to dampen the structural precariousness and not just the training error (also known as empirical risk) [47]. The usage of this fits for a more complex, non-linear regression with the speedy delivery of outcomes at runtime; however, we in our paper have only used SVM algorithm.
2.2.2
Decision Tree
A basic structure of the decision tree shown in Fig. 2 comprises nodes that form a rooted tree; in simple words, it is a directed tree with a node called “root” with no incoming edges. It is drawn upside down with its root at the top, and this has known to be helpful for only having a better visual understanding but also for an explicit representation of a decision tree, which helps in making informed decisions. A test node or an internal node is the one with outgoing edges. Rest all the other nodes referred to as terminal or decision nodes (also known as leaves). In a decision tree, a splitting of the instance space into two or more sub-spaces occurs at each internal node under a particular discrete function of the input attributes values. Algorithms used in the construction of a decision tree includes (a) (b) (c) (d) A.
ID3 Gini Index Chi-Square Reduction in variance ID3—ID3 is the core building block of the decision trees. Now, this method employs a greedy algorithm with no backtracking and follows a top-down order. For the construction of the decision tree, ID3 makes use of information gain and entropy. Entropy: Now, since decision tree followed a top-down order starting from the root node and subjected to partitioning into subsets while moving down. These partitioned subsets consist of homogeneity of a sample. For the processing of building a decision tree, two types of entropy needed to compute with the help of frequency tables: (1) Computation of entropy of a single attribute using a frequency table
Review of Discrete Wavelet Transform-Based Emotion Recognition …
31
Decision Node
Decision Node
Decision Node
Sub-Tree Leaf
Leaf Decision Node
Leaf
Leaf
Leaf
Leaf
Fig. 2 Decision tree classification
E(S) =
c
− pi log2 pi
(1)
i=1
(2) Computation of entropy of two attributes using a frequency table E(T, X ) =
P(c)E(c)
(2)
cε X
Information Gain: After a data undergoes a split, information gain is calculated based on the reduction in the entropy and supersedes data splitting. B. Gini Index—This performs only binary splits. The higher its value, the greater the homogeneity in the data set. Since it works with two target variables— “Success” or “failure”, the computation of Gini for sub-nodes is done using the sum of probability for success and failure formula, which is (p2 + q2 ). C. Chi-Square—It is one of the most prominently used statistical algorithms, whose primary function is to compute the statistical implication between the differences between the parent node and the sub-nodes calculated using the formula, Chi - Square =
1 2
(Actual − Expected)2 Expected
(3)
D. Reduction in Variance—In contrast to all the algorithms mentioned above used for categorical target variables, a reduction in variance is utilized for continuous target variables. To choose the most fitting split, it makes use of the standard formula of variance. As a benchmark to split the dataset, the split with lower
32
A. Anand et al.
variance is selected and computed as Variance =
2.2.3
(x − x)2 n
(4)
Linear Discriminant Analysis
Linear Discriminant Analysis (LDA) is one of the well-known dimensionality reduction techniques mainly used for supervised classification problems. As the name suggests, a dimensionality reduction algorithm decreases the number of dimensions (i.e. variables) in a data set while conserving as much information as possible. To improve the discrimination between classes in a high dimensional vector space, LDA is a prominent method used in statistical pattern classification [51]. LDA proceeds to solve any problem at hand by assuming normal distribution conditional probability density functions p( x|y = 1) and p( x|y = 0) with covari− → − → ance as (μ1, 1 ) and means as (μ0, 0 ). As compared to Quadratic discriminant analysis (QDA), LDA makes a homo-scedasticity assumption, in other words, the covariance (all covariances have full rank) of the classes are identical, i.e. = 1 = 0. Under such circumstances, various terms cancel out:
x ) 1−1 x x ) 0−1 x = ( (
(5)
− → − → x ) i−1 μi = (μi ) i−1 x, where i is known as Hermitian. (
(6)
And this above-derived decision criterion now comes to be known as the threshold on the dot product. − → − → w. x > c, where w = −1 μ1 − μ0 and c is − 1 → − → → −1 − → −1 − T − μ0 μ0 + μ1 μ1 c= 2
(7)
The criterion of the x, which is the input and a part of “y” class, can be found as this linear combination of the given/found out observations. Doddington made use of LDA to employ state-specific transformation matrices on its basis [52]. This approach is different as, in case of high acoustic resolution, it made use of a single class independent LDA transformation matrix, which has proven to be highly effective [55]. The main objective is to find a linear transformation which can maximize the separation using a suitable criterion. To create a new axis, LDA follows two criteria: • The distance between the means of the two classes is maximum. • If any variation exists within each class, minimize it.
Review of Discrete Wavelet Transform-Based Emotion Recognition …
33
Fig. 3 2D LDA along a new axis
Fig. 4 New generated axis
On observing Fig. 3, a new axis (red) is generated and plotted in 2D satisfying the criteria mentioned above. Directly speaking, this newly generated axis not only increases the separation between the data points of the two classes but also all the data points of the classes are plotted on this new axis and are shown in Fig. 4 (X and Y are the two classes using which a new axis is generated). But a demerit of LDA is that it becomes impossible for it to find a new axis when the mean of the distributions is shared, which ultimately makes both the classes linearly separable. If such cases arise, go for non-linear discriminant analysis. The computation of transformation is done as the product of two scatter or covariance matrices (after finding the decomposition of the eigenvector) of the total-scatter matrix and the inverse of the average within-class scatter matrix.
3 Proposed Method Extraction of emotion from only speech can prove to be a challenging assignment. To outstrip the predicament of selecting appropriate features under both time and frequency domain. This paper entirely focuses on extracting four different features, namely zero-crossing rate (time-domain), energy (time-domain), pitch (frequencydomain) and Mel-Frequency Cepstrum Coefficients MFCCs (frequency-domain).
34
A. Anand et al.
Another issue that crops up is to understand the stress in the speech, to combat this issue with the help of zero-crossing rate. The idea implemented in this project is to provide the difference between preprocessing with DWT and without DWT using the amalgam of cepstral and noncepstral feature extraction techniques. This paper focuses on three different classifiers (Support Vector Machine, Decision Tree and Linear Discriminant Analysis) instead of using a single classifier, to make a comparative study based on each of their outcomes. The four emotional states considered in this paper are joy, sad, neutral and angry. Figure 5 summarizes the proposed methodology of this paper, and the terminologies used in the figure are detailed in their respective sections.
Speech Dataset
Feature Matrix
Classifier SelecƟon
LDA
Decision Tree
Train
Train
Model
Model
SVM
Train
Model I/P
I/P
Classify
O/P
Classify
O/P
Fig. 5 Workflow diagram of the proposed methodology
Classify
O/P
Review of Discrete Wavelet Transform-Based Emotion Recognition …
35
High Pass Filter Dj+1
2 Down Sampling
Aj
High Pass Filter 2
Aj+1
Dj+2
Down Sampling
2 Low Pass Filter 2
Aj+2
Low Pass Filter
Fig. 6 Block diagram of DWT system
3.1 Pre-Processing The proposed method uses two layers of Daubechies 3 wavelet (dB 3) for the preprocessing technique. The purpose of using Daubechies is that it outshines the other discrete wavelet techniques with regard to our experiment. After completing the preprocessing of the data set, the next step is the extraction of features from the given pre-processed data set. Figure 6 illustrates the working of DWT system in which input speech (Aj ) decomposed into approximation (Aj+1 ) and detailed coefficients (Dj+1 ). The low-frequency information is stored in approximation coefficients, whereas the detailed coefficients comprise high-frequency information. The input speech has to pass through High Pass Filter followed by a dyadic downsampler to obtain the detailed coefficients. Similarly, the input speech through Low Pass Filter instead of High Pass Filter, which is then accompanied by dyad downsampling to get approximation filter. The second level of decomposition obtained from the first level approximation coefficients to make the output more efficient.
3.2 Feature Extraction Mainly, zero-crossing rate, pitch, energy and MFCC features are extracted from each frame. Features are extracted by initially accumulating the speech samples and dividing them into frames of 30 ms with entails an overlap of about 75%. Afterwards,
36
A. Anand et al.
I/P Speech Data
Framing Window
FFT MEL Filter Bank
DCT
Logarithm
O/P MFCC Components Fig. 7 Block diagram of MFCC
making use of the MATLAB functions such as Speech Pitch Detector, audio plug-in example, on every frame, and then by using voiced speech to make sure if those chosen samples correlate to a voiced speech segment. Using the above steps, pitch and 13 MFCCs for the entire file needed to be determined, subsequently replace the first MFCC coefficients by log energy of the audio signal. Now only retrieve the pitch and MFCC information related to the voiced frames only. In the subsequent step, replace the first MFCC coefficients by the audio signal’s log energy to retrieve the pitch and MFCC information related to the voiced frames only. At last, normalization is performed on extracted features to eliminate the bias caused by the classifier as the computed pitch and MFCC are not on the same scale. Figure 7 shows the block diagram of MFCC, which is the short-term power spectrum of the speech signal. At first, the speech data performs the framing of 30 ms followed by windowing of the data using a hamming window to reduce the leakage. Afterwards, the data is processed through Fast Fourier Transform to convert the time domain to the frequency domain. The output of the frequency domain signal is filtered through the MEL Filter Bank. In MEL Filter Bank, the summation of energy occurs in each filter to form sub-band energy. At last logarithmic function is applied and is accompanied by discrete cosine transform for compression of the data.
3.3 Classification The pre-processed extracted features data from the previous step are used for classification. The three most preferred algorithms employed for classification purposes
Review of Discrete Wavelet Transform-Based Emotion Recognition …
37
are Support Vector Machine (SVM), Decision Tree (DT) and Linear Discriminant Analysis (LDA). The detailed study of the above classifiers is explained in Sect. 2. Finally, the confusion matrixes obtained by training the classifiers contain valuable information regarding the classifier performance and be one of the bases from where proper conclusions can be deduced.
4 Results and Discussion 4.1 Data Set 4.1.1
Ravdess
One of the most gender-balanced and a validated multi-modal database of emotional speech and song is RAVDESS. This database comprises 24 professional actors. The accent in this database is lexically vocalized and matched statements in a neutral North American accent. The emotions used in this speech database are notably surprise, happy, angry calm, sad, fearful and disgust expressions. On the other hand, the song consists of happy, sad, angry, calm and fearful emotions. The speech utterances have been recorded using Digi-design 003 mixing workstation and using Pro Tools 8. All this was carried out at a sampling rate of 48 kHz, 16 bit, with files saved in uncompressed wave format.
4.1.2
Real-Time Data
This paper uses a gender-independent speech data set, gender-dependent data set and real-time recorded data, for simulation. The real-time input consists of four emotions: joy, neutral, sad and angry, recorded by three female speakers. A total of 96 training samples and 21 testing samples have been recorded in the English language.
4.2 Experimental Output 4.2.1
SVM Output with RAVDESS Data Set
Without and with DWT The results obtained on Support Vector Machine without DWT are satisfactory, whereas, in the case of DWT, there is a swift increase in the accuracy. Without DWT, the training accuracy obtained using SVM is 47.83%. 118 emotions were classified accurately out of 121 during re-substitution, whereas with the use of DWT, the training accuracy obtained using SVM is 77.68%. 94 emotions were classified
38
A. Anand et al.
Fig. 8 Initial features of the data set without DWT and with DWT
accurately out of 121 during re-substitution Multiclass SVM was used for the classification among the four emotions and 13 features. Figure 8 explains the pattern of the initial data set without DWT and with DWT. Figure 9 shows the Test Data points (represented by X) plotted on a graph of trained data points showing their classification. Figure 10 shows the region of various emotions. The first figure describes that there is an overlapping of emotions. However, in the second figure after performing DWT, the regions for various emotions are quite clear and reducing the confusion for the model. Here × 1 and y1 are two different classes (i.e. training and testing points). This paper has obtained a testing accuracy of 47.83% on the test data set of 23 records. 11 out of 23 emotions were recognized correctly without DWT. With the inclusion of DWT, 17 emotions out of 23 emotions were correctly classified, giving a testing accuracy of 73.91%.
Fig. 9 Classified test point on trained model without and with DWT
Review of Discrete Wavelet Transform-Based Emotion Recognition …
39
Fig. 10 Plotting the region of various emotions without and with DWT
4.2.2
Decision Tree Output with RAVDESS Data Set
Without and with DWT The results obtained using the Decision Tree classifier are very good and best among the three classifiers used. The training accuracy obtained using D-tree is 100% in case of both without and with DWT. 121 emotions were classified accurately out of 121 during re-substitution. Testing accuracy of 56.52% was obtained on the test data set of 23 records. 13 out of 23 emotions were recognized correctly without DWT. With the inclusion of DWT, 21 emotions out of 23 emotions were correctly classified, giving a testing accuracy of 91.3%. This work has used a Multiclass Decision Tree to make classification among the four emotions and 13 features. Figure 11 shows the region of various emotions. In the first figure, there is the overlapping of emotions. However, in the second figure after performing DWT, the regions for various emotions are quite clear, reducing the confusion for the model. Here × 1 and y1 are two different classes (i.e. training and testing points).
Fig. 11 Regions of different emotions in DT obtained without and with DWT
40
A. Anand et al.
This paper is using a re-sampling procedure called cross-validation to evaluate various machine learning models on a limited data sample. In other words, it is a popular method that trains numerous models on subsets of the given input data and subsequently evaluates them on the subset of data, which is complimentary. The primary use of cross-validation of data is to detect overfitting (which results from failing to generalize a problem). The loss, computed between the response training data and the predicted response values of the model (based on the input training data), is called the re-substitution loss. This obtained value is a scalar value, and the value returned is 0 for both with and without DWT. In case of a decision tree, the re-substitution error is smaller than the crossvalidation error, which indicates overfitting. By pruning the original tree, a tree can be obtained that is less complicated and simpler, can be one of the best solutions offered to this. Pruning removes the sections of the tree that offers faint power to classify instances, thereby reducing the overall size of decision trees. This method not only decreases the complexity of the final classifier but also enhances predictive accuracy by the diminution of overfitting, which ultimately helps in reducing the cross-validation up to a definite point. The best decision tree selected here is the most straightforward tree with minimum cross-validation. In the end, without DWT’s use, the value obtained after pruning the decision tree is 0.0826, whereas, in the case of DWT, the value obtained is 0.067. Figure 12 shows the trained and test points of the Decision Tree classifier without and with DWT. Here X and Y are two different classes (i.e. training and testing points). Figure 13 displays the complete classification decision tree obtained with a pruning level of 8. K-fold loss computes the cross-validation error of 0.1157 for our model.
Fig. 12 Training and test point emotion classifier
Review of Discrete Wavelet Transform-Based Emotion Recognition …
41
Fig. 13 Complete classification tree obtained
With DWT Figure 14 displays the complete classification decision tree obtained. The best decision tree is the simplest tree with the least cross-validation. The value finally obtained after pruning the decision tree is 0.0667.
Fig. 14 Classification tree obtained
42
4.2.3
A. Anand et al.
LDA Output with RAVDESS Data Set
Without and with DWT Figure 15 depicts the region of various emotions. The first figure indicates that there is an overlapping of emotions. However, in the second figure after performing DWT, the regions for various emotions are quite clear, reducing the confusion for the model. Here × 1 and y1 are two different classes (i.e. training and testing points). Figure 16 depicts the Test Data Points, which are represented by X, and it is plotted on a graph of trained data points, which shows their classification. Figure 17 shows the misclassified points by LDA classifier without and with the inclusion of DWT, the training accuracy obtained using LDA is 97.52%, i.e. 118 emotions were classified accurately out of 121 during re-substitution in case of without DWT, whereas with the use of DWT, the training accuracy obtained is 83.3%, i.e. 100 emotions were classified accurately out of 120 during re-substitution.
Fig. 15 Region of different emotions classified without and with the use of DWT
Fig. 16 Classified test points on the trained model without and with the use of DWT
Review of Discrete Wavelet Transform-Based Emotion Recognition …
43
Fig. 17 Misclassified points obtained without and with the use of DWT
Without DWT, a testing accuracy of 60.87% is obtained on the test data set of 23 records. 14 out of 23 emotions were recognized correctly. The LDA cross-validation error is 0.0744. Obtained testing accuracy that is 78.6% and the confusion matrix of the LDA classifier with DWT. The LDA cross-validation error is 0.2667.
4.2.4
Confusion Matrix
Figure 18 illustrates the confusion matrices obtained using the three classifiers without DWT and comparison gives to get a better understanding of the efficiency of each of the classifiers on each emotion. The first confusion matrix demonstrates the accuracy result in the SVM classifier without DWT, i.e. 93.33%, 96.77%, 100% and 100% for anger, sadness, joy and neutrality, respectively. Similarly, the accuracy obtained in the LDA classifier depicted in the second confusion matrix in the absence of DWT is 96.77% for anger, 96.77% for sad, 100% for joy and 96.66% for neutral. Finally, Decision Tree’s precision without performing DWT is 100% for every emotion. In Fig. 19, the confusion matrices of three chosen classifiers for four different emotions, from the RAVDESS database and DWT. The first matrix demonstrates the accuracy result in the SVM classifier accompanied by DWT, i.e. 100%, 80%, 43.33% and 90% for anger, sad, joy and neutral, respectively. Similarly, the accuracy obtained in the LDA classifier portrayed by the second matrix, escorted by DWT is
Fig. 18 Confusion matrix of SVM, Decision Tree and LDA without DWT, respectively
44
A. Anand et al.
Fig. 19 Confusion matrix of SVM, Decision Tree and LDA with DWT, respectively
Table 1 Output of classifiers with and without DWT of RAVDESS database
Classifier Accuracy without DWT Accuracy with DWT (%) (%) SVM
47.83
73.91
D-TREE
56.52
91.3
LDA
60.87
78.26
93.33% for anger, 80% for sad, 63.33% for joy and 96.66% for neutral. Finally, the Decision Tree’s precision with DWT is 100% for every emotion. In Table 1, the testing accuracies of three different classifiers, i.e. SVM, Decision Tree and LDA before and after performing DWT from the RAVDESS database. This paper concludes that by incorporating DWT as a pre-processing step, the accuracy has dramatically increased.
4.2.5
SVM Output with Real-Time Data Set
With and without DWT The results obtained on Support Vector Machine are good enough. The training accuracy obtained using SVM is 94.73%. 90 emotions were classified accurately out of 95 during re-substitution. A Multiclass SVM makes classification among the four emotions and 13 features. Figure 20 depicts the four original features of the data set. Figure 21 shows the region of various emotions. The first figure shows that there is an overlapping of emotions. However, in the second figure after performing DWT, the regions for various emotions are quite clear and reducing the confusion for the model. Here x1 and y1 are two different classes (i.e. training and testing points). Figure 22 depicts the Test Data Points, which are represented by X. This has been plotted on a graph of trained data points which are showing their classification Without DWT, a testing accuracy of 81.82% on the test data set of 22 records. 18 out of 22 emotions were recognized correctly. With DWT, obtained testing accuracy that is 90.91%, 20 out of 22 emotions were recognized correctly.
Review of Discrete Wavelet Transform-Based Emotion Recognition …
Fig. 20 Original features of the data set
Fig. 21 Regions of various emotions classified by the SVM classifier
Fig. 22 Trained and test points of the SVM classifier with and without DWT
45
46
A. Anand et al.
Fig. 23 Complete decision tree without implementing DWT
4.2.6
Decision Tree with Real-Time Data Set
With and without Dwt Figure 23 displays the complete classification decision tree obtained. The best decision tree is the simplest tree with the least cross-validation. The value finally obtained after pruning the decision tree is 0.035, as shown in Fig. 24. Figure 25 displays the complete classification decision tree obtained. The best decision tree is the simplest tree with the least cross-validation. The decision tree is behaving as a simple decision tree and giving the crossvalidation error and re-substitution error as 0, as shown in Fig. 26. Figure 27 shows the region of various emotions. There is an overlapping of emotions in the first figure, and it is quite clear and significantly reducing the confusion for the model in the second figure after performing DWT. Here x1 and y1 are two different classes (i.e. training and testing points). Figure 28 depicts the Test Data Points which are represented by X. This has been plotted on a graph of trained data points, which are showing their classification.
4.2.7
LDA Classifier with Real-Time Data Set
With and without Dwt Figure 29 shows the misclassified points by the LDA classifier without DWT. Without DWT, the training accuracy obtained using LDA is 91.57%. 87 emotions were classified accurately out of 95 during re-substitution. Testing accuracy of 81.82% on the
Review of Discrete Wavelet Transform-Based Emotion Recognition …
47
Fig. 24 Simple tree with small CV error without implementing DWT
Fig. 25 Complete decision tree with implementing DWT
test data set of 22 records. 18 out of 22 emotions were recognized correctly. The LDA cross-validation error is 0. 1368. With DWT, the training accuracy obtained using LDA is 98.9%. 94 emotions were classified accurately out of 95 during resubstitution. Testing accuracy of 95.45% the test dataset of 22 records. 21 out of 22 emotions were recognized correctly. The LDA cross-validation error is 0.0421.
48
A. Anand et al.
Fig. 26 Simple tree with small CV error with implementing DWT
Fig. 27 Regions of various emotions classified by the DT classifier with and without DWT
Figure 30 depicts the Test Data Points, which are represented by X. This has been plotted on a graph of trained data points which are showing their classification. Figure 31 shows the region of various emotions, and in the first figure, there is an overlapping of emotions. The second figure after performing DWT the regions for various emotions are quite clear and reducing the confusion for the model. Below, x1 and y1 are two different classes. Table 2 illustrates the testing accuracy of three different classifiers, i.e. SVM, Decision Tree and LDA before and after performing DWT from the real-time database.
Review of Discrete Wavelet Transform-Based Emotion Recognition …
Fig. 28 Test Data Points
Fig. 29 Misclassified points by LDA classifier with DWT
Fig. 30 Training and test point Emotion Classifier
49
50
A. Anand et al.
Fig. 31 Regions of various emotions classified by the LDA classifier
Table 2 Output of classifiers with and without DWT of real-time database
Classifier Accuracy without DWT Accuracy with DWT (%) (%) SVM
81.8
90.91
D-TREE
95.45
100
LDA
81.82
95.45
SVM is significantly increased by 9.11% after incorporating DWT technique. Likewise, the decision tree rose from 95.45 to 100%. Finally, in the LDA classifier, the accuracy without DWT is surged by 13.63% by including DWT in the process.
5 Conclusion In this paper, the three different classifiers are compared with and without DWT. The simulation results show that the pre-processing of the data with DWT does improve the accuracy of the classifiers. It also helps to recognize the emotions in a better and more accurate way in both the data sets. Also, it is worth observing that gender dependency does increase the accuracy of the SER system. By comparing the results, it concludes that the efficiency of the real-time data, which was gender-dependent, is performing far better than the typical RAVDESS dataset. After conducting the required simulation, it is concluded that the accuracy of any SER system, not only depends on the selection of methods, classifiers and the data set. The accuracy also depends on the language, gender, accent and the quality of the speech data. The above factors play a very significant role in determining the efficiency and accuracy of the SER systems.
Review of Discrete Wavelet Transform-Based Emotion Recognition …
51
References 1. Schuller B, Seppi D, Batliner A, Maier A, Steidl S (2007) Towards more reality in the recognition of emotional speech. IEEE Conference Publication. https://ieeexplore.ieee.org/ 2. Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F et al (2017) Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Commun 90:39–46 3. Gaspar J, Schweitzer M (2013) The emotion deception model: a review of deception in negotiation and the role of emotion in deception. Negot Confl Manage Res 6(3):160–179 4. Soumya Barathi C (2016) Lie detection based on facial micro expression, body language and speech analysis. Int J Eng Res V5(02) 5. Low D, Bentley K, Ghosh S (2020) Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Invest Otolaryngol 5(1):96–116 6. Yoon WJ, Park KS (2007) A study of emotion recognition and its applications. In: Torra V, Narukawa Y, Yoshida Y (eds) Modeling decisions for artificial intelligence. MDAI 2007. Lecture Notes in Computer Science, vol 4617. Springer, Berlin, Heidelberg 7. Johnson M, Lapkin S, Long V et al (2014) A systematic review of speech recognition technology in health care. BMC Med Inform DecisMak 14:94 8. Erik Marchi O (2012) Emotion in the speech of children with autism spectrum conditions: prosody and everything else. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.430. 9242 9. Fan Y, Cheng Y (2014) A typical mismatch negativity in response to emotional voices in people with autism spectrum conditions. PLoS ONE 9(7):e102471 10. Landau M (2008) Acoustical properties of speech as indicators of depression and suicidal risk. Vanderbilt Undergrad Res J 4 11. Sönmez M, Shriberg E, Heck L, Weintraub M (1998) Ee.columbia.edu. Retrieved 9 June 2020, from https://www.ee.columbia.edu/~dpwe/papers/SonSHW98-prosmod.pdf 12. Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A et al (2003) The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. Defense Technical Information Center 13. Carey M.J, Parris E.S, Lloyd-Thomas H, Bennett S. (1996): ‘Robust prosodic features for speaker identification’. Proc. ICSLP, Philadelphia, PA. 14. Atal B (1969) Automatic speaker recognition based on pitch contours. J Acoust Soc America 45(1):309–309 15. Fayek H, Lech M, Cavedon L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw 92:60–68 16. Johnson W (1986) Recognition of emotion from vocal cues. Arch Gen Psychiatry 43(3):280 17. Huang C, Song B, Zhao L (2016) Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. Int J Speech Technol 19(4):805–816 18. Ljolje A, Fallside F (1987) Recognition of isolated prosodic patterns using hidden Markov models. Comput Speech Lang 2(1):27–34 19. Shen P, Changjun Z, Chen X (2011) Automatic speech emotion recognition using support vector machine—IEEE conference publication. https://ieeexplore.ieee.org/document/6023178/ 20. Chomphan S (2009) Towards the development of speaker-dependent and speaker-independent hidden Markov model-based Thai speech synthesis. J Comput Sci 5(12):905–914 21. Sethu V, Ambikairajah E, Epps J (2009) https://www2.ee.unsw.edu.au/~speechgroup/vidhya/ pdf/icassp09.pdf 22. Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181 23. Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comp Appl 1(20):8–11 24. Tusón J (2000) Diccionari de lingüística. Bibliograf [Publication in Spanish] 25. Nooteboom S (2014) The prosody of speech: melody and rhythm. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences, Blackwell Publishers Ltd., Oxford, pp 641–673
52
A. Anand et al.
26. Wennerstrom A (2001) The music of everyday speech: prosody and discourse analysis. Oxford University Press, Oxford 27. Lee CM, Narayanan S, Pieraccini R (2001). Recognition of negative emotions from the speech signal. In: Proceedings IEEE automatic speech recognition and understanding Wsh (ASRU) 28. Schuller B, Reiter S, et al (2005). Speaker Independent Speech Emotion Recognition by Ensemble Classification. In Int. Conf. on Multimedia and Expo, pp. 864–867. 29. Kehrein R (2002) The prosody of authentic emotions. In: Speech prosody conference, pp 423–426 30. Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustic Speech Signal Proc 357–366 31. Oppenheim AV, Schafer RW (2004) From frequency to quefrency: a history of the cepstrum. IEEE Signal Process Mag 95–106 32. Nadeu C, Hernando J, Gorricho M (1995) On the decorrelation of filter bank energies in speech recognition. In: Proceedings of Eurospeech, Madrid, Spain 33. Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: Proceedings of Eurospeech conference 34. Davood G, MansourS AN, Sahar G (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl 21(8):2115–2126 35. New TL, Foo SW, Silva LCD (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623 36. Ramakrishnan S, Emary IM (2013) Speech emotion recognition approaches in humancomputer interaction. Telecommun Syst 52(3):1467–1478 37. Sungrackand Y, Yoo CD (2011) Loss scaled large margin Gaussian mixture models for speech emotion classification. AIEEE Trans Audio, Speech Lang Proc 20(2):585–598 38. Pao TL, Chen YT, Yeh JH, Cheng YM and Lin YY (2007) A comparative study of different weighting schemes on KNN based emotion recognition in Mandarin speech. Adv Intell Comput Theor Appl with Aspects Theor Methodol 4681:997–1005 39. Morrison D, Wang R, Silva LD (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112 40. Mao X, Chen L (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst E93–D(8):2324–2326 41. Schuller B, Muller R, Lang M, Rigoll G (2005a) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of 9th Eurospeech–Interspeech. 42. Vogt T, André E (2006) Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of language resources and evaluation conference 43. Eyben F, Wollmer M, and Schuller B (2009) OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: Proceedings of ACII 2009. IEEE, pp 1–6 44. Mower E, Mataric MJ, Narayanan S (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19(5):1057–1070 45. Kim Y and Mower Provost E (2013). Emotion classification via utterance-level dynamics: a pattern-based approach to characterizing affective expressions. In: Proceedings of IEEE ICASSP 2013. IEEE 46. Yu D, Seltzer ML, Li J, Huang J-T and Seide F (2013). Feature learning in deep neural networksstudies on speech recognition tasks, Published at ICLAR 47. Vapnik V (1995) The nature of statistical learning theory. Springer, New York 48. Lim and Chang (2012) Enhancing support vector machine based speech/music classification using conditional maximum a Posteriori Criterion. Signal Process IET 6(4):335–340 49. Hasan Md. Al, Ahmad S (2018) predSucc-Site: lysine succinylation sites prediction in proteins by using support vector machine and resolving data imbalance issue. Int J Comp Appl 182(15):8–13
Review of Discrete Wavelet Transform-Based Emotion Recognition …
53
50. Yacoub S, Simske S et al (2003) Recognition of emotions in interactive voice response systems. Tech Rep, HP Laboratories Palo Alto 51. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York 52. Doddington GR (1989) Phonetically sensitive discriminants for improved speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Glasgow, UK, pp 556–559 53. Haeb-Umbach R, Geller D and Ney H, (1993) Improvements in connected digit recognition using linear discriminant analysis and mixture densities, proceeding ICASSP 54. Mohanaprasad K, Arulmozhivarman P (2015) Wavelet-based ICA using maximum likelihood estimation and information-theoretic measure for acoustic echo cancellation during double talk situation. Circ Syst Sign Process (Springer) 34(12):3915–3931 55. Mohanaprasad K, Arulmozhivarman P (2015) Wavelet based ICA using maximization of nongaussianity for acoustic echo cancellation during double talk situation. Appl Acous (Elsevier) 97:37–45 56. Mohanaprasad K, Singh A, Sinha K, Ketkar T (2019) Noise reduction in speech signals using adaptive independent component analysis (ICA) for hands free communication devices. Int J Speech Technol 22(1):169–177
Network Intrusion Detection Using Machine Learning Pratik Kumar Prajapati, Ishanika Singh, and N. Subhashini
Abstract With the rapid growth in the use of computer networks, the issues of maintaining a network also increase. Moreover the attackers keep changing their tools and techniques. With this the necessity to adopt the right type of intrusion detection system (IDS) increases. An IDS monitors the network for suspicious activity and the computer network is protected from unauthorized access. In the proposed work, a network intrusion detection system was developed using various machine learning classifiers on the KDD99 data set, which is a predictive model that can distinguish between intrusions and normal connections. In the early stage of our implementation, we found the correlation between the attributes and further mapped the classification characteristics as part of data pre-processing. Subsequently, we implemented several models, such as Naive Bayes, Decision Tree, Random Forest, Support Vector Classifier, Logistic Regression and Ensemble Voting Model. All the models were analysed based on their scores on metrics such as accuracy, recall, f1-score and precision along with training and testing times. The implemented models demonstrated that Naive Bayes classifier achieved the lowest accuracy, whereas Random Forest had the highest accuracy and best scores on all the metrics. Keywords Machine learning · Intrusion detection system · KDD99 · Classification
1 Introduction Security threats and attacks such as unsecured network intrusion, policy violations and more are now increasingly common to companies around the world, leading to huge revenue losses. It is important to make sure your business doesn’t fall victim to intrusion attacks. Due to implementation and design flaws, the network is vulnerable P. K. Prajapati (B) · I. Singh School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India e-mail: [email protected] N. Subhashini School of Electronics Engineering, Vellore Institute of Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_4
55
56
P. K. Prajapati et al.
to various attacks. The main flaw can be the cause of the most serious vulnerabilities. These vulnerabilities can be exploited by attackers by following a sequential process. Intrusion attacks can cause serious damage to your network and embedded systems. With the continued growth of online services, such as online banking and e-commerce, security has become an important requirement. An intrusion detection system (IDS) is a type of hardware and software used to identify and reduce threats and attacks. An IDS monitors your inbound and outbound network traffic and continually analyses activity patterns while alerting you to abnormal network behaviour. Misuse IDS uses malicious signature tables to identify attacks. If a packet matches the table signature, an alert will be issued. The anomaly detection system defines the normal behaviour of the network and the host. The normal state of the network composed of the network and the host means that there is no attack state. When abnormal activity occurs, abnormality detection will alarm. Anomaly detection system is different from Signature and Misuse Detection systems as it can detect new unknown attacks because the detection strategies of these systems are based on state change information, rather than the matching of attack signatures. The IDS operating system monitors all incoming and outgoing traffic, monitors data packets transmitted across the network and issues a warning if traffic deviates from normal pattern. In this way, one is warned about threats of entry in advance, which allows one to respond to such threats firmly. But we must remember that a coin has two sides and it all comes with pros and cons. Sometimes IDS can send fake alarms which needs to be identified by the organization. They need to configure the system so that they can easily see normal traffic and can easily distinguish any dangerous activity. Also, IDS helps to monitor the network but does not prevent or solve such issues. The organization needs to hire the right staff to look into such threats and take immediate action. IDS also suffers from a major problem that they fail to detect suspected new intrusions, as the new malware does not reflect a pattern of previously unfamiliar behaviour. Therefore, the IDS must take steps to find a new code of conduct for the organization to take immediate security measures against such a threat. Machine learning allows computers to learn and improve from experience without the need for explicit programming. The learning process begins with observing data and looking for patterns to predict categories. It can be divided into supervised learning and unsupervised learning. The unsupervised learning algorithm learns patterns without any labelled data set and reports anomalies. It can detect new types of attacks and are prone to false positives. To reduce this situation, we can use a marked data set and build a monitoring model which teaches it the difference between normal and attack packets. It can handle known attacks and can also detect changes in attacks. The supervised learning algorithms used in this work are Naive Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, Logistic Regression and Ensemble Voting Model. In the rest of the paper, Sect. 2 gives an overview of the literature survey we have done for this. Section 3 describes the methodology and the various algorithms used; in Sect. 4, implementation details are described. The detailed analysis of these IDS models is discussed in Sect. 5 of the paper along with the description of the
Network Intrusion Detection Using Machine Learning
57
parameters utilized for analysis as well as a brief discussion on the findings. Finally, Sect. 5 draws a conclusion of this study and proposes the future work which can be carried out for analysing and building more robust and efficient ML-based models.
2 Literature Survey Separating undesirable network traffic from the normal traffic is a difficult task. An analyst should search for a large amount of data to detect unfavourable network connections. To support this work, an application is required that enhances its domain knowledge using machine learning techniques to create the rules for detection. Genetics and decision-making technologies are used to automatically generate rules for severing network traffic. In paper [1], the authors explain how machine learning can be used for this application. In paper [2], the authors use multiple machine learning algorithms such as decision trees, random decision forests, random trees, Bayesian networks, Naive Bayes classifiers, decision tables and artificial neural networks to detect the impact of these algorithms in the field of network security. Finally, the performance of different tests on the network security data set is evaluated through the various stages of the network attack, and indicators of performance, clarity, recovery, f1 point and accuracy are evaluated. In paper [3], distribution histograms, distribution areas and information acquisition are presented as tools for feature reduction. The process of reducing usage characteristics is based on deforestation and rebound. They first analyse the data from KDD Cup ‘99’ and its potential for feature reduction. The database contains 41 link connection records that support input detection. The whole path is classified as “normal” or divided into four types of active attacks, network detection, remote to location or user to root. Using their custom feature selection process, it shows that numerical features in the database can be significantly reduced to a few key features. Finally, a small collection of two-stage and multipoint detection is introduced with 48 outstanding characteristics, as well as the category discovery of each attack; the performance of using vertical differentiation and the good performance of using all available functions. The proposed procedure is standard and can be used for any similar database. In paper [4], the author works with PCA in selecting a feature with the Naïve Bayes in stages to create a network access system. Experimental analysis is made using the KDD Cup 1999 intrusion detection benchmark data set. Phase 2 separation is performed. Test results show high accuracy of the proposed method with a lower positive level, and the time taken is less compared to other methods available during the construction of an effective network access system. In paper [5], author compared the results of distributed attack detection through deploying a deep learning model in a single node and multiple coordinated nodes. They also evaluated the effectiveness of deep learning and shallow learning approaches. In [6], the author aims to use data mining techniques including tree splitting and vector support equipment to detect
58
P. K. Prajapati et al.
intrusion detection. As the results show, the C4.5 algorithm is better than SVM in detecting network intrusion and false alarm level in the KDD CUP 99 database. In [7], an intelligent access system (AI) using deep neural network (DNN) was studied and tested using KDD Cup 99 data to deal with ongoing cyber-attacks. First, the data is processed by modifying the data according to the requirement of DNN model. Then, DNN algorithm was used in the pre-refined data for creating a training model. The performance of the DNN pattern was studied including accuracy, rate of detection and false alarm level. It is shown to produce positive results for intrusion detection. In [8], an in-depth study of the lessons learnt in the use of machine learning IDS is presented. Two ways have been introduced to detect network attacks: a treebased medicine study with genetic engineering that can surpass modern results in the field, and a new approach to selecting IDS training data. By using a small subset of training data combined with other weak class algorithms, an improved detector performance at lower running costs has been achieved. In paper [9], work has been carried out on reducing the responsibility of an analyst using automatic detection and to provide protection in an effective way. The method continuously scans the system and takes action according to the threat identified. Suspicious activity is detected with the help of a freelancer who works as a visual analyst who works at the same time as the system to access the network to protect the threatened area and take appropriate action. In the last stage, a package was developed to evaluate all the vectors of attack and separate supervised and unverified data. When unprotected data is encoded or converted into monitored data, the algorithm will be automatically updated (Virtual Analyst Algorithm). If the algorithm uses activelearning mechanisms over time, it works more efficiently. In this article [10], they reviewed the Aegean Wi-Fi Intrusive Int (AWID) data set for various ML methods. Various feature reduction strategies such as information gain (IG) and chi-square (CH) statistics were used for coefficient reduction and the database performance assessed. Test results show that reduced functionality leads to better analysis in terms of accuracy, processing time and difficulty. In paper [11], SparkChiSVM sample is used for intrusion detection. In this work, the feature selection is done using ChiSqSelector and then built an intrusion detection model using the Support Vector Machine Classifier. They have used KDD99 data for model training and testing. They have compared both ChiSVM classifier and the ChiLogistic regression classifier. The results of the test have shown that the SparkChiSVM model works very well, reducing the training time and efficiency for large data. In paper [12], Measured Intrusion Detection System (MIDS) is proposed which allows the system to detect any unusual activities in the system even if the attacker hides it. A supervised ML model is used to classify the activities in an industrial control system to assess the performance of MIDS. Although MIDS can detect anomalies, it cannot prevent malicious intrusions at the network traffic layer. MIDS can detect anomalies when they successfully trick the NIDS into mimicking normal behaviour, so MIDS is not a substitute for the NIDS. In paper [13], the ten most popular ML algorithms are evaluated on the NSLKDD data set. Then, evaluations of these ML models are made to determine the best ML algorithm according to the performance they give based on various parameters such as specificity, precision and
Network Intrusion Detection Using Machine Learning
59
sensitivity. After the analysis of the four best algorithms, it is observed that they take lot of time in model building process. Therefore, to get the shortest possible time a feature selection method is applied without any compromise in accuracy. It is observed that even after using the feature selection or reduction methods, the four best models take more time for model building than the Random Tree (without feature selection and reduction) and achieving almost the same precision. In [14], it was observed that a machine learning model based on artificial neural network (ANN) with feature selection wraps better than the Support Vector Machine (SVM) technique when distinguishing the network traffic. The NSLKDD data set is used here to classify the network data using ANN and SVM supervised techniques. On comparison, it is observed that the model proposed is more efficient than the previous existing models. Most of the methods that we discussed above, time to build a model and train a model is not considered for the model comparison and evaluation. However, in real world scenarios it is an important parameter. Hence, we consider time along with accuracy to build an ideal model which achieves low false alarm and high accuracy.
3 Methodology This paper is about building an intrusion detection system using machine learning techniques. The learning task of an intrusion detector is to build a predictive model (i.e. classifier) that is able to distinguish between “bad connections” (intrusion/attack) and good connections (normal). In Fig. 1, system architecture is presented. In the presented work, we first intercept the network traffic using already available tools. The captured packet is then passed thorough the pre-trained ML classifier which is a part of intrusion detection. As per the classification, the alert will be generated, thus blocking or forwarding the packet. For intercepting the packet, we can use arp poisoning and then analyse and forward the packet in a remote machine, or we can use a host only interceptor as well.
Fig. 1 Proposed system architecture
60
P. K. Prajapati et al.
This work mainly focusses on the intrusion detection part of the system architecture, under which we have trained various ML classifiers and analysed them for faster and accurate response for alert generation. In Fig. 1, system architecture is presented. In the presented work, we will first intercept the network traffic using already available tools, then the captured packet would be passed thorough the pre-trained ML classifier which is part of intrusion detection, as per the classification the alert will be generated, thus blocking or forwarding the packet. For intercepting the packet, we can use arp poisoning and then analysing and forwarding the packet in a remote machine, or we can use a host only interceptor as well. This work is mainly focused on the intrusion detection part of the system architecture, under which we have trained various ML classifiers and analysed them for faster and accurate response for alert generation.
4 Implementation In this section, we have discussed the implementation details of our work, it includes data set exploration and pre-processing and modelling of the classifier for training and testing.
4.1 Dataset Exploration and Pre-processing The KDD Cup’99 data set is itself a subset of original intrusion detection evaluation data set compiled by MIT’s Lincoln Laboratory. It consists of network connection traces representing attacks and normal network activity. Each connection is described by 41 features such as duration, protocol type, the number of source bytes and the number of destination bytes attacks are of four different types: Denial of Service (DoS), R2L (unauthorized access from a remote machine), U2R (unauthorized access to local superuser (root) privileges), Probing (surveillance and other probing). Figure 2 shows the frequency distribution (Y-axis) of different types of attacks in the data set (X-axis). In Fig. 3, the workflow is presented. In pre-processing, we find missing values and normalize them. The categorical features are found, and mapping is done. The correlation between the attributes is then found to assess the highly correlated features. If two features are highly correlated, then we remove one of them from the data frame as highly correlated features don’t contribute much in training a model.
Network Intrusion Detection Using Machine Learning
61
Fig. 2 Number of entries for each type of attack
Fig. 3 Workflow diagram
4.2 Classifier Modelling and Training In the modelling phase, we split the training and testing set in the ratio of 75% to 25%. Each of the models is then trained separately, and their accuracy, training and testing time are recorded for further analysis. In decision tree, we used the ID3 method, and entropy was calculated to determine the root node. In Logistic Regression, the maximum number of iteration for normalizing the classification boundary was set to 1,200,000. Along with decision tree we also used Random Forest which is an ensemble model for decision tree and
62
P. K. Prajapati et al.
the number of estimators, i.e. number of different trees to be formed from the data set is 10. In Support Vector Machine, linear kernel was considered with scaling. In Ensemble Voting method, we took Logistic Regression, Random Forest and SVM as the base models and used hard voting to get the output. We further train Naïve Bayes, decision tree, Logistic regression, Random Forest, Support Vector Machine and Ensemble Voting model using sklearn library in Python and record the training and testing time and accuracy for each model.
5 Results and Analysis In this section, we analyse the various classification techniques such as Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR) and Ensemble Voting Classifier (En) based on their scores on different performance metrics. Precision is defined as the percentage of accurate predictions on the test data. It can be calculated by dividing the number of related predictions by the total number of predictions. In our work, we have considered both training and testing accuracy of the models to make sure that the models don’t overfit or underfit. Apart from precision and time, we have also taken recall, fscore and accuracy as metrics for model analysis. The formal equations for the metrics are Precision = Recall =
TP TP + FN
(1) (2)
Precision ∗ Recall Precision + Recall
(3)
TP + TN TP + TN + FP + FN
(4)
Fscore = 2 ∗ Accuracy =
TP TP + FP
Here TP denotes true positives, TN denotes true negative, FP denotes false positives and FN denotes false negative. The training and testing scores obtained didn’t show much variation for a model, so all the models were properly fitting. The accuracy percentages of training score of models and testing score of models are given in Table 1. Figure 4 plots the accuracy scores of the models; we can see that apart from Naïve Bayes (NB), all other models have shown high accuracy scores. Similarly, we have recorded the training time and testing time (in seconds) of all models and are provided in Table 2.
Network Intrusion Detection Using Machine Learning
63
Table 1 Training and testing accuracy of models Models
Naïve Bayes
Decision tree
Random forest
Support vector machine
Logistic regression
Ensemble learning
Training accuracy
87.96
99.05
99.99
99.84
99.19
99.71
Testing accuracy
87.93
99.04
99.99
99.85
99.18
99.70
Fig. 4 Accuracy of models
Table 2 Training and testing time of models Models
Naïve Bayes
Decision tree
Random forest
Support vector machine
Logistic regression
Ensemble learning
Training time(s)
0.6579
1.1692
8.4544
261.3549
10.5083
273.5025
Testing time(s)
0.4298
0.0349
1.2080
32.2670
0.5549
33.4788
Figure 5 shows the variation in training time among the models, huge variation is observed among the models. It is observed that SVM and Ensemble models (En) were slow in training compared to others.
64
P. K. Prajapati et al.
Fig. 5 Plot of training time of models
In Table 3, we have recorded the precision, recall, f1-score and accuracy for all the models. The recorded data shows that even though all the models have high accuracy the other metrics vary substantially. Decision tree showed high accuracy but recall, f1-score and precision were least with respect to other models. Similarly logistic regression, support vector and Ensemble Voting classifiers showed high accuracy and had a good precision, recall and f1-score but still were outperformed by Random Forest classifier which showed the highest scores in all the performance metrics used. In Fig. 6, we can see the visual representation of the scores obtained by all the models for different performance metrics. In the plot, we can observe that Random Forest (RF) which is represented in green scores the highest in accuracy, f1-score, recall and precision substantially as compared to other models. It is observed that the Random Forest model fits well with our data by looking at all the metrics and then complexity of the time. It shows high accuracy, precision, Table 3 Accuracy, f1-score, recall and precision for all the models Naïve Bayes
Decision tree
Random forest
Support vector
Logistic regression
Ensemble voting
Accuracy
0.879
0.990
0.999
0.998
0.991
0.997
F1-score
0.447
0.531
0.936
0.908
0.829
0.842
Recall
0.738
0.574
0.913
0.886
0.782
0.795
Precision
0.466
0.505
0.967
0.934
0.914
0.932
Network Intrusion Detection Using Machine Learning
65
Fig. 6 Scores of all the models based on accuracy, f1-score, recall and precision
recall and f1-score and over that the training time even though not the fastest was still taking less training time compared to SVM, Logistic Regression and Ensemble Voting classifier.
6 Conclusion Rapid growth in Internet use and the creation of a wealth of valuable digital data are encroaching on intruders for illegal financial gain. This work examines the use of ML algorithms in IDS and conducts specific research on the KDD’99 Cup data set using ML algorithms, and then compares their results. It is important to note that even with same feature selection techniques used for all classifiers, the classifiers which performed best in terms of accuracy are not always the best in terms of training and testing time, but the only classification model which gives high accuracy in a comparatively lesser time is Random Forest. The above derived conclusion is based on the KDD’99 database, and the performances of these classifiers may vary with different datasets. Thus, it is important to know the right choice of classifier to choose from, for real time use. Our results will provide the future researchers with an in-depth performance analysis for most common classification methods. In future work, we can evaluate the performances of these ML algorithms with different feature selection methodologies along with comparison of these ML algorithms on basis of their space requirements and build a real-time system which deals with high speed network data.
66
P. K. Prajapati et al.
References 1. Sinclair C, Pierce L, Matzner S (1999) An application of machine learning to network intrusion detection. In: Proceedings 15th annual computer security applications conference (ACSAC’99), 1999 2. Alqahtani H, Sarker IH, Kalim A, Minhaz Hossain SM, Ikhlaq S, Hossain S (2020) Cyber intrusion detection using machine learning classification techniques. In: International conference on computing science, communication and security, 2020 3. Almseidin M, Alzubi M, Kovacs S, Alkasassbeh M (2017) Evaluation of machine learning algorithms for intrusion detection system. In: 2017 IEEE 15th international symposium on intelligent systems and informatics (SISY), 2017 4. Neethu B (2013) Adaptive intrusion detection using machine learning. IJCSNS Int J Comp Sci Netw Secur 13(3) 5. KAP da Costa, Papa JP, Lisboa CO, Munozb R, VHC de Albuquerque (2019) Internet of Things: a survey on machine learning-based intrusion detection approaches. Int J Comp Sci Inform Secur (IJCSIS) 6. Ektefa M, Memar S, Sidi F, Affendey LS (2010) Intrusion detection using data mining techniques. In: International conference on information retrieval and knowledge management (CAMP), 2010 7. Kim J, Shin N, Yeon Jo S, Kim SH (2017) Method of intrusion detection using deep neural network. In: IEEE international conference on big data and smart computing (BigComp) 8. Dang QV (2019) Studying machine learning techniques for intrusion detection systems. In: International conference on future data and security engineering 9. Repalle SA, Kolluru VR (2017) Intrusion detection system using AI and machine learning algorithm. Int Res J Eng Technol (IRJET) 10. Thanthrige USKPM, Samarabandu J, Wang X (2016) Machine learning techniques for intrusion detection on public dataset. In: IEEE Canadian conference on electrical and computer engineering (CCECE) 11. Othman SM et al (2018) Intrusion detection model using machine learning algorithm on Big Data environment. J Big Data 5(1):1–12 12. Mokhtari S et al (2021) A machine learning approach for anomaly detection in industrial control systems based on measurement data. Electronics 10(4):407 13. Malhotra H, Sharma P (2019) Intrusion detection using machine learning and feature selection. Int J Comp Netw Inform Secur 11(4) 14. Taher KA, Jisan BMY, Rahman MM (2019) Network intrusion detection using supervised machine learning technique with feature selection. In: 2019 international conference on robotics, electrical and signal processing techniques (ICREST). IEEE
Glacier Ice Surface Velocity Using Interferometry M. Geetha Priya, D. Krishnaveni, and I. M. Bahuguna
Abstract This research paper presents a case study utilizing SAR interferometric technique for retrieving ice velocity of Samudra Tapu and Bada Shigri glaciers of Chandra basin in Himachal Pradesh. Two Sentinel images with an interval of 6 days of ablation period interval were used and processed for interferogram using GMTSAR software. Probably, this is the best interval for getting coherence in SAR image pairs. Moreover, the changes in surface characteristics of glaciers in 6 days are normally significant resulting in good coherence in interferometric pairs (a primary condition for good results). In general, various research groups find the glacier velocity which is an average of one or two years. The obtained results show that velocity varies from a minimum of 3.2 cm/day to a maximum of 5.0 cm/day for Bada Shigri and a minimum of 3.6 cm/day to a maximum of 3.9 cm/day Samudra Tapu glacier, respectively. Similar results in two glaciers indicate the coherence of results in two nearby glaciers. Keywords INSAR · Displacement · Surface velocity · Glacier · DEM
1 Introduction The movement of ice within a glacier system is central to all the processes which are governed by glacier–climate interactions. It is embedded in the definition itself of glaciers. This movement of ice varies in time and space and differs from one glacier to another. The primary reasons for the variability are the total mass of the glacier, slope of glacier bed, meltwater system, and conditions of stress and strain developed within M. G. Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] D. Krishnaveni Department of ECE, Jyothy Institute of Technology, Bengaluru, India I. M. Bahuguna Space Applications Centre, Indian Space Research Organization, Ahmedabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_5
67
68
M. G. Priya et al.
a glacier. So, ice velocity becomes an obvious parameter, monitoring of which can help in the understanding of glacier dynamics and understanding the effect of climate forcing on glaciers [1]. This parameter is also used to retrieve the ice thickness of glaciers. Ice velocity can be measured on the ground using GPS measurements made on the markers or stakes drilled into the ice. However, the ground methods are very difficult to be employed in Himalayan terrain, and it is not feasible to measure the ice velocity of several thousand glaciers in a short period. Therefore, the need of knowing velocity of ice has driven development of newer techniques utilizing images acquired by orbiting satellites. Two major approaches to retrieving velocity from images have evolved over the years; first, the techniques based on image correlations mainly utilize optical images and the other is based on SAR interferometry [2]. Both the approaches have their advantages and limitations related to imaging radiometry and geometry. SAR images are known to be more useful even in cloud cover conditions. SAR signals can also penetrate through dry snow, so SAR interferometric pairs can be utilized to find velocity even in winters when glaciers are covered with dry snow. Most of the work carried out so far in determining velocity in Himalayan or other mountain ranges is based on image correlation techniques utilizing optical and SAR images [3]. In SAR images, it is called intensity tracking. SAR interferometry is a much less explored area of finding velocity.
2 Study Area and Dataset The two glaciers namely Bada Shigri and Samudra Tapu from Chandra basin in Lahaul and Spiti district, Himachal Pradesh, have been taken up for glacier ice surface velocity estimation using SAR interferometry technique (Fig. 1). Bada Shigri (137 km2 ) is the largest glacier and Samudra Tapu (95 km2 ) is the second-largest glacier in the upper Chandra basin [4]. The existing snout position of Bada Shigri and Samudra Tapu glaciers are approximately at 4000 m elevation 32° 06’ N and 77°44’ E at a distance of about 4 km from the Chandra river bed and 4200 m elevation 32°30’ N and 77°32’ E about 10 km southwest of Chandra Tal Lake [5], respectively, from MSL. The Sentinel-1 satellite constellation from European Space Agency under the Copernicus program contains two satellites, Sentinel-1A and Sentinel-1B with Synthetic Aperture Radar sensors operating at a frequency of 5.405 GHz (C band). Sentinel-1 offers a combined revisit period of 6 days and individually 12 days of revisit period. Two Sentinel-1A & B, two SAR image pairs are collected for SAR interferometry study with 6-days interval (Table 1) in Interferometric Wide Swath (IW) mode covering 250 km Swath and with 5 × 20 m spatial resolution.
Glacier Ice Surface Velocity Using Interferometry
69
Fig. 1 Study area
3 Methodology Radar interferometry shown in Fig. 2 is a process that is used to estimate displacement from two SAR images acquired with slightly different geometry and a small baseline distance separated over a time period. The phase difference between two SAR images (master and slave respectively) with respect to the target is used to estimate movement in the terrain. For interferogram generation defined by Eq. (1), the SAR SLC data with amplitude and phase information is considered. Interferogram (InSAR) = SARMaster ∗ SARSlave = A M As exp (θ )
(1)
70
M. G. Priya et al.
Table 1 Data specification
Parameters
Image-1
Image-2
Date of acquisition 22/9/2016
28/9/2016
Satellite sensor
Sentinel-1A (SAR)
Sentinel-1B (SAR)
Product level
Single look complex
Single look complex
Mode
Interferometric wide swath
Interferometric wide swath
Perpendicular baseline
29 m
Temporal baseline
6 days (ascending pass)
Frequency
5.405 GHz
Wavelength
5.6 cm
Spatial resolution
5 × 20 m
Fig. 2 Geometric parameters of a satellite interferometric SAR system
And θ =
4π R λ
(2)
where AM and AS is master and slave amplitude data, respectively, θ is a phase difference, λ is SAR wavelength, and R is radar-target distance. Equation (2) represents the fundamental equation representing the initial interferogram wrapped phase into an interval 0 to 2π. The generated interferogram corrected by the “known” topography, provided as a Digital Elevation Model (DEM), gives Differential Interferogram (DInSAR). The DEM also provides optimal removal of “baseline decorrelation” in conjunction with precise satellite orbit information. The entire process was carried
Glacier Ice Surface Velocity Using Interferometry
71
out in a GMTSAR open-source InSAR processing system designed for users familiar with Generic Mapping Tools (GMT). The two SAR images (master and slave) are co-registered to align, with subpixel accuracy. The co-registered images are resampled to the same size using DEM. The popular DEM, which is Shuttle Radar Topography Mission (SRTM) DEM, covers most areas of the earth’s surface between –60° latitude and +60° latitude is used. The generated interferogram contains coherence and phase information (wrapped in the form of fringes) of the co-registered image pairs. The coherence γ, often known as the correlation of the two signals S1 and S2 observed in interferometric mode [6], is given by Eq. (3), where * indicates the complex conjugate. |(S1 × S2∗)| γ =√ |(S1 × S1∗)||(S2 × S2∗)|
(3)
The coherence value γ varies from 0 (the interferometric phase is just noise) to 1 (complete absence of phase noise). The interferogram as generated from the SLCs has an almost linear phase trend across the image as a function of the slant range and baseline. The operation called interferogram flattening (topographic phase removal) generates a phase map proportional to the relative terrain altitude. As the topographical effects are removed, the resulting DINSAR scenes now only contain the differential phase information related to the movement between the acquisition of the image pairs [7, 8]. This differential phase is filtered using adaptive filtering algorithms that significantly reduce phase noise, improving both accuracies of measurement and phase unwrapping, with signifying degradation in areas of pure noise [9, 10]. The filtered differential phase is then unwrapped using SNAPHU and multiplied with λ/4π for line of sight (LOS) displacement (λ is SAR wavelength), then divided with the time period between the image acquisitions to get (LOS) velocity. The complete process of InSAR and DInSAR is given in Fig. 3.
4 Results The data processing methodology discussed under Sect. 3 has been used to estimate the glacier ice surface velocity along the direction of surface flow. The processed results at various stages of SAR interferometry are shown in Fig. 4. Figure 4a represents the three full swaths of Sentinel-l image in RGB, whereas Fig. 4b, c represents the coherence image and filtered phase image for the three full swaths obtained during the process of interferogram generation. The wrapped phase obtained from the interferogram process has been filtered using Goldstein filter with a window size of 3 × 3 for removal of noise. The wrapped differential phase after the noise filtering process is unwrapped using Statistical-Cost, Network-Flow Algorithm for Phase Unwrapping (SNAPHU). Figure 4d represents the unwrapped phase image for the three full swaths. The pixels with a coherence value of 1 indicate a higher degree of correlation between the two acquisitions. Based on the literature, pixels
72
M. G. Priya et al.
Fig. 3 Process flow
with coherence values between 0.5 and 1 are considered for phase unwrapping. Pixels with coherence less than 0.5 indicate decorrelation between the two acquisitions and are generally not considered for phase unwrapping. For the unwrapping phase, the coherence threshold value of 0.5 is considered for the present study, as coherence plays a vital role in interferometry, and the accuracy of results is directly influenced. The temporal correlation of 6 days interval for the present study is of significance as the glaciated regions had a correlation coherence between 0.5 and 0.7 compared to the non-glaciated region. As the data acquired period is the end of September 2016 (end of ablation season for the Hydrological year 2015–2016), it is the more suitable period for glacier velocity related studies. The results in Fig. 5 show that a velocity varies from a minimum of 3.2 cm/day to a maximum of 5.0 cm/day for Bada Shigri and a minimum of 3.6 cm/day to a maximum of 3.9 cm/day Samudra
Glacier Ice Surface Velocity Using Interferometry
73
Tapu glacier, respectively. Positive and negative velocities in Fig. 5 indicate ice movement with respect to the line-of-sight (LOS) of the satellite. The LOS component is positive when the surface movement is along the LOS of satellite direction and exhibits a negative value when surface movement is toward the satellite. As the glacier movement is dynamic and variable, several parameters affect the glacier velocity. Glacier flow varies with respect to topography, season, temperature, and many other factors. The velocity results presented in this study are relevant only for the mentioned 6 days period during ablation season as glacier movement is less during winter, more in summer and vary from time to time. Similar results in two glaciers indicate the coherence of results in two nearby glaciers. These results also match with the order of velocity found in other glaciers. Ice velocity of glaciers in different basins can’t be compared as the effect of climatic forcing vary from one basin to another. As the objective of this paper is to propose a method for estimating ice surface velocity using Sentinel-1 SAR datasets, access to field data for validating the DINSAR results obtained is beyond the scope of this study. For glacier velocity estimation on an annual scale, DINSAR method could not be used due to decorrelation and coherence loss. Application of DINSAR process for each 6 days interval data throughout the year and accumulating the velocity results for annual estimation would be a time-consuming and tedious process.
Fig. 4 a RGB image of study area, b coherence image, c filtered phase, and d unwrapped phase
74
M. G. Priya et al.
Fig. 5 Glacier surface ice velocity. a Bada Shigri and b Samudra Tapu
5 Conclusion The outcome from this study shows that glacier velocities on Western Himalayas can be calculated with Sentinel-1 (A and B) datasets using SAR interferometry technique. Good coherence was observed over the glacier region with a combined revisit period of 6 days. The best time for velocity related studies is the end of ablation time. The Himalayan region is mostly covered with clouds making optical data not suitable for this period of study making SAR data consistent for monitoring glacier velocity. The results suggest more and more utilization of this approach for finding ice velocity of glaciers and comparative studies on the variability of this parameter in time and space. This study also demands launching of SAR sensors with interferometric capability. Acknowledgements This research work had been supported by the Space Applications Centre— ISRO, Ahmedabad, under the project “Integrated Studies of Himalayan Cryosphere (ISHC) using space-based inputs.” The authors gratefully acknowledge the support and cooperation given by Dr. Krishna Venkatesh, Director, CIIRC, Jyothy Institute of Technology, Bangalore, India.
Glacier Ice Surface Velocity Using Interferometry
75
References 1. Sivaranjani S, Priya MG, Krishnaveni D (2021) Glacier surface flow velocity of Hunza Basin, Karakoram using satellite optical data. In: Bhateja V, Peng SL, Satapathy SC, Zhang YD (eds) Conference 2021, evolution in computational intelligence AISC, vol 1176, pp 669–677. Springer, Singapore 2. Sivalingam S, Murugesan GP, Dhulipala K, Kulkarni A, Devaraj S (2021) Temporal fluctuations of Siachen glacier velocity: a repeat pass SAR interferometry based approach. Geocarto Int, 1010–6049 3. Rupal B, Agrawal R, Rathore BP, Bahuguna IM, Rajawat AS (2018) Estimation of changes in ice thickness of Chhota Shigri glacier, Himachal Pradesh using DGPS and SRTM derived elevation profiles. J. Geomat. 12(2):117–121 4. Dobhal DP, Kumar S (1996) Inventory of glacier basins in Himachal Himalaya. J Geol Soc. India 48:671–681 5. Babu GRK, Dhar S, Kalia R, Kulkarni AV, Rathore BP (2006) Recession of Samudra Tapu glacier, Chandra basin, Himachal Pradesh. Photonirvachak 34(1):39–46 6. Bamler R, Hartl P (1998) Synthetic aperture radar interferometry. Inverse Prob 14:R1–R54 7. Hagen JO, Wangensteen B, Weydahl DJ (2005) Mapping glacier velocities on Svalbard using ERS tandem DInSAR data. Norsk Geografisk Tidsskrift. Norwegian J. Geograp. 59:276–285 8. Rao KS, Rao YS, Venkataraman G (2004) SAR interferometry for DEM generation and movement of Indian glaciers. In: IEEE International Geoscience and Remote Sensing Symposium, Alaska, 1128–1131 9. Goldstein RM, Werner CL (1998) Radar interferogram filtering for geophysical applications. Geophys Res Lett 25(21):4035–4038 10. Kouraev A, Sharov A, Strozzi T, Wegmüller U, Werner C, Wiesmann A (2008) Estimation of Arctic glacier motion with satellite L-band SAR data. Remote Sens Environ 112(3):636–645
A Study on Various Optimization Techniques for Understanding the Challenges, Issues, and Opportunities of Hybrid Renewable Energy Built Microgrid Systems K. Venkatasubramani and R. Ramya Abstract This study mainly focusses on various optimization problems, which occurs in hybrid microgrid (HMG) system. As the utilization of renewable energy sources (RES) raises in recent decades, microgrid is the only way to use RES effectively. This study deals with various optimization techniques, which deals with multiconstrained and multi-objective problems arises because of the multiple distributed generators (DGs). The effective performance of the whole microgrid system becomes a hypothetical question for recent researchers. Hence with multiple objectives and constraints involved in the architecture of critically coordinated different DGs in a common AC bus system, this paper ensures the critical literature review of various optimization techniques. The study enhances the concept of sustainable energy and the benefits of the different optimization algorithms. Also, the study concludes the future aspects of optimization techniques involved in RES built HMG. Keywords Constraints · Distributed generator · Hybrid microgrid · Optimization · Renewable energy
1 Introduction In order to achieve sustainable environment, to reduce harmful CO2 emission, and to reduce pollution, the use of renewable energy resources (RES) is considered as the alternation for replacing conventional fossil fuel-based energy sources [1]. The development in the field of RES is critically effective only based on the implementation in microgrids. Thus, the concept hybrid renewable energy built microgrid systems (HREBMS) came into the field of energy systems [2]. The microgrid is a system of connection of various Distributed Energy Resources (DERs) and interconnected K. Venkatasubramani (B) · R. Ramya Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology, Chennai, India e-mail: [email protected] R. Ramya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_6
77
78
K. Venkatasubramani and R. Ramya
Fig. 1 Components of HREBMS
loads, with definite electrical network limits that acts as a single controllable unit with respect to the grid [3]. Microgrid has two modes of operation, if it is connected to the service utility it is called as grid connected mode, if it is disconnected from the service utility it is called as islanded mode. Thus, the age-old power systems are renovated to improve resiliency and reliability [2]. The concept of 3Ds also involves in the attraction of using HREBMS. 3Ds means decentralize, decarbonize, and democratize. This leads to concentrate on electricity cost and to provide reliable electricity to the area where electrical infrastructure is lacking. Thus, the limit has increased by considering the environmental and economic concerns from the above advantages. Some of the concerns are restructuring, boom in DER technologies, some economical risks like massive generators, transmission lines, research gradually shifted from massive to smaller decentralized HREBMS.
1.1 Components of HREBMS See Fig. 1.
1.2 Components of HREBMS In this study, it was mainly concentrated on economic operation of microgrid for various applications using optimization techniques. From the above Fig. 2, it is
A Study on Various Optimization Techniques for Understanding …
79
Fig. 2 Structure of microgrid
clearly noticed that the economic considerations were studied based on distributed generators, power electronic converters, and energy storage systems. The microgrid system is connected to the utility grid through static transfer switch (STS) at the point of common coupling (PCC) [1]. In order to describe the working cost accurate economic model is required to describe the administration of working model. Microgrid has been operated in grid connected mode as well as islanding mode based on the applications [4]. In order to reduce the operating cost to the minimum, optimization components are needed.
2 Challenges • There is an undesirable power flow because of the complexity caused by distribution generators. • Some of the issues such as harmonics, frequency variations, voltage variations that exist in ac microgrid. • Also, there are some of the issues that dc microgrid faces are bus faults, circulating and inrush currents. • Low inertia is a major issue, which results in frequency deviation at high rate. This is because of no proper controlling measures during islanding mode of operation.
80
K. Venkatasubramani and R. Ramya
Fig. 3 Challenges in microgrid
• There is shift from grid operation to islanded operation because of faulty operation in microgrid. • Because of multiple DG’s proper coordination control is needed. • As it is difficult to predict the continuous availability of the RES, the storage system should be highly efficient. Because RES is to be supplied with continuous power. • The microgrid economical operation is mainly based on achieving constant frequency and voltage values. • Good communication network for speedy operation is also one of the main challenges. • Regulatory bodies need to have enough ideas on microgrid policies and to allow infrastructure development (Fig. 3).
3 Optimization Techniques Optimization techniques adopted here is to study the simulation program to obtain the optimal cost for the PV array, wind turbine, micro turbine, battery bank, power electronic converter devices, and diesel generator [4]. The different optimization techniques were studied in the following sections.
A Study on Various Optimization Techniques for Understanding …
81
3.1 Graphical Method In [4], authors have reviewed the graphical methods as a problem of two variables solving by how it varies from one to another. By considering the factor associated with solar term called irradiation the optimal assignment was obtained. It was done by low irradiation of solar on a particular day. The author showed an optimal cost size of hybrid microgrid system.
3.2 Probabilistic Technique Dependent on the collected data, randomness is used in this approach [4]. Hence, one of the statistical tools is used, because for separate values state variable is not known. The author says that there are two benefits arrived at the end of this technique, one is reducing the cost of a system and the second is to minimize the data’s associated with load.
3.3 Deterministic Method Authors have given the cost estimation and the area required for the PV array by using this method [4]. Individually evaluated each set of state variables by a parameter and substituted with a value of earlier state. Hence obtained the continuous unique optimal solution.
3.4 Iteration Method This is the computational method, Author [4] proposed a scheme to arrive a solution, which was already predicted. For the optimization problem, the solution is arrived by choosing the set of approximate values.
3.5 Genetic Algorithm (GA) This is a heuristic algorithm; this evolutionary algorithm uses mutation, selection, and combination. The GA uses genetic demonstrate and fitness function for domain solutions. Figure 4 gives you the GA optimization method [4]. The author used to optimize sizing of PV array/wind turbine; this objective decreased the total initial costs.
82
K. Venkatasubramani and R. Ramya
Fig. 4 GA flowchart
3.6 Fuzzy Logic This method maps the exact situation, so this is most widely used method. Fuzzy regression, Fuzzy gray prediction, and Fuzzy clustering play a role in finding the fuzziness while ranking the variables. Author used to grade the electrical utility significance [4].
A Study on Various Optimization Techniques for Understanding …
83
3.7 Artificial Neural Network (ANN) This technique deals with the analysis of neurons, which are of organized group of artificial intelligence. The author [4] used the ANN method for approaching the control which are related to preventive measures in the microgrid.
3.8 Artificial Bee Colony (ABC) Algorithm Author [5] discussed the ABC algorithm in two different stages based on the objective functions. First stage is for providing minimum fuel cost. With this minimum cost function, second stage is obtained by minimum operating and maintenance cost. Author [5] has verified the results with various load conditions. This method attains the best distributed generators for the MG under various load demands at a minimum cost. Figure 5 shows the flowchart of the ABC proposed technique by the author [5].
3.9 Particle Swarm Optimization (PSO) Author [6] used PSO technique to answer the simple approach for optimal power generated from multiple distributed generators in microgrid system. The author [6] with some justifications proved its high intensity and sensitivity in solving optimization problem. Figure 6 shows the flowchart of PSO.
3.10 Spotted Hyena (SH) Optimization This is intelligent bio-inspired optimization algorithm. Hyena is an animal lives in dry environment. This animal is clever and has multiple sense. It was observed that animal attitude evaluation was separated into three classifications. Category I—resource accessibility and competition between creatures. Category II—individual behavior, Category III—spices social relationship. Figure 7 gives you the flowchart for SH optimization technique. With the constraints of voltage variation, this technique helps to optimize the different sets of parameters of controllers. In order to solve this bioinspired algorithm, the constraints in the form of objective functions are reactive power limits, real power limits, balance and loss of power, etc., Author [7], with the proposed method, reduced the cost and restored secure limits. The constraints are availability of the RES and demand from the load side. This is a meta-heuristic algorithm.
84
K. Venkatasubramani and R. Ramya
Fig. 5 ABC flowchart
3.11 Elephant Herd (EH) Optimization Elephants are social animals having structure organized of females and calves. Females with her claves have a clan. Female elephants live as family, but male elephants always prefer to live alone, after certain age male elephants will leave its family. Though the male elephants live far away from the clan, it has some vibration with its clan. Author [7] showed the grazing characteristics of the elephants in two operators. The family of elephant is always made up of number of clans, each clan is supposed to have certain number of elephants. In every generation, some number of male elephants leave their family and live alone. In each clan, all elephants live
A Study on Various Optimization Techniques for Understanding …
85
Fig. 6 PSO flowchart
under the leadership of matriarch [7]. The steps of elephant herding is explained in Fig. 8. From the above discussions, SH optimization is used to optimize the sets of parameters of controllers. With the help of data from SH, EH optimization technique is used for the signal controllers (Table 1).
86
K. Venkatasubramani and R. Ramya
Fig. 7 Spotted Hyena flowchart
4 Conclusion and Future Scope In this literature study, various results of optimization techniques are compared, and achievement of cost economic operation of HREBMS was presented. The most reliable power supply for RES is found to be battery bank as energy storage devices and act as best back up and famous power supply for RES. Hence, obtaining the
A Study on Various Optimization Techniques for Understanding …
87
Fig. 8 Elephant Herd flowchart
best architecture of HREBMS is the significant feature to reduce the cost. Thus, the optimization solution achieves reliable power supplies. Cost of various elements is found to be objectives in optimization problems. To obtain accurate results many objective functions and constraints are to be considered. This survey gives some of the mathematical and bio-inspired optimization technique. There is a huge future in running economical cost optimization using various latest optimization techniques, which include maximum of objective functions and constraints and to design more numbers of DGs in our proposed HREBMS.
88
K. Venkatasubramani and R. Ramya
Table 1 Summary of different optimization techniques S. No.
Different optimization techniques
Description
1
Graphical method [4]
Achieved cost optimization and also optimal sizing of HREBMS
2
Probabilistic technique [4]
Achieved minimal cost of the system and load’s data collection
3
Deterministic method [4]
Obtained unique optimal cost minimization
4
Iteration method [4]
Optimize PV, Wind turbine, Battery energy storage costs
5
GA [4]
Achievement of optimization in islanded operation of hybrid microgrid
6
Fuzzy logic [4]
Obtained grade significance of the utility grid
7
ANN [5]
PV and wind systems hybrid architecture was designed and achieved forecasting of loads at various schedules
8
ABC [5]
Achieved optimum configuration of the microgrid at minimum fuel cost, minimum operation and maintenance cost
9
PSO [6]
Minimized the levelized cost of energy (LCE)
10
Spotted Hyena (SH) [7]
Optimized the set of parameters of the controllers
11
Elephant Herd (EH) [7]
Optimization of signal controllers
References 1. Singh P, Paliwal P, Arya A, A review on challenges and techniques for secondary control of microgrid. IOP Conf Ser: Mater Sci Eng 561:012075 2. Hirscha A, Paraga Y, Guerrerob J (2018) Microgrids: a review of technologies, key drivers, and outstanding issues. Renew Sustain Energy Rev 90:402–411 3. Ton DT, Smith MA, The US department of energy’s microgrid initiative 4. Dawouda SM, Lina X, Okbaa MI (2018) Hybrid renewable microgrid optimization techniques: a review. Renew Sustain Energy Rev Part 3 82:2039–2052 5. Roy K, Mandal KK (2014) Hybrid optimization for modelling and management of micro grid connected system. Front Energy 8(3):305–314 6. Amer M, Namaane A, Sirdi NKM (2013) Optimization of Hybrid Renewable Energy Systems (HRES) using PSO for cost reduction. Energy Proced 42:318–327 7. Annapandi P, Banumathi R, Pratheeba NS, Manuela AA, An efficient optimal power flow management based microgrid renewable energy system using hybrid technique. 10.1177/0, 4233, 22096687
Intrusion Detection System on New Feature Selection Techniques with BFO R. Rajeshwari and M. P. Anuradha
Abstract As a result of the fast increase of the Internet and the increase in attacks, intrusion detection systems (IDS) have become an important source of information security. As the name implies, the purpose of IDS is to help computer systems protect themselves from threats. When an intrusion occurs, this anomaly detection system uses the regular behaviour database and alerts the user of deviations from normal behaviour. Hosting-based and network-based intrusion detection systems (IDS) are separated by a data source. Individual packets that travel through the network are scanned in network-based IDS, while single computer or host processes are scanned in host-based IDS. Selected features in IDS help to reduce classification time, and an effective network intrusion detection system (NIDS) has been established and implemented in this paper. To achieve this goal, a proposed feature selection (PFS) with bacterial forage optimization (BFO) method based on information gain ratio has been recommended and implemented. Proposed algorithm selects the optimal number of features from the NSL-KDD dataset using the selection method. Support vector machine (SVM) has been used to effectively classify the data. As a result, the system is particularly successful in detecting DoS attacks and reducing the number of false alarms. As a result of the proposed feature selection (PFS) with BFO and SVM classification techniques, IDS attacks can be identified. In terms of detection accuracy and time, the proposed feature selection algorithm (PFS) with BFO has approved some algorithms for selecting advanced functions. Keywords Network intrusion detection system (NIDS) · Proposed feature selection (PFS) · Information gain · Bacterial foraging optimization (BFO)
R. Rajeshwari · M. P. Anuradha (B) Department of Computer Science, Bishop Heber College, Affiliated to Bharathidasan University, Tiruchirappalli, Tamil Nadu, India e-mail: [email protected] R. Rajeshwari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_7
89
90
R. Rajeshwari and M. P. Anuradha
1 Introduction A lot of networks are connected with a lot of computers, so security has become an important issue in many fields. The security of the network has become increasingly important because of the rapid rise of Internet communications and the availability of tools to occupy the network. The security policies in place today are not sufficient to protect the data in the database. Other technologies can provide security, such as firewalls, encryption, and authentication systems, but they are still vulnerable to being attacked by hackers who exploit computer errors. Using a combination of a novel feature selection algorithm and the BFO approach, a new intrusion detection system is built into the project effort to prevent it from being attacked by intruders. Hidden forecast information can be extracted from large databases using the KDD Cup Database and data mining techniques. This is a great novel technology [1]. Any type of database can be hacked using data mining techniques, including the Internet. Algorithms and techniques, on the other hand, may vary depending on the data type. Over the past few years, the Internet has developed a part of everyday life. Information-processing-system-based internet is now vulnerable to various risks that can cause various damages, which can result in significant losses [2]. As result security becomes more important. The maximum simple purpose of network safety is to defend defensive networking schemes from unwanted entrée, exposure, interference, modification, or destruction. This reduces the risks associated with key security goals such as confidentiality, integrity, and access. The proposed methodologies are splitting training and testing datasets for data pre-processing, feature selection, and classification.
2 Related Work The requirement for high-speed services in providing network services has continuously been a requirement of networks, and no further attention is necessary in this regard. NIDS is an important tool for protecting the network. There have been a lot of research projects regarding network security in recent years due to the internet. NIDS is a topic that has been discussed in many publications. Intrusion detection systems are used to identify intrusive attacks. Adnan et al. [3] devotes a separate section to the presentation of IDS datasets. Three key datasets are provided in particular: KDD99, NSL, and Kyoto. This study finds that in the neural network (NN)-based model for an IDS in the IoT—three elements of idea drift, high-dimensional awareness, and computational awareness— are symmetric in their effects and must be addressed. Einy et al. [4] selected features, trained the network, and tested the model using the KDD Cup99 and dataset. By finding a balance amongst the objectives of the number of representative features and training mistakes, this technique was able to
Intrusion Detection System on New Feature Selection Techniques …
91
increase the performance of the IDS in terms of valuation standards when associated with other earlier approaches, based on the evolutionary power of MOPSO. Alazzam et al. [5] proposed a wrapper-based feature selection algorithm for IDS, which used a feature selection pigeon optimizer algorithm. A new way for achieving a feature selection optimization methodology that may be used in conjunction with classification methods has been proposed and compared to classic feature selection approaches based on collective intelligence algorithms. Wu and Li [6] explores specific feature selection approaches and presents a team of Random Forest and neural networks to enhance diagnostic presentation in his next research article. This includes scheming a smart scheme that can select the right path in the adaptive approach. Their review includes examining the reliability of this strategy using KDD99 data and evaluating its practical performance using real data from a Honeynet system to determine its practical effectiveness. As an experimental result, this technique will give a better overall result compared to similar approaches by identifying essential and closely related properties. Lin et al. [7] proposed in this research that a novel infiltration detection framework be developed that explores and compares various feature selection techniques and taxonomy determination methods. For feature selection and group classification, an automatic parameter change mechanism is designed. Since manual testing is not required to find the best parameters, it improves the flexibility of both the parameters and the model. The traditional NSL-KDD database and the most recent CICIDS2018 database are used in this comparative analysis. The accuracy and false-positive ratio have been proven in the test data. Nimbalkar and Kshirsagar [8] is interested in employing the Information Gain (IG) and Profit Margin (GR) feature selection algorithms to detect DOS and DDOS attacks. Inserting and subdividing IG/GR above 50 percent score yields Optimal Features. Laghrissi et al. [9] employed deep learning approaches to detect attacks by long short-term memory (LSTM). PCA (Principal component analysis) and mutual information are two ways to select the optimal features. By implementing the KDD99 dataset, the findings may prove that PCA-based models perform better in classification and multi-class classification.
3 Data Collection and Preparation 3.1 Data Collector NSL-KDD data is collected by a data collection agent. They are sent to the preprocessing section to be pre-processed before being transferred to the final processing module to be processed. There is a possibility that the records collected from the NSLKDD dataset are either normal or malicious. In the Table 1, describe the NSL-KDD dataset features with their data types and category.
92
R. Rajeshwari and M. P. Anuradha
Table 1 NSL-KDD features with their data types and category Nu = Numeric; Nom = Nominal; Bi = Binary Category
Sno
Name
Datatype
Category
Sno
Name
Datatype
Basic
1
Dur
Nu
Conent
22
is gulog
Bi
2
p. type
Nom
23
Ct
Nu
3
Ser
Nom
24
Srvc
Nu
4
Flag
Nom
25
Serrorr
Nu
5
src
Nu
26
Srvsr
Nu
6
dst
Nu
27
rerrorre
Nu
7
Lan
Bi
28
srvrerrorrate
Nu
8
wf
Nu
29
samesr
Nu
9
Ur
Nu
30
diffsrvr
Nu
10
Hot
Nu
31
srv diffhr
Nu
11
nfl
Nu
32
dsthc
Nu
12
loged
33
dsthsc
Nu
13
Num
Nu
34
dst hss rate
Nu
14
Rs
Nu
35
dsthdiffsr
Nu
15
Su
Nu
36
dsthssprate
Nu
16
Numr
Nu
37
dsthsdhr
Nu
17
Numfc
NuNu
38
dst hsr
Nu
18
Numsh
Numeric
39
dsthssr
Nu
19
numaf
Nu
40
dst hrr
Nu
20
numoutcmds
Nu
41
dsthossrr
Nu
21
is log
Binary
42
Class
Content
Traffic
4 Methodology The main aims of this feature selection optimization approach are to reduce the number of features created on the class label and reduce attack detection errors in the network, which will increase the accuracy of forecasting test samples. As a result, a solution for the proposed feature selection problem is a set of dominant solutions, in which each solution is a vector of two components: number of features and classification error rate. The purpose was to use the feature selection problem as a minimization problem to reduce the number of irrelevant features, lowering the classification error rate. The flow of the proposed method is shown in Fig. 1.
Intrusion Detection System on New Feature Selection Techniques …
93
Fig. 1 Proposed block diagram
5 Feature Selection 5.1 Bacterial Foraging Optimization Algorithm (BFO) By simulating the hunting performance of E.coli in the human colon, Passino created the bacterial foraging optimization (BFO) algorithm in 2002 [10]. In general, the technique consists of four bacterial foraging approaches. i.
Chemotaxis: This method replicates the movement of an E.coli cell by using flagella to swim and tumble. An E. coli bacterium can migrate in two different ways biologically. It can swim in one direction for a long period or tumble, and it can switch between these two modes of operation throughout its life. Let (i, j, k, l) be the ith bacterium at the jth chemotactic, kth reproductive, and lth elimination-dispersal steps. C(i) denotes the movement size taken by the tumble’s random direction (run-length unit). So, the bacterium’s movement can be described by computational chemotaxis. θ i ( j + 1, k, l) = θ i ( j, k, l) + C(i)
(i) T (i)(i)
,
94
R. Rajeshwari and M. P. Anuradha
a. denotes a random-direction vector with elements in the range [−1, 1]. ii. Swarming: Some motive bacteria species, including E. coli and S. Typhimurium, be interesting group behaviour in which intricate and stable spatio-temporal patents (swarms). A collection of E. coli cells is decided in an itinerant ring by touching the nutrient slope up in the centre of a semi-solid matrix with one chemical effect. The cells release an attractive aspartate when enthused by an elevated degree of succinate, which helps them to the group and so moves the concentration pattern of high bacterial density swarms. The following function can represent cell-to-cell signalling in E.coli swarm. Jcc (θ, P( j, k, l)) = a.
=
S
Jcc θ, θ i ( j, k, l)
i=1
S i=1
+
S i=1
p 2 θm − θmi −dattractant exp −wattractant m=1
p 2 θm − θmi −drepellant exp −wrepellant
m=1
b. where Jcc (θ, P( j, k, l)) is the impartial purpose value. θ = [θ1 , θ2 , . . . θ p ]T is an argument in the p-dimensional examine domain. d attractant , wattractant , hrepellant , wrepellant , are dissimilar constants that should be chosen correctly [9, 11]. iii. Reproduction: When the objective purpose is less important, the less intense bacteria eventually die, and the better bacteria each individually divide into two bacteria that are located nearby. The scope of the swarm is continuously protected. iv. Elimination and Dispersal: A collection of bacteria present in an elevated part of Nutrient inclines can be destroyed gradually or suddenly in a limited atmosphere anywhere the bacterial people exist, for instance, due to several details an important resident infection increase can occur. Events can be held to kill all the bacteria in an area or disperse a group into a new location. Several bacteria are settled by chance at a very small probability for the simulation of this occurrence in BFOA though the novel substitutions are arbitrarily reset over the examiner part. Bacterial forage processing steps follows below. [Bacterial Forage Processing Steps]. For loop for maximum no of elimination and dispersal steps. For loop for maximum no of reproduction. For loop for maximum no of chemotaxis. Swim and tumble (direction). calculate fitness function. End Chemotaxis. Worst fitness function dies off. End Reproduction.
Intrusion Detection System on New Feature Selection Techniques …
95
Bacterium regenerated with Probability. End Elimination and dispersal steps.
5.2 Proposed Algorithm for Proposed Feature Selection (PFS) Pre-processing techniques are required because it is problematic to procedure great quantities of network traffic data with all the features needed to detect and prevent intrusions. This process is significant to eliminate the sum of features required to create IDS. The IGR for attribute selection was used to develop this algorithm. As a result, the dataset DS is subdivided into n classes CSi. The agent chooses the attributes Fi with the most non-zero values, and the IGR is calculated by calculations: x Freq CS j , DS Freq CS j , DS log2 Info(DS) = − |DS| |DS| j=1 Info(FN) =
y
|FNi | i=1
IGR(Ri ) =
|FN|
× Info(FNi )
Info(DS) − Info(FS) Info(DS) + Info(FS)
(1)
(2)
(3)
To successfully protect against attacks and to save computation time, the PFS algorithm has selected ten important features. Calculating IGR values with standard feature selection techniques takes a long time. As a result, in this work, a novel feature selection process named the proposed feature selection process is proposed and implemented, which reduces computing time. This process determines the information gain ratio (IGR) for the dataset’s various properties. It reduces the number of columns based on the IGR value. PFS progresses detection accuracy through dropping false alarm rates. The simulated attacks are divided into four groups: User to Root (U2R), Denial of Service (DoS), Remote to Local (R2L), and Probe attacks. The novel feature selection algorithm is as follows. Algorithm: Optimal Attribute Selection Algorithm using Intelligent Agent. Input: All Attributes (41 Attributes in the Dataset) Output: Optimal Set of Attributes (R) Step 1: Choose the qualities that have a wide range of values. Step 2: Formula 1 can be used to compute the Info (D) standards for particular qualities.
96
R. Rajeshwari and M. P. Anuradha
Step 3: Choose the attributes that have the most non-zero values. Step 4: Using Eq. 2 compute the Info (D) standards for particular qualities chosen in step 3. Step 5: By calculation 3, calculate the IGR value. Step 6: Select the attributes based on the IGR value.
6 Classification Deep learning is the subset of machine learning, which in turn is a subset of artificial intelligence. Artificial intelligence is a technique that enables a machine to mimic human behaviour. Machine learning algorithm takes decision from the pattern and yield better results. Widely used machine learning algorithm is support vector machine (SVM) which can still assist for a wide range of classification problem [12] and it has great potential to detect the anomaly feature in the network.
6.1 SVM Classifier The support vector machine (SVM) is the most popular widely used as classifier for intrusion detection systems. SVM are divided into two types: binary classifier (linear and nonlinear) and multi-class classifiers. In this paper, a meta-heuristicassisted SVM-based IDS was used for classification. Using the minimum risk rule, SVM generates a hyperplane to distinguish between positive (normal) and negative (anomaly) occurrences incorporates various kernel types, including sigmoid, radial basis function (RBF), linear splines kernel, and Bessel function kernel. In the proposed work, Sigmoid kernel is used. k(x, xi ) = tanh αx t xi + β
(4)
In Formula 4, where α represents scaling parameter and β denotes the changing parameter.
7 Results and Discussion Experiments were performed to examine the properties of the Proposed Feature Selection (PFS) algorithm with BFO classification with Python (programming language), installed on Windows 10 with an Intel Core i5-8400 (8MCache, 2.8 GHz), and 32 GB of RAM. Since Python programming language showed its advantages of
Intrusion Detection System on New Feature Selection Techniques …
97
being easy-to-use and fast to develop, lots of researchers and learners have widely used Python in machine learning. Under this situation, the project decided to use Python to compile the code for the machine learning model. Python extensively supported superior-quality data science libraries to reduce the length of code and provide more powerful functions. The following Table 2 summarises and tabulates the performance and time analysis for various types of attacks. Tables exhibit the detection accuracy and computation time obtained by applying known and proposed feature selection algorithms to the characteristics of the NSL-KDD data set. The classification of different types of attacks begins with the SVM classification. Table 2 demonstrates the outcomes of the presentation examination in terms of accuracy and time occupied to classify the attacks using the SVM classifier without feature selection. The detection accuracy and time spent for 6000 records are shown in this Table 2. The classification accuracy for SVM classification is shown in Figures. The classification time taken for SVM classification is pictorially represented without feature selection and with feature selection in Figs. 2 and 3. There are 1581 entries in the Table 3 that are classified as DoS attacks based on a SVM classification. When compared to the classification complete by the forty-one characteristics of the data set, it is shown that classifying the DoS attack in PFS with BFO utilizing Table 2 Presentation examines for SVM classification with all features
Attacks Detection accuracy for all features Time in seconds (%) DoS
98.0
210
Probe
89.1
211
R2L
11.56
217
U2R
40.0
219
Detection Accuracy %
120% 100%
98% 89.10%
80% 60% 40%
40% 20% 0%
11.56% DoS
Probe
R2L Attacks
Fig. 2 Classification accuracy of SVM classification without feature selection
U2R
98
R. Rajeshwari and M. P. Anuradha 220
219
218
217
Time in Seconds
216 214 212 210
211 210
208 206 204
DoS
Probe
R2L
U2R
Attacks
Fig. 3 Classification in time of SVM classification without features
Table 3 Time examination for Dos attack in PFS through SVM
Data slice
Time taken in seconds Particular feature using PFS (10)
Total features (41)
1
2.21
2.32
2
2.35
2.12
3
2.01
2.13
4
2.04
2.03
5
2.02
2.16
Avg.
2.11
2.13
attributes designated in PFS takes less time. In PFS with BFO, the computation time for identifying DoS attacks is graphically represented in Fig. 4. Table 4 shows the calculation time required to classify the registers acquired via PFS with BFO classification as probe attacks. In comparison to the classification complete by the forty-one attributes of the data set, it is seen that classifying the probe attack in PFS with BFO using the features selected in PFS takes less time. The longer it takes to classify the probe attack in Python is graphically represented in Fig. 5. Table 5 shows the time required to classify the data that were collected through PFS with BFO as U2R attacks. It has been discovered that categorizing the U2R attack in Python using the features selected in PFS takes less time than classifying the U2R attack by the forty-one attributes of the data set. The time analysis for 1745 data is shown in Table 5.
Intrusion Detection System on New Feature Selection Techniques …
99
2.4
Time in Seconds
2.3 2.2
Time taken in seconds Selected feature using PFS(10)
2.1 2
Time taken in seconds Total Features(41)
1.9 1.8
1
2
3 4 Data Slice
5
Fig. 4 Calculation time for classification of DoS attack Table 4 Time analysis for probe attack in PFS with BFO
Data slice Time taken in seconds Selected feature using PFS (10) Total features (41) 1
4.42
7.23
2
4.09
7.11
3
4.02
6.92
4
3.76
6.75
5
3.52
6.52
Avg.
3.95
6.91
8
Time in Seconds
7 6 5
Time taken in seconds Selected feature using PFS(10)
4 3
Time taken in seconds Total Features(41)
2 1 0
1
2
3 4 Data Slice
5
Fig. 5 Calculation time for classification of probe attack
100
R. Rajeshwari and M. P. Anuradha
Table 5 Time examination for U2R attack in PFS through BFO
Data slice
Time taken in seconds Particular feature using PFS (10)
Total features (41)
1
3.39
6.59
2
3.30
6.23
3
3.18
6.05
4
3.25
6.02
5
3.18
6.03
Avg.
3.27
6.14
The computing time required in Python to classify the U2R assault is graphically represented in Fig. 6. Table 6 shows the time it takes to categories the 1745 records that were received from PFS with BFO as R2L attacks. When compared to the classification by the forty-one attributes of the data set, the time taken to classify the R2L attack in Python using the features selected in
Time in Seconds
7 6 5
Time taken in seconds Selected feature using PFS(10)
4 3 2
Time taken in seconds Total Features(41)
1 0
1
2
3 4 5 Experiment number
Fig. 6 Calculation time for classification of U2R attack
Table 6 Time examination for R2L attack in PFS through BFO
Data slice Time taken in seconds Selected feature using PFS (10) Total features (41) 1
3.53
6.59
2
3.41
6.41
3
3.37
6.39
4
3.31
6.20
5
3.39
6.01
Avg.
3.41
6.33
Intrusion Detection System on New Feature Selection Techniques …
101
OFS takes less time. In Fig. 7, the computation time for identifying the U2R attack in BFO is graphically represented. In Fig. 8 shows the accuracy examines for attacks in PFS with BFO. Table 7 shows the calculation time for PFS with BFO for various types of records. As the number of records grows, the accuracy of detecting records with selected features using PFS progressively improves, and it approaches the accuracy of recognizing records with all features using PFS with BFO. 7
Time in Seconds
6 5 Time taken in seconds Selected feature using PFS(10)
4 3
Time taken in seconds Total Features(41)
2 1 0
1
2
3 4 Data Slice
5
Fig. 7 Calculation time for classification of R2L attack
102
Accuracy(%)
100 98 96
DoS
94
Probe U2R
92
R2L
90 88
6000
10000
14000 18000 No. Records
Fig. 8 Accuracy examines for attacks in PFS with BFO
22000
26000
102
R. Rajeshwari and M. P. Anuradha
Table 7 Accuracy analyses for attacks in PFS with BFO Exp. no.
DoS
Probe
U2R
R2L
1
No. of records 6000
99.23
93.63
92.21
92.23
2
10,000
99.34
94.43
93.65
93.53
3
14,000
99.36
95.63
94.91
94.76
4
18,000
99.36
95.87
95.45
95.65
5
22,000
99.45
96.01
95.86
95.59
6
26,000
99.48
96.23
96.02
96.01
8 Conclusion The proposed system’s implementation and results are examined in this work. From the result, the proposed hybrid approach (PFS-BFO) achieved higher classification accuracy than the existing approach using 100% complete NSL-KDD. The following chapter will provide a conclusion to this study as well as recommendations for future efforts. Using a proposed feature selection (PFS) algorithm and BFO classification techniques to secure the system, a novel IDS has been planned and applied in this work. In the NSL-KDD data set, the calculation time for identifying and classifying records using all forty-one features’ takings a significant amount of time. To reduce detection then classification time, the PFS algorithm picks out only the most significant features. Further, BFO and PFS with BFO contribute to a higher level of precision. In addition to reducing false-positive rates, the proposed IDS reduces computation time.
References 1. Moustafa N, Hu J, Slay J (2019) A holistic review of network anomaly detection systems: a comprehensive survey. J Netw Comput Appl 128:33–55 2. Zhang Y, Zhang Y, Zhang N, Xiao M (2020) A network intrusion detection method based on deep learning with higher accuracy. Procedia Comp Sci 174:50–54 3. Adnan A, Muhammed A, Ghani AAA, Abdullah A, Hakim F (2021) An intrusion detection system for the internet of things based on machine learning: review and challenges. Symmetry 13(6):1011 4. Einy S, Oz C, Navaei YD (2021) Network intrusion detection system based on the combination of multiobjective particle swarm algorithm-based feature selection and fast-learning network. Wireless Comm Mobile Comp 5. Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl 148:113249 6. Wu C, Li W (2021) Enhancing intrusion detection with feature selection and neural network. Int J Intell Syst 36(7):3087–3105 7. Lin C, Li A, Jiang R (2021) Automatic feature selection and ensemble classifier for intrusion detection. J Phys: Conference Series 1856(1):012067
Intrusion Detection System on New Feature Selection Techniques …
103
8. Nimbalkar P, Kshirsagar D (2021) Feature selection for intrusion detection system in Internetof-Things (IoT). ICT Express 7(2):177–181 9. Laghrissi FE, Douzi S, Douzi K, Hssina B (2021) Intrusion detection systems using long short-term memory (LSTM). J Big Data 8(1):1–16 10. Sugianela Y, Ahmad T (2020) Pearson correlation attribute evaluation-based feature selection for intrusion detection system. In: 2020 International Conference on Smart Technology and Applications (ICoSTA), pp 1–5. IEEE 11. Farahani G, (2020) Feature selection based on cross-correlation for the intrusion detection system. Security Comm Netw 12. Chen K, Zhou F-Y, Yuan X-F (2019) Hybrid particle swarm optimization with a spiral-shaped mechanism for feature selection. Expert Syst Appl 128:140–156
Frequency and Stability Control of Photovoltaic and Wind-Powered Grid-Connected DC Bus System M. Moovendan, R. Arul, and S. Angalaeswari
Abstract As renewable resources are increasingly used and integrated with the grids, the power stability and frequency stability issues have become more concerned in the power system. An integrated Renewable Energy Sources (RES) model is connected to the DC bus through KY-Boost Converter (KY-BC). The DC bus is connected to Resistive—R, Inductive—L, and Capacitive—C type loads and tested. The Energy Storage Device (ESD) connected in parallel with RES will also supply the DC bus during the peak demand. This paper makes a comparative analysis between the performance of the conventional PI and Fractional order PID controller. Both open-loop and closed-loop systems are composed, displayed, and recreated utilizing MATLAB-Simulink, and their outcomes have been observed. The proposed approach gives an enhanced voltage output to the DC grid. Keywords Renewable energy sources (RES) · KY-boost converter (KY-BC) · KY converter (KYC) · Energy storage device (ESD) · Resistive (R) · Inductive (L) and capacitive (C) · Microgrids (MGs) · Continuous conduction (CC) · Supercapacitor (SC) · Photovoltaic (PV) · Wind energy (WE)
1 Introduction Microgrids (MGs) are usually supplied only by RES and ESD [1, 2]. The different sources, loads, and controllers will stabilize the MGs. Usually, the control strategies used over various RES will provide MGs with smooth and stable function. The MGs connected with constant power loads will experience sudden significant transient with the change in load levels, which escalate concerning time [3, 4]. The algorithm related to controls is a well-known cost-effective solution to these problems. Usual lead-lag management may improve the stability of MGs [5]. Higher-level transients are managed using nonlinear stabilizers for electronically connected RES using the M. Moovendan · R. Arul (B) · S. Angalaeswari School of Electrical Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu 600127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_8
105
106
M. Moovendan et al.
approach of backstepping [6–8]. The management of MGs’ voltage, current, and frequency fluctuations using a passivity-based control method was recorded [8]. A Hamiltonian plan based on non-precise stabilizers for MG’s has been established similarly [9]. The stabilizers implemented for MGs in these RES and ESD are modeled after synchro converters [10]. Newly developed MGs are also paying close attention to how well they operate. Lara et al. [11] describes a comprehensive power control approach for isolated MGs in the vicinity of RES, ESD, and variable loads. It’s worth noting that MGs that experience significant transients over terse period necessitate quick and restraint methods. Buck converters have been fitted with a nonlinear sliding mode controller [12]. In contrast to traditional linear controls, these control strategies are more resistant to system disturbances. However, in a multisource DC-MG, the function of this controller has not been studied. Furthermore, boost converters with time-varying models were not considered. Nonlinear regulators are added to buck converters, whereas it is not possible with boost converters, and all RES, ESD in DC-MG must be fitted with the stabilizer to ensure global stability [13]. The regulation of various transients following the constant power load changes and a new version of amalgam converting system attached with stabilizers was developed, which works without centralized information [14]. The voltage-lifting technique for SEPIC and Cuk converter has yielded better voltage conversion with high efficiency and minimum ripples [15]. The sliding mode current control technique has produced better performance in reducing the ripple content [16]. The automatic loop-bandwidth control implemented has a quicker response adding to lower ripples due to precise Pulse Width Modulated (PWM) control and short hysteresis control [17]. It is proven that the above converters have only one-half plane transfer function for Continuous Conduction (CC) mode, whose transient response is excellent and hard to achieve in practice. KYC, a newly invented DC-DC non-isolated boost converter proposed by K. I. Hwu et al., operating in CC mode, is recognized for low output voltage ripple and fast load transients [18]. KY-BC is a combination of KYC and conventional Synchronous Rectified converter. Unlike other traditional DC–DC converters described earlier, this converter possesses the whole plane as nonzero in the transfer function, which produces a good load transient response and is easily controllable in practice. This characteristic leads to minimized output current and voltage ripple of the KY-BC [19]. The minimized output current and voltage ripple could be effectively done by implementing the boost converter. ESD is the primary source in compact applications, and they are variable [20]. Due to high proficiency and tiny size, the switching mode converters are widely used. A lot of new converter topologies for the DC grid to achieve the developing need in the switching mode controls are tested. A novel combination of KYC and boost converters inherently operates in CC mode and reduces the output ripple [21]. The method of mixing KYC and Buck–Boost Converters enhances voltage gain [22–25]. Therefore, the KYC is a step-up converter; it is a boost converter. It contains a pair of the power switch, a diode, with a couple of capacitors, one for energy transferring and the other for output and an inductor [26, 27]. Fractional order differential equations are an excellent tool for analyzing different system behaviors and are well suited
Frequency and Stability Control of Photovoltaic and Wind-Powered …
107
for characterizing real-world systems. These qualities made these devices available in every field of research [28]. A novel control strategy is proposed for a grid-connected wind-PV-based Hybrid Renewable Energy System with a built-in—supercapacitor-based energy storage system—with the following objectives: (i) an operational control design using a KY-BC to support the integration of RES such as Solar Energy, Wind Energy, and ESD such as Supercapacitor (SC) to the DC microgrid. (ii) An aggregated model of overhead Photovoltaic (PV) panel, Wind energy (WE), and SC with KY-BC power generation. (iii) A dangle control technique for stabilizing converters connected to DC bus with variable loads. (iv) A comparative analysis between the performance of Integer order and Fractional order control.
2 System Description Renewable Energy plays a vital role in today’s power generation and leads to distributed generation, microgrid, and smart cities. Researchers have constantly been working on the converters to obtain better voltage gain and efficiency with reduced ripple in both voltage and current. The converters are otherwise known as electrical converters that change the electrical quantities such as voltage or current. DC to DC converters are electromechanical devices that convert the DC voltage from one value to another. Renewable Energy Systems require a high gain boost converters for most of the applications. The closed-loop block diagram and circuit diagram of the Proportional Integral Controller and Fractional order PID controller are shown below (Fig. 1). This proposed system is simulated for a 90 V DC as a testing voltage, considering the charging stations where the 90 V batteries of electric vehicles get charged both
Fig. 1 Closed-loop block diagram DC microgrid system
108
M. Moovendan et al.
Fig. 2 Circuit diagram of closed-loop DC microgrid system
directly, as well as using the battery swapping technology, and for supplying the loads ranging from 0 to 90 V DC. In the above block diagram, PV, WE, and SC integration is done using rectifiers and KY-BCs. The KY-BC is a new type of converter which ensures boosting in a CC mode. The power through the KY-BC is supplied to the DC grid, which connects the R, L, C loads such as Resistors, Motor, and Battery. The voltage of the DC grid has been measured and feedback. The measured voltage is fed as an error signal to the controllers, which generates the pulse signal supplied to the converters. These converters are essential in stabilizing the power to the DC grid. The SCs are being connected in parallel with RES will take the highfrequency mismatch, whereas the RES will take the low-frequency mismatches. The phenomenon of using SC will help in reducing the high power demand compensation. In the mean of quick and terse-term ESD, the SC unit is critical to the stable response of the generation units. A bi-directional KY boost conversion is used for the SC unit to provide active normalization [29] (Fig. 2).
2.1 Proportional Integral Controller The proportional controllers are usually used to make proportional changes in the gain value with the change in data while acquiring a stable point. The proportional controllers are responsible for attaining the regular end, but the consistent system operation can be possibly done using Integral controllers only. And so, the Integral gain values are used for achieving the objective. The conventional PI controllers with
Frequency and Stability Control of Photovoltaic and Wind-Powered …
109
their transfer functions are used for analysis purposes. C(s) = KP + KI /S
(1)
2.2 Fractional Order PID Controller For some of the dynamic realities, the traditional integer order systems are not suitable. Fractional order calculus is complex and very old as like integers. The ability of Fractional order calculus to portray the natural and dynamic systems in mathematical expressions grabs more attention of the researchers. The classical Integer order controllers have a quick response and non-complex, but for nonlinear reality, the Fractional order controllers are more suited. The Fractional order PID controller is simulated in this paper and compared to the classical Integer order PI controller. The generalized equation of Fractional order PID as per Podlubny in 1999 is G(s) = KP + KI s−λ + KDsμ
(2)
In the above equation, Case 1: if (λ, μ) is in the order (1, 1), then the system will give a PID controllers output, Case 2: if (λ, μ) is in the order (1, 0), then the system will give a PI controllers output, Case 3: if (λ, μ) is in the order (0, 1) then the system will give a PD controllers output, and Case 4: if (λ, μ) is in the order (0, 0), the system will gain [30]. However, the Fractional order PID controllers provide high flexibility to adjust with real-time dynamic properties. The Fractional order PID controller is similar to reality, which improves the control level and results higher than even higher-order Integer models. For Fractional order systems, the fractional controllers are most appropriate (Fig. 3). Fig. 3 Block diagram of fractional order PID controller
110
M. Moovendan et al.
3 Simulation Results 3.1 Closed-Loop PI Controller Simulation Results The below block diagram shows the closed-loop system of the DC bus connected from RES and ESD. The DC bus receives the supply from the source through the KY-BC and is controlled by a conventional integer PI controller. Since we have a closed-loop system, the set point is kept as 90 V, and so the stability analysis is done for this rated set point. The applied input is 15 V for the KY-BC, and the voltage across R-load is 90 V. The current R-load is 0.98 Amp. The resultant output power is 86 watts. The motor speed ranges 1100 rpm, and the torque is 2.35 N-m. The battery rated 88 V is connected to the DC bus as a capacitive load (Figs. 4, 5, 6, and 7).
Fig. 4 Simulink diagram of closed-loop DC microgrid with PI control
Fig. 5 Input voltage
Frequency and Stability Control of Photovoltaic and Wind-Powered …
111
Fig. 6 Voltage output of closed-loop DC microgrid with PI control
Fig. 7 Output power of closed-loop DC microgrid with PI control
3.2 Closed Loop Fractional Order PID Controller Simulation Results In Fig. 4, the PI controller block is replaced with Fractional order PID controller to obtain the Simulink diagram of closed-loop DC microgrid with Fractional order PID control. The simulation results for the system controlled by Fractional order PID are shown. The input voltage is 15 V for the converter, and the voltage across R-load is 90 V. The current flowing through R-load is 0.9 Amps. The output power is 84 Watts, the current and voltage ripples are 0.02 A and 1 V, respectively. The set point is 90 V. The torque and speed of the motor are 2.35 N-m and 1096 rpm, respectively. The battery rated 88 V is connected to the DC bus as a capacitive load (Figs. 8, 9, and 10). The outcome of simulation results is tabulated. The output voltage is improved from 61 to 90 V. The system’s voltage ripple with controllers such as Fractional order PID and the Integer order PI is 2 V and 6 V, respectively. These results show that the KY-BC and Fractional order PID controllers are more effective than the traditional boost converter and PI controller (Fig. 11).
Fig. 8 Voltage output of closed-loop DC microgrid with Fractional Order PID controller
112
M. Moovendan et al.
Fig. 9 Output power of closed-loop DC microgrid with Fractional Order PID controller
Fig. 10 Bar chart of conventional and proposed circuit diagram
Fig. 11 Bar chart of output voltage ripple
These simulated output data are obtained from the MATLAB-Simulink, as shown above. The time-domain response comparison between PI and Fractional order PID is marked in the following table. By using Fractional order PID controller, there is a reduction in the rise time, Settling-time, Peak time, and Steady-state error from 0.63 s to 0.61 s, 2.0 s to 0.93 s, 0.72 s to 0.67 s, and 3.21 V to 2.20 V, respectively. Bar Chart of time-domain parameters using PI and FOPID is shown in Fig. 12. Hence, the outcomes represent that the FOPID control is more advantageous to the Integer order PI control.
Frequency and Stability Control of Photovoltaic and Wind-Powered …
113
Fig. 12 Bar chart of time-domain parameters using PI and FOPID
4 Conclusion In this paper, the damping of the voltage ripples, reduction of rising time, settling time, peak time, and steady-state error of a DC bus have been proposed. The conventional PI controller is compared with Fractional order PID, and the traditional boost converter is compared with the KY-BC. As a result, the proposed controller and the converter have an effective output response. The simulation results are shown for PI and Fractional order PID controller with KY-BC.
References 1. Bevrani H, Ise T (2016) Microgrid dynamics and control. Wiley, Hoboken, NJ, USA. Author F, Author S (2016) Title of a proceedings paper. In: Editor F, Editor S (eds) Conference 2016, LNCS, vol. 9999, pp 1–13. Springer, Heidelberg 2. Hirsch A, Parag Y, Guerrero J (2018) A review of technologies, key drivers, and outstanding issues. Renew Sustain Energy Rev 90:402–411 3. Lu X, Sun K, Guerrero JM, Vasquez JC, Huang L, Wang J (2015) Stability enhancement based on virtual impedance for dc microgrids with constant power loads. IEEE Trans Smart Grid 6(6):2770–2783 4. Mohamad AMI, Mohamed YARI (2018) Analysis and mitigation of interaction dynamics in active dc distribution systems with positive feedback islanding detection schemes. IEEE Trans Power Electron 33(3):2751–2773 5. Majumder R et al (2010) Improvement of stability and load sharing in an autonomous microgrid using supplementary droop control loop. IEEE Trans Power Syst 25(2):796–808 6. Ashabani SM, Mohamed YA-RI (2012) A flexible control strategy for grid-connected and islanded microgrids with enhanced stability using non-linear microgrid stabilizer. IEEE Trans Smart Grid 3(3):1291–1301
114
M. Moovendan et al.
7. Ashabani SM, Mohamed YA-RI (2014) Integrating VSCs to weak grids by non-linear power damping controller with self-synchronization capability. IEEE Trans Power Syst 29(2):805–814 8. Azimi SM, Afsharnia S (2016) Multi-purpose droop controllers, incorporating a passivitybased stabilizer for unified control of electronically interfaced distributed generators including primary source dynamics. ISA Trans 63:140–153 9. Mojallal A, Lotfifard S, Azimi SM (2020) A non-linear supplementary controller for transient response improvement of distributed generations in micro-grids. IEEE Trans Sustain Energy 11(1):489–499 10. Zhong Q, Weiss G (2011) Synchronverters: inverters that mimic synchronous generators. IEEE Trans Ind Electron 58(4):1259–1267 11. Lara JD, Olivares DE, Cañizares CA (2019) Robust energy management of isolated microgrids. IEEE Syst J 13(1):680–691 12. Ma L, Zhang Y, Yang X, Ding S, Dong L (2018) Quasi-continuous second-order sliding mode control of buck converter. IEEE Access 6:17859–17867 13. Azimi SM, Hamzeh M, Mohamed YA-RI (2019) Non-linear large-signal stabilizer design for dc micro-grids. IETGener Transmiss Distrib 13(8):1297–1304 14. Azimi SM, Hamzeh M (2020) Voltage/current large transient suppression in DC microgrids using local information and active stabilizing capability. IEEE Syst J 14(1) 15. Alenka H, Miro M (2007) Dynamic analysis of SEPIC converter. Automatika 38:137–144 16. Zhu M, Luo FL (2008) Series SEPIC implementing a voltage-lift technique for DC-DC power conversion. IET Power Elect 1(1):109–121 17. Arulselvi S, Uma G, Sampath V (2006) Development of simple fuzzy logic controller for ZVS quasi-resonant converter: design, simulation, and experimentation. J Indian Instit Sci 86:215–233 18. Hwu KI, Yau YT (2009) KY converter and its derivatives. IEEE Trans Pow Elect 24(1) 19. Hwu KI, Yau YT (2010) A KY-Boost converter. IEEE Trans Pow Elect 25(11) 20. Abdul O, Ahmed R (2004) Hardware implementation of an adaptive network-based fuzzy controller for DC-DC converters. IEEE Trans Indust Appl 41(6):1557–1565 21. Abdul O, Ahmed R (2005) Real-time implementation of a fuzzy logic controller for DC-DC switching converters. IEEE Trans Indust Appl 42(6):1367–1374 22. Abraham A, Nath B (2000) Hybrid intelligent systems: a review of a decade of research. School of Computing and Information Technology, Faculty of Information Technology, Monash University, Australia, Technical Report Series, vol 5, pp 1–55 23. Agostinelli M, Priewasser R, Marsili S, Huemer M (2011) Fixed frequency pseudo sliding mode control for a Buck-Boost DC-DC converter in mobile applications: a comparison with linear PID controller. IEEE Int Symposium Circuit Syst, ISCAS 2011:1604–1607 24. Anil K, Chandra D, Seshachalam D, Tirupathi RK (2006) New fuzzy logic controller for a buck converter. In: Proceedings of IEEE International Conference on Power Electronics, Drives and Systems, No. 5, pp 1–3 25. Berenji HR, Khedkar P (1992) Learning and tuning fuzzy logic controllers through reinforcements. IEEE Trans Neural Networks 3:724–740 26. Bourdoulis MK, Spanomichos JG, Alexandridis AT (2011) Model analysis and intelligentANFIS control design for PWM-regulated ac/dc converters. In: 16th International Conference on Intelligent System Application to Power Systems, ISAP 2011, pp 1–6 27. Cetin E, Omer D, Huseyin S, Hasan (2009) Adaptive fuzzy logic controller for DC-DC converters. Int. J Expert Syst Applicant 36(2):1540–1548 28. Gutierrez RE, Rosario JM, Machado JT, Fractional order calculus: basic concepts and engineering applications. Hindawi Publishing Corporation, Mathematical Problems in Engineering, 2010, Article ID 375858, 19 p 29. Hamzeh M, Ghafouri M, Karimi H, Sheshyekani K, Guerrero JM (2016) Power oscillations damping in dc microgrids. IEEE Trans Energy Convers 31(3):970–980 30. Podlubny I (1999) Fractional-order systems and P, Iλ, Dμ–controllers. IEEE Trans Automatic Control 44(1)
An Evaluation of Feature Selection Methods Performance for Dataset Construction P. Usha and M. P. Anuradha
Abstract A significant growth of digital data in different applications increases the data size and storage. Digital data may include missing values, irrelevant information, incorrect values, and redundant features. Each attribute in data collection is called features of dimensions. More dimensions in a dataset make prediction a complicated task. Feature selection is a method that plays a vital role in reducing the dimension of data and it can be done as an initial step in processing. Feature algorithm extract the refined feature for better classification and accuracy of predictive models. The proper selection of features is used to increase the efficiency of a dataset and performance of a model. Feature selection methods are not only used to reduce dataset but also to reduce the overfitting problems in mining process. This paper presents various feature selection methods in order to extract consistent data. The algorithms such as CFS, CAE, IGE, GRE, and WSE are used to select features. To measure the performance of these selected feature Naive Bayes (NB) model and support vector machine (SVM) model are used. Experimental result shows CAE, GRE, and IGE with SVM model give better performance than other methods. Keywords Correlation-based feature selector · Correlation-based attribute evaluator · Information gain-based evaluator · Gain ratio-based evaluator · Wrapper-based subset evaluator · Naïve Bayes and support vector machine
1 Introduction Nowadays, an advancements in science and technology make the world a computerized one. Information is collected and stored in different format which may have a lot of irrelevant, redundant, and noisy feature. Due to the growth of data, the P. Usha · M. P. Anuradha (B) Department of Computer Science, Bishop Heber College, Affiliated to Bharathidasan University, Tiruchirappalli, Tamil Nadu, India e-mail: [email protected] P. Usha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_9
115
116
P. Usha and M. P. Anuradha
managing process becomes more tedious. To make the data meaningful, many algorithms and methods in various fields like big data analytics, data science, data mining, artificial intelligence, and deep learning are introduced. To get purified data, many feature selection algorithms are used. Feature selection technique is the process of selecting more and most relevant and significant attributes from large given datasets. Feature selection methods are used to select and exclude features without changing the features, whereas dimensionality reduction transforms the features into lower dimensions and creates an entirely new feature as input. Feature selection methods deal with the lesser number of attributes, and it will be used to reduce complexity. A feature subset can be classified as: (1) irrelevant and noisy, (2) weakly relevant and redundant, (3) non-redundant and weakly relevant, and (4) Most or strongly relevant [1]. Feature selection methods always prefer to select strongly relevant and non-redundant data from large datasets. The main aim of the feature selection process is to identify the subset of attributes which are meaningful from a huge number of collected features. Feature selection models are very useful because: (1) It makes machine learning algorithm to train a model faster. (2) It makes a model easy to interpret and reduces the complexity of a dataset. (3) It improves the performance of a model by means of accuracy when right features are selected. (4) A proper feature selection methods are used to avoid overfitting problem.
1.1 Feature Selection Procedure A feature is a measurable property in a dataset, so it should be observable while removing in the selection process. Feature selection [2–5] is the process to find relevant information from large dataset in order to obtain best performance metrics. The following steps (Fig. 1) are involved in feature selection process. 1. 2. 3. 4. 5.
Search direction Determine search strategy Evaluation criteria Stopping criteria Validate results
Search direction is the first phase in feature selection which is used to find the starting point. The searching process can be done in three ways. (1) forward searching. (2) backward searching. (3) random searching. Feature selection strategies are classified as follows. (1) Forward sequential selection (FSS) is used to ignore the insignificant and irrelevant features and obtain an optimal subset. (2) Backward sequential selection (BSS) includes all the features and removes the irrelevant or redundant features one by one until it gets the best subset of features. Compared to FSS, this method gives better computational effect on performance. (3) Hill Climbing (HC) which combines both FSS and BSS. In this
An Evaluation of Feature Selection Methods Performance for Dataset … Original Data
Determine Search Direction
Determine Search Strategy
117 Determine Evaluation Criteria
Features
No
Stopping Criteria Selected Feature Subset Yes Validate Result
Fig. 1 Feature selection process
method, the stopping criteria are set earlier by defining the number of iterations to select the optimal set. The last iteration returns the best subset for classifier model. Evaluation criteria is used to evaluate the best subset which is used to determine the relevancy toward the classification model. Stopping criteria is used to specify where to stop the feature selection process in order to obtain an optima subset of features. The most common stopping criteria’s are number of predefined feature, number of predefined iterations, and evaluation function. Validate result is used to check whether the selected features are valid or not; otherwise, it is used to check whether the selected features are meaningful for further process.
2 Literature Survey Feature Selection Technique (FST) can be classified as filter, wrapper, and embedded methods. A filter [2, 6, 7] FST is a supervised method which uses a statistical technique to evaluate the relationship between target and input variables. Filter method uses ranking technique to select features to build a classification model in preprocessing step to filter a less relevant information. Ranking techniques are used to select variables, and the selection process is independent of classifier. Filter process can be done in two steps. They are (1) Rank feature (2) Filtered out low ranked features. Based on the performance values of a classifier model best subset can be chosen. Filter methods include chi square test, F score, information gain, etc. This methods are very simple to execute and takes less computation time to process. Filter method performs feature selection process is prior to classification. Wrapper feature selection method [2, 8] creates a model with different subset of all input features. According to the performance metric, the best model is chosen. This method requires a learning algorithm in prior. Wrapper feature selection process has two steps. (1) Make a search to generate subset. (2) Evaluate the subset by learning
118
P. Usha and M. P. Anuradha
Table 1 Pros and cons of feature selection methods and feature selection algorithm S no.
Method
Advantages
Disadvantages
Algorithm
1
Filter
Faster and efficient Independent of classification model Suitable for low dimensional data
Low computational performance Ignore features Doesn’t consider feature dependency
Information gain Gain ratio Correlation-based feature selection
2
Wrapper
Consider feature dependency Interact with classifier model
High computational cost Not guaranteed for optimal solution High risk of overfitting
Sequential forward selection Sequential backward selection Genetic algorithm
3
Embedded
Faster running time Lower overfitting risk Interact with classification model
Poor generalization among features Not suited for high dimensional data
Decision tree Naïve Bayes SVM
algorithm. In this method, before selecting feature selection methods, the knowledge of classification algorithms are required. Compared to filter approach, this method has higher computational cost. An embedded feature selection is an optimal feature generation method. In this method, feature selection and learning are performed simultaneously. In training phase itself the feature is selected. This method provides the feature with the highest accuracy of the model. A feature selection algorithm [8–10] proposes a new feature subset based on the combination of search techniques and evaluation measure depends on feature. This algorithm is used to minimize the error rate. The evaluation metrics are important to analyze the performance of classifier model. Table 1 shows the merits and limitations of feature selection techniques and some of the algorithm which is used to select features.
3 Feature Selection Methods Correlation based Feature Selector (CFS) method evaluates subset which contains an attribute that is correlated to the class label but independent of each other. In this method, attributes are ranked according to the heuristic function based on correlation of attributes. It can be calculated using the formula. Ms = l · (tcf ) ÷
√
(l + l(l − 1)) · (tff )
(1)
An Evaluation of Feature Selection Methods Performance for Dataset …
119
where M s represents the subset, l is features, t cf is average correlation between class and feature and t ff represents correlation between two features. The pseudocode of CFS is given as:
Pseudo Code: CFS I n put: S(F1, F2 . . . .Fn)// Training Dataset r// Predefined Correlation Coefficient value Out put: S Best // Optimal Subset St e p1 : FindtheCorr elated Featur es(C F) F O Ri = 1ton G(X/Y )] // IG Information Gain Calculate C F(x,y) = 2∗[I H (X )+H (Y ) // H Entropy St e p2 : I FC F ≥ r T H E N AD D Fi toSSelect St e p3 : Rank Featur es F O Ri, j = 1ton I FC F i > C F j T H E N SSelect = C F i St e p4 : S Best = S’ Select // Optimal Subset St e p5 : R E T U R N
Correlation-based Attribute Evaluator (CAE) method [11] evaluates correlation of an attribute with respect to target class. Pearson’s Correlation measures correlation between attributes as well as the target class. It can be calculated by F1 ∗ F2 F1 ∗ F2 − PCC = 2 2 2 2 n F1 − F1 ∗ n F2 − n F2
Pseudo Code: CAE_.Pearson’s Correlation I n put: S(F1, F2 . . . .Fn) // Training Dataset P C C// Pearson’s Correlation Co efficient Out put : SBest // Optimal Subset St e p1 : F O R F1 toFn 1 ,F2 ) CalculatePCC = Cov(F // Covariance σ F1 σ F2 // σF1 σF2 Standard Deviations Or F ∗F − F ∗ F PCC = 2 1 2 (2 1 ) (2 2) 2 // n total no of Features n F1 −( F1 ) ∗n F2 −n ( F2 ) St e p2 : Select O ptimal Featur e n∗Average(PCCfc ) // ff correlation b/w features SBest = √n+n(n−1)Average(PCC ff )
(2)
120
P. Usha and M. P. Anuradha
//fc correlation b/w feature & class St e p3 : R E T U R N
Gain Ratio-based Attribute Evaluator (GRE) [12] is a memory-based learning method which computes weight for a feature without considering other feature values present in a dataset. Information gain is used to select the training attribute. It can be calculated by
Gain_Ratio(Class, Attribute) =
H (Class) − H (Class|Attribute) H (Attribute)
(3)
Input: S(F1, F2, ...Fn) // Training Dataset/Sample G R// Gain ratio Out put : SBest // Optimal Subset St e p1 : CalculateI n f or mationGaino f f eatur einS m pi log2 pi // pi Probability, m distinct classes I (S) = − i=1 St e p2 : PartitionSintoSubset A&obtain Entr opy m I (S) S1i +S2iS+...Sm E(A) = − i=1 St e p3 : CalculateGaino f A Gain(A) = I (S) − E(A) St e p4 : Set Splitting Attribute
SpiltInfo (A) = −
m (|Si /S|) log2 (|Si /S|) i=1
St e p5 : CalculateGain Ratio Gain(A) GR(A) = SplitInfo(A) St e p6 : Select H ighG R(A) SBest = G R( A) St e p7 : R E T U R N
Information Gain-based Evaluator (IGE) [13] is an entropy-based method in which relevance of an attribute is measured against the target variable. It is used to define how much amount of information is provided by a feature. This algorithm is used to select the features which have larger gain value the predefined threshold. It can be calculated by
An Evaluation of Feature Selection Methods Performance for Dataset …
121
Info_Gain(C, A) = H (C) − H (C|A)
(4)
where A represents attribute, C as class, H(C) is an entropy of a class and H(C|A) represents a conditional entropy of a class for the given attribute. IG is a good measure for identifying relevance of an attribute, but it is less efficient when attributes have different values. The pseudocode for IGE is given as:
I n put : S(F1, F2 . . . .Fn)//Training Dataset/ Sample I G //Information Gain Out put : SBest //Optimal Subset Step1 : Findtheex m pected I n f or mationo f SampleS ps log2 ps //ps Probability, m distinct classes Info(S) = − i=1 Step2 : PartitionSintoSubset A&obtain I n f ormationr elatedtoS m //(|Si /S|) weight of jth partition (S) = − (|Si /S|) ∗ Info(S) i=1
Step3 : CalculateGaino f A Gain(A) = Info(S) − Info A (S) Step4 : Select H ighGain(A) SBest = Gain(A) Step5 : R E T U R N
Wrapper Subset Evaluator (WSE) [14] is used to find the interaction between variables. To find an optimal feature, the selection process can done forward or backward or bi directionally. It is called greedy method because this method aims to find best subset in order to give a permanent model. This wrapper method select feature combines with classification model. I n put F : S(F1, F2 . . . .Fn)// Training Dataset/ Sample W S E// Wrapper Subset Evaluator_Greedy Out put: SBest // Optimal Subset St e p1 : Seti = 0, S Best = 0, itera = 0, O pt = 0 St e p2 : W H I L E(i < n) Set K = si ze F (i) , M AX = 0, Featur e = 0 F O R j = 1toK Set Scor e = evaluate F (i) I F(Scor e > M AX )T H E N M AX = Scor e Featur e = F (i) ∗ j E N DI F E N DFO R
122
P. Usha and M. P. Anuradha
I F(M AX > O pt)T H E N O pt = M AX, itera = i SBest → F (i) + Featur e F (i+1) = F (i) − Featur e i ++ E N DI F St e p3 : Retur nSBest
Classifier models: Naïve Bayes classifier [1] model is based on Baye’s theorem which is used to find the probability of an event occurring based on a previously occurred event. It can be defined as
P(A|B) =
P(B|A)P( A) P(B)
(5)
where P(A|B) is conditional probability and P( A) is called class probability. Support vector machine (SVM) comes under supervised learning model [15] which is based on statistical theory in which hyperplanes are used to find the decision boundary which is used to segregate the features into different classes. The decision boundary is called hyperplane and can be represented as Y =a∗x +b
(6)
where b is a constant and a represents slope of a hyperplane. SVM is suitable for both classification and regression problems.
4 System Model Every dataset may contain irrelevant, redundant, and noisy data. These inconsistent data are not suitable for prediction of results. Thus feature selection algorithms help to retrieve consistent data from huge datasets. Feature selection process comprises two process: (1) Attribute Evaluator (2) Search method. An attribute evaluator is a technique in which each attribute in a dataset is assessed based on output variable. The search method is a technique in which best subset of attributes is achieved by combining different attributes in a dataset. In this system model (Fig. 2), dataset is given as input, and it depends upon the dataset, attribute evaluator and search method which is chosen in order to obtain subset. These subsets can be considered as dataset to the classifier model such as Naïve Bayes, support vector machine [16,
An Evaluation of Feature Selection Methods Performance for Dataset …
123
Feature Selection Process Dataset
FST
Classifier Model
Search Method
Performance Analysis Fig. 2 System model for feature subset selection
17], and performs classification task to predict performance metrics such as accuracy, sensitivity, F-score etc.
5 Experimental Setup In this paper, two datasets such as breast cancer and brain tumor are used. Breast cancer dataset contains ten attributes such as age, menopause and size of the tumor and is downloaded from UCI repository. Brain tumor dataset contains 12 attributes such as histology, grade, gender, and mutation status is downloaded from Chinese Glioma Genome Atlas (CGGA). WEKA [17] tool is used to select features. Table 2 shows the sample dataset of brain tumor and Table 3 for breast cancer.
6 Results and Discussion In our experiment, five feature selection algorithms such as Correlation based Feature Selection (CFS), Correlation Attribute Evaluator (CAE), Gain Ratio attribute Evaluator (GRE), Info Gain attribute Evaluator (IGE), and Wrapper Subset Evaluator (WSE) [14] are chosen. CFS method combined with Best first search technique always is used to select best features. CAE, IGE, and GRE method uses ranker technique to rank the features. Likewise, WSE method chooses Greedy stepwise search technique. In this experiment to evaluate an effectiveness of selected features, Naïve Bayes and support vector machine algorithms are used. Table 4 shows the different FST and Search technique and selected number of feature in a given dataset. Table 5 shows the performance result of classifier based on feature election algorithm. From this, we can identify SVM with CAE, GRE, and IGE models are best suited for health-related information. The result is shown in Fig. 3.
WHO Male II
WHO Male II
WHO Male II
A
CGGA_1012 Recurrent rO
A
GBM
CGGA_1010 Primary
CGGA_1014 Primary
CGGA_1017 Primary
A
A
CGGA_1030 Primary
CGGA_1032 Primary
56
50
WHO Female 41 II
WHO Female 40 II
WHO Male III
AA
CGGA_103
Primary
WHO Male II
CGGA_1018 Recurrent rO
WHO Female 29 IV
42
45
45
WHO Female 47 II
O
CGGA_1003 Primary
WHO Female 43 III
AA
Wildtype
Mutant
Mutant
Mutant
Wildtype
Wildtype
Mutant
Mutant
Mutant
Wildtype
Codel
Non-codel
Non-codel
Codel
Non-codel
Non-codel
Non-codel
Non_codel
Codel
Non-codel
643 1
3275 0
1188 1
2527 1
768 1
263 1
3407 0
246 1
3428 0
1
1
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
Censor Radio_status Chemo_status
305 1
PRS_type Histology Grade Gender Age IDH_mutation_status 1p19q_codeletion_status OS
CGGA_1002 Primary
CGGA ID
Table 2 The sample dataset brain tumor
124 P. Usha and M. P. Anuradha
Menopause
Premen
Age 40
Age 40
Premen
Premen
Premen
Age 40
Premen
Premen
Age 40
Age
40 < 50
50 < 59
50 < 59
40 < 50
40 < 50
50 < 59
50 < 59
40 < 50
40 < 50
40 < 50
40 - 44
0–4
Octo_14
40 - 44
25 - 29
30 – 34
35 - 39
35 – 39
15 – 19
15 – 19
Tum_Size
Table 3 Sample dataset breast cancer
15 - 17
0–2
0–2
0–2
03_MAY
03_MAY
0–2
0–2
0–2
0–2
Inv_Nodes
Y
N
N
N
N
Y
Y
N
N
Y
Node_ Cap
2
2
2
3
2
2
3
2
1
3
Deg_Malig
R
R
L
L
R
L
R
L
R
R
Breast_Side L/R
L_Up
R_Low
L_Up
L_Up
L_Up
R_Up
L_Low
L_Low
Central
L_Up
Breast_Quad
Y
N
N
N
Y
N
Y
N
N
N
Irradiat
No reccur
No reccur
No reccur
No reccur
No reccur
Recurrent
No reccur
Recurrent
No reccur
Recurrent
Class
An Evaluation of Feature Selection Methods Performance for Dataset … 125
126
P. Usha and M. P. Anuradha
Table 4 Feature selection algorithm and technique to select subset Feature selection technique
Search technique
Dataset Breast cancer
Brain tumor
Total no. of attributes = 10
Total no. of ttributes = 12
Correlation-based feature selector (CFS)
Best first
5
4
Correlation-based attribute evaluator (CAE)
Ranker
9
11
Gain ratio-based attribute evaluator (GRE)
Ranker
9
12
Info gain-based attribute evaluator (IGE)
Ranker
9
12
Wrapper subset evaluator (WSE)
Greedy step wise
10
12
Table 5 Performance measure of NB and SVM Dataset/algorithm Accuracy Naïve Bayes CFS
CAE
SVM GRE
IGE
WSE
CFS
CAE
GRE
IGE
WSE
72.37 75.17 75.17 75.17 70.28 73.43 76.23 76.22 76.22 70.28
Brain tumor
64.31 86.14 91.01 91.01 91.01 65.12 98.91 98.40 98.40 98.40
Accuracy %
Breast cancer
77 76 75 74 73 72 71 70 69 68 67
76.23 75.17
76.22 75.17
76.22 75.17
73.43 72.37 70.28 70.28
CFS
CAE
GRE
IGE
Feature Selection Technique Naïve Bayes
Fig. 3 Performance measurement
SVM
WSE
An Evaluation of Feature Selection Methods Performance for Dataset …
127
This graphical representation shows feature selection algorithm’s accuracy along with NB and SVM classifier models.
7 Conclusion This paper shows the efficiency of various feature selection algorithms to select consistent data. From that, we can conclude Correlation based Attribute, Gain Ratio based and Information Gain based evaluators are used to select significant and relevant features in an efficient way. Each method has its own pros and cons. CAE is best for medical-related data, and IGE and GRE are best suited for discrete and small amount of data. In the future, instead of eliminating inconsistent data, we can treat them as a consistent one by applying data cleaning process. If information is eliminated, the size of dataset becomes smaller. From the small amount of dataset, we cannot give predictions. Thus, treating inconsistent instances must be treated as consistent. Filter feature selection techniques such as CAE, GRE, and IGE give good results when combined with SVM classifiers.
References 1. Bermejo S (2017) Ensembles of wrappers for automated feature selection in fish age classification. Comput Electron Agric 134:27–32 2. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453 3. Kadhim BS, Janabi A, Kadhim R (2018) Data reduction techniques: a comparative study for attribute selection methods. Int J Adv Comp Sci Tech 8(1):1–13. ISSN 2249-3123 © Research India Publications http://www.ripublication.com 4. Rozlini M (2018) Munirah Mohd Yusof and Noorhaniza Wahidi”, A comparative study of feature selection techniques for Bat algorithm in various applications. MATEC Web of Conferences 150:06006. https://doi.org/10.1051/matecconf/201815006006 5. Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybernetics Inform Technol 19. ISSN: 1311-9702; Online ISSN: 1314-4081. https://doi.org/10.2478/cait2019-0001 6. Hasri NM, Wen NH, Howe CW, Mohamad MS, Deris S, Kasim S (2017) Improved support vector machine using multiple SVM-RFE for cancer classification. Int J Adv Sci Eng Inf Technol 7:1589–1594 7. Shah SAA, Shabbir HM, et al (2020) A comparative study of feature selection approaches: 2016–2020. Int J Scient Eng Res 11(2), February. ISSN 2229-5518 8. Das S, Singh PK, Bhowmik S, Sarkar R, Nasipuri M (2017) A harmony search based wrapper feature selection method for holistic Bangla word recognition. Procedia Comput Sci 89:395– 403 9. Liu Z, Wang R, Japkowicz N et al (2019) Mobile app traffic flow feature extraction and selection for improving classification robustness. J Netw Comput Appl 125:190–208. https://doi.org/10. 1016/j.jnca.2018.10.018 10. Kang C, Huo Y et al (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91. https://doi.org/10.1016/j.jtbi.2018.12.010
128
P. Usha and M. P. Anuradha
11. Venkatesh B, Anuradha J, A review of feature selection and its methods. Cybernetics Inform Technol 19(1). Print ISSN: 1311-9702; Online ISSN: 1314-4081. https://doi.org/10.2478/cait2019-0001. 12. Rahman MA, Muniyandi RC (2018) Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int J Adv Sci Eng Inf Technol 8:1387–1393 13. Li J, Cheng K, Wang S, Morstatter F (2018) Feature selection: a data perspective. ACM Comp 14. Jameel S, Rehman SU (2018) An optimal feature selection method using a modified wrapperbased ant colony optimisation. J Nat Sci Foundation of Sri Lanka 46(2) 15. Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267:687–699 16. Pratiwi AI, Adiwijaya (2018) On the feature selection and classification based on information gain for document sentiment analysis. Hindawi Appl Comput Intell Soft Comp, Article ID 1407817, 5 p. https://doi.org/10.1155/2018/1407817 17. Gnanambal S, Thangaraj M et al (2018) Classification algorithms with attribute selection: an evaluation study using WEKA. Int J Adv Networking Appl 9:3640–3644, 6 p. ISSN: 0975-0290
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning Algorithm Maddikera Kalyan Chakravarthi, Tamil Selvan Subramaniam, Ainul Hayat Abdul Razak, and Mohd Hafizi Omar
Abstract The monitoring system available in the market currently does not come with detection of clothing that abide to the safety standard. The purpose of this project is to develop, analyze, and evaluate IoT-based laboratory safety monitoring camera using deep learning algorithm. This project is developed base on the System Development Life Cycle (SDLC) model. In which there are five steps to be taken which is analyze, design, develop, implement, and evaluate. The OpenCV library for Python is used to develop the programming of the PPE detection algorithm and the system also integrates the element of IoT which is developed using the platform Telegram to send notification. Technical analysis was conducted for the circuit, detection algorithm, and notification system before the project was handed over to expert for evaluation purposes. The project was evaluated by three experts that have expertise in the field of electric and electronic using the checklist instrument from the aspect of design, necessity, and functionality. The findings of the response by the expert show that it was very positive towards the project objective and it can be more optimized by expertly trained algorithms. This can increase the amount of individuals being monitored and further modify the project so that it could be used on any environment to control the door remotely. Keywords OpenCV · Raspberry Pi · Safety monitoring · IoT · Deep learning
M. K. Chakravarthi (B) School of Electronics Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India e-mail: [email protected] T. S. Subramaniam Faculty of Technical and Vocational Education, Universiti Tun Hussein Onn, Parit Raja, Malaysia e-mail: [email protected] A. H. A. Razak Soft IP Logic Programmer (GT), INTEL PSG, Penang, Malaysia M. H. Omar Faculty of Electronics Engineering Technology, Universiti Malaysia Perlis, Arau, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_10
129
130
M. K. Chakravarthi et al.
1 Introduction In the era of fourth industrial revolution (IR4.0), the significant technological achievement came in the form of communication between machine-to-user and machineto-machine. Direct communication, big data, human–machine cooperation, longdistance sensing, monitoring and controlling, autonomous tool along with interconnection between all of this can be consider as an asset that can’t be ignore. As a result, the industry sector compete to always find ways to increase productivity and cost reduction [1]. One of the key technological advancements within this era of industrial revolution is artificial intelligence. A lot of success in terms of technological innovation used in industry was introduced during the fourth industrial revolution. According to literature, the first revolution occurs in eighteenth century Britain, and the revolution was spark because of the salary structure provided by Britain was greatly superior compare with other nation [2]. This proves that if there is a huge investment in the industry, it is bound to advance to a higher level. The technological discovery of steam engine became a core toward the development of new production machine such as sewing machine which enables clothing to be massively produced at a greater scale. The new technology generated by such revolution also introduces new challenge. These challenges have to be overcome so that the transition of technology can occur safely. It is also stated that problem arises between the cooperation of robot and worker through close interaction between them, where the robot will be used to help the worker conduct his difficult and dangerous task [3]. He insists the importance of developing a robot that have a safety-conscious and is able to recognize action that could cause injury or safety hazard toward the worker. Internet search engine such as Google is a solid example of an artificial intelligence [4]. Developer believes that artificial intelligence has gone through multiple stage of advancement over the year and that the level of intelligence is no longer far from that of a human. This is why in order to fully take advantage of this advancement in technology, a product has to be develop that will utilize fully the function and operation of an artificial intelligence technology.
2 Research Background Accidents at the laboratory occur due to unintentional causes which are mostly usergenerated; they fail to follow the safety criteria provided by the management. The research conduct shows that teacher and student have a 27.33% chance to receive injury when they are inside a lab [5]. Not to mention a study done about the discovery that at the end of 2017 there is 80,393 injury occurs at workplace’s laboratory and workshop [6]. Generally, accident occurs due to the unsafe condition and action practice by the worker. That is why to discipline a laboratory user to follow a laboratory safety guideline is necessary.
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning …
131
A report evaluating safety precaution toward an electrical laboratory at Polytechnic shows that the usage of personnel protective equipment (PPE) and safe working standard at an electrical laboratory are at a low score [7]. Both scoring 66.25% and 62.5%, respectively. Most individuals would simply take the easy way when working on a task and neglect the risk that would occur. As observed in one of the studies, a worker in the laboratory or on the site seems not to be bothered about safety and chooses to complete tasks quickly by exposing oneself to unnecessary risk [8]. Developer believes that there is a huge need of improvement in safety sector within the fourth industrial revolution. A product that applies the technology available within IR4.0 have to be developed in order to solve the safety issues occurred. A list of few steps that can be taken in order to increase the safety of workers in the fourth industrial revolution and one of the steps that can be applied is to develop a dynamic interface and emotion sensor to monitor worker to make sure their safety continuously was also proposed [9].
3 Problem Statement The monitoring system currently available at the market right now does not have a feature to detect safe clothing standard. Other than that, there is also a disadvantage in term of the safety element that could cause injury or even dearth to the victim and the increment of risk is usually came from the user themselves. The failure to discipline the lab user to follow the rules and wear their PPE is a huge problem that has to be overcome. Moreover, the large size of the laboratory will make it hard for the staff to manage the risk because of the sheer size of a laboratory. However, if there is help from an autonomous system that could monitor the PPE of the individual, this would greatly improve the safety of the laboratory. The development of the autonomous monitoring system will be able to eliminate human error. The monitoring responsibility can be given fully to this system. The development of this system can provide support toward individual who is responsible toward the safety of a laboratory for continuous monitoring thus reducing the risk of injury that could happen to a laboratory user. The environment of a lab for an education institute specifically toward certificate-level student, where the average age of the student can be considered as young and unable to determine the risk of safety toward oneself, and will more than likely to ignore the safety rule of a laboratory if not monitored. Therefore, the objectives of this project are: (i) Develop an artificial intelligence-based system to detect PPE and integrate the system with IoT, (ii) Analyze the artificial intelligence-based system to detect PPE and the integration of the system with IoT, and (iii) Evaluate the artificial intelligence-based system to detect PPE and the integration of the system with IoT.
132
M. K. Chakravarthi et al.
4 Methodology Previous product or research that has been develop and conducted was used as a point of reference when developing the laboratory camera monitoring system based on IoT using deep learning algorithm. The developer develops a safety monitoring camera for railway crossroad. The product was meant to help the train from crashing unto pedestrian or vehicle using the crossroad. The monitoring was conducted autonomously using object detection algorithm [10]. There is an increase in safety practice among the individual under surveillance. The increment occurs at two stage which is when the CCTV was first installed and the other time was at the end stage of the research. This prove that safety practice increases with the help of monitoring [11]. The finding shows that if a person is required to spend time to evaluate a video feed their performance will degrade overtime. This support for the need to replace human with a computerized system that will conduct the monitoring autonomously [12]. The findings show that a stereo camera system that covers three angles can be used to increase the safety level of working with a robot. The camera feed should be connected to the robot so that it can judge whether there is a risk toward its human counterpart [13].
5 Discussion The methodology used for the development of IoT-based laboratory safety monitoring camera using deep-learning algorithm uses SDLC model, which has five main phases to be completed. The five phases are analysis, design, development, implementation, and evaluation.
5.1 Phase 1: Analysis Initially, analysis was done in order to determine the problem and issue that existed in order to develop a project that could help to reduce or eliminate the said problem. For this project, developer has made analysis and found out that a safety monitoring system is necessary in order to help manage risk inside of a laboratory. Previous product was compared in order to determine what step to be taken for developing the system. What area to improve on, what innovation is required all of those questions were asked, and an objective is form base on that question.
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning …
133
5.2 Phase 2: Design Design phase focuses on the development of the design of the monitoring system and how it will the system operate. At the end of this phase, a design for this system will eventually be decided. Developer has sketched a design suitable for the purpose of the development of the IoT-based laboratory safety monitoring camera using deeplearning algorithm. The design will be a determinant whether the objective is able to be achieve or otherwise. The implementation of the real-time work in this paper is elaborated using Figs. 1 and 2.
Fig. 1 Overall design of the project
Fig. 2 Design implementation of the system
134
M. K. Chakravarthi et al.
Fig. 3 The notification and algorithm that were develop
5.3 Phase 3: Development and Implementation The development of the system was done on a Raspberry Pi microcomputer. Mainly, the Python compiler for the operating system was set up so that it could use the OpenCV library, and the installation are mostly done through the Linux terminal. After the compiler was set up with OpenCV developer proceed with developing the programming for the detection algorithm. The cascade classifier was trained using the software Cascade-Trainer-GUI through multiple stage of training. After the detection algorithm was done, next step was the development of the notification system which uses Telegram as the platform for sending notification. Telegram has its own API that developer could use to take advantage of its messaging platform. After both parts of the system were done which were the algorithm and notification, the system is now fully functional and ready to be test. Sample real-time image acquisition and analysis in software are explained by Fig. 3.
5.4 Phase 4: Testing and Evaluation The analysis was done on the detection circuit and door locking mechanism. For the detection circuit, the analysis has confirmed that all parts of the circuit was working as intended. In the detection circuit, there is an LED used as indicator for the user to refer in order to know their status of detection. There are three LED light that will turn on base on the progress of the detection. Table 1 shows the analysis of the LED. The lock work flawlessly, it responded to the programming really well. Table 2 shows the results of implementation in different cases. When the lock is open, considerable amount of heat is generated around the lock due to the induction within the lock.
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning …
135
Table 1 Analysis of indicator LED S. no LED indication
Findings
1
No Led turn on because there is no user wearing PPE detected
2
Red LED turn on because there is a user wearing PPE detected. The process of detection commencing
3
Yellow LED turns on because the detection process is now complete and it has verified that the object detected is indeed a proper PPE
4
Green LED turn on now the user is allowed to enter the laboratory
Programming of the system can be classified into three parts, namely, Detection algorithm, Control of general-purpose input/output, and system for sending notification. Each part of the programming has been thoroughly analyzed to determine that they are working properly. For the detection, algorithm analysis was done simultaneously during its development due to the algorithm unique traits. The analysis and development need to be conduct alongside is due to the method of developing the algorithm which was by using deep learning. Multiple stage of development and analysis went through before a sufficient detection algorithm was able to produce. A total of 231 negative samples and 191 positive sample were provided to the training of the algorithm. What can be concluded from the analysis is that to increase the accuracy of the detection algorithm, the sample provided must have a solid distinction between positive and negative sample. The response from experts was
136
M. K. Chakravarthi et al.
Table 2 Analysis of the detection algorithm S. no Picture of detection for each stage
Findings of each stage
1
The first stage shows that there is an error when detecting the object, and there also a moment when the PPE was unable to be detected
2
On the second stage, there is an improvement toward the algorithm, though there are still error when detecting the object, but the algorithm is getting efficient
3
On the third stage, different background was used. This was done in order to simulate different background, and the algorithm was unable to detect the object
4
On the fourth stage, the object is still unable to recognize the object. More input is needed to train the algorithm to identify the object
5
On the fifth stage, even though there is a good improvement by the algorithm, it is still detecting a wrong object. More negative data is required for the algorithm to produce a cognitive like detection
6
On the sixth stage, considerable improvement was achieved by the detection algorithm. PPE object was able to be detected for 70% of the time. However, object that was not PPE was detected for about 20% of the time
7
Finally, the algorithm was able to identify the PPE object that was needed to be detected. The result of the algorithm was great and the percentage of error is about 5%
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning …
137
achieve through checklist provided and comments regarding the improvement that could be done toward the system. Tables 1 and 2 elaborate on practical cases of implementation of the current research. The experts have main interest in the field of electric and electronic. The checklist is divided into three sections. The first section is for the design, the second section is for the suitability and finally the functionality of the system. In order to analyze the checklist result from each expert developer use the level of satisfaction by each expert to produce a score of each section. The weightage or full score of each question is 5, so for very satisfied the question would receive a perfect score of 5, and if the experts choose very unsatisfied then the question will have a score of 0. The full marks that each section can obtain are 25 marks, and the scores are given by Figs. 4, 5, and 6.
Fig. 4 Score for the design of the system
Fig. 5 Score for the suitability of the system
138
M. K. Chakravarthi et al.
Fig. 6 Score for the functionality of the system
6 Conclusions The development of IoT-based laboratory safety monitoring camera using deeplearning algorithm has been divided into two main phases. The first phase is to develop the software, and the second one is the development of the hardware. The development of software includes the process of installing the operating system, configuration and installation of OpenCV, development detection algorithm and notification system using telegram. Analysis of the IoT-based laboratory safety monitoring camera using deep-learning algorithm was done to ensure it is in great condition before the project is handed over to experts for evaluation purposes. Two types of analysis conducted which is on the circuit of the system and also the programming. It is best to handle the casing of the lock with care, especially after the lock has been operated. For the programming the first analysis was conducted on the detection algorithm. Each stage of the algorithm was tested to verify its accuracy in detecting the object. It turns out after a few stages, the accuracy reach the level necessary for the project, and developer stops the development of the algorithm after this level of accuracy was achieved. Next was the analysis of the notification system. Developer has tested the ability of the system to send notification through Telegram. The analysis shows that Internet speed is the main factor when it comes to the speed of the notification to be received. The functionality of the system was excellence; it was working as its intended purpose. The analysis of the checklist by the expert shows that generally most of the experts agreed on the design, suitability, and functionality of the system. However, suitability receives the lowest score out of the other aspects. Developer believes this is because of the limitation of the system where it is only suitable to be used at a certain kind of laboratory only. The highest score was received by the aspect of the system design, although the second highest was not only a different in two points. Developer believes that the design of the system was the best for its kind of purpose and all of the experts also agrees. Generally, the development
IoT-Based Laboratory Safety Monitoring Camera Using Deep-Learning …
139
of IoT-based laboratory safety monitoring camera using deep-learning algorithm has been a success. The objective was able to be achieved. The detection system based on artificial intelligence to identify PPE was able to perform its function. The evidence of the function has been thoroughly explained through the technical analysis, where the effectiveness has been analyzed. This effectiveness is due to the discovery of the technology of the fourth industrial revolution which is machine learning that helps making the function of detection into reality. Moreover, the technology that builds the artificial intelligence for object detection can easily be reuse to teach the machine to detect more object. All that it takes is to provide new sample to the algorithm so that it can learn what the object that is required to be detected.
References 1. Badri A, Boudreau TB, Souissi AS (2018) Occupational health and safety in the industry 4.0 era: a cause for major concern? Safety Sci 109:403–411 2. Allen RC (2012) The British industrial revolution in global perspective. The British Indust Revol Global Persp 3. Beetz M, Bartels G, Albu-Schaffer A, Balint-Benczedi F, Belder R, Bebler D, Worch JH (2015) Robotic agents capable of natural and safe physical interaction with human co-workers. In: IEEE International Conference on Intelligent Robots and Systems, 6528–6535 4. Ramos C, Augusto JC, Shapiro D (2008) Ambient intelligence the next step for artificial intelligence. IEEE Intell Syst 23(2):15–18 5. Chatigny C, Riel J, Nadon L (2012) Health and safety of students in vocational training in Quebec: a gender issue? 41:4653–4660 6. Basori B (2018) The evaluation of occupational health and safety (OHS) implementation in Vocational High School Workshop, Surakarta, 121–125 7. Efendi A, Nugroho YS (2019) Has the electrical laboratory of Subang State Polytechnic applied occupational safety and health? Evaluation Report Automot Exp 2(2):47–52 8. Zahoor H, Chan APC, Masood R, Choudhry RM, Javed AA, Utama WP (2016) Occupational safety and health performance in the Pakistani construction industry: stakeholders’ perspective. Int J Constr Manag 16(3):209–219 9. Kagermann H, Wahlster W, Helbig J (2013) Securing the future of German manufacturing industry: recommendations for implementing the strategic initiative Industrie 4.0. Final Report of the Industrie 4.0 Working Group 10. Fakhfakh N (2011) A video-based object detection system for improving safety at level crossings. The Open Transp J 5:45–49 11. Cocca P, Marciano F, Alberti M (2016) Video surveillance systems to enhance occupational safety: a case study. Saf Sci 84:140–148 12. Howard CJ, Troscianko T, Gilchrist ID, Behera A, Hogg DC (2009) Searching for threat: factors determining performance during CCTV monitoring. Security, 1–7 13. Tan JTC, Arai T (2011) Triple stereo vision system for safety monitoring of human-robot collaboration in cellular manufacturing. In: IEEE International Symposium on Assembly and Manufacturing, 1–6
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System Soumya Shaw, S. Siddharth, S. Ramnath, S. Kishore Nithin, Suganthi Kulanthaivelu, and O. S. Gnana Prakasi
Abstract Safety is crucial, and the truth is ineluctable with its practicality. We strive forward to rev up the safety protocols even more in the field of Road Safety in particular. Countries like India face around 5,00,000 accidents, which lead to 1,80,000 demises each year. The two-lane one-way roads present a risk of the overtaking vehicle crashing onto an incoming car (from the opposite direction) that the overtaking vehicle is unaware of. We seek to achieve two equivocal milestones with our idea in the blind overtake issue, namely, technological aid and economic feasibility. This makes our concept equally impactful in all situations. The technological precision and advancement will help anyone with enough resources to use them tangibly, and economic feasibility ensures a threshold of safety levels that must be put into action. In fact, we are slightly inclined toward the frugality of the architecture paradigm of our idea because ‘safety’ is everyone’s right. On the economic side, we propose an LED board-based solution that presents enough information about the incoming vehicle with which a blind overtake condition can be avoided. Besides, we put forward the idea of vehicle-to-vehicle communication for streaming the video content to the trailing cars with smarter selection and added ease to the drivers. Keywords SSD: single-shot detector · Temporal smoothing · Contour growth rate algorithm · V2V: Vehicle to Vehicle Communication
1 Introduction India, being an extensive and populous country, has the second-largest road network (5.898 million km) in the world after the USA [1], and it encompasses 20.49% of single-lane roads. Such records come at a cost as the heavy population density onsets S. Shaw · S. Siddharth · S. Ramnath · S. K. Nithin · S. Kulanthaivelu (B) School of Electronics Engineering, Vellore Institute of Technology, Chennai, India e-mail: [email protected] O. S. Gnana Prakasi CHRIST (Deemed to be University), Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_11
141
142
S. Shaw et al.
a chain of traffic accidents throughout the year. As of the 2019 report by National Crime Records Bureau [2], 1,81,113 people lost their lives, and 4,42,996 got injured on Indian roads, which is still a considerable number to digest. The report also indicates over-speeding and overtaking as the two main factors responsible for road accidents out of the many other categories, as shown in Fig. 1. Besides, tailgating is another contributing factor to the road accidents we encounter. The research sector has been active in this field lately, developing new systems and targeting the inefficiencies in the traffic practices. Kim et al. propose a collisionavoidance system by identifying any vehicle that may be overtaking from the blind spot. They propose detecting blobs and furthermore identify the approaching ones using motion vectors [3]. Traditional approaches for similar purposes use Support Vector machines (SVMs) and Histogram of Oriented Gradient (HOG) [4, 5]. Schamm et al. and Joung et al. have put forward their ideas of using headlights as features for identifying vehicles in night time using the same algorithmic architecture [6, 7]. Advanced driver assistance systems (ADASs) have acquired substantial recognition in the recent history of technology since numerous road accidents are induced primarily by tiredness and lack of awareness. Warning the driver of any danger which lies ahead on the road is vital for improving traffic safety and accident prevention. Thus, the assistance system’s significant function is to detect the vehicles ahead of one’s own by employing computer vision technologies, considering that the value of an optical sensor like CCD and CMOS is far cheaper than that of a lively sensor like LASER or Radar. Besides, optical sensors are most often used in a broader scope of utilization, such as incident video recorders and lane-departure caution systems. We now discuss a particular situation on the roads that are a combination of the causes mentioned earlier. Tailgating and trying to overtake vehicles in two-lane one-way roads, like typical Indian roads, present another danger level. It presents a risk of the overtaking vehicle crashing onto an incoming vehicle (from the opposite direction) that the overtaking vehicle is unaware of. The driver’s view is partially obstructed by the vehicle in front if it is a car. However, if the overtaking vehicle is preceded by a truck or something of the same category, the view is wholly obdurate.
Fig. 1 Causes of accidents in India, 2019
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
143
The driver blindly assumes that there are no incoming vehicles and decides to take over, completely ignoring the risk of an accident that he/she might encounter hastily. Even though these situations appear to be obsolete in the case of multi-lane driveways, the roads of subcontinents like India are dominated by single-lane roads without dividers. Figure 2 presents a visual experience of the scenario depicting the same situation mentioned [13]. For such roads, the threat of the accident mentioned above situations is ineluctable. Nonetheless, these accidents prove to be fatal and need to be addressed with a solution. Samsung came up with their own solution to the blind overtake issue [14], naming it ‘Samsung Smart Trucks.’ The truck has a wireless camera mounted to the front of the truck to record the scene in front of it, as shown in Fig. 3. This camera is attached to a screen wall consisting of four external monitors at the rear of the car. The cameras give the drivers behind the truck a direct view of the road ahead of the vehicle. Samsung led the development of the prototype by supplying examples of large format displays and testing with a local B2B client. To improve its field of view, Samsung Safety Truck uses two built-in front cameras and a specially developed Ingematica transport software interface to record and relay a continuous video stream of the road ahead. It allows the trailing vehicles to measure the passing opportunity more effectively and make more educated driving decisions. Via a high-quality monitor consisting of four Samsung OH46D video walls, the stream is displayed. The OHD Series video walls are IP56 accredited for quality results against demanding
Fig. 2 Blind overtake scenario
144
S. Shaw et al.
Fig. 3 Samsung safety truck (Image source https://news.samsung.com/global/the-safety-truckcould-revolutionize-road-safety)
environmental conditions and are built to be dustproof and waterproof, guaranteeing a clear image regardless of the road or temperature. Although it is a viable idea, the catch is that its working is limited to two-lane roads and is inessential on multi-lane highways at priority. The cost of these has not yet been disclosed by the idea owners, although it is not impossible to predict that the expense of these devices, for an ordinary vehicle owner, would be on the pricier side. Thus, the cost will be the biggest obstacle in introducing such a scheme because ‘an overpriced solution is not a solution at all.’ Another concern is that a human driver can be quickly confused by screens behind the truck. We tend to get overwhelmed by screens naturally. Thus, when searching the perimeter of the route, our eyes would inevitably be drawn to a screen with motion in it and might lose sight of the rest of the road, which would make it counterproductive and futile.
2 Methodology and Implementation For achieving the desired system, the whole process is demarcated into two parts— the input computational part and the output computational part. The first part is responsible for analyzing the traffic characteristics in front of the truck, and the second part is meant for relaying this pivotal information to the trailing motorists through peripherals.
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
145
Fig. 4 Temporal smoothing architecture
2.1 Input Computational Segment In the first phase, the computational system comprises a Single-Shot Detector (SSD) model running continuously and detecting the vehicles and returning the bounding box for each such detected vehicle. The relative velocity for each vehicle is calculated using the Contour Growth Rate Algorithm described in the next subsection. The real-world motions are noisy and sporadic. It is trivial to incorporate a temporal smoothing function to reject the noisy outputs. The temporal smoothing architecture uses a four-element queue to save up to four relative velocity values into its memory. The summing element uses weighted recalls of the previous velocities and calculates the relative velocity for the present instance, as shown in Fig. 4. As the new value is pushed inside the queue, the oldest stored value is dumped, and other states are shifted.
2.2 Contour Growth Rate The algorithm utilizes the change in the SSD bounding box’s height as a parameter to estimate the velocity with which the object is coming. This idea considers the specific Truck’s state of motion as the inertial reference frame. Hence, the velocity depicts the motions, considering the truck stationary. Each frame accounts for the change in the height of the bounding box. If we observe the pattern analytically, the ratio of change in height (h) and height gives a characteristic slope engendering the required relative velocity as shown in Fig. 5. (relative velocity because we consider the truck’s inertial frame). Mathematically, it can be expressed in Eq. 1.
146
S. Shaw et al.
Fig. 5 h versus h relation graph
h n+1 − h n ∝ relative velocity h n+1 − h n
(1)
The slope is unique for each speed level and is instantaneous. However, each capturing system may have different dimensions, and thus, the exact slope of the equation is dependent on the specific components being used. It is imperative to remember that the values will lie in both positive and negative axes since a vehicle receding away will output a negative relative velocity because of a negative slope.
2.3 Output Computational Segment On the other hand, this segment has the primary hardware components liable for the sufficient intimation to the motorists. To give a decent idea about what is in front of the truck, it is essential to encode the data into some parameters that can be understood by the drivers. Focusing on that, we came up with a display board of RGB LEDs capable of showing different colors by varying the intensity of individual colors, shown in Fig. 6 as an example. The blind overtake scenario is tricky as it is dependent on the relative velocity of the vehicle ahead and their distance from the truck simultaneously. Hence, we developed a function that satisfies the requirements of the closest. The conversion equation outputting the danger level can be mathematically represented by Eq. 2 with the activation function of Eq. 3.
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
147
Fig. 6 LED matrix board [15]
L = v + d, where d = height × coefficient
L output
⎧ ⎪ ⎨ 1, L > 1 = L, 0 < L < 1 ⎪ ⎩ 0, L ≤ 0
(2)
(3)
The relative velocity parameter (v) can be denoted by Eq. 4. ⎧ ⎪ ⎪ ⎨
1 ,x > 0 1 + e−0.5x v= 1 ⎪ ⎪ ⎩ ,x ≤ 0 1 + e−0.2x
(4)
This piecewise function ensures continuity on the number line and accurately distributes the danger level throughout the relative velocity plane. Also, the graph summarizes the implications of the function represented by the red line in Fig. 7.
Fig. 7 Relative velocity parameter function
148
S. Shaw et al.
Similarly, the distance parameter (d) is denoted by Eq. 5.
d=
⎧ ⎨
1 − 0.1 log(x − 195), x > 0 log(0.1x − 10) ⎩ 1, x ≤ 200
(5)
As shown in Fig. 8, any vehicle within the 200 m range poses a threat to overtake and hence instigates full danger level. Apart from that, the rest of the function elicits decreasing danger level with increasing distance. The danger level can further be converted to the linear wavelength scale of colors ranging from 380 nm (λgreen ) to 730 nm (λred ) using Eq. 6. The range corresponds to the green–red spectrum in visible light. Using the wavelength as a parameter ensures continuity in the color space. λoutput = L output (λred − λgreen ) + λgreen
(6)
Finally, the wavelength to RGB values, compatible with the software system, can be mapped using the color map obtained from the work of Sumit et al. and shown in Fig. 9 [16].
Fig. 8 Distance parameter function
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
149
Fig. 9 RGB rendering chart
2.4 Architecture The three major software tools required to implement this project are Python, COM0COM, and Proteus. The architecture for the software implementation of this project is provided below in Fig. 10.
Fig. 10 Architecture for software implementation
150
S. Shaw et al.
A video file in mp4 format is fed as input to Python which uses a SSD model to detect objects in the video frames. The Python Scripts use this information to analyze the situation in front of the vehicle by calculating the distance and relative velocity of the vehicles in front. Python transmits this information serially to the Transmitter Virtual COM Port. The Transmitter and Receiver Virtual COM Ports must be created by COM0COM application beforehand. Once the Transmitter Virtual COM Port receives information from Python, it transmits the same data to Receiver Virtual COM Port which is supported by COM0COM application. Once the Receiver Virtual COM Port receives information, it is simulated as a Serial Port in Proteus. These data are sent serially from the Port to Arduino UNO. This microcontroller uses this information as input and produces an output which is sent to MAX7219 IC, thereby displaying the data in RGB LED matrix.
2.5 Position and Algorithm The LED matrix board is meant for the region near the number plate and rear bumper. The position of the board in such a region will not interfere with the remaining components and will be visible to the trailing drivers easily. Serial input/output common-cathode display driver configurations of microprocessors like MAX7219 IC are capable of handling such a task with LED matrix. The full flow of the algorithm is once again summarized into the flowchart in Fig. 11.
2.6 Vehicle-To-Vehicle Communication The recent advancements have opened new windows for realizing vehicle-to-vehicle (V2V) communication practically. Pereira et al. [17] presented their work advocating the use of a video transmission system over the IEEE 802.11p/WAVE communication technology. Their driving assistance system uses HTTP POST packets for data encapsulation and transmission. The FFserver allows multi-platform support, as even a browser is enough to watch the video stream at the receiving end. Their proposed algorithm even elicits assuring analytics in terms of transmission delays. It engenders an average delay of 1.22 and 2.44 ms in urban and highway scenarios, respectively. The overall architecture is shown in Fig. 12. Although the work seems to be implementable in our case, some real-time issues need to be handled first. The road will not be free of traffic, and under no circumstances, can we assume that there will only be one heavy vehicle blocking the driver’s sight. Hence, a prime need is to have a priority-based algorithm to select and accept the video stream. The available streams are sorted per their descending signal strength in dB and relay the closest vehicle’s video content. This inclusion makes the logical choice
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
151
Fig. 11 Flowchart of the full algorithm
Fig. 12 Vehicle-to-vehicle video stream service
viable since the truck/bus most proximate to the driver is most likely to block the sight. However, the driver can switch between vehicle streams with their voice command. It is an obvious choice as other means can prove to be distracting and eventually fatal for the driver. We also suggest using the vehicle’s number plate as the unique ID to know which video content the driver is accessing. In this way, the driver can exploit the V2V technology to overcome the blind overtake effectively.
152
S. Shaw et al.
3 Conclusion It can be concluded that a system needs to be developed that can assist the driver in different geographical conditions along with varying conditions of the road. It is known that overtaking is a complicated maneuver to perform on two-lane one-way roads. When it comes to Indian roads’ situations, it becomes challenging to achieve a safe overtaking maneuver provided the scenario we are discussing. Therefore, a system developed should assist the driver when to steer the vehicle for overtaking or avoid it while performing the blind overtake. It should present enough information to the driver about the viability of performing an overtake maneuver in a real-time scenario. This system should be independent of external infrastructure so that it can be used anywhere. Also, looking toward Indian roads and traffic conditions, a precise, accurate, and cheap system must be designed for these conditions. Autonomous driver assistance systems’ support to the driver in tricky driving situations is swiftly evolving from becoming a futuristic dream to being a conventional reality. The present correspondence introduced our approach to solving one of the leading causes of road fatalities: overtaking. While the system is very robust, it becomes more challenging during the night time due to inadequate light in the background that makes object detection less accurate. Besides, V2V communication between vehicles might not stream continuously with high quality at all times. The future versions can incorporate a more accurate SSD model for object detection to improve the accuracy of detecting objects. This will also improve the accuracy of the locality of things in the frames so that the bounding box coordinates do not change much with respect to time. Similarly, an object/vehicle tracker can be implemented to track the vehicles detected instead of running the SSD model for each frame, thereby increasing throughput. The system can even be merged with Sign Recognition systems to aid the driver in the road conditions.
References 1. Basic Road Statistics of India, Ministry of Road Transport and Highways (2016–17) 2. Accidental Deaths and Suicides in India, Nationals Crime Records Bureau (NCRB), (2019) 3. Kim SG, Kim JE, Yi K, Jung KH (2017) Detection and tracking of overtaking vehicle in Blind Spot area at night time. In: IEEE international conference on consumer electronics (ICCE), Las Vegas, NV, pp 47–48. https://doi.org/10.1109/ICCE.2017.7889224 4. Jung KH, Yi K (2015) Determination of moving direction by the pose transition of vehicle in blind spot area. In: Proceedings of IEEE international symposium on consumer electronics. Spain Madrid (2015) 5. Baek S-H, Kim H-S, Boo K-S (2014) A method for rear-side vehicle detection and tracking with vision system. J Korean Soc Precis Eng 31(3):233–241 6. Schamm T, von Carlowitz C, Z¨ollner JM (2010) On-road vehicle detection during dusk and at night. In: Proceedings of IEEE intelligent vehicles symposium, pp 418–423 7. Joung J-E, Kim H-K, Park J-H, Jung H-Y (2011) Night-time blind spot vehicle detection using visual property of head-lamp. IEMEK J Embed Sys Appl 6(5):311–317
ByWalk: Unriddling Blind Overtake Scenario with Frugal Safety System
153
8. Huang SS, Chen CJ, Hsiao PY, Fu LC (2004) On-board vision system for lane recognition and front-vehicle detection to enhance driver’s awareness. In: IEEE international conference on robotics and automation, proceedings. ICRA ‘04. New Orleans, LA, USA, vol 3, pp 2456–2461. https://doi.org/10.1109/ROBOT.2004.1307429 9. Sun Z, Bebis G, Miller R (2002) On-road vehicle detection using Gabor filters and support vector machines. In: 14th international conference on digital signal processing proceedings, vol 2, pp 1019–1022. https://doi.org/10.1109/ICDSP.2002.1028263 10. Ponsa D, Lopez A, Lumbreras F, Serrat J, Graf T (2005) 3D vehicle sensor supported sight. In: IEEE Proceedings intelligent transportation system, pp 1096–1101 11. Liu T, Zheng N, Zhao L, Cheng H, Learning based symmetric features selection for vehicle detection. In: 2005 IEEE intelligent vehicles symposium proceedings. Las Vegas, NV, USA, pp. 124–129. https://doi.org/10.1109/IVS.2005.1505089 12. Ponsa D, López A (2007) Cascade of classifiers for vehicle detection. In: Blanc-Talon J, Philips W, Popescu D, Scheunders P (eds) Advanced concepts for intelligent vision systems. ACIVS 2007. Lecture Notes in Computer Science, vol 4678. Springer, Berlin, Heidelberg. https://doi. org/10.1007/978-3-540-74607-2 13. Transport Accident Commission (TAC), Victoria State Government 14. News.samsung.com (2016) Samsung Presents First “Samsung Safety Truck” Prototype. [online] Available at: https://news.samsung.com/global/samsung-presents-first-samsung safety-truck-prototype [Accessed 7 Nov 2020] 15. Shop.pimoroni.com (2021) RGB LED Matrix Panel—Pimoroni. [online] Available at: https://shop.pimoroni.com/products/rgb-led-matrix-panel?variant=19321740999 [Accessed 6 Feb 2021] 16. Bhowmick S (2017) The RGB rendering of visible wavelength lights (2019 02 28 14 47 31 UTC). https://doi.org/10.13140/RG.2.2.14324.71040 17. Pereira J, Diaz-Cacho M, Sargento S, Zuquete A, Guardalben L, Luis M (2018) Vehicle-tovehicle real-time video transmission through IEEE 802.11p for Assisted-Driving. In: 2018 IEEE 87th vehicular technology conference (VTC Spring), Porto, pp. 1–6. https://doi.org/10. 1109/VTCSpring.2018.8417766
Levy Flight-Based Black Widow Optimization for Power Network Reconfiguration S. Dhivya
and R. Arul
Abstract Over the last few years, power loss minimization is an essential task in the radial distribution system. Hence, in this paper, the Levy Flight-Based Black Widow Optimization (LFBWO) is designed to minimize real power losses by computing the optimal location of feeders. The proposed method combines the Levy Flight (LF) and Black Widow Optimization (BWO). The results showed that the recommended method is an efficient one compared with the current methodologies. The suggested technique shows the feasibility of optimal feeder allocation problems. The BWO performances are enhanced with the help of LF, which reduce the convergence problems in the method. The proposed procedure is applied on MATLAB, and performances are tested with the Indian 52 practical bus system. It is validated with different cases such as bus outage condition, line outage condition, both bus and line outages. The suggested technique is compared with the artificial bee colony algorithm (ABC), flower pollination algorithm (FPA), Black Widow Optimization (BWO) to validate its efficacy. Keywords Loss minimization · Radial distribution system · Levy Flight · Black widow optimization
1 Introduction Fossil fuel sources create environmental hazards, emissions of unwanted gases, which are also harmful to human beings. As an alternative, Distributed Generation (DG) is the ultimate solution to fulfill the requirements of electricity that is utilized in the power sector [1]. DG has many advantages, such as small-scale generation and costeffectiveness contrasted with central power plants. The renewable energy resources of DG are geothermal, photovoltaic (PV), hydrogen, wind turbine (WT) biogas, ethanol, biodiesel, and biomass, which are also eco-friendly sources [2]. Some new S. Dhivya · R. Arul (B) School of Electrical Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu 600127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_12
155
156
S. Dhivya and R. Arul
algorithms are identified which are very helpful for the optimal network reconfiguration of the power networks. The recent approach is very effective and efficient that can easily manage and resolve the reconfiguration of the network along with the objective of the bringing fair voltage profile as well as the power loss minimization [3]. The main aim of this research study is to generate some results and focus on the power loss minimization because the engineers are facing troubles and challenges due to losses of energy in the power distribution network. So, it is needed to focus on the new algorithms that will focus on the minimization of the energy losses using the three proposed algorithms for the optimal network reconfiguration [4]. Furthermore, power distribution network delivers the variety of loads such as the industrial, residential as well as commercial which are generally subjected to the variations in the daily demands on a larger scale. This research is providing an optimal network reconfiguration in which different types of studies are involved and every study is providing results by using different algorithms [5]. Additionally, some analytical procedures are also available to detect the ideal position of feeders in the power networks. The analytical method was presented to obtain real power loss minimization [6]. This article presents new hybrid algorithm for performing network reconfiguration during different outage conditions. The remaining part of the study is arranged as follows. Section 2 elaborates prior studies on power loss minimization and metaheuristics in the radial distribution systems. Section 3 briefs the model of an objective function with power system constraints. Section 4 contains the proposed algorithm and process flow. The results and effects of the ideal positioning of feeders in the power system are described in the fifth section. The conclusion of the paper is found in the sixth part.
2 Literature Review The researchers developed many metaheuristic methods to minimize power loss by selecting optimal tie switch connections through reconfiguration in the radial distribution system (RDS). The problem of reconfiguring RDS is framed as mathematical programming, and it could be solved with the decomposition and coordination theories [7]. The complex network theory usage has got much importance as it is progressed [8]. The sub-graph indices [9] for rating the significance of network buses are proposed. It is proclaimed that the weighted power system reflects better node’s significance and functioning in existing methods. However, the power source relationship between the nodes is not counted in the power system with restoration techniques [10]. Power distribution network reconfiguration (PDNR) approaches had evolved significantly, from single objective to recent, multi-objective stochastic approaches [11]. It could be furnished with superfast emulators and cutting-edge visualization. PDNR is probably the most debated power system optimization issue as evidenced by the number of published articles on the subject over the previous three decades. The two major problems of PDNR techniques for RDS are the enormously vast combinative resolution space and the necessity of a highly rapid loss evaluation
Levy Flight-Based Black Widow Optimization for Power Network …
157
methodology for repetitive and constant assessment of each design [12]. In addition, the progressive development of RDS, with the rapid popularity of DG, has been made. While PDNR methods were originally intended only for effective power loss minimization and load feeder balancing, they now serve various other goals. A bright vision of enhancing consistency and quality of power indices has been encircled in an environment of nonlinear approach. The flexible alternating current transmission system devices, innovative applications to ensure good quality and trustable power distribution, the introduction of advanced metaheuristic optimization techniques, and increasing conditions stochastically have contributed to this research area. Many metaheuristic algorithms have been reported to tackle many global optimization problems. Nature replicates like biological, physical, ethological, and different intelligence concepts have inspired many algorithms [13, 14]. Indeed, the widespread use of metaheuristic algorithms, particularly in engineering optimization challenges, can be traced to several factors, including elasticity, gradient-free mechanisms, and local avoidance goals based on basic ideas. Because of straightforward notions, metaheuristic algorithms inspired by nature are almost easy to understand. Levy Flights (LFs) may be regarded as a random walk. The proposed method is particularly targeted toward the environment of wireless sensor networks in conjunction with LF motions. The LF showed that in uncertain situations, it might improve the efficiency of resource searches. In reality, numerous naturally influenced events in the ecosystem can inspire LFs [15]. The Black Widow Optimization Method (BWO) is found as a new technique based on population and encouraged by the lifespan of black widow spiders. The BWO offers rapid computational speed and prevents local goals in the exploring stages. It is also worth noting that BWO can strike a balance between exploitation and exploration. Similarly, it can investigate a broad region to find the optimum global solution. BWO will be a suitable fit for a variety of optimization issues involving many local goals [16]. This article combines LF with BWO to obtain true optimal zone. It is then computed to solve network reconfiguration problem, and then, results are compared with artificial bee colony (ABC) [17] and flower pollination algorithm (FPA) [18] algorithms to validate its efficacy.
3 Problem Formulation 3.1 Power Loss Minimization The initial goal is to reduce total real power loss by allocating optimal tie switch connections. The power loss minimization is formulated as follows in Eq. (1): N BR
F = PLoss =
I =1
RI ×
PI2 + Q 2I , VI2
(1)
158
S. Dhivya and R. Arul
where N BR is described as distribution network total number of branches, VI2 is defined as voltage magnitude at Ith bus, PI2 is described as active powers load at Ith bus, Q 2I is described as reactive powers load at Ith bus, and R I is described as the resistance of the Ith branch, respectively. The optimal feeder connection is accomplished using a given algorithm, which must satisfy following constraints.
3.2 Power System Constraints The objective function of power loss is related to their constraints. These constraints should meet in the power system for maintaining stable operation [19]. The main constraints are formulated as follows.
3.2.1
Bus Voltage Limits
In the power system, the bus voltage must follow within their maximum and minimum limits, which are formulated as follows in Eq. (2): V MIN,I ≤ V I ≤ V MAX,I ;
I = 1, . . . , N B ,
(2)
where V MAX,I can be described as the maximum voltage of Ith bus and V MIN,I can be described as a minimum voltage of Ith bus, respectively.
3.2.2
Real and Reactive Power Balance
In the power system, real and reactive powers also manage within their limits which are mathematically formulated as follows in Eqs. (3) and (4): N DG
P
SLACK
+
P
=
I =1 N
Q
+
I =1
P
+
N
Q
=
I =1
P L ,K
(3)
Q L ,K
(4)
I =1
B
DG,I
N BR
D,J
I =1
DG
SLACK
N B
DG,I
N BR
Q
D,J
+
I =1
where PL,K is active losses of power in the Kth branch, QL,K is active losses of power in the K th branch, PD,J is the active power load demand of Jth bus, QD,J is the reactive power load demand of Jth bus, PDG,J shows the active power output of the Ith DG unit, QDG,I denotes the reactive power output of the Ith DG unit, PSLACK may be defined as active power supplied by the slack bus. In contrast, QSLACK can be defined as reactive power provided by the slack bus. In the distribution network, the total
Levy Flight-Based Black Widow Optimization for Power Network …
159
number of buses is N B , while N DG is the total number of DG units in the electricity system.
4 Proposed Algorithm The suggested LFBWO technique is used to know the best feeder allocation for attaining goal such as power loss reduction. The LFBWO algorithm combines the LF and BWO algorithms into one. The Levy Flight is utilized to help the BWO to solve convergence problems. The Levy distribution function improves the efficiency of BWO’s operations. This section contains a detailed discussion of BWO, LF, and the suggested method.
4.1 Black Widow Optimization The unique breeding behaviors of black widow spiders may be used to describe this algorithm. The main stage of the BWO algorithm’s exclusive stage is known as Cannibalism. Spiders are arthropods that breathe air and have eight legs as well as poisonous fangs [16]. The BWO algorithm starts with a population of spiders, each of which specifies a solution. These first spiders, in couples, are attempting to create a new offspring. Female of black widows devours black male widows during and after breeding, according to the BWO algorithm. Female black widows then store sperm in their sperm and create eggs in sacks. It will arrive with egg sacs after 11 days. The various answers to each challenge were modeled after a black widow spider in the BWO algorithm. Each black widow spider is seen as a potential issue. The following is the formula for the BWO’s starting population.
4.2 Levy Flight-Based Black Widow Optimization (LFBWO) The LFBWO is made to understand the best feeder positions in the power system. It includes minimal power loss. In the BWO, the LF distribution is used to choose the appropriate feeder allocation efficiently. The BWO algorithm may be trapped into local zone of false convergence to provide the optimal solutions. The Levy Flight distribution function overcomes these convergence problems by applying it into the normal BWO algorithm. So, the convergence of the BWO algorithm is efficiently achieved with the help of Levy Flight distribution. Hence, high search efficiency is achieved by the Levy distribution function in the unidentified search space. The method is applied to create a new exploration once the mutation process is achieved. The convergence process is enhanced with the mutation process adapted with the Levy Flight distribution function. The Levy Flight distribution is utilized
160
S. Dhivya and R. Arul
with mutation which is formulated as follows in Eq. (5): Y = X I + α0 . X I − X J ⊕ L (M) ,
(5)
where X I and X J can be considered as parents, L (M) is described as Levy function, ⊕ can be described as entry-wise multiplication, and α0 is described as a scaling factor. The best feeder placement is obtained when the suggested method is considered, which lowers power loss. Its performance is assessed in the next section.
5 Results and Discussion The proposed technique will be validated and justified in the current section. It has been established in MATLAB, and its performance has been assessed. Indian 52 bus system [20] is used to authenticate the suggested approach. It is implemented on a RAM capacity of 32 GB and a 4 GHz Intel Core i7 system. This method is designed to reduce the power loss by optimal allocation of tie switches for feeder reconfiguration. The suggested methodology is assessed with three different cases, which are mentioned as follows: • Case 1: bus outage • Case 2: line outage • Case 3: bus and line outages.
6 Performance Analysis of Indian 52 Bus System The ability of the suggested technique is assessed using the Indian 52 bus system and three case studies. There are 52 buses in the test system, 51 branches feeding the total network load, and three feeders. The test system’s power factor is 0.9, which is behind. The recommended test’s base kV and kVA are 11 and 1000, respectively. The test system’s lowest and highest bus voltage limitations of magnitude are 0.9 p.u and 1.05 p.u, correspondingly. The total reactive and active losses for 52 bus practical distribution systems without DG are 381.7 kVAr and 887.2 kW, respectively. The three DGs having real power sources of 10, 20, and 30 kW are placed at buses 15, 31, and 52. The battery energy storage system (ESS) of 30 kW real power is placed at bus 26. As shown in Fig. 1, the base case tie switch connections are represented by dotted lines. The optimal tie switch combinations for PDNR are shown in Table 1 using different algorithms. The real power loss is reduced with the help of the proposed methodology as visualized here in Table 2. Initially, the objective function of the system is identified and minimized with the assistance of optimal tie switch connections in a power system. The proposed LFBWO is compared with the ABC, FPA, and
Levy Flight-Based Black Widow Optimization for Power Network …
161
Fig. 1 Schematic diagram of practical Indian 52 bus system
BWO algorithm. The suggested approach achieves the greatest results for minimizing power loss based on the comparative study. Figure 2 depicts the power loss at each bus for bus outage condition (case 1). Meanwhile, Fig. 3 shows the power flow at each line of the Indian 52 bus system for line outage condition (case 2). The convergence characteristics of the Indian 52 bus system through all algorithms during both bus and line outages (case 3) are shown in Fig. 4. Comparatively, LFBWO converges quicker than all other techniques. The suggested method reduces real power loss while maintaining the system’s typical characteristics. The proposed technique yielded the best optimum outcomes for lowering the objective function, according to the study. Table 1 Tie switch combinations for Indian 52 bus system Tie switches (TSs)
TS-1
TS-2
TS-3
TS-4
Base case
33–47
32–35
35–43
20–25
ABC
18–47
45–33
34–13
44–15
42–6
11–19
FPA
11–52
47–10
18–25
5–15
13–28
37–26
BWO
46–10
36–16
31–5
6–40
13–25
8–51
8–25
11–9
31–10
42–15
33–20
LFBWO
29–6
TS-5 6–16
TS-6 8–11
162
S. Dhivya and R. Arul
Table 2 Analysis of real power loss in Indian 52 bus system Performance
Case 1
Case 2
Case 3
Outage conditions
Bus 14
Line 17
Bus 14 and line 17
Real power loss (kW)
Base case
408.7435
433.1093
407.1257
ABC
Best
403.3059
427.9742
401.6888
403.2534
427.9524
401.543
FPA BWO
402.8933
427.8438
401.2764
LFBWO
402.7756
427.4351
401.1232
403.8846
428.6163
402.2675
ABC
Worst
FPA
403.821
428.5152
402.2514
BWO
403.7846
428.5163
402.1675
LFBWO
403.6658
428.4531
402.0076 401.8335
ABC
403.4506
428.2615
FPA
Mean
403.4293
428.123
401.6756
BWO
403.3023
428.0792
401.6854
403.1245
428.0542
401.4361
0.2571
0.2724
0.2571
0.3142
0.1243
0.2714
LFBWO ABC
Standard deviation
FPA BWO
0.3852
0.3291
0.3850
LFBWO
0.0123
0.0453
0.0432
Fig. 2 Power loss at each bus for case 1
Levy Flight-Based Black Widow Optimization for Power Network …
163
Fig. 3 Power flow at each line for case 2
Fig. 4 Convergence chart of different algorithms for case 3
7 Conclusion The hybrid LFBWO method is proposed to ensure stable operation by considering real power loss reduction. The single objective function is applied to analyze the best configuration of RDS. The proposed method’s performance has been evaluated using the Indian 52 practical system. Different conditions are verified in each network, such as bus outage, line outage, both bus and line outages. LFBWO is validated by doing a performance and comparative study on the system’s power loss. The proposed
164
S. Dhivya and R. Arul
technique yielded the best outcomes in feeder allocation on RDS. The efficacy of the LFBWO is also supported by a contrast of the results from the projected and other methods. Furthermore, it is readily converged to choose the best tie switch connections. In future, the recommended approach is integrated with optimal DG and ESS placement for this reconfigured RDS. Acknowledgements The authors are very much thankful to the authorities of Vellore Institute of Technology, Chennai to carry out this research work with all facilities.
References 1. Angalaeswari S, Sanjeevikumar P, Jamuna K, Leonowicz Z (2020) Hybrid PIPSO-SQP algorithm for real power loss minimization in radial distribution systems with optimal placement of distributed generation. Sustainability 12(14):5787 2. Montoya OD, Molina-Cabrera A, Chamorro HR, Alvarado-Barrios L, Rivas-Trujillo E (2021) A hybrid approach based on SOCP and the discrete version of the SCA for optimal placement and sizing DGs in AC distribution networks. Electronics 10(1):26 3. Atteya II, Ashour H, Fahmi N, Strickland D (2017) Radial distribution network reconfiguration for power losses reduction using a modified particle swarm optimisation. CIRED-Open Access Proc J 2505–2508 4. Pegado R, Ñaupari Z, Molina Y, Castillo C (2019) Radial distribution network reconfiguration for power losses reduction based on improved selective BPSO. Electric Power Syst Res 206–213 5. Fathabadi H (2016) Power distribution network reconfiguration for power loss minimization using novel dynamic fuzzy c-means (dFCM) clustering based ANN approach. Int J of Electric Power Energy Syst 96–107 6. Samala RK, Mercy Rosalina K (2021) Optimal allocation of multiple photo-voltaic and/or wind-turbine based distributed generations in radial distribution system using hybrid technique with fuzzy logic controller. J Electric Eng Technol 16(1):101–113 7. Ferdavani AK, Zin ABM, Khairuddin AB, Naeini MM (2011) A review on reconfiguration of radial distribution networks through heuristic methods. In: Proceedings of ICMSAO, Kuala Lumpur, Malaysia, pp 1–5 8. Swarnkar A, Gupta N, Niazi KR (2012) Distribution network reconfiguration using population based AI techniques: a comparative analysis. In: Proceedings of IEEE PES- GM, San Diego, CA, USA, pp 1–6 9. Lavorato M, Franco JF, Rider MJ, Romero R (2012) Imposing radiality constraints in distribution system optimization problems. IEEE Trans Power Syst 27(1):172–180 10. Guedes LSM, Lisboa AC, Vieira DAG, Saldanha RR (2013) A multiobjective heuristic for reconfiguration of the electrical radial network. IEEE Trans Power Deliv 28(1):311–319 11. Karthik N, Parvathy AK, Arul R (2019) A review of optimal operation of microgrids. Indonesian J Electric Eng Comput Sci 14(1):1–8 12. Abadei C, Kavasseri R (2011) Efficient network reconfiguration using minimum cost maximum flow based branch exchanges and random walks based loss estimations. IEEE Trans Power Syst 26(1):30–37 13. Houssein EH, Saad MR, Hussain K, Zhu W, Shaban H, Hassaballah M (2020) Optimal sink node placement in large scale wireless sensor networks based on harris’ hawk optimization algorithm. IEEE Access 8:19381–19397 14. Çelik E (2020) A powerful variant of symbiotic organisms search algorithm for global optimization. Eng Appl Artif Intell 87:103294
Levy Flight-Based Black Widow Optimization for Power Network …
165
15. Houssein EH, Saad MR, Hashim FA, Shaban H, Hassaballah M (2020) Lévy flight distribution: a new metaheuristic algorithm for solving engineering optimization problems. Eng Appl Artific Intell 94:103731. https://doi.org/10.1016/j.engappai.2020.103731 16. Hayyolalam V, Kazem AAP (2020) Black widow optimization algorithm: a novel metaheuristic approach for solving engineering optimization problems. Eng Appl Artif Intell 87:103249. https://doi.org/10.1016/j.engappai.2019.103249 17. Ganesh S (2014) Network reconfiguration of distribution system using artificial bee colony algorithm’. World academy of science, engineering and technology, open science index 86. Int J Electric Comput Eng 8(2):396–402 18. Mahendran G, Govindaraju C (2020) Flower pollination algorithm for distribution system phase balancing considering variable demand. Microprocess Microsyst 74:103008 19. Raut U, Mishra S (2020) A new Pareto multi-objective sine cosine algorithm for performance enhancement of radial distribution network by optimal allocation of distributed generators. Evolution Intell 1–22 20. Sabarinath G, Manohar TG (2019) Application of bird swarm algorithm for allocation of distributed generation in an Indian practical distribution network. IJ Intell Syst Appl 7:54–61. https://doi.org/10.5815/ijisa.2019.07.06
Exploratory Spatial Data Analysis (ESDA) Based on Geolocational Area P. Baby Shamini , Shubham Trivedi, K. S. Shriram, R. R. Selva Rishi, and D. Sayyee Sabarish
Abstract In our daily life, we have websites like 99acres, Magicbricks, etc., which help us to find rooms or flats on rent in any city, but they do not give option to find accommodation according to our preferences that is food, budget, and accommodation. In this model, we will help the students to find best area in any city by classifying their choices such as food, budget. First, we will gather the datasets; then, we will clean the datasets according to our needs. After that we have our data, we need to understand it. A best way to understand data is by visualizing the data via graphs. To visualize the data, graphs help us to show more precise information which makes it easy to scan information in order to understand. After visualizing the data, we will run K-Means Clustering which will help by grouping the location. Find best K value for our population. From the Foursquare API, get all the geolocational data to find these people accommodation! Finally, we will run K-Means Clustering on data to plot the final results on the map. Keywords Data mining · Clustering · K-Means
P. B. Shamini (B) RMK Engineering College, Kavaraipettai, India e-mail: [email protected] S. Trivedi · K. S. Shriram · R. R. S. Rishi · D. S. Sabarish Rajalakshmi Institute of Technology, Chembarambakkam, India e-mail: [email protected] K. S. Shriram e-mail: [email protected] R. R. S. Rishi e-mail: [email protected] D. S. Sabarish e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_13
167
168
P. B. Shamini et al.
1 Introduction People are finding it difficult to adjust to the current food environment. If a person is away from home and family members, he or she may not have the time to prepare food and then consume it because of a lack of time. So, in essence, that individual has no choice than to dine outdoors, yet he is unable to eat outside food for an extended period of time. As a result, if one is accustomed to eating home-cooked meals every day, it is not unusual to desire to treat oneself to a nice lunch out every now and then for social or recreational reasons. Whatever the case, the food that one consumes is a vital component of one’s lifestyle, regardless of where one lives. Now, assume that someone has relocated to a different location. He or she will have a set of likes and tastes that are unique to them. That person, as well as food service providers, would benefit from living in close proximity to their favored outlets. Since a result, if the site is close by, it will be more convenient for both the individual and the restaurant, as it will result in increased sales while also saving time. Aside from food delivery applications, this will also assist managers of restaurant chains and hotels in obtaining information about the locations where they wish to construct a restaurant or café. For example, if a restaurant manager is already familiar with the area where sales will be strong, he or she would prefer to open in a hotspot location where sales will be concentrated, ensuring that people will go and come to the restaurant in a short period of time and that more customers can be served. It will also assist individuals in locating the hotspot region where they can find their favorite outlets, hence increasing the likelihood that they will choose to stay in that location. For example, if a person is aware of the location of a restaurant serving his preferred cuisine, it will be much easier for him to remain at that location without having to search for food when he returns to his hotel. In this assignment, it will also be discussed how a person might locate a place that fits his or her budget, culinary preferences, and other factors.
2 Related Work The most important phase in a paper is a review of the literature and technology used. Prior to developing the proposed model, it is critical to consider all relevant factors such as time, budget, and purpose. The next step is to determine which software and languages will be used to deploy the model. Once all of these factors have been considered, programmers will begin developing the model using external resources. Professional programmers, diverse publications, and well-known websites all provide this external assistance. In the software development process, a literature survey is an integral part of each project. Once these items have been satisfied and thoroughly surveyed, the next step is to look into the software specifications in the respective system, such as what operating system is required for the proposed model
Exploratory Spatial Data Analysis (ESDA) Based on Geolocational Area
169
project and software that is required to move on to the next step, which is to deploy the tools and methods. Li [1], This work investigates that the results of indoor radio propagation are used to develop a system for evaluating the performance of super-resolution. The results of SR techniques are compared to the results of traditional TOA approaches. Diversification approaches on super-resolution technique performance are assessed. Bicheron [2], Multi-year global land cover maps are now possible with this project’s new 300-m spatial resolution service, which makes use of FRS data gathered by the MERIS sensor on the ENVISAT satellite. GlobCover orthorectified products need to be geolocated accurately because they are integrated into a single dataset. Orthorectification preprocessing chain is discussed in this article. Wang [3], In order to extract and deduce traffic jam information from these trajectories, we design techniques. A road network is created by matching the paths that have been cleaned. Traffic jam propagation graphs are a way of concatenating events that are both spatially and temporally connected. It is possible to use these graphs to describe a traffic gridlock at a high level, as well as how it spreads over time and space Visual exploration and analysis of traffic conditions in a major city are made possible by our system, which enables several views at three different levels: at the level of propagation graphs, at the level of road segments, and at the level of the entire city. Li [4], The enormous expansion in volume, velocity, and variety of social media data has sparked interest in leveraging it to guide classic remote sensing image retrieval and information extraction activities. They compare and contrast the similarities and differences between localization. Then, we show how to extract information from big remote sensing data repositories using social media data, despite the fact that the examples offered show tremendous potential for integrating localization and spatial technologies. Nugent [5], The program’s objectives are to prepare adolescents for the workplace of the twenty-first century provide the opportunities to learn STEM ideas. The NSFfunded project underwent thorough investigation and evaluation during its three-year duration. In the majority of research, which is descriptive in nature, this work employs quantitative approaches that include comparison groups and post hoc analyses. Zaniewicz [6], Mobile devices are becoming increasingly used in a variety of fields, including maritime navigation. The paper discusses the potential applications of sensors in navigation systems. The definition of a data model is followed by data examination and sensor availability in this field. Following that, many options for integrating spatial data into the system are discussed, including the standard GIS strategy of using geoinformatic web services. Zignani [7], In this research, we examine a small number of GPS-based traces in order to infer patterns of human motion. Using GPS data, we present a clustering method for extracting the most important points of interest, referred to as geolocations. A statistical examination yields the most appropriate distributions of distances covered by persons within a geolocation and between geolocations, as well as the
170
P. B. Shamini et al.
length of time spent pausing between each location. Last but not least, we investigate the elements that influence people’s decisions about where to go next in their movement.
3 Proposed Model The proposed system follows the architecture consisting of following modules such as collecting the data, clean and visualizing the data, run K-Means Clustering on the data, get geolocation data from Foursquare, plot the result on the map. We will use Postman as to get geolocation data. K-Means Clustering: K-Means is method of vector quantization, that aim to partition ‘n’ observation into k clusters in which each observation belongs to the clusters with the nearest mean (cluster center), serving as a prototype of the clusters. Foursquare API: The Foursquare Places API helps us to give position-grounded gests with different locations, druggies, prints, and check-sways. Postman: It is a popular API customer that makes it easy for inventors to produce, partake, and test. This is done by allowing druggies to produce and save simple and complex HTTP/s requests, as well as read their responses. The result-further effective and less tedious work. • Figure 1 A is explained about the proposed work of our system. Collection of data: As we know all people have different taste in food, so on that basis, we will gather all the information from the internet and import that information in tubular form in Python. Cleaning and visualizing the data: After getting the data, we have clean the data by removing the information which is not relevant, and after cleaning the data, we will use graph to visualize the data because graphs help us to understand the data easily instead of reading thousands of rows of data. We will import several Python libraries like matplotlib, seaborn, geopandas, etc. K-Means Clustering: K-Means Clustering will help in grouping the location. For example, an area with a high number of shops nearby will be shown “Amenity Rich”, while an area with a smaller number of shops will be shown “Amenity Poor”. Similar locations will be clustered together. Get geolocation data: A person can set up a query to check for residential locations in a fixed radius around a point of your choosing. Plot the clusters on map: Finally, we will run K-Means Clustering on the data and plot the final results on a map. Here, we are applying K-Means Clustering on the dataset of the area where we chose, which will help us find the best area for each population.
3.1 Tools Used This module is designed to help incoming student in any city to live with all the facilities available nearby and mostly food choices.
Exploratory Spatial Data Analysis (ESDA) Based on Geolocational Area
171
Fig. 1 A processing of the dataset
Software used: • Anaconda. • Jupyter Notebook. System requirements: A. Hardware requirements: • Hard disk: 20 GB. • RAM: 2 GB RAM, processor speed: 1.2 GHz, processor: Core i3, i5, i7. B. Software requirements: • Operating system: Windows 10. • Languages: Python.
4 Results and Discussion The project created a new framework as a solution and enhances the information based on user expectation. This proposed model is intended to discover the best housing for a person based on their preferences such as budget, cuisine preferences, location, and other factors. It will assist in filtering the budget high or low based on the number of people in the same location. In order to enhance their sales, many hotel and café managers and owners may consider opening a location near the location where they would be able to grow their sales. After we get our facts, we must analyze it. The best method to comprehend data is to represent it using graphs. Graphs allow us to display data in a more compact and exact way, making it easier to read information quickly and spot trends. We will run K-Means Clustering after viewing the data, which will aid by grouping the locations. The geolocational data’s are collected from the Foursquare API and find the best K value for our demographic. Finally, we will use K-Means Clustering to visualize the final results on a map (Figs. 2, 3, 4, 5 and 6).
172
Fig. 2 Dataset after cleaning
Fig. 3 Dataset in graph which helps us to visualize and understand data easily
P. B. Shamini et al.
Exploratory Spatial Data Analysis (ESDA) Based on Geolocational Area
Fig. 4 We have run K-Means Clustering on the dataset to find the clusters
Fig. 5 We have marked location on map with the help of Foursquare API and Postman
173
174
P. B. Shamini et al.
Fig. 6 After marking all the locations on map, again we have run K-Means Clustering on map to see the final result
References 1. Li X, Pahlavan K (2004) Super-resolution TOA estimation with diversity for indoor geolocation. IEEE Trans Wireless Commun 224–234 2. Bicheron P, Amberg V, Bourg L, Petit D, Huc M, Miras B, Brockmann C, Hagolle O, Delwart S, Ranera F, Leroy M, Arino O (2011) Geolocation assessment of MERIS GlobCover orthorectified products. IEEE Trans Geosci Remote Sens 49:2972–2982 3. Wang Z, Lu M, Yuan X, Zhang J, Van De Wetering H (2013) Visual traffic jam analysis based on trajectory data. IEEE Trans Visual Comput Graph 19:2159–2168 4. Li J, Benediktsson JA, Zhang B, Yang T, Plaza A (2017) Spatial technology and social media in remote sensing: a survey. Proc IEEE 105:1855–1864 5. Nugent G, Barker B, Grandgenett N, Adamchuk V (2009) The use of digital manipulatives in k-12: robotics, GPS/GIS and programming. In: 39th IEEE frontiers in education conference, 1–6 6. Zaniewicz G, Kazimierski W, Bodus-Olkowska I (2016) Integration of spatial data from external sensors in the mobile navigation system for inland shipping. Baltic Geodetic Congress (BGC Geomatics), 165–170 7. Zignani M, Gaito S (2010) Extracting human mobility patterns from gps-based traces. IFIP Wireless Days 1–5
Modified Hill Cipher with Invertible Key Matrix Using Radix 64 Conversion A. Ashok Kumar, S. Kiran, and D. Sandeep Reddy
Abstract Cryptography is one of the most important technological areas which will provide security to the data by using various cryptographic algorithms. The Hill cipher (HC) is a known symmetric encryption algorithm using linear matrix transformation. In spite of the simplicity of Hill cipher algorithm, there are several advantages such as masquerading letter frequencies of the plaintext and high throughput. The traditional Hill cipher founds a serious setback due to the vulnerability against known plaintext–ciphertext attacks. So, to enhance security, a new variant of the Hill cipher method is proposed. The proposed algorithm develops in three stages, firstly, convert the plaintext into 8-bit binary form; in the second stage, we remove the last two bits from MSB positions as they are 0 in the MSB and 1 in the MSB-1 position, respectively, and this reduces the size of the data. Now, apply Radix 64 conversion to each row; in the last stage, we perform the traditional Hill cipher algorithm (modulo 64), where we use the invertible matrix as key, and this produces the ciphertext. The proposed technique reduces the size of ciphertext data after encryption. The security analysis of the modified algorithm using avalanche effect shows three-fold increase than comparing to tradition Hill cipher algorithm. Further, the data size dependence on the performance of the modified Hill cipher algorithm is also addressed and found that the encryption time is increased with increase of data size. Keywords Information security · Radix 64 conversion · Invertible matrix · Hill cipher · Cryptography · The CIA triad
A. A. Kumar Physics, YSR Engineering College of YVU, Proddatur, AP, India S. Kiran (B) Department of CSE, YSR Engineering College of YVU, Proddatur, AP, India e-mail: [email protected] D. S. Reddy ISSE, NIT Jamshedpur, Jharkhand 831014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_14
175
176
A. A. Kumar et al.
1 Introduction There is rapid growth in the economy of the world due to digitization in every field from e-papers to Bitcoin transactions with the help of the Internet [1]. Because of digital communication, there is a rapid increase in usage of the Internet from the past few decades. Data security is the primary objective in the processing of digital data [2, 3]. To provide security for the data, the data are to be transmitted through a secure medium. This can be achieved by using suitable cryptographic algorithms [1–4]. Cryptography is a branch of science and technology which uses mathematical algorithms to encrypt and decrypt information [5]. The main aim of using different algorithms in cryptography is to enable us to provide security to the sensitive information, which preferably transmits in insecure channels like Internet. The data cannot be protected from being accessed by attackers instead the data can be made unreadable during transmission, and only the intended recipient can reconvert the unreadable to a readable form [4]. The process of converting the plaintext into an unreadable format (ciphertext) using any cryptographic algorithm is known as encryption. The ciphertext which can be decoded back into the original message is known as decryption [5]. Along with the cryptographic algorithm, a key may be used to encode or decode the plaintext; if the same key is used for both encryption and decryption, then that algorithm is known as the symmetric-key algorithm. If not, then it is known as the asymmetric-key algorithm [4]. The dependence of cryptographic protection algorithms of a system against attacks and unauthorized user penetration depends on many factors. Some of the important factors includes the strengthening of the keys and effective management of its associated protocols. It is also essential to develop algorithms such as secure key generation, storage, distribution, use, and destruction to enhance the protection of the keys used in the encryption process. Several effective symmetric key algorithms have been developed over a past decade which include Blowfish, DES, RC5, AES, RC6, etc. One method that is often used is a Hill cipher algorithm which refers to symmetric cryptosystems which largely relies on matrix operations [6]. Different approaches have been developed to produce different variants of Hill cipher algorithms to strengthen its security mechanisms. Several other specific matrices, such as Vandermonde matrix [7] and orthogonal matrix [8], have been proposed as key matrices. The applications of Hill cipher are not only limited to cryptography but can also be extended to encrypt biometric signals [9]. The use of matrices in cryptography is started with the invention of the Hill cipher algorithm, which is a polygraphic substitution cipher based on linear algebra [3, 10]. Each letter is represented by a number modulo 26. Often, the simple scheme A = 0, B = 1, …, Z = 25 is used, but this is not an essential feature of the cipher. The key in this algorithm is a n*n non-singular matrix [i.e., det(matrix)! = 0], then only the inverse for the matrix exists. Before that, cryptographic algorithm is using the same key for encryption and decryption, but in the case of the Hill cipher algorithm, same key is shared with the recipient, but for decryption, the inverse matrix of the key is used. The data that are transferred (irrespective of medium) travel in the form
Modified Hill Cipher with Invertible Key Matrix Using Radix 64 …
177
of bits, not in the same way as they are sent. As a reason for this, we will be mostly concentrating on the bit operations in this paper. The matrix used for the encryption method is called the encoding matrix, and the matrix used for decryption is called the decoding matrix. An extension to Hill cipher is provided in this paper which helps in reducing the data to be shared when compared with the data given. This is possible with the help of Radix 64 conversion, where two bits are removed from every 8 bits to reduce the data and the two bits are 0 in the MSB position and 1 in the MSB-1 position; this is observed in all the cases; if not, a possible alternative is chosen to correct this technique.
2 Literature Survey The main limitation of the traditional Hill cipher algorithm is that it is prone to a known-plaintext attack. If the attacker has distinct pairs of plaintext and ciphertext, then the key value can be retrieved by solving the plaintext and ciphertext. Amit Kumar Mandle et al. [1] and Maheswari et al. [5] and other authors introduced various variants of the Hill cipher algorithm for encryption and decryption of messages. But, they have a limitation as it works only for the alphabets and space but not for other characters. For encryption and decryption, no single algorithm is sufficient to provide security. As researchers are working on it to provide greater security with the inclusion of different methods by modifying the Hill cipher algorithm. Aaref et al. [11] proposed a new hybrid secured approach of ciphering by using substitution followed by transposition cipher methods. They reported that the proposed method (Rail Hill) is highly secured and difficult to break and acts as a bridge between classical and modern ciphers. In this hybrid approach, they designed Hill cipher using substitution cipher algorithm and transposition cipher algorithm is used in rail fence. After evaluating the execution time, they found that the proposed algorithm is more secured and difficult to break than tradition substitution and transposition algorithms. Bakr et al. [12] proposed a modification on the Elliptic Curve Cryptography. This modification shows lesser computations’ time and further improves the performances of security. Lot of groups worked on modification of Hill cipher using matrix operations such as non-invertible key matrix [13] where the users do not have difficulty to identify inversion of a key matrix when the key matrix is invertible for decryption process, used rectangular matrix key is created using Playfair Cipher algorithm [14] to improve the security of the Hill cipher algorithm, genetic algorithms [15] have also been used to evaluate the correct key both for encryption and decryption processes of the Hill cipher algorithm, modified S-Box [16] is also utilized in the encryption process of Hill cipher algorithm to enhance the security, similarly P-box and M-Box techniques [17] are used to enhance the security issues of Hill cipher algorithm like avalanche effect and also demonstrated the data dependence of the algorithm, a block cipher is produced by introducing interweaving and iteration functions [18] because of which the plaintext has undergone several transformations before it has become the
178
A. A. Kumar et al.
ciphertext which significantly enhances the security, efficient methods are generated to generate self-invertible matrix [19] for Hill cipher algorithm where the methods are less computational complexity, a known plaintext attack of Hill cipher is suppressed by sharing a prime circulant key as secret key and non-singular matrix ‘G’ as public key [20], of which the determinant of GC gives 0 which generates infinite solution, and hence, security of this algorithm is enhanced.
3 Hill Cipher Algorithm Hill cipher algorithm is a polyalphabetic substitution cipher based on linear algebra [21]. It was the first polygraphic algorithm that converts three or more characters of the plaintext at a time [3, 10]. In the traditional Hill cipher algorithm, each character is assigned with an integral value as A = 1, B = 2……Z = 26, Space = 0 [5]. The existing algorithm is explained in the below-defined successive steps. Firstly, the plaintext is divided into column matrices of size m*1, and m is the size of the square invertible matrix (m*m). The second step is to convert the plaintext into its respective integer value. This method is applicable for only the specified 27 characters and generates the ciphertext in the same range, where the plaintext is defined. To achieve the resulted values in the same range, the existing algorithm uses MODULAR DIVISION (mod 27); here, 27 defines the total number of characters in the specified range. Then, construct an m*m square invertible matrix. To determine a square matrix as an invertible matrix, it must satisfy the condition ‘det(matrix) = 0’. The key value must have an invertible matrix such that the product of the key and its inverse is an identity matrix (K * K −1 = I). Now, perform matrix multiplication with key value and with column matrix one at a time and perform mod 27 for the intermediate result (result generated after matrix multiplication). Convert the resulted values into the text form, and thus, generated unreadable text is the ciphertext. Encryption can be mathematically defined as C = K ∗ P(mod27). Here C is the ciphertext formed, K is the key matrix, P is the plaintext matrix. Decryption can be mathematically defined as P = I ∗ C(mod27). Here P is the plaintext formed,
Modified Hill Cipher with Invertible Key Matrix Using Radix 64 …
179
Table 1 Conversion table of character to a respective integer value A
1
J
10
S
19
B
2
K
11
T
20
C
3
L
12
U
21
D
4
M
13
V
22
E
5
N
14
W
23
F
6
O
15
X
24
G
7
P
16
Y
25
H
8
Q
17
Z
26
I
9
R
18
SPACE
I
0
is the inverse of the key matrix I =K
−1
Ad j (K ) = , det(K )
C is the ciphertext. Explanation Let us consider the plaintext as: ‘HILL CIPHER ALGORITHM’. • Convert the plaintext into integer value with the reference of Table 1. H
I
L
L
8
9
12
12
0
C
I
P
H
E
R
3
9
16
8
5
18
0
A
L
G
O
R
I
T
H
M
1
12
8
15
18
9
20
8
13
• Convert the plaintext into the columnar matrix of size 3*1. ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 8 12 9 5 P1 = ⎣ 9 ⎦, P2 = ⎣ 0 ⎦, P3 = ⎣ 16 ⎦, P4 = ⎣ 18 ⎦, 12 3 8 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 15 20 P5 = ⎣ 12 ⎦, P6 = ⎣ 18 ⎦, P7 = ⎣ 8 ⎦ 7
9 ⎡
⎤ 4 −2 3 • Construct a key matrix K = ⎣ 8 −3 5 ⎦. 7 −2 4
13
180
A. A. Kumar et al.
• Now, multiply each columnar matrix with the key value, and the resulting value is the intermediate result. Apply mod 27 to the intermediate result. Convert the numerical values to text format with the reference of Table 1. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 4 −2 3 8 50 23 W C1 = K ∗ P1 = ⎣ 8 −3 5 ⎦ ∗ ⎣ 9 ⎦ = ⎣ 97 ⎦(mod27) = ⎣ 16 ⎦ = ⎣ P ⎦, 7 −2 4 12 86 5 E ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 11 K 3 C 1 A ⎣ ⎦ ⎣ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ C2 = 3 = C , C3 = 10 = J , C4 = 13 = M ⎦ 26 Z 15 I 9 I ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 A 24 X 22 V C5 = ⎣ 7 ⎦ = ⎣ G ⎦, C6 = ⎣ 3 ⎦ = ⎣ C ⎦, C7 = ⎣ 12 ⎦ = ⎣ L ⎦. 11 K 24 X 14 N • The ciphertext formed is ‘WPECCIAJIKMZAGKXCXVLN’. • The reverse of the encryption technique is the decryption technique. This traditional Hill cipher alone is not secure in its respective domain. So, to attain greater complexity of the ciphertext, different techniques are combined with the existing technique. In the proposed technique, a two stage hybrid approach was utilised includes removing common consecutive columns in each block and applying Radix 64 conversion.
4 Proposed Methodology The traditional Hill cipher has several advantages in its respective domain. It is a symmetric key matrix but does not use the same key for the encryption and decryption processes [5]. It generates an invertible matrix for the decryption process with the use of a key matrix. The main reason to develop this proposed methodology is that it reduces the data that are to be transferred when compared with the data that are given. The proposed methodology is described in four stages. Firstly, the plaintext characters are substituted. In the second stage, convert the plaintext to 8-bit binary form. In the third stage, remove MSB and MSB-1 bits as they are 0 and 1. Now, the 8-bit value is reduced to 6-bit value and convert them to decimal form. In the fourth stage, perform the traditional Hill cipher algorithm along with modulo 64 to generate the ciphertext in the range of the Radix 64 conversion table. The reverse procedure of encryption is decryption.
Modified Hill Cipher with Invertible Key Matrix Using Radix 64 …
181
4.1 Radix 64 Conversion Radix 64 conversion is the method of representing the characters in a 6-bit binary value concerning the Radix 64 conversion table. The Radix 64 conversion table consists of 64 characters. It consists of 26 uppercase characters (A–Z), 26 lowercase characters (a–z), ten digits (0–9), and two special characters (‘ + ’ and ‘/’). First, each character of the plaintext is converted into its respective ASCII value. After the removal of two bits from each 8-bit binary value, there will be 6 bits remaining, and the 6 bits left are in the Radix 64 form. The process of converting character into its respective 6-bit binary value is called Radix 64 encoding, and the reverse process of converting the 6-bit binary value to its character is called Radix 64 decoding. Algorithm of Encryption 1. Rewrite the plain text after necessary replacement of characters i.e., SPACE is replaced by ‘_’, ‘.’ is replaced by ‘|’ and ‘,’ is replaced by ‘\’. 2. Convert the plain text into ASCII values and then to 8-bit binary form. 3. Remove the last two bits from MSB positions as they are ‘0’ in the MSB and ‘1’ in the MSB-1 position respectively from every 8-bit binary value. 4. The 6 bits left after the removal of 2-bits are converted into Radix 64 value. 5. Group these values into columnar matrices of size 3*1 or 4*1 or more. 6. Apply traditional Hill Cipher and apply modulo 64 to the result and convert the resulted values into respective Radix 64 conversion. 7. The resulted text is the Ciphertext. Algorithm of Decryption 1. Convert the ciphertext into its respective Radix 64 value & group these values into columnar matrices of size 3*1 or 4*1 or more. 2. Apply traditional Hill Cipher and apply modulo 64 to the result. 3. Convert the above-resulted values into Radix 64 Encoding (6-bit). 4. Add two bits to each 6-bit binary value i.e., ‘0’ at MSB position and ‘1’ at the MSB-1 position. 5. Convert the 8-bit binary value into its ASCII value to character. 6. Now replace ‘_’ with SPACE, ‘|’ With ‘.’ And ‘\’ with ‘,’. 7. The resulted text is the plain text.
182 Table 2 Various sizes of data after encryption and decryption
A. A. Kumar et al. S. No. Data size(in bits) After encryption After decryption (in bits) (in bits) 1
64
48
64
2
128
96
128
3
256
192
256
4
512
384
512
5
1 Kb
768
1024
6
2 Kb
1536
2048
7
4 Kb
3072
4096
8
5 Kb
3840
5120
9
10 Kb
7680
10,240
5 Performance and Security Analysis 5.1 Analysis of Data Size After Encryption and Decryption Processes From the analysis, it is observed that the data size is reduced after doing encryption process. The data is retrieved back to the same original size after the decryption process. Table 2 represents how the size of the data is varied with respect to the encryption and decryption mechanisms.
5.2 Avalanche Effect The security of the proposed algorithm was analyzed using avalanche effect. It was found that the change of 1 bit in the plaintext results in change of three characters irrespective of size. It was also analyzed that the change of 1 bit in the key results in 33.3% change of characters in the ciphertext.
5.3 Encryption Time Dependence on File Size The encryption and decryption times were evaluated for the proposed algorithm for different sizes of the file. It is evidenced that the encryption time increases when file size increases but at a lower rate than compared to the file size. The statistical analysis is mentioned in Table 3. From the above analysis, it is observed that security of the modified Hill cipher using Radix 64 improves more than three folds comparing to original Hill cipher
Modified Hill Cipher with Invertible Key Matrix Using Radix 64 …
183
Table 3 Variation of encryption time with file size Size of file (in bytes) Encryption execution time File size (in KB) Encryption execution time (in ms) (in ms) 64
424 ms
1 KB
799 ms
128
442 ms
2 KB
915 ms
256
476 ms
3 KB
1120 ms
512
557 ms
4 KB
1441 ms
1024
799 ms
5 KB
1542 ms
algorithm [16]. The proposed variant of Hill cipher algorithm may be used than tradition Hill cipher in view of the advantages mentioned in the performance analysis.
6 Conclusion A new variant of the Hill cipher algorithm is proposed using Radix 64 conversion. The proposed algorithm encrypts and decrypts any type of message not only alphabets and characters. The ciphertext generated using this algorithm is also in the same range as the plaintext. The size of the ciphertext formed using this proposed methodology is smaller when compared with the size of the plaintext. Because of the decrease in the size of the ciphertext after encryption, the complexity of breaking the ciphertext is difficult. The proposed methodology generates 48 bits of ciphertext for every 64 bits of plaintext. The security analysis and data size dependence on the performance of the modified Hill cipher algorithm are also addressed. Avalanche effect of the proposed algorithm shows 33.3% change of characters in the ciphertext for every 1-bit change in the key. Increase in the change of encryption time is observed with increase of file size but at a lower rate.
References 1. Mandle AK, Namdeo V (2019) Encryption and decryption of a message involving byte rotation technique and invertible matrix. Int J Eng Adv Technol (IJEAT), ISSN: 2249–8958, 9(2):1160– 1163 2. Reddy SK, Kishore RS (2014) A new digital encryption scheme matrix rotations and bytes conversion encryption algorithm. Int J Eng Res Technol (IJERT) ISSN: 2278–0181 3(7):487– 494 3. Kalaichelvi V, Manimozhi K, Meenakshi P, Rajkumar B, Vimala Devi P (2017) A New variant of hill cipher algorithm for data security. Int J Pure Appl Math. ISSN: 1311–8080, 117(15):581– 588 4. Kumar BR, Murti PRK (2011) Data encryption and decryption process using bit shifting and stuffing (BSS) methodology. Int J Comput Sci Eng (IJCSE) ISSN: 0975–3397, 3(7):2818–2827
184
A. A. Kumar et al.
5. Maheswari D, Kaushika A, Jenifer A (2018) A study on data encryption and decryption using hill cipher algorithm. Int J Tech Innov Modern Eng Sci (IJTIMES). e-ISSN: 2455–2585, 4(3):21–26 6. Liew KJ, Nguyen VT (2020) Hill cipher key generation using skew-symmetric matrix. In: Proceedings of the 7th international cryptology and information security conference 2020 (CRYPTOLOGY2020) 7. Sharma P, Rehan M (2014) Modified hill cipher using vandermonde matrix and finite field. Int J Technol 4(1):252–256 8. Khan FH, Shams R, Qazi F, Agha D (2015) Hill cipher key generation algrithm by using orthogonal matrix. Proc Int J Innov Sci Modern Eng 3(3):5–7 9. Kaur H, Khanna P (2017) Non-invertible biometric encryption to generate cancelable biometric templates. Proc World Congress Eng Comput Sci 1:1–4 10. Siahaan MDL, Siahaan APU (2018) Application of hill cipher algorithm in securing text messages. Int J Innov Res Multidisciplin Field. 4(10):55–59, ISSN: 2455–0620 11. Aaref AM, Ablhd AZ (2017) A new cryptography method based on hill and rail fence algorithms. Diyala J Eng Sci 10(1):39–47 12. Bakr MA, Mokhtar A, Takieldeen A (2018) Elliptic curve cryptography modified hill cipher dependent on circulant matrix. Int J Indus Electron Electric Eng 6(7):24–29 13. Saxena MA, Mr. Lohiya H, Mr. Patidar K (2017) A novel technique of hill cipher for evaluation of non-invertible key matrix 06(1):856–862 14. Alawiyah T, Hikmah AB, Wiguna W, Kusmira M, Sutisna H, Simpony BK (2020) Generation of rectangular matrix key for hill cipher algorithm using playfair cipher. J Phys: Conf Series 1641 012094:1–5 15. Siahaan APU (2016) Genetic algorithm in hill cipher encryption. Am Int J Res Sci Technol Eng Math 15(1):84–89 16. Paragas JR, Sison AM, Medina RP (2019) A new variant of hill cipher algorithm using modified S-box. Int J Sci Technol Res 8(10):615–619 17. Khar S, Bhargawa N, Shukla R, Shukla M (2012) Implementation of enhanced modified hill cipher by P-box and -M-box technique. Int J Inf Technol Knowl Manag 5(1):53–58 18. Sastry VU, Shankar NR, Bhavani SD (2010) A modified hill cipher involving interweaving and iteration. Int J Netw Secur 11(1):11–16 19. Acharya B, Rath GS, Patra SK, Panigrahy SK (2007) Novel methods of generating selfinvertible matrix for hill cipher algorithm. Int J Secur 1(1):14–21 20. Reddy KA, Vishnuvardhan B, Madhuviswanatham, Krishna AVN (2012) A modified hill cipher based on circulant matrices. Procedia Technol 4:114–118 21. Abidin AFA, Chuan OY, Ariffin MRK (2011) A novel enhancement technique of the hill cipher for effective cryptographic purposes. J Comput Sci 7(5):785–789
BT Classification Using Deep Learning Techniques from MRI Images—A Review M. Neethu and J. Roopa Jayasingh
Abstract Cancers are the vast reported diseases leading to death, all over the world in last decade. Brain tumours have a major role among them. The analysis of malignance can be done only through biopsy test now, which is to be performed after a surgery. There are many researches under progress to find the malignance of tissues, by studying the scanned images. Artificial Intelligence is a very useful tool to perform analysis of databases. The new methods must allow the specialist to do the tumour detection more easily, and this paper studies about various researches already progressed in this area and suggests the most appropriate methods for each stage of image classification. Keywords Brain tumour · Artificial intelligence · Biomedical image processing · Accuracy
1 Introduction Irrespective to the rate of developments in the fields of medicine and technology, the number of new cases identified with different cancer dieses is increasing every day. This is considered as a significant problem to be addressed seriously for the well-being of entire humankind itself. Among all the cancers, brain tourer is playing a vigorous role. In the USA alone, about 23,000 fresh cases were reported in the year 2015 [1]. This number gradually increased in the following years is reached to approximately 80,000 fresh cases in 2018. A painful fact is that both children and adults are equally affected. There are different types of brain tumours. Different types and their share can be summarised as: Meningioma—36.3%, Gliomas—26.5%, Pituitary tumours—16.2% and other types together (Medulloblastoma, Lymphomas, etc.)—21%. It is nevertheless to say that the timely identification and accurate detection are very important in the treatment. The three important things which decide the possibility of curing are the degree of tumour, the pathological type and category in M. Neethu (B) · J. R. Jayasingh Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_15
185
186
M. Neethu and J. R. Jayasingh
which tumour belongs. The brain contains tissues and nerve cells to regulate each and every action of whole body, and it is the most complicated part in human body itself. Every cell has its own capabilities and functions. Cells develop and normally function ordinarily. Some cells show reduction in their proficiencies by stopping the growth and then become atypical. When the number of such abnormal cells increases, they form a tissue and we call it as tumour. Such irregularly propagating tissues formed by a bulk group of atypical cells in the brain are called as brain tumours [2–4]. For the treatment for the prevention of further spread and cure of cancer diseases, an accurate identification of stage and kind of tumour is required. Magnetic resonance imaging (MRI) is used all over the world by radiologists for this purpose [5]. The physical features such as shape, size and position of organs or tissues can be obtained by magnetic resonance imaging technique, without application of high intensity ionising energy [6]. These images are rich enough to study and generally precise, depending on the efficiency of equipment. Most important advantage of MRI imaging technology is that it provides accurate guidance for locating the surgical treatment and avoiding the thoracotomy or laparotomy procedures for diagnostic purposes. Brain tumour MRI is using three-dimensional multiband imaging technologies, which helps the medical practitioner to locate the lesion area accurately. The 3D technology can provide exact coordinate position when compared to old 2D technology. The 3D brain imaging provides the anatomy in three planes, namely sagittal, axial and coronal. Moreover, the brain MRI technology also provides diverse structures of one tissue, by applying different development sequences, known as a multimodal MRI output. There are four modes or sequences in brain MRI imaging, based on the Repetition Time (RT) and Time to Echo (TE) auxiliary conditions selected while imaging, namely T1 weighted, T2CE mode, T2 weighted and Fluid-Attenuated Inversion Recovery (FLAIR) mode. The different sequences are aiding to investigate different features of same tumour [7]. Application of machine learning techniques are for the faster analysis and accurate detection of information from MRI images [5]. The brain tumour segmentation methods existing now, automatic as well as semiautomatic methods, can be categorised broadly as techniques based on generative model based and techniques based on discriminative model [8]. The statistics gained via probabilistic image atlases are essential for generative model-based segmentation techniques. Based on this prior information, the brain tumour segmentation is modelled as an outliner detection problem. Unlike generative models, in discriminative model-based techniques, the problem is solved by pattern classification setting. In other words, classify the image voxels of tissues as normal or abnormal, based on features of MRI images. Obviously, the performance of later models greatly depends on the algorithms using for the feature extraction and classification of MRI outputs. The image features adopted for brain tumour segmentation studies are various, e.g. local histograms, image textures, structure tensor eigen values, etc. For the pattern classification, among different available algorithms, most popular is Support Vector Machines (SVMs) [9] and Random Forests [10]. Image classification [11], object detection [12] and semantic segmentation [13] are done with an improved accuracy by adopting deep learning techniques in recent studies. There are several methods based on deep learning techniques, which are using for image segmentation. Among them,
BT Classification Using Deep Learning Techniques from MRI …
187
Convolutional Neural Network (CNN)-based methods were shown better performance than other methods. The three-dimensional CNN models were also tried in some studies for the segmentation of BT MRI outputs.
2 Literature Review Mr. Sunil Mahajan et al. in their research used the Softmax loss function, a deep learning technique, for classification in the brain tumour detection from 3D MRI images. This method has reduced the risk of overfitting than earlier studies. At the same time, it failed to evaluate the detection method for a large database, which is the most important factor on accuracy of results [14]. Ahmet Cinar and Muhammed Yildirim together developed a hybrid model, which is a modification based on Resnet50 architecture, for the classification of images. Resnet50 is a hybrid CNN model which belongs to deep learning technique. Softmax loss function is also used in this before the final classification. This eight-layer method had high accuracy rate. The drawback of this classification technique is high computational complexity [15]. Gawad et al., had done the brain tumour detection from MRI outputs using an optimised edge detection technique. Balance Contrast Enhancement Technique (BCET) is used for improving the features of medical images, and then, the edge detection is achieved through Genetic Algorithms (GAs). In this research, the method improved image features well to deliver improved image characteristics. But, much more iterations are required to obtain a better classification result [16]. Li et al., have developed the multi-CNN, a combination of multimodal informal fusion with Convolutional Neural Network, for improving the accuracy of brain tumour detection from 3D images. This method had shown good localisation and provided detection of sharper edges effectively. Currently, this is tested with a small dataset only and has to be tested for larger datasets that include different ages and races to ensure its probability and failed to extend it in other medical applications too [7]. Noreen et al., had tried another deep learning model, which is based on the concatenation approach. The research includes two models: DenseNet201 and Inception-v3, with Softmax as classifier. These methods had shown good performance in tumour detection. When iterated with large number of layers on the pre-trained models, his method was failed in applying fine tune techniques [2]. Saba et al., had tried with one method for segmentation and another one for fine turning, the GrabCut method and VGG 19 Transfer Learning model, respectively. This method also had shown good performance in terms of tumour detection. The drawback of the method is its high running time [17]. Hazhemzehi et al., tried to develop a hybrid model by using the Neural Autoregressive Distribution Estimation (NADE) technique with CNN. Unlike other researches, this method for image classification had not only smoothened the boundaries of MRI images and removed the unsought features, but also extracted the advantageous
188
M. Neethu and J. R. Jayasingh
features required. The massive computational complexity is the major setback of this method [5]. Zhao et al., had developed a deep learning model by combining Conditional Random Fields (CRFs) with Fully Convolutional Neural Network (F-CNN). Even though the method has failed to get rid of the problem of imbalances in training data, it has shown good computational efficiency [18].
3 Challenges The challenges addressed in different brain tumour detection studies may be summarised as below. The 2D F-CNNs and CRF–RNN method is developed for brain tumour segmentation. Image slices are utilised as the training data. Training is done using CRF–RNN and fine tuning is done by using an integrated model of CRF–RNN with F-CNN. The segmentation performance of this network may be degraded as the number of images’ pixels for different classes is different in slices [18]. In [16], the major drawback of the method is low accuracy of detection, which mostly depends on the images selected for training. In this proposal, the time for training is considered for the training of samples, as a function of count of images used. In order to achieve an improved accuracy–speed trade-off, the count of images using to train may be varied. The challenge in the concatenation approach lies in two venues. In the application of the techniques for fine tuning, the pre-trained models may be trained with a greater number of layers. And, for the classification purpose, data augmentation techniques may be applied with scratch-based models [2]. The deep learning method improved the accuracy in detection and classification of not only brain tumours but also other types of tumours along with it. Moreover, this method decreased the time for computation and increased the accuracy. The challenge lies in implementing the detection method for large datasets [14]. It is nevertheless to say that it is a challenge to detect the edges of any MRI outputs accurately. Regarding precise diagnosis of disease, the accuracy in detection is critical and important. The inaccuracy in availed data is the most identified constraint in the analysis of MRI outputs.
4 Proposed Methodology This research primarily focussed to design and develop a novel method for the detection of brain tumours from MRI images. An optimisation algorithm for this purpose will be major contribution committed by research. In bird’s-eye view, there are five stages of iterations included in the procedure, namely region of interest (RoI) extraction, preprocessing, segmentation, feature extraction and finally the classification.
BT Classification Using Deep Learning Techniques from MRI …
189
Fig. 1 Block diagram of the proposed brain tumour detection process
The thresholding-based RoI extraction module will be the primary one which is intended to give the regions in the image for further stages of study. In order to remove the noise in these extracted features, a preprocessing module is designed. The T2F15 filter is preferred in this module. The Fuzzy C-Means (FCM) clustering algorithm will be used for the segmentation of the image features, after preprocessing [19]. The fourth module is dedicated for the extraction of features from segmented images. Fifth module is designed for feature extraction. To extract the appropriate features only, the operation is done by using texton features and curvelet transformbased features. The final and most prominent module is classification module. It is proposed to perform the classification from extracted features of MRI images by using a new CNN-based technique [20]. After preparing the proposed optimisation algorithm, the classifier is to be trained by using it. Hence, the developed algorithm will be the combination of two algorithms. The implementation of this will be on MATLAB and the dataset to be employed in [21]. Analysis of performance based on different parameters is inevitable to know the good and bad of new solution. Accuracy, specificity and sensitivity are to be analysed in depth and breadth. A comparative study is to be performed by using the above-mentioned performance matrices. The proposed algorithm is to be compared with [14, 15] and [18] in depth. The following block diagram depicts the stages of proposed research for brain tumour detection from MRI outputs, in a nutshell (Fig. 1).
5 Conclusion The necessity of very reliable and dependable algorithm for the detection of brain tumours from other tissues is the need of the time. There are various studies which are being progressed on this account too. Since even a narrow wrong approximation
190
M. Neethu and J. R. Jayasingh
may lead to the drastic negative outcomes, which can never be tolerated in the field of medical science, the new and improved algorithms are still relevant to develop. The following conclusions are made, based on the review of various research works. Thresholding method is shown highest effectiveness for the RoI extraction of medical images. For the segmentation process, sparse FCM technique is showing dependably good results. And, for the feature extraction, among all tried methods, curvelet transform is found most reliable one. The most important part of diagnosis is classification. Various advantages and incompatibilities were accounted in the current algorithms. Hence, it is proposed to combine more than one AI-based classification models to develop a dependably good algorithm.
References 1. Siegel RL, Miller KD, Jemal A (2015) Cancer statistics. Cancer J Clin 65(1):5–29 2. Noreen N, Palaniappan S, Qayyum A, Ahmad I, Imran M, Shoaib M (2020) A deep learning model based on concatenation approach for the diagnosis of brain tumour. IEEE Access 1–1 3. Razzak MI, Imran M, Xu G (2019) Efficient brain tumour segmentation with multiscale twopathway-group conventional neural networks. IEEE J Biomed Health Informat 23(5):1911– 1919 4. Rehman A, Naz S, Razzak MI, Akram F, Imran M (2019) A deep learning-based framework for automatic brain tumours classification using transfer learning. Circuits Syst Signal Process 39(2):757–775 5. Hashemzehi R, Mahdavi SJS, Kheirabadi M, Kamel SR (2020) Detection of brain tumours from MRI images base on deep learning using hybrid model CNN and NADE. Biocybernetics Biomed Eng 6. Ramalho M, Matos AP, Alobaidy M (2017) Magnetic resonance imaging of the cirrhotic liver: diagnosis of hepatocellular carcinoma and evaluation of response to treatment-part 1. Radiol Bras 50(1):38–47 7. Li M, Kuang L, Xu S, Sha Z (2019) Brain tumour detection based on multimodal information fusion and convolutional neural network. IEEE Access 7:180134–180146 8. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J (2015) The multimodal brain tumour image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34:1993– 2024 9. Li H, Fan Y (2012) Label propagation with robust initialization for brain tumour segmentation. In: 2012 9th IEEE international symposium on biomedical imaging (ISBI), pp 1715–1718 10. Goetz M, Weber C, Bloecher J, Stieltjes B, Meinzer HP, Maier-Hein K (2014) Extremely randomized trees based brain tumour segmentation. In: Proceedings MICCAI BraTS (Brain Tumour Segmentation Challenge), pp 6–11 11. Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks, In: Advances in Neural Information Processing Systems, pp. 1097–1105, (2012). 12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 13. Liu Z, Li X, Luo P, Loy CC, Tang X (2015) Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE international conference on computer vision, pp 1377– 1385 14. Maharjan S, Alsadoon A, Prasad PWC, Salam M, Alsadoon OH (2019) A novel enhanced softmax loss function for brain tumour detection using deep learning. J Neurosci Methods 108520
BT Classification Using Deep Learning Techniques from MRI …
191
15. Çinar A, Yıldırım M (2020) Detection of tumours on brain MRI images using the hybrid convolutional neural network architecture. Med Hypotheses 109684 16. Abdel-Gawad AH, Said LA, Radwan AG (2020) Optimized edge detection technique for brain tumour detection in MR images. IEEE Access 1–1 17. Saba T, Sameh Mohamed A, El-Affendi M, Amin J, Sharif M (2019) Brain tumour detection using fusion of hand crafted and deep learning features. Cognit Syst Res 18. Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y (2018) A deep learning model integrating FCNNs and CRFs for brain tumour segmentation. Med Image Anal 43:98–111 19. Chang X, Wang Q, Liu Y, Wang Y (2016) Sparse regularization in fuzzy c-means for highdimensional data clustering. IEEE Trans Cybernetics 47(9):2616–2627 20. Zhao D, Wang B, Liu D (2013) A supervised actor–critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099 21. BraTS dataset, https://www.med.upenn.edu/sbia/brats2018/data.html, (Last accessed on 2020)
Design and Development of Automated Smart Warehouse Solution B. Nagajayanthi and Roopa JayaSingh
Abstract IoT networks billions of devices to communicate with each other. The success of a smart city depends on productivity. Productivity depends on a wellmaintained workforce. Automation provides reduced overhead, improved accuracy, flexibility in work hours, and low injury risk. The rise in sales in E-Commerce during Covid-19 pandemic era requires strenuous social distancing solutions. As the need of the hour, a smart and lightweight warehouse prototype was designed to satisfy customer demand. An RFID is used to segregate and identify the product. Based on the seasonal requirement and the algorithm, the products are sorted and stored in the allotted space. A web interface is designed to keep track of the stored items remotely. This prototype was tested and implemented using appropriate hardware and software components. The Smart Warehouse Management system provides a consolidated platform to manage storage and allows selected personnel to have real-time access to quality data, thereby ensuring security. This system facilitates forecasting, reduces labor cost and risk, reduces requirement of skilled labor, reduces pickup time, provides optimized space for storage, improves pickup accuracy, and provides real-time data feed about the stored products. Smart Warehouse System utilizes IoT, predictive analysis, and big data for supply chain visibility. RFID and sensors make the warehouse smart and secured. Due to the unprecedented massive demand in ECommerce during Covid-19, there is a need for supply chain management and hence the need for automated Smart Warehouse solution. Keywords IIoT · Smart warehouse · Sensors · RFID · Tracking
B. Nagajayanthi (B) Vellore Institute of Technology, Chennai Campus, Chennai, India e-mail: [email protected] R. JayaSingh Karunya Institute of Technology, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_16
193
194
B. Nagajayanthi and R. JayaSingh
1 Introduction Industrial Internet of Things (IIoT) has revolutionized in the digital era by providing automation, real-time data analytics, predictive maintenance, labor planning, increased uptime, security optimization, supply chain visibility, and network visibility. Status of the devices is monitored and managed remotely with accuracy and efficiency. The proposed Smart Warehouse for automated storage is customized to sort and store the products automatically. Traditional inventory management has its own challenges in the digitized era. It requires skilled manpower, long turnaround time to pick up the product, time to place the product, relocate the product in case of inadequate space, and time to monitor the product. To satisfy the demands of an online shopper, Alibaba and Amazon have resorted to Smart Warehouse Management. Robots work along with employees in Amazon warehouse in semi-automated warehouse. Robots are used for carrying products and to scan barcodes. Fully automated warehouse was developed by Alibaba in 2018. Around 700 robots provide error-free maintenance at great speed. With burgeoning E-Commerce, managing and tracking the inventory are a challenge in warehouse management.
2 Technologies Used to Promote Warehouse Management Several technologies are used in warehouse management for transport, storage, retrieval, sorting, and sequencing. IoT, AI, ICT have played a major role in logistics as discussed in [1]. Artificial Intelligence facilitates machines to collect, analyze, and learn from the data. Automated Guided Vehicles (AGVs) used in transportation provide digital-guided paths with shortest path. Drones are used to manage the product in hard-to-reach areas specifically. More than 100,000 IoT-based robots are deployed by Amazon in their warehouse management system. With the development of Artificial Intelligence (AI), robots could choose an optimal path to pick up and deliver the products. Blockchain is used for secured tracking. Sleeker collaborative robots (Cobots) work with the existing associates. Smart glasses aid the users to shift different items without requiring to memorizing the details. Smart technologies and cloud improve supply chain management [2]. Supply chain spans across different locations. Connected platforms help in tracking and monitoring the goods during transit, thereby ensuring supply chain visibility. By tracking using IoT, the goods are maintained with the right temperature, pressure, and moisture. This ensures freshness of the perishable products. Finding skilled talent and training them to manage a warehouse remain a challenge till date. Thus, automated, connected Smart Warehouse improves productivity and efficiency.
Design and Development of Automated Smart Warehouse Solution
195
3 Role of IoT in Smart Warehouse Smart Warehouses simplify automation, stocking, storing, loading, and management. IoT is in the limelight in today’s industry for customization due to some of the reasons highlighted below.
3.1 Analytics, Reporting, and Strategy IoT handles large amount of data with precision. Real-time data are collected [3], visualized, and reported with less latency. This facilities decision-making and predictive maintenance.
3.2 Better Transparency in Supply Chain and Maintenance IoT facilitates data accuracy. Details of supply chain are shared from the warehouse to balance stock as per requirement.
3.3 Cloud Computing IoT facilitates collection and processing of huge amount of data using cloud as reviewed in [4]. Even with less data storage, business could be managed. IoT has a prominent role in business [5].
3.4 Data Visualization Instead of using Enterprise Resource planning (ERP), if IoT is used, the data collected by a sensor could be consolidated and customized as per requirement. These data are made available for visualization in a dashboard or could be sent as an alert message.
3.5 IoT Data Employees are guided with precision data. This speeds up the productivity with confidence.
196
B. Nagajayanthi and R. JayaSingh
3.6 Last Mile Delivery IoT is connected and automated. This reduces delay due to transportation and unskilled laborers.
3.7 Low Power and Lightweight IoT devices save power, and its design is sleek [6]. Lightweight design saves space, and low power consumption saves energy. Low power prevents health hazards.
3.8 Predictive Maintenance Any failure in the system is notified, and this reduces downtime and improves productivity.
3.9 Real-Time Data IoT provides real-time data streaming from anywhere and at any time [7]. This improves productivity, transparency and saves time.
3.10 RFID Tags RFID tags and sensors along with device-to-device connectivity are utilized in dayto-day activities. As discussed by Xiaolin et al. [8] RFID is used to identify, track, and monitor objects in real time [9]. An RFID reader is faster than a barcode reader. RFID tag needs to be secured [10].
3.11 Sensors Products’ condition could be maintained using appropriate sensors to maintain temperature, pressure, and humidity inside the warehouse. Based on this, Heating Ventilation and Air Condition (HVAC) system could be optimized automatically.
Design and Development of Automated Smart Warehouse Solution
197
3.12 Tracking IoT tracks [11] and detects malfunction earlier, thereby avoiding downtime. This avoids failure and accidents, thereby improving accuracy. From product delivery to product storage, IoT tracks the inventory, thereby speeding up delivery.
3.13 Wearables Wearables used by the employee monitor the performance of the individual and guide him with the required details for work. Apart from providing industrial metrics, sensors also monitor the health of the employee at work.
4 Proposed Automated Smart Warehouse Architecture The proposed framework for Automated Smart Warehouse Architecture consists of a stacker crane, a conveyer belt, an RFID reader, and a database to keep record of the details stored in the warehouse as shown in Fig. 1. Product picking and sorting are done using RFID. Perishable products are given highest priority. A web interface or handheld device is used to monitor the warehouse activities. Fig. 1 Proposed smart warehouse architecture
198
B. Nagajayanthi and R. JayaSingh
4.1 Controller Architecture The Controller Architecture consists of the hardware part as shown in Fig. 2 and the software part as shown in Fig. 3. Based on the software controller architecture, the movement of the stacker crane is controlled.
Fig. 2 Controller architecture (hardware)
Fig. 3 Controller architecture (software)
Design and Development of Automated Smart Warehouse Solution
199
Based on the algorithm, the stacker crane positions and stores the product. As per the priority algorithm, the product is placed in either of the three levels 1, 2, or 3. There are nine slots in total, namely, 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, and 3.3, respectively.
5 Simulation of the Proposed Smart Warehouse 5.1 Proteus Simulation Proteus is a design development tool that provides business solutions. The proposed framework for Smart Warehouse was tested using Proteus as shown in Fig. 4, and the results of the motor for X, Y, and Z axes were displayed in the virtual transmitter. The motor driver drives the motor to move the stacker crane along the corresponding axes X, Y, and Z.
Fig. 4 Proteus simulation for Smart Warehouse storage system
200
B. Nagajayanthi and R. JayaSingh
Fig. 5 CAD model for Smart Warehouse System
5.2 CAD Model Simulation of the Smart Warehouse System: CAD model was made for each major component comprising the conveyer belt, stack crane, X-axis, Y-axis, Z-axis, respectively. All the components were assembled and verified as shown in Fig. 5.
6 Implementation of Smart Warehouse System Automated Smart Warehouse Management system hardware module consists of a storage with a (3 × 3) stack with nine slots. These slots are for three categories: food, cloth, and other products. The hardware consists of stepper motors for the ‘X’, ‘Y’, and ‘Z’ axes. The product to be stored was placed with an RFID tag on the conveyer belt. The RFID tag has details of the product. PCB was designed as shown in Fig. 6 with ESP32 controller and three DRV8825 stepper motor drivers for the ‘X’ axis, ‘Y’ axis, and ‘Z’ axis, respectively. Eight mm diameter and 10 mm diameter linear ball bearings were used for linear motion. The hardware module was switched ON and was connected to the Wi-Fi Module. High-Density Fiberboard (HDF) Plywood sheets with 5 mm and MediumDensity Fiberboard (MDF) of 3-mm sheet were used. Motors were made of copper wiring to sustain and handle heat. Heat sink was used to keep the motor driver cool. As the RFID card in the product is identified by the card reader module, the crane moves along the ‘Z’ axis to pick up the product. It goes to the corresponding column and to the corresponding row and places the product in the stack. This is the functionality of the ‘Y’ axis. After placing the product, the stacker crane goes back to the home position, i.e., ‘X’ axis. The functioning of Smart Warehouse is shown in Fig. 7. The firebase database gets updated with the details of the product. The website has details of the stored products. As per the algorithm, two matrices ‘board’ and
Design and Development of Automated Smart Warehouse Solution
201
Fig. 6 PCB design
Fig. 7 Smart warehouse system
‘val’ were created. ‘Board’ indicates position of the product. ‘Val’ value is assigned to three categories, ‘f’, ‘c’, and ‘o’. As per the algorithm, the (X, Y) coordinates are given to the stacker crane to set the position for the product. A website was designed for real-time tracking of the product. When the product ID is entered in the website by the manager as shown in Fig. 8 to track the product, the details of the product were displayed along with the category and location for supply chain visibility. Using firebase database, duplicate entry of the product with the same ID is avoided.
7 Research Issues and Impediments The stacker crane was put into rigorous testing with a constant load up to 20–25 min. A cooling fan would help to reduce the increase in motor body temperature from 32° C to 50° C. The temperature of the belts, pulleys, ESP32 controller, and linear bearing remained the same. All the shafts, nuts, and bolts remained intact.
202
B. Nagajayanthi and R. JayaSingh
Fig. 8 Website for smart warehouse management
8 Results and Inferences Smart Warehouse Management optimizes space, sorts products, and provides realtime monitoring. Priority algorithm optimizes time complexity and space complexity. It takes up to 3 min to take a product and to store the product weighting up to 1.5–2 kg in the rack. Time could be reduced using a powerful motor. Poor space utilization was resolved using vertical storage of products (Z-axis). All the theoretical parameters tabulated in Table 1 were referred from the datasheet and from the manufacturer’s website. All tested parameters could have an error of ± 5–8%. Table 1 Comparison of theoretical and tested values
No.
Component name
Theoretical capacity
Tested capacity
1
Stepper motor
2 Amp per phase
750 mA per phase
2
SMPS power supply
5 Amp at 12 V
3.2 Amp at 12 V
3
Base platform (horizontal)
6 kg
2 kg
4
Y-axis platform (vertical)
3.5 kg
1.3 kg
5
Z-axis platform
500 gm
250 gm
6
Linear bearing and 20 kg rod—8 mm
5 kg
7
Linear bearing and 35 rod—10 mm
5 kg
Design and Development of Automated Smart Warehouse Solution
203
Automated Smart Warehouse plays a major role in industries as dis cussed by Da et al. [12]. Data type, data format, and data value validation of the input were done. Software used were updated for enhanced security. NodeJS supported encryption. End-to-End encryption was provided by ‘bcryptjs’ to ensure data security to the clients. Firebase prevented duplicate entries.
9 Conclusion and Future Work Smart Warehouse Management system optimizes space and manages storage. This helps in data analytics, predictive maintenance, and prioritized storage. Warehouse management is the heart of supply chain management. A centralized database is maintained and redundancy is reduced by creating Google Firebase. Cyber-Physical Systems could have a copy of the Smart Warehouse Management for verification if required. With increased customer demand, companies need to embrace intelligent automated solutions for their warehouses.
References 1. Feng B, Ye O (2021) Operation management of smart logistics A literature review and future research. Frontiers Eng Manag 8:344–355 2. Dolgui A, Proth JM (2008) RFID in supply chain management: state of the art and perspectives. In: Proceedings of the 17th World congress-the international federation automatic control, Seoul, Korea 3. van Geest M, Tekinerdogen B, Catal C (2021) Design of a reference architecture for developing smart warehouses in Industry 4.0. Comput Indus 124 4. Nagajayanthi B, Vijayakumari V, Radhakrishnan R (2016) Secured seamless broadcasting using bluetooth enabled IoT cloud. Res J Appl Sci Eng Technol. Maxwell Scientific Publication Corporation. 13(4):325–330 5. Bilgeri D, Brandt V, Lang M, Tesch J, Weinberger M (2015) The IoT business model builder. A white paper of the Bosch IoT lab in collaboration with Bosch software innovations GmbH 6. Nagajayanthi B (2020) Energy efficient light weight security algorithm for low power IoT devices. Int J Eng Adv Technol 9(3) 7. Nagajayanthi B (2019) Energy efficacious IoT based nifty parking information system. Int J Innov Technol Explor Eng (IJITEE) 9(3) 8. Jia X, Feng Q, Fan T, Lei Q (2012) RFID technology and its applications in Internet of Things (IoT). In: 2nd international conference on consumer electronics, communications and networks (CECNet), IEEE, pp 1282–1285 9. Subramanya S, Tejesh B, Neeraja S (2018) Warehouse inventory management system using IoT and open-source framework. Alexandria Eng J 57:3817–3823 10. Khattab A, Jeddi Z, Amini E, Bayoumi M (2017) RFID security: a light weight paradigm. Springer, pp 1–162 11. Pawalowicz B, Salach M, Trybus B (2021) The infrastructure of RFID-based fast moving consumer goods system using cloud. In: Conference on automation-automation 2020: towards industry of the future, pp 216–226 12. Da Li X, He W, Li S (2014) Internet of things in industries: a survey. IEEE Trans Industr Inf 10(4):2233–2243
Emotion Recognition from Facial Expressions Using Videos and Prototypical Network for Human–Computer Interaction Divina Lawrance and Suja Palaniswamy
Abstract Emotions play a vital part in a person’s day-to-day life. In this work, a method for emotion recognition from videos is proposed which uses prototypical network and Long Short-Term Memory (LSTM). The method is named Prototypical network and LSTM for Emotion Recognition in Videos (PLERV). The method first finds the classification of selected frames in a video, and this is achieved using the metric meta-learning approach prototypical network. A sequence of classification is generated from the selected frames in a video, which is in turn given to Long ShortTerm Memory (LSTM) model. LSTM model will provide the emotion classification of the video. Use of meta-learning algorithm enabled better generalization with lesser samples. Experiments are done using the BU-4DFE dataset for the emotions such as Happy, Surprise, Angry, Fear, Sad, and Disgust which gave an accuracy of 89%. Keywords BU-4DFE · Emotion recognition · Prototypical network · Meta-learning · LSTM
1 Introduction Emotion recognition has several applications and is important for Human–Computer Interaction. Traditional machine learning and deep learning contributed to create models with good accuracy. But, in the case of deep learning, models are trained using a large number of samples. In addition, the model must be trained from scratch for a related task, whereas in traditional machine learning approach, hand-engineered features are required to get better results. Few-shot learning (FSL) can generalize well for tasks that are not seen during training time [1]. Learning from a small number of data points is referred to as fewshot learning. A classifier that learns to categorize M classes using P samples per class is a ‘P shot N way’ classifier [1]. FSL problems can be solved using meta-learning D. Lawrance · S. Palaniswamy (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_17
205
206
D. Lawrance and S. Palaniswamy
approaches. In metric-based meta-learning, the feature vectors are extracted from images, and the distance between the feature vectors is calculated to find the similarity between two images. That is metric meta-learning approach which can provide better results with few samples unlike deep learning. Also, feature generation is not hand engineered like traditional machine learning as metric meta-learning approach uses Convolutional Neural Network as embedding network. The main contributions of PLERV are as follows: • Prototypical network and LSTM are used in PLERV, which is a novel technique to video emotion classification. • PLERV gives better generalization with lesser samples. • PLERV uses temporal aspects of video. Paper outline is as follows: Sect. 2 discusses related work. Section 3 delves into the proposed technique. The experiments conducted as part of this work are described in Sect. 4. In Sect. 5, results and analyses are described. Section 6 discusses the conclusion as well as future research.
2 Related Work Recent work in emotion recognition using facial expressions and meta-learning approach using prototypical network is explored as part of literature survey. Snell et al. [2] proposed prototypical network for the few-shot classification problem. They observed that the network worked better when the training-shot matches with the test shot and when more classes are used in a training episode. Also, performance improved when Euclidean distance is used instead of cosine distance. It is best to train and test with the same ‘shot’ number, i.e., number of samples in the case of prototypical network. Prototypical network is claimed to be the simplest and efficient than the recent meta-learning algorithms. Soumya and Suja Palaniswamy in their work [3, 4] used prototypical network for emotion recognition under occlusion for images. Temporal information can help to track minute changes in facial expression [5]. Long-Term Recurrent Convolutional Networks (LRCNs) were proposed in the study [6], which uses Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) for video classification. The temporal dynamics of videos are used in this approach. Abdullah et al. [7] employed CNN-LSTM for video classification task in their research as well. For feature extraction, deep CNN model is used. CNN model is pre-trained and fine-tuned as the dataset is small. For emotion recognition from videos, the work [8] uses the metric meta-learning technique Siamese network and LSTM. Siamese network in this work was trained with triplet loss. In the works [9–12], geometrical approach is used for feature extraction. The embedding model used in the work [8] is large, and hence, the computational intensity is more. Also, Siamese network with triplet loss uses image triplets for
Emotion Recognition from Facial Expressions Using Videos …
207
comparison, whereas prototypical network uses prototype representation which gives better generalization capability for prototypical network. With a geometrical-based technique for feature extraction, the works [9–12] were able to get good results. But, these methods need facial landmarks which are hand-engineered features which is a drawback of traditional machine learning algorithms. According to literature, prototypical network gives good results for lesser image samples. LSTM likewise allows the extraction of temporal features. In the case of videos, a combination of prototypical network and LSTM will yield good results when the sample size is limited. PLERV method incorporates the following advantages of traditional machine learning and deep learning algorithms: • PLERV does not need hand-engineered features which is a drawback of traditional machine learning algorithms. • PLERV does not need very large dataset like deep learning methods.
3 Proposed Method The proposed method PLERV is a combination of metric meta-learning model prototypical network and LSTM. The three stages of the method are shown in Fig. 1.
3.1 Data Preprocessing Frames are extracted from the video, and each alternate frame is used for further processing. There are nearly 100 frames extracted from each video of nearly 4 s, and
Fig. 1 PLERV process
208
D. Lawrance and S. Palaniswamy
we are taking alternate frames to avoid redundant information. From these selected frames, face is detected and then cropped. Frames are then resized to input dimension of prototypical network which is 256 × 256 × 3.
3.2 Prototypical Networks Prototypical network learns the prototype representation of each class during training. During classification, support points will be provided for each class. Class prototype is given by the mean of the support set. While classifying a query point which is an unseen sample, distance of the embedding to each class prototype is computed. The class prototype to which the distance is less is the target class of the query point. In Fig. 2, query point X has less distance to prototype P1 and belongs to the class of P1. In prototypical network, prototype of each class which is an M-dimensional representation is learned. Prototype of the class is the mean of the embedding of the support points in a class. Prototype representation for class ‘k’ is given by (1). Pr = k
1 |sk |
f θ (xi ).
(1)
(xi ,yi )∈Sk
|Sk | is the number of support points in the class ‘k’. θ is the learnable parameters. f θ (xi ) is the embedding function for the support point xi . Loss is calculated by (2). L = − log pθ (y = k|x),
(2)
pθ (y = k|x). is the probability of the query point x belonging to class ‘k’ given by (3).
Fig. 2 Concept of prototypical network
Emotion Recognition from Facial Expressions Using Videos …
209
Fig. 3 Episodic training
exp(−d( f θ (x) · Pr k )) . pθ (y = k|x) = k i exp(−d( f θ (x) · Pr k i ))
(3)
d( f θ (x), Pr k ) is the distance between embedding of query point and the prototype. Prototypical network is trained using episodic training. For instance, if we have given n_way = x, n_query = y, n_support = z, where n_way is the number of classes, n_query is the number of query points, and n_support is the number of support points, episodic training steps are shown in Fig. 3.
3.3 Classification of Videos To make use of the temporal aspects, classification of emotion videos is done using LSTM which consists of gates which enables storing information for longer time period [13]. Classification provided by the prototypical network for the selected frames in a video is used to form a sequence. Since the videos are not of fixed length, the number of frames selected during preprocessing will vary. But, LSTM is fed with a fixed frame length of 30. The frame length of 30 is selected after experimenting with different frame lengths, such that the model should able to give a better accuracy with lesser frames.
210
D. Lawrance and S. Palaniswamy
4 Experiment Setup 4.1 Dataset BU-4DFE [14] database is used in this work for the experiments of this work. Facial expression videos are captured from 101 subjects. The database has six emotions posed by 101 subjects which constitute 606 3D sequences with labels anger, disgust, happiness, fear, sadness, and surprise. Videos are captured at a rate of 25 frames per second. Each video consists of frames starting from neutral to the emotion expressed. There are 43 men and 58 women among the subjects. The subjects are between the ages of 18 and 45 and come from a variety of ethnic and racial backgrounds, including black, white, hispanic/latino, and Asian ancestry. Experiments are done using subset of 80 subjects. About 80% of the selected videos are used for training and 20% for testing. Training set consists of 384 videos, and test set consists of 96 videos.
4.2 Dataset for Training Prototypical Network For training prototypical network, dataset of images with seven classes is created. Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral are the labels of this dataset. One image of each emotion class is created from single frame of each video. Also, one frame which represents Neutral is taken from each subject. This results in 448 images for training the prototypical network.
4.3 Implementation Preprocessing. OpenCV library is used to extract the frames from the video. OpenCV Haar Cascade is used to detect faces in the frames. Face is then cropped from the frame and resized to 256 × 256 × 3 which is the input dimension of prototypical network. Training and Testing of PLERV. Embedding network of the prototypical network is CNN, and the model used for our experiment is shown in Table 1. Model consists of four convolutional layers with 64 filters in each layer. Prototypical network converts the frames to an embedding size of 4096 and then finds the distance to the prototype representation of support points. Steps of PLERV training and testing are provided in Fig. 4. Prototypical network is trained with the image dataset created with seven class labels. Model was trained in 3 epochs with 50 episodes in each epoch. Adam optimizer is used with learning rate 0.0006. After each epoch, learning rate was reduced by half. Loss is calculated as the average loss in all episodes.
Emotion Recognition from Facial Expressions Using Videos …
211
Table 1 Embedding model of PLERV Laver
Filter
Kernel size/pool size
Stride
Padding
Activation
Conv2D
64
3,3
1,1
1,1
ReLu
2.2
2,2
3,3
1,1
1,1
ReLu
2,2
2,2
3,3
1,1
1,1
ReLu
2,2
2,2
3,3
1,1
1,1
ReLu
2,2
2,2
BatchNormalization MaxPooling Conv2D
64
BatchNormalization MaxPooling Conv2D
64
BatchNormalization MaxPooling Conv2D
64
BatchNormalization MaxPooling FullyConnected
4096
Fig. 4 Training and testing of PLERV
Next step is to generate the training dataset for LSTM. For this, alternate frames from the video are selected and passed to prototypical network after preprocessing. Prototypical network will provide the classification for the set of frames in video which results in a sequence of classification. The sequence is generated for the entire videos in the training set which are 384 videos and are given the labels of the videos. LSTM is further trained using this generated dataset. LSTM model used as part of PLERV is shown in Fig. 5. Model gave best accuracy with a sequence length of 30.
212
D. Lawrance and S. Palaniswamy
Fig. 5 LSTM model used in PLERV
Optimizer used is Adam with learning rate 0.00006, and number of epochs is 390. Dropout is 0.2.
5 Result and Analysis Training accuracy of prototypical network with 448 images, n-way = 7, n_query = 5, and n_support = 5 is 99.8% and loss is 3.9%. Accuracy plot and the loss plot of the LSTM model is given in Fig. 6. Accuracy of the PLERV method for each emotion class is shown in Table 2. Hundred % accuracy is obtained for emotions Angry, Disgust, Happy, and Surprise. Average accuracy of 89.58% is obtained for this method. Table 3 shows the accuracy of the methods in existing literature experimented with BU-4DFE dataset. References [9–11] mentioned in Table 3 are using traditional feature engineering using facial landmarks. Method in [8] uses combination of Siamese network with triplet loss and LSTM. CNN used for generating the embedding in [8] is large and computationally intense. Also, Siamese network uses one-shot learning, and hence, the generalization capability is less compared to prototypical network. Prototypical network is a better meta-learning model for few-shot learning task and gives better accuracy with lesser samples than Siamese network. Also, the method does not need to use any traditional feature engineering methods such as facial landmarks, instead makes use of the feature extraction capability of CNN.
Emotion Recognition from Facial Expressions Using Videos …
213
Fig. 6 Accuracy and loss plots of LSTM
Table 2 Accuracy of PLERV method Emotion
Angry
Disgust
Fear
Happy
Sad
Surprise
Average accuracy
Accuracy
100
100
81.25
100
56.25
100
89.58
Table 3 Comparison of PLERV with existing literature References
Method
Accuracy (%)
Lawrance[8]
Siamese Neural Network with triplet loss and LSTM (STLEV)
87.5
Suja [9]
Euclidean distance, Support Vector Machine(SVM)
92.1
Euclidean distance. Neural Network (NN)
74
Euclidean distance(2 5 points)
93.06
Euclidean distance(8 points)
86.11
Gowri Patil [11]
Optical flow, NN
75.25
Proposed method
Prototypical network and LSTM (PLERV)
89.58
Kalyan Kumar[10]
6 Conclusion In this paper, we presented PLERV, a novel approach for emotion recognition from videos that employs the metric meta-learning model prototypical network and LSTM. PLERV makes use of the video’s spatial and temporal aspects. A prototypical network is trained to classify the frames of a video, resulting in a sequence of classes from video. LSTM provides the classification of this sequence which corresponds to the emotion classification of the video. Experiments are done using six emotions of 80 subjects of BU-4DFE database. The method gave an accuracy of 89.5%.
214
D. Lawrance and S. Palaniswamy
Future work can be done to experiment the feasibility of model-based or optimization-based meta-learning algorithms. Also, the proposed method needs to be experimented with an in-house developed dataset.
References 1. Wang Y, Yao Q, Kwok J, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning 2. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. arXiv:1703. 05175 [cs, stat]. Available at: http://arxiv.org/abs/1703.05175 (Accessed 19 Jun 2021) 3. Soumya K, Palaniswamy S (2020) Emotion recognition from partially occluded facial images using prototypical networks. In: 2020 2nd international conference on innovative mechanisms for industry applications (ICIMIA). 2020 2nd international conference on innovative mechanisms for industry applications (ICIMIA), Bangalore, India: IEEE, pp 491–497. https://doi. org/10.1109/ICIMIA48430.2020.9074962 4. Kuruvayil S, Palaniswamy S, Emotion recognition from facial images with simultaneous occlusion, pose and illumination variations using meta-learning. J King Saud Univ—Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2021.06.012 5. Haddad J, Lézoray O, Hamel P (2020) 3D-CNN for facial emotion recognition in videos. In: International symposium on visual computing, San Diego (Virtual), United States. hal02955528f 6. Donahue J et al (2016) Long-term recurrent convolutional networks for visual recognition and description. arXiv:1411.4389 [cs]. Available at: http://arxiv.org/abs/1411.4389 (Accessed: 19 Jun 2021) 7. Abdullah M, Ahmad M, Han D (2020) Facial expression recognition in videos: an cnn-lstm based model for video classification. In: 2020 international conference on electronics, information, and communication (ICEIC). 2020 International conference on electronics, information, and communication (ICEIC), Barcelona, Spain, IEEE, pp 1–3. https://doi.org/10.1109/ICEIC4 9074.2020.9051332 8. Lawrance D, Palaniswamy S (2021) Emotion recognition from facial expressions for 3D videos using siamese network. In: 2021 international conference on communication, control and information sciences (ICCISc), pp 1–6. https://doi.org/10.1109/ICCISc52257.2021.9484949 9. Palaniswamy S, Tripathi S (2017) Geometrical approach for emotion recognition from facial expressions using 4d videos and analysis on feature-classifier combination. Int J Intell Eng Syst 10(3):30–39. https://doi.org/10.22266/ijies2017.0430.04 10. Kalyan Kumar VP, Suja P, Tripathi S (2016) Emotion recognition from facial expressions for 4d videos using geometric approach. In: Thampi SM et al (eds) Advances in signal processing and intelligent recognition systems. Cham, Springer International Publishing, pp 3–14. https:// doi.org/10.1007/978-3-319-28658-7_1 11. Patil G, Suja P (2017) Emotion recognition from 3D videos using optical flow method. In: 2017 international conference on smart technologies for smart nation (SmartTechCon). 2017 international conference on smart technologies for smart nation (SmartTechCon), Bangalore, IEEE, pp 825–829. https://doi.org/10.1109/SmartTechCon.2017.8358488 12. Suja P, Kalyan Kumar VP, Tripathi S (2015) Dynamic facial emotion recognition from 4D video sequences. In: 2015 eighth international conference on contemporary computing (IC3). 2015 eighth international conference on contemporary computing (IC3), Noida, India, IEEE, pp 348–353. https://doi.org/10.1109/IC3.2015.7346705
Emotion Recognition from Facial Expressions Using Videos …
215
13. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 14. Yin L, Chen X, Sun Y, Worm T, Reale M (2008) A high-resolutionz3D dynamic facial expression database. In: 2008 8th IEEE international conference on automatic face gesture recognition, pp 1–6
A Review on Early Diagnosis of Parkinson’s Disease Using Speech Signal Parameters Based on Machine Learning Technique Rani Kumari and Prakash Ramachandran
Abstract Early diagnosis means an individual gets an indication about the disease on his or her own at very early stage of the disease. Today, many people across the globe are suffering from Parkinson’s disease (PD). Early detection of Parkinson’s disease can be a better choice to treat the disease much early. Vocal cord disorder, speech impairments/speech disorders are the early indicators of PD. The initial stage of PD affects the human speech production mechanism. The speech impairments are not apparent to common listeners. We should monitor carefully for the initial stage of PD by using proper expert systems. In this review, we mainly focused on speech signal analysis for the identification of PD with the help of machine learning techniques. The voice sample of affected people from PD can be used in an early detection algorithm using various classification models with different accuracy, sensitivity, specificity, etc. In our review, we found that mainly two types of techniques have been used in this problem (a) conventional feature-based techniques and (b) machine learning-based techniques. The detailed review using these types of algorithms is presented in this paper. In feature-based applications, mel frequency cepstral coefficient (MFCC) and linear predictive coding (LPC) are the mostly used features. Machine learning-based algorithms used intelligent architecture like artificial neural network (ANN), convolution neural network (CNN), hidden Markov model (HMM), XG boost, support vector machine (SVM), etc. It is found that machine learning-based algorithms are doing better in terms of highest accuracy but with some limitations. Keywords Accuracy · Algorithm · Classification model · Early diagnosis · Machine learning · Parkinson’s disease · Speech signal
R. Kumari (B) · P. Ramachandran School of Electronics Engineering, VIT, Vellore, India e-mail: [email protected] P. Ramachandran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_18
217
218
R. Kumari and P. Ramachandran
1 Introduction Parkinson’s disease (PD) has emerged as the second common neurological aliment of the human main nervous system following Alzheimer’s [1]. People suffering from PD cannot perform physical activities like standing, walking, speech distortions, etc., like a normal person [2]. Support of somebody else’s is required to do these activities. In our daily life, we can see its effects [3]. It is more common as age increases (above 50 years). As per a survey concluded in the USA by some agencies, it estimates that more than one million people are found to be affected by PD by 2020 [4]. In Australia, 10,000 people and in the UK 1,45,000 people are suffering from Parkinson’s disease [5]. India has around half a million people who are affected by Parkinson’s disease till 2016. The ratio of males and females is approximately equal, but males are more affected than females [2]. The term Parkinson’s disease was coined for the paralysis by Dr. James Parkinson in the year 1817 [6]. Till now, we could not decipher the cause of Parkinson’s disease. We can only detect the sign of PD at an early stage with some form of speech and language disorders. Kathe S. Perez et al (1996) studied on laryngeal abnormalities using visual-perceptual ratings of endoscopic and stroboscopic investigation of 22 patients treated with idiopathic PD and 7 patients with Parkinson’s plus syndromes. Pitch loudness tremor was observed in most of the 29 patients. The laryngeal tremor was observed in PD patients at the first stage. The most significant stroboscopic findings for the idiopathic PD patients were abnormal phase closure and phase asymmetrical. This paves path for the usage of method of speech analysis to detect PD in early stage. Speech disorders have various areas like degeneration of spoken language (dysprosody), voice degeneration (dysphonia), and articulation (dysarthria). Parkinson’s speech has also some features such as silent voice, tremor of voice, soft and monotonous speech, and imprecise dysarthria [7]. Approximately, (90–95) % of patients develop voice impairment [4] that might not be noticeable to listeners [1]. Some parameters like voice pitch, subtle changes in voice frequency (jitter), the difference in the magnitude of voice cycle to cycle (shimmer), volume (amplitude), and the frequency can be used to identify PD patients among the various subjects [8]. Machine learning methods also have a great potential to give the result with high accuracy [4, 9]. Supervised machine learning technique is used for real-life applications because it is the best technique in terms of accuracy [4]. In this paper, we made an attempt to review the research work done on early prediction of PD using speech processing. We logically divided the techniques into two different categories (a) conventional feature-based techniques and (b) machine learning-based techniques.
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
219
2 Conventional Feature-Based Techniques Feature-based techniques are devised based on the speech production mechanism and an understanding of how speech signal parameters are affected by PD. Abhishek et al. (2020) used feature extraction with its related algorithms utilizing some parameters like jittering, shimmering, and voice signal frequency. The range of accuracy lies between 85%-95% [8]. Gerald [10] found some points in their research: (a) The intensity of voice can be characterized by the speech of a person having PD as compared to a healthy person. (b) Persons have high pitch level than the normal person. (c) The speech delivery speed varies between PD and normal patients. In the same way, pausing, phrasing, and syllable duration also shows little difference between the two groups but at the individual level, a large deviation was shown [10]. Observation of Rusz et al [11] reveals those seventy eight percent of uncured PD patients develop vocal disorder. From 19 representative measures, a significant deviation of fundamental frequency in monologue and emotional sentences was obtained giving the best information to separate healthy controls from PD. Jitter is expected to give the highest accuracy, but other measures also achieved sufficient accuracy [11]. Hartelius and Svensson [12] observe symptoms of speech impairment and swallowing [12]. E. Jefferey Metter et al (1986) conducted studies using X-ray computed tomography (CT) and (F-18)-fluorodeoxyglucose (FDG), positron emission tomography (PET). A great deviation of speech characteristics was observed clinically and acoustically. The above results emphasize that a good understanding of hypokinetic dysarthria can produce more variability among patients [13]. Kristin [14] compared electromyography extent of the thyroarytenoid (TA) muscle of youth and old persons with idiopathic Parkinson disease (IPD) with known loudness. This retards the activity to characteristic hypophonic voice disorders (HVD) that usually accompanies IPD and aging [14]. Nelson Roy et al., studied on articulatory variation regarding the therapy progression of muscle tension dysphonia. In this, two acoustic techniques have been used namely centralization and decentralization. These techniques are responsible for muscle tension dysphonia in terms of ailments. The solution shows that physical circumlaryngeal therapy is advantageous for both phonatory and articulatory systems [15]. (Rhonda J. Holmes et al (2000)) This research tested the acoustic and perceptual voice properties of people with Parkinson (PWP). With the comparison of the earlier one, this way has permitted more specificity in the description of voice characteristics [16]. Jun Shao et al (2010) used parameters like percent jitter, percent shimmer, amplitude tremor intensity index (ATRI), frequency tremor intensity index (FTRI), fundamental amplitude tremor frequency (Fatr) from multi-dimensional voice program (MDVP). They have to compute nonlinear dynamic parameter (D2) from common sounds of healthier people and wavering sounds from the affected people. The D2 was very fruitful for distinguished normal sound from tremulous sound [17]. Spencer et al., describe the people affected from PD are not able to do their work by themselves and also decrease the capability to program development onset. The
220
R. Kumari and P. Ramachandran
results showed that the starting support for loss in terms of ataxic and hypokinetic dysarthria which distinct motor execution disorders [18]. Fletcher et al., describe time-by-count and generalize accurate measurement of diadochokinetic syllable generation as well as also talks about sex, age, and stimulus order [19]. The importance of Sabine [20] research is to determine the alteration of voice flow speech rate and pitch changes in PD affected people from a healthy one. A special pattern of speech rate can be determined in terms of articulatory acceleration. It also shows that the development of this parameter due to disease is more for male patients [20]. Jiri Mekyska et al (2011) determine the optimal parameters for the explanation of hypokinetic dysarthria speech disorder in PD. The speech parameters have two parts—one is used to identify hypokinetic dysarthria (HD) and the second parameter is used to observe the development of the disorder. Its optimization allows for the best automatic classification [21]. Alexander M. Gobeman et al (2005) show that the candidates can (a) distinguish the voice made by a healthy person, (b) explain the voice made by a person with PD, and (c) develop the strategies for people having PD when producing speech. On comparing with the earlier research, it becomes feasible that PD speakers use a clean speaking way as a trial to manage some speech deficits that accompany PD [22]. Max A. Little et al (2008) see that non-standard way of accompanying with the conventional harmonics-to-noise ratio is the fabulous method for the differentiation of a normal subject to the affected one and used it in a telemonitoring application. Further, accompanying knowledge and applying for the results of natural pitch deviation follows to formulate a new measure, pitch period entropy (PPE) gaining vital exercise increase [23]. Rahn III et al. [24] In a close examination, a stage variant and medication variant group having PD was examined and correlated various aspects of speech deterioration as PD progresses among patients [24]. Shimon Sapir et al (2010) used formant centralization ratio (FCR) and vowel space area (VSA) using the vowels a/, i/, and u/ so, it will be how easily these metrics may function on different vowels. The above observation shows that FCR is a dependable acoustic metric for segregating dysarthria from undistorted speech and for testing therapy effects [25]. (Sabine Skodda et al (2011)) Vowel articulation index (VAI) assumed to be better than triangular vowel space area (TVSA) is the sign of very small alteration of vowel articulation and beneficial for the beginning method of distorted vowel articulation in PD. The prime result verifies further confirmatory evidences to know the benefits of VAI as an alternate indicator of vowel articulation at multiple level of speech distortion [26]. Shimon [27] give sustainable support for the therapeutic results of Lee Silverman voice treatment (LSVT) on articulatory operation on the individual having PD. The results show a positive influence on the speech process without alternation of other processes [27]. Jayram et al., observed in details the fragmentation performance of the two fragmentation methods: (a) the spectral transition measure (STM)-based segmentation and (b) the maximum livelihood (ML) for various indicators representation. They performed a thorough observation on the strength of the various indicators representing additive white noise. They found the MFCC characteristics with a novel lifter given by [1+sin1/2 (pi*n/L)], 0 < n < L-1, for the MFCC dimension L, produced the best outcome in comparison of all LPC deduced indicators in both
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
221
clean and noisy speech up to - 10dB SNR [28]. Murty et al., The prime focus of their paper is to show the inter-dependency of residual phase and to reveal the information indeed helps to ameliorate the functioning of a traditional system depends on spectral characteristics such as MFCC. The speaker recognition system results in an equal error rate (EER) of 22% while using only residual phase information whereas 14% while using only the MFCC features. The amalgamation of the two gives an EER of 10.5% that is superior to both the individual systems [29]. Borrie et al., focus on utilizing noticeable knowledge for rehabilitative gain has been deduced up to the reach of the context of management of dysarthria, yet the extent of its application has a wider range. The learning source may be differentially emphasized by the nature of acoustic degradation [30]. Ahmed Ali et al., used maximum normalized spectral slope (MNSS) that is a very late concept that represents the amplitudes and characteristics of the spectrum. They also found that the average localized synchrony detection (ALSD) is much fruitful for voice detection and articulation detection which convert the acoustic abstract property into a measurable parameter [31]. Sorin Dusan et al (2007) show that polynomial approximation produced a beneficial and easy method for the reduction in speech parameters. This paper addresses the theoretical problems and experimental outcomes related to this kind of reduction. Practical application into a 2400b/s speech coder is observed with the objective and subjective demonstration of functioning under a noisy environment. The novel speech coder functions at a transfer rate of 1553 b/s and for under all noisy surroundings performs superior to the 2400b/s standard speech coder [32]. Vikas et al., This paper produces different methods for the treatment of the disease using voice detection at the very early phase. The voice features like pitch, formants, jitters, shimmer, glottal pulse, and MFCC have been examined for both healthy and PD affect males as well as females. The observations obtained were (a) male with PD have a higher pitch than the normal one. (b) Formants have more deviation in comparison with a normal person. (c) Glottal pulse has a similar pattern in normal persons while variation in PD persons. (c) PD patients have more jitter value [33]. Based on this review on conventional feature-based techniques, the consolidated features are listed here that we can see in Table 1. The review also reveals that lot of statistical parameters are also helpful for PD diagnosis. Some of the statistical parameters are mean intensity, standard deviation of intensity, mean fundamental frequency, standard deviation of fundamental frequency, relative standard deviation of fundamental frequency, variance, median, fundamental frequency variability, and Pearson correlation.
3 Machine Learning-Based Techniques For the early treatment of PD, using machine learning methods involve many steps like study participants [34], data collection [34], testing, data processing [34], data analysis [34], machine learning [34], and statistical analysis [35] and the whole
222
R. Kumari and P. Ramachandran
Table 1 List of conventional feature-based techniques S. No.
List of features
Comments/Observations
1
Jittering, shimmering, and voice signal frequency
The range of accuracy lies between 85%-95%
2
Pausing, phrasing, and syllable duration
PD subjects have high pitch level than the normal person
3
Phonation, articulation, and prosody
Like jitter these also achieved sufficient accuracy
4
Centralization and decentralization
Physical circumlaryngeal therapy is advantageous for both phonatory and articulatory systems
5
Percent jitter, percent shimmer, ATRI, FTRI, and Fatr
The nonlinear dynamic parameter was very fruitful for distinguished normal sound from tremulous sound
6
Speech rate and pitch
Determined in terms of articulatory acceleration
7
Hypokinetic dysarthria
Best automatic classification is demonstrated
8
Harmonics-to-noise ratio and pitch period Demonstrated in telemonitoring entropy (PPE) application
9
Formant centralization ratio (FCR) and vowel space area (VSA)
Dependable acoustic metrics for segregating dysarthria from undistorted speech and for testing therapy effects
10
Vowel articulation index (VAI) and triangular vowel space area (TVSA)
VAI assumed to be better than TVSA beneficial for the beginning method of distorted vowel articulation in PD
11
Mel frequency cepstral
Popular speech features
coefficient (MFCC) and linear predictive coding (LPC), formants, and glottal pulse 12
Maximum normalized spectral slope (MNSS) and average localized synchrony detection (ALSD)
ALSD is much fruitful for voice detection and articulation detection which convert the acoustic abstract property into a measurable parameter
process can be shown in Fig. 1. Various machine learning-based algorithms are reviewed in this paper. Radha et al. [3], use the classification method to segregate the samples of people affected with PD from a healthy person based on convolution neural network (CNN), artificial neural network (ANN), and hidden Markov model (HMM). CNN uses the spectrogram of speaker information, while other methods used only acoustic features. It is shown that on comparing ANN with HMM, and CNN, the former gave a better result, i.e., ANN gives a 96.20% recognition rate, while CNN provides 88.33% and HMM is 95% [3]. Shamrat et al., use three sophisticated algorithms for Parkinson’s disease analysis, i.e., support vector machine (SVM), k-nearest neighbor (k-NN,) and linear regression (LR). SVM gave 100% accuracy, while linear regression (LR) gave 97% and k-NN
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
223
Fig. 1 Machine learning flow
only 60% for PD detection [4]. Liaqat [1] used the linear discriminant analysis (LDA) model for dimensionality contraction and the neural network model for classification. In this paper, they used leave one subject out (LOSO) scheme. The experimental result showed that the newly developed method efficiently distinguishes affected persons from healthier ones up to a precision of 95% and 100% on training and testing databases, respectively [1]. Pahuja et al. [36] gave a new way to create an identification model by combining biomarkers and striatal binding ratio (SBR) values. The accuracy of the newly developed model can be tested using a known test database. SBR gives a value of four brain regions and five biological biomarkers. In addition to one or more biological marker plasma, PD identification gives 99.62% accuracy [36]. Rehman et al (2019) A comprehensive machine learning approach was used in this study, and a combination of spatial-temporal gait characteristics is the most important feature for classification. Random forest (RF), SVM, and LR emerged as the best classification for their database. RF gave 97% accuracy for classification [37]. (Akshay S et al (2019)) In this paper, a feed-forward technique is used to distinguish a PD patient from a healthy person. It gives the best results [2]. Berus et al., Artificial neural network is used with various configurations to make predictions of PD. Leave one subject out (LOSO) scheme validated the results depends on Pearson’s correlation coefficient, Kendall’s correlation coefficient, principal component analysis, and self-organizing maps have been used. Kendall’s correlation coefficient gave the best results with 86% accuracy [7]. Zehra [9] This paper presents a machine learning-based diagnosis of PD. This mechanism comprises feature selection and classification processes. Feature selection task comprises feature importance and recursive feature elimination method, whereas classification takes regression trees, ANN, and SVMs. Elimination gives the highest accuracy of 93.84% [9]. Kamran et al. [5], This paper presents an early diagnosis of PD. To overcome the high variability challenge, we used combined PD handwriting datasets and used transfer learning-based algorithms. It gives an excellent result with 99.22% accuracy [5]. CALISKAN et al., This paper helps in improving the diagnosis of PD by using a deep neural network classifier for the detection of speech impairments. The given classifier proves to be better than the Oxford Parkinson’s Disease Detection (OPD) and Parkinson Speech Dataset (PDS). It makes efficient classification [38]. Nissar et al., This paper focuses on how to evaluate the PD diagnosis by determining the voice
224
R. Kumari and P. Ramachandran
signals. Various classifiers have been used for voice data. Among all the classifiers, it has been concluded that XG boost is the best. It achieves 92.76% accuracy with the help of the recursive feature elimination (RFE) feature selection technique, when the minimum redundancy maximum relevance (mRMR) feature selection technique is considered, it attains 95.39% accuracy [6]. Sakar et al., Under this study, they look into the Parkinson dataset with the help of known learning tools. This study provides an opportunity to widen the acceptability and validity of the earlier developed models and can tempt biomedical signal processing and machine learning studies by giving differentiated results datasets for models based on the dataset for PD [39]. Hazan et al., Experiment reveals (a) early identification of PD with the help of data seems to have practical and precise results up to 90% in the two datasets, (b) enough characteristics of a dataset can be evaluated as language-dependent, (c) despite the earlier point, machine learning can be successfully used to train and test a variety of datasets altogether, and (d) machine learning can be trained in one country and tested in different one [40]. Frid et al. (2014) Under this study, they see that a differential therapy can be induced directly by using an analog speech signal itself. A distinction can be found among 7 different stages of the disease. This process showed that the auditory signal itself can replace choice, selection, and combination [41]. Cosi et al. [42], The hybrid artificial neural network/hidden Markov model systems use Centre for Spoken Language Understanding (CSLU) speech for the development and implementation. When the different front-end processing and system was compared on considering the best characteristics, they get 98.92% accuracy in word recognition 92.62% in the FIELD continuous digit recognition task. The recently best developed Italian recognizer consists of a CSLU toolkits design module, and a simple real-time Italian-language demonstration program has been developed [42]. Frid et al., elaborate an acoustic-phonetic analysis of the voice-less fricative to obtain good characteristics for their division. An SVM works with a certain set of phonemes and is verified on all the phonemes. The process gives outstanding results, though here all fricatives have been used [43]. Hasegawa-Johnson et al. [44], This study focuses on automatic isolated digital recognition for speakers of less intellectual occurred due to various symptoms associated with spastic dysarthria. The dynamic feature of the hidden Markov model (HMM) gives to a certain level of strength against a magnifying ground word length vacillation, whereas the regularize discriminative error matrix used to perform the sum gives it to a certain level of strength against retardation and detection of consonants [44]. (Santiago Omar Caballero et al (2009)) They correct the errors in a model for speech disorders instead of it adapting the acoustic models. For this work, there are two ways: (a) metamodels which include phonetic confusion matrix and (b) weighted finite-state transducers. These methods are available for the rectification of the errors at the phonetic level and for the determination of the language model. These methods are the best at the experiment level [45]. Rabiner [46] In this study, try to review carefully and methodically the theoretical condition of such type of data modeling and demonstrate how it has been utilized for a particular issue in machine recognition of speech [46]. Athanasions [47] Under this study, they tested how precisely these new algorithms can be applied to segregate the
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
225
affected subjects from healthier ones. There are two classifiers, one is random forests and the other is SVM. Previously available datasets comprise 263 samples from 43 subjects have been used and demonstrated in which new dysphonia methods can perform very well in state-of-the-art results, giving 39% overall precision of classification while using only 10 dysphonia characteristics. They observed some latest pro posed dysphonia methods complement available algorithms in enhancing the virtue of classifiers to segregate the sufferers from the healthier ones [47]. Sakar et al., This paper focuses on the significance of the quick diagnosis of PD. They observe the ability of vocal characteristics using machine learning techniques following a twofold method. For the first one, they target to find out the group of relatively higher severity of speech distortions with the help of patient data. For this, they used three supervised and two unsupervised machine learning techniques. For the second one, they create samples of PD patients with less severity using three classifiers with many settings to address this binary classification problem. Under the process, 96.40% was the maximum accuracy and 0.77 was the Mathew’s correlation coefficient [48]. Tsanas et al., They associate speech with signal processing algorithms. On obtaining the data, they get medically helpful characteristics of PD dynamism. Here, they use linear as well as nonlinear regression techniques that consist of classical least squares and nonparametric separation and regression trees. They certify our results with the available database. These results corroborate the possibility of regular, remote, and precise unified Parkinson’s disease rating scale (UPDRS) tracking [49]. Benba et al., To obtain the voiceprint from each sample of voice collected, they reduced the frame by the help of finding of their average value. For classification, they applied the leave one subject out validation scheme in addition to its various kernels. When using first 12 coefficient of tree MFCC by linear kernels SVM, the accuracy achieved is 91.17% that was the best accuracy [50]. Machine learning techniques perform a crucial role in numerous aspects, especially in the health sector. Machine learning provides a dynamic output when data is fed into it. For the diagnosis of disease, it is very advantageous. Nowadays, the use of machine learning algorithms is in trend. When more data is used, its precision values are enhanced and provide more accurate results in terms of predictions [51]. To check the working of machine learning methods for segregating Parkinson patients and for the determination of accuracy [50], following calculations are applied: • • • • • •
Confusion matrix [7] Sensitivity [52] Specificity [52] Accuracy [52] Precision [52] F-1 score [52]
Confusion matrix determines four terms: true positive (TP), true negative (TN), false positive (FP), and (false negative) [7].
226
R. Kumari and P. Ramachandran
Sensitivity is the virtue that can detect and correct the patients of Parkinson disease. It is defined as the ratio of true positive to the addition of true positive and false negative [52]. Specificity reveals the proportion of correct predictions. It is the ability to measure correctly detect normal values. Mathematically, it can be expressed in terms of ratio of true negative to the summation of true negative and false positive [52]. Accuracy is said to be as the ratio of summation of true positive and true negative to the summation of true positive, true negative, false positive, and false negative. It is directly proportional to the correct prediction, i.e., higher the accuracy, the better will be the prediction work, and vice-versa [52]. Precision means the occurrence of positive prediction. It is defined as the ratio of true positive to the summation of true positive and false positive [52]. F-1 score means combined effect of precision and sensitivity. It is expressed as the ratio of two times of true positive to the summation of two times of true positive, false positive, and false negative [52]. Apart from this, we can also use area under curve (AUC) characteristics [52]. Table 2 consolidates about machine learning-based techniques and their performance. The information related to datasets used in various work are also included in the table.
4 Discussion Feature-based applications—The very early work [53] paves the path for PD detection from speech and throws light on the need for speech feature analysis to detect PD in early stage. Later to this, in feature-based applications, various features like jitter, shimmer, harmonic-to-noise ratio, mel frequency cepstral coefficient (MFCC), LPC, voice signal frequency, ATRI, FTRI, Fatr, speech rate, pitch variation, pitch, PPE, formants, glottal pulse, vowel space area, formant centralization ratio (FCR), etc., have been used. Mostly MFCC and LPC have been used. They have better properties. Machine learning-based algorithms—In machine learning-based algorithms support vector machine(SVM), convolution neural network (CNN), artificial neural network(ANN), hidden Markov model (HMM), LR, LDA, LOSO, biomarker, RF, Pearson correlation coefficient, Kendall’s correlation coefficient, principal component analysis, regression trees, deep neural network, XG boost, RFE, mRMR, random forest, Mathew’s correlation coefficient, classical least square, nonparametric separation, linear kernel, etc., have been used. They gave better results in terms of highest accuracy as compared to others.
5 Conclusion This work focuses literature review on the speech analysis-based early detection of PD with an objective to segregate the affected patients from healthy ones to give an
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
227
Table 2 List of machine learning-based techniques S. Author’s name with Methods used No. year
Datasets
Accuracy
Remarks
1
Radha et al. [3]
Convolution neural network (CNN), artificial neural network (ANN), and hidden Markov model (HMM)
The dataset used at King’s College London Hospital in Denmark Hill, London. The dataset consists of total 37 sound recordings. After sub-slicing the dataset increases 191 healthy control audio samples and 131 PD audio samples
ANN gives a Among all 96.20% ANN is best recognition rate, while CNN provides 88.33% and HMM is 95%
2
Shamrat et al. [4 ]
Support vector machine (SVM), k-nearest neighbor (k-NN), and linear regression (LR)
The dataset consists of 62 individuals with PD, and15 people groups were sound
SVM gave Among all 100% SVM is best accuracy, while LR gave 97% and KNN only 60% for PD detection
3
Liaqat [1]
Linear discriminant analysis (LDA) model for-dimensionality contraction and the neural network model for classification
The training database contains 20 PD patients and 20 healthy subjects
-
The newly developed method efficiently distinguishes af fected per sons from healthier ones up to a precision of 95%
4
Rehman et al. [37]
Support vector machine (SVM), RF and LR
The dataset was used 1040 audio records
RF gave 97%accuracy for classification
–
(continued)
228
R. Kumari and P. Ramachandran
Table 2 (continued) S. Author’s name with Methods used No. year
Datasets
Accuracy
Remarks
5
Berus et al. [7]
Artificial neural network (ANN)
The dataset was collected 20 healthy controls (10 male and 10 female) and 20 patients (14 male and 06 female) with PD
Kendall’s correlation coefficient gave the best results with 86% accuracy
Kendall’s correlation coefficient is the best among all
6
Zehra [9]
Artificial neural The dataset network (ANN) consists of 31 and support vector people machine (SVM)
Elimination gives the highest accuracy of 93.84%
-
7
CALISKAN et al.[38]
Deep neural network classifier
The dataset consists of 1040 samples for training set and 168 samples for testing set
The given classifier proves to be better than the Oxford Parkinson’s Disease Detection (OPD)and PDS datasets
8
Nissar et al. [6]
XG boost
The dataset It achieves consists of 92.76% 188 PD accuracy patients in which 107 were men, 81 were women and the control individuals in which 41 were women and 23 were men
Among all the classifiers, it has been concluded that XG boost is the best
(continued)
early diagnosis that can reduce the complexities. The speech-based model gives the best accuracy in predicting PD. This model can be helpful to patients living in faroff places where medical facilities are not available. It will be outstandingly helpful for vulnerable people. In our review, we found that mainly two types of techniques have been used in this problem (a) conventional feature-based techniques and (b)
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
229
Table 2 (continued) S. Author’s name with Methods used No. year
Datasets
Accuracy
Remarks
9
Cosi et al. (2002)
Artificial neural network/hidden Markov model
They have used more than 1000 speakers
They get 98.92% accuracy in word recognition 92.62% in the FIELD continuous digit recognition task
-
10
Frid et al. [43]]
Support vector machine (SVM)
More than 35 parameters and features both in time-domain and frequency domain were studied
The process gives outstanding results
11
Hasagawa-Johnson Hidden Markov et al. [44] model (HMM)
The data were recorded from three subjects with spastic dysarthria: two male and one females
The dynamic feature of the hidden Mar
kov model (HMM) gives to a certain level of strength against a magnifying ground word length vacillation (continued)
machine learning-based techniques. The detailed review using these types of algorithms is presented in this paper. In feature-based applications, mel frequency cepstral coefficient (MFCC) and LPC are the mostly used features. Machine learning-based algorithms used intelligent architecture like artificial neural network (ANN), convolution neural network (CNN), hidden Markov model (HMM), XG boost, support vector machine (SVM), etc. It is found that machine learning-based algorithms are doing better in terms of highest accuracy but with some limitations. Machine learning algorithms, we discussed in our review directly process the raw speech. The total
230
R. Kumari and P. Ramachandran
Table 2 (continued) S. Author’s name with Methods used No. year
Datasets
Accuracy
Remarks
12
Athanasions [47]
Random forests and the other are support vector machine (SVM)
The datasets comprise 263 samples from 43 subjects have been used
Giving 39% overall precision of classification while using only 10 dysphonia characteristics
They observed some latest pro posed dysphonia methods complement available algorithms in enhancing the virtue of classifiers to segregate the sufferers from the healthier ones
13
Benba et al. [50]]
Support vector machine (SVM)
The dataset Got 91.17% used in this that was the study was 17 best accuracy patients with PD (06 female and 11 male) and 17 healthy people (08 female and 09 male)
-
number of samples of speech input to the neural network is very high and sometimes, useful features salient to PD are missed during pre-processing stage. The collection of a large sample from medical institutions (clinics and centers) can enhance the accuracy of detection and diagnosis. This research work creates a new direction for future work in this field. Nowadays, machine learning-based algorithms are doing better in terms of highest accuracy but with some limitations. These limitations can be improved if we use MFCC and LPC parameters.
References 1. Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J Transl Eng Health Med 7:2000410 2. Akshay S, Vincent K (2019) Identification of Parkinson disease patients classification using feed forward technique based on speech signals. IJEAT 8(5)
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
231
3. Radha N, Rm SM, Sameera HS (2021) Parkinson’s disease detection using machine learning techniques. J Adv Res Dyn Control Syst 30(2):543 4. Shamrat FMJM, Asaduzzaman M, Rahman AKMS, Tusher RTH, Tasnim Z (2019) A comparative analysis of Parkinson dis- ease prediction using machine learning approaches. Int J Sci Technol Res 8(11):2576–2580. ISSN: 2277-8616 5. Kamran I, Naz S, Razzak I, Imran M (2021) Handwriting dynamics assessment using deep neural network for early identification of Parkinson’s disease. Future Gener Comput Syst 117:234–244 6. Nissar I, Rizvi D, Masood S, Mir A (2019) Voice-based detection of Parkinson’s disease through ensemble machine learning approach: A performance study. EAI Endorsed Trans. Pervasive Health Technol 5(19):162806 7. Berus L, Klancnik S, Brezocnik M, Ficko M (2018) Classifying Parkinson’s disease based on acoustic measures using artificial neural networks. Sensors (Basel) 19(1):16 8. Abhishek MS, Chethan CR, Aditya CR, Divitha D, Nagaraju TR (2020) Diagnosis of Parkinson’s disorder through speech data using machine learning algorithms. 9(3):2278–3075 9. Karapinar Senturk Z (2020) Early diagnosis of Parkinson’s disease using machine learning algorithms. Med Hypotheses 138(109603):109603 10. Canter GJ (1963) Speech characteristics of patients with Parkinson’s disease: I. intensity, pitch, and duration. J Speech Hear Disord 28:221–229 11. Rusz J, Cmejla R, Ruzickova H, Ruzicka E (2011) Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkin- son’s disease. J Acoust Soc Am 129(1):350–367 12. Hartelius L, Svensson P (1994) Speech and swallowing symptoms associated with Parkinson’s disease and multiple sclerosis: a survey. Folia Phoniatr Logop 46(1):9–17 13. Metter EJ, Hanson WR (1986) Clinical and acoustical variability in hypokinetic dysarthria. J Commun Disord 19(5):347–366 14. Baker KK, Ramig LO, Luschei ES, Smith ME (1998) Thyroarytenoid muscle activity associated with hypophonia in Parkinson disease and aging. Neurology 51(6):1592–1598 15. Roy N, Nissen SL, Dromey C, Sapir S (2009) Articulatory changes in muscle tension dysphonia: evidence of vowel space expansion following manual circumlaryngeal therapy. J Commun Disord 42(2):124–135 16. Holmes RJ, Oates JM, Phyland DJ, Hughes AJ (2000) Voice characteristics in the progression of Parkinson’s disease. Int J Lang Commun Disord 35(3):407–418 17. Shao J, MacCallum JK, Zhang Y, Sprecher A, Jiang JJ (2010) Acoustic analysis of the tremulous voice: assessing the utility of the correlation dimension and pertur- bation parameters. J. Commun. Disord. 43(1):35–44 18. Spencer KA, Rogers MA (2005) Speech motor programming in hypokinetic and ataxic dysarthria. Brain Lang. 94(3):347–366 19. Fletcher SG (1972) Time-by-count measurement of diadochokinetic syllable rate. J Speech Hear Res 15(4):763–770 20. Skodda S, Rinsche H, Schlegel U (2009) Progression of dysprosody in Parkinson’s disease over time–a longitudinal study. Mov Disord 24(5):716–722 21. Mekyska J, Rektorova I, Smekal Z (2011) Selection of optimal parameters for automatic analysis of speech disorders in Parkinson’s disease. In: 2011 34th international conference on telecommunications and signal processing (TSP) 22. Goberman AM, Elmer LW (2005) Acoustic analysis of clear versus conversational speech in individuals with Parkinson disease. J. Commun Disord 38(3):215–230 23. Little M, McSharry P, Hunter E, Spielman J, Ramig L (2008) Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Derm Helv 24. Rahn DA 3rd, Chou M, Jiang JJ, Zhang Y (2007) Phonatory impairment in Parkinson’s disease: evidence from nonlinear dynamic analysis and perturbation analy- sis. J Voice 21(1):64–71 25. Sapir S, Ramig LO, Spielman JL, Fox C (2010) Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. J Speech Lang Hear Res 53(1):114–125
232
R. Kumari and P. Ramachandran
26. Skodda S, Visser W, Schlegel U (2011) Vowel articulation in Parkinson’s disease. J. Voice 25(4):467–472 27. Sapir S, Spielman JL, Ramig LO, Story BH, Fox C (2007) Effects of intensive voice treatment (the Lee silverman voice treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: acoustic and perceptual findings. J Speech Lang Hear Res 50(4):899–912 28. SaiJayram AKV, Ramasubramanian V, Sreenivas TV (2002) Robust parameters for automatic segmentation of speech. In: IEEE international conference on acoustics speech and signal processing 29. Murty KSR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13(1):52–55 30. Borrie SA, McAuliffe MJ, Liss JM (2012) Perceptual learning of dysarthric speech: a review of experimental studies. J Speech Lang Hear Res 55(1):290–305 31. Ali AMA, Van der Spiegel J, Mueller P (2001) Acoustic-phonetic features for the automatic classification of fricatives. J Acoust Soc Am 109(5):2217–2235 32. Dusan S, Flanagan JL, Karve A, Balaraman M (2007) Speech compression by polynomial approximation. IEEE Trans Audio Speech Lang Process 15(2):387–395 33. Vikas, Sharma RK (2014) Early detection of Parkinson’s disease through Voice. In: 2014 international conference on advances in engineering and technology (ICAET) 34. Adams WR (2017) High-accuracy detection of early Parkinson’s Disease using multiple characteristics of finger movement while typing. PLoS ONE 12(11):e0188226 ˇ 35. Hlavniˇcka J, Cmejla R, Tykalová T, Šonka K, R˚užiˇcka E, Rusz J (2017) Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder. Sci. Rep. 7(1):12 36. Pahuja G, Nagabhushan TN, Prasad B (2019) Early detection of Parkinson’s disease by using SPECT imaging and biomarkers. J Intell Syst 29(1):1329–1344 37. Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L (2019) Selecting clinically relevant gait characteristics for classification of early parkinson’s disease: a comprehensive machine learning approach. Sci. Rep. 9(1):17269 38. Caliskan A, Badem H, Basturk A, Yuksel ME (2017) Diagnosis of the Parkinson dis- ease by using deep neural network classifier. IU-J Electr Electron Eng 17:3311–3318 39. Sakar BE et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform 17(4):828–834 40. Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S (2012) Early diagnosis of Parkinson’s disease via machine learning on speech data. In: 2012 IEEE 27th conven tion of electrical and electronics engineers in Israel 41. Frid A, Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S (2014) Computational diagnosis of Parkinson’s disease directly from natural speech using machine learning techniques. In: 2014 IEEE international conference on software science, technology and engineering 42. Cosi P, Hosoma JP, Valente A (2005) High performance telephone bandwidth speaker independent continuous digit recognition. In: IEEE workshop on automatic speech recognition and understanding, 2001. ASRU ’01 43. Frid A, Lavner Y (2010) Acoustic-phonetic analysis of fricatives for classification using SVM based algorithm. In: 2010 IEEE 26th convention of electrical and electronics engineers in Israel 44. Hasegawa-Johnson M, Gunderson J, Perlman A, Huang T (2006) Hmm-based and svmbased recognition of the speech of talkers with spastic dysarthria. In: 2006 IEEE international conference on acoustics speed and signal processing proceedings 45. Caballero Morales SO, Cox SJ (2009) Modelling errors in automatic speech recognition for dysarthric speakers. EURASIP J Adv Signal Process 2009(1) 46. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE Inst Electr Electron Eng 77(2):257–286 47. Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271
A Review on Early Diagnosis of Parkinson’s Disease Using Speech …
233
48. Sakar BE, Serbes G, Sakar CO (2017) Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PLoS One 12(8):e0182428 49. Tsanas A, Little MA, McSharry PE, Ramig LO (2010) Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans Biomed Eng 57(4):884–893 50. Benba A, Jilbab A, Hammouch A, Sandabad S (2015) Voiceprints analysis using MFCC and SVM for detecting patients with Parkinson’s disease. In: 2015 interna tional conference on electrical and information technologies (ICEIT) 51. Anila M, Laksmaiah K (2020) Education foundation. A review on Parkinson’s disease diagnosis using machine learning techniques. Int J Eng Res Technol (Ahmedabad) V9(06) 52. Wang W, Lee J, Harrou F, Sun Y (2020) Early detection of Parkinson’s disease using deep learning and machine learning. IEEE Access 8:147635–147646 53. Perez KS, Ramig LO, Smith ME, Dromey C (1996) The Parkinson larynx: trem- or and videostroboscopic findings. J Voice 10(4):354–361
Investigation of Attention Deficit Hyperactivity Disorder with Image Enhancement and Calculation of Brain Grey Matter Volume using Anatomical and Resting-State functional MRI K. Usha Rupni and P. Aruna Priya Abstract Attention deficit hyperactivity disorder (ADHD) is a common neurodevelopmental disorder affecting school children which often continues till their adulthood and makes their normal life difficult. Therefore, detection at the early stage of the disorder is very essential. Till today the diagnosis of ADHD is done by doctors using the guidelines in American Psychiatric Association’s Diagnostic and Statistical Manual (DSM-5). This diagnosis is based on the symptoms which change over time. Therefore, the use of non-invasive imaging techniques like functional magnetic resonance imaging (fMRI) is necessary and may support doctors in the diagnosis of ADHD. In this paper, we have used structural and functional MRI scan images to investigate and analyse the difference between ADHD and its subtypes. Most of the research earlier is done on the classification between ADHD-affected and normal brains. Therefore, in this research work, ADHD and non-ADHD brain along with the ADHD subtypes brain difference based on image enhancement technique is investigated. The result shows image sharpening filters give the difference in the shape of the caudate nucleus for typically developing (TD) and ADHD subtypes. The grey matter volume of the brain is calculated, and we get values ranging approximately from 3.3 × 105 mm3 to 3.8 × 105 mm3 for an anatomical image. The result indicates that the grey matter volume of ADHD brains is slightly lesser than TD children’s brains. Keywords Attention deficit hyperactivity disorder subtypes · Brain volume · Image enhancement · MRI
K. U. Rupni (B) · P. A. Priya Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_19
235
236
K. U. Rupni and P. A. Priya
1 Introduction Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder that can be called hyper kinetic disorder (HKD). A study performed in Coimbatore, India, showed that prevalence in children to be above the global estimate, at 11.32% [1]. Ages 9 (at 26.4%) and 10 (at 25%) are affected maximum from ADHD. Further, the study found that more males (66.7%) possess ADHD. Children with ADHD not only have poor performance in school academics and behavioural difficulty but also have problems in reading and writing. In the USA, it is estimated that 6.1 million children aged 2 to 17 have been diagnosed with ADHD [2]. There are biological variations between the ADHD brain and the brain of a healthy individual, and these differences are based on anatomy, function, and chemistry. Researchers at Redbud University Nijmegen Medical Centre reported that ADHD-affected children have smaller brain volumes in subcortical areas, and also the total brain size will be smaller. The difference in brain size is more in children when compared to ADHD adults [3]. ADHD symptoms vary from person to person, and we have three subtypes namely inattentive, impulsive/hyperactive, and combined. a. Inattentive Type ADHD (ADHD-I): This kind of ADHD person will have attention problems alone, i.e., they will find it difficult to focus and listen to other person speaking. Other symptoms are an easy distraction, unable to pay close attention to details, forgetfulness in daily activities, often carelessness and losing things such as pen, pencil, water bottle, etc., and struggle to do organized tasks and activities. b. Impulsive/Hyperactive Type ADHD (ADHD-H): This kind of ADHD person will have hyperactive and impulsive behaviours but not have attention difficulty, and it is a rare type of ADHD. Other impulsive symptoms are disturbing others, not being patient, and having difficulty waiting for their turn, continuously speaking, and answering a question before completion. The symptoms of hyperactivity behaviours are continuously moving, restlessness, unable to focus in one job at a time, not quietly engaging in any given activities. c. Combined Type ADHD (ADHD-C): This type of ADHD subtype is very common. People with this kind of ADHD will be having a combination of impulsivity, hyperactivity, and inattention symptoms. When more than six symptoms of inattentiveness and hyperactivity/impulsivity are present in children for a minimum of six months are only diagnosed as combined type ADHD. Magnetic resonance imaging (MRI) is a widely used technology for visualizing the structure, function, and metabolism of the living human brain. fMRI is a part of MRI that deals with the functional activity of the brain in the resting state or active state. All kinds of MRI data are obtained without exposing patients to ionizing radiation or radioactive isotopes. MRI and fMRI continue to be successful brain imaging techniques in psychiatric practice and research due to safety, wide variety, and affordability. As there is a study published in Radiology says that brain MRI can
Investigation of Attention Deficit Hyperactivity Disorder with Image …
237
Fourier Transform
Filtering using
Inverse Fourier
of image
high-pass and low-pass filters
Transform of image
Input Enhanced Output
Image
Image
Fig. 1 Block diagram of image enhancement in the frequency domain
be used in ADHD diagnosis and the information from brain MRIs help to distinguish among subtypes of ADHD. Therefore, we choose to do research work using this non-invasive imaging technique. The medical image plays important role in clinical applications. The raw biomedical images are blurry, and it is due to the noise from the scattering of photons (Rayleigh, Compton). The clarity of images for human visualization can be improved by image enhancement. The blur and noise of an image are removed to increase contrast, and it gives detail of an image. In digital image processing, we have enhancement techniques using spatial and frequency domains. The spatial domain image enhancement techniques are based on the conversion of pixel values of an image but has the disadvantage of robustness and imperceptibility requirements are not enough. Thus, we go for frequency domain enhancement techniques which are based on modifying the Fourier transform of an image. In this technique, the image is enhanced by changing the transform coefficient of the image, such as discrete fourier transform (DFT) or fast fourier transform (FFT). The advantages of this method are simple computation, freedom of viewing and changing the frequency composition of the image, and the easy applicability of special transformed domain properties. The process involved during image enhancement in the frequency domain (see Fig. 1). The smoothing, sharpening, and edge detection in an image are done using filters. Image smoothing is an enhancement technique that reduces maximum noise, but at the same time, it blurs the edges of the image. To perform the image smoothing enhancement, the image is convolved with a low-pass (lp) filter. In the image sharpening enhancement technique, the edges and fine details in an image are highlighted. It is achieved by convolving the image with a high-pass (hp) filter. The link between the transfer functions of a high-pass filter (H hp ) and a low-pass filter (H lp ) is shown in Eq. 1. Hhp (i, j) = 1 − Hlp (i, j)
(1)
The three main filters in digital image processing are the Ideal filter, Butterworth filter, and Gaussian filter. The high-frequency values of the Fourier transform with a distance greater than a specified distance from the origin (D0 ) of the transformed
238
K. U. Rupni and P. A. Priya
image are removed using lp filters. The filter transfer function is defined as the ratio of its output to input depending upon frequency. The transfer functions of the ideal lp filter, Butterworth lp filter, and Gaussian lp filter are given by Eqs. 2, 3, and 4, respectively [4].
1, if D(i, j) ≤ D0 0, if D(i, j) > D0
(2)
1 1 + [D(i, j)/D0 ]2n
(3)
Hlp (i, j) = Hlp (i, j) =
Hlp (i, j) = e−D
2
(i, j)/2D02
(4)
1 where D(i, j) = (i − M/2)2 + ( j − N /2)2 / 2 . In the case of hp filter, all the low frequencies are removed and pass only the high frequencies. The transfer function of ideal hp and Butterworth hp filter of order n is given by Eqs. 5 and 6. The transfer function of Gaussian hp filter is given by Eq. 7 [4]. Hhp (i, j) = Hhp (i, j) =
0, if D(i, j) ≤ D0 1, if D(i, j) > D0
(5)
1 1 + [D0 /D(i, j)]2n
(6)
Hhp (i, j) = 1 − e−D
2
(i, j)/2D02
(7)
Decreased volume in regions associated with nodes of the default mode network (DMN) including the posterior cingulate, precuneus, and parahippocampal regions for the ADHD-I group relative to ADHD-C found in the study by Anderson et al. [5]. Graph theory measures network topology properties computed from volumetric measures between the subtypes of ADHD utilized in the study of Saad et al. [6]. Another study by Al-Amin et al., observed hippocampal volume in the ADHD-C type relative to ADHD-I, and controls got reduced [7]. Image enhancement for the medical image is essential, and it was given by the study of Salem et al. [8] where image enhancement has been done based on a histogram algorithm. Partha et al., conducted a study on filters for biomedical imaging and image processing and it shows that sharpening filters are more useful [9]. Some of the research works related to brain volume calculation and image enhancement discussed are the motivation for this research work.
Investigation of Attention Deficit Hyperactivity Disorder with Image …
239
2 Methodology The dataset used in this research work is obtained from the Neuro Bureau ADHD200 competition [10]. The majority of ADHD research has relied solely on this dataset, and it consists of neuroimaging data on typically developing (TD) and ADHD subjects with a total of 947 data. The dataset includes eight different imaging sites, and it is comprised of resting-state fMRI data, anatomical data, as well as phenotypic information. In this research work, we have used anatomical and resting-state fMRI data from New York University Medical Center (NYU) imaging site, it has the maximum image data, and the classification accuracy is higher compared to other sites [11]. In this paper, image smoothing and image sharpening are done with an ideal filter and Gaussian filter. To enhance an image using the frequency domain technique, the following steps are performed: a. Input image transformed to the Fourier domain using FFT. b. The filter is multiplied by the Fourier transformed image. c. The final step is obtaining an enhanced image by inverse FFT (iFFT). Using sharpening filters, the brain images are subjectively more appealing. The shape of brain regions like the left temporal lobe, bilateral cuneus, and areas around the left central sulcus are changed can be seen after image enhancement. More particularly, the caudate nucleus shape is differing in TD, ADHD, and its subtypes are viewed clearly. But during this process, important features may disappear while artefacts that simulate pathological processes might be generated. The performance metric used for image enhancement by sharpening and smoothing filters is meansquared-error (MSE). It is defined as the measures of the average squared difference between original and enhanced image pixel values and the formula is given by MSE =
N M 2 1 Oi, j − E i, j M × N i=1 j=1
(8)
In this paper, brain image is also analysed based on the calculation of grey matter volume to distinguish between TD and ADHD subtypes. The complete brain volume and grey matter are reduced in children having ADHD, but white matter volume is the same in all lobes [12]. Therefore, we have calculated the grey matter brain volume alone. The formula for calculating the voxel per size (v) is given by v = voxel size × slice thickness
(9)
The formula for calculating the grey matter brain volume (V ) is given by V =v×A where A is the area of the brain and unit for V is mm3
(10)
240
K. U. Rupni and P. A. Priya
The algorithm for image enhancement and calculation of the grey matter volume of the brain is written in MATLAB. The volume per voxel is given by the product of voxel size and slice thickness. The values of voxel size and slice thickness are obtained from the scan parameter which is available in the dataset. Therefore, we can calculate the grey matter volume of the brain with the product of total brain area and volume per voxel.
3 Results and Discussion We have collected TD and subtypes of ADHD anatomical and resting-state fMRI brain images from the ADHD-200 dataset. Single anatomical images of TD and ADHD subtypes (see Fig. 2) are transformed to Fourier transform using FFT, then the transformed image is convolved with the filters. After the image enhancement process of the anatomical image, there is a clear difference in the enhanced image with a Gaussian hp filter compared to the original anatomical image (see Fig. 4). In particular, the shape of the caudate nucleus is different and it is a small structure in the subcortical region of the brain which is related to cognitive and motor control. Using a Gaussian lp filter, most of the features are lost making the image very blurry (see Fig. 5).
Fig. 2 Original anatomical image
Investigation of Attention Deficit Hyperactivity Disorder with Image …
Fig. 3 Enhanced anatomical image using Ideal hp filter
Fig. 4 Enhanced anatomical image using Gaussian hp filter
241
242
K. U. Rupni and P. A. Priya
Fig. 5 Image of anatomical brain using Gaussian lp filter
For fMRI images, step-by-step images obtained during the whole image enhancement procedure are provided. FFT is used to turn single resting-state fMRI images of TD and ADHD subtypes (see Fig. 6) to Fourier transform. The transformed images (see Fig. 7) are convolved to filters which are shown in Figs. 8, 9, 10, and 11. The cutoff frequency D0 is kept lesser than 1, but for the ideal lp filter, D0 is increased than 1 to get some features. The enhanced brain images obtained after the iFFT are demonstrated in Figs. 12, 13, 14, and 15. The variation in the morphology of the caudate nucleus between the TD and ADHD subtypes is readily seen (see Fig. 14). Therefore, we can conclude that image sharpening using the Gaussian hp filter is a significant preprocessing step of brain images that may assist healthcare providers in looking for more details to detect neurodevelopmental disorders. Furthermore, the ideal filters result in a ringing effect whereas Gaussian filters give the filtered image without a ringing effect. The MSE performance metric obtained for the image enhancement filters on anatomical and resting-state fMRI brain images is shown in Table 1. The grey matter volume for TD and ADHD subtypes is calculated, and the values are given in Table 2. We have left the ADHD-H subtype as it is a rare type and we do not have many images in the used dataset. The grey matter brain volume is calculated and compared for TD, ADHD-I, and ADHD-C. The value ranges approximately from 3.3 × 105 mm3 to 3.8 × 105 mm3 for the anatomical image of the ADHD-200 dataset. The graphical comparison of grey matter volume of the brain for TD and ADHD subtypes is shown in Fig. 16. We have taken 15 sets of the anatomical image each set consisting of ADHD-I, ADHD-C, and TD brain images. The grey matter volume of TD children’s brains is slightly higher than the ADHD in most of the sets. Therefore,
Investigation of Attention Deficit Hyperactivity Disorder with Image …
Fig. 6 Original resting-state fMRI image
Fig. 7 Transformed image after FFT
243
244
K. U. Rupni and P. A. Priya
Fig. 8 Ideal hp filter multiplied with Fourier transformed image
Fig. 9 Ideal lp filter multiplied with Fourier transformed image
the calculation of grey matter volume from the brain scan may support the detection of ADHD. From the results discussed, we got the shape of the caudate nucleus for ADHD subtypes is different from the TD brain, and also the grey matter volume of the ADHD brain is lesser when compared to the TD brain are the most noteworthy findings.
Investigation of Attention Deficit Hyperactivity Disorder with Image …
245
Fig. 10 Gaussian hp filter multiplied with Fourier transformed image
Fig. 11 Gaussian lp filter multiplied with Fourier transformed image
4 Conclusion ADHD continues to increase year after year in school children, the diagnosis at an early stage is very essential for children to perform well in school and their daily work without any difficulties. Behaviour therapy along with training parents and
246
K. U. Rupni and P. A. Priya
Fig. 12 Enhanced resting-state fMRI image using Ideal hp filter
Fig. 13 Image of resting-state fMRI using Ideal lp filter
medications are two ways of treatment for ADHD but in case of diagnosis, we require more research to be done which may support the detection of this neurodevelopmental disorder. In this research work, we have used the image enhancement technique in the frequency domain and grey matter volume calculation of TD and ADHD subtypes for the investigation and analysis on the NYU imaging site of the ADHD-200 dataset.
Investigation of Attention Deficit Hyperactivity Disorder with Image …
247
Fig. 14 Enhanced resting-state fMRI image using Gaussian hp filter
Fig. 15 Image of resting-state fMRI using Gaussian lp filter
The shape of the caudate nucleus is different for TD, ADHD, and its subtypes can be viewed using a Gaussian hp filter. The MSE performance metric for the filters is calculated and it shows that the image quality of the Gaussian hp filter is better. Furthermore, the calculation of grey matter volume for ADHD brain is slightly lesser than TD brain. Currently, doctor diagnosis of ADHD with the behavioural reports
248
K. U. Rupni and P. A. Priya
Table 1 Comparison of MSE for filters
Table 2 Grey matter volume of the brain (in mm3 ) for TD and ADHD subtypes
Filter
MSE (anatomical image)
Gaussian high-pass
98.820337
398.942451
Gaussian low-pass
398.406233
9405.430378
Ideal high-pass
116.965746
549.053968
Ideal low-pass
410.907007
9827.016713
TD
ADHD-I
3.47213
×105
3.73327
×105
3.52709
×105
ADHD-C
3.41393
×105
3.10946 ×105
3.44381
×105
3.37114 ×105
3.42353
×105
3.55609 ×105
3.35390 ×105
3.51046 ×105
3.74422 ×105
3.60700
×105
3.48505
×105
3.46788 ×105
3.29888
×105
3.61403
×105
3.43353 ×105
3.43245 ×105
3.32517 ×105
3.64702 ×105
×105
×105
3.59334 ×105
3.41069 ×105
3.49343 ×105
3.37716 ×105
3.33058
×105
3.38297
×105
3.32930 ×105
3.59915
×105
3.33930
×105
3.31260 ×105
3.65154 ×105
3.63458 ×105
3.45158 ×105
3.72854
×105
3.63877
×105
3.34045 ×105
3.57664
×105
3.63133
×105
3.61518 ×105
3.66723 ×105
3.56535 ×105
3.37263
3.30692 ×105
Volume in millimeter cube
MSE (resting-state fMRI image)
3.45125
400000 300000 200000
TD
100000
ADHD-I
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ADHD-C
Number of set Fig. 16 Comparison of TD and ADHD subtypes based on grey matter volume of brain
Investigation of Attention Deficit Hyperactivity Disorder with Image …
249
given by parents and teachers or with the symptoms of ADHD based on the guidelines in DSM-5. Therefore, we conclude that non-invasive fMRI imaging may support healthcare providers in the diagnosis of ADHD. The difference between the subtypes of ADHD is very difficult and requires more research. Our future research work is using artificial intelligence for the detection and analysis of ADHD and its subtypes.
References 1. Venkata JA, Panicker AS (2013) Prevalence of attention deficit hyperactivity disorder in primary school children. Indian J Psychiatry 338–342 2. Danielson ML, Bitsko RH, Ghandour RM, Holbrook JR, Kogan MD, Blumberg SJ (2018) Prevalence of parent-reported ADHD diagnosis and associated treatment among U.S. children and adolescents, 2016. J Clin Child Adolescent Psychol 199–212 3. Hoogman M, Bralten J, Hibar DP et al (2017) Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults: a cross-sectional mega-analysis. Lancet Psychiatry 310–319 4. Gonzalez RC, Woods RE (2007) Digital image processing. 3rd edn. Prentice Hall, United States 5. Anderson A, Douglas PK, Kerr WT, Haynes VS, Yuille AL, Xie J, Wu YN, Brown JA, Cohen MS (2014) Non-negative matrix factorization of multimodal MRI, fMRI and phenotypic data reveals differential changes in default mode subnetworks in ADHD. NeuroImage 207–219 6. Saad JF, Griffiths KR, Kohn MR, Clarke S, Williams LM, Korgaonkar MS (2017) Regional brain network organization distinguishes the combined and inattentive subtypes of Attention Deficit Hyperactivity Disorder. NeuroImage: Clinical 383–390 7. Al-Amin M, Zinchenko A, Geyer T (2018) Hippocampal subfield volume changes in subtypes of attention deficit hyperactivity disorder. Brain Res 8. Salem N, Malik H, Shams A (2019) Medical image enhancement based on histogram algorithms. Proc Comput Sci 300–311 9. Mondal PP, Rajan K, Ahmad I (2006) Filter for biomedical imaging and image processing. J Optical Soc Am A 1678–1686 10. ADHD-200 sample, http://fcon_1000.projects.nitrc.org/indi/adhd200/. Last accessed 21 Aug 2021 11. Riaz, Asad M, Alonso E, Slabaugh G (2020) DeepFMRI: end-to-end deep learning for functional connectivity and classification of ADHD using fMRI. J Neurosci Methods 12. Batty MJ, Liddle EB, Pitiot A, Toro R, Groom MJ, Scerif G, Liotti M, Liddle PF, Paus T, Hollis C (2010) Cortical gray matter in attention-deficit/hyperactivity disorder: a structural magnetic resonance imaging study. J Am Acad Child Adolesc Psychiatry 49(3):229–238
VLSI Implementation for Noise Suppression Using Parallel Median Filtering Technique Pobbathi Nithin Kumar, Shubhada Budhe, A. Annis Fathima, and Chrishia Christudhas
Abstract This paper proposes a computationally efficient median filtering approach to reduce impulsive noise in a 2D signal. The median filter is a nonlinear method for reducing noise from an image. In this paper, a preprocessing step of identifying the corrupted pixels is included. This step ensures that the uncorrupted pixel preserves its value unlike the traditional median filter. A comparative analysis is done between the median filter and modified median filter. And, the results show that the modified median filter has improved RMSE value. Further, in the Verilog implementation, parallelism is incorporated to improve latency. The proposed work is implemented using ModelSim and Xilinx. Keywords Impulse noise · Median filter · Parallelism · Verilog
1 Introduction The most important field in digital image processing is the image restoration technique. Image restoration is the process of regaining the original image from the degraded one. Noise suppression is a technique to restore the real image from the erroneous image. Noise is a crucial problem while transferring images from all kinds of electronic devices. The most common noise in electronic communication systems is impulsive noise. To suppress noise from an image, many algorithms have been introduced in past, but more efficient algorithms are needed to suppress noise more P. N. Kumar (B) · S. Budhe · A. A. Fathima · C. Christudhas School of Electronics Engineering, VIT Chennai, Chennai, India e-mail: [email protected] S. Budhe e-mail: [email protected] A. A. Fathima e-mail: [email protected] C. Christudhas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_20
251
252
P. N. Kumar et al.
effectively. Using less efficient techniques lead to faults like more blurring of the image. There is a great demand for a more efficient algorithm to denoise the image. Impulse noise is one of the most common noise types of noise, and it is also known as salt and pepper noise. It is caused by the unstable voltage which is due to transmission or error generated in the communication channel. Impulse noise generates values in the pixel, i.e., 0 value as pepper noise and 255 value as salt noise. The expression for impulse noise is Noise =
0 or 255 With probability Pn , I (i, j) With probability 1 − Pn
(1)
where noise denotes corrupted pixel in image, Pn is the probability of corrupted pixel with salt and pepper noise, and I denotes pixel, where Pn is given in Eq. 2. The noise ratio lies between 0 and 1. Pn =
1 ∗ Noise Ratio. 2
(2)
Noise filtering techniques are of two types (1) linear technique and (2) nonlinear technique [1, 2]. In the linear filtering technique, the algorithm is applied linearly to all the pixels in the image without defining whether the image pixels are erroneous or not. The algorithm is applied to all pixels in the image even if erroneous pixels are not present. Hence, these filtering techniques are not suitable for removing noise from an image. On the other hand, the nonlinear filtering technique consists of two sps. Firstly, it will identify the corrupted pixels, and secondly, the corrupted pixels will get filtered by the specified algorithms. A median filter is a nonlinear filtering technique used for the removal of noise from a degraded image. Median filtering is the most widely used filtering technique as it effectively suppresses noise by preserving the edge data. Thus, it is very useful for filtering or suppressing the corrupted or missing pixel in the image, such as impulsive and salt and pepper noises. This is caused by intense disturbances in the image signal. The application is not limited for filtering also applied for pattern recognition [3]. The median filtering method is expensive and complex to compute. Many techniques are proposed by researchers for different nonlinear filtering techniques to make it computationally efficient [4, 5]. There are some filtering techniques like center-weighted median filter (CWMF), tri- state median filter (TSMF), progressive switching median filter (PSMF) [6], adaptive progress switching median filter (APSMF) [7], fast adaptive switching technique [8]. Further researchers have proposed different architectures for the noise suppression [9]. In this paper, a modified median filter algorithm is implemented which is efficient for denoising the corrupted pixel from an image. An 8 × 8 8-bit resolution gray scale has been chosen for the image test. A Verilog code for impulse noise suppression is implemented in Xilinx and ModelSim. We claim novelty for the proposed modified median filter algorithm that shows improvement in delay performance and also achieve parallelism.
VLSI Implementation for Noise Suppression Using Parallel Median …
253
2 Proposed Work 2.1 Traditional Median Filter Algorithm The median filter is a nonlinear type filter, which is used to suppress impulse noise from an image. It gives better edge approximation. The median filter works by moving throughout the entire image pixel by pixel, replacing each value with a neighboring pixel value. The median is calculated by sorting all pixel values from the window into numerical order and then replacing the corrupted pixel which is considered with the middle pixel value. There are two types of filters. 1) Maximum filter—it will choose the highest pixel value from the prearranged set of values from the window. (2) Minimum filter—it will choose the lowest pixel value from the prearranged value of pixels from the window. These filters are used for finding the minimum and maximum pixel values from the window. The minimum and maximum values of corrupted pixels are detected by the window. These values of a corrupted pixel are then mapped in sheets as addresses. If the window detects the maximum pixel value, it will map that value in the sheet; the same is done for minimum pixel value. Figure 1 shows the procedure of mapping. As it is a 3 × 3 window, it consists of 9-pixel values within the window. The 3 × 3 window slides over the entire image detecting the corrupted pixels one by one replacing each pixel value with the median pixel value. In [10], authors have proposed the mapping of pixels by creating two sheets for mapping, i.e., one sheet for maximum pixel address and the other for minimum pixel address. After replacing the corrupted pixel value, the minimum and maximum filters find the minimum and maximum pixel values and map their address in their respective sheet. In [10], these sheets contain the address of the minimum and maximum pixel values which are then combined to one sheet. In the traditional median filter, the original image is regained by replacing not only the corrupted or missing pixels but also the one which is not a noisy pixel.
Fig. 1 Procedure for mapping
254
P. N. Kumar et al.
Fig. 2 Modified procedure of mapping
2.2 Modified Median Filter Algorithm In [10], the problem with the approach is the maximum and minimum sheets which are not related. These two sheets should be related to each other. So, a little modification is made in the traditional median filter. Instead of creating two sheets, the pixel values are mapped into one sheet, i.e., maximum and minimum addresses of corrupted pixel values are introduced in one single sheet. If the pixel value is found to be minimum in the window, then the corresponding address of this pixel is mapped in the sheet. Similarly, when the pixel value is found to be maximum in the window, then the corresponding pixel value address is mapped in the same sheet. The modification is done to achieve low latency. Here, the window taken is a 5 × 5 sliding window which slides over the entire 8 × 8 image. The window consists of 25-pixel values. If the pixel value is minimum, i.e., the pixel value detected is ‘25’, then it is known as pepper noise. If the pixel value is maximum, i.e., if the pixel value detected is ‘25’, and it is the maximum value in that window, it is taken as the salt noise. By reducing the maximum and minimum sheets into one single sheet, we reduce the consumption of memory. These minimum and maximum pixel values are nothing but corrupted pixel values that are detected by the sliding window while sweeping through the image. These corrupted pixel values’ address is mapped into a single sheet. Figure 2 shows the proposed mapping procedure.
2.3 Block Diagram The corrupted pixels are filtered by the windowing technique. A 5X5 sliding window is incorporated in the median filter. These windows are independent of each other.
VLSI Implementation for Noise Suppression Using Parallel Median …
255
Fig. 3 Block diagram of modified median filter
Here, the window sweeps through the entire 8X8 image and detects and replaces only the erroneous pixels’ value with the median pixel value. The minimum and maximum pixel values are mapped in one single sheet as shown in Fig. 3 to achieve high parallelism to improve latency.
3 Result In this section, a comparative study is done between the modified median filter and the traditional median filter. An original image is corrupted by introducing noise in it, i.e., the image is corrupted by impulse noise. In Fig. 4, an original image name pattern is shown, in which the original image is introduced with impulse noise with noise density 0.1. To regain the original image, the corrupted image is processed by filtration technique. The modified and traditional median filter algorithms are applied to the corrupted image. After filtering, the erroneous pixels are removed, and hence, we get noise-free images as shown in Fig. 4 c, d. From this, one can observe that the circle edge is not preserved in the image filtered by the median filter. Thus, the image which is filtered by applying the modified filter algorithm is more effective in removing noisy pixels than the traditional median filter. Table 1 shows the mean square error which is an important parameter in image noise filtering at different noise densities. From the paper, it is evident that the mean square error of the modified median filter is very less than the traditional median filter and maximum sheets. Figure 5 shows the RTL schematic of traditional median filter, modified median filter, and minimum and maximum sheets. The minimum and maximum pixels values are mapped in one single sheet. The area consumption is more in modified median filter, but the power and delay required are less than the traditional median filter.
256
P. N. Kumar et al.
Fig. 4 a Original image, b image corrupted with noise density = 0.1, c filtered using proposed modified median filter, d filtered using traditional median filter
Table 2 summarizes the result. We observe that the modified median filter algorithm is more efficient in the terms of power, delay, and area than the traditional median filter algorithm.
4 Experimentation The implementation of median filtering to achieve parallelism and latency is done on Xilinx ISE Spartan6 software package. The complete algorithm is coded by using Verilog HDL. The traditional median filter is modified by reducing the two sheets to one sheet for mapping purpose. The modified median filter algorithm gives best
VLSI Implementation for Noise Suppression Using Parallel Median …
257
Table 1 Mean square error estimation for different noise densities using existing and modified median filters Test image
Noise density = 0.05
Noise density = 0.1
Noise density = 0.2
Traditional Median median filter using max/min sheet
Traditional Median median filter using max/min sheet
Traditional Median median filter using max/min sheet
0.1551
0.0566
0.4609
0.1204
0.8091
0.3840
16.3996
6.4341
17.6292
7.0572
20.3489
9.2097
5.4132
2.9509
6.4083
3.3453
8.7162
4.6509
62.5078
10.6359
63.4158
12.4360
65.5997
18.4458
23.0775
5.7166
24.6738
6.5761
28.8647
10.4550
Pattern.tiff
Lena.jpg
Cameraman.tiff
Boards.tiff
Baboon.tiff
results in terms of qualitative and quantitative analyses. Instead of applying 3 × 3 kernel on all the pixels, the corrupted pixels are identified using 5 × 5 sub-image. Hence, this reduces the applying of 3 × 3 median kernel only on the positions of corrupted pixels. The results in Table 2 depict it. The RTL schematic of modified and traditional median filters and the minimum and maximum sheets is shown in Fig. 5. The algorithm for the modified median filter is simulated in ModelSim, and the results are compared with the synthesis report generated in Xilinx.
258
P. N. Kumar et al.
Modified Median Filter
Traditional Median Filter
(a)
(b) Minimum and Maximum Sheet
(c) Fig. 5 RTL schematic of a modified median filter, b traditional median filter, c minimum Table 2 Comparison results of proposed modified median filter with conventional median filter
Parameters
Median
Modified median
Power (mW)
896.35
280.52
Area (LUTs)
2228
2447
Delay (nS)
17.88E-6
0.4E-6
VLSI Implementation for Noise Suppression Using Parallel Median …
259
5 Conclusion In this paper, a median filter algorithm, used for impulse noise suppression, was reviewed and modified to reduce the power and delay is proposed on it. A modified median filter algorithm is implemented by achieving parallelism. The corrupted pixels are identified and median kernel is applied only on the identified pixels. The maximum and minimum sheets are constructed in parallel to reduce the latency. A comparative study is done between the traditional and modified median filters. The modified median filter algorithm is found to be more efficient and effective as it considerably decreased the power consumption and delay. However, the area taken by this filter is more as parallelism is applied. The proposed work is simulated in ModelSim and implemented in Xilinx.
References 1. Kumar A, Sodhi SS (2020) Comparative analysis of gaussian filter, median filter and denoise autoencoder. In: IEEE international conference for sustainable global development (INDIA com) 2. Nieminen A, Neuvo Y (1988) Comments on theoretical analysis of the max/median filter, by GR Arce and MP McLaughlin. acoustic speech signal process. IEEE Tans 36:826–827 3. Gil J, Werman M (1993) Computing 2-D min, median and max filters. In: Pattern analysis and machine intelligence, IEEE transaction 4. Shrestha S (2014) Image denoising using new adaptive based median filter. Int J (SIPIJ) 5(4) 5. Hwang H, Haddad R (1995) Adaptive median filters: new algorithms and results. Image Process IEEE Tans 44940:499–502 6. Wang Z, Zhang D (1999) Progressive switching median filter for the removal of impulse noise from highly corrupted images. Circuits System P Analog Digit Signal Process IEEE Trans 46:78–80 7. Fabijanska A, Sankowski D (2011) Noise adaptive switching median- based filter for impulse noise removal from extremely corrupted images. Image Process IET 5(5):472–480 8. Malinski L, Bogdan S (2016) Fast adaptive switching technique of impulsive noise removal in color images. J Real- Time Image Process 1–22. https://doi.org/10.1007/s11554-016-0599-6 9. Mukherjee M, Maitra M, Reconfiguration architecture of adaptive median filter-an FPGA based approach for impulse noise suppression. In: Computer, communication, control and information technology (C3IT), 2015 IEEE third international conference, pp 1–6 10. Jelodari PT, Kordasiabi MP, Sheikhaei S, Forouzandeh B (2019) FPGA implementation of an adaptive window size image impulse noise suppression system. J Real-Time Image Process 16:2015–2026. https://doi.org/10.1007/s11554-017-0705-4
Investigation on Performance of CNN Architectures for Land Use Classification R. Avudaiammal, Vijayarajan Rajangam, A. Swarnalatha, P. S. Nancy, and S. Pavithra
Abstract Land use classification is useful to understand fundamental utilities of land area effectively in satisfying the needs of human society. Since, land use information takes a major role in many applications, it is required to extract essential features based on their patterns, tones, textures and shapes from satellite images for analysis and their classification. Convolutional neural networks (CNNs) have been widely used in object detection and in image classification due to their end-to-end trainable network framework. This is due to the speedy and correct feature extraction functioning of these frameworks. The performance accomplishment of CNN for the specific application is based on training set size, network size (number of layers, type of layers), number of computing resources, activation function, convergence rate and accuracy. In order to have better accuracy, various CNN models are evolving through research with the inclusion of more number of layers, more parameters, leading to requirement of high-end computing resources. Though CNN haves a large number of parameters, the optimum parameter setting in training CNN is generally based on experience and practice. This paper presents the performance analysis of dilated CNN and UNET architectures on land use classification. Very high resolution images of DSTL dataset are utilized to train dilated CNN and UNET architectures. The analysis of the achieved performance proves that UNET achieves faster convergence rate with lesser usage of parameters compared to dilated CNN. The accuracy of dilated CNN is 72.4%, whereas UNET is able to provide an accuracy of 92.8%. The accuracy achieved through UNET architecture suggests that this model be adopted to discriminate land surfaces into buildings and vegetation area effectively. Keywords Land use classification · VHR · CNN · Dilated CNN · UNET
R. Avudaiammal (B) · A. Swarnalatha · P. S. Nancy · S. Pavithra Department of ECE, St. Joseph’s College of Engineering, Chennai, India e-mail: [email protected] V. Rajangam Centre for Healthcare Advancement, Innovation and Research, VIT Chennai, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_21
261
262
R. Avudaiammal et al.
1 Introduction Land use classification is a method of discriminating land surfaces into forest area, agricultural area, residential area and industrial area based on their characteristics. It is mainly used to understand their fundamental utilities effectively in satisfying the needs of human society. Since, land use information takes a major role in various applications such as tax assessments, water supply planning, waste water treatment, water resource inventory, flood control, the public lands management, wildlife resources management, etc. It is required to extract essential features based on their patterns, tones, textures and shapes from satellite images for analysis and their classification [1]. Very high resolution (VHR) images provide elaborate spatial information, but it does not logically match up in providing accuracies with respect to image interpretation, especially in urban areas [2]. It increases the burden of human experts in labeling the objects as well as identifying complex information from VHR images manually. Extraction of built-up areas and vegetation area is very essential in land use classification. To ease the methodology of extracting information of objects like buildings, road from VHR images, many researchers have introduced automated extraction methods to reduce the volume of human efforts. The limitation of the ML algorithms is the requirement of ground truth data to train the model or to validate model results and to compute model error. Due to this dependence on ground truth data, it makes it difficult to extrapolate many applications [3]. In order to tackle these challenges, many researchers have started working on CNN-based land use classification. CNNs have been broadly used in object detection and in image classification due to their end-to-end trainable network framework for achieving faster and more accurate feature extraction [4–6]. The convolutional neural network comprises of input layer, hidden layer and output layer similar to the conventional neural network. In addition to, there are the convolution layer, pooling layer and the fully connected layer. When the pixel matrix of the input image is given as the input to the CNN, the feature map of the image is determined using convolution kernel [7]. The pooling layer further reduces the size of image but retains important information. The extracted feature map of the image, moved out through a sequence of convolution and pooling layers, is then given into a fully connected layer. The fully connected layer is used to incorporate the local information with classification details contained both in the convolution layer and the pooling layer for increasing the performance of the entire CNN. As the last stage, the Softmax layer classifies the output. In order to have better accuracy, the traditional CNN requires inclusion of more number of layers, more parameters, which leads to requirement of more computing resources [8]. In order to handle these limitations, various CNN models are developed by researchers. LeNet-5 has an innovative impact for the growth of deep CNNs as the combination of convolution layer, pooling layer and fully connected layer [9]. Ronneberger et al. [10] developed UNET architecture for image segmentation and is being extensively used [11, 12]. Kudo et al. [13] proposed a dilated CNN model.
Investigation on Performance of CNN Architectures for Land Use …
263
From the literatures, it is observed that the performance of the CNN can be improved by replacing the traditional convolution layers by the dilated convolution layers. From the literatures, it is also observed that the performance accomplishment of CNN for a specific application is based on training set size, network size (number of layers, type of layers), number of computing resources, activation function, parameters, convergence rate and accuracy. Though CNN has a large number of parameters, the parameter setting in training a CNN is based mostly on iterative techniques achieved by optimum practice. In this work, the performance measures of land use classification using dilated CNN and UNET architectures are compared. The outline of the paper is as follows: Sect. 2 describes dilated CNN architecture and UNET architecture to classify objects into vegetation area and building area automatically. Section 3 offers result analysis of dilated CNN and UNET architectures. Section 4 concludes a discussion on the performance of the land use classification.
2 CNN Architectures In this paper, land use classification is performed using two CNN architectures such as dilated CNN and UNET architectures. In this section, the architectures of these two deep networks are discussed.
2.1 Dilated CNN The dilated CNN [14] extracts feature map from the network and expands the specific field for extracting more information with lesser usage of computational resources by inserting zero weights (holes) into traditional convolution kernels. Dilation rate denotes spacing between kernel values. Dilated convolutions help in producing a wider field of view and in achieving faster convergence rate with lesser usage of parameters and with lesser computational cost. For example, a 3 × 3 kernel with dilation rate of 2 produces an impact of a 5 × 5 kernel, with only 9 parameters instead of 25 parameters. It increases the size of the receptive field by inserting a greater number of zero weights while retaining the same number of parameters. Dilated CNN architecture is constructed using cropping layers, convolution layers, concatenation layer, dropout layer, input and output layers. The shape of dilated CNN architecture looks like an inverted pyramid. The objective of the dilated CNN architecture is to retain the larger segments of the receptive field and thus decrease the size of the images in order to perform pixel level classification. To retain the larger portions of the receptive field and to decrease the size of the images, dilated convolution is introduced in the earlier stages of network. Different paths of this network are combined later to form a single path. The input layer is followed by many cropping layers, which lead to convolution layers, which are then combined
264
R. Avudaiammal et al.
by the concatenation layers to form a single network. The cropping layer is used to crop the feature map or the reference layer (size, height and width). Convolutional layer preserves the pixel data by learning the image features. The concatenation layer gets input in diverse dimensions and associates them along a specified dimension. The inputs must have the same size in terms of all dimensions. The dropout layer is used to avoid the over fitting of the neural network by visible or hidden layers. Dilated CNN has one input layer, one average pooling layer, 20 cropping layers, 56 convolution layers, 4 concatenation layer and 2 dropout layers.
2.2 UNET Architecture It is also encoder-decoder type model which contains contracting and expanding paths. A contracting path extracts features at different levels. The expanding path interpolates the result to achieve higher resolution of the detected features. UNET architecture is constructed using convolution layer, max pooling, up-sampling, input and output layers. UNET model contains 23 convolutional layers (19 convolutional layers and 4 transposed convolutional layers), 4 max pooling layers, 4 up-sampling layers and 4 concatenation layers. The attractive feature of UNET architecture is that it requires very few training images to produce more precise segmentation output. The reason is that pooling layers are modified into up-sampling layers which increase the resolution of the output. A contracting path is placed in the left side of the model, and an expansive path is placed in the right side of the model. The classic architecture of a convolutional network is followed in the contracting path. In contracting path, 3 × 3 convolution is continually applied. Then, the max pooling operation is carried out with stride 2 to perform down sampling. At each down sampling stage, the number of feature channels is doubled. In the expansive path, up-sampling of the feature map is followed by a 2 × 2 up convolution. At each down sampling stage, the number of feature channels gets doubled. In the expansive path, up-sampling of the feature map is followed by a 2 × 2 up convolution. At each up-sampling stage, the number of feature channels is halved. The cropped feature map obtained from the contracting path is concatenated with the corresponding feature in the expanding path, each followed by a ReLU. The cropping is essential due to the loss of pixels in boundaries in each convolution stage. In the last stage, each feature vector is assigned to one of the specific classes through 1 × 1 convolution. Once, the high-resolution features obtained from the contracting path are associated with the up sampled output, the succeeding convolution layer uses this information to extract a more precise output. As the up-sampling stage of the model provides higher number of features, context information is forwarded to higher resolution layers. Thus, a u-shaped architecture is formed with the symmetric expansive path and the contracting path. The input image is given to the two consecutive convolutional layers. The output of the convolutional layer is sent to the max pooling layer of window size 2 × 2. The important part of the UNET model is the concatenate layer which concatenates
Investigation on Performance of CNN Architectures for Land Use …
265
outputs from two layers, left and right sides of the UNET model. The last layer is a convolutional layer which uses activation function sigmoid with filter size 1 × 1. During the compilation of the model, binary cross entropy as a loss function and the optimizer used is adaptive learning rate algorithm (ADAM).
3 Results and Discussion The accuracy and the loss of dilated CNN model and UNET model have been analyzed. The training period epoch for network convergence, training accuracy, validation accuracy and model loss of the verification set was analyzed.
3.1 Dataset In this work, images from Defense Science and Technology Laboratory (DSTL) dataset obtained from Worldview 3 satellite are taken as study images. DSTL dataset consists of 450 images in both 3-band and 16-band formats, 25 of them have training labels. The 16-band format contains eight band multispectral (red, red edge, coastal, blue, green, yellow, near-IR1 and near-IR2) in the range of (400–1040 nm) and eight SWIR in the range of (1195–2365 nm) and panchromatic image in the range of (450– 800 nm). In this work, information of all 20 bands is utilized in training the model. The spatial resolution of image is 1.24 m. The images covering several categories such as vegetation, barren land, bridges, roads, urban houses and commercial buildings are considered. Since free cloud service platform Google Colab provides free GPU, the CNN model is implemented in the Google Colab platform. It is a free Jupyter notebook environment that runs in the cloud and stores the notebook on Google Drive. The system configuration used is an Intel core i5, 2.97 GHz and 8 GB RAM. The buildings and vegetation area identified using dilated CNN and UNET from Study Site 1, Study Site 2 and Study Site 3 are illustrated in Figs. 1, 2 and 3, respectively. The classified land use (building area) output of the UNET, dilated CNN results alone are given as the input to the canny edge detector. The number of identified buildings from built-up areas is from Fig. 1. Extracted buildings and vegetation from Study Site 1 through dilated CNN is 102 and UNET architecture is 80, from Fig. 2 Extracted buildings and vegetation from Study Site 2 through dilated CNN is 149 and UNET architecture is 79 and for Fig. 3. Extracted buildings and vegetation from Study Site 3 through dilated CNN is 365 and UNET architecture is 145. The variation in accuracy vs the number of epochs is illustrated in Fig. 4a. The training accuracy (accuracy of the trained model) is shown in orange color, and validation accuracy (accuracy of the test image) is shown in blue color. It is identified
266
R. Avudaiammal et al.
(a) Test Image 1
(b) Brightness Image
(d) Dilated CNN (Buildings)
(e) Dilated CNN (Vegetation)
(g) UNET (Buildings)
(h) UNET (Vegetation)
Fig. 1 Classified buildings and vegetation from Study Site 1
from Fig. 4a that accuracy increases with increase in the number of epochs. It is also found that both the training and validation accuracy is about 92% after 8 epochs. The variation in loss with respect to the number of epochs is shown in Fig. 4b in which the training loss and validation loss is shown in orange color and in blue color, respectively. It is inferred from Fig. 4b that the loss decreases with rise in the number of epochs. The validation loss and the training loss are almost same after two epochs and reaches almost zero. The batch processing technology is conventionally used in the training phase of the CNN model. The batch size finds the degree of optimization and speed of the CNN model. Different batch sizes are used to train datasets on the CNN model to analyze the training results as shown in Table 1. The validation accuracy and loss of architectures are analyzed by employing various activation functions and by varying batch sizes such as 8, 16 and 32. From
Investigation on Performance of CNN Architectures for Land Use …
(a) Test Image 2
(b) Brightness Image
(c) Dilated CNN (Buildings)
(d) Dilated CNN (Vegetation)
(e) UNET (Buildings)
267
(f) UNET (Vegetation)
Fig. 2 Classified buildings and vegetation from Study Site2
the Table, it is inferred that sigmoid activation function gives more accuracy than Softmax activation function. When the batch size of the sigmoid function is minimum say 8, the time taken for the execution of each epoch is increased and the total accuracy of the training is increased. The time complexity of the UNET is more than that of time complexity of dilated CNN. The dilated CNN provides validation accuracy of about 72.4%, whereas UNET provides higher accuracy of about 92.8%.
268
R. Avudaiammal et al.
(a) Test Image 3
(b) Brightness Image
(c) Dilated CNN (Buildings)
(d) Dilated CNN (Vegetation)
(e) UNET (Buildings)
(f) UNET (Vegetation)
Fig. 3 Classified buildings and vegetation from Study Site 3
The validation accuracy and loss of architectures are analyzed by employing various activation functions and by varying batch sizes such as 8, 16 and 32. From the Table 1, it is inferred that sigmoid activation function gives more accuracy than Softmax activation function. When the batch size of the sigmoid function is minimum say 8, the time taken for the execution of each epoch is increased and the total accuracy of the training is increased. The time complexity of the UNET is more than that of time complexity of dilated CNN. The dilated CNN provides validation accuracy of about 72.4%, whereas UNET provides higher accuracy of about 92.8%. Even though, the dilated convolution has an attractive feature of lesser training time due to the use of dilated kernels, discontinuity among the dilated convolution kernels caused by the holes leads to poor accuracy. The discontinuity causes the exclusion of certain pixels, which in turn removes the continuity information on the image during image feature map extraction. The loss of information causes the reduction in validation accuracy.
Investigation on Performance of CNN Architectures for Land Use …
269
(a) Plot of Model Accuracy
(b) Number of Epochs (Vs) Loss Fig. 4 Performance analysis
Therefore, dilated CNN architecture is suitable only for pixel level classification of single model, separate model is required to improve the accuracy but the computation needed will be more. In UNET architecture, up-sampling layers increase the resolution of the output and produce more validation accuracy. The good performance measures achieved in land use classification through UNET architecture shows that the work can be adopted for land use classification application.
270
R. Avudaiammal et al.
Table 1 Performance analysis Activation
Number of epochs
Batch size
Time (per epochs)
Validation accuracy (%)
Validation loss (%) 13.21
Softmax
2
16
1049
17.13
Sigmoid
5
8
1088
90.67
0.2578
16
1058
90.48
0.2655
Sigmoid
Sigmoid
10
15
32
1080
90.16
0.2601
8
1034
92.69
0.1912
16
1032
92.56
0.1876
32
1015
92.52
0.1914
8
1096
93.87
0.1609
16
1044
93.84
0.1648
32
1004
93.73
0.1591
4 Conclusion In this paper, analysis of land use classification is carried out. We have classified the built-up area and vegetation area using dilated CNN and UNET architectures. In this work, we have used three test images for assessing the model of land use classification. The accuracy of dilated CNN is 72.4%, and UNET is able to provide the accuracy of 92.8%. The good performance measures achieved in land use classification through UNET architecture shows that the work can be implemented for land use classification. The performance of the model may be enhanced by training with bigger dataset. Future investigations can be made on modifying the UNET model to improvise the performance of the land use classification.
References 1. Su W, Li J, Chen YH, Zhang JS, Hu DY, Liu CM (2007) Object-oriented urban land-cover classification of multi-scale image segmentation method: a case study in Kuala Lumpur City Center, Malaysia. J Remote Sens 04 2. Myint S, Lam N, Tyler J (2004) Wavelets for urban spatial feature discrimination: comparisons with fractal, spatial autocorrelation, and spatial co-occurrence approaches. Photogramm Eng Remote Sens 70:803–812 3. Holloway J, Mengerse K(2018) Statistical machine learning methods and remote sensing for sustainable development goals: a review. Remote Sens 10:1365 4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vis Pattern Recognit 1:770–778 5. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 6. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556 7. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Investigation on Performance of CNN Architectures for Land Use …
271
8. Lei X, Pan H, Huang X (2019) Dilated CNN model for image classification. IEEE Access 7:124087–124094 9. Sainath IN, Mohamed A, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In: IEEE international conference on acoustics, speech and signal processing, 26–31 May, Vancouver, BC, Canada, pp 8614–8618 10. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Lecture notes in computer science, vol 9351. Springer, Cham 11. Wagner FH, Dalagnol R, Tarabalka Y, Segantine TYF, Thomé R, Hirye HCM (2020) U-Net-Id, an instance segmentation model for building extraction from satellite images—case study in the Joanópolis City, Brazil. MDPI 12. Pan Z, Xu J, Guo Y, Hu Y, Wang G (2020) deep learning segmentation and classification for urban village using a worldview satellite image based on U-Net. Remote Sens 13. Li H, Lu H, Lin Z, Shen X, Price BL (2015) LCNN: low-level feature embedded CNN for salient object detection. arXiv preprint arXiv:1508.03928 14. Huang Z, Cheng G, Wang H, Li H, Shi L, Pan C (2016) Building extraction from multi-source remote sensing images via deep deconvolution neural networks. IEEE international geoscience and remote sensing symposium (IGARSS), 10–15 July, Beijing, China, pp 1835–1838
Enhanced ATM Security Using Facial Recognition, Fingerprint Authentication, and WEB Application K. V. Gunalan, R. A. Sashidhar, R. Srimathi , S. Revathi , and Nithya Venkatesan
Abstract Cash is the most liquid commodity and is used instantly to carry out economic activities such as the acquisition, selling, or payment of debt and meet a person’s basic needs. There are emergence of ATM frauds in the country and across the globe. The Automatic Teller Machine (ATM) is highly susceptible due to the weakness of its security. Therefore, the present security measures need to be improved based on the human biometric-based identification system. An enhanced ATM security system is proposed using RFID (Radio Frequency Identification), facial recognition, fingerprint authorization, and web development. The proposed system is devised using a raspberry pi microprocessor. The RFID and fingerprint authorization is the foremost security check to initiate a transaction. The facial recognition enables to access the further steps of transaction with the camera as the secondary security check for the system. In the event of a security breach, the respective details will be updated in the web application. The web application is accessed only by authorized personnel. The proposed system is also validated by experimental results for three different cases. From the results, it is evident that the proposed system is highly secured for money transactions in ATM. Keywords Automated Teller Machine (ATM) · Biometric face recognition · Raspberry pi · Webpage
Supported by VIT University Chennai. K. V. Gunalan Wipro Technologies,Elcot SEZ,Chennai, India R. A. Sashidhar Amrita Vishwa Vidyapeetham University,Coimbatore, India R. Srimathi (B) · S. Revathi · N. Venkatesan SELECT VIT Chennai,Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_22
273
274
K. V. Gunalan et al.
1 Introduction The Automated Teller Machine (ATM) is an Automatic Banking Machine (ABM). It enables the customer to complete simple money transactions without any assistance from the bank representatives. Magnetic stripe technology in cards is one of the popular inventions in modern business history. These cards were widely used in credit cards, identity cards, and tickets for transportation [1]. The magnetic stripe card details can be easily duplicated and misused. Moreover, the stolen data from the magnetic stripe card can be used to forge counterfeit cards to purchase goods fraudulently [2]. Thereby, magnetic stripe cards are replaced by EMV (Europay, Mastercard, and Visa) chip-embedded ATM cards to avoid the criminal activities. Chip cards can avoid the safety holes left by the magnetic stripe cards. The static, insecure data is replaced with cryptographic dynamic data [3]. These cards cannot be duplicated by the counterfeit cards as it has chip-based microcircuit. But chip-based cards have disadvantages such as minimal security during online transactions and delay in processing checkout speeds [4]. The RBI data states that the total number of fraud cases for the amount below INR 1 lakh has increased from 32,732 (FY18) to 50,438 (FY19). The total money involved is INR 59.43 crores (FY18) and INR 78.04 crores (FY18), respectively [5]. Hence, safety features such as hidden cameras are placed in ATM centers and safety pin codes for cards are issued to the customers. Moreover, some ATMs are placed in public spaces such as gas stations, shopping centers, grocery stores, and airports. This enables easy access to the ATM and less chance of theft or fraud. ATMs deal with currency notes; hence, the ATM design should also concentrate on protection of notes [6]. ATM threats are categorized into three types of attacks, namely card and currency fraud, logical attacks, and physical attacks. The fraudulent redemption of money from consumer’s bank account by indirect attacks such as card data and ATM pin theft is also possible. To reduce these fraudulent activities, an Radio Frequency Identification (RFID)-based system is proposed [7, 8]. This RFID system uses a unique number generator, and hence, it is impossible to predict the user data. The face recognition technology is also used for verification of user data in the ATM system. In facial recognition, the face of the individual is compared with all the other individual images stored in the database [9]. Tilt and vibration sensors are used to detect the abnormal behavior that happens on the ATM [10, 11]. Human biometrics are used for the identification of ownership of any credential [12]. Among them, fingerprint authentication is secure as each individual fingerprint is unique. Some system also uses face recognizing camera to capture the face of the person, who access the ATM. This verification serves as an access grant [13]. However, facial recognition fails to distinguish between identical twins due to similar facial points. In such cases, unique fingerprint authentication is used to identify the particular individual precisely. Though the fingerprints are unique, it can also be replicated. These replications can be produced from imprints on glue, polycarbonate molding and with high resolution picture [14]. All of the abovementioned security systems are different security systems with a single level of security check. Therefore,
Enhanced ATM Security Using Facial Recognition, Fingerprint …
275
to prevent the ATM threats a multistep security system that is hard to bypass is proposed and developed in this paper. The proposed security system uses computer vision, fingerprint authorization, and web development using raspberry pi to perform money transactions. But, the system does not support GSM technologies, such as vibration sensors, DC motors, and stepper motors, allowing for the prediction of unconscious gas theft from an external ATM machine. This paper has been organized as follows: Architecture of the proposed ATM security system is discussed in Sect. 2. The Operation of the proposed system is presented in Sect. 3 whereas experimental results are analyzed in Sect. 4. Finally, conclusions are discussed in Sect. 5.
2 Architecture of the Proposed System The block diagram of the proposed enhanced ATM security system is shown in Fig. 1. The Main Processing Device (MPD) of the proposed system is the raspberry Pi microprocessor. All the security level devices such as facial recognizing camera, Radio Frequency Identification (RFID) device, biometric device, and a web server are connected to the MPD. The data from the RFID, biometric, Pi camera, and web applications is processed simultaneously in the MPD. To initiate the transaction in an ATM, RFID chip-based ATM cards are used instead of EMV chip-based cards. The role of biometric authentication and facial identification is to examine whether the right cardholder is operating the ATM. The data from facial points is trained in the MPD. In case of any security breach, an alert will be driven on the common web server which connects all the ATMs. The MPD conveys the important details like time of the incident, incident location, and details of the security breach to the webserver.
2.1 RFID RFID is an Automatic Identification (Auto-ID) technology that uses radio waves to automatically identify individuals or objects from a distance of several inches to a hundred feet [15, 16]. For the identification purpose, the combination of tag and reader is used. An RFID tag is a small electronic transponder that contains an antenna and a microchip. The code is contained in the RFID tag, and a physical object is connected to it. The object is distinctly recognizable by an identification code, and the code is transmitted to the reader. The reader receives the object information [17]. There are different types of RFID tags based on the various radio frequencies, and the choice on the type is based on the application. They are low frequency, high frequency, ultra-high frequency, and microwave RFID. Besides frequency, the tag is also categorized as active and passive. Active tag has its own source of power whereas passive tag gets power from the radio frequency signal of the reader. The
276
K. V. Gunalan et al.
Fig. 1 Block diagram of the proposed ATM security system
reader antenna transmits RF (Radio Frequency) signal, and the tag gathers energy from the RF signal. To gain energy, inductive coupling is used in case of low and high frequency and back-scatter coupling in the case of ultra-high frequency tag. In the proposed security system, a low-frequency passive RFID tag (MFRC-522) is used as they are cheap and compact. The tag is configured by write code for the first time to enable the tag. The object information is written into the RFID tag. Similarly, the reader (ThinRFID-SICRFTH) is also initialized by a read code so that it can read the information from the tag. The read information is then transferred to the MPD. The processor compares the read information with the information stored in its database.
2.2 Facial Recognition Facial recognition is the task of identifying and verifying the face by computer vision. FaceNet is a face recognition [18] used to derive the best facial features from the face which is also called face embeddings. The face embeddings are derived from the given datasets. Derived embeddings are stored in the MPD as a graph in a library called TensorFlow. TensorFlow [19] is an open-source library for Artificial Intelligence (AI). The Application Program Interface (API) of AI in TensorFlow is used for image processing in the personal machine without any Graphical Processing Unit (GPU). Using the low-level API to significant level API, the face recognition is possible in the Central Processing Unit (CPU) of the personal computer.
Enhanced ATM Security Using Facial Recognition, Fingerprint …
277
Machine learning algorithms [20] of AI are used to train the datasets in the machine to classify the face of one person from other persons. A large dataset of face embedding is used to train the machine. There are different types of machine learning algorithms such as unsupervised, supervised, and reinforcement learning algorithms. A Support vector classifier (SVC) [21] is a type of supervised learning algorithm and is used in this ATM security system. SVC is used to classify the positive face (actual person face) from negative faces (other stored person faces). SVC takes test face as input and gives the output values as a percentage of arrays of all given person faces. The percentage in the array represents the similarity of the test face with each person’s face which is stored in the database. The highest percentage value in the given output array is considered as the correct output, and the respective face is taken as the final output. Each class of faces is stored in a different folder with different names. According to the face, the respective name of the folder will be given as the final output. In this system, the Pi camera (Raspberry Pi—100003) is used to capture the image of the individual. The captured face is converted into face embedding by the code written in the microprocessor.
2.3 Biometric Authentication Biometric Authentication [22] is used to identify a person based on their biological characteristics. It is used in the security system to identify and verify a person with greater accuracy. A unique electronic signature is created for each individual at the time of enrollment, and it is saved in the database. There are three types of fingerprint/biometric scanners, namely optical, capacitive, and ultrasonic scanners. In this system, an optical fingerprint scanner (Generic R307) is used. Optical fingerprint scanners work on the concept of Frustrated Total Internal Reflection (FTIR) [23]. It captures an optical image using Complementary Metal Oxide Semiconductor (CMOS) image sensors. Later, the position and direction of ridges in the image are bifurcated into numeric data using cryptographic hashing functions [24]. The security system verifies the captured biometric data of an individual with the data stored in the database.
2.4 Data Security Secure Shell (SSH) is a cryptographic network protocol that allows to communicate data securely between the computer and your Raspberry Pi. To control the Raspberry Pi without using a monitor or keyboard, the IP address of PI is configured by a command in command line. Moreover, each and every IP address of the ATM machine is mapped using GUI (Graphical User interface). Hence, the data of every ATM machine will be transferred in a secured platform. In the proposed new system, the facial data and fingerprint details will be stored in the bank’s server(i.e., database).
278
K. V. Gunalan et al.
Whenever a ATM card holder access the ATM the account details, facial and fingerprint details will be compared with the Database. The facial and fingerprint details are encrypted using hashing functions which ensures the integrity of the information. The ATM are connected to bank server using leased line through the host computer so data is transferred without any third party users.
2.5 EATMSP Web Application Enhanced ATM Surveillance portal (EATMSP) web application is built to enhance tight security in money transactions. This application is the common server connecting all the ATM centers. Moreover, it also updates the information on the security breach to the authorized person. The authorized person belongs to that particular one of the Reserve Bank of India (RBI). The server for this web application is hosted in the cloud. The Internet connection for the raspberry pi is provided by the laptop via Ethernet cable. Front-end web development is responsible for making the User Interface (UI) of a webpage. The front end is built using HTML5, Cascaded Style Sheets(CSS), bootstrap, and JavaScript. Front-end UI is important because a site or web application will communicate with the person easily without including a lot of information. Back-end development languages handle the functional capability and computational logic of the web application. It is the server-side programming of the application. Moreover, it also acts as a bridge that connects the web to the database, server, and application. Front-end UI receives the final processed information from the backend development, and it is displayed to the end-user. Back-end development program is developed using the MEAN (MongoDB, ExpressJs, AngularJs, NodeJs) Stack [25]. The MEAN stack is completely built on javascript programming language, and it provides more productivity than LAMP (Linux, Apache, MySQL, and PHP, Perl, or Python) stack [26]. Web application with MEAN stack is deployed in any operating system whereas the LAMP stack needs a linux operating system. The alert message sent from the microprocessor to the web application is of JSON(JavaScript Object Notation) data structure. The native data structure of the MEAN stack is JSON so that it can be easily accessed without any parsing in the back-end development. The EATMSP web application is built on REST (Representational State Transfer) architecture. RESTful application uses HTTP request to post data (create or update), read data, and delete data in the web application. An HTTP (Hyper Text Transfer Protocol) request is made from raspberry pi (client) to the web application (server) which has the security breach information. The web application responds to the raspberry pi by a status code. The status code indicates the completion of a specific HTTP request, and for a successful response, the code range is between 200 and 299. In addition, a log-in system is developed in this web application to ensure data security and is given in Fig. 2b. Every ATM center in a particular zone of different banks is listed on the home page of the web application as given in Fig. 2a. An Individual webpage is deployed for every bank covering all the ATMs of a partic-
Enhanced ATM Security Using Facial Recognition, Fingerprint …
279
ular zone. The location of the ATM center and their bank’s contact details are also available on the respective webpage as shown in Fig. 3a. Moreover, a separate webpage is also deployed to add new ATM details to the web application. The details are added without deteriorating the database of the web application and is shown in Fig. 3b. These details are added only by the authorized person in the server. The web application interacts with IoT devices [27] (RFID, Biometric authorization and facial recognition) present in the ATM to ensure whether no security breach has occurred. A unique Id for every ATM is generated when the details of the respective ATM are encrypted to the database. This unique id is embedded in the web address of the ATM webpage. The details of the security breach are updated by provoking the post request (HTTP request) to the web application from the microprocessor. The request address includes the unique id. The request is decrypted in the cloud using NodeJs (Back-end Development). The respective ATM is determined using the unique id. Subsequently, the issue is posted on the particular ATM page and also on the home page of the EATMSP web application.
3 Operation of the Proposed System All the devices namely Pi camera, RFID tag and reader, and biometric device are interfaced with the MPD. The schematic of the proposed system is shown in Fig. 4. The Raspberry Pi 3 Model B+ in this security system has a Broadcom BCM2837 SOC with a 1.4 GHz 64-bit quad-core ARM Cortex-A53 processor, with 512 KiB shared L2 cache. It contains 40 pin headers which include 12 power source (+5 V, +3.3 V, GND) pins, 26 General Purpose Input Output (GPIO) pins, and 2 ID EEPROM (ID _SD and ID _SC) pins. The General Purpose Input Output (GPIO) pins include communication or Universal Asynchronous Receiver Transmitter interface (RXD, TXD), Serial Peripheral Interface [SPI—MOSI (Master In Slave Out), MISO (Master Out Slave In), SCLK (Serial Clock), CE (Chip enable also called as slave select)] pins, Two Wire Interface (TWI) pins or I2C (Inter-integrated Circuit) pins, and 4 PWM pins. UART (Universal Asynchronous Receiver/Transmitter) pins are used for interfacing with sensors. Similarly, SPI pins are used to communicate with other boards or peripherals. SPI communication is used when the peripheral need high communication speed. In the proposed system, the RFID RC522 is interfaced with the microprocessor via Serial Peripheral Interface (SPI) pins. Similarly, the fingerprint module and Pi camera are interfaced with USB (Universal Serial Bus) port and CSI (Camera Serial Interface) port. The RFID RC522 is an eight-pin device powered with 3.3 V. The SPI is configured in the raspberry pi by using the command “sudo raspi-config”. Once the microprocessor is configured, the processor acts as a master and the RFID device acts as a slave. The serial clock [SCLK pin of Serial Peripheral Interface(SPI)] pin of the microprocessor provides pulses at a regular frequency to the SCK pin of the RFID reader. During the rising edge of every clock pulse, the handshake signal is transmitted from the master to the slave using MOSI (Master Out Slave In) pin, to
280
K. V. Gunalan et al.
(a) Login System for ATM Access
(b) Webpage with All the ATM’s Fig. 2 Webpage for all ATMs
initiate the RFID device. Similarly, during the falling edge of every clock pulse user RFID data is transmitted from the slave to the master via MISO (Master In Slave Out) pin. The chip enable (CE0 and CE1) pins of the microprocessor is used to select a device that is interfaced to the microprocessor that is ready to transmit the data. These two pins are also used to enable multiple peripherals that share the common CLK, MOSI, and MISO pins of the microprocessor. The fingerprint (Generic R307) module transfers the biometric data of the user to the microprocessor. A USB to serial TTL (Transistor-Transistor Logic) UART converter is used so that the fingerprint sensor can be plugged into a USB port rather than using the UART pins of the raspberry pi. The image data from the fingerprint module is transmitted by its TX pin to the RX pin of TTL UART. Subsequently,
Enhanced ATM Security Using Facial Recognition, Fingerprint …
281
(a) Web Page for an ATM with location
(b) Web Page to Add a New ATM Details Fig. 3 Webpage to ADD ATMs with location
the data from TTL (TL-26098) is transferred to the microprocessor via the USB port. To establish the connection and to authenticate the data received from the microprocessor, the RX pin of the fingerprint module is connected to the TX pin of TTL. The pi camera is attached to the Camera Serial Interface (CSI) port of the microprocessor via a 15-way ribbon cable. The camera is also enabled by using the command “sudo raspi-config” available in the configuration menu of the microprocessor. Once an image is captured, the pixels of the image are read out by the camera sensor one row at a time. The frame lines are sent to the Image Signal Processor (ISP) present
282
K. V. Gunalan et al.
Fig. 4 Schematic of the proposed ATM security system
in the Graphics Processing Unit (GPU). The Vision Processing Unit (VPU) of GPU communicates with the microprocessor for the image control parameters to process the image. The process of the image includes white balance, digital gain, noise reduction, brightness, and contrast. These processes are done to enhance the quality of the captured image. Later, the image of the user is transferred to the microprocessor. The image data is verified by the microprocessor for any mismatch. The flow chart of the proposed system is given in Fig. 5. The binary data received from RFID, fingerprint module, and Pi camera is decoded by a program in the microprocessor. The program is written using python code. The library files such as RPi.GPIO, mfrc522, and PyFingerprint are used to encode and decode the data through the GPIO pins, RFID reader and fingerprint module, respectively. The compiled python code interprets and processes the data received from the sensors. It also validates all the critical security checks of the proposed
Enhanced ATM Security Using Facial Recognition, Fingerprint …
283
Fig. 5 Flowchart of the proposed ATM security system
ATM security system. The critical checks include verification of the user data such as RFID detection, biometric authentication, and facial recognition. In the circumstance of any security breach, the issue is updated in the web application. Subsequently, the transaction initiated by the user is declined by the system.
4 Experimental Results The proposed system is validated through a hardware setup as shown in Fig. 6. The hardware of the proposed ATM security system is tested for three different cases namely 1. All the data (fingerprint authorization, facial recognition, and RFID) of an individual is matched. 2. Only part of the data of an individual is matched. 3. None of the data of an individual is matched. In all the three cases, the user accessing the ATM has to undergo the basic protocols, i.e., • RFID information (unique ID) of the user is obtained from the tag. • User’s face is captured by Pi camera. • Fingerprint data of the same user is recorded using the fingerprint sensor.
284
K. V. Gunalan et al.
(a) Hardware prototype
(b) Working Model with Web portal Fig. 6 Hardware setup of the proposed ATM security system
All the data of the user is obtained from the Pi camera, fingerprint sensor, and RFID tag. This information is transferred to MPD.
4.1 Case 1: Data Is Matched In this case, the information of the user obtained is compared with the data stored in the database of the MPD. All the captured details of the user data are matched with the data stored in the database. Hence, the user is authorized for money transaction and no warning message is displayed in the web application as shown in Fig. 7.
Enhanced ATM Security Using Facial Recognition, Fingerprint …
(a) Microprocessor Console Output
285
(b) Web portal Output
Fig. 7 Output of the system for case 1
(a) Microprocessor Console Output
(b) Web portal Output
Fig. 8 Output of the system for case II
4.2 Case II: Data Is Partially Matched When the user access the ATM, all the details of the user are obtained based on the protocol. The user’s face is identified and is compared with his fingerprint data and RFID data stored in the database. The person is identified during the facial recognition process, but the user’s biometric data does not match with the data of facial recognition and the RFID. Therefore, the person is restricted from access to further money transactions. The details of the security breach are updated in the web application for the mismatch of fingerprint authorization. The results of case II are shown in Fig. 8
4.3 Case III: None of the Data Is Matched In this case, the user who access the ATM has undergone the basic protocols. The details of the user are compared with the details stored in the database. None of the captured data (fingerprint, facial recognition, and RFID) match with each other as
286
K. V. Gunalan et al.
(a) Microprocessor Console Output
(b) Web portal Output
Fig. 9 Output of the system for case III
stored in the database. The user’s face is marked as “Unknown”, and the person is prevented from a further transaction. An alert message is sent to the authorized web server. The details of the security breach are updated in the web application and are shown in Fig. 9. All three cases are verified using the given security system. The results are accurate and are processed within a few seconds. The proposed security system can also be made tight with an iris scanner for the user and vibration sensor in case of theft in the ATM. These two features are identified as future work for the proposed system.
5 Conclusion The ATMs have become more important for the people across the globe owing to the fact millions of money transactions happen through ATM per day. Though the existing ATM is secured with person identification number (PIN) mode access, the security of the ATM is perilous and it needs to be enhanced. Therefore, an enhanced ATM security system is proposed and implemented with the use of human biometrics such as fingerprint authentication and facial recognition. The human biometrics used in this system is hard to duplicate and makes the security impenetrable. It is also proved experimentally for three cases. From the results, it is evident that there will be an increase in the delicate level of trust and confidence in the society on electronic transactions via ATM. Moreover, the security system provided in this paper is of great value to the RBI to enhance the present architecture of the ATM. The designed system is simple, easy to maintain and operate at a low cost. The entire framework of the system is extra protected, solid, and clear to utilize. The introduction of the 5G mobile network in this paper helps to control more IoT devices remotely in applications due to its unique combination of high-speed connectivity and very low latency. More research and innovation are needed to enhance and strengthen the present security level of the ATM system in the future.
Enhanced ATM Security Using Facial Recognition, Fingerprint …
287
References 1. Bradbury D (2016) Why we need better ATM security. Eng Technol 11:32–35 2. Kouser F, Pavithra V, Sree B, Others (2018) Highly secure multiple account bank affinity carda successor for ATM card. In: 2018 International conference on design innovations for 3Cs compute communicate control (ICDI3C), pp 115–119 (2018) 3. Waller P (2017) Electronic payment mechanisms in social security: extending the reach of benefit and contribution transactions. Int Soc Secur Rev 70:3–30 4. Markantonakis K, Main D (2017) Smart cards for banking and finance. In: Smart cards, tokens, security and applications, pp 129–153 5. Rakhecha P, Tanwar M (2019) Study on e-banking and digital payment movement of India. J Gujarat Res Soc 21:470–473 6. Sagar B, Singh G, Saket R (2011) Design concept and network reliability evaluation of ATM system. Int J Comput Aided Eng Technol 3:53–76 7. Jain A, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circ Syst Video Technol 14:4–20 8. Srivatsa K, Yashwanth M, Parvathy A (2010) RFID & mobile fusion for authenticated ATM transaction. Int J Comput Appl 3:5–10 9. Ayed M, Elkosantini S, Alshaya S, Abid M (2019) Suspicious behavior recognition based on face features. IEEE Access 7:149952–149958 10. Kishore R, Suriya S, Vivek K, Others (2019) Enhanced security for ATM machine with OTP and facial recognition features. Int Res J Multidisc Technovation 1:106–110 11. Shriram S, Shetty S, Hegde V, Nisha K, Dharmambal V (2016) Smart ATM surveillance system. In: 2016 International conference on circuit, power and computing technologies (ICCPCT), pp 1–6 12. Umar M, Mehmood A, Song H, Choo K (2017) I-marks: an iris code embedding system for ownership identification of multimedia content. Comput Electr Eng 63:209–219 13. Lim D, Devadas S (2005) Extracting secret keys from integrated circuits. IEEE Trans VLSI Syst 13:1200–1205 14. Schultz C, Wong J, Yu H (2018) Fabrication of 3D fingerprint phantoms via unconventional polycarbonate molding. Sci Rep 8:1–9 15. Parkash D, Kundu T, Kaur P (2012) The RFID technology and its applications: a review. Int J Electron Commun Instrum Eng Res Deve (IJECIERD) 2:109–120 16. Ahsan K, Shah H, Kingston P (2010) RFID applications: an introductory and exploratory study. ArXiv Preprint ArXiv:1002.1179 17. Uddin J, Reaz M, Hasan M, Nordin A, Ibrahimy M, Ali M (2010) UHF RFID antenna architectures and applications. Sci Res Essays 5:1033–1051 18. Wu Y, Liu H, Li J, Fu Y (2018) Improving face representation learning with center invariant loss. Image Vis Comput 79:123–132 19. Badave A, Jagtap R, Kaovasia R, Rahatwad S, Kulkarni S (2020) Android based object detection system for visually impaired. In: 2020 International conference on industry 4.0 technology (I4Tech), pp 34–38 20. Kumar D, Amgoth T, Annavarapu C (2019) Machine learning algorithms for wireless sensor networks: a survey. Inf Fusion 49:1–25 21. VenkateswarLal P, Nitta G, Prasad A (2019) Ensemble of texture and shape descriptors using support vector machine classification for face recognition. J Amb Intell Human Comput, pp 1–8 22. Tyagi A, Simon R, Others (2019) Security enhancement through IRIS and biometric recognition in ATM. In: 2019 4th International conference on information systems and computer networks (ISCON), pp 51–54 23. Vrzakova H, Begel A, Mehtätalo L, Bednarik R (2020) Affect recognition in code review: an in-situ biometric study of reviewers affect. J Syst Softw 159:110434 24. Verma G, Liao M, Lu D, He W, Peng X, Sinha A (2019) An optical asymmetric encryption scheme with biometric keys. Opt Lasers Eng 116:32–40
288
K. V. Gunalan et al.
25. Tilkov S, Vinoski S (2010) Node. js: Using javascript to build high-performance network programs. IEEE Internet Comput 14:80–83 26. Poulter A, Johnston S, Cox S (2015) Using the MEAN stack to implement a RESTful service for an Internet of Things application. In: 2015 IEEE 2nd world forum on internet of things (WF-IoT), pp 280–285 27. Calderoni L, Magnani A, Maio D (2019) IoT manager: an open-source IoT framework for smart cities. J Syst Arch 98:413–423
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban Using GIS and Satellite Image Processing S. Meghana and M. Geetha Priya
Abstract Wetlands are also known as water bodies play a vital role in maintaining natural balance. It is very important to maintain and preserve wetlands. They act as sources of drinking water and replenish groundwater. It makes the environment a better place to live. GIS and remote sensing data are widely used for monitoring wetland distribution. The use of remote sensing data is more advantageous for wetland studies such as monitoring and creating an inventory as it is less timeconsuming than aerial photography. Also, it is economical which can be used by developing countries. Lakes in Bengaluru, Karnataka have been disappearing due to rapid urbanization as it has posed a major danger to natural resources in several parts of Bengaluru. Preservation of water bodies has become more vital in today’s scenario; hence, the present intervention study was conducted for major lakes in Bengaluru urban. For this study Landsat-7 and Landsat-8 satellite, multispectral data images were used for a study period of 20 years covering 2000–2021. Normalized Difference Water Index (NDWI) was calculated to identify water bodies, and False Color Composite (FCC) is used to identify the features of lakes for digitization. Results indicate that the lakes have rapidly encroached in the past two decades. Keywords Encroachment · Landsat · Multispectral · NDWI · Lake
1 Introduction Water bodies play a vital role in sustaining life. Lake is a water-filled area surrounded by land. Lake ecosystems that are well-designed can help mitigate the effects of floods and droughts by holding vast amounts of water and releasing it when needed. Lakes also help to replenish groundwater, improve the water quality of downstream S. Meghana Department of ECE, Jyothy Institute of Technology, Bengaluru, India M. Geetha Priya (B) CIIRC, JyothyInstitute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_23
289
290
S. Meghana and M. Geetha Priya
watercourses, give recreational opportunities, and protect the area’s biodiversity and ecosystem. Bengaluru, a fast-growing Asian city, has gained the popularity of “Silicon City”, due to its advances in information technology. Formerly known as Garden City, it is losing its lung space as a result of rapid urbanization and multi-faceted industrial development [1]. There are many lakes in Bengaluru. Bengaluru has experienced unprecedented urban growth for the past two decades due to strong industrial development activities and economic development in the region. This steady growth has led to an increase in population and subsequent pressures on infrastructure and natural resources. Lakes have suffered immensely as a result of urbanization. Bengaluru is no exception in these developments, and it is even worse than other cities [2]. Bengaluru’s rapid urbanization has resulted in the unplanned spread of housing estates and parks, causing environmental damage. A survey of 105 lakes in Bangalore was conducted in the year 2013 showing that 98% of lakes have encroached (lake bed flood plain, etc.) and 90% of lakes are sewage fed (sustained inflow of untreated sewage) [3]. Encroachment is a major factor in the disappearance of lakes, their storage capacity, and the reduction of their area. The first stage in planning for water resource conservation and sustainable management is to establish an accurate and up-to-date survey. For monitoring, evaluation, and conservation of water resources, an inventory of water bodies and their characteristics is required. The best technique used for this purpose is geographic information system (GIS) and satellite remote sensing. Remote sensing is one such technology that can provide cost and time-effective solutions to mitigate these problems [4]. This study aims to assess the area lost and the percentage of encroachment on Bengaluru’s important lakes, using a remote sensing approach by comparing 20 years of data.
2 Materials and Methods 2.1 Study Area Bengaluru (12°58 44 N, 77°35 30 E) is one of the largest cities in India and (Fig. 1) has been adopted as the study area for the present work. It is at an elevation of 3113 feet (949 m) above sea level. Bengaluru receives an annual rainfall of 900 mm with three different rainy seasons covering nine months of the hydrological year. Bengaluru is known as the “Land of Lakes” because it has a huge number of lakes (about 285) that were built to store water. The number of lakes in Bengaluru has reduced from nearly 285–194 [5]. Due to industrialization, growth in technology and science, the city acquired “Silicon Valley” status, however, with globalization, the city lost its glory due to unplanned, unrealistic, and irresponsible urbanization. Lakes in Bellandur, Begur, Hulimavu, Agara, Sarakki, Hoskerehalli, Arekere, Gottigere, Uttarahalli, and Puttenahalli (Bengaluru urban) have been considered for the study (Fig. 1 and Table 1). These are the lakes in Bengaluru urban which are
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban …
291
Fig. 1 Study area map
Table 1 Coordinates of the study area
S. No. Lakes
Coordinates
1
Bellandur lake
12°56 3 N, 77°39 46 E
2
Begur lake
12° 52 28.98 N, 77° 37 40.78 E
3
Hulimavu lake
12.87° N, 77.60° E
4
Agara lake
12°55 15.6 N, 77°38 27.6 E
5
Sarakki lake
12°53 54 N, 77°34 40 E
6
Hoskerehalli lake 12°55 35 N, 77°32 4 E
7
Arekere lake
12°52 58.76 N, 77°35 53.19 E
8
Gottigere lake
12.85° N, 77.59° E
9
Uttarahalli lake
12.9078° N, 77.5415° E
10
Puttenahalli lake
12°53 26.37 N, 77°35 12.02 E
encroached by the public as well as government agencies [2] to the extent possible, thus creating shortage of groundwater level, affecting ecosystem and biodiversity (flora and fauna).
292 Table 2 List of Landsat data used in the present study
S. Meghana and M. Geetha Priya Year
Satellite name
Sensor name
Date of acquisition
2000
Landsat-7
ETM+
16-03-2000
2001
Landsat-7
ETM+
03-03-2001
2002
Landsat-7
ETM+
18-02-2002
2003
Landsat-7
ETM+
09-03-2003
2004
Landsat-7
ETM+
11-03-2004
2005
Landsat-7
ETM+
14-03-2005
2007
Landsat-7
ETM+
05-04-2007
2008
Landsat-7
ETM+
06-03-2008
2009
Landsat-7
ETM+
05-02-2009
2010
Landsat-7
ETM+
08-02-2010
2011
Landsat-7
ETM+
31-03-2011
2012
Landsat-7
ETM+
01-03-2012
2013
Landsat-7
ETM+
31-01-2013
2014
Landsat-8
OLI
11-02-2014
2015
Landsat-8
OLI
13-01-2015
2017
Landsat-8
OLI
11-03-2017
2018
Landsat-8
OLI
22-02-2018
2019
Landsat-8
OLI
13-03-2019
2020
Landsat-8
OLI
27-01-2020
2021
Landsat-8
OLI
02-03-2021
2.2 Satellite Data Used The Landsat program is the world’s longest-running satellite images acquisition project which is a NASA/USGS collaboration. Multispectral satellite cloud-free data of Landsat-7 (Enhanced Thematic Mapper Plus) and Landsat-8 (Operational Land Imager) from the path and row numbers 144 and 51, respectively, have been downloaded from USGS (https://earthexplorer.usgs.gov/) website for the study period 2000–2021. Data for the years 2006 and 2016 are not considered due to the unavailability of data. Data bands RED, GREEN, BLUE, near-infrared (NIR), and short wave-infrared (SWIR) of 30 m resolution are used for the computation of NDWI and FCC. Table 2 shows the satellite details of the data used in the present study.
2.3 Process Flow Landsat-7 consists of 8 bands, and Landsat-8 consists of 11 bands. The Landsat-7 and Landsat-8 data downloaded from USGS are available in digital numbers (DN) format (Level 1, Collection 1). This DN data is preprocessed by converting it to
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban …
293
radiance and then into top of atmospheric reflectance (ToA) by using Eqs. (1) and (2), respectively [6]. X λ = PL × Q cal + A L
(1)
where Xλ is the TOA spectral radiance (W/m2 s rad µm); PL and AL are the bandspecific multiplicative and additive radiance rescaling factor, respectively, from the metadata; and Qcal is the calibrated and quantized standard product pixel values (digital numbers). X λ = Pρ × Q cal + Aρ
(2)
where X λ is the TOA spectral reflectance; Pρ and Aρ are the band-specific multiplicative and additive reflectance rescaling factor, respectively, from the metadata; and Qcal is the calibrated and quantized standard product pixel values (digital numbers).
2.3.1
Estimation of NDWI
Equation 3 represents the NDWI computed using the near-infrared (NIR) and short wave infrared (SWIR) reflectance bands. The water bodies are analyzed using the Normalize Difference Water Index. NDWI is developed by McFeeters (1996) to enhance the water-related features of the landscapes. NDWI can be calculated by the following formula [7]: NDWI = (NIR−SWIR)/(NIR + SWIR)
(3)
The NDWI was computed for Landsat-7 and Landsat-8 using Eqs. 4 and 5. For Landsat-7 data, NDWI = (Band 4−Band 5)/(Band 4 + Band 5)
(4)
For Landsat-8 data, NDWI = (Band 5−Band 6)/(Band 5 + Band 6)
2.3.2
(5)
FCC Computation
FCC is a picture created intentionally in which the colors blue, green, and red are allocated to wavelength areas where they do not belong in nature. Blue is assigned to green radiations (0.5–0.6 m), green is assigned to red radiations (0.6–0.7 m), and red is assigned to near-infrared radiation (0.7–0.8 m) in a typical False Color Composite
294
S. Meghana and M. Geetha Priya
Fig. 2 Process flow
Landsat 7/8 data
Radiometric and Atmospheric corrections
NDWI and FCC
Visual interpretation of water body boundaries
On-screen digitization
Calculation of area and encroachment (20 years)
[8]. FCC is computed using bands 5, 4, and 3 (SWIR, NIR, and RED) for Landsat-7 and bands 6, 5, and 4 (SWIR, NIR, and RED) for Landsat-8. A vector layer (shapefile) was created by digitization of visual interpretation of water body boundaries for 20 years of data for the study region. The area of each shapefile was calculated. The region of encroachment was also digitized, and the encroachment area was calculated. The process flow chart used in this study is shown in Fig. 2.
3 Results Most of the lakes in Bengaluru city have undergone changes in the last two decades, including shrinking and encroachment for real estate purposes. In some of the lakes, few cents to some acres of land have encroached. The change in area of lakes in Bellandur, Begur, Hulimavu, Agara, Sarakki, Hoskerehalli, Arekere, Gottigere, Uttarahalli, and Puttenahalli are shown in Fig. 3 over 20 years from 2000 to 2021. The polygon in red color represents the change in area/encroachment as compared with the google earth application using historical imagery.
(j)
(h)
(d)
Fig. 3 a Bellandur lake, b Begur lake, c Hulimavu lake, d Agara lake, e Sarakki lake, f Hoskerehalli lake, g Arekere lake, h Gottigere lake, i Uttarahalli lake, and j Puttenahalli lake
Encroachment
2021
(i)
(g)
(f)
(e)
2000
(c)
(b)
(a)
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban … 295
296
S. Meghana and M. Geetha Priya
Table 3 Area of lakes from 2000 to 2001 Year
Area in km2 Bellandur lake
Begur lake
Hulimavu lake
Agara lake
Sarakki lake
2000
3.373
0.492
0.431
0.397
0.281
2001
3.373
0.492
0.431
0.397
0.281
2002
3.373
0.492
0.429
0.397
0.281
2003
3.373
0.492
0.429
0.397
0.281
2004
3.373
0.492
0.429
0.361
0.281
2005
3.373
0.492
0.429
0.361
0.281
2007
3.251
0.492
0.425
0.361
0.229
2008
3.251
0.492
0.425
0.361
0.229
2009
3.251
0.488
0.421
0.361
0.225
2010
3.251
0.488
0.421
0.361
0.225
2011
3.251
0.488
0.421
0.361
0.221
2012
3.251
0.488
0.421
0.361
0.221
2013
3.251
0.481
0.421
0.361
0.221
2014
3.251
0.481
0.421
0.361
0.221
2015
3.251
0.481
0.421
0.361
0.221
2017
3.251
0.481
0.421
0.361
0.221
2018
3.214
0.481
0.421
0.361
0.221
2019
3.214
0.481
0.421
0.361
0.221
2020
3.214
0.481
0.407
0.361
0.221
2021
3.214
0.481
0.407
0.361
0.221
The boundary of 10 lakes has been digitized individually, and the area has been calculated for 20 years from the FCC and NDWI images which are shown in Fig. 3 and Tables 3 and 4. From Tables 3, 4 and 5, the following observations are made: • The minimum encroachment that can be seen in Begur lake, Gottigere lake, Bellandur lake, Arekere lake, and Hulimavu lake are in the range of 2.2–5.5%. The changes in the area of these lakes are observed from the year 2005 onwards because of the construction of buildings. • The average encroachment or shrinkage in the area of the lakes such as Agara lake, Hoskerehalli lake, and Uttarahalli lake due to industrialization and construction of roads are in the range of 8.6–11.1% approximately. • The maximum encroachment can be seen in Sarakki lake and Puttenahalli lake which is approximately from 21 to 28.2%. As these lakes are situated in the prime locations of Bengaluru city, more encroachment can be seen in terms of the residential area.
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban …
297
Table 4 Area of lakes from 2000 to 2021 Year
Area in km2 Hoskerehalli lake
Arekere lake
Gottigere lake
Uttarahalli lake
Puttenahalli lake
2000
0.221
0.099
0.132
0.051
0.039
2001
0.221
0.099
0.132
0.051
0.039
2002
0.221
0.099
0.132
0.051
0.039
2003
0.221
0.099
0.132
0.051
0.039
2004
0.221
0.099
0.132
0.05
0.039
2005
0.221
0.099
0.132
0.05
0.039
2007
0.22
0.097
0.132
0.047
0.035
2008
0.22
0.097
0.132
0.047
0.035
2009
0.202
0.097
0.132
0.047
0.035
2010
0.202
0.097
0.128
0.047
0.03
2011
0.202
0.097
0.127
0.047
0.03
2012
0.202
0.097
0.127
0.047
0.03
2013
0.202
0.097
0.127
0.047
0.03
2014
0.202
0.097
0.127
0.047
0.03
2015
0.202
0.097
0.127
0.047
0.03
2017
0.202
0.096
0.127
0.045
0.03
2018
0.202
0.096
0.127
0.045
0.03
2019
0.202
0.096
0.127
0.045
0.03
2020
0.202
0.096
0.125
0.045
0.03
2021
0.202
0.096
0.125
0.045
0.03
Table 5 Change in area of lakes S. No.
Lake name
Change in the area in km2
Change in the area in percentage (%)
(2000–2021)
(2000–2021)
1
Bellandur lake
0.1594
2
Begur lake
0.0109
2.2149
3
Hulimavu lake
0.024
5.56
4
Agara lake
0.0361
9.1
5
Sarakki lake
0.0592
21.097
6
Hoskerehalli lake
0.0192
8.6799
7
Arekere lake
0.0318
3.03
8
Gottigere lake
0.0049
5.303
9
Uttarahalli lake
0.0057
11.176
10
Puttenahalli lake
0.0089
22.8205
4.7249
298
S. Meghana and M. Geetha Priya
4 Conclusion Lakes are encroached due to unplanned urbanization. The disappearance and abandonment of lakes in urban areas has have resulted in a slew of problems, including tank life span reduction, water contamination, groundwater depletion, encroachments, and health dangers. In the present study which has been done, out of ten lakes on an average of 8.15 acres of the area have been encroached. Approximately 81.5 acres of the total area have been encroached or shrunken due to the development of the residential area. It can be seen that an average of 9.37% of the lake area has been occupied. The encroached lakebeds have been used not only for infrastructure development but also for the construction of residential and government structures. Many cities have experienced a shortage of drinking water in recent years due to insufficient water resources. Rejuvenation of lakes has been done by the government to get back the encroached area. Despite the attempts to restore a damaged lake to its natural state, the lack of ecological methods in the restoration has unfortunately left it at its current state. To avoid the encroachment of lakes, we need to have a regular measurement of the area of lakes and it should be properly fenced to avoid any further encroachments. The dried lakes have to be monitored since it is more prone to encroachment. These restoration goals require intensive landscape planning, leadership, and funding, with active involvement not only from all levels of various organizations but also from every citizen living in the city. We will continue to lose lakes if sufficient and effective actions are not implemented, and we will be unable to maintain, preserve, and restore our lakes for future generations. Acknowledgements The authors thank the Director, CIIRC, Jyothy Institute of Technology, Bengaluru for lab facilities and access to the software. Also, the authors would like to thank the Principal, Jyothy Institute of Technology, Bengaluru for encouraging and supporting us.
References 1. Achutha VR, Najeeb KM, Jayaprakash HP, Rajarajan K (2002) Ground water information booklet Bangalore urban district, Karnataka. Government of India, Ministry of Water Resources Central Ground Water Board, November, 1–17 (2002). http://cgwb.gov.in/district_profile/karnat aka/bangalore_urban_brochure.pdf 2. Thippaiah P (2009) Vanishing lakes: a study of Bangalore city 3. Ramachandra TV, Asulabha KS, Sincy V, Sudarshan PB, Aithal BH (2016) Wetlands: treasure of Bangalore ENVIS technical report: 101 January 2016. Energy & Wetlands Research Group, CES TE 15 Environmental Information System [ENVIS] Centre for Ecological Sciences, Indian Institute of Science, Bangalore, 560012, India, p 106 4. Sekhon HK, Jain AK, Singh H (2016) Creation of inventory of water bodies in Hoshiarpur district using remote sensing and GIS. Int J Adv Remote Sens 4(1):32–41 5. Ramachandra TV, Vinay S, Mahapatra DM, Sincy V, Aithal BH (2016) Water situation in Bengaluru. ENVIS technical report, Environmental Information System, p 114 6. U.S. Geological Survey (2016) Landsat 8 data users handbook. NASA, p 97. https://landsat. usgs.gov/documents/Landsat8DataUsersHandbook.pdf
Spatial and Temporal Analysis of Water Bodies in Bengaluru Urban …
299
7. Bahadur (2018) NDVI, NDBI & NDWI calculation using Landsat 7 8. Cracknell AP, Hayes LBW (1991) Introduction to remote sensing. https://doi.org/10.1117/3.673 407.ch1
Shoreline Change Detection and Coastal Erosion Monitoring: A Case Study in Kappil–Pesolikal Beach Region of the Malabar Coast, Kerala Sushma S. Bharadwaj and M. Geetha Priya
Abstract The geological borders that divide the seawater from the land region are known as shorelines. Erosion and accretion alter the shoreline, which has major ramifications for coastal habitats and settlements. Kerala’s coastline has been subjected to significant erosion and accretion in recent years, making it more susceptible to coastal alterations similar to those seen in other parts of the globe. A part of the Malabar coastline stretch from Kappil Beach to Pesolikal Beach has been chosen as the study area for the present study over the study period 2005–2020. Environmental System Research Institute’s (ESRI) software extension Digital Shoreline Analysis System (DSAS) along with Landsat 7/8 multispectral images was used to highlight the changes in the coastline location of the study region. The variations in coastline were statistically established using linear regression rate (LRR) and end point rate (EPR), and it was discovered that the Malabar coastline had suffered a severe reduction. Results suggested that the average accretion and erosion rates varied across regions of the study area, which were divided for accurate results. The work demonstrates the utility of DSAS as a research tool to study shoreline change and depicts the state of erosion along the Malabar coastline. Keywords Shoreline · Erosion · Accretion · DSAS · QGIS
1 Introduction Shorelines are dynamic aspects of coastal ecosystems that alter in response to wave energy, storm events, and changes in sea level and sediment supply [1]. Shoreline changes are primarily caused by natural processes such as erosion and accretion, which can occur as a result of a range of short- and long-term events [2]. Storms, S. S. Bharadwaj Departmrnt of ECE, Jyothy Institute of Technology, Bengaluru, India M. Geetha Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_24
301
302
S. S. Bharadwaj and M. Geetha Priya
waves, tides, and winds are all short-term occurrences, whereas glaciations and tectonic cycles, which can drastically affect sea levels, coastal land subsidence, or emergence, are long-term phenomena. Most coasts are inherently dynamic, and coastal erosion has become a growing problem as the economy has developed and threats have increased. Furthermore, detrimental environmental effects from anthropogenic activities such as sand mining, over-exploitation of groundwater, and land reclamation are all indirect factors of shoreline erosion. Shoreline exploitation is the primary focus of coastal zone management, which is a crucial role in national development and environmental protection [2]. There is an estimated 504,000 km of shoreline on the planet, and more than half of the world’s population lives within 100 km of the sea [3]. Generally, coastal cities have fought erosion by hardening the shoreline, which has included the construction of seawalls and breakwaters. The demand for erosion prevention by waterfront property owners and municipalities will grow as the human population along the coast continues to grow and sea-level rise accelerates. To effectively address the difficulties posed by coastal erosion risk and future sealevel change, relevant data must be collected and reliable methodologies should be used to demarcate the shoreline as well as precisely measure and reflect fluctuations in its position [4]. It is worth noting that, thanks to advancements in satellite imagery, image processing is currently the primary method for detecting shorelines [3]. The practice of gathering information about objects or places from afar, usually by aircraft or satellites, is known as remote sensing [5]. It is becoming more popular in coastal monitoring since it adds to the completeness of the radiometric data and enables automated or semi-automated shoreline extraction using image processing. Shoreline change detection analysis is a critical work with applications in a variety of domains, including setback designing, hazard management, erosion-accretion investigations, regional sediment budgets, and theoretical and predictive analysis of coastal morphodynamics [6]. There is a higher demand for the evaluation of shoreline detecting systems as interest in beach monitoring and coastal erosion grows [3]. Coastal erosion has now become a key concern for a highly populous state like Kerala due to a severe lack of land, particularly due to the high population density in the coastal zone. The necessity of detecting and responding to changes in the shoreline cannot be overstated [7]. As a result, an attempt is undertaken in this study to generate a shoreline map and estimate shoreline change over the 16 years using DSAS software in the Kappil–Pesolikal Beach region (part of Malabar Coast) of Kerala.
2 Materials and Methods 2.1 Study Area Kerala is located between 8°17 30 N and 12°47 40 N in northern latitude and 74°27 47 E and 77°37 12 E in eastern longitude [8]. The state’s coastline length
Shoreline Change Detection and Coastal Erosion Monitoring: A Case …
303
Fig. 1 Geographical location of the study area (a part of the Malabar Coast, Kerala)
is about 580 km, and its width ranges from 35 to 120 km. The study area chosen extends from Kappil Beach, near Varkala to Pesolikal Beach, Kadinamkulam. Kappil is a tourist destination in Kerala’s Thiruvananthapuram district. It is situated on the Arabian seashore in Edava Panchayat in Varkala Taluk which is situated at 8°46 49 N in the northern latitude and 76°40 35 E. Kadinamkulam is a panchayat situated in the northern suburb of Trivandrum City. It is about 20 km away from Varkala and is surrounded by Kadinamkulam Kayal (lake) to the east and the Arabian Sea on its west and its coordinates are 8°36 0 N and 76°49 0 E. Figure 1 shows the geographic mapping position of the study area (a part of the Malabar coast, Kerala).
2.2 Data Used The Malabar Coast’s long-term shoreline change assessment is studied for 16 years, between 2005 and 2020. The evaluation of shoreline change is based on a comparison of four shorelines derived from different periods of satellite imagery. The multispectral satellite cloud-free data of Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and Landsat 8 Operational Land Imager (OLI), from the path and row number 144
304
S. S. Bharadwaj and M. Geetha Priya
Table 1 List of satellite data used S. No.
Satellite
Scene ID
1
Landsat 7 ETM LE07_L1TP_144054_20050314_20170117_01_T1 + C1 Level-1
14-03-2005
2
Landsat 7 ETM LE07_L1TP_144054_20100107_20161216_01_T1 + C1 Level-1
07-01-2010
3
Landsat 7 ETM LE07_L1TP_144054_20150222_20161029_01_T1 + C1 Level-1
22-02-2015
4
Landsat 8 OLI/TIRS C1 Level-1
12-02-2020
LC08_L1TP_144054_20200212_20200225_01_T1
Date of acquisition
and 54, respectively, were downloaded from USGS (https://earthexplorer.usgs.gov/) website for the study area for the years 2005, 2010, 2015, and 2020. The red band of 30 m resolution was used for computation purposes [9]. The data utilized in this study are listed in Table 1.
2.3 Methodology The chosen study area falls under the UTM Zone 43N, WGS 84. The downloaded Landsat 7 and 8 images for the years 2005, 2010, 2015, and 2020 were subjected to radiometric corrections (digital number to reflectance) using the semi-automatic classification plug-in (SCP), which is an open-source plugin for Quantum Geographic Information System (QGIS) [10]. To eliminate any errors in analyzing shoreline change, the satellite imagery utilized in this study was first mathematically corrected to match the image position with its geographic position, and were, then, digitized manually [11]. Figure 2 shows the process used to determine the shoreline change. To determine the shoreline changes in DSAS, the following steps were followed. (a) A personal geodatabase was created where two new feature classes were added—shoreline and baseline. Various field names were added for both shoreline and baselines such as object ID (a unique number assigned to each transect), shape, ID, shape length, date (original survey year), and (b) Uncertainty. (c) After the shorelines were digitized manually for all the years, a baseline was built which acted as a buffer (reference) for the transects. Histogram images were used as it was easy to differentiate the pixels. (d) To cast the transects, the buffer was created with a maximum search distance of 1000 m and with a transect spacing of 100 m for accurate analysis. Then, the two statistical change measures were calculated namely end point rate (EPR) and linear regression rate (LRR). The EPR is computed by dividing the distance
Shoreline Change Detection and Coastal Erosion Monitoring: A Case … Fig. 2 Flowchart to determine the shoreline change
305
Landsat 7 ETM+ and Landsat 8 OLI
Radiometric corrections
Georeferencing and Digitization of the study area
Shoreline Digitization (DSAS)
Baseline and Transect establishment
Calculation of statistical change measures (EPR and LRR)
Shoreline change statistical map
between the data set’s oldest and youngest shorelines by the time difference between them. The shoreline shift is estimated using the LRR method by constructing a least square regression line to all shoreline locations for a specific transect [12]. Thus, the technique began with the establishment of a baseline in the general direction of the shoreline, followed by the establishment of transects perpendicular
306
S. S. Bharadwaj and M. Geetha Priya
to the baseline and lastly, the calculation of the distance between shorelines along various transects.
3 Results Along the 28 km length of Kappil Beach–Pesolikal Beach, DSAS generated 259 transects that were aligned perpendicular to the baseline and spaced at 100 m intervals. DSAS was applied to compute variation in shoreline rates using two out of six alternative statistical techniques—end point rate (EPR) and linear regression rate (LRR), where the EPR and LRR are negative and positive values show the landward and seaward movements, respectively. The stretch of Muthalapozhi–Pesolikal Beach is under accretion, whereas the region of Kappil–Varkala Beach is under erosion. The study was divided into three regions for accurate analysis as shown in Fig. 3. The stretch of Kappil Beach to Varkala Beach from transects 1 to 52 was labeled as Region A, while Region B was considered as Varkala Beach to Muthalapozhi Beach from transects 52 to 202. The coastal line of Muthalapozhi Beach– Pesolikal Beach, from transects 203 to 259, was labeled as Region C. Figures 4 and 5 show the endpoint rate and linear regression rate, and Table 2 shows the shoreline change across different regions of the study area. The erosion and accretion per year for each region were calculated. In region A, that is, the stretch of Kappil Beach–Varkala Beach did not show any accretion while there was an erosion of − 5.6607 m/year and −5.3621 m/year concerning end point rate and linear regression rate, respectively. Similarly, there was more erosion as compared to accretion in the coastal line of Varkala Beach–Muthalapozhi Beach which is Region B. The accretion was 2.535 m/year for EPR and 2.438 m for LRR, whereas the average erosion rate was about −4.4625 m/year (EPR) and −3.744 (LRR). Whereas in Region C, that is, Muthalapozhi Beach–Pesolikal Beach, there was more accretion of 4.5482 m/year (EPR) and 4.955 m/year (LRR). The main reason for the shoreline alteration in the Kappil–Pesolikal Beach region over the last 16 years, according to the above analysis, are both due to manmade development and natural factors. The shorelines are shaped by natural processes such as geomorphology and geology, the joint stroke of oceanic currents and waves, storms, tectonics plates, and sea-level fluctuation. Several coastal landforms, such as bays, headlands, estuaries, mudflats, and beaches, were involved in the shoreline changes along the study area. Variations in sea level can also result in shoreline erosion or accretion. One of the causes of erosion is the lack of sediments on the coast due to natural factors. In today’s world, human and anthropogenic events have a significant impact on coastal variations. The principal cause of accretion is sand deposition on the seashore. The tides, wind, wave movement, and longshore current, as well as wind speed, all had a role in the sand deposition. The direction of the wind and the motion of the waves are crucial factors in the sand deposition. Human activities like sand mining directly from beaches or relatively close regions, which resulted in an overall sand loss for a brief time, or topographical changes in closer
Shoreline Change Detection and Coastal Erosion Monitoring: A Case …
307
Fig. 3 Shoreline change for the years 2005, 2010, 2015, and 2020
areas, which contributed to high wave energy on the beach and sand wash off, all contributed to more erosion in Region A.
308
S. S. Bharadwaj and M. Geetha Priya End Point Rate
550 500
C
450 400
B
350 300
A
250 200 150 100 50
0 -4.59 -5.8 -4.11 -7.05 -7.22 -4.49 -3.72 -3.68 -3.98 -4.11 -1. -3.78 -4.77 -7.46 -3.4 -4.11 -2.89 -5.3 -4.75 -4.6 -4.93 5.33 6.22 6.12 4.84 3.1 -1.29 2.14
EPR
Fig. 4 End point rate of the study area
Linear Regression Rate 550 500
C
450
SHAPE_Length
400
B
350 300 250
A
200 150 100 50 0 -4.58 -6.08 -3.95 -6.5 -4.81 -7.44 -3.55 1.46 -5.05 -3.5 -6.95 -2.91 -3.85 -1.85 -2.8 -3.05 -7.07 -4.89 -3.77 -3.44 -5.08 9.32 5.05 3.14 3.76 3. 8.4 2. 0.25
LRR
Fig. 5 Linear regression rate of the study area
4 Conclusion Using DSAS, the regions of accretion and erosion were studied in the Malabar Coast. Over the last 16 years, the shoreline in the study region has changed dramatically. Due to accretion, the highest change in shoreline occurred in the coastal stretch of Muthalapozhi Beach to Pesolikal Beach, while the massive amount of erosion has taken place in the stretch of Kappil Beach to Varkala Beach. It is also observed that shoreline modifications along the curved sections had an uneven pattern, whereas straight stretches are also less prone to shoreline alterations. Therefore, similar kind
Shoreline Change Detection and Coastal Erosion Monitoring: A Case …
309
Table 2 Shoreline change for different regions Region/information
A (Kappil Beach to Varkala Beach)
B (Varkala Beach to Muthalapozhi Beach)
C (Muthalapozhi Beach to Pesolikal Beach)
Transects
Jan-51
52–202
203–259
Number of transects
50
150
56
Baseline distance from shoreline (m)
100
100
100
Average accretion (m/year)
0 (EPR)
2.535 (EPR)
4.5482 (EPR)
0 (LRR)
2.438 (LRR)
4.955 (LRR)
Average erosion (m/year)
−5.6607 (EPR)
−4.4625 (EPR)
−1.6175 (EPR)
−5.3621 (LRR)
−3.744 (LRR)
−0.9 (LRR)
Maximum accretion (m/year)
0 (EPR)
5.33 (EPR)
9.85 (EPR)
0 (LRR)
12.12 (LRR)
17.77 (LRR)
200
216, 203
Maximum erosion (m/year)
−9.78 (EPR)
−12.57 (EPR)
−2.41(EPR)
−9.34 (LRR)
−12.66 (LRR)
−0.9 (LRR)
[Transect]
12
183
243, 242
[Transect]
of detections needs to be followed for the other coastal regions which are prone to severe shoreline changes for a better understanding and future directions. Acknowledgements The authors thank the Director of the Centre for Incubation, Innovation, Research and Consultancy (CIIRC), Jyothy Institute of Technology, Bengaluru, for lab facilities and access to the software. Also, the authors would like to thank the Principal of Jyothy Institute of Technology, Bengaluru for the encouragement and support.
References 1. Salvador E, Rica C (2008) Climate change impacts on coastal ecosystems. Environment, 36–39 2. Amrutha TK, Reeba T (2021) Shoreline-change detection at Kodungallur–Chettuva region in Kerala state. IOP Conf Ser Mater Sci Eng 1114(1):012026. https://doi.org/10.1088/1757-899x/ 1114/1/012026 3. Toure S, Diop O, Kpalma K, Maiga AS (2019) Shoreline detection using optical remote sensing: a review. ISPRS Int J Geo-Inf 8(2). https://doi.org/10.3390/ijgi8020075 4. Nicholls RJ, Wong PP, Burkett VR, Codignotto JO, Hay JE, McLean RF, Ragoonaden S, Woodroffe CD (2007) Coastal systems and low-lying areas, pp 315–356 5. Dyring E (1973) Principles of remote sensing. Ambio 11(3):57–69. https://doi.org/10.4324/ 9780203714522-9 6. Paravolidakis V, Ragia L, Moirogiorgou K, Zervakis ME (2018) Automatic coastline extraction using edge detection and optimization procedures. Geosciences (Switzerland) 8(11). https:// doi.org/10.3390/geosciences8110407
310
S. S. Bharadwaj and M. Geetha Priya
7. Temiz F, Durduran SS (2016) Monitoring coastline change using remote sensing and GIS technology: a case study of Acigöl Lake, Turkey. IOP Conf Ser Earth Environ Sci 44(4). https://doi.org/10.1088/1755-1315/44/4/042033 8. Vanaja C, Pk S (2020) Mosquito-borne diseases in Kerala, India: An update. Int J Mosquito Res 7(4):45–48 9. Rahaman KR, Hassan QK, Ahmed MR (2017) Pan-sharpening of Landsat-8 images and its application in calculating vegetation greenness and canopy water contents. ISPRS Int J Geo-Inf 6(6). https://doi.org/10.3390/ijgi6060168 10. Mondal I, Thakur S, Juliev M, Bandyopadhyay J, De TK (2020) Spatio-temporal modelling of shoreline migration in Sagar Island, West Bengal, India. J Coastal Conserv 24(4):1–20. https:// doi.org/10.1007/s11852-020-00768-2 11. Joesidawati MI, Suntoyo (2016) Shoreline change in Tuban district, East Java using geospatial and Digital Shoreline Analysis System (DSAS) techniques. Int J Oceans Oceanogr 10(2):235– 246 12. Oyedotun TDT (2014) Shoreline geometry: DSAS as a tool for historical trend analysis. Geomorphol Tech (Online Ed) 2(October):1–12
A Novel Approach with Hybrid Technique for Monitoring and Leakage Detection of Water Pipeline Using IoT D. Mahesh Kumar , BA. Anandh , A. Shankar Ganesh , and R. Sakthivel
Abstract A smart water pipeline monitoring system is discussed in this work to control water leakages. Water usage is increasing in daily life, which is related to the increase in water waste. As a result, a smart monitoring system based on the Internet of Things (IoT) has been devised and presented as a solution. IoT’s applications and benefits are limitless in today’s world. A wide range of sensors for measuring water flow is available in the market. The flow of water is monitored using a flow sensor in the pipe, and the impurity present in water is detected by a turbidity sensor. The NodeMCU microcontroller, which is one of the most widely used microcontrollers for IoT applications, was employed in this system. The data from the turbidity and water flow sensors are transferred to the cloud server. ThingSpeak cloud server is open and free to use, it was chosen to store the data in the cloud. The data obtained from the flow sensor are shown in the ThingSpeak cloud. As a result, it will be very easy to monitor the water flow in the pipeline. Keywords Smart monitoring system · Cloud server · Flow sensor · IoT
1 Introduction Water can be considered a renewable asset when used appropriately. If we discard it recklessly, it will become a non-renewable resource on that site. Due to the expanding inhabitants, fast industrialisation, and increasing lifestyle standards, the need for water has increased throughout time. In an attempt to gather water, dams, reservoirs, as well as groundwater infrastructure such as wells have been constructed. Wastewater reuse and desalination are other solutions, but the costs are prohibitive. Three decades from now, 1/3rd of the people may get affected by water shortage. One of the techniques to decrease pipeline leaking is to build an underground water pipeline surveillance system. The substance of the pipelines is among the most important variables producing leaks within the water pipeline, and the ageing of the pipelines D. Mahesh Kumar (B) · BA. Anandh · A. Shankar Ganesh · R. Sakthivel Department of Electronics, PSG College of Arts & Science, Coimbatore, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_25
311
312
D. Mahesh Kumar et al.
employed is also another critical aspect. The water into the network is predicted to reach 50% due to the ageing of pipes. Pipes can be damaged by digging by other companies such as communication, gas, and electricity. This could take several days for the relevant people to locate the source of the leak and fix the leakage pipe. A considerable volume of water would be lost in the process of the delay. The main objective must be automated by utilizing sensor-based techniques to find leakage and track where they happen, allowing leaks to be repaired or prevented promptly on time, leading to limited water loss [1]. Tracking underground pipes is simple with the help of flow sensors and IoT. The flow rate is supervised using a flow sensor, and the amount of water flowing through the pipes also is measured. The volume of water and the flow rate inside the pipe is measured and transferred to the cloud via IoT for further analysis. IoT is utilized to take the communication system to the next stage by simultaneously connecting several objects to the Internet, allowing machine-machine and man–machine interactions [2]. Devices or sensors gather data from the outside world and transmit it to the cloud. Sensors can connect with the cloud via the device utilizing Wi-Fi, cellular networks, and other networking methodologies. Selecting a good communication method for the IoT system is difficult due to the availability of various wireless systems [3]. Data collected from the devices are stored in the cloud for further handling for the users based on the different applications. Such type of performance can be done in a variety of ways, such as sending emails or setting alerts on mobile phones. A customer will have accessibility to an app that will allow them to carefully manage the IoT device remotely. Aya Ayadi et al. propose the objects of the hardware and software technologies used for water pipeline monitoring will require modern algorithms to monitor the status of the security management, target movement, and also the leakage detection. He also expressed to maintain the conversation of energy and to make the accuracy of communication with the hardware and software methods used in WSNs [1]. Angel Vergina et al. recommended using Internet of Things devices to improve machine learning and fuzzy system dynamic data to check the water quality monitoring structures that employ to screen the water borders. Furthermore, a machine learning random forest method has been presented to improve the efficiency and accuracy of the water screening framework [4]. Rahmat et al. used Arduino UNO and proposed to employ kinematics physics and fluid mechanics to detect the leaking site of a water pipeline by harnessing water flow rate data gathered with the usage of a flow liquid metre sensor [5].
2 Proposed System Non-acoustic leakage detectors using flow sensors can be implemented in this system. The flow rate may be determined, and the capacity of water flowing in the pipe is determined by the YF-S201 water flow sensor. When leakage occurs in the pipe, then the cleanliness of the water will be affected by the mixture of soil with the water. As
A Novel Approach with Hybrid Technique for Monitoring and Leakage …
313
a result, it is essential to keep an eye on the water’s quality. So, a turbidity sensor is also used to inspect the water passing through the pipe for impurities, i.e. purity of water. IoT technology will be used for real-time updates, wherein the microcontroller transmits the data to the cloud for subsequent operations. The block diagram of the proposed system is shown in Fig. 1. NodeMCU is a costeffective microcontroller with a Wi-Fi module. D7 and D8 I/O pins are connected with the YF-S201 water flow sensor. This sensor has a rotor that sits in line with the water pipeline, with a pinwheel sensor to detect the quantity of fluid that has passed over the pipeline. An inbuilt magnetic Hall effect sensor generates an electrical pulse with each spin and is insulated from the water, keeping it dry and safe. Whenever water flows over the flow sensor at a high rotational speed, pulses are generated at the output. These pulses are read as interrupts by the microcontroller. The water flow can be determined by counting the output pulses from the sensor. Every pulse is about 2.25 mL. The purity of the water is monitored using a turbidity sensor. It has a light transmitter and receiver. Light received by the receiver will change depending on the particles found in the water. It has both digital and analogue outputs. Because of the accuracy, the analogue signal output is employed and the output is connected to the analogue read pin. Open and free ThingSpeak cloud server is used in this system. Data’s from the turbidity and the flow sensors are sent to the cloud and can be viewed in the ThingSpeak web server. The data can be transferred to an Excel spreadsheet or given to MATLAB for analysis.
Fig. 1 Block diagram of the system
314
D. Mahesh Kumar et al.
3 Working Procedure The turbidity sensor works based on the amount of light scattered through the water and the corresponding analogue voltage is produced by the sensor. The output of the sensor will be less than 2 V (NTU: 020) for pure water and 4.8 V (NTU: 2500– 3000) for contaminated water. The output from the turbidity sensor is stored in the ThingSpeak cloud server for analysis. The goal of utilizing a turbidity sensor is mainly to monitor the contaminations. When there is a leak in the pipe and soil gets into the pipeline, then the water cannot be used for drinking. Though the flow sensor fails to identify the leakage in the pipe, it can be monitored by the turbidity sensor. When the flow of water is heavy, the rotor rotates quickly, and when the water flow is slow, the rotor rotates slowly [5, 6]. Depending on the revolutionary speed of the rotor, TTL pulses may fluctuate with a constant voltage. The active flow will be between one and thirty litres per minute. 450 pulses can be created per litre. Since the flow sensor produces the digital output, it can be directly connected to NodeMCU. The flow sensor output is digital pulses, and hence, the data are read digitally by the time interval between pulses in NodeMCU. Flow rate is determined by dividing the pulse frequency by 7.5. The water flow per minute is calculated by dividing the flow rate by 60. The amount of flow through the pipe is obtained by adding the volume and flow rate [4, 7]. The flow sensor is installed in the receiving line as well as the distribution region of the water source. The volume of water flows thru the flow sensor is sent to the ThingSpeak cloud from where the data can be used for analysis. If there is any deviation in the volume of water from the distribution region to the receiving area, then there is a leak in the water pipe. D8 (GPIO 15) pin is attached with the flow sensor of the main water source. The flow sensor of street A is connected to D7 (GPIO 13), and the flow sensor of street B is connected to D6 (GPIO 12) pin. When the system is powered up, NodeMCU switches on and establishes the connection with the available Wi-Fi network and the data is updated in ThingSpeak. Each distribution path is installed with a flow sensor, and the water flow should be kept moderate since the flow sensor detects 30 L/m [8, 9]. Figure 2 shows the entire system flow. First, the data is read through the flow sensor and is sent to the cloud. Then, the data in the cloud are analysed based on distributed and received volume of water. The received data from the sensor are compared with the distributed volume of water data. If there is variation in it beyond the threshold value, then an alarm signal is enabled with the help of an app to the user.
4 Result and Discussion Data of each flow sensor from the ThingSpeak server are shown in Fig. 3. It indicates the amount of water flowing through the pipeline. The data are sent to the ThingSpeak
A Novel Approach with Hybrid Technique for Monitoring and Leakage …
315
Fig. 2 Flowchart for leak detection
cloud every second/minute from the microcontroller. In the ThingSpeak cloud, the data analysis can be done by monitoring the amount of water distribution to the specified area and the consumption of water that occurred in that area and also the water leakage that occurred in it. MATLAB processing present in the ThingSpeak can also be utilized for analysing the data [10]. The differences in the distributed water and received water can be analysed. When any discrepancies, notification can be generated by IFTTT, Twilio can be used to make calls, and Prowl can be used to send push notifications [11]. In the water distribution pipe, water flows at a rate of 1 L/s and is distributed to street A and street B. Figure 4 shows the water distribution without leakage done through street A and street B. In that graph, it shows that the total amount of water transmitted/released is around 150 L and the total amount of water received at both street A and street B is like 74 and 76, respectively. So, the total amount of water received at the end of streets A and B is 150 L. Figure 5 shows the data comparison of the water consumed with the leakage. Here, also the total amount of water transmitted/released is around 150 L. On the receiving side, there is a small amount of water leakage that occurred in street A and also in street B from the fifth day onwards. Finally, at the end of street A and street B, the total amount of water received is around 134 L. So, there is a loss of 16 L of water on the receiver side. These figures also show the time and number of consumptions in each street. This gives a clear idea about the total volume of water distributed and the amount of water consumed [12, 13].
316
D. Mahesh Kumar et al.
Fig. 3 ThingSpeak webserver output
Amount of Water (in litres)
Water Distribution without Leakage 120
120
105 90
100 75
80
60
60
45
40 20
150
135
140
000
33 30 27 26 16 15 14 19 87
39 36
76 69 74 62 66 58 54 46 51 44
Water out(L) Street A Street B
0
Date of Data Reading
Fig. 4 Water distribution data comparison without leakage
5 Conclusion Water pipelines are widely used over long distances, which makes them expensive and vulnerable in certain tough conditions. Pipeline leaks can result in economic and
A Novel Approach with Hybrid Technique for Monitoring and Leakage …
Amount of Water (in litres)
Water Distribution with Leakage
150
135
140
120
120
105 90
100
75
80
60
60 40 20
317
000
45 33 39 31 30 26 19 24 16 15 14 87
51 44 38 44
0
58 50
66 55
74 60 Water out(L) Street A Street B
Date of Data Reading
Fig. 5 Water distribution data comparison with leakage
physical losses. Water pipeline infrastructure monitoring and leak detection have become major challenges in recent years. This system provides very good service to end-users and focuses mainly on water pipeline monitoring methods. The use of IoT helps in monitoring the flow of water through pipelines in any location. As a result, this system has a very good response time for monitoring and detecting leaks in the water pipeline. In future work, an advanced algorithm can be proposed to prevent the leakage from occurring in the water pipeline so that the water can be saved.
References 1. Ayadi A, Ghorbel O, BenSalah MS, Abid M (2019) A framework of monitoring water pipeline techniques based on sensors technologies. J King Saud Univ Comput Inf Sci. https://doi.org/ 10.1016/j.jksuci.2019.12.003 2. Karray F, Garcia-Ortiz A, Jmal MW, Obeid AM, Abid M (2016) EARNPIPE: a testbed for smart water pipeline monitoring using wireless sensor network. Procedia Comput Sci 96:285–294. https://doi.org/10.1016/j.procs.2016.08.141 3. Karray F, Triki M, Jmal MW, Abid M, Obeid AM (2018) An IoT-based wireless sensor network for water pipeline monitoring. Int J Electr Comput Eng 8(5):3250–3258. https://doi.org/10. 11591/ijece.v8i5.pp3250-3258 4. Angel Vergina S, Bhavadharini RM, Kavalvizhi S, Kalpana Devi S (2020) An underground pipeline water quality monitoring using IoT devices. Eur J Mol Clin Med 7(8):2046–2054 5. Rahmat RF, Satria IS, Siregar B, Budiarto R (2017) Water pipeline monitoring and leak detection using flow liquid meter sensor. IOP Conf Ser Mater Sci Eng 190:012036. https://doi.org/ 10.1088/1757-899X/190/1/012036 6. Vijayan A, Narwade R, Nagarajan K (2019) Real-time water leakage monitoring system using IoT based architecture. Int J Res Eng Appl Manage 5(8):24–30 7. Sugantha Lakshmi R, Chandra Praba G, Abhirami K (2021) Automated water management and leakage detection system using IoT. Int J Eng Res Technol 9(5):401–403 8. Abdelhafidh M, Fourati M, Fourati LC, Mnaouer AB, Zid M (2018) Cognitive internet of things for smart water pipeline monitoring system. In: 2018 IEEE/ACM 22nd international symposium
318
9. 10.
11.
12.
13.
D. Mahesh Kumar et al. on distributed simulation and real-time applications (DS-RT), pp 1–8. IEEE, Madrid, Spain. https://doi.org/10.1109/DISTRA.2018.8600999 Shinde JP, Sapre S (2019) IoT based detection and control of leakage in water pipeline with a portable kit. J Emerg Technol Innov Res 6(4):559–562 Boniel GJM, Catarinen CC, Nanong RDO, Noval JPC, Labrador CJM, Cañada JR (2020) Water management system through wireless sensor network with mobile application. AIP Conf Proc 2278(1). https://doi.org/10.1063/5.0026155 Aba EN, Olugboji OA, Nasir A, Olutoye MA, Adedipe O (2021) Petroleum pipeline monitoring using an internet of things (IoT) platform. SN Appl Sci 3(180). https://doi.org/10.1007/s42452021-04225-z Abdelhafidh M, Fourati M, Fourati LC, Abidi A (2017) Remote water pipeline monitoring system IoT-based architecture for new industrial era 4.0. In: 2017 IEEE/ACS 14th international conference on computer systems and applications, pp 1184–1191, Hammamet, Tunisia. https:// doi.org/10.1109/AICCSA.2017.158 Ahmed S, Le Mouël F, Stouls N (2020) Resilient IoT-based monitoring system for crude oil pipelines. In: Proceedings of the 7th international conference on internet of things: systems, management and security, Paris, France
VGG-16 Architecture for MRI Brain Tumor Image Classification N. Veni and J. Manjula
Abstract Brain cancer has the lowest survival rate of other types of cancer. Distinct kinds of brain cancer persist, based on the size, shape, structure, and position. The magnetic resonance imaging (MRI) technology is the traditional way for diagnosing various brain diseases. The manual identification of brain abnormalities becomes time-consuming and challenging. In recent times, the deep learning (DL) models have become increasingly popular in medical image analysis. In classifying the brain tumor images, features extracted using the transfer learning (TL) technique is fed to convolutional neural network (CNN) model with various network layers. The initial stage is the preprocessing of sample images, followed by the filtering using the pooling layers, feature extraction using the convolutional layers, and finally, the classification is done using the FC layer of the architecture model. The MRI images from the REpository of Molecular BRAin Neoplasia DaTa (REMBRANDT) database utilize the pre-trained model architectures such as Visual Geometry Group (VGG) VGG-19, VGG-16, Inception-V3, Inception-V2, Residual Network (ResNet) ResNet-18, and ResNet-50.The result of the experiments shows comparative study of the accuracy, precision, recall, and F1-score performance metrics evaluation for the VGG-16 approach with little processing resources and less complexity. Keywords Magnetic resonance imaging · Deep learning · Residual network · Visual geometry group · Inception
N. Veni (B) · J. Manjula Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] J. Manjula e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_26
319
320
N. Veni and J. Manjula
1 Introduction An abnormal or improper brain cells growth may be the brain tumor that can cause serious harm to the central nervous system. It is tough and time-consuming to solve the problem of brain tumor classification using simply standard medical image processing. Medical evidence shows that human-assisted classification can result in improper prediction and diagnosis. An accurate cancer-type diagnosis allows doctors to provide the finest treatment for the patient and preserve their life. The minute alterations in MRI scans are difficult to notice manually. The variation and resemblance of tumor and normal tissues are the main reasons behind this. The REpository of Molecular BRAin Neoplasia DaTa (REMBRANDT) database MRI brain images are utilized. Hence, choosing the right features and classifiers achieves the best outcome. An automated MRI brain image classification and tumor image detection using TL is presented in [1]. A DL model is developed using TL approach for the brain tumor classification of MRI images. Many CNN designs are investigated by this system, including ResNet, Inception, and VGG networks. The use of DL and TL methods to diagnose and classify brain tumors using MRI brain images has been proved to be a promising methodology [2]. The use of deep neural networks (DNN) in DL methods helps to categorize MRI brain images without the need for a human feature extraction procedure. CNN is one of the most important deep learning approaches for dealing with difficult issues in a variety of applications. Convolutional network layers with a collection of filters, maxpooling layers, and fully connected (FC) layers are some of the sophisticated features used in CNN architectures [3]. DL method is used to cluster the features, identify, and classify the brain images based on the database features stored to detect the brain tumors. The detection and classification of brain tumors using the DL method are discussed in [4]. Initially, the tumor image region is segmented, and then, the necessary features are extracted using stochastic gradient descent by a pre-trained CNN. Applying fine-tuning [5] method and optimizing it with a pre-trained VGG-19 CNN model is suggested for classifying multigrade tumors. In [6], a multinomial logistic regression algorithm is used to classify pituitary adenomas tumor. MRI image classification with CNN framework is discussed in [7]. It classifies abnormal brain images into minimum and maximum grade levels, by modifying the ResNet CNN model. The pre-trained NN model is utilized to categorize different types of brain tumors [8]. This image processing approach, when coupled with a trained CNN model, can help identify different kinds of brain tumors [9]. MRI brain images utilize the pre-trained CNN model with TL method to extract the deep features for image classification analyses [10]. The brain tumor classification is supported by three distinct pre-trained CNN models (VGGNet, ResNet, and GoogleNet). During this TL approach, VGG-16 modifies the pre-trained system architectures [11]. The unavailability of labeled data is one of the major obstacles in the penetration of DL in medical health care. As recent development of DL applications in other fields has shown, the bigger the data, better the accuracy rate [12].
VGG-16 Architecture for MRI Brain Tumor Image Classification
321
Organization of the research paper: The rest of this work is organized as follows: The various deep learning models are addressed in Sect. 2. Section 3 discusses the experimental results of various architectures. This research article is concluded in the final section.
2 Methods and Materials The sample input MR brain images obtained from the REMBRANDT [13] dataset is shown in Fig. 1. Figure 2 illustrates the CNN model’s architecture for brain tumor categorization. In this system, the obtained MR brain image is given as input for the chosen architecture model. Deep features extracted from the brain images are fed to the designed CNN model with pre-trained features. Inception-v2, Inception-V3, ResNet-18 [14], ResNet-50 VGG-19, and VGG-16 are the six pre-trained CNN models employed in this study. Then, in the FC softmax layers of the convolutional network, the collected features from pre-trained models are classified using DL techniques [15]. In a typical DL method, the collected deep features are fed to convolutional layers comprising max-pooling layered networks with an FC layer as illustrated in Fig. 2 to forecast the output as tumor or normal brain image classification.
2.1 Pre-trained CNN Models for Deep Feature Extraction In general, CNN performs better in larger datasets than the smaller ones. When it is not possible to produce a big training dataset, TL can be utilized. Figure 1 shows how
Normal images
Abnormal images Fig. 1 Sample MRI brain images showing normal and abnormal cases (REMBRANDT dataset)
322
N. Veni and J. Manjula Deep feature extraction
Classification
FC Layer (Normal)
(Tumor)
Output
64x64 Flatten
256
512
128 Pooling
Convolution
Pooling
Fig. 2 CNN model architecture for MRI brain image classification
TL may be utilized as a feature extractor for a variety of tasks with a relatively limited dataset, such as brain MRI images. TL has been effectively utilized in a variety of fields in recent years, including medical image processing and analysis for the cancer classification. CNN is a sort of DNN that uses convolutional layers to filter inputs for particular information. The output of neurons connected to specific areas in the input is computed by convolutional filters in CNN’s convolutional layers. It helps extract spatial and temporal information from images. The total feature parameters in the model architectures with convolutional layers are reduced using a weighted sharing mechanism. CNN model involves three major parts of utilization: • A convolutional layer is used to learn spatial and temporal features. • A sub-sampling or max-pooling layer lowers the dimensionality of an image. • An FC layer aids in classifying the MR brain images into different classes. The architectures of several CNN models are described as follows.
2.1.1
VGGNet (VGG-16 and VGG-19) Architecture
In image recognition, the impact of the depth of a convolutional network layer changes the accuracy. They used extremely tiny (3 × 3) convolution filters to push the depth of layers from eleven to nineteen weighted layers of the developing VGGNet. The VGG-16 and VGG-19 configurations typically employ 16 and 19 weighted layers. Initially, the MRI brain image is given as input for both the VGG-16 and VGG-19 architecture model. The deep features are extracted from the convolutional and max-pooling layer of the network. With increasing depth, the classification error reduces until it reaches 19 layers, till it becomes saturated. The significance of depth
VGG-16 Architecture for MRI Brain Tumor Image Classification
323
VGG-16
Softmax
FC 4096
FC 1000
FC 4096
Conv.(3x3),512
2x2 Max-Pooling
Conv.(3x3), 512
Conv.(3x3), 512
Conv.(3x3), 512
2x2 Max-Pooling
Conv.(3x3), 512
Conv.(3x3), 512
2x2 Max-Pooling
Conv.(3x3), 256
Conv.(3x3), 256
Conv.(3x3) 128
2x2 Max-Pooling
Conv.(3x3), 128
Conv.(3x3)64
2x2 Max-Pooling
Conv, (3x3)64
Input image
Softmax
FC 4096
FC 1000
FC 4096
2x2 Max-Pooling
Conv.(3x3),512
Conv.(3x3),512
Conv.(3x3), 512
Conv.(3x3), 512
Conv.(3x3), 512
2x2 Max-Pooling
Conv.(3x3), 512
Conv.(3x3), 512
Conv.(3x3), 512
Conv.(3x3), 256
2x2 Max-Pooling
Conv.(3x3), 256
2x2 Max-Pooling
Conv.(3x3), 128
Conv.(3x3), 128
2x2 Max-Pooling
Conv.(3x3), 64
Conv.(3x3), 64
VGG-19
Fig. 3 VGG-16 and VGG-19 network architecture
in graphical representations is also validated by the researchers. The VGGNet architecture model with convolutional layers, max-pooling layer, and softmax layer along with the input brain image is depicted in Fig. 3.
2.1.2
ResNet Architecture (ResNet-18 and ResNet-50)
Fig. 4 ResNet-18 architecture
Softmax
FC
Avg pool
Conv.(3x3),512
Conv.(3x3),512
Conv.(3x3),512
Conv.(3x3),512. /2
Conv.(3x3),256
Conv.(3x3),256
Conv.(3x3),256
Conv.(3x3),256, /2
Conv.(3x3),128
Conv.(3x3),128
Conv.(3x3),128
Conv.(3x3),128. /2
Conv.(3x3),64
Conv.(3x3),64
Conv.(3x3),64
Conv.(3x3),64
Conv.(3x3),64
Input
ResNet is short form of residual network. ResNet-18 is an 18-layer convolutional as shown in Fig. 4, and ResNet-50 is a 50-layer convolutional neural network as shown in Fig. 5 trained on MRI brain images from the REMBRANDT database [13]. The usage of skip connections, which symbolize the signal coming into a layer being added to the output of a layer higher up the stack, was a unique notion behind the architecture. Figure 4 depicts a basic illustration of this notion. The shortcut connections with the input dimensions less than the output dimensions are introduced in ResNet as one of three forms of skip. Shortcut handles identity mapping with extra zero padding as dimensions expand, and it does not require any
Predicted Subject
(X3)
77 fc, softmax
(X6)
avg pool
(X4)
Conv.(1x1),512 Conv.(3x3),512 Conv.(1x1),2048
Conv.(1x1),64 Conv.(3x3),64 Conv (1x1) 256
Max pool
Conv.(7x7),64 stride
Input Frames
(X3)
Conv.(1x1),256 Conv.(3x3),256 Conv.(1x1),1024
N. Veni and J. Manjula
Conv.(1x1),128 Conv.(3x3),128 Conv.(1x1),512
324
Fig. 5 ResNet-50 architecture
more parameters. Only the projection shortcut is utilized to increase dimensions; for other shortcuts, extra parameters for the identity are required. Initially, in this ResNet model as shown in Fig. 5 architecture model, the MRI brain image is given as input for convolutional layers with max-pooling layers and FC softmax layer for image classification.
2.1.3
GoogleNet (Inception-v2 and Inception-v3)
The architecture of GoogleNet is shown in Fig. 6, and it maximizes the computational resources constructed inside the network. A dropout occurs once the FC layer is removed. There are three different levels of Inception: In Inception-v1, the block at each level is increased while hiding the last stage’s high number of input filters from the subsequent layers. In Inception-v2, the learning rate is increased by eliminating the dropout and local response normalization. It fully shuffles the training instances and minimizes the L2 weight normalization and visual abnormalities. Inception-v3 was considerably faster to train the model and so the computing time is less compared to Inceptionv2 but more than VGG-16 architecture. The Inception module of the GoogleNet architecture with input brain image employs an activation function with max-pool layer and without FC layer.
3 Results and Discussion The REMBRANDT dataset [13] provides 200 MRI brain images with 100 normal brain images and 100 abnormal MRI brain images. The study employs various CNN architectures, including ResNet-18, ResNet-50, GoogleNet, VGGNet-16, and VGGNet-19, to classify brain tumors as normal and abnormal. This utilizes MRI brain images; each study examines TL approaches including fine-tuning and optimization. The VGG-16 architecture with more advantageous and efficient classification is demonstrated in this section. The database contains MRI images from both normal and tumor patients.
VGG-16 Architecture for MRI Brain Tumor Image Classification
325
Input
Conv 7x7+ 2S
Max Pool 3x3 + 2S
LRCN
Inception Layer x9
LRCN
Conv 3x3 + 1S
Conv 1x1 + 1V
FC
Softmax Activation
Softmax Output
Average Pool 7x7 +1V
Fig. 6 Simplified block diagram of GoogleNet architecture
An input REMBRANDT dataset is produced with 140 brain images for training phase and 60 MR brain images for testing phase. Before sending the raw pixels of each image to the modified VGG-16 and other CNN model architectures, the model is pre-trained. After successful training, the VGG-16 and the other five CNN model architectures are implemented in the testing phase. This phase involves automated loading of test image into the system and saving the output class as (+1 for abnormal, −1 for normal). These are then compared to the ground truth data to determine the appropriate class. This method is carried out for all the testing images in the input database, and the results for VGG-19, VGG-16, Inception-v2, ResNet-18, Inceptionv3, and ResNet-50 are obtained and evaluated in Table 1. The proposed system efficiency to categorize the test MRI brain images into two classes; P—positive cases or N—negative cases is evaluated based on the accuracy, recall, precision, and F1-score values. It demonstrates the accurate architecture model for the normal and abnormal brain image classification using the overall performance of the system using the following equation. Accuracy =
TP + TN TP + FN + TN + FP
Recall =
TP TP + FN
Precision =
TP TP + FP
(1) (2) (3)
326
N. Veni and J. Manjula
Table 1 Accuracy obtained for different architectures #Run
Accuracy (%) ResNet-18
ResNet-50
Inception-v2
Inception-v3
VGG-16
VGG-19
1
89
90
95
93
95
95
2
90
94
94
92
96
94
3
92
91
94
92
97
93
4
89
90
96
93
96
95
5
92
94
95
93
95
95
6
92
91
93
92
96
96
7
90
89
91
90
95
95
8
88
90
95
94
97
94
9
89
92
93
92
97
96
10
88
92
94
93
95
96
Average
90.0
91.15
94
92.4
96
94.45
F1-score =
2 ∗ (Precision ∗ Recall) (Precision + Recall)
(4)
where TP—true positive, TN—true negative, FN—false negative, and FP—false positive. Table 1 depicts the accuracy of the ResNet-18, ResNet-50, Inception-v2, and Inception-v3, VGG-16, and VGG-19 system. The VGG-16 provides the high accuracy of 96% than all other CNN models image classification. Table 2 predicts the recall, precision, and F1-score evaluation for various model architectures mentioned above. Among them, VGG-16 provides the highest value of 96.13% recall, 95.9% precision, and 96% of F1-score. Hence, the VGG-16 architecture is compared to be more efficient than all other architecture model. Table 2 Recall, precision, and F1-score evaluation of various architectures
Various model
Recall
Precision
F1 score
ResNet-18
90.08
89.9
0.9
ResNet-50
91.04
91.3
0.91
Inception-v2
94.02
94
0.94
Inception-v3
92.42
92.4
0.92
VGG-16
96.13
95.9
0.96
VGG-19
94.10
94.9
0.94
VGG-16 Architecture for MRI Brain Tumor Image Classification
327
4 Conclusion In this study, with limited dataset of brain MRI images, various CNN architectures are used to examine the brain tumor classification. The MRI images from REMBRANDT database are fed to the pre-trained architecture models to determine the brain tumor image or normal images. Then, the VGG-16 and the other five CNN model architectures are implemented in the testing phase. The testing images in the input database provide the image classification using various CNN models. The accuracy of all the CNN models is obtained and tabulated. Finally, the diagnostic performance and computational consumption of all the model is compared with the VGG-16 architecture accuracy of 96%. Hence, comparing to the above-mentioned CNN model architectures as shown in Tables 1 and 2, the VGG-16 architecture with 16 network layer provides higher accuracy, precision, F1-score, and recall measures of performance in classifying the MR brain images into tumor or normal images.
References 1. Arbane M, Benlamri R, Brik Y, Djerioui M (2021) Transfer learning for automatic brain tumor classification using MRI images. In: 2nd international workshop on human-centric smart environments for health and well-being. IEEE Press, Algeria, pp 210–214 2. Mohammed BA, Al-Ani MS (2021) An efficient approach to diagnose brain tumors through deep CNN. J Math Biosci Eng 18(1):851–867 3. Rehman A, Naz S, Razzak MI, Akram F, Imran M (2020) A deep learning-based framework for automatic brain tumors classification using transfer learning. J Circ Syst Signal Process 39:757–775 4. Chelghoum R, Ikhlef A, Hameurlaine A, Jacquir S (2020) Transfer learning using convolutional neural network architectures for brain tumor classification from MRI images. In: IFIP international conference on artificial intelligence applications and innovations, pp 189–200. Springer, Switzerland 5. Bhanumathi V, Sangeetha R (2019) CNN based training and classification of MRI brain images. In: 5th international conference on advanced computing & communication systems. IEEE Press, India, pp 129–133 6. Kaur T, Gandhi TK (2020) Deep convolutional neural networks with transfer learning for automated brain image classification. J Mach Vis Appl 31(3):1–16 7. Murugan S, Bhardwaj A, Ganeshbabu TR (2015) Object recognition based on empirical wavelet transform. Int J MC Square Sci Res 7(1):74–80 8. Jayachandran A, Andrews J, Prabhu LAJ (2019) Modified region growing for MRI brain image classification system using deep learning convolutional neural networks. In: International conference on innovative data communication technologies and application. Springer, Switzerland, pp 710–717 9. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumor classification in MRI image using convolutional neural network. J Math Biosci Eng 17:6203–6216 10. Kaur T, Gandhi TK (2019) Automated brain image classification based on VGG-16 and transfer learning. In: International conference on information technology. IEEE Press, India, pp 94–98 11. Talo M, Baloglu UB, Yıldırım O, Acharya UR (2019) Application of deep transfer learning for automated brain abnormality classification using MR Images. J Cogn Syst Res 54:176–188 12. Mohsen H, El-Dahshan ESA, El-Horbaty ESM, Salem ABM (2018) Classification using deep learning neural networks for brain tumors. J Future Comput Inform 3(1):68–71
328
N. Veni and J. Manjula
13. REMBRANDT. https://wiki.cancerimagingarchive.net/display/Public/REMBRANDT 14. Tamilarasi R, Gopinathan S (2021) Inception architecture for brain image classification. Int J Phys Conf Ser 1964(7):072022 15. Rai HM, Chatterjee K (2020) Detection of brain abnormality by a novel Lu-Net deep neural CNN model from MR images. J Mach Learn Appl 2:100004
Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using Multispectral Data K. R. Raghavendra, M. Geetha Priya, and S. Sivaranjani
Abstract Glaciers are the unit of an environment that actively responds to climate changes. As the glacier is one of the important cryospheric elements, glacier facies can also be called cryo-facies. Cryo-facies represent different zones of the glacier which helps to understand glacier modeling. The present study aims to delineate the cryo-facies (glacier facies) of Karakoram and Himalayan (Western, Central & Eastern) glaciers. The multispectral data from the satellites Landsat-7 and Landsat8 were used for the year (2020). MXL (Maximum Likelihood Classification), a classifier of supervised classification, is a machine learning algorithm that has been used to classify cryo-facies. The MXL classifier is a pixel-based classifier, which segregates the image pixels into wet snow, dry snow, melting ice, debris, and shadow regions of the glacier accordingly. Results obtained show that the melting ice zone is spatially variable from region to region, glacier to glacier, and within glacier due to its topography. Keywords Glacier facies · Landsat-7/8 · Himalayas · Karakoram · Cryo-facies
1 Introduction Glacier is a huge mass of ice covered with snow that moves downwards due to gravity and its mass. In general, only benchmark glaciers that are larger in the area are considered for glaciology studies as smaller glaciers respond quickly to environmental and climate change. Almost all glaciers in the Himalayas are retreating at an annual rate of 16–35 m [1–3]. Karakoram region contains more than 60% of the glaciers of the Indus basin [4], the glaciers in the Karakoram region are retreating slowly as compared to glaciers in the Himalayan region [5, 6]. K. R. Raghavendra Department of ECE, Jyothy Institute of Technology, Bengaluru, India M. Geetha Priya (B) · S. Sivaranjani CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_27
329
330
K. R. Raghavendra et al.
Cryo-facies give the spatial variation of snow and ice cover over the glacier which helps to understand the response of the glacier to climatic changes and cryo-facies also helps to understand the glacier model more accurately. By tracking the spatial variation of snow and ice facies of glaciers will help us to understand surface albedo and regional mass balance. The cryo-facies give delineated information of different zones of the glacier-like dry snow, wet snow, melting ice, debris, shadows. In the present study, supervised classification of Landsat datasets has been carried out using training sets for individual features of the glacier region-wise.
2 Materials and Methods 2.1 Study Area The benchmark glaciers (important glaciers) which don’t respond to climate changes quickly are used here in the study. In this study, four glaciers namely Siachen glacier, Samudra Tapu, Khumbu Glacier, and Zemu glacier are selected as study area (Fig. 1). The Siachen glacier is in the Eastern Karakoram region with a large area and highest elevation [7]. Siachen glacier, Karakoram, and Samudra Tapu, Western Himalayas receive the winter snowfall due to westerly disturbances. The Khumbu glacier in the central Himalayas also receives snowfall due to westerly disturbances but not more as compared to the Siachen glacier and Samudra Tapu. The Zemu glacier which is present in the Eastern Himalayas is monsoon-dominated. Table 1 gives the information of each glacier considered in this study.
2.2 Data Used The multispectral data from the satellites Landsat-7 and Landsat-8 were obtained from United States Geological Survey (USGS) Earth Explorer (https://earthexplorer. usgs.gov/) for the year 2020 during ablation period with minimum cloud cover. Table 2 gives the details of the data used for the present study. The Landsat-7 was launched on April 15, 1999 with an Enhanced Thematic Mapper Plus (ETM+) sensor which has 8 bands, with a swath of 185 km. The Landsat-8 was launched on February 11, 2013 with Operational Land Imager (OLI) and Thermal Infrared Sensors (TIRS), only OLI level-1 C-1 data is used here for the study, which has 11 bands with a swath of 185 km × 180 km. The visible bands (Red, Green & Blue) from Landsat-7 and Lansat8 were used which have a spatial resolution of 30 m were used here for the study. The glaciers shapefiles were obtained from RGI 6.0 (Randolph Glacier Inventory) (https://www.glims.org/maps/glims) which is delineate the glacier boundary.
Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using …
331
Fig. 1 Study area map: A—Siachen Glacier, B—Samudra tapu, C—Khumbu Glacier, D—Zemu Glacier Table 1 Details of the study area Glacier name
Glacier id
Region
Area in km2
Latitude
Longitude
Siachen Glacier
51,842
Karakoram
936
35°27 50.04 N
77°2 38.04 E
Samudra Tapu
95,590
Western Himalayas
65
32°28 48 N
77°24 36 E
Khumbu Glacier
96,230
Central Himalayas
39.5
27°58 19.2 N
86°49 50.88 E
Zemu Glacier
23,358
Eastern Himalayas
42
27°42 28.8 N
88°11 33 E
Table 2 List of data used Scene id
Date of acquisition
Path no.
Row no.
LC08_L1TP_148035_20200919_20201006_01_T1
19-09-2020
145
38
LC08_L1TP_147037_20200912_20200919_01_T1
12-09-2020
147
37
LE07_L1TP_140041_20200412_20200508_01_T1
12-04-2020
140
41
LC08_L1TP_139041_20200124_20200128_01_T1
24-01-2020
139
41
332
K. R. Raghavendra et al.
2.3 Methodology The visible bands (Red, Blue & Green) of the level-1 C-1 product obtained from the satellites Landsat-7/8 should be preprocessed initially as they are available in the format of Digital Numbers (DN). The pre-processing consists of 3 steps namely, radiometric correction, geometric correction, and atmospheric correction. In radiometric correction, the obtained data in Digital Number (DN) form is converted to radiance using Eq. 1 and subsequently to reflectance using Eq. 2 as shown in the process flow chart (Fig. 2). Radiance = Gain ∗ DN + offset values Reflectance =
π ∗ Radiance ∗ L 2 E sun ∗ cos ∅
(1)
(2)
where L—Distance from earth to sun (in astronomical units), E sun —mean solar exoatmospheric irradiance, and ∅—Solar zenith angle. In geometric correction [8], the errors caused due to the sensors are corrected. Atmospheric correction was done to rectify absorption and scattering caused due to atmospheric conditions and later rescaling reflectance was done for atmospherically corrected output. The radiometric correction, geometric correction and atmospheric correction is carried out using a semi-automatic plugin of Quantum Geographic Information System (QGIS) which is open-source software that is used for this process. The preprocessed output has been clipped with the shapefile obtained from RGI 6.0. So, the region of interest that is the glacier region is obtained. Later the clipped data is digitized by giving the proper Universal Transverse Mercator (UTM). The MXL classifier classifies the pixels into classes based on maximum likelihood. The likelihood is the posterior probability of pixels belonging to class ‘r’ as given in the Eq. 3. MXL states the relationship between the class and probability of pixels. The MXL classification using training sets for individual features of the glacier to classify the glacier into wet snow, dry snow, debris, melting ice, and shadows were carried out. LH = P(r |X ) =
P(r ) ∗ P(X |r ) P( j) ∗ P(X | j)
(3)
where LH—Likelihood, P(r)—Prior probability of class r, P(X|r)–Probability density function (Fig. 2).
Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using …
333
Fig. 2 Process flow
3 Results As per the methodology described in the previous section, process flow has been carried out for the satellite data, and the cryo-facies have been mapped using supervised pixel-based classification technique with appropriate training samples. The mapped cryo-facies results are shown in Figs. 3, 4, 5 and 6 and Table 3 gives the spatial variation of cryo-facies. It is also observed that the glaciers in the Karakoram [9], Western Himalayas, and the central Himalayas are exposed during the ablation period (June, July, August & September) Whereas it is seen that, glaciers in the Eastern Himalayas are not exposed during the ablation period, instead they are exposed during January and February which are winters. This shows that for the
334
K. R. Raghavendra et al.
Fig. 3 Cryo-facies mapping of Siachen Glacier
Eastern Himalayas January and February are not true winters as they are monsoondominated. This is evident from the fact that the westerly disturbances become weak before they reach the Eastern Himalayas. From Figures 3, 4, 5 and 6 and Table 3, the following observations are made: • In general, dry snow will be observed over the accumulation region which is also evident from the results obtained, except for the Zemu glacier. • The melting snow from the accumulation region due to its moisture content has been classified as wet snow and it is noticed on the parts of the accumulation region of glaciers, except the Zemu glacier. • Melting ice are possible from exposed ice of the ablation region of the glaciers. Results show that the Zemu glacier has melting ice over few parts of the accumulation region also. • The Khumbu glacier ablation region is completely filled with debris (pixels classified as debris for the Khumbu glacier are possibly ice blocks due to icefall), whereas Siachen and Samudra Tapu have debris cover at the sides of the central flowline in the ablation region. • The observations over the Zemu glacier indicate that the cryo-facies over Eastern Himalayas are unique and variable both on spatial and temporal scale.
Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using …
Fig. 4 Cryo-facies mapping of Samudra Tapu Glacier
Fig. 5 Cryo-facies mapping of Khumbu Glacier
335
336
K. R. Raghavendra et al.
Fig. 6 Cryo-facies mapping of Zemu Glacier Table 3 Spatial analysis of cryo-facies Glacier name
Dry snow
Wet snow
Melting ice
Debris
Siachen Glacier
Upper accumulation region
Lower accumulation region
Upper ablation region
Snout and sides of central flow line in the ablation region
Samudra Tapu
Upper accumulation region
Lower accumulation region
Upper ablation region and lower ablation region
Snout and sides of central flow line in the ablation region
Khumbu Glacier
Few parts of the upper accumulation region
Few parts of the Few parts of the lower accumulation accumulation region region
Parts of accumulation region, complete Ablation region & snout
Zemu Glacier
Parts of the accumulation region and ablation region
Parts of the accumulation region and ablation region
Debris free
Few parts of the accumulation region
Cryo-facies Mapping of Karakoram and Himalayan Glaciers Using …
337
4 Conclusion In the present study, mapping of cryo-facies for four important benchmark glaciers of the Karakoram and the Himalayas has been carried out. Open-source satellite datasets and QGIS software were used for the entire processing. The delineated cryo-facies provides better insight and understanding about different glacier zones and their behavior along with their responses to climate changes and global warming. Glacier facies helps to build glacier models which help to understand the changes in glaciers with respect to melting and mass loss. Acknowledgements This work is being financially supported by SPLICE-DST (DST/CCP/NHC/156/2018(G)) under NMSHE Network Programme on Climate Change and Himalayan Cryosphere (DST/CCP/NHC/156/2018(G)). The authors gratefully acknowledge the support and cooperation given by Dr.KrishnaVenkatesh, Director, CIIRC—Jyothy Institute of Technology, Bengaluru.
References 1. Dobhal DP, Gergan JT, Thayyen RJ (2004) Recession and morphogeometrical changes of Dokriani glacier (1962–1995) Garhwal Himalaya, India. Curr Sci 86:692–696 2. Kulkarni AV, Karyakarte Y (2014) Observed changes in Himalayan glaciers. Curr Sci 106:237– 244 3. Kulkarni AV, Dhar S, Rathore BP, Babu Govindha Raj K, Kalia R (2006) Recession of Samudra Tapu glacier, Chandra river basin, Himachal Pradesh. J Indian Soc Remote Sens 34:39–46. https://doi.org/10.1007/BF02990745 4. Muhammad S, Tian L, Khan A (2019) Early twenty-first century glacier mass losses in the Indus Basin constrained by density assumptions. J Hydrol 574:467–475. https://doi.org/10.1016/j.jhy drol.2019.04.057 5. Muhammad S, Tian L (2016) Changes in the ablation zones of glaciers in the western Himalaya and the Karakoram between 1972 and 2015. Remote Sens Environ 187:505–512. https://doi. org/10.1016/j.rse.2016.10.034 6. Sivaranjani S, Priya MG, Krishnaveni D (2020) Glacier surface flow velocity of Hunza Basin, Karakoram using satellite glacier surface flow velocity of Hunza Basin, Karakoram using satellite optical data 7. Sivalingam S, Murugesan GP, Dhulipala K, Kulkarni AV, Devaraj S (2021) Temporal fluctuations of siachen glacier velocity: a repeat pass sar interferometry based approach. Geocarto Int 37(17):4888–4910. https://doi.org/10.1080/10106049.2021.1899306 8. Sowmya D, Deepa P, Venugopal P (2017) Remote sensing satellite image processing techniques for image classification: a comprehensive survey. Int J Comput Appl 161:24–37. https://doi.org/ 10.5120/ijca2017913306 9. Sivalingam S, Murugesan GP, Dhulipala K, Kulkarni AV, Pandit A (2021) Essential study of Karakoram Glacier velocity products extracted using various techniques. Geocarto Int, 1–14 (2021). https://doi.org/10.1080/10106049.2021.1974954
Ethereum-Based Certificate Creation and Verification Using Blockchain E. Mutharasan, J. Bharathi, K. Nithesh, S. Bose, D. Prabhu, and T. Anitha
Abstract Certificates are signals of achievement or membership. A web-based system for secure verification of academic certificates and a blockchain system for storing academic weblogs are proposed in this paper. Certificates are created digitally using QR and search to find the certificate from the database. Any modification in the certificate can be identified by the Base64 image comparison. The environment is less complex to design because hyperledger is not a coin (“token”) based blockchain. A transaction on hyperledger does not require a virtual currency transfer, in contrast to bitcoin or Ethereum. The application has two main components: Webserver and Ethereum. The disadvantage of storing information on a blockchain-based application is that it cannot store image files; instead, it uses weblogs for storage. When proposing a digital way of handling certificates, give utmost importance to how it eliminates counterfeit certificates from the system. Making such a digital system will eventually lead to questions regarding the security of the digital content (i.e.) certificates. To ensure that the digital certificates are stored in a secure way such that a malicious attacker or a user cannot change the content of these certificates. When a member tries to exploit the system and make sure that they are removed from the system without causing much damage to the overall functioning of the system. The proposed digitally created certificate can be verified by anyone with high security. Keywords Blockchain · Higher studies · Certificate · Encryption
E. Mutharasan · J. Bharathi · K. Nithesh · S. Bose · T. Anitha (B) Department of Computer Science and Engineering, College of Engineering, Anna University, Guindy, Chennai, India e-mail: [email protected] D. Prabhu Department of Computer Science and Engineering, University College of Engineering, Arni, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_28
339
340
E. Mutharasan et al.
1 Introduction Certificates are important everywhere. They are the tokens of our life to prove our excellence and our worth to the world. Now, a lot of us faced a situation that our certificate verification can be a little time-consuming. This work aims to digitalize certificates that can be verified by anyone instantly. Certificates signify membership or achievement. University degrees are essential for students to help them go to higher studies or get jobs. To check the authenticity of a transcript in our analog system, verifying universities are required to call the issuing institution. A system must hold the record of such transcripts and make them available online while ensuring the security and authenticity of such transcripts. If such a system is proposed and developed, authorities can easily verify the certificates and students can always hold a digital copy of their certificates. Another problem that comes when handling educational certificates is how we handle the case of counterfeit certificates. When proposing a digital way of handling certificates it should give the most important how to eliminate counterfeit certificates from the system. Making such a digital system will eventually lead to questions regarding the security of the digital contents (i.e.) certificates. So we must ensure that the digital certificates are stored in a secure way such that a malicious attacker or a user cannot change the content of these certificates. When a member tries to exploit the system we must make sure that they are removed from the system without causing much damage to the overall functioning of the system. A two-factor verification-based system for secure verification of academic certificates is proposed in this work. Universities can upload the certificates of their students and verify the certificates from other member universities. To ensure the authenticity and security of the certificate, we use a hash-based storage system to store digital contents. The main contribution of the work is given below: 1. Creating a digital certificate from the webserver with embedded encrypted for identification using QR creation and QR encryption. 2. Verification of digital certificate is done for the user using decoded and decrypted QR and check for the records in the webserver. 3. Storing weblogs in blockchain to protect the data. We cannot modify or delete. Any logs from the blockchain. This paperwork is organized as follows: Sect. 2 discusses the various existing methods for the modules of the proposed system, and Sect. 3 discusses the various concepts used in the proposed system along with the overall design. Section 4 discusses the implementation and experimental results details of our proposed work. Section 5 discusses the conclusion and future work.
Ethereum-Based Certificate Creation and Verification Using Blockchain
341
2 Related Work Nowadays, educational records are often used to describe a user’s educational process. These documents are significantly vital for a user’s future studies and career. The most prominent factor of records is that they are primitive, permitting them to reconstitute valid historical circumstances. As a result, students, potential employers, and educational institutions esteem these records. Educational records have been digitized as a result of advances in information technology. Digital records, in contrast with the standard physical records of paper, are held on a storage media with a high degree of variable, permitting them to be easily modified during workflows of storage, transmission, and sharing. Centralized storage is perhaps the most prominent option for storing and managing data, and this mode makes systems vulnerable to several attacks. Additionally, educational institutions maintain records from multiple learning stages on separate storage servers, and these storage servers are often configured to permit mainly internal staff access, without any kind of interoperability. A failure could also result in loss of data or leakage. As a result, organizations frequently develop security policies that limit the access and exchange of records to secure information. Institutions, on the other hand, lack of secure and worthwhile record-sharing techniques among institutions. When transferring from one institution to another, students may encounter difficulties, for example, when keeping their previous institution’s course completion record [1]. Validation of certificates in public key infrastructure is an essential part of creating security features on any network. There seems to be a lot of discussion about how efficiently validates digital certificates in Public key infrastructure, which is the foundation for the security of communication networks. The development of such a framework is critical because digital certificates must be validated fast and secure for a huge number of clients at a minimal cost for a short time frame. Our analysis of TLS holding hands in the dataset of Alexa’s top one million domains suggests that current popular certificate validation systems have been unable to distribute certificate validation of additional data to the clients in a well-timed manner and have a lot of overhead on the client-side, making them vulnerable to various types of attacks. The Secure Guard is a mechanism that verifies certificates and it is capable of handling certificate validation expeditiously during TLS holding hands, as a result of the observations. This method makes use of Internet service providers as the validation of certificate’s primary entity, taking advantage of the fact that all the requests for a proxy-cache server must be used by ISPs to provide Internet access. The Secure Guard evaluation explains its efficiency. Furthermore, it presents a quantitative analysis method for comparing the costs of the system and alternative certificate validation procedure in the same inception scenarios. Secure Guard can examine digital certificates with minimal network delay, according to the results [2]. Digital certificates are an essential part of online security because they provide public key authentication of network entities. A digital certificate is a digital document that has been signed by a certification authority, certifies that the named person holds the public key that was disclosed. In general, certificate authorities are incharge
342
E. Mutharasan et al.
of both certificate revocation and reissue, and certificates are deemed independent of one another by their nature. This work addresses the issue of certificate management in this research and presents a flexible framework for creating associated certificates. Then, it utilizes to build a multi-certificate public key infrastructure that allows users to perform self-service tasks like certificate substitution and self-reissue after selfrevocation. To the state of the art, it’s the first self-reissue mechanism for certificate holders. Additionally, there is the unidentified digital certificate, which connects a user’s public key to their identity in an unidentified manner controlled by the user. Due to the unlikability of these certificates, the identity-key bond can be divulged by the user to specify communication peers while remaining anonymous to the rest of the world, achieving privacy [3]. Because of nature’s technology, it has a wide range of applications in multiple industries. This latest technology, the subject of human resource management data must be kept private and secret to have a great research value. Distribution ledger technology is a revolutionary idea in the world of human resource management, especially when it comes to managing human resource records. A privacy-preserving framework has been used to create a transparent system for managing human resource records. Wallets can also be created utilizing an organization’s id and output with a public–private key pair, as well as matching privacy parameters with hashing. Confidentiality, integrity, and authentication are all provided by keys. The contract makes decisions in distributed with converged manner, with classified levels of privacy. Time, consumption of memory, failure point identification, and read–write latencies were used to evaluate the proposed work’s performance [4]. In the paper, the M-estimator algorithm is utilized to smooth two-dimensional (2D) splines. The novel techniques, M-estimator-based image processing algorithms, consider the spatial relationships between the picture elements. The sample’s contribution to the model is determined not by the sample’s residual, but also by the nearby residuals. Each processing window’s smoothing parameter is computed separately and adjusts to the image’s local structure. The proposed approach is used to filter images. The filter results in preserving details while minimizing additive Gaussian and impulsive noise [5]. Most image processing technologies do not make use of the substantial inter channel correlation found in vector-valued data like RGB color images or multimodal medical images. The angle between the spatial gradients in vector-valued images is a new way of dealing with vector-valued images. Parallel level sets in images could be generated by minimizing the cost function that disincentivizes large angles. To examine the Gateaux derivatives, which lead to a diffusion-like gradient descent technique, after formally introducing the idea of cost functional. Several examples of de-noising and demosaicking RGB color images are used to demonstrate the properties of the cost function. In this study, parallel level sets are demonstrated to be an effective tool for improving color images. For low noise levels, demosaicking using parallel level sets produces visually excellent results. Furthermore, as compared to existing techniques, the proposed technique produces clearer images [6]. In public key infrastructure (PKI), the public key digital certificate is quite often used to offer users public key authentication. User authentication cannot be done with
Ethereum-Based Certificate Creation and Verification Using Blockchain
343
a public key digital certificate. In the author’s paper, generalized digital certificates (GDCs) are proposed in the context of user authentication or key exchange. A digital signature is included in a GDC and the public information is signed by a trustworthy certificate authority, as well as the user’s public information, such as the user’s digital driving license and digital birth certificate information. GDCs, on the other hand, do not contain any public key information. Due to the lack of a private and public key pair, GDC key management is easier than that of a public key digital certificate. The GDC acts as a concealed token on every user, and the signature can never be verified by anyone. Instead, by replying to the verifier’s challenge, the owner proves to the verifier should know the signature. The proposed discrete logarithm (DL) and integer factoring (IF) based methods for user authentication and secret key establishment are based on this approach [7]. The proposed methodology uses a new word shape coding scheme to extract document images, which captures text content by annotating each word image with a word shape code. A combination of a topological form of features to label word pictures, including character ascenders or descenders, character voids, and character water sources. Document images can be retrieved employing annotated word shape codes and either query keywords or a query document image. According to experimental results, the technique for retrieving document images has been proposed with quick, systematic, and considerate [8]. To make proof of identity of uploaded documents for any organization faster and more convenient, the proposed approach integrates a number of techniques, including public or private key cryptography, online storage security, digital signatures, hashes, and peer-to-peer networks [9]. But security and reliability perform less in these proposed techniques [10]. Scalable and cost-effective solutions are used reduce overhead and make document verification a flawless process may now be deployed thanks to public blockchain like Ethereum, smart contracts, and decentralized applications. The OCR module for extracting information from certificates, as well as a blockchain module for sending and verifying data stored on the blockchain used in this system. The docschain was created to address the three shortcomings of blockcerts previously described [11]. Docschain integrates smoothly into the existing degree issue system by functioning on hard copies of the degree documents. A degree document is analyzed using optical character recognition (OCR), and the record of each degree document is saved with the relevant OCR template, allowing us to understand the logic behind the data provided throughout the document [12]. To create a secure database system that cannot be manipulated over, edited, destroyed, or altered in the proposed technology. It also protects the security of papers issued within the college system as well as those exported or deported from outside platform, such as financial documents, official documents, and academic credentials. Furthermore, this technology ensures the security and confidentiality of our data and information. Proposed technology will be constructed on a database that has 100% correct information about the platform’s exports [13]. Smart contracts have been used to generate data for blocks provided toward the Ethereum blockchain platform. The Inter Planetary File System (IPFS) has been used to hold certificate files in a distributed environment for fast and accurate entry. Certificate data can be stored on the Ethereum public
344
E. Mutharasan et al.
blockchain architecture, with promoting files in the IPFS environment, according to the results [14, 15].
3 Proposed Methodology The proposed system follows an architecture comprising of the following modules creating a digital certificate. Verification of digital certificates and storing weblogs on the blockchain is shown in Fig. 1.
3.1 Digital Certificate Creation The data manager gives the student information to the webserver. Thus, the digital certificate is created from the webserver with embedded encrypted are for identification using QR creation and QR encryption. It involves the process of twofactor authentication, encrypting data using AES, creating QR from encrypted data, and embeds the QR into the certificate. The data manager login into the webserver using two-factor authentication. The OTP is received through the registered mobile number. The data manager gives the student information to the webserver. Thus, the digital certificate is created from the webserver with embedded encrypted are for identification. QR Creation The QR is created from the student roll number that is available in the web database. QR Encryption The roll number is encrypted using the AES-128 Cipher algorithm. We encrypt the QR to avoid SQL injection attacks. Two-Factor Authentication Two-factor authentication (2FA), generally known as two-step verification or dualfactor authentication, is a security technique that requires users to confirm their identities using two separate authentication methods. Such a technique is taken to protect the user’s credentials and the resources to which the user has accessed. Singlefactor authentication (SFA), wherein the user specifies handiest one issue (typically a password or passcode), gives a better stage of protection than two-factor authentication (TFA). Users should provide both a password and a second factor, which is usually a security token or a biometric element like a fingerprint or facial scan. Identifying a victim’s password is completely inadequate to pass any validation tests; therefore, two-factor authentication adds an extra layer of security to the authentication step, making it more difficult for attackers to gain access to a user’s machines or online accounts. Two-factor authentication might be in possession long been used to
Ethereum-Based Certificate Creation and Verification Using Blockchain
345
Fig. 1 Digital certificate creation and verification system architecture
guard from harm approach to particular systems and information, and internet service providers exist evenly implementing it to take care of their customers’ attestation from hackers the one may have hack a password collection of data or obtained user passwords through phishing attempts. Two-factor authentication use Twilio to send the OTP to the data manager through text message through sid shown in Algorithm 1. Algorithm 1 Texting OTP 1. Send OTP using sid, token from twilio account
346
E. Mutharasan et al.
(a) client = new Client(sid; token) (b) client to messages to create(“from”, “body” pin).
3.2 Web Application A web application is application software that employs a webserver, as distinct to computer-based software programs that execute locally on the device’s operating system. To get the right of entry to Internet apps, the user needs to make use of an internet browser with an energetic community connection.
3.3 Security Protocol AES exist an iterative cipher alternatively a Feistel cipher. It’s function based on a permutation substitute network. It comprises a series of interconnected operations, some of which necessitate swapping specific inputs for specific outputs (substitutions) and those which necessitate shifting bits around (permutations). Surprisingly, AES performs all of its calculations in bytes rather than bits. As an outcome, the plaintext block’s 128 bits are handled as sixteen bytes with the aid of using AES. These sixteen bytes are divided into 4 columns and 4 rows for matrix processing. In Fig. 2, Algorithm 2 depicts the AES Encryption.
Fig. 2 AES schematic structure
Ethereum-Based Certificate Creation and Verification Using Blockchain
347
In AES, the number of rounds can be customized and is determined by the key length. AES makes use of 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys. Each of those rounds has its very own 128-bit round key, that’s generated from the initial AES key. Algorithm 2 AES Encryption Input: Roll Number from Database Output: Encrypted String 1. Get Roll No from Database 2. Encrypt the roll no (a) encryption = opens encrypt(simple string, ciphering, encryptionkey, options, encryption).
3.4 QR Encoding A QR Code is the shape of a square since the QR Code scanners can detect and it enhances their efficiency for holding and sending data. Algorithm 3 demonstrates QR coding. Seven important components make their puzzle-like appearance: • • • •
Positioning markings: Direct the way in which a code is printed. Alignment markings: This feature facilitates the alignment of larger QR Codes. Timing pattern: The size of the data matrix is determined by lines in a scanner. Version information: The QR Code version used in specific sections. Versions 1–7 are the most commonly utilized 40 various QR Code versions. • Format information: Formatting patterns contain information on error tolerance and data mask patterns, making the code easier while scanning. • Data and error correction keys: The exact data is displayed here. • Quiet zone: Space that allows scanning algorithms to distinguish between the QR Code and the materials around it. Algorithm 3 QR Encoding Input: Encrypted String Output: Encoded QR 1. Get the string (a) (b) (c) (d) (e)
Resize image Detect cigarette Convert image into gray scale Detect face If both cigarette and face are detected find the relative distance.
348
E. Mutharasan et al.
3.5 Certificate Verification Any user can now verify the digital certificate. Verification involves decode the QR from the uploaded certificate and decrypt the data and check the records in the database. Finalize the result with image comparison. QR Decoding QR is decoded using javascript and gives the encrypted string. This is shown in the following Algorithm 4. Algorithm 4 QR Decoding Input: QR Output: Encrypted String 1. Function decodeImageFromBase64 (data, call-back) // set callbackqrcode.callback = call-back; // Start decoding qrcode.decode (data). Decrypting the String Decryption is the process of restoring encrypted data to its original state. It’s usually a two-step encryption process. Even though decryption necessitates the use of a secret key or password, it decodes the encrypted data, allowing only an authorized user to decrypt it. Privacy is one of the reasons why encryption is used in decryption systems. As data goes via the Internet, it is vital to monitor unauthorized groups or individuals’ access. As a result, the data is encrypted to prevent data loss and theft. Encrypted data includes text files, photographs, e-mail communications, user data, and directories, to name a few examples. To acquire the encrypted material, the decryption receiver receives a prompt or window asking for a password. The process of extracting and converting the scrambled data into words and pictures that can be examined by a reader as well as decrypted by the device. Decryption can be done manually or automatically. A set of keys or passwords might be used. Algorithm 5 shows the decryption schema. Algorithm 5 Decrypting String Input: Encrypted String Output: Decrypted String 1. decryption = openssldecrypt (encryption, ciphering, decryptionkey, options, decryptioniv). Find and Verify the Record Using Image Comparison With the decrypted string, we can find if there is a record available in the database and verify the mark sheet using Image Comparison. The Image Comparison converts the image into Base64 format and compares it. It is shown in Algorithm 6.
Ethereum-Based Certificate Creation and Verification Using Blockchain
349
Algorithm 6 Image Comparison Input: Image Output: Whether the image is same or not 1. If (abase64 === bbase64) // they are identical window. Alert (”both are same”); else // they are not identical window. Alert (”both are not same”).
3.6 Store the Logs in Blockchain The verify logs and upload logs are fetched from the database. Then, they are stored in blockchain Ethereum. This is done by using Meta Mask, Ganache, and Truffle. Truffle creates the contracts and solidifies the information. Meta Mask Connects the local information to Ganache. Now, the logs are stored in Ganache a blockchain transaction.
4 Experimental Results A computer with Ubuntu operating system and 4 GB of RAM with Xampp, Ganache, Meta Mask, and node JS Truffle tools that support our system to carry out the process of certificate creation and verification using blockchain. Twilio is a cloud-based platform as a service that permits software developers to build and receive calls, send and receive messages, and accomplish similar communication tasks via its web service APIs. Truffle is an Ethereum programming environment, testing infrastructure, and asset pipeline aimed at making it easier for Ethereum developers. Built-in smart contract compilation, linking, deployment, and binary management are all available with Truffle, as well as automated contract testing using Mocha and Chai. Scriptable deployment migrations framework, configurable build pipeline with support for bespoke build procedures. For deployment to a large number of public–private networks, network management is required. External script runner that performs script in a Truffle framework, dynamic console for direct contract communication, and real-time asset rebuilding throughout development. Ganache is a private blockchain that makes it simple to create Ethereum and Corda distributed application and it is used for developing, deploying, and testing the applications in a secure and predictable environment. Ganache UI is a desktop program that is compatible with both Ethereum and Corda. Additionally, Ganache-cli, an Ethereum version of Ganache, is available as a command-line tool. Meta Mask is the most suitable and efficient method of masking to connect the blockchain-based services in a securely.
350
E. Mutharasan et al.
4.1 Evaluation Metrics The system’s performance must be assessed a set of criteria, these criteria decide the basis of the performance of any system. Performance metrics are a term used to describe the types of parameters. In order to evaluate a webserver, the following performance measures are used: Throughput Throughput refers to how much data can be manipulated through given time frame. The data involves enrolling a new data of a candidate or updating and retrieving data of the candidate. It can be measured either in bits per second or data per second. Throughput =
Total operations Total time in seconds
(1)
Figure 3 shows the time taken to process requests when the amount of requests varies with the varying amount of peers handling the request in the fabric system. Based on the above times Fig. 3 taken to process the requests, the throughput of the fabric system with different peers is graphed as in Fig. 4. Delay Delay is the time between when the request is submitted and when the response is received. Delay (D) = Time when response received−Time when Request submitted (2) Average delay of multiple requests is defined in the following graph where Effective Parallel Request (EPR) is defined as the ratio of number of parallel requests
Fig. 3 Average time taken to process requests
Ethereum-Based Certificate Creation and Verification Using Blockchain
351
Fig. 4 No. of peers versus throughput
received at an instance to the number of peers handling the request. Effective Parallel Request (EPR) =
number of requests at instance number of peers handling the request
(3)
When the value of EPR is a decimal it is rounded off to the next natural number. For each successful transaction a block is added to the ledger. Therefore ledger height is directly proportional to the number of transactions successfully submitted to the network. Also the greater the ledger size, the greater will be the latency value since the system needs to process more number of blocks. The system is evaluated for various ledger heights and levels of EPR are shown in Fig. 5. Generated and comparison of mark sheet shown in Figs. 6 and 7.The database is interrogated for both the verify and upload logs. They are then stored on the Ethereum blockchain. This is accomplished by combining Meta Mask, Ganache, and Truffle. Truffle creates the contracts and solidifies the data. Meta Mask is a tool that connects
Fig. 5 Ledger heights and levels of EPR
352
E. Mutharasan et al.
Fig. 6 Generated marksheet
Fig. 7 Marksheet comparison
local information to Ganache. The logs are now stored as a blockchain transaction in Ganache is shown in Fig. 8. Log stored in blockchain Fig. 9.
5 Conclusion and Future Work This project was successful in creating and verifying digital certificates. Thus the proposed solution can help anyone who wants to get a certificate without going to university or verify the certificate. In conclusion, academic institutions can collaborate with other employers and publish credentials on web applications to eradicate
Ethereum-Based Certificate Creation and Verification Using Blockchain
353
Fig. 8 Ganache transactions
Fig. 9 Log stored in blockchain
fake educational certificates. The system is extremely flexible, allowing it to be expanded to include any sort of record, depending on the needs of different businesses. Additional improvements to our technology would involve more automation
354
E. Mutharasan et al.
to completely eliminate human efforts and full integration with academic institution’s information systems to require minimal user engagement.
References 1. Li H, Han D (2019) Blockchain-based educational records secure storage and sharing scheme. This work was supported in part by the National Natural Science Foundation of China under Grant 61672338 and Grant 61873160, vol 7, pp 179273–179289 2. Alrawais A, Alhothaily A, Cheng X, Hu C, Yu J (2018) A certificate validation system in public key infrastructure. IEEE Trans Vehic Technol 67:5399–5408 3. Zhu W-T, Lin J (2016) Generating correlated digital certificates. Framework and applications. IEEE Trans Inf Forensics Secur 11:1117–1127 4. Muhamed Turkanovi C, Holbl M, Kristjan Ko C, Marjan Heri C, Aida Kami Sali C (2018) A blockchain-based higher education credit platform. This work was supported by the Slovenian Research Agency (Research Core Funding) under Grant P2-0057, vol 6, pp 5112–5127 5. Karczewicz M, Gabbouj M (1998) Image modelling with application to image processing. IEEE Trans Image Process, 912–917 6. Ehrhardt MJ, Arridge SR (2014) Vector-valued image processing by parallel level sets. Proceedings of IEEE Trans Image Process 23:9–18 7. Harn L, Ren J (2011) Generalized digital certificate for user authentication and key establishment for secure communications. IEEE Trans Wirel Commun 10:2372–2379 8. Lu S, Li L, Tan CL (2008) Document image retrieval. IEEE Trans Pattern Anal Mach Intell, 1913–1918 9. Imam IT, Arafat Y, Alam KS, Shahriyar SA (2021) DOC-BLOCK: a blockchain based authentication system for digital documents. In: Proceedings of the third international conference on intelligent communication technologies and virtual mobile networks (ICICV), pp 1262–1267 10. Gaikwad H, Souza N, Gupta R, Tripathy AK (2021) A blockchain-based verification system for academic certificates. In: 2021 international conference on system, computation, automation and networking (ICSCAN) 11. Rasool S, Saleem A, Iqbal M, Dagiuklas T, Mumtaz S, ul Qayyum Z (2020) Docschain: blockchain-based IoT solution for verification of degree documents. IEEE Trans Comput Soc Syst 7 12. Kanan T, Kanan T, Al-Lahham M (2019) SmartCert blockchain imperative for educational certificates. In: 2019 IEEE Jordan international conference on electrical engineering and information technology, pp 629–633 13. Afrianto I, Heryanto Y (2010) Design and implementation of work training certificate verification based on public blockchain platform. In: 2020 fifth international conference on informatics and computing (ICIC) 14. Kim TH, Kumar G, Saha R, Thomas R, Buchanan WJ, Alazab M (2020) A privacy preserving distributed ledger framework for global human resource record management. This work was supported by the Department of Corporate and Information Services, NTG, Australia, vol 8, pp 96455–96467 15. Yang W, Aghasian E, Garg S, Herber D, Disiuta L, Kang B (2019) A survey on blockchain-based internet service architecture. IEEE Access 7:75845–75872
IoT and Machine Learning Algorithm in Smart Agriculture A. Revathi and S. Poonguzhali
Abstract Internet of Things (IoT) refers to the interconnection of physical objects and controlling them using embedded technologies such as sensors, software for transferring data via Internet. These data are processed using various decision making algorithms. IoT devices and emerging technologies have been implemented in various sectors and major transformation has occurred in the field of agriculture. In agriculture, farmers face numerous challenges which could be resolved using IoT technologies. IoT enabled smart farming results in improved crop productivity in agricultural field. This study explores the IoT architecture and components, IoT technologies and sensors, various applications in agricultural IoT and its benefits. Also, this study proposes that the use of Remote Monitoring System (RMS) in agriculture for collecting various data such as temperature of environment, humidity of environment, soil moisture, soil pH using different sensors along with decision tree algorithm improves decision making resulting in high yield of crops. Keywords IoT · Agriculture · IoT technologies · Sensors · Decision tree
1 Introduction IoT, the technology which has created a revolution in the world, by connecting devices in a smart way wherein users can monitor and control them in a significant manner. IoT uses existing and trending technologies for networking, data sensing, and robotics. This technology has its applications in many fields such as agriculture, smart education, e-health sector [1, 2]. Agriculture has an essential role in the production of essential food crops. Agriculture is not all about food but also dairy, forestry, cultivation, poultry, bee keeping, etc. Agriculture includes production, promotion, filtering, A. Revathi (B) Department of Computer Science, VISTAS, Chennai, India e-mail: [email protected] S. Poonguzhali Department of Computer Applications, VISTAS, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_29
355
356
A. Revathi and S. Poonguzhali
and sales of agricultural products and provides better employment opportunities to many people. Nowadays, precision agriculture is gaining more popularity and its main goal is the availability to all common people at low cost with maximum crop productivity and also helps in protecting the environment [3]. The various devices used in precision agriculture are used for monitoring and analyzing different parameters like soil temperature, moisture, humidity, etc. for increasing productivity and reducing crop damage. These devices have interoperability so that every device is connected to each other and functions accordingly [4]. At present IoT enabled technology in agriculture has developed to a greater extent such that farmers use drones [5] and livestock monitoring [6] significantly and efficiently for better farming. Farmers can manage any upcoming challenges in the agricultural environment using the smart IoT technology. This paper is divided into seven sections. The first section introduces a brief overview of IoT, the second section describes the related literature of this paper, and the third section briefs the technical terms used in IoT. The fourth section provides the detail discussion of agricultural IoT, the fifth section describes the benefits of IoT. The sixth section gives the comparison of literatures for arriving at best IoT and machine learning algorithm and finally the seventh section concludes this survey. This paper identifies the four parameters temperature of environment, humidity of environment, soil moisture, and soil pH as the highly influential inputs for the decision tree algorithm to yield higher accuracy in predicting crop production which in turn increases the crop yield.
2 Literature Survey Patel et al. [7] have discussed about the IoT, its architectures, characteristics, and its applications. The authors have explained the different layers in IoT architecture like application layer, service management layer, sensor layer, network layer, etc. along with the need for the future development in the IoT architectures. They have concluded stating that IoT is a technology which can create a revolution and needs development in areas like privacy and security and its availability to the people all over the world. Vinayak et al. [8] have explained how the IoT plays a major role in the field of agriculture using smaller and customized sensors and provided key advantages in enhancing farming techniques using IoT technology. The use of various sensors like RFID chips to detect diseases in plants to prevent loss of crops and monitoring factors like soil temperature, moisture, and pH level can help farmers produce better results. The authors conclude that the use of IOT technology can help in minimizing human efforts and grow market with a single touch from anywhere. Alreshidi et al. [9] presented an overview on existing AI/IoT technologies used for smart sustainable agriculture and identified the architecture to further improve the existing platforms. Sensing layers hosted various types of IoT devices that were used for identification and collection of real-time data and also categorized the main
IoT and Machine Learning Algorithm in Smart Agriculture
357
domains of agriculture. The authors have finally concluded stating that the proposed system improved the overall architecture of the existing smart sustainable agriculture platform to increase the yield of the crops. Ratnaparkhi et al. [10] reviewed on sensors, its types, applications, challenges, its uses in agriculture along with pros and cons. According to the author, the main goal of these sensors was to identify and monitor the properties of the surrounding environment and soil. IoT enabled sensors play an essential role in smart agriculture by determining soil properties and helping farmers by using real-time data. Sensors attached with IoT devices can be used in many applications like Precision farming, Smart greenhouses, Monitoring livestock, etc. These sensors can improve the productivity of the farming. The authors finally concluded that the world hunger can be brought to an end, if these technologies are implemented in the right way. Bhuvaneshwari et al. [5] described a method using drone attached with various kinds of sensors and Raspberry Pi 3B module for monitoring and increasing the production of crops with advanced solutions. ADCMPC3008 took input from the Raspberry Pi 3B module and converted analog data to digital data. The devices used USB, Bluetooth, and Wi-Fi modules for transferring the data. It reduced the consumption of power by using solar panels. It mainly used support vector machine for the classification and achieved a better result. Mishra et al. [11] described about the IoT-based system which collects the data using various devices and sensors and transfers the data through the Internet. The transferred data is then analyzed and visualized using IoT application. It also related the use of spectrum with IoT to achieve better results. It gave an overview on LPWA technology which was used to reduce the power consumption of the IoT devices to long last its durability. The authors finally concluded that IoT-based agriculture can solve many problems like impacts on crops by weather, climate, etc. and showed the advancements using LPWA technology. Shali et al. [6] presented the importance of various segments of IoT in cultivation of crops. It also described the advancements in networks used in horticulture based on IoT including its layers and designs. Various networks and protocols that are used for various purposes like monitoring the climatic conditions, pests, plants diseases, and livestock, etc. have been discussed. They finally concluded stating that using IoT features like sensors helped the farmers and ranchers in a great way and makes the horticulture applications more feasible. Patil et al. [12] monitored the environment around the crop from faraway by using Remote Monitoring System (RMS) along with Internet and wireless communications. The alert system will alert the users about the environment by sending messages and advices on the pattern of the weather. The device is integrated with ICT, which is an essential system in agricultural domain, for monitoring weather patterns and soil conditions. They finally concluded that the task of providing accurate information on weather data is a challenging task and it will help the farmers profit by protecting and increasing the yield of the crops. Khanna et al. [13] provided a clear review on important applications in IoT field and mainly on precision agriculture. It is stated clearly that IoT contributed essentially in precision agriculture to further improve the techniques and the yield of the
358
A. Revathi and S. Poonguzhali
crops. They gave a detailed overview on communication technologies in IoT and its applications in real life. While using IoT technology, many problems were faced like confidentiality of the data and its security which should be solved to perfect the modern technologies. They have finally concluded that IoT enabled farmers to control various things in agriculture land from a remote place the current technology was not static and could be changed in near future. Muangprathuba et al. [14] developed a watering system for agricultural crops using wireless sensors for improving crop growth and reducing the overall costs. The proposed system was used to control the irrigation system using node sensors in the agriculture field and the data can be managed through a web-based application or smart phone. This system successfully maintained the moisture content of the soil and showed a good result in analyzing the fetched data. They finally concluded that the proposed system showed a high potential in increasing the agricultural productivity and reducing costs.
3 Technical Terms in IoT 3.1 IoT Definition The term IoT is defined as connecting of various devices together using Internet where each device has unique Internet Protocol (IP) network. Internet of Things refers to the objects which are recognizable and controllable through Internet [7]. IoT enables each and every device to stay connected anytime and anywhere. IoT system can make users to obtain better automation, integration, and analysis within a system.
3.2 Characteristics of IoT The various characteristics of IoT are given are as follows: Connectivity: Connectivity is an important aspect in the IoT infrastructure. IoT devices should always be connected with each other anytime, anywhere without any interference. Scalability: It should be capable of managing enormous devices in a vast area and should always stay connected and send and receive data efficiently. Dynamic changes: The devices should be able to adapt themselves to its changing environment, number of devices and locations which can change dynamically. Heterogeneity and safety: The devices may be compressed of many different types of hardware platforms and networks. Therefore, each device should be able to interact with other devices through different platforms using various networks.
IoT and Machine Learning Algorithm in Smart Agriculture
359
Safety should always be an utmost thing to prevent data theft or unwanted access to the devices. Hence equipment safety is a critical aspect in IoT network [7].
3.3 IoT Architecture IoT has a wide range of applications and works as per it is designed. It does not have standard definition for its architecture and it depends upon its implementation in various sectors, the basic architecture of IoT is as follows: Sensor layer: In this layer, the devices are connected with sensors or actuators. These sensors/actuators take input, processes the data and sends it over the networks such as Ethernet or Wi-Fi connections. Network layer: In this layer, various network gateways and Data Acquisition System (DAS) are present. Sensors send the captured data via wired or wireless network transport medium. DAS collects the data sent by sensors and converts analog data to digital data. Using advanced gateways, many functionalities like malware protection are performed. Data processing layer: In this layer, the main processing unit of IoT ecosystem is present. The data, which is received through network, is analyzed and processed and sends it to the data cloud. Application layer: In this final layer, the data received is stored in data centers or data cloud where the data is managed and is ready to use by the end-users. The end-user can these data for many applications like health care, agriculture, etc. [7].
3.4 IoT Components An IoT system consists of sensors or devices which send data to cloud via network. These sensors collect data of physical properties from their environment like temperature reading, capturing images measuring wind speed, etc. [11]. According to the Ratnaparkhi et al. [10], the goal of these sensors is to monitor, control, diagnosis, and analysis of the properties of the surrounding environment while adapting themselves to its changing climate and weather conditions.
4 Agricultural IoT IoT can be used in many sectors to improve the current techniques by connecting things to each other and to the Internet. One of the important sectors is the agriculture which is the source of food for the people around the world. As described by Alreshidi et al. [9] agricultural products are the final outcome of cultivation of crops and livestock using which humans obtain food and clothing. A few examples for the
360
A. Revathi and S. Poonguzhali
agricultural products are foods and grains, fibers, raw materials, and fuels. The most important form is foods, and rice which is mainly produced in India and India is the 2nd largest producer in the world. The use IoT in agriculture provides the farmers the ability to monitor the agricultural field from a remote place which saves a great amount of time and the farmers can even control or automate the equipment like motors, etc. in the field. Using this technology, the output can be maximized with high quality and low usage of capital and resources which will save both time and money for farmers. It can also be used to manage the harvesting and storage of crops and can also monitor livestock as well detect pests and plant diseases at an early stage. In IoT, various devices such as sensors can be used in farm to determine the soil properties, environmental conditions accurately [8]. As described by Candemir et al. [3], the farmers can know the type of seeds suitable for a particular field by determining the air temperature, NPK content, CO2 content, etc. along with the types of fertilizers and type of irrigation that can be used in the field. IoT performs in various domains of farming to increase time efficiency, water management, detecting the disease, soil management, crop monitoring, control of insecticides and pesticides, etc. It also reduces human efforts, simplifies techniques of farming, and helps to improve smart farming.
4.1 Applications of IoT in Agriculture Monitoring the climatic conditions: Climatic change is one of the major factors that affects the crops. Therefore, climatic conditions should be monitored at a regular basis to prevent loss of crops due to unpredictable weather. The use of IoT devices attached with sensors, gather the data of the surroundings in the agriculture field to take precautions accordingly. These sensors recognize the climatic conditions and data is transferred to the cloud where the user can use the data and take necessary precautions [6]. The sensors attached with IoT devices detects real-time weather conditions like humidity, rainfall, temperature and humidity which are essential for crop production. Sensors are able to sense any difference in the climatic condition that can reduce the production. An alert will be sent to the server regarding change in climate which can be helpful to eliminate the need for physical presence. This leads to higher yields. Monitoring the pests and crop diseases: The presence of pests in the agricultural field damages the crops leading to decrease in the quality as well as the quantity of output. Also crop disease damages the plants drastically. Therefore, pests and crop diseases should be identified at an early stage to prevent the further spread of diseases to other crops. Image detection using IoT technology can be used for the detection of these pests and diseases [6]. Using the IoT technologies such as sensors can be placed in the field of agriculture and it can be used to sense the data like soil moisture and temperature. Healthy crops are essential for human consumption.
IoT and Machine Learning Algorithm in Smart Agriculture
361
Managing the water resource: For the healthy growth of crops, water should be used adequately. Excess or shortage of water interferes with the growth of the plants which may even become a cause for the diseases. The IoT sensors can be used here to determine the level of moisture in soil and provide water to crops accordingly [6]. Introduction of high-tech practices like smart irrigation, precision farming, crop water management, real-time water metering’s provide better water management solutions by partially or fully automating and optimizes the use of human power. Livestock Monitoring: The use of IoT technology on domesticated animals enables the farmers to know the location and even the health condition of the animals. Various animal checking sensors are fixed to the animals to record their activities. It will reduce the use of labor and cost [6]. The collected information helps farmers to identify sick animals by separating them from others and thus reduces the spread of the diseases with other animals. Crop management: Management of crops are characterized by ecological, economic, and social aspects. In crop management, there is a major difference between crops in indoor and outdoor. For instance, all the factors in indoor system are under control whereas in outdoor system they are not. We can use IoT technology in the indoor system to manage the crops as well as factors like proper lightning, fan, water, etc. [15]. IoT devices should be placed in the field to collect the data specific to crop farming from temperature to leaf water potential and overall crop health. These collected data and information are used for improved farming practices. Sensors can monitor the crop growth effectively to prevent diseases or infestations that could harm the yield. Drones: In many places, usage of drone technology for monitoring crops has become an essential part of large-scale precision agriculture [5]. In a drone, a camera is attached for recording the field images and observing them in real-time. This enables the farmers to observe the fields from a remote place and ensures safety to them by providing surveillance to fields all the time. The ground-based and aerialbased drones are more helpful in agriculture for irrigation, crop health assessment, crop monitoring, planting, crop spraying, and soil and field analysis. Various benefits like crop health imaging, saving time, integrated GIS mapping, and the potential to increase yields are achieved using drones. Drone technology gives high-tech makeover to the agriculture industry by using real-time data. Precision farming: Precision farming is the way of making accurate decisions based on data gathered for getting maximum output. It is one of the effective technologies in agriculture for better results. It uses sensors to collect data like soil temperature, moisture, etc. [3]. Precision farming can be used to analyze condition of the soil and other parameters to increase the efficiency of the operation. Smart Green house: Greenhouse farming can be used to enhance the yield of fruits, vegetables, crops, etc. Greenhouse controls various parameters through manual intervention or a proportional control mechanism. A smart greenhouse could be made with the help of IoT. This model records as well as controls the climate and eliminates the need for manual intervention. Green house can be automated using IoT technology. Using this approach, plants are provided with the right amount of water, temperature, and even proper lightning. Remote controller can be used to control the
362
A. Revathi and S. Poonguzhali
Fig. 1 Applications of agricultural IoT
devices from anywhere [10]. To make our greenhouses more advanced, IoT has made weather stations which can automatically adjust the climate conditions according to a specific set of instructions. Finally all the applications of IoT in agriculture is shown in Fig. 1.
4.2 Technologies Used in IoT for Agriculture There are various technologies in IoT that can be used in many fields. Some examples are (1) Cloud versus edge computing in agriculture: Edge computing is a technology which is used to process real-time data whereas cloud computing is used for processing data which is not time-driven. In cloud-based programming, measuring, and recovering of data are done in a more precise manner which is useful for better smart farming. (2) Data analytics and Machine learning: Technologies like machine learning can be used in agriculture in many ways such as detection of pests in crops, prediction of diseases in plants, etc. IoT combined with machine learning provides many best ways to solve various kinds of problems in agriculture [6]. Today’s agriculture mainly uses sophisticated technologies such as temperature, moisture sensors, robots, aerial images, and GPS technology. These advanced devices and precision agriculture and robotic systems allows businesses to be more efficient, profitable, safer, and more environmentally friendly. Radio-frequency Identification (RFID) is a system composed of various RFID tags and readers and used to communicate with item and people and it is a wireless communication technology. Long term evolution (LTE) refers to the standard communication protocol wirelessly
IoT and Machine Learning Algorithm in Smart Agriculture
363
which is used for transferring data at high-speed based on GSM. Near-field communication (NFC) is a similar version of RFID but the difference is that it works in a very short range of 20 cm. Ultra-wide band (UWD) is a communication technology used to improve communications in areas having short range coverage. This technology is very similar to NFC, except that it uses high bandwidth to interconnect sensors and is capable of 500 MHz bandwidth [13]. According to the Madushanki et al. [16] In recent years, various IoT technologies made significant improvements, in which, Wi-Fi technology stands at the top of the IoT technologies with 30% of usage, while Mobile technology is placed in the second place with 20% of usage and Radio communication is placed in the last place with 2% of usage. IoT-based devices in agriculture. IoT devices have been used in various sectors like industries, agriculture, and all. The use of IoT devices like LoRa will provide many advantages to its users and it is made available at a very low cost. These devices can even detect the quality of food [17]. It can enable smart agriculture using which farmers can gain benefits like high yield crops, precise watering system, and even automating everything in the field. Long range (LoRa), a wireless platform of IoT, is a modulation technique based on spectroscopy which is mainly used for long range communications with low-power consumption. LoRa is more ideal for places where there is a need for physical penetration through structures along with low-power consumption. LoRa falls under the LPWAN and non-cellular category. Using IoT, agriculture can be intellectualized and the work in the farm land can be automated easily. There are various IoT technologies which can be used agriculture but each one of them can be used for a specific purpose. For instance, GPRS system can be used to improve irrigation system but it requires high amount of energy and Wi-Fi can be used to save the energy but the coverage area becomes less. LoRa is a beneficial system which can cover a large amount of area while consuming less amount of energy [18]. For example, Arduino, a single-board microcontroller, can be used in agricultural field to control the irrigation system, motors, etc. Another example is Raspberry Pi, a tiny device used to monitor, can be used in land to sense various parameters like soil temperature, moisture, nitrite content, etc. The use of these IoT devices can help farmers in saving time as well as money while gain maximized production without any loss of crops.
4.3 Types of Sensors in Agriculture According to Ratnaparkhi et al. [10], there are different kinds of sensors used in agriculture: Optical sensors: This type of sensor makes use of light to determine the soil properties and it is mainly installed in devices like satellites and drones. Location sensors: This type of sensor is used by attaching them to tracking devices to accurately map the farms, mainly farms with irregular landscapes.
364
A. Revathi and S. Poonguzhali
Electrochemical sensors: This type of sensor is used to analyze the pH level of the soil by measuring the voltage between two points using electrodes. This sensor uses the contact method for analyzing. Acoustic sensors: This type of sensor is used to detect sound in its surroundings. The common use of this sensor is to detect pesticides in the field. The prevention of these pests can avoid diseases in plants.
4.4 Agriculture Data Analysis Knowledge management system: It is used store and manage the data while improving the understanding and process alignment. Knowledge base is used by farmers for storing huge amount of data and is mainly useful for providing accurate knowledge. Knowledge acquisition gets and analyses the data using existing methods where the data is got through agricultural universities and agriculture agencies with other API services. Knowledge flow refers to the representation of flow of the data in an efficient manner and providing optimal information to adapt varying climatic condition [19]. Data pre-processing: Data pre-processing is an essential step to improve the efficiency and accuracy in data mining since the data may be inconsistent. The data is pre-processes as follows: (I) cleaning the data to remove irrelevant information, (ii) integrating the cleaned data, and (iii) transformation of data as per the required condition. Data reduction and modeling: In this process, the data is encoded into smaller representation while preserving the original data and produce same analysis result. It uses numerosity education for the storage of the reduced data including histogram. In data modeling, it will process the knowledge is extracted from the previous data. It uses analysis tools like classification, association, etc. to identify the data patterns [14].
4.5 Agricultural Practices Soil mapping: This method is used to sow various crop varieties in a particular field which matches the soil properly. In this way, the farmers can sow seeds in a smarter way by growing different crops together. Phenotyping: In this process, the plant genomics is linked with agronomy and eco-physiology. The crop breeding is progressed using various genetic tools in the recent decade but even though it helps in modification, it has several disadvantages like less pathogen resistance, less grain weight, etc. Crop disease and pest management: The crops are getting affected due to poor maintenance and increase in number of pests in crop. To obtain healthy and sustainable crop growth, disease and pest management is a necessary step to improve the
IoT and Machine Learning Algorithm in Smart Agriculture
365
overall production. It can be done by using technologies like UAVs to spray pesticides and even monitor crops. The use of various vehicles attached with GIS as well as GPS facilities improve the Precision farming while identifying and sending the precise location of the devices. These technologies can be implemented in various platforms like airplanes, satellites, and even balloons as well as UAVs. Sigfox and Hydroponics: Sigfox is referred to as the network provider to objects which consumes less energy as per the requirements. This service is on the basis of ultra-narrow-band technology (i.e.) using narrow spectrum while encoding the radio waves. Hydroponics refers to the process of lowering the water consumption and required space drastically.
5 Benefits of IoT in Agriculture • IoT can collect a huge amount of data automatically using sensors. • Water usage can be improved by measuring the soil moisture and automated watering can be implemented to eliminate the risk of overwatering. • The pesticide usage is measured to improve the effective pesticide strategy. • IoT can give more profitability to increase the harvest by reducing human labor and it can able to give the accurate planning and predictions in a short time. • Preventing the crop and pest diseases is very essential and it can be possible by using the IoT sensors. • Monitoring the soil temperature, livestock, climate and weather conditions, and the total environmental field is processed by IoT technologies. • Greenhouse automation is very essential to enhance the smart greenhouse by using IoT devices. • Drone is one of the IoT technologies which can be used for monitoring the field and spraying fertilizes. • IoT can be able to use for routine operations by using the automatic machines and automatic vehicles. • IoT allows to reduce the cost and less human intervention by using automation technologies [17].
6 Comparison of Literatures for Arriving at Best IoT and Machine Learning Algorithm Araby et al. [22] have implemented a sensing network to collect data like temperature, humidity, and moisture level and data was fed to the algorithm using a microcontroller named nodeMCU equipped with ESP8266 Wi-Fi module in which Raspberry pi3 was used as a gateway between the perception layer and the application layer. Machine learning such as linear regression and neural network were used to predict the occurrence of disease before 8 days, which then sends a warning message to the
366
A. Revathi and S. Poonguzhali
farmer through GUI. Maneesha et al. [24] proposed a system to detect the rice plant diseases using sensors to collect the climate data and water data which is then sent to thinkspeak cloud and the stored data is then extracted by Naive Bayes algorithm for predicting the rice plant diseases. The proposed system has achieved an accuracy of 90%. From the Table 1 Sangeetha et al. [23] have taken two parameters temperature, humidity to predict the level of water and using these parameters data was fed to the Arduino and did not use any decision system to arrive at an outcome. While Reddy et al. [20] have taken three parameters temperature, humidity, and moisture to predict the amount water required using Raspberry pi along decision tree. Dahane et al. [21] has taken five parameters soil-moisture, temperature, humidity, waterflow, luminous intensity using these parameters data can be fed to the Raspberry pi microprocessor along with LSTM and GRU but in this author used more parameters then others to get higher accuracy of required water resource. We can draw a conclusion that using four parameters increased the prediction and monitoring of the irrigation system accurately. Also, from the Table 1 after comparing all the authors study we can understand that the combination of nodeMCU along with Raspberry pi module performed better rather than using Raspberry pi or Arduino alone as the IoT device. The combination has provided a better accuracy when combined with linear regression and neural network in predicting the plant diseases at earlier stage. In the field of agriculture, IoT automation is still in a developing stage. Only monitoring and prediction using IoT technologies doesn’t provide better solutions as it only sends alert messages to farmers and the further steps have to be taken manually. Therefore, if IoT is combined with machine learning (ML) algorithms such as decision tree, neural network, linear regression, support vector machine it can help us to arrive at decisions along with the alert message. The decision drawn helps the farmer to arrive at quick and accurate prevention measures. This paper proposes the combination of IoT and machine learning algorithms to design and develop the prediction model for overcoming the limitations of traditional agriculture. In IoT technology using Raspberry pi along with nodeMCU is the best way to collect the data from the agricultural field along with the machine learning algorithm such as decision tree which gives higher accuracy and best prevention steps and thus, the method will alter the traditional agriculture into smart agriculture.
7 Conclusion IoT-based agriculture is very important to solve the main problem of farmers like climate and weather monitoring, livestock monitoring, crop disease, and smart greenhouse effect, etc. In this paper, IoT and its applications in agriculture have been discussed briefly. This paper discusses in detail the characteristics, components, and technologies of IoT that improves the crop productivity with less human intervention. This paper also discusses how to transform traditional agriculture into smart agriculture using IoT. It is proposed that the data collection with the four parameters
IoT and Machine Learning Algorithm in Smart Agriculture
367
Table 1 Comparison of IoT and machine learning algorithm S. No.
Reference No.
Proposed system
Parameters
1
[20]
Smart irrigation system
2
[21]
3
Methodology
Result
IoT
Algorithm
Temperature, humidity, moisture
Raspberry pi
Decision tree Using this method, the amount of water needed is sent to the farmer through the mail id
Smart farming system
Soil-moisture, temperature, humidity, water-level, water-flow, luminous intensity
Raspberry pi
Long short-term memory and Gated recurrent units
Usage of water resource can be reduced for precision agriculture
[22]
Monitoring system
Temperature, humidity, moisture
NodeMCU and Raspberry pi
Linear regression and neural network
To predict the disease and the warning message is sent through GUI
4
[23]
Monitoring system
Temperature, humidity, chlorophyll
Arduino
–
The level of water, presence of insects and the amount of chlorophyll are automatically monitored
5
[24]
Plant disease prediction
Temperature, humidity, moisture
Raspberry pi
Naïve Bayes It predicts the paddy disease and provides safety measures that are to be taken beforehand to avoid the infection of plants
368
A. Revathi and S. Poonguzhali
(temperature of environment, humidity of environment, soil moisture, and soil pH) along with nodeMCU and Raspberry pi IoT device can be implemented for accurately predicting and monitoring the crop productivity. It is also recommended to make use of decision tree algorithm for increased accuracy in monitoring healthiness of the crop. As future enhancement, research can focus on usage of robotics for improving crop productivity, to overcome the rapidly changing climate and labor shortage. Additionally, research can focus on usage of IoT along with ML algorithms in reducing the amount of herbicide used by farmers.
References 1. Sahoo J, Barrett K (2021) Internet of things (IoT) application model for smart farming. In: Southeast IEEE conference, pp 1–2 2. Reddy KSP, Mohana Roopa Y, Kovvada Rajeev LN, Nanda NS (2020) IoT based smart agriculture using machine learning. In: Proceedings of the second international conference on inventive research in computing applications (ICIRCA-2020), pp 130–134. ISBN 978-1-7281-5374-2 3. Candemir E, Sa˘gır MM, Tahtalı A, Maral E, Ta¸skın S (2021) IoT based precision agriculture: Ema farming. In: IEEE conference 4. Mondal MA, Rehena Z (2018) IoT based intelligent agriculture field monitoring system. In: 8th international conference on cloud computing, data science & engineering (confluence). IEEE, pp 625–629 5. Bhuvaneshwari C, Saranyadevi G, Vani R, Manjunathan A (2017) Development of high yield farming using IoT based UAV. IOP Conf Ser Mater Sci Eng 1055:012007, 1–5. https://doi.org/ 10.1088/1757-899x/1055/1/012007 6. Shali A, Sangeerani Devi A, Kavitha D (2021) Agricultural farming survey using IoT. J Phys Conf Ser 1724:012047, 1–5. https://doi.org/10.1088/1742-6596/1724/1/012047 7. Patel KK, Patel SM (2016) Internet Of Things—IoT: definition, characteristics, architecture, enabling technologies, application & future challenges. Int J Eng Sci Comput 6(5):6122–6131. https://doi.org/10.4010/2016.1482 8. Malavade VN, Akulwar PK (2016) Role of IoT in Agriculture. In: National conference on “changing technology and rural development”, pp 56–57 9. Alreshidi E (2019) Smart sustainable agriculture (SSA) solution underpinned by internet of things (Iot) and artificial intelligence (AI). Int J Adv Comput Sci Appl (IJACSA) 10(5):93–102 10. Ratnaparkhi S, Khan S, Arya C, Khapre S, Singh P, Diwakar M, Shankar A (2020) Smart agriculture sensors in IoT: a review. Mater Today Proc, 1–6 11. Mishra KN, Kumar S, Nileshkumar R Patel (2021) Survey on internet of things and its application in agriculture. J Phys Conf Ser 1714:012025, 1–9. https://doi.org/10.1088/1742-6596/ 1714/1/012025 12. Patil KA, Kale NR (2016) A model for smart agriculture using IoT. In: International conference on global trends in signal processing, information computing and communication. IEEE, pp 543–545 13. Khanna A, Kaur S (2019) Evolution of internet of things (IoT) and its significant impact in the field of precision agriculture. Comput Electron Agric, 218–231 14. Muangprathuba J, Boonnam N, Kajornkasirat S, Lekbangpong N, Wanichsombat A, Nillaor P (2019) IoT and agriculture data analysis for smart farm. Comput Electron Agric, 467–474 15. Vitali G, Francia M, Golfarelli M, Canavari M (2021) Crop management with the IoT: an interdisciplinary survey. J Agronomy 11(181):1–18. https://doi.org/10.3390/Agronomy1101 0181
IoT and Machine Learning Algorithm in Smart Agriculture
369
16. Madushanki R, Halgamuge MN, Surangi Wirasagoda WAH, Syed A (2019) Adoption of the internet of things (IoT) in agriculture and smart farming towards urban greening: a review. Int J Adv Comput Sci Appl (IJACSA) 10(4):11–28 17. Gómez-Chabla R, Real-Avilés K, Morán C, Grijalva P, Recalde T (2019) IoT applications in agriculture: a systematic literature review. Springer Nature Switzerland AG 2019, pp 68–76. https://doi.org/10.1007/978-3-030-10728-4_8 18. Zhao W, Lin S, Han J, Xu R, Hou L (2017) Design and implementation of smart irrigation system based on Lora. In: IEEE conference 19. Mohanraj I, Kirthika A, Naren J (2016) Field monitoring and automation using IoT in agriculture domain. In: 6th international conference on advances in computing & communications, ICACC, pp 931–939 20. Reddy KSP, Mohana Roopa Y, Kovvada Rajeev LN, Nandan NS (2020) IoT based smart agriculture using machine learning. In: Second international conference on inventive research in computing applications, pp 130–134 21. Dahane A, Benameur R, Kechar B, Benyamina A (2020) An IoT based smart farming system using machine learning. 978-1-7281-5628-6/20/$31.00 ©, 2020 22. Araby AA, Abd Elhameed MM, Magdy NM, Said LA, Abdelaal N, Abd Allah YT, Saeed Darweesh M, Fahim MA, Mostafa H (2019) Smart IoT monitoring system for agriculture with predictive analysis. In: 8th international conference on modern circuits and systems technologies (MOCAST) 23. Sangeetha K, Santhosh A, Pradeeba SS, Selvamani T (2019) Paddy monitoring and management system. Int J Appl Eng Res 14(5):1045–1048. ISSN 0973-4562 24. Maneesha A, Suresh C, Kiranmayee BV (2021) Prediction of rice plant diseases based on soil and weather conditions. In: Proceedings of international conference on advances in computer engineering and communication systems, learning and analytics in intelligent systems, vol 20, pp 155–165. https://doi.org/10.1007/978-981-15-9293-5_14
Laminar Ice Flow Model-Based Thickness and Volume Estimation of Karakoram Glaciers S. Sivaranjani and M. Geetha Priya
Abstract Glaciers are the true guide that reflects the impact caused by global temperature rise. The challenging rugged terrain of Karakoram is the reason for a very little exploration of glaciers in that region. With the advent of remote-sensing techniques and data products, the glacier dynamic parameters such as ice thickness and surface flow velocity in Karakoram are explored. The present study involves the laminar ice flow model-based estimation of thickness for benchmark glaciers (Hispar, Batura, Baltoro, Biafo, Siachen, Rimo, Mani, Pratikrisht, Kutiah, and Avadyah) of five basins of Karakoram Range. Digital elevation model, surface flow velocity of the benchmark glaciers for HY 2017–18, and other parameters have been used as inputs for the model. Siachen Glacier possesses a maximum ice thickness of 895.96 m with an average ice thickness of around 81.75 m. Batura, Biafo, Baltoro, and Rimo glaciers possess ice thickness values greater than 500 m. Hispar, Mani, Pratikrisht, Kutiah, and Avadyah possess ice thickness values less than 500 m. From this study, it is evident that the thickness of the glaciers varies spatially due to the bedrock topography and elevation. Thus, the ice thickness of benchmark glaciers of Karakoram is higher than the ice thickness of Himalayan glaciers. The estimated total volume of water accumulated in the benchmark glaciers of the Karakoram using area and average ice thickness of glaciers is 288.20 km3 . Thus, ice volume estimates are crucial for managing glacial lake outburst floods (GLOFs) hazards. Keywords Karakoram · Ice thickness · Velocity · Laminar flow model · Benchmark glaciers
1 Introduction Glaciers are significant climate change indicators [1]. The Karakoram and Himalayan mountain ranges are home to numerous glaciers and serve as the primary source of several perennial rivers downstream [2]. As glaciers in the Karakoram are placed S. Sivaranjani · M. Geetha Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru 560082, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_30
371
372
S. Sivaranjani and M. Geetha Priya
at higher elevations, they exhibit different faces. Because of the rough terrain that exists in the Karakoram, exploration of glaciers in this region becomes a challenge [3]. Remote-sensing characteristics such as wide space coverage, frequent data collection on the same space, and the cost-effectiveness of satellite data when compared to field measurements have increased its application in glaciology. The datasets collected using satellite optical sensors are widely used to create a digital repository for glaciers in mountain areas [4]. Understanding glacier dynamics prompt knowledge of several parameters including mass balance, ice surface velocity, bed elevation, depth and spatial extent of the glacier. Surface ice velocity and depth of the glacier are critical prerequisites for many glaciological studies [5, 6]. Any rapid temperature rise throughout the world changes the glacier’s dynamic characteristics. However, the lack of in-situ data frequently limits our knowledge of glacier ice thickness distribution in the Karakoram. In general, glacier velocity is calculated using techniques such as offset tracking, interferometry, and pixel correlation (cross and subpixel) [7, 8]. The velocity in this analysis was calculated by the cross-correlation technique at the subpixel level. Future glacier change prognoses, estimates of available freshwater resources, and assessments of potential sea-level rise all require accurate glacier ice thickness. Several methods for estimating ice thickness based on remote-sensing observational data, such as surface mass balance, surface elevation, and surface velocity using satellite optical data, have been proposed. A study on approximating the thickness and volume of ice on the Karakoram glaciers has been done based on elevation [9]. Estimating glacier ice thickness will provide better insights for volume assessment at the basin and glacier scales. The perfect plasticity and laminar flow models are widely used to estimate glacier thickness [10, 11]. In this study, a laminar flow model with glacier velocity as a primary input is used to estimate the thickness of benchmark glaciers in the Karakoram. Area chosen for the present study, types of data used, and method adopted for estimating glacier velocity along with the results and conclusion is discussed subsequently.
2 Study Area Karakoram is the maximum ice-covered and distant mountain range in Asia. It outspreads from latitude 34° N to 37° N and longitude 73° E to 78° E. Areal extent of Karakoram is ~ 53,378 km2 with 18,231 km2 lined with ice. It is composed of five major basins, namely Hunza, Shigar, Shyok, Gilgit, and a part of the Indus basin [12]. Elevation of mountain chain ranges to a maximum of 8572 m.a.s.l and minimum of 767 m.a.s.l. Our present study is concentrated on some of the benchmark glaciers of the Karakoram Range (2 glaciers per basin), namely Hispar, Batura, Baltoro, Biafo, Siachen, Rimo, Mani, RGI60-14.03658, Kutiah, and RGI60-14.04884 as shown in Fig. 1.Glaciers, namely RGI60-14.03658 and RGI60-14.04884, of the Gilgit and Indus basin have been named as Pratikrisht and Avadyah in the present study, respectively. Details of all the benchmark glaciers are represented in Table 1.
Laminar Ice Flow Model-Based Thickness and Volume Estimation …
373
Fig. 1 Study area—Benchmark glaciers of Karakoram, India Table 1 Details of benchmark glaciers in the Karakoram
Area (km2 )
Basin
RGI60-14.04477
495.645
Hunza
RGI60-14.02150
311.653
Baltoro
RGI60-14.06794
809.109
Biafo
RGI60-14.00005
559.807
Siachen
RGI60-14.07524
Rimo
RGI60-14.05890
Mani
RGI60-14.04807
37.081
Pratikrisht
RGI60-14.03658
29.422
Benchmark Glacier
RGI Id
Hispar Batura
1077.95
Shigar Shyok
439.616
Kutiah
RGI60-14.04945
50.957
Avadyah
RGI60-14.04884
18.11
Gilgit Indus
374
S. Sivaranjani and M. Geetha Priya
Table 2 Particulars of the data employed in the contemporary analysis Data
Path
Row
Date of acquisition
Band number
Resolution (m)
Landsat-8
148
35
30-11-2017 17-11-2018
8
15
149
35
05-11-2017 24-11-2018
8
15
150
35
12-11-2017 15-11-2018
8
15
3 Data Used Surface ice flow velocity estimation (Hydrological Year (HY) 2017–2018), for the benchmark glaciers, involves the use of band 8 (Panchromatic) of Landsat-8 satellite with 15 m spatial resolution. Global Digital Elevation Model (GDEM) with 30 m spatial resolution of Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data is used for glacier ice thickness estimation using the laminar ice flow model. Randolph Glacier Inventory (RGI)-Version 6.0, an updated version of glacier boundary datasets, was used to extract the boundary of the benchmark glaciers. Table 2 provides particulars of the data employed in the contemporary analysis.
4 Methodology The methodological flow involved in the current analysis is represented in Fig. 2. Multispectral satellite imagery of two consecutive years is used for subpixel correlation using the COSI-Corr module developed by Leprince et al. [13]. For correlating the consecutive multispectral images, a moving window of 32 × 32 pixel frequency correlator was used (step size = 2). This subpixel-based correlation generates three results such as East–West (Y ), North–South (X), and Signal-to-Noise Ratio (SNR) at a spatial resolution of 30 m. To overcome the uncertainties because of the noise, only the pixels with SNR greater than 0.9 are considered. Using Euclidean distance Eq. (1) the resultant displacement (D) is calculated. The variation in the time among the consecutive satellite images is used to determine the surface velocity (V s ) of the glacier on an annual scale. The glacier ice surface velocity estimated using the above methodology is used as an input to the laminar ice flow model for glacier ice thickness calculation using Eq. (2) [14]. D=
X2 + Y 2
(1)
Laminar Ice Flow Model-Based Thickness and Volume Estimation …
375
T =
4
1.5Vs A f 3 (ρg sin α)3
(2)
where T —Glacier ice thickness (m), V s —Annual surface velocity (m/yr), ρ—Ice density (900 kg m–3 ), g—Acceleration due to gravity (9.8 m s–2 ), A—Creep parameter (3.24 × 10−24 Pa–3 s–1 ), f —Shape factor (0.8), α—Slope angle. With the combination of the laminar flow equation and basal shear stress, the glacier ice thickness equation has been derived. Using pixel-based velocity measurements through the glacier’s central flow line, the glacier ice thickness has been estimated for different elevation bands using Eq. 2. Distribution of glacier ice thickness (Spatially) is obtained by interpolation technique using Topo to Raster function in ArcGIS. The volume of water stored in the benchmark glaciers is then calculated using the average ice thickness value and the area of the glaciers [15]. Velocity-based ice thickness estimation models have also been used by researchers to estimate glacier volume and deepening in the glacier bed which are useful for the government policymakers to design their water use policies [16].
Multispectral Satellite Imagery (Consecutive years) Correlation at sub pixel level
DEM
Slope
X, Y and SNR image
Displacement
Ice surface Velocity
Glacier Area
Laminar Ice Flow Model
Glacier Boundary
Glacier Ice thickness
Fig. 2 Methodology adopted for glacier ice thickness estimation
Glacier Volume
376
S. Sivaranjani and M. Geetha Priya
5 Results From the laminar flow model conferred under the methodology part, the glacier ice thickness distribution for the benchmark glaciers of Karakoram has been obtained using the surface velocity (HY 2017–2018) and DEM data. Annual surface ice flow velocity of benchmark glaciers of Karakoram Range is shown in Fig. 3. From the obtained velocity results, the benchmark glacier of Karakoram Range experiences a maximum velocity of 180 m/yr for HY 2017–2018. For Batura, Biafo, Baltoro, Siachen, and Rimo glaciers the maximum velocity (140–180 m/yr) has been observed in the main central flowline as well as in the nodes of the tributaries. Minimum velocity (20–40 m/yr) has been observed in the upper accumulation regions of the glacier. Other glaciers, namely Hispar, Mani, Pratikrisht, Kutiah, and Avadyah, experienced the maximum velocity (40–60 m/yr) in very few regions of upper accumulation regions. Velocity variations are mainly due to the underlying varied topography [17]. Benchmark glacier ice thickness derived using a laminar flow model with velocity as one of the inputs is shown in Fig. 4. The maximum and average thickness values for all benchmark glaciers are represented in Fig. 5. Among all, the Siachen Glacier possesses a maximum ice thickness of 895.96 m with an average ice thickness of around 81.75 m. Next to the Siachen Glacier, Baltoro, Rimo, Biafo, and Batura possess maximum and average ice thickness of 713.67 m, 609.99 m, 601.99, 568.92 m and 79.85 m, 95.9 m, 77.84, 61.22, respectively. Maximum velocity of the Hispar,
Fig. 3 Surface velocity of Karakoram benchmark glaciers
Laminar Ice Flow Model-Based Thickness and Volume Estimation …
377
Mani, and Kutiah glaciers is observed to be 485.99 m, 297.96 m, and 320.52 m, respectively. Pratikrisht Glacier possesses a maximum ice thickness of 263.95 m with an average thickness of around 46.06 m. Similarly, Avadyah Glacier possesses a maximum ice thickness of 265.8 m with an average thickness of around 46.17 m. Pratikrisht and Avadyah glaciers have reduced maximum thickness as they cover a small area compared to the other glaciers and the maximum thickness is spatially distributed in very few regions inside the glaciers. Benchmark glacier’s ice thickness distribution shows that the thickness is maximum along the central flowline. While moving away from the central flowline, the ice thickness of all benchmark glaciers gradually reduced. In upper accumulation regions of the glaciers, the ice thickness is noticed to be very minimum. Thus, the underlying bedrock and the slope are the major part of the spatial distribution of ice thickness estimation [18]. Ice thickness may also vary spatially as well as temporally due to various parameters that are discussed in Eq. (2). Among that the slope of the glacier derived from elevation data has a significant impact on the ice thickness estimation. Thus, field-based data is very much essential to have better insights into the ice thickness of the glacier. Volume of accumulated water in each benchmark glacier is represented in Table 3. Accumulated water volume of the Siachen Glacier is 88.12 km3 being the largest storage reservoir in the Shyok basin of Karakoram Range. Avadyah Glacier of Indus basin stores 0.83 km3 of water is
Fig. 4 Ice thickness distribution of Karakoram benchmark glaciers: a Hispar, b Batura, c Pratikrisht, d Baltoro, e Rimo, f Mani, g Biafo, h Siachen, i Kutiah, j Avadyah
378
S. Sivaranjani and M. Geetha Priya
ICE THICKNESS OF BENCHMARK GLACIERS 1000 900
Ice thickness (m)
800 700 600 500 400 300 200 100 0
Bench Mark Glaciers Maximum Thickness (m)
Average Thickness (m)
Fig. 5 Maximum and average ice thickness of Karakoram benchmark glaciers
Table 3 Volume of water accumulated in benchmark glaciers in km3
Benchmark Glacier
Average thickness (km)
Area (km2 )
Volume (km3 )
Hispar
0.04962
495.645
24.59
Batura
0.06122
311.653
19.08
Baltoro
0.07985
809.109
64.61
Biafo
0.07784
Siachen
0.08175
Rimo
0.0959
559.807 1077.95 439.616
43.57 88.12 42.15
Mani
0.03943
37.081
1.46
Pratikrisht
0.04606
29.422
1.35
Kutiah
0.04737
50.957
2.41
Avadyah
0.04617
18.11
0.83
the smallest storage reservoir. Total volume of water accumulated in the benchmark glaciers is found to be 288.20 km3 . Again, the volume of water accumulated in the glacier is mainly dependent on the area occupied by the glacier as well as its estimated depth.
Laminar Ice Flow Model-Based Thickness and Volume Estimation …
379
6 Conclusion Ice thickness distribution for benchmark glaciers of Karakoram has been estimated using a laminar flow model using velocity data of HY 2017–18. In this study, ice thickness values are maximum (> 500 m) for Batura, Biafo, Baltoro, Siachen, and Rimo glaciers. Hispar, Mani, Pratikrisht, Kutiah, and Avadyah possess ice thickness values of around < 500 m. Average ice thickness for all the benchmark glaciers almost exists in the same range. Slope and the bedrock topography of the glacier are responsible for the spatial and temporal variations in the distribution of glacier ice thickness. The average volume of water accumulated in the selected ten benchmark glaciers of Karakoram Range using area and average ice thickness is estimated to be 28.82 km3 . Thus, glacier ice thickness data and volume of water accumulated in the glacier with its topographical information are helpful in various hydrological applications at the basin scale. Acknowledgements This research is funded by the NMSHE Climate Change and Himalayan Cryosphere Program (DST/CCP/NHC/156/2018(G)) within the SPLICE-DST Research Network. The authors are grateful to Dr. Krishna Venkatesh, Director, CIIRC—Jyothy Institute of Technology, Bengaluru, for his support and cooperation.
References 1. Bishop MP, Bush ABG, Copland L, Kamp U, Owen LA, Seong YB, Shroder JF (2010) Climate change and mountain topographic evolution in the central Karakoram, Pakistan. Ann Assoc Am Geogr 100:772–793. https://doi.org/10.1080/00045608.2010.500521 2. Mukhopadhyay B, Khan A, Gautam R (2015) Rising and falling river flows: contrasting signals of climate change and glacier mass balance from the eastern and western Karakoram. Hydrol Sci J 60:2062–2085. https://doi.org/10.1080/02626667.2014.947291 3. Bhutiyani MR (1999) Mass-balance studies on Siachen Glacier in the Nubra valley, Karakoram Himalaya, India. J Glaciol 45:112–118. https://doi.org/10.3189/s0022143000003099 4. Tak S, Keshari AK (2020) Investigating mass balance of Parvati glacier in Himalaya using satellite imagery based model. Sci Rep, 1–16. https://doi.org/10.1038/s41598-020-69203-8 5. Sivalingam S, Murugesan GP, Dhulipala K, Kulkarni AV, Devaraj S (2021) Temporal fluctuations of Siachen glacier velocity: a repeat pass SAR interferometry based approach. Geocarto Int 37(17):4888–4910. https://doi.org/10.1080/10106049.2021.1899306 6. Singh KK, Singh DK, Negi HS, Kulkarni AV, Gusain HS, Ganju A, Babu Govindha Raj K (2018) Temporal change and flow velocity estimation of Patseo glacier, Western Himalaya, India. Curr Sci 114:776–784. https://doi.org/10.18520/cs/v114/i04/776-784 7. Sivalingam S, Murugesan GP, Dhulipala K, Kulkarni AV, Pandit A (2021) Essential study of Karakoram glacier velocity products extracted using various techniques. Geocarto Int, 1–17. https://doi.org/10.1080/10106049.2021.1974954 8. Quincey DJ, Copland L, Mayer C, Bishop M, Luckman A, Belò M (2009) Ice velocity and climate variations for Baltoro Glacier, Pakistan. J Glaciol 55:1061–1071. https://doi.org/10. 3189/002214309790794913 9. Najam S, Reba MN, Hussain D (2018) Elevation dependent thickness and ice-volume estimation using satellite derived DEM for mountainous glaciers of Karakorum range Elevation
380
10.
11.
12. 13.
14.
15. 16.
17.
18.
S. Sivaranjani and M. Geetha Priya dependent thickness and ice-volume estimation using satellite derived DEM for mountainous glaciers of Karakorum range. IOP Conf Ser Earth Environ Sci 169 Singh KK, Negi HS, Singh DK (2019) Assessment of glacier stored water in Karakoram Himalaya using satellite remote sensing and field investigation. J Mt Sci 16:836–849. https:// doi.org/10.1007/s11629-018-5121-0 Gharehchahi S, James WHM, Bhardwaj A, Jensen JLR, Sam L, Ballinger TJ, Butler DR (2020) Glacier ice thickness estimation and future lake formation in Swiss Southwestern Alps—the upper Rhône catchment: a VOLTA application. Remote Sens 12:1–28. https://doi.org/10.3390/ rs12203443 Ashraf A, Roohi R, Naz R (2011) Identification of glacial flood hazards in Karakorum range using remote sensing technique and risk analysis. Sci Vis 16:71–80 Leprince S, Barbot S, Ayoub F, Avouac JP (2007) Automatic and precise orthorectification, coregistration, and subpixel correlation of satellite images, application to ground deformation measurements. IEEE Trans Geosci Remote Sens 45:1529–1558. https://doi.org/10.1109/ TGRS.2006.888937 Gantayat P, Kulkarni AV, Srinivasan J (2014) Estimation of ice thickness using surface velocities and slope: case study at Gangotri Glacier, India. J Glaciol 60:277–282. https://doi.org/10.3189/ 2014JoG13J078 Remya SN, Kulkarni AV, Pradeep S, Shrestha DG (2019) Volume estimation of existing and potential glacier lakes, Sikkim Himalaya, India. Curr Sci 116:1–8 Sattar A, Goswami A, Kulkarni AV, Das P (2019) Glacier-surface velocity derived ice volume and retreat assessment in the Dhauliganga basin, central Himalaya—a remote sensing and modeling based approach. Front Earth Sci 7:1–15. https://doi.org/10.3389/feart.2019.00105 Sivaranjani S, Priya MG, Krishnaveni D (2020) Glacier surface flow velocity of Hunza Basin, Karakoram using satellite glacier surface flow velocity of Hunza Basin, Karakoram using satellite optical.https://doi.org/10.1007/978-981-15-5788-0 Liu J, Wang S, He Y, Li Y, Wang Y, Wei Y, Che Y (2020) Estimation of ice thickness and the features of subglacial media detected by ground penetrating radar at the Baishui River Glacier no. 1 in Mt. Yulong, China. Remote Sens 12:1–23. https://doi.org/10.3390/rs12244105
Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using Space-Based Inputs B. Shashank and M. Geetha Priya
Abstract Snow cover is an essential part of Earth’s climate system as it is an important factor influencing Earth’s surface temperature. Climate change leads to variation in precipitation and temperature which results in higher melting rates in summer when compared to accumulation rates in winter. Hence, mapping snow cover will be helpful for studies on climate and temperature. Dry snow and wet snow mapping are necessary for understanding the melting rates which are directly proportional to climate change. The current research focuses on using remote sensing data to map dry and wet snow for seven glaciers of Ny-Ålesund, Svalbard, Arctic. The Arctic glaciers are important in the domain of research as these glaciers are least influenced by humans and less affected by pollution making them an ideal place for studying and understanding climate change. The mapping and estimate of dry/wet snow using the thresholding techniques (NDSI and NIR) for the hydrological year 2017–2018 (October 2017–September 2018) has been carried out using Sentinel-2 data. The results show that there is more dry snow region in April, May, and June and more wet snow regions during July, August, and September. In March 2018, it is observed that all 7 glaciers are exhibiting wet snow layer due to melting that has been caused by sudden temperature rise in February–March 2018. Keywords Arctic · Snow cover · Sentinel-2 · Glacier · NDSI
1 Introduction The cryosphere’s snow cover is an essential component, as well as one of the most active natural components on the terrestrial surface [1, 2]. A glacier is a continuous thick mass of ice that is continually moving owing to its weight [3]. Glaciers are B. Shashank Department of ECE, Jyothy Institute of Technology, Bengaluru, India M. Geetha Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_31
381
382
B. Shashank and M. Geetha Priya
a vital and vulnerable component of our ecosystem that may be exploited to track global warming and climate change [4]. Especially, the Arctic and Greenland ice sheets/glaciers are very important as they help in balancing the world’s climate. Glaciers are regarded to be highly sensitive indicators of climate change since they are impacted by long-term climatic changes such as changing air temperature, precipitation, cloud cover, and so on [3]. Snowmelt process monitoring, local climate research, snow disaster assessment, and water resource management all benefit from snow cover extraction studies, particularly identifying dry and wet snow [5]. Snow cover may be monitored on a wide area and with great precision using remote sensing data [6]. The study area chosen is in the Arctic region is far from pollution making it easy for studying and understanding the concepts of dry snow, wet snow, and the melting of ice. The present study is carried out to classify the snow cover into dry snow and wet snow using remote sensing techniques, which contributes to analyzing climate change since they are influenced by long-term climatic changes.
2 Materials and Methods 2.1 Study Area Ny-Ålesund is a tiny village on the island of Spitsbergen in Svalbard, Arctic. It is located on the Brogger peninsula (Broggerhalvoya) and the Kongsfjorden bay’s shore. The seven glaciers namely, Vestre Brøggerbreen, Austre Brøggerbreen, Midtre Lovénbreen, Austre Lovénbreen, Edithbreen, Pedersenbreen, and Botnfjellbreen of Ny-Ålesund shown in Fig. 1 and Table 1 is adopted for the present study. The Arctic glaciers are particularly significant in research since they are the least affected by people and the least polluted, making them a great location for researching and analyzing climate change. Debris is also seen in this region during the ablation period, The presence of debris is critical because as the thickness of debris increases, the melting of ice decreases.
2.2 Data Used Sentinel-2, a European wide-swath, high-resolution, multispectral data, is used for the present study. The twin satellites Sentinel 2 A and B which are in the same orbit but phased at 180°, offer a high revisit frequency of 5 days at the Equator, according to the complete mission specification. This also has an optical payload that samples 13 spectral bands: four at 10 m, six at 20 m, and three at 60 m spatial resolution. The orbital swath is 290 km wide. Only, the GREEN band (10 m) (band-3), NIR band (10 m) (band-8), SWIR band (20 m) (band-11) of the optical data required for the
Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using …
383
Fig. 1 The geographical location of the study area
Table 1 List of glaciers used in the study Sl. No.
Glacier name
Glacier ID
Area (in km2 )
1
Vestre Brøggerbreen
G011694E78906N
4.63
2
Austre Brøggerbreen
G011895E78886N
9.8
3
Midtre Lovénbreen
G012039E78878N
5.2
4
Austre Lovénbreen
G012161E78870N
5.01
5
Edithbreen
G012119E78852N
3.61
6
Pedersenbreen
G012286E78855N
6.24
7
Botnfjellbreen
G012405E78843N
5.86
hydrological year 2017–18 with less cloud cover was downloaded from the website https://earthexplorer.usgs.gov/. The data acquisition details are given below in Table 2. The shapefiles of the 7 glaciers are derived from the Randolph Glacier Inventory (RGI 6.0) (https://www.glims.org/maps/glims). The weather (temperature) data to correlate with the study has obtained from the Norwegian Meteorological Institute (MET Norway) (https://seklima.met.no/).
Acquisition date
March 2, 2018
April 26,2018
May 14, 2018
June 22, 2018
July 3, 2018
August 20, 2018
September 21, 2018
Sl. No.
1
2
3
4
5
6
7
S2A_MSIL1C_20180921T123811_N0206_R095_T33XVH_20180921T144857.SAFE
S2B_MSIL1C_20180820T124659_N0206_R138_T33XVH_20180820T164304.SAFE
S2A_MSIL1C_20180703T123701_N0206_R095_T33XVH_20180703T144112.SAFE
S2A_MSIL1C_20180622T130711_N0206_R081_T33XVH_20180622T132733.SAFE
S2B_MSIL1C_20180514T132719_N0206_R024_T33XVH_20180514T152642.SAFE
S2B_MSIL1C_20180426T122649_N0206_R052_T33XVH_20180426T162408.SAFE
S2A_MSIL1C_20180321T125821_N0206_R038_T31XFH_20180321T132417.SAFE
Scene ID
Table 2 List of satellite data used
Sentinel 2A
Sentinel 2B
Sentinel 2A
Sentinel 2A
Sentinel 2B
Sentinel 2B
Sentinel 2A
Satellite
95
138
95
81
24
52
38
Orbital Number
384 B. Shashank and M. Geetha Priya
Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using …
385
2.3 Methodology The Sentinel 2 data downloaded from the USGS website is directly available in the form of Top-Of-Atmosphere (TOA) as a Level-1C orthoimage product. The Normalized Difference Snow Index (NDSI) technique is used to estimate the snow cover map after the data has been post-processed using Eq. 1 [7] in the raster calculator. NDSI =
(GREEN Reflectance − SWIR Reflectance) (GREEN Reflectance + SWIR Reflectance)
(1)
Once the NDSI has been computed, a threshold of “NDSI >= 0.4” is applied to segregate the snow pixels, and “NDSI < 0.4” refers to non-snow pixels [8]. From the snow pixels, the snow cover area map is obtained. The snow cover map is generated using the NDSI algorithm, which identifies snow, ice, snow under the shadow, and even water bodies with frozen layers as snow pixels. Therefore, masking of water bodies obtained from July data was introduced to eliminate the overestimation of water bodies as snow pixels. To distinguish the wet & dry snow from the snowcovered area, a threshold is applied for the NIR reflectance band knowing the fact that NIR reflectance is less for wet snow due to the meltwater layer when compared to dry snow [9]. A threshold of NIR reflectance ≥ 0.5 is referred to as dry snow in the snow-covered area and a threshold NIR reflectance < 0.5 refers to wet snow in the snow-covered area [10]. To avoid classifying cloud pixels as snow pixels or as dry snow, Cloud-free satellite data was used during the NDSI process and NIR reflectance thresholding, respectively, because of their strong spectral sensitivity during this process. The same process (as shown in Fig. 2) is carried for all 7 glaciers; hence the result is obtained.
3 Results Using Sentinel-2A/2B data, the approach described above was used to generate snow cover maps and categorize them into wet and dry snow. The current study has been broadened to include seven main glaciers of Ny-Ålesund on the Norwegian island of Spitsbergen for one hydrological year, October 2017–September 2018, comprising accumulation and ablation seasons. The satellite data for the months October 2017– February 2018 was not available due to operational reasons of the European Space Agency (ESA). The wet and dry now maps along with the measured aerial extent are given in Figs. 5 and 6. Figure 3 represents the field photograph of Vestre Brøggerbreen in September 2019. From the results, it is observed that the presence of dry snow is maximum in the glaciers during the months April to June 2018. This is possible because April 2018 marks the end of the accumulation period whereas May & June 2018 are the onset of the ablation period. As the summer period progresses due to the
386
B. Shashank and M. Geetha Priya
Fig. 2 Process flow to estimate dry and wet snow
melting of snow/ice more wet snow area is observed over the glacier for the period July to September 2018. More interestingly, it is seen that the presence of wet snow is more than the dry snow during March 2018 (Fig. 5) which is part of the accumulation period. To confirm the melting process in March 2018 the temperature data for NyÅlesund was obtained from the Norwegian Meteorological Institute (MET Norway) [11] and shown in Fig. 4. From the temperature data, it is evident that positive temperatures had been recorded during February–March 2018. The same has been reported by NASA’s Goddard Institute for Space Studies (https://climate.nasa.gov/news/2714/ march-2018-was-one-of-six-warmest-marches-on-record/). The results suggest that climate changes can affect the glaciers at spatial and temporal scales from regional to global levels.
Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using …
387
Fig. 3 Field photograph of Vestre Brøggerbreen during UAV survey by CIIRC-JIT team
Fig. 4 The maximum temperature recorded during February–March of 2018
4 Conclusion For the Hydrological year 2017–2018, snow cover mapping with wet and dry snow classification was carried out using Sentinel-2 multispectral images, for seven glaciers in Ny-Ålesund, Svalbard, Arctic. The open-source Quantum GIS (QGIS) software was used for automated model creation with the required processes that would help for this study. With the spatial and temporal coverage of snow/ice, the dry and wet snow area estimation has been done. It’s observed that even a small variation in environment/weather/climate can cause a noticeable amount of change in glacier dynamics. Hence, the glaciers are known as natural climate indicators. Studying wet and dry snow mapping is necessary to analyze melting rates in the glacier which gives information about how climate change occurs. The present study period is confined to one hydrological year, whereas a similar study on a decadal scale helps in a better understanding of glacier dynamics.
388
Fig. 5 Dry and wet snow mapping for the study area
B. Shashank and M. Geetha Priya
Monitoring of Melting Glaciers of Ny-Ålesund, Svalbard, Arctic Using …
389
Fig. 6 Graphical representation of a Dry snow area and b Wet snow area calculated for the study area
Acknowledgements National Centre for Polar and Ocean Research, Ministry of Earth Sciences, Government of India, Goa has supported the logistical access to Ny-Ålesund, Svalbard, Arctic to carry out this scientific research work. The authors also thank the Director, CIIRC, and the Principal of Jyothy Institute of Technology, Bengaluru, Karnataka for supporting the research work.
References 1. Fierz C (2009) The international classification for seasonal snow on the ground (UNESCO), IHP (International Hydrological Programme )—VII, Technical Documents in Hydrology, No 83 ; IACS (Internation ... 91) 2. Barnett TP, Adam JC, Lettenmaier DP (2005) Potential impacts of a warming climate on water availability in snow-dominated regions. Nature 438:303–309. https://doi.org/10.1038/nature 04141 3. Luis AJ, Singh S (2020) High-resolution multispectral mapping facies on glacier surface in the Arctic using WorldView-3 data. Czech Polar Rep 10:23–36. https://doi.org/10.5817/CPR202 0-1-3
390
B. Shashank and M. Geetha Priya
4. Geetha Priya M, Varshini N, Suresh D (2021) Spatial analysis of supraglacial debris cover in Svalbard, Arctic Region—a decadal study. Environ Sci Pollut Res 28:22823–22831. https:// doi.org/10.1007/s11356-020-12282-x 5. Ostheimer GJ, Hadjivasiliou H, Kloer DP, Barkan A, Matthews BW (2005) Structural analysis of the group II intron splicing factor CRS2 yields insights into its protein and RNA interaction surfaces. J Mol Biol 345:51–68. https://doi.org/10.1016/j.jmb.2004.10.032 6. Immerzeel WW, Droogers P, de Jong SM, Bierkens MFP (2009) Large-scale monitoring of snow cover and runoff simulation in Himalayan river basins using remote sensing. Remote Sens Environ 113:40–49. https://doi.org/10.1016/j.rse.2008.08.010 7. Kulkarni AV, Dhar S, Rathore BP, Babu Govindha Raj K, Kalia R (2006) Recession of Samudra Tapu glacier, Chandra river basin, Himachal Pradesh. J Indian Soc Remote Sens 34:39–46. https://doi.org/10.1007/BF02990745 8. Geetha Priya M, Chandhana G, Deeksha G, Parmanand Sharma DK (2021) Mapping of wet and dry snow for Himalayan Glaciers using band ratioing and thresholding techniques. In: Advances in automation, signal processing, instrumentation, and control, lecture notes in electrical engineering, pp 1333–1340. Springer, Singapore. https://doi.org/10.1007/978-981-158221-9 9. Gupta RP, Haritashya UK, Singh P (2005) Mapping dry/wet snow cover in the Indian Himalayas using IRS multispectral imagery. Remote Sens Environ 97:458–469. https://doi.org/10.1016/ j.rse.2005.05.010 10. Rimarova K, Bernasovska K, Petril’akova T, Kovarova M (1998) Vplyv vybranych socialnych faktorov na reprodukcne ukazovatele rodiciek. Hygiena 43:131–136 11. Geetha Priya M, Venkatesh K, Shivanna L, Devaraj S (2020) Detecting short-term surface melt over Vestre Broggerbreen, Arctic glacier using indigenously developed unmanned air vehicles. Geocarto Int 1–12. https://doi.org/10.1080/10106049.2020.1849416
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data S. Dhanush and M. Geetha Priya
Abstract In several ways, glaciers are significant indicators of global warming and climate change. The melting of glaciers causes an increase in sea level, which has an impact on coastal areas. Despite the fact that the number of glaciers is rising or shrinking, researchers are studying the supraglacial debris cover with challenges. The study of debris cover is significant because debris cover affects the melting of ice in glaciers. The melting rate increases as the thickness of the debris cover decreases, and vice versa. The amount of supraglacial debris cover (SDC) determines the area of the ablation region covered with debris. Using Sentinel-2 Multispectral data, the present study is an attempt to estimate supraglacial debris cover for glaciers of Ny-Ålesund, Svalbard, Arctic for the Hydrological Year (HY) 2017–2018. Due to its high spatial resolution, Sentinel-2A and 2B dataset was able to give a more accurate delineation of debris from snow/ice using a threshold of 0.23 on normalized difference snow index (NDSI). The SDC has been calculated for seven glaciers of Ny-Ålesund, and it is found that nearly 30,500–430,900 m2 of the glaciated region has been covered with debris in the month of August 2018. Keywords Climate change · Debris · Arctic · NDSI · Sentinel-2
1 Introduction When compared to clean ice, the thickness of the SDC cover determines the rate at which the bottom ice melts [1]. Surface melt is affected by supraglacial debris cover, with thin debris cover increasing ablation and thick debris cover decreasing ablation [2]. A rise in the debris-covered area has been linked to overall glacier shrinking and mass loss, according to several studies [3–6]. When compared to S. Dhanush Department of ECE, Jyothy Institute of Technology, Benguluru, India M. G. Priya (B) CIIRC, Jyothy Institute of Technology, Benguluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_32
391
392
S. Dhanush and M. G. Priya
pure debris on rocks, the surface temperatures of black glaciers or debris-covered glaciers are a few degrees colder. As a result, the debris on top of the ice keeps the glaciers from melting too quickly [7]. Debris thickness tends to rise nearer the glacier borders and terminus due to higher underlying melt-out rate and concentration caused by decreased ice velocity. The quantity of energy available for sub-debris ablation is affected by weather conditions and local debris properties like reflectivity, rock properties, roughness, and moisture content, which vary the exact relationship between debris thickness and ablation rate [2]. Due to the complex climate pattern found in Ny-Ålesund, this place is suitable for glacier study. Hence, the present study is an attempt to study the debris cover of seven glaciers in Ny-Ålesund.
2 Materials and Methods 2.1 Study Area Ny-Ålesund lies on the west coast of Svalbard (Spitsbergen), one of the Arctic’s 20 most northernmost islands. This research base is situated on a coast surrounded by glaciers, moraines, and other modern Arctic research operations are a worldwide center for a variety of modern Arctic research activities. In the winter, daily average temperatures in this humid area can sometimes reach 0 °C due to the North Atlantic warming current [8]. The seven glaciers of Ny-Ålesund, namely Vestre Brøggerbreen, Austre Brøggerbreen, Midtre Lovénbreen, Austre Lovénbreen, Edithbreen, Pedersenbreen, Botnfjellbreen, are taken for the present study (Fig. 1). The area of seven glaciers are listed in Table 1. Only the ablation region of the glaciers is considered for the study of debris cover.
2.2 Data Used Multispectral data from the Sentinel-2 satellite downloaded from the USGS website (http://earthexplorer.usgs.gov) (Table 2) has been used in the present study. Sentinel2 Multispectral Instrument (MSI) has 13 spectral bands with spatial resolution ranges from 10 to 60 m. The Green band of 10 m and SWIR band of 20 m has been used in this study. The glacier boundary from GLIMS RGI V6 (Randolph Glacier Inventory) of 2017 has been used to delineate the ablation region of glaciers in Ny-Ålesund. To avoid misinterpretation of cloud pixels as snow pixels during the NDSI procedure, the satellite optical data used was cloud-free.
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data
393
Fig. 1 Location of glaciers in Svalbard archipelago
Table 1 List of glaciers used
Sl. No.
Glacier name
Glacier ID
Area in kms2
1
Vestre Brøggerbreen
G011694E78906N
4.63
2
Austre Brøggerbreen
G011895E78886N
9.64
3
Midtre Lovénbreen
G012039E78878N
5.02
4
Austre Lovénbreen
G012119E78852N
4.83
5
Edithbreen
G012119E78852N
3.48
6
Pedersenbreen
G012286E78855N
6.02
7
Botnfjellbreen
G012405E78843N
5.66
2.3 Methodology The Green and SWIR bands of level-1 C-1 product from Sentinel-2 are directly available as TOA reflectance is used for this study. Equation 1, which represents normalized difference snow index (NDSI) is used to segregate snow/ice pixel from debris pixel by applying an appropriate threshold. In general, NDSI greater than 0.4 is used to obtain snow cover products. In order to differentiate snow/ice from debris,
Acquisition date
02 March 2018
26 April 2018
14 May 2018
22 June 2018
03 July 2018
20 August 2018
21 September 2018
Sl. No.
1
2
3
4
5
6
7
S2A_MSIL1C_20180921T123811_N0206_R095_T33XVH_20180921T144857.SAFE
S2B_MSIL1C_20180820T124659_N0206_R138_T33XVH_20180820T164304.SAFE
S2A_MSIL1C_20180703T123701_N0206_R095_T33XVH_20180703T144112.SAFE
S2A_MSIL1C_20180622T130711_N0206_R081_T33XVH_20180622T132733.SAFE
S2B_MSIL1C_20180514T132719_N0206_R024_T33XVH_20180514T152642.SAFE
S2B_MSIL1C_20180426T122649_N0206_R052_T33XVH_20180426T162408.SAFE
S2A_MSIL1C_20180321T125821_N0206_R038_T31XFH_20180321T132417.SAFE
Scene ID
Table 2 List of satellite data used
Sentinel-2A
Sentinel-2B
Sentinel-2A
Sentinel-2A
Sentinel-2B
Sentinel-2B
Sentinel-2A
Satellite
95
138
95
81
24
52
38
Orbital number
394 S. Dhanush and M. G. Priya
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data
395
Sentinel-2 Level 1 C1 (Green and SWIR band)
TOA Reflectance
NDSI
Threshold >0.23
glacier boundary
Supraglacial debris cover map
Fig. 2 Process flow
different threshold values are suggested in the literature for Landsat datasets. For Sentinel-2 due to its spectral signature, existing threshold values from literature are not applicable. Hence, a new threshold to delineate debris from snow/ice has been formulated in the present study using green reflectance and NDSI values of Sentinel2 data. Using the new threshold SDC for seven glaciers of Ny-Ålesund, Svalbard has been carried out. The process flow is shown in Fig. 2. NDSI =
GREEN Reflectance − SWIR Reflectance GREEN Reflectance + SWIR Reflectance
(1)
3 Results To determine the appropriate value of threshold to delineate supradebris cover (SDC) by its surface and characteristics of the terrain, green reflectance as a function of NDSI was plotted as shown in Fig. 3. The average pixel values that differentiated snow/ice and debris were chosen as the final threshold value [9]. Around 300 samples of green reflectance and NDSI pixels representing each class of debris, snow/ice was considered to find the threshold, which distinguishes between debris and ice/snow. Green reflectance as a function of NDSI indicates that NDSI threshold values around 0.23 can be used to identify debris pixels from snow/ice pixels. RGI V6 glacier
396
S. Dhanush and M. G. Priya
boundary was used to manually delineate the ablation region based on the satellite images (Fig. 4). As per the methodology described, supraglacial debris cover has been mapped for the seven glaciers of Ny-Ålesund for the months of June, July, August, and September of the year 2018. The results obtained are shown in the form of SDC map in Figs. 5, 6, 7, 8 and 9. The SDC area measured from the ablation region is also given in Fig. 4. From the results, it is observed that for the months of March, April, and May of the year 2018, the supraglacial debris are covered with snow due to the accumulation period for all the glaciers. Results show that with the onset of summer, and the glaciers SDC area increases gradually from the month of June to August 2018. It is also observed a decline in the SDC area in the month of September 2018 which was possibly due to sudden snowfall. It is also noticeable that the glaciers
Fig. 3 NDSI threshold value for separating ice from debris
Fig. 4 Representation of the debris area in square kilometer of Ny-Ålesund for HY 2017–2018
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data
397
Fig. 5 SDC mapping for the month of March, April, and May 2018
Austre Brøggerbreen and Vestre Brøggerbreen have more debris cover in the range of 6.77–9.28% [10, 11]. All the glaciers of the study area except the Pedersenbreen Glacier have exposed maximum SDC in the month of August 2018. Pedersenbreen Glacier has maximum SDC in the month of July 2018 compared to SDC in August 2018 according to the satellite data obtained. This may be possible in case of a sudden avalanche/surge during August 2018. Figure 10 shows the supraglacial debris cover from the field photograph of Vestre Brøggerbreen Glacier.
4 Conclusion The present study is an attempt to understand the supraglacial debris cover of glaciers in Ny-Ålesund. Open-source software Quantum Geographic Information System (QGIS) and satellite data are used for the present study. The dataset of Sentinel-2 was analyzed, and NDSI was calculated. A new threshold value was found statistically and used to distinguish the debris from snow/ice from the NDSI. Results indicate nearly 0.51–9.28% of the glaciated region is covered with supraglacial debris in the month of August 2018 for the seven glaciers of Ny-Ålesund. The increased resolution of Sentinel-2 allows for more exact digitization, resulting in a more accurate delineation of SDC.
398
Fig. 6 SDC mapping for the month of June 2018
Fig. 7 SDC mapping for the month of July 2018
S. Dhanush and M. G. Priya
Supraglacial Debris Cover for Ny-Ålesund Using Sentinel-2 Data
Fig. 8 SDC mapping for the month of August 2018
Fig. 9 SDC mapping for the month of September 2018
399
400
S. Dhanush and M. G. Priya
Fig. 10 Field photograph of Vestre Brøggerbreen glacier during an UAV survey by CIIRC-JIT team showing debris on the glacier surface
Acknowledgements National Centre for Polar and Ocean Research, Ministry of Earth Sciences, Government of India, Goa, has supported the logistical access to Ny-Ålesund to carry out this scientific research work. The authors also thank the Director, CIIRC, and the Principal of Jyothy Institute of Technology, Bengaluru, Karnataka, for supporting the research work.
References 1. Reid TD, Brock BW (2010) An energy-balance model for debris-covered glaciers including heat conduction through the debris layer vol 56, pp 903–916 2. Nicholson LI, Mccarthy M, Pritchard HD, Willis I (2018) Supraglacial debris thickness variability : impact on ablation and relation to terrain properties, pp 3719–3734 3. Deline P (2005) Blanc massif vol 2, pp 302–309 4. Stokes CR, Popovnin V, Aleynikov A, Gurney SD, Shahgedanova M (2007) Recent glacier retreat in the Caucasus Mountains, Russia, and associated increase in supraglacial debris cover and supra-/proglacial lake development vol 2, pp 195–203 5. Kirkbride MP, Deline P, Bourget L (2013) The formation of supraglacial debris covers by primary dispersal from transverse englacial debris bands vol 1792, pp 1779–1792. https://doi. org/10.1002/esp.3416 6. Glasser NF, Holt TO, Evans ZD, Davies BJ, Pelto M, Harrison S (2016) SC. Geomorphology. https://doi.org/10.1016/j.geomorph.2016.07.036 7. Ranzi R, Grossi G, Iacovelli L, Taschner S (2004) Use of multispectral ASTER images for mapping debris-covered glaciers within the GLIMS Project, pp 1144–1147 8. Ding M, Wang S, Sun W (2018) Decadal climate change in Ny-Ålesund, Svalbard, a representative area of the arctic 1–11. https://doi.org/10.3390/condmat3020012 9. Pratibha S, Kulkarni AV (2018) Decadal change in supraglacial debris cover in Baspa basin, Western Himalaya. https://doi.org/10.18520/cs/v114/i04/792-799 10. Geetha Priya M, Varshini Narayan SD (2021) Spatial analysis of supraglacial debris cover in Svalbard, Arctic Region—a decadal study, pp 22823–22831 11. Geetha Priya M, Venkatesh K, Shivanna L, Devaraj S (2020) Detecting short-term surface melt over Vestre Broggerbreen, Arctic glacier using indigenously developed unmanned air vehicles. Geocarto Int 1–12. https://doi.org/10.1080/10106049.2020.1849416
Flood Mapping and Damage Assessment of Odisha During Fani Cyclone Using HSR Data C. Rakshita, M. Geetha Priya, and D. Krishnaveni
Abstract Natural hazards, such as floods, earthquakes, volcanoes, storms, and forest fires, are the most serious threats confronting our planet. Among these natural disasters, floods are the most common ones. Therefore, flood mapping aids in estimating the size of a flood on a wide scale as a way of emergency response following a flooding event. It serves as a foundation and prevention actions for any further occurrences. The present study is an attempt for mapping the flood and damage assessment for the Angul District of Odisha during the Fani Cyclone in May 2019. Sentinel-2 data is a high spatial resolution (HSR) data from European Space Agency (ESA) which is used in the present study. The present study delineates the flood extent with Normalized Difference Water Index (NDWI) and determines the damaged infrastructure, such as affected areas and buildings, as well as any degraded areas of interest. The process involved results in a vector file containing the affected areas and other containing damaged buildings along with flood extent mapping. Keywords Damage assessment · Flood mapping · Spectral reflectance · Sentinel-2 · Fani cyclone
1 Introduction Damage assessment is critical for stakeholders such as municipal, state, and federal governments, as well as insurance companies, following any disaster. Recent studies have shown that analysis of remotely sensed data can efficiently detect a wide range of damages caused by natural disasters. Flood, being one of the weather-related disasters, has a wide-range geographical impact [1].
C. Rakshita · D. Krishnaveni Department of ECE, Jyothy Institute of Technology, Bengaluru, India M. G. Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_33
401
402
C. Rakshita et al.
The flood disaster management process is divided into two events: pre-event and post-event. Pre-event includes identifying regions that are often flooded, as well as locations that are likely to be affected by a flood. Post-event starts after the flood and includes identifying changes in the river causing the flood, the state of flood control works, riverbank erosion, drainage congestion, flood hazard, and risk vulnerability assessment [2]. Due to its regular occurrence, flooding in cities has become one of the world’s most pressing issues, resulting in infrastructure damage and human deaths. The very brief time of ephemeral water flow, which seldom lasts more than one day, distinguishes a flash flood in an urban environment [3]. Obtaining the most recent version of hazard maps is the first step in flood control. Flood hazard mapping aids the decision-making process by broadening the information base required to comprehend the nature and features of floods in order to prevent risk in a neighborhood or city. Flood-affected regions must be mapped and modeled, which is a critical job: (i) Identify the most vulnerable areas for civic protection, (ii) access damages, and (iii) make urban planning [4]. Sentinel-2 images, the choice of the most convenient satellite data, depend on many factors: (i) characteristics of the study area, (ii) spatial resolution, (iii) revisit time, (iv) time of acquisitions with respect to the moment of maximum inundation, and (v) availability and cost of images. According to these conditions, Sentinel-2 provides really practical satellite data to validate urban flood impacts [5].
2 Materials and Methods 2.1 Study Area Angul is one of Odisha’s thirty districts, located in the eastern Indian state of Odisha. It is located 20° 31 N–21° 40 N latitude and 84° 15 E–85° 23 E longitude in Odisha’s central region, and it has an area of 6232 km2 . The elevation ranges from 564 to 1187 m. The district covers 6232 km2 . It is bordered on the east by Cuttack District and Dhenkanal, on the north by Sundargarh, Deogarh, and Kendujhar districts, on the west by Sambalpur, and Sonepur districts, and on the south by Boudh and Nayagarh districts. Natural resources are abounded throughout the district. Talcher is a place in Angul that is employed in the study process. Odisha is one of the most susceptible states to cyclones. Figure 1 shows the study area map.
2.2 Fani Cyclone 2019: Odisha Flood Situation Fani is a cyclone from a tropical depression that originated in the Indian Ocean west of Sumatra. Vertical wind shear hindered the storm’s development at first, but conditions improved for Fani on April 30th, 2019. It quickly became an exceptionally
Flood Mapping and Damage Assessment of Odisha During Fani …
403
Fig. 1 Study area map
strong cyclonic storm, reaching its peak strength on May 2nd, 2019. Its timing and strength are the two primary factors that distinguish it from most other tropical cyclones in May 2019 [6]. It has claimed the lives of 72 individuals in India, 64 of them were from Odisha. Figure 2 shows the rainfall data of the flood that took place in Odisha. On May 2nd, 2019 morning, the cyclone Fani passed about 160 km east of Visakhapatnam, India, in the Bay of Bengal, and it had intensified to the equivalent of a category four hurricane on the Saffir–Simpson Hurricane Wind Scale. The Joint Typhoon Warning Center predicted Fani’s winds at 245 kmph only hours before impact. Fani’s eye landed in the southern portion of Odisha state near Puri between 8 am 10 a.m. local time on May 3rd, 2019, bringing strong winds and heavy rainfall. According to the Indian Meteorology Department, the core of Fani was situated just inland a few miles west of Puri at approximately 9:30 a.m. local time on May 3rd, 2019. Northeast India and western Bangladesh received a lot of rain.
2.3 Data Used European Space Agency (ESA) launched the Sentinel-2 satellite in the year 2015 which has abundant merits. For instance, it has global coverage with a wide area view of 290 km, a fast revisit time of 5 days with two satellites and high resolutions
404
C. Rakshita et al.
(a)
(b) Fig. 2 Rainfall data a pre-event and b post-event
of 10, 20, and 60 m are some of the advantages. The major objective of the Sentinel-2 mission is to provide high-resolution optical imagery for the operational land cover maps and land change detection maps [7]. Through the multispectral instrument, Sentinel-2 collects data in 13 different spectral bands: four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution. Green and near-infrared (NIR) bands with a wavelength of 10 m are employed in this procedure. The Sentinel2 products are geometrically adjusted (UTM-WGS 84). The data used in this study were taken on April 18th and May 8th, 2019. The USGS Earth Explorer web browser is used to view Sentinel-2 images [8]. Table 1 shows the data used in the present study.
Satellite
Sentinel-2A Sentinel-2A
Sl. No.
1 2
LS2A_MSIL1C_20190418T044701_N0207_R076_T45QUD_20190418T075521.SAFE LS2A_MSIL1C_20190508T044701_N0207_R076_T45QUD_20190508T082652.SAFE
Scene Id
Table 1 Data used in the present study 2019/04/18 (pre-event) 2019/05/8 (post-event)
Date of acquisition
Flood Mapping and Damage Assessment of Odisha During Fani … 405
406
C. Rakshita et al.
2.4 Methodology The process flow (Fig. 4) that is adapted to study the affected area is as follows: (1) Sentinel-2 data download from USGS Earth Explorer (2) Normalized Difference Water Index (NDWI) computation. (3) Flood extent clipping (4) Binarization (5) transformation of raster to vector (6) Geometry-simplified model (7) layer overlay with buildings and affected area and (8) Damage assessment. The data is downloaded from the USGS Earth Explorer website http://earthexplorer.usgs.gov. which is an open source. Both pre-event and post-event data were downloaded from the Sentinel2 satellite. Near-infrared (NIR) and green bands were selected, which fall in 10 m spatial resolution. These bands were imported to QGIS. The coordinate reference system (CRS) was set to determine the real reflectance radiated by objects on Earth’s surface. Utilizing the green and near-infrared bands in the raster calculator, flood extent was determined. The Normalized Difference Water Index (NDWI) is given in Eq. 1: NDWI = (Band Green − Band NIR)/(Band Green + Band NIR)
(1)
Figure 3 shows the NDWI of pre-event and post-event datasets. The NDWI image must be binarized in order to recover the flood outline. This implies that each pixel will be assigned a value of 0 or 1, indicating whether the pixel is flooded or not. A threshold was determined to differentiate between flooded and non-flooded regions. This is done by taking an average of the digital number values in flooded regions. On averaging the two samples, the threshold value of 0.0058 is obtained [9]. In the raster calculator, the output binary geotiff file is obtained which is now converted to a vector file. The geometry of the shapefile of flood extent obtained as an outcome of this method has to be corrected and the CSR, redefined. A vast number of tiny polygons were generated during the polygonization process. These might happen on wet streets, wet rooftops, or manmade water bodies like reservoirs or pools that don’t always depict the full scope of the flood. A large number of polygons necessitates lengthy processing periods, which obstructs our understanding of the flood’s impacts. As a result, it is recommended that the flood extent shapefile be cleaned of these polygons. The flood layer and vector layer of the river can be distinguished by using buffer processing since we are interested only in the flood extent and not the river. By putting a buffer of 20 m around all polygons in the local neighborhood the areas inside the polygon are joined. The damage assessment of the impacted region is done by obtaining the intersection regions of area polygons and reduced flood polygons. Intersect the affected area (input) with the final flood extent to extract the impacted roads and make it easier to map them. These are called “intersections” which shows the interested regions only.
Flood Mapping and Damage Assessment of Odisha During Fani …
(a)
(c)
407
(b)
(d)
Fig. 3 Images of a NDWI of pre-event and b NDWI of post-event c binary value pre-event and d post-event
3 Results The present study is used to analyze the effect of floods that took place between April 18th, 2019 and May 8th, 2019. April 18th, 2019, is considered as pre-event and May 8th, 2019, is considered as post-event. Understanding the uncertainties that exist at each stage of the process and deciding how to incorporate this uncertainty into subsequent risk management choices is a key issue in flood mapping and damage assessment. Studying the damage caused by flood events is an important step in this study. This process finally consists of one vector file containing the affected
408
C. Rakshita et al.
Fig. 4 Flowchart
areas and the other containing damaged buildings. The other aerial observations and ground surveys are the traditional techniques of flood mapping; if the condition is prevalent, such approaches are expensive and time-consuming; moreover, timely aircraft observations might be difficult owing to prohibitive weather conditions. The ideal technique for mapping and monitoring floods is to employ satellite remote sensing (RS) technology as a replacement. The study illustrates how the area was swamped over time. It also depicts the rate of retrenchment over time. It can be seen from Fig. 5 that the highest flood occurred in the month of May 2019, the blue color region indicates the affected area and the brown color indicates the damaged buildings. Figure 6 shows the number of buildings affected during the flood event in Odisha. Table 2 shows the flood extent mapped in the present study accounting to approximately 701 km2 . post the Fani Cyclone.
Flood Mapping and Damage Assessment of Odisha During Fani …
(a)
409
(b)
Fig. 5 Damage assessment a pre-event and b post-event
Fig. 6 Buildings affected in Odisha during Fani Cyclone May 2019
Table 2 Flood area extent Sl. No.
Date
Event
Water body area in km2
Flood extent in km2
1 2
2019/04/18 2019/05/08
Pre-event Post-event
1000 1701
– 701
410
C. Rakshita et al.
4 Conclusion The present study on damage assessment of flood caused by the cyclone Fani during May 2019 in the district of Angul, Odisha, demonstrates a rapid method for assessing flood damage and mapping the flood area. The option to exclude flood regions under a particular size or natural water bodies, as well as the thresholding technique, might lead to varying results. Results indicate a nearly 70% of the increase in waterbody mapping post-event due to the flooding. One of the major floods caused by the Fani Cyclone that occurred in 2019 is studied here. The present study can be extended to other affected regions also to have an extensive analysis. Using microwave SAR data will be an added advantage for natural disaster-based studies due to its allweather/all-time characteristics. Acknowledgements This research work was being supported and funded by the space Applications Center-ISRO, Ahmedabad, under the NISAR mission (NASA-ISRO-SAR) with project ID number: NDM-03. The authors gratefully acknowledge the support and cooperation given by Dr. Krishna Venkatesh, Director, CIIRC—Jyothy Institute of Technology (JIT), Bengaluru, Karnataka, and Sri Sringeri Sharada Peetham, Sringeri, Karnataka, India.
References 1. Zeaiean Firouzabadi P, Saroei S (2015) Chapter twenty-four using remote sensing, a case study in golestan province 2. Brakenridge GR, Andersona E, Nghiemb SV, Caquard S, Shabaneh TB (2003) Flood warnings, flood disaster assessments, and flood hazard reduction. In: The roles of orbital remote sensing, 30th international symposium, remote sensing environment 3. Lin X (1999) Flash floods in arid and semi-arid zones. Technical Documents in Hydrology, no. 23, UNESCO, Paris, France 4. Amadio M, Mysiak J, Carrera L, Koks E (2016) Improving flood damage assessment models in Italy. Natural Haz 82(3):2075–2088 5. Khatun M, Garai S, Sharma J, Singh R (2021) Flood inundation mapping and damage assessment due to the super cyclone Yaas using Google Earth Engine in Purba Medinipur. West Bengal, India 6. Indian Red Cross Society: Odisha FANI cyclone Assessment Report 14 (2019) 7. Gascon F, Bouzinac C, Thepaut O et al (2017) Sentinel-2A calibration and products validation status. Remote Sens 9(6):584 8. Geetha Priya M (2018) The 39th Asian Conference on Remote Sensing 2018 At: Kualalumpur 9. Gao BC (1996) NDWI—a normalized difference water index for remote sensing of vegetation liquid water from space. In: Descour MR, Mooney JM, Perry DL, Illing LR (eds) Remote sensing of environment international society for optics and photonics, vol 58, pp 257–66
Disability Assistance System Using Speech and Facial Gestures B. N. Ramkumar, S. L. Jayalakshmi, R. Vedhapriyavadhana, and R. Girija
Abstract Physically disabled people are deprived from their opportunities in computer technologies stating their disabilities. Therefore, we are in need of developing a system which can help the disabled to interact with the computer technologies overcoming their disability. The main objective of this paper is to propose an assistance system for the disabled using minimum amount of resources and work for users with specialized interfaces. This proposed system can be helpful for inclusion of persons with amputee and vision impairment to use the latest technologies with easy accessibility. The user can interact with the system using two options called head-based control or voice-based control. This system can create many job opportunities in today’s computer world which can uplift the social and economic status of the disabled person. Keywords Assistance system · Hands-free cursor control · Voice control
1 Introduction In recent years, the growth of technology and the use of computer systems are drastically increasing. This leads to a point where we cannot spend a single day without a computer interaction for daily activities. But due to the disability, some people B. N. Ramkumar · R. Vedhapriyavadhana School of Computer Science and Engineering, VIT University, Chennai, India e-mail: [email protected] R. Vedhapriyavadhana e-mail: [email protected] S. L. Jayalakshmi (B) Department of Computer Science, School of Engineering and Technology, Pondicherry University (Main Campus), Puducherry, India e-mail: [email protected] R. Girija Department of Computer Science and Engineering, Manav Rachna University, Faridabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_34
411
412
B. N. Ramkumar et al.
find it difficult to interact, hence affecting their day-to-day activities. So they are in need of some assistance to perform these activities [1–3]. But in case of human assistance, they feel hesitant to ask for assistance and don’t want to disturb others. This leads to the point where a disability system is needed by them to perform these daily activities without seeking help from others. This system can improve their quality of living and social status [4, 5]. Many people due to their disabilities in arm or vision are unable to use standard computer interaction devices like mouse and keyboard. Therefore, the major objective of this paper is to propose system which uses both facial gestures and voice recognition to interact with the computer instead of standard mouse and keyboard. The users who find it difficult to use the mouse can use their head movement and facial gestures to move the cursor to do their task easily [6]. The speech recognition provides an advantage for people who have issues related to vision to use the computer technology without any hindrance [7, 8]. This system can also help the normal users to increase their efficiency of interaction with the computer systems. Let us imagine a situation where two workspaces are used side by side on the computer. So instead of switching from one workspace to other through mouse, we can use our simple head movement which takes less time. This is because it negates the time taken to switch our hands to the mouse and perform the action. The voice-based interaction can also be used as an alternative for keyboard inputs. The users could simply speak the input that has to be typed into a document which reduces both works done and time for the users. Apart from these uses, the system can also be used for other entertainment applications like games, graphic designing, and painting. The proposed system needs simple requirements like web camera, microphone for functioning, and there is no need of sophisticated requirements like sensors and other electronic gadgets. Thus, this paper proposes a system using programming languages and technologies to overcome the difficulties faced by the disabled in using computer technology using the limited amount of work and resources.
2 Proposed Disability Assistance System The proposed disability assistance system provides assistance to the users by two ways either through facial gestures or voice recognition. This system is very simple and straightforward which uses less number of resources. The system makes use of very common hardware like web camera and microphone which is now readily available with all computer systems as shown in Fig. 1. It also avoids the use of some uncommon and costly tools like sensors. Hand gloves, trackball was used in the previous systems. It provides the user with an easy-to-use interface, and therefore, a novice user could easily learn and use the system without difficulty. It is made sure that the users get desired action with minimal physical interaction than that of the conventional systems. This system consists of two important modules named as face-based control and voice-based control. The head-based control module tracks the head movements and facial features of the users to recognize the users need and
Disability Assistance System Using Speech and Facial Gestures
413
Fig. 1 Proposed architecture of the disability assistance system
executes the desired action. This system helps persons with disabled arms or legs to interact with the computer system. The second module is the voice recognition module which gets users speech as input through the microphone and converts it into voice-to-text. Later the text is searched in the keyword list if the text matches, the appropriate command is executed in the system. The above disability assistance system can be used by two methods by the users. The voice control module can be used by speaking commands to the system where the microphone takes the input from the user. This input is converted into text using speech to text convertor. The text is later matched which appropriate function, and task is carried out. This methodology can be used by persons who are deaf, blind and having amputees with less effort. On the other hand, the facial control module is specifically made for disabled person who have problem with hands to operate computers normally. This module uses the facial features of the user to understand the action they want to perform and maps it to the appropriate function. Though the system is targeted for the use of disabled person, it can also be used by normal persons for easier interaction with the computer systems. The following sections describe both the modules in detail.
3 Theory and Calculation The proposed disability assistance system has the following two main modules: facial control module and voice control module. This section states the detailed description and working of both modules.
414
B. N. Ramkumar et al.
3.1 Facial Control Module The facial control module will allow the user of the system to control their mouse cursor using their facial movements. The module uses the webcam to collect the video frames and extract the facial feature and map it to the predefined action. The module mainly works based on predicting the facial landmarks from a given face. These detected facial landmarks can be used to find the facial actions of the users like blinking of the eye and many more. To predict the facial landmarks of a face, we use Dlib’s prebuilt model which can help us to predict sixty-eight two-dimensional facial landmarks [9] as shown in Fig. 2, and also, it allows fast and precise face detection. The landmarks of the face are used to predict certain actions by the user. For example, we can detect the blinking or winking of the eye by finding the eye aspect ratio and mouth aspect ratio to detect the user actions like opening and closing of mouth, where these certain actions can be mapped onto their respective controls as shown in Fig. 3. Eye Aspect Ratio The eye aspect ratio methodology is a simple and straightforward way to find movement of eye such as blinking and closing of the eyes [10]. This method takes the six 2D facial landmarks near or on the eye detected by the Dlib libraries as shown in Fig. 4 and finds the aspect ratio using Formula (1), where X denotes multiplication.
Fig. 2 Dlib’s prebuilt facial landmarks
Disability Assistance System Using Speech and Facial Gestures
415
Fig. 3 Actions and their functions of facial control module
Fig. 4 EAR ratio of both closed and open eye using the predicted landmark of facial feature
EAR =
p2 − p6| + | p3 − p5| 2X | p1 − p4|
(1)
The EAR ratio in Fig. 4 states that the EAR ratio decreases when there is a blink of the eye, this ratio can be used to determine whether the eyes are closed or open. The same formula can be applied for both the eyes to find their actions. This can be done using a threshold where if EAR ratio goes below the threshold value (i.e., 0.2), we consider the eye is in a closed state and appropriate functions can be performed by the assistance system.
416
B. N. Ramkumar et al.
Fig. 5 EAR ratio of both closed and open eye using the predicted landmark of facial feature
Mouth Aspect Ratio The mouth aspect ratio works similar to the EAR where the landmarks of the mouth are taken into consideration and aspect ratio is found and used determine whether the mouth is open or closed. The ratio is calculated by taking eight 2D landmark of the mount as shown in Fig. 5, and these landmarks are used to compute the MAR ratio using Formula (2), where X denotes multiplication.
MAR =
|| p2 − p8||+| p3 − p7|+| p4 − p6| 2X | p1 − p5|
(2)
Similar to the EAR ratio, the MAR ratio of the mouth decreases when the mouth is closed as shown in Fig. 5. So we can set a threshold value just like EAR so if the MAR ratio goes below the threshold which is 0.5, then the mouth is in closed state. This can be mapped onto their respective action, and appropriate function can be carried out by the assistance system.
3.2 Voice Control Module The speech recognition has various uses in the field of automation [11, 12]. This voice control module is responsible for detecting the speech input from the users and mapping it to their appropriate function. This module has the ability to perform tasks such as playing music, opening applications, and operating mouse cursor. The module works by detecting the speech from the user and converting it to the equivalent text using text-to-speech libraries as shown in Fig. 6. Then these detected texts are mapped onto their respective functions as shown in Table 1. The speech-to-text conversion is done via several preexisting libraries, and we used python’s speech recognition library as our online library. We also used Google speech-to-text online library to get accurate results of conversion of the voice commands to text. The converted text is searched in the table whether it is present or
Disability Assistance System Using Speech and Facial Gestures
Voice Commands
Speech-to-Text
417
Command Execution
Fig. 6 Working of voice control module
Table 1 Appropriate voice command and their functions
Voice command
Action
Mute
Mute/Unmute
Play
Play/Pause
Volume UP
Increase Volume
Volume DOWN
Decrease Volume
Mouse UP
Moves the mouse Up by 200 pixels
Mouse DOWN
Moves the mouse Down by 200 pixels
Right Click
Right clicks the mouse button
Open Chrome
Opens Chrome from taskbar
not. If present the text is given as an input to the python “pyautogui” library, where the appropriate command for the action is executed.
4 Results and Discussion The disability assistant system has an easy-to-use user interface as shown in Fig. 7 where additional feature like news and contacts specific to the disabled people are added to the system. Users can you use either facial-based control or voice-based control feature according to their disability. All the functions of the system are very easy to use as it is very need of the disabled people [13–16].
4.1 Facial Control Module The facial control module is tested with a real-time scenario, and the results are very impressive. The MAR and EAR ratio are found using the facial landmarks of one of authors (Mr. Ramkumar), and it was shown in Fig. 8. Then the appropriate actions are mapped to the function of the disability assistant system (Refer Fig. 3).
418
B. N. Ramkumar et al.
Fig. 7 User interface of the disability assistant system
Fig. 8 Facial control module results (facial landmarks) for mouth closed (deactivate mouse control function) and opened state (activate mouse control function)
Disability Assistance System Using Speech and Facial Gestures
419
4.2 Voice Control Module The voice control module was given the input of speech, and the speech is converted into text and appropriate action was performed by the assistant system as shown in Fig. 9. The response of the text-to-speech conversion with the Google Voice Recognition API was very good, and results were very effective and efficient. In this work, we have used our own real-time images for analysis purpose. In the future, this work will be extended for benchmark datasets and the performance will be compared with some of the state-of-the-art methods that are proven effective in the literature.
Fig. 9 Result of the voice control module
420
B. N. Ramkumar et al.
5 Conclusion The disability assistance system provides the easy way of interaction between person and computer systems. In the proposed system, both voice and facial gestures were used as a parameter to provide the user accessibility. To work with the user input, two separate modules for voice and facial gesture were developed. The system was tested for basic computer interaction tasks like opening applications and moving the mouse cursor. The results prove that our system determines the user inputs and maps with appropriate action successfully with less user work. The system proves to be very effective and efficient for the use of the disabled persons. In addition to that, this system can also be used by normal people to reduce the interaction time with system. In the future work, the interaction can be made simpler and common without using some awkward actions. Thus, the developed disability assistance system can be useful for the disabled people to interact with the system.
References 1. Bakken JP, Varidireddy N, Uskov VL (2020) Smart universities: gesture recognition systems for college students with disabilities. In: Smart education and e-learning 2020. Springer, Singapore, pp 393–411 2. Donna K, Stoddart K (2021) Approaches that address social inclusion for children with disabilities: a critical review. In: Child and youth care forum. Springer US, pp 1–21 3. Sharmila A (2021) Hybrid control approaches for hands-free high level human–computer interface-a review. J Med Eng Technol 45(1):6–13 4. Chattoraj S, Vishwakarma K, Tanmay P (2017) Assistive system for physically disabled people using gesture recognition. In: International conference of signal and image processing (ICSIP), pp 60–65. https://doi.org/10.1109/SIPROCESS.2017.8124506 5. Muñoz-Saavedra L, Luna-Perejón F, Civit-Masot J, Miró-Amarante L, Civit A, DomínguezMorales M (2020) Affective state assistant for helping users with cognition disabilities using neural networks. Electronics 9(11):1843 6. Dorairangaswamy KS, Hands-free PC control for users with disabilities of their hands. Int J Mod Eng Res (IJMER) 84–86. ISSN: 2249-6645 7. Ascari REOS, Pereira R, Silva L (2020) Computer vision-based methodology to improve interaction for people with motor and speech impairment. ACM Trans Access Comp (TACCESS) 13(4):1–33 8. Khushitha A, Prince P, Jithin S (2021) Voice based assistant for windows. Int J Tech Res Appl 8:336–338. https://doi.org/10.1007/978-1-4302-5855-1_13 9. Davis, K (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758. https:// doi.org/10.1145/1577069.1755843 10. Soukupova T, Cech J (2016) Real-Time Eye Blink Detection using facial landmarks. In: 21st computer vision winter workshop (CVWW 2016), pp 1–8. https://api.semanticscholar.org/Cor pusID:21124316 11. Hassouneh A, Mutawa AM, Murugappan M (2020) Development of a real-time emotion recognition system using facial expressions and EEG based on machine learning and deep neural network methods. Inform Med Unlocked 20:100372 12. Durai S, Rajkumar N, Manikandan NK, Manivannan D (2016) Data entry works in computer using voice keyboard. Indian J Sci Technol 9(2). https://doi.org/10.17485/ijst/2016/v9i2/85814
Disability Assistance System Using Speech and Facial Gestures
421
13. Francy Irudaya Rani E, Vedhapriyavadhana R, Jeyabala S, Jothi Monika S, Krishnammal C (2018) Attendance monitoring using face recognition with message alert. Indo Iranian J Sci Res 2(2):105–113. ISSN: 2581-4362 14. Jayalakshmi SL, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118 15. Chandrakala S, Venkatraman M, Shreyas N, Jayalakshmi SL (2021) Multi-view representation for sound event recognition. Sign Image Video Process 1–9 16. Nikitha R, Vedhapriyavadhana R, Anubala VP (2018) Video saliency detection using weight based spatio-temporal features. In: 2018 international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 343–347 17. de Oliveira Schultz Ascari RE, Silva L, Pereira R (2021) Computer vision applied to improve interaction and communication of people with motor disabilities: a systematic mapping. Technol Disabil 1–18
A Deep Learning Neural Network Model for Predicting and Forecasting the Cryptocurrency–Dogecoin Using LSTM Algorithm N. Shivaanivarsha, M. Shyamkumar, and S. Vigita
Abstract In this era, cryptocurrencies have become a worldwide experience in the financial sector. It was initially created as an alternative payment method and a modern investment instrument. There are many cryptocurrencies, for example, Bitcoin, Litecoin, Dogecoin, etc. But there is a slight risk involved in investing in them as their prices are highly volatile. We compared data of various cryptocurrencies and understood that the value of Dogecoin in the cryptocurrency market has increased a lot compared to others in the past few months. Cryptocurrency price prediction is one of the greatest challenges. This paper introduces a neural network framework with LSTM algorithm to handle the price volatility of the Dogecoin and to obtain high accuracy. This deep learning solution predicts and forecasts the future prices. These findings have potential implications for investment and trading strategies. An alert system is also implemented which sends email alert to the user, when the Dogecoin price exceeds the user’s predefined limit. Keywords Deep learning · LSTM · Dogecoin · Cryptocurrency
1 Introduction A cryptocurrency is a virtual currency which is secured by cryptography. The algorithm used here is very complex. It requires connected network to conduct complex mathematical operations . They are not regulated and backed by the central bank of
N. Shivaanivarsha (B) · M. Shyamkumar · S. Vigita Department of ECE, Sri Sairam Engineering College, Chennai, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_35
423
424
N. Shivaanivarsha et al.
the issuing nation. The popularity of the cryptocurrency has skyrocketed since 2017. At present, cryptocurrencies are used by people to put a new form of economy. They are highly volatile because they include certain computer protocols that are out of any government control. As a result, the value of a cryptocurrency can go very high, or very low, overnight. They have rude swings in their prices and therefore very hard to predict. There are various reasons to invest and own cryptocurrencies. • Cryptocurrency transactions are much faster compared to the bank transactions. • The government cannot take your cryptocurrencies away. • No one can steal your personal data as these transactions use two encryption keys: public and private. When the coins are set somewhere, a mathematical function is applied to the key. Dogecoin is one of the cryptocurrency. But it was created as a joke back in 2013 as an alternative to bitcoin. But now it has become a very real asset. It has grown more than 26,000% in the last six months to become one of the most valuable digital assets. Unlike the bitcoins, Dogecoins are found in abundant. 10,000 coins are mined per minute, so over 5 billion new coins appear every year. Therefore, it is safe to invest in Dogecoin.
2 Proposed Method This paper proposes deep learning methods for the development of Dogecoin price prediction models. Deep learning methods are based on neural networks which is inspired by the functioning of the human brain [1]. In the proposed work, we are going to use LSTM algorithm which is a recurrent neural network architecture [4] for future predictions of Dogecoin prices. The reason we are choosing LSTM over a simple RNN is because RNN has trouble with short-term memory. Figure 1 shows the flow diagram for prediction process. In this proposed method, first historical data of Dogecoins are collected and preprocessed. In data preprocessing stage, the missing data (if any present) in the dataset will be taken care of. After that, the dataset is split into two: the testing set and the training set. Then using a learning algorithm, we develop a model by training it using the train set. The learning algorithm we use is LSTM. After we develop the model, we use the test set to evaluate whether the predicted values for the test set are of good accuracy. Now our model is ready. With this model, we can now predict and forecast Dogecoin price for n days.
A Deep Learning Neural Network Model for Predicting and Forecasting …
425
Fig. 1 Flow diagram of Dogecoin prediction process
3 Methodology There are various steps involved in the prediction of future prices of Dogecoins. Data Comparison Initially, we compared the data from various cryptocurrencies (Dogecoin, Bitcoin, Ethereum, Litecoin, Ripple, and Monero) along with Gold and Euro. Figure 2 shows the data uploaded for comparison. As we could infer from Fig. 3, 2015-08-07 can be considered as the common date in all the datasets. Then we plotted the graphs (Figs. 4, 5, 6, 7, 8, 9, 10, 11 and 12) for various cryptocurrencies along with gold and dollar to check the volatility by seeing how the closing price varies. Fig. 2 Datasets of various cryptocurrencies uploaded
426
N. Shivaanivarsha et al.
Fig. 3 Finding the minimum date in each dataset
Fig. 4 Daily returns of each currency starting from 2015-08-07
Fig. 5 Bitcoin price fluctuations
Looking at the graph in Fig. 4, we can understand that there are peaks of volatility around 2018 and 2021, especially for Ethereum, Ripple, and Dogecoin, where the noticeable change is in Dogecoin. It has spiked 11,000% in 2021. Investing in Dogecoin is a smart choice. To predict the future value of Dogecoins, we acquired acsv file
A Deep Learning Neural Network Model for Predicting and Forecasting …
427
Fig. 6 Ethereum price fluctuations
Fig. 7 Euro dollar price fluctuations
Fig. 8 Gold price fluctuations
Fig. 9 Litecoin price fluctuations
from Yahoo Finance. Yahoo Finance is an open source consisting of various stock price details.
428
N. Shivaanivarsha et al.
Fig. 10 Ripple price fluctuations
Fig. 11 Monero price fluctuations
Fig. 12 Dogecoin price fluctuations
Now, these datasets are uploaded in our program. Figure 13 shows all the data available in the dataset acquired from Yahoo Finance.
4 Data Visualization There are totally 7 columns and 2435 rows in our dataset. All these columns are not required to predict the future prices for Dogecoin. So, we need to eliminate the unwanted columns. For this purpose, we use a library called AutoViz library. AutoViz library is used to perform exploratory data analysis. It visualizes any kind
A Deep Learning Neural Network Model for Predicting and Forecasting …
429
Fig. 13 Data in the dataset
of datasets from small to big in different graphs and plots. Just a single line of code from this library results in beautiful visualizations. It gives results within seconds. It also eliminates the unwanted columns. Figure 14 shows how the code classified our variables in our Dogecoin dataset. There are totally 6 numerical columns available and one ID column. The ID variable for removed during the plot as it is a lowinformation variable. Figure 15 shows the relationship between two variables. Figure 16 shows violin plot visualization. It plots numerical data which is similar to a box plot, with the difference being the addition of a rotated kernel density plot on each side. They also show the probability density of the data at different values, which is usually smoothed by a kernel density estimator. Figure 17 shows magnitude of an occurrence, such as color in two dimensions.
Fig. 14 Classifying variables
430
N. Shivaanivarsha et al.
Fig. 15 Scatter plot between two variables
Fig. 16 Violin plot visualization
5 Removing Missing Data Now that we have found out what columns to use and not, the next step is data preprocessing. Sometimes, data maybe be missing in the dataset. They are called missing data (Fig. 18). When such missing data occurs in the dataset, the uploaded file will have NaN in the place of the missing data. NaN stands for ‘Not A Number,’
A Deep Learning Neural Network Model for Predicting and Forecasting …
431
Fig. 17 Heat map visualization
and it is one of the common ways the system represents the missing value in the dataset. During data analysis, this is a big problem. So, the rows having NaN should be removed for getting better results. Dropna() function will return the index without NaN values. Now let’s predict the flow of the ‘Close’ Column of the historical data of Dogecoin.
Fig. 18 Removing the rows with missing data
432
N. Shivaanivarsha et al.
Fig. 19 Splitting data into testing and training set
6 Test–Train Split The next step is to split the data into test set and training set. If we don’t split the data, then we will be training and testing the same data. We need to know if the model will provide good accuracy over an unseen data. So, we split the data into a training set and a testing set. We train our model using the train set, and we test that model using the test sets, and by comparing the test set results with the original data, we can see if the model is producing a good accuracy or not. For this proposed Dogecoin price prediction method, we have taken 65% of data for training set, and the remaining 35% is taken for test set (Fig. 19).
7 Forecasting Using Stacked LSTM Neural networks are a group of algorithms which is inspired by the functioning of the human brain [5]. They take in a large set of data, then process them, and produce the output. These networks do not have a memory to understand sequential data like language translation. RNN and LSTM have similar train-like structure, with the difference in the repeating module. Instead of one neural network in the repeating module of RNN, the LSTM have 4. A module of LSTM has one cell state, an input gate, an output gate, and a forget gate. After splitting the data, the required packages for producing a stacked LSTM model are imported. TensorFlow is a python library (Fig. 20), which provides fast numerical computing. After importing the libraries, we use the training set data to develop a model. Figure 21 shows the summary of the model developed.
Fig. 20 Importing libraries for LSTM modeling
A Deep Learning Neural Network Model for Predicting and Forecasting …
433
Fig. 21 Summary of the model developed
Training a neural network takes more than a few epochs because a greater number of epochs produce a good learning model. Here we trained our model using 100 epochs in order to avoid overfitting problem (Fig. 22). Now we use the test set to evaluate the model to see if the model predicts the value correctly. Then on calculating the ‘Root Mean Square Error’ for the model (Fig. 23), it is found to be nearly around 1.9.
Fig. 22 Model training using 100 epochs
Fig. 23 Root mean square error for the model
434
N. Shivaanivarsha et al.
Fig. 24 Actual test data vs predicted test data
8 Result and Discussion After training, the model test set is used to estimate the performance of the model. A graph is plotted to compare the results. In the graph shown in the figure, the blue line represents the entire data line and the orange line shows the test data and the green line predicts the test data. Now this model can be used to predict and forecast the values for n days. For forecasting on next 30 days, we have used a loop that continues for 30 iterations, where it follows the same LSTM rule. In the graph shown in Fig. 24, the blue line represents the entire data line and the orange line shows the test data and the green line predicts the test data [3]. In the graph shown in Fig. 25, the orange line shows the forecasted plot for the next 30 days. An alert system is also created to send an email notification when once the Dogecoin exceeds predefined limit [2]. It searches Yahoo Finance and scraps the current price of the Dogecoin and sends an alert message. Figures 26 and 27 show the current Dogecoin value, and Fig. 28 shows the email alert sent on exceeding the predefined values.
9 Conclusion This paper developed a methodology to predict Dogecoin price, and at the end, a new model was created to forecast the predicted price. We used deep learning method to build this robust model for price prediction. Thus, this model provides an important opportunity to contribute to the field of finance, as the results obtained have significant inference for future decisions, making it possible to avoid any big price changes to prevent loses.
A Deep Learning Neural Network Model for Predicting and Forecasting …
Fig. 25 Prediction for next 30 days
Fig. 26 Predefined limit
Fig. 27 Current prize
Fig. 28 Mail alert
435
436
N. Shivaanivarsha et al.
References 1. Sabry F, Labda W, Erbad A, Malluhi Q (2020) Cryptocurrencies and artificial intelligence: challenges and opportunities. IEEE Access 8:175840–175858. https://doi.org/10.1109/ACCESS. 2020.3025211 2. Mathai-Davis S, Trudeau E (2019) Bitcoin systematic trading algorithms in the cloud: challenges and opportunities. IEEE Cloud Summit 2019:25–30. https://doi.org/10.1109/CloudSumm it47114.2019.00011 3. Pillai S, Biyani D, Motghare R, Karia D (2021) Price prediction and notification system for cryptocurrency share market trading. In: 2021 International conference on communication information and computing technology (ICCICT), pp 1-7. https://doi.org/10.1109/ICCICT50803.2021. 9510122 4. Jay P, Kalariya V, Parmar P, Tanwar S, Kumar N (2020) Stochastic neural networks for cryptocurrency price prediction. IEEE Access 8:82804–82818 5. Biswas S, Pawar M, Badole S, Galande N, Rathod S (2021) Cryptocurrency price prediction using neural networks and deep learning. In: 2021 7th international conference on advanced computing and communication systems (ICACCS), 2021, pp 408–413. https://doi.org/10.1109/ ICACCS51430.2021.9441872
Crop Monitoring of Agricultural Land in Chikkaballapura District of Karnataka Using HSR Data A. Sowjanya and M. Geetha Priya
Abstract Remote sensing plays a major role in crop monitoring, as it helps in crop classification, crop health, and yield assessments. In the current environmental circumstances, crop condition monitoring, crop yield projections, irrigation monitoring, and over-fertilization monitoring are required. The present study utilizes Sentinel-2 multispectral data which is a high spatial resolution (HSR) satellite data from European Space Agency (ESA). For the present study of crop monitoring, an open-source GIS software named the Sentinel Application Platform (SNAP) was used to process the HSR data. The study was carried out on agricultural lands of Kurubarahalli, Gauribidanur taluk of Chikkaballapura District of Karnataka. A total of six agricultural lands with different crops (maize, coconut, raagi, and lentils) were monitored over the 7 months of the year 2020 by obtaining spectral band indices to understand the growth of the crop. Greater diversity of spectral vegetation indices (VIs) is achieved by red edge bands of Sentinel-2 to calculate the indices needed for the study and vegetation characterization. The indices studied are leaf area index (LAI), normalized difference vegetation index (NDVI), fraction of absorbed photosynthetically active radiation (FAPAR), FCover, or known as fraction of vegetation cover (FVC), canopy chlorophyll content (CCC), canopy water content (CWC). Keywords Crop monitoring · Sentinel-2 · Vegetation index · SNAP · Agriculture
1 Introduction Agriculture refers to the cultivation of plants and animals. It provides people with food which is essential to living. The sustainability of current agricultural products is threatened by fast population expansion [1] around the world. Due to the growth A. Sowjanya Department of ECE, Jyothy Institute of Technology, Bengaluru, India M. G. Priya (B) CIIRC, Jyothy Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_36
437
438
A. Sowjanya and M. G. Priya
of urbanization, there are now few places on the planet where arable land may be expanded. Increased global warming and environmental changes may potentially contribute to the loss of arable land. Agriculture is necessary to provide enough food and raw materials to an ever-increasing population. There is advancement in crop productivity over the last few decades in order to meet the requirements of the ever-growing population. To feed the entire population, further innovation in crop management will be required in the future. For countries whose economies rely on crop harvests, accurate and early assessments are necessary. Crop monitoring is one such technique that helps in crop management, and this technique can be used for different temporal regions and different purposes. Crop monitoring is a real-time procedure that enables the farmers to make quick changes for the field so that the health and yield of crops are maintained well. Crop traits [2], or factors that determine agricultural performance or fitness, can be obtained for and they differ from one crop region to the next. Applications of remote sensing technologies have already begun to benefit several elements of crop management. The use of satellites to monitor the crops and assist in their management is becoming popular [3]. Depending on the geographic area, crop diversity, field size, crop phenology, and soil condition, different band ratios of multispectral data, and classification schemes have been applied. For several agricultural and ecological applications, accurate quantitative estimate of biophysical factors is critical [4]. In the present study, the Sentinel2 satellite is preferred for the study because of its varying spatial resolution and vegetation red edge bands which are useful in crop monitoring. It is used to visualize and analyze Sentinel-2 data, as well as to generate products. The generated products are the six indices that provide data on crop development. As acknowledged by international organizations such as GCOS and GTOS, the leaf area index (LAI), the fraction of photosynthetically active radiation absorbed by the canopy (FAPAR), and the fraction of vegetation cover (FVC) are essential climatic variables (ECVs) and the other three being canopy chlorophyll content, canopy water content and normalized difference vegetation index (NDVI) which are fundamental for the understanding of agricultural ecosystems.
2 Materials and Methods 2.1 Study Area Kurubarahalli is a village in the Gauribidanur Tehsil of Karnataka’s Chikkaballapura District. It is 5 km from Gauribidanur, the sub-district headquarters, and 34 km from Chikkaballapur, the district headquarters. Kurubarahalli Village has Hirebidanur as its gram panchayat. The village covers a total area of 365.98 hectares. This area lies between 13°34 51 and 13°35 19 latitude and 077°31 29 and 077°32 05 longitude as represented in Fig. 1. The population of Kurubarahalli is 1587 people. Kurubarahalli Village has approximately 372 homes. The nearest town to Kurubarahalli is
Crop Monitoring of Agricultural Land in Chikkaballapura District …
439
Gauribidanur, which is around 5 km distance. The village is surrounded by agricultural land that belongs to the people (https://villageinfo.in/karnataka/chikkabal lapura/gauribidanur/kurubarahalli.html). Farmers who make up the majority of the village’s population and actively participate in farming in the village agricultural lands. The region of interest is outlined in black in location map and contains many small crop fields. Figure 2 represents the crop types in the study area with maize1, maize2, maize3 which indicates the maize crops grown in three different plots.
Fig. 1 Location map of the study area Gauribidanur
Fig.2 Crop-type map with colors assigned to each crop
440 Table 1 Sentinel-2 spectral bands used in the present study
A. Sowjanya and M. G. Priya Band
Name
Central wavelength in µm
Resolution in m
B2 B3
Blue
0.490
10
Green
0.560
10
B4
Red
0.665
10
B5
Vegetation red edge
0.705
20
B6
Vegetation red edge
0.740
20
B7
Vegetation red edge
0.783
20
B8
NIR
0.842
10
B8A
Vegetation red edge
0.865
20
B11
SWIR
1.610
20
2.2 Data Used Sentinel-2 is a satellite that is often used for land surface monitoring. This satellite is a constellation with Sentinel-2A (launched on 23 June 2015) and Sentinel-2B (launched on March 7, 2017). It has 13 band which operates in NIR, visible, SWIR spectral ranges with 10–60 m resolutions. The main instrument of Sentinel-2 is its sensor multispectral instrument (MSI). The inclusion of 3 band types, namely NIR, visible, SWIR, is a great combination for characterizing the green vegetation and also to know the status of crop health. Table 1 provides the list of spectral bands of Sentinel-2 used in the present study (https://sentinels.copernicus.eu/web/sentinel/ user-guides/sentinel-2-msi/resolutions/spatial). All the datasets were downloaded from USGS Earth Explorer (website). USGS provides level-1C Sentinel-2 data that are geometrically corrected with top of atmosphere reflectance. Seven cloud-free data over the study area were downloaded starting from July to December for the year 2020 (Table 2). All the datasets are from sensing orbit number 19 with descending direction. The monitoring period was chosen with the knowledge that the crops under study are Kharif crops.
2.3 Methodology Level-1C Sentinel-2 data that are orthorectified top-of-atmosphere (TOA) reflectance, with sub-pixel multispectral registration was converted to Level-2A orthorectified bottom-of-atmosphere (BOA) reflectance as mentioned in Fig. 3, with sub-pixel multispectral registration by using SNAP plugin named Sen2Cor. All
Satellite name
2B
2B
2A
2A
2B
2A
2A
Date of acquisition
2020–05-13
2020–06-22
2020–07-07
2020–10-25
2020–11-19
2020–11-24
2020–12-24
Table 2 Data used for the study
S2A_MSIL1C_20201224T051221_N0209_R019_T43PGR_20201224T062712
S2A_MSIL1C_20201124T051131_N0209_R019_T43PGR_20201124T062349
S2B_MSIL1C_20201119T051119_N0209_R019_T43PGR_20201119T072452
S2A_MSIL1C_20201025T050911_N0209_R019_T43PGR_20201025T071819
S2A_MSIL1C_20200707T050701_N0209_R019_T43PGR_20200707T084804
S2B_MSIL1C_20200622T050659_N0209_R019_T43PGR_20200622T085642
S2B_MSIL1C_20200513T050649_N0209_R019_T43PGR_20200513T090043
Scene ID
Crop Monitoring of Agricultural Land in Chikkaballapura District … 441
442
A. Sowjanya and M. G. Priya
Fig. 3 Methodology adopted in the present study
Sentinel-2 level-1C Resampling to 10m Subset of study area Calculation of indices LAI
NDVI
FAPAR
FCover
CCC
CWC
Time-series map of crop monitoring
geometrically corrected BOA reflectance bands were resampled to 10 m spatial resolution. The data were then subset according to the study area with some buffer area around. The indices were obtained from the S2ToolBox in SNAP (https://sentinel.esa. int/web/sentinel/toolboxes/sentinel-2) named the biophysical processor based on neural networks [5, 6] that generates the indices that are used in the present study. The process was repeated for all 7 datasets, and then each index from each of the seven months was stacked. A time-series map was obtained by placing pins on the field of interest.
2.4 Description of Vegetation Indices Used The leaf area index (LAI) is one of the biophysical measures that can be used to evaluate plant nutritional and health status, as well as serve as a stress and damage indicator. It is an important vegetation leaf structure parameter in forest and agricultural ecosystems. LAI is the amount of leaf material in a canopy and can also be defined as the ratio of one-sided leaf area per unit ground area. LAI [7, 8] is unitless because it is a ratio of areas. LAI is calculated using Eqs. 1 and 2. EVI = 2.5 ∗ (NIR − RED)/(NIR + 6 ∗ RED−7.5 ∗ BLUE + 1)
(1)
LAI = (3.618 ∗ EVI−0.118)
(2)
Normalized difference vegetation index (NDVI) is often used in remote sensing for crop growth monitoring, and it indicates the greenness in the area taken; in this study, the NDVI time series is obtained to monitor crops. The common values for
Crop Monitoring of Agricultural Land in Chikkaballapura District …
443
the green cropland are 0.2–0.8, and NDVI [9] can be obtained using Eq. 3. NDVI = (NIR−RED)/(NIR + RED)
(3)
Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) is the fraction of solar radiation absorbed by live leaves for photosynthesis. Only green plants that are alive and are capable to do photosynthesis give higher value and dead and dried leaves are likely to have lower FAPAR values. FAPAR [10] is calculated using Eqs. 4, 5, 6, and 7. SR = (1 + NDVI)/(1 − NDVI )
(4)
FAPARSR = [(SR − SRmin )/(SRmax −SRmin )] ∗ (FAPARmax −FAPARmin ) + FAPARmin .
(5)
FAPARNDVI = [(NDVI − NDVImin )/(NDVImax −NDVImin )] ∗ (FAPARmax − FAPARmin ) + FAPARmin
(6)
SRmax SRmin corresponds to the difference between the 98th and 2nd percentiles of the surface reflectance (SR) explain frequency distributions. NDVImax , NDVImin corresponds to maximum and minimum NDVI values obtained in the study, and FAPARmin, FAPARmax corresponds to maximum and minimum FAPAR values. FAPAR = (FAPARSR + FAPARNDVI )/2
(7)
The FCover/Fraction of Vegetation Cover (FVC) simply refers to the fraction of ground covered by green. In crop monitoring, this can be used to measure the growth of crop interns of height as well as branching and leaves. FVC [11] is calculated using Eq. 8. FCover = (NDVI − NDVISOIL )/(NDVIVEG − NDVISOIL )
(8)
Canopy Water Content (CW/CWC) is the mass of water present in the unit ground area. Real-time monitoring of this index gives us information on water stress in the crop. CWC is calculated using Eq. 9. CWC = Fresh weight − Dry weight
(9)
The Canopy Chlorophyll Content (CCC) is a two-dimensional remote sensing index created by combining the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Red Edge Index (NDREI). The index uses NIR spectral range between 670 and 790 nm, making it ideal for measuring nitrogen levels. This is mainly useful in canopy nutrition control. CCC is calculated using Eq. 10.
444
A. Sowjanya and M. G. Priya
CCC = (NIR−Red Edge)/(NIR + Red Edge)]/[(NIR − RED) /(NIR + RED)]
(10)
3 Results Using the process flow discussed under methodology section, various indices were calculated. The field photographs of the crops monitored are shown in Fig. 4.
3.1 Leaf Area Index (LAI) The leaf area index (LAI) is highly variable, the desert has a value of less than 1 while the densest forests have as high as 9, while the mid-latitude forest shrubs typically have LAI values between 3 and 6. When the crop is not fully grown, the LAI value over that area remains low as seen in Fig. 5. All crop shows less than the LAI value 1 and the value simultaneously increases as the crop grows, represented by increasing values. In the month of November, when LAI value again goes down at once indicating cultivation. The LAI declines linearly after grain filling begins. If the values begin to go down before cultivation, it indicates that the crop has been affected by pests. In Fig. 4, we can see that raagi and maize show similar pattern when it comes to LAI. The three maize fields differ a lot in values, whereas maize1 has a much higher LAI value compared to the other two maize crops which shows that the leaves were denser in maize1 field compared to other two fields. It is observed that all the crops follow a similar pattern for LAI.
Fig. 4 Field photographs of study area
Crop Monitoring of Agricultural Land in Chikkaballapura District …
445
Fig. 5 Time-series map of LAI
3.2 NDVI (Normalized Difference Vegetation Index) From Fig. 6, it is observed that NDVI value rises significantly in all crops because of the change in cropland. The drop in the values during November and December represents that the crops were harvested and land cover changed from lush green crops to bare land. NDVI is often high in value when the crop is healthy, pest-free, and stress-free. We can also see the difference of NDVI value in maize crop types which shows that the maize2 with lesser NDVI value was unhealthy compared to maize1 with higher NDVI values. Coconut crop shows steady NDVI value. In Fig. 6, maize1 crop shows a high NDVI value in October and maize3 follows the same pattern till the peak in October, then there is a change in pattern because maize1 was harvested early compared to other crops, this is represented by the sudden drop in NDVI value. Lentils and maize2 have lesser NDVI values.
Fig. 6 Time-series map of NDVI
446
A. Sowjanya and M. G. Priya
Fig. 7 Time-series map of FAPAR
3.3 FAPAR (Fraction of Absorbed Photosynthetically Active Radiation) FAPAR (Fraction of Absorbed Photosynthetically Active Radiation) is the fraction of solar radiation absorbed by live leaves for photosynthesis. Only green plants that are alive and are capable to do photosynthesis give higher value and dead and dried leaves are likely to have lower FAPAR values. Figure 7 represents that coconut, maize1, raagi, maize3 have followed a similar pattern which shows that the FAPAR value peaks in October which is the harvest season meaning that the crop was livelier and lentils, maize2 have lesser values compared to other crops which shows that the crop was not actively photosynthesizing.
3.4 FCover/Fraction of Vegetation Cover (FVC) When a crop branches out it covers the more surrounding area and also indicates that it’s healthy which is shown in and can be seen in maize1, coconut, raagi, maize3 as represented in Fig. 8. This index can be used to spot unhealthy or pest attacked areas since the main indication in such situations is stubborn growth and leaves getting eaten by pests which can be seen in lentils and maize2. FCover alone cannot be considered since the unnecessary plants also contribute to the vegetation cover which may lead to confusion in calculating and analyzing crop growth.
3.5 CW/Canopy Water Content (CWC) Water is essential for healthy crop growth, and also excessive water makes the plants lessen their breathing ability to store more water and roots get rotten and leaves turn
Crop Monitoring of Agricultural Land in Chikkaballapura District …
447
Fig. 8 Time-series map of FCover
Fig. 9 Time-series map of CWC/CW
yellow. So, by this index, we can check the water content and if it’s high limiting the water supply or if the CWC value is less, then providing sufficient water to the crops helps maintain the crop’s health. To know absorption properties of water and light penetration, reflectance measurements in the near-infrared and shortwave infrared bands are used to determine canopy water content. As seen in Fig. 9, the coconut farm shows high water content. Raagi, maize1, maize3, lentils show steady water content in them and follow a similar pattern. Maize2 has less water content which affects its health which also tells why the LAI, FAPAR, FCover values of this field were less.
3.6 CAB/Chlorophyll Content in Leaf (CCI) Nitrogen is very important for crop growth, and we can use the chlorophyll content index to see the requirement of nitrogen. The amount of chlorophyll in the crop area indicates the nitrogen requirement, the higher the chlorophyll the lesser the need for nitrogen, and vice-versa. From Fig. 10, it is seen that maize1, maize3, and raagi show
448
A. Sowjanya and M. G. Priya
Fig. 10 Time-series map of CAB/CCI
similar chlorophyll content patterns, coconut farm shows high chlorophyll content which exceeds 140. Lentils and maize2 show poor chlorophyll graph which can be interpreted as malnutrition in these crops.
4 Conclusion The study carried out using the Sentinel-2 biophysical indices and NDVI was summarized in this paper. By studying the characteristics and values of the indices, we can analyze the health of the crop. In this study, we observed that crops of same type that was grown in the same surroundings had different values in all the indices due to malnutrition and poor health. Maps obtained showed that crops which had smaller indices value were not properly maintained, and these crops lacked nourishment that requires for them to grow well. By crop monitoring in real time, we can obtain information about this malnourishment and solve them immediately. It was also seen that maize1, coconut, and raagi were well-nourished and had high CWC, CCC, and other indices values. Acknowledgements The authors thank the Director, CIIRC, Jyothy Institute of Technology, Bengaluru, for laboratory facilities and access to the software. Also, the authors would like to thank the Principal, Jyothy Institute of Technology, Bengaluru, for encouraging and supporting us.
Crop Monitoring of Agricultural Land in Chikkaballapura District …
449
References 1. Paz DB, Henderson K, Loreau M (2020) Agricultural land use and the sustainability of social-ecological systems. Ecol Modell 437:109312. https://doi.org/10.1016/j.ecolmodel.2020. 109312 2. Jay S, Maupas F, Bendoula R, Gorretta N (2017) Retrieving LAI, chlorophyll and nitrogen contents in sugar beet crops from multi-angular optical remote sensing: comparison of vegetation indices and PROSAIL inversion for field phenotyping. F Crop Res 210:33–46. https:// doi.org/10.1016/j.fcr.2017.05.005 3. Segarra J, Buchaillot ML, Araus JL, Kefauver SC (2020) Remote sensing for precision agriculture: sentinel-2 improved features and applications. Agronomy 10:1–18. https://doi.org/10. 3390/agronomy10050641 4. Upreti D, Huang W, Kong W, Pascucci S, Pignatti S, Zhou X, Ye H, Casa R (2019) A comparison of hybrid machine learning algorithms for the retrieval of wheat biophysical variables from sentinel-2. Remote Sens 11. https://doi.org/10.3390/rs11050481 5. Nocerino E, Dubbini M, Menna F, Remondino F, Gattelli M, Covi D (2017) Geometric calibration and radiometric correction of the maia multispectral camera. Int Arch Photogramm Remote Sens Spat Inf Sci—ISPRS Arch 42:149–156. https://doi.org/10.5194/isprs-archivesXLII-3-W3-149-2017. 6. Weiss M, Baret F (2016) S2ToolBox Level 2 products: LAI, FAPAR, FCOVER - Version 1.1. Sentin. ToolBox Level2 Prod 53 7. Huete A, Didan K, Miura T, Rodriguez E, Gao X, Ferreira LG (2002) Overview of the radiometric and biophysical performance of the MODIS vegetation Indices. Remote Sens Environ 83:195–213. https://doi.org/10.1016/S0034-4257(02)00096-2 8. Boegh E, Soegaard H, Broge N, Hasager C, Jensen NO, Schelde K, Thomsen A (2002) Airborne multispectral data for quantifying leaf area index, nitrogen concentration, and photosynthetic efficiency in agriculture. Remote Sens Environ 81:179–193. https://doi.org/10.1016/S00344257(01)00342-X 9. Rouse JW, Space G (2020) S 876 10. Los SO, Collatz GJ, Sellers PJ, Malmström CM, Pollack NH, DeFries RS, Bounoua L, Parris MT, Tucker CJ, Dazlich DA (2000) A global 9-yr biophysical land surface dataset from NOAA AVHRR data. J Hydrometeorol 1:183–199. https://doi.org/10.1175/1525-7541(2000)001%3c0 183:AGYBLS%3e2.0.CO;2 11. Wu B, Li M, Yan C, Zhou W, Yan C (2004) Developing method of vegetation fraction estimation by remote sensing for soil loss equation: a case in the Upper Basin of Miyun Reservoir. Int Geosci Remote Sens Symp 6:4352–4355. https://doi.org/10.1109/igarss.2004.1370101
RGB-to-Grayscale Conversion Using Truncated Floating-Point Multiplier S. Sankar Ganesh and J. Jean Jenifer Nesam
Abstract Color space conversion in imaging using floating-point (FP) arithmetic requires simple multiplication and adder units that consume less area and power. This paper presents directly truncated single precision floating-point multiplier using new bit pair recoding (NBPR) algorithm for area and power-efficient RGB-to-grayscale conversion. The NBPR has a major impact on partial product reduction which reduces the height of the partial product row for n-bit multiplication, from n to n/4. The algorithm generates approximately 57% less partial product rows when compared to the modified booth encoding (MBE) algorithm without performing ‘neg’ term and sign extension computation. This eliminates the necessity of additional circuitry involved in MBE that saves large area and power. Also, the direct truncation implies the truncation error of 0.0045% which is acceptable for error-tolerant image processing applications. The proposed floating-point multiplier is validated with red, green, and blue (RGB) to gray image converting application and produces the images with enhanced peak signal-to-noise ratio (PSNR). The implementation results in a TSMC 65 nm CMOS technology show that the NBPR floating-point multiplier achieves 76% and 50% of reductions in area and power, respectively, when compared to conventional booth floating-point multiplier. Keywords Color to grayscale conversion · Truncation · Floating-point multiplier · NBPR algorithm
S. S. Ganesh (B) Vellore Institute of Technology, Vellore, India e-mail: [email protected] J. J. J. Nesam Sri Venkateswara College of Engineering and Technology (SVCET) (A), Chittoor, AP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Subhashini et al. (eds.), Futuristic Communication and Network Technologies, Lecture Notes in Electrical Engineering 966, https://doi.org/10.1007/978-981-19-8338-2_37
451
452
S. S. Ganesh and J. J. J. Nesam
1 Introduction Numerous digital signal processing (DSP) applications such as 3D/computer graphics, speech, and image recognition construct a rigorous usage of floatingpoint (FP) operation to handle the internal data. Accordingly, there is a necessity to design FP units [1] with area and power efficient. Among all FP arithmetic units, FP multiplier is the cardinal part of many applications. Performance of the FP multiplier depends on the implementation way of its mantissa multiplication. Optimized mantissa multiplication is performed in many algorithmic ways such as modified booth encoding (MBE) [2], Karatsuba (divide and conquer) multiplication [3], and Wallace and Dadda multiplication [4]. The well-known Wallace multiplier uses tree-type adder structure in partial product (PP) reduction phase. In general, Wallace performed in three phases. In the first phase, n-bit multiplicand is ‘AND’ with each bit of the n-bit multiplier. This formed a PP matrix. In the second phase, the generated PPs are reduced to two 2n-bit row terms (2n-bit for sum and 2n-bit for carry). In the third phase, the final adder is used to add sum and carry to produce the result. Wallace multipliers are concentrated on 2nd phase, where PPs matrix’s row is reduced to 2 rows. Wallace employs carry save adder (CSA) in PP reduction phase and is then summed by carry propagate adder (CPA) to provide the multiplier output. The stages in the Wallace PP reduction phase vary with the bit size. Wallace performs CSA on three rows in each stage. Three bits use full adders, and a single bit in the same stage passed to the next stage without processing. In some cases in the same stage, only two rows avail after grouping; it will also pass to the next stage without processing. This process continues until only two rows remain. The amount of adders used in PP reduction phase determines the gate count and hence the area. In modified Wallace multiplier for unsigned multipliers defined in [4] reduces the number of half adder in PP reduction phase, since half adder does not reduce the PP row height. However, the reduction in half adder count in PP reduction phase slightly increases the full adder count. Moreover, the speed is the key parameter considered in the Wallace and Dadda but area consumption remains unattended. The MBE elevates this large area consumption using PP rows height reduction techniques [5–8]. However, the complexity remains in MBE with two unavoidable terms such as negative encoding (NE) and sign extension (SE). These two terms inhabit more gate count than PP reduction phase in MBE. Since NE and SE are deemed to be irrelevant to FP unsigned mantissa multiplication, the proposed NBPR generates the PPs without computing NE and SE. In NBPR, the multiplier is recoded as non-overlapping four bit groups, and each group generates one PP row. This helps to lessen the area–power consumption. Apart from this optimization in an algorithmic way, the area and power are reduced further with truncating few lower blocks of multiplication [9, 10]. Truncation is adopted in many areas and energy-efficient multiplier designs since DSP and multimedia applications perform fixed width multiplication on the data [11, 12]. The FP multiplier is equivalent to the unsinged fixed width multiplier. The double-length
RGB-to-Grayscale Conversion Using Truncated Floating-Point Multiplier
453
product is trim to actual input length after performing the rounding. The rounding is the complex and time-consuming part in FP multiplication. It requires n-bit addition stands on the guard (G), round (R), and sticky (S) bit computation. The S-bit is computed as the ‘OR’ function of bits which are going to be truncated at the end of rounding [13–16]. If the S-bit is set to ‘1’, the product value is added with ‘1’. Fortunately, many signal processing applications require area and power optimized multiplier rather than the accurately rounded multipliers [17–20]. In this paper, area, power reduced directly truncated single precision floating-point mantissa multiplication using NBPR algorithm is presented. The 24-bit mantissa multiplication is performed as decomposed unsigned fixed width multiplier, i.e., 24 × 24 multiplier with the 24-bit product, by applying direct truncation method. This eliminates the substantial steps involved in IEEE-754–2008 standard rounding methods. The truncation error due to truncating lower part of the partial products (PPs) is mathematically analyzed. Finally, the performance is evaluated through observing the PSNR of RGB to grayscale converted images. This conversion requires multipliers and adders. The remaining paper is organized as follows: Sect. 2 describes the conventional way of implementing FP multiplier. Section 3 presents the NBPR algorithm and Sect. 4 provides the truncated mantissa NBPR multiplication. Section 5 compares the implementation result. Section 6 concludes the paper.
2 IEEE 754–2008 Standard If X[23:0] and Y[23:0] are two normalized single precision FP numbers, where X[23] and Y[23] = 1, a traditional IEEE 754–2008 compliant FP multiplication performs as mentioned in the following steps: Step 1: Calculate sign as the ‘XOR’ of the sign of both operand. Step 2: Compute exponent as the addition of both operands exponent and then subtract the bias. Step 3: Generate partial product and provide carry and sum bits. Step 4: Add sum and carry to provide the product P[47:0]. Step 5: Pre-normalize the result and make P[47] = 1. Step 6: Calculate guard, round, and sticky bit (GRS) and the sticky bit is calculated as OR (P[22], P[21], P[20]…….P[0]). Step 7: The rounding bit R p {0,1} and is determined by the sticky bit. Step 8: Do post-normalization and exceptional handling. The performance of FP multiplier heavily depends on the unsigned mantissa multiplication. The conventional shift and add the type of single precision FP mantissa multiplier requires a large area, more power, and long delay. The MBEbased mantissa multiplication effectively reduces the area, power and the delay using the PP height reduction method. However, to ensure the unsigned mantissa multiplication using MBE the MSB of the normalized mantissa is extended with two zeros.
454
S. S. Ganesh and J. J. J. Nesam
This requires an additional one PP row [2] and increases the reduced PP row height from (n/2 + 1) to (n/2 + 2). MBE needs fourteen PP rows for 24 × 24 single precision FP mantissa multiplication. In addition to this, the sign of each row is extended to (2n-1)-bit position. These properties of MBE those are considered as an unnecessary computation for unsigned multiplication increases the logic complexity.
3 NBPR Algorithm The overlapped 3-bit multiplier groups select the coefficient {−2, − 1, 0, 1, 2} in MBE to generate the PP rows. Due to the negative encoding and the negative coefficient, the computation of 2’s complement in PP generation phase is mandatory in MBE. The conventional way of generating 2’s complement requires n-bit inversion followed by the n-bit addition. Since the inverted ‘1’ is added in the PP reduction phase, it increases the total PP reduction height by 1. The way of generating 2’s complement defines the critical path in MBE. The delay associated with the generation of 2’s complement is reduced in many designs in different ways [21]. However, the mandatory requirement of inversion and correction terms makes the MBE multipliers inapplicable for color conversion in image processing applications. The proposed NBPR algorithm totally eliminates the inversion computation and reduces the critical path and hence the delay. Moreover, the NBPR algorithm does not require correction bits and additional PP rows make the algorithm more amicable for imaging.
3.1 PP Generation The NBPR algorithm generates the PP by adding the pre-calculated values provided in Table 1, and its circuitry is shown in Fig. 1. The multiplier bits are recoded as non-overlapped four bit groups. The lower 2-bit is given as a select signal to the multiplexer1 (MUX) (M1), and higher two bits are fed as a selection line to MUX2 (M2). The adder circuit for adding the ‘Y’ and 1-bit left shifted ‘Y’ values are limited to use 2-input ‘AND’ gate, 2-input ‘OR’ gate and the inverter. Sticking to the point, the half adders are implemented using four gates and the full adders are implemented using nine gates as shown in Figs. 2 and 3, respectively The inputs are passed by six and five gates to generate the sum and carry, respectively [delay (6,5)]. The same manner the delay of half adder has a delay of 3,1, respectively [delay(3,1)]. The adder in MBE for generating 2’s complement takes the inverted inputs and delay is increased with the 1-gate delay (Table 2). This makes the full adder delay as [delay (7,6)] and half adder as [(delay(4,2)] for MBE-based implementation. The delay of PP generation phase of MBE and NBPR is tabulated in Table 3 and shows that the proposed NBPR algorithm consumes 4.55% lesser delay than MBE.
RGB-to-Grayscale Conversion Using Truncated Floating-Point Multiplier Table 1 Pre-defined values of NBPR algorithm
Representation
Pre-defined values
455
Operation
0
0000
0
A
000Y
Replace 1’s position by 8-bit multiplicand
B
00Y0
C
0Y00
D
Y000
E
Y + (Y < < 1)
Add Y and 1-left sifted Y
F
E00
Concatenate two 0’s as LSB with E
Y → 8-bit multiplicand
0 yi+3yi+2
00
A
B
E
0
C
D
F
01
10
11
00
01
10
11
MUX
yi+1yi
MUX n+4
n+4
Adder n+4
Fig. 1 Partial product generation circuit A 3
1 B
2
Sum
1
Carry
1
Fig. 2 Gate delay of half adder using 4 gates
3.2 PP Reduction The NBPR has a unique property for 8 × 8 multiplication as it merges the PP reduction and final addition into a single phase. The 8 × 8 NBPR unsigned multiplier generates two PPs. The generated PPs are directly passed to the final summing stage. The performance of the NBPR algorithm is explained in the following example.
456
S. S. Ganesh and J. J. J. Nesam A
1
B
3
2
sum
4
0
66
5 1
0
4
3 C 4
carry
5
1
Fig. 3 Gate delay of full adder using 9 gates Table 2 MUX-based PP selection for PP generation x
MUX output
Recoded bits y2i+3
y2i+2
y2i+1
y2i
Z1
Z2
0
0
0
0
0
0
0
0
0
1
A
0
0
0
1
0
B
0
0
0
1
1
E
0
0
1
0
0
0
C
0
1
0
1
A
C
0
1
1
0
B
C
0
1
1
1
E
C
1
0
0
0
0
D
1
0
0
1
A
D
1
0
1
0
B
D
1
0
1
1
E
D
1
1
0
0
0
F
1
1
0
1
A
F
1
1
1
0
B
F
1
1
1
1
E
F
Table 3 Gate delay comparison
Recorder
8×8 Gate delay
In %
MBE
22
100
NBPR
21
95.54
RGB-to-Grayscale Conversion Using Truncated Floating-Point Multiplier Table 4 Adder count for various 8 × 8 multipliers
457
Multiplier
8×8 Gate count
In %
Wallace
402
100
Modified wallace
363
90.3
Dadda
343
85.32
Proposed
206
51.24
Example 1 Y is the multiplicand and Z is the multiplier for the 8 × 8 multiplier. For the values of Y = 10,101,100 and Z = 11,101,001, the unsigned multiplication is performed as the steps follow: Step 1: Compute the pre-defined values A = 00010101100; B = 00101011000; C = 01010110000; D = 10101100000; //padding zeros E = 10101100 + (10101100)21 = 1000000100//(Y + Y