505 61 22MB
English Pages 763 [764] Year 2023
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Rajendra Prasad Yadav Satyasai Jagannath Nanda Prashant Singh Rana Meng-Hiot Lim Editors
Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences PCCDS 2022
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
Rajendra Prasad Yadav · Satyasai Jagannath Nanda · Prashant Singh Rana · Meng-Hiot Lim Editors
Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences PCCDS 2022
Editors Rajendra Prasad Yadav Department of Electronics and Communication Engineering Malaviya National Institute of Technology Jaipur, Rajasthan, India Prashant Singh Rana Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India
Satyasai Jagannath Nanda Department of Electronics and Communication Engineering Malaviya National Institute of Technology Jaipur, Rajasthan, India Meng-Hiot Lim School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-19-8741-0 ISBN 978-981-19-8742-7 (eBook) https://doi.org/10.1007/978-981-19-8742-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences (PCCDS 2022). PCCDS 2022 has been organized by Malaviya National Institute of Technology Jaipur, India, and technically sponsored by Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the PCCDS 2022 through the stringent and careful peer-review process. This book presents novel contributions to Communication, Computing and Data Sciences and serves as reference material for advanced research. PCCDS 2022 received many technical contributed articles from distinguished participants from home and abroad. PCCDS 2022 received 349 research submissions from 20 different countries, viz. Bangladesh, China, Germany, Greece, Iceland, India, Indonesia, Malaysia, Mexico, Morocco, Philippines, Poland, Qatar, Romania, Russia, Senegal, Serbia, Spain, Ukraine, and the USA. After a very stringent peerreviewing process, only 62 high-quality papers were finally accepted for presentation and the final proceedings. Jaipur, India Jaipur, India Patiala, India Singapore
Rajendra Prasad Yadav Satyasai Jagannath Nanda Prashant Singh Rana Meng-Hiot Lim
v
Contents
1
2
3
4
5
6
7
8
Optimized Watermarking Scheme for Copyright Protection of Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohit Thanki and Purva Joshi
1
MobileNet + SSD: Lightweight Network for Real-Time Detection of Basketball Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Banoth Thulasya Naik and Mohammad Farukh Hashmi
11
Modified Hungarian Algorithm-Based User Pairing with Optimal Power Allocation in NOMA Systems . . . . . . . . . . . . . . . Sunkaraboina Sreenu and Kalpana Naidu
21
Design and Implementation of Advanced Re-Configurable Quantum-Dot Cellular Automata-Based (Q-DCA) n-Bit Barrel-Shifter Using Multilayer 8:1 MUX with Reversibility . . . . . . Swarup Sarkar and Rupsa Roy Recognition of Facial Expressions Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Sarasa-Cabezuelo Identification of Customer Preferences by Using the Multichannel Personalization for Product Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Ramakantha Reddy and R. Lokesh Kumar A Post-disaster Relocation Model for Infectious Population Considering Minimizing Cost and Time Under a Pentagonal Fuzzy Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mayank Singh Bhakuni, Pooja Bhakuni, and Amrit Das The Hidden Enemy: A Botnet Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . Sneha Padhiar, Aayushyamaan Shah, and Ritesh Patel
35
53
69
79 93
vii
viii
9
Contents
Intelligent Call Prioritization Using Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Sanjana Addagarla, Ravi Agrawal, Deep Dodhiwala, Nikahat Mulla, and Kaisar Katchi
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Marko Djuric, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, Milos Antonijevic, and Marko Sarac 11 Prediction of Pneumonia Using Deep Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Jashasmita Pal and Subhalaxmi Das 12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Eugene Fedorov, Tetyana Utkina, Tetiana Neskorodieva, and Anastasiia Neskorodieva 13 Analysis of Delay in 16 × 16 Signed Binary Multiplier . . . . . . . . . . . . 155 Niharika Behera, Manoranjan Pradhan, and Pranaba K. Mishro 14 Review of Machine Learning for Antenna Selection and CSI Feedback in Multi-antenna Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Garrouani Yassine, Alami Hassani Aicha, Mrabti Fatiha, and Dhassi Younes 15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Hrishikesh Kumar, Sanjay Velu, Are Lokesh, Kuruguntla Suman, and Srilatha Chebrolu 16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Satyabrat Malla Bujar Baruah, Adil Zafar Laskar, and Soumik Roy 17 Dynamic Thresholding with Short-Time Signal Features in Continuous Bangla Speech Segmentation . . . . . . . . . . . . . . . . . . . . . 205 Md Mijanur Rahman and Mahnuma Rahman Rinty 18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Balla Pavan Kumar, Arvind Kumar, and Rajoo Pandey 19 Review on Recent Advances in Hearing Aids: A Signal Processing Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 R. Vanitha Devi and Vasundhara
Contents
ix
20 Hierarchical Earthquake Prediction Framework . . . . . . . . . . . . . . . . . 241 Dipti Rana, Charmi Shah, Yamini Kabra, Ummulkiram Daginawala, and Pranjal Tibrewal 21 Classification Accuracy Analysis of Machine Learning Algorithms for Gearbox Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . 255 Sunil Choudhary, Naresh K. Raghuwanshi, and Vikas Sharma 22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model Optimized by BPNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Deepti Patnaik, N. V. Jagannadha Rao, and Brajabandhu Padhiari 23 Identification of Genetically Closely Related Peanut Varieties Using Deep Learning: The Case of Flower-Related Varieties 11 . . . . 273 Atoumane Sene, Amadou Dahirou Gueye, and Issa Faye 24 Efficient Color Image Segmentation of Low Light and Night Time Image Enhancement Using Novel 2DTU-Net and FM2 CM Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chandana Kumari and Abhijit Mustafi 25 An Architecture to Develop an Automated Expert Finding System for Academic Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Harshada V. Talnikar and Snehalata B. Shirude 26 A Seismcity Declustering Model Based on Weighted Kernel FCM Along with DPC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Ashish Sharma and Satyasai Jagannath Nanda 27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna with Defected Ground for Medical Devices . . . . . . . . . . . . . . 325 Archana Tiwari and A. A. Khurshid 28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense Circular Polarization Antenna for 5G/Wi-MAX and C-Band Satellite Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Krishna Chennakesava Rao Madaka and Pachiyannan Muthusamy 29 An Analytical Appraisal on Recent Trends and Challenges in Secret Sharing Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Neetha Francis and Thomas Monoth 30 A Comparative Study on Sign Language Translation Using Artificial Intelligence Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Damini Ponnappa and Bhat Geetalaxmi Jairam 31 WSN-IoT Integration with Artificial Intelligence: Research Opportunities and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Khyati Shrivastav and Ramesh B. Battula
x
Contents
32 Time Window Based Recommender System for Movies . . . . . . . . . . . 381 Madhurima Banerjee, Joydeep Das, and Subhashis Majumder 33 Approximate Multiplier for Power Efficient Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 K. B. Sowmya and Rajat Raj 34 A Study on the Implications of NLARP to Optimize Double Q-Learning for Energy Enhancement in Cognitive Radio Networks with IoT Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Jyoti Sharma, Surendra Kumar Patel, and V. K. Patle 35 Automatic Generation Control Simulation Study for Restructured Reheat Thermal Power System . . . . . . . . . . . . . . . . . 419 Ram Naresh Mishra 36 Processing and Analysis of Electrocardiogram Signal Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Gursirat Singh Saini and Kiranbir Kaur 37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic Controller for Solar PV System Under Dynamic Irradiation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 CH Hussaian Basha, G. Devadasu, Nikita Patil, Abhishek Kumbhar, M. Narule, and B. Srinivasa Varma 38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy for Fog/Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Ipsita Dalui, Arnab Sarkar, and Amlan Chakrabarti 39 A Comparative Approach: Machine Learning and Adversarial Learning for Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Madhura Mulimani, Rashmi Rachh, and Sanjana Kavatagi 40 Blockchain-Based Agri-Food Supply Chain Management . . . . . . . . . 489 N. Anithadevi, M. Ajay, V. Akalya, N. Dharun Krishna, and S. Vishnu Adityaa 41 Data Balancing for a More Accurate Model of Bacterial Vaginosis Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Jesús Francisco Perez-Gomez, Juana Canul-Reich, Rafael Rivera-Lopez, Betania Hernández Ocaña, and Cristina López-Ramírez 42 Approximate Adder Circuits: A Comparative Analysis and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Pooja Choudhary, Lava Bhargava, and Virendra Singh
Contents
xi
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed Traffic Conditions on Urban Roads . . . . . . . . . . . . . . . . . . . . 535 K. C. Varmora, P. J. Gundaliya, and T. L. Popat 44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy Animal Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Sanjay Mate, Vikas Somani, and Prashant Dahiwale 45 Energy-Efficient Approximate Arithmetic Circuit Design for Error Resilient Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 V. Joshi and P. Mane 46 Continuous Real Time Sensing and Estimation of In-Situ Soil Macronutrients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 G. N. Shwetha and Bhat GeetaLaxmi Jairam 47 Design and Development of Automated Groundstation System for Beliefsat-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Rinkesh Sante, Jatin Bhosale, Shrutika Bhosle, Pavan Jangam, Umesh Shinde, Kavita Bathe, Devanand Bathe, and Tilottama Dhake 48 Towards Developing a Deep Learning-Based Liver Segmentation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Snigdha Mohanty, Subhashree Mishra, Sudhansu Shekhar Singh, and Sarada Prasad Dakua 49 Review on Vision-Based Control Using Artificial Intelligence in Autonomous Ground Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Abhishek Thakur and Sudhanshu Kumar Mishra 50 Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 K. Kiruthika Devi, G. A. Sathish Kumar, and B. T. Shobana 51 Small-Scale Islanded Microgrid for Remotely Located Load Centers with PV-Wind-Battery-Diesel Generator . . . . . . . . . . . . . . . . . 637 Deepak Gauttam, Amit Arora, Mahendra Bhadu, and Shikha 52 A Review on Early Diagnosis of Lung Cancer from CT Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Maya M. Warrier and Lizy Abraham 53 A Context-Based Approach to Teaching Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 András Kakucs, Zoltán Kátai, and Katalin Harangus 54 On the Applicability of Possible Theory-Based Approaches for Ranking Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Monika Gupta and R. K. Bathla
xii
Contents
55 Change Detection of Mangroves at Subpixel Level of Synthesized Hyperspectral Data Using Multifractal Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 Dipanwita Ghosh, Somdatta Chakravortty, and Tanumi Kumar 56 Analysis of the Behavior of Metamaterial Unit Cell with Respect to Change in Its Structural Parameters . . . . . . . . . . . . . 703 Shipra Tiwari, Pramod Sharma, and Shoyab Ali 57 Mid-Term Load Forecasting by LSTM Model of Deep Learning with Hyper-Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . 713 Ashish Prajesh, Prerna Jain, and Satish Sharma 58 A Comprehensive Survey: Benefits, Recent Works, Challenges of Optimal UAV Placement for Maximum Target Coverage . . . . . . . 723 Spandana Bandari and L. Nirmala Devi 59 Comparative Study Between Different Algorithms of Data Compression and Decompression Techniques . . . . . . . . . . . . . . . . . . . . 737 Babacar Isaac Diop, Amadou Dahirou Gueye, and Alassane Diop 60 A Unique Multident Wideband Antenna for TV White Space Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 Ankit Meghwal, Garima Saini, and Balwinder Singh Dhaliwal 61 Development of a Deep Neural Network Model for Predicting Reference Crop Evapotranspiration from Climate Variables . . . . . . 757 T. R. Jayashree, N. V. Subba Reddy, and U. Dinesh Acharya 62 A Novel Efficient AI-Based EEG Workload Assessment System Using ANN-DL Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 R. Ramasamy, M. Anto Bennet, M. Vasim Babu, T. Jayachandran, V. Rajmohan, and S. Janarthanan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
About the Editors
Prof. Rajendra Prasad Yadav is currently working as a Professor-HAG in the Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India. He has more than four decades of teaching and research experience. He was instrumental in starting new B.Tech, M.Tech courses and formulating Ph.D Ordinances for starting research work in Rajasthan Technical University (RTU) Kota and other affiliated Engg colleges as Vice Chancellor of the University. He has served as HOD of Electronics and Comm. Engg., President Sports and Library, Hostel warden, Dean Student Affairs at MNIT Jaipur. At present he is also the Chief Vigilance Officer of MNIT Jaipur since 2015. Prof. Yadav received the Ph.D degree from MREC Jaipur and M.Tech degree from IIT Delhi. Under his supervision 15 Ph.D students have received Ph.D degree, 7 students are working for their Ph.D degree. Forty M.Tech students have carried out their dissertation work under his guidance. He has published more than 200 peer review research papers which has received 1150 citations. His research interest are Error Control Codes and MIMO-OFDM, RF and Antenna Systems, Mobile and Wireless Communication Systems, Optical Switching and Materials, Mobile Adhoc and Sensor Network, Device Characterization and MEMS, Cognitive Radio Networks. Dr. Satyasai Jagannath Nanda is an assistant professor in the Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur since June 2013. Prior to joining MNIT Jaipur he has received the PhD degree from School of Electrical Sciences, IIT Bhubaneswar and M. Tech. degree from Dept. of Electronics and Communication Engg., NIT Rourkela. He was the recipient of Canadian Research Fellowship- GSEP, from Dept. of Foreign Affairs and Intern. Trade (DFAIT), Govt. of Canada for the year 2009-10. He was awarded Best PhD thesis award at SocPros 2015 by IIT Roorkee. He received the best research paper awards at SocPros-2020 at IIT Indore, IC3-2018 at SMIT Sikkim, SocPros-2017 at IIT Bhubaneswar, IEEE UPCON-2016 at IIT BHU and Springer OWT-2017 at MNIT. He is the recipient of prestigious IEI Young Engineers Award by Institution of Engineers, Govt. of India in the field of Electronics and Telecommunication Engineering for the year 2018-19. Dr. Nanda is a Senior Member of IEEE and IEEE xiii
xiv
About the Editors
Computational Intelligence Society. He has received travel and research grants from SERB, UGC, CCSTDS (INSA), INAE. Till date he has published 40 SCI/SCOUPUS Journal articles and 50 international conference proceedings which received almost twelve hundred citations. He is the in-charge of Digital Signal and Image Processing (DSIP) Lab. at MNIT Jaipur. Under his supervision at MNIT Jaipur six researchers have awarded PhD and five researchers are continuing their research work. Along with it he has supervised 22 M. Tech thesis. Dr. Nanda is co-coordinator of Electronics and ICT Academy at MNIT Jaipur which is a set-up of Ministry of Electronics and IT, Govt. of India of Grant 10 Crore. Dr Prashant Singh Rana is presently working as Associate Professor in the Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala, Punjab. He received his both PhD and MTech from ABV-IIITM, Gwalior. His areas of research are Machine Learning, Deep Learning, Bioinformatics and Optimization. He has published more than 70 research papers in different journals and conferences. He completed five projects sponsored by DST, ICMR, NVIDIA and one projects are going on. He published 10 patents. He guided seven PhD students and 18 Masters students. Dr Meng-Hiot Lim is a faculty in the School of Electrical and Electronic Engineering. He is holding a concurrent appointment as a Deputy Director for the M.Sc in Financial Engineering and the Centre for Financial Engineering, anchored at the Nanyang Business School. He is a versatile researcher with diverse interests, with research focus in the areas of computational intelligence, evolvable hardware, finance, algorithms for UAVs and memetic computing. He is currently the Editor-inChief of the Journal of Memetic Computing published by Springer. He is also the Series Editor of the book series by Springer titled “Studies in Evolutionary Learning and Optimization”.
Chapter 1
Optimized Watermarking Scheme for Copyright Protection of Medical Images Rohit Thanki
and Purva Joshi
1 Introduction Image sharing is easy with today’s open-access media. However, attackers or imposters can manipulate the images on open-access media. This leads to copyright issues. Because of this, if an image is shared on an open-access medium, it must be protected by copyright. Watermarking can be used to solve this problem [1– 11]. Using an embedding factor, watermarked content is generated from the cover image. There are many types of watermarking based on the content of their covers, the processing domains, the attack, and the extraction method [1, 6]. Watermarks can be applied as text watermarks, image watermarks, or signal watermarks, depending on the content of the cover. Watermarking can be classified into four types based on the processing domain: spatial domain, transform domain, a hybrid domain, and sparse domain. According to their resistance to attacks, watermarks can be classified as robust or fragile. Additionally, blind, non-blind, and semi-blind watermarking can be classified as types of watermarking based on how they are extracted. According to the literature [1–11], watermark embedding is performed using an embedding factor. The watermark information is inserted or modified into the cover medium’s content using the embedding factor value. In the literature, researchers have created watermarked information using their embedding factors. Unfortunately, there is no standard for embedding factors. Therefore, an optimized process is required for embedding factor standardization. The optimization process will identify the optimal embedding factor to produce the best results for the watermarking algorithm.
R. Thanki (B) Senior Member, IEEE Gujarat Section, Rajkot, India e-mail: [email protected] P. Joshi University of Pisa, Pisa, Italy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_1
1
2
R. Thanki and P. Joshi
The watermarking algorithm uses various optimization algorithms to find an optimal embedding factor. The embedding factor is determined by watermarking evolution parameters (called a fitness function, f ). Watermark transparency is measured by peak signal-to-noise ratios (PSNR) or weighted PSNRs. Structural similarity index measures (SSIMs), bit error rates (BERs), and bit correction rates (BCRs) are used to measure robustness. Watermark size and cover data size are used to calculate payload capacity. Transparency, robustness, and payload capacity are measured by PSNR, NC, and PC, respectively. As a result, the fitness function may look like this: f m = PSNRm + w1 · NCm + w2 · PCm
(1)
where m is a no. of iteration and w1 and w2 are weighted factors. The embedding factors are used to generate an image with a watermark. To obtain a watermarked image, embedding factors were used according to the researchers’ specifications. It is time consuming to select embedding factors manually. The best embedding factor is thus determined by algorithms regarding watermarking techniques. Embedding factors are affected by cover data, watermarks, and embedding processes. An array of optimization algorithms can currently be used to accomplish this, including genetic algorithms (GAs) [12], particle swarm optimizations (PSOs) [13], genetic programming (GP) [14, 15], differential evaluations (DAs) [16], simulated annealing (SAs) [17], tabu searches [18], colony optimizations [19], and harmony searches [20]. Due to its ease of understanding and implementation, the PSO algorithm is widely used for optimizing embedding factors in watermarking [21]. The literature contains a few watermarking schemes using PSO to protect medical images from copyright infringements [22–25]. Findik et al. [22] proposed a PSO watermarking scheme for cover color images. A pseudo-random noise generator was used to determine where the watermark image should be embedded. The watermark bit was inserted into a block of the cover image using PSO after finding the best pixel locations in this block. The payload capacity of this scheme is smaller than that of a blind scheme. Watermarking schemes based on PSO and discrete wavelet transform (DWT) have been proposed by Fakhari et al. [23] and Wang et al. [24]. An image with a grayscale medical cover and a greyscale standard was proposed by the Fakhari scheme, while the Wang scheme proposed the standard grayscale image. It was a non-blind scheme in both cases. There is, however, a limited payload capacity with these two schemes. Wang’s scheme is robust against standard watermarking attacks, whereas Fakhari’s is robust against geometric attacks. A DWT watermarking scheme was presented by Chakraborty et al. [25]. A detailed wavelet sub-band of the host image is augmented with PN sequences by watermark bits. The embedding factors for watermarks are found using PSO. The authors do not discuss the scheme’s robustness against attacks in their paper. A new blind watermarking scheme is developed and proposed for medical images in this paper to overcome some limitations of the existing techniques [22, 23]. Watermarking schemes using RDWTs offer better transparency and payload capacity than
1 Optimized Watermarking Scheme for Copyright Protection of Medical …
3
those using wavelet transforms alone. This scheme has the following main features: (1) uses RDWT and PSO properties to overcome some of the limitations of standard watermarking procedures, such as selecting the embedding factor and limited payload capacity; (2) watermark images can be blindly extracted, which cannot be done using the existing scheme [23, 25]. The proposed scheme is robust based on the results. Moreover, the proposed scheme provides greater transparency and payload capacity than the existing schemes such as Fakhari’s scheme [23] and Chakraborty’s scheme [25]. The proposed scheme selects embedding factors according to an optimization mechanism. Watermarking schemes are improved by adding a mechanism for optimization to the traditional trade-offs. It proceeds as follows: Sect. 2 describes a proposed blind and optimized watermarking scheme. A comparison of the proposed scheme and the experimental results is presented in Sect. 3. The conclusion of the paper is presented in Sect. 4.
2 Proposed Scheme The PSO algorithm implements a robust and blind watermarking scheme [25]. This scheme uses digital wavelet transform (DWT). However, the payload capacity of this scheme is very low. Additionally, watermarked images are less transparent with this scheme. Moreover, this scheme applies only to certain types of images or signals. Using block RDWT and PSO, we propose an approach to embed the monochrome watermark directly into the LH, HL, and HH wavelet sub-bands of the cover medical image. HH, HL, and LH, which contain detailed wavelet sub-bands, provide less visual information for embedding watermarks. As a result, the proposed scheme embeds the watermark in the detailed wavelet sub-bands. Sub-bands are divided into non-overlapping blocks. Watermark bits 0 and 1 are embedded using two uncorrected noise sequences. As a result of each noise sequence, the coefficients of its corresponding sub-band are modified using the optimal embedding factor. The PSO algorithm determined an optimal embedding factor. Embedding and extraction are two processes in the proposed scheme.
2.1 Embedding Process The steps for the embedding process are given below. Step 1. A single-level RDWT is used to separate the cover medical image into wavelet sub-bands such as LL (approximation sub-band) and LH, HL, and HH (detail sub-bands). Step 2. Monochrome watermark images are converted into binary sequences. Step 3. Create non-overlapping blocks from detailed wavelet sub-bands LH, HL, and HH.
4
R. Thanki and P. Joshi
Step 4. The block size is equal to the size of two uncorrelated noise sequences generated by a noise generator. S0 and S1 are the noise sequences for watermark bits 0 and 1. Step 5. The watermark sequence modifies the coefficients of detailed wavelet subbands based on an optimal embedding factor (k) which is obtained from PSO. For each block of cover medical image, this procedure was conducted for all coefficients. Step 6. A modified wavelet sub-band is the output as the result of step 5. After that, apply single-level inverse RDWT to these modified sub-bands and unmodified approximation sub-bands in order to generate the watermarked medical image.
2.2 Extraction Process The steps for the extraction process are given below. Step 1. A watermarked medical image is decomposed into different wavelet subbands using a single-level RDWT, including approximation sub-bands and detailed sub-bands like WLH, WHL, and WHH. WLH, WHL, and WHH should be converted into non-overlapping blocks. Step 2. Take uncorrelated noise sequences that are generated during embedding. Step 3. Using the correlation between the noise sequences (S 0 , S 1 ) and the detailed wavelet coefficients (WLH, WHL, and WHH), recover the watermark bit from the detailed wavelet coefficients of the watermarked medical image. The correlation result of coefficients with noise sequence S 1 and noise sequence S 0 indicates C 1 and C 0 . Step 4. Whenever C 0 > C 1 , bit 0 is selected as the watermark bit value. Otherwise, bit 1 is chosen. Step 5. The recovered watermark image is created by reshaping the extracted sequence into the matrix.
2.3 Generation of Optimal Embedding Factor Any watermarking scheme depends on the embedding factor to meet its basic requirements. Watermarked images with large embedding factors have lower transparency, but recovered watermarked images with high embedding factors have better quality. Many existing schemes in the literature keep the embedding factor constant, but these do not work well with multimedia data. As a result, some adaptive schemes are needed to calculate the appropriate embedding factors for various multimedia data types. This paper combines a block RDWT-based watermarking scheme with a popular optimization algorithm, PSO, to provide an optimal embedding factor value. According to the proposed scheme, PSNR and NC can be used to calculate fitness
1 Optimized Watermarking Scheme for Copyright Protection of Medical … Table 1 Obtained optimal embedding factor for proposed scheme
Embedding factor range
k1
k2
5 k3
0.0–1.0
0.5483
0.6346
0.8362
0.0–2.0
1.5718
1.8092
1.6667
0.0–3.0
2.0803
2.7933
2.6394
0.0–4.0
3.8015
3.8167
3.8451
0.0–8.0
7.6857
6.2683
5.6797
0.0–10.0
8.0542
9.8810
7.6335
0.0–50.0
42.8845
49.0747
44.0159
0.0–150.0
143.4951
148.4743
142.2608
0.0–250.0
233.1310
225.7728
218.1750
functions for each particle’s population. Optimal solutions (gbest) are selected by selecting the maximum fitness values. Equation (2) gives the fitness function: fitness = PSNR(C, WA) + 100 ∗ NC w, w
(2)
The above equation represents the peak signal-to-noise ratio defined by PSNR and the normalized correlation by NC. A cover medical image is indicated by variable C, while variable WA indicates a watermarked medical image, and a watermark image is indicated by variables w, w. According to the experimental results, the proposed fitness function works well. In this case, the PSO parameters are selected to help compare schemes, their values are provided as constants C1 and C2, the number of particles is 5, the number of iterations is 8, and the initial weight α is 0.9. Some trial and experimental medical images found the best range between 0.0 and 250.0. The PSO algorithm provided the optimal embedding factor given in Table 1. The embedding factor k is represented in Table 1 for the generation of watermarked medical images.
3 Experimental Results An MRI image with grayscale (512 × 512 pixels) [26] is used as a cover medical image and the hospital logo (64 × 64) as a watermark image in testing the proposed scheme (Fig. 1). In this example, the watermark image is inserted directly into the cover medical image. The resultant images using the proposed scheme are shown in Fig. 2. Table 2 summarizes the PSNR and NC values based on the proposed scheme. Using this measurement, the PSNR measures the imperceptibility (transparency) of the embedded watermark in the cover image. As part of the study, NC also measured the robustness of extracting watermarks from watermarked medical images. A T EMB (s) and T EXT (s) indicate how long it takes to embed the watermark into a cover
6
R. Thanki and P. Joshi
Fig. 1 a Cover (MRI) image, b watermark image Embedding Factor Range
0.0 – 1.0
0.0 – 2.0
0.0 – 3.0
0.0 – 4.0
0.0 – 8.0
0.0 – 10.0
0.0 – 50.0
0.0 – 15.0
0.0 – 250.0
Watermarked Image
Recovered Watermark Image Embedding Factor Range
Watermarked Image
Recovered Watermark Image Embedding Factor Range
Watermarked Image
Recovered Watermark Image
Fig. 2 Generated watermarked medical images and recovered watermark image
1 Optimized Watermarking Scheme for Copyright Protection of Medical …
7
medical image and how long it takes to extract the watermark from that image. In total, this algorithm generates watermarked medical images in around 3 s. Table 3 summarizes the NC values of recovered watermark images for different watermarking attacks. This scheme can also provide robustness for medical images, as shown by the results. This indicates that telemedicine applications can use this scheme to secure medical images. Based on Table 4, the proposed scheme is compared with the Fakhari and Chakraborty schemes that provide similar copyright protection for medical images. For embedding the watermark image, the Fakhari scheme [23] and the Chakraborty scheme [25] use DWT, whereas the proposed scheme uses RDWT. Fakhari’s scheme [23] has a payload capacity of 10 bits, while Chakraborty’s scheme Table 2 Quality measures values of proposed scheme Range of embedding factor
PSNR (dB)
NC
T EMB (s)
T EXT (s)
0.0–1.0
63.02
0.6805
1.9943
1.5326
0.0–2.0
55.05
0.7872
1.4002
1.5577
0.0–3.0
51.58
0.8493
1.4411
1.5326
0.0–4.0
47.95
0.9035
1.4463
1.5061
0.0–8.0
43.14
0.9475
1.3609
1.6488
0.0–10.0
40.88
0.9667
1.3595
1.5823
0.0–50.0
26.44
0.9862
1.5518
1.7159
0.0–150.0
16.37
0.9975
1.5061
1.6962
0.0–250.0
12.50
0.9996
1.5345
1.6310
Table 3 NC values of proposed scheme under different watermarking attacks Attacks
NC
Attacks
NC
JPEG (Q = 80)
0.9996
Motion blurring
0.8067 0.9996
JPEG (Q = 25)
0.9993
Gaussian blurring
Median filtering (3 · 3)
0.9979
Sharping
1.0000
Gaussian noise (variance = 0.005)
0.9996
Histogram equalization
1.0000
Salt and pepper noise (variance = 0.005)
0.9996
Rotation (20°)
0.5649
Speckle noise (variance = 0.005)
0.9996
Cropping (20%)
0.9996
Intensity adjustment
1.0000
Scaling (512–256–512)
0.9755
Table 4 Performance comparison of proposed scheme with the existing schemes [23, 25] Features
Fakhari scheme [23]
Chakraborty scheme [25]
Proposed scheme
Used transform
DWT
DWT
RDWT
PSNRMax (dB)
51.55
25.7253
63.02
Payload capacity (bits)
10
1024
4096
8
R. Thanki and P. Joshi
Table 5 Performance comparison of proposed scheme with recently published schemes (2022, 2021) [23, 25] Features
Rezaee scheme [27]
Sharma scheme [28]
Golda scheme [29]
Proposed scheme
Used optimization algorithm
Whale
Firefly
Social group
Particle swarm optimization
PSNRMax (dB)
39.87
57.58
23.78
63.02
NCMax
0.9807
Not reported
Not reported
0.9996
[25] has a payload capacity of 1024 bits, which is less than the proposed scheme’s payload capacity. Compared to the Fakhari scheme [23] and the Chakraborty scheme [25], the proposed scheme performs better in transparency and payload capacity. Furthermore, with higher PSNR values, the transparency of watermark medical images is improved. As a result, Table 4 shows that the proposed scheme has a higher PSNR than the existing schemes, which implies that it has a higher degree of transparency. Table 5 summarizes the performance comparison of the proposed scheme with that of recently published schemes (2022, 2021) [27–29]. The comparison is based on PSNR and NC values and an optimization algorithm. Compared with recently published schemes [27–29], this proposed scheme outperformed them.
4 Conclusion For copyright protection of medical images, we present a watermarking scheme based on RDWT and PSO. In this case, the RDWT is used to increase payload capacity, while the PSO is used to generate optimal embedding factors. The proposed scheme for embedding a secret logo into medical images for copyright protection was secure and accurate. However, as a limitation of this proposed scheme, it can only watermark binary watermark images. Additionally, the proposed scheme performed better than the existing schemes.
References 1. Langelaar GC, Setyawan I, Lagendijk RL (2000) Watermarking digital image and video data. A state-of-the-art overview. IEEE Signal Process Mag 17(5):20–46 2. Thanki R, Borra S, Dwivedi V, Borisagar K (2017) An efficient medical image watermarking scheme based on FDCuT–DCT. Eng Sci Technol Int J 20(4):1366–1379 3. Lakshmi HR, Surekha B, Raju SV (2017) Real-time implementation of reversible watermarking. In: Intelligent techniques in signal processing for multimedia security. Springer, Cham, pp 113–132
1 Optimized Watermarking Scheme for Copyright Protection of Medical …
9
4. Thanki R, Borra S (2018) A color image steganography in hybrid FRT–DWT domain. J Inf Secur Appl 40:92–102 5. Thanki R, Dwivedi V, Borisagar K, Borra S (2017) A watermarking algorithm for multiple watermarks protection using SVD and compressive sensing. Informatica 41(4):479–493 6. Borra S, Lakshmi H, Dey N, Ashour A, Shi F (2017) Digital image watermarking tools: stateof-the-art. In: Information technology and intelligent transportation systems: proceedings of the 2nd international conference on information technology and intelligent transportation systems, vol 296, Xi’an, China, p 450 7. Surekha B, Swamy GN (2013) Sensitive digital image watermarking for copyright protection. IJ Netw Secur 15(2):113–121 8. Surekha B, Swamy G, Reddy KRL (2012, July) A novel copyright protection scheme based on visual secret sharing. In: 2012 third international conference on computing communication & networking technologies (ICCCNT). IEEE, pp 1–5 9. Dey N, Roy AB, Das A, Chaudhuri SS (2012, October) Stationary wavelet transformation based self-recovery of blind-watermark from electrocardiogram signal in wireless telecardiology. In: International conference on security in computer networks and distributed systems. Springer, Berlin, Heidelberg, pp 347–357 10. Dey N, Dey G, Chakraborty S, Chaudhuri SS (2014) Feature analysis of blind watermarked electromyogram signal in wireless telemonitoring. In: Concepts and trends in healthcare information systems. Springer, Cham, pp 205–229 11. Dey N, Ashour AS, Chakraborty S, Banerjee S, Gospodinova E, Gospodinov M, Hassanien AE (2017) Watermarking in biomedical signal processing. In: Intelligent techniques in signal processing for multimedia security. Springer, Cham, pp 345–369 12. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press 13. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Perth, Australia, pp 1942–1948 14. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press 15. Koza JR (1994) Genetic programming as a means for programming computers by natural selection. Stat Comput 4(2):87–112 16. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359 17. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 18. Glover F (1977) Heuristics for integer programming using surrogate constraints. Dec Sci 8(1):156–166 19. Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B 26(1):29–41 20. Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68 21. Li X, Wang J (2007) A steganographic method based upon JPEG and particle swarm optimization algorithm. Inf Sci 177(15):3099–3109 22. Findik O, Babao˘glu ˙I, Ülker E (2010) A color image watermarking scheme based on hybrid classification method: particle swarm optimization and k-nearest neighbor algorithm. Opt Commun 283(24):4916–4922 23. Fakhari P, Vahedi E, Lucas C (2011) Protecting patient privacy from unauthorized release of medical images using a bio-inspired wavelet-based watermarking approach. Digital Signal Process 21(3):433–446 24. Wang YR, Lin WH, Yang L (2011) An intelligent watermarking method based on particle swarm optimization. Expert Syst Appl 38(7):8024–8029 25. Chakraborty S, Samanta S, Biswas D, Dey N, Chaudhuri SS (2013, December) Particle swarm optimization-based parameter optimization technique in medical information hiding. In: 2013 IEEE international conference on computational intelligence and computing research (ICCIC), IEEE, pp 1–6
10
R. Thanki and P. Joshi
26. MedPixTM Medical Image Database available at http://rad.usuhs.mil/medpix/medpix.html, https://medpix.nlm.nih.gov/home. Last access year: 2021 27. Rezaee K, SaberiAnari M, Khosravi MR (2022) A wavelet-based robust medical image watermarking technique using whale optimization algorithm for data exchange through internet of medical things. In: Intelligent healthcare. Springer, Singapore, pp 373–394 28. Sharma S, Choudhary S, Sharma VK, Goyal A, Balihar MM (2022) Image watermarking in frequency domain using Hu’s invariant moments and firefly algorithm. Int J Image Graph Signal Process 2:1–15 29. Golda D, Prabha B, Murali K, Prasuna K, Vatsav SS, Adepu S (2021) Robust image watermarking using the social group optimization algorithm. Mater Today Proc
Chapter 2
MobileNet + SSD: Lightweight Network for Real-Time Detection of Basketball Player Banoth Thulasya Naik
and Mohammad Farukh Hashmi
1 Introduction In the field of computer vision, sports video analysis is one of the important topics. Many kinds of research, in particular, emphasize field sports like basketball, soccer, and field hockey, which are extremely popular outdoor sports all over the world. Analysis of field sports videos can be used for a variety of purposes, including event detection and player/team activity analysis. Low-level structural processes, such as player detection, classification, and tracking, are required for high-level applications. Player detection is a challenging issue though it is generally the first step in sports video analysis. The reason is basketball being an extremely dynamic sport, with players continuously changing their positions and postures. This paper offers a robust and efficient system for real-time player detection in basketball. This system enables detection of the position of the player in sports videos. As a result, only, one camera is displayed at any given moment. The player’s size does not always remain consistent when the camera moves from one side of the court to other and zooms in at various points. Furthermore, some body parts may appear to be foreshortened in comparison to others as a result of players changing their angle in relation to the camera. Moreover, it is extremely common for a player to be partially covered by other players. This paper focuses on basketball player detection using a dynamic camera infrastructure. In field sports, player detection algorithms must deal with a wide range of challenges, including changing lighting and weather conditions, as well as positional B. T. Naik (B) · M. F. Hashmi Department of Electronics and Communication Engineering, National Institute of Technology, Warangal, India e-mail: [email protected] M. F. Hashmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_2
11
12
B. T. Naik and M. F. Hashmi
changes of players in pictures, such as size and rotation, depending on the camera viewpoint. Depending on the distance from the camera and the direction in which they travel, players may appear at varying sizes, resolutions, and orientations. As there is such a broad range of player uniform colors and textures, the team uniform and lighting have a significant impact on player appearance. An approach is proposed to address these problems for real-time player detection in basketball sports. The rest of the paper is organized as follows. Section 2 provides a review of the existing literature related to player detection. The proposed methodology which is used to achieve the desired results is discussed in Sect. 3. The experimental results and performance metrics related to the work are presented in Sect. 4. Finally, the conclusion and scope of future work is discussed in Sect. 5.
2 Literature Survey One of the most basic tasks in computer vision is extracting information from images. Researchers have developed systems that use structure-from-motion [1] to obtain geometric information for detection and classification to find semantic information. For a wide range of applications, detecting players from images and videos is significant [2]. Intelligent broadcast systems, for example, employ player positions to influence broadcast camera viewpoints. For instance, broadcast systems employ player positions to influence broadcast camera viewpoints [3, 4]. In [5] proposed a mechanism to classify group activities by detecting players. Player detection and team categorization also offer metadata for player tracking [6], player pose estimate, and team strategy analysis [7]. Player detection has attracted a lot of attention as a subset of object detection in sports. [8–10] presented background subtraction-based approaches to achieve real-time response in a basketball game. Nonetheless, in order to detect foreground objects reliably, all techniques assume the camera is stationary. Several learning-based approaches, such as Faster R-CNN [11] and YOLO [12], can be modified to identify players with high detection accuracy; however, due to poor pixel resolution, they may miss distant players.
3 Methodology Detection algorithm comprises dual mechanism, i.e., a backbone and head. As essence, the backbone is often a pre-trained network for classifying the images. Here, the network called MobileNet [13] is used as backbone which is trained using over a million images while SSD [14] is used as head as shown in Fig. 1. Therefore, SSD contains various fixed layers, and results were defined in terms of classes of predicted and ground truth bounding boxes at the final 1-dimensional fully connected layer.
2 MobileNet + SSD: Lightweight Network for Real-Time Detection …
13
Fig. 1 Architecture of MobileNetv1 + SSD detector
3.1 SSD Parameters 3.1.1
Grid Cell and Anchor Box
Player detection in image entails determining the class and position of an element in the immediate vicinity. For example, an image of 4 × 4 grid is shown in Fig. 2. Here, the grating concentrates on producing appropriate spacing and form. The anchor boxes are accessible fields that are provided to complete that portion of the picture which has distinct parts in the grating separately. Individual grid cells were assigned to SSD with various anchors or prefixes. The anchor boxes have complete control over the shape and size of each grid cell. Figure 2 Fig. 2 Example of 4 × 4 grid and different size anchor boxes
14
B. T. Naik and M. F. Hashmi
shows two players, one with a height of anchor box and the other with a width of anchor box, indicating that the anchor boxes are of various sizes. The class and location of an item is finalized by anchor boxes with a lot of intersections through it. This information is utilized to train the network as well as to forecast the detected object’s position once the network has been completed.
3.1.2
Zoom Level and Aspect Ratio
The size of the anchor boxes does not have to be the same as the grid cells. It is used to determine the degree to which the anchor box wants the posterior to move upward or downward using grid cells. As illustrated in Fig. 2 based on varying grades, certain items in form are broader while others are longer. The SSD architecture allows anchor boxes to have a higher aspect ratio. The various aspect ratios of anchor box links are described using the range of proportions.
3.1.3
Receptive Field
The permissible field input area is divided into a viewable zone using a specific convolutional neural network. Zeiler et al. [15] presented them as a back-line combination at the relative location using a distinguishing attribute and an actuation. The characteristics of distinct layers indicate the varying sizes of regions in the image due to the hampered process. This is done by pacing the bottom layer (5 × 5) first, followed by a console, resulting in a middle layer (3 × 3) with a single green pixel representing a 3 × 3 section of the input layer (bottom layer), as seen in Fig. 3. The convolution is then applied to the middle layer (green), with the upper red layer (2 × 2) containing the individual attribute equaling the input image’s 7 × 7 region. The feature map is a green and red 2D array that points to a collection of features produced using a comparable feature extractor at different places of the input map in the form of an indication window. A comparable field exists for similar map features, which attempts to find similar patterns in different locations. As a result, a convolutional network at the local level is constructed.
3.2 Implementation of Proposed Methodology TensorFlow, Keras, and OpenCV libraries were used to build deep learning models for player detection, which is akin to real-time object detection. First, the system was trained with known data, i.e., labeled basketball data so that a player who comes in any unseen frames or video can be detected. This paper deals with a pre-trained lightweight player detection model which is trained on third-party objects, and most of the objects are included in the training class except player. So, some of the layers of the proposed method were modified to train the model on labeled basketball
2 MobileNet + SSD: Lightweight Network for Real-Time Detection …
15
Fig. 3 Feature map visualization and the receptive field
data. Finally, combining the pre-trained network and the SSD approach, the system was ready to detect basketball players. Here, the pre-trained model (backbone) is MobieNetv1 which was combined with SSD to complete the architecture of player detection. Now, the architecture was trained by detecting labels from the training dataset based on the bounding boxes. Frames must be considered fixed resolution (512 × 512) for training the player detection model. MobileNetv1 was utilized as the backbone of SSD architecture to improve the detection accuracy and frame rate. To detect multiple players, this technique must take a single shot. So, SSD is the best architecture designed based on neural networks to detect objects in a frame or video. Other techniques such as R-CNN, Fast R-CNN, and Faster R-CNN require two shots. This increases the computation time as parameters increase, which reduces speed of detection. The SSD method divides the bounding output space into a sequence of default boxes with different proportions and sizes and recognizes the restricted output space as the default box. The network quickly analyzes a default box for the existence of particular object classes and combines the boxes to detect an exact object. This network also encompasses a variety of models of varying sizes of natural adhesives and resolutions. In case if no object is present in the frame, then it is considered as background and ignored.
4 Dataset and Experimental Results The proposed model carried out training on the basketball dataset [16] which was filmed during a basketball match (Resolution 1020 × 1080), and it contains a variable number of players, i.e., some frames contain 10 or 11 players while other frames contain 12 or 13 players of total 50,127 frames of which 40,300 frames are for training and 9827 frames are for testing. While training the proposed model, the resolution of frames was modified to 512 × 512, and various data augmentation techniques
16 Table 1 Configurations of experimental setup
B. T. Naik and M. F. Hashmi Model training/testing setup Names
Experimental configuration
OS
Windows 10 Pro
CPU/GHz
Intel Xeon 64 bit CPU @3.60
RAM/GB
64
GPU
NVIDIA Quadro P4000, 8 GB, 1792 Cuda cores
GPU acceleration library CUDA10.0, CUDNN7.4 TensorFlow
2.x
Keras
2.2.x
such as blur, flip, crop, affine, and contrast were applied to enhance the robustness of the model. The detection model was trained and tested using a workstation with the configurations listed in Table 1. Model training was stopped at 100 epochs, as it attained the minimum training and testing loss accuracy of 0.12 and 0.45; at the same time, it reached training and testing accuracy of 98.3% and 96.8%, respectively, as shown in Figs. 4 and 5. The model was set to save the checkpoints (i.e., weight file of detection model on various parameters) for every 10 epochs. The size of the final weight file achieved is 12.4 MB, is a lightweight player detection network which can be embedded on low-edge devices for real-time implementation and may achieve better accuracy in detecting the player in real time. Though the basketball match was captured using a dynamic camera, players were detected accurately, with the performance and robustness of the proposed method measured using four metrics whereas results were compared and tabulated as shown in Table 2. Figure 6 depicts that almost all the players were detected even though the movement of the camera changed while capturing the match, as shown between frame-8 to frame-199.
Fig. 4 Training precision and training loss with respect to number of epochs
2 MobileNet + SSD: Lightweight Network for Real-Time Detection …
17
Fig. 5 Testing precision and testing loss with respect to number of epochs Table 2 Comparative analysis of proposed detection algorithm with state of art methods on basketball dataset Architecture
Precision (%)
Recall (%)
F1-score (%)
FPS
Multiplayer detection [9]
88.65
92.19
90.39
–
Proposed method
92.1
73.8
81.3
57.2
The bold values represent the performance of the proposed methodology is superior in precision and FPS
Fig. 6 Detecting basketball players in a frame which is captured with a dynamic camera (the view of the camera changes and it can be observed from frame-8 to frame-199 in the figure)
18
B. T. Naik and M. F. Hashmi
5 Conclusion and for Future Research Scope In this paper, a lightweight network for real-time basketball player detection is proposed. The proposed mechanism effectively improves player detection speed of 57.2 fps. The experimental results on the proposed MobileNetv1 + SSD methodology achieved 92.1% precision and 81.3% f 1-score. The weight file obtained after training the model is 12.4 MB, which is a lightweight player detection network that is simple to deploy on low-edge embedded devices. In future work, in addition to addressing the above limitations, the proposed methodology is deployed in embedded devices such as PYNQ board and Jetson Nano, while other methods are considered to optimize the proposed method to enhance scalability.
References 1. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge 2. Thomas G, Gade R, Moeslund TB, Carr P, Hilton A (2017) Computer vision for sports: current applications and research topics. Comput Vis Image Underst 159:3–18 3. Chen J, Le HM, Carr P, Yue Y, Little JJ (2016) Learning online smooth predictors for realtime camera planning using recurrent decision trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4688–4696 4. Chen J, Little JJ (2017) Where should cameras look at soccer games: improving smoothness using the overlapped hidden Markov model. Comput Vis Image Underst 159:59–73 5. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1971–1980 6. Lu W-L, Ting J-A, Little JJ, Murphy KP (2013) Learning to track and identify players from broadcast sports videos. IEEE Trans Pattern Anal Mach Intell 35(7):1704–1716 7. Lucey P, Oliver D, Carr P, Roth J, Matthews I (2013) Assessing team strategy using spatiotemporal data. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1366–1374 8. Carr P, Sheikh Y, Matthews I (2012) Monocular object detection using 3d geometric primitives. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 864–878 9. Liu J, Tong X, Li W, Wang T, Zhang Y, Wang H (2009) Automatic player detection, labeling and tracking in broadcast soccer video. Pattern Recogn Lett 30(2):103–113 10. Parisot P, De Vleeschouwer C (2017) Scene-specific classifier for effective and efficient team sport players detection from a single calibrated camera. Comput Vis Image Understand 159:74– 88 11. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149 12. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 13. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
2 MobileNet + SSD: Lightweight Network for Real-Time Detection …
19
15. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833 16. Citraro L, Márquez-Neila P, Savare S, Jayaram V, Dubout C, Renaut F, Hasfura A, Shitrit HB, Fua P (2020) Real-time camera pose estimation for sports fields. Mach Vis Appl 31(3):1–13
Chapter 3
Modified Hungarian Algorithm-Based User Pairing with Optimal Power Allocation in NOMA Systems Sunkaraboina Sreenu and Kalpana Naidu
1 Introduction With the rapid expansion of smart devices and multimedia applications, NonOrthogonal Multiple Access (NOMA) has evolved as cutting-edge technology for 5G networks since it boosts user connectivity, total capacity, and cell-edge user data rate [1–3]. Contrary to prevailing Orthogonal Multiple Access (OMA) [4, 5], NOMA simultaneously serves the massive number of users in the single resource block through “power domain multiplexing” [6]. Moreover, the NOMA system employs “Superposition Coding (SC)” at the transmitter to superimpose several user symbols. Furthermore, “Successive Interference Cancellation (SIC)” is conducted to isolate the respective user symbols at the receiver [7, 8]. In addition, NOMA assigns vast power to the bad channel state users (far users) and small power to the users having good channel conditions (near users) to preserve fairness among users [9, 10]. However, superimposing more users in the same resource unit leads to severe error propagation and higher latency [11, 12]. Therefore, efficient resource allocation in the NOMA systems plays a substantial role in enriching sum capacity, energy efficiency, and user fairness. Recently, many studies focused on the “User Pairing (UP) and Power Allocation (PA)” problems to enrich the overall system’s potency. Thus, in [13], the authors proposed the exhaustive search UP scheme, which in turn picks the better throughput user group among all user combinations. Nevertheless, in [13], the computational intricacy rises exponentially with an upsurge in users. Hence, this strategy is impractical for the massive number of users. Conversely, the Random Pairing (RP) algorithm S. Sreenu · K. Naidu (B) Department of Electronics and Communication Engineering, NIT Warangal, Warangal, India e-mail: [email protected] S. Sreenu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_3
21
22
S. Sreenu and K. Naidu
has been presented in [14], in which users are paired randomly, and it offers low complexity. But, RP performance is ineffective since users’ channel information is not considered for pairing. In Consequence, Near-Far Pairing (NFP) has been investigated in [15]; wherein NFP forms the user pairs according to the channel states of users. Although channel conditions are considered in [15], the group of users in the cell center constitutes a very small channel gain gap, leading to substantial interference formation among those users. Further, the author in [16] studied the Uniform Channel Gain Difference (UCGD) user grouping to circumvent the issues of low channel gain gap user pairs in near-far user pairing. Additionally, the virtual user pairing technique has been propounded in [17]; in which, user groups are paired optimally when the number of weak signal strength users is more than the strong users. However, to further upgrade the system performance, various PA algorithms have been investigated in the literature in addition to UP algorithms [18]. Additionally, the authors in [19] presented the low complexity user grouping and power allotment technique by employing the Fixed Power Allocation (FPA) approach for multiplexed users of each pair. Anyhow, FPA provides poor performance. Analogously, Fractional Transmission Power Allocation (FTPA) was proposed in [6], which is employed to allocate the fractional power between the sub-band users. Further, in [20], the Difference-of-Convex (DC) functions technique was investigated to improve spectral and energy efficiencies. However, this approach allocates the power between inter and intra-sub bands. Even though the PA scheme in [20] outperforms the FTPA, it provides the sub-optimal solution. Furthermore, an optimum power distribution approach was explored in [21] to optimize the system’s weighted sum rate while maintaining the Quality of Service (QoS) standards. In an analogous manner, a new resource allocation strategy has been studied in [22], in which Karush-Kuhn-Tucker (KKT) conditions have been used to solve the optimal power factors among sub-band users by satisfying BS transmit power and QoS of each user constraints. Moreover, authors in [22] exploited the Hungarian algorithm for user grouping. However, this approach has high complexity. In the existing literature, most of the works have not investigated sub-band power allocation in the NOMA system. The optimal PA among orthogonal sub-bands attains further improvement in the system performance [23], motivated by this fact. This article proposes an optimal resource allocation (i.e., UP and PA) algorithm, aiming to further enhance the system’s sum capacity and minimize the algorithm’s complexity. In this line, we develop the sum-rate maximization problem with the constraints of (a) total power available at the BS and (b) minimum required data throughput of users. However, finding the optimal global solution for the combined UP and PA problem is complex due to the problem’s non-convexity [24]. As a result, we divide this problem into two parts and solve them in the following order: • All the users are paired (two users on every sub-band) optimally based on the Modified Hungarian Algorithm (MHA). Moreover, MHA has lower complexity than the Hungarian algorithm [25, 26].
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
23
• We proposed Water Filling based Power Allocation (WFPA) to distribute the powers across sub-bands according to the channel conditions of each sub-band. • Then, this allocated sub-band power is optimally shared between sub-band users based on KKT optimal conditions. Here onwards, the suggested hybrid Resource Allocation (RA) method for NOMA systems is labeled as “MHA-WFPA-KKT”. Ultimately, the proposed hybrid RA method shows significant performance and lower complexity compared to the existing RA techniques. The remaining article is organized accordingly. Section 2 elucidates the downlink NOMA network model. Then, Sect. 3 describes the proposed MHA-based user pairing algorithm. Subsequently, Sect. 4 demonstrates a method for power distribution across sub-bands and a strategy for power allotment among users on each sub-band. Section 5 validates the effectiveness of the MHA-WFPA-KKT scheme through simulation results. Ultimately, the article is wrapped up in Sect. 6.
2 Network Model This work considers the downlink transmission model for the NOMA system. Here, the BS is established at the origin of the circular cell province. Besides, the diameter of the cell region is D meters, and M users are uniformly and randomly scattered in the cell. Further, the system’s bandwidth is BT , which is identically portioned into N Sub-Bands (SB), where bandwidth of each SB is Bn = BN . Moreover, the power N Pn = PT ; where, PT is total power available assigned to n th SB is Pn , such that n=1 at BS. Figure 1 depicts the fundamental concept of the multi-user downlink NOMA transmission model. The BS multiplex the Wn users symbols on each SB by using SC. Consequently, the superposed signal broadasted by the BS is given by, T
sn =
√
p1,n s1,n +
√
p2,n s2,n + ..... +
⇒ sn =
Wn √
√
pWn ,n sWn ,n
pw,n sw,n
(1)
(2)
w=1
where, sn represents BS transmitted signal and pw,n is allocated power for wth user Wn pw,n = Pn . Thus, the signal obtained at the wth user is written on the S Bn and w=1 by, (3) rw,n = gw,n sn + vw,n , ⇒ rw,n =
√
pw,n gw,n sw,n +
Wn √ j=1, j=w
p j,n gw,n s j,n + vw,n .
(4)
24
S. Sreenu and K. Naidu
Fig. 1 Multi-user downlink system model of NOMA
In Eq. (4); gw,n = √hw,n−η denotes channel coefficient from BS to user-w on nth SB, dw where, h w,n is Rayleigh fading channel gain. Besides, distance from BS to user ‘w’ is denoted by dw and η represents path loss slope. In addition, vw,n is the Gaussian noise and its power σ 2 = No Bn , where No is the noise spectral density of vw,n . Let |g | 2 G w,n = σw,n2 is the “Channel gain to Noise Ratio (CNR)” of wth user on sub band ‘n’. With no loss of generality, according to CNRs, users’ are ordered as follows: G 1,n G 2,n G 3,n ............. G Wn ,n .
(5)
The users in the NOMA network utilize the SIC technique to remove interference from far users on the SB [27]. Furthermore, the ascending order of CNRs is the optimal detecting sequence for SIC. According to this sequence, the user can perfectly interpret the signals of other users whose decoding order precedes that user. As a result, interference from poor channel state users can be eliminated by a user with substantial channel gain. For instance, consider two users in S Bn , and if G 1,n ≤ G 2,n . Then, BS allot the transmission power as p1,n ≥ p2,n . In this case, user-2 performs the SIC to remove interference caused by user-1 (i.e., far user) and then extracts its information. However, user-1 decodes its signal by presuming the user-2 signal as inter-user noise. Therefore, the “Signal to Interference plus Noise Ratio (SINR)” of wth user is represented as follows: Γw,n =
G w,n pw,n Wn j=w+1
G w,n p j,n + 1
(6)
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
25
Then, capacity of wth user is obtained as, Rw,n = Bn log2 1 + Γw,n
(7)
Therefore, overall throughput of the NOMA system can be expresses as, Rtotal =
Wn N
Rw,n .
(8)
n=1 w=1
3 MHA-Based User Pairing The Hungarian Algorithm (HA) is one of the best combinatorial schemes for solving assignment problems. Since it provides globally optimal solutions. Moreover, HA is perfectly suitable for pairing the users on the SB in the NOMA systems to improve the sum rate [13, 22, 28]. Nevertheless, the computational intricacy of this algorithm is high. So, we proposed the modified Hungarian method for the problem of user pairing, which provides the same performance as HA but with lower complexity [29]. In order to pair the users, divide randomly deployed users into two user groups as follows: (a) strong users’ group gs = (U1 , U2 , ...., Ui ), (b) weak users’ group gw = (U1 , U2 , ...., U j ). Further, gs and gw represent rows and columns in the cost function, respectively. Therefore, the cost function is mathematically constructed as R = Ci j ; i, j = {1, 2, 3, ..., M/2}. Where, Ci j is the sum of ith strong user and jth weak user achievable data rates. For instance, 10 users are deployed in the cell and assume gs = (U1 , U2 , U5 , U8 , U10 ) and gw = (U3 , U4 , U6 , U7 , U9 ) then the corresponding cost matrix is given in Table 1. Then, the MHA pairs the strong user with the weak on a specific sub-band at the maximum sum rate. The steps of the MHA-based user pairing are detailed lucidly in Algorithm 1.
Table 1 MHA-based user pairing cost matrix description for sum rate optimization Strong users Weak users U3 U4 U6 U7 U9 U1 U2 U5 U8 U10
C1,3 C2,3 C5,3 C8,3 C10,3
C1,4 C2,4 C5,4 C8,4 C10,4
C1,6 C2,6 C5,4 C8,4 C10,4
C1,7 C2,7 C5,7 C8,7 C10,7
C1,9 C2,9 C5,9 C8,9 C10,9
26
S. Sreenu and K. Naidu
Algorithm 1. MHA-based user grouping scheme 1: Construct the cost function Ci j . 2: Obtain the maximum element from the whole cost function. Then do the subtrac-
tion between the largest element and each element of the cost matrix. 3: Identify the least possible element from each row and perform the subtraction
among the minor element and every element in that specific row. 4: Similarly, identify the minimum number from each column and subtract it from
every element in that column. 5: Draw the least number of lines on rows and columns required to cover all the
zeros in the resulting cost function obtained from step 1 to step 4. 6: If the minimal number of lines (K ) and order of the cost function (k) are differed
by one (i.e., k − K = 1), then do the partial pairing in the following way: (i): Mark all the zeros of single zero rows and columns with a circle and cross the rest of the zeros in the corresponding columns and rows. (ii): If there are no unmarked zeros, proceed to step 7. But, if there exists more than one unmarked zero, randomly mark one of them with a circle and cross the remaining zeros. Continue this procedure until there are no unmarked zeros in the cost function. 7: Subsequently, we found the ‘marked zeros’ assignment of the (k − 1) rows and (k − 1) columns of the cost matrix. Hence, one row and one column element remain to which no selection has been made. 8: Eventually, the smallest of all uncovered elements is the element at the crosssection of row and column to which no earlier selection has been made. The best solution will be a combination of partial assignment and this extra assignment (all assigned elements treated as matched user pairs).
In HA, we must modify the cost matrix until we get K = k, which takes more arithmetic operations. Whereas in MHA, we perform the assignment (pairing) when k − K = 1, this procedure reduces the algorithm’s complexity and provides optimal pairs.
4 Proposed Power Allocation Methods Once the user pairs on the sub-bands have been determined, this part investigates the power allocation issue, which blends the WFPA with the KKT optimal conditions in order to enrich the aggregate data rate even more.
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
27
4.1 Sub-band Power Allocation The proposed MHA-based user pairing method grouped the only two users in each sub-band. As a result, every SB user can decode its signal perfectly. Besides, consider that the sum of the paired users’ channel gains on nth sub-band equals to ψn . As previously mentioned, existing RA methods split the total power (PT ) equally across all sub-bands. Even though Equal Power Allocation (EPA) is simple, it produces a sub-optimal solution. The powers allotted to SB will impact the its possible comprehensive data rate. Hence, in order to find optimal power for every SB, we exploited the water-filling algorithm [31]. Therefore, the optimization problem for sub-band power allocation is formulated as, max Pn
N
Bn log2 (1 + Pn ψn )
(9)
n=1
subject to :
N
Pn PT
(10)
n=1
Here, Eq. (9) is a convex optimization problem. So, we solve the above optimization problem by exploiting Lagrange multiplier method. Finally, the closed-form solution for the nth sub-band power is provided by, Pn ∗ =
and
1 1 − δ ψn
N 1 n=1
1 − δ ψn
+
+
(11)
=PT
(12)
where, δ is Lagrange multiplier.
4.2 Power Assignment for Sub-band Users In this part, our goal is to optimize the sum-rate under the total BS transmission power and also each user’s minimum rate constraints. Accordingly, the sum-rate maximization issue is developed as follows: max pw,n
Wn N n=1 w=1
Rw,n
(13)
28
S. Sreenu and K. Naidu
subject to :
Wn N
pw,n ≤ PT
(14)
n=1 w=1
Rw,n win w,n , w = 1, 2, ...., Wn
(15)
where win w,n is the minimal required data rate of wth user of S Bn . Equation (14) indicates the total power constraint and Equation (15) guarantees the each user’s minimal required rate. According to the optimization problem in Equation (13), the total data rate can be find by summing all the sub-bands aggregate throughput. Therefore, to maximize the systems’ sum rate, the data throughput of each sub-band must be optimized, so that optimisation problem can be modified as, max Rn = pw
Wn
Rw
(16)
pw = Pn
(17)
w=1
subject to :
Wn w=1
Rw win w
(18)
We can achieve the optimal powers for the superimposed users on each SB in a closed-form, that satisfies the KKT [30] conditions. The Lagrange function of the formulated problem (16) is obtained as, L ( pw , λ, μw ) = Bn
Wn
log2 (1 + pw γw ) − λ
w=1
−
Wn
μw
W n
pw − Pn
w=1
win w
− Bn
w=1
Wn
log2 (1 + pw γw )
(19)
w=1
where λ, μw denotes the Lagrange multipliers and γw =
Wn
G w,n
,
pi G w,n +1
i=w+1
= Bn
Wn
w=1
(1 + μw ) log2 (1 + pw γw ) − λ
W n w=1
pw − Pn −
Wn
μw win w .
(20)
w=1
Karush-Kuhn-Tucker conditions are attained in the following way: Bn 1 + μ∗w γw ∂L − λ∗ = 0, ∀w ∈ Wn = ∂ pw∗ ln 2 1 + pw∗ γw
(21)
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
λ
∗
W n
29
pw∗
− Pn
= 0, ∀w ∈ Wn
(22)
∗ log2 1 + pw γm = 0, ∀w ∈ Wn
(23)
w=1
μ∗w
win w
− Bn
Wn w=1
W n
pm∗ − Pn
= 0, ∀w ∈ Wn
(24)
log2 1 + pw∗ γw = 0, ∀w ∈ Wn
(25)
w=1
− Bn win w
Wn w=1
λ∗ 0 and μ∗w 0, ∀w ∈ Wn
(26)
If λ∗ , μ∗w are greater than zero, the optimal solution is obtained [22, 32]. Therefore, Wn
pw∗ = Pn , ∀w ∈ Wn
(27)
w=1
win w
= Bn
Wn
log2 1 + pw∗ γw , ∀w ∈ Wn
(28)
w=1
The closed-form result of the optimal power allocations within the Wn -user SB can be given as,
win w 1 2 Bn − 1 , ∀w ∈ {2, 3, ...., Wn } (29) pw∗ = γw and p1 = Pn −
Wn
pw∗ .
(30)
w=2
5 Simulation Results This section evaluates the effectiveness proposed MHA-WFPA-KKT resource allocation approach for the downlink NOMA system through MATLAB simulations. We presume that BS is positioned at the central point of the cell with perfect CSI. In addition, the BS transmission power is set at 30 dBm, and the system bandwidth is 5 MHz. Also, assume that the noise spectral density (No ) is equal over all SBs. Besides, Table 2 presents the detailed simulation settings for the proposed system.
30 Table 2 Simulation specifications Parameter’s name No. of users (M) No. of sub bands (N) Cell diameter (D) System bandwidth (BT ) Noise spectral density (No ) Channel model Path loss (υ) FTPA decay factor (α) Min. rate for strong user (min ) s Min. rate for weak user (min ) w
S. Sreenu and K. Naidu
Value 10 5 1000 m 5 MHz −174 dBm/Hz Rayleigh fading channel 3.7 0.4 1 Mbps 0.5 Mbps
Fig. 2 Proposed MHA based user pairing algorithm performance against various stat-of-art pairing schemes and OMA
Figure 2 portrays the performance of the MHA-based UP scheme compared with existing user pairing strategies. Here, equal power is distributed to the orthogonal sub-bands (i.e., Pn = PNT ), and FTPA has been employed for intra-sub-band power allocation. Further, Fig. 2 corroborated that the MHA-based UP scheme outperforms the UCGD pairing, NFP, RP, and OMA system in terms of the system’s sum throughput. Moreover, the overall data rate improves as transmission power increases from 0 to 1 Watt. For instance, if PT is fixed at 0.5 Watt, the proposed pairing performance is 4.5%, 8.5%, 20%, and 56.85% more than the UCGD pairing, NFP, RP, and OMA, respectively. Further, MHA provides equal performance as HA with lower complexity. Figure 3 exhibits the relationship between total cell throughput and transmit power among three diverse power assignment algorithms. We can notice from Fig. 3 that as PT grows, the overall cell throughput also increases. Furthermore, when compared to
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
31
Fig. 3 Comparison of power allocation techniques
Fig. 4 Sum rate comparison among proposed resource allocation method, existing resource allocation methods and OMA system
the EPA-FTPA algorithm, the suggested technique performed better at low transmit power and was closer to the EPA-FTPA at high transmit power. The EPA-FTPA algorithm had poor performance because the channel conditions were not considered in sub-band PA, and FTPA is a sub-optimal PA scheme. Although the EPA-KKT scheme obtained higher throughput than the EPA-FTPA approach, it has to optimize the inter-sub band powers. In this paper, the WFPA-KKT algorithm furnished optimal solutions for inter and intra-sub band power allocations and provided better sum-rate performance. Figure 4 displays impact of PT on sum-throughput with 10 users assumed in cell. From Fig. 4, it can be inferred that as PT changes from 0 to 1 W, the sum rate of the system also increases for all resource allocation methods. In addition, the proposed
32
S. Sreenu and K. Naidu
hybrid MHA-WFPA-KKT system throughput outperformed all existing HNG-EPAKKT, UCGD-EPA-FTPA, NFP-EPA-FTPA, RP-EPA-FTPA, and OMA systems.
6 Conclusion In this paper, the sum-rate maximization problem that guarantees the BS transmission power is resolved by fulfilling each user’s minimum required throughput constraints. Thus, to untangle this optimization problem, the modified Hungarian algorithm is proposed for user grouping and optimal power allocation for every sub-band users. Moreover, the proposed WFPA provides better performance against equal power allocation, as WFPA yields optimal power distribution among sub-bands. Furthermore, compared to the Hungarian method, the proposed MHA for user pairing is lesser complex. Ultimately, the simulations exhibited that the sum data rate of the proposed MHA-WFPA-KKT is superior to the prevailing resource allocation methods for NOMA and OMA systems.
References 1. Dai L, Wang B, Ding Z, Wang Z, Chen S, Hanzo L (2018) A survey of non-orthogonal multiple access for 5G. IEEE Commun Surv Tutorials 20(3):2294–2323 2. Islam SMR, Zeng M, Dobre OA, Kwak K (April 2018) Resource allocation for downlink NOMA systems: key techniques and open issues. IEEE Wirel Commun 25(2):40–47 3. Liu Y, Qin Z, Elkashlan M, Ding Z, Nallanathan A, Hanzo L (Dec 2017) Nonorthogonal multiple access for 5G and beyond. Proc IEEE 105(12):2347–2381 4. Naidu K, Ali Khan MZ, Hanzo L (July 2016) An efficient direct solution of cave-filling problems. IEEE Trans Commun 64(7):3064–3077 5. Kalpana, Sunkaraboina S (24 Dec 2021) Remote health monitoring system using heterogeneous networks. Healthc Technol Lett 9(1–2):16–24 6. Saito Y, Benjebbour A, Kishiyama Y, Nakamura T (2013) System-level performance evaluation of downlink non-orthogonal multiple access (NOMA). In: 2013 IEEE 24th annual international symposium on personal, indoor, and mobile radio communications (PIMRC), pp 611–615 7. Saito Y, Kishiyama Y, Benjebbour A, Nakamura T, Li A, Higuchi K (2013) Non-orthogonal multiple access (NOMA) for cellular future radio access. In: 2013 IEEE 77th vehicular technology conference (VTC Spring), pp 1–5 8. Sunkaraboina S, Naidu K (Dec 2021) Novel user association scheme deployed for the downlink NOMA systems. In: International conference on communication and intelligent systems (ICCIS ). Delhi, India, pp 1–5 9. Aldababsa M, Toka M, Gökçeli S, Kurt GK, Kucur O (2018) A tutorial on nonorthogonal multiple access for 5G and beyond. Wirel Commun Mob Comput 2018:9713450, 24 10. J. Choi (2017) NOMA: principles and recent results. In: 2017 international symposium on wireless communication systems (ISWCS), pp 349–354 11. Al-Obiedollah H, Cumanan K, Salameh HB, Chen G, Ding Z, Dobre OA (Nov 2021) Downlink multi-carrier NOMA with opportunistic bandwidth allocations. IEEE Wirel Commun Lett 10(11):2426–2429 12. Ding Z, Fan P, Poor HV (August 2016) Impact of user pairing on 5G non-orthogonal multipleaccess downlink transmissions. IEEE Trans Veh Technol 65(8):6010–6023
3 Modified Hungarian Algorithm-Based User Pairing with Optimal …
33
13. Marcano AS, Christiansen H (2018) L: Impact of NOMA on network capacity dimensioning for 5G HetNets. IEEE Access 6:13587–13603 14. Aghdam MRG, Abdolee R, Azhiri FA, Tazehkand BM (2018) Random user pairing in massiveMIMO-NOMA transmission systems based on mmWave. In: 2018 IEEE 88th vehicular technology conference (VTC-Fall), pp 1–6 15. Dogra T, Bharti MR (2022) User pairing and power allocation strategies for downlink NOMAbased VLC systems: an overview. AEU-Int J Electron Commun 154184 16. Gao Y, Yu F, Zhang H, Shi Y, Xia Y (2022) Optimal downlink power allocation schemes for OFDM-NOMA-based internet of things. Int J Distrib Sens Netw 18(1) 17. Shahab MB, Shin SY (2017) On the performance of a virtual user pairing scheme to efficiently utilize the spectrum of unpaired users in NOMA. Phys Commun 25:492–501 18. Di B, Song L, Li Y (2016) Sub-channel assignment, power allocation, and user scheduling for non-orthogonal multiple access networks. IEEE Trans Wirel Commun 15(11):7686–7698 19. He J, Tang Z (2017) Low-complexity user pairing and power allocation algorithm for 5G cellular network non-orthogonal multiple access. Electron Lett 53(9):626–627 20. Parida P, Das SS (2014) Power allocation in OFDM based NOMA systems: a DC programming approach. In: 2014 IEEE globecom workshops (GC Wkshps), pp 1026–1031 21. He J, Tang Z (2017) Low-complexity user pairing and power allocation algorithm for 5G cellular network non-orthogonal multiple access. Electron Lett 53:626–627 22. Ali ZJ, Noordin NK, Sali A, Hashim F, Balfaqih M (2020) Novel resource allocation techniques for downlink non-orthogonal multiple access systems. Appl Sci 10(17):5892 23. Goswami D, Das SS (Dec 2020) Iterative sub-band and power allocation in downlink multiband NOMA. IEEE Syst J 14(4):5199–5209 24. Saraereh OA, Alsaraira A, Khan I, Uthansakul J (2019) An efficient resource allocation algorithm for OFDM-based NOMA in 5G systems. Electronics 8(12):1399 25. Dutta J, Pal SC (2015) A note on Hungarian method for solving assignment problem. J Inf Optim Sci 36(5):451–459 26. Akpan NP, Abraham UP (2016) A critique of the Hungarian method of solving assignment problem to the alternate method of assignment problem by Mansi. Int J Sci: Basic Appl Res 29(1):43–56 27. Dai L, Wang B, Yuan Y, Han S, Chih-Lin I, Wang Z (2015) Non-orthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends. IEEE Commun Mag 53(9):74–81 28. Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logistics Q 2(1–2):83–97 29. Dutta J, Pal PC (2015) A note on Hungarian method for solving assignment problem. J Inf Optim Sci 36(5):451–459 30. Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge university press 31. Khan MZA, Naidu K (May 2015) Weighted water-filling algorithm with reduced computational complexity. In: IEEE ICCIT 2015, 20–21, Abu Dhabi, UAE 32. Zuo H, Tao X (2017) Power allocation optimization for uplink non-orthogonal multiple access systems. In: 2017 9th international conference on wireless communications and signal processing (WCSP), pp. 1–5
Chapter 4
Design and Implementation of Advanced Re-Configurable Quantum-Dot Cellular Automata-Based (Q-DCA) n-Bit Barrel-Shifter Using Multilayer 8:1 MUX with Reversibility Swarup Sarkar and Rupsa Roy
1 Introduction The proposed shift, named as barrel-shifter, is formed widely using CMO technology till now because it follows Moore’s Law [1], but the fundamental CMOSbased circuitries face various types of small-scaling issues as the phase of devicedensity increment takes place. The device-complexity, leakage current flow, powerdissipation increment and delay are increased due to the device-size decrement in CMOS technology. Thus, utilization of more advanced technologies to design digital components is becoming a problem in this recent nano-technical digital era. In this research work, a novel low power high speed beyond transistor-level technology is proposed to design only combinational circuit-based shift registers. This novel advanced technology is named as Q-DCA technology, introduced by lent et al. in 1993 [2, 3]. In this proposed technique, quantum cells or Q-cells with 4-dots [4] are applied to form the proposed digital circuitries, in which 2-dots are engaged by Spintronic electrons (positioned crosswise all time to maintain electrostatic-revolting force between 2 consecutive similar charged carriers), and the electrons move one dot to another through a tunnel. These spintronic electron-based Q-cells are located one after another in this proposed technical field to form a Q-wire, and these quantum wires help to flow the information from one Q-cell to another Q-cell with few power, very low leak-current, and THz frequency range [5]. In this nano-technical field, multilayer 3D design with reversible gate also can be used to get the efficient device size, delay and dissipated power amount without any device-complexity increment, which is proved in this work done. S. Sarkar (B) · R. Roy Department of Electronics and Communication Engineering, SMIT, SMU, Rangpo, Sikkim, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_4
35
36
S. Sarkar and R. Roy
Combinational circuit-based barrel-shifter is proposed in this paper to design using Q-DCA technology because it is more acceptable as an arithmetic logic unit (ALU) component than other sequential circuit-based shift registers, which helps to shift data, and “ALU” is a major element of any processor in this recent era. Multi-bit barrel-shifter using IGFINFET is the recent progress of 2020 [6] (produced by Nehru kandasamy et al.), where the advancement of the proposed technology compare to “Transmission-gate” as well as “Pass-transistor-logic” is proved to optimize the value of power-delay product, and in this year, also, a transistor logic-based advanced 8bit barrel-shifter is presented by Ch. Padmavani et al. in paper [7], where the value of average power is discussed for different technologies from 65 to 250 nm using Tanner 15.0 EDA tool with SVL or self-voltage-controllable logic. In this work, the flow of barrel-shifter-design-advancement is maintained by applying low power high speed beyond transistor logic technologies named as Q-DCA with multilayer 3D advancement, 16 nm cell design and reversibility checking. 8:1 multiplexer or MUX is the basic component of the proposed shifter structure of this paper. If the recent quantum cell-based 8:1 MUX designs are discussed, the recent parametric changes of this used component are also very important to present. In 2019, 2:1 to 8:1 MUX model using single-layer quantum cell-based technology will be presented by Li. Xingjian et al. in paper [8], where cost calculation is shown, and only, 260 cells are used to get 0.4 µm2 area-occupied 8:1 MUX. It presents an outcome with 9 ps propagation delay. The previous occupied area and cell count is optimized more without changing the propagation delay in 2022 [9], which is written by Seyed-Sajad Ahmadpour et al., and in this paper, also, the 2:1–8:1 MUX design using single-layer Q-DCA is presented, where the ultimate optimized design is used to configure a “RAM-cell,” but in this proposed work, barrel-shifter is the ultimate design where 8:1 MUX is used as the basic component. A novel structure of multilayer 8:1 multiplexer with 75% reversibility is used in this work done to design an 8-bit multilayer barrel-shifter. Layer separation gap reduction effect is also checked in this proposed work with temperature tolerance checking to reduce the device volume, which is a trade off in multilayer circuitry. The major contributions of this projected work done are as follows: • A novel optimized multilayer 3D 8:1 MUX is designed in Q-DCA platform and checks the reversibility. • Multilayer barrel-shifter using the proposed design of 8:1 MUX is formed in Q-DCA platform and presents the parametric advancement of Q-DCA-based proposed design compared to recently widely used transistor-based design. • High-thermal effect and layer-separation-gap-decrement effect of proposed multilayer shifter designs are discussed. The entire contribution is methodically presented in 5 different sections: Sect. 2 reveals the theoretical surroundings related to the suggested technical and logical field; Sects. 3 and 4 present the configuration and results of the projected 8:1 multiplexer and 8-bit barrel-shifter, respectively, with proper parametric investigations, and Sect. 5 presents the concluded part with future possibilities of this presented circuitries.
4 Design and Implementation of Advanced Re-Configurable …
37
2 Theoretical Surroundings 2.1 QCA-Based Fundamental Gates The “3-input majority gate” or “3-input M-G,” “5-input majority gate” or “5-input M-G,” and “inverter gate are the most essential and highly used gates in our proposed low power 4-dotted Q-DCA-design technology, based on the previously discussed binary ‘0’ and binary ‘1’ selection in Q-DCA technical platform [10, 11]. The polarity of input part and the polarity of output part in a “3-input M-G” are the similar. If “A, B, and C” are the 3 inputs of this structure, then the output is presented in Eq. 1. M-G with 3-input also can be used to configure “AND-Gate” and “OR-Gate” by alternating the polarity from -1 to + 1 of one of the three inputs in “3-input M-G” and vice versa in the same way, which is also given in Eqs. 2 and 3, correspondingly (Fig. 1a presents a clear image of “3-input M-G”) [12, 13]. Another important multiinput M-G is the “5-input M-G.” Equation 4 represents a “5-input M-G” with input “A, B, C, D, and E.” If 3-inputs among the five are merged and change the clock zone (discussed below in QCA-based clock-scheme part) from clock 0 to clock 1 near the output section, it gives a “3-input XOR” output, and without changing the clock zone, it presents the output of normal “3-input M-G” with 4% output-strength increment, but the cell complexity is increased from 5 to 11. The “5-input M-G” is presented in this section in Fig. 1b, which is representative of the “3-input XOR” operation. As we know, “inverter gate” is rapidly required to design any digital-based circuitry, which gives a “NOT-Gate” outcome. In this theory about QCA-based widely used conventional logic gates, this clear reflection of basic single later “inverter gate” is also included, which is given in Fig. 1c. M − G(“A, B, C ) = AB + BC + AC
(1)
M − G(“A, B, 0 ) = A.B
(2)
M − G(“A, B, 1 ) = A + B
(3)
M − G(“A, B, C, D, E ) = ABC + AB D + AB E + AD E + AC E + AC D + BC E + BC D + B D E + C D E
(4)
38
S. Sarkar and R. Roy
Fig. 1 Q-DCA-based single-layer structure of a “3-input M-G,” b “5-input M-G,” and c “InverterGate”
Fig. 2 “Clock-phases” used in Q-DCA technology
2.2 Clock Scheme that Used for QCA Technology In Q-DCA, a different timing scheme helps control the flow of information from one part to another in the circuit, maintains performance gain by restoring lost signal energies in the environment, and identifies design delays. It is a concatenated structure where 4 clock zones with 4 clock phases are presented. The 4 different clock zones are as follows: “Clock Zone 1,” “Clock Zone 2,” “Clock Zone 3,” and “Clock Zone 4.” Likewise, the 4 different clock phases with a 90° phase difference are as follows: “switch, hold, release and relax.” All are shown in Fig. 2 [14]. When the clock is high, the potential barricade between the 2-dots drops and the total circuit polarization is 0, and when the clock is low, the potential barricade between the 2-dots is high, and electrons are placed in the dots by tunneling according to the cell polarization, which depends on the specified neighboring cells.
2.3 Reversible Logical Expression In the conventional logic gates discussed above, “copy deletion” cannot be possible. Thus, the energy is dissipated per bit, which can be maintained through design adiabaticity, and this adiabatic logic can be followed by using reversible gates, where “information erase with copy” can be maintained by the “Bennet Clock Scheme” [15– 17]. So, the energy loss per bit can be preserved by adding this reversible gate. In a conventional gate, the inputs are represented only by the outputs, but in this type of gate (return gate), the inputs are also dependent on the output. This means that
4 Design and Implementation of Advanced Re-Configurable …
39
Fig. 3 The “block diagram” of a “reversible gate”
the output distribution is also able to represent the input distribution and vice versa in this proposed feedback gate. To create this arrangement in a reversible gate, it is necessary to maintain the same number of inputs and outputs. Thus, we can say that a proper exploration of the advantages of Q-DCA-based circuits can be possible using a reversible gate, and due to its energy-controlled nature, it becomes more efficient in a multilayer platform. Therefore, this proposed design is formed in a hybrid way by adding a reversible gate with the widely used “3-input M-G.” Figure 3 presents a proper block diagram of the basic reversible gate.
2.4 Q-DCA-Based Multilayer Structure In circuit design, the criterion of crossing two wires is a very common and important thing, which becomes more complex at the time of circuit incrementing operations. Delay, area, output power, and power dissipation also depend on this criterion. So, choosing a crossover design is the trickiest part of building a circuit. In our proposed technology, Q-DCA, coplanar, multilayer, and crossing by changing the clock zones of two different crossing wires are presented [18]. In our work, a multilayer crossover is used, where different cells are specified in different layers, which acts as an inverter in two different consecutive layers. However, this type of design can increase the output strength compared to the coplanar form and also compared to one later “inverter gate” with a 25% reduction in delay. In this multilayer structure, the vertically separated quantum cells are tuned to match their kink energy in the horizontal plane in contrast to the transistor-based structure [19]. Figure 4 presents a bridged multilayer structure based on Q-DCA.
40
S. Sarkar and R. Roy
Fig. 4 Multilayer Q-DCA-based formation
2.5 Occupied Area, Delay, Dissipated-Power, and Tunneling Resistivity Calculation The Q-DCA-based structure performance model was primarily established by Timler and borrowed in 2002, and this model is based on the “Hamiltonian matrix” where the cell–cell “Hartree–Fock” is used to determine the methodology. “Hamilton matrix” [20]. The total amount of energy flow through the Q-dot cell is another value of the energy flow between the cells and the dissipated energy. Here, the total flow of energy through the quantum wires (P1) depends on the “power of the clock (‘Pclk’),” the “power of the input indicator (‘Pin’),” the “power of the output indicator (‘Pout’),” and “dissipated power (‘Pdiss’)” in the Q-cell base conductors. The relationship between these given powers is illustrated in this section in Eq. 5. This section also represents the gain equation (of each Q-cell) in Eq. 6. Power dissipation calculation is discussed also through Eq. 7. In this equation, ‘τ ’ = “the time of energy relaxation,” ‘γnew ’ and ‘γold ’ = “the clock energy during switching activity takes place,” ‘Po ’ and ‘Pn ’ = “the polarization of output and input,” and ‘Pold ’ and ‘Pnew ’ = “the polarization before and after Q-cell switching in that order” [21, 22]. P1 = Pin + Pout + Pclk + Pdiss
(5)
Gain = Pout /Pin
(6)
pnew 1 2γnew Po Pn γold − γnew + Ek τ Ek Pold Pnew 2 −(Pn − Po )]
Pdiss =
(7)
In our proposed Q-DCA-based nanoscale spin technology, the area-based circuit power dissipation is easily calculated considering a power dissipation of 100 W per cm2 area [20]. But, in this work, the power dissipation in the quantum wire is calculated using a few simple equations that depend on the switching time of the quantum cells, the complexity of the cell, and the distance between two quantum cells. The basic power dissipation equation used in this proposed work is also given in this section in Eq. 8.
4 Design and Implementation of Advanced Re-Configurable …
Pdiss = (E diss )/(Switching Time)
41
(8)
In the above equation, the energy dissipation (‘Ediss’) depends on the distance between two cells (‘r’), the quantum cell length (‘l’), the number of cells (‘C’), the translation angle based on the specification of two consecutive cells and cell fracture energy. The equation in this section is given in Eqs. 9 and 10 represents the “Kink Energy” display. The “switching time” depends on the cell complexity and tunneling speed (‘Tr’ = 1/tunneling time), and this switching time is also represented here by Eq. 11 [23]. Energy Dissipation = {r (C − 1)/l} × (Kink Energy)
(9)
Kink Energy = {23.04e − 29/r } J
(10)
Switching Time = (C − 1)/ Tr
(11)
Not only the estimated power, but also the delay and area estimation are expressed in this portion. For the systematic process replication of Q-DCA-structures’ requirearea (‘A’), in Eq. 12, it is shown in this part, where ‘n’ = “number of used cells in the vertical portion,” ‘m’ = “used cells of the horizontal portion,” ‘l’ = “length of each cell,” ‘w’ = “width of each cell,” ‘q’ = “the distance between two consecutive cells vertically,” and ‘r’ = “the distance between two consecutive cells horizontally.” Not only this occupied area but also the utilized area is also determined in this work, which is called area utilization factor (“AUF”) [24], and the used equation is presented in Eq. 13, where ‘C’ = the number of total used cells. The propagated latency of Q-DCA-formation is dependent of the required clocking zones, which are used in the “Critical-path” of this formed configuration. In this paper, the calculation of tunneling resistivity (‘ρT ’) is also presented using Eq. 14, where the ‘d’ = effectivetunneling-distance, ‘τ ’ = tunneling-rate, ‘ε’ = permittivity, ‘e’ = electronic-charge, and ‘c’ = speed of light. A = {(l × n) + q} × {(w × m) + r
(12)
AUF = [{(L × n) + q} × {(W × m) + r }]/(C × L × W )
(13)
(d 3 × τ ) ρT = ε × e × c2
(14)
42
S. Sarkar and R. Roy
3 Proposed Multilayer 8:1 MUX As we know, multiplexer or MUX is the basic component of barrel-shifter, and to design an 8-bit barrel-shifter, a novel 8:1 MUX design using multilayer Q-DCA technology is proposed in this paper. This proposed structure based on 2:1 MUX [25] design and this design is 75% reversible if any one input of this 2:1 MUX acts as a direct output because; 6 numbers of bit combination in output are matched with the input bit combination among 8 numbers of output bit combination in this proposed MUX design, and reversibility helps to store the information before removing. The Q-DCA-based multilayer novel structure of proposed 8:1 MUX is presented in this portion in two different ways: Fig. 5 presents all the layers separately, and Fig. 6 presents the combined-layer structure without showing any unwanted signals, where 4 clock zones are required with 4 different colors. Here, i0–i7 is the 8 inputs with S0–S2 three select lines, and OUT is the ultimate outcome. The simulated outputs of used 2:1 MUX and ultimate 8:1 MUX are also presented in this part in Figs. 7 and 8, respectively, where the outcomes of 2:1 MUX show 75% reversibility of the proposed design (when output q1 is the direct outcome of select line S). Proposed multilayer Q-DCA technology is able to present a more advanced occupied-area-efficient, less-complex; highly frequent and less dissipated powerbased 8:1 MUX compare to single-layer design which is shown in this portion through a comparison table with recent-related works (Table 1). The utilized area of the presented design is more than paper [9] and less than paper [8]. But, the proposed is more effective than paper [8]’s design due to 75% occupied area reduction, 54% cell-complexity reduction, 64% areal power dissipation reduction (used cell number based), 84% cost reduction, and 82% speed improvement achievement for only 30% utilized areal factor reduction in this proposed design compare to the design of paper
Fig. 5 Layers of proposed Q-DCA-based 8:1 MUX
4 Design and Implementation of Advanced Re-Configurable … Fig. 6 Combined-layers structure of proposed multilayer Q-DCA-based 8:1 MUX
Fig. 7 Simulated outcomes of used 2:1 MUX with reversibility
43
44
S. Sarkar and R. Roy
Fig. 8 Simulated outcome of proposed multilayer Q-DCA-based 8:1 MUX
[8]. Not only the previously discussed parameters but also this multilayer design is able to perform for less layer separation gap, which can reduce the volume of the design compared to normal value (11.5 nm) of multilayer Q-DCA-based designs. But, high-temperature effect is a trade off in multilayer circuitry, which affects the improvement of output strength because of the rise of electro-dispersion possibilities. In the presented 3D 3-layer 8:1 MUX design, temperature tolerance is 2 K more than the room temperature in the presence of layer separation gap reduction down to 7 nm with the same output strength.
4 Design and Implementation of Advanced Re-Configurable …
45
Table 1 Comparison table: Q-DCA-based 8:1 MUX design Designs of 8:1 MUX, year
Occupied area (µm2 )
AUF
Speed [1/propagation Delay (ps)] (THz)
Cell complexity
Areal power dissipation (nW)
Cost (area * delay [26])
Single-layer design of paper [8]
0.40
4.76
0.11
260
80
3.6
Single-layer design of paper [9]
0.12
2.78
0.11
135
44
1.08
Proposed 3-layered design
0.10
3.33
0.2
118
30
0.5
4 Proposed 8-Bit Barrel Shifter The presented advanced configuration of “8:1 MUX” is used for the design of an 8-bit Q-DCA-based barrel-shifter. As we know, multiplexer is the main component of barrel-shifter; in this work, 8:1 MUX is directly used to design 8-bit proposed shifters same as 4-bit barrel-shifter, where 4:1 MUX is the basic component. The block diagram of this 4:1 MUX-based 4-bit barrel-shifter is given in paper [27]. The used layers of the proposed shifter are presented in Fig. 9a–g, and the combined-layer structure is presented in Fig. 10 separately. The simulated outcomes also presented in this portion in Fig. 11, where the inputs are i0–i7 with 3 select lines S0 to S2 and outputs are O0–O7. The power dissipation based on cell area and based on switching effect and delay (propagation delay and switching delay) of the proposed shifter design is calculated in this work and compared the values with most optimized IGFINFET-based 8-bit barrel-shifter. The comparison table is presented through Table 2, and additionally, another table (Table 3) is presented in this portion, in which other parameters of the proposed multilayer 8-bit shifter are listed.
5 Conclusion A novel structure of multilayer 8:1 multiplexer-based “8-bit Barrel-Shifter” is designed utilizing Q-DCA technology, where 75% reversible 2:1 MUX is used to design the proposed 8:1 MUX, without using any extra area. The proposed multilayer 8:1 MUX can reduce 16.6% occupied area, 12.5% cell complexity, 31.8% power dissipation and 33.7% cost with 19.7% utilized area-factor improvement, and 81.8% speed improvement compare to recent single layer most optimized Q-DCAbased design. This optimization is utilized in this work to prove the advancement of
46
S. Sarkar and R. Roy
(a) Layer: 1
(b) Layer: 2
(c) Layer: 3 Fig. 9 Layers of proposed 8-bit barrel-shifter
4 Design and Implementation of Advanced Re-Configurable …
(d) Layer: 4
(e) Layer: 5
(f) layer: 6
Fig. 9 (continued)
47
48
S. Sarkar and R. Roy
(g) layer: 7
Fig. 9 (continued)
Fig. 10 Combined-layers structure of proposed 8-bit barrel-shifter
quantum cell-based technology compared to transistor-level IGFINFET technology by forming lower-size cell-based barrel-shifter, and all the calculated parametric values are presented in this paper with the required comparisons based on the occupied area, power dissipation, propagation delay, switching delay, AUF, temperature tolerance, layer separation gap, used cell-numbers, and the total tunneling resistivity. In this proposed shifter design, down to 8.5-nm layer separation gap can be applied to get a proper stable result. But, 10.5-nm layer separation gap presents a highest temperature tolerance with less output strength compare to previous one, or it can be said that the proposed multilayer barrel-shifter design can tolerate 81% more hightemperature effect compared to normal temperature for 19.5% increment of layer separation gap and 1% increment of maximum output strength.
4 Design and Implementation of Advanced Re-Configurable …
Fig. 11 Simulated outcomes of proposed 8-bit barrel-shifter
49
50
S. Sarkar and R. Roy
Table2 Comparison table: 8-bit barrel-shifter Technology
Power dissipation (W )
Delay (s)
20-nm IGFINFET [6]
36.45 * 10–6
9.013 * 10–9
16-nm cell-based multilayer Q-DCA
8 * 10–6 (due to switching effect) 6.7 * 10–6 (based on cell-area)
12 * 10–12 (propagation delay) 1.05 * 10–15 (switching delay)
Table 3 Parametric outcomes of proposed 8-bit barrel shifter Occupied area
1.18 µm2
Used cells
2644
Maximum temperature tolerance for most suitable layer separation gap
11 K more than room temperature for 10.5-nm layer separation gap
AUF
1.76
Total tunneling resistivity
92.5 * 105 m
References 1. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114– 117 2. Lent CS, Tougaw PD, Porod W, Bernstein GH (1993) Quantum cellular automata. Nanotechnology 4:49–57 3. Tougaw PD, Lent CS (1994) Logical devices implemented using quantum cellular automata. J Appl Phys 75:1818–1825 4. Babaie S et al (2019) Design of an efficient multilayer arithmetic logic unit in quantum-dot cellular automata (QCA). IEEE Trans Circ Syst 66:963–967 5. MirzajaniOskouei S, Ghaffari A (2019) Designing a new reversible ALU by QCA for reducing occupation area. J Supercomput 75:5118–5144 6. Kandasamy N, Telagam N, Kumar P, Gopal V (2020) Analysis of IG FINFET based N-bit barrel shifter. Int J Integr Eng 12(8):141–148 7. Padmavani C, Pavani T, Sandhya Kumari T, Anitha Bhavani C (2020) Design of 8-bit low power barrel shifter using self controllable voltage level technique. Int J Adv Sci Technol 29(8):3787–3795 8. Xingjun L, Zhiwei S, Hongping C, Reza M, Haghighi J (2019) A new design of QCA-based nanoscale multiplexer and its usage in communications. Int J Commun Syst 33(4):1–12 9. Ahmadpour SS, Mosleh M, Heikalabad SR (2022) Efficient designs of quantum-dot cellular automata multiplexer and RAM with physical proof along with power analysis. J Supercomput 78:1672–1695 10. Singh R, Pandey MK (2018) Analysis and implementation of reversible dual edge triggered D flip flop using quantum dot cellular automata. Int J Innov Comput Inf Control 14:147–159 11. Safoev N, Jeon J-C (2017) Area efficient QCA barrel shifter. Adv Sci Technol Lett 144:51–57 12. Walus K, Dysart TJ et al (2004) QCA designer: a rapid design and simulation tool for quantumdot cellular automata. IEEE Trans Nanotechnol 3:26–31 13. Roy SS (2016) Simplification of master power expression and effective power detection of QCA device. In: IEEE students’ technology symposium, pp 272–277 14. Askari M, Taghizadeh M (2011) Logic circuit design in nano-scale using quantum-dot cellular automata. Eur J Sci Res 48:516–526 15. Narimani R, Safaei B, Ejlali A (2020) A comprehensive analysis on the resilience of adiabatic logic families against transient faults. Integr VLSI J 72:183–193
4 Design and Implementation of Advanced Re-Configurable …
51
16. Pidaparthi SS, Lent CS (2018) Exponentially adiabatic switching in quantum-dot cellular automata. J Low Power Electron Appl 8:1–15 17. D’Souza N, Atulasimha J, Bandyopadhyay S (2012) An energy-efficient Bennett clocking scheme for 4-state multiferroic logic. IEEE Trans Nano Technol 11:418–425 18. Abedi D, Jaberipur G, Sangsefidi M (2015) Coplanar full adder in quantum-dot cellular automata via clock-zone based crossover. In: IEEE transactions on nanotechnology, 18th CSI international symposium on computer architecture and digital systems (CADS), vol 14, no 3, pp 497–504 19. Waje MG, Dakhole P (2013) Design implementation of the 4-bit arithmetic logic unit using quantum-dot cellular automata. IEEE, IACC, pp 1022–1029 20. Timler J, Lent CS (2020) Power gain and dissipation in quantum-dot cellular automata. J Appl Phys 91:823–831 21. Barughi YZ et al (2017) A three-layer full adder/subtractor structure in quantum-dot cellular automata. Int J Theor Phys 56:2848–2858 22. Ganesh EN (2015) Power analysis of quantum cellular automata circuit. Proc Mater Sci 10:381– 394 23. Roy SS (2017) Generalized quantum tunneling effect and ultimate equations for switching time and cell to cell power dissipation approximation in QCA devices. Phys Tomorrow, 1–12 24. Zahmatkesh M, Tabrizchi S, Mohammadyan S, Navi K, Bagherzadeh N (2019) Robust coplanar full adder based on novel inverter in quantum cellular automata. Int J Theor Phys 58:639–655 25. Majeed AH, Alkaldy E, Zainal MS, Navi K, Nor D (2019) Optimal design of RAM cell using novel 2:1 multiplexer in QCA technology. Circ World 46(2):147–158 26. Maharaj J, Muthurathinam S (2020) Effective RCA design using quantum-dot cellular automata. Microprocess Microsyst 73:1–8 27. Elamaran V, Upadhyay HN (2015) Low power digital barrel shifter datapath circuits using microwind layout editor with high reliability. Asian J Sci Res 8:478–489
Chapter 5
Recognition of Facial Expressions Using Convolutional Neural Networks Antonio Sarasa-Cabezuelo
1 Introduction Image recognition is a classic problem in the field of artificial intelligence [1], which has been treated with different techniques. The problem consists [2] of detecting the similarity or equivalence between images in an automated way as a result of processing the information contained in the images. In most solutions, the similarity is obtained by defining a distance [3] that allows measuring the closeness between the identifying characteristics that the images have. Thus, in recent years, this problem has been treated with machine learning algorithms [4], obtaining different levels of success, but solutions based on neural networks [5] stand out for their efficiency. This fact is consistent with the nonlinear nature of the problem and with the strength of networks to solve problems of this nature. However, despite the efficiency obtained, the classical neural network does not adapt well [6] to this problem due to the spatial factor associated with the problem of image recognition, given that the images are generally represented by one (in the case of being in gray scale) or three matrices (in the case of using the RGB representation) and the need to compute many parameters. The rise of big data [7] has favored the evolution of neural network models with the aim of improving their predictive and classifying capacity. And for this, their complexity has been increased in terms of intermediate processing layers and different connection architectures. Thus, a more particular model of artificial neural networks is the so-called convolutional neural networks (CNNs) [8], used, for the most part, in the recognition and classification of images. A convolutional neural network is generally made up of three types of layers: the convolutional layers that are responsible for performing feature extraction, the subsampling (or pooling) layers A. Sarasa-Cabezuelo (B) Universidad Complutense de Madrid. Dpto. Sistemas Informáticos y Computación, Calle Profesor José García Santesmases, 9, 28040 Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_5
53
54
A. Sarasa-Cabezuelo
that are used to reduce the size of the image, and the fully connected layers which are essentially a traditional multilayer neural network that is responsible for classifying the data processed by the previous layers. A particular problem related to image recognition is the recognition of facial expressions associated with human emotions [9]. Facial expressions are part of verbal communication and are based on small variations of the surface of a person’s face such as the curvature of the mouth that gives rise to different expressions that aim to convey an emotion. One difficulty that arises [10] in solving this problem is the fact that certain facial expressions can be easily confused. For example, frowning can express anger, but also disgust, and having an open mouth can indicate joy, surprise, or fury. It is for this reason that there are different proposals [11] for solving this problem depending on whether it is considered a discrete problem, each facial expression corresponds to a single emotion, or continuous (an easy expression could contain elements of several types of emotions). The recognition of facial expressions has been a problem treated using different approaches with convoluted neural networks. In the first works [12], trained CNN combinations were used with the aim of minimizing squared hinge loss. This approximation was improved [13] by increasing the training and test data, and using an ensemble voting mechanism via uniform averaging so that the ensemble predictions are integrated via weighted averaging with learned weights. Another proposal [14] is based on hierarchically integrating the predictions of the set with assigned network weights according to the performance of the validation set. In another work, with the aim of obtaining a trained model that generalizes its classificatory capacity, a single CNN based on the Inception architecture was used where data from multiple posed and naturalistic datasets were processed in an effort. Proposals for improvements have also been implemented based on [15] increasing the amount of training data by using multiple datasets with heterogeneous labels and applying a patch-based feature extraction and registration technique as well as feature integration through an early fusion. Related to this idea, the use of [16] registered and unregistered versions of a given face image during training and testing has been tested. Other approaches [17] are based on the detection of facial landmarks in images; however, they present the problem of challenging facial expressions or partial occlusions [18] that make detection difficult. Several solutions have been proposed for this problem, such as the detection of reference points in several versions of the same images [19], the illumination correction [20], or the normalization of the images by means of a histogram equalization and linear plane fitting [21]. Another line of work is based on the variation of the depth of the networks (number of layers with weights), where it has been possible to prove [22] that great efficiency can be obtained with a depth of 5. The influence of the dataset used in training. Some works use the union of seven [23] and three [24] datasets, while in other works preprocessing functions are applied to obtain additional information such as the calculation of a vector of HoG features from facial patches [25] or the encoding of face pose information [26]. Likewise, other works augment the data using horizontal doubling techniques [27], random trimming [28], or related transformations. Other variants are based on which dataset is augmented, on the training set [29] or on the test dataset [30].
5 Recognition of Facial Expressions Using Convolutional Neural Networks
55
The objective of this work is to compare the performance of different convolutional neural networks to identify the facial emotion represented in an image. These convolutional networks behave well since they preprocess the images in such a way that each element of the image depends directly on those that surround it. To carry out the study, several architectures are considered, and versions are generated where some elements are improved, such as the increase in complexity, the introduction of regularization methods, or the change of the activation function. Likewise, the size and type of the dataset have been experimented with, so that initially, the FER2013 [31] facial expression image dataset is used, to which artificially generated images are added through various mechanisms, with the aim of evaluating a performance improvement. The structure of the paper is as follows. Section 2 briefly describes the dataset used as well as the different convoluted neural networks used in the experiment. Next, Sect. 3 shows the results obtained from the experiment. In Sect. 4, a discussion is made about the results shown. And finally, Sect. 5 presents the conclusions and a set of lines of future work.
2 Materials and Methods 2.1 Materials For the training and validation of the networks, the FER2013 database [32] has been used, which has 35,887 grayscale images divided into 28,709 training images and 7,178 for testing. Each of them is labeled according to the emotion they represent: anger, disgust, fear, happiness, sadness, surprise, and neutral. The distribution of the images according to their labels can be seen in Table 1. The images are represented by a square matrix of dimension 48 of integers between 0 and 255 where 0 represents black and 255 white. Table 1 Number of images per label
Label
Training
Test
Total
Anger
3995
958
4953
Disgust
436
111
547
Fear
4097
1024
5121
Happiness
7215
1774
8989
Sadness
4830
1247
6077
Surprise
3171
831
4002
Neutral
4965
1233
6198
Total
28,709
7178
35,887
56
A. Sarasa-Cabezuelo
2.2 Methods A convolutional neural network is made up of three types of layers: the convolutional layers themselves, which are responsible for performing feature extraction, the subsampling (or pooling) layers used to reduce the size of the image, and the completely subsampling layers connected (multilayer neural networks) that are responsible for classifying the data processed by the previous layers. Convolutional layers are based on [33] a matrix operation called convolution and defined as: Y [i, j] =
∞
∞
X [i + k1; j + k2]W [k1; k2]
k1=−∞ k2=−∞
where X is the data matrix and W is the filter matrix that allows features to be extracted. Although the X and W matrices are finite, the sums are infinite, since what is done is to fill with 0 those indices in which they are not defined (padding). In this sense, the matrix operation represents a movement [34] of the filter matrix over the input matrix after padding with zeros (adding as many rows and columns of zeros as necessary to the outside of the matrix) as a sliding window and calculating the sum of element-to-element products. The number of indices the filter array moves over the input array is called the offset. Thus, the size of the result matrix depends on the offset that is applied to the filter in each iteration of the convolution and the number of zeros with which the input matrix is filled. Depending on the desired size of the result matrix, three types of padding [35] are distinguished. The first type, padding the same, involves adding as many rows and columns as necessary to the input array so that, taking into account the size of the filter and the offset used, the result array has the same size as the input array. The second type, padding valid, consists of not adding padding to ensure that the result matrix has a smaller size than the input one. And finally, the third fill, full fill, seeks that all the elements of the matrix are subjected to the same number of convolutions, so that a result matrix of a size greater than the input is obtained. Note that the data matrix in the case of a grayscale image is provided [36] as a single matrix, or as three in the case of a color image in RGB format, one representing the reds, another the greens, and one third the blues. In either case, the convolution operation will sort the input data to extract the salient features and from them build the so-called feature hierarchy where low-level features (such as image edges) are sorted and the high-level ones (which are combinations of the low-level ones, for example, a shape of an object). The subsampling layers are based on [37] the grouping of the adjacent data by generating a new data matrix based on a given of so that its size is smaller. To do this, it is divided into submatrices, and it operates with the elements of this submatrix with the aim of obtaining a single element. The most used operations [38] are to obtain the maximum of that submatrix (max-pooling) or to take the average of its elements (average pooling). The objective of these layers is to reduce the size of the
5 Recognition of Facial Expressions Using Convolutional Neural Networks
57
features, which implies an improvement in performance and decreases the degree of overlearning (that specific features of the training sample are taken as general when classifying), and helps to elaborate characteristics more resistant to noise, so that data that are out of the ordinary do not spoil the sample. The fully connected layers are formed [40] by a multilayer neural network that is responsible for classifying the data preprocessed by the previous ones. Before that, it is necessary to use a flattening layer that is responsible for transforming the data into a one-dimensional array. There are other layers such as [41] activation layers where activation functions are applied to the data (they can be separate or integrated in the convolutional and fully connected layers), dropout layers to decrease overlearning by deactivating a percentage of the connections in the network chosen randomly, and the batchnormalization layers that are responsible for normalizing the output of another layer in such a way that extreme cases are avoided. The structure of a convoluted network is configured by [42] setting how many convolutional layers will be used, where they will be placed, the size of the filter, the type of subsampling, and the activation functions used. The way of combining the different layers and their parameters gives rise to different results of lower or higher quality depending on the problem in question. Next, the CNN architectures that will be used for the facial expression recognition problem will be briefly described. (a) Le-Net 5 is a convoluted neural network [43] consisting of a convolutional layer that subjects the image to 6 square filters of order 5 with a shift of one unit and valid padding (i.e., no padding). In this way, 6 new images are obtained, one per filter, square of order 28. The hyperbolic tangent is applied to these images resulting from the convolution as an activation function. After this, there is a subsampling layer of means of square submatrices of order 2. This reduces the size of the image to 14 × 14. Once this has been done, it is repeated again; a convolution layer with 16 filters of size 5 × 5, with one unit offset and valid padding, tanh is applied and subsampled again. At the end of this process, 16 images of size 5 × 5 are available, which are transformed into a one-dimensional array. This array is used as input in a neural network that has 120 artificial neurons as input layer, a hidden layer of 84, and an output layer of 10. Of these layers, the output layer uses a SoftMax activation function [44] and the other two the hyperbolic tangent. (b) Basic CNN is a convoluted neural network [45] made up of 2 convolutional layers (which in this case use equal padding, ReLU as the activation function and a unit shift), each followed by a subsampling layer, and after this, it will be passed as input the fully connected layers that form the final neural network. These layers will be 3, one input, one hidden, and one output. Its main difference with the Le-Net 5 architecture is the increase in the number of filter units and artificial neurons, which means an increase in the number of parameters on which learning is performed. (c) AlexNet is a convolutional neural network [46] architecture that features a convolutional layer applied to the input data, 48 × 11 × 11 filters with an
58
A. Sarasa-Cabezuelo
offset of 4, and no padding. This reduces the image from 224 × 224 to 55 × 55. After this, a maximum subsampling layer is applied. The data are then passed through another convolutional layer with a 5 × 5 filter, with equal padding and offset of 1, and a subsampling layer is applied again. To end the preprocessing, 3 equal convolutional layers with 3 × 3 filters and a final subsampling layer are applied. These data serve as input to a 3-layer neural network in which the first two, the input and the hidden, have 4096 neurons and the output 1000, which corresponds to the number of classes to be differentiated. This network was the first to use the ReLU function as an activation function and generally performs well in image classification. (d) ResNet-50 is a neural network design technique [47] that is characterized by using shortcuts in the network. To do this, a layer is not connected to the immediately following one, but rather it is skipped and connected to another one that is later on. In this way, through an additional operation (or layer), the data that have passed through a series of layers are combined with others that come from further back through a shortcut. These architectures are called ResNet, and they are a very useful alternative when a network becomes complex by increasing the number of parameters or layers since adequate performance may not be obtained and training times are very high. A particular case is the ResNet-50 network whose structure consists mainly of the interleaved use of convolution and identity blocks where shortcuts are used. The first in the shortcut is subject to block normalization to the data matrix, and the second is passed directly.
2.3 Artificial Data Generation The quality of the classification carried out by neural networks depends on the number of samples [48] that the training dataset has. In this sense, it is necessary to have a sufficiently high number of samples to be able to extract the general characteristics of each class that allow them to be differentiated (if the samples are not sufficient, aspects such as which side is facing or the relative position of the eyes and mouth in the image when they do not provide information about the expression). To solve this problem, artificial training data will be generated based on the existing ones through 3 types of transformations: • Image flipping [49]: it modifies the image to a lesser extent. To do this, if, for example, there are images of 48 × 48 pixels; the column in position i is exchanged for the one in position 48 − i + 1. Thus, a new image is obtained with a mirror effect on the first one. This simple implementation method has been successfully tested on databases such as ImageNet or CIFAR-10. • Translation [50]: it consists of displacing all the columns or rows of the image by the same amount and in the same direction. Since images are often centered, you may get a better result than you would if you didn’t do anything if the images you are testing on aren’t centered afterward. When moving the rows or columns of an image, those that are outside the dimension of the image are eliminated while new
5 Recognition of Facial Expressions Using Convolutional Neural Networks
59
ones have to be filled in on the opposite side. This padding can be any value. In this work, the first row or column has been repeated as many times as necessary. In this sense, the number of positions in which the image is moved in the vertical case is less than in the horizontal case since in the images, generally, the mouth tends to be very low and could be eliminated. • Random elimination [51]: it consists of defining a number of submatrices of the variable dimension image (image patches) and transforming the elements that form it into a fixed value (white has been used in this work). This method produces good results as it forces the network to ignore certain features. Note that this transformation can decrease overlearning.
2.4 Key Performance Indicators This section defines the metrics (or KPIs, Key Performance Indicators) used to be able to evaluate the results of the algorithms used [52]: • Precision It is the proportion of elements that truly belong to a class among all those that have been classified as such. The precision of class k calculated as follows: ck vk precisionk = n i=1 ck vi where ci v j : In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j. • Recall It is the proportion of all elements truly belonging to a class, those that have been classified as such. The recall of class k is calculated as follows: ck vk recallk = n i=1 ci vk where ci v j : In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j. • Accuracy Indicates the proportion of total success of the model and coincides with the weighted average of the recall. It is calculated as follows: n accuracy =
i=1 ci vi
N
60
A. Sarasa-Cabezuelo
where NThe total number of elements to classify. ci v j In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j.
3 Results The neural architectures described have been trained on the same subset of data in order to maintain the same conditions between the architectures. The only exception to this is those experiments where artificial data generation is used since these are created randomly at runtime. In the first place, the implementation that provides the best result will be chosen and will be subjected to another training with the dataset after introducing the artificial data generation techniques. These artificial data generation techniques will always be the same although they will provide different images due to the randomness on which they are based. For each image, it will be decided whether each of the techniques is applied independently, and one image may be subjected to several. This is decided as follows: • Each image in the base training dataset has a 50% chance of generating a flipped copy. • Each image in the base training dataset has a 50% chance of generating a translated copy. This copy has the same probability of being translated on the horizontal axis (with an offset of between 5 and 15 columns, randomly decided) as on the vertical axis (by 5–10 rows in this case). • Each image in the base training dataset has a 25% chance of spawning a copy of itself with patches removed. The number of patches and their size is decided randomly and can be between 1 and 3 patches of between 5 and 15 pixels in width and height each. When evaluating each of the convolutional neural networks, the same test set will always be used, and the precision, recall, and accuracy of each of the trained models will be studied.
3.1 Le-Net 5 Using this architecture, 3 models have been implemented, changing some elements of it: • Implementation 1: The last connected layer has been changed so that it contains 7 neutrons since this is the number of classes that we want to classify. • Implementation 2: In variant I1, the tanh activation function has been replaced by a ReLU function
5 Recognition of Facial Expressions Using Convolutional Neural Networks Table 2 Results of Le-Net 5
61
Label
Precision
Recall
Accuracy
Implementation 1
0.56
0.33
0.33
Implementation 2
0.59
0.41
0.41
Implementation 3
0.63
0.45
0.45
Best Implementation
0.64
0.50
0.50
• Implementation 3: In the I2 variant, the subsampling of means has been changed to maximums, and dropouts and block normalization have been introduced as regularization methods to prevent overlearning of the dataset. Note that the best results are achieved with implementation 3. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 2 shows the results obtained.
3.2 Basic CNN Using this architecture, 5 models have been implemented, changing some elements of it: • Implementation 1: The base implementation is used. It is observed that after applying the second layer of convolution and subsampling, there is still an image with a considerable size (12 × 12). This causes that the number of parameters to process is high and that the network takes time to train. • Implementation 2: In the previous implementation, a third convolution layer with its corresponding subsampling layer is introduced. This new layer has 248 filters, and their size is reduced from 5 × 5 to 3 × 3 because the image is not considered large enough for a 5 × 5 filter to provide relevant information. In addition, a second hidden layer is introduced, just like the first, in the fully connected layers to see if it is possible to extract more information from the data that was available. • Implementation 3: Using implementation 2, dropout and block normalization layers are introduced after each convolutional layer and fully connected layer as a technique to reduce overlearning • Implementation 4: Using implementation 2, between each subsampling layer, the convolution layers are doubled, introducing a new layer after each of the ones that were available and the same with one exception, all the new ones use filters of size 3 × 3 to see if it is possible to obtain new information that was not previously available. • Implementation 5: Dropout and block normalization are introduced in the previous version.
62 Table 3 Results of basic CNN
A. Sarasa-Cabezuelo Label
Precision
Recall
Accuracy
Implementation 1
0.60
0.42
0.42
Implementation 2
0.61
0.41
0.41
Implementation 3
0.63
0.41
0.41
Implementation 4
0.62
0.44
0.44
Implementation 5
0.60
0.52
0.52
Best-implementation
0.65
0.48
0.48
Note that the best results are achieved with implementation 5. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 3 shows the results obtained.
3.3 AlexNet Using this architecture, 3 models have been implemented, changing some elements of it: • Implementation 1: The first convolutional layer was adapted so that it did not reduce the size of the image. For this, the displacement was reduced from 4 to 1, and, since the FER213 images are much smaller in size than those of the AlexNet input, the filter dimension was changed from 11 × 11 to 5 × 5 since it was considered that such a filter large could not give quality information about the image. Also, the last connected layer has been changed to have seven outputs, and due to the difference between the number of classes, the size of the hidden layer has been reduced to 2048 • Implementation 2: In the previous version, the fully connected input layers are replaced by one of 1024 units, two hidden layers of 2048, and one output of 7 • Implementation 3: Dropout and block normalization are introduced in the previous version. Note that the best results are achieved with implementation 3. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 4 shows the results obtained. Table 4 Results of AlexNet
Label
Precision
Recall
Accuracy
Implementation 1
0.63
0.39
0.39
Implementation 2
0.62
0.46
0.46
Implementation 3
0.56
0.48
0.48
Best-implementation
0.63
0.52
0.52
5 Recognition of Facial Expressions Using Convolutional Neural Networks Table 5 Results of ResNet-50
63
Label
Precision
Recall
Accuracy
Implementation 1
0.60
0.51
0.51
Implementation 2
0.60
0.43
0.43
Best-implementation
0.64
0.51
0.51
3.4 ResNet-50 Using this architecture, 2 models have been implemented, changing some elements of it: • Implementation 1: The final fully connected layer is changed by one of 7 learning units (the number of classes to classify) • Implementation 2: In the previous implementation, three fully connected layers are added before the output layer. The first has 1024 learning units; the other two have 2048 and an output of 7. Likewise, dropout and block normalization techniques are also used in these layers. Note that the best results are achieved with implementation 1. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 5 shows the results obtained.
4 Discussion From the results of the metrics obtained in the experiments, the following conclusions can be drawn. In the first place, it has been verified that there is a direct influence of the type of activation function that is used in the networks since the experiments show that when the ReLU activation function is used in all the layers of the networks (except in the one of output where in all cases SoftMax is used) better results are obtained than using other activation functions. Secondly, it has been found that there is a direct relationship between the complexity of the networks (number of filters in the convolutional layers and number of learning units) and the goodness of the results. In this sense, in general, the more complex the networks, the better the results obtained. This phenomenon does not hold if the network was already complex enough (for example, in the ResNet-50 architecture, when several convolutional layers are added to the first implementation, then the resulting second implementation does not improve, probably because the images before reaching the fully connected layers had already been pre-rendered). Thirdly, the results show that the use of the dropout regularization and block normalization techniques, used in all the experiments, generally improve the accuracy of the networks. Fourth, it has been verified in all types of networks that the extension of the training dataset shows in all cases an improvement in the accuracy of the model,
64
A. Sarasa-Cabezuelo
and also, in three of the implementations, it increases or maintains the same global accuracy. Finally, it can be seen that it is possible to obtain as good results with a simple network as those obtained with more complex networks. This phenomenon is observed in the Le-Net 5 network where changing the activation function, using regularization techniques and expanding the training dataset, results are as good as those obtained with other more complex networks and with a lower training time due to the simplicity of the network (for example, it can be compared with the best implementation of the AlexNet network where more filters and more preprocessing are introduced). If the results obtained in the confusion matrices (Appendix A) are analyzed, the following phenomena can be observed. The class of expression of anger is the one that presents the most classification errors and where the networks work the worst since in all of them copies that are not are classified as anger. In particular, the results show that this confusion error with the expression of anger occurs mainly with the fear, sadness, and neutral classes. However, it occurs to a lesser extent with the expression of joy or surprise. This behavior could be explained by several reasons. Regarding the behavior with the expressions of joy or surprise, the explanation could be that in the case of the expression of joy it is the opposite emotion to anger, and it presents an equally opposite expression, and in the case of surprise, the explanation could be that in the gesture of surprise an O is formed with the mouth that is easier to recognize and therefore to distinguish more easily. And regarding the misbehavior with the expressions of fear, sadness, and neutral, the explanation could be found in the implementation of the models, and the internal workings of the framework used, so that it could be happening that if there is not a sufficient number of copies to be able to recognize an image as belonging to a class, then they are classified as the class of expression that is most similar and that follows in greater number of copies to the previous ones (in this case as an expression of anger). Likewise, in the particular case of the misclassification with the expression of sadness, the explanation could be due to the difficulty in distinguishing both expressions because some of the facial features that can be associated with anger can also be associated with sadness (line of the mouth straight or slightly curved downward). On the other hand, there is the expression of disgust that also presents classification problems. In this case, the explanation is due to the fact that the number of images of disgust (547) is much lower than that of other classes such as happiness (which has 8989 images). This problem could be reduced if techniques are used to reduce class imbalance by introducing new images that can be classified in the disgust category. Finally, it is observed that ignoring the previous cases, the models used classify the rest of the classes of expressions quite accurately.
5 Recognition of Facial Expressions Using Convolutional Neural Networks
65
5 Conclusions and Future Work In this work, the ability of some types of convolutional neural networks to classify images showing different facial expressions corresponding to emotional states has been studied. For this, several experiments have been carried out. In the first place, 4 types of networks have been chosen, and for each of them, improvements have been implemented for each of the networks by varying their architecture or their components. Each of the implemented variants has been trained and tested with the same dataset obtained from the FER2013 facial expression image database, and the results obtained in each case have been compared with the aim of selecting the variant that is best classified for each type of architecture. Secondly, artificial data have been generated from the images used in the FER2013 database by means of three artificial data generation techniques (flip, translation, and random elimination) with the aim of increasing the number of specimens to train and test the networks and in this way to be able to measure the impact with respect to the number of copies on the results obtained. The dataset augmented with the artificially generated data has been tested with the best variants that had been obtained using the original data. And the main conclusions that have been obtained from the results of the experiments show that: (1) Using the ReLU activation function in the non-output layers performs better than the hyperbolic tangent activation function (2) The use of regularization techniques improves the accuracy of the network (3) Extending the dataset using artificial data generation techniques improves network performance (4) The greater the complexity of a network, the better results are obtained. However, it is possible to obtain similar results with simple networks to those obtained in more complex implementations if their parameters and components are adjusted appropriately. (5) Bad behavior is obtained in the classification with the expression of anger, probably due to the fact that this emotion shares features with the rest of the expressions, such as the curvature of the mouth between anger and sadness. (6) Misbehavior is also obtained with the expression of disgust due to the low number of images in the dataset used when compared to the number of images available for the rest of the facial expressions. There are several lines of future work to improve the results obtained in this work. Firstly, it is proposed to extend the dataset used for training by means of other datasets or by generating more artificial data by means of other techniques such as image blending. In this sense, it would also be interesting to compare artificial data generation techniques to analyze which are the most appropriate for this problem. Another line of future work is to analyze the impact on the classification results of the greater or lesser use of preprocessing techniques (for example, the lighting of images or the extraction of facial features) or the type of images used (in particular, to analyze the difference between the use of 2D and 3D images). Lastly, another line of work consists of analyzing how the fact of considering that facial expressions
66
A. Sarasa-Cabezuelo
of emotions are discrete or continuous influences the classification process, that is, considering that an easy expression can only represent a single type of emotion, or considering that in the same facial expression traits of various emotions can be found intermingled. Acknowledgements I would like to thank Mateo García Pérez for developing the analyses.
References 1. Ali W, Tian W, Din SU, Iradukunda D, Khan AA (2021) Classical and modern face recognition approaches: a complete review. Multimedia Tools Appl 80(3):4825–4880 2. Liu Q, Zhang N, Yang W, Wang S, Cui Z, Chen X, Chen L (2017) A review of image recognition with deep convolutional neural network. In: International conference on intelligent computing. Springer, Cham, pp 69–80 3. Javidi B (2022) Image recognition and classification: algorithms, systems, and applications. CRC Press 4. Pak M, Kim S (2017) A review of deep learning in image recognition. In: 2017 4th international conference on computer applications and information processing technology (CAIPT). IEEE, pp 1–3 5. Quraishi MI, Choudhury JP, De M (2012) Image recognition and processing using artificial neural network. In: 2012 1st international conference on recent advances in information technology (RAIT). IEEE, pp 95–100 6. Chen H, Geng L, Zhao H, Zhao C, Liu A (2021) Image recognition algorithm based on artificial intelligence. Neural Comput Appl 2021:1–12 7. Hu Z, He T, Zeng Y, Luo X, Wang J, Huang S, Lin B (2018) Fast image recognition of transmission tower based on big data. Protect Control Mod Power Syst 3(1):1–10 8. Sapijaszko G, Mikhael WB (2018) An overview of recent convolutional neural network algorithms for image recognition. In: 2018 IEEE 61st international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 743–746 9. Revina IM, Emmanuel WS (2021) A survey on human face expression recognition techniques. J King Saud Univ Comput Inf Sci 33(6):619–628 10. Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Progr Artif Intell 9(2):85–112 11. Nonis F, Dagnes N, Marcolin F, Vezzetti E (2019) 3D Approaches and challenges in facial expression recognition algorithms—a literature review. Appl Sci 9(18):3904 12. Ekundayo O, Viriri S (2019) Facial expression recognition: a review of methods, performances and limitations. In: 2019 conference on information communications technology and society (ICTAS). IEEE, pp 1–6 13. Kodhai E, Pooveswari A, Sharmila P, Ramiya N (2020) Literature review on emotion recognition system. In: 2020 international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–4 14. Abdullah SMS, Abdulazeez AM (2021) Facial expression recognition based on deep learning convolution neural network: a review. J Soft Comput Data Min 2(1):53–65 15. Masson A, Cazenave G, Trombini J, Batt M (2020) The current challenges of automatic recognition of facial expressions: a systematic review. AI Commun 33(3–6):113–138 16. Altaher A, Salekshahrezaee Z, Abdollah Zadeh A, Rafieipour H, Altaher A (2020) Using multi-inception CNN for face emotion recognition. J Bioeng Res 3(1):1–12 17. Owusu E, Kumi JA, Appati JK (2021) On facial expression recognition benchmarks. Appl Comput Intell Soft Comput
5 Recognition of Facial Expressions Using Convolutional Neural Networks
67
18. Ekundayo O, Viriri S (2021) Multilabel convolution neural network for facial expression recognition and ordinal intensity estimation. PeerJ Computer Science 7:e736 19. Kaur P, Krishan K, Sharma SK, Kanchan T (2020) Facial-recognition algorithms: a literature review. Med Sci Law 60(2):131–139 20. Pham L, Vu TH, Tran TA (2020) Facial expression recognition using residual masking network. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 4513–4519 21. Alreshidi A, Ullah M (2020) Facial emotion recognition using hybrid features. Informatics 7(1) 22. Balasubramanian B, Diwan P, Nadar R, Bhatia A (2019) Analysis of facial emotion recognition. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). IEEE, pp 945–949 23. Chengeta K, Viriri S (2019) A review of local, holistic and deep learning approaches in facial expressions recognition. In: 2019 conference on information communications technology and society (ICTAS). IEEE, pp 1–7 24. Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401 25. Dino H, Abdulrazzaq MB, Zeebaree SR, Sallow AB, Zebari RR, Shukur HM, Haji LM (2020) Facial expression recognition based on hybrid feature extraction techniques with different classifiers. TEST Eng Manage 83:22319–22329 26. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 27. Kumar A, Kaur A, Kumar M (2019) Face detection techniques: a review. Artif Intell Rev 52(2):927–948 28. Zhang H, Jolfaei A, Alazab M (2019) A face emotion recognition method using convolutional neural network and image edge computing. IEEE Access 7:159081–159089 29. Saxena A, Khanna A, Gupta D (2020) Emotion recognition and detection methods: a comprehensive survey. J Artif Intell Syst 2(1):53–79 30. Jaapar RMQR, Mansor MA (2018) Convolutional neural network model in machine learning methods and computer vision for image recognition: a review. J Appl Sci Res 14(6):23–27 31. Singh S, Nasoz F (2020) Facial expression recognition with convolutional neural networks. In: 2020 10th annual computing and communication workshop and conference (CCWC). IEEE, pp 0324–0328 32. Kusuma GP, Jonathan APL, Lim AP (2020) Emotion recognition on fer-2013 face images using fine-tuned vgg-16. Adv Sci Technol Eng Syst J 5(6):315–322 33. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional neural networks and applications: a survey. Mech Syst Signal Process 151:107398 34. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629 35. Zhou DX (2020) Theory of deep convolutional neural networks: downsampling. Neural Netw 124:319–327 36. Lindsay GW (2021) Convolutional neural networks as a model of the visual system: past, present, and future. J Cogn Neurosci 33(10):2017–2031 37. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516 38. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868 39. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74 40. Boulent J, Foucher S, Théau J, St-Charles PL (2019) Convolutional neural networks for the automatic identification of plant diseases. Front Plant Sci 10:941 41. Véstias MP (2019) A survey of convolutional neural networks on edge with reconfigurable computing. Algorithms 12(8):154
68
A. Sarasa-Cabezuelo
42. Kimutai G, Ngenzi A, Said RN, Kiprop A, Förster A (2020) An optimum tea fermentation detection model based on deep convolutional neural networks. Data 5(2):44 43. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 44. Kouretas I, Paliouras V (2019) Simplified hardware implementation of the softmax activation function. In: 2019 8th international conference on modern circuits and systems technologies (MOCAST). IEEE, pp 1–4 45. Jaafra Y, Laurent JL, Deruyver A, Naceur MS (2019) Reinforcement learning for neural architecture search: a review. Image Vis Comput 89:57–66 46. Ismail Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Petitjean F (2020) Inceptiontime: finding alexnet for time series classification. Data Min Knowl Disc 34(6):1936– 1962 47. Li B, Lima D (2021) Facial expression recognition via ResNet-50. Int J Cogn Comput Eng 2:57–64 48. Lateh MA, Muda AK, Yusof ZIM, Muda NA, Azmi MS (2017) Handling a small dataset problem in prediction model by employ artificial data generation approach: a review. J Phys Conf Ser 892(1) 49. Taylor L, Nitschke G (2018) Improving deep learning with generic data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1542–1547 50. Body T, Tao X, Li Y, Li L, Zhong N (2021) Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models. Expert Syst Appl 178:115033 51. Liu H, Motoda H (eds) Computational methods of feature selection. CRC Press 52. Peral J, Maté A, Marco M (2017) Application of data mining techniques to identify relevant key performance indicators. Comput Stand Interf 54:76–85
Chapter 6
Identification of Customer Preferences by Using the Multichannel Personalization for Product Recommendations B. Ramakantha Reddy
and R. Lokesh Kumar
1 Introduction From the past publication articles, it has been seen that the promotional businessfacing has very high variations because of the fast and excessive dispersion of the Internet. Also, the conventional, and online advertisement media are utilizing a search engine and social media for enhancing their promotional resolution [1]. So, the customers or consumers give their messages through various online channels and conventional mode channels. After that, the reaction of customers to online mode advertising is neglected [2]. As of now, one-third of the persons are utilizing the adblock software for removing unnecessary advertising channels. From the literature review, many companies use the personalization concepts for improving their efficiency of publicity [3]. Personation is one of the most common product recommendation systems which are utilized in e-commerce. Later, the personalization is incorporated directly with the software industry website, and the email communication happens between the customers and organizations [4]. At the initial stage, personalization recommendations are mostly related to the intelligence technology and it transfers to the substantial advertising methodology for attracting the particular audience. In article [5], the authors evidently explain that the tentative results give improved publicity rates. From the last few year’s research data, it is clearly identified that the product endorsement system is frequently applied for all industrial applications [6]. In article [7], a German retailer explains that the purchasing order rate has improved by 25% from the help of a product related B. Ramakantha Reddy · R. Lokesh Kumar (B) Vellore Institute of Technology, Katpadi, Vellore, Tamil Nadu 632014, India e-mail: [email protected] B. Ramakantha Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_6
69
70
B. Ramakantha Reddy and R. Lokesh Kumar
recommendation system. The excessive enhancement of advertisement channels and at most deteriorate advertisement funds within the companies makes the companies take the decision on effective investment of funds for publicity purposes. So, it is very complex to find out the improvement of advertisement channels and their related messages on advertisement. The personalized advertisements are a present concern in various media channels in order to evaluate recommendations for industries for attracting the consumers [8]. Hence, in this article, the evaluation of personalized product endorsements have been designed in various advertisements in terms of different media channels, essential recommended techniques. Also, the other design parameters are considered based on the customer dependent. In article [9], the single information communicating channel is utilized for the design of a product personalization system which is a website of a retailer. The past research gives the first examination of how the recommendation features effect on customer’s motivation for following the personalization recommendations. The second one gives the comparison of banner-based advertising, array or package inserts, and finally the advertisement of email [10]. This article gives the boosted and advanced present information on personalized features effect on advertising Medias. Also, it gives the interrelation between the product recommendations design, and personalized related research. Here, the human gender has been considered. From the previously published magazines, it is identified that the female candidates are more involved in clothing business when compared to the male candidates [11]. So, gender variation is one of the major factors in advertising the clothes manufacturing companies. Also, more data is collected from various customers before buying the clothes. Now, based on the generation, the nature of online shopping, and male purchasing nature is changing rapidly day by day [12]. The enjoyment of male peoples are increasing rapidly for each candidate deeper understanding is analyzed which is illustrated as a research call. So, the discrete analysis has been made for each and every gender of human beings.
2 Realistic Choice-Based Conjoint Analysis The experimental investigation is done in this section based on three considerations: those are defined as the research area, design of experimental personalized product, and finally different samples and collection of data.
2.1 Area of Research From the recent papers review, the personalized effects on products are explained by taking the factorial strategy and it needed various samples for making the conditions. So, the determination of each preference related to the various alternatives of experiment analysis is not all suitable for all conditions. So, it is well and good in particular
6 Identification of Customer Preferences by Using the Multichannel …
71
conditions. Due to this disadvantage, the experiment analysis is replaced and it is used as a choice related investigation Bayes was utilized as a centric user determination technique for evaluating the recommendations of personalized products [13]. Also, the conventional conjoint investigation is applied for finding the consumer’s preferences behavior from the rank related data or rating related data [14]. In conjoint related investigation analysis, the defendants may be asked to select an appropriate option from the choice data sets which is named as a set of data alternatives [15]. The respondents may select the none option instead of the different product choices that show that their selection shows to some other current stimuli. This type of choices selection in between the various substitutes are quite the same to the present real world results in the various advertising places. As a result, the customer’s choices may be taken into account in a realistic way [16]. However, in this article, the visual indication of all stimuli has been carried out. This may come due to the particular characteristics of recommendations related to the product. The results of recommendations have been determined by utilizing the underlying concept. In this underlying concept, the recommendations are determined by the combination of stimulation characteristics, and personal characteristics. There are a number of published articles in the marketing field giving the alternatives by visual manner [17]. The primary research of marketing handles the landing related pages optimization, and interfacing of websites. Also, the determination of advertisement effectiveness in several contexts is a very difficult task. Similar to the landing related optimization, the designing of different complex ads have been made product recommendation feature evaluation completion [18]. The demerits of above literature methods are compensated by applying the conjoint related concept. The conjoint is applied in the recommendation system for handling the complex advertisements and it may help to determine the reasons for refusing all media ads. The conjoint-based concept features the first one is personalization system, and the other is selection of particular tasks by utilizing none option [19]. However, a unique product conjoint analysis is a one of the special cases of choice-based analysis.
2.2 Experimental Strategy of Conjoint Analysis Similar to the conjoint investigation, the defendants are given in a pullover manner. For suppose, the pullover is bought recently from the retailer then choice-based analysis may ask the various subjects, and their related instantaneous recommendation on products. There are two major conjoint analysis have been done on sexual separated human beings. Based on the conjoint features, the bestselling products are evaluated on amazon website [20]. Based on the personalized recommendation integration with the package of image insertion, the channel of advertisement is analyzed visually. However, the collaborative concept-based recommendation systems consist of different challenges those are recommendation speed initiation, sparsity of data recorded, effectiveness of recommendation, and scalability [21]. There are various
72
B. Ramakantha Reddy and R. Lokesh Kumar
unlimited attempts have been employed for limiting the disadvantages of collaborative issues. Based on these attempts, the resultant generation recommendation may be highly accurate. The present work gives the various recommendation systems explanation and their required determinations. From the literature of recommended systems, these systems are classified as content related systems, Knowledge Correlated Systems (KCS), memory dependent, and finally, Collaborative Related Filtering (CRF) [22]. The content related RS works depending on the comfort filtering, and analysis of memory data. Here, from one item to another item, the collaborative filtering concept is utilized because of its features are good simplicity, quality in recommendation, high scalability, and fast and quick data understanding and updating. Collaborative concepts in recommendation of projects have been done through the various websites of amazon personalized recommendations [23]. In amazon, the product style of description is given in the form of a label. The representation of the label is defined as a unique proposal for you depending on the products which you bought currently. As of now, the recommendation systems are developed for contributing the products for the possible customers. The co-operative filtering is called a basic technique in one among the recommendation systems for giving the advice of parallel customers at the time of incoming, and past transactions. Big data analysis is the one important in co-operative clarifying recommendation networks [24]. So, the suggestions made by utilizing the collaborative seeping consist of very less accuracy. From the previous works, several research scholars have employed the associate rules in various recommended methods for enhancing the accuracy of suggestions. But, the major drawbacks of recommendation techniques are a longer running time period, and it may not be useful for real world application purposes [25]. At present, the highest number of researchers working on personalized product recommendations. The determination of recommendation systems by neglecting the advertising media is clearly given in Table.1.
2.3 Various Samples Plus Data Assembly From the literature study, one of the German institutes selected a group of students for understanding and analyzing the choice related conjoint method. The analysis of assignments and their levels in conjoint concept is given in Table.2. Here, the population is the major consideration for the current research topic. The nineteenth century onwards students are developed with digital technologies which is defined as digital natives [26]. Moreover, digital technology is playing a major role in the present day life. Also, the involvement of higher age peoples in the digital system is the current trend setters for style. Based on the higher influence of digital media, and personalized concept, it is very important to associate both the traditional recommendations, and digital recommendations in various channels. In article [27], the literature is explained by the authors for differentiating the male and female candidates in provisions of clothes shopping. From the different
6 Identification of Customer Preferences by Using the Multichannel …
73
Table 1 Probable causes for declining the utilization of product endorsements Product or cause
Hypothesis
Principal, I use to neglect the every personalized publicity
Avoiding of publicity
Principal, I use to neglect the every email related publicity
Avoiding of publicity
Principal, I use to neglect the every banner publicity
Avoiding of publicity
Principal, I use to neglect the every publicity in packet Avoiding of publicity inserts If the firm may access my private documents
Confidential anxiety
If I have any advertising related documents, I think it can be peeving
Peeving
I haven’t like the personalization items
Quality of recommendation
Most of the personalized products are same with each and other
Perceived variation of recommendation
I was not knows that why continuously I am receiving Clarity recommended mails Table 2 Assigns and their levels in choice related conjoint analysis Investigation theory or research hypothesis
Determination of present choice conjoint analysis Assigns
Levels
Customers mostly refer printed channels publicity for making the publicity of information exchange channels which are banners, and emails
Various publicity channels
Banner, and package inserts
Consumer refer emails ads to exceptional publicity
Process
Email and banners publicity
Customers refer product recommendations which are obtained from the collaborative filtering
Process
Collective streaming, best marketing items
Consumer refer product style description to general description
Clarification
Each and every recommendation related to recent product purchasing
Customer refers moderate recommendation data sets to higher data sets
Multiple recommendation systems
4, 8, 12
Customer refer retailer advertise Source to high level credibility
Amazon, baur, and vests
74
B. Ramakantha Reddy and R. Lokesh Kumar
results, the gender variation is controlled accurately. Also, the gender variation gives the people preferences on various products, and their recommendations in multiple ways of media. The economics, and faculty of law related digital survey has been considered. Based on the survey of faculty law, the random number of students are taken into account who are currently available at the survey time. Here, all the students must and should have the equal probability of selecting samples. In article [28], the authors utilized the four computers in the faculty block entrance, and posters were pasted in order to make the survey effective. The update of participation in each and every survey has been done. In article [29], the selected students for conducting an analytical survey is 334. After that the two respondents are removed from the survey, and remaining made the fast literature survey which are not digital natives. Overall, the female persons are equal to 48.8%, and remaining 51.2% are male candidates [30]. Here, 76.2% respondents are aged between 18 and 23. Also, the respondents doing their undergraduate work are 74.1%. However, the selected samples may not be accurate due to the small selection biases.
3 Research Results and Discussion The factors considered for the analysis of results are fit goodness, analytical validity, choice related conjoint, and its detailed hypothesis.
3.1 Evaluation of Quality of Fit and Predictive Analysis The quality of fit of the various models of data sets is evaluated by utilizing the Average Root Probability (ARP) rates. The present fit value is greater than the ARP rate for the causal model of 0.5 for all combination samples as well as individual samples. Also, the high value of average Initial Selected Hit Rate (ISHR) gives good predictability. The choice firmness was exactly forecasted as 80.12% for the male candidates, and 79.88% for the female data set samples. Accordingly, the gender differentiation evaluation accuracy is improved to 59% for human male peoples. Similarly, the female candidates’ determination accuracy is enhanced to 54% when associated with the random value. So, the choice related conjoint analysis with highly recommended stimuli is the best method for identifying the advertisement predilections as given in Table.3.
6 Identification of Customer Preferences by Using the Multichannel … Table 3 Interior, and analytical validity of the effective utility estimations
75
Male (x = 195) Female (x = 158) ARP or RLH Cumulative
0.783
0.721
Specific
0.795
0.775
ISHR or FCHR Holdout assessment-1 (%) 75.31
75.33
Holdout assessment-2 (%) 79.98
72.77
Holdout assessment-3 (%) 78.22
80.88
Holdout assessment-4 (%) 84.55
85.11
Average
78.02
79.85
3.2 Choice Associated Combined and Assumption-Based Testing The evaluated results from the conjoint for cumulative and individual samples which are mean importance plus nil positioned measure are the functions for the relative characteristic levels specify the inclination to utilize the particular product personalized recommendations which are motivated to the different media channels. The gender differentiation of male plus female has been done by applying the advertising media and it is the most useful consideration for male candidates. As per the advertising media, the rate of male candidates is 47%, and female rate is 43%. From Table.4, the male candidates refer to package inserts instead of advertising media for product recommendations. Similarly, the female candidates refer to the email advertisement for personalized recommendations.
3.3 Product Recommendation Rejecting Reasons The average one-third of the responses chose the sixteen choices in combined with the 4 hold out tasks to get the existing product recommendation systems. Hence, approximately one-third of the probability tasks defendants refuse the use of personalized product recommendations. Those candidates select none in 3 or 16 choice tasks. However, the major reason for the rejection of the product recommendation system is different. So, the significant variation of 2-samples is clearly shown in Fig. 1.
76
B. Ramakantha Reddy and R. Lokesh Kumar
Table 4 Segment value utilities, plus quality of necessity for two genders association Quality or level
Significance and average deviation Male candidates (x = 195)
Female candidates (x = 158)
Publicity channel
46.66%, 17.5528
44.22%, 15,9334
Package supplements
41.92934, 85.27832
26.85, 90.3287
Advertising based on email
17.88834, 89.81167
36.98763, 64.734441
Advertising based on banner
− 68.004521221
− 69.2999876001
Algorithm
12.222%, 8.8999
18.66%, 14.044887
Collaborative filtering
− 21.00044, 29.4655
6.224, 56.81123
Good selling item
21.22982, 29.455430
− 4.89321, 57.228991
Clarification
6.59012%, 5.55021
6.8213%, 5.00012
Style of item
− 2.00031, 23.10321
− 10.274528, 19.9928
Undefined
2.254377, 23.332299
10.732854, 19.99934
Recommendations
19.665443%, 11.7756
19.4406%, 9.34288
4
52.9965, 55.88826
− 7.7756, 45.99878
8
− 23.8778, 30.6667
− 23.88865, 28.999654
12
− 17.9987, 44.4565021
29.99989, 40.99978765
Supplier
14.88765%, 7.564330
14.9876554%, 7.003421
Amazon
24.987654, 32.34667
26.365350, 30.6753635
Option none
173.9678674, 225.87656745
99.9944563, 124.87654
Baur
− 14.8754333, 28.54322
− 18.54330, 20.5433
Vestes deis
− 12.32211, 23.9876
− 7.985446, 31.997654
4 Conclusion In this work, the advertising on personalization is described clearly in various forms which are the use of choice related conjoint, and visuals performances of stimuli. These are not illustrated clearly in the previous published marketing works. Here, in this work, the conjoint analysis is effectively finding customer preferences in advertisement related channels. Also, the choice related conjoint gives better gender differentiation, and their references on product related recommendations in various advertising channels. Hence, the promoters, and local sellers are intellectual about doing publicity on their products. Also, the other constraints have been taken into account which are the total amount of personalized recommendations based on particular customer preferences.
6 Identification of Customer Preferences by Using the Multichannel …
77
Fig. 1 Major reasons for opposing the product recommendation system
References 1. Xie C, Teo P (2020) Institutional self-promotion: a comparative study of appraisal resources used by top-and second-tier universities in China and America. High Educ 80(2):353–371 2. Li D, Atkinson L (2020) Effect of emotional victim images in prosocial advertising: the moderating role of helping mode. Int J Nonprofit Voluntary Sector Market 25(4):e1676 3. Wongwatkit C, Panjaburee P, Srisawasdi N, Seprum P (2020) Moderating effects of gender differences on the relationships between perceived learning support, intention to use, and learning performance in a personalized e-learning. J Comput Educ 7(2):229–255 4. Kwayu S, Abubakre M, Lal B (2021) The influence of informal social media practices on knowledge sharing and work processes within organizations. Int J Inf Manage 58:102280 5. Huey RB, Carroll C, Salisbury R, Wang JL (2020) Mountaineers on Mount Everest: effects of age, sex, experience, and crowding on rates of success and death. PLoS ONE 15(8):e0236919 6. Selvaraj V, Karthika TS, Mansiya C, Alagar M (2021) An over review on recently developed techniques, mechanisms and intermediate involved in the advanced azo dye degradation for industrial applications. J Mol Struct 1224:129195 7. Schreiner T, Rese A, Baier D (2019) Multichannel personalization: identifying consumer preferences for product recommendations in advertisements across different media channels. J Retail Consum Serv 48:87–99 8. Hong T, Choi JA, Lim K, Kim P (2020) Enhancing personalized ads using interest category classification of SNS users based on deep neural networks. Sensors 21(1):199 9. Wang Y, Ma HS, Yang JH, Wang KS (2017) Industry 4.0: a way from mass customization to mass personalization production. Adv Manuf 5(4):311–320 10. Guitart IA, Hervet G, Gelper S (2020) Competitive advertising strategies for programmatic television. J Acad Mark Sci 48(4):753–775 11. Sen S, Antara N, Sen S (2021) Factors influencing consumers’ to take ready-made frozen food. Curr Psychol 40(6):2634–2643
78
B. Ramakantha Reddy and R. Lokesh Kumar
12. Matuschek E, Åhman J, Webster C, Kahlmeter G (2018) Antimicrobial susceptibility testing of colistin–evaluation of seven commercial MIC products against standard broth microdilution for Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, and Acinetobacter spp. Clin Microbiol Infect 24(8):865–870 13. Haruna K, Akmar Ismail M, Suhendroyono S, Damiasih D, Pierewan AC, Chiroma H, Herawan T (2017) Context-aware recommender system: a review of recent developmental process and future research direction. Appl Sci 7(12):1211 14. Carlo AD, Hosseini Ghomi R, Renn BN, Areán PA (2019) By the numbers: ratings and utilization of behavioral health mobile applications. NPJ Digital Med 2(1):1–8 15. Gottschall T, Skokov KP, Fries M, Taubel A, Radulov I, Scheibel F, Gutfleisch O (2019) Making a cool choice: the materials library of magnetic refrigeration. Adv Energy Mater 9(34):1901322 16. Illgen S, Höck M (2019) Literature review of the vehicle relocation problem in one-way car sharing networks. Transp Res Part B Methodol 120:193–204 17. Sample KL, Hagtvedt H, Brasel SA (2020) Components of visual perception in marketing contexts: a conceptual framework and review. J Acad Mark Sci 48(3):405–421 18. He R, Kang WC, McAuley J (2017) Translation-based recommendation. In: Proceedings of the eleventh ACM conference on recommender systems, pp 161–169 19. Micu A, Capatina A, Cristea DS, Munteanu D, Micu AE, Sarpe DA (2022) Assessing an onsite customer profiling and hyper-personalization system prototype based on a deep learning approach. Technol Forecast Soc Chang 174:121289 20. Kaushik K, Mishra R, Rana NP, Dwivedi YK (2018) Exploring reviews and review sequences on e-commerce platform: a study of helpful reviews on Amazon. J Retail Consumer Serv 45:21–32 21. Wu Z, Li C, Cao J, Ge Y (2020) On Scalability of Association-rule-based recommendation: a unified distributed-computing framework. ACM Trans Web (TWEB) 14(3):1–21 22. Tan Z, He L (2017) An efficient similarity measure for user-based collaborative filtering recommender systems inspired by the physical resonance principle. IEEE Access 5:27211–27228 23. Yoneda T, Kozawa S, Osone K, Koide Y, Abe Y, Seki Y (2019) Algorithms and system architecture for immediate personalized news recommendations. In: IEEE/WIC/ACM international conference on web intelligence, Oct 2019, pp 124–131 24. Kamilaris A, Kartakoullis A, Prenafeta-Boldú FX (2017) A review on the practice of big data analysis in agriculture. Comput Electron Agric 143:23–37 25. Tarus JK, Niu Z, Mustafa G (2018) Knowledge-based recommendation: a review of ontologybased recommender systems for e-learning. Artif Intell Rev 50(1):21–48 26. Kirschner PA, De Bruyckere P (2017) The myths of the digital native and the multitasker. Teach Teach Educ 67:135–142 27. Bourabain D, Verhaeghe PP (2019) Could you help me, please? Intersectional field experiments on everyday discrimination in clothing stores. J Ethn Migr Stud 45(11):2026–2044 28. Schwab-McCoy A, Baker CM, Gasper RE (2021) Data science in 2020: computing, curricula, and challenges for the next 10 years. J Stat Data Sci Educ 29(sup1):S40–S50 29. Oswalt SB, Lederer AM, Chestnut-Steich K, Day C, Halbritter A, Ortiz D (2020) Trends in college students’ mental health diagnoses and utilization of services, 2009–2015. J Am Coll Health 68(1):41–51 30. Kao K, Benstead LJ (2021) Female electability in the Arab world: the advantages of intersectionality. Comp Polit 53(3):427–464
Chapter 7
A Post-disaster Relocation Model for Infectious Population Considering Minimizing Cost and Time Under a Pentagonal Fuzzy Environment Mayank Singh Bhakuni, Pooja Bhakuni, and Amrit Das
1 Introduction Disasters are unanticipated disastrous events that cause significant harm to the population. In the recent years there has been considerable increase in frequency of natural and man-made disaster. These catastrophic causes human causalities and damage to public-private infrastructure. In the past, natural disaster like flood in Kedarnath in 2013, earthquake in Nepal in 2015, flood in Assam in 2017 and Chennai flood in 2021 have causes devastation of life of million of people. The damage to infrastructure causes delay in delivery of basic amenities to the affected population. Considering their miserable situation and need for humanitarian goods the population needs to be quickly transported to relief centres. The affected population comprises of both infectious and non-infectious. The transportation of infectious population poses a greater challenge because of deteriorating of health condition and possibility of spreading the disease to non-infectious population. Kouadio et al. [1] describe the need of prevention and measures taken to tackle infectious population during disaster. Loebach et al. [2] discuss the displacement of infectious population due to the consequences of natural disaster. To counter the challenges during relocation process in post-disaster phase, a multi-objective solid transportation (MOST) model is developed for transportation of infectious population from affected areas to relief centres. The model is based on core notion of solid transportation problem (STP) [3] which is an extension of transportation problem [4] developed by F. L. Hitchcock in 1941. The STP persist in various environments consisting of single and multiple objectives. Das et al. [5], Zhang et al. [6] and Kundu et al. [7] consider multiple objective for their STP.
M. S. Bhakuni · P. Bhakuni · A. Das (B) Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_7
79
80
M. S. Bhakuni et al.
The post-disaster phase persists various challenges for relocation process. These include limited budget [8], unavailability of conveyance, limited bedding in relief centres, etc. Cao et al. [9] proposed an optimization model for relief distribution in post-disaster phase. Moreover, Zhang et al. [10] develop a optimization model for humanitarian relief network. The constrains of the developed model are designed to undertake these challenges during relocation process. The model consists of two crucial objectives functions which include total travel time i.e., transportation time of affected population from source to relief centres using different types of conveyance and service time i.e., travel time of affected population, loading and unloading time and time taken to provide accommodation in relief centres. The objectives functions are design in such a way that minimum resources are used and timely allocation of infectious population is done. The consequences of disaster are unpredictable. The mathematical formulation of the model is designed considering source points, relief centres and conveyances. The exact cost of transportation, time taken to transport to relief centres, loading and unloading time, accommodation time, etc., are difficult to predict and varies depending upon the condition of source, destination, and conveyance, intensity of damage caused to roads. To counter the impreciseness in the real-life scenario we have considered inputs as pentagonal fuzzy number (PFN). The fuzzy set theory was introduced by Zadeh [11] in 1996. Later, in 1999 K Atanassov proposed [12] intuitionistic fuzzy set. The reason for choosing PFN as inputs is because it uses five components to represent a fuzzy number, allowing it to comprehend the vagueness to a great extent. The developed model is implemented in the case study conducted on Chennai flood. The COVID affected population at Chennai is transported from different sources to relief centres. The model involve multiple conflicting objectives so a compromise solution techniques is used to obtain the result. These techniques includes: global criterion method (GCM) and fuzzy goal programming (FGP). The model in solved in LINGO optimization software and thorough analysis is done for the obtained result. As far as our knowledge and research, our suggested model combines the following novel finding: – A mathematical model for transportation of infectious population. – Considering accommodation and, loading and unloading time. – Inputs of MOST model in the form of PFN.
2 Basic Concepts and Defuzzification Method for PFN In this section, we define PFN and various criteria satisfied by the membership function of PFN. We also discuss fuzzy equivalence relation. The advantage of utilizing PFN over a standard fuzzy set is that it allows expressing a fuzzy number with five components, which helps capture uncertainty and ambiguity more effectively.
7 A Post-disaster Relocation Model …
81
2.1 Basic Concepts In this section we discuss the basic concepts aligned to PFN and some properties of its membership function. Definition 1. Linear pentagonal fuzzy number [13]: A PFN is denoted as F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p) and the membership function is represented as follows: ⎧ x− f1 p f2 − f1 ⎪ ⎪ ⎪ ⎪ f2 ⎪ 1 − (1 − p) fx− ⎪ ⎪ 3 − f2 ⎪ ⎨1 ζ F˜ (x) = ⎪ 1 − (1 − p) ff44−−xf3 ⎪ ⎪ ⎪ ⎪ ⎪ p ff5−−xf ⎪ ⎪ ⎩ 5 4 0
if f 1 ≤ x ≤ f 2 if f 2 ≤ x ≤ f 3 if x = f 3 if f 3 ≤ x ≤ f 4 if f 4 ≤ x ≤ f 5 if x > f 5
Definition 2. Properties of PFN [13]: A PFN F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p) and its membership function ζ F˜ (x) must accomplish following conditions: (i) ζ F˜ (x) is a continuous function with a range of values between [0, 1] (ii) ζ F˜ (x) is strictly increasing on [ f 1 , f 2 ] and [ f 2 , f 3 ] (iii) ζ F˜ (x) is strictly decreasing on [ f 3 , f 4 ] and [ f 4 , f 5 ].
2.2 Removal Area Method to Convert PFN to Crisp Number Let us consider a PFN F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p). The defuzzification methodology to convert PFN to equivalent crisp number using the removal area method was proposed by Chakraborty et al. [14] in 2019, where the author considered five sets of the area obtained through the different regions of PFN. Then, the average value of the obtained area is calculated by summing up all the obtained areas and dividing by 5. The formula obtained is used for the defuzzification of PFN, and the representation is as follows:
˜ = D( F)
( f1 + f2 ) 2
· p+
( f2 + f3 ) 2
· (1 − p) + f 3 + f 4 − 5
( f4 − f3 ) 2
· (1 − p) +
( f5 + f4 ) 2
·p
(1) The Eq. (1) is simplified further using algebraic operations and the resulted equation is shown below: ˜ = D( F)
f 2 + 4 f 3 + f 4 + p( f 1 − 2 f 3 + 2 f 4 + f 5 ) 10
(2)
82
M. S. Bhakuni et al.
3 Problem Statement and Model Formation This section comprises mathematical modeling for the transportation of infectious populations during the post-disaster phase. Transportation must be done as early as possible to avoid further deterioration of the health of the infectious population. At the same, it must be kept in mind that transportation must be cost-efficient, such that every individual is transported using minimum resources. Therefore, there is a need for a mathematical model considering cost and time as objective functions. Since the availability of conveyance plays a crucial role during the transportation, therefore we have to consider the various type of conveyance which are restricted by the service time.
3.1 Modelling In this section, we have introduced the MOST model along with various assumptions. Further, a brief interpretation of the MOST model is made. Assumptions For The Model (i) Limited amount of budget is allotted for relief operation. (ii) Each relief centre has a limited capacity for the number of infectious populations. (iii) Working time of each convenience is limited. (iv) Each conveyance is confined to carry a restricted number of persons. Indices: (i) I set of source, indexed by i (ii) J set of relief centre, indexed by j (iii) K set of conveyance, indexed by k. Parameters: i jk C j F T T i jk T LU i jk AC Tj TP i P
fuzzy transportation cost of per person from source i to relief centre j using kth conveyance fuzzy facility cost of a person in relief centre j fuzzy transportation time taken to transport per person from source i to relief centre j using kth conveyance fuzzy loading and unloading time of conveyance k while travelling from source i to relief centre j fuzzy time taken to accommodate at relief centre j fuzzy number of total population that needs to be transported fuzzy number of population present at source i that needs to be trasported to relief centres
7 A Post-disaster Relocation Model …
k CC C Rj B k T
83
fuzzy capacity of kth conveyance fuzzy capacity of jth relief centre fuzzy budget allocated for relief work fuzzy limited time for kth conveyance
Decision Variables: xi jk yi jk =
unknown number of persons to be shifted via transportation from source i to relief centre j using k−th conveyance 1 i f xi jk > 0 0 i f other wise
Mathematical Model Min T C =
I J K i jk + F j )xi jk (C
(3)
k=1 i=1 j=1
Min ST =
I J K (T T i jk + T LU i jk + AC T j )yi jk
(4)
i=1 j=1 k=1
subject to, J K I
xi jk = T P
(5)
i=1 j=1 k=1 K J
i xi jk = P
i = 1, 2, ..., I,
(6)
j=1 k=1 I J
k xi jk ≤ CC
k = 1, 2, ..., K ,
(7)
xi jk ≤ C Rj
j = 1, 2, ..., J,
(8)
i=1 j=1 I K i=1 k=1 K I J i jk xi jk + F j )xi jk ≤ (C B
(9)
k=1 i=1 j=1 J I (T T i jk + T LU i jk )yi jk ≤ Tk k = 1, 2, ..., K
(10)
i=1 j=1
0 ≤ xi jk i = 1, 2, ..., I,
j = 1, 2, ..., J, k = 1, 2, ..., K , (11)
Model Interpretation The objective function (3) minimizes the total cost of relocating a disaster-affected population, and it has two key terms. The first term delineates the transportation cost from source i to relief centre j using k−th conveyance and
84
M. S. Bhakuni et al.
the second term represents the facility cost of relief centre j. The objective function (4) minimizes the total service time taken to settle the affected population, and it contains three essential terms. The transportation time between source i and relief centre j using k−th conveyance, loading and unloading time of overall population, and accommodation time of each relief centre j. Constraint (5) indicates the overall population that need to be transported to the relief centre. Constraint (6) depicts the number of people present at source i. Constraint (7) delineates the capacity of conveyance k. Constraint (8) represents the capacity of relief centre j. Constraints (9) depicts the available budget. Constraints (9) illustrated the limited working time for each conveyance k. Constraint (11) demonstrate non-negative restrictions.
4 Solving Multi-objective Optimization Problems The model proposed in Sect. 3 is full of complexity due to the presence of different fuzzy parameters and multiple objectives. To solve the model, first, we need a defuzzification technique and then utilize a compromise programming approach to obtain a optimal solution.
4.1 Defuzzification of MOST Model The MOST model describe in Sect. 3 exists in fuzzy environment. In order to solve the MOST model it needs to be converted to the crisp environment. Using the defuzzi˜ i jk , fication technique discussed in Sect. 2 we convert fuzzy C˜ i jk , F˜ j , T˜T i jk , T LU ˜ k , C˜R j , B, ˜ T˜k to crisp Ci jk , F j , T Ti jk , T LUi jk , AC Ti jk , T P, AC˜ T i jk , T˜P, P˜i , CC Pi , CCk , C R j , B, T˜k , respectively. The obtained crisp model is as follows: Min T C =
K I J (Ci jk + F j )xi jk
(12)
k=1 i=1 j=1
Min ST =
I J K
T Ti jk + T LUi jk + AC T j yi jk
(13)
i=1 j=1 k=1
subject to, I J K
xi jk = T P
(14)
i=1 j=1 k=1 J K j=1 k=1
xi jk = Pi
i = 1, 2, ..., I,
(15)
7 A Post-disaster Relocation Model … I J
85
xi jk ≤ CCk
k = 1, 2, ..., K ,
(16)
i=1 j=1 K I
xi jk ≤ C R j
j = 1, 2, ..., J,
(17)
i=1 k=1 I J K
(Ci jk xi jk + F j )xi jk ≤ B
(18)
k=1 i=1 j=1 J I (T Ti jk + T LUi jk )yi jk ≤ Tk k = 1, 2, ..., K
(19)
i=1 j=1
constraint (11)
(20)
4.2 Compromise Programming Approach to Solve MOST Model The above model consists of two conflicting objective functions. Where first objective minimizes the total cost and second objective minimizes the overall service time. The dissonance and incomprehensibility among objective functions result in the lack of a single dominant solution to the overall problem. As a result, employing techniques for solving MOST model a compromise solution is obtained. We use two solution methods namely FGP and GCM to solve MOST model. Global Criterion Method The GCM is used to obtain a compromise solution for a MOST model. The lack of the Pareto ranking mechanism [15] gives this technique a significant advantage above other multi-objective optimization methods in terms of simplicity and efficiency. It minimizes the metric function, which represents the sum of the difference between the objective function and its respective ideal solution. The procedures for utilizing GCM to solve the MOST model are as follows: Step 1: Each objective function (ϕ1 , ϕ2 , ..., ϕm ), of the MOST model is solved independently. Step 2: The value obtained from step-1 for each objective function is titled as ideal objective vectors. The obtained values are (ϕ1 min , ϕ2 min , ..., ϕm min ). Step 3: Using GCM, the MOST model is simplified to a single objective and represented as follows:
86
M. S. Bhakuni et al.
η M ϕm − ϕmmin Min ϕmmin m=1
η1
subject to, constraints (14)–(20) 1≤η≤∞ The value of the integer-valued exponent, i.e. η = 1, means that we are giving equal significance to each objective function [16]. While η > 1 indicates that higher importance is given to an objective with maximum deviation. For η = 1 we get linear objective function while for η > 1 we have non-linear [17]. Fuzzy Goal Programming In 1961, Charnes and Cooper [18] developed goal programming. The basic idea underlying goal programming is to minimize the distance between objective functions i.e., ϕ1 , ϕ2 , ..., ϕm and their aspirant level i.e., ϕ 1 , ϕ 2 , ..., ϕ m respectively. Further, Mohamed [19] proposed a new concept to minimize distance by defining positive (δm+ ) and negative deviations (δm− ) variables as shown below: 1 {(ϕm − ϕm ) + |ϕm − ϕm |} 2 1 δm− = max (0, ϕm − ϕm ) = {(ϕm − ϕm ) + |ϕm − ϕm |} 2
δm+ = max (0, ϕm − ϕm ) =
m = 1, 2, ..., M, m = 1, 2, ..., M,
In 1972 Zimmermann [20] proposed fuzzy linear programming, where the objective function and respective constraints were defined in fuzzy parameters. In 1997, Mohamed [19] drew attention to the resemblance between goal programming and fuzzy linear programming, as well as how one may lead to the other. Zangiabadi and Maleki [21] introduced an FGP technique that used a unique sort of nonlinear (hyperbolic) membership function for each fuzzy objective and constraint. The steps to solve the MOST model described in Sect. 3 using FGP are stated as follows: Step 1: Solve each MOST model objective function independently, i.e. take just one objective function at a time and ignore the rest. Assume that q1 , q2 , ..., ql are the values of unknown variables acquired after solving each objective function. Step 2: Using unknown variables acquired in step 1 and we obtain ϕ1 (q1 ), ϕ1 (q2 ), ..., ϕ1 (ql ), ϕ2 (q1 ), ϕ2 (q2 ), ..., ϕ2 (ql ), ..., ϕm (q1 ), ϕm (q2 ), ..., ϕm (ql ). Step 3: Calculate the best (bm ) and the worst (wm ) for each objective function. bm = min ϕm (ql ) and wm = max ϕm (ql ) ∀l∈L
∀l∈L
l = 1, 2, ..., L .
Step 4: The model described in Sect. 3 is represented as follows:
7 A Post-disaster Relocation Model …
min ξ subject to,
(bm +wm )
−ϕ
ν
87
− (bm +wm ) −ϕ ν
m p m p 2 2 −e 1 1e − δm+ + δm− = 1 + m ) −ϕ m ) −ϕ 2 2 (bm +w − (bm +w m νp m νp 2 2 e +e ξ ≥ δm− p = 1, 2, · · · , P, δm+ δm− = 0
constraints (14)–(20) 0≤ξ ≤1 6 νp = wm − bm
4.3 LINGO Optimization Software The MOST model proposed in Sect. 3 is solved using LINGO software. It comes with a collection of built-in solvers for various problems. The modeling environment is strictly aligned to the LINGO solver and because of this inter connectivity it transmit problem directly to memory which results in minimization of compatibility issues between solver and modeling components. It uses multiple CPU cores for model simulation, thus giving faster results.
5 Numerical Experiments and Discussions Flood causes loss of human life, damage to infrastructure, non-operation of infrastructural facilities, and worsening health conditions due to waterborne infections. The situation of those suffering from infectious diseases might wreak much more significant harm. Therefore, timely relocation to relief centres is required. To counter this problem, we have proposed the MOST model in Sect. 3, and its practical implementation is demonstrated in the upcoming subsection.
5.1 Input Data for the Real Life Model We have considered the case study based on Chennai, the capital of Tamil Nadu, India. In 2021, the city received more than 1000mm of rainfall in four weeks which resulted in floods in various regions. The peak of COVID made the situation more havoc. Which further delayed various rescue operations. Considering this problem, the designed model targets the relocation of the infectious population during the post-
88
M. S. Bhakuni et al.
Table 1 PFN inputs for MOST model j
j F
C Rj
AC Tj
1 (78,95,125,142,154;0.5)
(8,13,19,25,40;0.6)
(435,491,556,641,715;0.7)
2 (75,84,105,119,127;0.9) i i P
(11,14,17,31,51;0.3) k CC
(564,625,687,827,856;0.8) k T
k
1 (163,189,225,275,307;0.8)
1
(197,275,343,383,423;0.4) (1359,1486,1635,1856,1959;0.8)
2 (254,295,345,384,453;0.6)
2
(345,416,445,494,557;0.9) (1238,1416,1698,1783,1902;0.9)
3 (345,395,465,515,555;0.7)
3
(368,434,475,548,671;0.8) (1292,1391,1535,1686,1761;0.6) B
(151901,184389,197653,213678,230675;0.8) T P (768,925,1078,1194,1315;0.6)
disaster phase. We have considered three source points located at Velachery, West Mambalam and Pullianthope. From these source points, the affected population is transported to two relief centres situated at thiruvallur and Kanchipuram. Depending upon the transportation time, capacity and ease of travelling, we have considered j ), time taken to three types of conveyance. The PFN inputs for the facility cost ( F accommodate at jth relief centre ( AC T j ), capacity of relief centre (C R j ), population present at source ( Pi ), capacity of conveyance (CC k ), limited time of conveyance k ), budget for relief operation ( B) and total population that needs to be transported (T i jk ), (T P) is shown using Table 1. While, Table 2 represents cost of transportation (C transportation time (T T i jk ), loading and unloading time (T LU i jk ).
5.2 Result Analysis The model is solved in the LINGO optimization solver using FGP and GCM approaches. Using GCM for η = 1 a total of six allocations of unknown variables are made i.e., x113 = 182, x213 = 10, x211 = 231, x212 = 5, x222 = 7, x322 = 347 with the total cost being TC= 144338 and service time ST= 994. For η = 2 there are five allocations i.e., x321 = 171, x122 = 182, x222 = 1, x312 = 176, x213 = 252 and total cost is TC= 158835 and service time ST= 802. While solving using FGP five allocations are made i.e., x211 = 40, x222 = 12, x322 = 347, x113 = 182, x213 = 201 along with TC= 148514 and service time ST= 817. Further, the transportation of number of people from source to relief centres using GCM and FGP is shown using Figs. 1 and 2 respectively. After analyzing the result obtained using FGP and GCM we infer following points:
7 A Post-disaster Relocation Model …
89
Table 2 PFN inputs for transportation cost and time i jk C
i jk C
T T i jk
T T i jk
T LU i jk
T LU i jk
i
k=1
k=2
1
(136,148,159,184,198;0.5)
(130,132,155,176,193;0.8)
k=3
(125,145,180,202,236;0.6)
2
(58,86,125,138,156;0.9)
(95,136,156,185,197;0.3)
(96,124,153,176,198;0.7)
3
(86,105,135,177,210;0.6)
(78,112,146,188,218;0.7)
(91,132,168,204,217;0.9)
1
(136,164,181,243,310;0.7)
(168,186,235,286,325;0.4)
(185,248,259,264,269;0.5)
2
(132,175,205,225,253;0.4)
(86,115,193,221,248;0.8)
(145,178,215,245,289;0.5)
3
(56,76,104,142,158;0.4)
(74,95,102,127,151;0.8)
(76,112,154,183,196;0.3)
1
(105,134,156,192,198;0.4)
(130,132,155,176,193;0.8)
(125,145,180,202,236;0.6)
2
(123,156,178,204,235;0.8)
(114,178,198,245,317;0.2)
(95,132,163,176,229;0.4)
3
(158,186,221,284,331;0.4)
(128,156,191,237,270;0.7)
(125,149,167,205,219;0.9)
1
(105,124,168,189,203;0.9)
(94,136,157,210,220;0.3)
(86,114,138,169,182;0.5)
2
(165,178,214,267,299;0.7)
(143,168,195,246,265;0.6)
(125,158,198,231,247,0.5)
3
(143,176,189,234,247;0.8)
(114,167,198,245,284;0.5)
(131,156,174,204,239;0.8)
1
(18,27,34,43,52;0.5)
(14,21,24,34,36;0.7)
(8,14,21,25,38;0.5)
2
(23,31,45,61,75;0.6)
(18,28,41,53,68;0.5)
(21,24,35,50,69;0.3)
3
(18,26,35,44,64,0.6)
(18,21,24,39,57;0.8)
(16,21,28,37,41;0.4)
1
(12,24,31,38,42;0.2)
(9,16,19,22.25,0.9)
(7,9,15,17,24;0.4)
2
(15,28,34,48,77;0.9)
(18,28,31,46,57;0.4)
(9,17,28,35,57;0.2)
3
(14,28,32,36,54;0.5)
(10,18,25,33,44;0.7)
(8,17,20,29,34;0.9)
j =1
j =2
j =1
j =2
j =1
j =2
Fig. 1 Solution of MOST model using GCM for η = 1 and η = 2
– The minimum value of objective function TC is obtained using GCM for η = 1, and the maximum value is obtained using η = 2. – For objective function, ST minimum value is obtained using GCM for η = 2, and the maximum value is obtained using η = 1. – FGP provides an intermediate result for both the objective function TC and ST compared with the solution obtained using η = 1 and η = 2 of GCM. Decision-makers can choose any solution technique depending upon their priority. If they want minimum expenditure in relief procedure (have ample time), they can opt for GCM with η = 1. While, if they need to complete the relief procedure as fast as possible (without being restricted to budget), GCM with η = 2 can be used. While if they want interim value for cost and time objectives, they can opt FGP technique.
90
M. S. Bhakuni et al.
Fig. 2 Solution of MOST model obtained using FGP
6 Conclusion and Future Prospects A MOST model comprising two objective functions is developed for the transportation of infectious populations during the post-disaster phase. The model is developed considering time, budget, and capacity constraints. The after affected of a disaster is unpredictable. Therefore, in order to grasp the uncertainty and vagueness, we have considered PFN as the input parameters. The model is successfully implemented in the case study conducted on the Chennai flood. The inputs for the model are considered in the PF environment. The inputs are converted to equivalent crisp values using the defuzzification method based on the removal area method. The obtained equivalent crisp model is solved in the LINGO optimization solver using compromise solution techniques (GCM and FGP). Further, a detailed analysis of the optimal solution is done, and suggestions are made for the various choices of the decisionmaker. In future, the model can be extended in other uncertain environments like type-2 fuzzy, stochastic, etc. The model can further be implemented in various case studies considering additional objectives and constraints.
References 1. Kouadio IK, Aljunid S, Kamigaki T, Hammad K, Oshitani H (2012) Infectious diseases following natural disasters: prevention and control measures. Expert Rev Anti-infect Ther 10(1):95– 104 2. Loebach P, Korinek K (2019) Disaster vulnerability, displacement, and infectious disease: Nicaragua and Hurricane Mitch. Popul Environ 40(4):434–455 3. Haley KB (1962) New methods in mathematical programming-the solid transportation problem. Oper Res 10(4):448–463 4. Hitchcock FL (1941) The distribution of a product from several sources to numerous localities. J Math Phys 20(1–4):224–230 5. Das A, Bera UK, Maiti M (2018) Defuzzification and application of trapezoidal type-2 fuzzy variables to green solid transportation problem. Soft Comput 22(7):2275–2297 6. Zhang B, Peng J, Li S, Chen L (2016) Fixed charge solid transportation problem in uncertain environment and its algorithm. Comput Ind Eng 102:186–197
7 A Post-disaster Relocation Model …
91
7. Kundu P, Kar S, Maiti M (2014) Multi-objective solid transportation problems with budget constraint in uncertain environment. Int J Sys Sci 45(8):1668–1682 8. Vahdani B, Veysmoradi D, Noori F, Mansour F (2018) Two-stage multi-objective locationrouting-inventory model for humanitarian logistics network design under uncertainty. Int J Disaster Risk Reduction 27:290–306 9. Cao C, Liu Y, Tang O, Gao X (2021) A fuzzy bi-level optimization model for multi-period post-disaster relief distribution in sustainable humanitarian supply chains. Int J Prod Econ 235:108081 10. Zhang P, Liu Y, Yang G, Zhang G (2020) A distributionally robust optimization model for designing humanitarian relief network with resource reallocation. Soft Comput 24(4):2749– 2767 11. Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, vol 6. World Scientific 12. Atanassov KT (1999) Intuitionistic fuzzy sets. Intuitionistic fuzzy sets. Physica, Heidelberg, pp 1–137 13. Mondal SP, Mandal M (2017) Pentagonal fuzzy number, its properties and application in fuzzy equation. Future Comput Inform J 2(2):110–117 14. Chakraborty A, Mondal SP, Alam S, Ahmadian A, Senu N, De D, Salahshour S (2019) The pentagonal fuzzy number: its different representations, properties, ranking, defuzzification and application in game problems. Symmetry 11(2):248 15. Chiandussi G, Marco C, Ferrero S, Varesio FE (2012) Comparison of multi-objective optimization methodologies for engineering applications. Comput Math Appl 63(5):912–942 16. Salukvadze M (1974) On the existence of solutions in problems of optimization under vectorvalued criteria. J Optim Theory Appl 13(2):203–217 17. Tabucanon MT (1988) Multiple criteria decision making in industry, vol 8. Elsevier Science Limited 18. Kade G, Charnes A, Cooper WW (1964) Management models and industrial applications of linear programming. Vol. I und II. New York-London, (1961) Book Review. J Econ/Zeitschrift für Nazionalökonomie 23:432 19. Mohamed RH (1997) The relationship between goal programming and fuzzy programming. Fuzzy Sets Syst 89(2):215–222 20. Zimmermann H-J (1975) Description and optimization of fuzzy systems. Int J Gen Syst 2(1):209–215 21. Zangiabadi M, Maleki HR (2007) Fuzzy goal programming for multiobjective transportation problems. J Appl Math Comput 24(1):449–460
Chapter 8
The Hidden Enemy: A Botnet Taxonomy Sneha Padhiar , Aayushyamaan Shah, and Ritesh Patel
1 Introduction Modern society is heavily dependent on the use of the Internet. From carrying out simple tasks like opening an application to creating a presentation and complex tasks like playing music from an online application to playing MMO games, everyone uses the Internet. This very use sometimes requires the download of proprietary software which might not come cheap in some cases. Due to this, piracy has hit an all-time high. This in turn made it easy for hackers to spread malware [1–3]. This malware is often made to utilize the resources of the infected machine as a ready-to-use compute node in a large distributed yet interconnected network known as a botnet. In other words, botnets can be said to be a network of interconnected, malware-infected, zombie-like computers that are managed by a central authoritative server known as the command and control server (and the one to control the C&C server is known as the botmaster) [4]. The botmaster is like the brain of the network where an infected machine will do whatever the botmaster will ask it to do. One of the most well-known uses for such a botnet is to carry out a DDoS attack. DDoS attack, in layman’s terms, is where multiple computers start pinging or utilizing a server’s resources in order to stop the actual clients of that server from accessing the services provided [5, 6]. DDoS in itself is hard to prevent and identify if executed correctly, and a botnet adds to it. With the help of botnets, it becomes impossible S. Padhiar (B) · A. Shah · R. Patel U & P.U Patel Department of Computer Engineering, CSPIT, Charusat University of Science and Technology, Anand, GJ 388421, India e-mail: [email protected] A. Shah e-mail: [email protected] R. Patel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_8
93
94
S. Padhiar et al.
for simple firewalls and security systems to distinguish a DDoS attack from heavy traffic[6]. The rest of the paper is organized as follows. Section 2 introduces a detailed life cycle of botnet. In Sect. 3, we analyze botnet phenomenon, classification based on its architecture, and most relevant detection technique. Finally, Sect. 4 presents conclusions.
2 Botnet Life Cycle Understanding the concept of a botnet is not a difficult task. Designing one is completely different as there are many intricacies involved. The designer has to keep in mind some of the following points before actually deploying it [7–9]: • It should be very efficient in utilizing the resources of the infected computers. • The code should be camouflaged to prevent it from being detected by the antimalware software. • The infected machine should follow the command and control server’s commands. Keeping that in mind, a simple botnet life cycle consisting of five phases was created. The phases of the botnet life cycle (Fig. 1) are as follows: A. Initial injection In this step, the attacker identifies potential hosts by different methods. Once the attacker finds a suitable target, he/she uses a series of attacks to infect that host. This series of attacks include phishing, sending spam emails, creating backdoors, etc. When any one of these attacks is successful and the host is infected, the main goal of the attacker is to add the infected machine to the bot network [7–9]. This is done by periodically refreshing and/or updating this entry. At the end of this stage, we can call the infected computer a newborn or dormant bot.
Fig. 1 Botnet life cycle
8 The Hidden Enemy: A Botnet Taxonomy
95
B. Secondary injection In this stage, the infected machine downloads the actual binary scripts of the malware. The download is done with the help of HTTP, IRC, and peer-to-peer (P2P) protocols. Once the download is complete, the bot compiles the scripts if needed and runs them in the background. By doing this, the bot becomes ready to take useful commands from the command and control servers (C&C servers). These ready bots are called active bots [7–9]. C. Connection After infecting the computers, the botnet is still useless until the C&C server actually communicates with the bots. Hence, after the two phases of infection, the C&C server makes contact with the bots to give command and control information on what is to be done by the bots. On receiving the command and control information from the C&C server, the bot replies to the server with an acknowledgment message proving that it is still active. The person controlling the C&C servers to send the command and control messages is known as the botmaster. To ensure proper working of the bots, the botmaster might restart the bots to check their status (newer methods include simple pinging to know the status of a bot) [7–9]. D. Command and control server The heart of the botnets is the command and control servers (C&C servers). These servers are what the botmaster uses to communicate with and control the army of bots that was developed in the first two phases. C&C servers, or servers in some cases, are created in different manners based on the use and requirements of the attacker. For example, a botnet where the bots are used to carry out an elaborate DDoS attack on a high profile company might consist of multiple C&C servers in a tree-like fashion (hierarchical fashion), where one controls the other and the servers near the leaf nodes control the bots [7–9], whereas a botnet that is designed to simply gather information from the infected machine might only consist of a single C&C server (centralized). E. Upgrade and Maintenance The most crucial step in software development is maintenance and upgradation. Simply put, maintenance of a botnet is required to adapt it to new technologies making it more efficient, and easy to handle, and preventing C&C servers from being detected by anti-malware software and network analysts. To do so, the C&C servers are shifted to new locations preventing them from being tracked by network traffic analysts [7–9].
96
S. Padhiar et al.
3 Classification Since there are no standards for creating malware, classification of all the malware is important to identify the similarities among them. These very classifications help the researchers to group them into categories during their research and practical implementations [10–12]. For this paper, we have decided to classify different botnets based on their architectures.
3.1 Botnet Architecture Botnet architecture means the way the botnet is created. As seen in Fig. 2, there are three different types of models a botnet can fall into based on its architecture [13].
3.1.1
Centralized Architecture
The simplest architecture in this classification is centralized architecture. In this architecture (Fig. 3), all the bots are managed by one single C&C server or a single chain of C&C servers that act as a single entity. The botmaster uses this single C&C server for all the malicious activities. To connect to this type of server, bots use HTTP and IRC protocols. Due to this, the implementation of the botnet is comparatively easy [13, 14].
3.1.2
Decentralized Architecture
The next tier in the architecture of botnets is decentralized architecture (Fig. 4). Here the botmaster uses a C&C server to control one or some bots which in turn act as Fig. 2 Types of botnet based on its architecture
8 The Hidden Enemy: A Botnet Taxonomy
97
Fig. 3 Centralized botnet architecture
Fig. 4 Decentralized botnet architecture
C&C servers for other bots. In other words, all the bots are capable of acting as a bot as well as C&C servers for other bots [13, 14]. This type of network is called P2P or peer-to-peer network and uses the aptly named P2P protocol for communication.
3.1.3
Hybrid Architecture
The final type of architecture is hybrid architecture. This architecture is a combination of centralized and decentralized architecture. In this sense, it is modified peer-to-peer architecture where some aspects of the centralized architecture are implemented. The general idea is to create a mesh-like topology of bot networks where the nodes in the tree are sub-networks of bots (mini botnets). This can also be viewed as a centralized C&C server that controls multiple different C&C servers that in turn are responsible for controlling the local botnets [13, 14].
98
S. Padhiar et al.
Fig. 5 Botnet detection techniques
3.2 Botnet Detection Techniques See Fig. 5.
3.2.1
Honeynets and Honeypots-Based Detection Technique
The honeynet detection technique works with honeypot and honey wall. This technique is used to detect and observe the overall behavior of a botnet. A honeypot is defined as a vulnerable pot that can be easily compromised [15, 16]. Additionally, it is vulnerable, as it is intended to be a part of a botnet and to attract botmasters to infect it. There are different ways to set up honeypots.
Anomaly-Based Detection Technique Anomaly-based techniques work with monitoring the network traffic. The anomalybased detection techniques indicate spamming bot activity by detecting unexpected network interactions and network traffic on unused and unusual ports, as well as unusual system behavior [17]. In order to spot botnets, it is necessary to distinguish malicious traffic from normal traffic. It is possible to detect and shut down a botnet with a number of the existing techniques, but none of them is 100% accurate [13–17]. There are different methods that each technique uses to produce results, and they produce results in different ways. Since all techniques have some advantages and limitations, Table 1 compares the different techniques used for detecting botnets.
8 The Hidden Enemy: A Botnet Taxonomy
99
Table 1 Comparison of different botnet detection techniques Detection technique
Known bot detection
Unknown bot detection
Encrypted traffic detection
Structure and protocol independent
Real-time detection
Anomaly
Yes
Yes
Yes
Yes
No
Signature
Yes
No
No
No
No
DNS
Yes
Yes
Yes
No
No
Data mining Yes
No
No
Yes
No
Honeypot
Yes
Yes
Yes
Yes
Yes
4 Conclusion and Future Scope Botnets are considered the most dangerous form of cyber-security attack in the list of malware. There has been substantial research on botnets in previous years, and it continues to be a difficult topic for researchers due to its complexity and dynamism. Furthermore, botnets can perform legal functions when they are in charge of checking the activities of all organizations and employers. A brief summary of the problem of botnets, what botnet attacks look like at present, the types of botnets, and their existing detection methods is presented in this paper. Different protocols and architectural designs can be used to create botnets. Several new types of botnets are being created, such as bot clouds and mobile botnets. Facebook, Twitter, and other social networking websites are attacked by botnets such as social bot. The existing botnet detection techniques can be classified as setting up a honeynet or using an intrusion detection system (IDS). The use of machine learning, automatic models, and other methods is one way to identify bots. Yet, tracing botnets is not yet 100% accurate with any existing model or technique. In the future, security researchers should investigate the various botnet detection techniques with respect to botnet architecture and improve its potential extension. In addition, it is necessary to monitor the entire functioning of the different botnets and create a complete list of the bots with their signatures, as this will prove beneficial in developing new botnet detection models and techniques. In the field of cyber-security, providers should pay more attention to the latest botnet attacks such as bot clouds, mobile botnets, and social bots.
References 1. Kurt A, Erdin E, Cebe M, Akkaya K, Uluagac AS (2020) LNBot: a covert hybrid botnet on bitcoin lightning network for fun and profit. In: Computer security–ESORICS. Springer, Berlin, Germany 2. Shin S, Xu L, Hong S, Gu G (2016) Enhancing network security through software defined network (SDN). IEEE
100
S. Padhiar et al.
3. Ali I, Ahmed AIA, Almogren A et al (2020) Systematic literature review on IoT-based botnet attack. IEEE Access 8:212220–212232 4. Almomani (2018) Fast-flux hunter: a system for filtering online fast-flux botnet. Neural Comput Appl 29(7):483–493 5. Al-Nawasrah A, Al-Momani A, Meziane F, Alauthman M (2018) Fast flux botnet detection framework using adaptive dynamic evolving spiking neural network algorithm. In: 2018 9th international conference on information and communication systems (ICICS). IEEE, pp 7–11 6. Sandip Sonawane M (2018) A survey of botnet and botnet detection methods. Int J Eng Res Technol (IJERT) 7(12) 7. Karim A, Salleh RB, Shiraz M, Shah SAA, Awan I, Anuar NB (2014) Botnet detection techniques: review, future trends and issues. J Zhejiang Univ Sci C 15(11):943–983 8. Liu CY, Peng CH, Lin IC (2014) A survey of Botnet architecture and botnet detection techniques. Int J Netw Secur 16(2):81–89 9. Kaur N, Singh M (2016) Botnet and botnet detection techniques in cyber realm. In: 2016 international conference on inventive computation technologies (ICICT) 10. Wu D, Fang B, Cui X, Liu Q (2018) Bot catcher: botnet detection system based on deep learning. J Commun 39(8):18–28 11. Grill M, Rehák M (2014) Malware detection using http user-agent discrepancy identification. In: 2014 IEEE international workshop on information forensics and security (WIFS), IEEE, pp 221–226 12. Zha Z, Wang A, Guo Y, Montgomery D, Chen S (2019) BotSifter: an SDN-based online bot detection framework in data centers. In: Proceedings of the 2019 IEEE conference on communications and network security (CNS), Washington DC, DC, USA, Nov 2019, pp 142– 150 13. Xiong Z (2019) Research on botnet traffic detection methods for fast-flux and domain-flux. University of Electronic Science and Technology, Chengdu, China 14. Kirubavathi G, Anitha R (2018) Structural analysis and detection of Android botnets using machine learning techniques. Int J Inf Secure 17(2):153–167 15. Ghosh T, El-Sheikh E, Jammal W (2019) A multi-stage detection technique for DNS-tunneled botnets. Can Art Ther Assoc 58:137–143 16. Khan RU, Kumar R, Alazab M, Zhang X (2019) A hybrid technique to detect botnets, based on P2P traffic similarity. In: Proceedings of the 2019 cybersecurity and cyberforensics conference (CCC), Melbourne, Australia, May, 2019, pp 136–142 17. Kempanna M, Jagadeesh Kannan R (2015) A novel traffic reduction technique and ANFIS based botnet detection. In: International conference on circuits, systems, signal and telecommunications, 31 Jan 2015
Chapter 9
Intelligent Call Prioritization Using Speech Emotion Recognition Sanjana Addagarla, Ravi Agrawal, Deep Dodhiwala, Nikahat Mulla, and Kaisar Katchi
1 Introduction Humans are intelligent beings, and one of the most important aspects that differentiate this species from the rest is our ability to communicate using a well-defined language with each other. This language consists of verbal communication and non-verbal cues that make us receptive to the speaker’s emotional state of affairs. The past decade has invited much research into analyzing and improving the emotional capability of the human–machine interaction (HMI). This research has branched to the different modalities of human communication, including physical analysis (by examining face-tracking, facial expression recognition, eyesighttracking), speech analysis (by acoustic interpretations, voice tone quality), and physiological analysis using sensors to monitor heart rate, EEG signal and more. Customer Relationship Management (CRM) is a critical component of any business’s growth and sustainability. Inbound call centers receive calls from customers to resolve complaints, answer queries or accept all feedback. The Customer Service Representatives (CSR) play a significant role in creating a fluid, seamless and positive customer experience. CSRs performance is crucial for maintaining customer satisfaction and retention [1]. This process is also reliant on the compatibility of the customer and the assigned agent’s capability to address the call subject. When all S. Addagarla (B) · R. Agrawal · D. Dodhiwala · N. Mulla Department of Information Technology, Sardar Patel Institute of Technology, Mumbai, Maharashtra 400053, India N. Mulla e-mail: [email protected] K. Katchi Department of Applied Sciences and Humanities, Sardar Patel Institute of Technology, Mumbai, Maharashtra 400053, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_9
101
102
S. Addagarla et al.
agents are busy, the callers are placed in a waiting queue and are serviced on a firstcome-first-serve basis. So, in theory, a caller whose call needs immediate servicing will be placed in a wait queue and would have to wait until all the customers in front of them in the queue are serviced. Then the caller is matched with the next available agent. This process does not guarantee that the agent is a good match for the customer. Additionally, this bitters their experience due to all the wait time. Call centers also review calls for performance purposes after the call has taken place to improve customer experience. But there is no analysis done before the call when the caller is added to the waiting queue. This paper focuses on leveraging speech emotion recognition and textual analysis to identify the caller’s emotional state and match them with the right agent while reducing and redistributing their waiting time to service the call as quickly as possible. This helps in the smart utilization of the workforce and the available data. Ekman [2] states that human emotions can be broken down into six primary emotions of anger, fear, surprise, sadness, happiness, and disgust. Using these principal emotions, the caller in the waiting queue is requested to record a short clip detailing their reason for the call on which speech and textual analysis are performed. Priority is assigned to the callers based on the predicted emotion and the calls in the wait queue are reordered accordingly. The identified emotion is used to match the caller with the appropriate emotionally skilled agent using our Emotion-Based Routing (EBR) to address this call. The underlying assumptions made in the paper are that some callers in the waiting queue are of a higher priority than others according to the parameters determined by the business use case. For a customer service call center, anger has been recognized as the most critical emotion for businesses [3]. It is in the business’s interest to service angry callers as soon as possible to have a higher customer retention rate and lower call abandonment rate. It is also assumed that the emotion of the caller is proportionately reflected in the voice and the lexicon of the caller. The paper is divided into the following sections: Sect. 2 refers to the related works of speech emotion recognition and textual analysis, and Sect. 3 covers the proposed methodology for the prioritization of calls in the waiting queue of the call center. The experimental results and observations are detailed in Sect. 4. Section 5 summarizes the conclusion and the future work possible for the paper.
2 Literature Survey Petrushin and Valery [3] proposed an Emotion Recognition software for call centers to process telephone quality voice messages which can be used for prioritizing voice messages. They concluded that speech emotion recognition employing neural networks proved beneficial for fostering and enhancing CRM systems. The features most considered for speech emotion recognition are Mel-frequency cepstral coefficients (MFCC), prosody and voice quality features such as pitch, intensity, duration harmonics, Perceptual Linear Predictive Cepstrum (PLPC), and more. Using the
9 Intelligent Call Prioritization Using Speech Emotion Recognition
103
combination of the different perceptual features of MFCC, PLP, MFPLPC, LPC, and statistics gives the best result for a simple DNN model trained on the Berlin Database [4] as found in [5]. Vidrascu et al. [6] identified the 25 best features to be selected from 129, i.e., four from Microprosody, four F0 related, four Formants, five from Energy, four from Duration from phonetic alignment, and six other cues from Transcripts. Their research suggests that sadness is the hardest emotion to recognize without mixing cues from the transcripts. Research on call redistribution based on emotion [7] indicates that a multilayer perceptron used for speech emotion recognition does better when the neural network has more neurons in the hidden layer, but it is more computationally expensive. Machine learning models such as Linear Bayes Classifier are comparatively faster and also have higher accuracy. The calls are reordered in descending order by placing angry and fearful emotion callers first. This increased the waiting time of joy and neutral emotion callers significantly. Banerjee and Poorna [8] considered the four parameters of pitch, SPL, timbre, and time gaps to perform their analysis using MATLAB and Wavepad considering only three emotional states normal, angry, and panicked, and observed that the emotion of the speaker also affects the pitch and the number of pauses the speaker takes. Khalil et al. [9] reviewed the background behind emotion detection in speech and recognition using different techniques. They classified the datasets into three kinds: Simulated Databases, created by experienced performers and actors; Induced Database (taken without the knowledge of the speaker); Natural Database: recorded from call center conversations, conversations in the public domain, etc. They observed the best accuracy for the three emotion classes of Anger (99%), Happy (99%), and Sad (96%) using a Deep Convolutional Neural Network. They also noted that the Deep Learning model had an advantage since they tend to learn quickly and efficiently provide representation, but its disadvantage is that RvNN is usually used for natural language processing and not speech emotion recognition though it can handle the different modalities. In [10] Heracleous et al. focused on speech emotion recognition in multiple languages by drawing on lifelike emotional speeches from films in three languages. Language-specific features are combined with features specific to certain emotions and are classified using an extremely randomized trees classifier (ERT). This classifier performed well to give an unweighted average recall of 73.3%. Cho et al. [11] proposed combining acoustic information and text transcriptions from the audio for improved speech emotion recognition. With an LSTM network for acoustic analysis and CNN for textual analysis trained on the IEMOCAP [12] dataset, they observed that the combination of both text and speech models outperformed the text-trained and speech-trained models and proved that the textual information extracted from the audio was complementary. Jia et al. [13] experimented with Long Short-Term Memory (LSTM) and a Latent Dirichlet Allocation (LDA) to extract the acoustic and textual features, respectively. They worked with Soguo Voice Assistant to get a dataset of audio and text and used three indicators namely Descriptive, Temporal, and Geo-social. They also observed that time also plays a crucial role in
104
S. Addagarla et al.
affecting one’s emotion, i.e., people tend to be more joyous early at night and dull before dawn, correlating this with one’s working hours.
3 Methodology This section details the proposed methodology (see Fig. 1) to predict the emotion and order of the calls in a wait queue for a call center. The input to the system is the audio recordings collected when the caller is placed in the wait queue. The recording would be 15–30 s long. This audio clip goes through the Audio Preprocessing Module, where the features are extracted, and the audio sample is transcribed to text which is passed on to the next module. The Emotion Detection Module consists of two submodules: Speech Emotion Recognition and Textual Analysis. The speech emotion recognition submodule analyses the prosodic and waveform features to predict the emotion. The textual analysis submodule analyses the transcribed text from the audio recording to determine the underlying emotion. The output from both the submodules is then combined to compute the overall emotion from the four basic emotions of anger, sadness, happiness, and neutral emotions. This emotion and other extracted features are inputted into the Call Prioritizer Module, where the calls are reordered according to the prioritized emotion of the application. Section 3.1 describes the Audio Preprocessor Module. Section 3.2 illustrates the Emotion Detection Module’s working, and Sect. 3.3 explains the Call Prioritizer Module algorithm.
3.1 Audio Preprocessor Module Dataset. The datasets used are the Interactive Emotional Dyadic Motion Capture developed by USC (USC-IEMOCAP) [12] and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [14]. IEMOCAP consists of multimodal information, i.e., it has video, audio, and text transcriptions of the same. We have
Fig. 1 System block diagram
9 Intelligent Call Prioritization Using Speech Emotion Recognition
105
used the audio and text from the dataset that contains 10,038 corpus samples. The RAVDESS dataset contains 7356 files. The datasets chosen have audio files of actors enacting emotions as real-world call datasets are not publicly available. The presumption made is that real-world conversation will not differ vastly from the acted-out audio samples. For the textual analysis module, we have used a combination of 3 datasets—DailyDailog [15], ISEAR [16], and Emotion-Stimulus [17]. Preprocessing and Feature Extraction. The audio samples are augmented to prevent overfitting and output a better accuracy. The audio samples are subject to noise injection, waveform shifting, and speed modification. The waveform and spectrogram for a sample audio clip from the dataset having the emotion as “happy” have been visualized in Figs. 2 and 3, respectively. The happy and sad samples are oversampled since they are underrepresented in the dataset. The following features are extracted from the audio sample for prediction: signal mean, signal standard deviation, pitch, root mean square error (RMSE), root mean square deviation (RMSD), duration of silence, harmonic, autocorrelation factors, and MFCCs using the Librosa [18] library. The text from the audio clip is transcribed using the SpeechRecognition python library that runs on Google Cloud Speech API. For textual analysis, the text is first tokenized, followed by the counting of unique tokens. Each token is padded and encoded for categorization. The text is also checked for buzzwords, which are important phrases and words determined by the business application.
Fig. 2 The waveform of an audio sample with “happy” emotion
Fig. 3 Spectrogram of the waveform in Fig. 2
106
S. Addagarla et al.
3.2 Emotion Detection Module Speech Emotion Recognition Submodule. The speech emotion recognition module predicts the emotion based on the extracted features. The following deep learning models and machine learning algorithms were trained on the preprocessed dataset. From Table 1, it is observed that though models such as Multi Naïve Bayes, XGBoost, and Logistic Regression had comparable performance, the Random Forest algorithm performs better than the Deep Learning models and other machine learning algorithms. It is also faster compared to Deep Learning algorithms. Random Forest also proves to be useful in a real-time application due to its nimble and lightweight feature. Deep Learning algorithms also take significantly more time to train as compared to the Machine Learning models. The speech emotion recognition module has the highest accuracy for happiness and anger emotions as these emotions have much more distinguished characteristics auditorily as compared to sadness and neutral emotions. Textual Emotion Analysis Submodule. BERT is one of the most versatile methods for performing NLP as this model can fine-tune the input data specific to the context of the language and the purpose of use. Another advantage that BERT has is that it is pre-trained on a large dataset which improves the accuracy even without finetuning. For predicting the emotion through the text transcripts of the call, we have used Transfer Learning using BERT, using the KTrain library. The model gives an unweighted average accuracy (UAA) of 82%. The accuracies of the BERT model and alternative Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) models are as follows. From Table 2, it is observed that BERT outperforms the CNN and LSTM model by a considerable margin, which can also be noticed by interpreting the loss graph of the CNN and the LSTM model depicted in Figs. 4 and 5, respectively. Even though training loss decreases significantly, the same is not reflected in the validation loss. These models need more time and epochs to attain similar accuracy as the BERT Table 1 Accuracy results of the models for the speech emotion recognition submodule Classifier
Accuracy Overall
Happiness
Anger
Sadness
Neutral
SVM
0.76
0.92
0.86
0.89
0.86
Multi Naïve Bayes
0.88
0.98
0.94
0.88
0.96
Random forest
0.90
0.95
0.95
0.95
0.96
XGBoost
0.89
0.97
0.93
0.95
0.95
Multilayer perceptron
0.89
0.96
0.95
0.92
0.95
Logistic regression
0.89
0.98
0.94
0.89
0.96
CNN
0.64
0.75
0.84
0.85
0.83
LSTM
0.81
0.89
0.94
0.91
0.89
9 Intelligent Call Prioritization Using Speech Emotion Recognition
107
Table 2 Accuracy results of the models for the textual emotion analysis submodule Classifier
Accuracy Overall
Happiness
Anger
Sadness
Neutral
CNN
0.77
LSTM
0.73
0.90
0.89
0.91
0.91
0.87
0.89
0.89
BERT
0.82
0.91
0.94
0.91
0.93
0.93
model. The CNN and LSTM models are most accurate in identifying “Neutral” while the BERT model has the highest accuracy for “Happiness” and lowest accuracy for “Anger.” Hence, the BERT model is chosen for the textual analysis of the text transcribed from the input audio sample. Fig. 4 Textual emotion analysis—CNN loss graph
Fig. 5 Textual emotion analysis—LSTM loss graph
108
S. Addagarla et al.
3.3 Call Prioritizer Module The predicted emotion from the speech emotion recognition and textual analysis submodules are combined in a weighted average format to output the overall predicted emotion. We propose an algorithm to prioritize the callers in the waiting queue considering the following factors: • Emotion Score: This is the combined prediction score generated by the Emotion Detection Module by analyzing both audio and textual features. This score is then multiplied by the emotion multiplying factor, depending on the detected emotion and business logic. • Buzzword Score: This is set depending on the presence of the “buzzword” in the short recording of the caller. • Loyalty Score: This score gives priority to different client tiers for a multi-tier business. • Call Back Score: This factor considers if a caller got disconnected by an error or is calling back multiple times because the issue is unresolved. In such a case, the caller is given an additional boost in the score to reduce the frustration of being at the bottom of the waiting queue to resolve the same issue again [19]. • Wait Time Score: This factor is added to prevent starvation of the callers and accounts for the time the caller has spent in the waiting queue. Using the above-mentioned factors, a Priority Score is calculated for each caller according to which they are placed in a waiting queue. After this, we use the Agent Emotion Matrix to calculate the Agent Suitability for each caller in the wait queue. The agents are ranked according to their ability to deal with callers having certain emotions. A separate Agent Waiting List is generated for each agent, considering their suitability rank for each caller. As soon as any agent gets free, the caller on top of their specific waiting list is dequeued and assigned to them. Call Priority Score: WT
Wait time, i.e., the time spent in the waiting queue in seconds
CT
Current time in seconds
AT
Arrival time in seconds
S
Priority score, based on which the call is prioritized
ES
Emotion score
EM(e)
Emotion multiplier for emotion ‘e’
L
Loyalty score
CB
Call back score
CBF
Call back factor
BS
Buzzword score
S(a)
Suitability score for agent ‘a’, based on which the caller is placed in the waiting queue of agent ‘a’ (continued)
9 Intelligent Call Prioritization Using Speech Emotion Recognition
109
(continued) R(a)
Suitability rank of agent ‘a’ for the caller
C
Number of calls in the waiting queue
N
Number of agents
A ≡ aij Agent emotion matrix of size C * n
WT = CT − AT
(1)
S = ES ∗ EM(e) + L + CB ∗ CBF + BS + WT/60
(2)
S(a) = S ∗ (1 − R(a)/n)
(3)
ai j = {e, when agent i supports emotion e at priority level j 0, otherwise
(4)
where i ranges from 1 to C, 1 < k < n, 1 < j < n When a new caller joins the waiting queue, the Agent Suitability is calculated using the Agent Emotion Matrix A (4). The caller is placed in the agent waiting queues based on the Suitability Score for each agent (3), which is in turn dependent on (1) and (2). As soon as an agent becomes free, the callers on top of the waiting queue of that agent are assigned to that agent, removed from all waiting queues and Agent Suitability is recalculated for the remaining callers in the waiting queue (see Fig. 6).
4 Results and Observations Table 3 shows the comparison between the proposed combined text and speech model, the only speech model, and the only text model. The proposed model surpasses the alternative models and has the highest accuracy for the “anger” emotion. Adding the semantic aspect of communication to the prosodic features enhances the recognition rate. Humans also analyze telephonic conversations on these 2 parameters and enabling the model to integrate both perspectives gives it the necessary edge to succeed. The proposed model not only has better overall accuracy but on average recognizes each emotion better than the other 2 models. This validates that both speech emotion recognition and textual emotion analysis are highly complementary. To simulate a call center using the proposed algorithm versus the first-come-firstserve algorithm traditionally used in call centers, a list of callers with the attributes required by the call prioritization module is passed through the algorithm. The application in Fig. 7 is the simulation of the calls arriving at the call center and being
110
S. Addagarla et al.
Fig. 6 Flowchart for call prioritization
Table 3 Accuracy comparison of three models Model
Accuracy Overall
Happiness
Anger
Sadness
Neutral
Only speech
0.90
0.95
0.95
0.95
0.96
Only text
0.82
0.94
0.91
0.93
0.93
Speech + text
0.92
0.95
0.97
0.96
0.96
processed by the proposed methodology. The upper panel indicates whether a certain agent is free or busy and if busy which caller is the agent currently dealing with. The panel on the left shows the overall waiting queue and the individual waiting queues of each agent. The panel on right simulates ten calls, there are three buttons for each call, one for starting the call, one for ending it, and one for listening to the audio file. Using the call and end call button for each caller, we can simulate the running of the call center as seen in the figure above. When the call arrives, the agent availability is checked and if all agents are busy, the recorded audio clip is pushed to run through the model to extract the emotion from the Emotion Detection module. After detecting the emotion, the calls are dynamically reordered in the queues according to the calculated scores. The findings as shown in Table 4 depict the Average Waiting time before and after using the application. The Average Patience time is also listed in the table, which
9 Intelligent Call Prioritization Using Speech Emotion Recognition
111
Fig. 7 Simulation of call center using proposed methodology
is the average amount of time in seconds a caller is willing to stay in the waiting queue. It is observed that the waiting time of the emotions, “Anger” and “Sadness” have drastically reduced, while though that of the others have increased, they are still less than the average patience time of the caller. During the simulation of fifteen callers calling in a brief time period, two angry callers hung up, while waiting to be assigned to an agent using first-come-first-serve, while none hung up using the proposed algorithm.
112
S. Addagarla et al.
Table 4 Comparing the waiting time difference before and after applying the algorithm Emotion Average waiting time using Average waiting time using Average patience time (in the first-come-first-serve emotion-based routing (in s) algorithm (in s) s) Sad
45
30
390
Neutral
90
210
435
Angry
300
140
220
Happy
60
90
555
5 Conclusion This research has identified the gap and combated the issue that is faced in call centers daily. The proposed solution analyses the emotional state of the caller before the call has taken place when the caller is placed in a waiting queue when all the agents are busy. This helps to prioritize the callers according to the use case of the application. In customer service centers, anger could be the prioritized emotion, whereas in national emergency helplines, fear could be the prioritized emotion. This gives the research flexibility to adapt to any application domain as it considers both the emotional aspects of speech as well as the lexicon and linguistic aspects of the call. Call centers usually emphasize reviewing the call post its occurrence, but by performing pre-call analysis, the waiting time for the prioritized emotion callers is cut down significantly and is smartly redistributed among other callers. All the callers are not placed in a singular waiting queue but are placed in each agent’s waiting queue according to their compatibility and the agent’s emotional capability to address the call. This ensures that the caller avails of the best possible service for their call. This creates the best outcome for both the caller and the call center. By accounting for additional parameters such as loyalty, buzzword, and callback scores, the model increases its capability to make a well-informed decision. The experimental results, though on a small scale, look promising and viable to work on a larger scale. For future research, the model could be trained on real-world data from call centers to assess its efficacy on realistic data. The model would also benefit from post-call analysis and agent skill-based routing in addition to the proposed research to heighten its emotion recognition rate and reduce waiting time even further. More parameters could be added to the priority score to make it more adaptable and realistic.
References 1. Kumar M, Misra M (2020) Evaluating the effects of CRM practices on organizational learning, its antecedents and level of customer satisfaction. J Bus Ind Market 2. Ekman P (1999) Basic emotions. Handbook Cogn Emotion 98(45–60):16 3. Petrushin V (2000) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering
9 Intelligent Call Prioritization Using Speech Emotion Recognition
113
4. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005, Sept) A database of German emotional speech. Interspeech 5:1517–1520 5. Lalitha S, Tripathi S, Gupta D (2019) Enhanced speech emotion detection using deep neural networks. Int J Speech Technol 22(3):497–510 6. Vidrascu L, Devillers L (2007, Aug) Five emotion classes detection in real-world call center data: the use of various types of paralinguistic features. In: Proceedings of international workshop on paralinguistic speech between models and data, ParaLing 7. Bojani´c M, Deli´c V, Karpov A (2020) Call redistribution for a call center based on speech emotion recognition. Appl Sci 10(13):4653 8. Dasgupta PB (2017). Detection and analysis of human emotions through voice and speech pattern processing. arXiv:1710.10198 9. Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T (2019) Speech emotion recognition using deep learning techniques: a review. IEEE Access 7:117327–117345 10. Heracleous P, Mohammad Y, Yoneyama A (2020, July) Integrating language and emotion features for multilingual speech emotion recognition. In: International conference on humancomputer interaction. Springer, Cham, pp 187–196 11. Cho J, Pappagari R, Kulkarni P, Villalba J, Carmiel Y, Dehak N (2018) deep neural networks for emotion recognition combining audio and transcripts. INTERSPEECH 12. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359 13. Jia J, Zhou S, Yin Y, Wu B, Chen W, Meng F, Wang Y (2018) Inferring emotions from large-scale internet voice data. IEEE Trans Multimedia 21(7):1853–1866 14. Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5):e0196391 15. Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) DailyDialog: a manually labelled multi-turn dialogue dataset. IJCNLP 16. Geneva U, Wallbott HG (1994) Evidence for universality and cultural variation of differential emotion response patterning: correction. J Pers Soc Psychol 67(1):55–55 17. Ghazi D, Inkpen D, Szpakowicz S (2015, April) Detecting emotion stimuli in emotionbearing sentences. In: International conference on intelligent text processing and computational linguistics. Springer, Cham, pp 152–165 18. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015, July) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25 19. Hu K, Allon G, Bassamboo A (2022) Understanding customer retrials in call centers: preferences for service quality and service speed. Manuf Serv Oper Manage 24(2):1002–1020
Chapter 10
The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Marko Djuric, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, Milos Antonijevic, and Marko Sarac
1 Introduction During the global pandemic of COVID-19, the world has witnessed a big spike in e-commerce and online transactions. Most economies in the world have relied on e-commerce to alleviate the pressure that quarantine has brought [2]. According to UNCTAD the 2019 e-commerce transactions globally increased to more than $26 trillion during that year and were equal to almost one-third of the gross domestic product (GDP) worldwide. Nilson company report states that by the year 2030 global volume of credit card transactions is expected to reach 74.14 trillion dollars, and the industry is projected to lose close to 49.32 billion dollars to fraud. Moreover, card fraud in the next ten years will amount to around $408.50 billion in losses cumulatively. Credit card frauds are a type of identity theft involving obtaining a victim’s credit card private data without authorization with a malicious goal to charge payments to the card and/or withdraw the money from it. Looking at the growth of M. Djuric · L. Jovanovic · M. Zivkovic · N. Bacanin (B) · M. Antonijevic · M. Sarac Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] M. Djuric e-mail: [email protected] L. Jovanovic e-mail: [email protected] M. Zivkovic e-mail: [email protected] M. Antonijevic e-mail: [email protected] M. Sarac e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_10
115
116
M. Djuric et al.
credit card usage in the coming years, it is necessary to implement a precise system to detect fraudulent activity and protect everyone using them. In this research, the authors utilized machine learning (ML) methods in order to detect fraudulent activity in credit card transactions that were assessed on a realworld dataset formed during September 2013 from different transactions by credit card users across Europe. Unfortunately, the dataset is exceedingly imbalanced. To mitigate this issue the authors used a social network search algorithm (SNS) [38]. Machine learning algorithms utilized by the authors in their research include decision tree (DT), random forest (RF), extra tree (ET), support vector machine (SVM), extreme gradient boosting (XGBoost), and logistic regression (LR). The aforementioned ML algorithms were compared separately in order to measure the quality of convincingness and classification. Also, Adaptive Boosting (AdaBoost) technique has been used with every model in order to make them more robust. The focal point of this research is a parallel comparison of multiple machine learning algorithms on a dataset that is accessible to the public, that consists of real-world card payments. Furthermore, this paper examines AdaBoost to boost the condition of observed classifiers on an exceedingly disproportional credit card dataset. Finally, the paper introduces the possibility of applying the novel SNS algorithm to control the highly imbalanced dataset. SNS metaheuristics were used to optimize the hyperparameters of AdaBoost. The authors correlated their suggested solution to the already implemented solution and all other ML algorithms in [28]. The data were also compared with Matthews correlation coefficient (MCC), the area under the curve (AUC), precision, accuracy, and recall. Some of the contributions of the research given in this manuscript include the following: – A scalable framework that is designed to detect fraudulent credit card transactions. – Social network search algorithm is implemented for the purpose of solving the problem of imbalanced classes within the employed dataset. It was used to optimize the hyperparameters of AdaBoost model. – AdaBoost was blended with the SNS metaheuristics with the purpose to raise the performance of the proposed framework. Furthermore, a comparative analysis was conducted with the following metrics in mind: accuracy, precision, recall, AUC, and MCC. – This framework was implemented on a dataset that is extremely imbalanced in order to confirm its effectiveness The structure of the rest of this manuscript is defined in the following way: Section 2 brings a brief literature survey of recent publications that employed machine learning algorithms for credit card fraud identification. Section 3 describes the employed SNS metaheuristics that were later utilized in experiments to enhance the performances of the AdaBoost even further. Section 4 presents the experimental configuration, the implementation of the suggested framework on a synthetic credit card fraud dataset, and finally, presents the obtained experimental findings together with comparative analysis. Section 5 concludes the research.
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection
117
2 Literature Review and Background In their work, Tanouz et al. [40] have given us a structure on how to utilize machine learning methods for detecting credit card frauds. The authors utilized the European cardholders dataset for determining the performance of chosen methods. Furthermore, the authors employed an under-sampling approach in order to deal with the imbalance that occurs in a used dataset. Random forest and logistic regression were examined in this paper, and accuracy was considered to be the main performance measurement. Results have shown that random forest managed to detect fraud with 91.24% efficiency. On the other hand, logistic regression managed to achieve a 95.16% accuracy. Additionally, the authors managed to measure the confusion matrix in order to affirm if their chosen methods performed adequately in regards to negative and positive classes. Randhawa et al. [35] have suggested a method that pairs AdaBoost with a majority voting in order to detect credit card frauds. In their paper, the authors utilized the dataset produced by cardholders in Europe. Furthermore, the adaptive boosting method (AdaBoost) was paired with different machine learning methods like support vector machines. Matthews’s correlation coefficient and accuracy were chosen to be the main measures of performance. AdaBoost-SVM had a MCC of 0.044, and 99.959% accuracy. Rajora et al. [34] organized a parallel investigation of machine learning algorithms to determine fraudulent activity in credit card transactions utilizing the dataset that was generated from European cardholders. Notable methods that were considered include random forest (RF) and K-nearest neighbors (KNN). The area under the curve and accuracy were considered to be the main performance measures. Results that were obtained show that the random forest algorithm accomplished an accuracy of 94.9% and AUC of 0.94. On the other side, K-nearest neighbors accuracy was 93.2% and AUC of 0.93. The authors decided not to explore the imbalance that occurs in the used dataset. Dataset: The dataset utilized for the purpose of this paper was produced by collecting European cardholder’s transactions in September 2013. It is openly available on Kaggle, although it is greatly skewed. Furthermore, it is not synthetic as the transactions contained in it happened over a longer time period. Also, it has 284,807 transactions out of which 99.828% are valid and 0.172% are false. Furthermore, 30 attributes are included, alongside time and amount.
2.1 Adaptive Boosting Algorithm AdaBoost is employed to enhance the performances of observed machine learning methods. It produces the best results when used in combination with weak learners, models that attain accuracy just above random chance. A combination of various elementary or less precise model boosting techniques generally helps machine learning
118
M. Djuric et al.
solutions to develop and achieve higher accuracy. Adaptive boosting algorithm [26] has been utilized in this research in order to help with performance classification. AdaBoost generates weighted sums using a mix of individually boosted outputs. The mathematical formula for the algorithm is described below:
G N (x) =
N
gt (x)
(1)
t=1
where the weak learner results in a likely input vector x the prediction are depicted as gt , whilst iteration is presented as t. Prediction representing weak learners in each training sample is h(xn ). In every iteration a coefficient βt multiplies a picked weak learner to measure the error of the training process L, which is conveyed in the next equation: Lt =
L[G t−1 xn + βt h(xn )]
(2)
n
where G t−1 serves as a boosted classifier taken from the iteration before t − 1, whilst the weak classifier that is considered to be a part of the conclusive model is βt h(xn ).
2.2 Metaphor Based Metaheuristics The bulk of metaheuristics is established on the basis of biological evolution. Special attention is given to stimulating different biological metaphors which diverge in character when it comes to the schemes used for depiction. Three major paradigms have established themselves: immune systems, swarm, and evolutionary [1]. Artificial immune systems (AIS) have found their ingenuity in theoretical immunology, and noticeable immune functions, with all their rules and methods. A case can be made that they are an algorithmic alternative to Evolutionary Algorithms. When used for optimization, antibodies serve as candidate solutions, which iteratively grow by reproducing certain operators such as cloning, mutation, and selection. Antigens serve as an objective function, and memory cells are used to store satisfying solutions. Nearly every AIS-based metaheuristic relies on the basis of clonal selection such as the B-Cell algorithm, clonal selection algorithm [23], and artificial immune network [22] that is used for optimization (opt-AINET) [19], and negative selection algorithms. Swarm intelligence (SI) finds its inspiration in the collective behavior of an animal community, such as bees or ants. Swarm intelligence mostly relies on the decentralization principle where all the individuals are transformed by interacting with other solutions and the environment. Some of the more established algorithms include
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection
119
particle swarm optimization (PSO) [29], harris hawk optimizer (HHO) [27], bat algorithm (BA) [43], ant colony optimization (ACO) [24], moth-flame optimization (MFO) [31], firefly algorithm (FA) [42], grey wolf optimizer (GWO) [32] and many others. Evolutionary algorithms (EA) mimic the biological advancement on a cellular level utilizing various parameters such as mutation, crossover, selection, and reproduction to achieve the best possible candidate solutions (chromosomes). For evolutionary calculation, four paradigms are notable: genetic programming, evolutionary strategies, genetic algorithms, and evolutionary programming. It appears that the three concepts are interdependent due to the connection of the operators that were used, specifically selection and mutation operators. Thus a case could be made that AIS and SI are a subcategory of EA. That being said, what separates these three approaches is the method by which they handle exploration and exploitation. In the domain of informatics, computing and information technologies, natureinspired metaheuristics techniques had been extensively used in addressing NP-hard tasks, including MRI classifier optimization [15, 17], forecasting the rate of COVID19 infections [44, 46], artificial neural network feature selection optimization and hyperparameters tuning [4, 7–10, 12, 14, 16, 21, 25, 37, 49], cloud-based task scheduling optimization [6, 13, 18, 48], network lifetime optimization in the domain of wireless sensor networks [5, 11, 45, 47, 50], and so on.
3 Social Network Search Algorithm Social Network Search (SNS) algorithm [39] draws its inspiration from social networks. Human beings are a very social species, and social networks are mechanisms that were created for the purpose of connecting people. SNS algorithm replicates the interaction between social network users in order to achieve more popularity. The base of the SNS algorithm is the interaction between users across different social networks. They can alter and influence each other’s opinions, all for the purpose of increasing their popularity. There are different kinds of social networks, but their users exhibit similar behavior. They familiarise themselves with other users’ views and accept them if they are better. Decision moods and mathematical model: The viewpoint of the users can be modified by alternative views in various moods including Imitation, Conversation, Disputation, and Innovation. Virtually every metaheuristic algorithm applies a set of operations to develop new solutions. SNS algorithm achieves new solutions by using one of the four moods that mimic real-world social behavior. Imitation: The predominant quality of social media is that people have the option to follow one another, and when someone shares something, friends and followers of said person can be informed about it. If some new happening presents a challenge, users will aim to post a topic about it. The formula for imitation is presented in Eq. (3)
120
M. Djuric et al.
X i new = X j + rand(−1, 1) × R R = rand(0, 1) × r
(3)
r = X j − Xi where X j is used to describe the vector of the j-th user’s opinion, which is picked at random, and i = j, is the vector of i-th user’s view. Rand(−1, 1), and rand(0, 1) are arbitrary vectors in intervals [−1, 1] and [0, 1]. The shock radius R represents the amount of influence of the j-th user, and its magnitude is considered as a multiple of r . Conversation: Represents a category where people find out more information while communicating amongst themselves and grow their information about different occurrences via a private chat. Users tend to find a different perspective through conversation and can draw new conclusions. The mathematical formulation is presented in Eq. (4) X i new = X k + R R = rand(0, 1) × D
(4)
D = sign( f i − f j ) × (X j − X i ) where X k represents the vector of the subject of conversation and is chosen at random. R is the result that the conversation is having, and is established on the difference in opinions and represents the change in their perspective (X k). D is the difference of views that the users have, rand(0, 1) is an arbitrary vector in the interval [0, 1], X j represents a vector of a random user’s view for a chat and X i is the vector that represents the view that the i-th user has, and it is important to remember that i = j = k. Sign ( f i − f j ) is used to determine in which direction X k will move by comparing f i and f j . Disputation: Describes a state in which the users explain and defend certain views on a subject to other users. In addition, users can create groups in order to discuss certain subjects. Thus are users influenced by seeing different opinions. Here an arbitrary number of social network users are considered to be commenters or members of a group, and new views can be calculated according to Eq. (5) X i new = X i + rand(0, 1) × (M − AF × X i ) Nr Xt M= t Nr AF = 1 + round(rand)
(5)
where X i is the vector used to represent the view of i-th user, rand(0, 1) is an arbitrary vector in the interval [0, 1], M is mean of views of commenters. AF is the admission factor, which serves as an indication of insistence that the users have of their opinion when discussing it with other people and is represented by an integer between 1
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection
121
and 2. Round() is a function that rounds the input to the nearest integer, and rand is a random number [0, 1]. Nr represents the number of people that comment or group size. It is a number between 1 and N-user. N-user represents the number of users of the network. Innovation: Occasionally shared content is the result of users’ opinions and experiences. For example when a person contemplates particular issues and possibly views said issues in a different light and is in a situation where it is capable of understanding the nature of the problem with more accuracy or discovering an entirely new perspective. A distinct subject may have particular features, and by changing the perception about some of those features, the broad perception of them changes. The mathematical formula for innovation is described in Eq. (6) X id new = t × x dj + (1 − t) × n dnew n dnew = lbd + rand1 × (ubd − lbd )
(6)
t = rand2 where X i new = [x1 , x2 , x3 , ..., xid new , ..., xD ] Xi =
X i , f (X i ) < f (X i new ) X i new , f (X i new ≥ f (X i ))
(7)
(8)
The pseudocode for the SNS algorithm can be seen in Algorithm 1.
4 Experimental Setup In the suggested setup, the SNS algorithm was utilized to optimize the AdaBoost hyperparameters. The following hyperparameters underwent the optimization procedure: – n_estimators, within the boundaries [60, 200] that were empirically determined. – base_estimator, binary-coded with three possible values, 0 denoting DT, 1 representing LR, and 2 being SVM. – learning_rate, within the range [0.5, 2]. Each solution within the population is encoded as a vector of 3 parameters, where the first parameter is an integer value in the range [60, 200], the second parameter is also an integer in the range [0, 2], while the third parameter is continuous in the
122
M. Djuric et al.
Algorithm 1 Psuedocode for tne SNS algorithm Set number of user, MaxIter, LB, UB, Iter = 0 Initialize population according to X 0 = LB + rand(0, 1) × (UB − LB) Evaluate each user according to objective function i =0 Do If (i < N ) Iter = Iter + 1 i =0 i =i +1 Mood = rand(1, 4) If (Mood == 1) Create new views based on Eq. (3) ElseIf (Mood == 2): Create new views based on Eq. (4) ElseIf (Mood == 3) Create new views based on Eq. (5) ElseIf (Mood == 4) Create new views based on Eq. (3) Limit new views According to Eq. (8) Evaluate new view based on the objective function If (New view better then current view) Keep old view, don’t share new view Else Replace old view with new view and share it While Termination criteria not met Return Optimal Solution
range [0.5, 2]. Therefore, this particular optimization problem is a combined integercontinuous NP-hard task. The algorithm was tested with 20 units in the population, with 100 iterations. The best-obtained outcomes have been reported in the resulting tables. To provide good insight into the results, the FA algorithm has also been applied to tune the same AdaBoost hyperparameters. FA was tested under the same conditions as the SNS algorithm. The simulations were conducted in two stages, as suggested by the experiments published by [28]. It must be noted here that the authors have separately realized all machine learning methods for the purpose of this research, and used their own results in the later comparative analysis. The obtained results of ML methods that don’t employ the AdaBoost are presented in Table 1, while the scores of these methods with AdaBoost are given in Table 2. For the experiments with AdaBoost, another version of AdaBoost that had hyperparameters optimized by the FA algorithm has been implemented and tested under the same conditions, to enable a more detailed comparison. The results of the basic AdaBoost have also been provided. Since the dataset is highly skewed, the synthetic minority over-sampling technique (SMOTE) has been utilized to address the heavily imbalanced data. The KNN model has been used together with SMOTE method to create the peculiar class entries by connecting the data points to k-nearest neighbors. This approach generates fictional
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Table 1 Simulation outcomes without AdaBoost employed Method ACC (%) RC (%) PR (%) DT RF ET XGB LR
99.91 99.95 99.93 99.90 99.92
75.60 79.34 78.22 59.43 56.61
Table 2 Simulation outcomes with AdaBoost employed Method ACC (%) RC (%) DT-AdaBoost RF-AdaBoost ET-AdaBoost XGB-AdaBoost LR-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost
99.69 99.94 99.97 99.98 98.76 99.94 99.97 99.99
99.02 99.80 99.95 99.96 93.84 99.93 99.97 99.98
123
MCC (%)
79.85 97.22 96.28 83.98 85.23
0.80 0.86 0.88 0.73 0.61
PR (%)
MCC (%)
98.81 99.92 99.91 99.94 97.58 99.91 99.95 99.96
0.98 0.99 0.99 0.99 0.95 0.99 0.99 0.99
synthetic data, that is not observed as a direct copy of the class sample being the minority. This approach is used to control the overfitting problem. The actual implementation of this method is used exactly as proposed in [28], where the pseudo-code can also be found. Finally, the identical library (Imblearn) was used for the credit card dataset as suggested in [28], with the goal to establish proper grounds for the comparisons between the approaches. Entries in the utilized dataset are labeled 0 or 1 depending on whether or not the data is legitimate or fraudulent. Therefore, this task belongs to the binary classification problems. In these circumstances, the main metrics that are utilized to validate the performances of the observed models are the precision (PR), accuracy (AC), and recall (RC), calculated according to the following mathematical expressions:
AC =
TN + TP TP + TN + FN + FP
(9)
PR =
TP TP + FP
(10)
RC =
TP TP + FN
(11)
124
M. Djuric et al.
where the TN and TP mark the true negatives and positives, while the FN and FP denote the false negatives and positives, in that order. True values indicate the situations where the model successfully estimated the negative/positive outcome. On the other hand, the false values describe the situations where the model made a mistake in predicting a negative/positive outcome. Since the utilized European cardholders dataset is in extreme disproportion, the performances of the observed models must be evaluated with additional metrics. Those metrics include the confusion matrix (CM), the area under the curve (AUC) [33], and the MCC [20]. This paper measures the quality of the classification by utilizing the MCC metrics, with an allowed value range of [−1, 1]. Since the objective quality stands in proportion to the MCC metric, the higher the values of the MCC indicate better classification performance. A confusion matrix is utilized to highlight the errors made by the observed classifier [30], while the AUC measure represents both the quality and reliability of the models, determining the effectiveness of the observed classifier. The values of the AUC also fall in the range [−1, 1], where a higher value indicates a more optimal solution [33]. (TN × TP) − (FN × FP) MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)
(12)
The results depicted in Table 3 present the comparison of the suggested SNS technique against other well-known machine learning models. Additional details on other competitor techniques observed through the comparative analysis are obtainable by [28]. The proposed model is significantly superior to the traditional models, with the accuracy that is in some cases more than 5% higher than standard RF and KNN approaches. The models have been evaluated on the synthetic credit card dataset that can be obtained from [3]. The fundamental characteristics of this dataset are shown in Table 4. The experimental findings of the models and their performances on this dataset are presented in Table 5. The SNS-enhanced model achieved the best performance on this dataset, as the results clearly show. Both experiments have shown the challenges that can arise from highly disproportional datasets. High accuracy can be misleading, as the model will be able to correctly classify the valid transactions (that are dominant), however, the minority class can be falsely classified, and consequently, the model will fail to detect some of the malicious transactions. The proposed SNS model has shown very promising results in this domain, however, further extensive testing with additional real-life datasets is required before putting it into the use in practice.
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Table 3 Existing standard ML models comparison Author Model Rajora and others [34] Rajora and others [34] Trivedi and others [41] Tanouz and others [40] Tanouz and others [40] Riffi and others [36] Riffi and others [36] Suggested model Suggested model Suggested model Suggested model Suggested model Suggested model Suggested model
AC (%)
RF KNN RF RF LR MLP ELM RF-AdaBoost DT-AdaBoost ET-AdaBoost XGB-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost
94.90 93.20 94.00 91.24 95.16 97.84 95.46 99.94 99.69 99.97 99.98 99.94 99.97 99.99
Table 4 Description of the dataset key features Properties Class User, Card, Year, Month, Day, Time, Amount, Use Chip, Merchant Name, Merchant City, Merchant State, Zip, MCC, Errors
Is fraud Not fraud
Table 5 Synthetic dataset AdaBoost experimental findings Method ACC (%) RC (%) PR (%) DT-AdaBoost RF-AdaBoost ET-AdaBoost XGB-AdaBoost LR-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost
99.68 99.94 99.98 99.98 100.0 99.95 99.99 100.0
98.98 99.85 99.99 99.99 98.90 99.91 99.99 100
98.80 99.94 99.92 99.92 78.84 99.88 99.93 99.95
MCC (%) 0.98 0.99 0.99 0.99 0.17 0.98 0.98 0.99
125
126
M. Djuric et al.
5 Conclusion This research focused on metaphor-based metaheuristic algorithms specified for detection in credit card transactions. The algorithms covered in this paper included decision tree (DT), random forest (RF), logistic regression (LR), extra tree (ET), support vector machine (SVM), and extreme gradient boosting (XGBoost). Additionally, every algorithm was tested with the AdaBoost method in order to improve classification accuracy. This paper proposes an SNS approach in order to maximize these performances by optimizing the AdaBoost hyperparameters’ values with the SNS algorithm. The results from the conducted experiments clearly support the SNS algorithm as superior, as the SNS-AdaBoost method obtained the best performances on the observed dataset. This established the SNS-AdaBoost as a strong candidate for solving the credit card frauds detection, however, further extensive testing of the model on more datasets is necessary. The possible future work in this field would include modifying the original implementation of the SNS algorithm to further improve its performances and also using it in other application domains where it can address other NP-hard practical problems.
References 1. Abdel-Basset M, Abdel-Fatah L, Sangaiah AK (2018) Chapter 10—metaheuristic algorithms: a comprehensive review. In: Sangaiah AK, Sheng M, Zhang Z (eds) Computational intelligence for multimedia big data on the cloud with engineering applications, pp 185–231. Intelligent Data-Centric Systems, Academic Press. https://www.sciencedirect.com/science/article/ pii/B9780128133149000104 2. Alcedo J, Cavallo A, Dwyer B, Mishra P, Spilimbergo A (2022) E-commerce during covid: stylized facts from 47 economies. Working Paper 29729, National Bureau of Economic Research. http://www.nber.org/papers/w29729 3. Altman ER (2019) Synthesizing credit card transactions. arXiv preprint arXiv:1910.03033 4. Bacanin N, Alhazmi K, Zivkovic M, Venkatachalam K, Bezdan T, Nebhen J (2022) Training multi-layer perceptron with enhanced brain storm optimization metaheuristics. Comp Mater Cont 70(2):4199–4215. http://www.techscience.com/cmc/v70n2/44706 5. Bacanin N, Arnaut U, Zivkovic M, Bezdan T, Rashid TA (2022) Energy efficient clustering in wireless sensor networks by opposition-based initialization bat algorithm. In Computer networks and inventive communication technologies. Springer, pp 1–16 6. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In 2019 27th Telecommunications forum (TELFOR). IEEE, pp 1–4 7. Bacanin N, Bezdan T, Venkatachalam K, Zivkovic M, Strumberger I, Abouhawwash M, Ahmed A (2021) Artificial neural networks hidden unit and weight connection optimization by quasirefection-based learning artificial bee colony algorithm. IEEE Access 8. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In Mobile computing and sustainable informatics. Springer, pp 397–409
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection
127
9. Bacanin N, Petrovic A, Zivkovic M, Bezdan T, Antonijevic M (2021) Feature selection in machine learning by hybrid sine cosine metaheuristics. In International conference on advances in computing and data sciences. Springer, pp 604–616 10. Bacanin N, Stoean R, Zivkovic M, Petrovic A, Rashid TA, Bezdan T (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21). https://www.mdpi.com/ 2227-7390/9/21/2705 11. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In International conference on hybrid intelligent systems. Springer, pp 328–338 12. Bacanin N, Zivkovic M, Bezdan T, Cvetnic D, Gajic L (2022) Dimensionality reduction using hybrid brainstorm optimization algorithm. In Proceedings of international conference on data science and applications. Springer, pp 679–692 13. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neur Comput Appl 1–26 14. Bacanin N, Zivkovic M, Salb M, Strumberger I, Chhabra A (2022) Convolutional neural networks hyperparameters optimization using sine cosine algorithm. In Sentimental analysis and deep learning. Springer, pp 863–878 15. Bezdan T, Milosevic S, Venkatachalam K, Zivkovic M, Bacanin N, Strumberger I (2021) Optimizing convolutional neural network by hybridized elephant herding optimization algorithm for magnetic resonance image classification of glioma brain tumor grade. In 2021 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 171–176 16. Bezdan T, Stoean C, Naamany AA, Bacanin N, Rashid TA, Zivkovic M, Venkatachalam K (2021) Hybrid fruit-fly optimization algorithm with k-means for text document clustering. Mathematics 9(16):1929 17. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Glioma brain tumor grade classification from mri using convolutional neural networks designed by modified FA. In International conference on intelligent and fuzzy systems. Springer, pp 955–963 18. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. In International conference on intelligent and fuzzy systems. Springer, pp 718–725 19. de Castro LN, Von Zuben FJ (2002) ainet: an artificial immune network for data analysis. In Data mining: a heuristic approach. IGI Global, pp 231–260 20. Chicco D, Jurman G (2020) The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13 21. Cuk A, Bezdan T, Bacanin N, Zivkovic M, Venkatachalam K, Rashid TA, Devi VK (2021) Feedforward multi-layer perceptron training by hybridized method between genetic algorithm and artificial bee colony. Data Sci Data Anal Oppor Chall 279 22. De Castro LN, Timmis J (2002) An artificial immune network for multimodal function optimization. In Proceedings of the 2002 congress on evolutionary computation. CEC’02 (Cat. No. 02TH8600), Vol 1. IEEE, pp 699–704 23. De Castro LN, Von Zuben FJ (2000) The clonal selection algorithm with engineering applications. In Proceedings of GECCO, Vol 2000, pp 36–39 24. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Magaz 1(4):28–39 25. Gajic L, Cvetnic D, Zivkovic M, Bezdan T, Bacanin N, Milosevic S (2021) Multi-layer perceptron training using hybridized bat algorithm. In Computational vision and bio-inspired computing. Springer, pp 689–705 26. Hastie TJ, Rosset S, Zhu J, Zou H (2006) Multi-class adaboost. Statistics and its. Interface 2:349–360 27. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Fut Gener Comp Syst 97:849–872 28. Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using smote and adaboost. IEEE Access 9:165286–165294
128
M. Djuric et al.
29. Kennedy J, Eberhart R (1995) Particle swarm optimization. In Proceedings of ICNN’95international conference on neural networks, Vol 4. IEEE, pp 1942–1948 30. Luque A, Carrasco A, Martín A, de Las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Patt Recogn 91:216– 231 31. Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl Based Syst 89:228–249 32. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 33. Norton M, Uryasev S (2019) Maximization of auc and buffered auc in binary classification. Math Program 174(1):575–612 34. Rajora S, Li DL, Jha C, Bharill N, Patel OP, Joshi S, Puthal D, Prasad M (2018) A comparative study of machine learning techniques for credit card fraud detection based on time variance. In 2018 IEEE symposium series on computational intelligence (SSCI), pp 1958–1963 35. Randhawa K, Chu Kiong L, Seera M, Lim C, Nandi A (2018) Credit card fraud detection using adaboost and majority voting. IEEE Access, pp 14277–14284 36. Riffi J, Mahraz MA, El Yahyaouy A, Tairi H, et al (2020) Credit card fraud detection based on multilayer perceptron and extreme learning machine architectures. In 2020 International conference on intelligent systems and computer vision (ISCV). IEEE, pp 1–5 37. Strumberger I, Tuba E, Bacanin N, Zivkovic M, Beko M, Tuba M (2019) Designing convolutional neural network architecture by the firefly algorithm. In 2019 International young engineers forum (YEF-ECE). IEEE, pp 59–65 38. Talatahari S, Bayzidi H, Saraee M (2021) Social network search for global optimization. IEEE Access 9:92815–92863 39. Talatahari S, Bayzidi H, Saraee M (2021) Social network search for global optimization 40. Tanouz D, Subramanian RR, Eswar D, Reddy GP, Kumar AR, Praneeth CV (2021) Credit card fraud detection using machine learning. In 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 967–972 41. Trivedi NK, Simaiya S, Lilhore UK, Sharma SK (2020) An efficient credit card fraud detection model based on machine learning methods. Int J Adv Sci Technol 29(5):3414–3424 42. Yang XS (2009) Firefly algorithms for multimodal optimization. In International symposium on stochastic algorithms. Springer, pp 169–178 43. Yang XS, Gandomi AH (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483 44. Zivkovic M, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA, et al (2021) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In Proceedings of international conference on sustainable expert systems. Springer, pp 169–184 45. Zivkovic M, Bacanin N, Tuba E, Strumberger I, Bezdan T, Tuba M (2020) Wireless sensor networks life time optimization based on the improved firefly algorithm. In 2020 International wireless communications and mobile computing (IWCMC). IEEE, pp 1176–1181 46. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 47. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In 2020 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 48. Zivkovic M, Bezdan T, Strumberger I, Bacanin N, Venkatachalam K (2021) Improved Harris Hawks optimization algorithm for workflow scheduling challenge in cloud—edge environment. In Computer networks, big data and IoT. Springer, pp 87–102 49. Zivkovic M, Stoean C, Chhabra A, Budimirovic N, Petrovic A, Bacanin N (2022) Novel improved salp swarm algorithm: an application for feature selection. Sensors 22(5):1711 50. Zivkovic M, Zivkovic T, Venkatachalam K, Bacanin N (2021) Enhanced dragonfly algorithm adapted for wireless sensor network lifetime optimization. In Data intelligence and cognitive informatics. Springer, pp 803–817
Chapter 11
Prediction of Pneumonia Using Deep Convolutional Neural Network (CNN) Jashasmita Pal and Subhalaxmi Das
1 Introduction Pneumonia is one example of a communicable di ease. Bacteria, viruses, or microorganisms cause infections. Pneumonia is a respiratory infection that affects lasting lung damage. When a healthy individual inhales, he or she exhales slowly and deeply [1]. Alveoli are little sacs that fill with air. The lungs are in the process of being formed. Patients with pneumonia experience difficulty breathing and limited oxygen intake due to clogged alveoli in the lungs. This bacterium is extremely harmful for youngsters under the age of five, as well as those in their declining years [1]. Fortunately, antibiotics and antiviral medicines are often effective in treating pneumonia. Early identification and treatment of pneumonia, on the other hand, are critical in preventing death. Pneumonia can be diagnosed using a variety of techniques, including chest X-rays, CT scans, chest ultrasounds, and chest MRIs [2, 3]. Chest X-rays are the most popular and well-known clinical approach for identifying pneumonia nowadays. In rare circumstances, pneumonia appears indistinct on chest X-ray pictures and is mistaken for another disease. These discrepancies resulted in a lot of subjective judgments and variations among radiologists when it came to diagnosing pneumonia. As a result, a computerized web is required to assist radiologists in the avoidance of X-ray-induced pneumonia. Convolutional neural networks (CNNs) have recently achieved tremendous results in picture categorization and segmentation using deep learning approaches (See Fig. 1). The chest cavity is visible or dark in color on the left image of Fig. 1 because the lung chambers are filled with air as fluid fills the air sacs, [4] the radiological J. Pal (B) · S. Das Odisha University of Technology and Research, Bhubaneswar, Odisha, India e-mail: [email protected] S. Das e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_11
129
130
J. Pal and S. Das
(a)
(b)
Fig. 1 a Normal chest X-ray. b Pneumonia chest X-ray
image of the chest cavity brightens, indicating pneumonia, as seen in Fig. 1 on the right. Deep learning is currently posing a significant difficulty in the medical industry which would aid a physician in making the best decision possible regarding early detection. In this study, we present an approach for determining whether or not a person has pneumonia. Deep learning methods and convolutional neural networks (CNNs) were employed to extract and classify the features. Finally, the performance of the models was tested using several performance measures. The following are the work’s concluding sections: Sect. 2 describes the chapter’s Literature Review. Section 3 describes the methodology of this chapter, which contains information on deep learning techniques. Section 4 presents the suggested work for the chapter. Section 5 compares the results of the categorization algorithms, and Sect. 6 presents the conclusion.
2 Literature Survey Recently, many researchers have made efforts to predict pneumonia. Rahman et al. [1] described a deep learning system to diagnose pneumonia, as well as bacterial and viral illness, from X-ray images. The performance of four different pre-trained networks is investigated in the proposed study (AlexNet, ResNet18, DenseNet, and SqueezeNet). This report examines three different networks while also providing methodological information for future studies. Except for three-opposite networks, DenseNet201 surpasses all other networks. Various CNN deep networks after training, DenseNet201 can accurately categorize pneumonia using only a few numbers of complicated datasets such as pictures, yielding in less bias, and better generalization. Sibbaluca et al. [2] Using five convolutional neural network models, the author employs the deep learning function to identify pneumonia through computer vision. The picture datasets were taken from the database of the Radiological Society of North America. The only models that validated the researchers’ observations with
11 Prediction of Pneumonia Using Deep Convolutional Neural Network …
131
an accuracy rate of 95–97% were AlexNet, LeNet, GoogleNet, and VGGNet. In all of the models, pneumonia was successfully recognized 74% of the time, while normal chest X-rays were correctly detected 76% of the time. Using a densely connected convolutional neural network, Varshni et al. [3] proposed a pneumonia detection method (DenseNet-169). Upon analyzing many pre-trained CNN models and other classifiers, the authors chose DenseNet-169 to extract features and SVM to classify them based on the statistical results. According to Ayan et al. [4], their study used both the Xception and VGG16 convolutional neural network models. After that, during the training phase, transfer learning and fine-tuning were applied. In terms of accuracy, the VGG16 network outperformed the Xception network by 0.87 and 0.82%, respectively. Among the two well-known models used by Ayush Pant et al. [5] in this study, the “EfficientNet-B4-based U-Net” model outperformed the “ResNet-based UNet” model with low precision but high recall. High precision is provided by the EfficientNet-B4 U-Net, whereas the ResNet-34 U-Net provides high recall. Amazing results can be obtained by combining these two models. To improve pneumonia detection accuracy, Tilve et al. [6] explored image preprocessing techniques, such as CNN, RESNET, CheXNet, DENSENET, ANN, and KNN, which are critical to converting raw X-ray images into standard formats for analysis and detection. Rudraraju et al. [7] demonstrated a system that uses an unprepared convolution neural system model to describe and use a variety of chest X-ray imaging modalities to detect the presence of pneumonia, it constructs a convolution neural system without any prior preparation to separate highlights of a chest X-ray image and order it to return to a decision whether someone is infected with pneumonia or not. Researchers Mubarok et al. [8] examine how well residual networks and maskRCNNs identify and diagnose pneumonia using two well-known deep convolutional architectures. Furthermore, the outcomes are contrasted and scrutinized. In terms of detecting pneumonia, the residual network outperforms the mask-RCNN. The chest X-rays used by Chakraborty et al. [9] enabled them to uncover some cutting-edge pneumonia identification results. They employed a convolutional neural net for detection, which allowed us to analyze the spatial relevance of the information included within the images in order to correctly identify whether the chest X-rays were sensitive to pneumonia. Li et al. [10] Investigated illness features in CXR images and described an attention-guided CNN-based pneumonia diagnosis technique. Using SE-ResNet as a backbone, they build a fully connected convolutional neural network model for end-to-end output detecting objects. According to the findings, the proposed method outperforms the state-of-the-art object detection model in terms of accuracy and false-positive rate. Li et al. [11] developed an improved convolutional neural network approach for identifying pneumonia. The model in this work was created by adding a convolutional layer, a pooling layer, and a feature integration layer to the initial conventional
132
J. Pal and S. Das
lenet-5 model, and then extensively abstracting the acquired features. Finally, remarkable results were obtained on both the training and test sets of two public datasets, confirming the robustness of the proposed model. To classify pneumonia from X-ray pictures, Islam et al. [12] employed pre-trained deep neural networks as feature extractors in conjunction with classical classification approaches. They also used chest X-ray images to retrain the pre-trained networks, and they chose the two networks with the highest accuracy and sensitivity to use as feature extractors. Radiologists could use the model presented by Yue et al. [13] to make decisions regarding pneumonia diagnosis from digital chest X-rays. The purpose of this thesis is to optimize the weighted predictions of AI models such as ResNet18, Xception, InceptionV3, DenseNet121, and MobileNetV3 by using a weighted classifier. Mohammad Farukh Hashmi et al. developed a method for detecting pneumonia using digital chest X-ray images, which they believe would aid radiologists in making better decisions. It also included a weighted classifier that combines the weighted predictions from cutting-edge deep learning models such as ResNet18, Xception, InceptionV3, DenseNet121, and MobileNetV3 to produce the best possible outcome. Shah et al. [14] used to diagnose Pneumonia properly recognizes chest X-rays. The model’s loss is minimized during training, and as a result, accuracy improves with each epoch step, yielding distinct outcomes for distinguishing between pneumoniaaffected and non-affected people. As a result of the information augmentation and preprocessing processes, convolutional neural networks and deep neural networks are not overfitted, guaranteeing that the outputs are consistent. From overall all these papers we observed that in most of the cases CNN gives the better result and better accuracy. Different authors used this technique to implement different algorithms. So, we use CNN technique for this chapter for prediction of pneumonia disease.
3 Methodology 3.1 Deep Learning Deep learning, in general, refers to a machine learning approach that takes an input X and predicts an output Y. A deep learning algorithm will attempt to narrow the gap between its forecast and the system’s output given a big dataset of input and output pairs. Deep learning algorithms explore for links between inputs and outputs using a neural network. In a neural network, “nodes” make up the input, hidden, and output layers. In a numerical representation of knowledge, hidden layers are associated with the majority of the calculation (e.g., images with pixel specs). Input layers encode knowledge numerically (for example, photos with pixel specifications), output layers provide predictions, and neural networks are used to execute deep learning [3, 5].
11 Prediction of Pneumonia Using Deep Convolutional Neural Network …
133
Fig. 2 Diagram of neurons
The biological neuron represents the concept of motivation in neural networks. A neuron is nothing but brain cells. In Fig. 2, the diagram of neurons; we have got dendrites that are wont to provide input to our neurons. As here multiple dendrites these many inputs are provided to neurons. Inside the Cell body, we have a nucleus that performs some function [6]. At the moment output will travel through Axon, and it’ll go toward axon terminals, so this neuron will fire this output toward the subsequent neuron. Now, this may tell us the following neuron or two neurons are never connected to every other. The gap between two neurons is named Synapse. So this is often the fundamentals of neurons. Within the bellow, part is the same because the upper part diagram. The neural networks feed-forward neural network, convolutional neural network (CNN), recurrent neural network (RNN), modular neural network, artificial neural network (ANN), multilayer perception will be accommodated based on the categories of knowledge. We will go over a convolution neural network in this paper (CNN).
3.2 Convolutional Neural Networks (CNNs) A CNN finds relevant traits without the need for human intervention. Input images, an in-depth feature extractor, and a classifier are the three main components. Images that have not been processed (or have been pre-processed) are used as input. The feature extractor automatically learns the important features [1]. The taught features are fed into a classifier, such as SoftMaxs, which sorts them into categories based on the learned qualities. CNNs are particularly popular due to their superior picture classification capabilities. An artificial intelligence technique called CNN is a feedforward neural network. The tool is widely used in the field of image recognition. An
134
J. Pal and S. Das
Fig. 3 CNN architecture
array of multidimensional data is used by CNN to represent the input file. It works well with a large amount of labeled data [2]. (See Fig. 3) Figure 3 depicts CNN’s architecture, which is made up of three main sorts of layers: (1) The convolution layer is the convolutional network’s first layer and it is utilized to find features. (2) The max-pooling (subsampling) layer reduces dimensionality and down-samples the image, lowering computational costs. The most common polling approach is max-pooling, which uses the most significant element from the feature map as an input. And (3) a fully linked layer to give the network with categorization capabilities [3].
3.3 Pre-trained Convolutional Neural Networks CNN, ResNet50, VGG19, and InceptionV3 are five well-known pre-trained deep learning CNNs to be employed for pneumonia diagnosis in this paper. The following is a quick description of these pre-trained networks.
11 Prediction of Pneumonia Using Deep Convolutional Neural Network …
135
3.4 ResNet ResNet stands for Residual Network, a specific kind of neural network that was introduced by Kaiming He, Zhang in 2015. Residual neural networks are a type of neural network that exists in the absence of Use skip connections or shortcuts to jump over some layers to experiment with this. There are two main reasons to feature skip connections like vanishing gradients and mitigating degradation problems [4, 7]. ResNet comes in several flavors: ResNet18, ResNet50, and ResNet101. ResNet was effectively used for transfer learning in biomedical picture classification. We utilized ResNet50 to detect pneumonia in this article.
3.5 InceptionV3 Inception-v3 may be a deep convolutional neural network with 48 layers’ model with pre-trained layers. It’s a variant of the network that’s been trained on millions of pictures from the Imogene collection. Inceptionv3 requires a 299 * 299-pixel input picture. Convolutions, average pooling, maximum pooling, concat, dropouts, and completely linked layers are among the symmetric and asymmetric construction components of the model [6]. This approach is utilized throughout the model to batch normalize the activation inputs. Softmax is used to compute the loss.
3.6 VGG19 VGG is an acronym for Visual Geometry Group. It’s a multi-layered convolutional neural network. This is the foundation for cutting-edge deep neural network-based object identification models [7, 9, 10]. VGG comes in a variety of forms, including VGG11, VGG16, VGG19, and others. In this study, we look at the VGG19 pretrained model. It’s based on the VGG model, there are 19 levels in total, and they include (16 convolution layers, 3 FC layers, 5 Maxpool layers and 1 Softmax). The model’s weight layers are represented by “19”.
4 Proposed Work In this part, we discussed the proposed work of this topic. We design the five-ensemble framework, i.e., CNN, ResNet50, Inceptionv3, and VGG19 which are explained in detail in the previous section. In this section we discussed the proposed method, preprocessing and augmentation technique, and performance measures (See Fig. 4).
136
J. Pal and S. Das
Datas t INPUT PHASE
Data Preprocessing & Data Augmentation
Normal FINAL PREDICTION
Deep CNN PreTrained Model CNN ResNet50 InceptionV 3 Vgg19
Performance Classification & Evaluation
Pneumonia
Fig. 4 Overview of model framework
Figure 4 depicts an overview of the model structure. The input part of the setup is connected to the second part which could be a preprocessing stage. In these stages, we preprocess the information and resize the pixels. In preprocessing stages, we convert the information into a clean dataset or understandable format. Then we use different data augmentation techniques for expanding the dimensions of a training dataset by creating modified images in this dataset. Then the various deep convolutional transfer learning models are generated to a machine, during this work, five different types of deep learning algorithms: CNN, ResNet50, InceptionV3, VGG19, DenseNet121 are accustomed to classify the output. Hence, the output part is identified as either normal or pneumonia. During this study, four evaluation metrics accuracy, recall, precision, and F1-score were assigned to the bottom CNN model.
4.1 Preprocessing and Augmentation One of the important steps during this methodology is data preprocessing. It is resized the image input for various algorithms [11]. All images are normalized in line with the pre-trained model. As we all know CNN works with an outsized dataset. As a result, data augmentation techniques are frequently used to construct alternative versions of an actual dataset in order to increase its size or to build a whole new training dataset [1]. Deep learning algorithms can benefit from data augmentation to improve their accuracy. Gray scales, horizontal and vertical flips, random crops, color jitters, translations, rotations, resizing, scaling, and a variety of other augmentation techniques are all available. Using these tactics on our data, we may easily double or quadruple the amount of training data, resulting in a very strong model. We used different augmentation techniques for each algorithm in this paper to get new datasets: for CNN, we used (rescale = 1./255, zoom range = 0.3, vertical flip = true, width shift range = 0.1, height shift range = 0.1), for RESNET50, we used horizontal_flip = True, width_shift_range = 0.2, height_shift_range = 0.2, shear_range = 0.2,
11 Prediction of Pneumonia Using Deep Convolutional Neural Network …
137
Table 1 Preprocessing and augmentation technique for different algorithm Algorithm
Augmentation technique
CNN
Vertical flip = true, rescale = 1/255, zoom range = 0.3, width shift range = 0.1, height shift range = 0.1
RESNET50
horizontal_flip = True, width_shift_range = 0.2, height_shift_range = 0.2, shear_range = 0.2, zoom_range = 0.2
INCEPTIONV3 Resized(150, 150, 3) VGG19
horizontal flip = True, validation split = 0, rotation range = 40, rescale = 1./255, shear range = 0.2, zoom range = 0.2, horizontal flip = True, validation split = 0 horizontal flip = True, validation split = 0, shear range = 0.2, zoom range = 0.2
zoom_range = 0.2, for INCEPTIONV3 we use resize (150, 150, 3), for VGG19 we use (rotation_range = 40, rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True, validation_split = 0.1) see the Table 1.
4.2 Performance Metrics K-fold cross-validation is used to train and test four CNNs. After evaluating and comparing various networks’ performances on testing datasets, performance metrics such as accuracy, sensitivity, recall, precision, AUC, and F1-score are used. As a performance metric, True Positive (TP) is a model’s correct choice of the positive selection and is compared using accuracy, sensitivity, recall, precision, AUC, and F1-score [12]. False Positives (FPs) are negative classes that are incorrectly classified as positive classes [13]. Negatives in the model should be categorized as True Negatives (TN). False negatives (FN) are negative classes that have been wrongly classified as positive classes. Performance metrics are shown below for several deep CNNs: Accuracy =
(TP + TN) (TP + FN) + (FP + TN)
Recall =
(TN) (FP + TN)
(1) (2)
Sensitivity =
(TP) (TP + FN)
(3)
Precision =
(TP) (TN + FP)
(4)
(2 ∗ TP) (2 ∗ TP + FN + FP)
(5)
F1 score =
138
J. Pal and S. Das
The words true positive (TP), true negative (TN), false positive (FP), and false negative (FN) refer to the number of pneumonia images in the preceding equations that were classified as pneumonia [7]. And hence the number of normal images that were classified as normal. The term “accuracy” relates to a model’s overall accuracy, or how many correct predictions it makes. The “precision and recall” values show how well the model performs. The model’s positive label prediction precision is demonstrated. This is the ratio of the model’s accurate predictions to its overall predictions. The percentage of bottom truth positives that the model correctly anticipated is measured by recall. “F1-Scores” strike the right blend of precision and recall. Thus, instead of focusing just on the accuracy rate, assessment metrics are used in medical picture classification to obtain an explicit identification of non-diseased, but nonetheless suffering patients [8].
5 Results and Discussion The evaluation findings of the suggested model are discussed in this section. This model’s dataset was also discussed. Table 2 shows the results of different deep learning algorithms and calculates the different measures. The above performance is the training and testing accuracy for different CNNs. It can be found that InceptionV3 produces high accuracy. For normal and pneumonia classification, the accuracy is 96%. In this study, we also calculate the recall, precision, F1-score. Table 2 represent the different performance metrics for five CNNs. This suggested work is also compared with the results of recently published work. Tawsifur Rahman et al. [1] described a deep learning system to diagnose pneumonia, as well as bacterial and viral illness, from X-ray images. The performance of four different pre-trained networks is investigated in the proposed study (AlexNet, ResNet18, DenseNet, and SqueezeNet) with an accuracy of 0.94, 0.96, 0.98, and 0.96, respectively. Sibbaluca et al. [4] classify a person normal or pneumonia and showed outcomes of a result of 0.84 and sensitivity, precision, and AUC of 0.89, 0.91, 0.87. Table 2 Different performance metrics for different deep CNN Algorithms
Accuracy (%)
Recall
Precision
F1-score
CNN
0.90
0.94
0.90
0.92
ResNet50
0.94
0.91
0.88
0.90
InceptionV3
0.96
0.94
0.88
0.91
VGG19
0.74
0.80
0.77
0.82
11 Prediction of Pneumonia Using Deep Convolutional Neural Network … Table 3 Overview of dataset
139
Types
Training set
Test set
Val
Normal
1341 (Normal)
234 (Normal)
8 (Normal)
Pneumonia
3875 (Pneumonia)
390 (Pneumonia)
8 (Pneumonia)
Total
5216
624
16
5.1 Dataset Chest X-Ray Images are included in the dataset (Pneumonia) https://www.kaggle. com/datasets/paultimothymooney/chest-xray-pneumonia. The 1 GB Kaggle chest X-ray pneumonia database, which contains 5856 chest X-ray images, was applied in this study. The images are in jpg or png format to fit the model. There are three sections in the data folder: train, test, and validate. Each directory has subdirectories for every image category of the several types of pneumonia (Normal, Pneumonia). 1341 chest X-rays were judged to be normal, whereas 3875 were discovered to be pneumonia. We covered dataset specifics in the Table 3.
5.2 Simulation Results The following section is discussing about different graph for different algorithm such as CNN, ResNet50, VGG19, and InceptionV3. Figure 5 presented a graph between model accuracy and model loss. This graph shows the No of epoch, loss, Accuracy, Val_loss, Val Accuracy. Here the No of epoch is 20, Loss is 0.21, Accuracy is 0.90, Val_loss and Val accuracy are 0.43 and 0.68. For ResNet50 (See Fig. 6) here the No of epoch is 6. We get 94% accuracy after 6 epochs. Here the Val accuracy is 0.93 and loss is 0.23. Here the training set seems to be steadily increasing, but overall, it increases and this might be a sign of overfitting. Figure 7 shows the graph of InceptionV3. Here the No of epochs is 6, loss is 0.09, Accuracy is 0.96, Val_loss is 0.70, and Val Accuracy is 0.78. For InceptionV3 accuracy is higher than algorithm. Figure 8 shows the graph of model VGG19. Here the number of epochs is 20, the loss is 0.58, the accuracy is 0.74, the loss is 0.58, and the accuracy is 0.74. The Val_loss is 0.58, and the Val accuracy is 0.74. In the graph model accuracy, the X-axis represents the number of epochs and the Y-axis reflects the algorithm’s accuracy; in the graph model loss, the epoch and loss will be displayed. The Python programming language is used to train, test, and evaluate many algorithms in this research. In order to train various models, the machine is equipped with a 64-bit Windows 8 operating system as well as an Intel i3 processor running at 2.40 GHz with 4 GB of RAM.
140 Fig. 5 Model_CNN
Fig. 6 Model_ResNet50
Fig. 7 Model_InceptionV3
Fig. 8 Model_VGG19
J. Pal and S. Das
11 Prediction of Pneumonia Using Deep Convolutional Neural Network …
141
6 Conclusion and Future Work This work demonstrates how deep CNN-based transfer learning may be used to identify pneumonia automatically. Four different pre-trained CNN algorithms are trained and assessed using chest X-ray to distinguish between normal and pneumonia patients [15]. In order to determine the most appropriate course of treatment and ensure that pneumonia does not pose a life-threatening threat to the patient, early identification of pneumonia is critical [14]. The most common method for diagnosing pneumonia is to take a chest radiograph. It has been discovered that InceptionV3 has a better level of accuracy than the others. For normal and pneumonia, the classification accuracy, recall, precision, and F1-score are all high (96, 94, and 96%). Every year, 1,000,000 children die as a result of this critical disease. Many lives can be saved by a speedy recovery combined with efficient therapy based on an accurate identification of the ailment. A timely diagnosis of pneumonia is essential to determining the best course of treatment and preventing life-threatening complications in patients. Deep convolutional neural networks have a unique style that produces high accuracy and superior results. This research can also be used to help in the diagnosis of other health problems. We intend to build a larger database as part of our future work. As a result, other deep learning methodologies can be used to train and evaluate the system in order to more precisely predict outcomes.
References 1. Rahman T, Muhammad EHC, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl Sci 10(9) 2. Varshni D, Thakral K, Agarwal L, Nijhawan R, Mittal A (2019) Pneumonia detection using CNN based feature extraction. In 2019 IEEE international conference on electrical, computer and communication technologies (ICECCT), pp 1–7. IEEE 3. Militante SV, Dionisio NV, Sibbaluca BG (2020) Pneumonia detection through adaptive deep learning models of convolutional neural networks. In: 2020 11th IEEE control and system graduate research colloquium (ICSGRC), pp 88–93. IEEE 4. Sibbaluca BG (2020) Pneumonia detection using convolutional neural network. Int J Sci Technol Res 5. Al Mubarok AF, Dominique Jeffrey AM (2019) Pneumonia detection and classification from chest x-ray image using deep learning approach. IEEE 6. Pant A, Jain A, Nayak KC, Gandhi D, Prasad BG (2020) Pneumonia detection: an efficient approach using deep learning. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–6 7. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. In: 2019 scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT). IEEE, pp 1–5 8. Sirish Kaushik V (2020) Pneumonia detection using convolutional neural networks (cnns). In: Proceedings of first international conference on computing, communications, and cybersecurity (IC4S 2019). Springer
142
J. Pal and S. Das
9. Al Mubarok AF, Faqih A, Dominique JAM, Thias AH (2019) Pneumonia detection with deep convolutional architecture. In: 2019 international conference of artificial intelligence and information technology (ICAIIT). IEEE 10. Li X, Chen F, Hao H, Li M (2020) A pneumonia detection method based on improved convolutional neural network. In: 2020 IEEE 4th information technology, networking, electronic and automation control conference (ITNEC), vol 1. IEEE, pp 488–493 11. Mehta SSH (2020) Pneumonia detection using convolutional neural networks. In: Third international conference on smart systems and inventive technology (ICSSIT 2020) IEEE Xplore Part Number: CFP20P17-ART, IEEE 12. Krishnan S (2018) Prevention of pneumonia using deep learning approach 13. Imran A, Sinha V (2019) Training a CNN to detect pneumonia 14. Chakraborty S, Aich S, Sim JS, Kim H-C (2019) Detection of pneumonia from chest xrays using a convolutional neural network architecture. In: International conference on future information & communication engineering, vol 11, no 1, pp 98–102 15. Bozickovic J, Lazic I, Turukalo TL (2020) Pneumonia detection and classification from X-ray images—a deep learning approach
Chapter 12
State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert System Eugene Fedorov , Tetyana Utkina , Tetiana Neskorodieva , and Anastasiia Neskorodieva
1 Introduction Artificial incubation of poultry eggs, both industrially and in small households, has significant advantages over the classical method, using a mother hen. But sometimes, the eggs incubation results can be unsatisfactory due to the detection of a significant percentage of losses. Losses, as a rule, include unfertilized eggs identified during candling, eggs with the presence of blood rings, freezing of the embryo in development, physiological abnormalities, etc. If many these are detected, it is necessary to diagnose errors in the incubation modes [1]. The most common incubation errors are related to storage, high humidity, too high or low hatch temperature, incorrect turning, or insufficient ventilation. Therefore, first, it is necessary to revise the microclimate parameters: temperature, humidity, the composition of ventilated air, etc. And also, do not forget that the main condition for breeding healthy poultry chicks during the incubation process is the quality of the eggs themselves. It is advisable to choose poultry eggs according to their average weight and size. As a rule, the weight of an egg should fluctuate in the range: chicken 55–60 g, duck 80–92, turkeys 82–85, geese 160–180, quails 9–11. However, deviations of a few grammes are allowed. For incubation, eggs are chosen without growths and thickenings on the shell, the correct shape, oval, without cracks [1]. On
E. Fedorov (B) · T. Utkina Cherkasy State Technological University, Shevchenko Blvd., Cherkasy 460, 18006, Ukraine e-mail: [email protected] T. Utkina e-mail: [email protected] T. Neskorodieva · A. Neskorodieva Vasyl’ Stus Donetsk National University, 600-Richcha Str., 21, Vinnytsia 21021, Ukraine e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_12
143
144
E. Fedorov et al.
initial candling, there should be no noticeable bloody rings or spots, yolk displacement, airbox problems, or a uniform glow. Candling of eggs should be done carefully without unnecessary shaking, shock, prolonged cooling, and, as a rule, no more than 3 times [2]. In addition, no one is immune from the case when, due to external factors (power outages, programme error, or incorrect operator actions), the hatchery may stop working in normal mode. Diagnosis and errors elimination in the egg incubation mode are of paramount importance for breeding healthy poultry, as it will allow creating and controlling optimal external conditions in the incubator, corresponding to the normal embryo development in the egg. The insufficient number of works related to the creation of systems for intelligent diagnostics of the state of development of poultry eggs is a gap in research in the existing literature and a motivation to correct this shortcoming. Therefore, the development of a neuro-fuzzy expert system for the implementation of intelligent diagnostics of the state of development of poultry eggs in the process of incubation is an urgent task. This will make it easy and timely to diagnose and eliminate errors in the mode of incubation of poultry eggs, based on biological signs indicating violations of the development of the embryo due to the adverse influence of external conditions, which will ensure a greater percentage of the yield of high-quality and healthy offspring of poultry. Now for intellectual diagnostics of egg development state methods of artificial intelligence are used, at the same time, the most popular are: 1. Machine learning: (1) Metric approach (for example, k of the nearest neighbours) [1]. (2) Probabilistic approach: logistic and multinomial regression [3]; linear discriminant analysis [4, 5]; Naive Bayesian classifier [6]. (3) Logical approach (for example, decisions tree) [1]. (4) Connectionist approach: support vector machine [7, 8]; multilayer perceptron [9, 10]; convolution neural networks [11, 12]. (5) Taxonomical approach (for example, k-means [13]). (6) Metaheuristic approach (for example, clonal selection [14, 15], genetic algorithm) [16]. 2. Expert systems: expert system [17]; fuzzy expert system [18]. Now for intellectual diagnostics of egg development state by the most popular methods of machine learning artificial neural networks are [19, 20]. Now for intellectual diagnostics of egg development states amongst expert systems fuzzy expert systems are the most [21]. Recently, neural nets are combined with fuzzy expert systems, and for training of parameters of membership function can be used metaheuristics [22, 23]. Thus, development of an intellectual system of diagnostics of egg development states which will allow to eliminate the specified defects is relevant. Thus, it is relevant to develop an intelligent system for diagnosing the state of egg development, which will eliminate these shortcomings, which is a new contribution to the study of this problem.
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …
145
The purpose of work is increasing and diagnosing the efficiency of egg development states at the expense of a neuro-fuzzy expert system which studies the fundamentals of metaheuristics. For goal achievement, it is necessary to solve the following problems: 1. Creation of a neuro-fuzzy expert system of egg development states diagnostics. 2. Creation of mathematical models of a neuro-fuzzy expert system of egg development states diagnostics. 3. Choice of criteria for evaluation of efficiency of mathematical models of a neurofuzzy expert system of egg development states diagnostics. 4. Parameter identification of the mathematical model neuro-fuzzy expert system of egg development states diagnostics based on an algorithm of the back propagation in batch mode. 5. Parameter identification of the mathematical model neuro-fuzzy expert system of egg development states diagnostics based on the optimizer of a grey wolf.
2 The Neuro-fuzzy Expert System of Egg Development States Diagnostics Creation For diagnostics of egg development states in work, the fuzzy expert system which provides representation of egg development states knowledge in the form of the fuzzy rules available to understanding by the person reached further improvement and provides performance of the following stages [24, 25]: formation of linguistic variables; formation of base of fuzzy rules; fuzzification; subconditions aggregation; conclusions activation; conclusions aggregation; defuzzification. Previously, extraction of features from the image by means of the image transformation in grey, segmentations, threshold processing (definition of cracks and sites of egg without shell), and morphological operations (receiving all egg surface by means of the shell defects filling) is carried out [18].
2.1 Linguistic Variables Formation As accurate input variables were chosen: • • • •
availability of blood on a shell x1 ; existence of the site of egg without shell x2 ; share of superficial cracks x3 ; egg size x4 . As linguistic input variables were chosen:
• blood on a shell x˜1 with the values a11 = present, a12 = absent, at which values areas are fuzzy sets A˜ 11 = {x1 |μ A˜ 11 (x1 )}, A˜ 12 = {x1 |μ A˜ 12 (x1 )};
146
E. Fedorov et al.
• site of egg without shell x˜2 with the values a21 = present, a12 = absent, at which values areas are fuzzy sets A˜ 21 = {x2 |μ A˜ 21 (x2 )}, A˜ 22 = {x2 |μ A˜ 22 (x2 )}; • share of superficial cracks x˜3 with the values a31 = large, a32 = medium, a33 = small, a34 = zero, at which values areas are fuzzy sets A˜ 31 = {x3 |μ A˜ 31 (x3 )}, A˜ 32 = {x3 |μ A˜ 32 (x3 )}, A˜ 33 = {x3 |μ A˜ 33 (x3 )}, A˜ 34 = {x3 |μ A˜ 34 (x3 )}; • egg size x˜3 with the values a31 = small, a32 = medium, a33 = large, at which values areas are fuzzy sets A˜ 41 = {x4 |μ A˜ 41 (x4 )}, A˜ 42 = {x4 |μ A˜ 42 (x4 )}, A˜ 43 = {x4 |μ A˜ 43 (x4 )}. As an accurate output variable, number of a state of egg was chosen y. As a linguistic output variable, the state of egg was chosen y with the values 1 = invalid, β 2 = poor, β 3 = average, β 4 = good, β 5 = excellent, at which β ˜ ˜ values areas are fuzzy sets B1 = {y|μ B˜ 1 (y)}, B2 = {y|μ B˜ 2 (y)}, B˜ 3 = {y|μ B˜ 3 (y)}, B˜ 4 = {y|μ B˜ 4 (y)}, B˜ 5 = {y|μ B˜ 5 (y)}.
2.2 Formation of Base of Fuzzy Rules The offered fuzzy rules consider all possible combinations of input linguistic variables values and output linguistic variable values corresponding to them: R 1 : If x˜1 is α˜ 11 and x˜2 is α˜ 21 and x˜3 is α˜ 31 and x˜4 is α˜ 41, then y˜ is β˜1 (F 1 )… R 48 : If x˜1 is α˜ 12 and x˜2 is α˜ 22 and x˜3 is α˜ 34 and x˜4 is α˜ 43, then y˜ is β˜5 (F 48 ). where F r —coefficients of fuzzy rules R r . For example, fuzzy rule R 1 corresponds to the following knowledge: If blood is present at a shell, and the opening is present at a shell both a share of superficial cracks big and the size of egg small, then a state of egg development invalids.
2.3 Fuzzification Let us define degree of truth of each subcondition of each fuzzy rule by means of membership function μ A˜ i j (xi ). Membership function of subcondition is defined in a form
−1 xi − γi1 2βi j μ A˜ i j (xi ) = 1 + , i ∈ 1, 4, j ∈ 1, n i , αi j or in a form
(1)
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …
⎧ 0, xi ≤ α ⎪ ⎪ ⎪ xi −α ⎪ ⎪ , α ≤ xi ≤ β ⎨ β−α μ A˜ i j (xi ) = 1, β ≤ xi ≤ γ , i ∈ 1, 4, j ∈ 1, n i ⎪ δ−xi ⎪ ⎪ , γ ≤ xi ≤ δ ⎪ δ−γ ⎪ ⎩ 0, xi ≥ δ
147
(2)
where αi j , βi j , γi j δi j —parameters, n 1 = 2, n 2 = 2, n 3 = 4, n 4 = 3.
2.4 Aggregation of Substates Let us define degree of truth of each state of each fuzzy rule by means of membership function μ A˜ r (x). Membership function of states is defined in a look μ A˜ r (x) = μ A˜ 1i (x1 )μ A˜ 2 j (x2 )μ A˜ 3k (x3 )μ A˜ 4l (x4 ),
(3)
r ∈ 1, 48, i ∈ 1, n 1 , j ∈ 1, n 2 , k ∈ 1, n 3 , l ∈ 1, n 4 , or in a form μ A˜ r (x) = min{μ A˜ 1i (x1 ), μ A˜ 2 j (x2 ), μ A˜ 3k (x3 ), μ A˜ 4l (x4 )},
(4)
r ∈ 1, 48, i ∈ 1, n 1 , j ∈ 1, n 2 , k ∈ 1, n 3 , l ∈ 1, n 4 .
2.5 Activation of the Conclusions Let us define degree of truth of each state of each fuzzy rule by means of membership function μC˜ r (x, z). Membership function of the conclusions is defined in a look μC˜ r (x, z) = μ A˜ r (x)μ B˜ m (z)F r , r ∈ 1, 48.
(5)
μC˜ r (x, z) = min{μ A˜ r (x), μ B˜ m (z)}F r , r ∈ 1, 48.
(6)
or in a form
In this work membership function μ Bm (z) and weight coefficients of fuzzy rules F r are defined in a form
148
E. Fedorov et al.
μ Bm (z) = [z = m] =
1, z = m , m ∈ 1, 5 0, z = m
(7)
F r = 1.
2.6 Conclusions Aggregation Let us define degree of truth of the conclusion by means of membership function μC˜ (x, z). Membership function of the conclusion is defined in a form μC˜ (x, z) = 1 − (1 − μC˜ 1 (x, z)) · ... · (1 − μC˜ 48 (x, z)), z ∈ 1, 5
(8)
or in a form μC˜ (x, z) = max{μC˜ 1 (x, z), ..., μC˜ 48 (x, z)}, z ∈ 1, 5
(9)
2.7 Defuzzification For obtaining number of egg development states, the method of a maximum of membership function is used z ∗ = arg max μC˜ (x, z), z ∈ 1, 5.(10). z
3 Creation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics For the neuro-fuzzy expert system of egg development states, diagnostics in work reached further improvement mathematical models of artificial neural networks due to use of p-sigma, inverted p, and min–max of neurons that allows to model stages of a fuzzy logical conclusion which defines model structure. The model structure of a neuro-fuzzy expert system is presented in the form of the graph in Fig. 1. The input (zero) layer contains four neurons (corresponds to quantity of input variables). The first hidden layer realises a fuzzification and contains eleven neurons (corresponds to values number of linguistic input variables). The second hidden layer realises aggregation of subconditions and contains 48 neurons (corresponds to the
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …
149
z
Fig. 1 Model structure of a neuro-fuzzy expert system in the graph form
x1
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
layer 2
layer 3
… …
x2
… … …
x3 … …
x4
…
layer 0
layer 1
y1 …
…
y2 y3 y4
…
y5
layer 4
number of fuzzy rules). The third hidden layer realises activation of the conclusions and contains forty-eight neurons (corresponds to the number of fuzzy rules). The output (fourth) layer realises aggregation of the conclusions and contains five neurons (corresponds to values number of linguistic output variables). Functioning of a neuro-fuzzy expert system is presented as follows (Fig. 1). In the first layer, membership function of subconditions on a basis is calculated: • trapezoidal function ⎧ 0, ⎪ ⎪ ⎪ xi −α ⎪ ⎪ ⎨ β−α , μ A˜ i j (xi ) = 1, ⎪ δ−xi ⎪ ⎪ , ⎪ ⎪ ⎩ δ−γ 0,
xi ≤ α α ≤ xi ≤ β β ≤ xi ≤ γ , i ∈ 1, 4, j ∈ 1, n i ; γ ≤ xi ≤ δ xi ≥ δ
(11)
• bell-shaped function
−1 xi − γi1 2βi j μ A˜ i j (xi ) = 1 + , i ∈ 1, 4, j ∈ 1, n i , αi j
(12)
150
E. Fedorov et al.
n 1 = 2, n 2 = 2, n 3 = 4, n 4 = 3. In the second layer condition, membership function on a basis is calculated: • product of the sums μ A˜ r (x) =
ni 4
wirj μ A˜ i j (xi ) r ∈ 1, 48, wirj ∈ {0, 1};
(13)
i=1 j=1
• minimization of maximising
μ A˜ r (x) = min max wirj μ A˜ i j (xi ) , i ∈ 1, 4, j ∈ 1, n i , r ∈ 1, 48, wirj ∈ {0, 1}. i
j
(14) In the third layer, membership function of the conclusions on a basis is calculated: • products μC˜ r (x, z) = wr μ A˜ r (x)μ B˜ r (z), z ∈ 1, 5, r ∈ 1, 48;
(15)
• minimisation μC˜ r (x, z) = wr min{μ A˜ r (x), μ B˜ r (z)}, z ∈ 1, 5, r ∈ 1, 48, wr = F r .
(16)
In the fourth layer, membership function of the conclusion on a basis is calculated: • the inverted product yz = μC˜ (x, z) = 1 −
48
wrz (1 − μC˜ r (x, z)), z ∈ 1, 5, wrz ∈ {0, 1};
(17)
r =1
• maximising yz = μC˜ (x, z) = max wrz μC˜ r (x, z) , z ∈ 1, 5, r ∈ 1, 48, wrz ∈ {0, 1}. r
(18)
Thus, the mathematical model of a neuro-fuzzy expert system based on bellshaped function, the product of the sums, the product, and the inverted product is presented in the form yz = μC˜ (x, z) = 1 −
48 r =1
⎛
⎛
wrz ⎝1 − wr ⎝
ni 4
⎞
⎞
wirj μ A˜ i j (xi )⎠μ B˜ r (z)⎠, z ∈ 1, 5.
i=1 j=1
(19)
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …
151
Thus, the mathematical model of a neuro-fuzzy expert system based on trapezoidal function, minimisation of maximising, minimisation, and maximising is presented in the form
yz = μC˜ (x, z) = max wrz wr min min max wirj μ A˜ i j (xi ) , μ B˜ r (z) , z ∈ 1, 5 r ∈1,48
i∈1,4 j∈1,n i
(20) For making decision on an egg development state for models (19)–(20), the method of a maximum of membership function is used z ∗ = arg max yz = arg max μC˜ (x, z), z ∈ 1, 5. z
z
(21)
4 Choice of Criteria for Efficiency Evaluation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics In work for assessment of parametric identification of mathematical models of a neuro-fuzzy expert system of egg development states diagnostics (19), (20) are chosen: • accuracy criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of a mean square error (the differences of an output on model and a test output) F=
P 5 1 (y pz − d pz )2 → min θ 3P p=1 z=1
(22)
where d pz —test output, d pz ∈ {0, 1}, y pz —output on model, P—number of test realisation; • reliability criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of probability of the wrong decision (the differences of an output on model and a test output) F=
P 1 arg max y pz = arg max d pz → min, θ P p=1 z∈1,5 z∈1,5
(23)
152
E. Fedorov et al.
⎧ ⎨ 1, arg max yz = arg max d pz z∈1,5 z∈1,5 arg max yz = arg max d pz = . ⎩ 0, arg max yz = arg max d pz z∈1,5 z∈1,5 z∈1,5
z∈1,5
• speed criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of computing complexity F = T → min .
(24)
θ
5 Numerical Research The numerical research of the offered mathematical models of a neuro-fuzzy expert system and a usual multilayer perceptron was conducted in a Python package. Computing difficulties, mean square errors (MSE), and probabilities of the wrong decisions on diagnostics of a state the developments of egg gained on the benchmark of Chicken [26] containing RGB of the image of 1080 × 800 in size are presented in Table 1 and from 380 images 80% of the images for the training sample and 20% of images for test and test sample, by means of artificial neural network of type multilayered perceptron (MLP) with the back propagation (BP) and the optimizer of a grey wolf (GWO), and the offered models (19) and (20) with the back propagation (BP) and the optimizer of a grey wolf (GWO), respectively, were in a random way selected. At the same time, MLP had 2 hidden layers (everyone consisted of 11 neurons, as well as an input layer). It was experimentally established that parameter ε = 0.05 (Table 1). Table 1 Quality characteristics of the diagnostics of the state of development of the egg Model and method of parameters identification
MSE
Probability of the wrong decision
Computing complexity
Usual MLP with BP in the consecutive mode
0.49
0.19
T = PN
Usual MLP with GWO without parallelism
0.38
0.14
T = PNI
Author’s model (19) with BP in batch mode with bell-shaped membership function
0.10
0.04
T =N
Author’s model (20) with GWO with parallelism with trapezoidal membership function
0.05
0.02
T =N
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …
153
According to Table 1, the best results are yielded by model (20) with identification of parameters based on GWO and with trapezoidal membership function. Based on the experiments made, it is possible to draw the following conclusions. Identification procedure of parameters based on the optimizer of a grey wolf is more effective than a method of training at a basis of the back propagation due to the automatic choice of structure of models, reduction of probability of hit to a local extremum, and use of technology of parallel information processing.
6 Conclusions For a solution of the problem of increase in efficiency of diagnostics of egg development states, the corresponding methods of artificial intelligence were investigated. This research showed that today, the most effective is use of a neuro-fuzzy expert system in combination with metaheuristics. The novelty of a research is that the offered neuro-fuzzy expert system provides representation of knowledge about states of development of egg in the form of the fuzzy rules available to understanding by the person and reduces computing complexity, probability of the wrong decision and a mean square error due to the automatic choice of structure of model, reduction of probability of hit in a local extremum and uses of technology of parallel information processing for the optimizer of a grey wolf and the back propagation in batch mode. As a result of a numerical research, it was established that the offered neurofuzzy expert system provides probability wrong the made decisions on states of development of egg 0.02 and mean square error 0.05. Further prospects of a research are use of the offered neuro-fuzzy expert system for various intellectual decision support systems.
References 1. Kertész I, Zsom-Muha V, András R, Horváth F, Németh C, Felföldi J (2021) Development of a novel acoustic spectroscopy method for detection of eggshell cracks. Molecules 26:1–10. https://doi.org/10.3390/molecules26154693 2. Zhihui Z, Ting L, Dejun X, Qiao-hua W, Meihu M (2015) Nondestructive detection of infertile hatching eggs based on spectral and imaging information. Int J Agric Biol Eng 8(4):69–76. https://doi.org/10.25165/ijabe.v8i4.1672 3. Lai C-C, Li C-H, Huang K-J, Cheng C-W (2021) Duck eggshell crack detection by nondestructive sonic measurement and analysis. Sensors 21:1–11 4. Sun L, Feng S, Chen C, Liu X, Cai J (2020) Identification of eggshell crack for hen egg and duck egg using correlation analysis based on acoustic resonance method. J Food Process Eng 43(8):1–9. https://doi.org/10.1111/jfpe.13430 5. Teimouri N, Omid M, Mollazade K, Mousazadeh H, Alimardani R, Karstoft H (2018) On-line separation and sorting of chicken portions using a robust vision-based intelligent modeling approach. Biosys Eng 167:8–20
154
E. Fedorov et al.
6. Nikolova M, Zlatev ZD (2019) Analysis of color and spectral characteristics of hen egg yolks from different manufacturers. Appl Res Technics Technol Educ 7(2):103–122. https://doi.org/ 10.15547/artte.2019.02.005 7. Zhu Z, Ma M (2015) The identification of white fertile eggs prior to incubation based on machine vision and least square support vector machine. Int J Animal Breed Genetics 4(4):1–6 8. Bhuvaneshwari M, Palanivelu LM (2015) Improvement in detection of chicken egg fertility using image processing techniques. Int J Eng Technol Sci 2(4):65–67 9. Hamdany AHS, Al-Nima RRO, Albak LH (2021) Translating cuneiform symbols using artificial neural network. TELKOMNIKA Telecommun Comput Electron Control 19(2):438–443. https://doi.org/10.12928/telkomnika.v19i2.16134 10. Saifullah S, Suryotomo AP (2021) Chicken egg fertility identification using FOS and BP-neural networks on image processing. J Rekayasa Sistem dan Teknologi Informasi 5(5):919–926. https://doi.org/10.29207/resti.v5i5.3431 11. Geng L, Hu Y, Xiao Z, Xi J (2019) Fertility detection of hatching eggs based on a convolutional neural network. Appl Sci 9(7):1–16. https://doi.org/10.3390/app9071408 12. Fedorov E, Lukashenko V, Patrushev V, Lukashenko A, Rudakov K, Mitsenko S (2018) The method of intelligent image processing based on a three-channel purely convolutional neural network. CEUR Workshop Proc 2255:336–351 13. Saifullah S K-means Segmentation based-on lab color space for embryo egg detection, 1–11 (2021). arXiv:2103.02288. https://doi.org/10.48550/arXiv.2103.02288 14. Grygor OO, Fedorov EE, Utkina TY, Lukashenko AG, Rudakov KS, Harder DA, Lukashenko VM (2019) Optimization method based on the synthesis of clonal selection and annealing simulation algorithms. Radio Electron Comput Sci Control 2:90–99. https://doi.org/10.15588/ 1607-3274-2019-2-10 15. Fedorov E, Lukashenko V, Utkina T, Lukashenko A, Rudakov K (2019) Method for parametric identification of Gaussian mixture model based on clonal selection algorithm. CEUR Workshop Proc 2353:41–55 16. Loshchilov I CMA-ES with restarts for solving CEC-2013 benchmark problems. In: 2013 IEEE congress on evolutionary computation proceedings, pp 369–376 17. Patel VC, McClendon RW, Goodrum JW (1998) Development and evaluation of an expert system for egg sorting. Comput Electron Agric 20(2):97–116 18. Omid M, Soltani M, Dehrouyeh MH, Mohtasebi SS, Ahmadi H (2013) An expert egg grading system based on machine vision and artificial intelligence techniques. J Food Eng 118(1):70–77. https://doi.org/10.1016/j.jfoodeng.2013.03.019 19. Haykin S (2009) Neural networks and learning machines. Pearson Education, Inc., Upper Saddle River, New Jersey, 3rd ed 20. Du K-L, Swamy MNS (2014) Neural networks and statistical learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3 21. Rancapan JGC, Arboleda ER, Dioses JL, Dellosa RM (2019) Egg fertility detection using image processing and fuzzy logic. Int J Sci Technol Res 8(10):3228–3230 22. Yang X-S (2018) Nature-inspired algorithms and applied optimization. Springer, Charm 23. Nakib A, Talbi El-G (2017) Metaheuristics for medicine and biology. Springer, Berlin 24. Fedorov E, Nechyporenko O (2021) Dynamic stock buffer management method based on linguistic constructions. CEUR Workshop Proc 2870:1742–1753 25. Tsiutsiura M, Tsiutsiura S, Yerukaiev A, Terentiev O, Kyivska K, Kuleba M (2020) Protection of information in assessing the factors of influence. In: 2020 IEEE 2nd international conference on advanced trends in information theory proc, pp 285–289 26. Fedorov E Chicken eggs image models. https://github.com/fedorovee75/ArticleChicken/raw/ main/chicken.zip
Chapter 13
Analysis of Delay in 16 × 16 Signed Binary Multiplier Niharika Behera, Manoranjan Pradhan, and Pranaba K. Mishro
1 Introduction Multiplication is a basic and vital operation which helps in implementing algebraic arithmetic computations. It is equally important in both unsigned and signed operations, such as multiply and accumulate, fast Fourier transform, digital signal processors, microprocessors, and filtering applications. The processing speed of these processors is a dependent factor on the type of multipliers used. With the use of conventional approaches, less operational speed is experienced with respect to the need. Vedic mathematics was developed from Primeval Indian Vedas in the early twentieth century. It is a type of ancient mathematics discovered by an Indian mathematician Jagadguru Sri Bharati Krishna Tirthaji. According to him, Vedic mathematics consists of 16 sutras and 13 sub-sutras, which helps in computing calculus, geometry, conics, algebra, and arithmetic [1]. Normal mathematical problems can be easily solved and equally improved with the utilization of the Vedic mathematics. It is not only a mathematical astonishment but also analytical. The analytical support of Vedic mathematics has a standard of elevation which cannot be discarded. Due to these exceptional features, it has set up a flagship in the exploration of mathematical models. It is a very riveting area and gives a number of effective methods [2]. This can be employed in several fields of engineering such as computational and analytical models and digital signal processing. Sixteen Vedic algorithms (sutra) can be employed for solving the difficulties in the design of binary multipliers. Two sutras of Vedic mathematics such as Nikhilam and Urdhva Tiryagbhyam (UT) sutras are used commonly in multiplication operations. Similarly, Yavadunam sutra is used in square and cube operations. With the use of these sutras, the mathematical N. Behera (B) · M. Pradhan · P. K. Mishro Veer Surendra Sai University of Technology, Burla, Odisha 768018, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_13
155
156
N. Behera et al.
operation can be analyzed in a short period of time. This can further reduce the chip area in the FPGA board, and the processing delay can also be improved. The details about the signed and unsigned numbers are discussed in [3]. The author established a statement that the performance of multipliers is efficient by using these signed and unsigned numbers. Performance evaluation of squaring operation by Vedic mathematics multiplier is discussed in [4]. For squaring operation, they suggested a design architecture which can be easily implemented in a short time period by using Yavadunam sutra in VHDL. The performance is compared with conventional Booth’s algorithm. The parameters like time delay and area inhabited on the Xilinx Virtex are considered for the comparison purpose. However, the delay performance is not convenient. In [5], the authors extended the decimal algorithm to the binary number system. Most of the operations have been done by using only unsigned numbers and decimal numbers [6]. However, there is a scope of improvisation of speed, delay, and reducing the area. The Booth multipliers give a few number of one-sided multiplications (partial product) by concerning a bunch of multiplier portions while balanced through array multipliers [7]. Time delay and area performance are compared with conventional multipliers [8]. However, the higher carry propagation is the major issue. It occurred in larger operand proportion in array multiplier. A generalized architecture for cube operation based on Yavadunam sutra of Vedic mathematics is discussed in [9]. The sutra converts the cube of a higher magnitude number into a lower magnitude number and insertion operation. Using Xilinx ISE 14.5 software, the cubic architecture is synthesized and simulated using various FPGA devices for comparison purposes. Many research works have been reported by using these sixteen sutras such as addition, multiplication, division, square, and cubes [9–11]. In [12], authors reported a fast 16 × 16-bit Vedic multiplier utilizing sutra. However, less delay is obtained with the Wallace tree, Array, and Booth multipliers. Similarly, a multiplier acquired in application specific integrated circuit design utilizing Vedic sutra is discussed in [13]. FPGA implementation of complex multiplier based on signed Vedic multiplier is reported in [14, 15]. It multiplies signed numbers in 2’s complement form and produces the result in 2’s complement form [16–18]. All the early designs using Vedic sutra were commonly focused on unsigned operations. It has motivated us to design the signed multiplier using Vedic UT sutra. The suggested multiplier is designed with more than 1-bit of multiplier and multiplicand in every cycle. This technique is experimented with multiplication of more than one decimal number as well as binary numbers. The multiplication operations using the UT sutra are easier than the conventional approaches. We can easily find the product of large signed or unsigned numbers in one step using the suggested design. The architecture may also be useful for future exploration of signed bit multiplications. The rest of the paper is organized as follows: Sect. 2 explains the standard UT sutra. In Sect. 3, the proposed design is elaborated with a representation block diagram. The results and related discussion are presented in Sect. 4. Finally, the paper is concluded in Sect. 5 with future scope.
13 Analysis of Delay in 16 × 16 Signed Binary Multiplier
157
2 Urdhva Tiryagbhyam Sutra In this section, a generalized UT sutra is presented. The sutra is found to be suitable for all cases of multiplications. The explicit structure of this sutra is “vertically and crosswise” operation. It is applied for multiplication in both signed and unsigned number systems (Fig. 1). Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7:
y0 = m0 × n0. y1= m1 × n0 + m0 × n1 y2 = m2 × n0 + m0 × n2 + m1 × n1 y3 = m3 × n0 + m0 × n3 + m2 × n1 + m1 × n2. y4 = m3 × n1 + m1 × n3 + m2 × n2. y5 = m3 × n2 + m2 × n3. y6 = m3 × n3.
Example 1: Finding the signed number multiplication of two decimal numbers 12 by (−13) using UT sutra, as shown in Fig. 2. Step-1: The leftmost digit of 12 (1) is multiplied by the multiplicand vertically by the leftmost digit (−1) of the multiplier −13, getting their product (−1), and setting it down as the leftmost part of the answer. Step-2: The digits 1 of 12 with (−3) of −13 and (−1) of −13 with 2 of 12 are crosswise multiplied and added the two, getting (−5) as the sum, and setting it down as the middle part of the answer. Step-3: The digits 2 of 12 and (−3) of −13 are multiplied vertically, getting (−6) as their product, and putting it down as the last (the rightmost) part of the answer. Thus, 12 × (−13) = (−156). Step 4: Finally, left, middle, and right parts are concatenated to get the final signed product (−156).
Fig. 1 Generalized procedure of the UT sutra
158
N. Behera et al.
Fig. 2 Multiplication of decimal numbers using UT sutra
Fig. 3 Multiplication of binary numbers using UT sutra
Example 2: Finding the signed binary multiplication of (−1) (multiplicand) and (−3) (multiplier) using UT sutra, as shown in Fig. 3. Step-1: Finding the 2’s complement of multiplicand that is -1 = 1111. Step-2: Finding the 2’s complement of multiplier that is -3 = 1101. Step-3: Vertical multiplication of least significant bit (LSB) of the multiplicand and multiplier (1 × 1 = 1) is the right part result (RPR). Step-4: Cross-multiplication is performed among (1 × 1 + 1 × 0 = 1). 1 is stored as the 1st bit of the middle part of the result (MPR). Step-5: Cross-multiplication is performed among (1 × 1 + 1 × 1 + 1 × 0 = 10), and the LSB 0 will be stored as the 2nd bit of MPR and MSB 1, as the 1st bit of the carry c1. Step-6: Cross-multiplication is performed among (1 × 1 + 1 × 1 + 1 × 1 + 1 × 0 = 11), MSB 1 results as 2nd bit of carry c1, the LSB 1 is added with 1st bit of carry c1, and10 is formed. 0 is stored as 3rd bit of MPR, and 1 is taken as 1st bit of the carry c2.
13 Analysis of Delay in 16 × 16 Signed Binary Multiplier
159
Step-7: Cross-multiplication is performed among (1 × 0 + 1 × 1 + 1 × 1 = 10), the MSB 1 results as the3rd bit of the carry c1, the LSB 0 is added with the 2nd bit of c1 and 1st bit of c2, and 11 is obtained. The LSB of 1 is stored as the 4th bit of MPR, and MSB of 1 is taken as the 2nd bit of c2. Step-8: Cross-multiplication is performed among (1 × 1 + 1 × 1 = 10), the MSB 1 behaves as4th bit of the carry c1, the LSB 0 is added with the 3rd and 2nd bit of c2, and 11 is obtained. The LSB of 1 is stored as the 5th bit of MPR, and the MSB of 1 is taken as the 3rd bit of c2. Step-9: Lastly, the vertical multiplication is performed among the MSB of multiplicand and multiplier (1 × 1 = 1), which is then added with the 4th bit of c1 and 3rd bit of c2, and 11 is obtained. The LSB of 1 is stored as the 6th bit of MPR, and MSB of 1 is stored as the left part result (LPR). So concatenation of LPR, MPR, and RPR gives the final result.
3 Proposed Design The block diagram of signed multiplier using conventional method is shown in Fig. 4. The two inputs m and n correspond to multiplier and multiplicand which have 16-bit, respectively.
Fig. 4 Proposed 16 × 16 signed multiplier architecture using conventional method
160
N. Behera et al.
The complement of multiplicand or multiplier is symbolized by taking 2’s complement of that 16-bit binary number. So the 2’s complement part generates the complemented output of 16-bit multiplier operand (−m) and 16-bit multiplicand operand (−n). The complemented output of 16-bit multiplier operand and 16-bit multiplicand operand is two inputs of the signed multiplier module. After signed multiplication, the 32-bit product has been stored as the final product. The proposed 16 × 16 signed Vedic multiplier (VM) using “Urdhva Tiryagbhyam” sutra is shown in Fig. 2. Four 8 × 8 signed Vedic multiplier modules, one 16-bit ripple carry adder (RCA), and two 17-bit binary adder stages are required for the proposed 16 × 16 signed Vedic multiplier. 16-bit RCA and 17-bits binary adder modules of proposed architecture used to make the concluding 32-bits signed multiplier (s31–s16) and (s15–s8) and (s7–s0). The s7–s0 (8-bits) of the product shows the 8-bits least significant part of the 16-bit resultant of the right hand part 8 × 8 signed multiplier module. The 16-bit RCA adds three input 16-bit operands, i.e., concatenated 16-bit (“11111111” and the major important eight bits output of rightmost part 8 × 8 signed VM module), every 16-bit output of next and third 8 × 8 signed VM modules. The 16-bit RCA fabricates two 16-bit output operands, sum part, and carry part. The outputs of the RCA are fed into the initial 17-bit binary adder to produce 17-bit summation. The middle part (s15–s8) corresponds to the negligible amount of large eight bits of 17-bit summation. The 16-bit output of the left hand part 8 × 8 signed VM module and concatenated 16-bits (“1111111” and the important nine bits of 17-bits sum part) is cropped into the next 17-bit binary adder. The s31–s16 corresponds to the 16-bit summation. Final product is 33-bit; 32-bit is taken from the left significant part of products.
4 Results and Discussion The conventional design as shown in Fig. 4 has been coded and synthesized using Verilog hardware description language through the ISE Xilinx 14.5 software. Figure 6 shows the simulation waveform of our proposed conventional signed architecture. Here for 16 × 16-bit multiplier, applied input a = 1111111111011111 (−33) and input b = 1111111111111111 (−1) and obtained 32-bit product for the same y = 00000000000000000000000000100001(33). The proposed 16 × 16 binary signed VM multiplier design as shown in Fig. 5 is synthesized using Verilog hardware description language through the ISE Xilinx 14.5 software. Further, the proposed design is implemented in Virtex 4 FPGA device. Figure 7 (in signed decimal radix) and Fig. 8 (in binary radix) show the simulation waveforms of the proposed 16 × 16 signed architecture using UT sutra. Here for 16 × 16-bit multiplier, applied input a = 0000000000001111 (15) and input b = 1111111111111110 (−2) and obtained 32-bit product for the same y = 11111111111111111111111111100010 (−30). Here, we multiplied the unsigned number with the signed number using UT sutra and got the appropriate result.
13 Analysis of Delay in 16 × 16 Signed Binary Multiplier
Fig. 5 Proposed 16 × 16 signed VM architectures
Fig. 6 Simulation result of 16 × 16 conventional signed multiplier
Fig. 7 Simulation result of 16 × 16 signed VM multiplier
Fig. 8 Simulation result of 16 × 16 binary signed VM multiplier
161
162 Table 1 Synthesis report of proposed multipliers
N. Behera et al. Device: virtex 4 vlx15sf363-12
16 × 16 conventional multiplication
16 × 16 signed VM
Path delay (ns)
10.948
9.956
Area (4 input LUT)
64 out of 10,944
25 out of 10,944
Power (W)
166.38
166.38
FPGA implementation is the most important tool in Verilog. The performance of the proposed multiplier design is examined through different family of FPGA devices. From Table 1, it is observed that the path delay means the total time required for the designing of the multiplier and considers it in terms of nanosecond (ns). Area is the total lookup table (LUT) which is used to design the proposed architecture. Power consumption is also an important parameter. More power consumption is the demerit for the multiplier. Table 1 shows the synthesis report of conventional multiplier and our proposed signed VM multiplier using Virtex 4 FPGA devices. From the above table, it has been observed that the proposed signed VM architecture shows a significant improvement in the area delay than the conventional multiplication methods. The area of the designs is noted on the basis of the number of lookup tables. Hence, the Vedic multiplier has played a significant role in the proposed work. After multiplication operation using UT sutra, we obtained the results and compared them with the existing standard models. The proposed structure is also implemented by using Spartan 3, XC3S50 FPGA device. Table 2 illustrates the comparison of proposed 16 × 16 signed VM design with the Wallace tree multiplier [12], Booth multiplier [12], signed Vedic multiplier [15], signed Vedic multiplier [8]. The suggested design has almost 72.62, 65.97, 58.21, and 36.45% less combinational delay over [8, 12, 15]. Table 2 Delay comparison for the proposed signed VM in Spartan-3 device
16 bit multiplier
Combinational path delay (ns)
Percentage of improvement
Wallace tree multiplier [12]
46.046
72.62
Booth multiplier [12]
37.041
65.97
Signed Vedic multiplier [15]
30.16
58.21
Signed Vedic multiplier [8]
19.832
36.45
Proposed signed VM
12.603
13 Analysis of Delay in 16 × 16 Signed Binary Multiplier
163
5 Conclusion In this work, we proposed the design of a conventional signed multiplier and an efficient signed Vedic multiplier using UT sutra. A simplified design of signed multiplier is presented to be used in digital verification systems. It presents an architecture from unsigned decimal number to signed binary number in the multiplication process. The suggested design is synthesized using ISE Xilinx 14.5 and implemented using different FPGA devices. From the results, we observed that the use of UT sutra in signed binary multipliers helps in reducing the combinational path delay and area. This also improves the system performance in terms of execution speed. The proposed design is compared with the previously reported multiplier architectures [8, 12, 15]. From the results, the superiority of the suggested design is claimed. The work can be extended in implementation with higher bit size multiplier design in future.
References 1. Tirtha S, Agrawala V, Agrawala S (1992) Vedic mathematics. Motilal Banarsi Dass Publ, India 2. Patali P, Kassim S (2020) An efficient architecture for signed carry save multiplication. IEEE Lett Comput Soc 3(1):9–12 3. Parhami B (2010) Computer arithmetic: algorithms and hardware designs. Oxford University Press, New York 4. Poornima M, Shivukumar S, Shridhar KP, Sanjay H (2013) Implementation of multiplier using vedic algorithm. Int J Innov Technol Explor Eng 2(6):219–223 5. Gorgin S, Jaberipur G (2009) A fully redundant decimal adder and its application in parallel decimal multipliers. Microelectron J 40(10):1471–1481 6. Thapliyal H, Arbania HR (2004) A time-area-power efficient multiplier and square architecture based on ancient Indian vedic mathematics. In: Proceedings of the international conference on VLSI (VLSI’04), Las Vegas, Nevada, pp 434–439 7. Thapliyal H, Srinivas MB (2004) High speed efficient N × N parallel hierarchical overlay multiplier architecture based on ancient Indian vedic mathematics. Trans Eng Comput Technol 8. Sahoo S, Bhoi B, Pradhan M (2020) Fast signed multiplier using Vedic Nikhilam algorithm. IET Circuits Dev Syst 14(8):1160–1166 9. Barik R, Pradhan M (2017) Efficient ASIC and FPGA implementation of cube architecture. IET Comput Digital Tech 11(1):43–49 10. Kasliwal PS, Patil BP, Gautam DK (2011) Performance evaluation of squaring operation by vedic mathematics. IETE J Res 57(1):39–41 11. Sethi K, Panda R (2015) Multiplier less high-speed squaring circuit for binary numbers. Int J Electron 102(3):433–443 12. Bansal Y, Madhu C (2016) A novel high-speed approach for 16 × 16 vedic multiplication with compressor adders. Comput Electr Eng 49:39–49 13. He Y, Yang J, Chang H (2017) Design and evaluation of booth- encoded multipliers in redundant binary representation. In: Proceedings of embedded systems design with special arithmetic and number systems, pp 113–147 14. Palnitkar S (2003) Verilog HDL: a guide to digital design and synthesis. Prentice Hall Professional, India 15. Barik RK, Pradhan M, Panda R (2017) Time efficient signed vedic multiplier using redundant binary representation. J Eng 2017(3):60–68
164
N. Behera et al.
16. Madenda S, Harmanto S (2021) New approaches of signed binary number multiplication and its implementation in FPGA. Jurnal Ilmiah Teknologi dan Rekayasa 26(1):56–68 17. Imaña JL (2021) Low-delay FPGA-based implementation of finite field multipliers. IEEE Trans Circuits Syst II Express Briefs 68(8):2952–2956 18. Ullah S, Schmidl H, Sahoo SS, Rehman S, Kumar A (2020) Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans Comput 70(3):384–392 19. Paldurai K, Hariharan K (2015) Implementation of signed vedic multiplier targeted at FPGA architectures. ARPN J Eng Appl Sci 10(5):2193–2197 20. Pichhode K, Patil M, Shah D, Chaurasiya B (2015) FPGA implementation of efficient vedic multiplier. In: Proceedings of international conference of IEEE, pp 565–570
Chapter 14
Review of Machine Learning for Antenna Selection and CSI Feedback in Multi-antenna Systems Garrouani Yassine, Alami Hassani Aicha, Mrabti Fatiha, and Dhassi Younes
1 Introduction In the last three decades, multi-antenna systems were suggested as the best solution to improve a wireless communication system’s reliability. Ranging from single-user MIMO systems to massive MIMO ones, they succeeded in combating deep fading events that single-antenna systems used to suffer from, this was mainly achieved using the transmit diversity and receive diversity setups. In addition, thanks to spatial multiplexing, many users had become able to be served simultaneously over the same time–frequency resource, which contributed substantially in improving the overall system’s spectral efficiency. However, for these breakthroughs to come to fruition, there are some limiting factors that need some consideration during the design of such systems. On the first hand, Gao et al. [1] showed that within the antenna array, there might be some impairing antennas not performing as expected causing a degradation in the system performance. Moreover, the hardware cost and energy consumption have also been two subjects of concern, especially that in an ideal setup, every antenna should have its own radio-frequency (RF) module. On the other hand, as the wireless communication is being carried over an ever changing environment, the manifestation of phenomena as fast-fading caused by users with high-mobility, pilot contamination arising from pilot reuse in neighboring cells and so on is to be expected, and this makes the task of acquiring channel state information (CSI) more challenging especially in systems operating in the frequency division duplexing (FDD) mode. In such systems, the wireless channel is estimated in downlink by the user equipment (UE) and then fed back to the base station (BS) on the uplink which introduces an overhead that scales with the number of antennas at the base station side. G. Yassine (B) · A. H. Aicha · M. Fatiha · D. Younes Sidi Mohamed Ben Abdellah University Fez, Fes, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_14
165
166
G. Yassine et al.
To overcome the limitations mentioned above, and as a workaround for the impairing antenna issue as well as the cost and energy concerns, Arash et al. [2] and Ouyang and Yang [3] have suggested to deploy less RF modules than antennas and activate only the antennas that optimize the system performance. This implies the selection of a subset of antennas among the available ones with respect to a given criterion such as bit-error-rate (BER) or signal-to-noise ratio (SNR), the unselected antennas are seen as degrees of freedom that could be resorted to later when their conditions become favorable. Machine learning (ML) has recently emerged as one of the best choices that can be resorted to in order to handle the problems emerging in wireless communication systems in a new way and precisely following a data-driven approach as detailed by Wen et al. [4] and Zhang et al. [5]. Besides the diverse optimization techniques used by Jounga and Sun [6], Gorokhov et al. [7] and Gharavi-Alkhansari and Gershman [8] and mainly the conventional schemes relying on exhaustive search and on iterative algorithms, Joung [9], Yao et al. [10] and Yao et al. [11] used machine learning (ML) to approach this kind of problems in a new way, more specifically, consider the problem of antenna selection as a multiclass classification problem. Regarding the overhead in FDD-based massive MIMO systems, many researchers proposed some techniques to mitigate its impact on the system performance by means of CSI compression prior to feedback. To fulfill this compression task, Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15] used machine learning techniques and especially deep learning ones. The following paper is organized as follows: Sect. 2 describes the system model of transmit antenna selection (TAS) for single-user MIMO as well as untrusted relay networks adopted by Joung [9], Yao et al. [10] and Yao et al. [11]. Next, we describe in detail the ML-based schemes that were suggested to solve the TAS problem. In Sect. 3, we discuss the problem of channel state information (CSI) in FDD-based massive MIMO systems and especially the overhead originating from feeding back the measured CSI to the base station transceiver. Next, we describe the methods that suggested the use of ML-based schemes to achieve optimal CSI feedback before moving to the explanation of the compression via convolutional neural networks. Finally, Sect. 4 wraps up the paper.
2 TAS for Single-User MIMO and Untrusted Relay Networks 2.1 System Model As illustrated in Fig. 1, Joung [9] assumed a single-user MIMO system where the transmitter and the receiver are equipped with N T and N R antennas, respectively. At the transmitter side, only N s RF chains are deployed where N s < N T . The wireless channel between the two communicating sides is represented by a matrix of size N R × N T noted as H and described as follows:
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
167
Fig. 1 MIMO system with antenna selection
⎡
h 11 h 12 ⎢h ⎢ 21 h 22 ⎢ H =⎢ . . ⎢ ⎣ . . h NR 1 h NR 2
. . . . .
⎤ . h 1NT . h 2NT ⎥ ⎥ ⎥ . . ⎥ ⎥ . . ⎦ . h N R NT
(1)
where each hij coefficient represents the complex fading coefficient between the ith receiving antenna and the jth transmitting antenna. The magnitude |hij | of these coefficients is assumed to be Rayleigh distributed. The communication between the two sides is modeled using the equation below: y = H.x + n
(2)
where y is the N R × 1 signal received at the receiver side, H is the CSI matrix representing the wireless channel over which communication is being carried, x is the N T × 1 data signal transmitted by the base station and n represents the AdditiveWhite-Gaussian-Noise. The aim is to select the best Ns antennas out of the N T available antennas with respect to a key performance indicator (KPI) and activate them for communication. To perform this, Joung [9] considered the use of K-nearest neighbors (KNNs) and support vector machine (SVM) to build a CSI classifier that takes as input a new CSI matrix and outputs the subset for which the communication would be optimal. Regarding the untrusted relay networks, Yao et al. [10] and [11] considered a TDDbased wireless communication system comprising a source that is equipped with NT transmitting antennas and serving a single-antenna destination. They assumed the communication to be carried along a non-line of sight path with the presence of
168
G. Yassine et al.
Fig. 2 Untrusted relay network
shadowing objects, and hence, the communication between the source and the destination is achieved via an untrusted relay node that serves primarily for amplifying the attenuated signals prior to forwarding them to the end user. The wireless channel between the source and the relay is represented by a vector h of size N T × 1, while the channel between the relay and the destination is represented by a scalar g. Each entry in h represents the complex fading coefficient between a transmitting antenna and the single-antenna relay node. The communication is performed in two phases as illustrated in Fig. 2. Within the first phase, the source sends pre-coded data to the relay, and the destination as well sends a jamming signal that serves primarily for preventing the relay from interpreting the pre-coded data. During the second phase, the relay amplifies the signal and forwards it to the destination. As approached by Joung [9], the authors considered the source to have only N s RF modules where N s < N T, and the aim is to find the best antenna subset with respect to a KPI. They suggested the use of the following ML techniques to solve the combinatorial search problem: SVM, Naïve Bayes, KNN and deep neural networks (DNNs).
2.2 Dataset Generation and CSI Classifier Building: Joung [9] generated a training set comprising M channel realizations of size N R × N T . As a KPI, the singular value decomposition (SVD) of submatrices was used to evaluate the C NNTs possible antenna combinations of each training CSI matrix looking for the best antenna combination. The purpose of this operation is to label the training CSI matrices against which new channel realizations will be evaluated. After that, each of the M CSI matrices is reshaped to a 1 by N feature vector where N = N R . N T and its coefficients are constructed by computing the squared magnitude |hij |2 of each complex channel coefficient and assigned the label found during the evaluation phase. Similarly, Yao et al. [10] and [11] generated a training set containing N channel realizations of size N T × 1 and computed the magnitude |hij | of each complex fading coefficient to construct the feature vector as follows: {|h 1 |, |h 2 |, . . . , |h NT |, |g|}. As a KPI, they used the secrecy rate and evaluated the C NNTs antenna combinations looking for the one maximizing that rate. The obtained training datasets in the three aforementioned papers are normalized prior to evaluating new channel realizations.
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
169
Fig. 3 KNN classification
2.2.1
KNN
It is a non-parametric method that can be used for classification as well as for regression. In the case of multi-antenna systems, KNN is used as a classification method to build a CSI classifier. It is a quite simple algorithm that classifies a sample into the class of the majority of its K closest neighbors, the closest in terms of the Euclidean distance for example as illustrated in Fig. 3. From the example above, we can see clearly how the choice of K influences the classification decision. In the case of antenna selection, this bias can be considered as a shortcoming that might lead to choosing the less optimal subset of antennas. So, K is also subject to optimization. Joung [9] and Yao et al. [10] preceded the evaluation of a new channel realization by extracting its feature vector as well as normalizing it. After that, an optimal antenna subset could be found by computing the Euclidean distance between every feature vector in the training dataset and the newly constructed feature vector. Once done, the results are sorted in the ascending order. Finally, among the K smallest values, the label of samples that represent the majority is chosen, and the new CSI matrix is assigned that label.
2.2.2
SVM
As KNN, SVM also can be used for both classification and regression. But unlike KNN, SVM aims to find a decision boundary or a hyperplane that helps classify the data points into different classes as illustrated in Fig. 4. There are many hyperplanes or boundaries that could separate the two classes of data. However, SVM aims to find
170
G. Yassine et al.
Fig. 4 SVM classification
a boundary that is located at a maximal distance from the data points of both classes; this will help to classify future data points accurately and with more confidence. Since antenna selection is approached as a multi-class classification problem, Joung [9] and Yao et al. [10] used the one-vs-all also known as one-vs-rest method to find the boundary separating an antenna subset from the remaining subsets. As binary classification was used, the M labeled training feature matrix is split into two sub-training matrices. One is containing the feature vectors having the same label, and the second is containing the remaining training samples. For every binary classification, a 1 by M label vector is generated. Its elements will be set to 1 for the indices having the same label and 0 otherwise. The boundary between each class and the remaining classes is found via solving a logistic regression problem where the purpose is to find the optimal parameters that minimize a cost function. If there are L classes, then there will be L optimal parameters, i.e., {θ1 , θ2 , . . . ., θ L }. After finding all thetas, a new channel realization could be evaluated. Its feature vector f is extracted, normalized and then for i ∈ L , θiT . f is computed. The largest value is kept, and the new channel realization is assigned its label.
2.2.3
Naïve Bayes
The Naïve Bayes algorithm is a probabilistic classifier based on the Bayes formula expressed below. It is a quite simple algorithm to implement. To predict the class of an input sample, the algorithm considers each of the features of the input sample to contribute independently to the probability that the sample is part of a given class of samples. It overlooks the correlations between features and the relationships that might be between them. In other words, it assumes that the existence of a feature in a
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
171
class of samples is unrelated to the existence of the remaining features in this class. The formula below enables us to compute the posterior probability of a new feature vector f belonging to a given class c: P(c| f ) =
P( f |c).P(c) P( f )
(3)
To perform transmit antenna selection as suggested by Yao et al. [10], for every new channel realization, the feature vector is constructed according to the operations carried in the training dataset generation and the posterior conditional probability that this vector belongs to a given class is computed. This operation is repeated for all classes. Among all the computed probabilities, the maximal value is kept, and the vector is assigned that class label.
2.2.4
DNN
Concerning the deep neural networks, they are structured into many layers as illustrated in Fig. 5. Each layer comprises a set of nodes called neurons connected to the nodes of the next layer. The transition from a layer to its successor is achieved through a matrix called the weights matrix. This latter controls the mapping between layers. At the output layer level, we get a hypothesis function called the classifier that can predict the class label of an unseen input. Yao et al. [11] fed the labeled training dataset to the neural network for the purpose of learning the mapping between each feature vector in the training set and its corresponding label. Since the transmit antenna selection is approached as a multi-class classification problem, the output layer comprises L nodes.
Fig. 5 Neural network with multiple nodes at the output layer
172
G. Yassine et al.
The following table contains a summary of the setups used to simulate the MLbased classifiers in the aforementioned papers and presents the pros as well as the cons of the adopted ML algorithms.
2.3 Performance Analysis of ML-Based CSI Classifiers Besides the advantages and disadvantages summarized in Table 1, the performance of the adopted ML algorithms has to be evaluated with respect to the operations carried in the training phase as well as the decision-making one not to forget their extensibility to serve large-scale MIMO systems. Regarding the training carried by Joung [9], Yao et al. [10, 11, Junior et al. [16] and Vu et al. [17], evaluating all the possible combinations looking for the best one might be affordable in single-user MIMO systems as the number of antennas does not exceed eight. But for massive MIMO systems where the number of antennas deployed at the base station side is large, this becomes computationally heavy and time consuming. One way to accomplish this phase is by offloading the training to an edge datacenter where the evaluation of the training CSI matrices could be performed using parallel computations; this will accelerate the process of building the CSI classifiers. For KNN, it computes M Euclidean distances and bases its decisions only on the K smallest ones. Computing M Euclidean distances is acceptable in the case of single-user MIMO systems as the size of the flattened feature vector is relatively small and the dataset comprises only 2000 samples, but for massive MIMO systems, this becomes heavy as the size of that vector gets larger as well as training dataset size. Moreover, the choice of K is crucial as different values of it can lead to different outcomes, which makes it a parameter subject also to optimizations and should not be set statically. In addition, a high memory is required to store the M labeled training feature vectors at the base station side. Regarding SVM, binary classification was used to solve the multi-class classification problem, and this means that C NNts binary classifications are required before finding the class label of the best antenna combination; this implies also solving C NNts regression problems looking for the optimal parameters that minimize the cost function. As the number of antennas in a massive MIMO system is large, the number of possible combinations gets larger. Consequently, the training phase will require a considerable time before converging and ultimately, finding out the hyperplanes that separate the different classes. Not to forget that for both KNN and SVM, the number of samples in the training set must be way greater than that of possible antenna combinations (M > C NNts ). For Naïve Bayes approach, it computes the probability of a new feature vector given that it belongs to a specific class, which implies computing the frequency of occurrence of each of its coefficients within the training feature vectors. In a worst case, given that all possible classes might have occurred in the training phase, C NNts posterior probabilities have to be computed which might be affordable in single-user MIMO but not in large-scale MIMO systems. In addition, for some new CSI realizations, it is possible to get a null conditional probability for all classes and hence, become
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
173
Table 1 Summary of setups for TAS in single-user MIMO and untrusted relay networks System
Single-user MIMO
{N T , N S , N R }
{8, 1, 1} and {6, 2, 2} {6, 2, 1} and {6, 1, 1}
Training dataset size 2000 CSI matrices
Untrusted relay node 10,000 CSI matrices
Number of classes
C81 = 8andC62 = 15
C61 = 6andC62 = 15
KPI
Max[SVD(H min )]
Secrecy rate
|2
Feature vector
|h i j
Algorithms
KNN and SVM
|h i j | SVM, KNN, Naïve Bayes and DNN
Pros KNN
It is easy to implement and fast as there is no training phase. There is a variety of metrics: Euclidean, Manhattan, Minkowski distance
SVM
It is accurate on cleaner and relatively small datasets
Naïve Bayes
It is simple, fast on small datasets and suitable for real-time applications; in addition, it is not sensitive to irrelevant features
DNN
It is fast once trained, and its performance improves with more datasets
Cons KNN
It is slow for large datasets as in the case of TAS for massive MIMO systems. In addition, it requires a high memory to store the training data. Not to forget the difficulty of choosing an optimal k
SVM
It is slow for large datasets and performs poorly on noisier datasets
Naïve Bayes
It overlooks the dependency between features especially the correlation between adjacent antennas, and it suffers from the zero-frequency problem
DNN
It is difficult to choose a network model. It is seen as a black box which makes the interpretability of the output difficult, and it is expensive computationally
unable to select the optimal antenna subset. This is a problem that Naïve Bayes approach suffers from and could be combatted only if the size of the training dataset is increased so that the odds of getting null conditional probability gets substantially minimized. Regarding DNN, the number of neurons in the input layer is equal to N = N t . N r which is manageable in single MIMO systems but might not be in massive MIMO ones. In addition, unlike KNN, SVM and Naïve Bayes, the black box nature of neural networks makes the interpretability of the output class label difficult which gets complicated when there are many hidden layers in the network. Not to forget the choice of a network architecture which might require adjustments if not testing diverse architectures before finding the one that performs better on our problem. In addition to the above points, as the environment over which wireless communication is carried evolves in time, so should training. If the built models were trained on
174
G. Yassine et al.
a given time period, this implies that there exist some antennas that outperforms their sisters over that period of time but not necessarily over the upcoming time periods. This requires re-training the models on a regular basis so that they can keep up with the future changes and hence avoid penalizing some antennas. The frequency at which re-training should be carried depends on the long-term behavior of the wireless channel which needs to be determined. We can reduce the rate at which retraining will occur by performing training on channel realizations that were acquired on different time instances with enough separation in time.
3 FDD-Based Massive MIMO Systems For massive MIMO systems, the time division duplexing (TDD) is considered the preferred operating mode when compared to frequency division duplexing (FDD) as the channel estimation is performed only by the base station transceiver on uplink, and the acquired CSI matrix is used to pre-code users’ data in downlink, this has been examined thoroughly by Flordelis et al. [18]. In addition, this trend is motivated by the fact that channel estimation in FDD mode introduces a high overhead. In this mode, the channel needs to be estimated by users in the downlink and then fed back to the base station in the uplink. Moreover, the feedback overhead increases as the number of antennas in the array gets increased making the system perform in a less optimal way. Many researchers proposed some techniques to enable FDD-based massive MIMO systems; most of them aim mainly to reduce the feedback overhead by means of CSI compression. Their concept can be summarized as follows: The channel estimated by UEs on the downlink undergoes some processing before being fed back to the BS. To reduce the dimensionality of the estimated CSI matrix, its key features are extracted and then compressed before being sent back to the base station. In the following part, we are giving a brief description of the compression scheme adopted by Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15] before we explain in details the core technique of these compression schemes which is the convolutional neural network.
3.1 System Model The system model adopted in the aforementioned papers is described as follows: The BS is equipped with NT antennas and serves single-antenna UE. The multi-carrier transmission technique known as OFDM is used, and data is transmitted over N C subcarriers. The communication between the BS and the UEs is modeled by the following equation: y = H.v.x + n
(4)
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
175
where y is the signal received at the UE, H is a matrix of size N C × N T representing the wireless channel response over the N C subcarriers between the BS and the served UE, v is the pre-coding vector used to pre-code the signal x in downlink and n is the Additive-White-Gaussian-noise. The aim is find the least possible representation of H to and feed it back to the BS. To fulfill this task, deep learning (DL) techniques were used. Wen et al. [12] used compressed sensing technique which is a technique that enables the recovery of an estimated channel from fewer numbers of samples compared to the number of samples required by the Nyquist-Shannon theorem. It assumes the channel to be sparse in some spatial directions and hence can be represented using short code-words which reduces considerably the feedback overhead. The suggested CSI sensing and recovery network is called CsiNet, and it comprises an encoder that serves primarily for dimensionality reduction and a decoder at the BS side to recover the CSI from the received code-words. The architecture of CsiNet is as follows: The estimated CSI matrix at the UE is transformed to the angular-delay domain using the 2D discrete Fourier transform (DFT), and its real and imaginary parts are fed to a convolutional neural network (CNN) for feature extraction. After that, the acquired features are fed to a fully connected layer to compute the appropriate code-word. At the BS station side, the reverse operations are carried out. The received code-word is fed to a fully connected layer to reconstruct real and imaginary parts of H. To refine further the reconstructed CSI matrix, RefineNet units are used. Ultimately, the real and imaginary parts of H are combined to start pre-coding data in downlink. Yang et al. [13] proposed a network that comprises a feature extractor module, a quantization module and an entropy encoder module as illustrated in 8. The feature extraction is achieved through a CNN that aims to output a low representation of the channel matrix, the quantization module quantizes this low representation into a discrete valued vector, while the last module tries to achieve the most possible compression rate by means of entropy encoding. The same modules are reversed at the base station level. Its CSI recovering module comprises a feature decoder as well as an entropy decoder. Figure 6 illustrates the proposed architecture: Lu et al. [14] proposed a recurrent neural network (RNN) whose architecture is illustrated in Fig. 7. It is composed of a UE encoder that comprises a feature extractor that extracts the key features of the channel and a feature compressor that tries to represent the key features by the shortest possible bit sequence, it encompasses a longshort term memory to infer the correlation that might exist between different inputs,
Fig. 6 CSI feedback with entropy encoder/decoder
176
G. Yassine et al.
Fig. 7 CSI feedback with encoder–decoder modules
which will enable the system to catch the temporal correlation between channel realizations and hence improve the performance of the hypothesis function. The decoder at the base station side performs the reverse processing. It comprises a feature decoder and a feature decompressing module. The architecture described above is illustrated in Fig. 7. In order to further improve the downlink estimated CSI for FDD systems, Liu et al. [15] proposed a scheme that exploits the correlation that has been proved in the previous works between the downlink and the uplink channels. The encoder separates the magnitude of the CSI matrix from the phase. This latter is quantized and then sent directly to the base station, while the magnitude is fed to a convolutional layer for feature extraction. After reshaping the acquired feature map, it is fed to a fully connected layer to compute the code-word corresponding to the magnitude of the input channel realization. On the other hand, the decoder does the reverse processing but in addition, it exploits the estimated uplink CSI available at the base station side to improve the estimated downlink CSI, and it uses two blocks of residual layers to overcome the vanishing gradient descent problem which prevents artificial neural network from keeping training. At the output level, the estimated magnitude as well as the estimated phase are combined together to recover the channel matrix. Table 2 summarizes the setups used to simulate the DL-based CSI feedback scheme suggested in the papers mentioned above.
3.2 CNN for Compression The common factor between the feedback schemes suggested by Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19] is the use of CNN. In this part, we explain in detail the architecture of a CNN in the field of imagery visualization, but the same concept holds for massive MIMO systems since the channel matrix can be seen as a two-dimensional image. A convolutional neural network is a deep learning technique that is applied to analyze images. It is useful for catching the key features of an image without worrying about losing features which are critical for its accurate recognition. Moreover, it has the ability to catch the spatial as well as the temporal dependencies in an image by means of the application of relevant filters. It comprises three main modules: convolutional layers, pooling layers and a fully connected layer. Figure 8 illustrates the convolution
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
177
Table 2 Summary of setups in DL-based CSI feedback schemes Paper
[12]
[13]
[14]
[15]
DL scheme
CsiNet
DeepCMC
RecCsiNet PR-RecCsiNet
DualNet-MAG DualNet-ABS
Channel model
COST 2100
Scenario
Indoor Indoor 5 GHz 5.3 GHz Outdoor 300 MHz
Indoor
Indoor 5.3 GHz UL 5.3 GHz DL Semi-urban outdoor 260 MHz UL 300 MHz DL
Number of antennas NT
16, 32, 48
32
32
32
Number of subcarriers NC
1024
128,160, 192,224, 256
1024
1024
PKI
NMSE and cosine similarity
NMSE
Compression rate
1/14, 1/16, 1/32, 1/64
1/32, 1/64
–
–
operation between an image of size 5 × 5 × 1 represented in green and a kernel of size 3 × 3 × 1 represented in yellow with its coefficients written in red color. A convolutional layer aims to extract the low level features through a filter called the kernel. Small in size when compared to the input image, the kernel has to hover over portions of the image which have its same size. It repeats the process until the entire image is fully covered. Since one convolutional layer is able to catch only low level features, adding more convolutional layers enables the network to catch high-level features and acquire a full understanding of the input image. Undergoing these convolutional layers can lead to two possible results: a reduction in feature Fig. 8 Convolution operation between an input image and a kernel
178
G. Yassine et al.
Fig. 9 Pooling with a 3 × 3 × 1 kernel
dimensionality by means of the valid padding technique and preserving or increasing it through the so-called technique SAME padding. Figure 9 illustrates the convolution operation between an image of size 5 × 5 × 1 represented in green and a kernel of size 3 × 3 × 1 represented in yellow with its coefficients written in red color: The output of these convolutional layers is then fed to the next layer which is called the polling layer. This latter aims to reduce the spatial size of the convolved features; the operation it performs tries to extract the dominant features which stay invariant by rotation as well as translation. It could be considered as a down sampling operation. There are two types of pooling as in Fig. 11 such as max pooling where the kernel keeps the maximal value of the image portion over which the kernel hovers and average pooling which computes the average value of the image portion in question. Figure 10 illustrates the two pooling techniques: The output of these pooling layers is then flattened and fed to a fully connected layer to output the least possible representation of the input. The whole architecture of a CNN is illustrated in figure. Fig. 10 Two pooling techniques
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
179
Fig. 11 Convolutional neural network architecture
3.3 Analysis of the ML-Based CSI Feedback Techniques To get the least possible representation of CSI prior to feedback, Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19] used CNN which is a lossy compression technique that tries to catch the main features in a matrix. After undergoing the convolved CSI matrix max polling, a considerable amount of complex channel coefficients is lost which cannot be recovered after feedback, i.e., if the size of polling filter is N by N, then N.(N − 1) values are gone, and only the max value is kept which is a loss of information. Besides the errors caused by the noisy wireless channel, undergoing the data quantification as suggested by Yang et al. [13] will add extra errors which might adversely affect the recovery of the right CSI at the base station side. As suggested by Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15], after transforming the channel matrix H to the delay domain, the N C × N T matrix is truncated to a N T × N T squared matrix assuming that the N C − N T remaining rows are all equal to zero, this implies that the signals carried over the N C − N T subcarriers undergo deep fading which attenuates them totally, this assumption is not practical in massive MIMO-OFDM systems. In addition, the assumed spatial sparsity requires some hardware-related conditions to be met such as deploying a large antenna array in terms of the number of antennas as well as their aperture, and in most cases, it does not hold. Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19]and papers similar to them were aiming mainly to reduce the feedback overhead on uplink by mean of CSI compression but none of them had explained how the CSI matrix were firstly acquired at the UEs or even addressed the mapping of pilot symbols over the OFDM resource grid in downlink. As it is known when pilot-aided channel estimation (PACE) is used in FDD systems, the base station and the UEs share a priori the same cell reference signals literally known as pilot signals. These signals enable the user to extract the channel response at their positions and then infer
180
G. Yassine et al.
the channel response of the remaining subcarriers by means of a time–frequency interpolator. In a multi-antenna LTE system, the mapping of pilot symbols over the OFDM resource grid assumes that within a resource block (RB) that spans 12 subcarriers in frequency and 7 OFDM symbols in time when cyclic prefix is used, when an antenna transmits a pilot symbol over a resource element, the remaining antennas should be silent or transmit what is called spectral nulls over the same resource element, and this is required to avoid interference between the pilot symbols and ease the channel estimation at the UE side. But, owing to the half and the half rule stating that if more than half of the available time–frequency resources are used for operations other than data transmission, then the digital communication system is no longer optimal, and the extension of the LTE approach to massive MIMO systems is not possible as the whole resource block will be consumed only by pilot symbols. So, the feedback overhead is really a concern when it comes to FDD-based massive MIMO systems, but we should not overlook how the CSI was acquired in the first place, especially how the pilot symbols are mapped over the resource grid as this is a very challenging task. Assuming the availability of CSI at the UE side, the aforementioned papers tried to compress the whole CSI matrix before feeding it back to the base station. An important point we want to put the emphasis on here is as follows: Given that the base station and the UEs share a priori the pilot signals, why not making the UEs to only compress the sub-channels measured at the positions of pilot symbols instead of applying the ML-based compression schemes on the whole resource blocks. If proceeding this way, the recovery of the whole resource grid at the base station side will be possible as the expansion of the fed back sub-matrix after uncompressing it to the whole CSI matrix requires only a time–frequency interpolator. By doing so, a considerable amount of time–frequency resources will be conserved on the uplink as well as an important load will be shifted to the base station side given its ability when compared to battery-powered UE, this will contribute also in improving the energy efficiency at the UE side.
4 Conclusion In this paper, we have put the spot on two main topics where machine learning techniques have shown their upper hand over classical ones. Antenna selection can be used to turn-off some antennas when the traffic demand in a cell goes below a predefined threshold or when there are few active users. Regarding the CSI feedback in FDD-based MIMO systems, we believe that pilot-aided channel estimation should be given some attention as in the literature, and there is no mapping scheme for pilot symbols over the resource grid. In addition, as machine learning techniques are computationally demanding and need training before converging, we have to note that the implementation of the ML-based schemes is mainly challenged by the energy constraints at the UE side. Consequently, future works have to put the UE energy
14 Review of Machine Learning for Antenna Selection and CSI Feedback …
181
efficiency as their core objective before proposing schemes that solve the issues at the base station side but ultimately comes at the expense of the UE.
References 1. Gao X, Edfors O, Tufvesson F, Larsson EG (2015) Massive MIMO in real propagation environments: do all antennas contribute equally? IEEE Trans Commun 1–12 (early access articles) 2. Arash M, Yazdian E, Fazel MS, Brante G, Imran G (2017) Employing antenna selection to improve energy-efficiency in massive MIMO systems. arXiv: 1701.00767 [cs.IT] 3. Ouyang C, Yang H (2018) Massive MIMO antenna selection: asymptotic upper capacity bound and partial CSI. arXiv:1812.06595 [eess.SP] 4. Wen C-K, Shih W-T, Jin S (2019) Machine learning in the air. arXiv:1904.12385 [cs.IT] 5. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. arXiv:1803.04311[cs.NI] 6. Jounga J, Sun S (2016) Two-step transmit antenna selection algorithms for massive MIMO. In: IEEE international conference on communications 7. Gorokhov A, Gore D, Paulraj A (2003) Receive antenna selection for MIMO flat-fading channels: theory and algorithms. IEEE Trans Inf Theory 8. Gharavi-Alkhansari M, Gershman AB (2004) Fast antenna subset selection in MIMO systems. IEEE Trans Signal Process 9. Joung J (2016) Machine learning-based antenna selection in wireless communications. IEEE Commun Lett 10. Yao R, Zhang Y, Qi N, Tsiftsis TA (2018) Machine learning-based antenna selection in untrusted relay networks. arXiv:1812.10318 [eess.SP] 11. Yao R, Zhang Y, Wang S, Qi N, Tsiftsis TA, Miridakis NI (2019) Deep learning assisted antenna selection in untrusted relay networks. arXiv:1901.02005 [eess.SP] 12. Wen C-K, Shih W-T, Jin S (2018) Deep learning for massive MIMO CSI feedback. IEEE Wirel Commun Lett 13. Yang Q, Mashhadi MB, Gunduz D (2019) Deep convolutional compression for massive MIMO CSI feedback. arXiv:1907.02942 [cs.IT] 14. Lu C, Xu W, Shen H, Zhu J, Wang K (2018) MIMO channel information feedback using deep recurrent network. arXiv:1811.07535 [cs.IT] 15. Liu Z, Zhang L, Zhi D (2019) Exploiting bi-directional channel reciprocity in deep learning for low rate massive MIMo CSI feedback. IEEE Wirel Commun Lett 16. de Souza W Jr, Bruza Alves TA, Abrão T Antenna selection in non-orthogonal multiple access multiple-input multiple-output systems aided by machine learning, 18 April 2021 17. Vu TX, Nguyen V-D, Nguyen DN, Ottersten B (2021) Machine learning-enabled joint antenna selection and precoding design: from offline complexity to online performance 18. Flordelis J, Rusek F, Tufvesson F, Larsson EG, Edfors O Massive MIMO performance—TDD versus FDD: what do measurements say?, 3 Apr 2017 19. Madadi P, Jeon J, Cho J, Lo C, Lee J, Zhang J PolarDenseNet: a deep learning model for CSI feedback in MIMO systems, 2 Feb 2022
Chapter 15
Cassava Leaf Disease Detection Using Ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3 Models Hrishikesh Kumar, Sanjay Velu, Are Lokesh, Kuruguntla Suman, and Srilatha Chebrolu
1 Introduction Image classification [2] is the task of understanding the main content of an image, which is easy for humans but hard for machines. Existing approaches towards the diagnosis of plant leaf diseases need the assistance of an agricultural specialist to visibly investigate and diagnose plants. These methods are labour-intensive, lowyield and expensive. As an added challenge, successful solutions for diagnosing the disease must execute well under notable constraints utilising the least possible resources, since some may only have availability to low quality mobile cameras. The image-based disease diagnosis has made much more impact than the older traditional practices previously in use as it is efficient, effective and non-subjective [14, 16]. Image processing is one of the major technologies being used for localizing infected parts in disease-ridden plant leaves. The model’s accuracy faces a bottleneck as trained models struggle to detect the presence of disease confidently due to its similarity with other diseases on images. The recent advancements in machine learning, especially in deep learning, have guided us to make a promising performance in disease detection [11]. The proposed model deals with using such advances and creates an ensemble model that efficiently classifies and localizes the Cassava leaf disease. The model takes a Cassava leaf image as input to identify and localize its disease. The dataset used for the training and validation of this model is the Cassava leaf disease dataset [15]. The dataset is introduced by Makerere University AI Lab. The dataset is divided into two parts for (i) training and (ii) validation purposes. The leaf images are of type RGB coloured. The H. Kumar (B) · S. Velu · A. Lokesh · K. Suman · S. Chebrolu Department of Computer Science and Engineering, National Institute of Technology Andhra Pradesh, Tadepalligudem, India e-mail: [email protected] S. Chebrolu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_15
183
184
H. Kumar et al.
training dataset contains around 15,000 images and the validation dataset consists of around 6000 images. The total number of images in the dataset is around 21,400. The dataset is categorised into five classes as follows (i) Healthy Cassava leafs (ii) Cassava Bacterial Blight (CBB) (iii) Cassava Brown Streak Disease (CBSD) (iv) Cassava Green Mottle (CGM) and (v) Cassava Mosaic Disease (CMD). The proposed model will accurately detect the presence of disease, swiftly recognise infected plants, and assuredly preserve crops before they impose irreparable damage.
2 Related Work This section discusses about all the state-of-the-art models which are being used in the proposed ensemble model. The models that are being employed in the ensemble learning are EfficientNet [25], Squeeze-and-Excitation ResNeXt (SEResNeXt) [10], [29], Vision Transformer (ViT) [3], Data-efficient Image Transformer (DeIT) [26], MobileNetV3 [8].
2.1 EfficientNet A Convolutional Neural Network (CNN) [1] model’s architecture is created with fixed computational resources and then scaled up to improve accuracy. EfficientNet can increase the model’s performance by carefully balancing the network depth, width and resolution. A baseline network is chosen using the neural architecture search in the literature. And the baseline architecture is scaled up to build a family of models known as EfficientNet. These models outperform ConvNets [13] in terms of accuracy. The compound scaling method scales the networks over width, depth and resolution and can be generalized to existing CNN architectures such as MobileNet [9] and ResNet [7]. However, choosing an efficient baseline network is critical for acquiring the best results. The compound scaling method enhances the network’s predictive capability. This is achieved through replicating the convolutional operations along with the network architecture of the baseline network.
2.2 Squeeze-and-Excitation Networks In CNN, the central building block is the convolution operator, which serves the task of constructing informative features at each layer. Squeeze-and-Excitation (SE) block [10], is an additive to the convolution operator which is able to adaptively re-calibrate the channel-wise features response by modelling the inter-dependencies between channels explicitly. These fundamental blocks need to be stacked together to form a SENet architecture which is able to perform effective generalization to a
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .
185
greater extent across different datasets. The SE blocks also bring significant performance improvements in the existing state-of-the-art CNN, but with a slightly more computational cost.
2.3 ResNeXt ResNeXt50 [29], is an advanced computer vision model from the ResNet family. The ResNet-50 model of the ResNet family, is a CNN, having a depth of 50 neural layers. It can be pre-trained and is able to classify images in up to 1000 different categories. The network architecture is simple and highly modularized for the classification of images. This network is created by stacking multiple units of it to form a building block that aggregates transformations of similar topology. This unique design with only a few hyper-parameters forms a homogeneous, multi-branch architecture. The aforesaid technique leads to the discovery of a new dimension called cardinality [29]. The NeXt in ResNeXt refers to this newly found dimension, which is very helpful in improving the classifier efficiency of this model even under constrained environments just by increasing the cardinality of the architecture, in contrast to going into deeper layers.
2.4 ViT ViT [3], is a pure Transformer [27] which is directly applied to the sequences of image patches and shows high performance on image classification tasks, where CNN are not required. ViT models are generally pre-trained on large datasets, and then they are fine-tuned according to the smaller, downstream tasks. This work proves to be more beneficial when the fine-tuning at the higher resolutions than pre-training is done. The result of the above-mentioned technique is a larger effective sequence length as the patch size is kept the same while high-resolution images are being fed to the network. The ViT is thus able to handle arbitrary sequence lengths, and in addition to that, pre-trained position embedding is performed to maintain the integrity of the ViT model.
2.5 DeIT DeIT [26] introduced by Touvron et al., is one of those models which are based on CNNs but do not use many statistical priors of images. These DeIT models are just an advanced version of ViT models but do not require any external data for the purpose of training. This makes the DeIT model, one of the finest models to be used for image classification in constrained resource environments. The training strategy
186
H. Kumar et al.
of DeIT is a plus point, as it can simulate training on a much larger dataset than the computing environment may allow. DeIT contains an additional distillation token. As the performance of ViT decreases when it is trained on insufficient data. Distillation becomes an easier way to deal with this problem. Distillation is of two types, soft distillation and hard distillation. Soft distillations are convertible to hard distillation using label smoothing (InceptionV3) [24].
2.6 MobileNetV3 MobileNets [9] is designed in such a way that they can handle a lot of tasks in computer vision. They are suitable for object detection, face attributes, fine-grain classification, and large-scale geo-localization. They were basically designed for usage in mobile phones as well as embedded vision applications. These models use a streamlined architecture which makes the use of depthwise separable convolutions. Using this aforesaid strategy they build light weight deep neural networks. MobileNetV3 requires two hyper-parameters, both are global. These parameters are able to trade-off efficiently between accuracy and latency. The MobileNet is thus able to decide the correct size of modelling for the prescribed application using these hyper-parameters and various other usage constraints.
2.7 Stacked Ensemble Learning Generalization Stacking [28] involves integrating the predictions obtained from various machine learning models. These models have been evaluated on the same dataset. In stacking, the models that are present generally differ from one another and are then fitted on the same data. Dissimilar to boosting [19], this has a solitary model which is made to figure out the weights of predictions from the different models. The design incorporates two base models, referred to as level-0 models, and a meta-model. The meta-model joins the prediction obtained from the base models. The base models are trained on the complete training data. Preparing a training dataset for a meta-model may likewise incorporate several contributions from base models like k-fold crossvalidation [17, 18]. The meta-model is trained when the training data is available. Stacking-ensemble is suitable when various models have different skills i.e., the outcome predicted by models or else the errors made in predictions possess a small correlation. Other ensemble learning [4] algorithms may also be utilized as base models.
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .
187
3 Proposed Method This section describes the proposed model for the classification and localization of Cassava leaf disease using DeIT, EfficientNet, ViT, SEResNeXt, CropNet [6] and MobileNetV3. The architecture of the model is discussed with an explanation of pipeline-1 and pipeline-2. The model diagram as shown in Fig. 1 depicts the architecture of the model.
3.1 Architecture The visual identification and classification of these four diseases are being done by the ensemble model that has been proposed in this paper. Moreover, the idea of the transfer learning technique is employed for training the model pipelines. The
Fig. 1 The proposed model architecture for Cassava leaf disease detection
188
H. Kumar et al.
complete end-to-end pipeline design of the proposed model is shown in Fig. 1. Following are the two pipelines defined for classification tasks. Following are the two pipelines defined for classification tasks. Pipeline 1: The Cassava leaf dataset is used for training the models. The models used in this pipeline are EfficientNet, SEResNeXt, ViT and DeIT. All the models are first trained on the dataset and then their neural weights are recorded and kept for usage while inferencing is done. This pipeline has two modes. Both the modes are different in architecture. One of the modes is suitable for low-quality handheld devices and mode 2 uses stacking and is more efficient and more powerful. The classification done by these models falls under the category of multi-label classification. Mode 1: It contains only one model DeIT for inferencing purposes. The images are trained on dimensions 384*384*3. These image transformers do not require a very massive quantity of records to be trained and can work on a lesser amount of data. The linear transformation on DeIT is done to further increase the efficiency. The classification efficiency by DeIT is lesser than mode-2 but can perform on low resource availability. Mode 2: This mode consists of three models stacked in sequence for inferencing purposes. The linear transformation is applied to the models to improve the stacking efficiency. The first of the three models in stack EfficientNet, works on images of dimensions 512*512*3. A scaling approach that uniformly scales all dimensions of depth, width and resolution with the use of a compound coefficient. This model directly starts the inference of each image after the scaling. The second model ViT in the stack uses images of dimensions 384*384*3. The ViT model represents an entire image as a chain of image patches, just like the collection of phrase embeddings used whilst the usage of transformers to text, and immediately predicts class labels for the image. The third and the last model SEResNeXt in this stacking requires images of dimensions 512*512*3. The SEResNeXt structure is an addition to the deep residual network which replaces the usual residual block with one which leverages a splittransform-merge strategy. This model uses segmentation for inferencing the images, thus reducing the chances of any overfitting or misrepresentation of output. Pipeline 2: In the second pipeline, a pre-trained model, named as CropNet classifier for Cassava leaf disease is used. The CropNet classifier is built on MobileNetV3 architecture which shows an accuracy of 88% on Cassava leaf disease classification. A sequential MobileNetV3 model [23] has been used by the proposed architecture to create the CropNet replica and employs a weight loader to capture and load the classifier’s neural weights into the sequential model of dimensions 224*224*3. Thus formed model is used further to create the pipeline-2, in the ensemble model. This model replication is done due to the reason that the CropNet classifier classifies the images in six classes (CBB, CBSD, CGM, CMD, healthy, or unknown) in contrast to the five classes of Pipeline 1. Slicing of the prediction values is performed to remove the unknown class classification and create a suitable number of classes.
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .
189
4 Experimental Results and Analysis The complete experimental environment is the Kaggle Python kernel. Compute Unified Device Architecture (CUDA) [5] is used, for the training of image-level detection and classification in both the pipelines. The evaluation metric here used is Classification Accuracy. Test Time Augmentation (TTA) [20] is being performed in this work, as it helps to increase the efficiency of the model in the inference stage. TTA creates multiple versions of a single image for each test image during inference. The data pre-processing involves various techniques based on the data flow. First, the training data and validation data are segregated, and then transformation techniques are applied to both sets. The data is resized as per the specific model requirements. A random 50% of the dataset of both the sections is picked each time and then they are (i) transposed, (ii) flipped vertically or horizontally, (iii) rotated, and then it goes through (iv) hue saturation and (v) brightness and contrast changes. The data after all this procedure is normalised to not let the system create redundant data. The batch size for the training data is set as 128. Thus created dataset is then provided to mode-1 and mode-2 of pipeline-1 where mode-1 contains DeIT model and mode-2 consists of EfficientNet, ViT, SEResNeXt where Stacking is performed on the output from mode-2. Out of both of these models, anyone can be chosen to assist the pipeline-1. The data is passed through pipeline-1 and provides the result to the ensemble model. Pipeline-2 uses a direct way of image classification and prediction. As this model also predicts the images of unknown class, modification of the dimensions to attach it with pipeline-1 is done. On obtaining the results from both the pipelines, weighted-box-fusion is employed after the pipelining [22] where the results from both the pipelines are merged by using the weighted average technique. The weights to the pipelines are provisioned to them on the basis of their sensitivity score. Thus the final ensemble model is formed in which a stacked model pipeline and a pre-trained model pipeline are combined to provide better results than the individual model architecture. Experiments are conducted on Cassava leaf disease dataset using the proposed architecture to identify and localise the leaf diseases. Table 1 shows the results of these seven experiments conducted. Each experiment is conducted on a unique values set for the hyper-parameters such as optimisers, TTA, batch size. The last column of Table 1 shows the classification score obtained for each experiment. Among all the experiments conducted, Experiment 1 has achieved the highest classification score of 90.75%. These experiments can be found in the Kaggle Cassava leaf disease classification competition [12]. Figure 2 shows the sample input images. Figure 3 depicts the localization of leaf disease i.e. represented with bounding boxes and classification with identification of the type of diseases. The sixth image in Fig. 3 does not have any bounding box as it is identified as a healthy leaf by our proposed model. Moreover, this technique also provides better results in terms of time taken to train. The final ensemble model is efficient and shows on-par results.
190
H. Kumar et al.
Table 1 Experimental results obtained by fine-tuning the hyper-parameters of the proposed model
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Experiment 6
Experiment 7
Model
Optimizer
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
Adam Adam Adam Adam Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD Adam Adam Adam Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD SGD SGD SGD Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD Adam Adam Adam Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD Adam Adam Adam Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD Adam Adam Adam Adam
DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3
SGD Adam Adam Adam Adam
Batch size
TTA value
Classification score 0.9075
64
5
0.8993 128
10
0.9073 128
10
0.9002 64
5
0.8998 128
8
0.8991 64
10
0.9068 128
10
SGD stochastic gradient descent, TTA test time augmentation
The results obtained by the proposed method are compared with the CNN [21] and CropNet Cassava leaf disease classifier [6]. Table 2 shows the classification scores obtained by each of the methods. CNN has obtained a score of 85.3% and CropNet has achieved a score of 88.01% whereas the proposed model achieved the highest score of 90.75%.
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .
Fig. 2 The sample input images given to the proposed model
Fig. 3 Localization and classification of the sample input images Table 2 Comparison of the proposed method with other models Model Classification score CNN CropNet Cassava classifier Proposed method
0.853 0.8801 0.9075
191
192
H. Kumar et al.
5 Conclusion In this work, an ensemble deep learning model having two pipelines is proposed, for automatic Cassava leaf disease identification and classification using coloured images. The model is sufficient enough to classify all the four primary diseases CBB, CBSD, CGM and CMD caused in a Cassava leaf. The model is a standalone deep learning network and doesn’t require any external applications for use. Experimental results obtained show that this model can be used to build a classifier that can efficiently predict the presence of Cassava leaf diseases. This model will be helpful for amateur as well as experienced farmers as it eliminates the need for human assistance and related complications. The categorization accuracy achieved by the proposed model is 90.75%. The architecture of this model is so simple that it can be used for software(s) demanding real-time applications. Similarly, this model is able to detect other leaf diseases from coloured images if additional training data is provided. This model can therefore help farmers in Cassava fields to get appropriate assistance in the identification and classification of diseases. These precautionary measures have the potential to improve the management, survival, and prospects of Cassava plant fields.
References 1. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In Proceedings of the international conference on engineering and technology, pp 1–6 2. Chan TH, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032 3. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 4. Ganaie M, Hu M, et al (2021) Ensemble deep learning: a review. arXiv preprint arXiv:2104.02395 5. Ghorpade J, Parande J, Kulkarni M, Bawaskar A (2012) GPGPU processing in CUDA architecture. arXiv preprint arXiv:1202.4347 6. Google: TensorFlow CropNet Cassava disease classification model. https://tfhub.dev/google/ cropnet/classifier/cassava_disease_V1/2. [Online; accessed 20-July-2022] 7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 8. Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for MobileNetV3. In Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324 9. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 10. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141 11. Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Mark 31(3):685–695
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .
193
12. Kumar H (2022) Kaggle Cassava leaf disease detection. https://www.kaggle.com/code/ hrishikesh1kumar/cassava-leaf-disease-detection. Accessed on 20 Jul 2022 13. Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545 14. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 15. Mwebaze E, Gebru T, Frome A, Nsumba S, Tusubira J (2019) Cassava 2019 fine-grained visual categorization challenge. arXiv preprint arXiv:1908.02900 16. Ramcharan A, Baranowski K, McCloskey P, Ahmed B, Legg J, Hughes DP (2017) Deep learning for image-based cassava disease detection. Front Plant Sci 8:1852 17. Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 18. Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. Encycl Database Syst 532–538 19. Schapire RE (2003) The boosting approach to machine learning: an overview. In Nonlinear estimation and classification, pp 149–171 20. Shanmugam D, Blalock D, Balakrishnan G, Guttag J (2021) Better aggregation in test-time augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp 1214–1223 21. Shkliarevskyi M (2022) Kaggle Cassava leaf disease: Keras CNN prediction. https://www. kaggle.com/code/maksymshkliarevskyi/cassava-leaf-disease-keras-cnn-prediction. Accessed on 20 Jul 2022 22. Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: ensembling boxes from different object detection models. Image Vis Comput 107:104–117 23. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neur Inform Process Syst 27 24. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 25. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In Proceedings of the international conference on machine learning, pp 6105–6114 26. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers and distillation through attention. In Proceedings of the international conference on machine learning, pp 10347–10357 27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser LU, Polosukhin I (2017) Attention is all you need. In Advances in neural information processing systems, Vol 30. Curran Associates, Inc. 28. Wolpert DH (1992) Stacked generalization. Neur Netw 5(2):241–259 29. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5987–5995
Chapter 16
Scene Segmentation and Boundary Estimation in Primary Visual Cortex Satyabrat Malla Bujar Baruah , Adil Zafar Laskar , and Soumik Roy
1 Introduction Previous research has shown plenty of evidence of neurons’ strong computational capacities [24]. Neurons have distinct morphology that are adjusted to a specific frequency of inputs in order to retrieve unique information. Distinct retinal ganglion cell (RGC) morphology in the visual cortex is linked with exact connectome specificity, affecting the main visual cortex’s global response [3, 34]. However, very little has been known about the role of dendrite arbors’ electrophysiology and topologies in producing such complicated responses. The majority of the investigated neural networks are defined as learning systems because they focus on the mathematical interpretation of global behavior rather than local dynamics affecting global responses [6, 30]. Basic operations processed in the striate cortex of primate vision, such as edge recognition, scene segmentation, multi-resolution feature extraction, depth perception, and motion estimation, are suspected to be intrinsic behavior [1, 35] rather than an exhaustive learning process [11, 26]. The process of these basic visual activities is thought to be aided by a broad variety of neuron morphology [12, 16]. In this work, an attempt is made to link parasol RGC physiology, nonlinear dynamics, and connectome specificity to magnocellular scene segmentation, which aids in boundary prediction and object tracking type activity. To replicate their local behavSupported by organization x. S. M. B. Baruah · A. Z. Laskar (B) · S. Roy Department of Electronics and Communication Engineering, Tezpur University, Napam, Assam 784028, India e-mail: [email protected] S. Roy e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_16
195
196
S. M. B. Baruah et al.
ior, morphologically detailed parasol RGCs were constructed and modeled, including active and passive membrane dynamics. To simulate the global responses in these layers, a peculiar arrangement of midget and parasol RGCs creating RGC layers has been built as instructed in the in vivo investigation.
2 Method The morphology of parasol cells and midget cells that project information to the magnocellular and parvocellular layers of the visual cortex via specialized parallel pathways has long been known [9, 23, 23, 33]. Identifying the anatomical and functional relationships between the component neurons at consecutive points in the path is a key problem in understanding the organization and function of parallel visual pathways [15, 35]. Our model employs unique parasol RGCs connected to sympathetic bipolar cell along with nonlinear neural electrophysiology in driving scene segmentation functionalities. Natural images in ‘tiff’, ‘png’, and ‘jpg’ formats are fed to the model which are converted to spatio-temporal square pulses by bipolar cells. Temporal signal with an offset of 10 ms, pulse width of 150 ms, and total temporal length of 250 ms is generated by bipolar cells considering average response time of primates vision. Amplitude of the temporal signal is scaled proportional to amplitude of signal intensity within RGC sensitivity range of 1024–1016 nA [6]. These spatio-temporal signals are fed to the scene segmentation network that generates segmentation map of the visual stimuli that are send to the magnocellular layer. Orientation selective RGC layer in the magnocellular region then extracts the edge boundary from the segmented image. Details of the RGC morphology and connectome specificity with bipolar cell as well as boundary estimation in the visual cortex are discussed in Sects. 2.1 and 2.2 follows.
2.1 The RGC Morphology The proposed framework emphasizes on the computational role of unique neuron morphology, particularly parasol RGC in shaping visual scene segmentation and object boundary estimation. A moderate receptive field size has been taken [10, 18, 19] to optimize the computational complexity of the model and the morphology is shown in Fig. 1. RGC morphology in Fig. 1a is used for the scene segmentation model and RGC morphology in Fig. 1b is used for the boundary estimation model where the junctions, cell body synapses, and dendritic fibers are color encoded. Similar color at the synapses suggests connection of RGC solely with ON bipolar cells. Junction and soma are modeled as summing nodes that performs temporal summation and re-encoding of incoming cumulative signals. Re-encoding at localized active ion channel [29, 31] has been modeled using the Izhikevich’s membrane model and is given as
16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex
197
Fig. 1 Parasol RGC morphologies used in striate cortex and magnocellular layer
C
dv = k ((v − vr ) (v − vt ) − u + I ) , i f v ≥ vt dt du = a [b (v − vr ) − u] , v ← c, u ← u + d dt
(1) (2)
where v is the membrane potential with I as the stimuli to the neurons, u as the recovery current, vr as the resting membrane potential, and vt as the threshold potential. Different spiking activity such as regular spiking, chattering, and intrinsic bursting is controlled by parameters a, b, c, d, k, C. Use of Izhikevichs’ membrane model [21, 22] in our proposed model is because of its low computational complexity and its robustness in mimicking mammalian neurodynamics. The passive dendritic branches in the RGC morphology facilitate decremental conduction of propagating signal. Decremental conduction in passive fiber has been modeled using equations IinTotal = It + Iout (Vout − Vin ) IinTotal = Rlon dVout + G L (Vout − E L ) = 0 It + C m dt
(3) (4) (5)
from our previously published modeling work [4, 5]. Vin is the action potential generated by the localized active region, Vout is the membrane potential at the junction with initial membrane potential equals to resting membrane potential, Cm is the equivalent capacitance of fiber, Rlon is the axial resistance, G L is the membrane leakage conductance E L is the equilibrium potential due to the leakage ion channels, IinTotal is the total propagating current toward the nodes/ soma, It is the transmembrane current due to membrane dynamics, and Iout is the total delivered current.
198
S. M. B. Baruah et al.
2.2 Connectome Specificity Connectome specificity refers to the connectivity of parasol RGC with bipolar cells in context to the proposed framework. Shown in Fig. 1a are the excitatory type connectivity patterns of RGC morphology Fig. 1a with ON bipolar cell which are connected in oriented patterns [2, 8, 17, 20, 28]. A value of 1 in the connectivity matrix shown in Fig. 2 corresponds to excitatory connectivity with the ON bipolar cell, value of 2 in the connectivity matrix suggests two distal dendrites connected to the ON bipolar cell from opposite parent dendrites and value of 0 in the connectivity matrix suggests no connectivity. Scene segmentation type RGC morphologies connected in oriented pattern normalize the small gradient change corresponding to fine features and encode them in terms of their spiking frequency. Four orientation bands optimize the small local gradient change corresponding to specific orientation which are then passed through max-pool operator to generate the segmentation type response. Segmentation type images are then fed to the boundary detection network to extract the boundary information. With the minor gradient corresponding to fine features removed, when the segmented response is passed through orientation selective RGC network, the network tracks the major gradient corresponding to boundary of objects. Boundary estimation network employs the parasol RGC shown in Fig. 1b with exitatory as well as inhibitory connectivity with specific oriented patterns shown in Fig. 3. A value of 1 in the connectivity matrix suggests excitatory connectivity with RGC, a value of −1 in the connectivity matrix suggests inhibitory connectivity and
Fig. 2 Parasol RGC connectivity with ON bipolar cells for normalizing gradient change along specific orientations
Fig. 3 Connectivity matrix for boundary detection type RGC shown in Fig. 1b with segmentation type response with orientation specificity to 0◦ , 45◦ , 90◦ and 135◦
16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex
199
0 suggests no connectivity. These connectivity patterns detect gradient variation corresponding to 1s and −1s and starts firing at high frequency when the gradient is very high.
3 Simulation Results and Discussion The suggested model was simulated using the Python 3.6 interpreter, with packages like ‘openCV’ and ‘scikit’ for basic image operations and the ‘scipy.integrate’ package for solving differential equations. For plotting reaction images and other plot generations, the ‘matplotlib’ package was utilized in a similar way. Natural images in the ‘tif’, ‘png’, and ‘jpg’ formats were used as input to the suggested model, which is employed to stimulate photo receptor cells, and image data were collected from the Berkley segmentation database (BSDS500). Shown in Fig. 4 are some of the input images fed to the proposed model and their corresponding segmented responses and boundary estimation responses. As can be seen from ‘Segmentated’ response of Fig. 4a, c, the model successfully maps most of the boundary regions of the object, whereas responses corresponding to Fig. 4c, d shows some texture extraction which is due to the receptive field size of parasol RGCs in the segmentation layer. Increasing the receptive field size of the segmentation layer will remove most of the local gradient change making the boundary region more prominent. Thus, need of larger receptive field projecting to magnocellular region for better normalization of fine textures seems necessary. But due to computational complexity of the model, larger receptive fields are not being considered and remains to be interest of our future works. Table 1 lists the model parameters for modeling the passive membranes’ low
Fig. 4 Segmentation and boundary responses of the segmentation type RGC layer and boundary estimation RGC layer to input images (a–d)
200
S. M. B. Baruah et al.
Table 1 Izhikevich bursting and chattering membrane parameters and passive membrane propagation parameters Izhikevich Bursting Chattering Propagation Value parameter parameter parameter parameter a b c d C k vr vt
0.01 5 − 56 130 150 nF 1.2 − 65 mV − 35 mV
0.03 −2 −50 100 100 nF 0.7 − 60 mV − 30 mV
Cm
1 µF
Rlon
2
L
− 65 mV
GL
10−6 s
Table 2 Performance comparison of the proposed framework with existing state-of-the-art-models on BSDS500 database References Methods ODS OIS [25] [32] [38] [38] [38] [7] [36] [14] [27] [37] This work
RCF DeepContour Human BDCN BDCN-w/o SEM DeepEdge HED SE Multicue CEDN Parvocellular region
0.806 0.757 0.803 0.779 0.778 0.753 0.788 0.75 0.72 0.788 0.727
0.823 0.776 0.803 0.792 0.791 0.772 0.808 0.77 − 0.804 0.835
pass features [4, 5] as well as Izhikevich’s membrane dynamics [21, 22]. Izhikevich’s membrane model has been included to imitate the behavior of a human’s visual cortex due to its capacity to emulate Ca2 + ion channel dynamics. The proposed method has also been tested with the BSDS500 database because of the ground truth reference in the database. The proposed model is also compared with some existing state-of-the-art models as given in Table 2. All the models performed the analysis on the same dataset that is BSDS500 which our proposed model also considers. These models used some modern-day algorithms based on convolutional neural nets and some other specific methods for the estimation of the features. The Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS) values for the existing models which are in the literature are compared with our proposed model as given in Table 2. The proposed method performs very well in edge perception and
16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex
201
reaches a maximum OIS score of 83.5% which is nearly same as an average human being’s perception performance and an average ODS score of 72.7%. Validation of the performance has been evaluated using ‘Piotr’s Matlab Toolbox’ [13].
4 Conclusion The proposed model gives an insight into conversion of natural scenes fed to the visual cortex being segmented in the primary layer that later helps in the formation of object boundary. Even though the exact specificity of connectivity is not yet well explored for boundary estimation due to unavailability of measuring devices, the proposed methodology has been build with reference to in silico experimentation that refers to connectivity of neural networks specifically to either ON or OFF bipolar cells. Connectivity of network solely to ON type of bipolar cells gives rise to segmentation with relatively moderate receptive field. Moderate to larger receptive fields with connectivity to single type of bipolar cell give rise to segmentation type behavior and orientation selective RGC layer connected to segmentation type responses gives rise to object boundary detection, which is one of the major features projected onto magnocellular region of visual cortex. Thus, the proposed model gives insight into the process of object boundary estimation which later helps in complex function formation such as object tracking and object motion estimation. Acknowledgment This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation.
References 1. Aleci C, Belcastro E (2016) Parallel convergences: a glimpse to the magno-and parvocellular pathways in visual perception. World J Res Rev 3(3):34–42 2. Antinucci P, Hindges R (2018) Orientation-selective retinal circuits in vertebrates. Front Neural Circ 12:11 3. Barlow HB (1982) David Hubel and Torsten Wiesel: their contributions towards understanding the primary visual cortex. Trends Neurosci 5:145–152 4. Baruah SMB, Gogoi P, Roy S (2019) From cable equation to active and passive nerve membrane model. In: 2019 second international conference on advanced computational and communication paradigms (ICACCP). IEEE, pp 1–5 5. Baruah SMB, Nandi D, Roy S (2019) Modelling signal transmission in passive dendritic fibre using discretized cable equation. In: 2019 2nd international conference on innovations in electronics, signal processing and communication (IESC). IEEE, pp 138–141 6. Baruah SMB, Nandi D, Gogoi P, Roy S (2021) Primate vision: a single layer perception. Neural Comput Appl 33(18):11765–11775 7. Bertasius G, Shi J, Torresani L (2015) Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4380–4389
202
S. M. B. Baruah et al.
8. Briggman KL, Helmstaedter M, Denk W (2011) Wiring specificity in the direction-selectivity circuit of the retina. Nature 471(7337):183–188 9. Callaway EM (2005) Structure and function of parallel pathways in the primate early visual system. J Physiol 566(1):13–19 10. Cooler S, Schwartz GW (2021) An offset on-off receptive field is created by gap junctions between distinct types of retinal ganglion cells. Nat Neurosci 24(1):105–115 11. Dacey DM, Brace S (1992) A coupled network for parasol but not midget ganglion cells in the primate retina. Visual Neurosci 9(3–4):279–290 12. Dipoppa M, Ranson A, Krumin M, Pachitariu M, Carandini M, Harris KD (2018) Vision and locomotion shape the interactions between neuron types in mouse visual cortex. Neuron 98(3):602–615 13. Dollár P. Piotr’s computer vision Matlab toolbox (PMT). https://github.com/pdollar/toolbox 14. Dollár P, Zitnick CL (2014) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570 15. Edwards M, Goodhew SC, Badcock DR (2021) Using perceptual tasks to selectively measure magnocellular and parvocellular performance: Rationale and a user’s guide. Psychonom Bull Rev 28(4):1029–1050 16. Garg AK, Li P, Rashid MS, Callaway EM (2019) Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science 364(6447):1275–1279 17. Garg AK, Li P, Rashid MS, Callaway EM (2019) Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science 364(6447):1275–1279 18. Gauthier JL, Field GD, Sher A, Greschner M, Shlens J, Litke AM, Chichilnisky E (2009) Receptive fields in primate retina are coordinated to sample visual space more uniformly. PLoS Biol 7(4):e1000063 19. Gauthier JL, Field GD, Sher A, Shlens J, Greschner M, Litke AM, Chichilnisky E (2009) Uniform signal redundancy of parasol and midget ganglion cells in primate retina. J Neurosci 29(14):4675–4680 20. Guo T, Tsai D, Morley JW, Suaning GJ, Kameneva T, Lovell NH, Dokos S (2016) Electrical activity of on and off retinal ganglion cells: a modelling study. J Neural Eng 13(2):025005 21. Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Netw 14(6):1569– 1572 22. Izhikevich EM (2007) Dynamical systems in neuroscience. MIT Press 23. Kling A, Gogliettino AR, Shah NP, Wu EG, Brackbill N, Sher A, Litke AM, Silva RA, Chichilnisky E (2020) Functional organization of midget and parasol ganglion cells in the human retina. BioRxiv 24. Koch C, Segev I (2000) The role of single neurons in information processing. Nat Neurosci 3(11):1171–1177 25. Liu Y, Cheng MM, Hu X, Bian JW, Zhang L, Bai X, Tang J (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10. 1109/TPAMI.2018.2878849 26. Manookin MB, Patterson SS, Linehan CM (2018) Neural mechanisms mediating motion sensitivity in parasol ganglion cells of the primate retina. Neuron 97(6):1327–1340 27. Mély DA, Kim J, McGill M, Guo Y, Serre T (2016) A systematic comparison between visual cues for boundary detection. Vision Res 120:93–107 28. Nelson R, Kolb H (1983) Synaptic patterns and response properties of bipolar and ganglion cells in the cat retina. Vision Res 23(10):1183–1195 29. Nusser Z (2012) Differential subcellular distribution of ion channels and the diversity of neuronal function. Curr Opin Neurobiol 22(3):366–371 30. Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat Neurosci 2(11):1019–1025 31. Shah MM, Hammond RS, Hoffman DA (2010) Dendritic ion channel trafficking and plasticity. Trends Neurosci 33(7):307–316 32. Shen W, Wang X, Wang Y, Bai X, Zhang Z (2015) Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3982–3991
16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex
203
33. Soto F, Hsiang JC, Rajagopal R, Piggott K, Harocopos GJ, Couch SM, Custer P, Morgan JL, Kerschensteiner D (2020) Efficient coding by midget and parasol ganglion cells in the human retina. Neuron 107(4):656–666 34. Troncoso XG, Macknik SL, Martinez-Conde S (2011) Vision’s first steps: anatomy, physiology, and perception in the retina, lateral geniculate nucleus, and early visual cortical areas. In: Visual Prosthetics, pp 23–57 35. Wang W, Zhou T, Zhuo Y, Chen L, Huang Y (2020) Subcortical magnocellular visual system facilities object recognition by processing topological property. BioRxiv 36. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403 37. Yang J, Price B, Cohen S, Lee H, Yang MH (2016) Object contour detection with a fully convolutional encoder-decoder network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 193–202 38. Zhao Y, Li J, Zhang Y, Song Y, Tian Y (2021) Ordinal multi-task part segmentation with recurrent prior generation. IEEE Trans Pattern Anal Mach Intell 43(5):1636–1648. https://doi. org/10.1109/TPAMI.2019.2953854
Chapter 17
Dynamic Thresholding with Short-Time Signal Features in Continuous Bangla Speech Segmentation Md Mijanur Rahman
and Mahnuma Rahman Rinty
1 Introduction Segmentation is essential in any voiced activated system that decomposes the speech signal into smaller units [1]. Words, phonemes, and syllables are the fundamental auricular units of the speech waveform. Therefore, the word is the most acceptable candidate for the natural speech unit with a well-defined acoustic representation. Short-term features [2], dynamic thresholding [3], wavelet transform [4], fuzzy approaches [5], neural networks [6], and Markov model (HMM) [7] have mostly been used for speech segmentation. To segment an entity into separate, non-overlapping components are the basic goal of segmentation. [8]. There are many different ways to categorize continuous speech segmentation techniques [9], but a fundamental division is made between approaches that are assisted and those that are blind. [10, 11]. The significant distinction between aided and blind segmentation is how the computer processes the target speech units utilizing previously gathered data or external features. The most significant prerequisite for any speech recognition system is speech feature extraction, commonly referred to as signal processing front-end. It is a mathematical representation of the voice file, which turns the speech waveform into a parametric representation for further analysis and processing. The gathering of notable features is carried out using this parametric representation. The speech signal can be parametrically encoded in a variety of ways. Examples of these include short-time energy, zero-crossing rates, level crossing rates, spectral centroid, and other related parameters [12]. When it comes to speech segmentation and recognition, a good feature can help. M. M. Rahman (B) Dept. of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh 2224, Bangladesh e-mail: [email protected] M. R. Rinty Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 205 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_17
206
M. M. Rahman and M. R. Rinty
2 Short-Time Speech Signal Features The basic idea behind the method is to use short-term speech features to locate segment borders by tracking frequency or spectrum variations in a spoken signal. Moment and spectral signal characteristics are used to segment speech transmissions. These segmentation techniques rely on time-domain information, the same as signal energy/amplitude and average zero-crossing-rate (ZCR), are straightforward to apply and calculate. The signal energy is computed as short-term basis that locates voiced sounds in a continuous speech, which have higher power than silence/unvoiced (see Fig. 1). It is usually calculated by short-term windowing the speech frame, squaring the sample values, and assuming root-mean-square (RMS) values [13].
Fig. 1 Speech sentence’s “Amader Jatiya Kabi Kazi Nazrul Islam”: (a) original speech signal, (b) short-time energy curves, and (c) its short-time average zero-crossing rate curves
17 Dynamic Thresholding with Short-Time Signal …
207
The root-mean-square amplitude of a voice signal is a measure of its energy. It gives us a measurement of the difference in amplitude over time when used in subsequent windows of a speech signal. The short-term RMS energy of a speech frame with length N is given by:
E n(RMS)
N 1 = [x(m)w(n − m)]2 N m=1
(1)
The average ZCR, on the other hand, keeps track of how many times the signal’s amplitude crosses zero throughout a specific period [14]. In general, silent/unvoiced segments have greater ZCR values than loud ones. The speech ZCR curve, as depicted in Fig. 1c, has peaks and troughs from the voiced and unvoiced sections, respectively. The short-time averaged ZCR is defined asZn = where:
N 1 |sgn[x(m)] − sgn[x(m − 1)]|w(n − m) 2 m=1
1 x(m) ≥ 0 sgn[x(m)] = −1 x(m) < 0
(2)
(3)
and the rectangle window w(n) has length N . A basic speech/non-speech determination can be made using a combination of RMS and average ZCR. However, average ZCR is higher in unvoiced sounds, while RMS value is higher in voiced sounds and lower in non-speech sounds. The resulting speech signals after extracting the short-time (time-domain) signal features are shown in Fig. 1. The frequency range of most voice information is 250–6800 Hz [15]. The “discrete Fourier transform (DFT)” handles the frequency-domain features that provide information about each frequency contained in a signal [16]. Two frequency-domain characteristics were used in the segmentation methods: spectral centroid and spectral flux. The spectral centroid depicts the center of gravity of a voice signal. High values represent vocal sounds, and this feature assesses the spectral position [17]. “The center of gravity of its spectrum” is how the spectral centroid, SCi , of the i-th frame is described, and the following equation gives it: SCi =
N −1
m=0 f (m)X i (m) N −1 m=0 X i (m)
(4)
Here, f (m) stands for the center frequency of that bin’s length N and X i (m) is the amplitude corresponding to bin i in the DFT spectrum. As illustrated in Fig. 2b, higher values of this characteristic, which measures spectral location, correspond to “brighter sounds”. Finally, spectral flux is a measurement of how quickly a signal’s
208
M. M. Rahman and M. R. Rinty
power varies. It is determined by comparing the power signal between two frames (the Euclidean distance) [18]. The spectral flux can be used, among other things, to assess the timbre of an audio stream or in onset detection. The equation for spectral flux, for example, is: N /2 SFi = [|X i (k)| − |X i (k − 1)|]2
(5)
k=1
The DFT coefficient of the i-th short-term frame of length N is represented here by X i (k). SHi shows the rate of spectral changes in speech features. Figure 2 shows the generated speech signals after extracting the short-time (frequency-domain) signal characteristics.
Fig. 2 Graphs of a original signal, b spectral centroid features, and c spectral flux features of the same speech sentence
17 Dynamic Thresholding with Short-Time Signal …
209
3 Dynamic Thresholding Approach A dynamic thresholding approach is used to find the uttered words after extracting speech feature sequences. In segmentation, this approach uses a separate threshold value for each speech feature vector. This approach automatically finds two thresholds, T1 and T2 , for energy and spectral centroid spectra, respectively. Finally, sequential frames with individual signal qualities greater than the calculated criteria are used to produce the targeted voiced segments. Figure 3 shows both filtered feature sequence curves (signal energy and spectral centroid) with threshold settings. Figure 4 shows the overall segmentation results of another Bangla speech sentence which contains 5 (five) speech words. The other segmentation approach utilizes dynamic thresholding and blocking method, also known as “blocking black area” method, in the spectrogram image of Bangla speech sentence. After generating the spectrogram image of a speech signal, the thresholding method first transforms the image into a grayscale illustration [19]. The threshold analysis algorithm is then used to limit which pixels are redirected into black or white [20], as shown in Fig. 5. Static and dynamic thresholding are two most used thresholding procedures. Each pixel in the spectrogram image has its own threshold in the proposed approach. Finally, sequential frames with individual signal
Fig. 3 Curves with threshold values are ordered by the median filtered feature
210
M. M. Rahman and M. R. Rinty
Fig. 4 Overall segmentation results: a the original signal, b the short-time signal energy features curve with a threshold value, c the spectral centroid sequence curve with a threshold value, and d the segmented words in dashed circles
qualities greater than the calculated criteria are used to produce the targeted voiced segments. However, the issue is determining how to choose the desired threshold. Therefore, this study offers Otsu’s thresholding method to calculate the preferred threshold value. Otsu’s technique, which was developed in 1979 [21], by “Nobuyuki Otsu”, is typically more successful at segmenting images [22]. This method assumes that the image has two main areas: Background and foreground. It then determines “an optimal threshold that minimizes the weighted within-class variance and maximizes the between-class variance”.
4 Blocking Method A new approach (blocking method) is introduced in speech segmentation [23]. It blocks voiced parts of continuous speech’s thresholded illustration into multiple black boxes that separate them from silent/unvoiced segments. The black area represents vocal parts, and the white area represents unvoiced parts. The edges of each black block indicate speech word borders in continuous speech. Correctly identifying the speech unit boundaries (i.e., locating start and end points) shows proper
17 Dynamic Thresholding with Short-Time Signal …
211
Fig. 5 Bangla speech sentence graph and its thresholded (spectrogram) images
segmentation. As a result, in the voiced parts of the spoken sentence, this method generates rectangular black boxes, each of which shows the appropriate speech segment (e.g., words/syllables). The overall segmentation process with blocking method is illustrated in Fig. 6. These two points are utilized to automatically label the word boundaries in the original speech sentence, dividing each speech segment from the
212
M. M. Rahman and M. R. Rinty
Fig. 6 Thresholded spectrogram image (top), rectangular black boxes after blocking the voiced regions (middle), and speech word segments represent the speech segmentation using the blocking black area approach (bottom)
speech sentence, after identifying the start and stop positions of each black box. According to Fig. 6, the speech sentence presents 6 (six) black boxes, which represent 6 (six) word fragments in the “Amader Jatiya Kabi Kazi Nazrul Islam” speech sentence.
17 Dynamic Thresholding with Short-Time Signal …
213
5 Conclusion We have offered a straightforward approach based on short-term speech features for efficiently segmenting continuous speech into smaller units. This feature-based approach gives a foundation for differentiating voiced parts from unvoiced components. Moreover, it shows a changing feature on a short-term basis that may expose the tempo and periodicity character of the targeted speech signal. This paper also presents an effective dynamic thresholding algorithm with blocking method in continuous speech segmentation. However, it faced difficulty in segmenting some words adequately. It owed numerous sources of speech variability, such as phonetic features, pitch and amplitude, speaker’s characteristics, device, and environment properties. Acknowledgement The authors would like to convey their sincere gratitude to the Jatiya Kabi Kazi Nazrul Islam University’s research and extension center (Nazrul University, Bangladesh) for their financial supports and cooperation in conducting this research work.
References 1. Rahman MM, Khan MF, Bhuiyan MA-A (2012) Continuous Bangla speech segmentation, classification and feature extraction. Int J Comput Sci Issues 9(2):67 2. Rahman MM, Bhuiyan MAA (2012) Continuous Bangla speech segmentation using short-term speech features extraction approaches. Int J Adv Comput Sci Appl 3(11) 3. Rahman MM, Bhuiyan MA-A (2013) Dynamic thresholding on speech segmentation. Int J Res Eng Technol 2(9):404–411 4. Hioka Y, Hamada N (2003) Voice activity detection with array signal processing in the wavelet domain. IEICE Trans Fundamentals Electron Commun Comput Sci 86(11):2802–2811 5. Beritelli F, Casale S (1997) Robust voiced/unvoiced speech classification using fuzzy rules. In: IEEE workshop on speech coding for telecommunications proceedings. Back to basics: attacking fundamental problems in speech coding. IEEE, pp 5–6 6. Rahman MM, Bhuiyan MA-A (2015) Comparison study and result analysis of improved backpropagation algorithms in Bangla speech recognition. Int J Appl Res Inf Technol Comput 6(2):107–117 7. Kadir MA, Rahman MM (2016) Bangla speech sentence recognition using hidden Markov models. Int J Multidisc Res Dev 3(7):122–127 8. Rahman MM, Bhuiyan MA-A (2011) On segmentation and extraction of features from continuous Bangla speech including windowing. Int J Appl Res Inf Technol Comput 2(2):31–40 9. Rahman MM, Khan MF, Moni MA (2010) Speech recognition front-end for segmenting and clustering continuous Bangla speech. Daffodil Int Univ J Sci Technol 5(1):67–72 10. Sharma M, Mammone R (1996) Blind speech segmentation: automatic segmentation of speech without linguistic knowledge. In: Proceeding of fourth international conference on spoken language processing. ICSLP’96, vol 2. IEEE, pp 1237–1240 11. Schiel F (1999) Automatic phonetic transcription of non-prompted speech. In: Proceedings of the ICPhS, pp 607–610 12. Rahman MM (2022) Continuous Bangla speech processing: segmentation, classification and recognition. B. P. International 13. Zhang T, Kuo C-C (1999) Hierarchical classification of audio data for archiving and retrieving. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. no. 99CH36258), vol 6. IEEE, pp 3001–3004
214
M. M. Rahman and M. R. Rinty
14. Rabiner LR, Sambur MR (1975) An algorithm for determining the endpoints of isolated utterances. Bell Syst Techn J 54(2):297–315 15. Niederjohn R, Grotelueschen J (1976) The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans Acoust Speech Signal Process 24(4):277–282 16. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301 17. Giannakopoulos T (2009) Study and application of acoustic information for the detection of harmful content, and fusion with visual information. Thesis, Department of Informatics Telecommunications 18. Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (2005) A tutorial on onset detection in music signals. IEEE Trans Speech Audio Process 13(5):1035–1047 19. Shapiro LG, Linda G (2002) Comput Vis. Prentice Hall 20. Rahman M, Khatun F, Islam MS, Bhuiyan MA-A (2015) Binary features of speech signal for recognition. Int J Appl Res Inf Technol Comput 6(1):18–25 21. Gonzalez RC, Woods RE (1992) Digital image processing reading. Addison-Wesley, MA 22. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66 23. Rahman MM, Khatun F, Bhuiyan MA-A (2015) Blocking black area method for speech segmentation. Int J Adv Res Artif Intell 4(2):1–6
Chapter 18
Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images Balla Pavan Kumar , Arvind Kumar , and Rajoo Pandey
1 Introduction The quality of outdoor images is affected by bad climates such as fog. Due to this scenario, the performance of many real-time video applications such as automated driver assistance systems [1] and video surveillance is badly affected. Hence, there is a need to develop a suitable algorithm to reduce the haze effect. There are many dehazing algorithms that can effectively eliminate the haze. However, there are still some challenges that need to be addressed. As mentioned in [2], the dehazing methods can be categorized on the basis of (i) image enhancement, (ii) image fusion, (iii) image restoration, and (iv) neural network. The image enhancement-based dehazing methods [3, 4] improve the visibility of the curved edges of a hazy image but they produce halo artifacts, and also, they cannot efficiently remove the haze. The image fusion-based dehazing methods [5, 6] work well for the images with homogenous haze but they produce the halo effect and exhibit high execution time. The image restoration-based algorithms are currently trending in research of image dehazing. Although these methods [7–10] produce natural outputs with effective haze removal, they produce the over-dark outcome and also, these algorithms produce high computation time. The dehazing method of [11] considers the neural network framework to remove the fog effect. Despite the neural network techniques that are trending these days, they are not suitable for image dehazing as the atmospheric light is empirically chosen, which may result in the unnatural B. P. Kumar (B) · A. Kumar · R. Pandey National Institute of Technology, Kurukshetra, India e-mail: [email protected] A. Kumar e-mail: [email protected] R. Pandey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_18
215
216
B. P. Kumar et al.
outcome as mentioned in [12]. Although the haze relevant features can be learned in these methods, they rely on huge training datasets. Most of the dehazing methods do not consider the local variations such as the regions of ‘less-affected by haze’ and ‘more-affected by haze’. If all the areas of a hazy image are considered uniformly, the regions with less-affected by haze may be over-dehazed and may produce dark outputs; also, the regions with more-affected by haze may be under dehazed, and also, the haze at these regions is not effectively eliminated. Hence, a dehazing algorithm should be designed adaptively, which treats different areas of a dehaze image in appropriate manner, for the better outcome. Inspired from our work [13], a hazy image can be categorized as shown in Fig. 1. Also, many of the existing methods of dehazing are not suitable for real-time image processing systems as they exhibit high computational time. In order to overcome this problem, a fast dehazing algorithm has to be implemented with less complexity. In this paper, a fast adaptive dehazing algorithm is proposed which is inspired from our previous work [13]. At first, a hazy image is categorized into ‘less-affected by haze’ and ‘more-affected by haze’ regions. After that, the input hazy image is passed separately to the two blocks, namely ‘less-affected by haze’ and ‘more-affected by haze’, for the purpose of adaptive dehazing. For every block, the input hazy image is decomposed into base and detail layers with different scale smoothing factors. In each block, the fast dehazing algorithm of [15] is applied for the base layer for the dehazing purpose. Also, the fast Laplacian enhancement [16] is applied for the details layer in
Hazy Image
Dark-region layer
Non-dark
region
Fig. 1 Hazy image categorization which is inspired by [13] for the hazy image— ‘KRO_Google_143’ of RESIDE dataset [14]
18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images
217
each block for the details enhancement. In each block, the dehazed image is fused with a detailed enhanced image to obtain the recovered image. Finally, the recovered image outputs of the two blocks are fused based upon the regional categorization. The proposed method exhibits good outcomes and is also suitable for real-time video dehazing with 25 frames per second (FPS). The rest of the paper comprises (i) background of hazy image representation, (ii) methodology of the proposed method, (iii) experimental results of proposed and existing dehazing methods, and (iv) conclusion.
2 Background As given in [17], the physical description of a hazy image can be expressed as I (x) = J (x).t(x) + A(1 − t(x))
(1)
where x denotes the pixel location, J represents a haze-free image, A denotes the atmospheric light, and I represents a hazy image. As given in [17], the transmission map (t) is expressed as t (x) = e−β.d(x)
(2)
here, d represents the distance from the camera to the scene, and β is the attenuation constant. From Eqs. (1) and (2), it can be observed that the more the distance and/or attenuation constant, the greater the effect of atmospheric light on an image (haze effect). As the different areas of a hazy image are affected differently w.r.t distance and/or attenuation constant, adaptive dehazing is essential to treat according to the effect of haze.
3 Methodology 3.1 Hazy Image Classification The input hazy image is classified into ‘less-affected by haze’ and ‘more-affected by haze’ regions based on the pixel intensity values. The influence of atmospheric light will be more when there is more haze. In most cases, the pixel intensity value of the atmospheric light lies in the range (0.7, 1), where the pixel intensity values of any image can lie in the range (0, 1). Hence, it can be deduced that pixels with lowintensity values are less-affected by haze due to less influence of atmospheric light. Empirically, it can be concluded that the pixel intensities with less than 0.5 would be
218
B. P. Kumar et al.
considered as ‘less-affected by haze’ region, and the rest of the pixels come under the ‘more-affected by haze’ region. The hazy image categorization is illustrated as Haze level of region =
Less affected by haze; if I (x) < 0.5 More affected by haze; otherwise
(3)
here, I represents the hazy image, and x represents the pixel-coordinate of I.
3.2 Fast Hazy Image Decomposition After the classification of hazy images, the input hazy image is applied to the ‘lessaffected by haze’ and ‘more-affected by haze’ blocks as shown in Fig. 2. For each block, different sets of scale smooth factors are chosen for the decomposition of the hazy image. The hazy image is decomposed into base and detail layers using the fast guided filter of [15] as qi = ak Ibli + bk , ∀i ∈ wk
(4)
where I bl represents the base layer of I, q denotes the filtered image, i represents the pixel index, w denotes the square matrix, k is the index of w; a and b represent the linear map and constant map, respectively. The input hazy image is scaled by p times for faster execution of the guided filter.
Fig. 2 Block diagram of fast region-based adaptive dehazing algorithm
18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images
219
3.3 Fast Dehazing After image decomposition, the base layer is dehazed using the atmospheric model as mentioned in Eq. (1). As given in [18], from the mean of b, the atmospheric light map (Amap ) can be calculated as Amap (x) = mean(b) = μq (x) −
σq2 (x) σq2 (x) + ε
μq (x)
(5)
where μq and σ 2 q are the mean and variance of guided filter output (q), respectively, as shown in Eq. (4), ε is the scale smooth value. The above process is implemented separately in two blocks with different scale smooth values, i.e., ε = 10–6 for lessaffected by haze block and ε = 10–5 for those more-affected by the haze block. The transmission map (T ) can be obtained from [18] as T (x) = 1 − ωD(x)/A
(6)
where ω represents the constant parameter which is empirically chosen as 0.7, D denotes the dark channel; A is the atmospheric light value that is obtained from the median of the topmost 1% pixels of the Amap . The dark channel can be evaluated from [7] as D(x) = min Iblc (x) c∈{r,g,b}
(7)
where Iblc denotes the RGB color channel of Ibl . The transmission map is refined using the spatio-temporal Markov random field, as given in [18]. From the obtained refined transmission map (TR ) and atmospheric light (A), the dehaze image (J) can be obtained from Eq. (1) as J=
Ig − A + AB TR
(8)
where AB is balanced atmospheric light as in [19], AB can be expressed as AB =
AvB , if σ A > th A, otherwise
(9)
where th represents the threshold value (as given in [19],it is empirically chosen as th = 0.2), vB denotes the normalizing vector √13 , √13 , √13 , σA represents the variance of atmospheric light. The above dehazing process is implemented separately for two blocks—‘lessaffected by haze’ and ‘more-affected by haze’ with different scale smoothing values ε = 10−6 and ε = 10−5 , respectively.
220
B. P. Kumar et al.
3.4 Fast Multilayer Laplacian Enhancement After the image decomposition, the details layer is enhanced using fast multilayer Laplacian enhancement of [16]. It is expressed as M(x) =
sw , 1+e(−s(R−Rs ))
x,
if − 0.5sw < Ri < 0.5sw , otherwise
(10)
where M denotes the linear mapping function of fast Laplacian enhancement, sw represents the scale mapping width, s denotes the scale mapping factor; R and Rs denote the residual and mean of the residual images, respectively. The above enhancement process is implemented separately for two blocks—‘less-affected by haze’ and ‘more-affected by haze’ with different scale mapping factors s = 20 and s = 40, respectively. After that, in each block, the results of dehazed image and detailed enhanced image are fused to attain the recovered images as shown in Fig. 2. The final adaptive dehazing result is obtained by fusing the recovered images from both blocks according to the image categorization, i.e., for the recovered image of the ‘less-affected by haze’ block, only the less-affected by haze regions are chosen and vice-versa for the moreaffected by haze block. The proposed dehazing algorithm is fast and also adaptively dehazes according to the level of haze. The experimental results of the proposed dehazing technique show significant improvements when compared to the existing methods.
4 Experimental Results The proposed fast adaptive dehazing technique is tested with Foggy Cityscapes [20] and Foggy Zurich [21] datasets. These datasets contain road images in foggy weather conditions and are primarily used in applications for automated driver assistance systems. The quantitative results are mentioned in Table 1 for the existing methods [6–11] as well as for the proposed fast adaptive dehazing method. The proposed technique is implemented with less complexity to suit real-time video dehazing applications. The proposed method exhibits the computation time of around 0.04 s on average which is equal to around 25 FPS. Although the existing methods produce good quantitative results (NIQE [22] and BRISQUE [23]), they exhibit large execution times which are not suitable for real-time video processing. The qualitative results of the proposed algorithm and existing techniques [6– 11] for the images from Foggy Cityscapes [20] and Foggy Zurich [21] datasets are shown in Fig. 3. The existing methods [6–8, 10, 11] produce better dehazing results than the proposed method, but they produce over-dark outcomes after dehazing. The existing method MOF [9] produces the over-color saturated outcome. The proposed
18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images
221
Table 1 Image quality parameter values for 1508039851_start_01m00s_000351 and frankfurt_003920_beta_0.005 images of Foggy Cityscapes and Foggy Zurich datasets, respectively Parameters
DEFADE [6]
Fast DCP [7]
BCCR [8]
MOF [9]
D-NET [11]
OTM [10]
Proposed
Computation time
9.07735
1.0552
3.0868
1.298
1.0471
18.4023
0.0288
NIQE
2.1594
4.1498
2.2764
2.2764
2.5279
2.3849
7.2775
BRISQUE
22.2386
12.0204
25.0862
25.0862
37.1121
23.2916
64.2919
Computation time
9.09372
1.0504
3.5979
1.369
1.1115
15.9539
0.0603
NIQE
3.489
5.647
3.6279
3.6279
3.7946
3.7767
7.6359
BRISQUE
30.579
31.655
28.6412
28.6412
42.8589
33.3614
46.0747
method produces natural output and does not produce over-dark output and over-color saturation. The analysis of hazy image proposed dehazing result w.r.t semantic segmentation [24] is shown in Fig. 4. The semantic segmentation for the proposed dehazed output
a) Hazy Image
b) DEFADE [6]
c) Fast DCP [7]
e) MOF [9]
f) D-NET [11]
g) OTM [10]
i) Hazy Image
m) MOF [9]
j) DEFADE [6]
n) D-NET [11]
k) Fast DCP [7]
o) OTM [10]
d) BCCR [8]
h) Proposed Method
l) BCCR [8]
p) Proposed Method
Fig. 3 Results of existing dehazing methods and the proposed method. a, i Hazy images ‘1508039851_start_01m00s_000351’ and ‘frankfurt_003920_beta-_0.005’’ from Foggy Cityscapes and Foggy Zurich datasets, respectively. b–h, j–o Dehazed output of existing methods. h, p Dehazed result of the proposed technique
222
B. P. Kumar et al.
Fig. 4 Analysis of hazy image and proposed dehazing result w.r.t semantic segmentation [24]. a Hazy image ‘U080-000041’ from the FRIDA dataset [25]. b Dehazed output of proposed method. c Semantic segmentation of (a). d Semantic segmentation of (b)
shows better results when compared to the same for the hazy image as shown in Fig. 4c, d, respectively. Overall, the proposed fast dehazing method produces fast and efficient results which are suitable for the real-time video processing systems such as automated driver assistance systems.
5 Conclusion The challenges faced by the existing methods of dehazing are more complexity and more time consumption. For real-time video processing systems, the dehazing algorithm must be of less complexity and execute faster. A fast dehazing algorithm is proposed in this paper to overcome the challenges. A hazy image is first classified into ‘less-affected by haze’ and ‘more-affected by haze’ regions on the basis of pixel intensity values. The image decomposition, image dehazing, and details enhancement of the hazy image are performed separately in the blocks named ‘less-affected by haze’ and ‘more-affected by haze’, with different scale factors. The results of these two blocks are fused based upon the regional categorization for adaptive dehazing. The proposed adaptive fast dehazing method produces good dehazed results at the rate of 25 FPS which are suitable for real-time video processing systems.
References 1. Huang SC, Chen BH, Cheng YJ (2014) An efficient visibility enhancement algorithm for road scenes captured by intelligent transportation systems. IEEE Trans Intell Transp Syst 15(5):2321–2332 2. Wang W, Yuan X (2017) Recent advances in image dehazing. IEEE CAA J. Autom. Sinica 4(3):410–436
18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images
223
3. Kim JH, Jang WD, Sim JY, Kim CS (2013) Optimized contrast enhancement for real-time image and video dehazing. J Vis Commun Image Represent 24(3):410–425 4. Li Z, Tan P, Tan RT, Zou D, Zhiying Zhou S, Cheong LF (2015) Simultaneous video defogging and stereo reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4988–4997 5. Ancuti CO, Ancuti C (2013) Single image dehazing by multi-scale fusion. IEEE Trans Image Process 22(8):3271–3282 6. Choi LK, You J, Bovik AC (2015) Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans Image Process 24(11):3888–3901 7. He K, Sun J, Tang X (2010) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 8. Meng G, Wang Y, Duan J, Xiang S, Pan C. Efficient image dehazing with boundary constraint and contextual regularization. In: Proceedings of the IEEE international conference on computer vision, pp 617–624 9. Zhao D, Xu L, Yan Y, Chen J, Duan LY (2019) Multi-scale optimal fusion model for single image dehazing. Signal Process Image Commun 74:253–265 10. Ngo D, Lee S, Kang B (2020) Robust single-image haze removal using optimal transmission map and adaptive atmospheric light. Remote Sens 12(14):2233 11. Cai B, Xu X, Jia K, Qing C, Tao D (2016) Dehazenet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198 12. Haouassi S, Wu D (2020) Image dehazing based on (CMTnet) cascaded multi-scale convolutional neural networks and efficient light estimation algorithm. Appl Sci 10(3):1190 13. Kumar BP, Kumar A, Pandey R (2022) Region-based adaptive single image dehazing, detail enhancement and pre-processing using auto-colour transfer method. Signal Process Image Commun 100:116532 14. Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, Wang Z (2018) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–505 15. He K, Sun J (2015) Fast guided filter. arXiv:1505.00996 16. Talebi H, Milanfar P (2016) Fast multilayer Laplacian enhancement. IEEE Trans Comput Imaging 2(4):496–509 17. Koschmieder H (1924) Theorie der horizontalen Sichtweite. Beitrage zur Physik der freien Atmosphare, 33–53 18. Cai B, Xu X, Tao D (2016) Real-time video dehazing based on spatio-temporal mrf. In: Pacific Rim conference on multimedia. Springer, Cham, pp 315–325 19. Shin YS, Cho Y, Pandey G, Kim A (2016) Estimation of ambient light and transmission map with common convolutional architecture. In: OCEANS 2016, MTS/IEEE. Monterey. IEEE, pp 1–7 20. Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 126(9):973–992 21. Sakaridis C, Dai D, Hecker S, Van Gool L (2018) Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 687–704 22. Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212 23. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708 24. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818 25. Tarel JP, Hautiere N, Cord A, Gruyer D, Halmaoui H (2010) Improved visibility of road scene images under heterogeneous fog. In: 2010 IEEE intelligent vehicles symposium, pp 478–485
Chapter 19
Review on Recent Advances in Hearing Aids: A Signal Processing Perspective R. Vanitha Devi and Vasundhara
1 Introduction According to the definition provided by the World Health Organization (WHO), hearing loss is the inability to hear with a hearing threshold of 20 dB or better in both ears. Nowadays, hearing loss is becoming a commonly arising problem due to noise, aging, disease, and inheritance. Herein, the consequences of hearing loss are described systematically. Conversations with friends and family may be difficult for people with hearing loss. They may also have difficulty in hearing doorbells and alarms, as well as responding to warnings. Overall, the problem of hearing loss affects one among three adults between the ages of 65 and 74, and approximately the count expected to be half or little more in those people of age group over 75 [1]. However, some people may be hesitant to confess they have hearing problems. Older adults who have trouble hearing may become depressed or withdraw from others because they are disappointed or humiliated by their inability to understand what is being spoken to them. Because they can’t hear properly, older adults are sometimes misunderstood as being confused, unresponsive, or uncooperative. Based on the origin of hearing loss, it is classified into following types: • Conductive hearing loss: This phrase refers to hearing loss when sound cannot pass through the middle and outer ear. As a consequence of this hearing of soft sounds will be difficult to patent.
R. Vanitha Devi (B) · Vasundhara Department of Electronics and Communication Engineering, National Institute of Technology Warangal, Hanamkonda, Telangana 506003, India e-mail: [email protected] Vasundhara e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_19
225
226
R. Vanitha Devi and Vasundhara
• Sensorineural hearing loss: This phrase refers to hearing loss caused by a problem in the cochlea, the hearing nerve, or both. The cochlea, which is a “sensing organ”, and is referred to as “sensori”, while the hearing nerve is referred to as “neural.” • Mixed hearing loss: A person is said to have a mixed hearing loss if they suffer from both conductive and sensorineural hearing loss. Furthermore, hearing loss can also be categorized based on the ability of listening as minor, mild, moderate, and severe or profound hearing loss. People who are mildly deaf have a hard time understanding normal speech. Moderate hearing loss makes it difficult to understand loud speech. In the case of severe hearing loss one can only get clear speech. Moreover, people with a high level of hearing loss have problems in understanding clear speech. Therefore, hearing aid devices are found to have paramount importance in the life of the people who suffer with hearing loss problems. In subsequent sections, complete literature reports on various types of hearing aid devices and challenges of these were discussed.
1.1 Hearing Aid Devices Hearing aids are electronic devices that convert sound waves received through a microphone into electrical signals, which are then processed thoroughly before being amplified and transmitted to a loudspeaker. These devices are classified as (i) Analog hearing aid devices and (ii) Digital hearing aid devices.
1.2 Analog Hearing Aid Devices The sound coming from outside, including speech and all ambient noise, is amplified by an analog hearing aid. According to Halawai et al. [2] these devices are made up of (i) a small battery, (ii) a microphone, (iii) a speaker, and (iv) a simple electronic circuit with a transistor to amplify and control sound coming from the source. Figure 1 displays a block diagram of an analog hearing aid. Analog hearing aids amplify both speech and background noise since they are unable to distinguish between desired (speech) and undesirable (noise) signals. As a result, background noise can interfere with a conversation. It cannot provide any noise cancelling technology.
1.3 Digital Hearing Aid Devices In 1996, the first fully digital and programmable hearing aids were exhibited to the public for the first time. In comparison with analog hearing aids, the digital hearing aids have more flexibility and these can be fine-tuned to the patient’s demands [2].
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
227
Fig. 1 Block diagram of basic analog hearing aid device
Block diagram of digital hearing aid is given in Fig. 2. These are used to both amplify and reduce noise signals in speech signals. Furthermore, it is interesting to note that digital devices can also distinguish between speech and background noise. Speech enhancement and noise reduction techniques are used in digital hearing aids. As shown in Fig. 2, the microphone in a digital hearing aid device takes incoming signals and transforms them to a digital form. The digital hearing aid devices contain a microcontroller and a small loudspeaker that transmits the incoming signal to the canal of the ear. Hitherto, enormous work has been carried out in the field of hearing aids with an aid of tuning the signals thus the systematic review of the work is essential. Therefore, in this article an attempt has been made to consolidate the prominent results reported in this field.
Fig. 2 Digital hearing aid block diagram
228
R. Vanitha Devi and Vasundhara
2 Smart Hearing Aid Using Deep Learning In speech processing research, clean speech signal restoration by complete removal of noisy speech is regarded as a considerably difficult problem [3]. In recent times, deep learning has gained significant momentum to handle numerous problems effectively that were previously difficult to probe and solve utilizing traditional signal processing approaches. The deep learning algorithms are capable to train a function f which maps an input variable X and provides the appropriate variable Y as output even if the input is more complex and if we have a nonlinear mapping function [4]. As a direct consequence, it can be used to learn the complex nonlinear function that maps from noisy speech to desired speech, thereby separating the desired signal from the undesired one. Different deep neural network topologies have been utilized for speech improvement. One of these networks, the deep belief network (DBN), is addressed in [5–10]. For the pre-train phase, it employs Restricted Boltzmann Machines (RBMs) layers, and for the fine-tuning phase, a feedforward network has been utilized [11]. RBMs learn the dataset’s features in the pretrial phase utilizing an unsupervised training strategy, where each succeeding RBM uses the learned features from the preceding RBM as input to learn ever-higher-level features [12]. During the fine-tuning step, the weights of a conventional back propagation driven feedforward neural network (FNN) are initialized using the features that will have high information. This method of weight initialization aids in the discovery of improved parameter values during training. Another architecture utilized in improvement of speech is Convolutional Neural Networks (CNNs) as shown in [13, 14], in which convolutions are used to learn data which has high information of input data. The dimensionality of the input will vary between FNN and CNN. CNN takes three-dimensional inputs whereas FNN only takes one-dimensional vectors. The CNN is made up of three layers: viz. the input layer, the feature extraction layer, and the output layer [15]. Other speech augmentation methods combine two architectures, such that described in [16], which mixes CNN with recurrent neural network (RNN). In this scenario, features will be extracted from the input data and fixed by CNN before being sent to the RNN for learning and estimation. RNNs have been used successfully in applications that utilize sequence data because of its short-term memory [15]. When making decisions, RNNs consider both the current input, and anything learned from previously received inputs. However, some types of noises are as useful as desired speech signals like hearing aid applications. Car horn, Fire alarm, car siren and several other noise types will lead to major problems if the hearing impaired are not able to hear. Specific external alarm system has been used by the people with hearing difficulty to manage this problem which uses a flashlight or an object which vibrates when the desired noise is produced [17, 18]. The disadvantages of these devices are 1. It is costlier, 2.A separate systems for different types of desired noise should have been used by the user. Further, a new approach has been developed where speech enhancement and
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
229
alerting systems are carried out in a device itself so that now hearing aids will become smart hearing aids [19].
3 Smart Phone-Based Hearing Aids People who use hearing aids along with auxiliary devices will have improved user experience when compared with people who use hearing aids alone [20–23]. A smartphone chipset aboard and a Linux operating system are utilized to assist the operations of a hearing aid [20]. It gives high performance with low noise audio. As additional board is included in this its cost is significantly high. In [21] a smartphonebased app is used and a commercial Bluetooth headset has also been tested. In [22, 23] a system has been tested with a smartphone and personal frequency modulation. Various speech augmentation approaches based on deep learning methods have been created and proven useful, but they have failed to reach real-time processing, which is essential in the case of hearing aids. In this context, Wang and co-workers have contributed significantly in this field of smart phone-based binaural hearing aid [24]. A smartphone-based hearing self-assessment (SHSA) system has been developed and the SHSA utilizes a hearing aid as an audio source with “coarse-to-fine focus” algorithm and a smartphone as the controller and user interface. According to the findings, it has an average test accuracy difference which is less than 6 decibels HL and also able to save more than 50% of testing time when compared to a traditional hearing test. People with hearing loss can use the SHSA system to know self-hearing capability. Further, Bhat et al. have developed a multi-objective learning-based CNN-based speech enhancement (SE) method [25]. The suggested SE method which is CNNbased is computationally efficient and can be used to perform speech enhancement in real-time with reduced audio latency using smartphones. The results obtained from their scrutiny confirms that the suggested method’s functionality and its practical application in the real domain with varying levels of noise and low SNRs. Moreover, it can also be reported that Sun et al. introduced a supervised speech augmentation method based on an RNN structure to handle the real-time challenge [26]. This proposes a supervised speech enhancement approach based on an RNN structure even at low levels of SNR it will increase speech intelligibility and quality. The structure is destined to lower the whole procedure’s computation complexity.
4 Occlusion Effect in Hearing Aids When the hearing aid’s portion of cartilaginous is totally or considerably occluded by the hearing aid (because fixing in the section which has bone leads to discomfort physically), acoustic energy is confined inside the canal of the ear. When a hearing aid user speaks or chews, vibrations are carried in the cartilaginous regions of the ear
230
R. Vanitha Devi and Vasundhara
canal, which serve as an elastic membrane, giving the user the impression that his speech is being muffled. People who regularly hear low-frequency sounds but have severe to profound sensorineural losses at high frequencies are more susceptible to the occlusion effect. This happens as a result of an increase in power at low frequencies, which is mostly in the range of 200–500 Hz. It is reported that an active noise cancellation (ANC) system with a fixed digital controller was used to reduce the OE [27]. A controller has been constructed, and simulations have demonstrated a significant reduction in OE in the 60–700 Hz frequency range. The proposed method does not require precise fit and is durable in a variety of everyday situations. However, the controller must be tweaked individually to ensure best performance. Furthermore, as is known from noise canceling headphones, a complete analysis of occlusion effect cancellation (OEC) and its relationships to ANC can be presented [28]. Acoustic measurements, design restrictions, system topologies, synergies, filter design, and subjective and objective evaluations are all covered in detail. The suggested OEC structure has the primary advantage of almost decoupling the performance and design of the feedforward and feedback filters. By swapping filter coefficients, the system may switch between ANC and OEC operation modes.
5 Feedback Cancellation in Hearing Aids A hearing aid is made up of a microphone that accepts input signals(n), a signal processing block G(z) to effectively handle the amplification, and a receiver that works as a loudspeaker, according to the block diagram in Fig. 3a. All signal processing for noise reduction, signal amplification, and sub-band processing based on the user’s level of hearing loss is contained in the hearing aid’s forward route, denoted by the symbol G(z). The microphone and receiver of a hearing aid will be placed close to one another due to the hearing aid’s small size. Furthermore, the user would feel uncomfortable if the hearing aid is too tightly fixed. As seen in Fig. 3a (F(z)), this creates a feedback path between the receiver and the microphone. The input to the microphone will once more pick up the signal Y (z) of the receiver that is supposed to be delivered to the user’s ear, creating a closed loop system. Acoustic feedback is the name for this phenomenon in hearing aids. Figure 3a depicts the closed loop transfer function with the input signal S(Z) and the received signal Y (Z) as follows: H (z) =
G(z) Y (z) = S(z) 1 − G(z)F(z)
The major problem found in hearing aid devices is Acoustic feedback. The Acoustic feedback causes hearing aid devices to oscillate at higher gain and also hurdles the highest gain which is available to the users. Thus, sound disturbs significantly. Further it also causes a howling effect, screaming and whistling. As a result, acoustic feedback minimization gained utmost importance in hearing aids. According
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
231
Fig. 3 a Hearing aid block diagram with acoustic feedback. b Hearing aid block diagram with normalized least mean square algorithm based on adaptive feedback cancellation (AFC)
to the survey, several techniques were utilized to solve the problem of acoustic feedback. In [29–33] some methods were proposed and in [34, 35] a full evaluation of numerous proposals for AFC in hearing aids were reported. It’s worth noting that as when a hearing impaired keeps a phone near ear the acoustic feedback among the receiver and the microphone might change rapidly and dramatically [36]. AFC is performed using an adaptive filter W (z) to represent the acoustic feedback path F(z), as shown in Fig. 3b. Because of its simplicity, robustness, and ease of implementation, the normalized least mean square (NLMS) method [37] is the most widely used adaptive algorithm for AFC. The received signal y(n) and the microphone signal x(n) which is summed up with s(n) and y f (n) will operate as the input signal and intended response, respectively, for W (z), as shown in Fig. 3b. These two signals are highly linked, resulting in a biased convergence of W (z) and as a result non-optimal acoustic feedback cancellation. Many scholars have spent the last few decades focusing on developing efficient adaptive filtering algorithms. As demonstrated in Fig. 3b, AFC is carried out utilizing an adaptive filter W (z) to simulate the acoustic feedback circuit F(z). The normalized least mean square (NLMS) approach [37] is the most used adaptive algorithm for AFC for its easiness, robustness, and ease of implementation. The received signal y(n) and the microphone signal x(n) are combined to generate s(n) and y f (n), respectively, in Fig. 3b, which shows how they serve as the input signal and desired response for W (z). Due to the strong correlation between these two signals, W (z) converges biasedly, leading to a non-ideal cancellation of acoustic feedback. Over the past few decades, numerous academics have concentrated on creating effective adaptive filtering algorithms. The goal of this study is to highlight important works in this field.
232
R. Vanitha Devi and Vasundhara
In HA a closed loop has been created, due to this the feedback path estimate can induce bias which leads to high correlation between the loud speaker output and the input signal. To minimize the bias, various decorrelation methods were introduced, viz. delay insertion [38–42], probe noise insertion [43–46], frequency shifting [47, 48], phase modulation [49], and pre-whitening filters [50, 51]. The adaptive feedback cancellation which utilizes prediction error method (PEM-AFC) is well known due to its applications in both time [51–56] and frequency [39, 57–61] domains. In this method pre-filters are used to pre-whitened adaptive filters, which results in less correlation and less bias. There were more methods introduced like sub-band techniques [62–65], multiple microphones [56, 66–71], fast converging adaptive filters [51, 52, 54, 69, 72–74], filters with affine combination [59] and variable step size (VSS) [48, 75–77] and all of these combination of techniques [68, 78, 79] for AFC also were given a performance improvement. In [80] proposes an AFC strategy based on decomposing an adaptive filter into a Kronecker Product of two shorter filters. The need for a dependable AFC technique persists despite the fact that the aforementioned AFC approaches can improve system performance to some extent. When the least mean square (LMS) and normalized LMS (NLMS) algorithms were used, the performance of the AFC would suffer due to the sparse properties of the feedback path [81, 82] and correlated input signals. To further enhance the convergence and tracking rates, the hybrid normalized least square method (H-NLMS) for AFC has been introduced [83]. Additionally, a hardware implementation of acoustic feedback cancellers (AFCs) in hearing aids has been successfully achieved using the partitioned time-domain block LMS (PTBLMS) method [84]. To further investigate the change in sparsity conditions, the re-weighted Zero attracting proportionate normalized sub-band adaptive filtering (RZA-PNSAF) algorithm was developed. The perceptual evaluation of speech quality (PESQ) values and maximum stable gain of 3–5 dB have increased as a result of this procedure [85]. Additionally, a switching PEM that employs soft-clipping for AFC was created. By calculating the adaptive filter coefficients, it operates on a new update rule [86]. Additionally, the convex proportionate NLMS (CPNLMS) and convex improved proportionate NLMS (CIPNLMS) algorithms were proposed to reveal the rate of convergence and performance of the adaptive filter in steady-state to improve the cancellation of performance of the acoustic feedback in hearing aids [87]. Additionally, the proposed approach of resilient set membership M-based affine projection with sparsity awareness. Moreover, the proposed method of sparsity aware affine projection like robust set membership M-estimate (SAPL-RSM) filter was taken into consideration in HA to decrease the effect of impulsive noise on the adaptive weight of the feedback cancellation [88]. Details of advances in smart hearing aids and algorithms implemented for feedback cancellation have been systematically summarized in Tables 1 and 2, respectively.
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
233
Table 1 Recent advances in smart hearing aids Authors (Year)
Contribution
References
Park and Lee (2016) Redundant convolutional encoder decoder (R-CED) network was used to map noisy and clean speech using supervised learning to remove babble noise
[13]
Nossier et al. (2019) A smart hearing aid is developed which distinguishes important noises in noise signals like fire alarms
[19]
Chern et al. (2017)
For a variety of target users who could gain from improved [23] listening clarity in the classroom, a smartphone-based hearing assistance system (smart hear) has been developed. The smart hear system includes speech transmission transmitter and reception devices (smartphone and bluetooth headset), as well as an android mobile application that connects and controls all of the various devices through bluetooth or WIFI
Chen et al. (2019)
Smartphone-based hearing self-assessment (SHSA) system [24] has been used to self-check hearing loss degree using a smartphone as the user-interface, controller and hearing aid as the audio source
Bhat et al. (2019)
An application has been built in smart phone which performs real-time speech enhancement and this will assist HA device here multi-objective learning-based CNN has been used which was computationally fast and reduced processing delay
[25]
Sun et al. (2020)
A supervised speech enhancement method using RNN is used for real-time application of HA
[26]
6 Conclusion The present review article provides a summary of recent advances in the performance of hearing aids. We sum up the basics of hearing aid, smartphone-based hearing aids, effect of occlusion, and feedback cancellation in hearing aid. Further the adaptive signal processing techniques employed for occlusion and feedback mitigation have been discussed. In the past decade, various adaptive filtering techniques have been employed for acoustic feedback mitigation in hearing aids. In recent times, focus has been shifted toward making smart hearing aids integrated with android or smart phone-based platforms. The researchers can take up work toward improving the perceptual speech quality deliverable by the hearing aids and making it more self-adjustable and sufficient. Further, they can be integrated with machine learning and artificial intelligence-based notions in the upcoming days. The paradigm can be shifted toward making hearing aids as a complete health monitoring device by including several health and cognitive monitoring embedded facilities in the device with the advent of latest technologies. We firmly believe that the present review article will provide significant insight to the chosen topic for the readers.
Frequency shifting with prediction error method was used which responded faster to feedback changes
Phase modulation was used and compared with frequency shifting (FS) and observed that FS is giving slightly better performance of AFC than phase modulation
Two channel AFC and a decoupled PEM-AFC is used. PEM-AFC is preferred for highly non-stationary signals [51]
NLMS adaptive filters provide a slow convergence rate when colored incoming signals are used. It was suggested to use the affine projection algorithm (APA) to increase convergence rate
The proportionate NLMS (PNLMS) and improved PNLMS algorithm (IPNLMS) has been proposed to speed convergence and tracking for fast sparse echo responses
PEM-based pre-whitening and a frequency-domain Kalman filter (PEM-FDKF) for AFC is compared with standard frequency-domain adaptive filter (FDAF) and reported that proposed algorithm reduces estimation error and improves sound quality
An improved practical VSS algorithm (IPVSS) is proposed which uses a variable step size to update the weight equation with upper and lower limits of the adaptive filter
Convex proportionate normalized Wilcoxon LMS (PNWLMS) algorithm was proposed which shows better cancellation performance than Filtered-x LMS algorithm
The Hybrid-AFC scheme was proposed where a soft-clipping-based stability detector was used to decide which [83] algorithm has to be used (NLMS or PEM) to update the adaptive filter. Computational complexity has increased slightly
Nakagawa et al. (2014)
Schroeder (1964)
Guo et al. (2012)
Spriet et al. (2005)
Tran et al. (2016)
Tran et al. (2017)
Bernardi et al. (2015)
Tran et al. (2016)
Vasundhara et al. (2016)
Nordholm et al. (2018)
(continued)
[81]
[77]
[59]
[54]
[52]
[49]
[47]
[46]
Probe noise inserted and enhanced to reduce bias and convergence rate was reduced by factor of 10
The signal quality reduces by Injecting probe noise in the loudspeaker so the probe signal is shaped and also forward path delay is reduced
Guo et al. (2012)
[44]
References [40]
Contribution
Van Waterschoot et al. (2011) AFC and its challenges were reported
Authors (Year)
Table 2 Algorithms for Feedback cancellation in hearing aids
234 R. Vanitha Devi and Vasundhara
A switched PEM with soft-clipping (swPEMSC) was proposed which improved the convergence and tracking rates, resulting in a better ability to recover from unstable/howling effect which resulted in a reduced howling effect and also system became stable
Convex improved proportionate NLMS (CIPNLMS) algorithm was proposed to improve the performance of AFC
Sparsity aware affine projection-like robust set membership M-estimate (SAPL-RSM) filtering has been used to [88] improve performance of AFC by its weight update method when impulsive noise was entered into the process of HA. Misalignment has been reduced and sound quality was improved
Tran et al. (2021)
Vanamadi et al. (2021)
Vasundhara (2021)
[87]
[86]
[84]
Vasundhara et al. (2018)
References
Contribution
Partitioned time-domain block LMS (PTBLMS) algorithm was proposed where hardware implementation of AFC is realized
Authors (Year)
Table 2 (continued)
19 Review on Recent Advances in Hearing Aids: A Signal Processing … 235
236
R. Vanitha Devi and Vasundhara
References 1. National Institute on aging, https://www.nia.nih.gov/health/hearing-loss-common-problemolder-adults. Accessed on 01 Apr 2022 2. Halawani SM, Al-Talhi AR, Khan AW (2013) Speech enhancement techniques for hearing impaired people: digital signal processing based approach. Life Sci J 10(4):3467–3476 3. Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC, Boca Raton, FL 4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(1):436–444 5. Xu Y, Du J, Dai L-R, Lee C-H (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(1):7–19 6. Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising auto encoder. Inter speech 7. Wang Y, Wang DL (2012) Boosting classification based speech separation using temporal dynamics. In: 13th proceedings on inter speech. ISCA Archive, pp 1528–1531 8. Wang Y, Wang DL (2012) Cocktail party processing via structured prediction. In: Proceedings of advances in neural information processing systems. Curran Associates, pp 224–32 9. Wang Y, Wang DL (2013) Towards scaling up classification based speech separation. IEEE Trans Audio Speech Lang Process 21(7):1381–1390 10. Healy EW, Yoho SE, Wang Y, Wang DL (2013) An algorithm to improve speech recognition in noise for hearing impaired listeners. J Acoust Soc Am 134(4):3029–3038 11. Bengio Y (2009) Learning deep architectures for AI. Foundat Trends Mach Learn 2(1):1–127 12. Erhan D, Courville A, Bengio Y, Vincent P (2010) Why does unsupervised pre-training help deep learning. J Mach Learn Res 11:625–660 13. Park SR, Lee J (2016) A fully convolutional neural network for speech enhancement. Available: https://arxiv.org/abs/1609.07132 14. Fu SW, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks. Available: https://arxiv.org/abs/1703.02205 15. Nivarthi PM, Nadendla SH, Kumar CS 9 Comparative study of deep learning techniques used for speech enhancement. In: 2021 IEEE 6th international conference on computing, communication and automation (ICCCA), pp 161–165 16. Zhao H, Zarar S, Tashev I, Lee C-H (2018) Convolutional-recurrent neural networks for speech enhancement. Available: https://arxiv.org/abs/1805.00579 17. Khandelwal R, Narayanan S, Li L (2006) Emergency alert service [Online]. Available: https:// patents.google.com/patent/US7119675B2/en 18. Ketabdar H, Polzehl T (2009) Tactile and visual alerts for deaf people by mobile phones. In: Proceedings 11th international ACM SIGACCESS conference on computer access, pp 253–254 19. Nossier SA, Rizk MRM, Moussa ND, Shehaby S (2019) Enhanced smart hearing aid using deep neural networks. Alex Eng J 58(2):539–550 20. Pisha L, Hamilton S, Sengupta D, Lee C-H, Vastare KC, Zubatiy T, Luna S, Yalcin C, Grant A, Gupta R, Chockalingam G, Rao BD, Garudadri H (2018) A wearable platform for research in augmented hearing. In: Proceedings 52nd Asilomar conference signals, system, computing, pp 223–227 21. Panahi IMS, Kehtarnavaz N, Thibodeau L (2018) Smartphone as a research platform for hearing study and hearing aid applications. J Acoust Soc Am 143(3):1738 22. Lin Y-C, Lai Y-H, Chang H-W, Tsao Y, Chang Y-P, Chang RY (2018) Smart hear: a smartphonebased remote microphone hearing assistive system using wireless technologies. IEEE Syst J 12(1):20–29 23. Chern A, Lai Y-H, Chang Y-P, Tsao Y, Chang RY, Chang H-W (2017) A smartphone-based multi-functional hearing assistive system to facilitate speech recognition in the classroom. IEEE Access 5:10339–10351 24. Chen F, Wang S, Li J, Tan H, Jia W, Wang Z (2019) Smartphone-based hearing self-assessment system using hearing aids with fast audiometry method. IEEE Trans Biomed Circuits Syst 13(1):170–179
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
237
25. Bhat GS, Shankar N, Reddy CKA, Panahi IMS (2019) A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone. IEEE Access 7:78421–78433 26. Sun Z, Li Y, Jiang H, Chen F, Xie X, Wang Z (2020) A supervised speech enhancement method for smartphone-based binaural hearing aids. IEEE Trans Biomed Circuits Syst 14(5):951–960 27. Liebich S, Jax P, Vary P (2016) Active cancellation of the occlusion effect in hearing aids by time invariant robust feedback. Speech communication. In: 12th ITG symposium. Germany, pp 1–5 28. Liebich S, Vary P (2022) Occlusion effect cancellation in headphones and hearing devices—the sister of active noise cancellation. IEEE/ACM Trans Audio Speech Lang Process 30:35–48 29. Maxwell J, Zurek P (1995) Reducing acoustic feedback in hearing aids. IEEE Trans Speech Audio Process 4:304–313 30. Edwards BW (1998) Signal processing techniques for a DSP hearing aid. Proc IEEE ISCAS 6:586–589 31. Bustamante DK, Worrall TL, Williamson MJ (1989) Measurement and adaptive suppression of acoustic feedback in hearing aids. Proc IEEE ICASSP 3:2017–2020 32. Kaelin A, Lindgren A, Wyrsch S (1998) A digital frequency domain implementation of a very high gain hearing aid with compensation for recruitment of loudness and acoustic echo cancellation. Signal Process 64(1):71–85 33. Kates JM (1999) Constrained adaptation for feedback cancellation in hearing aids. J Acoust Soc Am 106(2):1010–1019 34. Kates JM (2008) Digital hearing aids. Plural Publishing 35. Ma G, Gran F, Jacobsen F, Agerkvist FT (2011) Adaptive feedback cancellation with bandlimited LPC vocoder in digital hearing aids. IEEE Trans Audio Speech Lang Process 19(4):677– 687 36. Spriet A, Moonen M, Wouters J (2010) Evaluation of feedback reduction techniques in hearing aids based on physical performance measures. J Acoust Soc Am 128(3):1245–1261 37. Douglas SC (1994) A family of normalized LMS algorithms. IEEE Signal Process Lett 1(3):49– 51 38. Siqueira MG, Alwan A (2000) Steady-state analysis of continuous adaptation in acoustic feedback reduction systems for hearing-aids. IEEE Trans Speech Audio Process 8(4):443–453 39. Spriet A, Doclo S, Moonen M, Wouters J (2008) Feedback control in hearing aids. In: Springer handbook of speech processing. Springer, Berlin/Heidelberg, pp 979–1000 40. Van Waterschoot V, Moonen M (2011) Fifty years of acoustic feedback control: state of the art and future challenges. Proc IEEE 99(2):288–327 41. Hellgren J, Forssell U (2001) Bias of feedback cancellation algorithms in hearing aids based on direct closed loop identification. IEEE Trans Speech Audio Process 9(8):906–913 42. Laugesen S, Hansen KV, Hellgren J (1999) Acceptable delays in hearing aids and implications for feedback cancellation. J Acoust Soc Am 105(2):1211–1212 43. Kates J (1990) Feedback cancellation in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing. NM, pp 1125–1128 44. Guo M, Jensen SH, Jensen J (2012) Novel acoustic feedback cancellation approaches in hearing aid applications using probe noise and probe noise enhancement. IEEE Trans Audio Speech Lang Process 20(9):2549–2563 45. Guo M, Elmedyb TB, Jensen SH, Jensen J (2012) On acoustic feedback cancellation using probe noise in multiple-microphone and single-loudspeaker systems. IEEE Signal Process Lett 19(5):283–286 46. Nakagawa CRC, Nordholm S, Yan WY (2014) Feedback cancellation with probe shaping compensation. IEEE Signal Process Lett 21(3):365–369 47. Schroeder MR (1964) Improvement of acoustic-feedback stability by frequency shifting. J Acoust Soc Am 36(9):1718–1724 48. Strasser F, Puder H (2015) Adaptive feedback cancellation for realistic hearing aid applications. IEEE/ACM Trans Audio Speech Lang Process 23(12):2322–2333
238
R. Vanitha Devi and Vasundhara
49. Guo M, Jensen SH, Jensen J, Grant SL (2012) On the use of a phase modulation method for decorrelation in acoustic feedback cancellation. In: Proceedings of the European signal processing conference (EUSIPCO). Bucharest, pp 2000–2004 50. Hellgren J (2002) Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Trans Speech Audio Process 10(2):119–131 51. Spriet A, Proudler I, Moonen M, Wouters J (2005) Adaptive feedback cancellation in hearing aids with linear prediction of the desired signal. IEEE Trans Signal Process 53(10):3749–3763 52. Tran LTT, Dam HH, Nordholm S (2016) Affine projection algorithm for acoustic feedback cancellation using prediction error method in hearing aids. In: Proceedings of the IEEE international workshop on acoustic signal enhancement (IWAENC), Xi’an 53. Rombouts G, Van Waterschoot T, Moonen M (2007) Robust and efficient implementation of the PEM-AFROW algorithm for acoustic feedback cancellation. J Audio Eng Soc 55(11):955–966 54. Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2017) Proportionate NLMS for adaptive feedback control in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, New Orleans, LA 55. Gil-Cacho JM, van Waterschoot T, Moonen M, Jensen SH (2012) Transform domain prediction error method for improved acoustic echo and feedback cancellation. In: Proceedings of the European signal processing conference (EUSIPCO). Bucharest, pp. 2422–2426 56. Tran LTT, Nordholm SE, Schepker H, Dam HH, Doclo S (2018) Two-microphone hearing aids using prediction error method for adaptive feedback control. IEEE/ACM Trans Audio Speech Lang Process 26(5):909–923 57. Spriet A, Rombouts G, Moonen M, Wouters J (2006) Adaptive feedback cancellation in hearing aids. Elsevier J Frankl Inst 343(6):545–573 58. Bernardi G, Van Waterschoot T, Wouters J, Moonen M (2015) An all-frequency-domain adaptive filter with PEM-based decorrelation for acoustic feedback control. In: Proceedings of the workshop on applications of signal processing to audio and acoustics (WASPAA). New Paltz, NY, pp 1–5 59. Bernardi G, Van Waterschoot T, Wouters J, Hillbratt M, Moonen M (2015)A PEM-based frequency-domain Kalman filter for adaptive feedback cancellation. In: Proceedings of the 23rd European signal processing conference (EUSIPCO). Nice, pp 270–274 60. Schepker H, Tran LTT, Nordholm S, Doclo S (2016) Improving adaptive feedback cancellation in hearing aids using an affine combination of filters. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Shanghai 61. Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2018) Frequency domain improved practical variable step-size for adaptive feedback cancellation using pre-filters. In: Proceedings of the 2018 16th international workshop on acoustic signal enhancement (IWAENC). Tokyo, pp 171–175 62. Yang F, Wu M, Ji P, Yang J (2012) An improved multiband-structured subband adaptive filter algorithm. IEEE Signal Process Lett 19(10):647–650 63. Strasser F, Puder H (2014) Sub-band feedback cancellation with variable step sizes for music signals in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. Florence, pp 8207–8211 64. Khoubrouy SA, Panahi IMS (2016) An efficient delayless sub-band filtering for adaptive feedback compensation in hearing aid. J Signal Process Syst 83:401–409 65. Pradhan S, Patel V, Somani D, George NV (2017) An improved proportionate delayless multiband-structured subband adaptive feedback canceller for digital hearing aids. IEEE/ACM Trans Audio Speech Lang Process 25(8):1633–1643 66. Nakagawa CRC, Nordholm S, Yan WY (2012) Dual microphone solution for acoustic feedback cancellation for assistive listening. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. Kyoto, pp 149–152 67. Nakagawa CRC, Nordholm S, Yan WY (2015) Analysis of two microphone method for feedback cancellation. IEEE Signal Process Lett 22(1):35–39 68. Tran LTT, Nordholm S, Dam HH, Yan WY, Nakagawa CR (2015) Acoustic feedback cancellation in hearing aids using two microphones employing variable step size affine projection
19 Review on Recent Advances in Hearing Aids: A Signal Processing …
69.
70.
71.
72. 73.
74. 75.
76.
77.
78.
79.
80. 81. 82. 83. 84.
85.
239
algorithms. In: Proceedings of the IEEE international conference on digital signal processing (DSP). Singapore, pp 1191–1195 Albu F, Nakagawa R, Nordholm S Proportionate algorithms for two-microphone active feedback cancellation. In: Proceedings of the 23rd European signal processing conference (EUSIPCO). Nice, pp 290–294 Schepker H, Nordholm SE, Tran LTT, Doclo S (2019) Null-steering beamformer-based feedback cancellation for multi-microphone hearing aids with incoming signal preservation. IEEE/ACM Trans Audio Speech Lang Process 27(4):679–691 Schepker H, Nordholm S, Doclo S (2020) Acoustic feedback suppression for multi-microphone hearing devices using a soft-constrained null-steering beamformer. IEEE/ACM Trans Audio Speech Lang Process 28:929–940 Lee S, Kim IY, Park YC (2007) Approximated affine projection algorithm for feedback cancellation in hearing aids. Comp Methods Programs Biomed 87(3):254–261 Lee K, Baik YH, Park Y, Kim D, Sohn J (2011) Robust adaptive feedback canceller based on modified pseudo affine projection algorithm. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society. Boston, MA, pp 3760– 3763 Pradhan S, Patel V, Patel K, Maheshwari J, George NV (2017) Acoustic feedback cancellation in digital hearing aids: a sparse adaptive filtering approach. Appl Acoust 122:138–145 Thipphayathetthana S, Chinrungrueng C (2000) Variable step-size of the least-mean-square algorithm for reducing acoustic feedback in hearing aids. In: Proceedings of the IEEE AsiaPacific conference on circuits and systems. Tianjin, pp 407–410 Rotaru M, Albu F, Coanda H (2012) A variable step size modified decorrelated NLMS algorithm for adaptive feedback cancellation in hearing aids. In: Proceedings of the international symposium on electronics and telecommunications. Timisoara, pp 263–266 Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2016) Improved practical variable step-size algorithm for adaptive feedback control in hearing aids. In: Proceedings of the IEEE international conference on signal processing and communication systems, surfers paradise, QLD Albu F, Tran LTT, Nordholm S (2017) A combined variable step size strategy for two microphones acoustic feedback cancellation using proportionate algorithms. In: Proceedings of the Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). Kuala Lumpur, pp 1373–1377 Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm SE (2017) Adaptive feedback control using improved variable step-size affine projection algorithm for hearing aids. In: Proceedings of the 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). Kuala Lumpur, pp 1633–1640 Bhattacharjee SS, George NV (2021) Fast and efficient acoustic feedback cancellation based on low rank approximation. Signal Process 182:107984 Vasundhara, Panda G, Puhan NB (2016) A robust adaptive hybrid feedback cancellation scheme for hearing aids in the presence of outliers. Appl Acoust 102:146–155 Maheshwari J, George NV (2016) Robust modeling of acoustic paths using a sparse adaptive algorithm. Appl Acoust 101:122–126 Nordholm S, Schepker H, Tran LTT, Doclo S (2018) Stability-controlled hybrid adaptive feedback cancellation scheme for hearing aids. J Acoust Soc Am 143(1):150–166 Vasundhara, Mohanty BK, Panda G, Puhan NB (2018) Hardware design for VLSI implementation of acoustic feedback canceller in hearing aids. Circuits Syst Signal Process 37(4):1383–1406 Vasundhara, Puhan NB, Pandam G (2019) Zero attracting proportionate normalized sub band adaptive filtering technique for feedback cancellation in hearing aids. Appl Acoust 149:39–45
240
R. Vanitha Devi and Vasundhara
86. Tran LTT, Nordholm SE (2021) A switched algorithm for adaptive feedback cancellation using pre-filters in hearing aids. Audiol Res 11(3):389–409 87. Vanamadi R, Kar A (2021) Feedback cancellation in digital hearing aids using convex combination of proportionate adaptive algorithms. Appl Acoust 182:108175 88. Vasundhara (2021) Sparsity aware affine-projection-like filtering integrated with robust set membership and M-estimate approach for acoustic feedback cancellation in hearing aids. Appl Acoust 175:107778
Chapter 20
Hierarchical Earthquake Prediction Framework Dipti Rana , Charmi Shah , Yamini Kabra , Ummulkiram Daginawala , and Pranjal Tibrewal
1 Introduction Earthquakes are perhaps the most dangerous catastrophic events caused due to rock layer development or relocation of the earth’s structural plate. This steep development delivers a tremendous measure of energy that makes a sort of seismic wave. The vibration results that go through the earth’s surface cause harm for the people that live in the earthquake-prone regions by causing injuries and damage to life, damage to the roads and bridges, property damage, etc., and the economy in numerous ways [1]. The earth has four significant layers: the inner core, outer core, mantle, and crust. The crust and the highest point of the mantle make up a thin layer outside our planet. Yet, this layer is not safe. It consists of many pieces like a riddle covering the outside of the earth. That, yet these interconnecting pieces keep gradually moving around, sliding past each other and finding one another. We call these unique pieces structural plates, and the edges of the plates are known as the plate limits. The plate limits have many flaws, and the majority of the earthquakes throughout the planet happen on these cracks. Since the edges of the plates are not good, they stall out while the remainder of the plate continues to move. At long last, when the plate has moved far enough, the edges unstuck on one of the cracks, and there is a tremor [2]. Earthquakes generally happen suddenly and do not permit much time for individuals to respond. In this way, earthquakes can cause genuine wounds and death
D. Rana (B) · C. Shah (B) · Y. Kabra · U. Daginawala · P. Tibrewal Computer Science and Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, India e-mail: [email protected] C. Shah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_20
241
242
D. Rana et al.
tolls and annihilate immense structures and frameworks, prompting incredible financial misfortune. The prediction of earthquakes is essential for the well-being of the general public, but it has been demonstrated to be a complicated issue in seismology. It is indisputable that the seismologist community is the open research area to predict earthquakes due to its severity [2]. Some of the statistics about earthquakes are [3]: • An earthquake is considered a common phenomenon as it always happens somewhere, every day. An average of 20,000 earthquakes every year (about 50 a day) are recorded. • On a yearly basis, USGS detects about half a million earthquakes worldwide. • By The National Earthquake Information Center (NEIC) worldwide. However, millions of earthquakes also occur every year that are too weak to be registered. Japan and Indonesia are the two nations with the highest number of seismic tremors. • The deadliest tremor ever hit China back in 1556, killing around 830,000 individuals. • The rim of the Pacific Ocean called the “Ring of Fire” is a region where 80% of the planet’s earthquakes occur. • On May 22, 1960, in Chile, the largest earthquake in the world was recorded, having a magnitude of 9.5.
1.1 Types of Earthquakes The earth seems like a solid place from the outside, but it is rather dynamic underneath the surface of four primary layers: a solid crust, a nearly solid mantle, a liquid outer core, and a solid inner core. The relentless development causes stress on the earth’s crust and causes a variety of earthquakes which are tectonic earthquakes, fault zones, and volcanic earthquakes.
1.2 Impacts of Seismic Tremor on Earth While the edges of faults have remained together, and the remainder of the square is moving, the energy that would typically make the squares slide past each other is accumulated. When the power of the moving squares, at last, beats the grinding of the rough edges of the earth and it unstuck, all that stockpiled energy is delivered. The energy emanates outward from the cracks in seismic waves, just like waves on a lake. The seismic waves move the earth as they travel through it, and when the waves appear at the world’s surface, they move the ground and anything on it, similar to our homes and us [4].
20 Hierarchical Earthquake Prediction Framework
243
1.3 Challenges for the Earthquake Prediction These days, the earthquake cautioning framework is introduced in numerous distant and volcanic regions that acquire data about tremor attributes and effects on the encompassing area and might expand the number of survivors’ assumptions. Machine learning has been utilized to make progressions in data and forecast results. Earthquake prediction remained an underachieved objective due to numerous challenges. Some of them are as follows: • There is the absence of the volume of information for the successful prediction or the forecast procedure. • There is a lack of technology in precisely observing the stress changes, pressure, and temperature variations deep beneath the crust through scientific instruments, which eventually results in the unavailability of comprehensive data about seismic features. • The gap between seismologists and computer scientists for exploring the various avenues of technology to hunt is a challenging task. • Some predictions’ results have not given a precise forecast. • The heuristic “Many foreshocks (small magnitude earthquakes) lead up to the ‘main’ earthquake” is analyzed by many researchers, but not implemented.
1.4 Application The world has estimated that around 5,00,000 earthquakes occur each year and are increasing with time. As this figure is quite prominent and seismologists cannot find an appropriate earthquake prediction method till now, earthquakes are Therefore, our way of Earthquake prediction can help predict the magnitude and regions of the upcoming earthquakes that can help us take necessary and immediate precautions that can save the lives of millions of people and help reduce financial misfortune that occurs due to collapsing buildings and properties. This research aims to prepare a technique to predict the location and magnitude of the expected earthquakes with as high accuracy as possible. Thus, this research explored the algorithms and methodologies to predict earthquakes using machine learning and deep learning, various preprocessing techniques to convert data into proper format before feeding it to machine learning models using spatio-temporal relations in hierarchical way by considering the relative data.
244
D. Rana et al.
1.5 Organization of Paper This research organization is as follows: The second section lists the previous techniques used to predict earthquakes. The third section provides the proposed framework planned for this research. The fourth section displays all the simulations, graphs, and data analysis implemented. The final section concludes the research and mentions the possible future work for earthquake prediction.
2 Literature Review The study of earthquake prediction is essential for the benefit of the society. This section discusses the literature review of earthquake prediction techniques and lists the seismic parameters required for the prediction model. Most of the earthquake prediction works are classified into five categories. In the first place, a few works utilize numerical or statistical analysis to make earthquakes prediction. Kannan [5] predicted earthquake focal points as indicated by the hypothesis of the spatial connection, i.e., earthquakes happening inside an issue zone are identified with each other. Predictions are made by taking benefits of Poisson range identifier work (PRI), Poisson distributions, and so forth. Boucouvalas et al. [6] worked on the Fibonacci, Dual, and Lucas (FDL) strategy and proposed a plan to anticipate earthquakes by utilizing a trigger planetary angle date preceding a solid seismic tremor as a seed for the unfolding of the FDL time spiral. Notwithstanding, these works were tried with a highly restricted measure of information and did not give excellent outcomes (low achievement rate). Some works predicted earthquakes based on precursor signals studies. Unfortunately, it is difficult to conclude these precursor signals due to very insufficient data. Besides, these precursor signals alone usually cannot lead to accurate prediction results so not elaborated here. Later, machine learning has been utilized as an effective strategy to make earthquake predictions. Last et al. [7] compare several data mining and time series analysis methods, which include J48, AdaBoost, information network (IN), multi-objective info-fuzzy network (M-IFN), k-nearest neighbors (k-NN), and SVM, for predicting the magnitude of the highest anticipated seismic event based on earlier listed seismic events in the same region. Moreover, the prediction characteristics based on the Gutenberg-Richter ratio and some new seismic indicators are much more valuable than those traditionally used in the earthquake prediction literature, i.e., the average number of earthquakes in each region. Cortés et al. [8] analyzed the sensitivity of the present seismic indicators reported in the literature by adjusting the input attributes and their parameterization. The author notices that most machine learning methods make earthquake predictions based on seismic indicators, where only time-domain but not space domain correlations are examined.
20 Hierarchical Earthquake Prediction Framework
245
Fourth, lately, deep learning methods have been employed in earthquake prediction. Narayanakumar et al. [9] estimated the review of BP neural network techniques in predicting earthquakes. The outcomes explain that the BP neural network method can give better prediction accuracy for earthquakes of magnitude 3–5 than former works but still cannot have good results for earthquakes of magnitude 5–8 due to the shortage of adequate data. The author noticed that most of these neural network methods practice various features as input to predict the earthquakes’ time and/or magnitudes. But, very few of them study the spatial relations between earthquakes. The fifth way examines the spatio-temporal relations of the earthquake and predicts the location of the impending earthquakes. This method studies the sequence of earthquakes and recognizes long-term and short-term patterns of earthquakes in a particular area using the LSTM method. This recognizes patterns over a short period and a long period and can help increase the accuracy of the prediction. This work believes that the seismic properties of one location will connect with the seismic properties of another location. It also considers that seismic activities tend to have specific patterns over long periods.
2.1 Seismic Parameters From the literature review discovered, the variety of important features required to compute for the research work to improve the performance of the model. Following are the seismic parameters taken into consideration, time, mean magnitude of the last n events, the rate of the square root of seismic energy released dE, Gutenberg– Richter equation, summation of the mean square deviation from the regression line based on the Gutenberg-Richter inverse power law (η value), the difference between the maximum observed and the maximum expected earthquake magnitude, mean time between characteristic events among the last n events, deviation from the above mean time.
3 Proposed Framework The literature review shows many methods to predict earthquakes or the location of earthquakes. The proposed work considers the spatio-temporal relations to indicate the earthquake location and uses the seismic parameters to predict the magnitude range and exact magnitude value. The proposed framework is as shown in Fig. 1. The work is divided into two parts which include location prediction and magnitude prediction. Location prediction is done using LSTM network with two-dimensional input, and for magnitude prediction, multiclass classification is used to estimate if the earthquake will be mild, medium, or fatal, and ANN is used to estimate the exact magnitude of the earthquake.
246
D. Rana et al.
Fig. 1 Proposed hierarchical earthquake prediction framework
3.1 Dataset The dataset is downloaded from the USGS website [10] for the period 1971–2021 and over the range of 24°–40° latitude and 60°–76° longitude. The information of the earthquakes in the earthquake-prone regions of Afghanistan, Pakistan, Tajikistan, and the Northern part of India is recorded in the dataset.
3.2 Data Preprocessing for Location Prediction The area is split into four sub-regions of equal size. Every earthquake that occurs, together with the matching date and other information, is recorded in each moment of our original data. For the model inputs, we want our data to be in a weekly format, therefore, we consider weeks where one refers to the existence of one or
20 Hierarchical Earthquake Prediction Framework
247
more earthquakes in that particular week. One week has been set aside for the time slot. A one-hot vector is used as the input. Each vector has four indices for each position and 2628 vectors in total (total number of weeks in the mentioned time). If one or more earthquakes happened in the region during the week, the condition of each index is 1. This will be used as input to the dense neural network or LSTM model.
3.3 Location Prediction For estimating the location, the preprocessed vectors are fed as input to the dense/LSTM network. There will be X vectors where W represents the number of weeks considered of size R, where R is the number of regions considered, and in our case, it is 4. Xt will be the input at time t. Ht is the hidden state of the LSTM layer at that particular time. The architecture contains an LSTM layer that includes memory cells and will have Xt and Ht-1 as its input and Ht as its output. These memory cells will retain the information needed for prediction and remove the information not required on a short-term and long-term basis. This Ht will then be fed to the dropout layer, which prevents the model from overfitting. Followed by this, we are planning to have a dense network to learn the features which help us make predictions. Finally, the softmax function may be chosen to be the activation function and apply it to the output of the dense network. The output of this is a vector, Yt, containing four indexes. Every index corresponds to a location. Elements in this vector include 1 or 0 depending on the prediction results whether an earthquake is going to occur or not at time T + 1. Similarly, we have predicted using four models. These models include RNN, LSTM, LSTM+CNN, and finally, LSTM. The format of our data is given (transposed) in Fig. 2. For the rest of the networks, the processed network will be given input similarly but the LSTM layer would be modified accordingly into the respective model layers. The network will be highly dense, and the result will be given as output accordingly. Fig. 2 Data preprocessing for location prediction
248
D. Rana et al.
3.4 Preprocessing for Magnitude Prediction The collected data cannot be fed directly to the classifier or the ANN model. Hence, we need to preprocess the data before providing it into the classification or the ANN model. 1. The data contains a lot of null values which will be resolved in the near future. 2. The original data consists of rows representing every earthquake. In our case, we use weekly data to train our models, i.e., 1 in our dataset represents that there have been one or more earthquakes in the corresponding week. For this purpose, our data is calculated week-wise and then is given as inputs to our models. 3. Seismic parameters that we take into consideration need to be calculated as explained above for all the regions.
3.5 Magnitude Prediction The output of location prediction will give us the regions where there is a maximum probability of an earthquake taking place. These regions will be considered for the further stage of magnitude prediction. Different models will be trained for each location, i.e., four different locations in our case. Seismic parameters calculated on the past n earthquakes is given as input for the prediction of the magnitude of (n + 1)th earthquake. Magnitude prediction is performed in two ways.
3.6 Magnitude Range Classification Classification algorithms are used to perform multiclass classification and predict if the earthquake will be mild, medium, or fatal. Different algorithms like random forest classifier, SVM will be trained, tested, and compared on the same inputs.
3.7 Magnitude Estimation ANN is used to estimate the exact magnitude of the earthquake. The model will be trained using backtracking, and the output layer will have a linear activation function that will estimate the exact magnitude. If we have more than one location to predict for a given week, it will be done consequently by inputting the values into the corresponding models.
20 Hierarchical Earthquake Prediction Framework
249
4 Experimental Analysis 4.1 Dataset The dataset for earthquake prediction has been downloaded from the USGS website [10] of earthquake-prone regions of Afghanistan, Pakistan, Tajikistan, and the Northern part of India for the period 1970–2021 and over the range of 24°–40° latitude and 60°–76° longitude. The dataset includes the following columns, time, latitude, longitude, depth, mag, magType, nst, gap, dmin, rms, net, ID, updated, place, type, horizontalError, depthError, magError, magNst, status, locationSource, and magSource.
4.2 Data Preprocessing for Location Prediction The collected data cannot be fed directly to the dense/LSTM model, and hence, preprocessing of the data is required. For which, the region is divided into four equal sub-regions. Each instant of our original data consists of every earthquake occurring with the corresponding date and other information. We want our data to be in a weekly format for the model inputs, and hence, we achieve that by considering weeks where one corresponds to the presence of one or more earthquakes in that particular week. The data is filtered and processed as needed for the input for the dense/LSTM model. The time slot is defined to be one week. The input is defined as a one-hot vector. Each vector contains four indexes corresponding to each location and 2628 vectors (total number of weeks in the mentioned time). The state of each index is 1 if one or more earthquakes have occurred in the region during the particular week. This will be fed into the dense/LSTM model as input. The data is shown in Fig. 2.
4.3 Location Prediction For estimating the location of the earthquake, we have built various time series models. This can be done in two ways. One-dimensional input corresponding to every region and two-dimensional input in which the whole dataset would be fed into the model for training. In one-dimensional input, we create different vectors for each region. In twodimensional data, we include all four regions together and give that as input to our model. Our data consists of weeks consisting of information of whether earthquakes occurred or not. The time frame we have taken is of 100 events of earthquake. On the basis of the past 100 events, the 101st event is predicted. We have used DNN, LSTM, LSTM+CNN, and RNN models for prediction. We have used softmax activation function for all these models and the learning rate is
250
D. Rana et al.
Fig. 3 Region wise preprocessed data
set to 0.01 with optimizers such as Adam for dense network and RMSProp for the rest of the models. The results are summarized in the below given table.
4.4 Data Preprocessing for Magnitude Prediction The collected raw data cannot be fed directly into the classification model. Hence, preprocessing of the raw data into an understandable format is required. First, the data is divided region wise into four parts. Different seismic parameters are calculated using the respective formulae for each region. Region wise preprocessed data are shown in Fig. 3.
4.5 Magnitude Range Classification For estimating the magnitude range of the earthquake occurring in a particular region, the preprocessed data for each of the four regions is fed as input to classification models. This input data will predict if the earthquake’s magnitude will be mild, moderate, or fatal with respect to the particular range of magnitude. If the earthquake’s magnitude is less than 4.5, it is considered a mild earthquake, if the earthquake’s magnitude lies between 4.5 and 5.9, it is considered a moderate earthquake, and if the magnitude is greater than 5.9, it is considered a fatal earthquake. The classifier SVM is used for the prediction of earthquake’s magnitude range. Calculated precision and recall in following way: • Micro average—Calculated using the sum of total true positives, false negatives, true negatives, and false positives. • Macro average—Calculated using the average of precision and recall of each label in the dataset.
20 Hierarchical Earthquake Prediction Framework
251
Table 1 Loss after the given epochs of respective models Model used
Loss after 25 epochs
Loss after 50 epochs
Dense
0.67
0.56
RNN
0.52
0.30
LSTM + CNN
0.51
0.42
LSTM
0.51
0.39
Table 2 Result obtained for SVM Region
Accuracy (%)
Precision (%)
Recall (%)
Micro
Macro
Weighted
Micro
Macro
Weighted
1
89
89
44
78
89
50
89
2
82
82
52
81
82
51
82
3
96
96
57
96
96
50
96
4
78
78
50
76
78
49
78
All
83
83
83
83
83
61
83
Table 3 Result obtained for Naïve Bayes Region
Accuracy (%)
Precision (%)
Recall (%)
Micro
Macro
Weighted
Micro
Macro
Weighted
1
89
89
44
78
89
50
89
2
59
59
36
64
59
63
59
3
96
96
57
95
96
53
96
4
64
64
38
62
64
38
64
All
72
72
43
71
72
42
72
• Weighted average—Calculated by considering the proportion of each label in the dataset to calculate the average of precision and recall of each label. The results in Table 2 shows that the justifiable accuracy is achieved for the magnitude range prediction using SVM. From the analysis of the result, it is found that compared to the data of region 3, data is imbalanced for the regions 1, 2, and 4 and requires the balancing of the data.
4.6 Magnitude Estimation To calculate the estimated magnitude of the earthquake occurring in a particular region, the preprocessed data for each of the four regions is fed as input to the artificial
252
D. Rana et al.
Table 4 Result obtained for random forest Region
Accuracy (%)
Precision (%)
Recall (%)
Micro
Macro
Weighted
Micro
Macro
Weighted
1
91
91
96
92
91
62
91
2
83
82
54
82
83
51
83
3
94
96
65
94
94
40
94
4
81
78
53
80
81
51
81
All
85
85
72
85
85
56
85
Table 5 Result obtained for ANN
Region
MSE (%)
MAE (%)
1
0.11
0.27
2
0.25
0.38
3
0.21
0.36
4
0.15
0.29
All
0.18
0.32
neural network regression model. This model will predict the numeric value of the magnitude.
5 Conclusions This research provided the various reasons that are causing earthquakes and the significant factors affecting them to understand the different parameters that can act as a backbone to estimating earthquakes. Here, we have reviewed various research papers that acquaint us with the multiple methodologies undertaken to aid this prediction. The research proposed a hierarchical framework for more accurate prediction and implemented location prediction using dense neural network for time series model and justifiable magnitude range prediction as well as magnitude value prediction which is achieved by feeding the preprocessed data to the SVM and to ANN model, respectively. In the future, the work will be carried out to implement the LSTM model for location prediction and improve the framework and generalize the model to improve accuracy in predicting the exact magnitude of future earthquakes.
20 Hierarchical Earthquake Prediction Framework
253
References 1. Murwantara IM, Yugopuspito P, Hermawan R (2020) Comparison of machine learning performance for earthquake prediction in Indonesia using 30 years historical data. TELKOMNIKA (Telecommun Comput Electron Control) 18:1331 2. Wang Q, Guo Y, Yu L, Li P Earthquake prediction based on spatiotemporal data mining: an LSTM network approach, IEEE 3. Trans Emer Topics Comput 8:148–158 (2020) 4. Earthquake Statistics and Facts for 2021|PolicyAdvice, Feb 2021 5. Kannan S (2014) Improving innovative mathematical model for earthquake prediction. Eng Fail Anal 41:89–95. https://doi.org/10.1016/j.engfailanal.2013.10.016 6. Boucouvalas AC, Gkasios M, Tselikas NT, Drakatos G (2015) Modified-Fibonacci-dual-lucas method for earthquake prediction. In: Third international conference on remote sensing and geoinformation of the environment (RSCy2015), vol 9535, 95351. https://doi.org/10.1117/12. 2192683 7. Last M, Rabinowitz N, Leonard G (2016) Predicting the maximum earthquake magnitude from seismic data in Israel and its neighboring countries. PLoS ONE 11:e0146101 8. Cortés GA, Martínez-Álvarez F, Morales-Esteban A, Reyes J, Troncoso A (2017) Using principal component analysis to improve earthquake magnitude prediction in Japan. Logic J IGPL jzx049:1–14. https://doi.org/10.1093/jigpal/jzx049 9. Narayanakumar S, Raja K (2016) A BP artificial neural network model for earthquake magnitude prediction in Himalayas, India. Circuits Syst 07(11):3456–3468 10. Search Earthquake Catalog, https://earthquake.usgs.gov/earthquakes/search/
Chapter 21
Classification Accuracy Analysis of Machine Learning Algorithms for Gearbox Fault Diagnosis Sunil Choudhary, Naresh K. Raghuwanshi, and Vikas Sharma
1 Introduction Monitoring of machine condition through vibration measurement is a very popular method in industries. The critical rotating machineries like gears are one of the most important critical components of gearboxes that are used in aircrafts, automobiles, machining tools, etc. The gearbox failure of these rotating machines is the most common reason for machine breakdown. Therefore, the early stage fault detection of gearbox is the main task to avoid sudden failure. It is studied that a lot of work has been completed for fault diagnosis of gearboxes by using conventional signal processing techniques. Many researchers are working on artificial intelligence techniques for machine fault diagnosis. The statistical features’ extraction from vibration data of gearbox was studied based on time domain, frequency domain, time– frequency domain analysis [1–4]. Machine learning algorithms such as decision tree classification (DT) [5], fault detection using proximal support vector machine (PSVM) and artificial neural network (ANN) [6], support vector machines (SVM) [7], ANN and SVM with genetic machine algorithms [8], ANN [9, 10], wavelet transform (WT) and ANN [11, 12] are the common algorithms that were used by many researchers. Jedlinski and Jonak [7] used the SVM algorithm which shows better classification accuracy in present days. SVM is a popular technique in the current decade for fault diagnosis. ANN is widely used by researchers for gearbox fault diagnosis with single as well as multifaults in the gear tooth [10]. Another widely S. Choudhary · N. K. Raghuwanshi (B) Department of Mechanical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India e-mail: [email protected] V. Sharma Department of Mechanical-Mechatronics Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_21
255
256
S. Choudhary et al.
used algorithm for gearbox fault detection is SVM [7, 8, 13, 14]. Random forest classifier depicted very good detection accuracy which demonstrates the feasibility and effectiveness of machine learning algorithms for gearboxes [15]. Semi-supervised graph-based random forest is also applied for gearbox fault diagnosis [16]. Wei et al. [17] used random forest to diagnose faults in the planetary gearbox. Decision tree classifier (J48 type) is also capable of detecting faults present in the spur gearbox with good accuracy [18]. In the present work, the comparison of some of the above and additional machine learning algorithms such as Naive Bayes, decision tree, K-nearest neighbor (KNN), random forest, and SVM is carried out, and these algorithms are applied on vibration data of different classes of gear faults (e.g., chipped, eccentric, and broken). This provides us a better idea for examining better machine learning algorithms for gear faults (e.g., chipped, eccentric, and broken or combined two faults simultaneously) present in the gearbox with accuracy level of classification.
2 Experimental Vibration Data Collection 2.1 Experimental Gearbox Test Setup The data used here is the vibration data of the generic gearbox. This vibration data is imported from the 2009 PHM Society Data challenge [19]. The block diagram of the gearbox is shown in Fig. 1. In this gearbox model, two types of gearbox are used, one is spur gears, and another is helical gears. The spur gearbox has input shaft, idler shaft, and output shaft. Spur Gear 1 (G1) with 32 teeth is mounted on the input shaft. Spur Gear 2 (G2) with 96 teeth and Spur Gear 3 (G3) with 48 teeth are mounted on the idler shaft. Spur Gear 4 (G4) with 80 teeth is mounted on the output shaft. The helical gearbox also has input shaft, idler shaft, and output shaft. Helical Gear 1 (G1) with 16 teeth is mounted on the input shaft. Helical Gear 2 (G2) with
Fig. 1 Generic gearbox block diagram
21 Classification Accuracy Analysis of Machine Learning Algorithms … Table 1 Spur gearbox fault conditions
Table 2 Helical gearbox fault conditions
257
Condition
Description
Spur Gear 1 (SG1) Spur Gear 2 (SG2) Spur Gear 3 (SG3) Spur Gear 4 (SG4) Spur Gear 5 (SG5)
Good or healthy condition G1 chipped and G3 eccentric G3 eccentric G3 eccentric and G4 broken G1 chipped, G3 eccentric, and G4 broken
Condition
Description
Helical Gear 1 (HG1) Helical Gear 2 (HG2) Helical Gear 3 (HG3) Helical Gear 4 (HG4) Helical Gear 5 (HG5)
Good or healthy condition G3 chipped G3 broken Imbalance at input shaft G3 broken
48 teeth and Helical Gear 3 (G3) with 24 teeth are mounted on an idler shaft. Helical Gear 4 (G4) with 40 teeth is mounted on the output shaft. B1, B2, B3, B4, B5, and B6 represent the bearings of the gearbox. In Fig. 1, A1 and A2 are the vibration data collected by accelerometers at input and output sides, respectively. Tm is the tachometer signal (10 pulses per revolution) collected at the input shaft. Gear reduction ratio of gearbox setup is 5:1. Two Endevco accelerometers having sensitivity of 10 mV/g (where g = 9.81 m/s2 ) are mounted on the gearbox casing at input and output sides. It consists of ±1% error. Resonance occurs above 45 kHz frequency. The sampling frequency rate is 66,666.67 Hz or 200 kHz. To evaluate the outcome of different faults on the gearbox vibration signal, results have been completed for six fault conditions of the gearbox for both spur and helical type. The six different fault conditions of the gearbox are considered as shown in Tables 1 and 2. The operating speed of the shaft is varying in continuous increment of 5 Hz by starting from 30 to 50 Hz with high- and low-load conditions. Two samples are collected at high-load condition, and two are at low-load condition with same frequency speed of the shaft. Initially in the first data of vibration signal, the signal readings depict the healthy condition of the spur and helical gearbox. After that, different faults are created in different gears at different locations of the gearbox, and data is collected, respectively, as depicted in Tables 1 and 2.
2.2 Methodology and Procedure The experimental gearbox setup consists of multiple or individual faults in different gears of spur and helical gearbox. The different components of the gearbox used such as gearbox housing, bearings, shaft (input and output sides), gears (total gear four is used in each type of gearbox), so the vibration generated due to gear faults interaction is shifted from gears to shaft axis, shaft axis to bearings which are mounted
258
S. Choudhary et al.
on gearbox housing and then it is transferred from bearings to gearbox housing. This systematic connectivity generates the vibration signal which is measured with the help of two accelerometers (input and output sides). The vibration signals are generated due to input, idler and output shaft rotation, gear tooth meshing, impact, speed variation, and bearings. The time-domain signal data is measured using two accelerometers at regular intervals. For recognizing the spur and helical gearbox faults in the gearbox model, raw vibration data is collected from a double-stage spur and helical gearboxes and with different fault conditions which are shown in Tables 1 and 2. In faulty condition of gearbox, minimum one and maximum five gear faults are existing in the gearbox. After the data collection, each sample data is listed in a .csv format file. This .csv file is used for analysis by using five machine learning algorithms in Python script. There are four features used for classification of faults, and one feature is generated for target class where 0 indicates healthy class of vibration data and 1 indicates for faulty class of data. Then by applying machine learning algorithms, final accuracy results of classification are explained with each class of faults present in the gearbox.
3 Classification Accuracy Selection Process The data is labeled in three columns where first column and second column tell about the vibration data collected by accelerometer mounted at input side and output side, respectively, and third column is for data which is measured by tachometer mounted at input side. Fourth column is generated for frequency in the Python script for helping in classification of faults. The flow diagram for analysis is given in Fig. 2.
4 Machine Learning Algorithm In this work, five machine learning algorithms are used for gearbox fault diagnosis. The algorithms are Naive Bayes, KNN, decision tree, random forest, and SVM. Brief explanations about these algorithms are given in next subsections.
4.1 Naive Bayes Machine Learning Classifier The gearbox healthy and faulty data is classified using Naive Bayes classifier. This classifier is based on Bayes theorem of probability. It is simple and fast in calculation but yet needs to be effective in results for classification. It works well where the vibration signals are noisy. If the target dataset contains large numbers of numeric features, then reliability of model by Naive Bayes classifier is limited.
21 Classification Accuracy Analysis of Machine Learning Algorithms …
259
Fig. 2 Flow diagram for classification accuracy selection process
4.2 KNN Machine Learning Classifier KNN is easy to implement, simple, versatile for use, and one of the best machine learning algorithms. KNN is constructed on a feature similarity method for fault diagnosis. KNN is a lazy learner machine algorithm. In this classifier, the gearbox model structure is defined by the data points itself. Lazy learning machine algorithm for classification of fault detection means this algorithm does not require any training dataset points for generating a model for prediction of new class data. All training dataset points are used in the phase of testing for fault classification. In the KNN algorithm, K is a number which is the nearest neighbor for classification. It does small work in training of datasets and large amounts of work in the
260
S. Choudhary et al.
testing stage to make a classification accurately. It stores the training data points instantly and behaves on upcoming data accordingly. KNN shows better results with a small number of classification features compared to a large number of classification features. In this algorithm, when dimension of dataset is increased by increasing classification features generates the issues of overfitting for the recorded datasets.
4.3 Decision Tree Machine Learning Classifier The decision tree machine learning algorithm is one of the greatest algorithms for classification. As the name indicates, it generates a database model in the form of a tree-like structure. Its grouping exactness for features is focused with different strategies. This algorithm can be used for multi-dimensional analysis with multiple classes of fault detection. The objective of decision tree machine learning is to create a model which predicts the target class value of the outcome results constructed on the new input variables in the classification feature vectors. Feature vector is defined by each node in the decision tree classifier. The output variable of result is evaluated by succeeding a proper path that initiates from the root and is directed by the values of the input variable. A decision tree is usually represented in the format as shown in Fig. 3. Each internal node of decision tree classifier (indicated by boxes) tests an attribute (indicated as A and B within the boxes). Each branch of this algorithm corresponds to an attribute value which is indicated by P and Q in the above figure. Each leaf node defines the class of fault. The first node is known as the root node. Branches generated from the root node are known as leaf nodes. Here, A is the root node, and B is the branch node. P and Q are known as leaf nodes. For small numbers of data points, not much mathematical and computational estimator is required to understand the model. It works well in most cases. It has ability to handle small and large training datasets. It defines a definite clue from which features are more helpful for classification. Often, it is biased toward the features of classification which have more number of levels. Large trees are complex Fig. 3 Block diagram of decision tree classifier
21 Classification Accuracy Analysis of Machine Learning Algorithms …
261
to define the model properly. There is an overfitting issue related to this classifier. So for overcoming this problem, random forest classifier is developed which resolves most of the overfitting issues in dataset evaluation.
4.4 Random Forest Machine Learning Classifier Random forest classifier is also a supervised machine learning algorithm. It is capable of analysis of classification as well as regression. Random forest has the maximum flexibility and is simple to use as a machine learning algorithm. It is a combining classifier that combines many decision tree classifiers to overcome the overfitting issues of the datasets. It is using a large number of tree datasets in random forest for training of models and generates features for models. After that, majority vote is applied to get the combined output of the different tree datasets. This classifier works efficiently on more dataset points for fault classification.
4.5 Support Vector Machine (SVM) SVM is able to do linear classification as well as regression. It is completely based on the concept of surface creation which is called a hyperplane. It creates a boundary between the datasets plotted in the multi-dimensional feature space for classification. The output predicts the class of new upcoming data according to suitability of predefined class at the time of training the algorithm. It can create an N-dimensional hyperplane that assigns the new data class into one of the two output classes (healthy and faulty class). SVM can work for classification problems of two classes or multiclasses. Firstly, the 70% of dataset is trained which is classified into two classes, i.e., healthy and faulty; after that, it creates the structure model. The key task of an SVM machine algorithm is to predict a class from which a new upcoming data point fits in. An SVM creates a graphical map of all the classified data with the maximum margins available between the two classes as shown in Fig. 4.
5 Fault Detection Results and Discussions The database is created by conducting different faulty and healthy tests for the gearbox. The final results of classification accuracy by using different classifiers are analyzed in Python script by using Jupyter notebook. All machine learning algorithm classifiers predict different accuracy levels of classification with different faults in gears which are already listed in Tables 1 and 2. When features for fault detection are increased, then accuracy of Naive Bayes classifier also increases and vice-versa. The classification accuracy assessment of Naive Bayes classifier, KNN classifier,
262
S. Choudhary et al.
Fig. 4 SVM algorithm classification
decision tree classifier, random forest classifier, and support vector machine (SVM) is evaluated with a similar number of data samples for training and testing. It can be well known that the training and testing accuracies for classification are superior to different classifiers with different classes of fault in gears. In the case of spur gearbox, Naive Bayes classifier shows best accuracy of fault classification in all gear faults (chipped, eccentric, and broken) by leaving gear fault type 1 (GFT-1). The results are given in Table 3. SG3, SG4, and SG5 cases of Naive Bayes classifier show accuracy as 54.80%, 61.55%, and 66.34%, respectively. However, the decision tree classifier shows best accuracy for the SG2 case. The SVM classifier remains almost consistent with gear fault type. Similarly in the case of helical gearbox, Naive Bayes classifier shows best classification accuracy in most gear faults as given in Table 4. Best classifier is highlighted in bold for each class of fault in gears. Again the SVM classifier remains almost consistent with gear fault type. For HG2, HG3, and HG5, the accuracy of the Naive Bayes classifier algorithms for gearbox fault detection is 52.96%, 52.71%, and 64.16%, respectively. Random forest classifier accuracy is observed best for HG4 case which is 52.82%. Table 3 Classification accuracy of model with different classifiers of spur-type gearbox S. No.
Model class Gear fault Naive of spur gear type (GFT) Bayes 1 (SG1) classifier versus (in %)
KNN classifier (in %)
Decision tree classifier (in %)
Random forest classifier (in %)
SVM classifier (in %)
1
SG2
GFT-1
55.60
53.12
55.86
54.05
49.76
2
SG3
GFT-2
54.80
52.28
54.26
52.93
49.86
3
SG4
GFT-3
61.55
58.23
59.50
60.06
50.02
4
SG5
GFT-4
66.34
63.80
65.41
66.21
49.14
21 Classification Accuracy Analysis of Machine Learning Algorithms …
263
Table 4 Classification accuracy of model with different classifiers of helical-type gearbox S. No.
Model class of helical gear 1 (HG1) versus
Gear fault Naive type (GFT) Bayes classifier (in %)
KNN classifier (in %)
Decision tree classifier (in %)
Random forest classifier (in %)
SVM classifier (in %)
1
HG2
GFT-1
52.96
51.59
52.58
52.75
49.18
2
HG3
GFT-2
52.71
51.83
51.77
51.93
50.75
3
HG4
GFT-3
51.51
51.68
53.71
52.82
49.88
4
HG5
GFT-4
64.15
60.04
63.45
62.74
50.49
6 Conclusions All machine learning algorithms which are used for gearbox fault detection have been established with the help of time-domain vibration signals. The gearbox fault diagnosis is done by using Naive Bayes, KNN, decision tree, random forest, and SVM algorithms and compares the results of accuracy. The best accuracy of classification is found by using Naive Bayes classifier for most of the cases of the gearbox faults. However, SVM is observed as a reliable algorithm due to its consistency in the results. The present work based on Naive Bayes, KNN, and random forest is employed for gearbox fault diagnosis and found that these algorithms can also be used for gearbox fault diagnosis.
References 1. Laxmikant S, Mangesh D, Chaudhari B (2018) Compound gear-bearing fault feature extraction using statistical features based on time-frequency method. Measurement 125:63–77 2. Loutas TH, Sotiriades G, Kalaitzoglou I, Kostopoulos V (2009) Condition monitoring of a single- stage gearbox with artificially induced gear cracks utilizing on-line vibration and acoustic emission measurements. Appl Acoust 70:1148–1159 3. Assaad B, Eltabach M, Antoni J (2014) Vibration based condition monitoring of a multistage epicyclic gearbox in lifting cranes. Mech Syst Signal Process 42:351–367 4. Li Y, Ding K, He G, Lin H (2016) Vibration mechanisms of spur gear pair in healthy and fault states. Mech Syst Signal Process 81:183–201 5. Saravanan N, Ramachandran KI (2009) Fault diagnosis of spur bevel gear box using discrete wavelet features and decision tree classification. Expert Syst Appl 36:9564–9573 6. Saravanan N, Kumar VNS, Siddabattuni, Ramachandran KI (2010) Fault diagnosis of spur bevel gear box using artificial neural network (ANN) and proximal support vector machine (PSVM). Appl Soft Comput 10:344–360 7. Jedlinski L, Jonak J (2015) Early fault detection in gearboxes based on support vector machines and multilayer perceptron with a continuous wavelet transform. Appl Soft Comput 30:636–641 8. Samanta B (2004) Gear fault detection using artificial neural networks and support vector ma chines with genetic algorithms. Mech Syst Signal Process 18:625–644
264
S. Choudhary et al.
9. Rafiee J, Arvani F, Harifi A, Sadeghi MH (2007) Intelligent condition monitoring of a gear box using artificial neural network. Mech Syst Signal Process 21:1746–1754 10. Dhamande LS, Chaudhari MB (2016) Detection of combined gear-bearing fault in single stage spur gear box using artificial neural network. Proc Eng 144:759–766 11. Saravanan N, Ramachandran KI (2010) Incipient gear box fault diagnosis using discrete wave let transform (DWT) for feature extraction and classification using artificial neural network (ANN). Expert Syst Appl 37:4168–4181 12. Wu JD, Chan JJ (2009) Faulted gear identification of a rotating machinery based on wavelet transform and artificial neural network. Expert Syst Appl 36:8862–8875 13. Bordoloi DJ, Tiwari R (2014) Support vector machine based optimization of multi-fault classi fication of gears with evolutionary algorithms from time-frequency vibration data. Meas J Int Meas Confed 55:1–14 14. Bordoloi DJ, Tiwari R (2014) Optimum multi-fault classification of gears with integration of evolutionary and SVM algorithms. Mech Mach Theory 73:49–60 15. Zarnaq MH, Omid M, Aghdamb EB (2022) Fault diagnosis of tractor auxiliary gearbox using vibration analysis and random forest classifier. Inform Process Agric 9:60–67 16. Chen S, Yang R, Zhong M (2021) Graph-based semi-supervised random forest for rotating machinery gearbox fault diagnosis. Control Eng Pract 117:104952 17. Wei Y, Yang Y, Xu M, Huang W (2021) Intelligent fault diagnosis of planetary gearbox based on refined composite hierarchical fuzzy entropy and random forest. ISA Trans 109:340–351 18. Gunasegaran V, Muralidharan V (2020) Fault diagnosis of spur gear system through decision tree algorithm using vibration signal. Mater Today Proc 22:3232–3239 19. PHM Data Challenge Homepage, https://c3.ndc.nasa.gov/dashlink/resources/997/. Last Accessed on 1 Feb 2022
Chapter 22
Stock Price Forecasting Using Hybrid Prophet—LSTM Model Optimized by BPNN Deepti Patnaik, N. V. Jagannadha Rao, and Brajabandhu Padhiari
1 Introduction Stock market is the group of markets involving regular selling, purchasing, and issuance of shares of the different public held companies through institutionalized formal exchange. Such activities are operated in a market place under a defined set of regulations [1]. The shares are traded in the stock market through stock exchange. The stock exchange being the designated market for trading ensures complete transparency in the transactions [2]. Stock price depends upon various parameters such as financial situations, administrative decisions, market fluctuations, and pandemic effect. [3] Again, the stock prices are dynamic, nonlinear, and noisy in nature [4]. Thus, the prediction of stock prices is a complex issue because of irregularities, volatility, changing trends, noisy, complex features, etc. [5]. Time series is the collection of data with equal spacing and should be in sequence. In time series, the successive observations are usually not independent. Thus, any variable which changes over time is included in time series analysis [6]. Stock prices basically vary with respect to time, thus can be tracked for short term (every business day) or long term such as every month over the course of the last 18–20 years. The highlight of the stock market is the seasonal trend [7]. It is observed from the literature that, Ariyo et al. predicted autoregressive integrated moving average (ARIMA) model for New York Stock Exchange (NYSE) and Nigeria Stock Exchange (NSE) data and proved that ARIMA model is reasonably well with emerging forecasting techniques in short-term predictions [8]. Chen and Chen proposed a fuzzy time series model for stock prediction using linguistic value D. Patnaik (B) · N. V. J. Rao GIET-University, Gunupur, Rayagada, Odisha 765 022, India e-mail: [email protected] D. Patnaik · B. Padhiari IIMT Bhubaneswar, Khurda, Odisha 751 003, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_22
265
266
D. Patnaik et al.
of data. For nonlinear and dynamic datasets, these methods are widely accepted [9]. Elsir and Faris employed regression, artificial neural networks, and support vector machines (SVM) models for forecasting stock data [10]. Bharambe and Dharmadhikari showed that the stock prediction depends on declaration of dividends, estimations of future incomes, management changes, and so on [11]. Shrivas and Sharma projected various machine learning techniques, which involve SVM, regression technique, and artificial neural network (ANN) to analyze and predict the stock trends for Bombay Stock Exchange (BSE) stock prices. They have proved SVM is the best technique among all [12]. Xu et al. use historical price data to forecast the stock price value based on a deep neural network [13]. Li et al. proposed deep learning algorithms to predict market impacts [14]. For right investments with high profit, More et al. projected neuro-linguistic programming (NLP) approach for finding the stock charts [15]. After having these literature studies, it is found that the stock price is a time series stochastic process, and its stochastic nature introduces variation in volatility and thus risk [16]. Although causes are known, its quantification is very difficult [17]. Thus, in this work, a hybrid model which is a combination of prophet and long short-term memory (LSTM) is proposed to overcome these issues. Again at the end, forecasting is further optimized by backpropagation neural network (BPNN). The performance parameters used here are root mean square error (RMSE), mean absolute percentage error (MAPE), and mean average error (MAE).
2 Prophet Model Prophet or ‘Facebook Prophet’ is developed by Facebook for forecasting additive time series model, where nonlinearity is fit with seasonality, daily, weekly as well as holiday effects. It is a novel model for missing data, shifts in trend and handles outliers too. The mathematical dynamics for it is given by [18], Y (t) = g(t) + s(t) + h(t) + e(t)
(1)
where g(t) represents the nonlinear function, s(t) represents the seasonality changes, h(t) represents the holidays or irregular schedules, e(t) is the error value. The modeling flowchart of the prophet model is given in Fig. 1.
3 LSTM Model LSTM is a special kind of recurrent neural network (RNN), which can learn from the long-term dependencies in the data. It is achieved by a combination of 4 layers interacting with each other. Hence overcomes the vanishing gradient problem. LSTM consists of 3 types of gates within a unit those are (i) Forget gate, which decides what
22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …
267
Parameter tuning
Performance Measures
Prophet Model
Stock price values
Final Predictions
Fig. 1 Flowchart of prophet model
information has to throw away from network (ii) Input gate, which decides which values have to be updated from the memory (iii) Output gate, which decides the output based on input and the memory block It works like a mini state machine, where the weights of the gates are learned during the training process. The internal short-term memory is given by [19], h t = σ (U xt + V h t−1 )
(2)
where ht is the model internal short-term memory, σ is the sigmoid function, U is the weight to input, x t is the sequential input, V is the weight to the short-term memory. The output is given by [20], Yt = W h t
(3)
where w is the weight to the output.
4 MAE, RMSE, MAPE 4.1 Mean Absolute Error (MAE) For a given dataset, the mean absolute error is given by [21], MAE =
N 1 abs(ei ) N i=1
(4)
where ei is the error (difference between original and predicted value), N is the total number of samples. It is also called mean absolute deviation. It shows the overall error in the forecasting of stocks.
268
D. Patnaik et al.
4.2 Root Mean Square Error (RMSE) Root mean square error is a balanced error measure and is a very effective parameter in the accuracy measurement of stock forecasting. It is given by [21], RMSE =
N 1 2 e N i=1 i
(5)
4.3 Mean Absolute Percentage Error (MAPE) Mean absolute percentage error is also called as loss function in forecasting and is given by [21], N MAPE =
i=1
abs N
ei O
× 100%
(6)
where ei is the error (difference between original and predicted value), N is the total number of samples, ‘o’ is the original sample value.
5 Proposed Method At first, the historical data were collected from the Yahoo Finance Web site for the last 7 years, from 15.04.2015 to 14.04.2022. This data include the closing prices of NIFTY 50, S&P 500, Nikkei 225. To visualize the trend, the price index of these stock exchanges is shown in graphs (Figs. 2, 3, and 4). The flowchart of the proposed model is given in Fig. 5. Generally, the stock price value is the composition of linear and nonlinear components. It is given to the input of the prophet model that is designed to have automatic tuning of parameters without any prior knowledge. The prophet model is robust against missing data, holiday data, etc. Thus, the data interpolation is not required. The prophet library of Python 3.10 is used here for simulation purposes. Seasonality is addressed here by using Fourier series. The default values of Fourier components used here are 10 and 3 for yearly and weekly seasonality, respectively. Out of 7 years of data, 6 years are used for training purposes, and the last one year (15.04.2021 to 14.04.2022) is used for testing purposes. The accuracy of the model is tested using the original values and forecasted values in terms of RMSE, MAPE, and MAE. The error terms or residual terms concerned with nonlinearity of the data are forwarded to the next model that is deep learning LSTM model. Before applying it to the LSTM model, the data have been first normalized between 0 and 1. The number of epochs
22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …
269
Fig. 2 Stock price index value for S&P 500
Fig. 3 Stock price index value for Nikkei 225
set for the LSTM model is 250. Adam optimizer has been used for the LSTM model. Finally, the LSTM output stock price data have been applied to BPNN for fine tuning and optimization. BPNN consists of an input layer, hidden layer, and output layer.
270
D. Patnaik et al.
Fig. 4 Stock price index value for NIFTY 50 Fig. 5 Flowchart of the proposed hybrid model
Stock price data
Prophet model
Residual of the Prophet model
LSTM model
Forecasting
values
of
LSTM
Optimization by BPNN
Final forecasting values
22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …
271
Table 1 Statistical analysis of price of indices of S&P 500 Name
RMSE
MAPE (%)
MAE
Prophet
18.66
0.711
15.56
Hybrid model (prophet and LSTM)
16.23
0.674
14.24
Proposed (prophet—LSTM and BPNN)
13.52
0.588
11.57
Table 2 Statistical analysis of price of indices of Nikkei225 Name
RMSE
MAPE (%)
MAE
Prophet
208.76
0.998
151.22
Hybrid model (prophet and LSTM)
201.55
0.945
149.61
Proposed (prophet—LSTM and BPNN)
192.16
0.8412
145.34
MAPE (%)
MAE
Table 3 Statistical analysis of price of indices of NIFTY 50 Name
RMSE
Prophet
152.56
0.967
111.62
Hybrid model (prophet and LSTM)
150.66
0.954
109.34
Proposed (prophet—LSTM and BPNN)
145.87
0.901
105.23
6 Results and Discussion The proposed model is simulated in a Python environment for the very standard stock indices of S&P 500, Nikkei 225, and NIFTY 50. Out of 7 years of data, 6 years of data are used for training purposes; last one year is used for testing purposes. The RMSE, MAPE, MAE obtained for the testing dataset is shown in Tables 1, 2, and 3 for the stock index values of S&P 500, Nikkei 225, and NIFTY 50, respectively. It is observed that the hybrid model performs better in all three cases because it takes care of the linear, nonlinear variations, and also, it is optimized by the BPNN network.
7 Conclusion In this work, a novel hybrid forecasting model (Prophet—LSTM) is proposed, and it has been optimized by the BPNN network. It is found that the proposed optimized model performs better in all respects for various standard stock index values collected for various countries all over the globe. The model will be very much useful for the forecasting of future stock prices. It may further be analyzed for other optimization techniques and also for various other stock values.
272
D. Patnaik et al.
References 1. Granger CWJ, Newbold P (2014) Forecasting economic time series. Academic Press 2. Idrees SM, Alam MA, Agarwal P (2019) A Prediction approach for stock market volatility based on time series data. IEEE Access 7:17287–17298. https://doi.org/10.1109/ACCESS. 2019.2895252 3. Wen M, Li P, Zhang L, Chen Y (2019) Stock market trend prediction using high-order information of time series. IEEE Access 7:28299–28308. https://doi.org/10.1109/ACCESS.2019. 2901842 4. Zavadzki S, Kleina M, Drozda F, Marques M (2020) Computational intelligence techniques used for stock market prediction: a systematic review. IEEE Lat Am Trans 18(04):744–755. https://doi.org/10.1109/TLA.2020.9082218 5. Devadoss A, Antony L (2013) Forecasting of stock prices using multi-layer perceptron. Int J Web Technol 2:52–58. https://doi.org/10.20894/IJWT.104.002.002.006 6. Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: IJCAI’15: Proceedings of the 24th international conference on artificial intelligence 7. Li W, Liao J (2018) A comparative study on trend forecasting approach for stock price time series. In: Proceedings of the international conference on anti-counterfeiting, security and identification 8. Ariyo AA, Adewumi AO, Ayo CK (2014) Stock price prediction using the ARIMA model, In: 2014 UK Sim-AMSS 16th international conference on computer modelling and simulation, pp 106–112 9. Chen MY, Chen BT (2015) A hybrid fuzzy time series model based on granular computing for stock price forecasting. Inform Sci 294:227–241 10. Elsir AFS, Faris H (2015) A comparison between regression, artifcial neural networks and support vector machines for predicting stock market index. Int J Adv Res Artif Intell 4(7) 11. Bharambe MMP, Dharmadhikari SC (2017) Stock market analysis based on artificial neural network with big data. In: Proceedings of 8th post graduate conference for information technology 12. Shrivas AK, Sharma SK (2018) A robust predictive model for stock market index prediction using data mining technique 13. Xu B, Zhang D, Zhang S, Li H, Lin H (2018) Stock market trend prediction using recurrent convolutional neural networks. In: Proceedings of CCF international conference on natural language processing and Chinese computing, pp 166–177 14. Li X, Cao J, Pan Z (2019) Market impact analysis via deep learned architectures. Neural Comput Appl 31:5989–6000 15. More AM, Rathod PU, Patil RH, Sarode DR (2018) Stock market prediction system using Hadoop. Int J Eng 16138 16. Sezer OB, Gudelek MU, Ozbayoglu AM (2020) Financial time series forecasting with deep learning: a systematic literature review 2005–2019. Appl Soft Comput J 90 17. Chatzis SP, Siakoulis V, Petropoulos A, Stavroulakis E, Vlachogiannakis N (2018) Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Syst Appl 112:353–371 18. Bashir T, Haoyong C, Tahir MF, Liqiang Z (2022) Short term electricity load forecasting using hybrid prophet—LSTM model optimized by BPNN, Energy reports, 8, Elsevier 19. Zhang X, Tan Y (2018) Deep stock ranker: a LSTM neural network model for stock selection. In: International conference on data mining and big data. Springer, pp 614–623 20. Charan VS, Rasool A, Dubey A (2022) Stock closing price forecasting using machine learning models. In: International conference for advancement in technology (ICONAT), pp 1–7. https:// doi.org/10.1109/ICONAT53423.2022.9725964 21. Rezaei H, Faaljou H, Mansourfar G (2020) Stock price prediction using deep learning and frequency decomposition. Elsevier, Expert systems with applications
Chapter 23
Identification of Genetically Closely Related Peanut Varieties Using Deep Learning: The Case of Flower-Related Varieties 11 Atoumane Sene, Amadou Dahirou Gueye, and Issa Faye
1 Introduction In Senegal, groundnut (Arachis hypogaea L.) is the most important cash crop with a sown area of about 1 million hectares quoted in FAO [1]. Annual production is around one million tons per year, making Senegal one of the largest groundnut producers in Africa. This production is supported by the development of improved varieties developed by ISRA and an increasing demand for certified seed by producers for these new improved varieties. In the seed certification process, there is a need for variety identification. However, variety identification has traditionally been based on morphological traits, particularly pod and seed characteristics. Moreover, these characteristics are highly influenced by environmental factors. Thus, this method of identification can be difficult when genetically related varieties are involved. Isogenic lines resulting from a selection scheme based on backcrossing are particularly difficult to differentiate between them and their parent on the basis of morphological characters. Thus, the development of more advanced identification methods based on approaches using artificial intelligence could contribute to the improvement of variety identification methods. In the ECOWAS region, Distinctness, Uniformity and Stability (DUS) tests are carried out according to the International Union for the Protection of New Varieties of Plants (UPOV) guidelines for groundnuts (TG/93/4) in UPOV [2]. Similarly, the A. Sene (B) · A. D. Gueye Alioune Diop University of Bambey, TIC4DEV Team, Bambey, Senegal e-mail: [email protected] A. D. Gueye e-mail: [email protected] I. Faye Senegalese Agricultural Research Institute/National Center for Agricultural Research, Bambey, Senegal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_23
273
274
A. Sene et al.
official seed control services use the same descriptor (www.upov.int). With the need for new varieties according to the various uses of the stakeholders, there is a real need to strengthen the varietal portfolio. In Senegal, ten new groundnut varieties were recently released in WAAPP [3]. These varieties are interspecific introgression lines resulting from a cross between the Fleur 11 variety and the amphidiploid (Arachis duranensis × Arachis ipaënsis) with four backcrosses made on the Fleur 11 parent variety WAAPP [3]. These varieties are morphologically quite similar and all resemble the parent variety Fleur 11. Identification based solely on the UPOV descriptor characteristics may not be very effective. Traditional methods of variety identification are relatively limited. This may affect the quality of the seed. So far, little research has been done to develop innovative methods to improve variety identification WAAPP [3]. Therefore, the objective of this study is to propose an automated identification model based on deep learning to identify the variety from its pod and seed characteristics. This model is based on VGG16 which is an improved learning technique using deep convolutional networks. It has the advantage of being fast and reliable for the identification of peanut varieties, especially those that are genetically very close. The rest of the paper will be organized as follows: in Sect. 2, we present related work in peanut variety identification. In Sect. 3, we first propose the methodology adopted. In Sect. 4, we present the VGG16 model on which our proposal is based.
2 Related Work In this section, we first present the situation of groundnut seed production and certification in Senegal; then we detail the existing identification methods for groundnut seeds.
2.1 Situation of Peanut Seed Production and Certification in Senegal Peanut (Arachis hypogaea L.) is a leguminous plant native to Latin America by Kouadio [4]. It is cultivated throughout the inter-tropical zone and is of great nutritional and economic importance. It is the sixth most important oil seed crop in the world FAO [5] and is cultivated by more than 100 countries on more than 26.4 million hectares with an average productivity of 1.4 tons per hectare FAO [5] and Ntare et al. [6]. In Senegal, groundnuts generate income for about a third of the population cited in Dia et al. [7] and are the second most important export of the entire agricultural sector after fisheries products ANSD[8]. Moreover, Senegal is the leading exporter of groundnuts in Africa in terms of value and volume, and these exports are primarily
23 Identification of Genetically Closely Related Peanut Varieties Using …
275
Fig. 1 Groundnut production and yield in Senegal. Source DAPSA 2019
directed to China Baborska [9]. The sector has been going through a deep crisis for several decades, resulting from a combination of structural factors (disorganization of the chain, dilapidated processing infrastructure, degradation of seed capital); climatic factors (irregular and insufficient rainfall, soil degradation); and cyclical factors (emergence of new, cheaper oils at the international level). Between 2006 and 2014 (see Fig. 1), production rarely exceeded 700 thousand tons due to low yields.
2.2 Existing Methods of Identification of Peanut Seeds in Senegal In the seed certification process, varietal purity is an important parameter. Thus, the identification of varieties on the basis of distinguishing characteristics is necessary. At present, identification is based on a set of morphological characteristics which are presented in document TG/93/4 UPOV [2] on Guidelines for the Examination of Distinctness, Uniformity and Stability of Peanut. These distinguishing characteristics are plant habit, plant density, anthocyanin coloration of branches, branching type, leaflet shape, pod and seed shape. These observations may have limitations when dealing with genetically closely related varieties shown area of about 1 million hectares FAO [1]. Annual production is around one million tons per year, making Senegal one of the largest groundnut producers in Africa. This production is supported by the development of improved varieties developed by ISRA and an increasing demand for certified seed by producers for these new improved varieties. In the seed certification process, there is a need for variety identification. However, variety identification has traditionally been based on morphological traits, particularly pod and seed characteristics. Moreover, these characteristics are
276
A. Sene et al.
highly influenced by environmental factors. Thus, this method of identification can be difficult when genetically related varieties are involved. Isogenic lines resulting from a selection scheme based on backcrossing are particularly difficult to differentiate between them and their parent on the basis of morphological characters. Thus, the development of more advanced identification methods based on approaches using artificial intelligence could contribute to the improvement of variety identification methods.
3 Methodology The methodological approach is based in a first step on an image acquisition on seeds and pods of a panel of six varieties related to flower 11. In a second step, we proceeded to the implementation of a data set of images classified according to the varieties.
3.1 Existing Methods of Identification of Peanut Seeds in Senegal The material used is composed of seven varieties, all of which are derived from the Fleur 11 variety. The Taaru variety is a cross between the Fleur 11 variety and the 73–30 variety. The other six varieties—Raw Gadu, Rafet Kaar, Yakaar, Jaambar, Tosset and Kom Kom—are all sister lines resulting from a cross between Fleur 11 and the amphidiploid (Arachis duranensis × Arachis ipaënsis) × 4. For each variety, images of the seed and shell are recorded. Thus we have a dataset of 1102 images divided into three folders: train, test and validation.
3.2 Image Acquisition The samples of these peanut varieties used in this paper all come from the National Center for Agricultural Research (CNRA) in Bambey, Senegal. They are all approved and rigorously selected to ensure accurate results. An iPhone 11 prox max camera was used with a resolution of 12 Mega pixel to record the images. The phone was mounted on a stand that allowed easy vertical movement and a stable support for the camera. For each peanut variety clear images of the shell and seed were obtained. All seeds and hulls in the sample were certified varieties that were humanely selected in bags. Each hull or seed could be placed in any random orientation and at any position in the field of view. The background was a white tablecloth.
23 Identification of Genetically Closely Related Peanut Varieties Using …
277
Fig. 2 Seeds and pods of the six peanut varieties from flower 11
The field of view was 12 mm × 9 mm. And the spatial resolution was approximately 0.019 mm/pixel. For each variety, images of the seed and shell are recorded. Thus we have a dataset of 1102 images divided into three folders: train, test and validation (Fig. 2).
4 Model Construction Deep learning is a specific subfield of machine learning: a new approach to learning representations from data that focuses on learning successive layers of increasingly meaningful representations. The word “deep” in “deep learning” does not refer to any deeper understanding achieved by this approach of successive layers of representations. The number of layers contributing to a data model is called the model depth Datascientest [10]. We have used the CNN model using the python language. In the field of deep learning, a convolutional neural network (CNN) is a class of deep neural networks, most often applied to visual imagery analysis. It uses a special technique called convolution. In mathematics, convolution is a mathematical operation on two functions that produces a third function expressing how the shape of one is modified by the other. In effect the convolutional layer, a matrix named Kernel is passed over the input matrix to create a feature map for the next layer. The Kernel matrix onto the input matrix. Thus convolution is a specialized type of linear operation that is widely used in many fields including image processing, statistics, physics. If we have a 2-dimensional image input, I, and a 2-dimensional kernel filter, K, the convolved image is calculated as follows: F(i, j) = (I ∗ K )(i, j) =
I (i − m, j − n)K (m, n)
(1)
278
A. Sene et al.
Fig. 3 VGG16 architecture. Source Datascientest
We used in this paper the open-source software library TensorFlow for machine learning and Keras, as a wrapper, which is a high-level neural network library that is built on top of TensorFlow. We used Sublim text as a development environment. Then we used the VGG16 model, a well-known algorithm in computer vision that is often used by transfer learning to avoid having to re-train it and solve similar problems on which VGG has already been trained Datascientest [10]. As the model is trained on a large dataset, it has learned a good representation of low-level features such as space, edges, rotation, illumination, shapes and these features can be shared to allow knowledge transfer and act as a feature extractor for new images in different categories Aung and al. [11]. These new images can be of completely different categories from the source dataset of the model that was pre-trained. In this paper, we will unleash the power of transfer learning using a VGG16 pre-trained model. The VGG16 pre-trained model is used as an efficient feature extractor to classify the six (06) varieties. With a dataset of 1102 images consisting of seeds and pods for each variety listed above; our model, through each layer, filters each image keeping only discriminative information such as atypical geometric shapes (Fig. 3).
5 Model Training We used a reduced number of images to add a constraint on the input images. We took 1102 images of seed and pod data of the peanut varieties listed above for training and 90 images for validation. The pre-trained VGG16 model served us as a feature extractor and will also use the low-level features such as edges, corners and rotation specific to the target problem which is to classify these images according to the corresponding variety proposed by Naveen and Diwan [12].
23 Identification of Genetically Closely Related Peanut Varieties Using …
279
The default input image size of VGG16 is 224 × 224. We started by resizing the images in our dataset. Then we converted our images into a pixel array by transforming them into an array with numpy array with the function img_to_array() Karen [13]. Finally, the number of epochs is set to 100 and then the training is launched. This will allow us to deduce that with each iteration on the dataset, the model is strengthened and therefore becomes better. Following the training, our model was tested through the generation of accuracy and loss curves, as shown in Figs. 4 and 5. Through the previous curves, we can see that the model has not finished learning. Indeed, the curve concerning the validation data set is stagnating. This led us to use data augmentation to avoid overflowing. To improve our model, we need huge amounts of data. Indeed, the quantity and especially the quality of our dataset will have a major role in the elaboration of a good model. Indeed, it is logical to have Fig. 4 Evolution of accuracy during training
Fig. 5 Evolution of accuracy during training
280
A. Sene et al.
data that are comparable between them, that is to say that they have the same format, the same size and length, etc. This is why we decided to use data augmentation. Data augmentation is based on the principle of artificially increasing our data, by applying transformations to it and increasing the diversity and therefore the learning field of our model, which will be able to adapt better to predict new data. After a new training, we have the following curves with a denser precision and a loss that tends toward 0 as the training goes on, as shown in Figs. 6 and 7. Data Augmentation is only for training data. Fig. 6 Evolution of loss during training
Fig. 7 Evolution of the loss of validation during training
23 Identification of Genetically Closely Related Peanut Varieties Using …
281
Fig. 8 Evaluation of model performance and accuracy
6 Model Evaluation and Testing Figure 8 shows the loss rate results for a convolution neural network in the training and test sets in 100 repetitions. This indicates that the convolution neural network learned the data efficiently and can serve as a good model for variety recognition for efficient variety identification. After the data augmentation, we evaluate our model. Thus, we note a loss of 0.54 and an accuracy of 0.72, which allows us to say that we have succeeded in improving our model. Following this, we took an image from the test dataset that constitutes 30% of the training. We randomly selected an image of the Kom kom variety. Another image could be tested. We called our already trained and evaluated model to test this image. For a given image, the model is inspired by the distinctive characteristics of the seed or pod in order to give the likelihood of resemblance to the already registered varieties. In the test results, we tested a “kom kom” image and Fig. 9 shows the representations obtained for each variety through categorical classification.
7 Conclusion In this paper, we sought to provide an answer to the problem of identifying peanut varieties derived from flower 11, given that they are genetically very similar and therefore difficult to identify with the naked eye. The choice of the species flower 11
282
A. Sene et al.
Fig. 9 Identification results of seed image of the kom kom variety
is justified by several reasons of a nutritional nature, yield and adaptation to climate change conditions. To achieve this, we approached the agricultural experts at the CNRA in Bambey to obtain images of these approved varieties in order to build our dataset on 1102 images consisting of images of peanut shells and seeds classified according to the six (06) selected varieties. We first trained our deep learning model on a dataset of 1102 and found that the model is not optimal. We then applied data augmentation to re-train the model which gave satisfactory performance results with a loss of 0.54 and an accuracy of 0.72. Therefore, after the test, we were able to classify a kom kom seed with a probability of 60%. In the perspectives we intend to increase the size of our dataset, to improve the quality of the images taken, to increase also the training time of our model for more precision. We also intend to go beyond the characteristics of the pod and the seed by adding in our model other characteristics related to the morphology of the plant, i.e., the color and the leaves of the plant to obtain a model of better varietal purity and thus of better output.
References 1. FAO: Bases de données FAOSTAT (2021) Données de l’alimentation et de l’agriculture 2. UPOV: TG/93/4(proj.2), https://www.upov.int/meetings/fr/doc_details.jspmeeting_id= 25646&doc_id=202761, Last Accessed 23 June 2022 3. WAAPP, http://www.waapp-ppaao.org/senegal/index.php/actualit%C3%A9s/240-seuls30-des-besoins-en-semences-d%E2%80%99arachide-assur%C3%A9s-expert.html, Last Accessed 10 Jan 2022 4. Kouadio AL (2007) Des Interuniversitaires en gestion des risques naturels: Prévision de la pro-duction nationale d’arachide au Sénégal à partir du modèle agrométéorologique AMS et du NDVI. In: ULG-Gembloux, p 54 5. FAO: L’évaluation de la dégradation des terres au Sénégal (2003) Projet FAO land degradation assessment. In: Rapport préliminaire. Avril., p 59 6. Ntare BR, Diallo AT, Ndjeunga J, Waliyar F (2008) Groundnut seed production manual. In: Manual. International Crops Research Institute for the Semi-Arid Tropics
23 Identification of Genetically Closely Related Peanut Varieties Using …
283
7. Dia D, Diop AM, Fall CS, et Seck T (2015) Sur les sentiers de la collecte et de la commercialisation de l’arachide au Sénégal. In: Les notes politiques de l’ISRA-BAME n°1. ISRA Bureau d’Analyses Macro Économiques (BAME), Dakar 8. ANSD (2007) Note d’Analyse du Commerce Extérieur. In: Edition 2018, Gouvernement du Sénégal, Dakar 9. Baborska R (2021) Suivi des politiques agricoles et alimentaires au Sénégal 2021. In: Rapport d’analyse politique. Suivi et analyse des politiques agricoles et alimentaires (SAPAA). Rome, FAO 10. Datascientest, https://datascientest.com/computer-vision, Last Accessed 25 Feb 2022 11. Aung H, Bobkov AV, Tun NL (2021) Face detection in real time live video using Yolo algorithm based on Vgg16 convolutional neural network. In: 2021 International conference on industrial engineering, applications and manufacturing (ICIEAM) 12. Naveen P, Diwan B (2021) Pre-trained VGG-16 with CNN architecture to classify X-rays images into normal or pneumonia. In: International conference on emerging smart computing and informatics (ESCI) 13. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large scale image recognition. In: International conference on learning representations 14. TALKAG, https://www.talkag.com/blogafrique/?p=3199, Last Accessed 15 Jan 2022 15. ANSD: Situation Économique et Social du Sénégal 2017–2018, https://satisfac-tion.ansd.sn/, Last Accessed 01 Mar 2022
Chapter 24
Efficient Color Image Segmentation of Low Light and Night Time Image Enhancement Using Novel 2DTU-Net and FM2 CM Segmentation Algorithm Chandana Kumari
and Abhijit Mustafi
1 Introduction The significance of SS is highlighted owing to the drastically rising demand for autonomous driving with higher-level scene understanding abilities [1]. A system of categorizing every pixel into a predetermined class for each pixel of an input image is termed SS [2]. Several SS models centered on Convolutional Neural Networks (CNN) have been hugely researched with the enhancement of deep learning technologies along with computer hardware [3]. In the daytime, many prevailing SS depicts higher performance; however, in the night time, the performance is low [4]. Owing to an inadequate quantity of external light, the images’ brightness is extremely less at night time along with the noise caused by the camera sensor raises [5]. Moreover, because of the camera’s longer exposure time, the motions together with optical blur are engendered in images [6]. SS is very complex in a LL environment, and performance enhancement is a difficult issue because of those issues [7]. Enormous LL image segmentation methodologies are researched for tackling that issue. The prevailing technique has large accuracy; however, the neural network’s training procedure is tedious and the training time is huge [8, 9]. Moreover, the prevailing systems’ accuracy rates aren’t adequate [10]. Thus, by employing a new 2DTU-NET and FM2 CM that effectively segments the LL images with a better accuracy rate, an effectual color image segmentation of LL and night time image enhancement is proposed. The balance part is arranged as: the related works concerning the proposed system are elucidated in Sect. 2; the proposed technique is expounded in Sect. 3; the outcomes and discussions grounded on performance metrics are illustrated in Sect. 4; the paper’s conclusion with future work is depicted in Sect. 5. C. Kumari (B) · A. Mustafi Birla Institute of Technology, Mesra, Ranchi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_24
285
286
C. Kumari and A. Mustafi
2 Literature Survey Cho et al. [11] introduced an altered Cycle Generative Adversarial Network (CycleGAN)-centric multi-class segmentation methodology, which enhanced the multi-class segmentation execution for LL images. LL databases engendered by ‘2’ road scene open databases, which offer segmentation labels were wielded. When weighed against state-of-the-art techniques, this system depicted higher performance. But it had a limitation of deprived reliability and high needs. Cho et al. [12] presented a LL image segmentation technique grounded on an altered CycleGAN. LL databases engendered from ‘2’ well-known road scene open databases like Cambridge-driving Labeled Video Database (CamVid) along with KITTI were wielded. When analogized to the prevailing state-of-the-art methodology, this system depicted enhanced segmentation performance in hugely LL surroundings. Nevertheless, the disadvantage is that the cost constraint frequently limits larger-scale applications. Baoet al. [13] determined an effectual segmentation technique, which merged dark images’ LL enhancement, level set segmentation’s bias correction, and local entropy segmentation. Initially, through the histogram equalization and V-channel of Hue, Saturation, and Value (HSV) color space the dark images were improved. Next, the level set technique corrected the enhanced image’s bias field. Afterward, the image local entropy was obtained. Ultimately, the segmentation outcomes were achieved. The system’s efficacy and feasibility were exhibited. But it is tedious to perform segmentation for several objects at the same time. Li et al. [14] conducted a network architecture termed Edge-Conditioned CNN (EC-CNN) for thermal image semantic segmentation. In EC-CNN, a gated featurewise transform layer was implemented, which adaptively merged edge prior knowledge. With edge guidance, the entire EC-CNN was end-to-end trained along with engendered high-quality segmentation outcomes. In thermal image semantic segmentation, for comprehensive appraisals, the “Segmenting Objects in Day And Night” (SODA) dataset was wielded. The EC-CNN’s effectiveness against state-of-the-art methodologies is expounded by doing extensive experiments on SODA. However, the noisy data was not effective. Kim et al. [15] developed a multi-spectral unsupervised domain adaptation meant for thermal image semantic segmentation. With pixel-level domain adaptation bridging day together with night thermal image domains, this scheme developed the thermal segmentation network’s generalization ability. Hence, devoid of any groundtruth labels, the thermal image segmentation network acquired superior performance. The system’s efficacy and robustness are exhibited quantitatively and qualitatively. Nevertheless, the accuracy rate was extremely low.
24 Efficient Color Image Segmentation of Low Light and Night Time …
287
3 Proposed Methodology Images attained in real-world LL circumstances are not just low in brightness, but also undergo color bias, unknown noise, detail loss, along with halo artifacts. Thus, the LL image segmentation is tedious. By utilizing a new 2DTU-Net and FM2CM segmentation algorithm, an effective color image segmentation of LL and night time image enhancement is proposed. The technique endures further steps for enhancing the segmentation procedures’ efficiency. First, the input LL or night images are resized into a standard format. Next, by deploying 2DTU-Net, the contrast enhancement function is implemented for the resized images. After that, by FM2CM, the contrastenhanced images are segmented. In Fig. 1, the proposed techniques’ architecture is elucidated.
3.1 Input Source From the KITTI dataset, the LL or night time images are taken as the input source. The dataset includes 7481 training images annotated with 3D bounding boxes that are openly accessible on the Internet.
3.2 Image Resizing Here, the input LL or night time images are resized. The procedure that modifies the image size by maximizing and minimizing the entire pixels is termed image resizing. The image is resized into length*width of 320*240; thus, the zoom image quality can be improved. This is equated as, Input LDR Image
Image Enhancement Color Conversion
Image Threshold
RGB-HSV
AMOT
Enhanced HDR Image
RCAB-RDMCNN
Image Segmentation RBSHM
Fig. 1 Architecture of the proposed framework
288
C. Kumari and A. Mustafi
Rz img = λR (Inimg )
(1)
Here, the resized images are signified by Rz img , the input images’ resizing function is mentioned by λR (Inimg ).
3.3 Image Enhancement The images’ contrasts are improved once the images are resized. The manipulation or redistribution of image pixels in a linear or non-linear fashion to enhance the images’ contrast is termed contrast enhancement. Thus, the image details in the low dynamic range might be evaluated effectively and enhance the images’ quality. So, in image processing, contrast enhancement is a significant process. By utilizing the 2DTU-Net, the images’ contrasts are improved.
3.3.1
Image Enhancement Using 2DTU-Net
U-net is grounded on CNN. A particular encoder-decoder plan is included in the U-net architecture. In each layer, the encoder decreases the spatial dimensions along with boosts the channels. Alternatively, while decreasing the channels, the decoder maximizes the spatial dimensions. In the end, to make a prediction for every pixel, the spatial dimensions are restored. In Fig. 2, the U-net architecture’s general structure is depicted. Encoder (Contraction path): The repeated application of two 3×3 convolutions is included. Every convolution is followed by ReLU and batch normalization. Next, for decreasing the spatial dimensions, a 2×2 max pooling operation is implemented. The feature channels get doubled at every down-sampling pace. Decoder (Expansion path): Every pace in the decoder encompasses the feature map’s up-sampling succeeded by a 2×2 transpose convolution that divides the feature channels and also concatenates with the equivalent feature map as of the contracting path, along with wields a 3×3 convolutional each succeeded by a ReLU. To engender contrast-enhanced images, a 1×1 convolution is wielded in the last layer. Usually, it makes the neuron useless and makes it unable to flow on other data points again for the leftover process, when a huge gradient flows through the ReLU in the CNN layers, the U-Net employs Rectified linear unit (ReLU) activation function. Hence, the ReLU activation is replaced with the Tanish function. Tanish, which constantly updates the weight and bias values even if the gradient is large, represents the Tanh and swish activation function. The Tanish function is depicted by, Tanish = T (z) ∗ Sw
(2)
Here, the Tanh activation function is indicated by T (z), and the swish activation function is mentioned by Sw.
24 Efficient Color Image Segmentation of Low Light and Night Time …
289
Fig. 2 General architecture of the U-Net
T (z) =
e z − e−z e z + e−z
Sw = X ∗ Sigmoid(X )
(3) (4)
Hence, the images’ contrast is considerably raised by th2DTU-Net. x signifies the contrast-enhanced images.
3.4 Segmentation Next, image segmentation is conducted. The process of segmenting the images into several divisions that aid in decreasing the image complexity to make image simpler processing is termed image segmentation. By utilizing FM2 CM, the images are segmented.
3.4.1
Segmentation using FM2 CM
A soft clustering methodology in which every pixel might belong to two or more clusters with differing degrees of membership is termed Fuzzy C-Means (FCM). The objective function is signified by the sum of distances between cluster centers and patterns. Usually, for estimating the distance between the cluster centers and initial
290
C. Kumari and A. Mustafi
pixels, the FCM wields Euclidean Distance (ED). However, the ED estimation is intricate along with degrading the clustering accuracy. For a huge amount of data, the ED isn’t apt. Thus, the Chebyshev distance, which is also called Maximum Metric (M2 ) that effectively surpasses the above-mentioned issue, is employed rather than ED. The usual FCM is termed FM2 CM due to that alteration. The FM2 CM steps are given below. Step 1: Input the input pixel of the contrast-enhanced images x. This is equated as: x j = {x1 , x2 , x3 , . . . , x N }
(5)
Here, the number of pixels in the image x is defined by N . Step 2: Input, the number of cluster centroid (Cn k ) arbitrarily, which is given by, Cn k = Cn 1 , Cn 2 , Cn 3 , . . . , Cn M
(6)
Step 3: Next, the M2 distance D[x j , Cn k ] betwixt the initial pixels x j and the cluster centroids (Cn k ) is estimated by, D(x j , Cn k ) = Max(|x2 − x1 |, |Cn 2 − Cn 1 |)
(7)
Step 4: The FM2 CM’s objective function (Obj F N ) is calculated by, ObjaFN =
N M
Z jk
a 2 · D x j , Cn k
(8)
j=1 k=1
where the real number greater than 1 maintains the degree of fuzziness is a. The membership function is depicted by Z jk , here Z jk ∈ [0, 1]. Step 5: The fuzzy membership function Z jk , which is equated as, Z jk
−2 M x j − Cn k a−1 = x j − Cn i
(9)
k=1
Here, the cluster centroid selected from Cn k is expounded by Cn i . Step 6: By reassigning the cluster centroids Cn k and Cn i , the process continues by optimizing the objective function ObjFN until the following condition is met. jk Z (N ) − Z jk (N + 1) ≤ Q
(10)
Here, the constant that ranges from 0 to 1 is signified by Q. Step 7: Hence, the images are effectively segmented by the proposed FM2 CM. The segmented images are equated as (Fig. 3),
24 Efficient Color Image Segmentation of Low Light and Night Time …
291
Fig. 3 Pseudo-code for the FM2 CM
Simg = {S1 , S2 , S3 , ..., Sn }
(11)
4 Results and Discussion Here, the evaluation of the proposed methodologies’ final results is elucidated. The performance and comparative evaluation are done to state the efficacy. In the working platform of MATLAB, the proposed method is deployed. From the KITTI dataset that is openly accessible on the Internet, the input data is gathered.
4.1 Performance Analysis of Proposed 2DTU-Net Regarding Peak Signal-to-Noise Ratio (PSNR), Mean Square Error (MSE), together with Structural Similarity Index (SSIM), the proposed 2DTU-Net’s performance appraisal is verified with several prevailing U-net, CNN, and Deep Neural Network (DNN) methodologies.
292
C. Kumari and A. Mustafi
Regarding PSNR, MSE, and SSIM, the proposed 2DTU-Net is analogized with several prevailing U-net, CNN, and DNN methodologies in Fig. 4. The images’ enhanced quality is offered by the high value of PSNR in Fig. 4a. For PSNR, the 2DTU-Net attains 20.14345; while the prevailing methodologies achieve the value of 11.54058. The image degradation caused by image compression and other processing techniques is quantified by the MSE metric value. The model’s efficacy is indicated by the low value of MSE. The 2DTU-Net obtains 0.01142; while the current methodologies achieve the value of 0.11071 regarding MSE. Moreover, the 2DTU-Net was also evaluated regarding SSIM metrics. For SSIM, 0.63568 is the value obtained by the 2DTU-Net; while the prevailing methodologies attain the lower value of 0.33896. The 2DTU-Net is an error-prone scheme along with conveys a quality image without any distortion.
4.2 Performance Analysis of Proposed FM2 CM Regarding clustering time and accuracy, the proposed FM2 CM’s performance evaluation is examined; in addition, the results are analogized to the prevailing FCM, K-means, K-medoid, and Mean Shift (MS) techniques. For the proposed FM2 CM, the clustering time analysis is depicted in Table 1. The time consumed by the whole system to create an effectual cluster is termed clustering time. The outcomes will be complicated if more time is consumed. For clustering time, the FM2 CM takes a less time of 16,352 ms; while the current methodologies consume a huge time of 23,421 ms for FCM, 26,617 ms for K-means, 30,337 ms for 30,337 ms, and 34,294 ms for MS. Hence, with limited time and cost, the FM2CM achieves effectual clusters along with decreases complications; thus, time complexity could be lightened. The clustering accuracy of the proposed FM2CM and the prevailing FCM, Kmeans, K-medoid, and MS methodologies are analogized in Fig. 5. For forming clusters, the FM2CM has an accuracy of 97%; while 89, 82, 73, and 70% is the clustering accuracy of current FCM, K-means, K-medoid, and MS methodologies, which is low comparatively. Thus, in cluster formation, the FM2CM exhibits superior performance, which provides an enhanced effect on the segmented images.
5 Conclusion By deploying a fresh 2DTU-Net and FM2 CM segmentation algorithm, an effective color image segmentation of LL and night time image enhancement is proposed. Resizing, image enhancement, and image segmentation were the ‘3’ key steps on which the proposed system focused. Next, for examining the proposed methodologies’ efficacy, the experiment evaluation is done, where the performance and comparative evaluation is conducted regarding a few performance metrics. Several
24 Efficient Color Image Segmentation of Low Light and Night Time …
(a)
(b)
(c) Fig. 4 Comparative analysis of proposed 2DTU-Net based on a PSNR, b MSE, and c SSIM
293
294 Table 1 Performance analysis of proposed FM2 CM algorithm in terms of clustering time
C. Kumari and A. Mustafi Techniques Proposed
FM2 CM
Clustering time (ms) 16,352
FCM
23,421
K-means
26,617
K-medoid
30,337
Mean shift
34,294
Fig. 5 Comparative analysis of the proposed FM2 CM in terms of clustering accuracy
uncertainties can be tackled by the presented technique along with it can give propitious outcomes. With a limited time like 16,352 ms, the clustering algorithms form effective clusters along with segments the images with an accuracy rate of 97%. The proposed framework surpasses the prevailing methodologies and sustains to be reliable and robust. The system will be elaborated with a few enhanced neural networks and perform the image segmentation procedure for complicated datasets in the future.
References 1. Valada A, Vertens J, Dhall A, Burgard W (2017) AdapNet adaptive semantic segmentation in adverse environmental conditions. In: IEEE international conference on robotics and automation (ICRA), May 29–June 3, 2017, Singapore 2. Dai D, Van Gool L (2018) Dark model adaptation semantic image segmentation from daytime to nighttime. In: 21st International conference on intelligent transportation systems (ITSC), November 4–7, 2018, Maui, Hawaii, USA 3. Long J, Shelhamer E, Darrell T, Berkeley UC (2015) Fully convolutional networks for semantic segmentation. arXiv:1411.4038v1 4. Lore KG, Akintayo A, Sarkar S (2016) LLNet a deep autoencoder approach to natural low-light image enhancement. Patt Recogn 61:650–662 5. Wang Y, Ren J (2018) Low-light forest flame image segmentation based on color features. J Phys Conf Ser 1069(1):1–9 6. Shen L, Yue Z, Feng F, Chen Q, Liu S, Ma J (2017) MSR-net low-light image enhancement using deep convolutional network. arXiv:1711.02488v1
24 Efficient Color Image Segmentation of Low Light and Night Time …
295
7. Dev S, Savoy FM, Lee YH, Winkler S (2017) Nighttime sky/cloud image segmentation. In: IEEE international conference on image processing (ICIP), 17–20 Sept 2017, Beijing, China 8. Badrinarayanan V, Kendall A, Cipolla R (2016) SegNet a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 9. Haltakov V, Mayr J, Unger C, Ilic S (2015) Semantic segmentation based traffic light detection at day and at night. Springer, Cham, ISBN: 978-3-319-24946-9 10. Sun L, Wang K, Yang K, Xiang K (2019) See clearer at night towards robust nighttime semantic segmentation through day-night image conversion. In: Proceedings artificial intelligence and machine learning in defense applications, 19 Sept 2019, Strasbourg, France 11. Cho SW, Baek NR, Koo JH, Park KR (2020a) Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation. IEEE Access 9:6296–6324 12. Cho SW, Baek NR, Koo JH, Arsalan M, Park KR (2020b) Semantic segmentation with low light images by modified cycle GAN-based image enhancement. IEEE Access 8:93561–93585 13. Bao XY, Sun ZL, Wang N, Chen YQ (2019) Solar panel segmentation under low contrast condition. In: Chinese control and decision conference (CCDC), 3–5 June 2019, Nanchang, China 14. Li C, Xia W, Yan Y, Luo B, Tang J (2020) Segmenting objects in day and night edge-conditioned CNN for thermal image semantic segmentation. IEEE Trans Neur Netw Learn Syst 32(7):3069– 3082 15. Kim YH, Shin U, Park J, Kweon IS (2021) MS-UDA multi-spectral unsupervised domain adaptation for thermal image semantic segmentation. IEEE Robot Autom Lett 6(4):6497–6504
Chapter 25
An Architecture to Develop an Automated Expert Finding System for Academic Events Harshada V. Talnikar and Snehalata B. Shirude
1 Introduction Expert in a specific area is the person who is knowledgeable and has in-depth skill in that area. Expert finding problems have various applications in the business operations field and in everyone’s everyday life. For instance, people may seek experts’ advice related to domains like academics, medical problems, laws, and finance [1]. As a result of the emergence of innovative technologies and recent swift advancements, it leads to huge data flow all over the world. Most of the search engines concentrate on words rather than concepts. It allows only to use a certain number of keywords to narrow the search. While using such search engines, search outcomes may be either relevant or irrelevant. If it is relevant, sometimes, the range varies from tens to hundreds. To meet this problem, the proposed work presents the use of natural language model-based information retrieval. It recovers the meaning insights from the enormous amount of data available on the Internet [2].
2 Related Work • Wu et al. [3] used ResearchGate and explored the features using questionnaires and interviews. The proposed approach has considered pages and navigations. It showed the complete process to find an expert on different academic social H. V. Talnikar (B) Department of Computer Science, S. N. Arts, D. J. M. Commerce, B. N. S. Science College (Autonomous), Sangamner, India e-mail: [email protected] H. V. Talnikar · S. B. Shirude School of Computer Sciences, K. B. C. North Maharashtra University, Jalgaon, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_25
297
298
H. V. Talnikar and S. B. Shirude
networking sites. A pathway defined by the authors is a series of navigations between pages and while linking it has followed chronological order. The study summarized specific indications for academic social networking sites to (a) Improve search pages (b) Focus on individual user’s needs (c) Use relational networks. • Fu and Li [4] proposed a novel recurrent memory reasoning network (RMRN), exploring the implicit relevance between a requester’s question and a candidate expert’s historical records by perception and reasoning taken into consideration. The considerations introduced a Gumbel-Softmax-based mechanism to select relevant historical records from candidate experts’ answering histories. The judgment has been made that the proposed method could achieve better performance than the existing state-of-the-art methods. • Javadi et al. [5] suggested a recommendation system for finding experts in online scientific communities. A dataset including bibliographic information, venue, and various listed-related published papers has been used. In evaluation using the IEEE database, the proposed method reached an accuracy of 71.50% that seems to be an acceptable result. • Joby [2] emphasized on using a natural language model-based information retrieval to recover the meaning insights from the enormous amount of data. The method has used the latent semantic analysis to retrieve significant information from the questions raised by the user or the bulk documents. The carried out method utilized the fundamentals of semantic factors occurring in the dataset to identify the useful insights. The experiment analysis of the proposed method has been carried out with few state-of-the-art datasets such as TIME, LISA, CACM, and NPL, and the results obtained demonstrated the superiority of the method proposed in terms of precision, recall, and F-score. • Hussain et al. [6] reported expert finding systems for the span, 2010–2019. Authors indicated specific scope by formulating five different research questions. This study focused on useful sources. Sources have three different categories, viz., text, social networks, and hybrid. Literature depicts the models for building the expert finding systems like generative and discriminative probabilistic, networkbased, voting, and some hybrid models. Datasets were used to evaluate the expert finding systems. Different datasets were broadly used in the environments as— enterprises, academics, and social networks. The differences between experts’ retrieval and experts seeking were discussed. Finally, the review concluded that nearly 65% expert finding systems are designed for academic purpose and domain. • de Campos et al. [7] proposed a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested. The approach represented experts by means of multifaceted problems. A judgment has been made that it is a valid technique to improve the performance of expert finding and document filtering.
25 An Architecture to Develop an Automated Expert Finding System …
299
• Shirude and Kolhe [8] investigated a framework for finding experts required for academic programs and committees. The framework used online research groups such as ResearchGate and Google Scholar as the resources. The authors assumed that performance depends on retrieval of keywords from many online research groups and suggested improvement by keywords weighting in the vectors. • Rostami and Neshati [9] used the idea of an agile team. Software companies require it. The idea has suggested a T-shaped model for expert finding and used the models XEBM, RDM. • Yuan et al. [10] reviewed and categorized current progress of expert finding in community question and answering (CQA). The existing solutions have been categorized into—matrix factorization-based models, gradient boosting tree-based models, deep learning-based models, and ranking models. According to authors, matrix factorization-based models outperform. • Lin et al. [1] reviewed and summarized expert finding methods and categorized according to their underlying algorithms and models. The conducted review has concluded with the categorization of models as—(a) generative probabilistic models (candidate generation models, topic generation models—(a) document models), (b) voting models, and (c) network-based models (HITS and PageRank algorithms, propagation models). The authors pointed out many unsolved challenges as notion of expert, finding relevant people, model evaluation, knowledge area classification. Automatically expert finding tasks are more challenging as large-scale expertise-related information is available on various data sources. For the expert finding purpose, the information retrieval process is popularly used in examining the information from the big amount of dataset. There are multitudes of possibilities available in the information retrieval. The enormous information flow quantity through the Web pages heightens the difficulties in the useful and as well as reliable information retrieval. Firstly, to decide who is an expert in a specific domain, it is necessary to acquire all relevant data of the person [11]. The following are the three channels used as data resources: (a) Meta databases (b) Document collections (c) Referral networks.
3 Findings About Expert Finding Systems The literature survey resulted in the classification of various studied expert finding systems. It made it possible to classify the systems according to domains used and applied techniques.
300
H. V. Talnikar and S. B. Shirude
Fig. 1 Classification based on domain
3.1 Classification Based on Used Domains The task of searching experts falls into two major categories according to the domains used as shown in Fig. 1. i. Enterprise: In this first category enterprise, it uses three sources to find expert’s area and level as (a) self-disclosed documents, (b) documents, and (c) social networks. The study summarized that self-disclosed information is difficult for timely updating, while the remaining two other sources documents and social networking are important. ii. Community question answering (CQA) platforms as Quora, StackOverflow, Yahoo answers: In this second category, i.e., online communities, it uses two sources as (a) social networks and documents. According to Wang et al. [12], knowledge in online communities has heterogeneous structure and may be low quality and highly affects expert finding systems performance.
3.2 Classification Based on Used Techniques It does not matter which domain is used either in enterprise or online communities while classifying expert finding techniques based on used techniques. Such a classification has divided used techniques into two groups as graph-based techniques and machine learning-based techniques as demonstrated in Fig. 2 [12].
3.2.1
Graph-Based Techniques
These techniques make use of graphical representation of retrieved data. A graph. G = (V, E) is prepared, where V is a set of experts, and E is a link between them to connect them by means of question asker-answerer, co-authorship, email
25 An Architecture to Develop an Automated Expert Finding System …
301
Fig. 2 Classification based on techniques
communication, etc. Number of methods are applied on these graphs to rank the experts as computing measures (e. g., PageRank, HITS) or graph properties (e. g., centrality, connections) [13].
3.2.2
Machine Learning-Based Techniques
Machine learning (ML) is useful to identify the patterns in the given training dataset. ML methods make use of feature extraction from various sources whether enterprise or online communities. Features are considered as either content-based or noncontent-based. Some examples of the ML-based techniques are—logistic regression, support vector machine, reinforcement, clustering, group recommendation, etc.
4 Proposed Architecture The proposed methodology includes the important tasks to (a) (b) (c) (d) (e) (f) (g)
Identify need of experts Ascertain various online research groups Remove noise from the collected data Apply artificial intelligence-based machine learning techniques Use natural language processing (NLP) tools to retrieve relevant information Extract names, address, expertise area, etc., of experts from relevant information Compare the result with existing tools. Let us summarize the process (Fig. 3). The above proposed architecture consists of all the following steps:
I. Resource selection
302
H. V. Talnikar and S. B. Shirude
Fig. 3 Conceptual expert finding procedure
It is a pinpoint task of correctly identifying an expertise area. To initiate the process of an expert finding system, it is the first step to identify the exact purpose to find the expert [14]. To meet the requirement, it is necessary to select the appropriate resources and to collect the relevant data. Various useful data resources may be any of the following: • Meta databases: Some organizations use databases to store the expertise of their employees. • Document collections: One approach is to construct a database manually, but the better is to automatically extract from documents like reports, publications, emails, question and answers forums, comments in such forums, Web pages, etc. • Referral networks: There are some Web groups of people who share the same interests and are in the same profession. Such groups may create referral networks which may consist of colleagues, students-teachers, authors, etc. In these networks, expert is recommended by other person who knows about the expert’s knowledge and the specific skills. II. Data cleaning These selected resultant documents containing Web pages can have noise contents as extra spaces or tabs or delimiters, advertisements, unwanted images, etc. Once the noise is removed, the produced clean data prompt to extract required contents very easily and in less time. It improves the accuracy of the system. If one has wrong or bad quality data, then it can be detrimental to process and analysis. On the other hand,
25 An Architecture to Develop an Automated Expert Finding System …
303
good-quality data can produce outstanding results using a simple algorithm. There are many kinds of constraints to conform for the data being valid like range, data type, and constraints, cross field examination, unique requirements, set membership restrictions, regular pattern, accuracy, completeness, consistency, uniformity, and many more. Considering the importance of data cleaning, the following data cleaning techniques may be implemented, • • • • • • •
Remove duplicate or irrelevant observations Filter outliers Avoid typo errors Data type conversions Deal with missing values Get rid of extra spaces Delete auto-formatting.
III. Build strategy to retrieve experts from cleaned data To implement strategy building steps, the innovative and modified strategies may be planned to improve efficient outcomes. The earlier studied techniques helped to design new combinatorial strategies. As a result, it has enabled us to propose welldefined concrete structures. While considering an expert retrieval problem, it may consider following two search criteria: (1) “Who is the expert person on topic Y?”—This query helps to find an expert in a particular knowledge domain. It is termed as expertise identification [18]. (2) “What does expert X know?”—This query helps to find expert’s information and knowledge. It is termed as expert identification. Most of the currently used algorithms focused on above stated first search criteria. Expert finding strategy concerns: • Representation of experts, • Finding expertise evidences, and • Association of query topics to candidates [13]. Any expert retrieval model has three components—candidate (a person to consider as an expert), document (the data resources), and topic (the specific domain). The following are the approaches which are used in expert finding task: (a) Generative probabilistic models These models are used in many expertise retrieval methods. It uses an idea to rank candidate with the probability p(Ca|q), and it is the probability of a candidate Ca to be an expert on topic q. Balog et al. [13] stated two different generative probabilistic models such as (i) candidate generation and (ii) topic generation models. Candidate generation model computes the probability of a candidate Ca being an expert on topic q as p(Ca|q). Topic generation model finds out the probability using Bayes’ theorem as
304
H. V. Talnikar and S. B. Shirude
p(Ca|q) =
p(q|Ca) p(Ca) p(q)
(1)
where p(Ca) is candidate’s probability and p(q) is a query probability. Generative probabilistic models are based on the foundation of language modeling. (b) Voting models Expert finding can be done with the voting process. Macdonald and Ounis, 2006 used retrieved documents ranking [15, 16]. Data fusion techniques were used for the ranking. They aggregated the votes for every candidate and determined a final ranking for the candidate. Zhang et al. stated a reciprocal rank (RR) fusion technique. In that, a candidate’s expertise is calculated as ScoreRR (Ca, q) =
1 rank(d, q) d:Ca∈d,d∈R(q)
(2)
where R (q) is set of documents retrieved as a result of query q. Rank (d, q) is the rank of document d. This reciprocal rank (RR) fusion technique is the simplest method to determine ranking in the voting model. (c) Network-based models For finding expert’s information, referral Webs and social networks are common channels for data retrieval. In such network-based models, expert retrieval graphs can be constructed using any of the following two ways: i. The graph in which nodes are represented as documents and candidate’s expertise. Edges are represented as their association. ii. The graph in which nodes are represented as only candidates and edges are represented as their relationship. In the various network-based models, HITS and PageRank algorithms are used as well random walk propagation is often used. IV. Forming experts groups The identified experts can be put into one group according to domain of expertise. Such collaborative groups can prove a good platform for communication as well as knowledge exchange, etc. [8]. It may prove to be advantageous for knowledge upgradation.
25 An Architecture to Develop an Automated Expert Finding System …
305
5 Result and Discussion The discussed framework is aiming to develop the artificial intelligence-based system with the highest possible accuracy with consideration of the broad domain for expert finding tasks. The effective implementation can produce improved and more efficient systems over the existing ways to find domain specific experts. It can be resulted into an intelligent application with enriched domain specific experts suggestions. Collaborative groups of similar domains’ experts can be found as an additional outcome of the implemented process. The field where it needs to focus the research is to design an efficient experts retrieval algorithm [17]. The discussions related to building strategy for retrieving experts from cleaned data points toward the appropriate model selection. To identify an expert by analyzing his related documents and collecting expertise evidence is an effective way, but there is a possibility that such evidence is outside the scope of collected documents. It arises from a need for implicit or explicit relation mapping among people and documents [18]. Network-based models are found well for such mapping. Generative probabilistic models have good empirical performance and the potential to incorporate extensions in a transparent manner. The voting models for the domain specific expert search aggregate scores from single strategy across members of a document aggregate rather than aggregating multiple systems’ score on only one document. The study has observed that identifying the domain relevant experts and ranking them over the non-relevant experts is the challenging task [19, 20].
6 Conclusion This article investigates primary key issues in the field of expert finding tasks such as resources selection, expertise data retrieval, and retrieval model extending. Pertaining to each issue specific tasks and algorithms needs to be implemented. In future, the proposed architecture may be implemented in several applicable domains. Further, the use of natural language model-based information retrieval may play an important role in the development of expert finding systems.
References 1. Lin S, Hong W, Wang D, Li T (2017) A survey on expert finding techniques. J Intell Inf Syst. © Springer Science+Business Media New York 2. Joby P (2020) Expedient information retrieval system for web pages using the natural language modeling. J Artifi Intell Caps Netw 02(02):100–110 3. Wu D, Fan S, Yuan F (2021) Research on pathways of expert finding on academic social networking sites. Inform Process Manage 58(2)
306
H. V. Talnikar and S. B. Shirude
4. Fu J, Li Y, Zhang Q, Wu Q, Ma R, Huang X, Jiang YG (2020) Recurrent memory reasoning network for expert finding in community question answering. Assoc Comput Mach 20 5. Javadi S, Safa R, Azizi M, Mirroshandel SA (2020) A recommendation system for finding experts in online scientific communities. J AI Data Mining 8(4):573–584 6. Husain O, Salim N, Alinda RA, Abdelsalam S, Hassan A (2019) Expert finding systems: a systematic review. Appl Sci 7. de Campos Luis M, Fernandez-Luna JM, Huete JF, Luis RE (2019) Automatic construction of multi-faceted user profiles using text clustering and its application to expert recommendation and filtering problems, Vol 190. Elsevier, Knowledge-Based Systems 8. Shirude S, Kolhe S (2019) A conceptual framework of expert finding system for academic events and committees. Int J Comp Sci Eng 7(2) 9. Rostami P, Neshati M (2019) T-shaped grouping: expert finding models to agile software teams retrieval. Expert Syst Appl 118:231–245 10. Yuan S, Zhang Y, Tang J, Hall W, Cabotà JB (2018) Expert finding in community question answering: a review. https://biendata.com/competition/bytecup2016/, Accessed on 10 Apr 2021 11. Huna A, Srba I, Bielikova M (2016) Exploiting content quality and question difficulty in CQA reputation systems 12. Wang GA, Jiao J, Abrahams Alan S, Fan W, Zhang Z (2013) ExpertRank: a topic-aware expert finding algorithm for online knowledge communities. Decision Support Systems 54 (2013), © 2012 Elsevier B.V. All rights reserved, pp 1442–1451 13. Balog K, Yi F, De Rijke M, Serdyukov P, Si L (2012) Expertise retrieval. Found Trends Inf Retr 6(2–3):127–256 14. Zhang J, Tang J, Liu L, Li J (2008) A mixture model for expert finding. PAKDD 2008, LNAI 5012, © Springer-Verlag Berlin Heidelberg, pp 466–478 15. Macdonald C, Ounis I (2007) Using relevance feedback in expert search. In: European conference on information retrieval. Springer, Heidelberg, pp 431–443 16. Macdonald C, Ounis I (2006) Voting for candidates: adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 387–396 17. Dawit Y, Alfred K (2003) Expert finding systems for organizations: problem and domain analysis and the DEMOIR approach. J Org Comput Electron Commer. https://doi.org/10.1207/ S15327744JOCE1301_1 18. Petkova D, Croft WB (2008) Hierarchical language models for expert finding in enterprise corpora. Int J Artif Intell Tools 17(01):5–18 19. Zhang M, Song R, Lin C, Ma S, Jiang Z, Jin Y, Liu Y, Zhao L, Ma S (2003) Expansion-based technologies in finding relevant and new information: thu trec 2002: novelty track experiments. NIST Spec Publ SP 251:586–590 20. McDonald DW, Ackerman MS (2000) Expertise recommender: a flexible recommendation system and architecture. In: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp 231–240
Chapter 26
A Seismcity Declustering Model Based on Weighted Kernel FCM Along with DPC Algorithm Ashish Sharma and Satyasai Jagannath Nanda
1 Introduction Earthquakes are linked with different types of clusters in the space-time domain that generate complex patterns. These seismic clusters are more predictable around major faults and tectonic boundary regions (Spatial clustering), and linked with aftershocksforeshocks and earthquake swarms [1, 2]. They represent various triggering processes like static and dynamic stress transfer, fluid flow, and seismic mass flow along the faults [3, 4]. The process of categorizing the events into aftershocks-mainshocksforeshocks (clusters) and regular events (backgrounds) is known as seismicity declustering. Declustered catalog is used in many applications like understanding the interaction between active fault lines structure [5], time-dependent probability estimation [6], and in many robust estimations like climatic, tidal, seasonal triggering of seismicity [7], development of seismic hazard maps [8], focal inversion for background stress fields [9], and localization of seismicity before the mainshocks [10]. Segregation of seismic catalogs into clustered and regular events is a complex task due to the high correlation in the spatial-temporal domain as there is no unique solution. Eventually, the final declustered catalogs deviate significantly according to the employed method. Seismic declustering is necessary to remove temporal and spatial bias due to aftershocks that overestimate seismic rates in regions. Researchers also investigated that it is essential to correct seismic rates to compensate for the reduction in rates due to declustering [11]. Many researchers have studied various declustering algorithms to find the observed seismicity [12–15]. These approaches are based on conA. Sharma (B) · S. J. Nanda Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur, Rajasthan 302017, India e-mail: [email protected] S. J. Nanda e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_26
307
308
A. Sharma and S. J. Nanda
straints derived from the characteristic of space-time patterns of seismicity [16]. A K-means clustering is a well-known clustering algorithm widely used in the seismicity analysis of earthquake regions. Rehman et al. analyzed the seismic events of the Pakistan region and categorized them using K-means clustering [17]. The problem with K-means algorithms is that it does not detect the accurate cluster centroids. It is more challenging to find centroids due to heterogeneous features in the seismic catalog (Latitude, Longitude, Time, and Magnitude). Recently, Zhuang et al. developed an epidemic-type aftershock sequence (ETAS) model for seismicity analysis based on the event’s intensity. Hanzi et al. analyzed the background (BG) events based on inter-event time distribution [18]. Nanda et al. developed a tri-stage clustering model using an iterative K-means algorithm with single distance formula. They used spatial-temporal clustering and magnitude thresholding to segregate the catalogs of earthquake-prone regions. Later, Vijay et al. proposed a tetra-stage model and included a depth parameter to analyze the seismicity [19]. Both tri- and tetrastage models are based on the K-means algorithm. The model fails to provide good results in the case of non-spherical datasets. Recently Vijay et al. proposed a shared nearest neighborhood-based model and categorized the events based on magnitude, event location, and occurrence time of Iran and Philippines region [20]. Florent et al. designed a model based on a random forest train generated by an epidemic-type aftershock sequence model and compared it with classical machine learning models in terms of AF events [21]. Density-based clustering approaches (DBSCAN) are more effective and able to identify the clusters of arbitrary shapes even in the presence of outliers [22]. The main advantage of DBSCAN is that it does not require any prior information about the number of clusters and identifies based on the density of each data point and its surroundings. It requires two parameters, radius “Eps” and a minimum number of points in the neighborhood “MinPts.” Recently, Vijay et al. proposed a variable -DBSCAN algorithm in made dependent on the magnitude of the event [23]. Evolutionary algorithm-based models like quantum gray wolf optimization model [24], binary NSGA-II model [25], and most recent multi-objective chimp optimization algorithms [26] are developed to solve the seismicity declustering problem. These analyses and results motivate the researchers to build a more efficient and robust model to reduce the complexity associated with seismicity declustering. Recently, authors developed a model based on fuzzy C-means with density peak clustering for seismicity analysis of the Philippines and New Zealand [27]. The major drawback of the fuzzy C-means algorithm is that it is susceptible to noise and outliers and is only suitable for spherical or ellipsoidal clustering. This manuscript proposes a multi-stage systematic approach using events coordinate, time, and magnitude information to estimate aftershock and background events in the catalog accurately. In the first phase, the improved fuzzy kernel clustering algorithm, known as weighted fuzzy kernel clustering (WFCM) algorithm, is introduced to find the potential seismic zones in the spatial domain with sufficient events in each zone. The major earthquake mainshocks are cluster centroids, and the dataset is classified based on the number of mainshocks. Later, each spatial zone is analyzed using weighted density peak temporal clustering using the famous clustering by fast
26 A Seismcity Declustering Model Based on Weighted …
309
search and find of density peaks algorithm [28]. A decision graph is plotted to detect the potential cluster centroid. Every event is allocated to a respective cluster to bifurcate the catalog into AF and BG events. The decision graph shows the distribution of the local density of events and distances on the X and Y axis, respectively. The event having higher local density and distances is chosen cluster centroids in the temporal domain. The decision graph ensures the correct assignment of the event to its corresponding cluster. The proposed model has less computational cost due to fewer mathematical calculations of Euclidean distances. The performance of the proposed model is tested on a historical seismic catalog of Japan and Indonesia with the help of Epicenter Plot, Cumulative Plot, Lambda Plot, and Coefficient of Variance. The rest of the paper is organized as follows: Sect. 2 provides brief details about the earthquake catalogs used in the analysis. Section 3 gives the detailed step-wise procedure of proposed spatio-temporal fuzzy clustering with density peaks to classify the earthquake catalogs. The obtained results are discussed in Sect. 4. The key points of the proposed declustering model are concluded in Sect. 5.
2 Seismic Cataog Used in the Analysis In this paper, the historical earthquake catalog of Japan and Indonesia is used in the analysis to measure the performance of the proposed model. The catalog is downloaded from the official website of the United States Geological Survey (USGS) [29] by setting the input parameters mentioned in Table 1. The brief details of each catalog used in the analysis are as follows. • Japan Catalog is one of the most seismic-active regions globally due to its location at the “Pacific Ring of Fire,” which lies across three tectonic plates, including the Pacific plate under the Pacific Ocean and the Philippine Sea Plate. Here, a total 19510 number of events are used in the analysis from the year 1992 to the year 2022. Japan has seen various devastating earthquakes, like the Great Hanshin earthquake in 1995 and the Tohoku earthquake in March 2011. The Epicenter Plot of the catalog is as shown in Fig. 1a. • Indonesia Catalog Indonesia is the most seismically active area on the planet, with a long history of powerful eruptions and earthquakes due to its existence on the Indian plate and Eurasian plate. A total 18106 number of seismic events are used in the analysis, which comprises significant earthquakes like the Sulawesi earthquake in the year 2018, the Sumatra earthquake the year 2009, and the Java earthquake in the year 2006. Figure 1b represents the Epicenter Plot of the earthquake.
01/01/1990
Indonesia
01/01/2022
01/01/2022
23:59:59
00:00:00
01/01/1990
End time
Start time
Japan
Region Max 41.4◦ 12◦ S
Min
30.9◦
6◦ S
Latitude Min 90◦ E
129.8◦ 120◦ E
143.3◦
Max
Longitude Min 2.5
2.5
9.5
9.3
Max
Magnitude
Table 1 Input parameters to download the earthquake catalog of Japan and Indonesia from USGS [29] Min 0
0
700
570
Max
Depth
18106
19510
events
Number of
310 A. Sharma and S. J. Nanda
26 A Seismcity Declustering Model Based on Weighted …
311
Latitude
40°N
35°N
200 km 200 mi
Earthstar Geographics
130°E
140°E
135°E
145°E
Longitude
Latitude
(a)
0°
10°S
500 km 200 mi
Earthstar Geographics
100°E
110°E
120°E
Longitude
(b) Fig. 1 Epicenter distribution plot of seismic event for a Japan and b Indonesia
3 Proposed Model The main goal of seismicity declustering is to estimate and discriminate between highly dense clusters (AFs) and uniformly distributed BGs, precisely. In this analysis, non-spatio-temporal parameters like magnitude and depth are also considered because seismic clusters strongly depend on them. This manuscript proposes a two-phase clustering model that detects effective seismic clusters in space and time domains. Clustered AF events and uniformly distributed BGs are determined based on density. The complete flowchart of the
312
A. Sharma and S. J. Nanda
proposed model is given in Fig. 2. The step-wise procedure of the proposed model is given in as follows. Step 1: Seismic Catalog The input dataset to the proposed model is the earthquake catalog given as ⎤ ⎡ t1 e1 ⎢ e2 ⎥ ⎢ t2 ⎢ ⎥ ⎢ =⎢ . ⎥=⎢. ⎣ .. ⎦ ⎣ .. ⎡
E N ×D
eN
θ1 θ2 .. .
φ1 m 1 φ2 m 2 .. .. . . tN θN φN m N
d1 d2 .. .
⎤ ⎥ ⎥ ⎥ ⎦
(1)
dN
Any ith event where i = 1, 2, 3, …N in earthquake catalog has information about origin time (t), coordinate location information in terms of latitude (θ ) and longitude (φ), earthquake magnitude (m), and depth (d). N represents the total number of events in the catalog. Step 2: Identification of Shallow and Deep focus Events The seismic events with epicenters near the earth’s surface are more hazardous and generate more AF events than deep focus earthquakes. Here, depth threshold (dth ) of 70 km is applied to segregate the shallow (Sc ) and deep focus (Dc ) catalog, and analysis is carried out separately.
Seismic Catalog
Depth Threshold
Data Processing / Visualization
Shallow Catalog
Deep Catalog
Mainshock Identification Calculate Distance Ds Metric between ti and tj
Parallel Event Selection Ei ϵ Szn
Calculate Density Metric between ti and tj
Euclidean Distance calculation between Event and Mainshocks
Sz1
Sz2
……
Szn
Spatial Seismic Zones Temporal cluster identification using Decision Graph Cluster Label Assignment DPC Clustering Fig. 2 Proposed declustering model
Aftershock Event Identification Background Event Identification
Magnitude Threshold Results Validation Comparison
26 A Seismcity Declustering Model Based on Weighted …
E N ×D =
ei ∈ Deep Catalog(Dc ) if d ≥ dth ei ∈ Shallow Catalog(Sc ) otherwise
313
(2)
. Step 3: Identification of Mainshocks The mainshocks are those events with higher magnitude, and the epicenter is near the earth’s surface. The spatial seismic zones are identified based on an optimal number of mainshocks. These pre-determined mainshocks are centroids of each spatial zone. It is represented as ⎡ ⎤ e11 e12 e13 · · · e1D ⎢ e21 e22 e23 · · · e2D ⎥ ⎢ ⎥ (3) M K ×D = ⎢ . . . . .. ⎥ . . . . ⎣ . . . . . ⎦ eK 1 eK 2 eK 3 · · · eK D where M K ×D ∈ S f . The K is a pre-determined number of mainshock events. Step 4: Spatial Analysis using Weighted Kernel Fuzzy C-means Algorithm Fuzzy C-means clustering (FCM) is one of the most famous classical fuzzy clusterings. The initial centroids of FCM are selected randomly. The input data to FCM comprises P features, and the output is the matrix (U ) having c rows and n columns where c represents the number of clusters and n represents the number of the dataset in each cluster. The events nearby in space and time are correlated with primary mainshock. The Euclidean distance function measures the similarity between the events. Let us consider earthquake catalog E = (ei , i = 1, 2, 3..., N ) is input to FCM, c is pre-determined number of categories according to number of mainshocks (m) is given interval. u i, j is the membership function where i = 1, 2, 3..., N and j = 1, 2, 3..., N . Then, the distance function is calculated as D(ei , m k ) =
Ns K
ηi,z j [(ei (θ ) − m j (θ ))2 + (ei (φ) − m j (φ))2 ]1/2
(4)
i=1 j=1
D(ei , m k ) =
Ns K
ηi,z j [di, j,θ + di, j,φ ]1/2
(5)
i=1 j=1
Dτ (i, j) = |ti − t j |
(6)
where z represents the constant used to control the degree of fuzzy overlapping. Then, the degree of membership ηi,z j is calculated between jth mainshock and any event ei as
K 2 −1 ||ei (θ, φ) − m j (θ, φ)|| z−1 z (7) ηi, j = ||ei (θ, φ) − m k (θ, φ)|| k=1
314
A. Sharma and S. J. Nanda
Here, j = 1, 2, . . . K number of mainshock and i = 1, 2, . . . Nc number of events in shallow catalog Sc . One of the problems with classical FCM is that it is only suitable for spherical and ellipsoidal clustering and is highly sensitive to an outlier in the dataset. This problem can be effectively solved by employing two new parameters. The first parameter is the kernel function in the clustering. The basic idea is to map the input space into the Rs to high-dimensional feature space (g) using nonlinear transformation. The frequently used nonlinear transformation is kernel function which is radial basis function. The second parameter is weight function (ai ), which allows the algorithm to assign weights to different classes improving the clustering effect. Let the events E N ×D ⊂ Rs is feature data-space in Rq mapped to sample dataset in feature space Rs . The mathematical formulation for weighted kernel fuzzy C-means clustering is given as D(ei , m k ) =
Ns K
ηi,z j aim [(d K ,i, j,θ ) + (d K ,i, j,φ )]1/2
(8)
i=1 j=1
if Kernel function K is selected then the Euclidean distance is calculated between the seismic event vector ei and mainshock event vector m j d K ,i, j,θ = [K (di, j,θ ) + K (di,k,θ )]1/2
(9)
The ai represent dynamic weights. The significance of dynamic weights is that the classes having more elements in the iterative process have more dense concentration and have high importance. The membership degree of the corresponding element will be larger. At the same time, fewer elements in a particular class that are much sparsely distributed have less importance. The ai satisfies the following condition C
ai = 1
(10)
i=1
Step 4: Spatial Seismic Zone Identification Each seismic event is allocated to the different seismic zone (Sz ) based on the distance between the event and mainshock using Eq. 7. (11) Labeli = min(d K ,i, j,θ ) where number of events are i = 1, 2 …N and number of mainshocks j = 1, 2 …m. Then, the seismic zones identified are given as E N ×D =
N1 i=1
E 1 (i, :) +
N2 i=1
E 2 (i, :) +
N3
E 3 (i, :) . . . . . .
i=1
E N ×D = Sz1 + Sz2 + . . . . . . Szm
NT
E m (i, :)
(12)
i=1
(13)
26 A Seismcity Declustering Model Based on Weighted …
315
where Sz represents the seismic zones according to predefined m number of mainshocks. Step 5: Temporal Analysis using Weighted Density Peak clustering In this phase, seismic zones identified in step 4 are further classified based on their density in the temporal domain. The objective is to identify those clustered highly correlated events in the time domain, i.e., events that occurred nearby in time on the same fault line with high intensity. This is carried out with the help of temporal density peak clustering proposed by Rodriguez and Laio [28] with a better weight adjustment mechanism. These methods find the clusters by assuming that points surround centroids with comparatively less local density and high distances from the points have high local density. The key advantage of the algorithm is that it determines the centroid based on density and assigns the rest of the points to the corresponding cluster with high density and nearest neighbor approach. It also identifies the clusters irrespective of their shape and dimensions. This procedure determines two parameters for every spatial zone (Sz ) first is local density ρi and distance δi between higher density event and and ith event of corresponding spatial zone (Sz ). The local density of each event in the specific spatial zone is determined with the help of the Gaussian kernel function using the time and magnitude information of each event in (Sz ) given as ρi =
2 ds (ti , t j ) exp − dc2
(14)
The dc is a critical parameter to perform robust clustering. Its value is around 1–2 % of total events present in seismic zone Sz . Then, the weighted local density ρiw for any ith event is determined using the magnitude information of each event in seismic zone Sz given as (15) ρiw = ρi × Mi where Mi =
Mi Max(M Sz )
(16)
where Mi is magnitude of ith seismic event and M Sz is maximum magnitude of event in seismic zone Sz . The distance ρi distance is calculated by finding the minimum distance between events and any other events having high local weighted density given as δi =
Min dτ (ti , t j ), ifρ wj ≥ ρiw Max dτ (ti , t j ), otherwise
(17)
On the basis of ρ w and δi , a decision graph is plotted to find high density and larger distance points. These points are considered cluster centroids in the temporal domain.
316
A. Sharma and S. J. Nanda
Step 6: Identification of Background Events The AF is identified based on their density. The density of each point in the seismic zone is compared with the density of the clusters ρ c . Those points having a higher density than ρ c are assigned to a cluster, and the rest are considered BG events. This procedure effectively segregates the events located nearby in the space-time domain considered AF events. The rest of the events not considered as part of any clusters are BG events. Step 7: Magnitude thresholding of Deep catalog In this stage, magnitude thresholding is applied to deep seismic catalog Dc X ×D where X is the number of events. The events with magnitude intensity higher than the specific threshold value are considered AF events. e ∈ AF if Mei ≥ M¯ Dc X ×Y = i (18) ei ∈ BG Otherwise where M¯ is mean value of the magnitude for deep focus catalog Dc X ×D . The identified aftershock events in spatial-temporal domain.
4 Result and Discussion In this section, the performance of the proposed model tested on earthquake catalogs is explained in Sect. 2. The proposed model is tested on both the catalogs and classified in AF and BG events. The higher magnitude earthquake events are identified based on their magnitude intensity. These events are considered cluster centroids for each cluster. These cluster centroids are represented as black stars, as shown in Fig. 3a, b. A total of 8 and 10 cluster centroids are identified using the WKFCM algorithm for Japan and Indonesia, respectively. Based on spatial information of centroids, events are classified into different spatial zones by applying the procedure mentioned in step 3. The identified spatial zones for Japan and Indonesia are represented with different colors in Fig. 3a, b. After categorizing the events into respective spatial zones, a weighted density peak clustering algorithm is applied for both catalogs. Initially, the unweighted temporal density is determined using Eq. 12. Here, the value of dc = 0.1 is considered in the analysis. In spatio-temporal analysis, the density of a number of events directly depends on magnitude. So the weighted local density ρ w is identified using Eq. 13. A decision graph is drawn between distance δ, and weighted local density ρ represents the higher magnitude earthquake events in the temporal domain as shown in Fig. 3c, d. It is observed from Fig. 3c, d, events having higher distance δ and weighted local density ρ are clearly separable from the rest of the events. The events having high local density represent the cluster centroids in the temporal domain. In spatio-temporal analysis of the seismic events, the clusters may overlap due to occurrence at the same spatial location but distant in the temporal domain. The overlap clusters have a low δ value in both space and time. In the proposed model, spatial separation between the events is performed using the WKFCM algorithm; then, time density peak clustering finds the events nearby in
26 A Seismcity Declustering Model Based on Weighted …
317
40°N
Latitude
Latitude
0°
35°N
10°S 500 km
200 km 200 mi
200 mi
Esri, HERE, Garmin, FAO, NOAA, USGS
130°E
135°E
140°E
Esri, HERE, Garmin, FAO, NOAA, USGS
100°E
145°E
110°E
Longitude
Longitude
(a)
(b)
12
120°E
15
10 10 Distance
Distance
8 6 4
5
2 0
0
5
10
15
20
25
30
0
0
2
4
6
8
10
12
Weighted Local Density
Weighted local density
(c)
(d)
14
16
Fig. 3 Results obtained from proposed model; spatial seismic zone identified for a Japan and b California; decision graph to identify potential seismic events in temporal domain for c Japan and d California
time. In the spatial domain, clusters are already defined according to mainshocks to avoid overlapping. Then, density peak clustering finds the non-overlapping centroids according to the decision graph and avoids merging clusters in the time domain even if they present the same spatial location. After this, the events are classified as AF and BG events based on the density of the cluster centroid, as mentioned in step 6 of Sect. 3. The results obtained are explained in the following subsections.
4.1 Epicenter Plot The events classified in AF and BG for the Japan and Indonesia catalog are depicted in Fig. 4a, b, respectively. It has been observed from Figs that the aftershock events (black dots) are highly dense and compact near the location of mainshocks. Aftershocks mainly exist at the fault boundaries where several mainshock events occur.
318
A. Sharma and S. J. Nanda
Latitude
Latitude
40°N
0°
35°N
10°S
200 km 200 mi
130°E
Esri, HERE, Garmin, FAO, NOAA, USGS
135°E
140°E
145°E
500 km 200 mi
Esri, HERE, Garmin, FAO, NOAA, USGS
100°E
110°E
Longitude
Longitude
(a)
(b)
120°E
Fig. 4 Epicenter distribution plot of seismic event with depth for a Japan and b Indonesia
The events not associated with mainshocks are considered BG events represented by gray dots in Fig. 4 for Japan and Indonesia. It has been observed that BGs are uniformly distributed across the entire region for both the catalogs. The BG events are not associated with any significant event and show the absence of a dense region.
4.2 Cumulative and Lambda Plot Cumulative and Lambda Plots are essential measures to test the effectiveness of the proposed model in terms of total events, AF, and BG events. A Cumulative Plot represents a cumulative sum of events with respect to specific time interval. Lambda Plot signifies the total number of occasions concerning a given period. Cumulative and Lambda Plots in terms of total events, clustered aftershocks, and non-clustered background event are shown in Fig. 5. It is observed from Fig. 5a, b that the cumulative rate for BG events (gray curve) follows the linear trend concerning time. It reveals that all the events that occurred over the period across the entire region follow the uniform distribution. The characteristics of aftershock events (black curve) follow the exact pattern of total events (pink curve). The non-uniform characteristics of AF events for both the catalogs and similar patterns between AF events and true events, as shown in Fig. 5a, b, show that the events are efficiently segregated using the proposed model. Figure 5c, d shows the occurrence rate of a number of events in a year, known as the Lambda Plot. It is evident that the seismicity rate in the case of BG events (gray line) is uniformly distributed and does not deviate even in the presence of a significant seismic event. It reveals that the BG seismicity rate is independent of mainshock events and becomes stationary throughout the interval. It is also observed that AF events show the non-uniform seismicity rate and presence of significant
26 A Seismcity Declustering Model Based on Weighted … 10
4 Total Events Clustered AFs Non Clustered BGs
1.5
1
0.5
0
10
2
Cumulative No of Events
Cumulative No of Events
2
10
15
20
25
4
Total Events Clustered AFs Non Clustered BGs
1.5
1
0.5
0 5
319
5
30
10
15
(a)
25
30
(b) 3000
4000 3500
20
No of Years
No of Years
Total Events Clustered AFs Non Clustered BGs
Total Events Clustered AFs Non Clustered BGs
2500
No of Events
No of Events
3000 2500 2000 1500
2000 1500 1000
1000 500 500 0
0 5
10
15
20
25
30
5
10
15
20
25
30
No of Years
No of Years
(d)
(c)
Fig. 5 Epicenter distribution plot of seismic event with depth for a Japan and b Indonesia
peaks at the time of mainshocks at years 11 and 22 in Fig. 5c and years 5, 11, and 16 in Fig. 5d. The characteristics of AF (pink curve) and true events (black curve) follow a similar trend, indicating the proposed model’s potential for segregation of events.
4.3 Temporal Seismicity Analysis Time domain analysis of all the events is performed using Coefficient of Variance (COVT ) [30]. It is the standard deviation to mean ratio for a given inter-event time (τ ). The inter-event time for consecutive earthquake events is given as τ = Ti − Ti+1
∀i = 1, 2, . . . N
(19)
320
A. Sharma and S. J. Nanda
Then, COVT will be determined as COVT =
E[τ 2 ] − (E[τ ])2 E[τ ]
(20)
E represents the average of inter-event time within given interval. The value of COVT seggregates the events in three categories. • In case of periodic time series, τ will be constant and COVT = Zero. • If COVT 1, time series follow the Poisson distribution then τ varies exponentially. • If COVT ≥ 1, time series follow power law distribution with τ growing along with time. The value of COVT for AF, BG, and Total events is mentioned in Table 2. The obtained results in terms of COV are described following subsection.
4.4 Comparative Analysis with State-of-the-Art Declustering Techniques For many years, various researchers and seismologists have made various attempts to classify earthquake catalogs. Gardner and Knopoff [12] developed a spatio-temporal window technique and analyzed different magnitude ranges to identify AF and BG events. The event that falls within the time window is considered AFs, and the rest are treated as BG events. Raesenberg [13] classified the earthquake sequences based on interaction zones in space-time domain. Spatial extent is determined near mainshocks using stress distribution, and time extent is identified using Omori’s law. The results of these algorithms are highly dependent on the default parameter setting. Here, the performance is also compared with the Uhrhammer window [14] technique and recently developed tetra-stage model [19] in terms of number of clusters, number of classified events, and C O VT . The results obtained from each algorithm are given in Table 2. Gardner’s method detects more AF events and less BG. The number of clusters is also higher compared to other methods. The results obtained from Uhrahemmer and Raesenber methods are opposite, where Uhrahemmer identifies higher AF (less than GK method) events and Raesenberg detects less number of AF. It shows that all these methods give inconsistent results. The results obtained from the tetra-stage model are more promising, but the high value of C O VBG is contradictory. The value of C O VBG is near the unity, and high statistical values of C O V AF show the superiority of the proposed model.
Total Events
19510
18106
Catalog
Japan
Indonesia
Gardner Knopff 12336 7174 1210 4.63 5.69 2.69 10522 8988 806 3.26 4.31 2.01
Methods
AF BG Clusters COVT COV AF COV BG AF BG Clusters COVT COV AF COV BG
13209 6301 1080 3.78 5.87 2.87 12127 7383 995 4.23 6.52 3.2
Gruenthal Window 9527 9989 1170 4.82 6.31 3.25 11356 6780 910 4.56 6.83 2.99
Uhrahemmer Method
Table 2 A comparative analysis between proposed model and benchmark declustering algorithms
7916 11594 889 4.51 5.87 3.62 13289 4817 827 3.91 5.49 2.76
Raesenberg Method 8510 10984 760 3.87 4.89 1.96 8726 9380 886 2.89 4.28 1.89
Tetra-stage Model
8782 10728 802 3.34 5.21 1.12 8951 9155 851 3.29 4.51 1.25
Proposed Model
26 A Seismcity Declustering Model Based on Weighted … 321
322
A. Sharma and S. J. Nanda
5 Conclusion In this manuscript, a two-phase space-time clustering is reported for the segregation of aftershocks and background events in the seismic catalog. A WKFCM clustering is applied in the spatial domain with a predefined number of mainshocks to determine the potential seismic zones. Then, Gaussian kernel-based density is estimated in the temporal domain. A magnitude-based weighted strategy is applied in the decision graph to identify the mainshocks in the time domain. This multi-stage clustering approach is used to decluster the seismicity of the Japan and Indonesia region. The results obtained from the proposed model are evaluated in terms of Cumulative Plot, Lambda Plot, Epicenter Plot, number of clusters, and Coefficient of Variance. The results reveal that the proposed model efficiently decluster the seismicity and outperforms the other conventional methods.
References 1. Yehuda B-Z (2008) Collective behavior of earthquakes and faults: continuum-discrete transitions, progressive evolutionary changes, and different dynamic regimes. Rev Geophys 46(4) 2. Utsu T (2002) Statistical features of seismicity. Int Geophys Ser 81(A):719–732 3. Lengliné O, Enescu B, Peng Z, Shiomi K (2012) Decay and expansion of the early aftershock activity following the 2011, mw9. 0 Tohoku earthquake. Geophys Res Lett 39(18) 4. Ross et al (2017) Aftershocks driven by afterslip and fluid pressure sweeping through a faultfracture mesh. Geophys Res Lett 44(16):8260–8267 5. Ruhl CJ, Abercrombie RE, Smith KD, Zaliapin I (2016) Complex spatiotemporal evolution of the 2008 mw 4.9 mogul earthquake swarm (reno, Nevada): interplay of fluid and faulting. J Geophys Res Solid Earth 121(11):8196–8216 6. Edward et al (2017) A spatiotemporal clustering model for the third uniform California earthquake rupture forecast (ucerf3-etas): toward an operational earthquake forecast. Bull Seismol Soc Am 107(3):1049–1081 7. Johnson CW, Fu Y, Bürgmann R (2017) Stress models of the annual hydrospheric, atmospheric, thermal, and tidal loading cycles on California faults: perturbation of background stress and changes in seismicity. J Geophys Res Solid Earth 122(12):10–605 8. Irsyam et al (2020) Development of the 2017 national seismic hazard maps of Indonesia. Earthquake Spectra 36(1_suppl):112–136 9. Petersen et al (2017) 2017 one-year seismic-hazard forecast for the central and eastern united states from induced and natural earthquakes. Seismol Res Lett 88(3):772–783 10. Ben-Zion Y, Zaliapin I (2020) Localization and coalescence of seismicity before large earthquakes. Geophys J Int 223(1):561–583 11. Eroglu Azak T, Kalafat D, Se¸ ¸ setyan K, Demircio˘glu MB. Effects of seismic declustering on seismic hazard assessment: a sensitivity study using the Turkish earthquake catalogue. Bull Earthquake Eng 16(8):3339–3366 12. Gardner JK, Knopoff L (1974) Is the sequence of earthquakes in southern California, with aftershocks removed, Poissonian? Bull Seismol Soc Am 64(5):1363–1367 13. Reasenberg P (1985) Second-order moment of central California seismicity, 1969–1982. J Geophys Res Solid Earth 90(B7):5479–5495 14. Uhrhammer RA (1986) Characteristics of northern and central California seismicity. Earthquake Notes 57(1):21
26 A Seismcity Declustering Model Based on Weighted …
323
15. Knopoff L (2000) The magnitude distribution of declustered earthquakes in southern California. Proc Nat Acad Sci 97(22):11880–11884 16. Utsu T (1969) Aftershocks and earthquake statistics (1)-some parameters which characterize an aftershock sequence and their interrelations. J Fac Hokkaido Univ Ser 7, 3:125–195 17. Rehman K, Burton PW, Weatherill GA (2014) K-means cluster analysis and seismicity partitioning for Pakistan. J Seismol 18(3):401–419 18. Hainzl S, Scherbaum F, Beauval C (2006) Estimating background activity based on intereventtime distribution. Bull Seismol Soc Am 96(1):313–320 19. Vijay RK, Nanda SJ (2017) Tetra-stage cluster identification model to analyse the seismic activities of japan, Himalaya and Taiwan. IET Signal Process 12(1):95–103 20. Vijay RK, Nanda SJ (2019) Shared nearest neighborhood intensity based declustering model for analysis of spatio-temporal seismicity. IEEE J Sel Top Appl Earth Observ Remote Sens 12(5):1619–1627 21. Aden-Antoniow F, Frank WB, Seydoux L (2021) Transfer learning to build a scalable model for the declustering of earthquake catalogs 22. Ester M, Kriegel H-P, Sander J, Xiaowei X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 96:226–231 23. Vijay RK, Nanda SJ (2019) A variable epsilon-DBscan algorithm for declustering earthquake catalogs. In: Soft computing for problem solving. Springer, pp 639–651 24. Vijay RK, Nanda SJ (2019) A quantum grey wolf optimizer based declustering model for analysis of earthquake catalogs in an ergodic framework. J Comput Sci 36:101019 25. Sharma A, Nanda SJ, Vijay RK (2021) A binary NSGA-II model for de-clustering seismicity of Turkey and Chile. In: 2021 IEEE congress on evolutionary computation (CEC). IEEE, pp 981–988 26. Sharma A, Nanda SJ (2022) A multi-objective chimp optimization algorithm for seismicity de-clustering. Appl Soft Comput, 108742 27. Sharma A, Nanda SJ, Vijay RK (2021) A model based on fuzzy c-means with density peak clustering for seismicity analysis of earthquake prone regions. In: Soft computing for problem solving. Springer, pp 173–185 28. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496 29. United State Geological Survey. https://earthquake.usgs.gov/earthquakes/search/ (2022) 30. Bottiglieri M, Lippiello E, Godano C, De Arcangelis L (2009) Identification and spatiotemporal organization of aftershocks. J Geophys Res Solid Earth 114(B3)
Chapter 27
Wearable Small, Narrow Band, Conformal, Low-Profile Antenna with Defected Ground for Medical Devices Archana Tiwari and A. A. Khurshid
1 Introduction The emergence of antennas has led to many revolutionary discoveries in the field of defense, health care, communication, etc. Considering the field of health care, the need for microstrip antennas is rising magnanimously and is progressing at a very rapid speed. The need for the up-gradation of features and reduction in the size of biomedical devices is increasing to a greater extent day by day. This technological enhancement necessitates a miniaturized form of antenna that can be suitable for medical applications. The microstrip monopole antenna used in microwave imaging is a less-expensive solution for detecting diseases at early stages [1, 2]. These microstrip antennas have a huge potential for further development and also play an important role in wireless applications due to their comparatively greater bandwidth, higher directivity, low profile, easy fabrication, and integration [3, 4]. The rapid progress in body area network (BAN) [5, 6] has resulted in major research success in the past years due to their encouraging applications. Since these antennas are positioned in the near vicinity of the human body, the lossy tissues create a loading effect, and therefore, efficient antenna design is challenging [7]. This work introduces a compact antenna, which can be used for onbody/biomedical telemetry applications. It has been verified through the parametric optimization of the antenna and other performances; the proposed miniaturized antenna meets the requirements of radiation pattern, bandwidth, and frequency. The paper presents the literature review in Sect. 2, describes the design and analysis A. Tiwari (B) · A. A. Khurshid Electronics Engineering Department, Shri. Ramdeobaba College of Engineering and Management, Nagpur 440013, India e-mail: [email protected] A. A. Khurshid e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_27
325
326
A. Tiwari and A. A. Khurshid
in Sect. 3, and the fabrication results are tabled in Sect. 4. Section 5 presents the comparison of proposed design with the work of other researchers as cited in the literature and also the concluding remarks.
2 Literature Review In the process of designing, several research works related to techniques for miniaturization, and structures for its suitability for medical devices were reviewed, and the relevant literature is described below. The compact monopole patch antennas can be used for a variety of applications in ISM band (2.4–5.8 GHz) and therefore has attracted the interest of researchers. The miniaturized shape made these antennas suitable for embedding directly into biomedical and communication devices [8–10]. Al-Zoubi et al. [11] proposed a circular microstrip patch antenna with a ring-shaped patch with a return loss of − 35 dB, bandwidth of 12.8%, and simulated gain of 5.7 dBi at 5.8 GHz. Peng et al. [12] proposed monopole patch antenna with three stubs. The antenna was excited by a microstrip line, and the peak antenna gain 1.90–2.16 dBi for 2.4 GHz band and 3.30–3.85 dBi for 5 GHz band was obtained. Liu et al. [13] presented the microstrip monopole antenna with a circular patch. The bandwidth of 18% with a gain of 6dBi was achieved. Rahaman et al. [14] designed a compact microstrip wideband antenna targeting resonant frequency of 2.45 GHz, return loss of −48.99 dB, having bandwidth 900 MHz. The dimensions of the proposed design were 30 * 40 * 1.76 mm2 , and the simulated gain achieved was equal to 4.59 dBi. Yang and Xiao [15] designed an implantable antenna using single-feed and wide bandwidth that operates at 2.4 GHz with the antenna dimensions of 11 * 7.6 * 0.635 mm3 . The bandwidth ranged from 2.24 to 2.59 GHz with a peak gain of 20.8 dBi. Rahaman and Hossain [16] proposed a compact open-end slot feed microstrip patch antenna targeting a resonant frequency of 2.45 GHz and could achieve a return loss of −46.64 dB, bandwidth of 16%, and gain of 7.2 dBi. Different structures have been analyzed to be applicable for their use as wearable antennas, together with perpendicular monopoles [7], and microstrip monopole planar antennas [12, 13]. The planar monopole antennas have a small area; but, significant energy goes into the human body due to the radiation properties being omnidirectional. This work focuses on an efficient narrowband, low-profile, and small form factor antenna design. Since monopole antennas are simple to design, efficient, and have relatively high reactive impedance over the frequency range, their suitability can be further explored for on-body medical devices. Though in an isolated chamber, monopole antenna impedances vary; impedance matching can be controlled by the designer without the need for extraneous matching components. With the use of emerging techniques of defected ground structure, the antenna parameters can be improved [17–19].
27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …
327
From the literature survey, it can be concluded that miniaturization of antennas is a promising approach to enhance scalability of wearable devices, and the merits of monopoles can be utilized to explore its suitability. Hence, this work is directed to design a narrowband, low-profile, efficient monopole antenna for the ISM band.
3 Antenna Design and Analysis In order to achieve narrow bandwidth and compactness, the inverted G-shape patch with a meandering element with full ground plane, half ground plane, and defected ground structure was simulated using FR4 as substrate material. Antenna with a meandering line is used to transform the design of monopole antenna. The variations were experimented with the objective to increase the radiation part and decrease electric length so as to achieve miniaturization. The experience of each simulation is combined with the next to achieve the desired results. With parametric variations, the optimal design was derived with dimensions 23 * 20 * 1.6 mm3 including the finite ground plane of dimensions 8 * 20 mm2 is found to be best suited using HFSS. The defect on the ground is etched at the bottom side as shown in Fig. 1. The substrate dielectric constant of 4.4 with a thickness 1.6 mm was used for simulation. Table 1 depicts the derived parameters of the proposed antenna in Fig. 1. The designed inverted G-shape patch with a meandering element is an inset fed antenna. Using the technique from [17] on defected ground, it is inferred that the increase in length of slots on the ground plane the resonant frequency is reduced.
Fig. 1 Proposed antenna design view: front and back
328 Table 1 Dimension of the antenna
A. Tiwari and A. A. Khurshid Antenna parameters
Variables
Size (mm)
Patch
M A1 A2 A3 A4 A5 A6 A7 A8
3 11 8 3 7.5 1 5 2 0.8
Ground
Q Z Zg Q2
20 23 8 20
Slot
c1 c2
2.5 3
Therefore, using Optimetrics, the slot is adjusted with changes in its dimensions to achieve frequency of 2.54 GHz. It has been observed that, with the use of defects on the ground plane, current distribution is disturbed which depends on the dimension and shape of the defect thereby leading to changes in the input impedance. Thus, through variations of slots, the excitation and wave propagation through the substrate is controlled, and a better degree of compactness is obtained. The variations are shown in Table 2, and after multiple variations, the design-6 having slot size of 3 * 2.5 mm2 achieved the desired return loss as shown in Fig. 2. The gain of the proposed design antenna is found to be 3.8 dBi as shown in Figs. 3 and 4, with directivity equal to −3 dB. Figure 5 shows the impedance Smith chart plot of the proposed design which is found to be matching to 50- impedance. The defected ground structure has improved the radiation without the use of additional circuits.
4 Fabrication Results and Analysis The proposed design-6 (Table 2) is fabricated using low-profile FR4 substrate material with thickness 1.6 mm with the dielectric constant of 4.4. The fabricated antenna front and back view is presented in Fig. 6. Figure 7 shows the return loss vs frequency plot and Smith chart plot for the fabricated antenna. Table 3 compares the simulated and fabricated design, and the return loss for both simulated and fabricated design is found to be the same as −16.54 dB. It is observed that the resonance occurs at 2.27 GHz for the fabricated antenna. It can be concluded that the proposed antenna is having good compromise between the simulated and fabricated results. The actual measurement data are improved
27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …
329
Table 2 Ground plane and slot variations of antenna Antenna design
Ground dimensions (Zg * Q2) in mm
Slot dimensions (c2 * c1) in mm
Frequency in GHz
Return loss in dB
Gain in dBi
Design 1
11 * 20
4*4
3.5
−10.001
Design 2
11 * 20
3*3
3.5
−17.57
Design 3
10 * 20
3*3
3
−8.84
Design 4
9 * 20
3*3
3.5
−0.0004
1.21
Design 5
8 * 20
3*3
2.54
−14.24
1.21
Design 6
8 * 20
3 * 2.5
2.54
−16.54
3.89
Design 7
7 * 20
3 * 2.5
3.5
−10.29
0.99
Design 8
6 * 20
3 * 2.5
3
−9
0.99
0.15 0.07 −11.79
Fig. 2 Return loss plot of proposed design
Fig. 3 2D gain plot of proposed design
in comparison to simulation data. The meandered line model and defected ground technique are able to solve the problems. Table 4 compares the antenna parameters with the work done by other researchers, indicating that the proposed design and the technique used enables miniaturization. The design observes the required conditions
330
A. Tiwari and A. A. Khurshid
Fig. 4 3D gain plot of proposed design
Fig. 5 Smith chart plot of proposed design
Fig. 6 Front and back view of fabricated antenna
of compacted structure meeting the set performance objective, thus providing an improved solution as a reference for future designers.
27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …
331
Fig. 7 Return loss plot and Smith chart plot of fabricated antenna Table 3 Comparison of simulated and fabricated design Sr. No.
Parameter of comparison
Simulation results
Fabrication results
1
Return loss
−16.54 dB
−16.54 dB
2
Frequency
2.54 GHz
2.27 GHz
3
Bandwidth
60 MHz
130 MHz
4
Impedance
50 (approx.)
60
Table 4 Proposed antenna work and existing work comparison Paper reference
[20]
[21]
[22]
[23]
Proposed antenna
Substrate material
FR4
Rogers RO4003C
Rogers Arlon DiClad 880
FR4
FR4
Dielectric constant
4.3
3.38
2.2
4.4
4.4
Frequency 2.2 (in GHz)
2.53
2.4
2.6
2.54
Size of antenna (in mm3 )
45 * 38 * 0.065 40 * 41.5 * 0.508 57 * 46 * 0.07 31.3 * 34.9 * 1.6 23 * 20 * 1.6
Return loss (in dB)
−19
−21
> −10
> −10
−16.54
332
A. Tiwari and A. A. Khurshid
5 Conclusion The inverted G-shaped antenna with a meandering element with a monopole patch has been proposed and analyzed in this paper. The miniaturized design is comprehended with a high dielectric constant FR4 substrate and a meandered line, thereby controlling the size. The fundamental constraints were modeled and estimated with HFSS software, and the results of the fabricated design with the defected ground structure are found to be matching. The effects of using defected ground structure have successfully experimented with different dimensions, and its role in dimension reduction is successfully demonstrated. The performance of different designs is evaluated on the basis of their radiation, bandwidth, and return loss characteristics. The proposed design provides a good compromise between volume, bandwidth, and efficiency. It could be concluded that the proposed design can be used for medical devices. The fixed substrate material and thickness limit the work. In the future, substrate materials can be used to achieve miniaturization.
References 1. Ahadi M, Nourinia J, Ghobadi C (2021) Square monopole antenna application in localization of tumors in three dimensions by confocal microwave imaging for breast cancer detection: experimental measurement. Wirel Pers Commun 116:2391–2409. https://doi.org/10.1007/s11 277-020-07801-5 2. Rodriguez-Duarte DO, Tobón Vasquez JA, Scapaticci R, Crocco L, Vipiana F (2021) Assessing a microwave imaging system for brain stroke monitoring via high fidelity numerical modelling. IEEE J Electromagn RF Microw Med Biol 5(3) 3. Zhang ZY, Fu G, Gong SX, Zuo SL, Lu QY (2010) Sleeve monopole antenna for DVB-H applications. Electron Lett 46:879–880. https://doi.org/10.1049/el.2010.1035 4. Ahmad S, Paracha KN, Ali Sheikh Y, Ghaffar A, Dawood Butt A, Alibakhshikenari M, Soh PJ, Khan S, Falcone F (2021) A metasurface-based single-layered compact AMC-backed dualband antenna for off-body IoT devices. IEEE Access 9 5. Hall PS, Hao Y (2012) Antenna and propagation for body-centric wireless communications. Artech House 6. Jiang ZH, Cui Z, Yue T, Zhu Y, Werner DH (2017) Compact, highly efficient, and fully flexible circularly polarized antenna enabled by silver nanowires for wireless body-area networks. IEEE Trans Biomed Circ Syst 11(4) 7. Hall PS (2007) Antennas and propagation for on-body communication systems. IEEE Antenn Propag Mag 49:41–58 8. Ammann MJ, Chen ZN (2003) A wide-band shorted planar monopole with Bevel. IEEE Trans Antenn Propag 51:901–903. https://doi.org/10.1109/TAP.2003.811061 9. Suh SY, Stutzman W, Davis WA (2018) A new ultrawideband printed monopole antenna: the planar inverted cone antenna (PICA). IEEE Trans Antenn Propag 52:1361–1364. https://doi. org/10.1109/TAP.2004.827529 10. Elsheakh D, Elsadek HA, Abdallah E, Elhenawy H, Iskander MF (2009) Enhancement of microstrip monopole antenna bandwidth by using EBG structures. IEEE Trans Antenn Wirel Propag Lett 8:959–962. https://doi.org/10.1109/LAWP.2009.2030375 11. Al-Zoubi A, Yang F, Kishk A (2009) A broadband center-fed circular patch-ring antenna with a monopole like radiation pattern. IEEE Trans Antenn Propag 57(3):789–792. https://doi.org/ 10.1109/TAP.2008.2011406,March
27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …
333
12. Peng L, Ruan CL (2007) A microstrip fed monopole patch antenna with three stubs for dualband WLAN applications. J Electromagn Waves Appl 21:2359–2369. https://doi.org/10.1163/ 156939307783134263 13. Liu J, Xue J, Wong Q, Lai HW, Long Y (2013) Design and analysis of a low-profile and broadband microstrip monopolar patch antenna. IEEE Trans Antenn Propag 61:11–18. https:// doi.org/10.1109/TAP.2012.2214996 14. Rahaman A, Hossain QD (2018) Design of a miniature microstrip wide band antenna for on-body biomedical telemetry. In: International conference on smart systems and inventive technology (ICSSIT 2018), IEEE Xplore Part Number: CFP 18P17-ART, ISBN: 978-1-53865873-4 15. Yang ZJ, Xiao S (2018) A wideband implantable antenna for 2.4 GHz ISM band biomedical application. National Natural Science Foundation of China under Grant 61331007 and 61731005, IEEE 978-1-5386-1851-6/18 16. Anisur Rahaman M, Hossain QD (2019) Design and overall performance analysis of an open-end slot feed miniature microstrip antenna for on-body biomedical applications. In: International conference on robotics, electrical and signal processing techniques (ICREST) 17. Yi N et al (2010) Characterization of narrowband communication channels on the human body at 2.45 GHz. IET Microw. Antenn Propag 4:722–732 18. Khandelwal MK, Kanaujia BK, Kumar S (2017) Defected ground structure: fundamentals, analysis, and applications in modern wireless trends. Int J Antenn Propag 2018527:22 19. Abdel Halim AS (2019) Low-profile wideband linear polarized patch antenna using metasurface: design and characterization. Res Rev J Eng Technol. ISSN: 2319-9873 20. Zhang H, Chen D, Zhao C (2020) A novel printed monopole antenna with folded stepped impedance resonator loading. IEEE Access 8:146831–146837 21. Zhang H, Chen D, Zhao C (2020) A novel printed monopole antenna with stepped impedance hairpin resonator loading. IEEE Access 8:96975–96980 22. Johnson AD, Manohar V, Venkatakrishnan SB, Volakis JL (2020) Low-cost S-band reconfigurable monopole/patch antenna for CubeSats. IEEE Open J Antenn Propog 1:598–603 23. Modak S, Khan T, Laskar RH (2020) Penta-notched UWB monopole antenna using EBG structures and fork-shaped slots. Radio Sci 55:01–11
Chapter 28
A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense Circular Polarization Antenna for 5G/Wi-MAX and C-Band Satellite Applications Krishna Chennakesava Rao Madaka and Pachiyannan Muthusamy
1 Introduction In mobile communication systems, the positions of transmitter and receiver are not essentially fixed; rather their positions are prone to be continuously changing concerning each other. This may result in zero signal reception due to polarization mismatching if conventional linear polarized (LP) antennas are used. Circularly polarized (CP) radiations are more immune to polarization mismatch losses, multipath fading and antenna orientations. Multiband circular polarized antenna provides an improved communication link with reduced antenna size. Various techniques have been reported to implement multiband dual-sense antenna accounting for polarization diversity. In [1] C-shaped grounded stub in [2], L-patch with grounded rectangular stubs in a square slot, are used to achieve dual-sense characteristics, in [3] parasitic elements are introduced, in [4], a dielectric resonator has been loaded with the circular patch, in [5], a cylindrical dielectric resonator with truncated notches and a pair of arc shape slots, in [6] a rectangular DRA with asymmetrical square ring, are used to obtain dual-sense polarization. The use of asymmetric resonators excited by surface integrated waveguide has been studied in [7], two-port feeding technique is studied in [8], a circular slot and a tri strip embedded corner truncated rectangular patch [9] a dual polarized monopole antenna with parasitic annular ring in a quasi pentagonal slot [10], a slanted patch in an asymmetric square slot and ground [11] are studied, to obtain a dual-sense nature. However, all these reported techniques have the major constraint of large antenna size and complex antenna design. In this proposed work K. C. R. Madaka (B) · P. Muthusamy Vignan’s Foundation for Science, Technology and Research, Vadlamudi, Andhra Pradesh 522213, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_28
335
336
K. C. R. Madaka and P. Muthusamy
single-port antenna is studied to obtain dual-band dual-sense (DBDS) characteristics using a grounded annular ring, which is suitable to use for 5G communication (2.7–3.7 GHz) and satellite communication (6.8–7.6 GHz) applications [12–14].
2 Antenna Design and Analysis The schematic representation and the geometrical dimensions (mm) of the proposed (30 mm × 30 mm) antenna are illustrated in Fig. 1 and Table 1, respectively. A square slot is engraved in the ground plane and a slitted rectangular patch is placed in it. The radiating patch is excited by a 50 feed of 3.2 mm line width and is located at 0.4 mm from the ground plane. Fire Retardant substrate FR4 of 1.6 mm height, dielectric constant 4.4 and loss tangent value of 0.02 is used to realize this antenna. λg/2 auxiliary stubs are attached to the rectangular patch to get two orthogonal field components of the same amplitude with phase quadrature.
a b
Fig. 1 a Proposed antenna geometry. b Fabricated antenna
Table 1 Geometrical dimensions l1
l2
lg
lpp
ls
l
w
H
wf
wp
f
r
26.2
16.1
4.2
8.1
8.1
30
30
1.6
5.2
1.4
3.2
2
28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …
337
2.1 Prototype Analysis Hierarchical steps in the improvement of the antenna are demonstrated in Fig. 2 with antenna prototypes, their resultant return loss values and axial ratio (AR) values are sketched in Figs. 3 and 4, respectively.
Fig. 2 Prototype analysis
Fig. 3 Comparison of S 11
338
K. C. R. Madaka and P. Muthusamy
Fig. 4 Comparison of AR
The design is initiated by engraving a modified square slot with a narrow slit in the quasi ground plane and placing a rectangular patch, as portrayed in prototype I. It resonates with poor return loss characteristics and axial ratio (AR > 30 dB). By controlling the flared ground plane at the feeding port, the input impedance is gradually transformed. In prototype II, two half wavelength rectangular stubs are connected to the patch at its diagonal opposite ends, to get the orthogonal field components with 900 phase difference. Thus it results in the improvement in impedance matching and also causes dual-band with linear polarization characteristics with AR values ranging between 7.1 and 10.4 dB. The 3 dB axial ratio bandwidth (ARBW) is tuned to improve CP, by connecting another half waveguide length stub parallel to the auxiliary stub, as shown by prototype III. Three parallel slits of width 0.3 mm are etched in the patch and a semicircular ring is attached as shown in prototype IV and prototype V, respectively, for further improvement of return loss in lower band. The dual-sensing CP nature is obtained with the intruded annular ring in the square slot as figured in prototype VI. From Fig. 3, it is evident that the proposed prototype VI antenna resonates with good return loss characteristics in both resonating bands and provides a wide impedance bandwidth extending from 2.6 to 3.9 GHz and 6.6–8.7 GHz. From Fig. 4 it is observed that prototype I and prototype II are linearly polarized. The CP nature is introduced from prototype III. The axial ratios observed in the prototypes III, IV and V are poor in the lower band. The axial ratios in both the resonating bands are improved in prototype VI. The proposed antenna in prototype VI exhibits good circular polarization with ARBW extending from 2.7 to 3.7 GHz and 6.8–7.6 GHz.
28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …
339
2.2 CP Mechanism and Analysis The distribution of surface current vectors in both the circularly polarized bands is sketched in Figs. 5 and 6. The sense of polarization in the azimuthal plane at 3.2 and 7.3 GHz is studied by using the advancing current vectors at 00 , 900 , 1800 and 2700 . The predominant current vectors are rotating anti-clock wisely in the lower band, w.r.t. + z axis as the propagating direction, and is depicted in Fig. 5 and rotates clockwise in the higher band as shown in Fig. 6. Hence conforming to right-handed circular polarization (RHCP) for the lower radiating band while left-handed circular polarization (LHCP) for the higher radiating band.
a)0°
b)90°
c)180°
d)270°
Fig. 5 Surface current vectors (3.2 GHz)
340
K. C. R. Madaka and P. Muthusamy
a)0°
c)180°
b)90°
d)270°
Fig. 6 Surface current vectors (7.3 GHz)
3 Results and Discussion The measured return loss of the presented antenna is sketched in Fig. 7. It shows the −10 dB impedance bandwidth (ImBW) which is extending in 2.6–3.9 GHz and 6.6–8.7 GHz frequency bands. A good agreement with slight deviation is observed between measured and simulated return loss. The observed deviation is due to the connector and soldering losses. The two diagonally connected auxiliary stubs of the radiating patch and the intruded annular ring provide the broadband CP characteristics. The 3 dB ARBW extends from 2.7 to 3.7 GHz and 6.8–7.6 GHz, illustrated in Fig. 8. The gain of this antenna in both the resonating bands is sketched in Fig. 9 and that is conforming a flat gain of ≈ 3dBi with a variation of ±0.2 dBi in both the resonating bands. The dual-sensing CP nature of the antenna is investigated using normalized RHCP and LHCP radiation patterns of xz-plane (E-plane) and yz-plane (H-plane) at 3.2 and 7.3 GHz.
28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …
341
Fig. 7 Measured versus simulated return loss
Fig. 8 Axial ratio
At 3.2 GHz, right polarization is dominant with good polarization purity (>30 dB) in both xz and yz planes, thus conforming the RHCP in the lower band as depicted in Fig. 10a and b. At 7.3 GHz, left polarization is dominant with good polarization purity (>30 dB) in both xz and yz planes, thus conforming to the LHCP in a higher frequency band as plotted in Fig. 10c and d.
342
K. C. R. Madaka and P. Muthusamy
Fig. 9 Antenna gain (in dBi)
The proposed circularly polarized dual-sense antenna is compact when compared to the state of art antennas referred in Table 2.
4 Conclusion A CPW fed dual-band dual-sensing circularly polarized antenna has been investigated in this letter. The antenna exhibits both RHCP (2.7–3.7 GHz) and LHCP (6.8–7.6 GHz) characteristics. The précised transformation of the input impedance is realized by flaring the ground at the feed; which results in dual resonating bands. Circular polarization is generated by using auxiliary stubs having a length of λg/2. Dual-sense characteristics are achieved by perturbing the square slot with an annular ring that is embedded in the ground plane. The lower radiating band is right-handed circular polarized while the higher radiating band is left-handed circular polarized. The presented antenna has a compact geometry with a simplified single-port structure and can be implemented for 5G communication (2.7–3.7 GHz) and satellite communication (6.8–7.6 GHz) applications.
28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …
343
Fig. 10 Radiation patterns of normalized RHCP and LHCP. a 3.2 GHz (xz-plane), b 3.2 GHz (yz-plane), c 7.3 GHz (xz-plane), d 7.3 GHz (yz-plane) Table 2 Comparison of DBDS antennas
References
l×w
f c (GHz)
Dual-sense
[2]
1.3 λg × 1.3 λg
3.1
Yes
[3]
1.3 λg × 1.5 λg
3.5
Yes
[5]
2.8 λg × 2.8 λg
5.3
No
[6]
1.1 λg × 0.96 λg
2.7
Yes
[9]
0.8 λg × 0.8 λg
3.1
No
[10]
1.7 λg × 1.7 λg
4.9
No
Proposed
0.7 λg × 0.7 λg
3.2
Yes
344
K. C. R. Madaka and P. Muthusamy
References 1. Chen YY, Jiao YC, Zhao G, Zhang F, Liao ZL, Tian Y (2011) Dual-band dual-sense circularly polarized slot antenna with a C-shaped grounded strip. IEEE Antenn Wirel Propag Lett 10:915– 918 2. Rui X, Li J, Wei K (2016) Dual-band dual-sense circularly polarized square slot antenna with simple structure. Electron Lett 52(8):578–580 3. Saini RK, Dwari S, Mandal MK (2017) CPW-fed dual-band dual-sense circularly polarized monopole antenna. IEEE Antenn Wirel Propag Lett 16:2497–2500 4. Pan YM, Zheng SY, Li W (2014) Dual-band and dual-sense Omni directional circularly polarized antenna. IEEE Antenn Wirel Propag Lett 13:706–709 5. Zhou YD, Jiao YC, Weng ZB, Ni T (2015) A novel single-fed wide dual-band circularly polarized dielectric resonator antenna. IEEE Antenn Wirel Propag Lett 15:930–933 6. Sahu NK, Sharma A, Gangwar RK (2018) Design and analysis of wideband composite antenna with dual-sense circular polarization characteristics. Microw Opt Technol Lett 60(8):2048– 2054 7. Kumar K, Dwari S, Mandal MK (2018) Dual-band dual-sense circularly polarized substrate integrated waveguide antenna. IEEE Antenn Wirel Propag Lett 17(3):521–524 8. Saini RK, Dwari S (2016) A broadband dual circularly polarized square slot antenna. IEEE Trans Antenn Propag 64(1):290–294 9. Khan MI, Chandra A, Das S (2019) A dual band, dual polarized slot antenna using coplanar waveguide. Adv Comp Commun Contr. Lect Notes Netw Syst 41:95–103 10. Madaka KC, Muthusamy P (2020) Mode investigation of parasitic annular ring loaded dual band coplanar waveguide antenna with polarization diversity characteristics. Int J RF Microwave Comput Aided Eng 30(4). https://doi.org/10.1002/mmce.22119 11. Fu Q, Feng Q, Chen H (2021) Design and optimization of CPW-fed broad band circularly polarized antenna for multiple communication systems. Progr Electromagn Res Lett 99(00):65– 75 12. Tang H, Zong X, Nie Z (2018) Broadband dual-polarized base station antenna for fifthgeneration (5G) applications. Sensors 18(8):2701 13. Federal Communications Commission. https://docs.fcc.gov/public/attachments/FCC-19130A1.pdf, Regulations 2019/12/16 14. Intelsat. Polarization 2013. http://www.intelsat.com/wp-content/uploads/2013/02/Polarizat ion.pdf
Chapter 29
An Analytical Appraisal on Recent Trends and Challenges in Secret Sharing Schemes Neetha Francis and Thomas Monoth
1 Introduction Security is a challenging matter in the recent scenario where all are connected to a public network and the data are usually stored on large servers. Anybody can steal private data of an organization which is open in a public place. But organizations need to protect data from disclosure. One way to protect information is by using conventional encryption. But what happens when the encrypted information is corrupted or when the secret key is lost. Secret sharing addresses this problem and finds solutions for ensuring both confidentiality and reliability. Instead of storing the valuable data in a single place, it is distributed and stored at several places. When the necessity ascends, they can be reconstructed from the distributed shares. The loss of a key cryptographic key is equivalent to data loss as the original data cannot be retrieved back without the encryption key. The security of secret keys used in cryptographic algorithms is very important. The key kept in a single location is highly undependable as a single misfortune such as computer failure, sudden death of a person possessing the secret may lead to great problems. A clear answer is to store the keys at several places. But this makes the situation even worse which provides chances for hackers and hence vulnerable to different types of attacks. The secret sharing-based solution provided a better key management which provides both confidentiality and reliability. This paper presents a comprehensive study of various Secret Sharing Schemes (SSS). Recent research developments in secret sharing are reviewed and conducted N. Francis (B) Department of Information Technology, Kannur University, Kannur, Kerala, India e-mail: [email protected] T. Monoth Department of Computer Science, Mary Matha Arts & Science College, Mananthavady, Wayanad, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_29
345
346
N. Francis and T. Monoth
an analysis on different SSS. The major issues are verifiability, cheating detection and cheater identification. In order to overcome the challenges in existing SSS, new techniques using threshold SSS and Verifiable Secret Sharing (VSS) can be developed by various mathematical models.
2 Secret Sharing Schemes The concept of secret sharing is to begin with a secret and split it into pieces called shares or shadows and they are given to shareholders in such a way that the collective shares of chosen subsets make the reconstruction of the secret possible. Secret sharing provides a strong key management scheme that is secure and reliable. The key is made secure by distributing it to n shareholders. If t or more shareholders are there, they can reconstruct it by combining the separate shares held by each shareholder. An authorized set can be defined as any subset of shareholders which comprises t or more than t participants. This method is called t-out-of-n threshold scheme and is denoted as (t, n), where n is the entire number of shareholders and t is the threshold value. This will not disclose the secret if less than t shares are known. Knowledge of less than t shares will not reveal any information about the secret. The size of the share is very important in a SSS. The competence of the scheme can be calculated by the information rate which is the ratio of size of the secret to size of the share. A SSS in which the information rate is equal to one is considered to be ideal. In a secret sharing scheme, the dealer is assumed to be honest. However a dishonest dealer may send inconsistent shares to the participants. To avoid such malicious behavior, protocols need to be implemented which permits the participant to validate the consistency of the shares. VSS will make the shareholders confirm that their shares are consistent. There are two types of VSS protocols, interactive proof and non-interactive proofs. In the Publicly VSS scheme, besides the participants, everyone can check whether the shares are properly allocated. Shamir [1] proposed a scheme based on Lagrange’s interpolating polynomials. For a (t, n) threshold scheme, Dealer picks a random t – 1 degree polynomial: q(x) = a0 + a1 x + . . . + at−1 x t−1
(1)
where a0 is the secret S and choose a prime p such that p ≥ n + 1. Dealers then generate n shares S 1 = q(1), S 2 = q(2),…….,S n = q(n) and securely distribute them to n participants. The Shamir’s scheme is depicted in Fig. 1. Let q(x) = a0 + a1 x + …….. + a t-1 × t−1 , where a0 = S. The n shadows are computed by evaluating q(x) at n different values x 1 , x 2 ,……,x n and x i = 0 for any i Si = q(xi ), 1 ≤ i ≤ n
(2)
Each point (x i , S i ) is a point on the curve defined by the polynomial. The values x 1 , x 2 ,……,x n need not be private and could be the numbers through 1,……,n. As
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
347
Fig. 1 Shamir’s Secret Sharing Scheme
t points uniquely determine the polynomial q(x) of degree t – 1, the secret S can be constructed from t shares. If P is the set of participants and access structure A ⊆ P such that |A| ≥ t , then q(x) can be constructed using Lagrange’s interpolation formula with t shares of participants. q(x) =
t (Si j. j=1
1≤k≤t,k= j
x − xik ) xi j − xik
(3)
Since S = q(0), we can rewrite the formula as S = q(0) =
t (Si j. j=1
1≤k≤t,k= j
xik ) xik − xi j
(4)
The above method is implemented mathematically with an example. Given S = 9406, n = 5, t = 3. Pick a prime number larger than p: 104,729. Generate two random coefficients: 55,142, 238. Polynomial is q(x) = 9406 + 55142x + 238x 2 Evaluate q(x) at x = 1, 2, 3, 4, 5 q(1) = 64786 mod 104729 = 64786 q(2) = 120642 mod 104729 = 15913 q(3) = 176974 mod 104729 = 72245 q(4) = 233782 mod 104729 = 24324 q(5) = 291066 mod 104729 = 81608 Hence the keys are: (1,64,786), (2, 15,913), (3, 72,245), (4,24,324), (5, 81,608). Suppose we have the keys 2, 3 and 5. To reconstruct, apply Lagrange’s interpolation method and compute q(0).
348
N. Francis and T. Monoth
q(0) =
3
yi.
i=1
x − xj (mod p) xi − x j j=i
−2. − 5 −3. − 5 + 72245. −1. − 3 1. − 2 −2. − 3 + 81608. (mod 104729) 3.2 = 15913.5 + 72245. − 5 + 81608(mod 104729) = 15913
= −2000052(mod 104729) = 9406 Thus secret S = 9406 is recovered successfully.
3 Recent Research Advances in Secret Sharing Schemes Threshold SSS were proposed independently by Shamir and Blakley [2] and since then much research works are going on in this area. Both schemes implement t-outof-n schemes. Polynomial-based constructions are used by Shamir whereas vector space constructions are used by Blakley. Schemes based on number theory were also introduced for the threshold secret sharing scheme. The Mignotte scheme [3] is based on Chinese Remainder Theorem (CRT) and modulo arithmetic. The shares are created using a special sequence of integers called Mignotte sequence and the secret is reconstructed by solving a set of congruence equations using CRT. This scheme is not perfect. Another perfect scheme based on CRT is introduced by Asmuth and Bloom [4]. They also use a special sequence of pairwise co-prime positive integers. Threshold Secret Sharing (TSS) schemes are primarily based on polynomial, vector space and number theory. The following section will give a brief description about various secret sharing methods that exploded in the literature during the period of 2006 to 2022. Bai [5] presented a robust (k, n) threshold SSS with k access levels using matrix projection. The secrets are represented as the members in a square matrix S. Here, the size of the share is comparatively lesser than the size of the secret. Due to the information concealment ability, it has many desired properties. Even though the technique is not a perfect SSS, the secrets are made secure. The scheme is capable of sharing multiple secrets and hence can be used as it is efficient, secure and reliable. Kaya et al. [6] studied the possibility of applying threshold cryptography with the Asmuth-Bloom SSS and also proposed three different functional secret sharing schemes for Elgamal and RSA. These schemes were based on Asmuth-Bloom SSS and also considered as the first security enhanced system. It would be a noteworthy improvement if there is a method to find out the messages or signatures exclusive of the correction phase. Also, extra characteristics like robustness and proactivity can
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
349
be integrated into the future techniques. The concepts described in this paper helped to obtain functional secret sharing for various public key cryptographic systems. Tang and Yao [7] proposed a novel (t, n) TSS method. It is based on Secure Multiparty Computation (SMC) in which secret K is distributed even if (t – 1) users are fraudulent and the discrete logarithm problem is challenging. It also depends on the multi-prover zero-knowledge arguments. As the distribution protocol in Shamir’s TSS scheme the dealer distributes secret K. SMC protocol allows any set of t users to rebuild secret K. t participants can confirm that they possess the secret K using multi-prover zero-knowledge arguments. Wang and Wong [8] studied the communication competence issues of secret reconstruction in SSS. They proved that there exists a relation between the communication cost and the number of users included in the process of secret reconstruction. They compromised the necessity of devising a confidential point to point communication means as in traditional methods and presented that some partial broadcast means are enough to perform secret reconstruction. An interesting research challenge is to discover a few more efficient structures which have optimal or suboptimal communication intricacy. Lao and Tartary [9] analyzed the properties of threshold changeable schemes. A novel CRT-based secret sharing is introduced, in which it approves several threshold alterations following the actual set-up phase deprived of demanding any communications with the dealer. One advantage of their interpretation is that the secret is at all times assured to be reconstructed after any threshold changes conflicting other schemes where recovery is only probabilistic. To consider the users who move away from the threshold update procedure is one of the challenging issues faced by the system. Bai and Zou [10] familiarized a novel and safe PSS technique for secret sharing constructed on matrix projection. This technique permits sharing of more than one secret whereas Shamir’s method allows sharing of one secret at a time. Also, this method concentrated on forming a distributed PSS scheme to withstand passive attacks which are hard to fix. A matrix is generated using the Pythagorean triples which ensure the security against passive attacks. Lin and Harn [11] proposed two variations of Shamir’s SSS. In the first method, each participant preserves both x and y coordinates of a polynomial as their confidential share. Any t private shares together with some public shares can allow secret reconstruction. These revised techniques are proved to be ideal and perfect. The suggested method utilizes polynomials to generate shares for the participants and applies Lagrange’s interpolation method to rebuild the secret. A multi-level TSS method is created for secret reconstruction and proved to be safe and secure. Wang et al. [12] proposed a multiple SSS based on matrix projection method. This method has the benefit that there is no restriction on the number of secrets that can be shared and it is not required to fill dummy elements into the secret matrix. It attains share size which is not varying as that of a single secret. The proactive feature of matrix projection technique thereby increases the scheme’s overall security. It can periodically update shares without changing the secrets. This method is not completely verifiable based on the features of the projection matrix. As it is only
350
N. Francis and T. Monoth
required to modify the public remainder matrix to distribute another set of secrets, the scheme is said to be dynamic to secret change. Shi and Zhong [13] investigated the issue of changing the threshold limit of Shamir’s threshold technique without assistance of the dealer. The challenging issue of the existing known methods is that it requires the dealer to pre-calculate the public information as per each threshold limit or announce the public function in advance. A new method with the threshold value increasing in the semi-honest model is presented in this paper. In the proposed technique, all users work together to take the role of dealer and to furnish the share renewal process. Each user stores only one share which has the same size as that of the secret. Hence the proposed method is perfect, secure and ideal. The challenge is how to fulfill the protocol in the malicious model. Lin and Harn [14] proposed a new technique to model a multi-SSS with unconditional security. Here, the dealer creates a polynomial with t-1 degree to allow n participants to share t master secrets. Every participant stores a single private share and applies this share to retrieve t secrets successively. This multi threshold scheme turns out to be more efficient which uses Shamir’s scheme. Sun et al. [15] utilized and optimized SSS rather than Lagrange’s method. Two way authentications are offered to guarantee that only the approved sets of participants can retrieve the correct session key. Every participant has to keep only one secret share for all sessions. The secret shares stored by participants can be used at all times instead of altering with various group keys as they exist before the generation of the group key. Farras and Padro [16] presented a normal description for the group of hierarchical access structures and an overall representation of the ideal hierarchical access structures. It is shown that each hierarchical matroid port allows an ideal linear SSS over a finite field. An open challenge is the optimization of the size of the shares with respect to the size of the secret in SSS. Zhang et al. [17] analyzed the security of four recently presented VSS. The study revealed that each of these techniques is vulnerable to cheating by dishonest dealers. The dealer has to announce some repeated information for the purpose of consistency testing. The dealer randomly provides secrets for sharing and also one reusable share can be stored by every shareholder. Every shareholder can find out the cheating by other shareholders with the help of a non-interactive protocol. The dealer and the shareholders are connected by an open channel. The dealer is not aware of the other user’s shadow. Singh et al. [18] presented multi-level SSS based on CRT. In this method, participants are categorized into various security subsets and each participant will hold a part of multi-secret. Multiple secrets are distributed between the users as each subset in successive order. Upper level shares can be used by a lower level subset to recover the secret. Verification is offered to identify cheating in the proposed technique. Asmuth-Bloom sequence is utilized to allocate multiple secrets. It has a controlled range to share a single secret. The shares are reusable in this method. It is unconditionally secure and efficient. Muthukumar and Nandhini [19] discussed two algorithms for securely sharing medical data in the cloud. SSS used polynomials to divide the data whereas the
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
351
information dispersal algorithm used matrices. It decreases transmission cost and space complexity. The proposed scheme increases the security of the system. It is used for dynamic databases where the participants are not involved. It has the capability of sharing highly confidential data in a multi-cloud environment. Deepika and Sreekumar [20] presented two modifications for a SSS using Gray code and XOR. The shares are created with Gray code and the secret is retrieved using XOR operation on the shares. Using this method, two different schemes 7-outof-7 scheme and 3-out-of-7 scheme are constructed. Two groups of shares which are Qualified set and Forbidden set are also created. The Qualified set includes 3 shares from 7 shares and the Forbidden set includes 4 shares from 7 shares. The presented technique can be used with algorithms in cryptography and secret sharing. Basit et al. [21] proposed a hierarchical, multi-stage SSS based on one-way function and polynomials. This technique has the same security level as that of Shamir’s method. It differs from the level of hardness of the one-way function. Shareholders are divided into various levels based on hierarchical access structure and every level has a separate value as threshold. Only one share of a multi-secret is held by each shareholder. Hence it thereby decreases participant’s difficulty in holding more than one share. It is not essential to refresh the shares for future communication. Shares of higher level shareholders can be used for reconstructing the secrets if the number of available shareholders is smaller than the threshold. Babu et al. [22] considered a multi-stage SSS based on CRT. This method requires only n-t + k + 2 public values. To reconstruct the secret, only one Lagrange’s interpolation polynomial is needed. All shareholders can together verify whether the share submitted by a shareholder is exact or not. By computing n additional public values and publishing on the bulletin board, cheater identification can be performed by the participants. The size of the secret is increased by k times even though each shareholder stores only one share for all the secrets. Liu et al. [23] discussed cheating issues in bivariate polynomial-based SSS and proposed two algorithms for cheating identification. The initial one can detect cheaters by m participants who are included in secret reconstruction. The next one can attain cheater identification with more capability. It is performed with the alliance of the remaining n–m participants who are not engaged in secret reconstruction. Thus proposed algorithms are competent in terms of cheater identification potentials. Jia et al. [24] proposed a threshold changeable SSS in which threshold can be modified in the interval [t, t’] without renewing the shares. In this method, another threshold can be initiated at any time using the public broadcast channel. The scheme makes use of a suggested sequence of nested closed intervals by large co-prime numbers. Harn et al. [25] presented an extended SSS, called secret sharing with secure secret reconstruction in which the secret can be safeguarded in the retrieval stage from both inside and outside attacks. Outsiders have to capture all the published shares to reconstruct the secret. Since a further number of shares is required in reconstruction, it will enhance the security. The fundamental scheme is extended so that the reconstructed secret is only available to participants.
352
N. Francis and T. Monoth
Kandar et al. [26] presented VSS with cheater identification. The method enables combiner verification by the shareholders to verify whether the request for share submission is from an authenticated combiner or not. The scheme has proved that it can withstand different types of attacks. In this method, each user is allotted a shadow share which eliminates the risk of reconstruction and then combining the shares of the minimum number of participants. The authenticity of the combiner is also checked by each participant before submitting their shares and this will eliminate the risk of an opponent acting as a combiner. Meng et al. [27] proposed a threshold changeable SSS using bivariate symmetry polynomial which is both immune cheating and prevents the illegal attack by participants. In the basic threshold changeable scheme during secret reconstruction, threshold is permitted to intensify from t to the precise number of all participants. If valid shares are produced by all participants, then the secret can be recovered. Moreover, a revised TCSS scheme is proposed in order to reduce the coefficients of shares for each participant. Huang et al. [28] uses the error correction ability of QR code to suggest a (n, n) TSSS. A secret QR code can be divided and encrypted into n cover QR codes. The created QR codes still contain cover messages so that unauthorized people cannot detect the presence of secret messages while transmitting in the public channel. The secret QR code can be easily recreated using XOR operation if all the n authorized participants provide their shares. The method is proved to be both feasible and robust. Yuan et al. [29] proposed a hierarchical multi-SSS based on the linear homogeneous recurrence (LHR) relations and one-way function. This method decreases the computational complexity of HSSS from exponential time to polynomial time. It can simultaneously share multiple secrets. Every participant only holds a single share during the execution. This method is both perfect and ideal. It also evades the verification of non-singularity of the matrices in the above method. Ding et al. [30] analyzed the security of existing secure secret reconstruction schemes based on bivariate polynomials. A theoretical model for the construction of secure secret reconstruction schemes in the dealer-free and non-interactive scenario is proposed. The share sizes are identical to other prevailing insecure (t, n) SSR schemes.
4 Comparative Analysis of Various Secret Sharing Schemes An analysis is made on different methods used in secret sharing along with their advantages and challenges. The summary of various modifications in SSS which are explained in the previous sections can be shown in Table 1. From Fig. 2, it can be seen that 55% of the research works are based on polynomial methods, 26% are using CRT-based methods, 16% are using matrix-based methods and 3% are using vector-based methods. Threshold secret sharing schemes are mainly based on polynomial, vector space, matrix and number theory. Other threshold schemes are hierarchical threshold secret sharing, weighted threshold secret
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
353
Table 1 Comparison of Various SSS Sl. No
Authors and Year
SS schemes used
Techniques used
Advantages/Challenges
1
Shamir [1]
Threshold
Polynomial based
Information theoretic security, minimal, extensible, dynamic, flexible, not verifiable
2
Blakley [2]
Threshold
Vector space based
Secret is an element of vector space, shares are n distinct (t-1) dimensional hyper planes, not perfect
3
Asmuth and Bloom [3]
Threshold
CRT based
Uses special sequence of pairwise co-prime integers, perfect
4
Mignotte [4]
Threshold
CRT based
Uses Mignotte sequence of integers, not perfect
5
Bai [5]
Threshold
Matrix projection based
Information concealment capability, smaller share size, not perfect
6
Kaya et al. [6]
Threshold
CRT based
Function sharing schemes for RSA, ElGamal and Paillier cryptosystems, robustness, proactivity has to be integrated
7
Tang and Yao [7]
Threshold
Polynomial based
Secure multiparty computation, proof by zero-knowledge arguments
8
Wang and Wong [8]
Threshold
Polynomial based
Partial broadcast channels for secret reconstruction, easy to implement, smaller share size, require secure multiparty cryptographic protocols
9
Lao and Tartary [9] Threshold
CRT based
Allows multiple threshold changes, perfect security, dealer-free environment
10
Bai and Zou [10]
Proactive
Matrix projection based
Share multiple secrets, counter passive adversary attacks, no measures to withstand active attacks
11
Lin and Harn [11]
Threshold
Polynomial based
Ideal, perfect, multi-level threshold SS
12
Wang et al. [12]
Multiple threshold
Matrix projection based
Constant share size, partially verifiable, dynamic (continued)
354
N. Francis and T. Monoth
Table 1 (continued) Sl. No
Authors and Year
13
SS schemes used
Techniques used
Advantages/Challenges
Shi and Zhong [13] Threshold
Polynomial based
Participants renew shares, secure, perfect, ideal
14
Lin and Harn [14]
Multi threshold
Polynomial based
Shareholder keeps one private share, unconditional security
15
Sun et al. [15]
Threshold
Polynomial based
Provide mutual authentication, reduced storage cost, improved computation efficiency
16
Farras and Padro [16]
Hierarchical threshold
Polymatroid based
Total characterization of ideal hierarchical access structures, optimization of length of shares
17
Zhang et al. [17]
Verifiable
Polynomial based
No secret channels, reusable shadows, detect cheating, lack of consistence test of information
18
Singh et al. [18]
Multi-level multi-stage threshold
CRT based
Detect cheating, reusable shares, unconditionally secure, efficient
19
Muthukumar and Nandhitha [19]
Threshold
Polynomial based, Matrix based
Reduced transmission overhead and space complexity, share high sensitive data in multi-cloud environment
20
Deepika and Sreekumar [20]
Threshold
Gray code and XOR Used as a cryptographic operation algorithm for SS and visual SS, no information loss, can be used for visual cryptography
21
Basit et al. [21]
Hierarchical multi-stage multi-secret threshold
Polynomial based
Unconditionally secure, shares can be reused, participant’s risk to keep multiple shares are minimized, no verification
22
Babu et al. [22]
Multi-stage threshold
Polynomial based
Less number of public values, cheater identification, size is increased by k times (continued)
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
355
Table 1 (continued) Sl. No
Authors and Year
SS schemes used
Techniques used
Advantages/Challenges
23
Liu et al. [23]
Threshold
Polynomial based
Identify cheaters by m participants engaged in secret reconstruction and remaining n-m participants who are not engaged in reconstruction
24
Jia et al. [24]
Threshold
CRT based
Threshold can be changed in an integer interval, smaller share size and low complexity for recovery
25
Harn et al. [25]
Threshold
Polynomial based
Secure from insider and outsider attacks, uses symmetric bivariate polynomial to generate the shares, enhanced security
26
Kandar et al. [26]
Verifiable
Polynomial based
Cheater identification feature, combiner, verification by the shareholders
27
Meng et al. [27]
Threshold
Polynomial based
Based on univariate polynomial and bivariate symmetry polynomial, prevents illegal participant attack, reduce coefficients of shares, dealer-free, non-interactive and cheating immune
28
Huang et al. [28]
Threshold
Polynomial based
utilizes the error correction capacity of QR code, feasible, high robustness, higher security
29
Yuan et al. [29]
Hierarchical multi-secret
Polynomial based
Reduces computational complexity, share multiple secrets, each participant only holds one share, both perfect and ideal
30
Ding et al. [30]
Threshold
Polynomial based
Based on asymmetric bivariate polynomials, easy to construct, same share size, dealer-free and non-interactive
356 Fig. 2 Graphical representation of techniques used in SSS
N. Francis and T. Monoth
Techniques Used 16% 3% 55% 26%
Polynomial CRT Vector Matrix
sharing and compartmented secret sharing. It can be analyzed that a polynomial based method can be considered as the best method based on the review.
5 Conclusion A comprehensive study of SSS for information security is presented in this paper. This also compared and analyzed the recent research advances in secret sharing done by different researchers. By reviewing the literature, it is realized that several SS methods are investigated to overcome the challenges existing in the fundamental methods proposed by Shamir and Blakley. Most of the studies are based on modifications to TSS schemes as they are easy to implement. Hence future studies can be focused on the multi-level schemes using TSS. As there is a need to improve the efficiency and security of existing SS methods and to obtain a cheating immune system, an approach of combining TSS with VSS schemes is suggested based on the review.
References 1. Shamir A (1979) How to share a secret. Commun ACM 22(11):612–613 2. Blakley, G. R.: Safeguarding cryptographic keys. In Managing Requirements Knowledge, 313–313. IEEE Computer Society, New York (1979). 3. Asmuth C, Bloom J (1983) A modular approach to key safeguarding. IEEE Trans Inf Theory 29(2):208–210 4. Mignotte, M.: How to share a secret. In Workshop on cryptography, 371–375. Springer, Berlin, Heidelberg (1982). 5. Bai, L.: A strong ramp secret sharing scheme using matrix projection. In 2006 International Symposium on a World of Wireless, Mobile and Multimedia Networks, 5–656. IEEE, (2006). 6. Kaya K, Selçuk AA (2007) Threshold cryptography based on Asmuth-Bloom secret sharing. Inf Sci 177(19):4148–4160 7. Tang, C., Yao, Z. A.: A new (t, n)-threshold secret sharing scheme. In International Conference on Advanced Computer Theory and Engineering, 920–924. IEEE, (2008). 8. Wang H, Wong DS (2008) On secret reconstruction in secret sharing schemes. IEEE Trans Inf Theory 54(1):473–480
29 An Analytical Appraisal on Recent Trends and Challenges in Secret …
357
9. Lou, T., Tartary, C.: Analysis and design of multiple threshold changeable secret sharing schemes. In International Conference on Cryptology and Network Security, 196–213, Springer, Berlin, Heidelberg. (2008). 10. Bai L, Zou X (2009) A proactive secret sharing scheme in matrix projection method. Int J Secure Network 4(4):201–209 11. Lin, C., Harn, L., & Ye, D.: Ideal perfect multilevel threshold secret sharing scheme. In Fifth International Conference on Information Assurance and Security (Vol 2), 118–121, IEEE (2009). 12. Wang, K., Zou, X., & Sui, Y.: A multiple secret sharing scheme based on matrix projection. In 33rd Annual IEEE International Computer Software and Applications Conference (Vol 1), 400–405, IEEE (2009). 13. Shi, R., Zhong, H.: A secret sharing scheme with the changeable threshold value. In International Symposium on Information Engineering and Electronic Commerce, 233–236, IEEE (2009). 14. Lin, C., Harn, L.: Unconditionally secure multi-secret sharing scheme. In IEEE International Conference on Computer Science and Automation Engineering (Vol 1), 169–172, IEEE (2012). 15. Sun Y, Wen Q, Sun H, Li W, Jin Z, Zhang H (2012) An authenticated group key transfer protocol based on secret sharing. Procedia Engineering 29:403–408 16. Farras O, Padró C (2012) Ideal hierarchical secret sharing schemes. IEEE Trans Inf Theory 58(5):3273–3286 17. Liu Y, Zhang F, Zhang J (2016) Attacks to some verifiable multi-secret sharing schemes and two improved schemes. Inf Sci 329:524–539 18. Singh, N., Tentu, A. N., Basit, A., & Venkaiah, V. C.: Sequential secret sharing scheme based on Chinese remainder theorem. In IEEE International Conference on Computational Intelligence and Computing Research, 1–6, IEEE (2016). 19. Muthukumar, K. A., Nandhini, M.: Modified secret sharing algorithm for secured medical data sharing in cloud environment. In Second International Conference on Science Technology Engineering and Management, 67–71, IEEE (2016). 20. Deepika, M. P., Sreekumar, A.: Secret sharing scheme using gray code and XOR operation. In Second International Conference on Electrical, Computer and Communication Technologies, 1–5, IEEE (2017). 21. Basit, A., Kumar, N. C., Venkaiah, V. C., Moiz, S. A., Tentu, A. N., & Naik, W.: Multi-stage multi-secret sharing scheme for hierarchical access structure. In International Conference on Computing, Communication and Automation, 557–563, IEEE (2017). 22. Babu, Y. P., Kumar, T. P., Swamy, M. S., & Rao, M. V.: An improved threshold multistage secret sharing scheme with cheater identificaion. In International Conference on Big Data Analytics and Computational Intelligence, 392–397, IEEE (2017). 23. Liu Y, Yang C, Wang Y, Zhu L, Ji W (2018) Cheating identifiable secret sharing scheme using symmetric bivariate polynomial. Inf Sci 453:21–29 24. Jia X, Wang D, Nie D, Luo X, Sun JZ (2019) A new threshold changeable secret sharing scheme based on the Chinese Remainder Theorem. Inf Sci 473:13–30 25. Harn L, Xia Z, Hsu C, Liu Y (2020) Secret sharing with secure secret reconstruction. Inf Sci 519:1–8 26. Kandar S, Dhara BC (2020) A verifiable secret sharing scheme with combiner verification and cheater identification. Journal of Information Security and Applications 51:102430 27. Meng K, Miao F, Huang W, Xiong Y (2020) Threshold changeable secret sharing with secure secret reconstruction. Inf Process Lett 157:105928 28. Huang PC, Chang CC, Li YH, Liu Y (2021) Enhanced (n, n)-threshold QR code secret sharing scheme based on error correction mechanism. Journal of Information Security and Applications 58:102719 29. Yuan J, Yang J, Wang C, Jia X, Fu FW, Xu G (2022) A new efficient hierarchical multi-secret sharing scheme based on linear homogeneous recurrence relations. Inf Sci 592:36–49 30. Ding J, Ke P, Lin C, Wang H (2022) Bivariate polynomial-based secret sharing schemes with secure secret reconstruction. Inf Sci 593:398–414
Chapter 30
A Comparative Study on Sign Language Translation Using Artificial Intelligence Techniques Damini Ponnappa and Bhat Geetalaxmi Jairam
1 Introduction Gestures are naturally used to convey meaning among humans and evolution in the IT area has had a vital impact on the manner with which individuals interact with each other. The study has always been centered around the exchange of information [1]. Communication facilitates interaction between humans to reciprocate emotions and intentions. The community of deaf-mutes faces plenty of challenges in communicating with normal people [2]. Typically, this problem can be solved by possessing a dedicated person to serve as an interpreter for facilitating communication among them. Nevertheless, a substitute solution must be offered as the translator may not be available at any given time, unlike a computer program [1]. Sign Language (SL) is the one where communication is nonverbal [3]. In SL, gestures and signs are grouped to form a single language. SL uses fingers, hands, arms, eyes, head, face, and more to communicate. Each gesture in SL has its meaning. In SL, understanding the gestures is the key to understanding the meaning of the words. When a person using this gestural SL solely for communication tries to communicate with someone who does not understand it, there is a problem in the communication. It is important to note that every nation possesses sign language. It is referred to as the Indian Sign Language (ISL) in India [4]. A gesture is the general means of communication for the people who face hardship in speaking and hearing. Despite not being able to communicate with most people, they can interact well with each other [1]. Various analysts are engaging in developing designs that are transforming how humans and computers D. Ponnappa (B) · B. G. Jairam Department of Information Science and Engineering, The National Institute of Engineering, Mysore, India e-mail: [email protected] B. G. Jairam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_30
359
360
D. Ponnappa and B. G. Jairam
Fig. 1 Major subsets of Artificial Intelligence
interact with each other in light of advances in science and engineering. Computer programs are developed in such a fashion that they can help in the translation of sign language to a textual format which includes frames that are static and dynamic [3]. With recent advancements in artificial intelligence, several algorithms are developed which are categorized under deep learning and machine learning, this will improve the quality and accuracy of predicting sign language.
1.1 Artificial Intelligence (AI) The replication of human intelligence by machines that can simulate their actions and even think like humans are labeled as artificial intelligence. The term may also apply to any machine that exhibits human characteristics, such as problem-solving and learning. Artificial Intelligence is proving to be highly useful in the prediction of sign language due to its rapid development. The most popular subsets of AI are machine learning and deep learning as shown in Fig. 1.
1.2 Machine Learning (ML) The subsection of AI that enables machines to train and grow rather than being specifically developed is called machine learning. Data acquisition and self-learning are functions of machine learning. To begin the machine learning process, real-time data or collected information is used. In order to draw conclusions from the examples presented, it seeks to find patterns in the data. By removing the need for human interaction and modifying the behavior of the computers, ML enables computers to train by themselves.
30 A Comparative Study on Sign Language Translation Using Artificial …
361
1.3 Deep Learning (DL) Deep learning is one of the subdivisions of AI and ML that impersonates the way humans gain insight. DL is highly useful in areas that include collection, inspection, and depicting an enormous proportion of data, it quickens and reduces the complexity of the technique. To its ultimate fundamental extent, DL can be considered a mode to automatize predictive analytics. DL algorithms are developed in a chain of rising complication and abstraction, far from classic ML algorithms that are straightforward.
2 Literature Review • Someshwar et al. [5] used TensorFlow, a library utilized for the design and development of the model. To serve the motive of Image Recognition, a DL algorithm, Convolutional Neural Network (CNN) was employed. CNN algorithm assists in translating the images into the form of a matrix which is recognized by the developed model and makes it ready for classification. OpenCV behaves as the eyes of the system which captures and processes real-time gestures made using hands. • Yusnita et al. [1] made use of Computer Vision which is concerned with obtaining images with the help of Image Processing and extracting the key specifics of the image. A classification procedure compares and classifies the current gesture performed by the user using the model which is trained. The base of the procedure is ML and Artificial Neural Network (ANN) is the categorization method employed in the conducted research. • Guo et al. [6] introduced a basic Convolutional Neural Network for image categorization. An examination was conducted by taking into consideration alternative techniques of learning rate sets and different optimization algorithms. This provided the optimal parameters for the influence on picture classification using the Convolutional Neural Network. It was also noticed how the composition of different Convolutional Neural Networks affects picture classification results. • Harini et al. [3] generated a model for the purpose of recognition of sign language that transforms the sign into text in both static and dynamic frames. The gestures are pre-processed and photographed with a webcam. Background subtraction is employed to remove the background in the pre-processing stage allowing the model to adjust to any change in the background. The image should be rightly collected and filtered which is a key challenge faced with software-based designs. Figure 2 depicts a sum-up of the layers used in the proposed CNN model. The Convolution Layer The technique performs the task of identifying specific characteristics of an image. Feature extraction is performed by filtering the input signal using the image and kernel matrices. The dot product of image and kernel matrices are the resulting matrices.
362
D. Ponnappa and B. G. Jairam
Fig. 2 Overview of the CNN model
The Max-pooling Layer It may be necessary to reduce a convoluted image without sacrificing its features if it is too large or too small. On a max-pooling surface, only the highest value in a certain area is chosen. The Flattening Layer The multi-D matrixes are converted to 1D arrays in the flattening surface so that they can be fed into a classifier. • Shinde and Kagalkar [7] utilized the Canny Edge Detection (CED) technique since it has higher accuracy and consumes less time. The method effectively removes noise from the input and detects a clear image for the next level of processing. The rate of error produced by the algorithm is relatively low, with edge points that are localized, and responses that are single. Java NetBeans was used to create the system. • Badhe and Kulkarni [4] applied the movements to the system that are identified from the collected input photos and transformed them into a graded setup, i.e. the transformed gestures made by hands are expressed in the form of English words. The input in the video format is separated into frames that are single, and every frame is sent to the function which handles pre-processing. Each frame passes through numerous filters to minimize unnecessary regions and improve the speed. • Mapari and Kharat [8] developed a system formed on a type of supervised machine learning algorithm called Support Vector Machine (SVM). The data was obtained from students who already knew how to do sign signals or had undergone training to do so. A still camera with 1.3 million pixels was used to record the data. Because only a few motions were taken into account, the precision of the model after it was trained was determined to be 93.75%. • Wu and Nagahashi [9] used a novel training methodology based on an AdaBoost classifier for the purpose of training on the images’ Haar features. This AdaBoost operates on Haar-like characteristics including the variation of frames to study the skin color by detecting whether the hand is left or right instantly. The classifier
30 A Comparative Study on Sign Language Translation Using Artificial …
363
Fig. 3 Example of Haar-like features
has an enhanced tracking algorithm that captures and processes the patch of the hand from the preceding frame to generate its fresh patch for the present frame. The algorithm correctly anticipates gestures at a rate of 99.9% of the time. As shown in Fig. 3, a collection of rectangle masks is used to calculate Haar-like features. The addition of pixel intensities which is available inside the white-colored rectangle is always deducted from the addition of pixel intensities available inside the black-colored rectangle to determine the value of each feature. It is an ML technique that is applied in a step-by-step manner. The approach chooses the classifiers which are weak depending on the Haar-like characteristics before combining the classifiers that are weak to improve execution. All classifiers that are weak are combined to form a single strong classifier. To reduce training error, the classifier which is stronger impulsively modifies the heaviness of the samples. This type of load change for the classifier that is stronger is too slow to operate in real-time. As a result, the classifiers that are weak are stacked in a stream, with subsequent classifiers being instructed exclusively based on examples that have passed through the previous classifiers. • Nath and Anu [10] applied Perceptron, a neural network operation for developing the system. SciPy, a built-in Python tool, was used to create the design. Few characteristics which include accuracy, F1 score, and recall are used in the system’s execution measurements. The developed model uses a strategy called pruning to assist cut the network size to increase performance. The concealed layers were gradually increased from 10 to 120 throughout the training phase. • Sajanraj and Beena [2] made use of the processor called ARM CORTEX A8 to implement sign recognition in the system. To capture and process images in the present time OpenCV python library was used. Haar training characteristics were employed to predict both positive and negative pictures. • Rao et al. [11] designed a model that was trained on a dataset that included 300 distinct ISL number pictures. In 22 epochs, the system’s accuracy was 99.56 percent. Various activation functions and rates of learning were used to test the model. Keras API assisted by TensorFlow was utilized to build the backend. The algorithm correctly anticipated the symbols which were static when it was tested with hundred photos for every sign. • Bantupalli and Xie [12] employed 4 convolution layers where the window size varied in the system; it also included the activation function called ReLu. The developed model was put to the test with three different pooling algorithms, with
364
D. Ponnappa and B. G. Jairam
stochastic pooling proving to be the most effective. For the purpose of feature extraction, stochastic pooling (2 layers) was used. • Suresh et al. [13] employed the initial design of CNN for the recognition of gestures to draw out spatial characteristics. The Recurrent Neural Network(RNN) model was utilized to bring out data that is temporally taken from the video streamlet. By making use of the identical samples for training and testing both CNN and RNN models were tested separately. This guarantees that either CNN or RNN does not make use of test data to improve prediction while the phase of training is in progress. To train both the models ADAM optimizer was utilized which minimizes loss. • Jiang and Chen [14] proposed a system to foretell SL motions produced by users, the designed model was constructed utilizing a 2-layered CNN. For the purpose of classifying and comparing prediction accuracy, two separate models were utilized. The optimizers namely SGD and Adam, both of these use the cost function namely Categorical Cross entropy, which was applied to optimize the output. Even with blurry images and under varying lighting situations, the model was found to accurately anticipate gestures. The developed system has identified a total of six distinct SLs with SGD having a precision of 99.12% and Adam having a precision of 99.51% percent, both optimizers were used separately. The accuracy is more while making use of the Adam optimizer.
3 Comparison of Artificial Intelligence Techniques Table 1 gives the accuracy of each AI Technique employed for sign language translation and suggestions to improve the same. The performance of the AI algorithms based on accuracy is depicted in Fig. 4.
4 Conclusion This paper outlines a prior analysis of the recognition and translation of sign language employing several artificial intelligence techniques. It is understood that the dataset input and selection of features are equally critical in acquiring worthier prediction results. This comparative analysis noticed that Convolutional Neural Network and AdaBoost classifier are the prominent techniques that yield higher accuracy individually in discovering and speculating the sign language.
30 A Comparative Study on Sign Language Translation Using Artificial …
365
Table 1 Comparison table of existing artificial intelligence techniques SL No
AI Technique used
Method used
Remarks
1
Convolutional Real-time Hand Neural Network [5] Gestures
Dataset input
CNN is used for Image Recognition and for making the Classifier ready
CNN has an individual accuracy of 99.91% which is commendable
2
Artificial Neural Network [1]
ANN is the image ANN has an classification method individual accuracy used here of 90% which can be improved by increasing the count of the layers which are hidden
3
Convolutional The MNIST Neural Network [6] dataset
CNN is the image CNN has an classification method individual accuracy used here of 99.91% which is commendable
4
Convolutional Real-time Hand Neural Network [3] Gestures
CNN is used for the analysis and classification of images
CNN has an individual accuracy of 99.91% which is commendable
5
The Canny Edge Detection Technique [7]
Real-time Hand Gestures
CED removes noise from the input and helps in detecting a clear image for processing
CED has an accuracy of 91.56% which can be improved by employing more parameters
6
Gesture Recognition Algorithm [4]
Indian Sign Language
GRA conducts data accession and pre-processing of signs to follow movements made by hands
GRA has an accuracy of 97.5% which can be improved by employing more parameters
7
Support Vector Machine [8]
Real-time data SVM is the image from students who classification method knew sign used here language
SVM has an accuracy of 93.75% which can be improved by using an ensemble of SVMs
8
An AdaBoost Classifier based on Haar-like features [9]
Videos containing sign language
AdaBoost has an individual accuracy of 99.9% which is commendable
SIBI—An Indonesian SL
An AdaBoost classifier is used for training the model centered on the images’ Haar-like features
(continued)
366
D. Ponnappa and B. G. Jairam
Table 1 (continued) SL No
AI Technique used
Dataset input
Method used
Remarks
9
Perceptron, a Neural Network Mechanism [10]
Real-time hand gestures
Perceptron uses the pruning technique to reduce the size and improve systems’ efficiency
Perceptron has a precision of 88% which can be improved by increasing the count of the layers which are hidden
10
Convolutional Neural Network, Haar classifier [2]
Indian Sign Language
The combination of CNN and Haar classifier was used for classification and image recognition
The accuracy of the model is higher at 99.56% with normal light conditions when compared with low-light conditions at 97.26%, this can be improved by integrating with other neural networks
11
Convolutional Indian Sign Neural Network Language with Keras API [11]
CNN was used for image recognition and classification along with Keras API which was used at the backend
The CNN with Keras model has an accuracy of 99.56% which is acceptable but can be enhanced by making use of ensemble learning
12
Convolutional Neural Network with ReLu activation function, Stochastic pooling [12]
A website named Kaggle which contained twentysix American sign language letters
CNN was used for image recognition and classification along with the ReLu activation function. Stochastic pooling was applied to extract the features
The model has an accuracy of 99.3% which can be enhanced by employing ensemble learning
13
Convolutional Neural Network, Recurrent Neural Network, Adam optimizer [13]
Video streams
CNN was used for gesture recognition and RNN was used for feature extraction. To train the two models, the Adam optimizer was utilized
The model has higher accuracy 99.91% when CNN is used. The performance of RNN can be improved with ensemble learning (continued)
30 A Comparative Study on Sign Language Translation Using Artificial …
367
Table 1 (continued) SL No
AI Technique used
14
Convolutional 6 different sign Neural Network, languages SGD optimizer, and Adam optimizer [14]
Dataset input
Method used
Remarks
2-layer CNN was used for the prediction of sign language, where the SGD optimizer and Adam optimizer were used for the optimization of output
The model has higher accuracy of 99.51% when the Adam optimizer is used and 99.12% when the SGD optimizer is used. SGD generalizes better and Adam Optimizer converges faster
Fig. 4 Performance of the Techniques used based on Accuracy
5 Future Work As per the analysis, it is clear that some techniques such as Support Vector Machine and Artificial Neural Network do not provide the expected accuracy as the count of hidden layers employed is limited. This concern can perhaps be overcome in the future by increasing the count of hidden layers employed and by using alternative combinations or hybrid AI algorithms which use ensemble learning over the existing technique.
References 1. Yusnita L, Rosalina R, Roestam R, Wahyu R (2017) Implementation of real-time static hand gesture recognition using artificial neural network. CommIT J 11(2):85 2. Sajanraj TD, Beena MV (2018) Indian sign language numeral recognition using region of interest convolutional neural network. In: 2nd International conference on inventive
368
D. Ponnappa and B. G. Jairam
communication and computational technologies, Coimbatore, India 3. Harini R, Janani R, Keerthana S, Madhubala S, Venkatasubramanian S (2020) Sign language translation. In: 6th International conference on advanced computing and communication systems, Coimbatore, India 4. Badhe PC, Kulkarni V (2015) Indian sign language translator using gesture recognition algorithm. In: IEEE International conference on computer graphics, vision and information security, Bhubaneswar, India 5. Someshwar D, Bhanushali D, Chaudhari V, Swathi N (2020) Implementation of virtual assistant with sign language using deep learning and TensorFlow. In: Second international conference on inventive research in computing applications, Coimbatore, India 6. Guo T, Dong J, Li H, Gao Y (2017) Simple convolutional neural network on image classification. In: IEEE 2nd International conference on big data analytics, pp 1–2 7. Shinde A, Kagalkar R (2015) Sign language to text and vice versa recognition using computer vision in Marathi. In: National conference on advances in computing, Kochi, India 8. Mapari R, Kharat G (2012) Hand gesture recognition using neural network. Int J Comput Sci Network 1(6):48–50 9. Wu S, Nagahashi H (2013) Real-time 2D hands detection and tracking for sign language recognition. In: 8th International conference on system of systems engineering, Maui, HI, USA 10. Nath GG, Anu VS (2017) Embedded sign language interpreter system for deaf and dumb people. In: International conference on innovations in information embedded and communication systems, Coimbatore, India 11. Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: Conference on signal processing and communication engineering systems, Vijayawada, India 12. Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision.In: IEEE International conference on big data, Seattle, WA, USA 13. Suresh S, Mithun H, Supriya MH (2019) Sign language recognition system using deep neural network. In: 5th International conference on advanced computing & communication systems, Coimbatore, India 14. Jiang S, Chen, Y (2017) Hand gesture recognition by using 3DCNN and LSTM with Adam optimizer. In: Advances in multimedia information processing. Harbin, China
Chapter 31
WSN-IoT Integration with Artificial Intelligence: Research Opportunities and Challenges Khyati Shrivastav
and Ramesh B. Battula
1 Introduction Sensor networks are formed not only from the data related to text but also audio, video, images and data from small-scale industrial sectors as well. There is clustering of data as well as a need for the power efficiency in sensor networks of multimedia. Clustering strategy was robust and effected packet data transmission with different traffic conditions. Cluster head formation, lifetime and delivery of data packet ratio exist in some methods. One has to use the packet delivery ratio for finding out the number of successful packet transfers by Salah ud din et al. [1]. For obtaining quality of service (QoS) for IoT, actual smart sensors need to be used for city network management with no human interventions. IoT-enabled devices are used in smart cities, intelligent and balanced networks as recommended by Keshari et al. [2]. IoT networks are unique networks used for a range of localized data, error localizations and other variances. Here, IoTs do not apply to all the categories of environment. Localizations of forests, oceans and buildings may be of free range or with some interconnecting ranges. IoT objectives are to set up such networks which are better in performance with minimal resources used in systems. There are IoT devices in industrial regions, schools, colleges, campuses, outside and inside also. Buildings, traffic, oceans, deserts, etc., are also networked in WSN-IoT. Global positioning systems (GPS) is prevalent for finding positions, but with other situations, indoor, or other building environments, localizations are less possible as proposed by Barshandeh et al. [3]. IoT is set for millions or billions of devices which are industrial or smart cities and houses, where human interference is least possible. This IoT has to be K. Shrivastav (B) Gautam Buddha University, Greater Noida, Nagar GB, India e-mail: [email protected] R. B. Battula Malviya National Institute of Technology, MNIT, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_31
369
370
K. Shrivastav and R. B. Battula
integrated with machine learning (ML), deep learning (DL) and also other industrial analytics for the growth of fifth generation (5G) and block chain systems. This system has seven layers in the architecture comprising physical devices, device-to-device (D2D) communication, edge computing, data storage network management, application and other collaborative things. From an innovative standpoint, technologies have to be derived for IoT communities and attackers. Sharma et al. [4] proposed that more research is thus required in the fields of IoT, block chain, data sciences, cloud and fog computing, etc. There are many data compression techniques for wireless sensor networks. Data collection is on a large-scale basis for applications related to the healthcare monitoring, industrial, agricultural management and other areas of industries which come under WSN. Communication takes place in the presence of short range devices like Bluetooth, zigbee, etc. Parameters of complexity include information processing, static or dynamic memory, data exchange, redundant data acquisition, security, realtime data management, robustness and consciousness for the realization of QoS as said by Ketshabetswe et al. [5]. There is energy minimization using modulation techniques for the IoT. WSNs have a short lifespan limitation for a given power supply. There is a M-ary quadrature amplitude modulation (M-QAM) system for the modulation technique. Abu-Baker et al. [6] proposed that cluster density and sizes are important in energy savings and form a lower and upper bound for calculation of energy consumption and throughput. For the 5G IoT networks, most importantly, there is a requirement of security and privacy. Protection of the IoT networks from various unfair Internet means is a prime need in today’s world. Latest technologies are required for the hybrid combination of things related to machine learning, AI and other devices by Kumar et al. [7]. Communication and computing latency are a few factors for the coverage of wireless sensor networks to attain server stabilities, etc. There are trade-offs among latency, network coverage and stability by Chen et al. [8]. Cloud-based AI engines are an application for host server regions to use sensor nodes for data exchange policies in a group or cluster. Various constructed or obstructed IoT networks have AI-based things for making decisions. IoT sensor nodes (SNs) and low power wide area networks (WAN) technologies work together in long-range communication. A well-planned IoT network is for the literature-based range predicting learning techniques by Lami and Abdulkhudhur [9]. Big amounts of data are considered for the concentration of wireless sensor networks. Two types of query-driven and query-finding WSNs work in collaboration with normal low power sensor nodes to solve the data collection and aggregation problems related to agriculture, traffic monitoring, earthquake and flood detections. Various protocols, issues and other applications exist for time-driven and hybrid architectures in prioritizing data transmission and achieving accuracy in a process by Sahar et al. [10]. The Internet has emerged for sharing with technologies to connect with communication, embedded actuator and sensor nodes, readers, target id’s, etc., for establishing communication with protocols and applications of smart cities, structural health of buildings, waste management, air quality and noise monitoring up to a certain level of traffic congestion. Challenges of the IoT paradigm have the components
31 WSN-IoT Integration with Artificial Intelligence: Research …
371
of privacy, compatibility, scalability, mobility management and cost effectiveness. Kassab and Darabkh [11] proposed that disconnection is a problem while establishing connections among heterogeneous devices. Now, services of IoT provide seamless connectivity of upper and physical worlds. IoT-related terms such as web-of-things (WoT), IoT cloud of things as well as machine-to-machine (M2M) communication are suitable in such systems where a major factor is providing systems with sensing capability. Connectivity, storage, computational and other capabilities led to the development of fog, cloud networking server based and nano-technological solutions which can work with IoTs by Colakovic and Hadzialic [12].
2 Related Works There are WSN clustering and routing methods that have been investigated in previous research works. Nowadays with the development of algorithms/protocols for IoT, it has become easier to integrate WSN-IoTs for specific AI applications. Thus, they have been highlighted in the following subsections.
2.1 Algorithms Related to WSN as Well as IoT Intelligent computation and adaptation are related to WSNs for the deployment, topology changing structure, computation storage and communication capability. Applications and protocols are energy efficient, scalable and robust, and the environment or the context can be changing and intelligent behavior needs to be demonstrated by Serpen et al. [13]. Wireless sensor networks have a high amount of big data complexities related to data which is either inside the networks or outside the networks. Data collected from networks are in distribution and centralization with other collected outputs of data by Fouad et al. [14]. Development of IoT-based energy efficient algorithms maximizes lifetime based on an analytical hierarchical process and a genetic clustering protocol (LiM-AAP-G-CP) for IoT-based areas divided into different IoT areas, dimensions, sizes of field areas for sensor nodes and energy residual selections presented by Darabkh et al. [15]. Virtual, physical devices, entities and components which are heterogeneous in nature are the IoT things which can be millions and billions in number. Internet of things communicates intelligently with each other and varies in their structure, abilities and other issues. Business and social network structure enabling technologies provide new open challenges for IoT by Li et al. [16]. Internet of things is built on the basis of challenges faced due to wireless sensor networks and radio frequency identification (RFID) tags. There are layers of presentation in IoT, viz. application layer, perception layer, presentation layer and transportation layer as well. Time and memory are some of the physical constraints, while other limitations include energy processing, etc. There is a huge amount of data generated by IoT which needs to be set up in WSN-IoT networks or
372
K. Shrivastav and R. B. Battula
heterogeneous systems. There is a danger of security attacks, threats, etc., by Jing et al. [17]. Vehicular networks, smart cities and grids also come under IoT ranging of capacities in research areas related to memory networks, cellular networks, social networking, etc. Wireless sensor and actuator networks (WSANs), e-health issues, cloud computing and software defined networking come under the latest areas of research. For cellular networks and machine-to-machine communications, there are cluster heads, cluster neighbors in them for a selection of suitable 5G networks. Medium access control (MAC) and routing protocols are also working with IoT in some or the other contexts related to networking or transport layers. Geographical information and geo-social distances come under location-based services of IoT by Rachedi et al. [18]. There is an implementation of multi-level time sensitive network protocol based on a platform of real-time communication systems using distributions in a variety of ways. Quality of service profiles have packet loss for global positioning system (GPS) acquiring or such signals’ connection with these systems. Sensors and adapted systems are there in which there is a need to calculate latency, throughput, etc., by Agarwal et al. [19]. IoT has effects on millions of devices, or sensors like smart phones, wearables, etc., which are deployed and used for various purposes. They are used in various platforms for smart farming, grids and manufacturers, etc. They have volumes, heterogeneous nature of data and useful data analysis of IoT. Cloud data and real-time data can be an essential part of IoT. Georgakopoulos and Jayaraman [20] presented security and privacy as important for developing large as well as small-scale devices. WSN comes across different areas of living or day-to-day life like embedded systems, sensors and actuators, etc., for future use. We have a scalability cloud for the enduser, meeting demands for the ranging of huge data amounts to be usefully utilized for IoT operations. These are in industry, government and non-government organizations by Gubbi et al. [21]. WSN is to be later involved in social IoTs (SIoTs) with the necessary resources and protecting them for privacy matters. Connecting people, objects and other vehicles, etc., are used for IoT applications. Source location protected protocol based on dynamic routing (SLPDR) is for the locating of source privacy policies. Transmission delays, secure networks and lifetime with different energy consumption levels are some of their characteristics presented by Han et al. [22]. Low-cost devices of IoT are playing an important role in research, development and the sensor deployment for the integrated software and hardware components in this environment. Standard protocols, development at ease, simulation or emulation are used with contiki and routing protocols. Storage, memory capabilities, efficiency and connectivity are some important terms associated with these protocols by Zikria et al. [23]. From an application point of view, IoT-based museum processes are for the preservation of security with AI and IoT for maintaining infrastructure, wireless systems, etc. For preserving the culture using smart city systems, sensor networks and web are some important things associated with the test, validity and other investigative cases in a proper fashion by Konev et al. [24].
31 WSN-IoT Integration with Artificial Intelligence: Research …
373
5G-enabled IoT is used for backbone or strength of data in the block chain industrial automation with smart city, home, agriculture and healthcare applications. Billions of things/data connected together with each other for various 5G applications involve different devices and protocols with the centralized architecture. Industryrelated applications for high connectivity among different networks are specifically useful for healthcare systems, and other dynamic processes by Mistry et al. [25]. Hybrid techniques are used for IoT applications with the sensors, mobile devices, cluster nodes, data in real time and other methods. Low costs, minimum energy utilization and management are simply useful for hybrid approaches of chip on which WSN and IoT are integrated as system-on-chip (SoC), sensor arrays, power supply, distribution units and wireless communication interfaces. In this integration process, various nodes act in a different way. Nowadays, AI and WSN are the most important IoT conceptualization for the cyber security approaches of smart city monitoring and e-governance systems. This can be picturized with its application in various sectors by Sundhari and Jaikumar [26]. There are virtual and real devices for smart and intelligent management of things. Networking, processing, security and privacy are some important terms related to IoT things and devices. Saving time, decision making methods, health care, home automation and smart cities, vehicles and parking are some tasks and systems where governing is done by IoT. Multimedia, wireless ad hoc networks (WANETs) can also be part of such IoT options by Goyal et al. [27]. Fog computing is a term associated with real-time applications of IoT and latency-related areas. Linking of IoT smart systems, cloud centers, business associated with potential technologies and environments has led to the thinking about weaknesses and strengths of the IoT technologies with AI/ML. Interoperability is also important for these sections of society by Zahmatkesh and Turjman [28]. IoT things have huge heterogeneous devices with data that will collaborate with other sensor networks for achieving QoS and QoE. Clustering in the form of cluster members and cluster heads is modeled according to the selection of cluster heads for secure communication. It includes a different number of cluster heads used in sensor networks. Quality of experience is important for 5G purposes in terms of coverage, latency, longer lifetime and reliable issues to provide fast, secure systems by Kalkan [29]. IoT is a new communication area or technology used for people, things and physical or biological entities. Agriculture sectors, live field views, financial areas get affected due to IoT environments. If there is an agricultural sector, then land area, crop, field size, water, humidity, all are the parameters for efficient monitoring of agricultural land and validation of performance in real-time areas. Computer, Internet and IoT communication technologies/areas are giving best outcomes in the government sector. They use RFID, Internet protocols (IP), etc. IoT deals with data updation to the level of balanced energy consumption among sensors by Pachayappan et al. [30]. Health care, smart cities and fitness are many of the different applications of IoT. There are body sensor networks with their unique advantages and leading to motivations for different features, decision and data extraction to the various levels of applying logic-wise sensors at different times. Fusion leads to different sensor
374
K. Shrivastav and R. B. Battula
data collected from body parts associated with the cloud level for scalable workflow connection and security by Gravina et al. [31]. Sensors fulfill the task in an efficient or inefficient manner depending on how they are used for all applications in smart things, computations, energy consumed and transmission range. These parameters are decided at various stages for a small or large set of devices. Deploying sensors for collection of data and communication among users at end-side or middle level is associated with topology related contexts. Mobility, bio-inspired protocols, latencies and challenges are features associated with motivation and guidance to researchers Hamidouche et al. [32]. A conventional approach or a non-conventional approach such as dynamic security is used for machine learning applications to overcome attacks, and powerful technologies need to be developed for secure approaches. Spoofing, denial of service and jamming are some problems of WSN which lead to changing things in the future if algorithmic models are developed for them. They present comfort based, easy going life, smoothness and secure things against attacks or unwanted effects with the help of ML-based algorithms for filling gaps in such technologies by Tahsien et al. [33]. Spectrum is shared with cognitive radio networks with IoT for solutions in smart things associated with packets and their scheduling. The worldwide network has all sensor devices connected for handling latency, short or long time responses. Fairness of queuing delay, packets dropped or complexity associated issues lead to nodes, servers, IoT devices that deal with complex issues of spectrum and index by Tarek et al. [34]. Systems based on IoT and other service technologies are an important criterion for web services and heterogeneous fusion of information. There are RFID, sensors, GPS and other laser scanningbased approaches. Intelligent transport, protection of environment, work of government and monitoring of industries with position and track of things can have smart home and residential capabilities. Applications of research and new modern stages have homes, intelligent district and component techniques available for future-based applications proposed by Li and Yu [35]. Major research issues for WSN-IoT toward AI. In collaboration with WSN-IoT, there is a need for analysis of the algorithmic models and approaches for IoT. These are formatted according to some parameters which are prominent features or research areas for intelligent systems. • • • • • • •
IoT connectivity with AI Heterogeneity at different levels among devices Scalability of the networks where the IoT and AI exist together Size, shape and dynamic nature of the network Choice of appropriate AI algorithms/protocols for WSN/IoT applications Clustering based on AI techniques AI-oriented WSN/IoT layers.
Summary of Existing WSN, IoT Algorithms/Protocols. The following Table 1 summarizes some recent algorithms/protocols for clustering/routing in WSN-IoT systems.
31 WSN-IoT Integration with Artificial Intelligence: Research …
375
Table 1 Algorithms/Protocols associated with WSN-IoTs and their advantages Algorithm/Protocol
Full form of algorithm/Protocol
Advantages/Methods
PE-WMoT [1]
Power efficient multimedia of things
Cluster head, sub-cluster head, Fuzzy-technique for network period enhancement
GWOAP algorithm [2]
Gray wolf optimization affinity Fitness function used for propagation minimizing the cost of communication in SDN-IoT networks. They are deployed with multiple microcontrollers in smart cities for balancing traffic load
Data compression algorithm used [5]
ALDC algorithm modified
Data bits reduced in size, compression is done for energy efficiency
Distance-based adaptive step function is introduced [6]
Adaptive modulation with clustering
Minimum energy consumption between cluster members and cluster heads
DXN [9]
Dynamic AI-based analysis and optimization of IoT networks
Nodes are freely available and the lifetime of sensor nodes is more with balance in energy consumption
LiM-AHP-G-CP [15]
Lifetime maximizing based on IoT cluster heads selection and analytical hierarchical process their hop decision and genetic clustering protocol
TSN [19]
Multi-level time sensitive networking (TSN) protocol
Network traffic and data sending on the basis of priority set
SLPDR [22]
Source location protection protocol based on dynamic routing
Boundary nodes used for packet forward process with sending of dummy packets
HCSM and DSOT [26]
IoT-assisted hierarchical computation strategic making, dynamic-assisted stochastic optimization technique
Enhancing lifetime and sustainability for smart buildings and cities in IoT contexts
SUTSEC [29]
SDN utilized secure clustering Energy efficient mechanism communication is reliable and less preferences user contexts key distributions
3 WSN-IoTs and AI Collaboration Earlier developments in WSN-IoT have been a motivating factor for their interconnection with AI, so there are some common regions of research interests in WSN-IoT or AI-IoT or WSN-AI. More prominently, a joint WSN-IoT-AI intelligent system can be developed for utilizing all these three sectors in an interdependent manner. This is
376
K. Shrivastav and R. B. Battula
Fig. 1 Representation of common areas/regions of WSN, IoT and AI
Fig. 2 Top to bottom layered approach for WSN-IoT with AI
shown below in Fig. 1. AI techniques with minor or major changes can be designed to work with WSN-IoT according to desired applications. Layered architecture in the form of top to bottom approach is depicted in Fig. 2. Different layers for the working of WSN/IoT have been in existence since the development of their protocols or algorithms. Presently, there is a need for the top to bottom layered architecture in which there are some unique layers on which WSN/IoT can work with AI to form smart and intelligent systems like smart cities, industries or cyber-physical systems. If there is going to be a connection of WSN-IoT clustering with AI algorithms, then the flowchart in Fig. 3 below provides a brief and concise idea for the development of an intelligent system.
31 WSN-IoT Integration with Artificial Intelligence: Research …
377
Fig. 3 Flowchart for AI algorithm in clustering of WSN-IoT intelligent systems
4 Conclusion and Future Scope WSN primitively has been an essential network in the field of wireless communicating devices which are small in size and low in cost. As the years passed, a new term came as multimedia of things. Later with the development of the devices, which can be connected with the Internet, WSN-IoT came into existence. Further, clustering/routing approaches have been developed to work together with AI algorithms/protocols. AI algorithms can be applied for WSN-IoT networks for collection, aggregation, clustering and sending of data to solve the issues related to transmission of data packets, balanced power consumption and energy efficiency. In the future, WSN-IoT with AI can work together by considering various communication parameters like mobility, security, heterogeneity and reliability to achieve QoS and QoE. AI-based intelligent systems need to be designed and developed for different real-world applications. Analysis of such systems and comparison with previously developed models will prove their efficiency and robustness. Mathematical modeling of data collected by heterogeneous sensors and clustering protocol with AI approach could be a motivating factor which will attract the attention of researchers to work in this field.
References 1. Salah ud din M, Rehman MAU, Ullah R, Park CW, Kim DH, Kim B (2021) Improving resourceconstrained IoT device lifetimes by mitigating redundant transmissions across heterogeneous wireless multimedia of things. Digital Commun Netw Elsevier J 1–17 2. Keshari SK, Kansal V, Kumar S (2021) A cluster based intelligent method to manage load
378
3. 4. 5.
6.
7.
8. 9. 10. 11.
12. 13. 14. 15.
16. 17. 18. 19.
20. 21. 22.
23.
24.
K. Shrivastav and R. B. Battula of controllers in SDN-IOT networks for smart cities. Scalable Comput: Pract Experience 22(2):247–257 Barshandeh S, Masdari M, Dhiman G, Hosseini V, Singh KK (2021) A range-free localization algorithm for IoT networks. Int J Intell Syst 1–44 Sharma P, Jain S, Gupta S, Chamola V (2021) Role of machine learning and deep learning in securing 5G-driven industrial IoT applications. Ad Hoc Netw 123(3):1–38 Ketshabetswe KL, Zungeru AM, Mitengi B, Lebekwe CK, Prabaharan SRS (2021) Data compression algorithms for wireless sensor networks: a review and comparison. IEEE Access 9:136872–136891 Abu-Baker A, Alshamali A, Shawaheen Y (2021) Energy-efficient cluster-based wireless sensor networks using adaptive modulation: performance analysis. IEEE Access 9:141766– 141777 Kumar GEP, Lydia M, Levron Y (2021) Security challenges in 5G and IoT networks: a review. In: Velliangiri S, Gunasekaran M, Karthikeyan P (eds) Secure communication for 5G and IoT networks. EAI/Springer innovations in communication and computing. Springer, Cham Chen Y, Liu J, Siano P (2021) SGedge: stochastic geometry-based model for multi-access edge computing in wireless sensor networks. IEEE Access 9:111238–111248 Lami I, Abdulkhudhur A (2021) DXN: dynamic AI-based analysis and optimization of IoT networks connectivity and sensor nodes performance. Signals 2:570–585 Sahar G, Bakar KA, Rahim S, Khani NAKK, Bibi T (2021) Recent advancement of data-driven models in wireless sensor networks: a survey. Technologies 9(76):1–26 Kassab W, Darabkh KA (2020) A–Z survey of internet of things: architectures, protocols, applications, recent advances, future directions and recommendations. J Netw Comput Appl 163:1–49 Colakovic A, Hadzialic M (2018) Internet of things (IoT): a review of enabling technologies, challenges and open research issues. Comput Netw 144:17–39 Serpen G, Li J, Liu L (2013) AI-WSN: adaptive and intelligent wireless sensor network. Procedia Comput Sci 20:406–413 Fouad MM, Oweis NE, Gaber T, Ahmed M, Snasel V (2015) Data mining and fusion techniques for WSNs as a source of the big data. Procedia Comput Sci 65:778–786 Darabkh KA, Kassab WK, Khalifeh AF (2020) Maximizing the lifetime of wireless sensor networks over IoT environment. In: Fifth international conference proceedings on fog and mobile edge computing (FMEC). IEEE, Paris, France, pp 1–5 Li S, Xu LD, Zhao S (2015) The internet of things: a survey. Inf Syst Front 17:243–259 Jing Q, Vasilakos AV, Wan J, Lu J, Qiu D (2014) Security of the internet of things: perspectives and challenges. Wireless Netw 20:2481–2501 Rachedi A, Rehmani MH, Cherkaoui S, Rodrigues JJPC (2016) The plethora of research in internet of things (IOT). IEEE Access Editorial 4:9575–9579 Agarwal T, Niknejad P, Barzegaran MR, Vanfretti L (2019) Multi-level time-sensitive networking (TSN) using the data distribution services (DDS) for synchronized three-phase measurement data transfer. IEEE Access 7:131407–131417 Georgakopoulos D, Jayaraman PP (2016) Internet of things: from internet scale sensing to smart devices. Computing 1–18 Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst 29(7):1645–1660 Han G, Zhou L, Wang H, Zhang W, Chan S (2018) A source location protection protocol based on dynamic routing in WSNs for the social internet of things. Futur Gener Comput Syst 82:689–697 Zikria YB, Afzal MK, Ishmanov F, Kim SW, Yu H (2018) A survey on routing protocols supported by the Contiki internet of things operating system. Futur Gener Comput Syst 82:200– 219 Konev A, Khaydarova R, Lapaev M, Feng L, Hu L, Chen M, Bondarenko I (2019) CHPC: a complex semantic-based secured approach to heritage preservation and secure IoT-based museum processes. Comput Commun 148:240–249
31 WSN-IoT Integration with Artificial Intelligence: Research …
379
25. Mistry I, Tanwar S, Tyagi S, Kumar N (2020) Blockchain for 5G-enabled IoT for industrial automation: a systematic review, solutions and challenges. Mech Syst Signal Process 135:1–21 26. Sundhari RPM, Jaikumar K (2020) IoT assisted hierarchical computation strategic making (HCSM) and dynamic stochastic optimization technique (DSOT) for energy optimization in wireless sensor networks for smart city monitoring. Comput Commun 150:226–234 27. Goyal P, Sahoo AK, Sharma TK (2021) Internet of things: architecture and enabling technologies. Mater Today: Proc 34(3):719–735 28. Zahmatkesh H, Turjman FA (2020) Fog computing for sustainable smart cities in the IoT era: caching techniques and enabling technologies-an overview. Sustain Cities Soc 59:1–15 29. Kalkan K (2020) SUTSEC: SDN utilized trust based secure clustering in IoT. Comput Netw 178:1–11 30. Pachayappan M, Ganeshkumar C, Sugundan N (2020) Technological implication and its impact in agricultural sector: an IoT based collaboration framework. Procedia Comput Sci 171:1166– 1173 31. Gravina R, Alinia P, Ghasemzadeh H, Fortino G (2017) Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inf Fusion 35:68–80 32. Hamidouche R, Aliouat Z, Gueroui AM, Ari AAA, Louail L (2018) Classical and bio-inspired mobility in sensor networks for IoT applications. J Netw Comput Appl 121:70–88 33. Tahsien SM, Karimipour H, Spachos P (2020) Machine learning based solutions for security of internet of things (IoT): a survey. J Netw Comput Appl 161:1–18 34. Tarek D, Benslimane A, Darwish M, Kotb AM (2020) A new strategy for packets scheduling in cognitive radio internet of things. Comput Netw 178:1–11 35. Li B, Yu J (2011) Research and application on the smart home based on component technologies and internet of things. Procedia Eng 15:2087–2092
Chapter 32
Time Window Based Recommender System for Movies Madhurima Banerjee, Joydeep Das, and Subhashis Majumder
1 Introduction Shopkeeper: “Good Morning, how can I help you?” Customer: “Can you show me some blue shirts?” Shopkeeper: “Yes, of course, there are several designs in blue shirts you’ll get in my shop. . ..here they are. . .I think this will definitely look good on you.” Customer: “Yes, you are right” Shopkeeper: “Since it is a formal shirt, I guess you would need a trouser and a tie. Let me show you some choices in trousers and tie.” Customer: “Sure”. Above is a very common scenario—a shopkeeper analyses the requirement of a customer and then recommends items to the customer. Reasons for the recommendation: 1. The customer cannot possibly know what all options are available for a certain item that is sought in that shop. 2. To make the options readily available to the customers so that he does not get frustrated fishing items 3. To make him feel important. 4. To show the customer more items associated with the item he wants to purchase. 5. To increase the sale of the items in the shop. We find above that recommendation system always existed in our life and was always in practice. E-commerce highly needed recommendation system to add a personal touch to the selling process that otherwise would be missing in an online platform. M. Banerjee (B) · J. Das The Heritage Academy, Kolkata, WB, India e-mail: [email protected] J. Das e-mail: [email protected] S. Majumder Dept. of Computer Sc. & Engg, Heritage Institute of Technology, Kolkata, WB, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_32
381
382
M. Banerjee et al.
More importantly, with the availability of data over the internet, the concept of recommender systems has successfully emerged as a support system serving the customers. Recommender systems recommend items and products to user, depending on the requirement and liking of the user [2, 9, 13, 14]. The system analyzes the need of a customer by leveraging on the existing data from the internet and generates a list of items that the customer may possibly have interest in. Content based filtering, Collaborative filtering, Hybrid filtering are the three well known methods of generating recommendations, but with growing availability of data and need for better recommendations, the above methods are combined with other dimensions like context, demography and time. Demographic parameters like age, gender, locality, might affect the preference of people towards various items. Huynh et al. rightly observed that [8]—“Similar users may prefer different items in a different context.” Thus, time might also prove to be an important factor in influencing preferences of user. In our work, we have considered the dimension of time within our recommendation algorithm. Let us consider that User A used the site and rated movies in the site in year X. User B used the same site 10 years later. Now the question is if user-user collaborative filtering is used, should User A be considered as a suitable neighbor to recommend movies to User B? Internet is getting overloaded with information about users. Going with the example of movie recommending sites, there might be information in the system ranging over several years. Now as database goes on building, so does the un-clustered dataset for recommendation prediction and as a result, time for recommendation also increases. The aim of this paper is to find out how the temporal context can be used to cluster the dataset so that the recommendation quality can be made more accurate and also the recommendation process can be scaled. The rest of the paper is organized as follows in Sect. 2, we have provided background information and past work related to context aware recommender systems. Section 3 outlines our contribution while Sects. 4 and 5 present our clustering and Recommendation schemes respectively. In Sect. 6, we describe our experimental settings while in Sect. 7, we report and analyze our results. We conclude in Sect. 8 discussing our future research directions.
2 Related Work In recent years, there has been an increasing trend of incorporating contextual information in the recommendation algorithms [10, 12]. Das et al. proposed a scalable collaborative filtering algorithm by clustering the users of the system on the basis of contextual attributes [4]. The authors utilized age, gender and time as the contextual attributes and projected the ratings of the users as vectors in the contextual space. Qi et al. has mentioned that in the Quality of Service recommender system, time context makes the data dynamic [15]. Without time, the QoS data would become static, which would not give accurate result in the recommender system. In their work, they have extended the traditional LSH-based service recommendation system to incorporate a
32 Time Window Based Recommender System for Movies
383
time factor. They have proposed a new time-aware and privacy-preserving service recommendation approach to improve the recommendation accuracy. In another work, Ahmadian et al. has pointed out that most of the recommendation methods harp on prediction accuracy, but sanctity of data being used should also be considered. They have pointed out that likes and dislikes of a user is most likely to change over time, therefore, it is important to consider the time factor in the recommendation model as well [1]. De Zwart has showed that user-user correlation of users have changed when time has been taken into consideration [5]. The author incorporates the temporal recency factor which means, more recent ratings should contribute more to the prediction. In his paper on temporal dynamics, Koren [11] has stated that one challenge of recommendation system is data that drifts with time. Now, in almost every aspect of recommendation, we get drifting data. Thus data and preferences change with time. Apart from time, some other circumstantial aspects are present as well. Wasid and Ali have also worked on movie recommendation system. According to them, traditional recommendation systems do not work efficiently for clusters that are based on multiple criteria [17]. They have proposed a method where, neighbor of a user is found based on their Mahalanobis distance within the user cluster. Recently, a time aware music recommender system have been proposed [16]. They have stated that for music recommendation, time is a very important factor since preference of user in music changes over time. They have stated that music as a product is different where one user can listen to the same music several times. Secondly, unlike most other products, music, or a song is not bought singularly. In their work, they have considered the “time of day” as a context for recommending music.
3 Our Contribution In this manuscript, we are trying to study and establish that time is an important context when we consider user-user collaborative filtering. In our work, we have proposed a clustering approach where one of the clusters will be identified as the cluster of contemporary users. The concept of contemporary users as considered in this paper is as follows. Let T Su,m denote the timestamp of a user u rating a movie m. We convert this timestamp into years in order to find the active year of the user u. Since u might have rated more than one movie, we find all the active years of u. Then an average of these active years have been calculated and it is used as the pivotal year (T Su ) of u. A user x is considered to be a contemporary user of the target user u if the timestamp T Sx is in between T Su − n and T Su + n, where, the quantity 2n + 1 is a chosen number of years for deciding whether two users will be considered to be contemporary. The year range {T Su − n to T Su + n} has been considered as u’s contemporary years. In other words, if timestamp T Sx of a user x falls within the range of contemporary years of u, then x will be considered to be a contemporary user in relation to user u. All other users in the database have been divided into sets U1 , U2 , · · · Un , where the users in these sets fall into different timestamps beyond contemporary years.
384
M. Banerjee et al.
Now our aim is to show that predictions computed using the preferences of the contemporary users of a target user yield better results. At first, the years have been divided into a range of five years taking T Su as a pivotal year and it is found that the data sparsity beyond 10 years is too high to consider. Finally, after some deliberation, the range of the years for a target user u with timestamp T Su has been decided to be T Su − 12 to T Su + 12, i.e, 12 years preceding and succeeding the pivotal year of the target user. The range has been chosen such that adequate similar users are available to estimate the result. The year of rating is considered to cluster the users into windows of {5, 7, 9} years and we intend to find the window that gives the best prediction corresponding to the target user.
4 Clustering Scheme The clustering scheme used in this work is very simple. In most clustering methods, users are assigned to a pre-defined cluster [3]. To study the importance of context of time, in this work, clusters have been created around individual users, and then we have reached to a conclusion as to whether time is making any difference to the recommendations. We already stated that T Sc is the average of years of rating in the data-set for the target user c. Now, for all other users t in the dataset timestamp T St is calculated and the clustering is done as per the procedure presented in Algorithm 1. An example of our clustering process is shown pictorially through Figs. 1, 2 and 3. In the figures, T Sc indicates timestamp of the target user c and T St indicates timestamp of another user t in reference to timestamp of user c. Note that, in Figs. 1, 2 and 3, the colored bubbles represent the clusters of contemporary users.
Algorithm 1: Clustering the Users
1 2 3 4 5 6 7 8 9 10 11
Input: Set of users U , y, T Sc Output: Clusters of users based on T Sc A cluster for contemporary users C L x is formed such that time stamp x of users in the cluster lies between: T Sc − y/2 to T Sc + y/2 Let t = T Sc − y/2 while (t > T Sc − 12) do Cluster C L y is formed such that time stamp of every user in the cluster lies in window of max(t − y, T Sc − 12) to (t − 1) years; t = t − y; end Let t = T Sc + y/2; while (t < T Sc + 12) do Cluster C L y is formed such that time stamp y of every user in the cluster lies in window of (t + 1) to min (t + y, T Sc + 12) years; t = t + y; end
32 Time Window Based Recommender System for Movies
385
Fig. 1 Clustering for time slot of 5 years
Fig. 2 Clustering for time slot of 7 years
Fig. 3 Clustering for time slot of 9 years
5 Recommendation Scheme In this work, we have used user-user collaborative filtering, where the correlation between two users has been calculated using the Pearson’s correlation coefficient. Following is the formula for Pearson correlation. Suppose we have two users p and q. The Pearson’s correlation coefficient and Vector cosine similarity between them are calculated using equation 1. sim ( p, q) = i∈I
r p,i − r¯ p rq,i − r¯q i∈I 2 2 r p,i − r¯ p rq,i − r¯q
(1)
i∈I
where I is the set of items rated by both users p and q. r¯ p and r¯q are the average ratings given by p and q while r p,i and rq,i are respectively ratings of user p and user q on item i. For all the users having Pearson correlation greater than a threshold value, prediction for the target user u for an item p is calculated as
386
M. Banerjee et al.
predu, p = r¯u + k
n
sim (u, j) r j, p − r¯ j
(2)
j=1
where n denotes the number of users similar to u, while r¯u and r¯ j represents the average rating of user u and its neighboring user j respectively. k is a normalizing factor and sim (u, j) calculates the correlation or similarity between u and j. After prediction, the Root Mean Square Error (RMSE) [7] for the training set is calculated. Three set of results for each user have been calculated with the threshold Pearson correlation being taken as 0.55, 0.65, 0.75. We calculate the Pearson correlation of a target user with every other user in the system. The entire recommendation module is presented in Algorithm 2. In Algorithm 2, the subroutine User_Cluster() clusters the users using Algorithm 1.
Algorithm 2: Recommendation Algorithm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Input: Pearson correlation threshold r , set of users L Output: RMSE of predicted recommendation for every cluster for each user c in L do for each user I in L do Find PCc,I ; PCc,I = Pearson correlation between c and I end end for y in {5,7,9} do for each user c in L do Call User_Cluster (L − c, y, T Sc ) ; T Sc = Timestamp of user c for each cluster CL do for each r in {0.55, 0.65, 0.75} do For all user I in C L, find RMSE of prediction for c using each I where PCc,I > r ; end end end for each r in {0.55, 0.65, 0.75} do for each user c in L do Find the cluster C L with minimum RMSE for the user c for Pearson correlation r ; end end Count the number of minimum RMSE under each cluster; Find percentage of minimum RMSE in each cluster; end
32 Time Window Based Recommender System for Movies Table 1 ML-10M dataset: example User Item 1 1 1 1 1
122 185 231 292 316
387
Rating
Timestamp
5 5 5 5 5
838985046 838983525 838983392 838983421 838983392
6 Experimental Settings 6.1 Data Description We have tested our algorithms on MovieLens-10M (ML-10M) [6] dataset. The dataset contains 10000054 ratings from 71567 users on 10681 movies. These ratings are integers on a scale from 1 to 5. Note that, ML-10M dataset includes only those users who have rated at least 20 movies. The dataset also contains the timestamp of the ratings in seconds since the Unix epoch (01 Jan 1970, 00:00:00 (UTC)). An example of the dataset is shown in Table 1. From the data reported in Table 1, timestamp T Su,m of a user u for a movie m is calculated in years as follows: T Su,m = Round(Timestampu,m /31, 536, 000)
(3)
where T imestampu,m is the timestamp of user u rating movie m as given in the table, and 31, 536, 000 is the number of seconds in a non-leap year of 365 days. To keep things simple we avoided considering the leap year separately. In any case, even if it was considered, it would have just shifted the boundaries between the years slightly, from where they are now. The timestamp given in Table 1 is expressed in seconds. We have divided these timestamps by 31, 536, 000 to convert the seconds into years. For example, let us consider the timestamp 83, 89, 85, 046 given in Table 1. Then 83, 89, 85, 046 / 31, 536, 000 gives 26 years. Now since the dataset contains the timestamps of the ratings in seconds since the Unix epoch (01 Jan 1970, 00:00:00 (UTC)), 26 years signify the year 1970 + 26 = 1996. Since 1970 has already been specified as the zeroth year in the dataset, and in this paper we have not considered any context which depends on the marked year in the Gregorian Calendar, we have simplified the calculation process and have considered 26 as the active year of the user. One user can rate several movies in different timestamps. We consider all the years calculated from the different timestamps as the active years of that user. The arithmetic mean of the active years of a target user is termed as the pivotal year around which the clusters for the target user will be created. The pivotal year T Su for a user u is calculated as follows:
388
M. Banerjee et al.
⎛ Round ⎝ T Su =
n
⎞ T Su, j ⎠
j=1
(4)
n
where n is the number of movies u has rated and T Su, j is the timestamp (in years) when u gave the rating to movie j. Note that T Su, j is calculated using Eq. 3.
6.2 Evaluation Metric In order to evaluate the accuracy of our proposed recommendation approach, RMSE (Root Mean Square Error) metric has been used. We have calculated RMSE for every target user considering each and every cluster of 5, 7 and 9 years separately. RMSE for a user u considering a cluster C L is defined below. n (ratingActualu, j − ratingPredictedu, j,C L )2
RMSEu,C L =
j=1
n
(5)
where ratingActualu, j is the original rating given by a user u to movie j and rating Pr edictedu, j,C L is the predicted rating of user u for movie j calculated by the algorithm on the basis of similar users in cluster C L, and n is the total number of movies rated by u. For a user in the training dataset, all movies rated by that user is predicted using each of the clusters created. Lower RMSE values denote better prediction accuracy. If RMSEu,C L < RMSEu,C L 1 where C L and C L 1 are two different clusters, then the predictions computed on the basis of similar users in cluster C L are better (closer to actual rating) than the predictions computed on the basis of similar users in cluster C L 1 .
7 Results and Discussions In Tables 2 and 3 we have tabulated the RMSE values of two sample users user1 and user2 for every 5 years clusters, where the clusters are created based on timestamps of user1 and user2 respectively. In Table 2, we notice that first two clusters (2nd and 3rd column) contain no RMSE data. This is because no similar users corresponding to the time range of first and second clusters were available for user1 and thus no prediction could be calculated. Similarly, in Table 3, no similar users corresponding to the time range of the first cluster (2nd column) was available for user2 and therefore no prediction could be calculated.
32 Time Window Based Recommender System for Movies Table 2 RMSE data for user1 using 5 years clusters Pearson T St − 12 to T St − 7 to T St − 2 to correlation T St − 8 T St − 3 T St + 2 0.55 0.65 0.75
– – –
– – –
0.456399 0.413239 0.397759
Table 3 RMSE Data for user2 using 5 years clusters Pearson T St − 12 to T St − 7 to T St − 2 to correlation T St − 8 T St − 3 T St + 2 0.55 0.65 0.75
– – –
0.659119 0.390583 0.369447
0.504781 0.437442 0.387289
389
T St + 3 to T St + 7
T St + 8 to T St + 12
0.482681 0.459307 0.462992
0.548482 0.5488 0.581031
T St + 3 to T St + 7
T St + 8 to T St + 12
0.55534 0.4728 0.406254
0.618151 0.66903 0.626839
Table 4 Percentage of target users having similar users in 5 years clusters T St − 12 to T St − 7 to T St − 2 to T St + 3 to T St − 8 T St − 3 T St + 2 T St + 7 24.62311558
60.8040201
100
94.97487437
Table 5 Percentage of target users having similar users in 7 years clusters All years = T St + 11 39.1959799
In Tables 4, 5 and 6 we have tabulated the percentage of target users having similar users in the different clusters, i.e., we have calculated the number of similar users (considering the entire ML-10M data) in each cluster for each user in the training dataset. To explain further, let us consider an example. In Tables 2 and 3, we find that the RMSE value in the first cluster (2nd column) for both the users (considering Pearson Correlation 0.55) are empty. It implies that for both the users, similar users are not available in that cluster. So, if we consider the 2 users as the only users in the system, then the percentage of users having similar users in the first cluster would be (0/2 ∗ 100) = 0%. For the 2nd cluster (3rd column), user1 does not have any RMSE value, hence no similar users. However for user2, we have got RMSE values. Thus 1 out of 2 users have similar users in second cluster and accordingly the percentage of users having similar users in the 2nd cluster is (1/2 ∗ 100) = 50%. In this way we calculate the percentages in the different clusters. From the data reported in Tables 4, 5 and 6, it is clear that the density of available data is 100% for time slot containing the contemporary years (T St − 2 to T St + 2), whereas it is evident that density of data reduces as we go beyond the contemporary
390
M. Banerjee et al.
Table 6 Percentage of target users having similar users in 9 years clusters All years < = T St − 5 T St − 4 to T St + 4 All years > = T St + 5 41.20603015
100
80.90452261
Table 7 Percentage of minimum RMSE under time slot of 5 years cluster Year slot of 5 years Pearson T St − 12 to T St − 7 to T St − 2 to T St + 3 to correlation T St − 8 T St − 3 T St + 2 T St + 7 0.55 0.65 0.75
2.010050251 4.020100503 3.015075377
7.035175879 3.51758794 6.030150754
83.41708543 85.42713568 87.93969849
7.537688442 6.532663317 2.512562814
Table 8 Percentage of minimum RMSE under time slot of 7 years cluster Year slot of 7 Years Pearson All years = T St + 11 0 0 0.502512563
Table 9 Percentage of minimum RMSE under time slot of 9 years cluster Year slot of 9 years Pearson correlation All years = T St + 5 0.55 0.65 0.75
6.030150754 4.020100503 5.527638191
93.46733668 94.97487437 91.45728643
0.502512563 1.005025126 3.015075377
years. Therefore it can be concluded that there is a considerable improvement in data density when data is clustered around the active years of a target user. From Table 5 it is clear that beyond 10 years the density of available similar users has reduced drastically. So it can be said that as we move away from the contemporary years, it is less likely to get similar users corresponding to a target user. In Tables 7, 8 and 9 we have tabulated the percentage of minimum RMSE obtained from each cluster. In the clusters of Tables 7, 8 and 9, we have reported the percentages of users for whom we got the best RMSE (minimum) from that cluster for the different Pearson’s correlation values. The best results have been marked in blue and bold. For example, in Table 7, we can see that the percentage of minimum RMSE for the 1st cluster for Pearson Correlation of 0.55 is 2.01. It means that in the training dataset, for only 2% of the population, best results were obtained from the 1st cluster, whereas,
32 Time Window Based Recommender System for Movies Table 10 Average RMSE for pivotal cluster Pearson No of clusters Clustering on 5 correlation years 0.55 0.65 0.75
0.4377384 0.38532707 0.359144155
0.408510402 0.347097256 0.308908849
391
Clustering on 7 years
Clustering on 9 years
0.417659111 0.357899407 0.324361211
0.42305104 0.365267116 0.335202503
Fig. 4 RMSE comparisons for different Pearson Correlation values
for 83.42% of the population, best results were obtained from the 3rd cluster. From Tables 7, 8 and 9, it is clear that in more than 80% of the cases the minimum RMSE for the individual users have been obtained from the cluster of contemporary years. We have presented the average RMSE of the time slot based clusters and average RMSE value of the unclustered data in Table 10. We observe in Table 10 that irrespective of the Pearson’s correlation value the average RMSE for 5 years cluster is minimum (marked in blue), indicating that clustering users in clusters of 5 years around the pivotal year gives much better results than the unclustered data and the other clusters considered in this study. From the subgraphs of Fig. 4, it can be concluded that percentage of minimum RMSE and Average RMSE both increase with larger time slots. This is quite obvious because when the number of years in the time
392
M. Banerjee et al.
slot increases, more users get clustered in the larger time slots resulting in maximum number of minimum RMSE falling in the larger time slot. Thus we see as number of years increases, the bars in the chart becomes taller for all the three Pearson Correlation values. Again from Fig. 4 it is clear that Average RMSE for 5 years slot is minimum since due to context of time, the users falling in 5 years time slot, i.e., T St − n and T St + n, n = 2 is closer to the target user, yielding better results than 7 years or 9 years which have users with larger difference in years, n > 2. Further to the above observation, from Table 10 we find that best result for average RMSE is given by time slot of 5 years and Pearson Correlation of 0.75. From this, inference can be drawn that 0.75 correlation filters better correlated users than correlations of 0.65 and 0.55, and applying time slot of 5 years gives very close neighbors yielding in good predictions. From Table 10 it is also clear that average RMSE for clustered data set is better than un-clustered data set, which does not consider the context of time.
8 Conclusion and Future Work Time is an important context when designing recommendation system. Since items change over time, the users, age group, items also change over a period of time. While clustering user based on time slot, important task is to judiciously choose the length of time. Too large time slot, or too narrow time slot, might yield bad predictions. Now in this work, we have studied the effect of time context at individual level. This method may help us to study the effect of context of time, but would not be feasible when predicting in real time environment, where clusters are predefined and for scalability, users are grouped in clustered during an offline process. Thus in future scope, a dynamic method of clustering around a user can be proposed to enhance prediction results.
References 1. Ahmadian S, Joorabloo N, Jalili M, Ahmadian M (2022) Alleviating data sparsity problem in time-aware recommender systems using a reliable rating profile enrichment approach. Expert Syst Appl 187:115849 2. Anelli VW, Di Noia T, Di Sciascio E, Ferrara A, Mancino ACM (2021) Sparse feature factorization for recommender systems with knowledge graphs. In: Fifteenth ACM conference on recommender systems, pp 154–165 3. Das J, Banerjee M, Mali K, Majumder S (2019) Scalable recommendations using clustering based collaborative filtering. In: 2019 international conference on information technology, pp 279–284 4. Das J, Majumder S, Mali K (2017) Context aware scalable collaborative filtering. In: 2017 international conference on big data analytics and computational intelligence (ICBDAC), pp 184–190
32 Time Window Based Recommender System for Movies
393
5. De Zwart T (2018) Time-aware neighbourhood-based collaborative filtering. Res Paper Bus Analyt 1–46 6. Harper FM, Konstan JA (2016) The movielens datasets: history and context. ACM Trans Inter Intell Syst 5(4) 7. Herlocker JL, Konstan JA, Terveen LG, Riedl J (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inform Syst 22(1):5–53 8. Huynh HX, Phan NQ, Pham NM, Pham V, Hoang Son L, Abdel-Basset M, Ismail M (2020) Context-similarity collaborative filtering recommendation. IEEE. Access 8:33342–33351 9. Jalili M, Ahmadian S, Izadi M, Moradi P, Salehi M (2018) Evaluating collaborative filtering recommender algorithms: a survey. IEEE Access 6:74003–74024 10. Jeong SY, Kim YK (2021) Deep learning-based context-aware recommender system considering contextual features. Appl Sci 12(1):45 11. Koren Y (2009) Collaborative filtering with temporal dynamics. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 447–456. KDD ’09 12. Kulkarni S, Rodd SF (2020) Context aware recommendation systems: a review of the state of the art techniques. Comp Sci Rev 37:100255 13. Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Supp Syst 74:12–32 14. Musto C, Gemmis MD, Lops P, Narducci F, Semeraro G (2022) Semantics and content-based recommendations. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook, pp 251–298. Springer 15. Qi L, Wang R, Hu C, Li S, He Q, Xu X (2019) Time-aware distributed service recommendation with privacy-preservation. Inform Sci 480:354–364 16. Sánchez-Moreno D, Zheng Y, Moreno-García MN (2020) Time-aware music recommender systems: modeling the evolution of implicit user preferences and user listening habits in a collaborative filtering approach. Appl Sci 10(15):5324 17. Wasid M, Ali R (2018) An improved recommender system based on multi-criteria clustering approach. Proc Comp Sci 131:93–101
Chapter 33
Approximate Multiplier for Power Efficient Multimedia Applications K. B. Sowmya and Rajat Raj
1 Introduction Approximate computing is a novel method in digital design that minimizes the requirement for the exact calculation to gain considerable power, speed, and area advantages. This method is becoming increasingly significant for embedded and mobile systems that must operate under severe energy and speed limitations. In several error-tolerant scenarios, approximate computing can be advantageous. Multimedia processing, image multiplication, and machine learning are some examples. Multipliers are essential components in microprocessors, DSPs, and embedded systems and have a variety of uses, including filtering and convolutional neural networks. Unfortunately, due to their advanced logic architecture, multipliers are one of the most power-hungry digital components [1, 2]. A multiplier comprises three basic blocks: partial product production, partial product reduction, and carrypropagate addition. A partial product Pj, i is often produced by an AND gate (i.e., Pj, i = AiBj) where Ai and Bj are the ith and jth LSBs of the inputs A and B, respectively. Some of the partial product accumulation structures that are commonly used include Dadda, Wallace tree, and a carry-save adder array. The same technique is repeated until only two rows of partial products remain in each layer’s adders, which function concurrently without carrying propagation. Approximations can be added to any of these blocks, as shown in Fig. 1 [3]. Often, while optimizing one parameter, a restriction for the other parameter is taken into account. Specifically, getting the necessary performance (speed) while
K. B. Sowmya (B) · R. Raj RV College of Engineering, Bengaluru, India e-mail: [email protected] R. Raj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_33
395
396
K. B. Sowmya and R. Raj
Fig. 1 Fundamental arithmetic operation of 4*4 unsigned multiplication
taking into account the restricted power budget of portable devices is a difficult issue. The multiplier is one of the most common arithmetic blocks used in a variety of applications, notably signal processing. There are two major designs for multipliers: sequential and parallel. While sequential designs consume little electricity, they have a relatively long delay. Parallel architectures, on either hand (e.g.., the Wallace tree and Dadda), are quick while consuming a lot of power. In high-performance applications, parallel multipliers are used to avoid hotspots on the device due to their high-power consumption [4].
2 Related Work In the literature, numerous designs have been presented. Narayanamoorthy et al. [5] proposed multiplier designs that could balance computational precision and energy consumption at the time of design. The suggested multiplier may operate with an average computational error of 1% while using less energy per operation than a precise multiplier. The design and study of two, approximate 4–2 compressors for use in a multiplier are presented by Momeni et al. in the paper [6]. The findings demonstrate that the suggested designs significantly reduce power dissipation, latency, and
33 Approximate Multiplier for Power Efficient Multimedia Applications
397
transistor count compared to an exact design. Additionally, excellent image multiplication competences are present in two of the proposed multiplier designs. Chang et al. [7] proposed approximate 4–2 compressor can save power consumption and delay by 56 and 39%, respectively, in comparison with the exact compressor. According to the simulation findings, the approximate 4–2 compressor-based multiplier may decrease power consumption and latency by 33 and 30%, respectively, when compared to the exact multiplier. Reddy et al. [8] propose designing a new, approximate 4-to-2 compressor. For optimal use of the proposed compressor and to minimize error, a modified Dadda multiplier architecture is presented. Some image processing applications analyze the multiplier’s effectiveness. The suggested multiplier typically processes images that are 85% structurally similar to the exact output image.
3 4:2 Compressors A 4:2 compressor is used to shorten the latency of the parallel multiplier’s partial product summation stage. As illustrated schematically in Fig. 2a, the compressor has four equal-weighted inputs (M1–M4) and an input carry (C in ), as well as two outputs (sum and carry) and an output C out [4]. The output sum is the same weight as the inputs, but Carry and C out have double the weight [9]. Carry is not reliant on C in because of the compressor’s architecture. An exact 4:2 compressor’s internal construction is made by serially coupling two full adders, as illustrated in Fig. 2b. C out , Carry, and Sum are given as expressions 1–3. Cout = M3(M1 ⊕ M2) + M1(M1 ⊕ M2)
(1)
Carry = Cin (M1 ⊕ M2 ⊕ M3 ⊕ M4) + M4(M1 ⊕ M2 ⊕ M3 ⊕ M4)
(2)
SUM = C ⊕ M1 ⊕ M2 ⊕ M3 ⊕ M4
(3)
The truth table for the exact compressor is shown in Table 1. Multipliers have a larger area, long latency, and high-power consumption. As a result, developing a multiplier with fast speed and less power is a major challenge. However, as area and speed are typically antagonistic, advances in speed results in more areas. To get a result, a variety (multiplicand) is multiplied by another number (multiplier) several times (product). In AI and DSP applications, multiplication is certainly a performance-deciding operation. Many applications need parallel operations at high speed with acceptable precision, which necessitates the need for high-speed multiplier designs. Approximation in multipliers enables faster computations with less power consuming hardware, complexity, latency and while retaining acceptable accuracy. Partial product summation is the multiplication step that cannot be completed quickly due to the propagation delay in adder networks. Compressors are used to shorten
398
K. B. Sowmya and R. Raj
Fig. 2 a Block diagram b Full adder-based 4:2 compressor configuration
propagation time. Compressors at each level compute the sum and carry it at the same time [4].
4 16*16 Bit Approximate Dadda Multiplier Architecture Two alternative approximate 4:2 compressor architectures are used to construct a 16 × 16 Dadda multiplier. One is a high-speed variant, while the other is a modified version of the first. As the error is higher with high-speed architecture, it will be utilized for the first few LSBs, while the later one will be used in the middle of the architecture as explained in Fig. 3. For the last few bits, we utilized the exact multiplier because they will have the most weightage in the final result [10]. The simulation results are compared with the exact multiplier. SUM is generated using a multiplexer (MUX)-based design technique. The XOR gate’s output serves as the MUX’s select line. (M3M4) is picked when the select line is high, while (M3 + M4) is selected when it is low. The suggested 4: 2 compressors can use OR gate to simplify carry generation logic by inserting an error with error distance (ED) 1 in the truth table of the exact compressor [11, 12]. The following Eq. (4–5) are the logical formulations for realizing SUM and CARRY. SUM = (M1 ⊕ M2)M3M4 + (M1 ⊕ M2)(M3 + M4)
(4)
CARRY = M1 + M2
(5)
For the binary input numbers 0011, 0100, 1000, and 1111, an error has been introduced. To guarantee that equal positive and negative deviation with error distance = 1 (minimum) is attained. The term “ED” refers to the difference between the exact and
33 Approximate Multiplier for Power Efficient Multimedia Applications
399
Table 1 Exact multiplier truth table A1
A2
A3
A4
C in
C out
CARRY
SUM
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
1
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
1
1
0
1
1
0
1
0
0
0
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
0
1
0
1
1
0
1
1
0
1
1
0
0
1
0
0
0
1
1
0
1
1
0
1
0
1
1
1
0
1
0
1
0
1
1
1
1
1
1
0
1
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
1
0
1
0
0
1
0
0
1
0
1
0
1
1
0
1
1
0
1
1
0
1
0
1
1
0
1
1
1
1
1
0
1
1
0
0
0
1
0
0
1
1
0
0
1
1
0
1
1
1
0
1
0
1
0
1
1
1
0
1
1
1
1
0
1
1
1
0
0
1
0
1
1
1
1
0
1
1
1
0
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
1
400
K. B. Sowmya and R. Raj
Fig. 4 a Dual-stage compressor b High-speed compressor c Exact multiplier
approximate 4: 2 compressor output. The approximation is made here by removing C out . While the inputs are ‘1111’ this brings down a problem. When the input bits are ‘1111’ the CARRY and SUM are both set to ‘11’ and a −1 error is imported [4]. In addition to the MUX, the high-speed area-efficient compressor design requires one AND, one XOR, and two OR gates, AND and OR gates each need six transistors. The paper offers a design using NOR and NAND gates, as depicted in Fig. 4b, to minimize transistor count [4]. Even if the sum and carry obtained by the modified design are not identical to those generated by the suggested 4: 2 compressor architecture, the inaccuracy is eliminated by cascading the compressor in multiples of 2 [13–17]. The proposed and exact compressor designs presented are utilized to build 16*16 Dadda multipliers. Levels 1 to 17 of the 16*16 Dadda multipliers use approximate compressors, whereas levels 18 to 32 use exact compressors. Figure 5 illustrates a 16*16 input multiplication procedure utilizing the proposed compressors. When there are two stages of cascaded partial products for summation, the improved dualstage compressors are utilized. For all other partial product levels below 14, should employ full adders, half adders, and the suggested high-speed area-efficient 4: 2 compressors. The simulation is run in Vivado using Verilog HDL. The high-speed compressor block takes a minimum number of cells and nets compared to the dual-stage compressor. The analysis is done using Vivado design suite and shown in Table 2. The tradeoff is the error involved with both blocks.
33 Approximate Multiplier for Power Efficient Multimedia Applications
401
Fig. 5 Approximate 16 × 16 multiplier with proposed 4:2 compressor
Table 2 Comparison of the architectural designs
Cell
Nets
Dual-stage compressor
9 cells
12 nets
High-speed 4:2 compressor
8 cells
11 cells
Exact compressor
11 cells
15 nets
4.1 Comparison of High-Speed 4:2 Compressor with Dual-Stage Compressor For comparing the device utilization and power consumption of both the designs, an 8*the 8-bit Dadda multiplier is tested in both designs, i.e., with only high-speed 4:2 compressor and with only dual-stage compressor. The simulation and synthesis are done in Vivado design suite.
402
K. B. Sowmya and R. Raj
5 Implementation Results and Analysis The exact and approximate multipliers were functionally verified using the Vivado simulator with 16-bit input data. Vivado design suite includes a Vivado simulator which is a compiled-language simulator that enables mixed-language simulation using Verilog, System Verilog, VHDL, and System C, as well as IP-centric and system-centric development environments. It allows you to simulate behavior and timing. The built-in logic simulator, i.e., ISE simulator is used for high-level logic synthesis in Vivado. The Vivado tool was used to synthesize both multipliers, and the device utilization and power consumption are given in Table 3. Figures 6 and 8 depict the synthesized designs for the exact multiplier and the approximate multiplier, respectively. Figures 7, 8, and 9 illustrate the simulation results for both architectures implemented in Verilog HDL. Specific test benches were created for the encoder and decoder, and the results of the behavioral simulation are given in Figs. 7, 8, and 9. During simulation, the time resolution was set to 1 ps. Table 4 displays the component distribution used to synthesize the exact and approximate multipliers. The exact multiplier is significantly more complicated and requires more LUTs. The proposed multiplier reduces total on-chip power, LUT count, and latency. Table 3 Comparison of high-speed 4:2 compressor with dual-stage compressor Dual-stage compressor
High-speed 4:2 compressor
Slice LUTS (32,600)
48
48
IO (150)
33
33
Total on-chip power
9.647 W
9.647 W
Data path delay (Max at slow process corner)
6.448 ns
9.752 ns
Data path delay (Min at fast process corner)
1.974 ns
2.248 ns
Fig. 6 RTL schematic of exact 16 bit multiplier
33 Approximate Multiplier for Power Efficient Multimedia Applications
403
Fig. 7 Exact 16 bit multiplier output waveform
Fig. 8 RTL schematic of approximate 16 bit multiplier
Fig. 9 Approximate 16 bit multiplier output waveform
6 Conclusion Two approximate 4:2 compressor topologies are shown in this study. To begin, a highspeed area-efficient compressor architecture is designed and evaluated in terms of delay to a modified dual-stage compressor without affecting the accuracy metrics for
404
K. B. Sowmya and R. Raj
Table 4 Device utilization and power consumption Exact 16 bit-multiplier Approx. 16 bit-multiplier Slice LUTS (32,600)
421
285
IO (150)
54
65
Total on-chip power
38.867 W
33.056 W
Data path delay (max at slow process corner) 17.185 ns
15.417 ns
Data path delay (min at fast process corner)
1.974 ns
1.974 ns
an 8 bit multiplier. When compared to the exact multiplier, a 16 bit multiplier design using a mix of both the approximate compressor design and the exact multiplier yielded a significant reduction in latency, area, and power. The architecture was developed and built with the Vivado design suite, which includes a Vivado simulator. The proposed approximate 4:2 compressor multiplier designs are intended for errortolerant applications. Future work involves developing an approximate multiplier that can be enhanced further to reduce delay and power usage. Since the area reductions are obtained at the expense of greater power usage and an increased critical path delay compared to separated configurations.
References 1. Zacharias N, Lalu V (2020) Study of approximate multiplier with different adders. In: 2020 International conference on smart electronics and communication (ICOSEC), pp 1264–1267. https://doi.org/10.1109/ICOSEC49089.2020.9215425 2. Esposito D, Strollo AGM, Napoli E, De Caro D, Petra N (2018) Approximate multipliers based on new approximate compressors. IEEE Trans Circuits Syst I Regul Pap 65(12):4169–4182. https://doi.org/10.1109/TCSI.2018.2839266 3. Jiang H, Liu C, Liu L, Lombardi F, Han J (2017) A review, classification, and comparative evaluation of approximate arithmetic circuits. ACM J Emerg Technol Comput Syst (JETC) 13:1–34. https://doi.org/10.1145/3094124 4. Edavoor PJ, Raveendran S, Rahulkar AD (2020) Approximate multiplier design using novel dual-stage 4:2 compressors. IEEE Access, vol 8, pp 48337–48351. https://doi.org/10.1109/ ACCESS.2020.2978773 5. Narayanamoorthy S, Moghaddam HA, Liu Z, Park T, Kim NS (2015) Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Transa Very Large-Scale Integr (VLSI) Syst 23(6):1180–1184. [6858039]. https://doi.org/10.1109/ TVLSI.2014.2333366 6. Momeni A, Han J, Montushi P, Lombardi F (2015) Design and analysis of approximate compressors for multiplication. IEEE Trans Comput 64:984–994 7. Chang Y-J et al (2019) Imprecise 4–2 compressor design used in image processing applications. IET Circuits Devices Syst 13:848–856 8. Reddy KM, Vasantha MH, Nithin Kumar YB, Dwivedi D (2019) Design and analysis of multiplier using approximate 4–2 compressor. Int J Electron Commun (AEÜ) 107:89–97 9. Van Toan N, Lee J (2020) FPGA-based multi-level approximate multipliers for highperformance error-resilient applications. IEEE Access 8:25481–25497. https://doi.org/10. 1109/ACCESS.2020.2970968
33 Approximate Multiplier for Power Efficient Multimedia Applications
405
10. Gu F-Y, Lin I-C, Lin J-W (2022) A low-power and high-accuracy approximate multiplier with reconfigurable truncation. IEEE Access 10:60447–60458. https://doi.org/10.1109/ACCESS. 2022.3179112 11. Akbari O, Kamal M, Afzali-Kusha A, Pedram M (April 2017) Dual-quality 4:2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(4):1352–1361. https://doi.org/10.1109/TVLSI.2016.2643003 12. Vahdat S, et al (2017) TruncApp: a truncation-based approximate divider for energy efficient DSP applications. In: Design, automation & test in Europe conference and exhibition (DATE) 2017, pp 1635–1638 13. Maheshwari N, Yang Z, Han J, Lombardi F (2015) A design approach for compressor based approximate multipliers. In: 2015 28th international conference on VLSI design, pp. 209–214. https://doi.org/10.1109/VLSID.2015.41 14. Strollo AGM, Napoli E, De Caro D, Petra N, Meo GD (2020) Comparison and extension of approximate 4–2 compressors for low-power approximate multipliers. IEEE Trans Circuits Syst I Regul Pap 67(9):3021–3034. https://doi.org/10.1109/TCSI.2020.2988353 15. Salmanpour F, Moaiyeri MH, Sabetzadeh F (Sept 2021) Ultra-compact imprecise 4:2 compressor and multiplier circuits for approximate computing in deep nanoscale. Circ Syst Sig Process 40(9):4633–4650. https://doi.org/10.1007/s00034-021-01688-8 16. Gorantla A, Deepa P (2017) Design of approximate compressors for multiplication. ACM J Emerg Technol Comput Syst (JETC) 13:1–17 17. Garg B, Patel SK (2021) Reconfigurable rounding based approximate multiplier for energy efficient multimedia applications. Wirel Pers Commun 118:919–931
Chapter 34
A Study on the Implications of NLARP to Optimize Double Q-Learning for Energy Enhancement in Cognitive Radio Networks with IoT Scenario Jyoti Sharma, Surendra Kumar Patel, and V. K. Patle
1 Introduction The conservation system across the globe has become more popular enough it has been considered to be a smart system in the country’s infrastructure development. The country becomes smart [1] in many aspects by considering technological development especially in the computing science arena or due to the development in information technology. The developments are more and more unimaginable in the course of time, earlier in the era of mechanical engineering technology lasted for a few years but in the information technology or digital era the technology lasts for few days or sometimes for few hours in certain cases. The contribution made by these technologies has been immeasurable. Network Lifetime Aware Routing Protocol (NLARP) has contributed with its best of the algorithm that has imbibed the [2] automation system and been utilized in many distinctive ways resulting in much conservation especially in energy harvesting, control system as well apart from conservation system. Q-learning [3] has equally contributed in the same area, but it has been considered as overoptimistic in most of the areas. The recent trends and the development in the information technology sector is IoT-Internet of Things where it utilizes the best of its kind in various sectors that optimizes in the area where it has been adopted. The study has taken the four distinctive areas into consideration and tries to measure the implications made by Network Lifetime Aware Routing Protocol (NLARP) by inculcating double Q-learning [4]. Cognitive radio networks along with
J. Sharma · V. K. Patle Computer Science and IT Pt., Ravishankar Shukla University, Raipur, Chhattisgarh, India S. K. Patel (B) Department of Information Technology Government Nagarjuna P.G. College of Science, Raipur, Chhattisgarh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_34
407
408
J. Sharma et al.
Internet of Things (IoT) and the drawn contribution will enhance the energy system being adopted in the country.
2 Methods An embedded system using cognitive radio networks with Internet of Things for the Network Lifetime Aware Routing. The system has been developed using the double Q-learning algorithm where the relevant developments have shown a positive enhancement in many energy conservation projects earlier and now the same can be used right here, Q-learning is a widely used reinforcement learning algorithm [5]. The Q-earning that was proposed by Watkins have showed better performance but an error of over estimation in the course of time, therefore imbibing double Q-learning into the same system will enhance the purpose of the paper on (i) optimization of Network Lifetime Aware Routing Protocol and (ii) energy enhancement in cognitive radio networks. Fig. 1 has been established and clarifies while applying the system to work. Applying the double Q-learning algorithm to [6] enhance and overcome the challenges faced by the cognitive radio networks has been explained below. Q t+1 st, at = Q t (st , at ) + αt st, at r t + γ Q t st+1, a − Q t (st , at )
(1)
Q-learning used belongs to off policy statement whereas SARSA works on policy learning, where maximum Q value is available in the next state Q ⇒ Q∗, Optimal action-value function. The action are repeated as. Initialize Q(s, a).
Fig. 1 Key challenges in cognitive radios network and with their machine learning-based solutions [7]
34 A Study on the Implications of NLARP to Optimize Double Q-Learning …
409
Picking the S t Start state and at . Do observe at . Selecting which is based on Q(s, a) e-greedy. Do a and concern r, s’ Q(s, a) = Q(s, a) + α r + γ max Q(s’, a’) − Q(s, a) a’
S = s Until S point of terminal influences. Q-learning back up the value of a’ ∗ which is the best next event performance with the higher of highest Q value of Q(s’, a’), the actions are followed by the behavior and not a ∗ . With this QL-off policy optimal policy is followed along edge than SARSA where non-optimal policy along with edge is followed ∈ -greedy algorithm denotes. ∈ = 0, SARSA is used in the online, when ε → 0 gradually both converge to optimal. In the above equation, the finding value of all the states is worked along with the pairs Q(s’, a’), it indicates how the event action is performed in the network state st and then make observing the optimal policy for the intellectual best action as (Fig. 2) V ∗ (St ) = max at Q ∗ (st , at )
(2)
For each possible state of st + 1, the probability value is moved like P(st+1 |st , at ) and performed continuously with whatever top policy to the expected collective
Fig. 2 Q-learning Q(s’, a’)
410
J. Sharma et al.
reward of V (st+1 ), so the steps are discounted one step later. With the expected reward and action of at the bellman’s equation (Bellman 1957 is applied as). First step to initiation of the V (s). Perform and repeat the action or value. For every s ∈ S. For every a ∈ A P(s\s, a) V s Q(s, a) ← E r |s , a + γ s,∈S
V (s) ← max a Q(s, a) Till V (s) converge. The Q ∗(st , at ) values, by performing a greedy search strategy at step, the ideal system of steps that maximizes the collective reward. The above equation, Qt (s, a) gives the value of the action a in states at time t and thereward rt is drawn from a fixed recompense delivery R : S × A × S → R where s = Rsa . E (s, a, s’) = st , at, st+1 Hence, the next number of states st+1 is estimated or evaluated by a fixed state transition distribution P : S × A × S → R[0, 1],
s produced the chance of probability with ending up in state s’ after where Psa s performing a in s, and Psa = 1. s
Performing and learning cost αt (s, a) ∈ [0, 1] which ensures that the update averages value over a number of possible randomness in their wards and the number of transitions performed in order to converge in the ending limit to the best event of action-value function. Hence, the best value function [8] is the outcome to solve the following set of expressions. ∀s, a : Q ∗ (s, a) =
s s Rsa Psa + γ Q∗ s, a
(3)
s
The above expression solves the problem of overestimation and thus brings the network to meet the requirements of the system we develop. We can use the single estimator technique to calculate approx. performing the value of the next state by maximizing over the estimated event of action in that network state. The value obtaining of the maximum by implementing single estimator method E{Xi} = E{μi} ≈ μi(s)
(4)
The value obtaining of the maximum by implementing single estimator method
34 A Study on the Implications of NLARP to Optimize Double Q-Learning …
E{Xi} = E μ B i ≈ μ B a ∗
411
(5)
Therefore, we will obtain the maximum from the implemented Network Lifetime Aware Routing.
3 Background Theorem: We concern a state that denotes s in which all the true ideal performance values are equal at. Q*(s; a) = V *(s) for some V *(s). Let Qt are the arbitrary value which calculate that are on the complete unbiased in the sense that a(Qt(s, a) − V *(s)) = 0, hence that should not all correct form, such that 1/m a(Qt(s, a) − V *(s))2 = C for some C > 0, where m ≥≥ 2 performs a number of actions in states. Under these conditions, c ∗ Q t (s, a) ≥ V (s) + m−1 . Hence, this lower bound value is tight. [9] Under the identical conditions, the inferior bound on the utter fault of the double Q-learning approximation is zero. This method is one of the baselines [10] through which the study has chosen to develop the system and understand the impact of Network Lifetime Aware Routing. Implementation of the machine learning [1] aspects will be an added advantage to the system where Internet of Things contributes the best and enhances the facility. Cognitive radio network is widespread technology where it applies the secondary devices to engage to acquire the ideal range of spectrum from the basic user. It avoids the false intruder in the [11, 12] cognitive network and thus brings the possible alarm as such. The cognitive radio network also proposes to implement Q-learning, while implementing double Q-learning is an added advantage.
3.1 Cognitive Radio Network 3.2 The History and Technology of Internet of Things (IoT’s) The IoT technology has become very familiar nowadays, but the history has started some time ago that connects much devices starting from Car, medical devices, mobile phones, [13] computers and many more, but that was just a beginning it has moved further to all the majority of the devices being used inside the home, from home it has moved to the city to make it smart, many of the India’s city are the witnesses, and now the benefit has been sustained to the agriculture as well. All the [9] applications have been consuming the energy using cognitive radio networks and now the study takes the Network Lifetime Aware Routing Protocol that connects all the said technologies and logics (Figs. 3 and 4).
412
J. Sharma et al.
Fig. 3 CR surrounded by different RATs [7]
Fig. 4 Impact of connected life, understanding IoT [14]
4 Implications of NLARP The study has identified the available platform in the form of technology, model, algorithm and network using intellectual and fast cognitive radio networks and many more. A common phenomenon being [15] understood is to double Q-learning and the Internet of Things. All the mentioned technologies and logics have been applied to find the best outcome and measure its implication while implementing Network Lifetime Aware Routing Protocol (NLARP) to enhance energy in the cognitive radio network. The result of the analysis is to enhance the energy in this particular [16] network or routing plan where the following suggestions and intentions have been identified for the betterment.
34 A Study on the Implications of NLARP to Optimize Double Q-Learning …
413
4.1 Network Layer for IoT Devices And hence, below graphical representation describes that the [17] network has been layered in the IoT, being implanted in the devices we utilize on a device used in the day to day life. Both the figures have listed its implication on the devices where it has been applied (Figs. 5 and 6). The above figures depict the application of IoT in the regular or routine [19] usage of human life and advanced technology. The above figure is also the evidence of the state and the [20] implication of the desired NLARP devices that has been used for
Fig. 5 Network layer for IoT devices [18]
Fig. 6 Other IoT devices in the daily use [18]
414
J. Sharma et al.
Fig. 7 Value chain [23]
routine activity in human life. It also states that the device that consumes energy has been very successful in the conservation and in its performance as well.
4.2 The GSMA’s Visualization of IoT Services—The Linked Life The understanding of any network started and became more familiar among the people through the mobile services by the service providing [21] company. The figure below explains where the embedded system can be enhanced, and the application flow is based on the value chain created through the system created with the help of all the [22] technology or logic being adapted in the study (Fig. 7).
4.3 Matrix of Energy and Security Visualization by Using Payoff in Coalition Game The below table has made an [23] important impact using the network, and it synthesizes the impact on the energy and security while keeping the energy constant or decreasing as mentioned in table (Table 1).
34 A Study on the Implications of NLARP to Optimize Double Q-Learning …
415
Table 1 Based on energy and security related matrix representation using payoff in coalition game Payoff matrix
No interference
Interference with noise in selected network
There is no interference
Power and security are the same System and network security thing enhances with constant level of energy and also decreasing energy
Interference in the specified System and network security channel enhances with constant level of energy and also decreasing energy
System privacy increases with a way of decreasing energy
Interference with noise in indolent network system
System privacy increases with a way of decreasing energy
System and network security enhances with constant level of energy and also decreasing energy
Table 2 Spectrum utilization using Q-learning Monitoring of energy related with spectrum scenario
Method of coalition with Q-learning by means of battery with 50%
Method of non-coalition by means of battery with 50%
Energy spent
56.1
51.2
Energy consumption
49.3
47.1
Spectrum utilization packets sent
55,681
50,572
4.4 Spectrum Utilization Using Q-Learning As mentioned in the table which inculcates the information and the impact from the previous table that keeps the [23] spectrum as the key point. Energy and spectrum have been monitored by taking a method of coalition with Q-learning and the battery usage was also taken into consideration. The output [13] has brought a great change in the energy spent, consumed and the spectrum utilization, the table witnesses that there is a reduction in turn that benefits in the course of time (Table 2).
5 Conclusion The Network Lifetime Aware Routing Protocol has made a significant implication, while double Q-learning being utilized and the attempt of energy enhancement using cognitive radio network with Internet of Things being embedded were found positive. The implication has been measured right from the history stating that the spread of application is really vast almost in all segments. Furthermore, the application has
416
J. Sharma et al.
been moved from the mobile services and enters into the medical segment and the crux is the energy enhancement. While going through all the above history and the double Q-learning integration have made the concept more confident to take it forward, there has been a witness of saving and earning money in many aspects. Spectrum utilization has also made a great and positive impact in the course of time. The impact has been tested with the Q-learning system along with the utilization of the battery following two different strategies like coalition and non-coalition. All the implications have been found positive, and the energy will definitely be enhanced while following the logic, system technology and much more.
References 1. Bindhu V (2020) Constraints mitigation in cognitive radio networks using computing. J Trends Comput Sci Smart Technol 2(1):1–10 2. Gu Y, Chen H, Zhai C, Li Y, Vucetic B (2019) Minimizing age of information in cognitive radio-based IoT systems: underlay or overlay? IEEE Internet Things J 6:10273–10288 3. Azade Fotouhi MD (2021) Deep Q-learning for two-hop communications of drone base station. J Sens 21(6)1–14 4. Albaire NB (2021) Cognitive radio based internet of things: applications, challenges and future research aspects. Int J Eng Inf Syst 5(5):58–62 5. Wenli Ning XH (2020) Reinforcement learning enabled cooperative spectrum sensing in cognigive radio networks. J Commun Networks 22(1):12–21 6. Koushik AF (2019) Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks. J IEEE 1–11 7. Upadhye A, Saravanan P (19 June 2021) A survey on machine learning algorithms for applications in cognitive radio networks, arXiv:2106.10413v1 [eess.SP] 8. Macro Lombardi FP (2021) Internet of Things: a general overview between architectures, protocols and applications. J Inf 12(2):12–87 9. Thuslimbanu DK (2014) Spectrum holes sensing policy for cognitive radio network spectrum holes sensing policy for cognitive radio network. Int J Adv Res Comput Sci Technol 2(1):170– 175 10. Zhou JS (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 109(1):1–10 11. Sharma DK (2018) A machine learning based protocol for efficient routing in opportunistic networks. IEEE Syst J 12(3):2207–2213 12. Jiang T (2011) Reinforcement learning-based spectrum sharing for cognitive radio. New York, Department of Electronics University of York 13. Zhang WZ (2018) Satellite mobile edge computing: improving QoS of high-speed satellite terrestrial networks using edge computing techniques. IEEE Network 97(c):70–76 14. https://www.gsma.com/iot/wp-content/uploads/2014/08/cl_iot_wp_07_14.pdf. Accessed 13 April 2022 15. Djamel Sadok CM (2019) An IOT sensor and scenario survey for data researchers. J Braz Comput Soc 25(4):2–17 16. Zikira HY (2020) Cognitive radio networks for internet of things and wirless sensor network. J Sens 20(5288):1–6 17. Liu XM (2021) Movement based solutions to energy limitation in wireless sensor networks: state of the art and future trends. IEEE Networks 9(1):188–193 18. Nilsson E, Anderson D (2018) Internet of things a survey about thoughts and knowledge. National Category Engineering and Technology
34 A Study on the Implications of NLARP to Optimize Double Q-Learning …
417
19. Wu Z (2020) Scheduling-guided automatic processing of massive hyperspectral image classification on cloud computing architectures. IEEE Trans Cybern 51(7):1–14 20. Marchese M, Patrone F (2018) Energy-aware routing algorithm for DTN-nanosatellite networks. In: Proceedings of IEEE global communications conference, Abu Dhabi 21. Zhao YM (2020) On hardware trojan-assisted power budgeting system attack targeting many core systems. J Syst Archit 109(10):1–11 22. Zhang WG (2017) IRPL: an energy efficient routing protocol for wireless sensor networks. J Syst Archit 11(3):35–49 23. Vimal Shanmuganathan LK (2021) EECCRN: energy enhancement with CSS approach using Q-learning and coalition game modelling in CRN. Inf Technol Control 50(1) 24. Suresh P (2014) A state of the art review on the internet of things (IoT) history, technology and fields of deployment. In: 2014 International conference on science engineering and management research (ICSEMR), pp 1–8 25. Jyoti Sharma SK (2020) Hybrid firefly optimization with double Q-learning for energy enhancement in cognitive radio networks. Int J Eng Res Technol 7(3):5227–5232 26. Deng XH (2020) Task allocation algorithm and optimization model on edge collaboration. J Syst Archit 110:1–14 27. Sun YZ (2019) An efficient and scalable framework for processing remotely sensed big data in cloud computing environments. IEEE Trans Geosci Remote Sens 4294–4308
Chapter 35
Automatic Generation Control Simulation Study for Restructured Reheat Thermal Power System Ram Naresh Mishra
1 Introduction A crucial component of improving the operation of the power system is automatic generation control (AGC). This is used to adjust the frequency in response to client power demands [1]. Basically, the bulky power systems contain the control areas indicating the coherent sets of generators, which is needed to monitor the frequency including tie-line power closer for fixed values. The system frequency is vulnerable by the load variations of the power system, and the reactive power has lower sensitivity due to the frequency deviations [2]. So, the control operations of active with reactive power could be independent. The frequency is not similar to power generation together with load requirement when the deviation occurs in frequency. Numerous studies have been published that consider various types of traditional AGC schemes [2, 3]. Companies for power generation, transmission, distribution, and individual power producers (IPPs) are the different firms that carry out their tasks with a monitoring unit as independent system operators in a restructured environment [3]. The constant level of power transmission triggers the power system functions to make it good and reliable. Because of the increasing number of industries, the power system became exceptionally complex. The variation of active power demand of industries creates the changes in system frequency. The unbalanced condition among the supply and load decreases the power system performance and makes the complex control operation
R. N. Mishra (B) GLA University, Mathura, UP 281406, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_35
419
420
R. N. Mishra
[4]. The control related issues of power systems mainly occur during the implementation process as well as operation process. The load frequency control (LFC) is also called frequency regulation of power systems that is important for automated generation control. The major goal of LFC is to diminish the frequency deviation, interchange tie-line power then to ensure the zero steady state errors. Christie and Bose [5] highlighted an increased system performance with better stability function under lower interruptions in the system. The implementation or operational difficulties become a major problem for the interconnected power system due to variation of structures, system size. The frequency as well as load demand should have a certain limit at any time to achieve reliable power delivery and increase the power system performance. The AGC provides the better solutions for achieving the reliable power delivery and increasing system performance but which is only possible through LFC. In addition, the AGC provides scheduled tie-line power as well as required system frequency to overcome inequality of the system. To overcome the inequality among the power generation and power demand, the set point of the power generating source is automatically changed by adjusting the speed [5]. In a deregulated environment, Donde et al. [6] applied AGC of interconnected energy networks by using the concepts of DPM as well as area participation factor (apf) to represent bilateral agreements. Nevertheless, several references to various aspects of load frequency control (LFC)/AGC of electrical systems in restructured environments have been found in the literature [4, 5]. Muthuraman et al. [7] discussed two-area power systems in a reformed environment based on PSO optimized LFC. Ranjan and Lal Bahadur [11] discussed integral control schemes for multi-area deregulated AGC systems. Hassan [12] investigated LFC models for conventional as well as smart power systems. The AGC of a linked two region power framework in a restructured condition is created in this research using a PSOA-optimized PID controller. The paper has been written as follows. Segment 2 presents the proposed power system model, and Sect. 3 discusses the use of the PSOA-optimized PID controller. Segment 4 focuses on the analysis and findings of the simulation. The specified work’s conclusion is described in phase 5.
2 Restructured Reheat Thermal Power System Model Two regionally reformed reheat thermal power systems are taken into consideration for this study’s examination. Figure 1 depicts the contracted power of DISCOs. Figure 2 displays the transfer function block diagram for this system. In [11], system data are provided. Ptie Scheduled = (Demand from GENCOs in control area 1 to DISCOs in control area 2) − (Demand from GENCOs in control area 2 to DISCOs in control area 1)
35 Automatic Generation Control Simulation Study for Restructured …
421
Fig. 1 Contracted power of DISCOs
Scheduled Ptie
=
3 4 i=1 j=3
Cpfij PLj −
6 2
Cpfij PLj
(1)
i=4 j=1
2π T 12 (F1 − F2) S
(2)
Error Actual Scheduled = Ptie − Ptie Ptie
(3)
actual Ptie =
Error decreases to zero. Here are provided the area control In a stable state, Ptie errors (ACEs) for two areas. Error ACE1 = B1F1 + Ptie
(4)
Error ACE2 = B2F2 + α12Ptie
(5)
Bi = constant for frequency bias of ith area (pu MW/Hz) and F i = frequency deviation of ith area (Hz), and i = 1, 2.a12 = size ratio of control area. Due of their apf for AGC, each region has three GENCOs and the ACE signal is transferred between them. The sum of each area’s apfs must, therefore, equal to 1, and each
422
R. N. Mishra
Fig. 2 Two-area restructured reheat thermal power system model
area’s variance in contracted local load demand can be stated as PL1LOC = PL1 + PL2
(6)
PL2LOC = PL3 + PL4
(7)
35 Automatic Generation Control Simulation Study for Restructured …
⎛
Cpf 11 Cpf 12 ⎜ Cpf 21 Cpf 22 ⎜ ⎜ ⎜ Cpf 31 Cpf 32 DPM = ⎜ ⎜ Cpf 41 Cpf 42 ⎜ ⎝ Cpf 51 Cpf 52 Cpf 61 Cpf 62
⎞ Cpf 13 Cpf 14 Cpf 23 Cpf 24 ⎟ ⎟ ⎟ Cpf 33 Cpf 34 ⎟ ⎟ Cpf 43 Cpf 44 ⎟ ⎟ Cpf 53 Cpf 54 ⎠ Cpf 63 Cpf 64
423
(8)
3 Application of PSOA for Tuning PID Controller To get the greatest outcomes, the PID controller’s primary job is to activate the feedback device. Three gains are the parameters of PID controller. Its transfer function is
1 u(s) = kP 1 + + td s g(s) = e(s) tI s where kI = ktIP and kD = kP tD Kennedy and Eberhart created the particle swarm optimization algorithm (PSOA), an optimization technique based on population [8]. To get the best outcomes; the PID controller is modified to obtain the controller gains based on the PSOA. It is said in the literature that ITAE is a superior objective function. Therefore, to optimize the gains of PID controllers utilizing PSOA, the goal function is chosen as integral time absolute error (ITAE). PSOA will continue to update based on this objective feature until there are no more iterations possible. The PSOA flowchart is displayed in Fig. 3 [9].
424
R. N. Mishra
Fig. 3 Flowchart of PSOA
4 Simulation Results and Discussion
4.1 Poolco-Based Transaction (PBT) In this instance, DISCO-1 and 2 have agreed that the load they are requesting from the GENCOs belongs to their respective areas. No GENCOs are required to supply power to the DISCOs in region 2. Therefore, there is no analogous cpfs in DPM.
35 Automatic Generation Control Simulation Study for Restructured …
425
A total load fluctuation that only affects area 1 (=0.01 pu MW), while total load fluctuation in area 2 is zero. The following mentions DPM for PBT [10]. ⎛
⎞ 0.3333 0.3333 0 0 ⎜ 0.3333 0.3333 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 0.3333 0.3333 0 0 ⎟ DPM = ⎜ ⎟ ⎜ 0 0 0 0⎟ ⎜ ⎟ ⎝ 0 0 0 0⎠ 0 0 00 Figure 4 proves dynamic responses for deviation in tie-line power (pu MW), deviation in system frequency, and GENCOs power responses (pu MW) for both areas subjected to PBT. The overall performance of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of PBT for restructured thermal (reheat) power system is proven in accordance with the time of settling including apex undershoot/overshoot.
4.2 Bilateral-Based Transaction (BBT) In this situation, every DISCO has a contract with every GENCO, and they all accept the terms of that contract [10]. ⎛
0.2 ⎜ 0.2 ⎜ ⎜ ⎜ 0.1 DPM = ⎜ ⎜ 0.2 ⎜ ⎝ 0.2 0.1
0.1 0.2 0.3 0.1 0.2 0.1
0.3 0.1 0.1 0.1 0.2 0.2
⎞ 0 0.1666 ⎟ ⎟ ⎟ 0.1666 ⎟ ⎟ 0.3336 ⎟ ⎟ 0.1666 ⎠ 0.1666
According to cpfs, each DISCO only requests 0.005 pu MW of power from the GENCOS in their region. As a result, each area’s overall load disturbance is 0.01 pu. In Fig. 5, which has been exposed to BBT, it is demonstrated that dynamic responses for deviations in system frequency for each area, deviations in tie-line power (pu MW), and GENCOs power responses (pu MW) for both areas. The overall performance of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of the BBT for the restructured thermal (reheat) power system is tested in accordance with the time of settling including apex undershoot/overshoot.
426
Fig. 4 Dynamic responses for PBT
R. N. Mishra
35 Automatic Generation Control Simulation Study for Restructured …
427
Fig. 4 (continued)
4.3 Contract Violation-Based Transaction (CVBT) In this instance, DISCO-1 receives 0.003 pu MW of contract extensions power from the GENCOS in its region. As a result, the updated figure of area-1’s load demand is 0.013 pu MW [10], where 0.01 pu MW represents the total load disturbance that occurs in each region. Figure 6 shows dynamic responses for deviation in tieline power (pu MW), variance in system frequency (pu Hz) for each location, and GENCOs power responses (pu MW) for both areas. The overall effectiveness of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of CVBT for the restructured thermal (reheat) power system is tested in accordance with the time of settling including apex undershoot/overshoot.
5 Conclusions In this paper, a PSOA-tuned PID controller is used to study the AGC of a twoarea reconstructed reheat power system. There are three types of power transactions that are taken into consideration; poolco, bilateral, and contract violations. Where
428
Fig. 5 Dynamic responses for BBT
R. N. Mishra
35 Automatic Generation Control Simulation Study for Restructured …
429
Fig. 5 (continued)
DISCOs break their contracts, there are larger frequency variances. Furthermore, when an agreement is broken, the response of the tie-line power deviation deteriorates. This simulation study demonstrates that overall performance of PSOA-tuned PID controller to regulate deviations in frequency and tie-line power for restructured thermal (reheat) power system is validated for poolco, bilateral, and contract violation-based power transactions. Performance is measured in accordance with the time of settling including apex undershoot/overshoot. Additionally, each control area’s GENCOs’ power generation is adequate.
Fig. 6 Dynamic responses for CVBT
430
R. N. Mishra
Fig. 6 (continued)
References 1. Elgerd OI (1971) Electric energy systems theory: an introduction, 2nd edn. McGraw Hill Education, New Delhi, India 2. Ibraheem, Kumar P, Kothari DP (2005) Recent philosophies of automatic generation control strategies in power systems. IEEE Trans Power Syst 20(1):346–357 3. Shayeghi H, Shayanfar HA, Jalili A (2009) Load frequency control strategies: a state-of-the-art survey for the researcher. Energy Convers Manage 50(2):344–353 4. Mishra RN, Chaturvedi DK, Kumar P (2020) Recent philosophies of AGC techniques in deregulated power environment. J Inst Eng India Ser B. https://doi.org/10.1007/s40031-020-004 63-8 5. Christie RD, Bose A (1996) Load frequency control issues in power system operations after deregulation. IEEE T Power Syst 11:1191–1200
35 Automatic Generation Control Simulation Study for Restructured …
431
6. Donde V, Pai MA, Hiskens IA (2001) Simulation and optimization in a AGC system after deregulation. IEEE Trans Power Syst 16(3):481–489 7. Muthuraman, Priyadarsini A, Arumugom (2016) PSO optimized load frequency control of two area power system in deregulated environment. J Electr Electr Syst 5(4). https://doi.org/ 10.4172/2332-0796.1000210 8. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Processes of IEEE international conference on neural networks pp 1942–1948 9. Alam MN (2016) Particle swarm optimization: algorithm and its codes in MATLAB. https:// doi.org/10.13140/RG.2.1.4985.3206 10. Sharma M, Dhundhara S, Arya Y (2020) Frequency stabilization in deregulated energy system using coordinated operation of fuzzy controller and redox flow battery. Int J Energy Res Wiley 7457–7473 11. Kumar R, Prasad LB (2021) Performance analysis of automatic generation control of multi-area restructured power system. ICAECT. https://doi.org/10.1109/ICAECT49130.2021.9392417 12. Alhelou HH, Hamedani-Golshan M-E, Zamani R, Heydarian-Forushani E, Siano P (2018) Challenges and opportunities of load frequency control in conventional, modern and future smart power systems: a comprehensive review. Energies 11:2497. https://doi.org/10.3390/en1 1102497
Chapter 36
Processing and Analysis of Electrocardiogram Signal Using Machine Learning Techniques Gursirat Singh Saini and Kiranbir Kaur
1 Introduction The electrocardiogram (ECG) is a diagnostic tool which detects the electrical activity that is present in the heart during pumping of blood. The interpretation of this electrical activity can help us to understand various underlying abnormalities in the heart. The ECG can give significant data of the person’s heart rhythm, increased thickness of heart muscle, a past heart attack, indications of diminished oxygen delivery to the heart, and issues with conduction of the electrical current starting with one part of the heart then onto the next. There are no dangers of ECG. The electricity is not passed through the body, and there is no danger of the shock. ECG can be used to understand the various issues if present inside the heart related to conduction of the impulses or issue in contraction of heart muscles due to damaged muscle fiber.
1.1 Conditions that Are Diagnosed with the ECG 1. 2. 3. 4.
It detects if the heart rate is abnormally fast. It detects if the heart rate is abnormally slow. If the waves or waveform of the ECG is not as per normal, it can depict underlying issues in the heart. It can depict if there is a trace of past heart attack.
G. S. Saini (B) · K. Kaur Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar 143005, India e-mail: [email protected] K. Kaur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_36
433
434
G. S. Saini and K. Kaur
5. 6.
Evidence of an evolving and acute cardiac attack. It can depict if any heart damage is there due to less supply of blood to heart during a heart attack. 7. Heart diseases can have an effect on the heart and can be seen in ECG. 8. Lung diseases like emphysema, etc., can also show deviations in ECG. 9. Congenital heart abnormalities can also be seen as variation in ECG from normal waveform. If the wave is not normal, it can also suggest imbalance of electrolytes like sodium, potassium, etc. 10. Magnitude of peaks can tell about the inflammation or enlargement of the heart.
1.2 Limitations of the ECG ECG is a graph and does not show the dynamics of the heart condition during various periods of the day. There is a possibility that a person has a heart problem but the ECG is normal. In this condition, if ECG is taken during stress increase, it can show the underlying problems in the heart. Sometimes, variation in ECG cannot be directly interpreted and can also represent more than one issue in the heart. Possibly the variations in ECG sometimes are not that specific. This can be sorted by doctor consultation and other cardiovascular tests.
2 ECG Signal Processing In the past few years, the electrocardiogram has played a vital role in heart disease detection, emotional and stress assessment, human computer interface (HCI), etc. Karthikeyan et al. [1] reported that digital signal processing and analysis of data are mostly commonly applied methods in biomedical engineering research. These methods find wide range applications in ECG signal processing especially and extravagant research has been done in this field since the past 30 years. Researchers and industries across the world have progressed a lot and have been quite successful in this endeavor of acquisition of ECG signal and then processing it for further detection and classification. The major aim behind ECG signal processing is to improve the accuracy and reproducibility and the deduction of features which are difficult to extract from the signal by viewing it from naked eyes. At times, the signal is recorded during stress conditions that the ECG gets faulted due to various kinds of noise, those come from the other physiological activities happening in the person. Therefore, disturbance removal called as noise reduction is a very crucial aim of electrocardiogram signal processing; actually, the noise masks the waveforms of interest so largely that their presence can only be revealed on the application of first signal processing. Hamilton [2] pointed out that to identify intermediate disturbances in the rhythm of the heart, electrocardiographic signals can be recorded for a few days. Hence, large amounts of data sizes will fill in the present space for storage that comes
36 Processing and Analysis of Electrocardiogram Signal Using Machine …
435
out from the ECG recording. Another application which involves a large amount of data is transmission of signals across public telephone networks. This is another important aim for ECG signal processing as in both conditions, data compression is an important step. Liu et al. [3] explained that there is an important contribution of signal processing for the latest understanding of electrocardiogram and its changing features as determined by beat morphology and changes in rhythm. Standard electrocardiogram signal print cannot recognize any of the two oscillatory signal properties. Few fundamental algorithms are developed which process the signal with different kinds of artifacts and noises, check heartbeat, withdraw basic ECG quantification of wave time as well as amplitudes and squeeze the data for better transmission and storage. This is common to various ECG analysis—like ambulatory testing, stress monitoring, resting ECG interpretation or care monitoring as reported by Sornmo and Laguna [4]. Sometimes these algorithms are incorporated into another different algorithm without disturbing the sequence of the algorithms to improve on performance. The noise which comes in the ambulatory ECG scanning is much more than what is taken from the ECG at rest. Thus, complexity of the algorithms will be changed based upon the application of the ECG diagnostics system. The ECG signal analysis will give us relevant data about the features of the ECG signal which is under analysis. The data can be further processed in high end algorithms based on the application in order to get the desired results about the morphology of the heart.
3 Significance of Denoising The noises present in the ECG signal can very well provide hindrance to the accurate interpretation of the ECG signal. The noises are of three types as mentioned above. The denoising of the ECG signal will give a pure ECG signal which can have a lot of information regarding the morphology of the heart. This can be used in order to finely interpret the underlying problems in the heart if any present. Recently, with the advent of MATLAB and its increasing popularity, MATLAB tools and functions can be used to devise codes and formulate them to reduce unwanted noise from the waveform. Hence, new techniques for denoising of the ECG signal have been implemented using MATLAB. The different kinds of noises like baseline wandering and power line interference can be removed by designing specific filters. The EMG noise has overlap with the QRS complex in the spectral analysis. Thus, this noise is more difficult to remove from the signal. But as ECG is recurrent, we can take more ECG signals in order to reduce the misinterpretation due to EMG noise. During the pre-processing of the signal, filtering of the signal is done in order to get the signal of high quality for feature extraction. Filtering should not affect the information of the ECG but should only affect the noise. Filtering of a signal is done to remove any type of distortion or noise present in the signal. Jeyarani and Singh [5] explained the use of three different filters for the three major types of noises in the ECG signal. The frequency of baseline wander is low. The frequency band is less than 1 Hz. The noise can be removed by using a high pass filter with cut-off frequency near 1 Hz.
436
G. S. Saini and K. Kaur
The power interference noise can be reduced by using a notch filter with notch at 50 Hz. The moving average filter can be used to reduce high frequency noise. This filter will smoothen the signal and provide an average of the values at any instant of time to remove high frequency noise from the signal.
4 Related Works Over the last many years, numerous papers have been reported on the study of ECG signals by using various techniques. Chazal and Reilly [6] reported the study with focus on a patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. Übeyli [7] illustrated the ECG beats classification using multiclass support vector machines with error correcting output codes and highlighted the advantages of multiclass support vector machines. Pourbabaee and Lucas [8] studied automatic detection and prediction of paroxysmal atrial fibrillation by using ECG signal feature classification methods and described the merits of this technique. Melgani and Bazi [9] reported the study of classification of ECG by employing different methods viz., particle swarm optimization method and classification of super vector machine method. They applied such methods to detect working of the heart in different forms such as normal, atrial premature beat, ventricular premature beat, paced beat and right and left bundle branch block. In this work, the research analysis was carried out by considering data from the MIT-BIH arrhythmia database of twenty patients with total pulses around 40,438 with highest accuracy level of 89.72%. Llamedo and Martinez [10] described heartbeat classification using feature selection driven by database generalization criteria and presented the significant findings that may play a very important role in predicting the actual problem in the heart of the patient. Zia et al. [11] well discussed the role of efficient and simplified adaptive noise cancellers for ECG sensor-based remote health monitoring. Here, the problem of noise cancelation from ECG signals is done using error normalization-based adaptive filters on real signals with different artifacts obtained from the MIT-BIH database. The computational complexity is in terms of multiply and accumulate (MACs) and signal-to-noise ratio (SNR). Swathi et al. [12] presented the study of R peak detection and feature extraction for the diagnosis of heart diseases and detected the R peaks obtained from denoised ECG signal with 97.56% accuracy. They achieved 80% classifier accuracy by applying an algorithm on the MIT-BIH arrhythmia database. An investigation focusing on computational techniques for ECG analysis and interpretation in light of their contribution to medical advances was illustrated by Lyon et al. [13]. A big data classification approach using LDA with an enhanced SVM method for ECG signals in the cloud was reported by Varatharajan et al. [14]. They used image processing filters viz., finite impulse response (FIR) and infinite impulse response (IIR) filters to remove unwanted noises. They achieved 92% accuracy by employing support vector machine (SVM) and linear discriminant analysis (LDA).
36 Processing and Analysis of Electrocardiogram Signal Using Machine …
437
5 Various Techniques Used for Analysis and Prediction 5.1 Wavelet Transform Technique The previous methods of analyzing the ECG signals were based on time-domain analysis. The ECG has various information in frequency domain as well. In order to extract the information present in the frequency domain, fast Fourier transform (FFT) is used. The shortcoming of this method is that the exact location of the frequency component in the time domain cannot be found out. Thus, time frequency representation is not possible. For this, short-term Fourier transform is used to get time frequency representation of the ECG signal. The drawback of this method is that SFT does not give very accurate results of time frequency representation of the signal. This disadvantage can be overcome by wavelet transform. It decomposes the signal into coefficients. Karthikeyan et al. [15] described that the wavelet transform has its own time and frequency space. These coefficients can be used to analyze the ECG signal as they have information about time and frequency band making it suitable for denoising of ECG. Saritha et al. [16] highlighted that wavelets can be broadly classified into two major types: continuous wavelet transformation (CWT) and detrital wavelet transform (DWT).
5.2 Machine Learning and Predictive Analysis Machine learning is a branch of artificial intelligence in which predictive models are generated based on training dataset. The training dataset consists of known outcomes, and the models are generated based on the known properties from the training dataset. The models generated by machine learning approach predict the molecules having activity against a particular biological target. These models when applied to a test set of ECG signals (unscreened signals) give a set of probable normality or abnormality of the ECG signal. In our study, we have employed a machine learning approach to generate binary classifiers which can classify a set of ECG signals based on their normality and abnormality. The following techniques have been used for this purpose. Weka uses a 2 × 2 matrix which consists of four parts: true positives (TP) for normal signals correctly classified as normal, false positives (FP) for abnormal signals incorrectly predicted as normal, true negatives (TN) for abnormal signals classified as abnormal and false negatives (FN) for normal signals incorrectly classified as abnormal. Since false negatives are more important, misclassification costs are set on false negatives to minimize their number. However, increasing the cost for false negatives simultaneously increases the false positive rate. We have placed a limit of 20% to control the rate of false positives. Now, the misclassification cost for false negatives is incremented until the rate of false positives reaches up to 20%. The misclassification
438
G. S. Saini and K. Kaur
cost setting in Weka depends on the base classifier used. We will use the following four classifiers: Naive Bayes classifier. It, based on Bayes theorem, considers that the presence of one descriptor has no effect on the other as all the descriptors are independent. The overall probability of the molecule’s activity is taken as the product of all the descriptor-based probabilities. Random forest classifier. It is developed by Leo Breiman, is an ensemble classifier which uses multiple decision trees and the output is the mode of the individual trees output. It is the most accurate classifier. J48 (implementation of C4.5 decision tree learner). It, developed by J. Ross Quinlan, uses a decision tree in which one attribute of data is taken and the data is split into subsets, one for every value of the attribute. The decision is made by the attribute having the maximum information. Decision tables. It is a visual representation of selecting or specifying the tasks to be performed based on the conditions. It represents conditional logic by creating a list of tasks. The proposed method in Fig. 1 follows the methodology involving dataset collection at first in which MIT-BIH dataset has been used. Then, it is followed by denoising of the same data as described above and followed by detection of peaks as well as features. At last, machine learning classifiers are used on extracted features to detect normality or abnormality of ECG signal and detection of diseases in case of abnormal ECG. Fig. 1 Method flow diagram
36 Processing and Analysis of Electrocardiogram Signal Using Machine …
439
6 Results and Discussion 6.1 Step-Wise Results for a Normal ECG Signal Figure 2 depicts the original signal of ECG given as an input, which has been used for the purpose of denoising and detecting whether it is normal or abnormal. For this purpose, we have performed denoising of signals using discrete wavelet transform technique and the denoised signal is shown in Fig. 3. In Fig. 4, removal of baseline wandering has been shown on the ECG signal obtained in Fig. 3, which includes removal of low frequency artifacts like breathing, electrically charged electrodes, body movements, etc., by designing specific filters. Fig. 2 Zoomed original signal
Fig. 3 Denoised signal
440
G. S. Saini and K. Kaur
After obtaining a complete denoised ECG signal, peaks have been detected to extract the features and applying machine learning classifiers on these features, the algorithm decides about normality or abnormality of the ECG signal. Figure 5 depicts normality of the signal obtained by denoising of the original ECG signal.
Fig. 4 ECG after baseline wandering removal with new baseline at 0 mV
Fig. 5 Detected peaks and final result showing that the ECG signal is normal and the denoising percentage
36 Processing and Analysis of Electrocardiogram Signal Using Machine …
441
Table 1 Conditions for detecting various heart diseases Diseases
Conditions (s = seconds, mV = millivolts)
Ventricular tachycardia
R-R interval < 0.6 s and QRS interval > 0
Long Q-T syndrome
Q-T interval > 0.57 s
Sinus bradycardia
R-R interval > 1 s or P-P interval > 1 s
Hyperkalemia
Q-T interval < 0.35 s and tall T (Tamp > 0.4 mV)
Hypokalemia
Q-T interval > 0.43 s and flat T (Tamp < 0.05 mV)
Hypercalcemia
Q-T interval < 0.35 s
Hypocalcemia
Q-T interval > 0.43 s
First degree Atrio-ventricular block
P-R interval > 0.20 s
Right atrial enlargement (RAE)
Pamp > 0.25 mV
Myocardial ischemia
Tamp > 0.5 mV
Atrial flutter
P-P or R-R interval < 0.6 s and QRS interval < 0.12 s and regular tachycardia and visible P and atrial rate > ventricular rate
6.2 Results for Abnormal Signals After listing down the diseases and the corresponding conditions that are sufficient to detect the disease, an algorithm was made for proper detection using the basic “if-else” statements. Table 1 consists of the diseases listed down along with the conditions that are sufficient to detect the diseases. Following the information in the below table, code was generated for detecting the mentioned diseases. If the conditions match in the input signal, the corresponding disease is detected. The steps remain the same for abnormal signals too. The only difference is that the detection of the disease should be done accurately, and the output should display the name of the disease(s) after the program is run on the input signal. Here is a result of one of the input signals, showing possible diseases. In Fig. 6, as the height of the P-wave is more than 0.25 mV for some cases of P peaks, right atrial enlargement has been diagnosed. As the height of T-wave is more than 0.5 mV for most of the cases, thus myocardial ischemia has been diagnosed based on the conditions described in Table 1. Either of the two diseases can be confirmed by further medical tests.
7 Conclusion This study was aimed to process ECG signals and analyze them so as to detect whether the signal is normal or abnormal. An exhaustive research and study were done before the initialization of the project. After having reviewed a slew of research
442
G. S. Saini and K. Kaur
Fig. 6 Detection of right atrial enlargement (RAE) and myocardial ischemia
papers related to biomedical engineering and ECG signal processing, we started the project and now, we finally have positive results as determined. We have developed codes which first generate the ECG signal in MATLAB, then denoise the signal, remove the baseline wandering, detect the peaks and finally machine learning algorithms were applied on the extracted dataset for computational prediction of the normal and abnormal ECG signal and also detecting the disease if abnormal. We have been successful in denoising the original ECG signal completely using the wavelet transform method. The peaks and all the diseases are being successfully and accurately detected for which we have made algorithms. This path breaking research work can be potentially utilized for real-time detection of diseases if implemented on the hardware and can be very useful for doctors and organizations in the medical field.
References 1. Karthikeyan P, Murugappan M, Yaacob S (2011) Review on stress inducement stimuli for assessing human stress using physiological signals. In: Taib MN, Adnan R, Samad AM, Tahir NM, Hussain Z, Rahiman MHF (eds) 2011 IEEE 7th international colloquium on signal
36 Processing and Analysis of Electrocardiogram Signal Using Machine …
443
processing and its applications. Penang, Malaysia, pp 420–425 2. Hamilton P (2002) Open source ECG analysis software documentation. E.P Limited, Somerville, USA, pp 101–104 3. Liu X, Zheng Y, Phyu MW, Endru FN, Navaneethan V, Zhao B (2012) An ultra-low power ECG acquisition and monitoring ASIC system for WBAN applications. IEEE J Emerg Sel Top Circuits Syst 2(1):60–70 4. Sornmo L, Laguna P (2006) Electrocardiogram (ECG) signal processing. Wiley Encyclopedia Biomed Eng 1–16 5. Jeyarani AD, Singh TJ (2010) Analysis of noise reduction techniques on QRS ECG waveformby applying different filters. Recent advances in space technology services and climate change 2010 (RSTS & CC-2010), pp. 149–152 6. de Chazal P, Reilly RB (2006) A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. IEEE Trans Biomed Eng 53:2535–2543 7. Übeyli ED (2007) ECG beats classification using multiclass support vector machines with error correcting output codes. Digit Signal Process 17:675–684 8. Pourbabaee B, Lucas C (2008) Automatic detection and prediction of paroxysmal atrial fibrillation based on analyzing ECG signal feature classification methods. In: 2008 Cairo international biomedical engineering conference, pp. 1–8 9. Melgani F, Bazi Y (2008) Classification of electrocardiogram signals with support vector machine and particle swarm optimization. IEEE Trans Inf Technol Biomed 12(5):667–677 10. Llamedo M, Martinez JP (2011) Heartbeat classification using feature selection driven by database generalization criteria. IEEE Trans Biomed Eng 58:616–625 11. Rahman MZUR, Shaik RA, Reddy DVR (2012) Efficient and simplified adaptive noise cancellers for ECG sensor based remote health monitoring. IEEE Sens J 12(3):566–573 12. Swathi ON, Ganesan M, Lavanya R (2017) R peak detection and feature extraction for the diagnosis of heart diseases. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). Manipal, India, pp. 2388–2391 13. Lyon A, Minchole A, Martinez JP, Laguna P, Rodriguez B (2018) Computational techniques for ECG analysis and interpretation in light of their contribution to medical advances. J R Soc Interface 15(138):20170821 14. Varatharajan R, Manogaran G, Priyan MK (2018) A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools Appl 77(8):10195–10215 15. Karthikeyan P, Murugappan M, Yaacob S (2012) ECG signal denoising using wavelet thresholding techniques in human stress assessment. Inter J Electr Eng Inf 4(2):306–319 16. Saritha C, Sukanya V, Murthy YN (2008) ECG signal analysis using wavelet transforms. Bulg J Phys 35:68–67
Chapter 37
Design of High Voltage Gain DC-DC Converter with Fuzzy Logic Controller for Solar PV System Under Dynamic Irradiation Conditions CH Hussaian Basha , G. Devadasu , Nikita Patil , Abhishek Kumbhar , M. Narule , and B. Srinivasa Varma
1 Introduction At present, the usage of nonrenewable energy sources for the generation of power is decreasing drastically because of its less availability on the earth. From the literature review, the classification of conventional energy sources are oil, coal, fuel woods, thermal, and nuclear [1]. The conventional energy sources disadvantages are compensated by applying the renewable power resources. The most popular nonconventional resources are geothermal, hydropower, tidal, marine energy, solar, plus wind [2–5]. In this article, solar power is used to supply the electricity to the grid. Solar is a most attractive and popular source because of its unlimited accessibility in the environment. The features of solar are good flexibility, zero noise generation, and high abundance. The solar PV system works comparable to the basic P–N junction diode. The photons clash to the P and N semiconductors. So that the free electrons move from one direction to another [6]. The potential generation of a single cell is 0.75 V to 0.8 V which is not useful for the customers. To make high voltage rating, the cells are formed in the way of serial and parallel to form a module. The plenty of modules form a panel, and the interconnection of panels form an array. The supply voltage
CH Hussaian Basha (B) Department of EEE, Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India e-mail: [email protected] G. Devadasu CMR College of Engineering and Technology (Autonomous), Hyderabad, India N. Patil · A. Kumbhar · M. Narule · B. S. Varma Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_37
445
446
CH Hussaian Basha et al.
Fig. 1 Structure of PV with DC-DC converter for grid-connected solar system
required to the customer is high, then the plenty modules are formed in serial manner. Otherwise, it may be formed in parallel [7]. From the literature review, the classification of PV panels are polycrystalline silicon, carbon-nanotube, graphene, thin film, mono, and cesium. Among all, the mono crystalline manufacturing technique gives high efficiency [8]. So, most of the researchers are working on the mono type PV cell technologies. There are many research topics involved in the solar PV power systems which are power point tracking design, types of PV cell circuit used, interfacing of converters, inverters, and load. Here, the major focus is MPPT methodology. Basically, solar power systems give nonlinear behavior voltage and current characteristics at diverse variant irradiation conditions. So, the operating PowerPoint finding of PV is a one of the major tasks in PV grid-connected systems [9]. In this article, a flexible MPPT controller is designed to transfer the maximum power from source to load at dynamic irradiation conditions. The schematic structure of the proposed PV system is given in Fig. 1. Another disadvantage of PV is high per unit power installation and generation cost which is limited by using the boost converter. As of now, there are two types: converter topologies are recommended to step-up the voltage. Those are isolated with a transformer and non-isolated without a transformer [10]. The isolated converters require an additional rectifier in the circuit. Due to that the cost of the converter circuit is high. Also, it requires more space to design for the PV system. To overcome this disadvantage, in this article, an inductor coupled non-isolated, single switch, and high voltage gain DC-DC converter is applied to improve the potential conversion ratio of PV systems.
2 Design of Solar PV Array As from the previous discussion, the PV cell manufacturing has been done by using the different silicon materials. In this article, the PV model is designed by utilizing
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …
447
Fig. 2 Mathematical modeling of PV
monocrystalline silicon. From the literature study, there are plenty of PV cell circuit topologies available. The basic PV cell methods are ideal diode, single diode-based PV cell, dual diode dependent PV cell, and plus triple diode dependent PV cell [11]. In this work, a one diode PV cell is applied to implement the PV array. The attractive features of this PV cell are easy design and implementation. The variables considered for the design of single diode dependent PV cells are series resistance (Rs ), diode diffusion current factor (I 0 ), parallel resistance (Rp ), cell output current (I PV ), and air mas (a). From Fig. 2, the PV array current at output is obtained as V pv +I pv Rs V +I R pv pv s I P V = Ice − I0 e ns ∗V t − 1 − Rp Vt =
A ∗ kt ST C q
(1) (2)
From Eq. (1) and (2), I PV and V PV are the PV array output current and output voltage, respectively. Similarly, ns and V t are the total number of series connected cells per string and PV thermal voltage. Here, T STC and ‘ A’ are the junction operating temperature at standard test condition, quality factor of P–N diode. From Eq. (1), the PV system parameters are derived as follows as
IM P P T
Isc Rs I R sc s Isc = I pv − I0 ∗ e ns ∗Vt − Rp Voc V oc Ioc = 0 = I pv − I0 e ns ∗Vt − 1 − Rp VM P P T +I M P P T Rs V M P P T + I M P P T Rs n s ∗Vt − = I pv − I0 ∗ e Rp
(3) (4) (5)
From the nonlinear I–V and P–V characteristics, at peak power point of solar cell, the differential of evaluated power related to the voltage is neglected which is given in Eq. (6). Similarly, the differentiation of current related to the potential is equal to the inversely proportional of the shunt resistance as given in Eq. (7). The detailed design constraints of the solar array is given in Table 1.
448
CH Hussaian Basha et al.
Table 1 Solar array designed parameters
Parameters
Symbol
Value
Peak to peak power
PMPPT
5.188 kW
Voltage of PV at open circuit
V oc
497.38 V
Peak to peak voltage
V MPPT
402.0 V
Total parallel strings
Np
2
Total series cells for each string
ns
24
Maximum peak current
I MPPT
12.89 A
Resistance at parallel
Rp
86.803
Resistance at series
RS
0.27328
dPpv =0 dV V =VM P P T dI 1 =− dV I =Isc R po
(6) (7)
3 Analysis of Single Switched Power Converter In most of the articles, the PV array power is directly fed to the inverter for converting direct current to alternating current as its desirable characteristics are less size, fair transmission power losses, high efficiency, and less cost. But, it gives less utilization factor of the solar PV [12]. Here, the coupled inductor concept is used in the planned converter to advance the voltage conversion ratio of the supply system and its interfacing between the supply and load is a challenging task. The advantages of coupled inductor technique is it is helpful for obtaining the fast and accurate dynamic response. The proposed structure of the converter is given in Fig. 3.
Li
IPV
Coupled Inductors Ii Rin
I0t
Iot Cs1
+ VPV
R0t L0t
C Cin
Switch (Q)
G
Fig. 3 Single switch inductor coupled converter
E
D
Vs1 Vo
N Cs2
Vs2
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic … Table 2 DC-DC coupled converter and DC-AC values
449
Parameters
Description
Values
rIN
Internal i/p resistance of converter
32.2 m
cs2
Secondary coupled capacitor
11.0 mF
cin
Primary coupled capacitor
11.00 mF
lot
Output inductor
14.13 mH
cs1
Secondary capacitor
9.98 mF
li
Input inductor
15.60 mH
rot
Output resistor
72.01 m
rtotal
Total equivalent resistor
0.278
lx
Magnetizing inductor
12.88 mH
cx
Source side capacitor
0.62 mF
From Fig. 3, it is clearly noticed that the solar system improves the boost converter output power without adjusting the duty of the converter. Here, the windings are placed on the two outside limbs of the core to design the high voltage gain of the converter. The small space is provided in between the two limbs of the inductor to make the absence of dispersion. The design parameters of the converter are given in Table.2. The inductor core is designed with less magnetic swell to operate the converter at high efficiency. From Fig. 3, the terms Rin and Rot are the primary and secondary windings internal resistances. The switch is selected as an insulated gate bipolar transistor (IGBT). The benefits of the power converter are high flexibility, moderate electromagnetic interference, improved efficiency, and low switching plus conduction power losses. The production voltage of the converter and its corresponding primary and secondary inductors are derived in terms of transformer turns ratio.
L 0t =
1 + Nratio D V0 = VPV D
(8)
L total Li = 2 1 + Nratio
(9)
2 Nratio 2 ∗ L total = Nratio ∗ Li 2 1 + Nratio
Rtotal 1 + Nratio Nratio ∗ Rtotal R0t = 1 + Nratio Rin =
(10) (11) (12)
From Eq. (8–10), the terms L i , L ot , and L total are the primary winding, secondary winding, and total winding inductances and their corresponding internal resistances
450 Table 3 Design values of sliding technique
CH Hussaian Basha et al. Parameters
Description
Values
C(s)
Integral output
1.01
Ki
Integral gain
5.4
A
Error signals
0.20
ωh
High pass frequency
10 rad/sec
ω
Reference frequency
100 rad/sec
Slider surface
1.5
are Rin , Rot , and Rtotal . The transformation of the inductive winding turns is indicated as N ratio . The design values of inverter and slider are given in Table.3.
3.1 Sliding Controller for Single Switch Converter In this work, the lower order filter is used to obtain the good time varying response of the system. Also, the advantages of a used lower order filter give wide input and output operation, high voltage conversion ratio of converter, high stable output voltage, and effective nonlinear behavior handling capability. By using this low pass filter, the converter output DC-link capacitor harmonics are filtered. Based on the proper state variables selection, the converter coupled winding magnetic flux is determined which is given in Eq. (13). Imag =
t ∈ Ton Ii Ii ∗ (1 + N ) t ∈ Toff
(13)
From Eq. (13), the on time of the IGBT is T on = D*T, and the blocking time period of switch is T off = (1 − D)*T. Finally, the forward bias and reverse bias operating time is indicated as T = T on + T off .
3.2 Design and Analysis of Adaptive MPPT Controller As we discussed previously, the predictable power point tracking controllers are not suitable for various insolation values of PV arrays. Here, the MPPT controller is projected for obtaining the maximum power of the PV module. Under diverse time varying irradiation conditions, the adaptive controller works based on the rotation of PV arrays. The demodulator angle (φd ) and integrator gain (K i ) are selected as 1 and 5.4, respectively. The lead compensator is used for the correction of error signals which is indicated as A(s).
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …
451
4 Design and Analysis of Two-Leg Inverter Basically, the design of the proposed PV fed grid-connected system is given in Fig. 4 [13]. In this work, a two-leg power inverter circuit is used for the regular power supply of PV to the load. From Fig. 4, it is indicated that the phase ‘x’ supplies the line voltage of V xy . The power generation of phase ‘ y’ is V yx . The root mean square value of the inverter voltage is calculated from the summation of V xy and V yx . The inverter gives three phase voltages which are represented as V px and V py and the corresponding phase currents are I px and I py . The filter inductor (L x ) is linked in sequence with the line, and the capacitor (C x ) is connected in parallel with the grid. The mathematical design of inverter is explained as dI j 2 = V j − 2Rt I j − V pj − Is Rt − IU dt 3 ∗ Lt IU =
1 Vs − 2Rt Is − V ps − I j Rt 3 ∗ Lt
(14) (15)
dIinvej 1 = I j + Iinvej − I pj dt Cx
(16)
Imag (1 − u) −1 dVcs1 = u x Iinve−x + u b Iinve−y + dt Cs1 Cs1 (1 + Nratio )
(17)
Imag (1 − u) 1 dVcs2 = (1 − u a )Iinve−x + (1 − u b )Iinv−y + dt Cs2 Cs2 (1 + Nratio )
(18)
Fig. 4 Working analysis of proposed inverter circuit
452
CH Hussaian Basha et al.
Table 4 States of operation of three phase inverter Node
Operation of switches
Output of inverter
x
y
T1
T2
T3
T3
V inve_x
V inve_y
V inve_z
0
0
1
0
1
0
−0.33 V 0
−0.33 V 0
0.66 V 0
0
1
1
1
0
0
−V 0
V0
0
1
0
0
0
1
1
V0
−V 0
0
1
1
0
1
0
1
0.33 V 0
0.33 V 0
−0.66 V 0
where the subscript represented j, K ∈ {x, y}. From Eqs. (17) and (18), the DClink capacitor voltage is purely depending on the converter transformation ratio and slider surface of the converter. The inverter generated voltages vary based on the magnetizing current of the coupled inductor current. The detailed operation of the inverter is given in Table.4. The phase opens delta related voltages, and currents are expressed as vpx , vpy , ipx, and ipy . Basically, the slider controller can be used in this three phase inverter. But, it has the drawback of requiring many sensors to sense all error signals for every switching pulse generation. In addition, it is not useful for the effective operation of the inverter. The disadvantages of sliding technique are complex in implementation, required high cost sensors, and less accurate in controlling grid voltage and current. So, the disadvantages of a sliding controller are limited in this work, a fuzzy logicbased PWM generator is used to achieve the switching pulses to the DC-AC converter. The working diagram of the fuzzy is given in Fig. 5. In this controller, three types of operations are involved which are classified as fuzzification, interference, and defuzzification. The DC-DC circuit voltages and load voltages are fed to the input to the fuzzification controller to convert the real values into linguistic functions. The middle process of the fuzzy controller is called interference. The interference solves all converter and inverter-based supply and output parameters. The last operation of fuzzy is inverse fuzzification that can be used to the inverter to convert linguistic functions into crisp answers. The main consideration of fuzzy logic PWM is to supply the maximum solar power to the grid with constant frequency. The phase voltage displacement angle is obtained by using Eq. (19), and plus the root mean square value of inverter voltages are calculated by applying Eq. (20). α = K P V0ref −V0 + K i ref = V ph
V0ref −V0 ∗ dt
V ∗ sin(ς ) sin(ς − α)
(19) (20)
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …
453
Fig. 5 Working flow of fuzzy pulse generation
5 Discussion of Simulation Results From the previous discussions, the solar PV systems output voltage is varying continuously because of sudden changes of atmospheric conditions as shown in Fig. 6. In this manuscript, a single diode topology built PV array is implemented which is shown in Fig. 2. From the nonlinear curves of solar systems, it is indicated that the MPP existed on the power curves for each irradiation changing condition. From the nonlinear curves, at 1000 W/m2 , the peak voltage of the proposed PV system is 415 V and its corresponding current and power are calculated as 23.33 A and 9.681 kW, respectively. Similarly, at second irradiation 750 W/m2 , the determined PV array voltage, current, and powers are 406 V, 16.82A, and 6.828 W, respectively. Finally, at 500 W/m2 , the solar array peak power, voltage, and currents are 5.181 kW, 402 V, and 12.89A, respectively. Here, the overall system is designed by using Simulink software. In the proposed system, there are two different stages involved which are DC-DC power conversion and DC-AC power transmission to the grid. The overall power generation of the solar system output waveform at peak operating point is given in Fig. 7 at diverse irradiation conditions. From Fig. 7, the solar array power is maximum at 1000 W/m2 and it is maintained constant up to the time period of 4 s. The rising time of power is 0.05 s, and the settling time duration is 0.8 s. After 4 s of time, the power starts reducing from 9.681 kW to 6.828 kW from the time period of 4 s to 4.2 s. Finally, the PV power again reduced from 6.828 kW to 5.181 kW at 500 W/m2 . The solar power is given to the inductor coupled, wide output, and input operation power converter to increase the effective utilization of the solar PV installation. The converter step-up the PV voltage from 410 to 1200 V by varying its winding turns. Also, it is useful for any automotive industry application. Here, the converter topology consists of a single switch. So, the generation of ripples in converter generated voltage is decreased excessively which is shown in Fig. 8. From Fig. 8, it is mentioned that the step down of irradiation from 1000 W/m2 to 800 W/m2 , then the corresponding voltage gets disturbed up to the time interval of 0.7 s. However, this distortion is very
454
CH Hussaian Basha et al.
Fig. 6 Current versus voltage curves and P–V curves at 1000, 750, and 500 W/m2
Fig. 7 Solar array output power at different irradiation conditions
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …
455
less when compared to the other convention controllers because the slider is working effectively by the proper selection of sliding surface. Also, the adaptive controller gives optimum duty cycle to the converter. Similarly, the inverter output DC-link capacitors generate the maximum AC-voltage for the commercial load applications. The obtained voltage waveforms of DC-link capacitors and its zoom views are shown in Fig. 9. From Fig. 9, it can be said that the DC-capacitors are working under balanced conditions based on their working output voltage. If the capacitor gives unbalanced voltages, then the switches in the inverter break down. So, the values of capacitors should be the same and equal to operate the IGBT switches efficiently. The obtained DC-link capacitor voltages at 1000 W/m2 is 600 V and 580 V. The point of inverter
Fig. 8 Inductor coupled single switch converter output voltage at different irradiation conditions
Fig. 9 Inverter capacitor voltages at time varying insolation condition
456
CH Hussaian Basha et al.
output is connected to a filter (L f –C f ) to eliminate unwanted harmonics components in the grid currents which is given in Fig. 10. The nominal voltage and operating currents of the inverter are 20.5 V and 15.2A, respectively. The per unit grid voltage and currents are given in Fig. 11. Here, the voltage and currents are in phase. So, the phase angle difference between the voltage and current is zero. Hence, the grid is operating effectively with unity power factor.
Fig. 10 3-phase network currents at dynamic insolation condition of solar PV
Fig. 11 Per unit three phase grid currents and voltages at time varying irradiation condition
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …
457
6 Conclusion The proposed MPPT controller is tracing the functioning point of PV efficiently and effectively with high convergence speed. The merits of this MPPT technique are less fluctuations, high error detection accuracy, independent of the PV array installation, and requiring less sensors to sense the variables. The sliding controller is giving the optimum duty cycle to the boost converter for achieving the good output voltage with less distortions. Another important concept of two-leg inverter gives continuous power supply to the local loads and grid. The merits of the fuzzy technique are less design difficulty, simple understanding, and accurate fast response.
References 1. Ghose D, Pradhan S, Shabbiruddin (2022) Development of model for assessment of renewable energy sources: a case study on Gujarat, India. Int J Ambient Energy 43(1):1157–1166 2. Kebede AA, Kalogiannis T, Van Mierlo J, Berecibar M (2022) A comprehensive review of stationary energy storage devices for large scale renewable energy sources grid integration. Renew Sustain Energy Rev 159:112213 3. Basha CH, Rani C (2020) Design and analysis of transformerless, high step-up, boost DC-DC converter with an improved VSS-RBFA based MPPT controller. Int Trans Electr Energy Syst 30(12):e12633 4. Hussaian Basha CH, Rani C, Brisilla RM, Odofin S (2020) Simulation of metaheuristic intelligence MPPT techniques for solar PV under partial shading condition. In: Soft computing for problem solving. Springer, Singapore, pp 773–785 5. Govinda Chowdary V, Udhay Sankar V, Mathew D, Hussaian Basha CH, Rani C (2020) Hybrid fuzzy logic-based MPPT for wind energy conversion system. In: Soft computing for problem solving. Springer, Singapore, pp 951–968 6. Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: a comprehensive analysis. Energies 13(2):371 7. Gao L, Chen L, Huang S, Li X, Yang G (2019) Series and parallel module design for large-area perovskite solar cells. ACS Appl Energy Mater 2(5):3851–3859 8. Du Z, Artemyev M, Wang J, Tang J (2019) Performance improvement strategies for quantum dot-sensitized solar cells: a review. J Mater Chem A 7(6):2464–2489 9. Prol JL (2018) Regulation, profitability and diffusion of photovoltaic grid-connected systems: a comparative analysis of Germany and Spain. Renew Sustain Energy Rev 91:1170–1181 10. Narwaria A, Swarnkar P, Gupta S (2022) A review on multi-input DC-DC converter and its controlling for hybrid power system. Intell Comput Tech Smart Energy Syst 277–288 11. Hussaian Basha CH, Rani C, Brisilla RM, Odofin S (2020) Mathematical design and analysis of photovoltaic cell using MATLAB/Simulink. In: Soft computing for problem solving. Springer, Singapore, pp 711–726 12. Basha CH, Murali M (2022) A new design of transformerless, non-isolated, high step-up DCDC converter with hybrid fuzzy logic MPPT controller. Int J Circuit Theory Appl 50(1):272– 297 13. Sharma B, Dahiya R, Nakka J (2019) Effective grid connected power injection scheme using multilevel inverter based hybrid wind solar energy conversion system. Electr Power Syst Res 171:1–14
Chapter 38
SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy for Fog/Edge Computing Ipsita Dalui, Arnab Sarkar, and Amlan Chakrabarti
1 Introduction Service-oriented computing (SOC) is gradually acquiring huge importance both in research as well as organizational and industrial implementations. This is primarily because of the inherent benefits of SOC including reduced infrastructural investments, maintenance, manpower and energy consumption, along with the ability to provide flexible demand-based scalability and seamless interoperability. These facts often lead to significant increase in return on investment. Cloud computing virtualizes the centrally controlled remotely located resources for the purpose of being shared commercially in a demand-based technology. On the basis of the nature of services being provided, cloud services are typically categorized as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). Recently, there has been a proliferation of Internet of Things (IoT) applications in domains such as vehicular networks, smart grids, wireless sensor and actuator networks, where a significant fraction of the data is generated at close proximities to the edge devices and must be processed within short time scales. Cloud computing, with its high inter-host and client to host geographical distances, has been shown to fail in providing acceptable performance to these IoT applications several times.
I. Dalui (B) MCKV Institute of Engineering, Kolkata 711204, India e-mail: [email protected] A. Sarkar Indian Institute Of Technology Kharagpur, Kharagpur 721302, India e-mail: [email protected] A. Chakrabarti University Of Calcutta, Kolkata 700098, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_38
459
460
I. Dalui et al.
Fig. 1 Fog extends the cloud closer to devices producing data. Source CISCO
With the objective of mitigating latency and connectivity issues as well as deployment costs of cloud computing, an alternate architecture called fog computing was proposed by Cisco in 2013. Fog is distinguished from cloud by its closer proximity to end devices. Thus it has emerged as a promising environment to provide software, storage and computation resources for many latency sensitive IoT applications in various domains. Any device having storage, computing and networking capability can be granted as a fog node. While remaining primarily responsible for providing services to a set of concerned edge devices, a fog service may also eventually be connected to the cloud in order to satisfy heavier resource demands. In 2015, Cisco documented the 3-tier fog architecture [1] where fog extends cloud closer to the things at the edge of the network that produce and act on IoT data as depicted in Fig. 1. Different heterogeneous resource-sharing models and frameworks for fog computing (FC) have been proposed in literature. A Platform as a Service (PaaS) programming model, called Mobile Fog, was presented by Hong et al. in 2013 [2]. The authors presented applications of Mobile Fog in vehicle tracking. Nishio et al. [3] presented a framework where all heterogeneous resource parameters such as latency and bandwidth are unified and equivalently mapped to have affordable service rates by heterogeneous fog devices. Stojmenovic in 2014 [4] proposed a 3-level hierarchical model consisting of edge devices, fog cells and clouds. In his paper, he considered cloudlet as a special case of fog computing. He also analysed smart grids, service domain networks (SDN), etc., as real-world applications of fog computing and discussed the security and privacy issues. A mathematical network model for fog computing was presented by Sarkar et al. [5] in 2015, where fog computing performance metrics like service latency, power consumption and CO2 emission (for various renewable and non-renewable resources) are mathematically characterized. A comparative performance evaluation of cloud computing with fog computing has also been performed there with a high number of Internet-connected devices demanding real-time services. Barbarossa et al. [6] in 2014 discussed several aspects in connection with the 5G networks environment. The energy-related issues in mobile cloud computing were considered by Lin et al. [7] in the same year. Different charging schedule algorithms in smart grid scenarios reviewed by Mukherjee and Gupta [8]
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
461
and scheduling of virtual machines in cloud computing by Bazarbayev [9] could be relevant in task scheduling in fog computing scenarios also. Luan et al. [10] focused on practical issues of mobile subscriber faces in edge computing. In this work, we present a new fog-based architecture which alleviates the need for dedicated fog computing servers to provide a seamless and cheap computation backbone by comprehensively aggregating the residual computing powers of a set of dynamically arriving, distributed and heterogeneous local computing resources. The principal motivation behind this endeavour is that today, we observe a huge proliferation of networked computing devices of various types whose computing capacities remain unharnessed to a large extent, at least during certain spans of time in a day. These edge devices with surplus computing capabilities can act as service providers (hosts/resources) for short periods of their idle times for earning some money. Alternatively, some edge devices may require computing services to accomplish a set of latency sensitive real-time tasks (tasks/ sinks) within deadline and ready to finance in specific budget. We introduce a third party scheduler, namely Aggregator, which maintains accounts of these residual computing capacities that the devices aim to serve commercially and overall instantaneous service demands of various tasks with their respective budgets as well. Thereafter it allocated the resources to the tasks as per the scheduling principle in real time. Both resource and task edge devices are connected to the Aggregator via network. Each Aggregator is connected to its neighbouring Aggregators and also connected to the Cloud. Our goal is to devise a scheduling mechanism for the Aggregator that attempts to maximize resource utilization at the network edge on the basis of Earliest Deadline First (EDF) scheduling algorithm. In the EDF algorithm the tasks are selected according to deadlines, i.e. the tasks with deadlines earlier will be executed with higher priorities. This algorithm can guarantee to meet the deadlines of the tasks with high probability. The strategy is extended by allowing a limited split-ability of lately arrived tasks, if eligible, for maximization of resource utilization. It is further extended by including profit-awareness to secure a minimum amount of monetary benefit to the Aggregator. The rest of the paper is organized as follows: we describe the system model and define the system parameters including resource availability matrix and task list in Sect. 2; we propose our resource allocation scheme in Sect. 3— it is executed in four steps by inclusion of cost of communication and also profit awareness of Aggregator in Sects. 3.1, 3.2, 3.3, and 3.4; the simulation framework and the simulation results with explanation are provided in Sect. 4 (Fig. 2).
2 System Model and Problem Formulation 2.1 System Model The SmartFog environment considered in this work is a distributed system consisting of sets T = {T 1 , T 2 , ……, T n } and R = {R1 , R2 , ……, Rm } of arriving/departing
462
I. Dalui et al.
Fig. 2 Proposed Aggregator-based network architecture
dynamic tasks and resources, respectively, along with a single dedicated Aggregator node A. The task T i in T, is a 4-tuple entity {TAi , TDi , TL i , TBi }, where TAi denotes the time of arrival, TDi is the deadline, TL i represents the task length and TBi the allocated financial budget. The resource Rj in R, is also a 4-tuple entity {RAj , RDj , RC j , RPj } where, RAj is the time of arrival, RDj the time of departure, RC j the speed of computation and RPj the price rate at which Rj can deliver its services. The tasks are basically service receptors, spawned by client edge devices which demand such services within stipulated deadlines from fog/edge computing resources. On the other hand, the resources are service providing entities obtained from edge devices who allow their surplus computing capacities to be lent over a specified span of time. The Aggregator A is actually a third party resource manager residing at a stipulated Fog node that allocates client tasks to appropriate fog computing resources such that the computational requirements of the tasks can be satisfied within their respective deadlines. The allocation policy of the Aggregator ensures fair distribution of resources to the tasks so that every resource gets a chance to provide services to the tasks to secure a financial gain as well as the client tasks suffer minimal starvation. The strategy also ensures a financial remuneration to the Aggregator in the form of brokerage for providing its services as a scheduler after paying the prices of resources for their computing services as well as the prices of the communication operators for providing their communication services. For implementation of its allocation policy, the Aggregator maintains accounts of the overall instantaneous service demands with budget and resource capacities with price in an organized manner using the following two data structures: Resource Availability Matrix It is a two dimensional matrix with rows specifying the discrete values of span of availability of the resources in seconds whereas the columns specify the discrete range of computing power in terms of Million Instructions per Second (MIPS). The Aggregator upon receiving the advertised information by the resources first checks the span of time it is available and consequently selects the row number; next it
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
463
verifies the computing speed of the resource and determines the column number. For example, for the resource that is available from 3T to 4T seconds having computing speed between 2C to 3C MIPS, where T and C are the chosen unit of discretization of span of availability and computing speed of the resources respectively, will be decided to be placed in row 4 and column 3 of this matrix. In practice, the linked lists of arrival times of the resources with the same span of time and computing speed are maintained as the elements of this matrix. Choice of the values of C and T is an important design issue. Task Demand List It is a one dimensional list consisting of task node structures. Each structure contains the information provided by the 4-tuple format. The Aggregator upon receiving a packet, recognizes it as the task packet and extracts the necessary information to maintain the list. In this model the Aggregator is also a commercial unit and claims remuneration for providing services to the tasks. A task T i is therefore required to pay (i) Pr , the cost of computing to the resources, (ii) Pn , the cost of communication to the Network Operators, (iii) Pa , the cost of its services as the brokerage to the Aggregator. Hence, if the budget of the ith task TBi is sufficient enough to pay all these financial costs, then only the task is considered to be served. In other words, the ith task is granted to be executed on one or more resources by the Aggregator if and only if the following condition holds: T Bi > (Pr + Pn + Pa )
(1)
2.2 Problem Formulation Given a set of tasks T, which arrive for services at the Aggregator at a certain time instant, the objective of the Aggregator is to appropriately allocate a set of resources available at t, such that the tasks are minimally starved (ensuring fairness) while satisfying constraints related to task budgets, resource price rates and lower bound on brokerage profit of the Aggregator.
3 SmartFog: The Proposed Framework The current work proposes a new resource allocation strategy to provide real-time fog computing services to the edge devices (tasks/resources) arriving and departing arbitrarily. The Aggregator maintains accounts of the overall instantaneous service demands of the tasks with their respective budgets as well as the resource capacities with their respective prices and schedules the tasks as per the scheduling principle in real time.
464
I. Dalui et al.
3.1 Task–Resource Allocation Based on Proposed Best_Fit Algorithm SmartFog allocates available resources to the tasks to accomplish their job following the proposed Best_Fit algorithm. The algorithm allocates the tasks to the resources in epoch-based scheduling strategy. The entire timeline is divided into fixed length epochs. One task can be scheduled in epoch boundary only, i.e. at the beginning of an epoch. When one task arrives at time instant t, if t is an epoch boundary then only it is considered for scheduling. If t is not an epoch boundary, then the arrived task has to wait until the end of the current epoch. It can be considered for scheduling at the beginning of the next epoch. At the epoch boundary, it is examined whether the newly arrived tasks can be accommodated or not. To take this decision, the status of the already available resources is updated as well as the newly arrived resources are taken into account by the Aggregator. The existing schedule of tasks, if any, is set as late as possible first. This task–resource allocation pairs will be kept as it is. For each of the newly arrived tasks, sorted on the basis of Earliest Deadline First (EDF) scheduling strategy, the minimum speed of computation required to accomplish the task is calculated using the formula: (length of the task)/(deadline of the task—arrival time of the task). The resources with computing speed higher than this required value are selected from the Resource Availability Matrix Rm as the eligible resources to accomplish the task. Among them, one having the least residual computing capacity, if it be allocated to accomplish the task, is selected to be the best resource to serve the task. If no resource is found to have computing speed higher than the required minimum speed of any task, then that task is rejected. Applying the same principle to all the newly arrived tasks in the current epoch, the Aggregator aims to accommodate as many tasks as possible. The modified task schedule T includes the newly arrived tasks in the current epoch along with the already existing task–resource pairs then shifted to be executed as early as possible.
3.2 Inclusion of Communication Cost Once the task and resource allocation table is constructed by the Aggregator, the communication links are required to be established by the Network Operators to transfer the jobs from the task nodes to their respective allocated resource nodes for execution. After execution, the results are also required to be transferred back to the task nodes from the resource nodes via the communication links. The links from tasks to resources for input data and resources to the tasks for output results need not be the same. The total cost of communication incurred in this process will be therefore the sum of the costs of communicating: (i) input data from task to resource (α i ), (ii) the programme from task to resource (β i ) and (iii) the output data from resource back to task (γ i ).
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
465
If δ be the cost for sending 1 bit via a communication link, the cost of communication λi incurred for completion of a task T i in a remote resource may be calculated as λi = (αi + βi + γi ).δ
(2)
Among the three factors mentioned above, β i plays the most significant role compared to α i and γ i , and hence, these two terms may be ignored. SmartFog then calculates the cost of communicating the tasks to their allocated resources based on the length of the tasks. It then verifies whether the required time of execution and time to send back the outputs to the task nodes violate the deadlines of the tasks. If it crosses the deadline of a task, then the schedule to execute that task into the corresponding resource node is cancelled by the Aggregator.
3.3 Applying the Strategy of Pre-Emption and Migration SmartFog further considers the scenario when one new task arrives in the system and no listed resource has the capacity to serve that task alone although some resources still have residual capacities unutilized. In this condition, the Aggregator will check whether the newly arrived task be allocated by allowing pre-emption and migration of the task in those resources. At this point, no resource is unallocated when the new task arrives in the system. The allocated tasks are scheduled to the resources in a fashion as late as possible, as per the scheduling strategy, and thus, the resources provide room to accommodate the new tasks. Next the system greedily selects the fastest one as the resource to be allocated to fulfil the requirement of the new task as much as permissible. If the residual computation capacity of the fastest resource is exhausted before accomplishment of the newly arrived task, then the task is pre-empted from the allocated resource leaving the resource for execution of its previously allocated task and is migrated to the second fastest resource to accomplish the task as much as possible. This strategy is carried out till the completion of the new task or as long as residual capacities of the resources are not exhausted. In the second possible scenario, it is concluded that the newly arrived task cannot be accommodated in the system, and hence, the request is rejected. In the first possible scenario it is possible to host the new task but with additional cost of pre-emption as well as migration in different resources, both in terms of cost of time and monetary values. If this additional cost is bearable, then only the newly arrived task can be granted. SmartFog limits the number of pre-emption and migration of a single task to a maximum value that the system may allow. In the first possible scenario, after accommodation of the newest task the previously allocated tasks slide back to be scheduled as early as possible. In the second possible scenario the system maintains the previous status-co of allocations. Let the cost of pre-emption and migration of a task T i from one resource be denoted by π i and let the cost of communication incurred to execute a task in a
466
I. Dalui et al.
remote resource is ρ i . Hence to pre-empt the newly arrived task from one resource and to migrate it into another will incur an additional cost μi which is the sum of these two costs, i.e. Algorithm 1: Best_Fit Algorithm: To construct Task–Resource Allocation L = Set of existing tasks for each time slot t do if new tasks have arrived then Create ordered list of newly arrived tasks T if t is an epoch boundary then for each task in T ACCEPT = CoSchedule(); else continue(); Wait till end of epoch end if if ACCEPT = TRUE then Allocate the resource and update L end if end if if t is an epoch boundary then if no new task arrives then continue with executing existing tasks end if end if
Algorithm 2: CoSchedule(): The Schedule Generator R: List of available resources in the current epoch T: List of sorted tasks on the basis of earliest deadline first (EDF) for each task T[i] in T required[i]: minimum computing speed required for task T[i] for each resource R[j] in R residue[j]: computing capacity available at R[j] Initially set FLAGT == FALSE if computing speed of R[ j] >= required[i] and residue[ j] >= length of T[i] then include R[j] in Temp[] end if best: Find_Best_Resource(T[i], Temp[]) if best != NULL then Schedule T[i] on best residue[best] = residue[best] - length of T[i] Set FLAG == TRUE end if RETURN FLAG
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
467
Algorithm 3: Find_Best_Resource Input: Task T, Set of Selected Resources Temp[] Output: best best: Temp[0] res: residual computing capacity of resource Temp[0] L: length of the task T resBest: res - L i←1 while R != NULL do residual: residual computing capacity of Temp[i] - L if residual = di j + pa
(5)
468
I. Dalui et al.
Illustration with Example Suppose the Aggregator has received the advertisements of five tasks and three resources with specifications as tabulated in Tables 1 and 2. The Aggregator first constructs the resource allocation matrix denoted by Rm . In each epoch boundary, upon receiving the information of currently arrived resources, it places the resources in their proper position in the matrix. For example, if the discrete value of span of availability of the resources ΔT be chosen as 200 s, and the discrete value of range of computing power ΔC be chosen as 600 MI/second, then a resource with arrival time 554 and computing speed 3500 MIPS will be placed in the resource matrix Rm in the position (3, 6). The 4-tuple tasks are also sorted and placed in a list T in each epoch applying the Earliest Deadline First (EDF) strategy that arranges the currently available tasks at each epoch in the increased value of their deadlines. The Aggregator has to allocate the resources to the tasks in T, if feasible, following the proposed scheduling strategy. Choice of epoch length is a major design issue. We have chosen it as the 10 per cent of the earliest deadline. T 2 has the earliest deadline 600 s; hence, the epoch length is set to be 60 s. We are considering the speed of communication of the underlying network to be 10 Mbps. Calculations at epoch 1 (0–60 s): In the first epoch, only one resource R1 and one task T 1 are present. Task length is 200,000 Million Instructions (MI) and the computing speed of the resource is 1000. Hence, the minimum time to accomplish the task is 200,000/1000 = 200 s. The resource R1 is present up to 580 s. Therefore, R1 is eligible to serve T 1 . Since no other task is present in this epoch, T 1 is scheduled to be executed on R1. Calculations at epoch 2 (60–120 s): Since both T 2 and R2 are arriving after commencement of epoch 2, both have to wait till the end of this epoch. Therefore, in epoch 2, T 1 will continue to be executed on R1 . Table 1 Task information Task ID
Arrival time (s)
Deadline (s)
Task length (MI)
Budget (INR)
T1
0
644
200,000
600
T2
90
600
250,000
850
T3
110
902
300,000
1000
T4
506
1140
450,000
1500
T5
800
1270
350,000
750
Table 2 Resource Information Resource ID
Arrival time (s)
Departure time (s)
Computing speed (MIPS)
R1 R2 R3
Price per sec (INR)
0
580
1000
8.00
70
1100
1500
12.00
300
1315
600
6.00
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
469
Calculations at epoch 3 (120–180 s): Two resources R1 , R2 and three tasks T 1 , T 2 , T 3 are present in this epoch. T 1 is already scheduled in R1 , and the Aggregator will not disturb this allocation. Then T 2 is required to be scheduled. If it is scheduled on R1 , as the residual computing capacity of R1 after serving T 1 is (580–200) × 1000 MI, i.e. 380,000 MI, whereas the length of the task T 1 is 250,000 MI. Also, the required computing time (250,000/1000) = 250 s and required communication time is (250,000/10,000) = 25 s. Hence, a total of (250 + 25 = 275) seconds time is required. Hence to serve both T 1 and T 2 , the resource R1 needs (220 + 275) = 495 s, whereas it is present in the system from 0 to 580 s. Therefore, R1 is eligible to serve both the tasks T 1 and T 2 . It is to be noted that in that case the remaining computing capacity of R1 will be (580–495) × 1000 MI = 85,000 MI. Alternatively, if T 2 is scheduled to be executed on R2 , then the required computing time will be (250,000/1500) = 167 s, and the communication time will be (250,000/10,000) = 25 s. Therefore, R2 will be engaged a total time of (167 + 25) = 192 s starting from the beginning of epoch 3. Therefore, (120 + 192) = 312 s is the time instant for finishing the task T 2 on R2 which is earlier than the deadline of T 2 and departure time of R2 , both. Hence, R2 is also eligible to serve T 2 . In this case, the residual computing capacity of R2 will be (1100–312) × 1500 MI = 1,180,500 MI. Since the least residual computing capacity among R1 and R2 if T 2 be scheduled to execute on them is of R1 , T 2 will be decided by the Aggregator to be scheduled on R1 . In epoch 1 and 2, i.e. in 120 s, 20 s time was spent to buffer the task T 1 . Rest 100 s will be utilized for computing the task T 1 . Hence, (100 × 1000)MI = 100,000 MI of the task T 1 is already executed. Rest 100,000 MI is shifted as late as possible at the beginning of epoch 3. Deadline of the task T 1 is at 644 and departure of R1 is at 580. Hence, the last 100 s of R1 is allocated for T 1 . Therefore, R1 can serve T 2 from 121 to 480 s. T 2 requires 275 s processing on R1 including communication and buffer time. Therefore, T 2 can reserve R1 from 206 to 480 s. The task T3 arrived meanwhile at 110 s. Hence in epoch 3 (120–180 s), T 3 is also required to be scheduled. Length of T 3 is 300,000 MI, whereas residual capacity of R1 is (206–120) × 1000 MI = 86,000 MI. Hence, it is not possible for R1 to serve T 3 . As the resource R2 arrives at the system at 70 s, it is available to serve from time instant 120 s. Hence, T 3 can be scheduled to be executed on R2 from time instant 120. The length of the task T 3 is 300,000 MI. Hence, it needs (300,000/10,000) = 30 s for communication and (300,000/1500) = 200 s time for computation. So it needs a total (200 + 30) seconds = 230 s to be executed on the resource R2 . As the resource R2 is available from time instant 120 to 902 and no other resource is available, T 3 will be scheduled to be executed on R2 for the time span 120 to 350 s. As there is no other task to be scheduled in this epoch, the entire schedule on R2 is shifted as early as possible, i.e. the resource R2 will be occupied from time instant 120 s to 495 s. Calculation for the task T4: The task T 4 is arriving at time instant 506, i.e. at epoch 9 (480–540) and hence can be scheduled not before epoch 10. In this epoch, resource R1 will depart at 580 s, that is after 40 s of the beginning of this epoch. The size of T 4 is 450,000 MI. The required communication time is therefore (450,000/10,000) = 45 s. Hence, R1 cannot serve T 4 . R2 has the computing capacity (980 − 585) × 1500 = 592,500 MI and hence eligible to serve T 4 . The residual capacity of R2 will
470
I. Dalui et al.
be then (592,000 − 450,000) MI = 142,000 MI. R3 is present up to 1315 time units. 45 s will be spent buffering the task and hence can compute (1315–540–45) × 600 = 438,000 MI which is not sufficient to serve R1 . Therefore, T 4 will be scheduled to be executed on R2 . Speed of computation of R2 is 1500 MIPS. Hence, it will take (450,000/1500) = 300 s for computation and 45 s for communication, as a whole 345 s on R2 starting from 541 s to (540 + 345 = 885) time instant. Calculation for the task T5: Length of the task T 5 is 350,000 MI. Hence, the required communication time is (350,000/10,000) = 35 s. It is arriving at 800 time instant, i.e. at epoch 14 (780–840); hence, it can be taken into consideration by the Aggregator at epoch 15, i.e. at 840 time instant. After being communicated for 35 s, till 875 instant, it can start execution. The residual computing capacity of R2 is then (980–875) × 1500 = 157,500 MI. Thus R2 cannot accomplish the task T 5. The resource R3 is departing at 1419 time unit, and the deadline of the task is 1270 time unit. Hence, (1270–875) × 600 = 237,000 MI can be executed by R3 before the departure of T 5 , whereas length of T 5 is 350,000 MI. Thus the task T 5 can be accomplished by none of the resources within its deadline. Hence, the task T 5 is REJECTED. Allowance of Pre-emption and Migration: To examine whether the residual computing capacities of resources R2 and R3 in combination can accommodate the hitherto rejected task T 5 , it is observed that the task T 5 arrives in the epoch 14 and hence cannot be scheduled before epoch 15. All other tasks were already executed and departed. Thus T 5 is the only task present at epoch 15. It is also noted that R2 is faster than R3 , thus more computations can be accomplished by R2 than R3 in the same time span. Communication time of T 5 is 35 s. Therefore, computation can start from 936 times. If x amount task is executed on R2 , then (x/1500) time is required. The remaining amount of tasks will be communicated to R3 for execution before departure of R2 and that will take (L − x)/10,000 s. Now, R2 is departing at 1100 instant. Available time of computation is therefore (1100 − 885 − 35) = 180 s less the communication time of rest of the task. Hence, (x/1500) + (350,000 − x) / 10,000 = 180. From this equation the value of x becomes 256,000 MI. This amount of T 5 is executed on R2 , and the rest 94,000 MI can be communicated to R3 that require 10 s of time. To compute this amount of task R3 needs (94,000/600) = 157 s. Therefore, it requires (157 + 10) = 167 s. As R2 is departing at 1100 time instant, 1101 to (1100 + 167) = 1267 s T 5 can be scheduled on R3 . As the deadline of T 5 is 1270 s and departing time of R3 is 1315 time unit, it is observed that if pre-emption and migration and allowed, hitherto rejected task T 5 can be accommodated in the system. Inserting Profit Awareness: The allocation policy up to this stage does not consider the financial aspect. To make the proposed model profit aware, let the price of communication medium per second be 1 unit and Aggregator’s brokerage is 2% of the dealing price. The dealing price and total payables of the tasks are shown in the following tables. The allocation status of the tasks according to their budget shows that the task T 1 will be rejected for its insufficient budget.
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
471
Table 3 Allocation summary Task ID
Allocated resource ID
Communication time (s)
Computing time (s)
T1
R1
20
200
T2
R1
25
250
T3
R2
30
200
T4
R2
45
300
T5
R2 , R3
35, 10
171, 157
Table 4 Price calculation Task ID
Dealing price (INR)
Aggregator’s Brokerage (INR)
Total payables (INR)
Task budget (INR)
T1
1620
32.4
1652.4
2000
T2
2025
40.5
2065.5
2500
T3
2430
48.6
2478.6
3000
T4
3645
72.9
3717.9
4000
T5
2994
59.88
3053.88
3000
As observed, the task T 5 cannot be scheduled due to insufficient funds. In this particular example, the Aggregator secures a profit of 254.28 unit in the form of brokerage from the client tasks (Tables 3 and 4).
4 Simulation Framework: SmartFog We have evaluated the performance of the proposed algorithm by varying the system load and the percentage rejection of tasks with that load. Results have been generated by varying parameters like system weight and number of the tasks, computing capacities and number of resources. We have developed a simulator and a synthetic data set generator to execute the experiments and assess the performance of the proposed model. It simulates the algorithm by entering tasks and resources in the system using a separate module for their generation. It comprises three modules: (1) to generate resources and tasks with their respective characteristics, (2) to allocate appropriate resources to the tasks by the Aggregator and (3) to compute the performance evaluation parameters, namely system load and percentage rejections. Synthetic Data set Generation Synthetic resource and task data sets with their respective characteristics are generated using a separate generator module. We introduce a fictitious machine with a defined computing power (in MIPS) corresponding to the mean computation capacity of the devices that can arrive at the Aggregator. A set of tasks are generated using a
472
I. Dalui et al.
normal distribution for their arrival time, deadline, length and financial budget. A set of resources are generated in the similar way, i.e. using a normal distribution of their arrival time, departure time, computing speed and price of service. The generator uses these sets of tasks and resources, called normal tasks and normal resources, respectively, to be used by the simulator. The system parameters can be modulated as desired by varying the mean values of the parameters. The system parameters related to tasks are defined as follows: weight = (execution time)/(deadline−arrival time) execution time = (task length)/(fict MIPS) fict MIPS = fictitious machine s MIPS The computing capacity of resource is defined as capacity = (computation capacity of device)/fict MIPS The generation of the data sets is such that the system load can be maintained within some specified bound where systemload =
(sum of weights of the active tasks) (sum of the capacities of the available devices)
A task is considered to be active if it has arrived in the system, but its deadline has not passed, and a resource is considered to be available if it has arrived at the Aggregator and not departed. The tasks and resources are added to the data set randomly by following the normal distribution of their generation while maintaining the system load as desired throughout the span of experiment. Experiments were conducted with varying system load by varying the number of active tasks in the system and again by varying the number of available resources in the system. The performance of the system is evaluated in terms of percentage rejection of the active tasks. Result and Discussion The analysis of output after several executions of the SmartFog with different data sets reveals, as shown in Figs. 3, 4 and 5, that more the length of the tasks is the chance to be fit into the system. Light weighted tasks suffer less rejection than the heavier. Second observation is that the more the budget of the task, the less is the chance of rejection. The reason is quite intuitive. Rich tasks may be accommodated in any resources and thus making budget constraints irrelevant. Another noticeable observation is that inclusion of both cost of communication as well as allowance of pre-emption and migration results in both increase of rejection as well as accommodation of tasks. On the one hand, pre-emption and migration increase the chance of accommodation; on the other hand, cost of pre-empting the task from one resource and to communicate
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
473
Fig. 3 System performance for various resource capacities
Fig. 4 System performance for various number of tasks
the residue to another resource for migration incurs huge cost that again may lead to rejection of tasks.
5 Conclusion An efficient EDF-based framework for scheduling tasks in edge/fog computing environments is proposed. In this model, a 4-step scheduling strategy has been proposed. In each step, the Aggregator tries to maximize utilization of the resources as well as to minimize rejection of tasks. A simulation framework for task scheduling in
474
I. Dalui et al.
Fig. 5 System performance for various range of resource prices
an edge/fog computing environment is implemented in the C programming environment. It is found that with the allowance of pre-emption and migration of a task in multiple resources improves performance of a system accommodating more tasks that were rejected earlier. However, the cost of pre-emption of a task from one resource and to communicate it to another resource for migration lowers the performance, too. Budget limitations of the tasks finalize the schedule. As far as we know, the said problem has not dealt with a similar approach before. The proposed model could be further improved by considering the cost of communication as incurred in the real environment, and the profit-aware fair scheduling algorithm may be turned into a profit maximization version.
References 1. Maxwell JC (1892) The power of the internet of things. In: A treatise on electricity and magnetism, 3rd edn, vol 2. Oxford: Clarendon, pp 68–73. (Cisco and/or its affiliates) 2. Hong K, Lillethun D, Ramachandran U, Ottenwälder B, Koldehofe B (2013) Mobile fog: a programming model for large-scale applications on the internet of things. In: Proceedings of the second ACM SIGCOMM workshop on mobile cloud computing. ACM, pp 15–20 3. Nishio T, Shinkuma R, Takahashi T, Mandayam NB (2013) Service-oriented heterogeneous resource sharing for optimizing service latency in mobile cloud. In: Proceedings of the first international workshop on mobile cloud computing and networking. ACM, pp 19–26 4. Stojmenovic I (2014) Fog computing: a cloud to the ground support for smart things and machine-to-machine networks. In: Telecommunication networks and applications conference (ATNAC), 2014 Australasian. IEEE, pp 117–122 5. Sarkar S, Chatterjee S, Misra S (2015) Assessment of the suitability of fog computing in the context of internet of things. IEEE Trans Cloud Comput 6. Barbarossa S, Sardellitti S, Di Lorenzo P (Nov 2014) Distributed mobile cloud computing over 5G heterogeneous networks. IEEE Sig Process Mag 45 7. Lin X, Wang Y, Xie Q, Pedram M (2014) Energy and performance aware task scheduling in a mobile cloud computing environment. In: IEEE international conference on cloud computing
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy …
475
8. Mukherjee JC, Gupta A (2015) A review of charge scheduling of electric vehicles in smart grid. IEEE Syst J 9(4):1541–1553 9. Bazarbayev S. Content-based scheduling of virtual machines (VMs) in the cloud. University of Illinois at Urbana-Champaign, AT&T Labs Research 10. Luan TH, Gao L, Li Z, Xiang Y, Sun L (2015) Fog computing: focusing on mobile users at the edge. arXiv preprint arXiv:1502.01815
Chapter 39
A Comparative Approach: Machine Learning and Adversarial Learning for Intrusion Detection Madhura Mulimani , Rashmi Rachh , and Sanjana Kavatagi
1 Introduction With a more significant number of connected devices in the cloud environment, the risks in the network for users and resources have also escalated. Several techniques have been proposed to mitigate the risks. However, due to the evolution in the attack surface, whose growth has been proportional to the development and advancement in the information technology field, even the different security mechanisms such as firewalls, antivirus software, etc., cannot successfully thwart the novel attacks [1]. Traditional intrusion detection systems (IDS) used to identify and detect attacks are categorized as signature-based and anomaly-based IDS. Signature-based IDSs match the incoming traffic with a known set of attacks, while anomaly-based IDSs detect the attacks that deviate from the expected behavior. The growth of the network has led to the creation of diverse types of attacks as attackers use polymorphic mechanisms to create new attacks very often. Hence, the traditional intrusion detection systems cannot detect these novel attacks and thus increase the vulnerability of the systems to attacks by malicious users. Nevertheless, the security of the system is the topmost priority for any organization. In a cloud environment with many users, it is challenging for a network administrator to identify and detect all the attacks in the massive incoming network traffic. Machine learning (ML) techniques are used to automate the tasks of identifying and detecting attacks that can be reported to the network administrator, who can take further action. Machine learning has become very popular and is used in many security applications like malware detection, anomaly detection, and spam detection [2]. In the area of machine learning, many researchers have focused on improving the M. Mulimani (B) · R. Rachh · S. Kavatagi Department of Computer Science and Engineering, Visvesvaraya Technological University, Machhe, Belagavi, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_39
477
478
M. Mulimani et al.
performance of machine learning systems for quite a long time. They have found that these ML systems are highly susceptible to attacks from adversaries, and with the ever-evolving attacks, their use in security-critical applications is raising increased concern. An attacker creates adversarial examples by altering the input such that the ML model is forced to misclassify the input and produce incorrect predictions [3]. The attacker may try to evade detection, corrupt the data, and recreate the model from an existing model to misclassify the data. An area that studies these attacks and has gained much attention in the software industry is called adversarial machine learning [4]. The malicious attacks present a more significant challenge to the security of the machine learning model as they can deceive the machine learning model by making small changes to the input, thereby making them susceptible to hacking [3]. Attackers use different attack methods based on their goal, capability, strategic knowledge about the target model, etc. [5]. It is thus imperative to protect the machine learning models and make them robust against such attacks. This is done by training the machine learning model with adversarial examples. This is the most popular defense strategy and is called adversarial training. It incorporates adversarial examples into the training process and trains the machine learning models with the augmented dataset containing the adversarial examples that are generated using various attack generation methods such as Carlini and Wagner (CW), Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD), and DeepFool [6]. These small perturbations are imperceptible to a human. They trick the machine learning model into making incorrect predictions. The adversarial examples are used in various forms of attacks like evasion, poisoning, and model stealing. The attacks may either have complete or very limited knowledge about the victim machine learning model, based on which they are categorized as white-box and black-box attacks, respectively. In the proposed work, machine learning techniques such as Naïve Bayes, logistic regression, and neural network have been used on the NSL-KDD dataset. The main aim of the work is to build an intrusion detection system that is robust against adversarial examples that are generated using the methods such as JSMA, FGSM, and DeepFool. The remainder of the paper is organized as follows: Section 2 provides the literature survey and gives a gist of the research work carried out in the field of adversarial machine learning. Section 3 provides the background of the area and a brief description of adversarial machine learning, attack generation methods, and adversarial training. Section 4 describes the methodology carried out in the current work. Section 5 discusses the experimental results obtained. Section 5 concludes the paper by emphasizing that adversarial training is essential to have robust intrusion detection systems and provide security to the network systems.
39 A Comparative Approach: Machine Learning and Adversarial Learning …
479
2 Related Work Various authors have conducted several research studies to study the impact of adversarial example machine learning models. Most of the research on adversarial examples has been carried out on images. However, other researchers have also explored the impact of adversarial examples on textual data. The literature survey presented in this section discusses the works of all those authors who have experimented on different datasets and have used different attack generation methods. Qureshi et al. [1] have generated adversarial attacks using the Jacobian Saliency Map Attack (JSMA) algorithm and implemented the swarm optimization capabilities to train the system with the Artificial Bee Colony (ABC) algorithm. The random neural network-based adversarial IDS thus created is evaluated and compared with that of deep neural networks (DNN) to reveal that the proposed system is better in performance. Martins et al. [4] have experimented with different machine learning algorithms to study their behavior in an adversarial context. They have used different datasets for the experiment and have implemented four different adversarial attack techniques with multiple perturbation magnitudes. They found that the performance of the classifiers deteriorates in the presence of adversarial attacks. The denoising autoencoder has demonstrated the highest attack resilience among the different classifiers used. Khamis and Matrawy [7] have investigated the effectiveness of different evasion attacks and the way the deep learning-based IDS can be trained using various neural networks. In their experiment, they have used different deep learning algorithms and datasets; they found that the robustness of the network can be increased using a min–max formulation that is based on adversarial training, as it considers a reliable defense technique against various adversarial attacks in DNN. Zheng et al. [8] have proposed a method that accumulates adversarial perturbations through epochs and uses them to enhance the robustness of trained models and, as a result, improves the training efficiency. To overcome the challenges of data augmentation and drastic model parameter change encountered when samples are transferred between epochs, they have proposed inverse data augmentation and periodically reset perturbation techniques. Overall, their proposed method increases the adversarial accuracy on CIFAR10 and takes much less training time on MNIST and CIFAR10 datasets, ultimately improving model robustness. Alhajjar et al. [3] have explored the adversarial example generation techniques such as genetic algorithm, particle swarm optimization, and generative adversarial networks and evaluated the performance of these algorithms of UNSW-NB15 and NSL-KDD datasets. Wang et al. [9] have proposed a first-order stationary condition criterion that evaluates the adversarial examples for their convergence quality found in inner maximization. Using the ones with better convergence quality at later stages of training ensures better robustness. Hence, to gradually increase the convergence quality, they have proposed a dynamic training strategy that has considerably improved the adversarial training’s robustness. Martins et al. [10] have analyzed different machine learning algorithms using NSL-KDD and CICIDS2017 datasets and different adversarial attack techniques with multiple perturbation magnitudes. They have considered
480
M. Mulimani et al.
three different scenarios in which the training and testing sets vary. Their experiment shows that the presence of adversarial attacks causes a decline in the performance of the classifiers. The denoising autoencoder is highly resilient to attacks among the different techniques used. Through their experiment, McCarthy et al. [11] show that the features in the dataset are susceptible to perturbation attacks. Additionally, they demonstrate how such vulnerable features can be removed to maintain an acceptable classifier accuracy. The classifier they have built distinguishes between DDoS attacks and benign traffic behaviors. Benzaid et al. [12] have proposed a framework that uses deep learning and software-defined networks to detect and mitigate application-layer DDoS attacks. They have trained the model on adversarial DDoS flows to make the framework robust and thus performed adversarial training. Jeong et al. [13] have generated adversarial samples using FGSM and JSMA methods and injected them into the convolutional neural network (CNN) and autoencoder classification models. They have measured the detection accuracy of the models in the presence of adversarial samples and found that it has reduced in the range of 21.82% and 39.08% Qureshi et al. [14] and Tcydenova et al. [15] have used the JSMA method to identify the feature that can cause maximum change to the benign samples with slight perturbations added. Using the adversarial samples generated by the JSMA method, they evaluate the performance metrics of the proposed method and compare them with those of a deep neural network. The comparison reveals that the proposed random neural network-based adversarial intrusion detection system (RNN-ADV) performs better in terms of accuracy, recall, precision, training epochs, and F1-score. Yin et al. [16] have suggested a framework that uses a generative adversarial network (GAN) to enhance the classifier’s performance. They use the generative model of the GAN to continuously generate the adversarial training samples and assist the classifier and the discriminator to identify different categories. Their experimental results show that the performance of the intrusion detection improves, and the classifier’s generalization is boosted. Peng et al. [17] have purported a framework for evaluating the IDSs based on deep learning. They have generated the adversarial attacks using the attacks models PGD attack, momentum iterative FGSM, limited memory Broyden-FletcherGoldfarb-Shanno (L-BFGS), and simultaneous perturbation stochastic approximation (SPSA) attacks and trained machine learning models such as DNNs, SVM, random forest, and logistic regression on the NSL-KDD dataset. They have evaluated the models’ performance considering the adversarial attacks in the dataset. Their experimental results show a substantial reduction in the performance of the models due to the attacks.
3 Background Adversarial machine learning (AML) is an emerging research field that is present at the intersection of machine learning and computer security. Machine learning techniques are used to automate the tasks in various fields. They train the machine
39 A Comparative Approach: Machine Learning and Adversarial Learning …
481
to learn from the data presented to it. Since the data may be collected from several sources, it is vulnerable to being tampered with at various points before it reaches the machine learning technique. There is a high probability that attackers or malicious users may introduce small alterations that are not noticeable by the human eye. However, the machine learning models can misclassify the input containing even the slightest alterations. The attacker may have complete or no knowledge at all about the machine learning model that it attacks, based on which the attacks that the attacker launches are classified as white-box, gray-box, or black-box attacks.
3.1 Types of Attacks Adversarial attack models are categorized into two types based on whether the classification result is prespecified. They may be targeted or untargeted. An adversary may or may not have all the information about the target model, based on which the attacks are classified as black-box and white-box attacks [7]. • White-box Attacks: An attacker has complete information about the target model and full access to it and, hence, can use backpropagation to compute gradients. An attacker knows the parameters, network architecture, and training data. These attacks are called white-box attacks. • Gray-box Attacks: An attacker has limited access to retrieve insights about the target model, e.g., only the architecture of the target model. Gray-box attacks, in comparison with white-box attacks, are difficult to execute but are more hazardous for the target models [5]. • Black-box Attacks: An attacker can freely access the input and output of a targeted DNN but cannot execute backpropagation on the network to compute gradients as he has no insights into the training data, network architecture, or parameters. Such attacks are called black-box attacks [18].
3.2 Adversarial Machine Learning Fundamentals Szegedy et al. [19] were the first to discover adversarial examples. Following this, Goodfellow et al. [20] proposed the FGSM to generate adversarial examples using a single step. JSMA, CW, BIM, DeepFool, etc., are some of the other attack generation methods used for generating adversarial examples. The generated adversarial examples are injected into the dataset during the model’s training phase or used during the testing phase to escape the detection process using different strategies. They are broadly classified as evasion, poisoning, and model stealing. • Evasion: When an attacker intentionally causes a machine learning model to perform malicious activities or misclassify the input samples, it is called an evasion attack.
482
M. Mulimani et al.
• Poisoning: When an attacker can partially modify the training data used by the machine learning model, it is called a poisoning attack. • Model Stealing: When an attacker probes into an already constructed machine learning model, reconstructs it, and trains it with altered parameters training data, such an attack is called a model stealing attack [21].
3.3 Adversarial Attack Strategies • Fast Gradient Sign Method (FGSM): This method is used to generate adversarial examples with a single gradient step [22]. It linearizes the cost function J to train a model f around the neighborhood of the training point x that the adversary wants to get forcibly misclassified f . The adversarial example corresponding to input x is calculated as x∗ ← x + ε.∇x J ( f, θ, x)
(1)
• where ε is a parameter that controls the magnitude of the perturbation introduced. The more significant the value, the greater the likelihood that the model f will misclassify the x∗. However, it makes the perturbation easier to detect by a human [23]. • Jacobian Saliency Map Attack (JSMA): This attack generation algorithm generates the adversarial samples by mapping the input directly to the desired output and uses the following mathematical model: argσc s.t.F(C + σc ) = D ∗
(2)
• where σc is the perturbation vector, . is the relevant norm for the RNN input comparison, D ∗ is the required adversarial output data point, and C + σc = C ∗ is the adversarial sample [1, 24–26]. • DeepFool: It is an attack generation method that generates adversarial perturbations by iteratively linearizing the classifier. It computes the final perturbation using the following rule: ∝i 2 s.t. f ( pi ) + ∇ f ( pi )T ∝i = 0
(3)
3.4 Defense Strategies Adversarial examples are used to make the model smoother and limit its sensitivity to small perturbations of its inputs. Injecting such examples during the training can improve the generalization of the machine learning model. This makes the machine learning model robust and is a defense strategy called adversarial training. Other
39 A Comparative Approach: Machine Learning and Adversarial Learning …
483
defense strategies include defensive distillation, gradient masking, and ensemble learning that are used during testing phases [27].
4 Proposed Model This section discusses the framework used in the experiment. In a network, many activities occur that legitimate or malicious users may perform. It is thus imperative for a network administrator to identify the activities of malicious users and take the appropriate action. Machine learning techniques can be used for such tasks. However, if the machine learning model has not been trained on these attacks, as a consequence, it misclassifies them. For instance, it may misclassify a normal transaction to be anomalous and an anomalous transaction to be a normal record. As a result, the accuracy of the machine learning model reduces drastically and paves the way for the attacks to escape being detected by the intrusion detection system. This may further lead to a breach of security or cause severe damage to the system. In this experiment, we aim to compare the performance of different machine learning models used for intrusion detection. The machine learning models are tested and evaluated on clean and adversarial data. Figure 1 depicts the architecture used in the experiment. Machine learning and deep learning techniques are used to train the machine on the NSL-KDD dataset. The attacks are introduced into the test dataset using the attack generation methods [28]. The machine learning models are tested with the new adversarial test dataset. The performance of the machine learning models decreases drastically as the machine has been trained on clean data but tested with adversarial data that it has not seen before. Hence, it is essential to train the machine for the adversarial data to build a robust intrusion detection system.
ML Models
Evaluate
DL Model
Evaluate
Dataset
ML Models
Evaluate
DL Model
Evaluate
Attack Generation Methods
Fig. 1 Proposed architecture used in the experiment
484
M. Mulimani et al.
The following subsections discuss the dataset, machine learning techniques, experimental setup, and performance metrics used in the experiment.
4.1 Dataset and Its Preprocessing The experiment is carried out on the NSL-KDD dataset. It contains 44 features that may be binary, categorical, or continuous. The categorical features are encoded using the one-hot encoding method. The features are normalized using the min–max normalization. The training dataset contains 80% of the records from the original dataset and has 100,778 records. 20% of the original dataset is used as the test dataset and, hence, contains 25,195 records.
4.2 Machine Learning Models In the experiment, different machine learning techniques such as Naïve Bayes, neural network, and logistic regression and a deep learning technique have been used to train the model.
4.3 Experimental Setup The experiment has been carried out on an Intel Core i7 CPU @ 2.5 GHz with 16 GB RAM with IBM Adversarial Robustness Toolbox (ART) library and Keras API, NumPy, pandas, and sklearn packages. Additionally, the experiment has been carried out on Google Colab also. The IBM ART is a Python library for machine learning security. It consists of different attack generation methods that can be used to generate evasion, poisoning, extraction, and inference attacks.
4.4 Performance Metrics The experimental results are evaluated using different performance metrics like precision, recall, F1-score, and accuracy. These metrics are computed using the values of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) from the confusion matrix, which is a 2 × 2 matrix. The Eqs. (4)–(7) are used to calculate the metrics. TP indicates the anomalous record that is correctly detected, TN indicates the normal record that is correctly detected, FP indicates the anomalous record that is incorrectly detected as a normal record, and FN indicates the normal record that is
39 A Comparative Approach: Machine Learning and Adversarial Learning …
485
incorrectly detected as an anomalous record. PrecisionP =
TP TP + FP
(4)
RecallR =
TP TP + FN
(5)
F1-Score =
2∗ P ∗ R P+R
(6)
TP + TN TP + FP + TN + FN
(7)
Accuracy =
The NSL-KDD dataset and the adversarial data are used to train and test the machine learning models. The adversarial data is generated using the JSMA, FGSM, and DeepFool adversarial attack generation algorithms. Each algorithm uses a different value for adding the perturbation to the original sample.
5 Results and Discussion In the proposed framework, the model is trained with all 44 features, and the performance of the model is evaluated in terms of accuracy. Table 1 lists the accuracies of the different machine learning models. In the experiment, a deep neural network with an input layer, an output layer, and two hidden layers is created. All the layers use the ‘ReLu’ activation function, except the output layer, which uses the ‘softmax’ activation function. The neural network classifies the records as normal or anomaly. The attack generation methods such as FGSM, JSMA, and DeepFool are used to generate the attacks within the MLP classifier. Table 2 gives the parameters setting used for the different attack generation methods in the experiment. The machine learning models obtained earlier in the experiment are tested with the new dataset containing the attacks, and it is the adversarial test dataset. Table 3 gives the accuracies of the different machine learning models with the adversarial test dataset. Table 1 Accuracy of ML models and DL model with a clean dataset
Classifier
Clean data (%)
Naïve Bayes
91.15
Neural network
99.60
Logistic regression
97.03
Multilayer perceptron
97.39
486
M. Mulimani et al.
Table 2 Parameters of the different attack generation methods Attack generation methods
Parameters Epsilon
Theta
Gamma
Iteration
Targeted
nb_grads
FGSM
0.3
–
–
–
No
–
JSMA
–
0.1
1
–
No
–
DeepFool
0.0001
–
–
100
No
10
Table 3 Accuracy of ML models and DL model with clean and adversarial test datasets Classifier
Clean data (%)
Adversarial test data DeepFool (%)
FGSM (%)
JSMA (%)
Naïve Bayes
91.15
9.19
19.48
79.80
Neural network
99.60
33.62
19.53
98.70
Logistic regression
97.03
18.67
19.48
94.41
Multilayer perceptron
97.39
2.83
19.48
4.69
From Table 3, it is clear that the performance of a model trained with clean data deteriorates drastically when tested with the data containing the attacks, such as the adversarial test dataset. For instance, the accuracy of Naïve Bayes when tested with the original test dataset is 91.15%, whereas, with the adversarial test dataset containing the DeepFool attacks, its accuracy has reduced to 9.19%, with FGSM attacks to 19.48%, and with JSMA attacks to 79.80%. This indicates that Naïve Bayes identifies and detects the JSMA attacks much better than DeepFool and FGSM attacks. Among all the machine learning and deep learning techniques used in the experiment, neural network has better performance in detecting any of the DeepFool, JSMA, or FGSM attacks, whereas the multilayer perceptron has the worst performance in identifying these attacks. Figure 2 shows the accuracy comparison of different machine learning techniques in detecting the DeepFool, FGSM, and JSMA attacks. Figure 2 shows that the neural network model performs well on adversarial test data, whereas the performance of multilayer perceptrons deteriorates when tested on adversarial data.
6 Conclusion The presented work considers a machine learning-based framework that identifies and classifies the attacks in the NSL-KDD dataset using different machine learning algorithms. FGSM, JSMA, and DeepFool attack generation methods have been used to generate the attacks. The performances of the different machine learning techniques with the original test dataset and with the adversarial test dataset containing
39 A Comparative Approach: Machine Learning and Adversarial Learning …
487
Multilayer Perceptron
CLEAN DATA (%)
79.8 FGSM (%)
4.69
19.48
19.53
19.48
19.48
2.91
33.62
18.83
9.19
DEEPFOOL (%)
98.7
Neural Network
Logistic Regression
94.41
Naïve Bayes
97.39
99.6
97.03
91.15
ACCURACY COMPARISON WITH CLEAN AND ADVERSARIAL TEST DATA
JSMA (%)
ADVERSARIAL TEST DATA
Fig. 2 Accuracy comparison of different ML and DL models with clean and adversarial data
the FGSM, JSMA, and DeepFool attacks are compared. A slight perturbation in the unseen data causes the classification rate to drop drastically. Consequently, the attacks can escape the detection process and cause harm to the system. Thus, it is imperative to build a robust intrusion detection system by implementing a defense strategy against the attacks.
References 1. Qureshi AUH, Larijani H, Mtetwa N, Yousefi M, Javed A (2020) An adversarial attack detection paradigm with swarm optimization. In: Proceedings of the international joint conference on neural networks. IEEE. Glasgow, UK 2. Alatwi HA, Morisset C (2021) Adversarial machine learning in network intrusion detection domain: a systematic review. http://arxiv.org/abs/2112.03315, pp 1–21 3. Alhajjar E, Maxwell P, Bastian N (2021) Adversarial machine learning in network intrusion detection systems. Expert systems with applications, vol 186. Elsevier Ltd. p 115782 4. Martins N, Cruz JM, Cruz T, Abreu PH (2019) Analyzing the footprint of classifiers in adversarial denial of service contexts. In: Artificial intelligence and lecture notes in bioinformatics, vol 11805. LNCS, pp 256–267 5. Zhu Q, Sun Z, Liang X, Xiong Y, Zhang L (2020) A survey of adversarial learning on graph. 35th IEEE/ACM Int Conf Autom Softw Eng 37(4):883–894. Melbourne, VIC, Australia 6. Zeng G, Qi F, Zhou Q, Zhang T, Ma Z, Hou B, Zang Y, Liu Z, Sun M (2021) OpenAttack: an open-source textual adversarial attack toolkit. In: 59th annual meeting of association for computational linguistics and the 11th international joint conference on natural language processing, proceedings of the system demonstrations, pp 363–371 7. Khamis RA, Matrawy A (2020) Evaluation of adversarial training on different types of neural networks in deep learning-based IDSs. In: International symposium on networks, computers, and communications. IEEE, Montreal, QC, Canada, pp 1–6 8. Zheng H, Zhang Z, Gu J, Lee H, Prakash A (2020) Efficient adversarial training with transferable adversarial examples. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE, Seattle, WA, USA, pp 1178–1187
488
M. Mulimani et al.
9. Wang Y, Ma X, Bailey J, Yi J, Zhou B, Gu Q (2019) On the convergence and robustness of adversarial training. In: 36th international conference on machine learning. PMLR 97, Long Beach, California, pp 11426–11438 10. Martins N, Cruz JM, Cruz T, Henriques Abreu P (2020) Adversarial machine learning applied to intrusion and malware scenarios: a systematic review. IEEE Access 8:35403–35419 11. Mccarthy A, Andriotis P, Ghadafi E, Legg P (2021) Feature vulnerability and robustness assessment against adversarial machine learning attacks. In: 2021 international conference on cyber situational awareness, data analytics, and assessment. Dublin, Ireland, pp 1–8 12. Benzaid C, Boukhalfa M, Taleb T (May 2020) Robust self-protection against applicationlayer (D)DoS attacks in SDN environment. In: IEEE wireless communications and networking conference. IEEE, Seoul, Korea (South) 13. Jeong JH, Kwon S, Hong MP, Kwak J, Shon T (2020) Adversarial attack-based security vulnerability verification using deep learning library for multimedia video surveillance. Multimedia Tools Appl 79(23–24):16077–16091 14. Qureshi AUH, Larijani H, Yousefi M, Adeel A, Mtetwa N (2020) An adversarial approach for intrusion detection systems using jacobian saliency map attacks (JSMA) algorithm. MDPI Comput 9(3):1–14 15. Tcydenova E, Kim TW, Lee C, Park JH (2021) Detection of adversarial attacks in ai-based intrusion detection systems using explainable AI. HCIS 11(35):1–13 16. Yin C, Zhu Y, Liu S, Fei J, Zhang H (2020) Enhancing network intrusion detection classifiers using supervised adversarial training. J Supercomputing 76(9):6690–6719 17. Peng Y, Su J, Shi X, Zhao B (2019) Evaluating deep learning based network intrusion detection system in adversarial environment. In: 9th international conference on electronics information and emergency communication. IEEE, Beijing, China, pp 61–66 18. Alzantot M, Sharma Y, Chakraborty S, Zhang H, Hsieh C-J, Srivastava MB (2019) GenAttack: practical black-box attacks with gradient-free optimization. In: GECCO’19. Prague, Czech Republic, pp 1111–1119 19. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: 2nd international conference on learning representations. Banff, AB, Canada, pp 1–10 20. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: 3rd international conference on learning representations. San Diego, CA, USA, pp 1–11 21. Pawar Y, Amayri M, Bouguila N (2021) Performance evaluation of adversarial learning for anomaly detection using mixture models. In: Proceedings of the IEEE international conference on industrial technology, March. IEEE, Valencia, Spain, pp 913–918 22. Wong E, Rice L, Kolter JZ (2020) Fast is better than free: revisiting adversarial training. ICLR, pp 1–17. http://arxiv.org/abs/2001.03994 23. Nicolas Papernot N, Carlini N, Goodfellow I, Feinman R, Faghri F, Matyasko A, Hambardzumyan K, Juang Y, Kurakin A, Sheatsley R, Garg A, Lin Y, Hendricks P, McDaniel P (2016) Cleverhans v2.0.0 an adversarial machine learning library, pp. 1–7. http://arxiv.org/ abs/1610.00768 24. Mukeri AF, Gaikwad DP (2022) Adversarial machine learning attacks and defenses in network intrusion detection systems. I J Wireless Microwave Technol MECS 1(2):12–21 25. Debicha I, Debatty T, Dricot J-M, Mees W (2021) Adversarial training for deep learning-based intrusion detection systems. The sixteenth international conference on systems ICONS 202. Porto, Portugal 26. Ren K, Zheng T, Qin Z, Liu X (2020) Adversarial attacks and defenses in deep learning. Eng Elsevier 6(3):346–360 27. Bai T, Luo J, Zhao J, Wen B, Wang Q (2021) Recent advances in adversarial training for adversarial robustness. Int Joint Conf Artif Intell 2:4312–4321. Montreal, Canada 28. Yang K, Liu J, Zhang C, Fang Y (2019) Adversarial examples against the deep learning based network intrusion detection systems. In: Proceedings—IEEE military communications conference MILCOM, vol 10. Los Angeles, CA, USA, pp 559–564
Chapter 40
Blockchain-Based Agri-Food Supply Chain Management N. Anithadevi, M. Ajay, V. Akalya, N. Dharun Krishna, and S. Vishnu Adityaa
1 Introduction Several years have fleeted since the arrival of the whitepaper “Bitcoin: A Peer-toPeer Electronic Cash System” by the anonymous author Nakamoto. A blockchain is an electronic exchange log maintained by a company with different processing machines that aren’t relying on an outsider for the point of upkeep and specialized help. Singular exact robotized programming stages are used to manage exchange information records (blocks), allowing the data to be exchanged, handled, stored, and addressed in an opaque structure. In its remarkable bitcoin plan, each square contains a header with a period stamp, trade data, and an association with the past block. Following the invention of blockchain technology, bitcoin has notably solved the double-spending problem. A blockchain is a computerized exchange record, oversaw by numerous registering machines that are free of outsider inclusion. Individual transaction data files are maintained through software platforms that allow the information to be communicated, handled, put away, and addressed in comprehensible structure. Each block contains a header with a period stamp, exchange information, and a connection to the past block in its unique bitcoin setup. For every block N. Anithadevi · M. Ajay (B) · V. Akalya · N. Dharun Krishna · S. Vishnu Adityaa Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, India e-mail: [email protected] N. Anithadevi e-mail: [email protected] V. Akalya e-mail: [email protected] N. Dharun Krishna e-mail: [email protected] S. Vishnu Adityaa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_40
489
490
N. Anithadevi et al.
depending on the contents a hash is generated. Later it gets checked in the header of the next block. Hence, any changes in a given block could prompt a jumble in the hashes of the multitude of resulting blocks. Each exchange is dispersed through the organization of machines running the blockchain convention and should be approved by all PC hubs. The noticeable utilization of a blockchain is its capacity to keep a steady view and understanding in the midst of the members (i.e., consensus). Significant constraint in the current food supply chain system is that it lacks transparency. End user is not satisfied about the product details received from the product that the user had brought. The farmers who worked hard for the product development were not receiving the sufficient reward for their work due to middleman-system. Even though users were provided with the data, they were not able to verify whether the data given was tamper proof or tampered data. Thus, the existing system lacked tamper-proof data. Accountability and Scalability was also a major concern in the existing system even for a particular product. In the remaining section of this paper, we will be seeing domain explanation in Sect. 2 and related works in Sect. 3; in Sect. 4, we will be seeing about the proposed system, and in Sect. 5, we will be seeing the conclusion and future works.
2 Domain Explanation 2.1 Blockchain A blockchain is also an extending rundown of records, perceived as blocks, that unit of estimation associated with exploitation cryptography. Each block consists of a hash of the later block, a timestamp, and dealing data. Blockchain might be proof against modification of the information. Blockchain is generally a public and distributed ledger which is capable of recording the transactions between a combination of parties expeditiously and in a really certain and super durable manner. When recorded, the data in some random block can’t be adjusted retroactively though no change of every single ulterior block that wants the degree of accord of the peer majority. Although blockchain records don’t seem to be unalterable, blockchain may even be exemplified as a distributed automatic process system with high Byzantine fault tolerance. Blockchain was made up by a non-public (or group of individuals) exploitation of the name Satoshi Nakamoto in 2008 to play out the public managing record of the digital currency bitcoin. The identity of Satoshi Nakamoto is obscure. The innovation of the blockchain for bitcoin made it the essential computerized money to determine the twofold spending disadvantage though not the necessity of genuine power or focal server.
40 Blockchain-Based Agri-Food Supply Chain Management
491
2.2 Keccak-256 Keccak could be a group of wipe works that has been normalized inside the style of SHAKE128 and SHAKE256 long yield capabilities and of SHA3-224 to SHA3512 hash capabilities in FIPS 202, what’s more as cSHAKE128, cSHAKE256, and various capabilities in government organization SP 800–185 as addressed in Fig. 1. The text below could be a fast description of Keccak victimization pseudo-code. The objective here is to give Keccak weight on coherence and lucidity. For an extra proper depiction, the scanner is welcome to peruse the reference determinations or the FIPS 202 typical. Keccak could be a group of hash tasks that region units upheld the wipe development and in this manner could be a wipe capability family. In Keccak, the hidden activity could be a change picked in an extreme set of seven Keccak-f stages, meaning Keccak-f[b], wherever {50, 100, 200, 400, 800, 1600} is that the component of the change. The component of the change is moreover the element of the state inside the wipe development. The state is organized as an associate degree array of 5 × 5 lanes, each of length w ∈ and b = 25 w. Once enforced on a 64-bit processor, a lane of Keccak-f[1600] may be delineated as a 64-bit computer hardware word. We procure the Keccak [r, c] wipe activity, with boundaries capacity c and bitrate r, in the event that we tend to apply the wipe development to Keccak-f[r + c] and by applying explicit antiques to the message input.
Fig. 1 KECCAK-256
492
N. Anithadevi et al.
2.3 Ethash Ethash is a proof of work that is a changed variant of a forerunner rule alluded to as Dagger-Hashimoto. With this, the output shaped within the hashing method should lead to a hash worth that’s below a particular threshold. The thought is perceived as an issue, and it includes the Ethereum network expanding and diminishing the edge to control the speed at that block square measure mined on the network. If the speed at that block square measure was found to increase, then the network can mechanically increase the issue, that is, it will lower the network threshold in order that the quantity of valid hashes capable of being found additionally decreases. Conversely, assuming the speed of found blocks diminishes, the organization limit can increment to give a superior assortment of right hash esteems that sounds found, truly. By and large, one block is made by the organization every twelve seconds. Ethash is predicated around an oversized, every which way generated dataset referred to as a Directed Acyclic Graph (DAG) as illustrated in Fig. 2. The DAG is refreshed once each 30,000 blocks, and furthermore the ongoing DAG size of Ethereum is, at the hour of composing, 2.84 GB. The DAG can in any case develop in light of the fact that the blockchain develops. Ethereum’s proof of work model is frequently differentiated to models like bitcoin that are predicated on SHA-256 hashing. Inside the instance of bitcoin, for instance, its verification of work subject is figure bound and that implies that the time taken to finish a machine task is set principally by the speed of a PC’s focal handling unit. Given the very reality that bitcoin’s mining rule needs a simple SHA256 calculation, we’ve seen the development of ASIC chips that are coordinated and intended for the main motivation behind figuring billions of SHA-256 hashes. This has made it pretty much impractical for excavators with universally handy CPU and GPU equipment to battle, as ASIC chips square measure far more conservative in calculation. Fig. 2 Ethash
40 Blockchain-Based Agri-Food Supply Chain Management
493
2.4 Smart Contract Code will either be the main genuine indication of the understanding between the gatherings or may supplement a standard text-based agreement and execute bound arrangements, like moving assets from Party A to Party B. The actual code is duplicated across different nodes of a blockchain and, hence, enjoys benefits from the security, length, and unchangeability that a blockchain offers. That replication moreover implies as each new block is accessorial to the blockchain, the code is, in actuality, executed. Assuming the gatherings have shown, by starting dealings, that bound boundaries are met, the code can execute the step set off by those boundaries. Assuming that no such arrangement has been started, the code won’t make any strides. Most reasonable agreements are unit written in one among the programming dialects straightforwardly fitted to such programs, like Solidity. Smart contracts are a unit as of now the most appropriate to execute precisely two styles of “transactions” found in several contracts: [1] guaranteeing/making bound/ensuring the payment of assets upon bound setting off occasions and forcing cash punishments on the off chance that specific goal conditions aren’t met. In every case, human intervention, together with through a trusty written agreement holder or maybe the scheme, isn’t needed when the reasonable agreement has been sent and is functional, accordingly decreasing the execution and social control costs of the getting technique as displayed in Fig. 3.
3 Related Works Blockchain innovation, because of its disseminated network framework, is accepted to empower a more straightforward supply chain and remake trust between individuals. Accordingly, blockchain-empowered e-agriculture is generally viewed as the following stage toward feasible agribusiness. All things considered, executing blockchain innovation actually offers a few main points of contention, and another Fig. 3 Smart contract
494
N. Anithadevi et al.
viewpoint about reasonable information the board frameworks is required. In this paper, information requests from all connected gatherings who care about supportability accomplishments in the rural area are thought about [2]. The paper proposes a general framework way to deal with embed blockchain innovation into the current agri-food supply chain. It gives fascinating bits of knowledge about how to accomplish maintainability, by making another worthy system among blockchain network individuals [3]. The standard supply chains are by and large incorporated and they exceptionally depend on an outsider for the end goal of exchanging. These blockchain based frameworks don’t have straightforwardness, responsibility, and auditability. In our proposed arrangement, we have introduced a total answer for blockchain-based agriculture and food (agri-food) supply chain. It uses the vital highlights of blockchain and brilliant agreements, conveyed over ethereum blockchain networks. Blockchain gives permanence of information and records in the organization; however, it actually has a limit as it can’t tackle a few confounded issues in store network the board like that of believability of the elaborate substances, responsibility of the exchanging system, and discernibility of the items. Consequently, there is a need for a solid framework that guarantees detectability, trust, and conveyance components in the agri-food supply chain. In the proposed system, all transactions are written to blockchain which ultimately uploads the data to Interplanetary File Storage System (IPFS). The storage system returns a hash of the information which is put away on the blockchain. Our framework furnishes savvy contracts alongside the calculations. This shows the association of elements in the framework. Besides, the reenactments and assessment of savvy contracts alongside the security and weakness examinations have additionally been introduced in this work [4, 5]. Ensuring food safety expects to completely screen the general course of dealing with food, readiness as well as stockpiling of food so that it lessens the gamble of individuals becoming ill because of unnecessary tidiness or any botch in the general cycle. Savvy agribusiness incorporates an effective, spotless, and safe food supply chain framework that can handle these issues more keenly than that of the current framework. Agricultural food supply chain indicates the component making sense of how the homestead food comes to our tables. The examination presents a brilliant model for the change of customary food production network thinking about blockchain technology. The model vows to give all partners partaking in the horticultural food supply chain equivalent open doors regardless of whether they are not comfortable to one another however without the assistance of dependable outsider specialist co-ops. We approve the proposed blockchain-based shrewd model with our own plan without utilizing blockchain [6, 7]. The use of blockchains to further develop item quality and well-being control in food supply chains through straightforward, trusted, and secure start to finish recognizability structures, has gotten expanded consideration over the most recent couple of years. The utilization of blockchains, however, doesn’t come at no expense. Unfortunate plan of blockchain-based applications can prompt restrictive expenses, horrendous deferrals, and non-adaptable frameworks. In this work, we investigate various structures for blockchain-based detectability and quality control of produce, proposing, assessing, and contrasting four unique situations. Our assessment utilizes
40 Blockchain-Based Agri-Food Supply Chain Management
495
public and confidential ethereum occasions and surveys the thought about models regarding cost and generally throughput [8, 9]. A public blockchain idea was chosen rather than a private blockchain in this review to guarantee transparency by permitting any individual to get to the network. Cases of the shrewd agreement were made for each actual item and conveyed to the blockchain network. A Quick Response code, which contained the location of the occasion, was a reference to the virtual item. Each entertainer that is engaged with the supply chain ought to cooperate with the framework to accomplish transparency. Farmers could place a certification demand in regards to their items, and they can acquire notoriety tokens for every accreditation done by peers. The proposed framework has been carried out as a model and approved inside the review [10, 11]. Blockchain has transformed business processes from centralized to decentralized. It can better help in making the food grain supply chain fully decentralized with peer-to-peer (P2P) business by eliminating delegates, and in this way lessening the general expense toward the end client side with better re-visitations of ranchers. In this paper, we investigate blockchain innovation and the expected brilliant agreements for dependable and boosted P2P exchanging of food grains. We propose different brilliant agreements, for example, food grain supply, offering, exchanging and use, which are sent on ethereum blockchain for the decentralized exchanging of food grains. We use Vickrey–Clarke–Grove (Vickrey auction) method for accomplishing the boosted exchange for both farmers and end users. The proposed framework offers P2P exchanging, security of food grain information, information straightforwardness, client’s namelessness, and impetuses in the exchanging system. The presentation of the proposed system is assessed and broken down as far as satisfying the prerequisites of food grain supply management [12]. The Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) has been utilized as a kind of perspective and fosters a lengthy model by thinking about apparent trust, since the capacities of blockchain innovation for food detectability and straightforwardness. The outcomes that have been acquired show that blockchain innovation can foster trust and influence the goal of procurement. The purchase aim is essentially featured and impacted by execution hope, exertion anticipation, and propensity. Besides, the outcomes show that technographic can conspicuously work on the utilization of blockchain innovation to follow food. The exploration gives knowledge to experts and specialists on the most proficient method to increment expectation towards blockchain [13]. The centralized data storage makes it harder to guarantee quality, rate, and beginning of the items. In this way, we are needing a decentralized framework where straightforwardness is accessible which makes individuals from the makers to purchasers fulfilled. Blockchain innovation is a computerized innovation that permits us to obtain traceability and transparency in the supply chain. Making use of this technology as a matter of fact works on the local area between various stakeholders and farmers. The properties of blockchain basically gives expanded limits, better security, unchanging nature, printing, quicker settlement, and full traceability of put away exchange records. This paper presents a completely decentralized blockchain-based traceability that empowers building blocks for farming that ceaselessly incorporate
496
N. Anithadevi et al.
IoT devices from provider to consumer. To implement, we introduced “Provider– Consumer Network”—a hypothetical end to end food traceability application. The goal is to make a conveyed record that is open by all clients in the organization that thus brings transparency [14]. Blockchain-based frameworks permit a coordinated framework to happen without an outsider by disseminating all information to every individual from the supply chain included. Investigation is finished by making the Food Trail Blockchain configuration utilizing four blockchain framework deliberation layers [15]. Food Trail Blockchain records and tracks the transfer and transformation of food products in the supply chain. Transfer and transformation are put away in the blockchain and can be followed utilizing profundity first pursuit calculation. The plan was then carried out in a model utilizing the Hyperledger Sawtooth structure. The prototypes that are fabricated are assessed in view of perspectives that should be viewed as in the traceability framework (broadness, profundity, accuracy, and access) and blockchain (appropriated, confirmed, and unchanging). That’s what the assessment shows: Food Trail Blockchain fulfilled the distributed, verified, and immutable aspect. The framework has low execution in exchange taking care of connection with broadness and profundity angle because of restricted server limit and confounded process. Anyway the framework enjoys benefits connected with access and accuracy, since all exchanges are confirmed, immutable, and stored locally in every node [16].
4 Proposed System The proposed system will encounter all the user needs in the system; it also considers both small scale supply chain and large-scale industrial supply chain, and it allows the seller–buyer exchange of products by completely eliminating the middle-man concept. Sellers are able to price the product. Buyer is only able to buy the product only if the buyer pays the crypto as the price of the product which was decided by the seller at registration time. Different types of product can be registered for purchasing. Event log is used to register all needed details for user access. Buyer is only allowed to pay to contract and after delivery confirmation from buyer contract pay the payment to seller. Every activity and functionality can be accessed only by trusted users. The structural diagram contains six major actors as illustrated in Fig. 4; they are (a) (b) (c) (d) (e) (f)
Seller Buyer Smart contract—payment Delivery Resell Refund.
40 Blockchain-Based Agri-Food Supply Chain Management
497
Fig. 4 Architectural diagram
4.1 Seller Phase First functionality of the seller is to enter the product details in the system. Then the seller must confirm the product details and then register the product in the blockchain system. After successfully registering, sellers are able to track products using hash value. After the buyer pays for the seller product, then the seller may hand over the product to the pickup agent. Seller will receive the payment only after confirmation about the product delivery from both seller and buyer side.
4.2 Buyer Phase Initially buyers are able to search for a desired product in the system. After choosing the product buyer must confirm the product and pay the crypto to the blockchain smart contract only if the amount of buyer crypto is the same as seller price crypto
498
N. Anithadevi et al.
and make sure the product seller itself doesn’t buy the product. Once the buyer places and pays for the product, the product will be delivered within the given period of time. After receiving the product, the buyer can rate and review the product and also the buyer will be able to confirm about the product delivery or request for refund if the product is not satisfied. Ownership will be transferred from seller to buyer, after successful delivery.
4.3 Payment Phase For any transaction like register, view etc., the user must be paying a small amount of gas in ethereum network. Buyer was allowed to pay the product price only to the contract. Seller will receive the payment for the product from the contract. After two step verification, that’s the verification about product delivery from the side of the delivery agent and from the side of the buyer. And if the buyer forgets to verify about the product delivery, then after a certain period of time the smart contract itself makes a payment to the seller by considering the verification which was done at the delivery agent side. The payment for refund works in such a way that is the buyer select option for refund instead/before verifying the product delivery, then the smart contract will make a refund back to the buyer and send notification to all users like seller, buyer, and delivery agent, and then the product will be delivered back to the seller by the delivery agent by noting the previous buyer and buyers review for that product.
4.4 Delivery Phase The physical transport of the product between the seller and buyer will be held by the delivery agency. Delivery agency will get notified after every successful transaction between the seller and buyer. Delivery agency will allocate a delivery person for exchanging the product. Delivery agencies will get their pay by the equal share from buyer and seller. After successful delivery, both delivery agent and buyer will be provided with the option for confirming about the product delivery. By this twostep verification about the product the payment phase from smart contract to seller account initiates and if buyer fails to confirm about the product delivery, then smart confirm will wait for a certain period of time and after a certain period of time the smart contract automates the process and transfer the amount to the seller account.
40 Blockchain-Based Agri-Food Supply Chain Management
499
4.5 Resell Phase Once the buyer received the product and confirm the delivery of the product, then the product ownership will be transferred from the seller to the buyer, and then the buyer becomes the owner of the product, and then the new owner can able to resell the product with or without modifying the product, and the seller–buyer cycle repeats till it reaches the end consumer.
4.6 Refund Phase In the refund module, the buyer is able to request for refund only at the stage of delivery thus before the buyer confirms about the product delivery, instead of confirming about the product delivery the buyer can choose the option for refund if the buyer was not satisfied about the product received. The smart contract validates the request and refund the cryptos to the buyer account and makes necessary steps to make the product reach back the seller.
5 Conclusion and Contribution to Research The proposed system allows the seller–buyer exchange of products by completely eliminating the middle-man concept. Sellers are able to price the product. Buyer is only able to buy the product only if buyer pays the crypto as the price of the product which was decided by seller at registration time. Seller is able to price the product. Buyer is only able to buy the product only if the buyer pays the crypto as the price of the product which was decided by the seller at registration time. Sellers are able to price the product. Buyer is only able to buy the product only if the buyer pays the crypto as the price of the product which was decided by the seller at registration time. Sellers are able to price the product. Buyer is only able to buy the product only if the buyer pays the crypto as the price of the product which was decided by the seller at registration time. Different types of product can be registered for purchasing. Event log is used to register all needed details for user access. Buyer is only allowed to pay to contract and after delivery confirmation from buyer contract pay the payment to seller. Every activity and functionality can be accessed only by trusted users.
6 Future Work The future work will be designing UI with user friendly sources, making the arrangements for delivery agents. The product tracking condition will be updated in the
500
N. Anithadevi et al.
delivery process. The delivery of the product can be made globally. The two step verification process has to be improved, making better search engines and advanced recommendation systems like e-commerce sites.
References 1. Ahamed NN, Karthikeyan P, Anandaraj SP, Vignesh R (2020) Seafood supply chain management using blockchain. In: 6th International conference on advanced computing and communication systems (ICACCS), vol 0. IEEE, Coimbatore, India, pp 473–476 2. Thejaswini S, Ranjitha KR (2020) Blockchain in agriculture by using decentralized peer to peer networks. In: 2020 fourth international conference on inventive system and control (ICISC), vol 0. IEEE Access, IEEE, Coimbatore, India, pp 600–306 3. Song, Luona, Xiaojuan Wang, and Nicolas Merveille (2020) Research on blockchain for sustainable e-agriculture. In: IEEE technology & engineering management conference (TEMSCON), vol 0. IEEE, Novi, MI, USA, pp 1–5 4. Shahid A, Almogren A, Javaid N, Al-Zahrani FA, Zuair M, Alam M (2020) Blockchain-based agri-food supply chain: a complete solution, vol 8. IEEE Access, IEEE, pp 69230–69243 5. Awan SH, Nawaz A, Ahmed S, Khattak HA, Zaman K, Najam Z (2020) Blockchain based smart model for agricultural food supply chain. In: International conference on UK-china emerging technologies (UCET), vol 0. IEEE, Glasgow, UK, pp 1–5 6. Voulgaris S, Fotiou N, Siris VA, Polyzos GC, Tomaras A, Karachontzitis S (2020) Hierarchical blockchain topologies for quality control in food supply chains. In: European conference on networks and communications (EuCNC), vol 0. IEEE, Dubrovnik, Croatia, pp 139–143 7. Salah K, Nizamuddin N, Jayaraman R, Omar M (2019) Blockchain-based soybean traceability in agricultural supply chain, vol 7. IEEE Access, IEEE, pp 73295–73305 8. Basnayake BMAL, Rajapakse C (2019) A blockchain-based decentralized system to ensure the transparency of organic food supply chain. In: International research conference on smart computing and systems engineering (SCSE), vol 0. IEEE, Colombo, Sri Lanka, pp 103–107 9. Wang S, Li D, Zhang Y, Chen J (2019) Smart contract-based product traceability system in the supply chain scenario, vol 7. IEEE Access, IEEE, pp 115122–115133 10. Jaiswal A, Chandel S, Muzumdar A, Madhu GM, Modi C, Vyjayanthi C (2019) A conceptual framework for trustworthy and incentivized trading of food grains using distributed ledger and smart contracts. In: 16th India council international conference (INDICON), vol 0. IEEE, Rajkot, India, pp 1–4 11. Tsang YP, Choy KL, Wu CH, Ho GTS, Lam HY (2019) Blockchain-driven IoT for food traceability with an integrated consensus mechanism, vol 7. IEEE access, IEEE, pp 129000– 129017 12. Yeh J-Y, Liao S-C, Wang Y-T, Chen Y-J (2019) Understanding consumer purchase intention in a blockchain technology for food traceability and transparency context. In: Social implications of technology (SIT) and information management (SITIM), vol 0. IEEE, Matsuyama, Japan, pp 1–6 13. Madumidha S, Siva Ranjani P, Vandhana U, Venmuhilan B (2019) A theoretical implementation: agriculture-food supply chain management using blockchain technology. In: TEQIP III sponsored international conference on microwave integrated circuits, photonics and wireless networks (IMICPW), vol 0. IEEE, Thiruchirappalli, India, pp 174–178 14. Hayati H, Nugraha IGBB (2018) Blockchain based traceability system in food supply chain. In: International seminar on research of information technology and intelligent systems (ISRITI), vol 0. IEEE, Yogyakarta, Indonesia, pp 120–125
40 Blockchain-Based Agri-Food Supply Chain Management
501
15. Lin W, Huang X, Fang H, Wang V, Hua Y, Wang J, Yau L (2020) Blockchain technology in current agricultural systems: from techniques to applications, vol 8. IEEE Access, IEEE, pp 143920–143937 16. Zhang X, Sun P, Xu J, Wang X, Yu J, Zhao Z, Dong Y (2020) Blockchain-based safety management system for the grain supply chain, vol 8. IEEE Access, IEEE, pp 36398–36410
Chapter 41
Data Balancing for a More Accurate Model of Bacterial Vaginosis Diagnosis Jesús Francisco Perez-Gomez , Juana Canul-Reich , Rafael Rivera-Lopez , Betania Hernández Ocaña , and Cristina López-Ramírez
1 Introduction Bacterial Vaginosis (BV) is one of the great enigmas in women’s health, a condition of unknown etiology, which is associated with significant morbidity and unacceptably high recurrence rates [1]. BV is related with a wide array of health issues, such as preterm birth, pelvic inflammatory diseases, increased susceptibility to HIV infection, and other health problems [2]. This illness is detected mainly through two clinic procedures: the Amsel and Nugent score criteria. The first method consists in the presence of three of the following four criteria: Increased homogeneous thin vaginal discharge; the pH of secretion greater than 4.5; Amine odor when potassium hydroxide solution is added; the presence of clue cells in wet preparations [3]. In the second method, the morphotypes that correspond to large gram-positive rods, small gram-negative rods, and curved Gram variable rods are quantified [4]. Previous studies with machine-learning methods have shown advances in bacterial vaginosis research. In [5] the authors experimented with classifiers such as the support vector machine (SVM) and logistic regression (LR), which classified BV categories with accuracies of 95 and 71%, respectively. Performance measures as balanced accuracy, sensitivity, and specificity were calculated and compared between the models created. Here, based on feature selector methods such as Relief and DT feature rankings were created with the most important features in the BV dataset. From these rankings, two subdatasets with the 15 most important features were created, which were used in additional experiments. Features such as Prevotella,
J. F. Perez-Gomez · J. Canul-Reich (B) · B. H. Ocaña · C. López-Ramírez Universidad Juarez Autonoma de Tabasco, CP 86690 Cunduacán, Tabasco, Mexico e-mail: [email protected] R. Rivera-Lopez Instituto Tecnológico de Veracruz, CP 91800 Veracruz, Veracruz, Mexico © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_41
503
504
J. F. Perez-Gomez et al.
Megasphaera, Atopobium, Dialister, and pH resulted in the most relevant features. Their results are compared with the results of this work. The work of Menardi and Torelli [6] has motivated the use of the methods for data balancing. They demonstrated that with the Random Over-Sampling Examples (ROSE) technique used as a remedy for data imbalance, a slight increase in the classification phase can be obtained. ROSE’s implementation to create synthetic samples by oversampling the minority class allows the improvement of the area under the curve (AUC) of classifiers such as classification trees and logistic models. The empirical analysis mentions that the larger the original data set, the higher the AUC obtained. It also shows that, in most experienced scenarios, ROSE outperforms other techniques like SMOTE. The motivation for this work is to improve the classification performance obtained in the previous work [5] by using a more balanced dataset. This paper is organized as follows. Section 2 details the datasets used for the experiments, methods and techniques applied from the machine-learning area. In Sect. 3 the experimental design is described. In Sect. 4 the results obtained through the experiments are presented. Finally, in Sect. 5, the conclusions are presented.
2 Materials and Methods 2.1 Dataset The dataset used is based on a study of vaginal bacterial communities generated by Ravel et al. [7]. The samples were obtained at three clinical sites: two in Baltimore at the University of Maryland School of Medicine and one in Atlanta at Emory University. It contains vaginal microbiology information about 396 asymptomatic North American women, of whom 97 were classified as positive for BV. Through two self-collected swabs, the vaginal samples were extracted. With the first swab, the Nugent criteria and clinical data were obtained. The other was submitted at the wholegenome DNA extraction to obtain pyrosequencing of the barcoded 16S rRNA gene amplicon. The dataset contains 253 features/columns and 396 samples/instances. The information and dataset of this study are publicly available at clinicaltrials.gov under ID Number NCT00576797 (Emory University, 2007). The preprocessing of this dataset was conducted as follows. First, all categorical values in the dataset were replaced by integer numbers (e.g., for the NugentScore_Cat “low” category was replaced by “1”, “intermediate” category by “2”, and “high” category by “3”). Additionally, explicit class labels of BV + were given to instances with a Nugent score value ≥ 7; otherwise, the instance classes were set to BV − , this is according to Beck and Foster’s definition [8]. A table with the features after the preprocessing phase is shown in Table 1.
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
505
Table 1 Summary of features in the preprocessed bacterial vaginosis dataset [7] Features
Values
BV
1 = positive or 2 = negative for BV
EthnicGroup
Ethnic group for which the test subject belongs. 1 = Asian, 2 = African American, 3 = Hispanic, and 4 = White
pH
Degree of alkalinity or acidity of a sample
NugentScore
Scoring system for vaginal swabs to diagnose BV: 7 to 10 is consistent with BV +
NugentScore_Cat
Nugent scores groups according to their values
CommunityGroup
Microbial community to which the test subject belongs
Megasphaera, Eggerthella, Lactobacillus Crispatus and others 247 features
Count of microorganisms in the vaginal analysis obtained by qPCR technique on the 16S rRNA gene
2.2 Machine Learning Methods and Metrics The techniques, methods, and algorithms used across all experiments are briefly described. The native R-programming language -version 1.2.5001- through the RStudio environment -version 3.6.1- was implemented. Support Vector Machine SVM is an algorithm capable of creating models representing the sampling point in a space that separates the class as much as possible [9]. A hyperplane—a straight line in the case of a two-dimensional separation—is placed to leave the largest possible margin on each side [10]. New classes fall into one or another class on how close they are to the model. SVM evaluates how well an individual feature contributes to the separation between classes [10] to produce a feature ranking. In a process of SVM, the feature relevance is obtained based on the feature weights as criteria, obtained from the product between the alpha values and the support vector in the resulting model [11]. The features with a weight nearest to 1 are the most relevant. In contrast, features with a weight closer to − 1 are the least relevant. The e1071 [12] R software package provides an implementation of the SVM algorithm. SVM algorithm was implemented using the parameters below: Type “Cclassification”, kernel “linear”, cost “1”. Decision Tree Decision Trees (DT) is a rule-based method of classification. In each iteration, this algorithm selects the attribute with the highest gain ratio as the attribute from which the tree branching (splitting attribute) is performed [13]. To classify a new instance, it is routed down the tree according to the values of the attributes tested in successive
506
J. F. Perez-Gomez et al.
nodes, and when a leaf is reached the instance is classified according to the class assigned to the leaf [14]. In a decision tree creation process, feature relevance is obtained based on the entropy as ranking criteria. It is described as a theoretical measure like “information uncertainty” contained in a training dataset. The features are selected in the construction of decision trees, with the most relevant features near the root of the tree, and the least important near the leaf [15]. Decision tree algorithm was implemented using the caret [12] R software package. The DT algorithm was implemented using the parameters below: “Type” c(“Classification”)”; C = 0.01; M = 1”. Logistic Regression This classification method allows relating a set of predictor variables with a categorical variable using a linear model [16]. It performs a regression for each class, setting the output equal to one for training instances that belong to the class and zero for those that do not. The result is a linear expression for the class. Then, given a test example of an unknown class, it calculates the value of each linear expression and chooses the one that is largest [14]. With a logistic regression model, a feature ranking can be obtained. The mean absolute value of the coefficient magnitude was used as ranking criteria. This value is the specific measure that quantifies the intensity of the linear relationship between two features. Features with the highest coefficient values are the most relevant, as opposed to those with the lowest coefficient values which are the least relevant. The LR algorithm was implemented using the caret [12] R software package. The algorithm used the parameters below: method = ”glm”, family = ”binomial”. Relief The Relief algorithm is an individual evaluation-filtering feature selection method. Relief feature scoring is based on the identification of feature value differences between nearest neighbor instance pairs [17]. These feature statistics are referred to as feature weights or feature scores that can range from − 1 (worst) to + 1 (best) [18]. Relief method is implemented using the FSelector [19] R software package. ROSE ROSE is a method based on the bootstrap technique that aids the task of binary classification in the presence of rare -imbalanced- classes. It is a method to generate synthetic balanced samples to strengthen the subsequent estimation of any binary classifier [20]. In the creation of synthetic examples, the smoothed bootstrapping method is implemented from the feature space neighborhood around the minority class. Unlike other data balancing methods, ROSE aids to manipulate the oversampling technique only on the minority class, without the other class being affected. With this process, both two classes can be completely balanced in proportion to the majority class. The ROSE method is provided in the ROSE [20] R software package.
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
507
K-Fold Cross-Validation It is a method for getting reliable estimates for datasets with poor sampling space -instances- and is usually called k-Folds Cross-Validation, or simply k-FCV. Torgo [21] describes this method as follows: Then, it performs k loops in which k − 1 partitions of the original dataset are used for training and the rest for testing. For each fold, the measure of evaluation of the model obtained from the confusion matrix is calculated and summed. When all the k loops have ended, the cross-validation accuracy is obtained. An advantage of cross-validation is that the variance of the resulting estimate is reduced as k increases [13]. The most popular value for k is 10 according to specialized literature [8]. This measure also is used for the experiments of this study.
2.3 Performance Measures Accuracy Estimates the performance of a classifier to know how well the classifier partitions samples into BV categories -positive or negative- [22]. It is referred to as the instances that are correctly classified. This performance measure is also known as overall accuracy. It is calculated using Eq. (1). Accuracy = (t p + tn) (t p + tn + f p + f n)
(1)
where tp means true positive, tn means true negative, fp means false positive, and fn means false negative prediction values in a confusion matrix. Balanced Accuracy Also known as weighted accuracy, means the accuracy individually for each class without distinguishing between the other classes [14]. It is calculated based on the prediction values in the confusion matrix with Eq. (2). Balanced Accuracy =
t p (t p + f n) + tn ( f p + tn) 2
(2)
Precision This measure, also named Positive Predicted Value (PPV), is the proportion of instances classified as positive that are really positive [23]. It can be calculated from the prediction values using Eq. (3). Precision = t p (t p + f p)
(3)
Sensitivity This performance measure, also named recall, gives the quantity of positive instances classified as positive [24]. The sensitivity is calculated using Eq. (4).
508 Table 2 Structure to calculate the mean importance value (MIV) by each feature in the dataset
J. F. Perez-Gomez et al. Features
Run1
Run30
MIV
Feat1
Importance
Importance
MIV_Feat1
Feat2
Importance
Importance
MIV_Feat2
Feat3
Importance
Importance
MIV_Feat3
Feat253
Importance
Importance
MIV_Feat253
Sensitivity = t p (t p + f n)
(4)
Specificity Is the measure of negative instances classified as negative [24]. The specificity is calculated using Eq. (5). Specificity = tn (tn + f p)
(5)
Feature Relevance According to Yu and Liu [25], the relevance of a feature indicates the level of importance of a feature for an optimal subset. A strong relevant feature cannot be removed without affecting the original conditional class distribution; a weak relevance suggests that the feature is not necessary but may become necessary for an optimal subset at certain conditions; an irrelevant feature indicates that the feature is not necessary at all. Table 2 shows the structure to obtain the relevance of each feature from the dataset using the proposed feature selection algorithms (FSA). This procedure also was implemented in [5]. The Mean Importance Value (MIV) is the discrimination measure to create the feature rankings. The MIV measures the average rank of the feature across the 30 runs of 10-FCV. This is performed for each FS algorithm.
3 Experimental Design This work is based on experiments with classifier methods evaluated on a BV dataset. Three scenarios were explored as detailed below.
3.1 Scenario One: “Performance on Imbalanced Dataset” The first scenario consisted of 30 runs implementing a 10-Folds cross-validation scheme by each of SVM, LR, and DT classification algorithms. In each experiment performed in this scenario the original imbalanced Ravel’s dataset described in
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
509
Sect. 2.1 was used. Here, the complete BV dataset was used to create the classification models. In each cross-validation iteration, 9 folds of instances were used for the model training phase and the 10th fold was used for the testing phase. In each run, the performance measures of predictive models detailed in Sect. 2.2 were calculated. In the end, the performance measures obtained in all 30 runs were averaged to get an overall performance of the models. To ensure data randomness in the training and testing phases, different seeds were used across the 30 runs. The same seed number was used in all classifier runs for comparison purposes.
3.2 Feature Ranking Based on Four Criteria on Imbalanced Dataset In the previous work [5], two feature rankings with the most relevant features were obtained using the DT and Relief methods. More details about how the relevance of the features are obtained can be found in Sect. 2. In this work, two more feature rankings using the SVM and LR methods were calculated to evaluate the feature relevancies with the original imbalanced dataset. Experiments with each method consisted of a 10-FCV scheme repeated 30 times. In each iteration of the crossvalidation scheme a relevancy level was calculated for each feature in the training set, that is, 90% of the original imbalanced dataset. The feature relevance measure is based on the SVM and LR rank criteria. The way how the SVM and LR calculate the feature relevancy is described in Sect. 2. An average of the relevancy measure results from the 10-FCV processes. An overall relevance (MIV) across the 30 runs was obtained for each feature in the BV dataset. To ensure the randomness of the data, a different seed was used in each 10-FCV run. Comparisons across all feature rankings obtained with each rank criterion are provided.
3.3 Scenario Two: “Performance on Balanced Dataset” The second scenario consisted of experiments similarly conducted on Sect. 3.1. That is, within a 10-FCV scheme repeated 30 times with different seeds by each of SVM, LR, and DT classification algorithms. Unlike the first scenario, the dataset was previously balanced with the oversampling technique. For this, the ROSE technique was applied over the original BV dataset using the smoothed bootstrapping method. With this preprocessing method, a completely balanced dataset was obtained (299 instances by class). For a more detailed explanation about the ROSE technique see Sect. 2.2. As in Scenario one, the mean performance measures described in Sect. 2.3 were calculated. In Table 3, the total of instances in both imbalanced and balanced datasets are shown.
510 Table 3 Results of data balancing with random over-sampling examples (rose) technique
J. F. Perez-Gomez et al. BV +
BV −
Total instances
Imbalanced BV dataset
97
299
396
Balanced BV dataset
299
299
598
3.4 Feature Ranking Based on Four Criteria on Balanced Dataset With the BV dataset balanced, four feature rankings with the most relevant BV features were obtained. All features in the dataset were evaluated to obtain a relevancy measure to BV diagnosis. This relevancy measure is based on the SVM, LR, DT, and Relief rank criteria, and is detailed in Sect. 2. All feature relevance were calculated using the same validation scheme as Sect. 3.2. This is, within a 10-FCV scheme repeated 30 times with different seeds by each method with the use of the balanced BV dataset. With this, an averaged feature relevancy from the 30 runs was obtained by each feature in the balanced dataset. From these feature rankings obtained, four sub-datasets with only the top fifteen features were created, one by each. The number of features extracted was used in previous works about BV predictors [5, 7, 8] and also was used in this work for comparative purposes. Only the fifteen most relevant features created in this section are used to perform experiments in Sect. 3.5.
3.5 Scenario Three: “Performance on Four Sub-Datasets” The third scenario consisted of experiments with the classifiers similarly conducted on the previous Sects. 3.1 and 3.3. It involves 30 runs by each classification method under a 10-FCV. Here, the four sub datasets with fifteen most relevant features obtained in Sect. 3.4 were used to create classification models. From the experiments, all performance measures such as accuracy, balanced accuracy, precision, among others described in Sect. 2.3 were calculated in this scenario. The general process of all experiments is shown in Fig. 1.
4 Results 4.1 Feature Rankings Based on Imbalanced BV Dataset The rankings with the most relevant features of the imbalanced BV dataset were calculated. For it, machine-learning methods were implemented to calculate the
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
511
Fig. 1 The general process of all experiments to evaluate the performance of three classification methods with a dataset of bacterial vaginosis (BV) implemented in three scenarios. The red, blue, and green lines symbolize the “Scenario one”, “Scenario two”, and “Scenario three” respectively. SVM; support vector machine, DT; decision tree, LR; logistic regression, ROSE; random over-sampling synthetic examples
feature relevancy with the use of the original imbalanced BV dataset. The relevancy of each feature in the original imbalanced dataset was calculated based on the SVM, LR, DT, and Relief rank criteria detailed in Sect. 2. The fifteen most relevant features of BV are shown in Table 4.
4.2 Feature Rankings Based on Balanced BV Dataset The feature rankings obtained by the Relief, SVM, LR, and DT methods with the use of the balanced dataset using the ROSE method are shown in Table 5. From those rankings, the sub-datasets used in the “balanced and filtered performance” were created, one by each. To create it, only the top fifteen features with the highest relevance measure were selected.
512
J. F. Perez-Gomez et al.
Table 4 Feature rankings of bacterial vaginosis (BV) obtained by the relief, decision tree (DT), logistic regression (LR), and support vector machine (SVM) with the use of the original imbalanced dataset #
Relief Ranking
DT Ranking
LR Ranking
SVM Ranking
1
Nugent_score_ba
Nugent_scorec
Firmicutes_3
Nugent_score_ba
2
Nugent_scorec
Nugent_score_ba
Nugent_score_ba
Finegoldia
3
Prevotellac
Prevotellac
Clostridiales_15
Lactobacillales_2
4
Megasphaerac
Dialister c
Ruminococcac_8
Clostridiales_17
5
Community_grcc
Gardnerella
Firmicutes_5
Gemella
6
pHc
Megasphaerac
Bacteria_4
Prevotellaceae_2
7
Sneathiac
pHc
Acidovorax
Peptococcus
8
Dialister c
Atopobiumc
Neisseria
Arcanobacterium
9
Eggerthellac
Eggerthellac
Bacteria_23
L._vaginalis
10
Ruminococcac_3c
Sneathiac
OD1_genera_se
Bifidobacterium
11
Lachnospirac_8
Community_grcc
Conchiformibius
Lactobacillus_2
12
Atopobiumc
Parvimonas
Bacillus_c
Varibaculum
Features in common in four rankings are labeled “a” (1). Features in common to three rankings are labeled “b” (0). Features in common to two rankings are labeled “c” (12). Table 5 Feature rankings of bacterial vaginosis (BV) obtained by the relief method, decision tree (DT), logistic regression (LR), and support vector machine (SVM) with the use of a balanced BV dataset #
Relief Ranking
DT Ranking
LR Ranking
SVM Ranking
1
Nugent_score_bb
Nugent_scoreb
Firmicutes_3
Nugent_scoreb
2
Nugent_scoreb
Nugent_score_bb
Nugent_score_bb
Gardnerella
3
L._iners
Dialisterc
Firmicutes_5
Eggerthellab
4
Megasphaerab
Prevotellab
Ruminococ_8
Fusobacterium
5
Community_grc_c
Megasphaerab
Bacteria_4
pHb
6
L._crispatus
Gardnerella
Clostridiales_15
Peptostreptococ
7
pHb
Eggerthellab
OD1_genera_se
Lactobacilla_1
8
Prevotellab
Atopobium
Bacteria_23
Ruminococ_3b
9
Ethnic_Groupa
Sneathiac
Acidovorax
Lachnospirac_7
10
Lactobacilla_7c
pHb
Flexibacterac_3
Megasphaerab
11
Sneathiac
Community_gc_c
Actinobacillus
Mycoplasma_1
12
Ruminococcac_3b
Parvimonas
Lactobacilla_7c
Actinomyces
13
Lactobacllales_2
Ruminococca_3b
Haemophilus
Peptoniphilus
14
Dialisterc
Peptoniphilus
Bradyrhizob
Streptococcus
15
Eggerthellab
Aerococcus
Gallicola
Prevotellab
Features in common to all four rankings are labeled “a” (0). Features in common to three rankings are labeled “b” (7). Features in common to two rankings are labeled “c” (4).
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis … Table 6 Total frequencies of bacterial vaginosis (BV) feature common to the eight feature rankings: four of them using the original imbalanced dataset obtained from previous work [5] and the other four feature rankings using the balanced dataset obtained in this work
BV feature
Frequency
Nugent_score_categoryb
7
Eggerthella
5
Megasphaera
5
Nugent_score
5
pH
5
Prevotella
5
Ruminococcaceae_3
5
Community_groupc_
4
Dialister
4
Peptoniphilus
4
Sneathia
4
Atopobium
3
Gardnerella
3
513
4.3 Bacterial Vaginosis Predictors The feature rankings obtained in the previous research [5] and the feature ranking obtained in this work are compared. In both two papers the BV dataset described in Sect. 2.1 was used. However, in the previous work the original imbalanced dataset was used, whereas in this work a data balancing technique was applied to it previous to its use with the algorithms. A frequency distribution was developed to identify the number of times the features appear in the eight feature rankings. For it, all feature rankings obtained by the Relief, SVM, DT, and LR methods using both the imbalanced and balanced datasets are compared. This analysis is shown in Table 6. Many features are found in common between the feature rankings obtained using the imbalanced dataset and feature rankings obtained using the balanced dataset. Even though both two investigations used the same dataset, the class-balancing technique allowed identifying other predictors. In many cases, the feature rankings share similar attributes in common, but the relevance level is slightly varied. Clinical features such as Nugent_score_categoryb, Nugent_score, pH, and microorganisms such as Eggerthella, Megasphaera, Prevotella, Dialister, Gardnerella, among others, are highlighted as the most relevant and the most related to BV diagnosis in all feature rankings compared.
4.4 Bacterial Vaginosis Predictive Models in Three Scenarios The experiments to evaluate the performance of the SVM, LR, and DT classification methods with the BV dataset used in three scenarios were completed. In each
514
J. F. Perez-Gomez et al.
Table 7 Performance achieved by the classification methods with the use of bacterial vaginosis (BV) dataset implemented in three scenarios Method
Sce
Dataset
Acc
Bal Acc
Prec
Sens
Spec
SVM
1
Imbalaced
0.9739
0.9586
0.9781
0.9884
0.9288
SVM
2
Balance
0.9905
0.9905
0.9945
0.9866
0.9943
SVM
3
Relief subset
1
1
1
1
1
SVM
3
SVM subset
1
1
1
1
1
SVM
3
DT subset
1
1
1
1
1
SVM
3
LR subset
1
1
1
1
1
LR
1
Imbalanced
0.7594
0.7134
0.8695
0.8037
0.6231
LR
2
Balanced
0.8914
0.8914
0.9708
0.8081
0.9747
LR
3
Relief subset
1
1
1
1
1
LR
3
SVM subset
1
1
1
1
1
LR
3
DT subset
1
1
1
1
1
LR
3
LR subset
0.9985
0.9985
1
0.9970
1
DT
1
Imbalanced
1
1
1
1
1
DT
2
Balanced
1
1
1
1
1
DT
3
Relief subset
1
1
1
1
1
DT
3
SVM subset
1
1
1
1
1
DT
3
DT subset
1
1
1
1
1
DT
3
LR subset
1
1
1
1
1
Sce: Scenario, Acc: Accuracy, Bal Acc: Balanced Accuracy, Prec: Precision, Sens: Sensitivity, Spec: Specificity, Ms: Microseconds, SVM: support vector machine, DT: decision tree, LR: logistic regression.
case, the mean of the performance measures across all 30 runs was calculated. The performances of the SVM, DT and LR methods obtained are shown in Table 7. In the “performance on imbalanced dataset” or scenario 1, all classifiers obtained the worst performance with respect to the other scenarios, except DT. In this case, DT obtained the highest values in all performance measures. The SVM was the second more accurate in this scenario. In the “performance on balanced dataset”, DT was the most accurate classifier. This classifier obtained 100% in all performance measures. In scenario “performance on four sub-datasets” the classifiers obtained the 100% in all performance measures, except LR. In this scenario, the LR classifier using the sub-datasets created by the LR ranking criterion obtained the lowest performance of all classifiers in this scenario. However, LR obtained better performance in this scenario compared with the other scenarios. In Fig. 2, the performance of the three classifiers in two cases is compared: when an imbalanced dataset is used and when a balanced dataset is used.
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
515
Fig. 2 Graphic comparison of the performance obtained by the support vector machine (SVM), logistic regression (LR), and decision tree (DT) classification methods with a bacterial vaginosis dataset used in two cases: imbalanced and balanced
The performance achieved by the support vector machines and logistic regression methods with the use of an imbalanced dataset was improved with the use of ROSE as data balancing technique. With this process, SVM increased its accuracy from 97 to 99% and LR increased its overall performance from 75 to 89%. DT kept up 100% in all performance measures in both two scenarios compared.
5 Conclusion In this paper, predictive models for BV diagnosis using a machine learning approach were investigated. Experiments to evaluate classification models such as the support vector machine, logistic regression, and decision trees were performed in three scenarios. First, the performance of these classifiers was evaluated with the use of an original imbalanced dataset about BV. Second, these three methods were evaluated using the BV dataset previously balanced with the ROSE technique. Third, the
516
J. F. Perez-Gomez et al.
classifiers were evaluated with the use of sub-datasets created by four rank criteria. After running the experiments, the results were compared. Based on the results, all the classifiers improved their performances with the use of a balanced dataset compared with the use of the imbalanced dataset. The general performance of classifiers with the use of sub-datasets with only the top BV predictors was improved. Based on the results, tree-based classifiers as DT are methods with high classifying abilities for BV diagnosis, but the computational cost is higher than the support vector machine. So, SVM is highlighted as the most trustworthy and the fastest classification method for BV diagnosis. In addition to the well-known microorganisms associated with the BV such as Prevotella, Gardnerella Vaginallis, mycoplasma, among others [26], the machinelearning methods implemented in this work identified other relevant microorganisms. Some of those microorganisms are identified in common among the feature rankings obtained in this paper. These microorganisms are highlighted by a high correlation with the presence of BV. The biological significance of these microorganisms has been analyzed together with experts in the medical field. This work is ongoing research. Other techniques and methods from the machinelearning area are being conducted.
References 1. Javed A, Parvaiz F, Manzoor S (2019) Bacterial vaginosis: an insight into the prevalence, alternative regimen treatments and it’s associated resistance patterns. Microb Pathog 127:21– 30. https://doi.org/10.1016/j.micpath.2018.11.046 2. Onderdonk AB, Delaney ML, Fichorova RN (2016) The human microbiome during bacterial vaginosis. Clin Microbiol Rev 29:223–238. https://doi.org/10.1128/CMR.00075-15 3. Amsel R, Totten PA, Spiegel CA, Chen KCS, Eschenbach D, Holmes KK (1983) Nonspecific vaginitis: diagnostic criteria and microbial and epidemiologic associations. Am J Med 74:14– 22. https://doi.org/10.1016/0002-9343(83)91137-3 4. Nugent RP, Krohn MA, Hillier SL (1991) Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation. J Clin Microbiol 29:297–301. https://doi.org/10.1128/jcm.29.2.297-301.1991 5. Pérez-Gómez JF, Canul-Reich J, Hernández-Torruco J, Hernández-Ocaña B (2020) Predictor selection for bacterial vaginosis diagnosis using decision tree and relief algorithms. Appl Sci 10:3291. https://doi.org/10.3390/app10093291 6. Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122. https://doi.org/10.1007/s10618-012-0295-5 7. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, McCulle SL, Karlebach S, Gorle R, Russell J, Tacket CO, Brotman RM, Davis CC, Ault K, Peralta L, Forney LJ (2011) Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A 108:4680–4687. https:// doi.org/10.1073/pnas.1002611107 8. Beck D, Foster JA (2015) Machine learning classifiers provide insight into the relationship between microbial communities and bacterial vaginosis. BioData Min 8:1–9. https://doi.org/ 10.1186/s13040-015-0055-3 9. Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267:687–699. https://doi.org/10.1016/j.ejor. 2017.12.001
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis …
517
10. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422. https://doi.org/10.1023/A:1012487302797 11. Canul-Reich J (2010) An iterative feature perturbation method for gene selection from microarray data. Retrieved from https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article= 2587&context=etd 12. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05 13. Hernandez-Torruco J, Canul-Reich J, Frausto-Solis J, Mendez-Castillo JJ (2015) Towards a predictive model for Guillain-Barré syndrome. In: Proc Annu Int Conf IEEE Eng Med Biol Soc (EMBS), pp 7234–7237. https://doi.org/10.1109/EMBC.2015.7320061 14. Witten IH, Frank E, Geller J (2002) Data mining: practical machine learning tools and techniques with java implementations. Elsevier. https://doi.org/10.1145/507338.507355 15. Duch W, Grabczewski K, Winiarski T, Biesiada J, Kachel A (2002) Feature selection based on information theory, consistency and separability indices. In: ICONIP 2002 - Proc 9th Int Conf Neural Inf Process Comput Intell E-Age, vol 4, pp 1951–1955. https://doi.org/10.1109/ ICONIP.2002.1199014 16. Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Elsevier Amsterdam, Champaign, IL 17. Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FMJM, Ignatious E, Shultana S, Beeravolu AR, De Boer F (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques. IEEE Access 9:19304–19326. https://doi.org/10.1109/ACCESS.2021.3053759 18. Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203. https://doi.org/10.1016/j.jbi. 2018.07.014 19. Romanski P (2013) Package “FSelector”. Retrieved from http://cran.r-project.org/web/pac kages/FSelector/FSelector.pdf 20. Lunardon N, Menardi G, Torelli N (2014) ROSE: A package for binary imbalanced learning. R J 6:79–89. https://doi.org/10.32614/rj-2014-008 21. Torgo L (2010) Data Mining with R, learning with case studies. Chapman and Hall/CRC 22. Beck D, Foster JA (2014) Machine learning techniques accurately classify microbial communities by bacterial vaginosis characteristics. PLoS One 9(2):e87830. https://doi.org/10.1371/ journal.pone.0087830 23. Bramer M (2016) Principles of data mining. Springer, London. https://doi.org/10.1007/978-14471-7307-6 24. Bramer M (2013) Introduction to data mining. In: Principles of data mining. Undergraduate topics in computer science. https://doi.org/10.1007/978-1-4471-4884-5_1 25. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224 26. Sobel JD (2000) Bacterial vaginosis. Annu Rev Med 51:349–356. https://doi.org/10.1146/ann urev.med.51.1.349
Chapter 42
Approximate Adder Circuits: A Comparative Analysis and Evaluation Pooja Choudhary , Lava Bhargava , and Virendra Singh
1 Introduction With significant increase in importance of IoT, Big Data, Artificial Intelligence and neural networks due to which huge data, complex computations and data acquisition is needed for these applications. Today’s equipment, general purpose computers require energy efficient high performing integrated circuits (ICs). ASICs are in huge need to process large amounts of data with size getting smaller embedded with new technologies. Power consumption and time are main components of energy efficiency. To improve it, both components must be reduced. So the best among all to solve is approximate computing discussed by Han and Orshansky in [1]. It is a technique that enhances performance, reduces power consumption by reducing accuracy. Every time exact computation is not necessary for applications where human senses are in play it is appropriate to apply approximate computing as small errors are not easily recognized. Generally the two types of methods for enhancing speed, performance and efficiency are Voltage-Over Scaling (VOS) discussed in Hedge and Shanbhag [2] and second one is redesigning the circuit for inherent resilience applications presented in Liu et al. [3] and Mohapatra [4]. P. Choudhary (B) Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur 302017, Rajasthan, India e-mail: [email protected] P. Choudhary · L. Bhargava Department of Electronics and Communication Engineering, Swami Keshvanand Institute of Technology Management & Gramothan, Jaipur 302017, Rajasthan, India e-mail: [email protected] V. Singh Dept. of Electrical Engineering & Dept. of Computer Science & Engineering, Indian Institute of Technology, Bombay 400076, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_42
519
520
P. Choudhary et al.
The paper is arranged in the following way Sect. 2 focuses on approximate adders and their classification. In Sect. 3 reviews about the approximate adder circuits are compared based on evaluation done on error and circuit characteristics. Error definitions are also defined. The last Sect. 4 gives the conclusion of the manuscript.
2 Approximate Adders 2.1 Preliminaries Adders are utilized for summation of two numbers. The most prevalent types of adder is full adder, half adder, ripple-carry adder (RCA) and carry lookahead adder (CLA) developed in Koren and Parhami [5, 6].Considering half adder is basic adder; full adder consists of half adder. In RCA, full adders are in cascaded form, and carry of every full adder is propagated to the next one. In RCA, growth of delay is proportional to n. Each adder block has to wait for carry from previous in RCA. On the other hand CLA consists of generating and propagating blocks. The sum and carry output are expressed in terms of carry generate and carry propagate. It reduces the delay but requires a large circuit area. Many methods and techniques are proposed by Lu [7] for CLA to reduce its critical path and circuit complexity. Since 1960, to speed up the division the newton–Raphson algorithm presented by Rabinowitz [8] has been used to approximate quotients followed by the Goldschmit algorithm discussed by Goldschmidt [9] based on iteration. For multiplication and division a logarithmic algorithm was proposed by Mitchell [10]. Even though particular approximation approaches for arithmetic circuits were not present then also, some simple approximation techniques were created like truncation-based multipliers to achieve output from the same bit width as input. These types of multipliers are fixed-width multipliers. In this partial products are accumulated to achieve approximations discussed by Lim, Schulte and Swartzlander [11, 12]. In 2004, to increase clock frequency approximation applied to adders and booth multipliers proposed by Lu [7]. The adder circuit was approximated by the concept of effective carry chain to shorten the chain of full carry; this was discussed by Burks et al. [13]. As compared to the original adder circuit the critical path is much shorter. In 2008, approximation of adders and multipliers gather huge attention which resulted into many designs such as Verma et al. [14] proposed almost correct adder, error-tolerant adder presented by Zhu et al. [15], Mahdiani et al. [16] proposed lowerpart-OR adder, and Mohapatra et al.[4] discussed about the equal segmentation adder. The new techniques and methods for logic synthesis also started developing which will help in lowering the area, complexity and power dissipation. In approximation error constraints play a major role. Iterative and automated approaches for generating approximation adders and multipliers have also been investigated by Venkataramani et al. [17], Vasicek et al. [18] and Mrazek et al. [19].
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
521
2.2 Classification 2.2.1
Speculative Adders
A speculative design, as an early concept, discussed by Lu [7] takes advantage of the fact that in most circumstances, carry chain adder having n-bit significantly shorter than n. In Fig. 1a, n-bit speculative adder the preceding k bits are used which follows k < n to forecast computation of carry from each sum bit. The idea of speculative adder is used for designing the almost correct adder (ACA). Prediction of carry is done by LSBs as shown in Fig. 1b. Due to parallel implementation there is decrease in path delay to O (log (k)) described by Verma et al. [14]. However, hardware overhead in this design is substantial; the overhead issue is minimized by sharing components.
(a)
(b) Fig. 1 a An approximate Speculative Adder [7] b ACA: Almost Correct Adder, Boxes represent Carry propagation path [14]
522
P. Choudhary et al.
Fig. 2 Segmented Adder: Basic Structure. the h-bit inputs are ai,h−1:0 and bi,h−1:0
Segmented Adders The adder is made up of multiple concurrent sub-adder blocks, each with its own carry-in as discussed by Zhu et al. [15], Venkataramani et al. [17], Kahng and Kang [20] and Zhu et al.[21]. Figure 2 presents the basic structure for Segmented Adder. Equal Segmentation Adder (ESA) Mohapatra et al. [4] proposes a method named DSEC dynamic segmentation with error compensation to approximate an adder. This approach breaks down an n-bit into a series of smaller adders that work in tandem with carry inputs. The error compensation technique is not used in this study as the main emphasis is on the approximation; hence the equal segmentation adder is used instead. Figure 3 is presented as a basic circuit for DSEC. In Fig. 3, sub-adders n/k are used here l and k are the size of the first sub-adder and second sub-adder respectively. As a result, ESA has an O (log(k)) delay and significantly lower hardware overhead than ACA. Error Tolerant Adder Type –II (ETA-II) Next approximation concept discussed by Zhu et al. [15] for designing an adder based on segmentation is Error Tolerant Adder Type-II. It’s different from ESA as it comprises carry and sum generators. Previously generated carry signal propagates to the next sum generator as shown in Fig. 4. It is considered more accurate as a large amount of information is required to predict carry bit than ESA. It has a larger delay than ESA but circuit complexity level is similar as proposed by Miao et al. [22].
Fig. 3 ESA: Equal Segmentation Adder. k stands for maximum length of carry chain; l stands for the size of the first sub-adder [4]
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
523
Fig. 4 ETA-II: ETA Type –II, shaded blocks indicate carry propagation [15]
Accuracy–Configurable Approximate Adder (ACAA) Kahng and Kang [20] propose the ACAA accuracy-configurable approximate adder. The circuit configuration may be changed at runtime to adjust accuracy. For designing n-bit adders [n/k − 1] 2 k-bit sub-adders are used. 2 k bits consecutively added in sub-adder with a k-bit overlap, and all sub-adders work in tandem to keep latency to O (log(2 k)). The errors caused by sub-adders are corrected using an error detection and correction circuit. The approximation adder with its pipelined design implements the accuracy configuration. Each sum bit having the same k value, the carry propagation in ACAA and ETAII will be the same, as their error characteristics will be the same. The structure of ACAA is generalized in GeAr generic accuracy configurable adder by Yang et al. [23]. Dithering Adder With upper and lower boundary modules, this type of adder employs a more significant and a less significant sub adder presented in Camus et al. [24]. The MSB sub adder’s carry-in signal is used for selecting the sum output of the less significant sub adder, which is carried in by an additional control signal. The dithering adder divides n-bit adder into sub-adders [24]. Upper sub-adder is considered accurate, while the lower sub-adder consists of a conditional upper and lower bounding module. Signal “Dither Control” is used for identification of sum’s upper and lower bound, which is then passed via sub-adder, resulting in a small error variance.
2.2.2
Approximate Carry Select Adders
Ebrahimi-Azandaryani et al. [25] and Du et al. [26] proposed to carry select adders. These types of adders introduce approximations for the selection of carry-in and sub-adder. These adders are known as an approximate carry-select adder with sum
524
P. Choudhary et al.
Fig. 5 Approximate CSA with sum selection [25]
Fig. 6 Approximate CSA with carry-in selection [25, 26]
selection, as seen in Fig. 5, and an approximate carry-select adder with carry-in selection, as illustrated in Fig. 6. Speculative Carry Select Adder (SCSA) As shown in Fig. 7 n-bit SCSA consists of sub adders m = [n/k] known as window adders. These sub-adders comprise of adder0 and adder1. Adder0 has Carry-in as “0” and “1” is carry-in for adder1 presented by Kim et al. [27]. Critical Path delay for SCSA is addition of the delay of sub-adder and multiplexer. Same accuracy is acquired by ETAII and SCSA because of this function. But SCSA is more complex as it contains extra adder and multiplexer compared to ETA II. Carry Skip Adder (CSA) It’s broken down into [n/k] blocks. Adder blocks are composed of sub adder and sub carry generator. The Signal propagated in the ith block determines carry in for the (i + 1)th sub adder. Carry in is the (i-1)th sub carry generator’s carry out. As a result, this technique improves the accuracy of carry prediction discussed by Ye et al. [28]. Gracefully-Degrading Accuracy-Configurable Adder (GDA) GDA adder is made up of several basic adder units, each of which is a k-bit adder that can be implemented in any design scheme proposed by Lin et al. [29]. Control signals managed by a multiplexer circuit are used in each sub adder to assess the accuracy of GDA. Carry propagation helps in determining the delay of GDA.
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
525
Fig. 7 SCSA: Speculative Carry select adder [27]
Carry Speculative Adder (CSPA) This adder consists of a sum generator, two carry generators (carry0 and carry1) and carry predictor. For (i + 1)the carry signal sum generator is selected by ith carry predictor. Compared to SCSA, hardware is reduced in Li and Zhou [30]. Consistent Carry Approximate Adder (CCA) SCSA concept is used, sum is selected from adder 0 and adder 1 by using multiplexer. But in CCA the propagation of Sel signals in the current and preceding blocks determines the multiplexer. Prediction of carry is dependent on both LSBs and MSBs. Path delays and complexity for area is same as SCSA as discussed by Hu and Qian [31]. Generate Signals Exploited Carry Speculation Adder (GCSA) The structure of GCSA and CSA are the same. Main difference is carry selection. The carry in signal is the block’s most significant signal; otherwise, it is the sub carry generator’s carry out signal. The maximum relative inaccuracy is efficiently controlled by this carry selection method developed by Gupta et al. [32].
2.2.3
Approximate Full Adders
Further, another way to reduce an adder’s delay path and power dissipation is to approximate a full adder proposed by Mahdiani et al. [16]. As illustrated in Fig. 8, the approximate Full Adder (AFA) is utilized to implement l LSBs in an n-bit adder (l < n), while the (n-l) MSBs are computed by an accurate adder. Lower-Part-OR Adder (LOA) In n-bit LOA adder is divided into (n-l) bit MSB and l-bit LSB adder. For LSB sub adders their inputs are processed by OR gates and MSB is an accurate adder. The carry in signal is generated using an AND gate. Approximate Mirror Adder (AMAs) By lowering the number of transistors in mirror adder and their internal node capacitance, five AMAs are proposed by Yang et al. [33]. These are used in LSBs block
526
P. Choudhary et al.
Fig. 8 AFA: Approximate Full Adders [16]
of adder. Critical path delay is more in AMA circuit form 1–4 as compared to LOA because carry is propagating through each bit. In 5th AMA circuit the carry out acts as the input and there is no carry propagation. Approximate Full Adder Using Pass Transistors These adders use pass transistors as multipliers and are based on XOR/XNOR gates. These are mostly used in low-power applications. The delay, area, and power delay product (PDP) of approximate XOR/XNOR-based adders (AXAs) are investigated and compared with an exact adder. The error distance measure is used to evaluate the robustness of approximation designs. Almurib et al. [34] proposed design consumes less power and has higher performance than the accurate XOR/XNOR-based adder. Inexact Adder Cell For approximate computation, the findings suggest three architectures for an inexact adder cell. In comparison to an exact adder and a known inexact design, the latter needs a significantly less number of transistors. These non-ideal cells are simulated and compared to metrics like latency, complexity, and energy delay product. Pashaeifar et al. [35] proposes ripple carry adder replacing exact cells with inexact cells. Approximate Reverse Carry Propagate Adder (RCPA) In the RCPA structure, the carry signal propagates from the most significant bit to the least significant bit discussed in Angizi et al. [36]; therefore the most significant is the carry input signal rather than the output carry. In the presence of delay changes, this type of carry propagation provides more stability. Three implementations of the reverse carry propagate full-adder (RCPFA) cell are presented, each with delay, power, energy, and accuracy. To create hybrid adders with configurable levels of
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
527
accuracy, the structure is proposed which can be paired with an exact carry adder. The results show that using the proposed RCPAs in hybrid adders can improve delay, energy, and energy-delay-product by 27 percent, 6 percent, and 31 percent, respectively, while delivering improved levels of accuracy. MOS/Spintronic Non-Volatile Full-Adder Based on the non-volatile (NV) logic-in-memory structure, a new circuit level design for approximation adder is given by Mrazek et al. [37], and Liu et al. [38]. With circuit reconfiguration and insufficient writing current, two types of NV approximation adders are implemented. Jiang et al. [39] proposed a magnetic full adder, the spin torque transfer magnetic tunnel junction (STT-MTJ) is used as the NV memory element (MFA). The proposed approximation MFAs are realized using 28 nm fully depleted silicon-on-insulator (FD-SOI) technology with ultra-thin body and buried oxide (UTBB). Power consumption, circuit delay, leakage power, error distance, and reliability performance are all shown as simulation results. In addition to the categories listed above. Finally, as a baseline design, a simply truncated adder (TruA) with reduced precision is considered. Using Cartesian genetic programming (CGP) and a multi-objective genetic algorithm, an autonomous library of 430 approximate 8-bit adders was created.
3 Comparative Analysis Error Characteristics A range of design metrics and analytical approaches can be used to evaluate approximate arithmetic circuits discussed by Qureshi and Hasan [40]. The error characteristics were evaluated using the metrics listed below. The error rate (ER) and error distance are two basic error measurements (ED). The ER reflects the likelihood of an incorrect result being produced. The ED depicts the difference in arithmetic between the approximate and accurate outcomes. Approximate Outcome is represented as E and accurate outcome as E. Error Distance (ED) ED = E − E Relative Error Distance (RED) RED = ED E Two crucial elements of an approximation design are revealed by ED and RED. When two input combinations result in the same ED, the one with the lower correct result, E, has a higher RED. The mean ED (MED) and mean relative ED (MRED) are commonly used for the accuracy of design.
528
P. Choudhary et al.
MED =
N
E Di.P(E Di)
(1)
i−1
and MRED =
N
R E Di.P(R E Di)
(2)
i−1
where N is the total number of input combinations for a circuit. P(ED) and P(RED) are probability of the ith input combination. The NMED is defined as the normalization of MED by the maximum output of the precise design, and it can be used to compare the magnitudes of error in different sizes of approximation designs. The mean squared error (MSE) and root-mean-square error (RMSE) are additional popular methods for calculating the magnitude of arithmetic errors. MSE =
N
E Di2 .P(E Di)
(3)
i−1
RMSE =
√
MSE
(4)
The normalized average error is defined as the average error divided by the maximum output of the accurate design. An approximation circuit’s worst case error reflects the biggest ED value discussed in Hanif et al. [41] and Jiang et al.[42]. For evaluation of error characteristics 16 bit approximate adders are considered. Approximate adders are simulated by MATLAB presented in Jiang et al. [43] using random input combinations. Table 1 depicts the simulation results of various adder circuits. The number associated with each adder is the LSBs employed in speculative adders. It also presents length of segmentation and truncated LSB in truncated adders. Each sum bit has the same carry propagation in ETA II, ACAA, and SCSA due to which these adders have the same characteristics for ER, NMED, and MRED. As we can see from Table 1 In terms of MRED, CSA is the most accurate of these approximate adders, while GCSA is the second most exact. The design of LOA differs from that of the other approximation adders. Its MSB part is completely accurate, and the approximate portion is less presented in Reda and Shafique [44]. As a result, while LOA’s MRED is small, its ER is huge. ER for TruA is the highest and most prominent MRED for a similar reason. For n ESA and CSPA their ER and MRED are higher than those of most other approximation designs. CCA, ETAII, SCSA, and ACAA have modest ER and MRED when compared to the other approximation adders. LOA shows lowest average error as it generates both types of errors i.e. positive and negative that can compensate, whereas the other approximation adders generate just negative errors, causing errors to accumulate. As a result, LOA works well for accumulative activities. In terms of NMED and
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation Table 1 Comparative Error Characteristics for the Approximate Adders
529
Adder circuits Speculative adder ACA-5
ER (%) 6.61
NMED (10−3 ) 2.9
MRED (10−3 ) 8.2
GDA-4
5.8
4.7
9.1
GeAr-4
11.5
2.1
7.3
5.8
16.8
Segmented adder ESA-5
85
ETAII-5
2.39
0.2
0.60
ACAA-5
2.39
0.2
0.60
2.39
0.2
0.60
1.0
2.2
Carry select adders SCSA-5 CSPA-5
10.3
CCA-5
3.08
0.2
0.4
GCSA-5
2.02
0.1
0.5
Approximate full adders LOA-8
90
0.54
2
TruA-8
100
1.5
4.5
MRED, segmented adders aren’t very precise. The least accurate adder is truncated of all the corresponding designs when ER and MRED are very large. The error characteristics of three types of approximate adders, ETAII, ACAA, and SCSA, are identical. In order to evaluate circuit properties implementation of approximate adders and accurate CLA are used in Reda and Shafique [44]. Hardware description languages are used for implementation and Synopsys Design Compiler is used for synthesis and generates the results related to characteristics of circuit. The same method, voltage, and temperature are used in all of the designs. Through delay maximum performance of the circuit is evaluated and through area minimum power is examined. The power, delay and speed of approximate circuits are directly compared w.r.t to ERs, NMEDs and MREDs. Table 2 shows comparative analysis of circuit characteristics and summary results are tabulated in Table 3. Each block has two sub-adders and one multiplexer, with SCSA having the largest power dissipation and area despite being the quickest of the ETA II. SCSA and ACAA. Due to its large critical path, ACAA is extremely slow. The ETA II block is less complicated than the SCSA and ACAA blocks. As a result, ETA II uses less energy and takes up less space than SCSA and ACAA. Large-area circuits are more likely to consume more power. Among all adders, CLA has the longest delay. TruA is not the fastest design, but it is the most power and space efficient. In comparison to other adders, according to analysis LOA is area efficient. ESA is the slowest, but its simple segmentation structure makes it powerful and area efficient. CCA has complex hardware design because it has power and area
530
P. Choudhary et al.
Table 2 Comparative analysis of Circuit characteristics Adder circuits Area (µm2 )
Speculative adder
Power (µW)
Delay (ps)
ACA-5
115
230
GDA-4
95
260
68.0
GeAr-4
101
285
59.2
68.1
Segmented adder ESA-5
49.6
249
49.5
ETAII-5
85.9
672
69.4
ACAA-5
80.4
587
72.4 126.7
Carry select adders SCSA-5
151
450
CSPA-5
111.6
287
83.3
CCA-5
195.8
485
131.1
98.7
380
169.7
LOA-8
61.9
425
49.3
TruA-8
59.2
368
51.4
GCSA-5 Approximate full adders
Table 3 Comparative analysis Summary of approximate adder circuits Adder
ER
ED
ACA
LOW
LOW
ACAA
Speed
Power
Area
HIGH
LARGE
LOW
SMALL
LOW
ESA
HIGH
LOW
ETA II
LOW
LOW
GDA-4
LOW
GeAr-4
LOW
SCSA
LOW
CSA
LARGE SMALL SMALL
LOW HIGH LOW
CSPA
LOW
CCA
LOW
HIGH
GCSA
LOW
LOW
LOA
HIGH
TruA
HIGH
HIGH
SMALL
HIGH
LARGE
HIGH
SMALL
HIGH HIGH
LARGE HIGH
LARGE
LOW
LOW
SMALL
HIGH
LOW
SMALL
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
531
consuming design. CSPA and GCSA both dissipate considerable amounts of power, although CSPA is faster. CSA has a medium range of speed and power.
4 Conclusion Current approximation adders are examined in this research, as well as their error and circuit characteristics. In summary, when ER and MRED are modest, the adders chosen are quite accurate such as speculative adder (ACA) (except for CSPA). LOA is an approximate adder with a reasonable MRED, lowest average error, but a relatively large ER. Despite their moderate performance, carry select adders are likely to have significant power consumption and area values. The segmented adders conserve power as well as space. Among all speculative adder design is fastest, as it also consumes a large amount of power and occupies space. The approximation based full adder is sluggish, and space and power consumption is also less. Approximate full adders are quite efficient. Acknowledgements This work is supported by Visvesvaraya Ph.D. Scheme, Meity, Govt. of India. MEITY-PHD-2950.
References 1. Han J, Orshansky M (2013) Approximate computing: an emerging paradigm for energyefficient design. In: 2013 18th IEEE European test symposium (ETS), Avignon, France. IEEE, pp 1–6 2. Hegde R, Shanbhag NR (2001) Soft digital signal processing. IEEE Trans Very Large Scale Integration VLSI Syst IEEE 9(6):813–823 3. Liu Y, Zhang T, Parhi KK (2009) Computation error analysis in digital signal processing systems with overscaled supply voltage. IEEE Trans Very Large Scale Integr VLSI Syst IEEE 18(4):517–526 4. Mohapatra D, Chippa VK, Raghunathan A, Roy K (2011) Design of voltage-scalable metafunctions for approximate computing. In: 2011 Design, automation & test in Europe, Grenoble, France. IEEE, pp 1–6 5. Koren I (2002) Computer arithmetic algorithms, 2nd edn. A K Peters 6. Parhami B (2010) Computer arithmetic, vol 20. Oxford University Press, New York, NY 7. Lu SL (2004) Speeding up processing with approximation circuits. Comput IEEE 37(3):67–73 8. Rabinowitz P (1961) Multiple-precision division. Commun ACM 4(2):98 9. Goldschmidt RE (1964) Applications of division by convergence. Doctoral dissertation, Massachusetts Institute of Technology 10. Mitchell JN (1962) Computer multiplication and division using binary logarithms. IRE Trans Electron Comput IEEE 4(11):512–517 11. Lim YC (1992) Single-precision multiplier with reduced circuit complexity for signal processing applications. IEEE Trans Comput IEEE 41(10):1333–1336 12. Schulte MJ, Swartzlander EE (1993) Truncated multiplication with correction constant [for DSP]. In: Proceedings of IEEE workshop on VLSI Signal Processing, Veldhoven, Netherlands. IEEE, pp 388–396
532
P. Choudhary et al.
13. Burks AW, Goldstine HH, Neumann JV (1982) Preliminary discussion of the logical design of an electronic computing instrument. In: The origins of digital computers, Berlin, Heidelberg. Springer, pp 399–413 14. Verma AK, Brisk P, Ienne P (2008) Variable latency speculative addition: a new paradigm for arithmetic circuit design. In: Proceedings of the conference on design, automation and test in Europe, Munich, Germany. ACM, pp 1250–1255 15. Zhu N, Goh WL, Wang G, Yeo KS (2010) Enhanced low-power high-speed adder for errortolerant application. In: 2010 International SoC design conference, Incheon, Korea. IEEE, pp 323–327 16. Mahdiani HR, Ahmadi A, Fakhraie SM, Lucas C (2009) Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans Circuits Syst I Regul Pap 57(4):850–862 17. Venkataramani S, Sabne A, Kozhikkottu V, Roy K, Raghunathan A (2012) SALSA: systematic logic synthesis of approximate circuits. In: DAC Design automation conference, San Francisco, CA, USA. IEEE, pp 796–801 18. Vasicek Z, Sekanina L (2014) Evolutionary approach to approximate digital circuits design. IEEE Trans Evol Comput IEEE 19(3):432–444 19. Mrazek V, Sarwar SS, Sekanina L, Vasicek Z, Roy K (2016) Design of power-efficient approximate multipliers for approximate artificial neural networks. In: 2016 IEEE/ACM International conference on computer-aided design (ICCAD), Austin, TX,USA. ACM, pp 1–7 20. Kahng AB, Kang S (2012) Accuracy-configurable adder for approximate arithmetic designs. In: Proceedings of the 49th annual design automation conference, San Francisco, California. ACM, pp 820–825 21. Zhu N, Goh WL, Yeo KS (2011) Ultra low-power high-speed flexible probabilistic adder for error-tolerant applications. In: 2011 International SoC design conference, Jeju, Korea. IEEE, pp 393–396 22. Miao J, He K, Gerstlauer A, Orshansky M (2012) Modeling and synthesis of quality-energy optimal approximate adders. In: Proceedings of the International conference on computer-aided design, San Jose, California, pp 728–735 23. Yang X, Xing Y, Qiao F, Wei Q, Yang H (2016) Approximate adder with hybrid prediction and error compensation technique. In: 2016 IEEE Computer society annual symposium on VLSI (ISVLSI), Pittsburg, PA, USA. IEEE, pp 373–378 24. Camus V, Schlachter J, Enz C (2016) A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision. In: 2016 53nd ACM/EDAC/IEEE Design automation conference (DAC), Austin, TX, USA. IEEE, pp 1–6 25. Ebrahimi-Azandaryani F, Akbari O, Kamal M, Afzali-Kusha A, Pedram M (2019) Block-based carry speculative approximate adder for energy-efficient applications. IEEE Trans Circuits Syst II Express Briefs IEEE 67(1):137–141 26. Du K, Varman P, Mohanram K (2012) High performance reliable variable latency carry select addition. In: 2012 Design, automation & test in europe conference & exhibition (DATE), Dresden, Germany. IEEE, pp 1257–1262 27. Kim Y, Zhang Y, Li P (2013) An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In: 2013 IEEE/ACM International conference on computer-aided design (ICCAD), San Jose, CA, USA. IEEE, pp 130–137 28. Ye R, Wang T, Yuan F, Kumar R, Xu Q (2013) On reconfiguration-oriented approximate adder design and its application. In: 2013 IEEE/ACM International conference on computer-aided design (ICCAD), San Jose, CA, USA. IEEE, pp 48–54 29. Lin C, Yang YM, Lin CC (2014) High-performance low-power carry speculative addition with variable latency. IEEE Trans Very Large Scale Integr VLSI Syst IEEE 23(9):1591–1603 30. Li L, Zhou H (2014) On error modeling and analysis of approximate adders. In: 2014 IEEE/ACM International conference on computer-aided design (ICCAD), San Jose, CA, USA. IEEE, pp 511–518 31. Hu J, Qian W (2015) A new approximate adder with low relative error and correct sign calculation. In: 2015 Design, automation & test in Europe conference & exhibition (DATE),Grenoble, France. IEEE, pp 1449–1454
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
533
32. Gupta V, Mohapatra D, Raghunathan A, Roy K (2012) Low-power digital signal processing using approximate adders. IEEE Trans Comput Aided Des Integr Circuits Syst, IEEE 32(1):124–137 33. Yang Z, Jain A, Liang J, Han J, Lombardi F (2013) Approximate XOR/XNOR-based adders for inexact computing. In: 2013 13th IEEE International conference on nanotechnology (IEEENANO 2013), Beijing, China. IEEE, pp 690–693 34. Almurib HAF, Kumar TN, Lombardi F (2016) Inexact designs for approximate low power addition by cell replacement. In: 2016 Design, automation & test in Europe conference & exhibition (DATE), Dresden, Germany. IEEE, pp 660–665 35. Pashaeifar M, Kamal M, Afzali-Kusha A, Pedram M (2018) Approximate reverse carry propagate adder for energy-efficient DSP applications. IEEE Trans Very Large Scale Integr VLSI Syst, IEEE 26(11):2530–2541 36. Angizi S, Jiang H, DeMara RF, Han J, Fan D (2018) Majority-based spin-CMOS primitives for approximate computing. IEEE Trans Nanotechnol IEEE 17(4):795–806 37. Mrazek V, Hrbacek R, Vasicek Z, Sekanina L (2017) EvoApprox8b: library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In: Design, automation & test in Europe conference & exhibition (DATE), Lausanne, Switzerland. IEEE, pp 258–261 38. Liu C, Han J, Lombardi F (2014) An analytical framework for evaluating the error characteristics of approximate adders. IEEE Trans Comput IEEE 64(5):1268–1281 39. Jiang H, Angizi S, Fan D, Han J, Liu L (2021) Non-volatile approximate arithmetic circuits using scalable hybrid spin-CMOS majority gates. IEEE Trans Circuits Syst I Regul Pap IEEE 68(3):1217–1230 40. Qureshi A, Hasan O (2018) Formal probabilistic analysis of low latency approximate adders. In: IEEE Transactions on computer-aided design of integrated circuits and systems, vol 38, no 1. IEEE, pp 177–189 41. Hanif MA, Hafiz R, Hasan O, Shafique M (2020) PEMACx: a probabilistic error analysis methodology for adders with cascaded approximate units. In: 2020 57th ACM/IEEE Design automation conference (DAC), San Francisco, CA, USA. IEEE, pp 1–6 42. Jiang H, Liu C, Liu L, Lombardi F, Han J (2017) A review, classification, and comparative evaluation of approximate arithmetic circuits. ACM J Emerg Technol Comput Syst (JETC) ACM 13(4):1–34 43. Jiang H, Liu L, Lombardi F, Han J (2019) Approximate arithmetic circuits: design and evaluation. In: Approximate circuits. Springer, Cham, pp 67–98 44. Reda,S, Shafique M (2019).Error analysis and optimization in approximate arithmetic circuits. In: Approximate circuits. Springer, Cham
Chapter 43
Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed Traffic Conditions on Urban Roads K. C. Varmora, P. J. Gundaliya, and T. L. Popat
1 Introduction In many cities of India, the urban roads are witnessing the rapid and uncontrolled growth of vehicles as the vehicle ownership of the urban population has increased due to the increase in their income during the last few decades. This situation resulted in heavily mixed traffic conditions. The size and composition of vehicles are significant factors affecting the equivalency factor which in turn influence the stream speed of traffic as well as the flow rate of vehicles. Generally, this is applicable for urban roads where the flow rate of vehicles may vary frequently. In mixed traffic, all the vehicles use the same road at the same time which demands a necessity to convert different vehicles into standard vehicles. Altogether this creates major problems for transportation professionals. It is not easier to fully understand the fluctuation in PCU value for any vehicle type at a given time. Under such a situation, it becomes necessary to use the term “Dynamic Passenger Car Unit (DPCU)” [1]. Stream Equivalency (SE) factor (or value) is a new concept used to know the variation (or fluctuation) in the flow rate of vehicles moving at corresponding speeds in mixed traffic conditions. Many times, the collected data is not consistent and some missing values are required to be predicted. In such a case, it is necessary to acquire consistent data by predicting missing or new SE values by applying an accurate method. Very few researchers have P. J. Gundaliya—Passed away T. L. Popat—Retired from the Gujarat Technological University K. C. Varmora (B) · T. L. Popat Gujarat Technological University, Ahmedabad, India e-mail: [email protected] P. J. Gundaliya Civil Engineering Department, L.D. College of Engineering, Ahmedabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_43
535
536
K. C. Varmora et al.
demonstrated a clear correlation between DPCU per hour and the flow of vehicles for an hour to attain stream equivalency value. This research attempts to predict missing SE values using a neural network in “R” at different stream speeds involving its effect on SE values in mixed traffic conditions on selected urban road segments in the city of Ahmedabad.
2 Literature Review “The Highway Capacity Manual (HCM-1965) [1] discussed the term Passenger Car Unit (PCU) or Passenger Car Equivalent (PCE) and defined as the number of passenger cars displaced in the traffic flow by other transport modes under the prevailing roadway and traffic conditions”. Considerable research efforts have been accomplished to estimate PCU for the different vehicles running on the varying categories of roads in various cities of the world. Mohammad [2] mentioned that the vehicle speed along with the projected plan area of vehicles gave the appropriate estimate of passenger car equivalent values. Patel and Joshi [3] carried out a study to obtain equivalency factors for Indian mixed traffic conditions. However, Chandra and Sikdar [4] have contributed to developing a very useful concept of DPCU value and formulated an equation based on the plan area of different vehicles involving the speed to obtain the passenger car. In another study, “Chandra and Kumar [5] derived values of PCU for various types of vehicles by employing speed as an important parameter”. “Dhamaniya and Chandra [6] proposed a useful methodology to present a new concept of stream equivalency factor for heterogeneous traffic by collecting field data on six urban arterial roads in New Delhi, India”. They used simulation to observe the effect of traffic composition and its volume on the Stream equivalency factor. Dhamaniya and Patel [7] carried out a study with a new methodology to estimate the saturation flow by developing stream equivalency based on the derived PCUs during the saturated green time. Indo-H.C.M. [8] has given a very useful methodology to find out stream equivalency factors for the Indian conditions. Gultekin et. al. [9] developed a neural network based on a traffic-flow prediction model for Istanbul, Turkey. Gallo et. al. [10] attempted to forecast passenger flow using ANN on railway sections of Naples Metro, Italy. The methodology employed in this paper correlates traffic stream speed and its effects on stream equivalency values under mixed traffic conditions.
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed …
537
Table 1 Length and carriageway width of road segments Sr. No
Name of Urban Road
Length (m)
Carriageway width (m)
No of lanes
1
Kalupur road (Kalupur Circle to Kalupur Railway Station segment)
380.0
11.50
3
2
Ring road (IIMA to GMCD segment)
900.0
19.00 (9.50 + 9.50)
4
3 Methodology 3.1 Selection of Study Area Segments of two urban roads, namely Kalupur road and Ring Road (known as 132 feet Ring Road) of Ahmedabad city were selected for the study. No gradient was observed on both roads. The pavement condition was excellent too. Both the road segments were selected in such a way that the interference of the signalized/non-signalized intersection can be eliminated. The segment length and carriageway widths of both the roads are presented in Table 1.
3.2 Collection of Data Two important parameters like traffic stream speed and SE values were required for this study. To compute these parameters, it was necessary to calculate travel time (sec) and flow of vehicles. To obtain the travel time of observed vehicles many methods are available such as the License Plate method, Photographic method, Elevated observation method, Moving Observer method, etc. As these methods are less accurate, many researchers used GPS and Number Plate detection methods for this purpose. In this study, the number plate detection method was employed to obtain accurate speed values. Each road section was covered by placing two numbers of HD video cameras, one at entry-level and one at exit level in the respective direction of traffic flow. The same timestamp was set for each camera displaying the time up to an accuracy of one second. The cameras were placed in such a way that the number plates, as well as the count of vehicles, can be ensured. The data were collected at an interval of every 5 min for 3–4 h during morning peak hours and for the same period during evening peak hours including off-peak hours.
538
K. C. Varmora et al.
3.3 Analysis of Data Traffic Stream Speed The travel time (sec) and distance (metre) were converted into an hour and km respectively to attain the speed of different vehicles in km/h by the following equation Vi =
d t
where, V i = Travel speed of the ith vehicle d = Total distance traveled by vehicle in km t = Time required to travel distance d. After obtaining the speed of different vehicles, the value of traffic stream speed (km/h) was obtained for every interval of 5 min. SE value In the mixed traffic condition, there is a great variation in the size and speed of different types of vehicles. The size was taken from INDO-HCM [8]. Following equation derived by Dhamaniya and Chandra [6] to attain the Stream equivalency factor was used. flow in PCU h Si = flow in vehicles h where flow in PCU/h is considered as flow in DPCU/h Figures 1 and 2 show the scatter plots for the obtained SE values and Traffic stream speed (km/h) for Kalupur road and Ring Road respectively.
3.4 Development of Neural Network Model The collected data was as per the actual field condition and found to be inconsistent. Hence for obtaining consistent data, it was necessary to predict missing values. During the last few decades, the neural network has become very useful to predict missing values at a greater accuracy level compared to a simple linear program. It can also be performed in any application. Generally, the Neural network consists of three components as shown in Fig. 3. The inputs form the input layer, the middle layer which performs the processing is called the hidden layer, and the output forms the output layer. The simple procedure done by a neural can be stated as: Output = sum (inputs ∗ weights) + bias.
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed …
539
Fig. 1 Plot between traffic stream speed and SE value for Kalupur road (Kalupur circle to Kalupur railway station segment)
Fig. 2 Plot between traffic stream speed and SE value (Ring road-IIMA to GMDC segment)
Many application tools and software are available which provide the platform to support neural networks such as MATLAB, “R”, Neural Designer, Darknet, Keras, etc. “In this study, R is used to develop a neural network model”. As the aim is to develop a neural network model to predict missing SE values, the original data points of traffic stream speed are considered as input while SE values are considered as output. The training and testing of the data set were performed to obtain the output. Missing data was then predicted to eliminate the inconsistencies. The output from the model was then compared with the original SE values. Statistical parameters like R2 and Root Mean Square Error (RMSE) are applied to ascertain the accuracy of
540
K. C. Varmora et al.
Fig. 3 Schematic Neural network diagram
the neural network model. Figs. 4 and 5 show the plot between observed SE and predicted SE values. The R2 values obtained for Kalupur road and Ring Road were 0.76 and 0.80 respectively, and RMSE values were 0.077 and 0.036 respectively for both roads. Referring to all these statistical values the model was accepted to predict the missing values.
Fig. 4 Observed and Predicted SE value for Kalupur road (Kalupur circle to Kalupur railway station segment)
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed …
541
Fig. 5 Observed and Predicted SE value for Ring road (IIMA to GMDC segment)
3.5 Prediction of Missing Values To predict the missing SE values, traffic stream values were given as input. Referring to Fig. 1 missing SE values are observed from 11 to 15 (km/h) and 34 to 41 (km/h) speed values respectively, whereas Fig. 2 shows the missing values from 13 to 16 (km/h) and 38 to 47 (km/h) respectively. Hence the speed values are entered as input at an interval of 0.5 (km/h) for these ranges to obtain missing SE values.
4 Results As discussed in Sects. 3.4 and 3.5 statistical parameters R2 and RMSE are applied to ascertain the accuracy of predicted SE values. The following Eqs. (1) and (2) are obtained by the neural network model using “R” for Kalupur road and Ring road respectively 3.72947E −6 X 4 − 0.00051X 3 + 0.02498X 2 − 0.50598X + 4.31257
(1)
1.30869E −6 X 4 − 0.00019X 3 + 0.01038X 2 − 0.23249X + 2.34947
(2)
Both the equations exhibit the accurate value of SE as an output compared to a simple linear equation for the given value of stream speed under this study. After the prediction of missing SE values, the results are plotted to show the relationship between traffic stream speed versus original and predicted SE values as shown in Figs. 6 and 7. For Kalupur road, the observed SE value is 1.17 for a minimum speed of 10.4 km/h and 0.48 for a maximum speed of 49.10 km/h while for Ring road (Fig. 7), the observed SE value is 0.71 for a minimum speed of 13.0 km/h and 0.32
542
K. C. Varmora et al.
for a maximum speed of 54.0 km/h. The results show that SE values varied rapidly due to an increase or decrease in traffic stream speed.
Fig. 6 Missing SE values for Kalupur road (Kalupur circle to Kalupur Railway station segment)
Fig. 7 Missing SE values for Ring Road (IIMA to GMDC segment)
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed …
543
5 Conclusion As discussed earlier, SE is the ratio of DPCU per hour to the total number of vehicles per hour, it represents the relative composition of different vehicles at the corresponding speed. Also, the stream speed has a great impact on SE values. The data collected on two urban road segments having no curvature and gradient was the limitation of this study. In this study, the Neural network model was developed in “R” to predict SE values and then the statistical parameters, R2 and RMSE are applied to test the accuracy of the model. The SE values suddenly decrease with an increase in traffic stream speed while increasing with a decrease in the same speed value. The methodology employed in this study can be helpful for obtaining new or missing SE values involving its variation at a given traffic stream speed for similar types of urban road segments.
References 1. Highway Capacity Manual (1965) Special Report 87: Highway Research Board, Washington, DC 2. Muhammad A (2014) Passenger car equivalent factors in heterogeneous traffic environment-are we using the right numbers? Procedia Eng 77(2014):103–113 3. Patel C, Joshi G (2015) Equivalency factor using optimization for Indian mixed traffic condition. Int J Traffic Transp Eng 5(2):210–224 4. Chandra S, Sikdar P (2000) Factors affecting PCU in mixed traffic situations on urban roads. Road Transp Res 9(3):40–50 5. Chandra S, Kumar U (2003) Effect of lane width on capacity under mixed traffic conditions in India. J Transp Eng 129(2):155–160 6. Dhamaniya A, Chandra S (2013) Concept of stream equivalency factor for heterogeneous traffic on urban arterial roads. J Transp Eng 139(11):1117–1123 7. Dhamaniya A, Patel P (2018) Stream equivalency factor for mixed traffic at urban signalized intersections. Transp Res Procedia 37(2019):362–368 8. Indian Highway Capacity Manual (2012–2017), New Delhi 9. Gültekin B, Murat S, O˘guz B (2019) A neural network-based traffic-flow prediction model. Math Comput Appl 15(2):269–278 10. Gallo M, De Luca G, D’Acierno L, Botte M (2019) Artificial neural networks for forecasting passenger flows on metro lines. Sens 19(15):3424. https://doi.org/10.3390/s19153424
Chapter 44
Intelligent System for Cattle Monitoring: A Smart Housing for Dairy Animal Using IoT Sanjay Mate , Vikas Somani, and Prashant Dahiwale
1 Introduction A very important task is to identify and evaluate village lifestyle, health, and economic infrastructure challenges. Table 1 shows that in 2020, 43.85% of the world’s population and 69.07% of India’s population residing in villages is relatively less than in all previous years. The key reason behind the decline in the overall percentage of the rural population is urbanization and declining income sources. A vital task is to identify and evaluate village lifestyle, health, and economic infrastructure challenges. Table 1 shows that in 2020, 43.85% of the world’s population and 69.07% of India’s population residing in villages comparatively is less than in all previous years. The major reason behind the decrease in the overall percentage of the rural population is urbanization and declining income sources. Papua New Guinea, located at Melanesia in the southwestern Pacific Ocean, is the world’s largest scenery population nation having almost 87% rural population of its overall population. Diverse countries have no countryside population, such as Singapore, Kuwait, Hong Kong SAR, China, etc. Table 2 shows that earnings towards the higher side decreases in the percent of the population. In short, the rural population has less income day after day. S. Mate (B) Sangam University, Bhilwara, India e-mail: [email protected] Government Polytechnic Daman, Daman, India V. Somani Computer Science Engineering Department, Sangam University, Bhilwara, India e-mail: [email protected] P. Dahiwale Computer Engineering Department, Government Polytechnic Daman, Daman, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_44
545
546
S. Mate et al.
Table 1 Year-wise rural population percentage of the world out of total population [1] Year
Rural Population of the World (% of the total population)
Rural Population of India (% of the total population of India)
1960
66.382
82.08
1970
63.446
80.24
1980
60.65
76.90
1990
56.971
74.45
2000
53.311
72.33
2010
48.354
69.07
2020
43.85
65.07
Table 2 Economy-based distribution of the rural population of the world in 2020 [1]
Economy (Income) World Rural Population of the World w.r.t income Low
67
Low middle
58
Low and middle
49
Middle
47
Upper middle
32
High
18
An indispensable reform of the village into a smart village, IoT helps by a large growth in agriculture and associated sectors. Figure 1 shows smart village design cycle phases consisting of recognition and mapping, decision-making, Prototype Development, evaluation, and scale-up. In each stage, the IoT components and systems for data processing are used as per the requirement.
Fig. 1 Phases in smart village design cycle
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
547
2 A Smart Village Design Cycle Figure 1 shows phases in the smart village design cycle. It shows four significant blocks: identification and mapping, decision making, prototype development, evaluation, and scale-up. a. Identification and Mapping: This phase focuses on identifying issues or problems to be raised. It tries to map problems with innovative IoT-based solutions and evaluate the challenges. Smart village solutions must cover cost-effective, energyefficient, low-power, or resource-based options. b. Decision Making: A precise decision-making makes a system robust. It should consider all different stakeholders, digital infrastructure, data repository, algorithms, etc. c. Prototype Development: It focuses on designing a small-scale and precise working model considering all scenarios fulfilling desired goals. Further to implement it on a large scale. d. Evaluate and Scale-Up: Model scale up to benefit all stakeholders once a successful prototype is developed, tested, implemented, and feedback-based action is considered. Some important factors need to be considered in developing a village into a smart village are agriculture and allied sector, socio-economic framework. Agriculture waste management can be enhanced with IoT by monitoring collection, transportation, and disposal treatment [2]. The actual problem is regarding education, hands-on technology associated with agriculture, food processing, fishing, poultry, cattle-diary, honey harvesting, etc. finance and business models are important issues in smart village socio-economic reform [3]. As shown in Table 3, IoT-based communication technology has a significant role in turning a village into a smart village. RFID, ZigBee, Bluetooth, WIFI, LPWAN, Cellular, etc., technologies in communication with different smart village verticals. To develop better e-services, open and wide paradigms like multiple applications, agents, technology, and social innovations need hard work and effort. It’s time to revamp traditional indefensible public infrastructure [4]. A good infrastructure with clean, good pH value, oxygen, and turbidity in water [5]. The Korean livestock industry uses IoT-based cattle sheds [6]. Fixed and variable cycle algorithms were Table 3 Mapping of Smart Village Verticals and IoT based communication Smart village vertical
RFID
ZigBee
Bluetooth
WiFi
LPWAN
Cellular
Climate monitoring
–
–
–
–
Yes
–
Dairy
Yes
Yes
–
–
Yes
–
Smart lighting
–
–
–
–
Yes
–
Smart village home
–
Yes
Yes
Yes
–
Yes
Water and waste management
Yes
–
Yes
Yes
Yes
–
548
S. Mate et al.
used in cattle shed management systems. BLE beacon-based smart cattle improve power consumption in cattle sheds [7].
3 Literature Review Cattle bodily and emotional fitness, diet, shed atmosphere, milk parlor cleanliness is monitored in an efficient manner for additional milk production. Cattle shed in general temperature, and cattle temperature is essential to monitor as even an eye temperature is subjected by pain and heat stress in cattle [8, 9]. As ambient temperature exceeds its benchmark threshold value, heat stress is realized in cattle. Skin, eye, and rectal temperature also affect cattle [10]. Unique experimentation consider dairy farm owners at the frontiers of farming [11]. In the period of high capability and capacity milk parlors, milking processes including pasteurization, cleaning and sanitation, and sanitary equipment design [12], various models were developed and studied based on the sensor for identifying clinical mastitis [13] and other supplementary health issues in the cattle. Mixed assorted linear models or principal component analyses were used to test lactation curves in dairy experiments [14]. Milking occurrences or frequency needs specialized monitoring as per the lactation and milk production; number of studies were made on udder health among traditional conventional milking and automatic milking (AM) [15]. A great deal of research was done in the case of udder health management and herd somatic cell count (SCC), typically in European countries. At the same time achieving an udder health management, a number of standard practices are linked with lowering the rate or pace of SCC. The rate or pace of SCC can be lowered by making use of blanket dry-cow therapy, sand bedding, parental selenium supplementation, recurrent use of California Mastitis test, close watch of dry-cow udder for mastitis, free stall system, udder hair management, cleaning the Calvin pen after each calving [16]. The Automatic or mechanical Milk System (AMS) is admired for focusing on the pros and cons regarding mastitis issues and milk quality [17]. AMS analysis is for milk, animal health issues and welfare as well [18]. Lameness influences the overall behavior of cattle. Changes occur in the cubical area when cattle are resting, interacting with the milking parlor, AMS [19]. On a large scale in European countries, more research on the relationship between technical, biological, and economic issues was found in AM farms [20]. On pasture milk and milking interval (MI) and milking frequency (MF) on milk yield in pasture-based automatic milking system (AMS) cows; in a some studies analysis, it is identified that compared to indoor AMS feeding cows, lesser milking frequency (MF) is noted or observed in pasture-based AMS cows no studies found [21]. Milk fatty acid and herbage intake composition were impacted by botanical composition and grassland administration and management; less SFA and more polyunsaturated fatty acids (PUFA) are identified in grazing cows and bio-diverse pasture, while house, cows have extra saturated fatty acid (SFA) and less PUFA [22]. Milk producers need to face loss because cow diseases and breeds which are having high productivity are more susceptible/ vulnerable to diseases like udder inflammation compared to
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
549
local breeds; abnormalities/fault in the construction of the udder were found with increased/higher inflammation incidences, few studies recommended /suggested that deficiency of vitamin E, vitamin A, selenium resulted in to increase in the number of mastitis incidents [23]. Studies and experimentation were made to connect technologies, processes, and practices [24]. Electrical conductivity (EC) was used to identify/detect clinical mastitis, but it has a difference in results [25]. AMS is useful to increase milk and reduce human efforts. Also, researchers and enthusiasts are trying to make hassle-free and error free processes in milk parlors [26] and AMS with advanced and improvised tools that will help to detect diseases in their beginning/early stages [27]. With the help of little assessment of technology and planned methods helped to detect diseases in the early or primary stages of the AMS era [28]. Day by day, decreasing labor effectiveness will help to decrease mistakes/ errors in milk parlors. On the other hand, AMS helps improve/recover the overall milking system and alerts about cow health [29]. It is identified that automation and similar innovation in milking a cow is becoming adoptable and popular as it has various benefits compared to conventional approaches. Another essential aspect is it is useful in the early detection of health issues like udder issues, mastitis, and Lameness and suggests remedies or first aid from its data repository. A biosensor dependent model is available for early detection of health problems which alerts users to take defensive precautionary steps and generate reports on health problems. In totality there are multiple benefits of automated milk systems and sensor-based health monitoring of cattle. Some unique thermal and non-thermal mechanisms are available in pasteurizing milk, and advances in technology are available. In non-thermal ultrasonic, ultraviolet and irradiation were used, while Ohmic heating, microwave, and radiofrequency were used in the thermal approach [30]. Non-thermal plasma as milk sterilization reduces the microbial cell numbers in the milk [31]. Innovative systems help to increase milk production with the help of robotic milking systems, analyze, process and preserve the milk [32].
4 Proposed Model Many different sensors are used at various stages in smart cattle shed management. As shown in Fig. 2, the proposed model has few major interlinked blocks. Sensors used in a cattle diet, health, shed atmosphere, AMS, milk processing, and waste management send real-time data to the system. Cattle diet sensors inform about moisture in the feed, weight of feed, water pH, and intake. Cattle health-related sensors inform body temperature, saliva alkaline, heart pulse, electrical conductivity, body movement, etc. Cattle shed sensors provide information about room temperature, daylight, air ventilation, humidity, toxic gasses, etc. AMS gives information about Pulsator, Teat cup shells and liners, Milk receptacle, Vacuum pump and gauge, Vacuum tank, and Regulator likewise devices used in AMS. A temperature sensor and a timer have an important role in milk processing.
550
S. Mate et al.
Fig. 2 A model for an advanced system for cattle monitoring
Cattle waste management informs about sewage tank capacity and status, biogas operative status, and bio-fertilizers status. Data received at the system is processed and analyzed. The system makes precise decisions with the help of an algorithm. Further, as per instructions, a suggestion of algorithm actions takes place with the help of actuators. For example, if there is cloudy weather during the daytime, then electric light starts in the shed, on the rising humidity, starts sprinklers and foggers, opens the vent for fresh air, etc. On drastic change in cattle health parameters like saliva alkaline range not in between 8.55 and 8.90, body temperature not between 37.8 and 39.2 °C, pulse rate not in between 40–80 per minute then inform via an alert system notification like text message, email or system-generated call, etc. Precise decisions are provided by the algorithm on receiving inputs from sensors to the system. A set of actuators takes action. Cattle shed mainly consists of equipment like cattle feeder pots used to feed them wet and dry feed and drinking water, crates or crushes were used to inspect cattle physically, weighing machine, different variety of grass cutters, and IoT based sensor devices and actuators. This paper discusses smart cattle sheds that mainly focus on monitoring and controlling things precisely. Smart cattle shed maintains the overall atmosphere by controlling air ventilation, light, nutritionally rich wet and dry feed, pH balanced drinking water, toxic gasses, animal waste management, biofuel, biodegradable waste, bio-fertilizers, etc. Smart cattle shed health monitoring includes observing cattle’s body temperature, saliva, breathing, heartbeat, walking, lying, grazing, weight, insemination and reproductive cycle, milk quality, etc. Innovations in smart dairy farms are mainly categorized into two parts: product and process innovation. Process innovation consists of cattle diet, health, cattle shed
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
551
atmosphere, and automatic milking systems (AMS). Product innovation consists of milk processing and cattle waste management, etc.
4.1 Cattle Shed Atmosphere Monitoring and Controlling Arduino Uno helps to maintain the cattle shed atmosphere. Daylight, air ventilation, temperature, humidity, gasses like O2 , CO, H2 O, Methane, etc., parameters within cattle sheds can be monitored and controlled. Using a set of different sensors, the cattle shed atmosphere is controlled. For example, light, flow, room temperature, absolute or relative humidity, and Micro Electro Mechanical System (MEMS) gas sensors. Room temperature, humidity, and gasses need to be monitored and controlled for the better physical health of cattle. Good ventilation and daylight are keys to better emotional health and stress-free cattle. Sensors provide run-time values to the system. If given values are reaching or crossing threshold actuators, start working. For example, as the particular gas concentration increases, the windows or shutters of the shed will open automatically for a short tenure. The electrical light equipment will start in dim light, in cloudy weather, during nighttime. In the presence of toxic fumes or gasses, an alarm gets generated in short messages, emails, ring-bell, etc. Wildlife or cattle monitoring and controlling is another important factor in smart cattle shed. Monitoring via a wearable GPRS collar and ZigBee network.
4.2 Cattle Health Monitoring Cattle health is monitored by placing various sensors on different parts of the cattle. Sensors use body parts to plot them are like temperature sensor and microphone at the neck, load sensor under the feet, heartbeat sensor at the vein on the neck, gas sensor near the nose, electrical conductivity sensor at the udder, accelerometer sensor at the neck, feet, udder, and near the tail. These sensors provide real-time data, which is processed to generate calls or alerts to stakeholders like veterinary Doctors, cattle shed management teams, etc. Figure 3 shows Schematic of smart cattle farms indicating parameters and biomarkers measured by IoT. Figure 3 shown in details about cattle feed, expected behavior on certain atmospheric conditions in cattle shed and automated control system. For example, adult cattle’s body temperature ranges between 37.8–39.2 °C; hence it is closely monitored. If it changes from the given standard values, the temperature sensor generates alerts to stakeholders. Similarly, adult cattle have 48–84 heartbeats per minute. If the sensor detects abnormal pulse values, it will generate an alert to the respective stakeholders.
552
S. Mate et al.
Fig. 3 Schematic of smart cattle farms indicating parameters and biomarkers measured by IoT
4.3 Cattle Food and Drinking Water Monitoring and Management Nutritionally rich soil gives healthier plants [33]. Nutritionally rich food keeps animals physically and emotionally healthy. Cattle grazing can be grouped into two types, i.e., grass-fed or house-fed and grazed at open grassland. Cattle in a shed are mostly grass-fed or house-fed, whereas cattle grazed on open grassland is comparatively better to keep them emotionally and physically healthy. A set of sensors is plotted in the smart cattle farm at the feeding pot in a smart cattle farm. Sensors help to know the moisture and the weight of feeds provided to cattle. Dry feed, wet feed, oil cakes, other nutritional supplements, etc., were monitored for moisture and weight. A sensor helps to monitor cattle drinking water in parameters like pH, temperature, and intake. A separate pot of water is highly recommended for controlling saliva-based spreading diseases like ulceration and drooling. The system can maintain a log for dry, wet feed, oil cake, supplements, and water intake. Such logs help detect early feed intake-related health illnesses, and cattle diet can be advised and monitored per season.
4.4 Cattle Waste Management BioDegradable Waste Composting is safe to biodegrade organic wastage. Compost helps to remediate soil polluted by heavy metals. Several bio-insects have a vital role in compost to degenerate biomolecules. Some popular insects are Black Soldier Fly (BSF) [34], Milichiidae [35], housefly larvae [36], Japanese Beetles [37], and cricket
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
553
Fig. 4 Bi-products from cattle waste i.e. feedstock sources [39]
[38]. A few are Berkley rapid composting, Indian Indore composting, Indian Bangalore composting, sheet composting, status composting, vermicomposting, window composting, and vessel composting, popular methods used for waste management in India. Use of compost increases soil fertility, soil amendment, crop yield, and erosion control. Figure 4 shows biofuel in the form of biogas prepared from cow dung with the help of the biogas plant. Biogas plants are advanced with the help of IoT technology, which gives ease in monitoring, control, and overall management. Biofuels External parameters considered while planning biofuel production, fertilizer, electricity, etc., from bio-waste are weather conditions and feedstock source. At the biogas plant, improved planning and monitoring of anaerobic digestion are required, as shown in Fig. 4, which can be achieved with IoT-based sensors. Sensors help monitor and control the pressure of methane gas and recognize gas production from available inputs. Bio-fertilizers These are used to produce bio-fertilizers. Few popular bio-fertilizers are Rhizobium, Azospirillum, phosphate solubilizing microorganisms, and silicate solubilizing bacteria. There are five significant steps in making biofertilizers from biofuel. These steps are choosing active organisms, isolation and selection of target microbes, selection of method and carrier material, selection of propagation method, and Prototype testing.
554
S. Mate et al.
4.5 Milk Process Equipment Traditionally, heat treatments are preferred in milk processing, homogenization, and fermentation to produce yogurts. Significant types of milk processing are thermal and nonthermal processes. Thermal Processing has several types: Spray Drying, Baking Milk, Ultrahigh Temperature (UHT) Processing, Sterilization, and Pasteurization. Various thermal methods like pasteurization, sterilization, and UHT use temperatures of 65, 120, and 135 °C, respectively [40]. Each thermal process time varies for heating milk from a few seconds to twenty minutes. Nonthermal Processing has several types: Homogenization, High-Pressure Homogenization (HPH), Ultrasonic Treatment, Enzymatic Processes, and Fermentation. Few novel techniques can be categorized for milk processing as Irradiation, Microwave Treatment, and Cold Plasma Treatment. Heat processing is mainly used for milk pasteurization, sterilization, and ultrahigh temperature (UHT) processing. The degree of the structural changes of the proteins occurring during heat treatment depends on the thermal conditions and treatment time and the type of protein and other food components such as lipids and carbohydrates, known as the “matrix effect” [41]. IoT-based temperature sensors are helpful for real-time temperature reading and managing time intervals for various heating processes. In pasteurization treatment, a few things should be followed in an effective way for a successful pasteurization process: (a) Each particle of milk must be heated in harmony with the time and temperature criteria, (b) The equipment or utensils, or apparatus must be appropriately designed, (c) The equipment or utensils or apparatus must be accurately operated, (d) A temperature standard must be met, and (e) A holding time or tenure to meet the standard must be achieved in complement to the temperature.
5 Conclusion Nutritional supply of feeds and fresh, pH balanced water intake leads cattle to a healthy lifestyle. A well-managed Cattle shed environment leads to keeping them better physically and emotionally. Ultimately it benefits better milk production. Further, a model can help farm owners adopt recent IoT technologies to improve cattle waste management byproducts. The processes at milk parlors benefit early detection of health diseases like mastitis. The proposed system is an overall architecture for improving cattle farm sheds, healthy cattle, milk processing, waste management, etc. IoT-based cattle farming is more efficient. In the early stage set up, smart cattle farms need huge investment. This model helps investors as a guide, and improved technological infrastructure helps to balance investment and earning amounts.
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
555
References 1. World Bank staff estimates based on the United Nations Population Division’s World Urbanization Prospects (2018) Retrieved from https://data.worldbank.org/indictor/SP.RUR.TOTL.ZS? most_recent_value_desc=true 2. Gnanaraj AA, Jayanthi JG (2017) An application framework for IoTs enabled smart agriculture waste recycle management system. In: World congress on computing and communication technologies (WCCCT). IEEE, Tiruchirappalli, India, pp 1–5. https://doi.org/10.1109/WCCCT.201 6.11 3. Anand PB, Navio-Marco J (2018) Governance and economics of smart cities: opportunities and challenges. Telecommun Policy Elsevier 42(10):795–799 4. Saunders T, Baeck P (2015) Rethinking smart cities from the ground up. Nesta SAGE, London 5. Ramesh MV, Nibi KV, Kurup A, Mohan R, Aiswarya A, Arsha A, Sarang PR (2017) Water quality monitoring and waste management using IoT. In: 2017 IEEE Global humanitarian technology conference (GHTC). IEEE, San Jose, CA, pp 1–7 6. Choe JG, Jang YH, Kwon YJ (2016) Implementation of user’s moving direction detecting system using BLE beacon and its application. In: Proceeding of Korean society for internet information, KSII, pp 71–72 7. Yang S-S, Jang Y-H, Ju Y-W, Park S-C (2017) Design of smart cattle shed system based on BLE beacon to improve power consumption. In: Advances in computer science and ubiquitous computing. Springer, Singapore, pp 77–82. https://doi.org/10.1007/978-981-10-7605-3_13 8. Gody´n D, Herbut P, Angrecka S (2018) Measurements of peripheral and deep body temperature in cattle – a review. J Thermal Biol Elsevier 79:42–49. https://doi.org/10.1016/j.jtherbio.2018. 11.011 9. McManus C, Tanure CB, Peripolli V, Seixas L, Fischer V, Gabbi AM, Menegassi SRO, Stumpf MT, Kolling GJ, Dias E et al (2016) Infrared thermography in animal production: an overview. Comput Electron Agric Elsevier 123:10–16. https://doi.org/10.1016/j.compag.2016.01.027 10. Yan G, Li H, Shi Z (2021) Evaluation of thermal indices as the indicators of heat stress in dairy cows in a temperate climate. Anim Basel MDPI 11(8):2459. https://doi.org/10.3390/ani 11082459 11. Akbar MO, khan MSS, Ali MJ, Hussain A, Qaiser G, Pasha M, Pasha U, Missen MS, Akhtar N (2020) IoT for development of smart farming. J Food Qual Hindawi 2020:1–8. https://doi. org/10.1155/2020/4242805 12. Rankin SA, Bradley RL, Miller G, Mildenhall KB (2017) A 100-year review: a century of dairy processing advancements—pasteurization, cleaning and sanitation, and sanitary equipment design. J Dairy Sci American Dairy Science Association 100(12):9903–9915. https://doi.org/ 10.3168/jds.2017-13187 13. Hogeveen H, Kamphuis C, Steeneveld W, Mollenhorst H (2010) Sensors and clinical mastitis - the quest for the perfect alert. Sens MDPI 10(9):7991–8009. https://doi.org/10.3390/s10090 7991 14. Macciotta NPP, Dimauro C, Rassu SPG, Steri R, Pulina G (2011) The mathematical description of lactation curves in dairy cattle. Ital J Anim Sci Taylor and Francis Ltd 10(4):213–223. https:// doi.org/10.4081/ijas.2011.e51 15. Hovinen M, Pyörälä S (2011) Invited review: udder health of dairy cows in automatic milking. J Dairy Sci American Dairy Science Association 94(2):547–562. https://doi.org/10.3168/jds. 2010-3556 16. Dufour S, Fréchette A, Barkema HW, Mussell A, Scholl DT (2011) Invited review: effect of udder health management practices on herd somatic cell count. J Dairy Sci American Dairy Science Association 94(2):563–579. https://doi.org/10.3168/jds.2010-3715 17. Edmondson P (2012) Mastitis control in robotic milking systems. InPract Wiley 34(5):260–269. https://doi.org/10.1136/inp.e2660 18. Jacobs JA, Siegford JM (2012) Invited review: the impact of automatic milking systems on dairy cow management, behavior, health, and welfare. J Dairy Sci American Dairy Science Association 95(5):2227–2247. https://doi.org/10.3168/jds.2011-4943
556
S. Mate et al.
19. Varlyakov I, Penev T, Mitev J, Miteva T, Uzunova K, Gergovska Z (2012) Effect of lameness on the behavior of dairy cows under intensive production systems. Bul J Agric Sci Agriculture Academy 18(1):125–132 20. Gaworski M, Leola A, Priekulis J (2013) Comparative analysis on effectiveness of AMS use on an example of three European countries. Agron Res Estonian Agriculture University 11(1):231– 238 21. Lyons NA, Kerrisk KL, Garcia SC (2014) Milking frequency management in pasture-based automatic milking systems: a review. Livestock Sci Elsevier 159:102–116. https://doi.org/10. 1016/j.livsci.2013.11.011 22. Elgersma A (2015) Grazing increases the unsaturated fatty acid concentration of milk from grass-fed cows: a review of the contributing factors, challenges and future perspectives. Eur J Lipid Sci Technol Wiley 117(9):1345–1369. https://doi.org/10.1002/ejlt.201400469 23. Litwi´nczuk Z, Król J, Brodziak A (2015) Factors determining the susceptibility of cows to mastitis and losses incurred by producers due to the disease—a review. Ann Anim Sci Walter de Gruyter GmbH 15(4):1–24. https://doi.org/10.1515/aoas-2015-0035 24. Wigboldus S, Klerkx L, Leeuwis C, Schut M, Muilerman S, Jochemsen H (2016) Systemic perspectives on scaling agricultural innovations. A review. Agron Sustain Dev Springer 36:46. https://doi.org/10.1007/s13593-016-0380-z 25. Khatun M, Clark CEF, Lyons NA, Thomson PC, Kerrisk KL, Garciá SC (2017) Early detection of clinical mastitis from electrical conductivity data in an automatic milking system. Anim Prod Sci Scrio Publishing. 57(7):1226–1232. https://doi.org/10.1071/AN16707 26. Jiang H, Wang W, Li C, Wang W (2017) Innovation, practical benefits and prospects for the future development of automatic milking systems. Front Agric Sci Eng Higher Education Press 4(1):37–47. https://doi.org/10.15302/J-FASE-2016117 27. Hejel P, Jurkovich V, Kovács P, Bakony M, Könyves L (2018) Automatic milking systems— factors involved in growing popularity and conditions of effective operation literature review. Magyar Allatorvosok Lapja Magyar Mezogazdasag Ltd 140(5):289–301 28. Penry JF (2018) Mastitis control in automatic milking systems. Vet Clin North Am Food Anim Pract Elsevier 34(3):439–456. https://doi.org/10.1016/j.cvfa.2018.06.004 29. Pitkäranta J, Kurkela V, Huotari V, Posio M, Halbach CE (2019) Designing automated milking dairy facilities to maximize labor efficiency. Vet Clin North Am Food Anim Pract W.B. Saunders Ltd 35(1):175–193. https://doi.org/10.1016/j.cvfa.2018.10.010 30. Barbosa-Cánovas G, Bermudez-Aguirre D (2010) 18 - Other novel milk preservation technologies: ultrasound, irradiation, microwave, radio frequency, ohmic heating, ultraviolet light and bacteriocins. In: Improving the safety and quality of milk. Woodhead Publishing Series in Food Science, Technology and Nutrition, vol 1, pp 420–450. https://doi.org/10.1533/978184 5699420.4.420 31. Widyaningrum D, Sebastian C, Pirdo KT (2020) Application of non-thermal plasma for milk sterilization: a review. In: IOP Conference series: earth and environmental science, vol 794, 4th International conference on eco engineering development. IOP Publishing Ltd, Banten, Indonesia 32. Nleya SM, Ndlovu S (2021) Smart dairy farming overview: innovation, algorithms and challenges. In: Smart agriculture automation using advanced technologies. Transactions on Computer Systems and Networks, Springer, Singapore. https://doi.org/10.1007/978-981-166124-2_3 33. Mate S (2021) Internet of Things (IoT) based irrigation and soil nutrient management system. J Embed Syst 9(2):22–28. i-Manager’s Publications. https://doi.org/10.26634/jes.9.2.18072 34. Purkayastha D, Sarkar S, Roy P, Kazmi A (2017) Isolation and morphological study of ecologically-important insect ‘hermetia illucens’ collected from roorkee compost plant. Pollut University of Tehran 3(3):453–459 35. Morales GE, Wolff M (2010) Insects associated with the composting process of solid urban waste separated at the source. Rev Bras Entomol Sci ELO 54(4):645–653 36. Wang H, Wang S, Li H, Wang B, Zhou Q, Zhang X, Li J, Zhang Z (2016) Decomposition and humification of dissolved organic matter in swine manure during housefly larvae composting.
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy …
37.
38.
39.
40.
41.
557
Waste Manage Res J Int Solid Wastes Public Cleansing Assoc Sage Pub 34(5):465–473. https:// doi.org/10.1177/0734242X16636675 Piñero JC, Shivers T, Byers PL, Johnson H-Y (2020) Insect-based compost and vermicomposting production, quality and performance. Renew Agric Food Syst Cambridge University Press 35(1):1–7. https://doi.org/10.1017/s1742170518000339 Ozdemir S, Dede G, Dede O, Turp S (2019) Composting of sewage sludge with mole cricket: stability, maturity and sanitation aspects. Int J Environ Sci Technol Springer Berlin Heidelberg 16(10):5827–5834. https://doi.org/10.1007/s13762-018-02192-4 Cinar S, Cinar SO, Wieczorek N, Ihsanullah S, Kuchta K (2021) Integration of artificial intelligence into biogas plant operation. Processes MDPI 9(1):85–103. https://doi.org/10.3390/pr9 010085 Geiselhart S, Podzhilkova A, Hoffmann-Sommergruber K (2021) Cow’s, Milk Processing— Friend or Foe in Food Allergy. Foods MDPI 10(3):572–589. https://doi.org/10.3390/foods1 0030572 Nowak-Wegrzyn A, Fiocchi A (2009) Rare, medium, or well done? The effect of heating and food matrix on food protein allergenicity. Curr Opin Allergy Clin Immunol Wolters Kluwer 9(3):234–237. https://doi.org/10.1097/ACI.0b013e32832b88e7
Chapter 45
Energy-Efficient Approximate Arithmetic Circuit Design for Error Resilient Applications V. Joshi and P. Mane
1 Introduction Accurate systems always cost a lot, meanwhile many application domains like machine learning, big data applications, computer vision, and signal processing are having intrinsic tolerance to inaccuracy. Approximate computing is a research agenda that is trying to better match the exact specifications of system abstractions with the needs and use of approximate programs Reda and Shafique [1]. A literature survey reveals that many research works have demonstrated their capability to significantly improve the final results under a small amount of computation error. It supports the concept of approximate computing. The overall impact of AC is that there is a tradeoff between quality and performance Jiang et al. [2]. Figure 1 describes the basic concept of approximation. An approximation is usually introduced either in construction or at the logic level or at the hardware description of arbitrary circuits. In other words, it is stated that approximation is achieved at four different levels Shafique et al. [3]. First is the algorithm level where the actual algorithm is kept intact either by altering the inputs or the hyperparameters which is preferred in machine learning and called metalearning. Another approach is application-level where algorithms are not intact and modified to achieve a level of approximation. Loop perforation is one example of it and in this, the loop iterations are managed by the user. The third approach is working at the architectural level. An approximation is achieved by doing modification in the instruction set with error resilience bounded for each instruction. The last one works at circuit level and is related to hardware circuits. Based on this a lot of work is available w.r.t. adders and multipliers categorized in deterministic and non-deterministic types that are discussed in literature surveys. V. Joshi (B) · P. Mane Bits Pilani, K.K. Birla, Goa Campus, Goa, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_45
559
560
V. Joshi and P. Mane
Fig. 1 The basic concept of approximate computing
The remainder of this paper is organized as follows. Section 2 presents the literature review for various approximate adders. Section 3 defines the scope for approximate adder design after literature survey. Section 4 is about proposed energy-efficient adder design with schematic and example. Section 5 covers evaluation metrics including circuit and error metrics. Section 6 discusses applications and Sect. 7 presents a case study for image processing applications. Finally, Sect. 8 summarizes the article with conclusion.
2 Literature Review For any arithmetic circuit adder, subtractor and multiplier are the basic building blocks. Usually, an approximation can be added at one of these stages or at all. A literature review is done for approximate adder discussing related work.
2.1 Review of Approximate Adders Seok et al. [4] proposed a scheme for circuit metric improvements by adding some hardware for error reduction. When it is implemented in 65 nm technology and compared with traditional adder it shows 69 and 70% reductions in EDP and NMED respectively without much degradation in the quality of output. Raha et al. [5] designed Reconfigurable Adder/subtractor Block (RAB) for video encoding which controls the degree of approximation (DA) across different videos maintaining PSNR degradation within 1–10% while achieving up to 38% energy saving compared to exact encoders. Prabakaran et al. [6] proposed different FPGA-based multi-bit architectures for performing approximate addition. The outcome of the paper is in the achievement of gains in the area of 50%, latency of 38%, and power-delay product of 53% as
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
561
compared to 16-bit exact address. The Register Transfer Level and behavioral codes of these approximate modules are open-source and can be used for further fueling the research in designing approximate adders. Dutt et al. [7] proposed energy-efficient, high performance Approximate Full Adders (AFAs), by using the idea of breaking the carry chain subject to low error rate in adder design and constructed N-bit approximate adder which has shown 46.31 and 28.57% improvement in power and area respectively with respect to ripple carry adder. In order to improve Error Distance (ED) and Error Rate (ER), the concept of carry lifetime, error detection, and correction which provide bit-width aware constant delay is used. Masadeh et al. [8] proposed a method to consider the type and position of approximate full adders used in an approximate multiplier as crucial to designing the target design module. Pareto-space exploration technique is carried out to identify the most optimal approximate designs. Gupta et al. [9] proposed various inexact full adders with the reduction in a number of transistors in basic full adder cells and utilized them in the construction of multibit approximate adders used in signal processing applications showing up to 69% power saving without much degradation in the quality of output.
3 Scope for Approximate Adder Design After going through the literature survey for approximate adder following gaps are noticed where there is a scope for research. 1. Error dynamics (Mean Error, Maximum Error, Error Rate, etc.) trade with speed, area, and power consumption. With an upper bound on error parameters, approximate circuits will be designed with less degradation in area and power consumption and considerable improvement in speed. 2. Error parameters, speed, area, and power consumptions are strong functions of operand sizes in arithmetic circuits. Their dependence on operand size can be considerably reduced using approximate circuits. 3. Signed arithmetic circuits have their own challenges in terms of interpretation of results and limited work has been reported in the literature on signed approximate arithmetic circuits. 4. Error detection and correction units will be designed for adaptive bound on error parameters so that circuits can be used in a multitude of applications. There will be degradation in terms of speed, area, and power; but there is no need to use different circuits for different applications. 5. Multiplier is a basic building block for many electronic systems as well as the most complex logic circuit because of repeated addition and that too with carry. There is a scope to design an approximate carry independent adder as a part of an approximate multiplier.
562
V. Joshi and P. Mane
4 Proposed Design of Energy-Efficient Adder This is the proposed idea for adding two numbers without carry propagation. Throughout addition from LSB to MSB, no carry is considered and propagated. Figure 2 indicates the schematic of this technique for the addition of two numbers (of any bit length). Let us start with the important features of this technique. • No carry is considered while adding numbers, in short, carry independent technique. • As carry is not waving from LSB to MSB, the carry chain propagation problem does not exist. • Non-existence of the carry chain propagation problem fastens the arithmetic operations by relaxing the limit on the power requirement. • Which in turn is the best-suited method for battery-operated appliances. • Attainable accuracy with the help of circuit parameters makes this method flexible for exact as well as approximate applications. The proposed method is slightly different because of the absence of carry propagation, but it also gives accurate results too. Though the number of steps required to generate correct results is more than conventional then also it is energy efficient as well as faster because carry chain propagation (conventional method) consumes more energy and lowers the speed. Supporting results are discussed in Sect. 5 under
Fig. 2 Schematic of the proposed method for addition
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
563
Fig. 3 Addition Example a Conventional Method (Carry Propagation) b Proposed Method (No Carry Propagation)
evaluation metrics. Let us see the same technique with the help of a suitable example explained in Fig. 3.
5 Evaluation Metrics In energy-efficient applications approximate circuits are playing a vital role and so basically each approximate circuit is analyzed on the basis of circuit metrics and error metrics such that after getting combined results the application level is decided. The basic circuit metrics include critical path delay (latency), power, and area [LPA] for regular circuits, and one additional constraint called ‘error’ is added for approximate circuits (Reda and Shafique [1], Mazahir et al. [10], Akbari et al. [11]). It is already proved that a large approximation is tolerated in multipliers rather than adders because complex computations are more prone to errors in addition as compared to the multiplication process with the same operand length (Jiang et al. [2], Mazahir et al. [10]).
564
V. Joshi and P. Mane
5.1 Error Metrics Various approaches are used to perform error analysis of approximate circuits using MATLAB or PYTHON. Few papers suggested data-based error analysis with random inputs to cover maximum inputs for that particular bit length and analysis Wu et al. [12]. This is the most widely used method for error analysis as the number of iterations/ trials is increased then analysis results approach towards accuracy. Another approach is to use probabilistic error modeling for predicting the possibility of error in the output Momeni et al. [13]. This method works on a bit position error probability so that the prediction is expected to be closer to reality. The important error parameters considered for comparative analysis are Error Rate (ER), Mean Error Distance (MED), Average Error, Average Hamming Distance, Mean Relative Error Distance, Normalized Mean Error Distance, Acceptance Probability (AP), etc. Error rate and Acceptance Probability (expressed in terms of percentage) play a vital role to decide the level of approximation for a particular application. Here the first approach is used for error analysis which considers random inputs. Table 1 describes the behavior of a 256-bit length adder for a various number of steps generating different error rates and acceptance probabilities. The number of steps are deciding the mode of operation of a circuit. Circuit switches from approximate to exact mode of operation when a number of steps varies from 5 to 256 progressively. The trials are nothing but iterations over which the circuit error performance is evaluated. Based on observations in Table 1, the responses plotted for AP (Fig. 4) and ER (Fig. 5) against an increasing number of bits. The number of steps is nothing but the repeated process of logical EX-ORing and ANDing of the operands till getting the correct result. Figure 4 describes the acceptance probability variation w.r.t. bit length for a fix number of trials, say 1000. As the number of steps varied from 5, 9, 13, 17 to 21, Table 1 Error Metric Analysis for 256 Bit-length Size
Trial
Number of steps
Error rate
256
1000
5
98.7
84.1
256
1000
9
22
99.1
256
1000
13
1.1
100
256
1000
17
0.1
100
256
1000
21
0
100
256
1000
41
0
100
256
1000
61
0
100
256
1000
121
0
100
256
1000
221
0
100
256
1000
256
0
100
Italic: Approximate mode; Bold: Accurate mode.
Acceptance probability (%)
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
565
Fig. 4 Acceptance probability versus bit length
Fig. 5 Error rate versus bit length
acceptance probability increased by 15.9%. It is because of an increase in accuracy which leads the results towards exactness. While in the case of response in Fig. 5, when the number of steps is varied from 9 to 17 through 13, the error rate is reduced by 98.7% because of an increase in accuracy. Improvement in error rate (accuracy) is at the cost of increased delay and power demand.
566
V. Joshi and P. Mane
5.2 Circuit Metrics Any design is synthesized for basic circuit metrics like area, power dissipation, and critical path delay (latency). Along with that compound metrics like power-delay product (PDP), area-delay product (ADP), and energy-delay product (EDP) can also be studied. Electronic Design Automation (EDA) tool is used for circuit design and performance evaluation. Circuit performance is evaluated on the basis of different process technologies (foundries) and component libraries (or software compilers) like Cadence RTL Compiler for 45 nm Nangate Opencell Library (Reda and Shafique [1], Liu et al. [14]), the 45-nm predictive technology model (PTM) [75], 28-nm CMOS, and 15-nm FinFET models. The Monte Carlo algorithm is used as an assessment method for PVT analysis of circuits. For a perfect and accurate comparison, the same circuit configuration should be tried for different designs. Table 2 is showing circuit metrics analysis using the cadence–genus tool for synthesis. Resource parameters chosen are area, power, and delay (latency). The circuit is analyzed under three different operating modes viz. full approximate mode (0.5 ∗ bit-length), half approximate mode (0.75 ∗ bit-length), and accurate mode (1 ∗ bit–length). Figure 6 explains about the area occupied w.r.t. bit length variation. When a circuit operates in an approximate mode, the area occupied is less because of the less number of hardware involved to generate results. The circuit occupies 29.90% more area when operated in full exact mode than a full approximate mode in case of 256 bit-length. As mapped in Fig. 7, power demand is plotted versus bit length. When the number of steps is reduced (approximate mode), power (energy) demand is reduced. This circuit demands 1.38% more power when switching from approximate mode to exact mode for 256 bit-length. This means when we switch from exact to approximate mode the circuit becomes more energy efficient. Figure 8 is all about the delay offered w.r.t. bit length. Delay increases by 9.08% when the circuit switches from approximate to the accurate mode for 256 bit-length. After complete analysis, it is concluded that this kind of approximate adder is more energy-efficient and faster in its approximate mode of operation with a bounded compromise in accuracy.
6 Applications Although AC is very promising still it will not be the first choice every time because of a few limitations like a proper selection of mode as well as approximation technique, approximate code or data portion selected for approximation. Because all these
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
567
Table 2 Circuit Metric Analysis Bit length
Resource parameter
Number of steps (0.5) ∗ bit length
8
Area (unit)2 Power (mW) Delay (nS)
16
Area (unit)2 Power (mW) Delay (nS)
32
Area (unit)2 Power (mW) Delay (nS)
64
Area (unit)2 Power (mW) Delay (nS)
128
Area (unit)2 Power (mW) Delay (nS)
256
Area (unit)2 Power (mW) Delay (nS)
215.1 2.466 5.429
(0.75) ∗ bit length 233.2 6.368
Equal to the bit length 263.3 6.356
5.710
5.922
650.1
732.906
776.340
4.2
12.079
12.081
5.96 2170.30 8.808 7.046 7867.3 17.139 9.202 29,264.598 104.035
6.5
7.004
2558.844
2639.556
26.143
26.159
8.147 9373.536 51.246 11.372 35,249.94 104.566
9.172 9522.99 51.299 13.502 36,140.166 104.773
13.602
17.941
22.231
111,244.392
136,385.838
144,509.706
210.029
212.010
212.937
22.497
31.201
39.702
parameters are application-oriented and cause quality loss with the same choice every time. Usually, error rates less than 10% and PSNR values greater than 30 dB are acceptable respectively in error-resilient and image processing applications Rahimi et al. [15]. When image processing applications are discussed, approximate multipliers with a lower error rate and higher acceptance probability outperform over the same circuit with a higher error rate for applications viz. image sharpening, image smoothening, brightness and contrast control, image compression Mazahir et al. [10]. But the same thumb rule is not applicable for approximate adders as for complex operations, these adders do not give expected results. The operand length and an approximation technique are the most important factors on which evaluation reports are dependent. After proper evaluation of the arithmetic circuit, its application domain is decided. It is always advisable to use a combination of an approximate adder as well as a multiplier to construct a higher-quality processed image instead of using a seldom approximate adder or multiplier Lin and Lin [16].
568
V. Joshi and P. Mane
Fig. 6 Area versus bit length
Approximate adders are mainly designed for reducing critical path delay by lowering error rates and improving circuit performance. But in the case of approximate adders as the carries are ignored or reduced throughout addition will be prone to generate single-sided errors that will become prominent in repeated or iterative addition. When approximate full adders are used at LSBs produce higher but low power dissipation. So while selecting an approximate adder for any application the best trade-off should be achieved between error metrics and circuit metrics. Unsigned multipliers show improvement in circuit area by truncating part of PPs or part of LSBs in input operands at the cost of degraded accuracy. The tradeoff is between area and accuracy (Lin and Lin [16], Kulkarni et al. [17], Jiang, et al. [18]). Few are showing a trade-off between energy and accuracy Jiang et al. [18]. For signed multipliers, the booth multiplier shows promising results due to good error compensation.
7 Case Study In order to illustrate the applicability of the proposed technique to real-time use, it is applied to benchmark images. It is tested by implementing image processing applications like image multiplication Bhardwaj et al. [19]. The proposed addition technique
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
569
Fig. 7 Power (energy) versus bit length
Fig. 8 Delay versus bit length
is used in approximate multipliers for partial product generation and summation. Such an approximate multiplier is used for showcasing the corresponding effect on the chosen standard image. It is shown in Fig. 9.
570
V. Joshi and P. Mane
(a)
(d)
(b)
(e)
(c)
(f)
Fig. 9 Original image a and output images using accurate and approximate adder in approximate multiplier b, c, d, e, f
As mentioned earlier, the application-level approximation technique is used where the approximation is introduced by managing the loop iterations. We can apply the proposed adder in any of the machine learning algorithms where addition and subtraction are heavily performed. Output generated by an accurate adder is considered a golden reference for application. • Image ‘a’ is the original image considered for said application. • Image ‘b’ is the output of an accurate multiplier operating in conventional mode. • Image ‘c’ is the output of approximate multiplier that operates in accurate mode (due to the exact adder with the number of steps equal to 1 ∗ Bit-Length). • Image ‘d’ is the output of approximate multiplier with in-built approximate adder with the number of steps equal to 0.75 ∗ Bit-Length. • Image ‘e’ is the output of approximate multiplier with in-built approximate adder with the number of steps equal to 0.5 ∗ Bit-Length. • Image ‘f’ is the output of the same circuit when an approximate adder is operated for the number of steps less than 0.5 ∗ Bit-Length with deterioration in image quality. The case study is continued with the effect of variation in levels of approximation on brightness and contrast parameters of resultant images. Along with this practical application approach, through circuit metric analysis it is already proved that the proposed adder significantly reduced hardware resource consumption.
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error …
571
8 Conclusion This article is an overall review of approximate circuits also proposing an energyefficient approximate adder. It covers basics, related work, approaches for approximation and challenges, evaluation, applications, case study, and future scope. Approximate Arithmetic adders are designed to improve efficiency as well as critical path delay. But sometimes, errors become prominent in complex applications due to approximation techniques, even though the error rates are low. Therefore, adders are showing less scope for approximation than multipliers Mazahir et al. [10]. On the other hand, approximate multipliers with lower error rates are showing promising results in complex applications. While selecting any multiplier tradeoff between accuracy and circuit metrics viz. area and energy are taken into account. The most important trade-off to be considered in the case of adders is between circuit metrics and error metrics which are already cross-verified for the proposed method through synthesis results using genus and python. It is showing improvements of 29.90, 1.38, and 9.08% respectively in the case of circuit metrics like area, power, and latency. As a case study, the proposed technique is applied to an image and its efficacy has been demonstrated. Depending on the QoR (Quality of Results) the level of approximation and application is decided that is more suitable for big data applications, machine learning, and battery-operated applications. A literature survey for image processing applications shows that almost all applications are implemented for static systems and not for dynamic and there seems a big scope for research work. Overall, approximate computing is really a good hope for green energy.
References 1. Reda S, Shafique M (2019) Approximate circuits: methodologies and CAD. Springer Nature Switzerland, Springer 2. Jiang H, Santiago FJH, Mo H, Liu L, Han J (2020) Approximate arithmetic circuits: a survey, characterization, and recent applications. Proc IEEE 108(12):2108–2135 3. Shafique M, Hafiz R, Rehman S, El-Harouni W, Henkel J (2016) Invited - cross-layer approximate computing: from logic to architectures. In: 53rd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, pp 1–6 4. Seok H, Seo H, Lee J, Kim Y (2021) COREA: delay- and energy-efficient approximate adder using effective carry speculation. Electron 10(18):2234. https://doi.org/10.3390/electronics1 0182234 5. Raha A, Jayakumar H, Raghunathan V (2016) Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Trans Very Large Scale Integr VLSI Syst 24(3):846–857 6. Prabakaran BS, Rehman S, Hanif MA, Ullah S, Mazaheri G, Kumar A, Shafique M (2018) Demas: an efficient design methodology for building approximate adders for FPGA-based systems. In: IEEE Design, automation & test in Europe conference & exhibition (DATE), Dresden, Germany, pp 917–920
572
V. Joshi and P. Mane
7. Dutt S, Nandi S, Trivedi G (2018) Accuracy enhancement of equal segment based approximate adders. IET Comput Digital Tech 12(5):206–215. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989 8. Masadeh M, Hasan O, S. Tahar S (2018) Comparative study of approximate multipliers. In: CoRR, vol abs/1803.06587, pp 415–418. Retrieved from http://arxiv.org/abs/1803.06587 9. Gupta V, Mohapatra D, Raghunathan A, Roy K (2013) Low-power digital signal processing using approximate adders. IEEE Trans Comput Aided Des Integr Circuits Syst 32(1):124–137 10. Mazahir S, Hasan O, Hafiz R, Shafique M, Henkel J (2017) Probabilistic error modeling for approximate adders. IEEE Trans Comput 66(3):515–530 11. Akbari O, Kamal M, Afzali-Kusha A, Pedram M (2018) RAP-CLA: a reconfigurable approximate carry look-ahead adder. In: IEEE Transactions on circuits and systems II: express briefs, vol 65, no 8, pp 1089–1093 12. Wu Y, Li Y, Ge X, Gao Y, Qian W (2019) An efficient method for calculating the error statistics of block-based approximate adders. IEEE Trans Comput 68(1):21–38 13. Momeni A, Han J, Montuschi P, Lombardi F (2014) Design and analysis of approximate compressors for multiplication. IEEE Trans Comput 64(4):984–994 14. Liu W, Cao T, Yin P, Zhu Y, Wang C, Swartzlander EE, Lombardi F (2019) Design and analysis of approximate redundant binary multipliers. IEEE Trans Comput 68(6):804–819 15. Rahimi A, Ghofrani A, Cheng K-T, Benini L, Gupta RK (2015) Approximate associative memristive memory for energy-efficient GPUs. In: IEEE Design, automation & test in Europe conference and exhibition (DATE), pp 1497–1502 16. Lin C-H, Lin I-C (2013) High accuracy approximate multiplier with error correction. In:IEEE 31st International conference on computer design (ICCD), Asheville, NC, USA, pp 33–38 17. Kulkarni P, Gupta P, Ercegovac M (2011) Trading accuracy for power with an underdesigned multiplier architecture. In: IEEE 24th International conference on VLSI design, IIT Madras, Chennai, India, pp 346–351 18. Jiang H, Liu C, Liu L, Lombardi F, Han J (2017) A review, classification and comparative evaluation of approximate arithmetic circuits. ACM J Emerg Technol Comput Syst 13(4):1–34 Article 60 19. Bhardwaj K, Mane PS, Henkel J (2014) Power- and area-efficient approximate wallace tree multiplier for error-resilient systems. In: IEEE Fifteenth international symposium on quality electronic design, Santa Clara Convention Center, Santa Clara, CA, pp 263–269
Chapter 46
Continuous Real Time Sensing and Estimation of In-Situ Soil Macronutrients G. N. Shwetha
and Bhat GeetaLaxmi Jairam
1 Introduction Agriculture plays a very important role in the increase in food production. Farmers are currently showing their crops and waiting for a healthy harvest. However, due to variables like the environment, excessive fertilizer use, ignorance of the amount of fertilizer to be put on the farm, or government fertilizer subsidies, farmers would apply fertilizer without even being aware of the proportion of soil parameters. West et al. [1] have shown that the fertilizer must be applied according to the deficiency found in the nutrients. But due to fertilizer subsidies and the lack of knowledge about the application of fertilizers, farmers are applying fertilizers to their crops. Ball et al. [2] have reported that excess concentration of macronutrients will flow into water bodies. Drinking such water causes cancer in humans and leads to eutrophication, which reduces the population of marine species. When there is an imbalance in soil parameters, it directly affects the growth of the crop as well. This leads to the poor yield and loss to the farmer. Farooq et al. [3] have explained that the Internet of Things, also known as IoT, is a promising technology that provides dependable and efficient solutions for the modernization of many areas, including agriculture. Bacco et al. [4] have shown that the application of AI and IoT in farm management is a key component of the technology known as “smart agriculture”. Amato et al. [5] have highlighted the core principle of precision agriculture is to enhance spatial management techniques to both increase crop yield and minimize the inappropriate use of fertilizers and pesticides. Pivoto et al. [6] have shown that smart farming may employ a variety of sensors to gather information G. N. Shwetha (B) · B. G. Jairam The National Institute of Engineering, Mysore, Karnataka, India e-mail: [email protected] B. G. Jairam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_46
573
574 Table 1 Standard Laboratory values of NPK for soil [9]
G. N. Shwetha and B. G. Jairam Soil fertility in term of levels
N value (kg/ha)
P value (kg/ha)
K value (kg/ha)
Very low-level
< 140
701
> 35
> 360
about the environment (such as temperature, humidity, light, pressure, and presence), and communication networks to send and receive the information. Leonard [7] says that the utilization of the data generated by smart farming increases production and reduces waste by enabling the execution of critical activities in the proper amount, timing and location.
2 Literature Survey Soil Components considered in this review are Nitrogen (N), Phosphorous (P), Potassium (K), Temperature (T) and Electrical Conductivity (EC) and pH (potential of Hydrogen).
2.1 NPK Zewdie and Reta [8] have explained the importance of macronutrients for good yield and crop quality. Macronutrients are N, P, K. Plants need a large amount of nitrogen as it is the main component of nucleic acids, chlorophyll, protein, protoplasm. Right amount of nitrogen gives dark green color to leaves. Phosphorus helps in the process of photosynthesis. Presence of Potassium gives good yield as well as increases quality of the crop. Table 1 gives the standard NPK values in soil.
2.2 pH Khadka et al. [10] have mentioned the relationship between the pH value and the concentration of NPK in soil. The pH range of 6.5–7.5 in soil is considered neutral. Soil is acidic if the pH value is below 7.5 and the soil is alkaline, if the pH value is above 6.5.Plants will not show proper growth in both the above mentioned conditions. The presence of pH is very much essential for the presence of macronutrients in soil.
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
575
Fig. 1 Relationship between N and pH [10]
Fig. 2 Relationship between P and pH [10]
If pH increases (acidic), the availability of N, P, K decreases. Therefore maintaining the correct pH is very important for a healthy plant growth. Figures 1, 2 and 3 shows the relationship between pH and N, P and K respectively.
2.3 Temperature Geng et al. [11] have shown that when temperature is increased it leads to less availability of NPK. Ma et al. [12] have expressed that the maintenance of temperature
576
G. N. Shwetha and B. G. Jairam
Fig. 3 Relationship between K and pH [10]
of 25 °C maintains EC value as well. Mensies and Gillman [13] have reported that the pH of the soil increases at high temperature i.e., at a range of 25–39 °C. Broadbent [14] says that when soil temperature increases soil moisture content decreases.
2.4 EC Patel [15] shows the importance of EC in soil. The soil will contain a small amount of salt. The right EC% aids in good production. Therefore, maintaining the proper EC in the soil is crucial. Sometimes the salinity comes from the fertilizer applied to the soil, irrigation water or when soil minerals are dissolved. When salt content increases in soil, it reduces the yield of the crop. Visconti and Paz [16] have expressed that the EC is dependent on temperature. When the temperature increases, the EC of the soil also increases. Thus a standard temperature of 25 °C should be maintained for good crop quality. Below table shows the standard values of EC. Table 2 shows the standard EC range of soil. Table 2 Standard values of Electrical Conductivity in soil [17]
Parameter
Unit
Range
Interpretation
EC
dS/m
0–2
Salt free
4–8
Slightly saline
8–15
Moderately saline
> 15
Highly saline
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
577
3 Soil Testing Methods Suchithra and Pai [18] have explained how soil testing determines the amounts of soil components in soil. Soil testing distinguishes the nutrient deficient areas from non-deficient ones, so that fertilizer use could be aimed to improve the economic optimum yield per hectare. Figure 4 shows the relationship between application of fertilizer, soil testing and yield of the crop.
3.1 Colorimetry Testing Method Dimkpa et al. [19] have explained that in colorimetric testing methods, soil samples are collected and using chemical analysis, different colors are represented by different nutrients. Sample must be prepared before proceeding. For a particular amount of time the mixture containing soil quantity and specified nutrient should be shaken. This is a time consuming process.
Fig. 4 Relationship between Soil testing, fertilizer application and crop yield [19]
578
G. N. Shwetha and B. G. Jairam
3.2 Spectroscopy Method Monavar [20] has estimated different soil components by using UV–VIS and VIS– NIR spectrometers. For classification Partial Least Squares-Regression (PLS-R) is used. Morellos et al. [21] has used the two linear multivariate and two machine learning methods for the prediction of soil nutrients like Total Nitrogen (TN), Organic Carbon (OC) and Moisture Content (MC).The two linear multivariate methods used are principal component regression (PCR) and PLSR. The linear multivariate methods work on linear data. The two machine learning methods considered are Least Squares Support Vector Machines (LS-SVM), and Cubist. The machine learning algorithms mentioned, works well for the prediction of non-linear data. Both the machine learning methods outperformed the linear multivariate methods. LS-SVM gave the best prediction for MC, OC and Cubist ML techniques gave good predictions for TN. Yang et al. [22] have compared a linear and 3 non-linear ML techniques for the prediction of Soil Organic Matter (SOM) and pH in the soil. The linear technique considered is PLSR, and the non-linear techniques taken are LS-SVM, Extreme Learning Machines (ELM) and the Cubist regression model for the prediction of SOM. ELM outperformed all the algorithms. For the prediction of pH, the LS-SVM and ELM methods outperformed the PLSR and Cubist models. Trontelj and Chambers [23] have compared six classification algorithms like Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), LS-SVM and Artificial Neural Network (ANN) for the prediction of K, P, Mg (Magnesium). RF gave good accuracy in predicting K and LS-SVM gave best results compared to ANN for predicting P and Mg. Subramanian [24] has estimated the soil macronutrients like N, P and K for the crop Rice. There are three phases involved: Data Pre-processing, Clustering and Classification. There are two datasets used, crop dataset and fertilizer dataset. Crop dataset contains types of soil like Alluvial, Rabi, Black soil, etc., of Andhra Pradesh, Tamil Nadu, Kerala and Hyderabad. Spectroscopy method is used for recommendation of fertilizer. Using the K-means algorithm the data collected will be clustered with the help of historical data collected before. Using crop dataset, the amount of nutrient required for the soil will be calculated with the help of the clustered data. For classification RF classifiers are used for good predictive analysis. Depending upon the estimated value of NPK, suitable fertilizer will be recommended by referring to the fertilizer dataset. The proposed method gives an accuracy of 93.33% for recommendation of fertilizer. Li et al. [25] have considered four different algorithms like PCR, PLSR, LS-SVM and Back Propagation Neural Network (BPNN) models for the prediction of Total Carbon (TC), TN, Total Phosphorus (TP), Total Potassium (TK), available nitrogen (AN), available phosphorus (AP), available potassium (AK), and slowly available potassium (SK), LS-SVM and PLSR had better stability and BPNN and LS-SVM gave good accuracy for different types of soil.
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
579
Monavar [20] have reported that the Spectroscopy method is very efficient compared to laboratory analysis but it is time consuming and it is costly when compared to chemical methods.
3.3 Sensor Based Method Akhter and Sofi [26] have mentioned that combining machine learning and IoT data analytics would increase the quality and quantity of the crop. Iorliam et al. [27] have observed that, using IoT enabled devices, soil nutrients like NPK are collected and classified using the Artificial Neural Network. ANN gives an accuracy of 97.14%.This study also mentions that linear regression, logistic regression, and decision tree may be used for classification in future. Gholap et al. [28] have compared three classification techniques like Naïve Bayes (NB), J48 with C4.5.decision tree and JRip. Totally 1988 soil samples with 9 soil components like pH, EC, OC, P, K, Iron (Fe),Zinc (Zn), Manganese (Mn), Copper (Cu) were used. The accuracy of NB was 38.40%, J48 (C4.5) was 91.90% and JRip was 90.24%. J48 outperformed NB and JRip. For prediction two methods used are Linear Regression (LR) and Least Median Square regression (LMSR) for the prediction of Phosphor. LMSR predicts better and more time is taken for model building when compared to LR. Hence the cost of computation is less for LR. Arunesh and Rajeswari [29] have showed the analysis of soil lime status of 203 data instances, by using various classifiers: NB, J48, Random Tree (RT), JRip, OneR, ZeroR, Naive Bayes classifier outperformed all the algorithms with its accuracies as follows-NB–93.81%, followed by J48 and JRip with the same accuracy of 93.46%, then RT gave an accuracy of 50.66% and finally OneR and ZeroR gave same accuracy as 39.92%.As a future enhancement, they want to build a recommender system for the prediction and recommendation of fertilizer and crop which suits lime status of the soil. Hemageetha and Nagalakshmi [30] have considered 701 soil samples for classification with Simple Naive Bayes, BayesNet, J48, Jrip in Salem, TamilNadu. J48 over performed all the other methods with an accuracy of 100%,BayesNet gave an accuracy of 99.84%,Jrip gave an accuracy of 99.71% and Simple Naive Bayes gave an accuracy of 95.14%.This shows that J48 outperformed. Authors also suggest building an innovative model to choose base crops based on the type of the soil as future work. Khadse et al. [31] have compared five classification algorithms like KNN, NB, DT, RF and Logistic Regression. The feature reduction is performed using PCA algorithm. The increase in the size of the dataset has the following effects: i. Decreases the performance of NB and LR. ii. Increases the time taken for execution of all the algorithms. The increase in the number of features in the dataset has the following effects:
580
G. N. Shwetha and B. G. Jairam
i. There is a decrease in the efficiency of NB and LR decreases. ii. There is an increase in the efficiency of DT, RFand KNN. Alama et al. [32] have compared a few classification techniques like SVM, KNN, Linear Discriminant Analysis (LDA), NB, C4.5, C5.0, ANN, and Deep Learning ANNs (DLANNs). The limitations of SVM are. i. ii. iii. iv.
SVM shows inefficiency in computation. Consumes more system resources. Processing speed is very low. Cost of computation is high. The limitations of DLANNs.
i. ii. iii. iv.
The structure of DLANNs is very complex. Requirements for computation are high. Needs a large amount of system resources. Has the highest execution time among all the eight algorithms considered.
KNN is lighter and has low execution times. Compared to SVM, KNN, NB and LDA, C4.5, C5.0, ANN and DLANN algorithms performed well. ANN algorithms are expensive in computation and very complex in building. The accuracy of the algorithms are as follows: C4.5–97.15%, C5.0–96.61% and ANN–96.19%. Out of the considered Techniques, C4.5 classified well followed by C5.0. Wu et al. [33] and Burbidge and Buxton [34] have reported that in order to reduce the cost of computation and to increase its scalability of SVM many optimization techniques are used. Chettri et al. [35] have compared different classification algorithms like CBR, NB and KNN for their accuracy. The accuracies of the algorithms are as follows: CBR–92% and NB–85%, KNN–72%. The results indicate the following points: i. KNN performs well for smaller datasets. ii. NB gives same accuracy even when dataset size is increased and the number of features are increased. iii. NB performed well compared to KNN. Li et al. [36] have compared the classification algorithms like Multiple Linear Regression (MLR), SVM and ANN. Soil components considered are: OM, TN, N, P, and K. The accuracies of SVM and GRNN models are 77.87 and 83.00%, respectively. Phanikumar and Velide [37] have used classification algorithms like NB, J48 (C4.5) and JRip for soil data analysis. Among three JRip is a simple, efficient classifier of soil data. The selected soil attributes were N, P, Calcium (Ca), Magnesium (Mg), Sulphur (S), Iron (Fe), Zinc (Zn), K, pH and Humus (Hs). The attributes were predicted by LR, LMSR and simple regression. For analysis LMSR gave good results compared to LR. For classification JRip gave good accuracy. Accuracy of the algorithms are as follows: Naive Bayes–38.74%, J48–87.06%, JRip-92.53%.
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
581
Bhuyar [38] proposes a work which investigates application of J48, NB and RF classifier by considering the parameters like pH, EC, Fe (Iron), Cu, Zn, OC, P2O5– Phosphorus oxide, K2 O–Potassium Oxide, FI–FertilityIndex of soil. The accuracies of the algorithms are as follows: J48–98.17%, RF–97.92%, NB–77.18%. J48 gave good results. Taher et al. [39] have compared the accuracy of various data mining techniques like K-NN, RF, DT and NB (NB) for classification of N, P, K, S, Fe, Zn, EC, pH of soil. Out of the four mentioned algorithms K-NN performed well. Author says for forecasting soil features K-NN can be suggested. Pandith et al. [40] have compared different classification algorithms like KNN, NB, MLR, ANN and RF for their accuracies are compared for the mustard crop yield prediction. Saranya and Mythili [41] have proposed a model for the prediction of soil type and the type of the crop which can be grown in a particular soil. Soil components taken are pH, Salinity, OM, K, S, Zn, B (Boron), Ca, Mg, Cu, Fe, Mn. Different machine learning algorithms are used such as KNN, Bagged tree, SVM and Logistic regression. Out of all these SVM gave better results with an accuracy of 96% for a total soil sample of 383. Rajeswari and Arunesh [42] have considered attributes like Name of the village, type of soil or Color, Texture of soil, pH, EC, Lime Status, P of 110 data samples. Here the author has predicted the soil type like red and black based on the pH and EC value. Three classification algorithms like JRip, J48, NB are compared for the prediction of soil type. The accuracies of JRip, J48, NB are 98.18, 97.27, 86.36% respectively. JRip predicted better.
4 Comparison of Soil Testing Methods The performance metrics for soil testing methods, including accuracy, time taken, cost involved, and real-time analysis are listed in Table 3.
5 Comparison of Classification Algorithms In this section different Classification algorithms with their accuracies, total number of soil samples and soil components considered are listed in Table 4, and few of the best algorithms for classifying soil components are listed in Table 5.
Elecrochemical sensor Laboratory/In Field based method (Using ISEs or ISFETs)
Optical Sensor using spectroscopy method
3
4
Laboratory/In Field
Laboratory/In Field
Colorimetry method
2
Laboratory
Laboratory Method/Chemical Method
Cost
Time taken for testing soil components is less when ISFET sensors are used [46]
Compared to the electrochemical sensor method and optical sensor method, time taken is less [48]
Not possible
Real time analysis
Sensors have to be calibrated frequently which increases cost. Cost of testing is~150 Yuan/sample [46, 43]
Fabrication of ISE sensors is done at low cost, whereas ISFET sensors cost high [46]
(continued)
Done either by using the kit or by inserting the sensors at the time of analysis [46]
Due to delay in the response ISE sensor cannot be used in real time analysis [46]
Less costly compared – Using soil testing to electrochemical kit – Continuous real sensor method and time analysis is not optical sensor method done [48]
Soil samples have to Cost incurred is more undergo chemical [44] analysis and pretreatment and hence time taken is more in this method [44–46]
Time
Accuracy is good when This method take more compared to laboratory time as the soil has to methods [46] undergo complex steps[46]
Accuracy obtained is the same as laboratory method [49]
This method gives good accuracy when compared to Laboratory method [47]
Laboratory method gives accuracy [43]
Accuracy
Place of development Performance parameters
1
Sl. No Soil testing methods
Table 3 Comparison of different soil testing methods for performance parameters
582 G. N. Shwetha and B. G. Jairam
5
IoT Sensors
Sl. No Soil testing methods
Table 3 (continued)
In Field
Time
As the terrain of the Time taken is less [43] farm varies, and when real time analysis is needed then this method gives good accuracy when compared with laboratory method [43]
Accuracy
Place of development Performance parameters Low cost sensors are used for large area [43]
Cost
Done either by using the kit or by inserting the sensors at the time of Analysis [43]
Real time analysis
46 Continuous Real Time Sensing and Estimation of In-Situ Soil … 583
584
G. N. Shwetha and B. G. Jairam
Table 4 Different Classification Techniques with its accuracy Sl. No
Classification methods
Soil samples
Soil components
1
Random Forest [24]
93.33
9
NPK
2
ANN [27]
97.14
5820
NPK
3
Naïve Bayes [28]
38.40
1988
JRip [28]
90.24
pH, K, EC, P, Fe, Zn, Mn, Cu
203
Lime status level in soil
701
pH, EC, NPK, OC
27
OM, TN, N, P, K
2400
N, S, Ca, Mg, Zn, Fe, P, K, pH, Hs
1639
pH, EC, Fe (Iron), Cu, Zn, OC, P2O5, K2O, FI–Fertility Index
10
N, P, K, S, Fe, Zn, EC, pH
5000
Mustard crop yield
4
5
Accuracy (%)
J48 [28]
91.90
J48 [29]
93.46
RandomTree [29]
50.66
JRip [29]
93.46
OneR [29]
39.92
ZeroR [29]
39.92
NB [29]
93.81
Simple Naïve Bayes [30]
95.14
BayesNet [30]
99.84
J48 [30]
100
Jrip [30]
99.71
6
SVM [36]
77.87
GRNN [36]
92.86
7
NB [37]
38.74
J48 [37]
87.06
8
9
10
JRip [37]
92.53
J48 [38]
98.17
NB [38]
77.18
RF [38]
97.12
KNN [39]
84
DT [39]
53.85
RF [39]
53.85
NB [39]
69.23
NB [42]
72.33
MLR [40]
80.24
RF [40]
94.13
KNN [40]
88.67
ANN [40]
76.86
11
SVM [41]
96
383
pH, Salinity, OM, K, S, Zn, B, Ca, Mg, Cu, Fe, Mn
12
JRip [42]
98.18
110
pH, EC, K, Fe, Cu
J48 [42]
97.27
NB [42]
86.36
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
585
Table 5 Best algorithms for the estimation of soil components References
Classification methods
Best Algorithm
Soil Components
[21]
PCR, PLSR, LS-SVM, Cubist
LS-SVM Cubist
MC, OC TN
[22]
PLSR, LS-SVM, ELM, Cubist
ELM LS-SVM and ELM
SOM pH
[23]
RF, DT, NB, SVM, LS-SVM, ANN
RF LS-SVM
K P, Mg
[24]
RF, SVM
RF
NPK
[28]
NB, J48 (C4.5) and JRip
J48
pH, EC, OC, P, K, Fe, Zn, Mn, Copper
[29]
J48, Random Tree, JRip, OneR, ZeroR, NB
NB
pH, Lime status
[30]
NB, J48, Bayesian networks and Jrip
J48
pH, EC, NPK, OC
[36]
GRNN, SVM
GRNN
OM, TN
[37]
NB, J48(C4.5) and JRip
JRip
N, S, Ca, Mg, Zn, Fe, P, K, pH, Humus
[38]
J48, NB and RF
J48
pH, EC, Fe, Cu, Zn, OC, P2O5,K2O, FI
[39]
KNN, DT, NB, RF
KNN
[40]
KNN, MLR, NB, ANN, RF RF
Mustard yield prediction
[41]
KNN, Bagged Tree, SVM, LR
Soil type, type of crop to be cultivated
SVM
N, P, K, S, Fe, Zn, EC, pH
6 Comparative Analysis of Classification Algorithms Based on the observation made in the previous section, JRip, J48, ANN and RF give good accuracy when different numbers of soil samples are considered for analysis.
6.1 J48 Alam and Pachauri [50] have explained that this algorithm works by creating a decision tree. Input data will be labeled and given to the decision tree. Depending upon the labeled input data decision tree will be formed. Then the unlabeled test data will be tested for generalization. This decision tree is used for classification. J48 also includes capabilities for accounting for missing data, pruning of decision trees, continuous attribute value ranges, derivation of rules, etc. Being a decision tree classifier, J48 employs a predictive machine-learning model that determines the outcome value of a new sample based on different attribute values of the available data. This makes J48 the best method.
586
G. N. Shwetha and B. G. Jairam
6.2 JRip Sonawani and Mukhopadhyay [51] have reported that to reduce errors, it repeatedly applies incremental pruning (RIPPER). JRip is a bottom-up method that finds a set of rules that apply to every member of a class by treating a specific judgment of the instances in the training data as a class. Over-fitting is avoided by using cross-validation and minimum-description length approaches, which yields the best performance.
6.3 RF Breiman [52] says that Random forests are a combination of tree predictors where each tree depends on the values of a random vector sampled autonomously and with the same distribution for all the trees in the forest. The growth of the trees in a random forest increases the model’s randomness. Since the Random Forest can accommodate missing values and can handle continuous, categorical, and binary data, it is suitable for high dimensional data modeling. Random Forest is robust enough to handle over fitting issues due to the bootstrapping and ensemble approach, so pruning the trees is not necessary. For a variety of dataset types, Random Forest is non-parametric, effective, and interpretable in addition to having excellent prediction accuracy.
6.4 ANN Bala and Kumar [53] states that the artificial neural network training technique is built on the initial parameter setup, weight, bias, and algorithm learning rate. It starts off with a baseline value and adjusts its weight after each iteration of the learning process. This approach helps to change the input so that the network gives the best result without redesigning the output procedure. Figure 5 shows the accuracies of JRip, J48, RF and ANN classification algorithms when different numbers of soil samples are considered.
7 Conclusion This paper explains about static data collection and analysis. A comparison has been done among many classification algorithms and observation made is J48, JRip performed very well for both small and large datasets. ANN and RF gave good performance for large datasets.
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
587
Fig. 5 Accuracies of JRip, J48, ANN, RF under different soil samples
8 Future Enhancement The traditional soil testing methods give static data collection, which leads to poor analysis of soil components. As the soil components vary frequently, a robust method is needed. The limitation of the static analysis of soil parameters, can be overcome by burying sensors in the soil and data about the soil components should be collected continuously in real time and stored in the cloud. At a regular time interval the collected data should be analyzed and the same is communicated to the farmer. This helps the farmer in getting good yield.
References 1. West PC, McKenney B, Monfreda C, Biggs R (2013) Feeding the world and protecting biodiversity. In: Encyclopedia of Biodiversity, 2nd edn. Elsevier 2. Ball AS, Wilson WS, Hinton R (1999) Managing risks of nitrates to humans and the environment. Woodhead Publishing 3. Farooq MS, Riaz S, Abid A, Abid K, Naeem MA (2019) A survey on the role of IoT in agriculture for the implementation of smart farming. IEEE Access 7:1–36 4. Bacco M, Barsocchi P, Ferro E, Gotta A, Ruggeri M (2019) The digitisation of agriculture: a survey of research activities on smart farming. Array 3–4:1–11 5. Amato F, Havel J, Gad A, El-Zeiny A (2015) Remotely sensed soil data analysis using artificial neural networks: a case study of El-Fayoum depression Egypt. ISPRS Int J Geo Inf 4:677–696
588
G. N. Shwetha and B. G. Jairam
6. Pivoto D, Waquil PD, Talamini E, Finocchio CPS, Corte VFD, de Vargas Mores G (2018) Scientific development of smart farming technologies and their application in Brazil. Inf Process Agric 5:21–32 7. Leonard EC (2016) Precision agriculture. In: Encyclopedia of food grains, vol 4. Elsevier: Amsterdam, The Netherlands, pp 162–167 8. Zewdie I, Reta Y (2021) Review on the role of soil macronutrient (NPK) on the improvement and yield and quality of agronomic crops. Direct Res J Agric Food Sci 9(1):7–11 9. Pawar DR, Shah EKM (2009) Laboratory testing procedure for soil and water sample analysis. In: Water Resources Department, Directorate of Irrigation Research and Development, Pune 10. Khadka D, Lamichhane S, Thapa B (2016) Assessment of relationship between soil pH and macronutrients. J Chem Biol Phys Sci 6(2):303–311 11. Geng Y, Baumann F, Song C, Zhang M, Shi Y, Kühn P, Scholten T, He J-S (2017) Increasing temperature reduces the coupling between available nitrogen and phosphorus in soils of Chinese grasslands. Sci Rep 12. Ma R, McBratney A, Whelan B, Minasny B, Short M (2011) Comparing temperature correction models for soil electrical conductivity measurement. Precision Agric 12:55–66 13. Menzies NW, Gillman GP (2003) Plant growth limitation and nutrient loss following piled burning in slash and burn agriculture. Nutr Cycl Agroecosyst 65:23–33 14. Broadbent FE (2015) Soil organic matter. Sustain Options Land Manage 2:34–38 15. Patel AH (2015) Electrical conductivity as soil quality indicator of different agricultural sites of Kheda district in Gujarat. Int J Innovative Res Sci Eng Technol 4(8):7305–7310 16. Visconti F, de Paz JM (2015) Electrical conductivity measurements in agriculture: the assessment of soil salinity. In: Intech, Chap 5, pp 99–126 17. Methods Manual, Soil Testing in India (2011) New Delhi, India: Department of Agriculture and Cooperation Ministry of Agriculture Government of India. 18. Suchithra MS, Pai ML (2020) Improving the prediction accuracy of soil nutrient classification by optimizing extreme learning machine parameters. Inf Process Agric 7(1):72–82 19. Dimkpa C, Bindraban P, Mclean JE, Gatere L, Singh U, Hellums D (2017) Methods for rapid testing of plant and soil nutrients. Springer International Publishing AG, pp1–42 20. Monavar HM (2016) Determination of several soil properties based on ultra-violet, visible, and near-infrared reflectance spectroscopy. In: ICFAE, Copenhagen, Denmark 21. Morellos A, Pantazi X-E, Moshou D, Alexandridis T, Whetton R, Tziotzios G, Wiebesohn J, Bill R, Mouazen A (2016) Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosys Eng 152:104–116 22. Yang M, Xu D, Chen S, Li H, Shi Z (2019) Evaluation of machine learning approaches to predict soil organic matter and pH using vis-NIR spectra. Sens, MDPI, pp 1–14 23. Trontelj ml. J, Chambers O (2021) Machine learning strategy for soil nutrients prediction using spectroscopic method. Sens, MDPI, pp 1–13 24. Subramanian KSR (2020) Design and implementation of fertilizer recommendation system for farmers. TEST 83:8840–8849 25. Li X–Y, Fan P–P, Liu Y, Hou G–L, Wang Q, Lv M –R (2019) Prediction results of different modeling methods in soil nutrient concentrations based on spectral technology. J Appl Spectrosc 86:765–770 26. Akhter R, Sofi SA (2021) Precision agriculture using IoT data analytics and machine learning. J King Saud Univ Comput Inf Sci 27. Iorliam A, Adeyelu A, Otor S, Okpe I, Iorliam I (2020) A novel classification of IOT-enabled soil nutrients data using artificial neural networks. IJIREEICE 8(4):103–109 28. Gholap J, Ingole A, Gohil J, Gargade S, Attar V (2012) Soil data analysis using classification techniques and soil attribute prediction. Int J Comput Sci Issues 9(3) 29. Arunesh K, Rajeshwari V (2017) Agricultural soil lime status analysis using data mining classification techniques. IJATES 5(2):28–35 30. Hemageetha N, Nagalakshmi N (2018) Classification techniques in analysis of Salem district soil condition for cultivation of sunflower. JCSE 6(8):642–646
46 Continuous Real Time Sensing and Estimation of In-Situ Soil …
589
31. Khadse VM, Mahalle PN, Shinde GR (2020) Statistical study of machine learning algorithms using parametric and non-parametric tests: a comparative analysis and recommendations. IJACI 11(3):80–105 32. Alam F, Mehmood R, Katib I, Albeshri A (2016) Analysis of eight data mining algorithms for smarter Internet of Things (IoT). Procedia Comput Sci 98:437–442 33. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37 34. Burbidge R, Buxton B (2001) An introduction to support vector machines for data mining. In: Semantic scholar 35. Chettri R, Pradhan S, Chettri L (2015) Internet of Things: comparative study on classification algorithms (k-NN, Naive Bayes and Case based Reasoning). Int J Comput Appl 130(12):7–9 36. Li H, Leng W, Zhou Y, Chen F, Xiu Z, Yang D (2014) evaluation models for soil nutrient based on support vector machine and artificial neural networks. Sci World J 2014:1–7. Hindawi Publishing Corporation 37. Phanikumar V, Velide L (2014) Data mining plays a key role in soil data analysis of Warangal region. Int J Sci Res Publ 4(3):1–3 38. Bhuyar V (2014) Comparative analysis of classification techniques on soil data to predict fertility rate for Aurangabad district. IJETTCS 3(2):200–203 39. Taher KI, Abdulazeez AM, Zebari DA (2021) Data mining classification algorithms for analyzing soil data. Asian J Res Comput Sci 8:17–28 40. Pandith V, Kour H, Singh S, Manhas J, Sharma V (2020) Performance evaluation of machine learning techniques for mustard crop yield prediction from soil analysis. J Sci Res 64(2):394– 398 41. Saranya N, Mythili A (2020) Classification of soil and crop suggestion using machine learning techniques. IJERT 9(2):671–673 42. Rajeswari V, Arunesh K (2016) Analysing soil data using data mining classification techniques. IJST 9(19):1–5 43. Burton L, Jayachandran K, Bhansali S (2020) Review—the “real-time” revolution for in situ soil nutrient sensing. J Electrochem Soc 44. Lavanya G, Rani C, Ganeshkumar P (2018) An automated low cost IoT based fertilizer intimation system for smart agriculture. SUSCOM, Elsevier 45. Shukre VA, Patil SS (2020) Comparative Study of different methodologies used for measuring soil parameters: a review. ICSITS, Pune 8:1–3 46. Lin J, Wang M , Zhang M, Zhang Y, Chen L (2007) Electrochemical sensors for soil nutrient detection: opportunity nutrient detection: opportunity and challenge. In: CCTA, Wuyishan, China, vol II, pp 1349–1353 47. Yamin M, bin Wan Ismail WI, bin Mohd Kassim MS, Aziz SBA, Akbar FN, Shamshiri RR, Ibrahim M, Mahns B (2020) Modification of colorimetric method based digital soil kit for determination of macronutrients in oil palm plantation. IJABE 13:188–197 48. Agarwal S, Bhangale N, Dhanure K, Gavhane S, Chakkarwar VA, Nagori MB (2018) Application of colorimetry to determine soil fertility through Naive Bayes classification algorithm. In: ICCCNT, Bengaluru 49. Sibley KJ, Brewster GR, Astatkie T, Adsett JF, Struik PC (2010) In-field measurement of soil nitrate using an ion-selective electrode. In: Advances in measurement systems, chap 1, pp 1–28 50. Alam F, Pachauri S (2017) Comparative study of J48, Naïve Bayes and One-R classification technique for credit card fraud detection using WEKA. ACST 10:1731–1743 51. Sonawani S, Mukhopadhyay D (2013) A decision tree approach to classify web services using quality parameters 52. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 53. Bala R, Kumar D (2017) Classification using ANN: a review. IJCIRV 13:1811–1820
Chapter 47
Design and Development of Automated Groundstation System for Beliefsat-1 Rinkesh Sante, Jatin Bhosale, Shrutika Bhosle, Pavan Jangam, Umesh Shinde, Kavita Bathe, Devanand Bathe, and Tilottama Dhake
1 Introduction BeliefSat-1 is a 2p-PocketQube standard nano-satellite being developed by the undergraduate students of the KJSIEIT. The satellite itself is a sub-part of the team’s proposal under the PS4-Orbital Platform Program which is accepted by ISRO for their Student Satellite Program, wherein, the team aims to demonstrate indigenously developed technologies for PocketQube standard nano-satellites [1]. One of the most important parts of this system (or any satellite for that matter) is indigenously developing an open-source low-cost Ground Station system for communication with the satellite in orbit. A typical satellite communication system R. Sante (B) · J. Bhosale · S. Bhosle · P. Jangam · U. Shinde · K. Bathe · D. Bathe · T. Dhake K. J. Somaiya Institute of Engineering and Information Technology, Mumbai 400022, India e-mail: [email protected] J. Bhosale e-mail: [email protected] S. Bhosle e-mail: [email protected] P. Jangam e-mail: [email protected] U. Shinde e-mail: [email protected] K. Bathe e-mail: [email protected] D. Bathe e-mail: [email protected] T. Dhake e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_47
591
592
R. Sante et al.
comprises a ground segment and a space segment. The basic parameters of a communication satellite are communication frequency and orbit. The orbit is the trajectory followed by the satellite. Several types of orbits are possible, each suitable for a specific application or mission. The satellite’s coverage area on the Earth depends on orbital parameters. Groundstations can communicate with LEO (Low Earth Orbiting) satellites only when the satellite is in its visibility region. The duration of the visibility and so the communication duration varies for each satellite pass at the Groundstation. For low-cost LEO satellite Ground Stations in an urban environment, it will be a big challenge to ensure communication down to the horizon. Communication at low elevation angles can be hindered through natural barriers or will be interfered by man-made noise. The paper proposes a low-cost and easy-to-implement design with the use of the Commercial of the Shelf (COTS) components while increasing the reliability of the system. The organization of the paper is as follows: Sect. 2 presents the literature review of the system. Section 3 presents the methodology used. Section 4 presents results and a discussion of the system. Section 5 concludes the system. While at the end references papers are presented.
2 Literature Review There are multiple Ground Stations available in the market. However, to track satellites’ position around the globe we required a network of Groundstations. AWS and SatNOGS are the most prominent in this field. SatNOGS project is free software and has published a few open-source hardware designs to be replicated by space enthusiasts [1]. The SatNOGS designs required many 3D printed designs i.e., not accessible to all, especially in developing nations like India. The comparison of a few Groundstations is given in Table 1. There are multiple types of Groundstations available in the commercial market and the open-source domain. For most such Groundstations the setup cost is very high and also has the requirement of some specific equipment that might not be available to everyone. Our goal was to lower this entry barrier to encourage the people to take interest in the space technology domain. Motivation behind the design of this Groundstation is to aid the deployment and operation of the Beliefsat-1 which is the student satellite of the KJSIEIT, India [7]. We have taken inspiration from the SatNOGS for their modular and network structure [1].
3 Methodology For the process of communication with the satellite in orbit multiple factors needed to be accounted for while designing the system. The system comprises of following modules as described in Fig. 1:
47 Design and Development of Automated Groundstation System …
593
Table 1 Related Work Ref. Title
Highlights
Limitations
[1]
SatNOGS: Satellite networked open Groundstation
• Popular and big network • Supports multiple Groundstations and satellites • Has web interface
• Requires access to a 3D printer • The 3D printed material tends to degrade over time
[2]
Small satellite ground • Based on Radio Amateur station in Equipment • Easy and quick to Pilsen–Experiences with implement VZLUSAT-1 commanding and future modifications toward open reference ground station solution
• Several bottlenecks of common Groundstations • Absence of antenna rotator results in data loss
[3]
Practical “low cost” LEO receiving ground station
• Need for an antenna rotator • The Groundstation is not is eliminated by the use of autonomous • Absence of antenna rotator a combination of might result in data loss Eggbeater and Lindenblad antenna • The system is not robust • Homebrew antennae result in very low fabrication costs
[4]
Mercury A satellite Groundstation control system
• Centralizes the station interface control • Able to control the Groundstation via the Internet
• The modular structure simplifies the operation of the Groundstation
[5]
AWS Groundstation
• Supported by AWS ensuring high speed and computation capabilities • Well-established global network
• Not suitable to replicate for a small project
[6]
Nyan-Sat
• Uses the COTS components like SDR, ESP32
• Complex for fabrication
1. 2. 3. 4.
Antenna Antenna directional rotator Transceiver system Data management system
The modular structure of the system allows users to tweak the system according to particular requirements. This also eliminates the issue of the unavailability of the components. The use of the Commercially of the Shelf (COTS) components.
594
R. Sante et al.
Fig. 1 Automated Groundstation system block diagram
3.1 Antenna There are multiple antenna options available for the antenna. The omnidirectional antenna doesn’t have a limited beamwidth hence it does not require the antenna rotator. But the satellite will be orbiting at heights of more than 500 km above the Earth’s Surface and the gain of omnidirectional antennas which are simple to fabricate like monopole, J-pole, V-pole, and dipole are not suitable for such application. The other options we have are the easy-to-fabricate directional antenna like Yagi and Cross Yagi antenna. The Yagi antenna has linear polarization which can introduce polarization mismatch loss. This can be addressed by the use of a circular polarized Cross Yagi Antenna [8]. BeliefSat-1 uses VHF Amateur Radio Band for communication using APRS (AX.25) protocol packets. As the operating frequency for Telecommand and Telemetry is in the 2 m VHF Amateur Radio Band, a 2 m VHF band Cross-Yagi Antenna is used for this purpose. The multiple satellites operate in the 2 m VHF Amateur Radio Band, but if the frequency band is not suitable for a particular case, the antenna with the required specification shall be used [7]. The design for the required Cross-Yagi Antenna is provided in Fig. 2. Since the antenna shall be mounted on the antenna rotator, the antenna needs to be lightweight to ensure reliability and to reduce the cost of the antenna rotator.
47 Design and Development of Automated Groundstation System …
595
Fig. 2 Cross yagi antenna design
Table 2 Material comparison for antenna Sr. No.
Factor
Aluminum
Copper
Stainless Steel
1
Conductivity
3.8 ×
5.8 ×
1.1 × 106
2
Tensile Strength
35 MPa
70 MPa
215 MPa
3
Elasticity
68.0 GPa
110 GPa
193 GPa
4
Corrosion
Low corrosion
Highly corrosive
Very low corrosion
5
Weight
Very low
Moderate
High
6
Endurance
Moderate
Low
High
7
Cost
Low
Moderate
Low
107
107
The material used to fabricate the antenna should be corrosion resistant, have high conductivity, and have low cost. A comparison between the various materials is given in Table 2: After considering all the factors the aluminum is most suitable in most cases. The fabricated antenna is presented in Fig. 3.
3.2 Antenna Directional Rotator As the satellite will be orbiting around the Earth and the Cross Yagi Antenna is the directional antenna, the antenna needs to be directed towards the satellite continuously during a satellite pass. To archive this the antenna directional rotator is required.
596
R. Sante et al.
Fig. 3 Cross yagi antenna model
The antenna rotator is a mechanical device on which the antenna is fixed and can change azimuth and elevation for the same [9]. Analysis of Satellite Passes/Single Axis Rotator To understand the working of an Antenna Rotator, it is important to analyze the behavior of the Satellite with respect to the Groundstation. The Satellite orbits around the Earth in an elliptical orbit. For an observer on the ground, the Satellite is a moving object which rises from a certain direction, approaches upwards to a maximum angle of elevation, and then sets in some other direction. This is referred to as a “Satellite Pass”. During a Satellite Pass, the angle of azimuth and the angle of elevation keep on changing continuously. The study was conducted on Satellite passes of all satellites with an Orbital height between 500–700 km for 10 days to analyze the pace of change at these angles [9]. Upon analysis, it was found that most of the Satellite Passes had a maximum angle of elevation of less than 60°. The graph in Fig. 4 shows the frequency of the maximum angle of elevation in different angle ranges. To verify these findings mathematically, the calculations were performed to determine the probability of the occurrence of Satellite at different angles of elevations.
47 Design and Development of Automated Groundstation System …
597
Fig. 4 Analysis of satellite passes
Probability =
Area of Orbital Sphere in the given range Area of Orbital Sphere above the horizon
(1)
The graph in Fig. 5 shows the findings of the probability calculations. Figure 6 shows that at lower angles of elevations, the range of the Satellite is maximum. As the angle of elevation increases, the range decreases exponentially.
Fig. 5 Probability of satellites passes
598
R. Sante et al.
Fig. 6 Range versus angle of elevation
The use of Single axis antenna rotator will allow the azimuth movement of the rotator which results in an inability to cover the 5% area vertically above the Groundstation. The use of a single-axis rotator will require only one motor and a simple structure resulting in a low-cost, simple system (see Fig. 7). Dual Axis Rotator The dual band antenna rotator can be used to cover the entire visible sky. The 3D model of dual axis antenna rotator using Commercially of The Shelf [10] (COTS) components is shown in Fig. 8. This model does not require any 3D printed material and those components can be changed with an appropriate alternative. Apart from these two designs, the use of Yaesu G5500 (Commercial Rotator) [10] or the Antenna Rotator design from SatNOGS [1] can be used.
3.3 Transceiver System The RF signal from the antenna needs to be demodulated and decoded for further analysis. Transceiver System should be capable of transmitting and receiving RF signals, controlling the Antenna Directional Rotator, and communicating data and commands with the Server. The transceiver should be able to communicate in the 2 m Amateur Radio Band (144–146 MHz). The frequency band may vary according to requirements. It should have a high gain, high transmit power and better sensitivity. The transceiver must be able to switch frequencies quickly and in real-time over
47 Design and Development of Automated Groundstation System … Fig. 7 Antenna rotator mechanism (front angle)
Fig. 8 Dual axis antenna directional rotator
599
600
R. Sante et al.
configuration commands from MCU to compensate for the Doppler shift [11]. The multiple RF ICs can be used to implement such systems. Figure 9 represents implementation where the transceiver system uses the DRA818V RF IC, similar to the BeliefSat-1 [7]. The MCU which is the processing unit will communicate with the server to fetch configuration data and post the data logs to the server. The data encoder and decoder are required to convert the signal to plain text and vice versa. The system further comprises the LNA (Low Noise Amplifier), RF filters, Power Amplifiers, and RF switch to aid the process of RF signal processing. The above RF circuitry can be also replaced by the SDR (Software Defined Radio) reducing the cost and providing the option to work on a wide band of frequencies [12]. Since Beliefsat-1 and many other satellites use the APRS protocol for communication, the system has been integrated with the Direwolf software for conversion of the APRS Audio file to text and vice versa [13]. Kiss-util is the terminal interface to Direwolf [14]. The prediction of the satellite passes is very important for the operation of the system. It will be used to schedule the operations and control the antenna directional rotator. The system will control the Antenna Directional Rotator using the EasyComm Protocol [1]. SPG4 (Simplified General Perturbations 4) uses the satellite data in the form of a TLE (Two Line Element) file and the Groundstation coordinates to make predictions [14]. Due to the drag from a very thin atmosphere in Low Earth Orbit, the TLE files are recommended to update at regular intervals [15]. The updated TLE files can be updated from the G-predict server. Figure 10 describes the data flow diagram of the transceiver system.
Fig. 9 Transceiver System block diagram
47 Design and Development of Automated Groundstation System …
601
Fig. 10 Data flow diagram for transceiver system
3.4 Data Management System The data management system will be the primary interface for the users to interact with the Groundstation system and hence it should be simple to operate while providing all the essential features required. The Data Management system will
602
R. Sante et al.
Fig. 11 Use case diagram for server
be a web application with a modern Graphical User Interface and database. It will be primarily hosted on the cloud and serve as the common node for multiple Groundstation and multiple satellites but can be also deployed locally to ensure the security for very critical missions. The use case diagram for Data Management System is represented by Fig. 11. The server has been implemented using the Django framework for its simple and quick development cycle [16]. The server has a robust authentication and authorization system for access control at various levels. It also comprises the graph and report generation to improve the user experience. Figures 12 and 13 represent the screenshots of satellite information page and graph generation respectively.
4 Results and Discussions 4.1 Antenna The Antenna has been simulated using HFSS (High-Frequency Structure Simulator) software and observed following VSWR (Voltage Standing Wave Ratio). The results of the simulation for VSWR and S Parameter are provided in Figs. 14 and 15 respectively [17].
47 Design and Development of Automated Groundstation System …
603
Fig. 12 Screenshot of server GUI (satellite pass prediction)
Fig. 13 Screenshot of server GUI (graph generation)
After the analysis of the above data in Table 3, it was found that the S Parameter readings of the Antenna have an accuracy of 92.5666% and VSWR readings have an accuracy of 94.6287%.
5 Conclusion There are multiple Groundstation systems available in the market as well as opensource domains. Though these systems are professionally designed and well tested, their overall cost is very high due to the requirement of very specific components which creates an entry barrier for hobbyists and small institutions. In this paper, we have compared various types of antenna and fabrication materials for the same.
604
R. Sante et al.
Fig. 14 VSWR plot for cross yagi antenna
Fig. 15 S Parameter plot for cross yagi antenna Table 3 Antenna testing results Frequency (MHz)
S Parameter Observed
S Parameter Expected
VSWR Observed
VSWR Expected
144.600
− 16.9625
− 16.2321
1.4971
1.5631
145.000
− 19.9186
− 18.4774
1.2222
1.2732
145.800
− 15.5106
− 17.2341
1.4975
1.6103
47 Design and Development of Automated Groundstation System …
605
The Cross Yagi antenna using aluminum will be most suitable in most situations. We then analyzed the satellite trajectories to study the viability of the single axis rotator, which results in the blind spot on the top of only 5% area. We proposed a simple implementation using the RF circuitry as well as Software Defined Radio (SDR). The use of the server-based data management system will provide a common interface for interaction with the system. The use of the Commercially of the Shelf (COTS) components can reduce the overall cost of the system while maintaining good accuracy and reliability. The modular approach allows users to tune the system according to their requirements and the availability of components.
References 1. White DJ, Giannelos I, Zissimatos A, Kosmas E, Papadeas D (2015) SatNOGS: Satellite networked open ground station. In: Engineering Faculty Publications, Valparaiso University, Valparaiso, Indiana 2. Vertat I, Linhart R, Pokorny M, Masopust J, Fiala P, Mraz J (2018) Small satellite ground station in Pilse–experiences with VZLUSAT-1 commanding and future modifications toward open reference ground station solution. In: 28th International conference radioelektronika (RADIOELEKTRONIKA). IEEE, pp 1–6 3. Shamutally F, Soreefan Z, Suddhoo A, Mmple JM (2018) Practical “low cost” LEO receiving ground station. In: IEEE Radio and antenna days of the Indian Ocean (RADIO). IEEE, pp 1–2 4. Cutler JW, Kitts CA (1999) Mercury: a satellite ground station control system. In: IEEE Aerospace Conference. Proceedings (Cat. No. 99TH8403) vol 2. IEEE, pp 51–58 5. AWS: AWS Ground Station. In: amazon. Retrieved from https://aws.amazon.com/ground-sta tion/ 6. Nyan-sat. Retrieved from https://nyan-sat.com/. Accessed on 21 Mar 2022 7. Bokade R (2020) Designing BeliefSat-1: an open-source technology demonstrator PocketQube. In: National conference on small satellite technology and applications 8. Devi DT, Nandhini BS, Balaakshaya S, Shankar HRL (2020) UHF band ground station antenna to track amateur radio satellites. Int J Sci Eng Res 11(5):386–391 9. Liu K, Hao J, Yang J, Ye Y, Yang F, Guo F (2021) Automatic tracking system of LEO satellite based on SGP4. In: ACM Turing award celebration conference-china (ACM TURC 2021), pp 192–199 10. Instruction Manual G-5500. Retrieved from https://www.yaesu.com/downloadFile.cfm?Fil eID=8814&FileCatID=155&FileName=G-5500_IM_ENG_E12901004.pdf&FileContentT ype=application%2Fpdf. Accessed on 21 Mar 2022 11. Ali I (1998) Doppler characterization for LEO satellites. IEEE Trans Commun 46(3):1–5 12. Ulversoy T (2010) Software defined radio: challenges and opportunities. IEEE Commun Surv Tutorials 12(4):531–550 13. Harsono SD, Rumadi, Ardinal R (2019) Design and implementation of SatGate/iGate YF1ZQA for APRS on the LAPAN-A2 Satellite. In: IEEE International conference on aerospace electronics and remote sensing technology (ICARES) 14. Vallado DA, Crawford P (2008) SGP4 orbit determination. In: AIAA/AAS Astrodynamics specialist conference and exhibit 15. Doornbos E, Klinkrad H, Visser P (2005) Atmospheric density calibration using satellite drag observations. Adv Space Res 36(3):515–521 16. Django. Retrieved from https://www.djangoproject.com/. Accessed on 21 Mar 2022 17. Remski R (2000) Analysis of photonic bandgap surfaces using Ansoft HFSS. Microwave J Euroglobal Ed 43(9):190–199
Chapter 48
Towards Developing a Deep Learning-Based Liver Segmentation Method Snigdha Mohanty, Subhashree Mishra, Sudhansu Shekhar Singh, and Sarada Prasad Dakua
1 Introduction There have been several causes of death, interestingly, the liver disease leads from the front; it is the eighth- and fifth-most common in women and in men, respectively [1]. Among the malignant neoplasms, colorectal metastasis and primary hepatocellular carcinoma (HCC) are the most frequent problems amounting to 500,000 and 700,000 new cases per year, respectively [2]. The incidence of primary liver cancer varies largely on the geographical region and is particularly high in Eastern Asia and Middle Africa. This is due to the fact that about 80% of the primary liver cancer cases are linked to Hepatitis B or C, which are frequent in those areas. Another risk factor for primary liver cancer is cirrhosis and the combination of viral hepatitis and cirrhosis leads to the highest risk of cancer development. On the worldwide average, a yearly fatality ratio is 1 in liver cancer indicating the liver cancer patient survival rate less than a year [3]. Considering the seriousness, the clinical research for computerassisted planning of surgeries suggests that treatment of a survival time of nonresected patients is in the range of 1624 months, and survival larger fraction of liver cancer patients would be possible if surgical beyond 5 years is uncommon. Surgical removal or even other treatments (such as chemotherapy, radiation therapy, and ablation) of tumors can only be carried out precision was increased. Delineation [4–8] is thus of utmost importance in any interventional/surgical act; delineation errors may complicate treatment, potentially also leading to poor clinical outcomes.
S. Mohanty (B) · S. Mishra · S. S. Singh School of Electronics, KIIT Deemed to be University, Bhubaneswar, India e-mail: [email protected] S. P. Dakua Department of Surgery, Hamad General Hospital, Hamad Medical Corporation, Doha, Qatar © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_48
607
608
S. Mohanty et al.
2 Materials and Methods 2.1 Dataset We have used Combined Healthy Abdominal Organ Segmentation (CHAOS) challenge dataset [9]. Data sets contain 20 subjects CT DICOM (Digital Imaging and Communications in Medicine) images and their Ground Truth in PNG (Portable Network Graphic) formats. Out of 20 subjects, 16 subject images are utilized for training and four subjects images for testing the network architecture. The values of slice thickness, space between the slices, pixel spacing, image dimension, bit allocation, and bit depth are 1.6 mm to 3.2 mm, 1.0 mm to 2.0 mm, 0.65 mm to 0.79 mm, 512 × 512, 16 bit, and 12 bit, respectively.
2.2 Methodology In general, medical images contain noise during its acquisition and the noise is proved to be an obstacle in the image segmentation process. Furthermore, contrast label is certainly an issue in image segmentation [11]. Thus, we follow a pre-processing with regards to contrast and noise. In summary, this method has two key steps: 1— contrast enhancement, 2—image segmentation using U-net. Conventional methods use filtering, but that could blur/smooth the edges degrading the image quality. Considering this, we intend to utilize the noise present in the input noise for its contrast enhancement using stochastic resonance (SR). Contrast Enhancement Using Stochastic Resonance The system to perform SR should have some basic requirements: sub-threshold signals like signals with small amplitude, a non-linearity in terms of threshold, and a source of additive noise. This kind of noise is present in input CT images. The bistable systems always are always suitable for such kind of behavior [12]. As far as the functional behavior of the bistable system with respect to the input image is concerned, the weak signal (at lower noise label) finds difficulty to cross the threshold giving low signal to noise ratio (SNR), whereas for large noise levels, the output is noise dominated providing too low SNR. But, it may be noted that the noise does allow the signal in order to pass/cross the threshold providing maximum SNR at some optimum noise level. This characteristics has been shown through Fig. 1a, where the SNR is a function of noise intensity. In this work, we propose a DSR-based contrast enhancement method in the wavelet transform domain. We intend to conduct a simulation investigating the conditions under which a noisy image, such as a liver CT image, can be optimized. The optimization will be with respect to the contrast level in the image by a certain non-zero noise setting. The bistable parameters of the system will be optimized maximizing the performance during the course of simulation.
48 Towards Developing a Deep Learning-Based Liver Segmentation Method
609
Fig. 1 a SNR is a function of noise intensity. b Bistable double potential well
Dynamic Stochastic Resonance Traditionally, noise has been considered as a bottleneck in any operation. However, a few recent studies have proved that the noise can too be used in non-linear systems to increase the SNR. the Dynamic Stochastic Resonance (DSR) phenomenon works in subthreshold domain. Since a low contrast image (irrespective of bring dark or bright) has very less variation in intensity values, it can be considered as a sub-threshold signal and becomes eligible for DSR-based enhancement. Mathematical Background and Proposition A Langevin equation is used to model the stochastic resonance in form of a classic 1-D nonlinear dynamic system, as given below: dx −dV (x) √ + W ξ(t) = dx dt
(1)
where V (x) represents the bistable potential that is provided in (1). W and ξ(t) are the noise variance and noise, respectively. −
g 2 h 4 x + x = V (x) 2 4
(2)
where, g and h are the positive bistable parameters of the double-well system. The double-well system (shown in Fig.1(b)) is noticed stable at xm =
g , h
when it is
2
g , 4h
when ξ(t) = 0. Addition of a peridisassociated by a barrier of height ΔV = odic input signal Bi sin wt to the bistable system makes it time-dependent whose dynamics are governed by (3): −
√ dx(t) dV (x) + Bi sin wt + W ξ (t) = dx dt
(3)
where Bi is the amplitude and ω is the frequency of the periodic signal. The original input signal amplitude plays a crucial role in this process; it is presumed that thee
610
S. Mohanty et al.
amplitude is so small that when the noise is absent, the amplitude will not be able to push the particle of unit mass from one side of the well to the other in the system (as shown in (4). Thus, it would result in fluctuating around its local stable states as shown by the below equation: √ dV (x) dx (t) = − + Bi sin wt + W ξ (t) dt dx
(4)
The switching between the sides of the well takes place only at some resonant value of noise. The hopping is of course synchronized by the average waiting time of the particle resting on a well and the signal frequency. It is noticed that the SNR achieves the maximum when g = 2σ02 . The other bistable parameter, h, can well 3 be calculated using g. The sub-threshold condition is ensured when h = 4g for 27 weak signals [7]. When we solve the stochastic differential equation, in (4), after substituting V (x) from (2) using the stochastic version of Euler Maruyamas iterative discretized method, we get: x (n) + Δt gx − hx 3 + Input(n) = x(n + 1)
(5)
where, Input (n) = Bi sin wt + W ξ(t) is the input signal plus noise, with the initial condition being x(0) = 0. Here Δt is the sampling time, taken as 0.015 experimentally. In this image enhancement scenario, x-axis corresponds to the normalized pixel intensity value; this is with respect to the threshold value that is defined as x = 0. Appropriate values of g and h provide the optimized state resulting in the enhanced image. The goal of this exercise is to add stochastic fluctuation (noise) to the input weak pixel value/signal activating the pixel particle to jump over the detector threshold and pass to the strong signal state enhancing the signal. Discrete Wavelet Transform The input image is split into lower resolutions of four characteristics by the 2-D Discrete Wavelet Transform (DWT), they are: horizontal (HL), approximation image (LL), diagonal (HH), and vertical (LH) detail components. This process is repeated computing multiple scale wavelet decomposition. Different resolutions are obtained by analyzing the signal at different frequencies with the help of multiresolution analysis. The base functions, wavelets, are obtained by the wavelet transform after decomposing the signal. The mother wavelet (prototype wavelet, ψ(t)) produces these wavelets by shifting and dilating functions [7]. There are some unique advantages of using Wavelet Transform (WT) over Discrete Cosine Transform (DCT) or Fast Fourier Transform (FFT); for instance, WT can model more accurately as compared to FFT or DCT. WT are preferred over others, when there is a need of scalability and tolerable degradation. Furthermore, WT is used, wherever, there is requirement of multi-resolution analysis or wavelet coding. In this work, DWT is thoroughly explored utilizing their coefficients in order to help improve the contrast level in an image. Proposed DSR-Based Algorithm We have used wavelets inside DSR to improve the contrast of the input CT images. During this process, we optimize the bistable
48 Towards Developing a Deep Learning-Based Liver Segmentation Method
611
parameters g, h and Δt optimizing the number of iterations by an adaptive procedure as discussed later. The input signal also contains√noise, thus, we have formulated the input image signal as: Input (n) = Bi sin wt + W ξ(t), the second term is the noise component. The salt and pepper can be considered as this kind of noise to name as an example. The DWT coefficients do contain such noise, thus, they are considered to contain signal (image information) and noise as well. Several iterations are needed to process these coefficients through an appropriate simulation. Proposed Algorithm 1. 2. 3. 4.
5. 6.
7. 8.
Initially an unprocessed input image is taken The real time image is subjected to an external noise of varying standard deviations DWT is applied to obtain sub-bands i.e. A, H, V, D Once the sub-bands are obtained DSR algorithm is applied on the AHVD coefficients, for some optimized values of g, h, Δt we get best optimized, tuned de-noised image: x (n) + Δt gx(n) − hx 3 (n) + DWTcoeff = x(n + 1) IDWT is then applied to obtain stochastically reconstructed de-noised image The results have been quantified in terms of performance parameters Noise mean value (NMV), Mean square difference (MSD) and Noise standard deviation (NSD) The obtained results have been compared with already existing techniques such as median filtering and average filtering technique When the output result of two different techniques is compared it can be deduced that DSR based technique gives better result in terms of visual information and correction. These enhanced images are the input for the following image segmentation task.
3 Image Segmentation The U-Net is one of Convolutional Neural Network (CNN) variants of specifically developed for medical image segmentation [10–14] as opposed to the conventional methods [15]; this is an encoder-decoder based architecture. However, the proposed architecture (Fig. 2) different from the architecture proposed by Ronneberger et al. in regards to the decoder [10]. The original architecture utilizes the half of the features from the previous layer in the up-convolution, whereas we have kept the same number of features as the previous layer in our proposed architecture. Thus, information loss during up-convolution is prevented. Subsequently, respective features are concatenated from the corresponding encoding layers. The third dimension of concatenated layer is quite different from original U-Net architecture. The network consists of encoding path and decoding path, which goes on contracting and expanding respectively. As we move on with the trajectory of encoding path, dimensions of output get decreased depending on the pooling parameter. On the contrary, when moved upward, the output dimensions of each layer get increased depending on up-sampling. While
612
S. Mohanty et al.
Fig. 2 Proposed architecture
going downward, the number of features increases, whereas going upward, the number of features gets decreased. Throughout the network, 2 × 2 Max Pooling is used with a stride of 2 and 2 × 2 up-sampling. Max Pooling and up-sampling are used to reduce and increase the dimension of output, respectively. The 3 × 3 convolution layers are used to extract the features. A rectified linear unit (ReLU) activation function activates each convolution layer. At the end of the network, the 1 × 1 convolution layer is used to convert 32- component vector to the required number of classes, which is two in this case (0 and 1). There are 11 layers in the network. The model is implemented on Keras with TensorFlow backend at Python platform. Model creation has been carried out on NVIDIA GPUs. Experimental Set-up After the pre-processing, the pre-processed data are stored in RAM before the training. Tensorflow dataset generator with prefetching has been used so that the neural net-works are efficiently fed with the input and ground truth. Keras has been used to define and train the networks. Adam optimizer was in use with learning rate 0.0001 and batch size as 1 for training the networks at 150 epochs. An HP Z8 workstation with an Intel Xeon(R) Silver 4216 CPU with a 2.10 GHz base clock (64 cores) and 128 GB of system memory was used to train the model. An Nvidia Quadro RTX 5000 GPU with 16 GB of VRAM was there in the workstation.
48 Towards Developing a Deep Learning-Based Liver Segmentation Method
613
4 Results and Discussion After selection of initial bistable parameters, the selected coefficients are processed more number of times than unselected coefficients and later recombined according to (5). The tuning process is adaptive and after optimum number of iterations, the tuned coefficients are inverse transformed and reconstructed to produce output enhanced image. Performance metrics (image quality namely relative image enhancement factor, distribution separation measure (DSM), target- to-background enhancement measure based on standard deviation) are determined for output after each iteration and we have optimized all bistable parameters with respect to them. Final output image is obtained using optimized parameters after adaptively iterating coefficients with rising performance metrics till they reach a maximum and start decreasing beyond that. We have trained the network with 16 subjects (2362 CT images) after contrast enhancement (the results are provided in Fig. 3) and tested on four subjects (512 CT images). Quantitative evaluation of the proposed method is performed in terms of various preferred evaluation parameters with respect to their ground truth [16]. The results (shown in Fig. 4) of average Relative absolute volume difference (RAVD), Dice similarity coefficient (DC), Maximum symmetric surface distance (MSSD), Average symmetric surface distance (ASSD), Hausdorff distance (HD), and Precision are found as 0.03 (std: 0.02), 0.97 (std: 0.03), 1.12 (std: 0.5), 2.82 (std: 1.89), 1.01 (std: 0.39), and 0.93 (std: 0.12), respectively. The most common observation in contrast enhancement is that with respect to bistable system parameters, the performance metric DSM is found to increase non-linearly till a particular value of bistable parameter, and then observed to decrease. This implies that maximum performance is obtained at a particular “resonant value of bistable parameter. This means, the system converges and reaches stability. The results look promising, however, there is a scope for improvement. For instance, the pre-processing can be considered to be accommodated inside the training. The different loss functions may be combined to determine the best one for a specific application. As a potential application of the liver segmentation, real-time
Fig. 3 Contrast enhancement results due to SR: a input image, b contrast enhancement after the iteration step 100, c contrast enhancement after the iteration step 200, d contrast enhancement after the iteration step 300
614
S. Mohanty et al.
Fig. 4 Segmentation results of three subjects: a raw DICOM image, b corresponding ground-truth, c predicted mask, and d segmentation result
fusion of pre-operative CT with intra-operative US can be thought of for hepatobiliary procedures, where the fused image might provide better visualization as opposed to the individual imaging such as CT, US [17]–[20].
5 Conclusion In this paper, a U-Net-based semantic segmentation method is proposed. The performance of the trained model has been evaluated by comparing with available ground truth using the metrics ASSD, DC, HD, MSSD, Precision, and RAVD; the results are promising. It has been observed that first 5-10 slices and last 5-10 slices of a subject provide minor false positive results, which we believe warrant further investigation. In the future, we aim to focus more on post-processing to enhance the accuracy of the model.
48 Towards Developing a Deep Learning-Based Liver Segmentation Method
615
Acknowledgement This publication was made possible by NPRP- 11S-1219-170106 from the Qatar National Research Fund (a member of Qatar Foundation). The findings herein reflect the work, and are solely the responsibility of the authors.
References 1. Rai P, Abinahed J, Dakua S, Balakrishnan S (2021) Feasibility and efficacy of fusion imaging systems for immediate post ablation assessment of liver neoplasms: protocol for a rapid systematic review. Int J Surg Prot IJS Press 25(1):209–215 2. Rhee P, Joseph B, Pandit V, Aziz H, Vercruysse G, Kulvatunyou N, Friese R (2014) Increasing trauma deaths in the united states. Ann Surg 260(1):13–21 3. Akhtar Y, Dakua S, Abdalla A, Aboumarzouk O, Ansari MY, Abinahed J, Elakkad MSM, AlAnsari A (2021) Risk assessment of computer-aided diagnostic software for hepatic resection. IEEE Trans Rad Plasma Med Sci. https://doi.org/10.1109/TRPMS.2021.3071148 4. Dakua S (2021) Towards left ventricle segmentation from magnetic resonance images. Sens J IEEE 17(18):1–11 5. Dakua S, Sahambi JS (2011) Detection of left ventricular myocardial contours from ischemic cardiac MR images. IETE J Res 57:372–384 6. Dakua S, Sahambi JS (2011) Automatic contour extraction of multi-labeled left ventricle from CMR images using cantilever beam and random Walk Approach. Cardiovascul Eng 10:30–43 7. Dakua S, Abinahed J, Ahmed AZ, Balakrishnan S, Younes G, Navkar N, Al-Ansari A, Zhai X, Bensaali F, Amira A (2019) Moving object tracking in clinical scenarios: application to cardiac surgery and cerebral aneurysm clipping. Int J Comp Assis Radiol Surg 14(12):2165–2176 8. Dakua S (2015) LV segmentation using stochastic resonance and evolutionary cellular automata. Int J Patt Recogn Artif Intell World Sci 29(3):1557002, 1–26 9. Kavur AE, Gezer NS, Bars M, Aslan S, Conze PH, Groza V, Pham DD, Chatterjee S, Ernst P, Ozkan S, Baydar B, Lachinov D, Han S, Pauli J, Isensee F, Perkonigg M, Sathish R, Rajan R, Sheet D, Dovletov G, Speck O, Nurnberger A, Maier-Hein KH, Bozdag Akar G, Nal G, Dicle O, Selver MA (2021) CHAOS challenge—combined (CT-MR) healthy abdominal organ segmentation. Med Image Anal 69:101950. https://doi.org/10.1016/j.media.2020.101950 10. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Medical image computing and computer-assisted intervention MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham 11. Orlando N, Gyacskov I, Gillies DJ, Guo F, Romagnoli C, D‘Souza D, Cool DW, Hoover DA, Fenster A (2022) Effect of dataset size, image quality, and image type on deep learning-based automatic prostate segmentation in 3d ultrasound. Phys Med Biol 67:074002 12. Khan MZ, Gajendran MK, Lee Y, Khan MA (2021) Deep neural architectures for medical image semantic segmentation: review. IEEE Access 9:8300283024 13. Siddique N, Paheding S, Elkin CP, Devabhaktuni V (2021) U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access 9:82031–82057 14. Torres-Velzquez M, Chen WJ, Li X, McMillan AB (2021) Application and construction of deep learning networks in medical imaging. IEEE Trans Rad Plasma Med Sci 5:137–159 15. Dakua S, Sahambi JS (2009) LV contour extraction from cardiac MR images using random walk approach. In: IEEE international advance computing conference, Patiala, India, pp 228–233 16. Ansari MY, Abdalla A, Ansari MY, Ansari MI, Malluhi B, Mohanty S, Mishra S, Singh SS, Abinahed J, Al-Ansari A, Balakrishnan S, Dakua SP (2022) Practical utility of liver segmentation methods in clinical surgeries and interventions. BMC Med Imaging 22(97):1–17 17. Mohanty S, Dakua S (2022) Toward computing cross-modality symmetric non-rigid medical image registration. IEEE Access 10:24528–24539
616
S. Mohanty et al.
18. Dakua S, Nayak A (2022) A review on treatments of hepatocellular carcinoma—role of radio wave ablation and possible improvements. Egyptian Liver J 12(30):1–10 19. Al-Kababji A, Bensaali F, Dakua SP (2022) Scheduling techniques for liver segmentation: ReduceLRonPlateau versus OneCycleLR. In: Bennour A, Ensari T, Kessentini Y, Eom S (eds) Intelligent systems and pattern recognition. ISPR 2022. Communications in computer and information science, vol 1589. Springer, Cham 20. Halabi O, Balakrishnan S, Dakua SP, Navab N, Warfa M (2020) Virtual and augmented reality in surgery. In: Doorsamy W, Paul B, Marwala T (eds) The disruptive fourth industrial revolution. Lecture Notes in Electrical Engineering, vol 674. Springer, Cham
Chapter 49
Review on Vision-Based Control Using Artificial Intelligence in Autonomous Ground Vehicle Abhishek Thakur and Sudhanshu Kumar Mishra
1 Introduction Automation has nearly entered almost every aspect of agriculture as applying crop protectants and fertility. These Automations are carried by vehicles designed with AI techniques and are either semi-automated or fully automated. An autonomous vehicle can travel between any two defined points without additional input from any person or cargo in the vehicle, or any external control or system. Every day a lot of people travel to different places such as offices, schools, hospitals and many other places. During these travels, roughly half of the people use public transport, and the rest half will either arrange their mode of transport or depend on human resources to move from one place to another. Most accidents occur because of people driving with some random thoughts in their mind, often failing to notice what’s coming ahead of them. Semi-autonomous vehicles play a key role in reducing the number of accidents. We can see many people who don’t use the sign and simply wave their hands to take turns. All these human errors can be overcome with the help of advancing technology present in Autonomous Ground Vehicles [1]. The development of new advanced technology is found to be an interesting and challenging area of interest for many researchers, and advances made in this field are growing at an exponential rate. Autonomous vehicles can provide autonomous mobility for a person who cannot drive. This field is expected to enhance productivity and help people to travel safely. Autonomous vehicles need to consider a lot of factors including road conditions, regions, obstacles, weather, and a lot [2].
A. Thakur (B) · S. K. Mishra BIT Mesra, Ranchi, Jharkhand 835215, India e-mail: [email protected] S. K. Mishra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_49
617
618
A. Thakur and S. K. Mishra
One such conception is the Self-driving or fully Autonomous Vehicle. “Autonomous driving” is a rapidly proceeding technology that aims at replacing partially or entirely the need for human drivers. At one extreme, people believe autonomous cars will ensure a better future, reduce infrastructure costs, and enhance mobility. On the other extreme, many are afraid of automotive hacking incidents, the risks of fatal crashes, and the loss of jobs related to driving [3]. It is important to figure out exactly how they work, as well as what types of sensors in autonomous vehicles help them to know where to go and recognize objects on the road to prevent car accidents. In general, these autonomous vehicles use sensors to stay connected with the environment and keep track of the path they travel through [4]. An autonomous vehicle has the potential to generate 2 Petabytes (2 million GB) of data every year. With its infrastructure and dense network, 5G makes the future of autonomous vehicles possible. Computer Vision (CV) with Artificial Intelligence (AI) can recognize patterns and objects, similar to human vision in an effective manner. There are notable numbers of path planning algorithms being developed for Unmanned Surface Vehicles (USV), such as Voronoi-Visibility (VV) path planning, roadmap-based path planning, and Dijkstra’s search, etc. [5]. Autonomous technology (like robotics, drones, vehicles, and appliances) is a rapidly growing trend, with its broad impact and use in this digitally transformed era. Most of the technologies are based on AI which is a learningbased approach and follows the behavior of the human brain. With the incorporation of AI into an autonomous ecosystem, it is inspiring to envision the transformational possibilities that lie ahead for various industries. Figure 1 displays various major application areas of autonomous vehicles at present [6]. The leading self-driving car Waymo gives a good experience for its users to date, and also, upgrades now and then. As we are seeing, agriculture is one of the most promising areas where we are witnessing good progress with the implication of autonomous ground vehicles. Cars operating in urban environments are typically denied Global Positioning System (GPS) support due to buildings, trees, and other factors. However, those vehicles operating in fields will most likely have GPS support and therefore will have much better position measurements than would be available via many latest methods. Some of the key areas of agriculture where automation is currently dominating include autonomous tractors, which drive through the field following a certain pattern. An Autonomous Tractor with AI and with location tracking would determine its path, and the sought-after efficiency may be easier to attain. Drones and Seed-planting robots are efficient to cultivate crops in a large area within a short span of time, Smart Spraying to control pests in and out of the field, and Picking with an arm robotic harvester. In this paper, we have discussed the benefits of using certain AI techniques, algorithms and other necessary calculations for the automotive machines in detail.
49 Review on Vision-Based Control Using Artificial Intelligence …
619
Agriculture Military
Transportation
Space
Heavy Traffic Areas
Autonomous Ground Vehicles
Nuclear Power Plant
Crawler Excavators
Fig. 1 Fields with Implementation of Autonomous Ground Vehicles
2 Literature Review Rochan et al. [7] have proposed a method for computing the steering angle for a vision-based autonomous vehicle. The author explains the process in three stages and ensures to achieve lower computing costs. By applying the Gaussian Mixture Model (GMM) and Expectation–Maximization (EM) algorithm, the road region has been extracted. After reducing the angle transition noise by Kalman Filter (KF), they have computed the steering angle. The author has mentioned that this model works in different lighting conditions but the road region has not been extracted dynamically and is a work to be extended in the future. Cai et al. [8] have made use of Deep Imitative Reinforcement Learning (DIRL) to achieve fast autonomous racing cars. The expert-driving dataset has been used with two different variations placing random obstacles and noises to increase the difficulty and enhance the performance. The author states that he would revise his proposed model to achieve the required data and lower the hardware burden which is under progress for real-world applications. A method for the hardware setup of the autonomous vehicles by implementing the computer vision (which is working with image acquisition, processing, analysis, and understanding) along with OpenCV was proposed by Sahu et al. in [9]. Raspberry Pi Camera has been used to find the directions of the movements. Once the captured image is processed, it is used to generate signals like stop, left, and right turn. However, OpenCV is believed to be slower while loading images than the matplotlib. In addition, the paper claims that the vehicle will follow the symbols on
620
A. Thakur and S. K. Mishra
the road by the raspberry pi camera, but we can’t expect the signs to be clear on an unstructured road or in rural areas. Valera et al. [10] have proposed an autonomously moving electric vehicle design with Computer Vision and Artificial Intelligence. Computer Vision includes image processing via OpenCV and TensorFlow and decides the correct fit of road lanes. AI-based decision-making algorithm simulated the real-time environment setup and then tested with the 3D go-kart design from CAD. The theory of guiding autonomous vehicles by processing the images about the road obtained by a single camera was proposed by Manivannan et al. [11] in Inverse Perspective Mapping (IPM) processes the individual frames from the video of a single camera. Fuzzy logic algorithms move the vehicle to the center of the path and detect the left edge and move to the center. The model has been tested with the P3-DX Mobile Robot. Huang and Nitschke [12] have proposed a model for an efficient vehicle controller by considering the objective-based search parameters. They have considered the parameters of the fitness function, novelty search, and hybrid search. However, the vehicle controllers must work as per the driving behaviors and through complicated road networks. Mitchell et al. [13] have proposed a method to drive the autonomous vehicle in multiple path scenarios. The author has implemented a sim2real approach and a mixed reality setup with virtual reality. This paper concludes that a few runs in mixed reality significantly reduce collisions. The lane-changing methods considering the safety criteria are focused on by Peter et al. [14]. Two separate algorithms for managing the road information improve the constructiveness of the approach. Computer vision made sure that markings on the road were noticed while processing the image. In addition, for the direct sunlight, a 3D printed box around the camera cancels it. The vehicle’s actual pose and images missed by the camera has replaced with the Ackermann Steering Vehicle Odometric Calculations (ASVOC). Kumer et al. [15] have put forth the approach of sending data from the autonomous vehicle to the cloud using Raspberry Pi, which is the core processor. These data are further observed by the vehicle for speed, distance, and the movement of other vehicles on the city road. An ultrasonic sensor is identifying the nearby vehicle positions, and also decides the directions for the unmanned vehicle. Lu and Wong [16] have put forth the approach to reduce the large error faced by vehicles turning at the corners with the help of the Convolutional Neural Network (CNN) to provide high rotation invariant property. ORB features have been implied to maintain high-speed performance and switched to CNN when the rotation was high. CNN is also responsible for semantically selecting the static background features for query and mapping. The author claims that the approach has precise localization performance with high efficiency. An analysis of the difficulties of different road and weather conditions that becomes a challenge for detecting the lane lines has been put forth by Stevi´c et al. [17]. The author attempted to utilize the color thresholding that detects lane edges where OpenCV with computer vision was implemented. To correct the distortion produced
49 Review on Vision-Based Control Using Artificial Intelligence …
621
by the camera, OpenCV calibrates the camera and returns the camera matrix and distortion coefficients, which are then undistorted for each frame. A polygon with 4 vertices is selected to perform perspective transform and find ROI. Later the image is converted to HLS color space. Finally, they have introduced the Hough transform to extract line segments and connect them to find the lane for the unmanned vehicle. Vukic et al. [18] have addressed the issue of collecting reference data by autonomous vehicles and sensor processing. To process the reference data, would take many man-hours, but made easy with computer simulation. The author has developed a simulation environment based on Unity by considering all the typical city objects as it would be efficient to test the sensors and algorithms for autonomous vehicles and also to show the deviations from reference data. A single-vehicle equipped with a stereo camera setup would generate simulated stereo images, which would later be processed with OpenCV along with a semi-global block matching algorithm. Boukerche and Ma [19] have presented a review on the deep-learning-based models in their work for vision-based autonomous vehicle recognition in terms of detection, re-identification, and model recognition. The feature extraction has been done by stacking multiple CNNs, in which the fully connected (FC) layer follows the softmax or sigmoid function. The author has pointed out the recent improvements in CNN as choosing different filter sizes for the convolutional layer, increasing depth and width, and also adopting different strategies to stack the convolutional layer. Various CNN architectures have been analyzed including LeNet, AlexNet, GoogLeNet, etc. Prédhumeau et al. [20] have developed a hybrid pedestrian model by combining the social force model (SFM), and a new decision model to achieve the interactions between conflicting pedestrian vehicles. This model collects the behavior of a social group of pedestrians and evaluates it through the comparisons under qualitative and quantitative ground truth trajectories, thus making it easier for the autonomous vehicle to predict the obstacles in a shared space with surrounding pedestrians. The SFM, deployed for non-conflicting interactions, focuses on avoiding static obstacles, pedestrians, and other autonomous vehicles, and also, moves towards the destination while staying in the social group. The decision model, deployed for conflicting interactions, follows the group decision for running, stopping, stepping back, and turning sharply. Mayilvaganam et al. [21] have come up with an algorithm for planning the path of autonomous vehicles, by using a Heuristics-based A* algorithm, which evaluates to find an obstacle-free path. The author also claims that this method can be used by any kind of autonomous vehicle, such as Autonomous Underwater Vehicle (AUV) and Autonomous Ground Vehicles (AGV). A Local Grid Frame was generated with the positions of autonomous vehicles that discretize the space near the vehicle and the finite number of neighboring locations is considered as the next position. The homogeneous transformation formulation and the A* algorithms are based on the potential functions where the tree generated has RRT. The nodes and branches are generated in the vehicle’s viable direction. The author also added that the computational efficiency would be analyzed in terms of the number of node levels.
622
A. Thakur and S. K. Mishra
Lu [22] has proposed unmanned driving technology (vision sensors) and the autonomous obstacle avoidance system (monocular vision detection methods) which is based on machine vision to improve the overall traffic efficiency and reduce the traffic congestion. Eight degrees of freedom dynamics including modeling of tire and vehicle driveline is carried out by feature point extraction and matching. The machine vision-based technique is trained with reinforcement learning. The information fusion of sensors at machine vision is also performed. The decision-making layer prepares the strategy for unmanned vehicles, the planning layer prepares the trajectory path and the control layer prepares the trajectory by controlling the vehicle steering wheel, throttle, and braking. The paper concluded that there are many machine vision methods that would improve efficiency more effectively. Morais et al. [23] have presented hybrid control architecture by combining Deep Reinforcement Learning (DRL) and Robust Linear Quadratic Regulator (RLQR) in order to improve the vision-based lateral control of the autonomous vehicle. They wanted to keep the vehicles at the center of the lane with constant velocity by using the mentioned techniques. Qiao and Zulkernine [24] have proposed a vision-based object detection and distance estimation algorithm with the deep learning techniques. They have implemented Faster R-CNN for higher accuracy even though it takes a longer time to detect objects and YOLOv4 (You Only Look Once) for real-time object detection. It improves detection precision and shortens the inference time. Reebadiya et al. [25] have put forth blockchain-based intelligent sensing and tracking architecture for AV systems using beyond 5G communication networks. The possible attacks and safety measures on the data collected by the autonomous vehicles are discussed and a solution is proposed by applying AI algorithms to the edge servers. This methodology comes with four layers, such as real-time infrastructure deployment layer, mobile-edge server layer, blockchain layer, and cloud computing layer. The block layer is accessed only when high intelligence is required. Shuai et al. [26] have proposed a method of personnel identification for tasks, i.e., team attendance, tracking and obstacle avoidance on unmanned vehicles, and distance measurement based on the fusion of YOLOv4 (based on CNN) and binocular stereo vision (used for personnel distance detection). A summary of all these works under the aspect of image capturing by considering the environment setup and the algorithm deployed for the same is presented in the Table 1. Singh et al. [27] have proposed a fuzzy logic controller for obstacle avoidance in an autonomous ground vehicle. The controller considers distances from left and right sensors as the input and provides the rotational angle of the steering as the output. They have considered a total of 36 rule bases for designing the fuzzy logic controller to provide the information for vehicle movement. Lin et al. [28] had proposed an anomaly modeling of navigation systems for autonomous vehicles using the machine learning techniques. This paper presents a detailed explanation on the various attacks that occur in the deep learning models. After the examination, an approach was presented that would model the anomalies in the driving systems that incorporate machine learning techniques.
49 Review on Vision-Based Control Using Artificial Intelligence …
623
Table 1 Comparisons of Techniques used for Vision-based Autonomous Ground Vehicle Development Setup used for capturing/image processing
Algorithms
Pros
Cons
Raspberry Pi Camera [9], 2019
Computer vision algorithm based on color detection
Low cost, High processing power
Raspberry Pi lacks in the analog to digital converter
Stereo Camera Setup [18], 2019
Semi-global block matching algorithm
Analysis based on the consideration of all the surrounding pixels
Requires higher computations
Calibration Function in YOLOv4, R-CNN OpenCV [24], 2020
R-CNN makes the processing up to 25 times more accurate than CNN
YOLOv4 struggles to detect small and close objects Comparatively low recall and more localization error
Gazebo simulator+Udacity dataset [7], 2018
Gaussian Mixture Model using Expectation Maximization
Works in dynamic weather conditions in structured and unstructured road region
Less systematic processing
LIDAR sensors+Gazebo Simulator [10], 2019
OpenCV
Well-optimized for image processing, high accuracy even at 30+frames per second
Doesn’t have its own editor
Image Acquisition Toolbox in Matlab [11], 2018
Line Detection Algorithm+Boundary Detection Algorithm
Ease of Use to Matlab is slower process any complex compared with analysis OpenCV Platform Independent to process any amount of data
Smart Camera [14], 2019 Matlab Simulink environment with Auto box from dspace
Control algorithm Lane changing algorithm Trajectory following algorithm
The position of the camera with a 3D printed support box cancels out the unwanted reflection
C310HD Web Camera+IR Sensor+Raspberry Pi+Ultrasonic Sensors [15], 2021
Computer vision programming
Captures both image No option for Plug and and video and Play provides the output Clarity is not up to HD of 5-megapixel resolution
KITTI dataset+Siamese Framework [16], 2019
CNN
Faster to train and get the output
Since the camera is inside, certain regions of the left and right lanes may not be captured
Quality is compromised (continued)
624
A. Thakur and S. K. Mishra
Table 1 (continued) Development Setup used for capturing/image processing
Algorithms
Pros
Cons
OpenCV with Python [17], 2020
Hough Transformation, OpenCV undistortion function
Good for object detection, Improves accuracy with removed noised
A lot of memory and computation is required by the Hough transformation
Eshraghi et al. [29] had proposed a framework of image processing module for UAV’s with the help of stereoscopic image reconstruction approach. They had constructed a vehicle with a wide-angle lens camera and a binocular vision device mounted on the bow to provide image input. It contains navigation, communication, power, post-driver, and image processing modules. The overall architecture is composed of the bow, hull, and stern. The carrier plate within the pressure hull carries various modules and experimental instruments. Das and Mishra [30] have reviewed different vision-based obstacle avoidance techniques for Autonomous Unmanned Ground vehicles (AUGV). They have mentioned that a LIDAR based camera was embedded at the top of the vehicles and has been used by many researchers to acquire images, which were directly connected to the GPS monitoring system. Many researchers have also applied swarm intelligence-based optimization techniques, such as Particle Swarm Optimization (PSO) for this interesting and challenging problem. The nature-inspired techniques, probabilistic based model, heuristic optimization techniques, etc., have been incorporated to complete the task of path optimization and controller, system parameters optimization for static as well as dynamic obstacles avoidance. The authors in [31] have reviewed and compared different machine learningbased techniques for designing an intelligent PID controller in autonomous vehicles. They have made a performance comparison between conventional PID controllers and fuzzy-based PID controllers based on the rise time and percentage overshoot. The authors have also compared the parameter tuning of Kp, Ki, and Kd with the conventional Ziegler–Nichols method and genetic algorithm (GA).
3 Conclusion Autonomous Vehicle is one of the fast-growing research fields and several innovative approaches are discovered now and then. Though it has its roots in many sectors, its contribution to the Agriculture area remains an attraction to many researchers and engineers. We have reviewed several works by focusing on each category of vision-based control applications of the autonomous vehicle, i.e., planning the path, preserving the quality of the image, and processing the data obtained from the images.
49 Review on Vision-Based Control Using Artificial Intelligence …
625
Similarly, different techniques for considering the traffic signs and recognizing them to make it reviewed extensively. We have observed that algorithms with AI and 5G improve the capability of sensing and processing image features of the obstacle in a more efficient manner.
References 1. Wang J, Zhang L, Huang Y, Zhao J (2020) Safety of autonomous vehicle. Hindawi J Adv Transp 2020’ 2. Asadi K, Jain R, Qin Z, Sun M, Noghabaei M, Cole J, Han K, Lobaton E (2019) Vision-based obstacle removal system for autonomous ground vehicles using a robotic arm. In: Computing in civil engineering 2019: data, sensing, and analytics. Reston, VA: American Society of Civil Engineers, pp 328–335 3. Weon I, Lee S, Ryu J (2020) Object recognition based interpolation with 3D LIDAR and vision for autonomous driving of an intelligent vehicle. IEEE Access 8:65599–65608 4. Guastella DC, Muscato G (2020) Learning-based methods of perception and navigation for ground vehicles in unstructured environments: a review. Sens 21(1):73 5. Niu H, Savvaris A, Tsourdos A, Ji Z (2019) Voronoi-visibility roadmap-based path planning algorithm for unmanned surface vehicles. J Navig 72(4):850–874 6. Al-Dahhan, MRH, Schmidt KW (2020) Voronoi boundary visibility for efficient path planning. IEEE Access 8:134764–134781 7. Rochan MR, Alagammai AK, Sujatha J (2018) Computer vision-based novel steering angle calculation for autonomous vehicles. In: Second IEEE international conference on robotic computing (IRC). IEEE, pp 143–146 8. Cai P, Wang H, Huaiyang H, Liu Y, Liu M (2021) Vision-based autonomous car racing using deep imitative reinforcement learning. In: IEEE Robotics and automation letters (RA-L) & IROS 2021. IEEE, pp 7262–7269 9. Sahu BK, Sahu BK, Choudhury J, Nag A (2019) Development of hardware setup of an autonomous robotic vehicle based on computer vision using raspberry pi. In: Innovations in power and advanced computing technologies (i-PACT), IEEE Xplore, vol 1. IEEE, pp 1–5 10. Valera J, Huaman L, Pasapera L, Prada E, Soto L, Agapito L (2019) Design of an autonomous electric single-seat vehicle based on environment recognition algorithms. In: 2019 IEEE Sciences and humanities international research conference (SHIRCON). IEEE, pp1–4 11. Manivannan PV, Ramakanth P (2018) Vision based intelligent vehicle steering control using single camera for automated highway system. Procedia Comput Sci Elsevier 133:839–846 12. Huang A, Nitschke G (2020) Automating coordinated autonomous vehicle control. In: AAMAS 2020: Proceedings of the 19th international conference on autonomous agents and multiagent systems. ACM pp 1867–1868 13. Mitchell R, Fletcher J, Panerati J, Prorok A (2020) Multi-vehicle mixed reality reinforcement learning for autonomous multi-lane driving. In: AAMAS 2020: Proceedings of the 19th international conference on autonomous agents and multiagent systems, pp 1928–1930 14. Péter G, Kiss B, Tihanyi V (2019) Vision and odometry based autonomous vehicle lane changing. ICT Express Elsevier 5(4):219–226 15. Kumer SVA, Nadipalli LSPS, Kanakaraja P, Kumar KS, Kavya KCS (2021) Controlling the autonomous vehicle using computer vision and cloud server. Mater Today Proc Elsevier 37(2):2982–2985 16. Lu G, Wong X (2019) Taking me to the correct place: vision-based localization for autonomous vehicles. In: 2019 IEEE International conference on image processing (ICIP). IEEE, pp 2966– 2970
626
A. Thakur and S. K. Mishra
ˇ c N (2020) Vision-based extrapolation of road lane lines 17. Stevi´c S, Dragojevi´c M, Kruni´c M, Ceti´ in controlled conditions. In: 2020 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 174–177 18. Vuki´c M, Grgi´c B, Dinˇcir D, Kostelac L, Markovi´c I (2019) Unity based urban environment simulation for autonomous vehicle stereo vision evaluation. In: 2019 42nd International convention on information and communication technology, electronics, and microelectronics (MIPRO). IEEE, pp 949–954 19. Boukerche A, Ma X (2022) Vision-based autonomous vehicle recognition: a new challenge for deep learning-based systems. ACM J ACM Comput Surv 54I(4):1–37 20. Prédhumeau M, Mancheva L, Dugdale J, Spalanzani A (2021) An agent-based model to predict pedestrians trajectories with an autonomous vehicle in shared spaces. In: ACM, AAMAS 2021: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 1–9 21. Mayilvaganam K, Shrivastava A, Rajagopal P (2021) An A* based path planning approach for autonomous vehicles. In: AIR2021: Advances in robotics - 5th international conference of the robotics society. ACM, Article No 24, pp 1–8 22. Lu S (2021) Design of autonomous obstacle avoidance system for driverless vehicle based on machine vision. In: 2021 3rd International conference on artificial intelligence and advanced manufacture. ACM, pp 1936–1940 23. de Morais GAP, Marcos LB, Bueno JNAD, de Resende NF, Terra MH, Jr VG (2020) Visionbased robust control framework based on deep reinforcement learning applied to autonomous ground vehicles. Control Eng Pract 104:104630. Elsevier 24. Qiao D, Zulkernine F (2020) Vision-based vehicle detection and distance estimation. In: 2020 IEEE Symposium series on computational intelligence (SSCI). IEEE, pp 2836–2842 25. Reebadiya D, Rathod T, Gupta R, Tanwar S, Kumar N (2021) Blockchain-based secure and intelligent sensing scheme for autonomous vehicles activity tracking beyond 5G networks. Peer-to-Peer Netw Appl 14:2757–2774 26. Shuai G, Wenlun M, Jingjing F, Zhipeng L (2020) Target recognition and range-measuring method based on binocular stereo vision. In: 2020 4th CAA International conference on vehicular control and intelligence (CVCI). IEEE, pp 623–626 27. Singh M, Das S, Mishra SK (2020) Static obstacle avoidance in autonomous vehicle navigation using fuzzy logic controller. In: IEEE INCET. IEEE, pp 1–6 28. Lin YH, Chen SY, Tsou CH (2019) Development of an image processing module for autonomous underwater vehicles through integration of visual recognition with stereoscopic image reconstruction. Underwater Technol Hydrodyn Control Syst J Mar Sci Eng MDPI 7(4) 29. Eshraghi H, Majidi B, Movaghar A (2020) Anomaly modelling in machine learning based navigation system of autonomous vehicle. In: 2020 6th Iranian conference on signal processing and intelligent systems (ICSPIS). IEEE Xplore 30. Das S, Mishra S (2019) A review on vision based control of autonomous vehicles using artificial intelligence techniques. In: International conference on information technology (ICIT), Bhubaneswar, India. IEEE, pp 500–504 31. Vartika V, Singh S, Das S, Mishra SK, Sahu SS (2019) A review on intelligent PID controllers in autonomous vehicle. In: Reddy MJB, Mohanta DK, Kumar D, Ghosh D (eds) Advances in smart grid automation and industry 4.0. Lecture notes in electrical engineering, vol 693. Springer, Singapore
Chapter 50
Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network K. Kiruthika Devi , G. A. Sathish Kumar , and B. T. Shobana
1 Introduction Social Media has become an irreplaceable aspect of people’s daily lives in recent years. Social media is defined as a structure built on technology that enables a deep social interaction, group formation and cooperation. In recent years, the use of social media networks especially Twitter has drastically increased across the globe due to its popularity and serves as a powerful tool for the users to share information and opinions about events, upload pictures and videos and spread information. Moreover, the social networking apps in smartphones facilitate user access to such sites. However the popularity and significance of twitter platforms attract spammers to a larger extent and social network spam is more challenging due to the size of tweets. Twitter spam holds deceptive information to attract the interest of victims, lure them to malicious sites and thus pose a major threat to the security. Prominent microblogging sites like Twitter have inevitably become a prime target of spammers and cases have shown that the security threats caused by Twitter spam majorly impacted the real world. Hence, detecting the spammers, spam bots and spam messages in twitter emerges as a significant task. Machine learning techniques have been broadly used globally to solve various problems across different domains with remarkable achievement. One such domain is the field of cyber security, where researchers have adopted these methods to identify spamming behavior. A majority of the methods for detecting spam on Twitter rely on machine learning that use various classification and clustering algorithms. There are many machine learning models developed using diverse features in the twitter data.
K. K. Devi (B) · G. A. S. Kumar · B. T. Shobana Sri Venkateswara College of Engineering, Sriperumbudur, TamilNadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_50
627
628
K. K. Devi et al.
The proposed work aims to select the most important lightweight features from user account and tweet content using ensemble feature selection to build an efficient spam detection model. The proposed work employs multiple feature selection methods such as Chi-Square, Extra Tree, Relief-F, Recursive Feature Elimination, Random Forest and Gain Ratio as the base feature selection method on the datasets and employs ensemble based feature selection to finally obtain the ranked list of features. Subsequently, various machine learning algorithms are applied to check for an accurate spam detection model.
2 Literature Survey One of the main challenges faced by researchers in classification and regression problems is the high dimensions in the dataset. Online social networks such as Twitter and Facebook generate a huge amount of data which are structured and unstructured. This leads to large dimensions of the data. Analyzing these online social networks which have large dimensions leads to computational complexity. Hence to overcome this problem, feature selection algorithms need to be applied to reduce the number of dimensions. Li and Liu [1] discuss the feature selection methods for large dimensional dataset. Feature selection methods in general are categorized as filter-based, wrapper and embedded. Filter based approaches such as chi-square, (Jin et al. [2]) variance (Ebenuwa et al. [3]) and correlation (Hall [4]) has been used for ranking of features as they are rapid, computationally inexpensive and independent of the classifiers whereas the wrapper approaches are dependent on the classifiers.
2.1 Features for Twitter Spam Detection Song et al. [5] utilized graph based features to determine whether an account is spam or not. The use of graphs and network features can contribute to efficient spam detection, but when used on real-time tweets, it is a time-consuming and complex process. The blacklisting technique was used by Ma et al. [6] to detect Twitter spam by identifying suspicious URLs. This technique is very laborious since it requires manual labeling, and the users may still click on suspicious URLs before being added to the blacklist. Chen et al. [7] employed user account and tweet based features for detecting spam in twitter. The user account features and tweet based features require less computing power when compared to blacklist approaches, graph and network based features. It is feasible to gather and analyze a great deal of user account and tweet based features.
50 Ensemble Learning Based Feature Selection for Detection of Spam …
629
2.2 Feature Selection Methods for Spam Detection Wald et al. [8] used wrapper based feature selection on two twitter datasets to improve the classifier accuracy. Different types of learners were used for the wrapper based feature selection and for classification to check the best learner for feature selection and classification. The final results showed that the naïve bayes learner for feature selection performed best and Multi-layer perceptron was best when used for classification. Reddy and Reddy [9] employed principal component analysis for the reduction of dimensions in the twitter data. The reduced dimensional data is then applied to the classifiers. Finally, the KNN classifier performed well with PCA. Morchid et al. [10] applied principal Component Analysis (PCA) to tweet based features for selecting the best features which could increase the classification accuracy. The results show that though there are many features for determining the spammers using the tweet based features, the correlated features obtained using PCA improved the performance of the SVM classifier. Mishra [11] proposed a method to find the correlated features in user and tweet level by using Pearson’s correlation coefficient. The highly correlated features are combined by computing the products of the correlated features. ANN algorithm showed a better accuracy with the correlated features compared to the other classifiers. Herzallah et al. [12] utilized the three feature selection techniques Mean Square Error, Information Gain and Relief-F method to test the significance of the features. The top five ranked features from the three techniques were used for the detection of spammers in the twitter network. Khalil et al. [13] proposed a clustering based feature extraction from twitter datasets to extract important features and thereby improve the classification accuracy of the model.
2.3 Machine Learning Algorithms for Spam Detection In order to combat spam, researchers utilize various machine learning algorithms to develop spam detection models. Supervised, unsupervised and semi-supervised machine learning algorithms are mostly used to detect spam in twitter by mining the features of twitter data. Imam and Vassilakis [14] proposed a semi-supervised approach with unlabeled data to handle twitter spam drift. Chen et al. [7] proposed a spam detection system which employed different supervised learning algorithms such as random forest, C4.5, Naïve Bayes, K-Nearest Neighbor and Support Vector Machine. Washha et al. [15] proposed an unsupervised learning algorithm to develop a spam detection model using unlabeled tweets.
630
K. K. Devi et al.
3 Dataset Description Due to security problems, Twitter does not provide the real twitter data. Though Twitter streaming API’s were utilized by researchers for retrieving the real twitter data, the tweets were not allowed to be publicly available. Hence, the proposed work is experimented on the dataset used by Chen et al. [7], a part of which is accessible to the researchers on the web. A portion of the dataset is represented in Table 1.The target attribute is a label which refers to the presence of spam ‘0’ refers to the absence of spam and ‘1’ refers to the presence of spam.
4 Proposed System The proposed work aims to detect spam in twitter using lightweight optimal features using machine learning algorithms. The derived dataset has a total of 27 features which includes user details, tweet content details, followers and following details. The optimal feature types and feature description is shown in Table 2. The proposed architecture diagram is depicted in Fig. 1.The first step involves feature selection where few features are derived from the existing features in the dataset. Feature engineering is an important preprocessing step as it plays a vital role in selecting the robust features from the set of all features to build an efficient model. So, in the second step ensemble feature selection is applied to the dataset in order to obtain the prominent features by simple voting shown in Table 3. From a total of 27 features, 15 features are chosen based on voting. The third step includes the training phase where randomly chosen training samples of different sizes are used to build the spam detection model using the four machine learning algorithms: SVM, Logistic Regression, Decision Tree and Random Forest. Finally, the test samples are tested with the classification models built to evaluate the spam detection rate.
4.1 Feature Selection Methods Used in the Proposed Work a. Chi-Square: Chi-Square is one of the statistical filter based feature selection methods in which the features are evaluated with respect to the target class. The relevant feature is the one which has a high value of chi-square. b. Random Forest: Random forest is one of the embedded feature selection methods to determine the significance of a feature. Each tree computes the importance of the feature based on its capability to decrease the impurity of the leaves. c. Extra Tree Classifier: Gini index is used to select the optimal features when Extra Tree classifier is used for feature selection. The features are ranked based on the Gini Index score. The higher the value of Gini index the better the feature will be for detecting spam.
525
16
1291
345
14
192
718
93
664
967
177
708
7
1376
395
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
Account age
4992
Id
4
1039
0
103
112
124
3093
407
479
3037
0
18
16
210
0
No follower
28
1936
0
107
248
273
3444
1663
263
2429
5
1909
83
247
1
No following
Table 1 Portion of the twitter dataset
0
927
37
1
3055
4
1
1
1762
26
0
0
6
6
0
No user favourites
1
2
0
1
0
0
0
1
1
0
0
0
0
0
0
No lists
6027
13713
38
1080
14899
352
16108
2556
5595
3271
26
633
360
506
2172
No tweets
0
0
2142
0
0
0
102,195
1253
0
135
3898
0
2036
0
0
No retweets
3
1
0
0
1
0
0
0
0
0
0
0
0
0
0
No hashtag
0
1
1
0
0
0
1
1
0
1
1
0
1
0
0
No user mention
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
No urls
50
95
60
63
115
46
106
32
44
118
75
59
74
37
22
No char
0
4
0
0
9
1
0
0
2
7
0
0
4
0
0
No digits
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
Class
50 Ensemble Learning Based Feature Selection for Detection of Spam … 631
632
K. K. Devi et al.
Table 2 Optimal features for spam detection Feature type
Feature name
User account based features User _ age
Tweet based features
Feature definition Age of the twitter user
Account_age
Age of the user account
Number_ followers
Number of followers
Num of following
Number of users being followed
Number of favorites
The count of favorites obtained by the user
Number of user_groups
Number of groups user belongs to
Number of tweets_liked
Count of tweets liked by users
Number of tweets
Count of tweets sent by the user
Number of retweets
Count of the number of retweets for a tweet
Number of favorites
Count of the favorites obtained for a tweet
Number of hashtags
Number of hashtags in this tweet
Number of user mentions
Number of users mentioned this tweet
Num of URLs
Number of URLs in this tweet
Num of characters
Number of characters in this tweet
Num of special characters Number of special characters in the tweet
Fig. 1 Ensemble learning for optimal feature selection in twitter spam detection
Feature name
Number_followers
Number_following
Account_age
User_age
Number of favorites
Number of user_groups
Number of tweets_liked
Number of tweets
Number of retweets
Number of favorites
Number of hashtags
Number of usermentions
Num of URLs
Num of characters
Num of special characters
Feature no
2
1
5
7
12
13
6
11
4
3
14
10
8
9
15
1
1
0
1
0
1
1
0
1
1
0
0
1
1
1
Extra trees
0
0
1
0
0
0
1
1
0
0
1
1
0
1
1
Random forest
Table 3 Optimal feature selection using ensemble methods
1
1
0
1
1
0
0
1
1
1
0
1
0
0
1
Recursive feature elimination
0
0
1
0
0
0
1
1
0
0
1
1
1
1
1
Relief-F
0
0
1
1
0
0
0
0
1
1
1
0
1
1
1
Chi-Square
1
1
1
1
0
0
1
1
1
1
1
1
1
0
1
Gain ratio
3
3
4
4
4
4
4
4
4
4
4
4
4
4
6
Votes
50 Ensemble Learning Based Feature Selection for Detection of Spam … 633
634
K. K. Devi et al.
d. Relief-F: Relief-F helps to find features which are statistically related to the target variable. Relief is a feature selection algorithm which can be used for the features which have continuous and discrete values. Relief-F is an extension of Relief algorithm which can be used to handle multi-class classification problems and also datasets which have incomplete and noisy data. e. Recursive Feature Elimination: A feature selection method which helps to determine the optimal features by recursively removing the features which are not strong. f. Gain Ratio: Gain ratio is a feature selection method which is better in determining the relevant features compared to information gain.
4.2 Classifier Training Four different supervised classifiers such as random forest, Logistic Regression, support vector machine and Decision tree are used to assess the performance of our proposed ensemble feature selection method. a. Random Forest: A supervised machine learning technique which helps to create multiple decision trees and based on the outputs from various decision trees, the final decision is made. b. Logistic Regression: A supervised learning algorithm which outputs a probability value between 0 and 1 for a target variable. c. Decision Tree: A supervised machine learning algorithm in which the nodes represent the features and the branches the rules and leaf node are the final results. d. SVM (Support Vector Machine): An algorithm which works well for many classification problems. This algorithm finds a hyperplane that classifies the data points in a K dimensional space.
5 Experimental Evaluation and Results In the proposed work, the performance of classifiers are evaluated under various experiment conditions using the standard metrics. Varying sizes of the training data are used, ranging from 2000 to 200,000 as shown in Table 4. In case of both training and testing data, a balanced ratio of spam and non-spams tweets is maintained. The classifiers are tested with data size of 100,000 spam and 100,000 non-spam tweets to evaluate the performance. The spam detection rate is evaluated by training four machine learning models such as Random Forest, Decision Tree, Support Vector Machine and Logistic Regression. The training set is gradually increased from 1000 to 100,000 and the performance of the proposed models are evaluated. From the evaluated results it is concluded that the random forest classifier has higher detection accuracy than the
50 Ensemble Learning Based Feature Selection for Detection of Spam …
635
Table 4 The training and testing data Training data Data set
Testing data
No. of spam tweets
No. of non spam tweets
Data set
No. of spam tweets
No. of non spam tweets
DS1
1000
1000
DS1
100,000
100,000
DS2
10,000
10,000
DS2
100,000
100,000
DS3
100,000
100,000
DS3
100,000
100,000
other three classifiers. Table 5 presents the performance evaluation of SVM, logistic regression, decision tree and random forest on varying size of datasets. Figure 2 depicts that all the four classifiers showed stable performance in terms of accuracy. Random forest classifier outperformed the other models and is found best suitable for twitter spam detection. Table 5 Performance evaluation of machine learning models Performance metric (in %)
SVM
Logistic regression
Decision tree
Random forest
Accuracy
83.19
78.75
79.3
86.95
Precision
20
25
79.29
87.07
Recall
83.25
78.47
79.3
86.95
F-measure
79.07
70.55
80.52
86.87
Detec on Accuracy of Classifiers A 100 c 80 c u 60 r 40 a 20 c 0 y
DS1 DS2 DS3
Random Forest
Decision Tree
Support Vector Machine
Machine Learning Classifiers
Fig. 2 Detection accuracy of machine learning classifiers
Logis c Regression
636
K. K. Devi et al.
6 Conclusion In the proposed work, the effect of optimal feature selection using ensemble methods and its impact on the stability of machine learning algorithms on varying training data are investigated. The experimental results reveal that random forest classifiers outperformed the other machine learning models. In the future, the performance of the models can be improved further by applying boosting techniques. Additionally, an investigation of the spam detection models is to be carried out on an imbalanced dataset suitable for real time spam detection.
References 1. Li J, Liu H (2017) Challenges of feature selection for big data analytics. IEEE Intell Syst 32(2):9–15 2. Jin C, Ma T, Hou R, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Chi-square statistics feature selection based on term frequency and distribution for text categorization. IETE J Res 61(4):351–362 3. Ebenuwa SH, Sharif MS, Alazab M, Al-Nemrat A (2019) Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7:24649–24666 4. Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning 5. Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender receiver relationship. In: International workshop on recent advances in intrusion detection. Springer, pp 301–307 6. Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious URLs: an application of large scale online learning. In: Proceedings of international conference on machine learning 7. Chen C, Zhang J, Chen X, Xiang Y, Zhou W (2015) 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: 2015 IEEE International conference on communications (ICC). IEEE, pp 7065–7070 8. Wald R, Khoshgoftaar TM, Napolitano A (2013) Should the same learners be used both within wrapper feature selection and for building classification models? In: 2013 IEEE 25th International conference on tools with artificial intelligence. IEEE, pp 439–445 9. Reddy KS, Reddy ES (2019) Using reduced set of features to detect spam in twitter data with decision tree and KNN classifier algorithms. Int J Innovative Technol Exploring Eng (IJITEE):8(9):6–12 10. Morchid M, Dufour R, Bousquet PM, Linares G, Torres-Moreno JM (2014) Feature selection using principal component analysis for massive retweet detection. Pattern Recogn Lett 49:33– 39 11. Mishra P (2019) Correlated feature selection for tweet spam classification. arXiv preprint arXiv:1911.05495v4 12. Herzallah W, Faris H, Adwan O (2018) Feature engineering for detecting spammers on twitter: modelling and analysis. J Inf Sci 44(2):230–247 13. Khalil H, Khan MUS, Ali M (2020) Feature selection for unsupervised bot detection. In: 2020 3rd International conference on computing, mathematics and engineering technologies (iCoMET). IEEE, pp 1–7 14. Imam NH, Vassilakis VG (2019) A survey of attacks against twitter spam detectors in an adversarial environment. Robot 8(3):50 15. Washha M, Qaroush A, Sedes F (2016) Leveraging time for spammers detection on twitter. In: Proceedings of the 8th international conference on management of digital ecosystems, pp 109–116
Chapter 51
Small-Scale Islanded Microgrid for Remotely Located Load Centers with PV-Wind-Battery-Diesel Generator Deepak Gauttam, Amit Arora, Mahendra Bhadu, and Shikha
1 Introduction A MG is defined as a “local grid that connects distributed energy sources with organized loads and is usually connected to the traditional central grid in a synchronous manner” [1]. The sources in the microgrids are called micro-sources, which can be battery storage, solid oxide fuel cells, wind energy, solar energy, diesel generators, etc. The load is related to a distributed network, and the power supply to the circulated network is met by the micro-power source and the mains [2]. The establishment of a MG is one of the objective methods of reducing the severity of power outages, which can ensure continuous power supply to critical loads by generating electricity in power distribution facilities [3]. Microgrids can operate in two modes: Grid Connected mode: normal operating conditions, MG, and the main grid are connected through a point of common coupling (PCC) on the alternating current (AC) bus bar. The power or frequency of the microgrid is synchronized with the mains. Island mode: In the event of a mains failure, disconnect the MG from the mains on the PCC by operating a switch that separates the MG from the mains [4]. As demand for electricity increases, we need environment-friendly generation and continuous power generation, so MG is the best suitable option. The operating cost of MG is very cheap compared to conventional grid [5]. Due to many factors, the demand for electricity has increased significantly in the last few years. During outage in grid and low renewable power generation than diesel generator for emergency to meet the load demands [6]. MG can be located at the consumer end whereas the convention grid is located faraway from residential areas [7]. As we know, all D. Gauttam (B) · A. Arora · M. Bhadu Electrical Engineering Department, Engineering College Bikaner, Bikaner, Rajasthan, India e-mail: [email protected] Shikha University College of Engineering and Technology, Bikaner, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_51
637
638
D. Gauttam et al.
residential equipment’s work on AC, so for AC load, AC MG is the best option. And for direct current (DC) load like electric vehicles, PV system, we preferred for DC MG. In the latest trends, hybrid MG (HMG) is most popular because it works on both AC and DC. By using a converter, MG can operate well for AC as well as direct currents (DCs), but conventional electric grids work on AC only [8]. Serval independent microgrids can supply large remote areas where each MG operates separately. To be energy self-sufficient, they are designed traditionally [9]. And all of the individual microgrids are interconnected to avoid voltage and current deviation and for continuous power supply [10]. Issues like power peak consumption, CO2 emissions, reserve generation capacity, large land requirements, transport and distribution losses, massive fuel cost or operating cost (which make electricity costs) can be reduced by the use of MG. As there is no fuel cost, so MG promotes uses of local renewable energy sources [11]. Many remote townships around the world are physically or financially unable to connect to the electricity grid. In these regions, small isolated diesel generators are employed to meet the electrical demand. The operational costs associated with some of these diesel generators may indeed be extremely high because of fossil fuel costs and issues in fuel delivery and generator servicing. In such situations, renewable energy sources (RESs) including such solar photovoltaic (PV) and wind turbine producers provide a feasible alternative to diesel generators for electricity production in off-grid communities [12]. The meteorological conditions are one of the biggest challenges to use MG as it changes with respect to times and power generation may decrease or increase depending upon the weather conditions. We can eliminate this problem with the help of Perturb & Observe (P&O) MPPT algorithm & some other control schemes like feedforward dq control for inverter & dq control for wind [13].
2 Modeling of the Test Power System A microgrid is a distributed network of distributed energy sources (DES such as wind and PV). In this model, the microgrid consists of PV system which provides DC supply to inverter, wind system that produces electricity from wind, battery system which stores energy & provides energy as needed, and diesel generator used for backup purpose. LC filter is used for filtering & controlling DC voltage & current. LC filter is preferred for islanded MG. And load is connected to inverter by PCC at AC bus. The capacity of MG is 5 MW, whereas PV, wind system generating 2 MW, 3 MW, respectively, and for backup purposes a diesel generator with rating of 5 MW, and 6500AH battery is used in this model. Figure 1 shows single-line diagram of small-scale 5 MW islanded MG.
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
AC-DC-AC Converter
639
PCC/ AC bus
Diesel Generator PCC Connection ON/OFF PWM Signal
MPPT & Feedforward dq control Control Scheme for PWM INVERTER
Diesel Generator g A
VDC
Battery System
B C
Load 1 LC Filter
Load 2 Load 3
Battery SOC Diesel Generator Control Scheme
Bi-Directional DCDC Converter PWM
Current Control Scheme for Battery charging /discharging controller
Diesel Generator PCC Connection ON/OFF
Vdc
Fig. 1 Small-scale 5 MW Islanded MG
3 Control Strategies For this MG, we are using four control schemes. P&O MPPT & feedforward dq control for PV system, current control for battery charging & discharging, dq control for wind system, energy management logic is implemented on diesel generators to turn on/off diesel generators. Figure 2 shows complete control strategies for smallscale 5 MW islanded MG.
3.1 Control Strategies for Battery Charging/Discharging Controller Current control schemes for charging & discharging of batteries have been used, and to generate reference current signals according to our condition, logic is developed here see Figs. 3 and 4 [13]. In Fig. 4 battery reference current and measured current, two inputs are taken, and output is given to PI controller to D to PWM generator to generate pulse for inverter.
640
D. Gauttam et al. Control Strategies
Energy Management Control Scheme for Diesel Generator
Control Scheme for PV System
Perturb & Observe (P&O) MPPT
Current Control Scheme for Battery Charging/ Discharging
Feedforward dq control for Inverter
Control Scheme for Wind System
dq Control for DFIG
Fig. 2 Control strategies for 5 MW small-scale islanded MG
Varrying Load Condition Case
Detect Mode Charging & Discharging
Reference Current Iref
Varrying Irradiance Condition
Fig. 3 Block representation of logic of battery charging/discharging
Iref
+
PI Controller
Duty Cycle to PWM
Getting Pulse to Inverter
I BAT
Fig. 4 Current control scheme for charging & discharging of battery
3.2 Control Strategies for PV System In a PV system, P&O MPPT gives good performance when there is a rapid change in the environment and generates the maximum power. Figure 5 shows the concept of P&O MPPT. Figure 6 shows flowchart of P&O MPPT algorithm. The MPPT’s P&O method works by increasing or decreasing the PV array voltages based on the change in
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers … (v) ΔPPV = 0 ΔVPV = 0
PPV PMPP
641
MPP
(iii) ΔPPV > 0 ΔVPV < 0
(ii) ΔPPV > 0 ΔVPV > 0
(iv) ΔPPV < 0 ΔVPV > 0
(i) ΔPPV < 0 ΔVPV < 0 Duty decrease
VMPP
Duty increase
VPV
Fig. 5 Concept of P&O MPPT
power (P) to reach MPP [14]. Because the output of solar is DC, so we need an inverter with a control scheme to convert this DC to AC. The inverter is controlled using a feedforward dq control method. From Eqs. (1 and 2), the coupling between d-axis and q-axis parameters in the dq mathematical model makes creating the controller complex. As a result, the new PI regulator may provide closed-loop stable system control, and thus, the feedforward dq control approach can be used, as shown in Fig. 7. The feedforward decoupling control strategy may provide independent management of real (P) and reactive power (Q) in the inner-loop current of a three-phase PV grid-connected inverter [15]. ud = L
di d + Ri d − ωLi q + ed dt
(1)
uq = L
di q + Ri q + ωLi d + eq dt
(2)
3.3 Control Scheme for Doubly Fed Induction Generator (DFIG) DFIG is utilized for variable wind operation. A winding rotor induction generator and an AC/DC/AC converter make up DFIG in wind turbines. The dq control method is used for the DFIG wind system. As shown in Fig. 8, the control system will create the pitch angle instruction as well as voltage control signals Vr and Vgc for the converter at rotor side (Cr) and converter at grid side (Cgrid), to manage the wind turbine’s power, DC bus voltage, and wattless power or voltage at the grid terminals [16].
642
D. Gauttam et al.
Start
Sense V(k), I(K)
Yes P(k)-P(k-1)=0
No No
Yes P(k)-P(k-1)>0
Yes
V(k)-V(k-1)>0
Decrease Vref
No
No
Increase Vref
V(k)-V(k-1)>0
Decrease Vref
Yes
Increase Vref
RETURN Fig. 6 P&O MPPT technique for PV system
3.4 Energy Management Control Strategies for Diesel Generator to Turn on/off Here, a control logic applied to on/off diesel generators. If PV and wind generation are greater than load demand, then diesel generators will be in off state. If PV and
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
643
ed
ivd_ref
+
PI
+
+
-
-
ivd_meas
ud
ωL Feed-forward compensation
ivq_meas ivq_ref
ωL +
+
+
PI
uq
+ eq
Fig. 7 dq-current controller block diagram with feedforward decoupling
wind generation less than load demand, then diesel generators will in on state, and it will supply excess load demand. [17, 18].
4 Result & Discussion The suggested MG design has been tested in a variety of situations, including (i) changing weather conditions and (ii) changing load demand circumstances. This section depicts the MG’s various characteristics under simulated situations.
4.1 Case1 Varying Irradiance Condition: Variable weather circumstances restrict the electricity generation from solar panels, which is one of the challenges for PV systems. In this part, the research was carried out to demonstrate the efficiency of the suggested MG under changing meteorological circumstances throughout time. The load demand was maintained constant at 4.5 MW for this case simulation investigation. With the use of a time-varying irradiance profile given to the solar panels, meteorological conditions were simulated. Solar panels generate the most electricity under normal conditions when exposed to 1000 W/m2 of light. The irradiance profile depicted in Fig. 9 was used to light solar panels.
644
D. Gauttam et al.
Solar panels were irradiated with a 200 W/m2 irradiance for two sec, then raised to 1000 W/m2 for four sec, and then maintained at 1000 W/m2 for the following two sec. After that, the irradiance was reduced to 200 W/m2 in four second and maintained at that level. And, respectively, we get the current & voltage of the PV panel. Since the DC-link voltage across solar panels in a PV system remains at 500 V (MPP), it follows that this is the MPP for solar panels. Figure 10 depicts the battery system’s SOC, which shows the battery charging/discharging controller’s proper operation. When the irradiance is 200 W/m2 , the PV system’s output is insufficient to run the loads; thus, the diesel generator and the battery system provide the extra power. As a result, the batteries begin to discharge, as seen by the SOC curve, which shows that SOC is dropping throughout this period. Battery SOC begins to curve for discharging to charging during the transition phase of irradiance from 200 W/m2 to 1000 W/m2 , when the PV system begins to provide adequate power to run the loads. When a solar panel is lighted with a 1000 W/m2 irradiance, there is surplus energy available from the solar panel, which causes the battery system to charge up, as seen by the growing SOC values. The current required from the battery system is seen in Fig. 10. Negative battery current signifies that now the battery system is being charged, whereas positive battery current signifies that it is being discharged. The power transfer to and from the battery system is depicted in Figs. 10 and 11 shows the THD at the inverter output across the load, as well as the short range of voltage and current throughout the load and also the instantaneous power flow to the load. Here, wind profile is varying from 12 m/s to 5 m/s for 0 to 14 s time samples. Figure 12 shows wind generator voltage, current, and power. Initially, the diesel generator will be in off state when wind is 12 m/s, but as it starts decreasing from 12 m/s to 5 m/s, then the diesel generator will come into action, and it will supply excess power to load as well as it will charge the battery also.
Ps
Qs
Rotor Side
AC
Wind Turbine Blade
Tm
ωr
Tem
DC
ωs
Grid Side
AC
DC to AC Converter
AC to DC Converter
Rotor Pr
Wind Pm Pitch angle
Fig. 8 Control scheme for DFIG
Vr
Vgc
Stator Qr
Control System
Pgc
Qgc
Three-phase grid
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
Fig. 9 PV panel measurement
Fig. 10 Battery measurement
645
646
D. Gauttam et al.
4.2 Case 2 Constant Irradiance and Variable Load Condition In this case, constant irradiance (1000w/m2 ) and variable load conditions are taken for MATLAB/SIMULATION. This example exhibits the suggested system’s ability to properly absorb load variations. In this simulation, the load across the MG is increased by 1.5 MW, i.e., 1.5 MW load was added to the MG during the simulation duration of 0–4.6 s. At the 4.6 and 9.6 s instants, further, 1.5 MW load increases over MG. As a result, total load throughout MG was 3 MW from 4.6 to 9.6 s of simulation ti and 4.5 MW from 9.6 to 14 s of simulation time. During the simulation duration of Case 2, the irradiance value remained constant at 1000 W/m2 , and the highest power used from the PV system as well as the maximum theoretical power supplied from the PV system is presented as shown in Fig. 13. Figure 14 depicts the charging/discharging rate of the battery mostly in the form of battery SOC. The amount of power required from the battery grows as the load on the system increases, as seen in Fig. 14. Excess power was transferred to the battery when the load across the system would be less than the power provided by the PV system, as evidenced by negative power throughout 0–4.6 s of simulation. When the load exceeds the energy available from the PV system, the battery system kicks in and delivers the excess power to the loads, as shown in Fig. 15 over the simulation duration of 4.6–9.8 s. The three-phase voltage and current waveform over the load is shown in Fig. 15. Figure 15 depicts total instantaneous power across the load, as
Fig. 11 Load measurement
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
647
Fig. 12 Wind generator measurement
well as overall THD across the simulation duration. During the whole simulation of load fluctuations, THD is less than 5%. Figure 16 shows a zoomed version of measurement of load current & voltage. Figure 17 shows effective power generation and voltage & current wave form of DFIG wind system. Initially, the diesel generator will be in off state when wind is 12 m/s, but as it starts decreasing from 12 m/s to 5 m/s, then the diesel generator will come into action, and it will supply excess power to load as well as it will charge the battery also.
5 Conclusion A MG with a capacity of 5 MW has been developed in this study. Solar & wind energy has been selected as the principal source of power in the proposed design. To offset the harmful impacts of climatic circumstances on the solar energy generating capacity, a battery-based energy storage technology has been included. Batteries store surplus energy produced by MG and discharge it when power consumption exceeds the amount prescribed. Diesel generator sets have also been used to power critical loads and as a backup system to prevent blackouts during the night or in exceptional weather conditions. THD is less than 5% for the entire simulation time. MATLAB/SIMULATION was used to construct and simulate the recommended
648
Fig. 13 PV panel measurement
Fig. 14 Battery measurement
D. Gauttam et al.
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
Fig. 15 Load measurement
Fig. 16 Load voltage and current zoomed version
649
650
D. Gauttam et al.
Fig. 17 Wind generator measurement
MG. The MG’s operation has also been shown in a variety of weather and load circumstances. In all circumstances, the recommended design is suitable.
References 1. Mao M, Zhu W, Chang L (2018) Stability analysis method for interconnected AC Is-LANDED microgrids. In: IEEE International power electronics and application conference and exposition (PEAC), pp 1–6. https://doi.org/10.1109/PEAC.2018.8590583. 2. Rathor B, Bhadu M, Bishnoi SK (2018) Modern controller techniques of improve stability of AC microgrid. In: 5th international conference on signal processing and integrated networks (SPIN), pp 592–596. https://doi.org/10.1109/SPIN.2018.8474249. 3. Wei X, Xiangning X, Pengwei C (2018) Overview of key microgrid technologies. Int Trans Electr Energy Syst 28(7). https://doi.org/10.1002/etep.2566 4. Liu X, Gao Z, Bian Y (2018) Large signal stability analysis of AC microgrids considering the storage system. In: 21st international conference on electrical machines and systems (ICEMS), pp 2023–2027. https://doi.org/10.23919/ICEMS.2018.8549445 5. Rathor B, Utreja N, Bhadu M, Sharma D (2018) Role of multi-band stabilizers on grid connected microgrid. In: 2nd international conference on micro-electronics and telecommunication engineering (ICMETE), pp 318–322. https://doi.org/10.1109/ICMETE.2018.00076 6. Agrawal V, Rathor B, Bhadu M, Bishnoi SK (2018) Discrete-time mode PSS Con-troller techniques to improve stability of AC microgrid. In: 8th IEEE India international conference on power electronics (IICPE), pp 1–5. https://doi.org/10.1109/IICPE.2018.8709509 7. Men Y, Ding L, Du Y, Lu X, Zhao D, Cao Y (2020) Holistic small-signal modeling and aiassisted region-based stability analysis of autonomous AC and DC microgrids. IEEE energy
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers …
8.
9.
10.
11.
12. 13.
14.
15.
16.
17.
18.
651
conversion congress and exposition (ECCE), pp 6162–6169. https://doi.org/10.1109/ECCE44 975.2020.9236022 Ranjan V, Arora A, Bhadu M (2021) Stability enhancement of grid connected AC microgrid in modern power systems. In: International conference on computational intelligence and emerging power system, algorithms for intelligent systems. Springer, Singapore, ISBN 978-981-16-4102-2. https://doi.org/10.1007/978-981-16-4103-9_29 Jadhav AM, Patne NR, Guerrero JM (2019) A novel approach to neighborhood fair energy trading in a distribution network of multiple microgrid clusters. IEEE Trans Industr Electron 66(2):1520–1531. https://doi.org/10.1109/TIE.2018.2815945 Arefi A, Shahnia F (2017) Tertiary controller-based optimal voltage and frequency management technique for multi-microgrid systems of large remote towns. IEEE Trans Smart Grid 9(6): 5962–5974. https://doi.org/10.1109/tsg.2017.2700054 Jin X (2015) Analysis of microgrid comprehensive benefits and evaluation of its economy. In: 10th international conference on advances in power system control, operation & management, pp 1–4. https://doi.org/10.1049/ic.2015.0279 Sawle Y, Gupta SC, Kumar Bohre A, Meng W (2016) PV wind hybrid system: a review with case study. Cogent Eng 3:1. https://doi.org/10.1080/23311916.2016.1189305. Nkambule M, Hasan A, Ali A (2019) Proportional study of perturb & observe and fuzzy logic control MPPT algorithm for a PV system under different weather conditions. In: 2019 IEEE 10th GCC conference & exhibition (GCC), pp 1–6, 2019. https://doi.org/10.1109/GCC45510. 2019.1570516142. Rosewater D, Ferreira S, Schoenwald D, Hawkins J, Santoso S (2019) Battery energy storage state-of-charge forecasting: models, optimization, and accuracy. IEEE Trans Smart Grid 10(3):2453–2462. https://doi.org/10.1109/TSG.2018.2798165 Huang T, Shi X, Sun Y, Wang D (2013) Three-phase photovoltaic grid-connected inverter based on feedforward decoupling control. In: International conference on materials for renewable energy and environment, 476–480. https://doi.org/10.1109/ICMREE.2013.6893714 Hu J, Huang Y, Wang D, Yuan H, Yuan X (2015) Modeling of Grid-connected DFIG-based wind turbines for DC-link voltage stability analysis. IEEE Trans Sustain Energ 6(4):1325–1336. https://doi.org/10.1109/TSTE.2015.2432062 Bhadu M, Punia VS, Bishnoi SK, Rathor B (2017) Robust noise mitigation control techniques for SMIB power system. In: 2017 international conference on computing and communication technologies for smart nation (IC3TSN), pp 7–12. https://doi.org/10.1109/IC3TSN.2017.828 4441 Bhadu M, Senroy N, Janardhanan S (2016) Discrete wide-area power system damping controller using periodic output feedback. Electr Power Componen Syst 44:17, 18921903. https://doi.org/10.1080/15325008.2016.1204571.
Chapter 52
A Review on Early Diagnosis of Lung Cancer from CT Images Using Deep Learning Maya M. Warrier and Lizy Abraham
1 Introduction Cancer was introduced by a Greek physician Hippocrates (c. 460 BC – c. 370 BC), also known as the “Father of Medicine”. He coined it with the Greek word “karkinos” meaning crab or tumor. This name was suggested due to the similarity of crab to a tumor with veins stretched on all sides. Later, a Roman encyclopaedist Celsus (c. 25 BC–50 AD) translated the word karkinos into Latin cancer. Cancers refer to a large family of diseases that has abnormal cell growth with the ability to spread to other parts of the body and form a subset of neoplasms. A group of cells that have undergone unregulated growth and will often form a mass or lump, but may be distributed diffusely is called a neoplasm or a malignant tumor, whereas benign tumors are those which do not spread. A lump, abnormal bleeding, prolonged cough, unexplained weight loss, and a change in bowel movements may indicate cancer, but can also have other causes. There are almost 100 types of cancers that affect human beings. Cancers are identified initially by the appearance of signs and symptoms or through screening and a definite diagnosis requires examination of a tissue sample by a pathologist. Tissue diagnosis helps to classify the type of cell that is proliferating into malignant tumors (carcinoma, sarcoma, leukemia) or benign tumors (lipoma, fibroma, adenomas, hemangioma, etc.). Patients diagnosed with cancer are subjected to more detailed medical investigations like serological (blood tests) and imageology tests. As of 2019, 18.5 million cases are reported in a year which is around 8.9 million deaths. The common sites of cancer in males are lung, prostate, colon, rectum, and stomach. In females, the common types are lung cancer, colon and rectal cancer, M. M. Warrier (B) · L. Abraham Department of Electronics and Communication Engineering, LBS Institute of Technology for Women, APJ Abdul Kalam Technological University, Thiruvananthapuram, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_52
653
654
M. M. Warrier and L. Abraham
cervical cancer, and breast cancer. Figure 1 depicts the leading sites of death statistics estimated by the American Cancer Society in 2022 [35]. The patient’s survival depends on the extent of metastasis and the cancer type. Practicing a healthy diet by consuming more vegetables and fruits, less alcohol, less red meat and helps in reducing the risk of cancer. A targeted multimodality treatment for cancer usually includes chemoradiation and surgery. Lung cancer or lung carcinoma is examined as the uncontrolled cell growth in lung tissues. This uncontrolled growth can spread beyond the lung to other parts of the body. Those cancers formed in lungs are called primary lung cancers and are carcinomas. Small cell lung carcinoma (SCLC) and non-small cell lung carcinoma (NSCLC) are the major two types of cancers. Symptoms that lead to lung cancer diagnosis are coughing up blood, chest pains, weight loss, fatigue, wheezing, shortness of breath, allergies, etc. [1] The major reason for lung cancer is long-term tobacco smoking, exposure to second-hand smoke, radon gas, genetic factors, asbestos, and other forms of air pollution. Lung cancer can be identified using chest radiography (X-ray), computed tomography (CT) scans, and magnetic resonance imaging (MRI). CT takes three-dimensional (3D) images of the chest and the presence of a nodule (any abnormal spots) in the lungs is diagnosed as a sign of cancer. Early stage lung cancer is not easily diagnosed on CT scan since it is very small in size and also due to the location of the pulmonary gland (small round lesions in the lung that can cause cancer if diagnosed late), and its symptoms are visible only when it gets advanced in stages. Figure 2 shows a CT lung image with the benign, primary malignant, and metastatic nodules. In 2022, 2.4 million people were estimated for new lung cancer cases and led to an estimate of 1.5 million deaths, thereby placing it as the most common cancer with highest mortality rate in both men and women and its early detection can save many lives. The increased number of preventive/early detection measures in the medical field has alleviated the demand for computerized solutions that provide accurate diagnosis in less time at reduced medical costs. Artificial intelligence (AI) is the ability of computer algorithms to mimic human cognition. AI gathers the data, processes them, and gives a well-defined result to the end-user with the help of machine learning and deep learning algorithms which makes it widely utilized in medical imaging applications. In order to achieve useful insights and predictions, these models are trained using a large amount of input data. In cancer research, AI has shown its potential in cancer prediction with higher accuracy levels than a general statistical expert in less time. Recently, computer-aided diagnosis systems are widely using deep learning algorithms that extract images features automatically and several medical image processing have been a success with this deep learning technology. Hence, researchers are developing several deep learning models to improve the accuracy of lung cancer detection with CT scans. Motivated by the application of deep learning approaches in cancer prediction, this article provides a review of the existing deep learning research works for the early prognosis of lung cancer.
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
655
Fig. 1 Estimated deaths for common cancer types in 2022 (www.cancer.org)
2 Related Works The growing number of patients diagnosed with cancer and the huge amount of data gathered during the treatment phase, and there has been an increase in the
656
M. M. Warrier and L. Abraham
Fig. 2 Lung CT image showing the three classifications of nodule-benign (left), primary malignant (middle), and metastatic malignant (right) [2]
need for AI to improve oncologic care. Several researchers have effectively implemented different machine learning and deep learning algorithms for the automated diagnosis of cancer. This section gives an overview of the existing deep learning models for lung cancer prediction. In [3], Aonpong et al. implemented the genotypeguided radiomics method (GGR) using the public radio genomics dataset of NSCLC which includes CT images and gene expression data. Results showed an accuracy of 83.28% at a low cost. An adaptive hierarchical heuristic mathematical model (AHHMM) was proposed by Yu et al., [5]. This method uses preprocessing, binarization, thresholding, segmentation, feature extraction, and finally cancer detection by a deep neural network (DNN) with an accuracy of 96.67% and improves image quality by using the likelihood distribution method. A low-dose CT scan system was developed by Ozdemir et al. [6], in which the model uses a system-based three-dimensional convolutional network for lung nodule detection and gives 96.5% accuracy. Silva et al. [4] used MLP for the classification of Epidermal Growth Factor Receptor (EGFR) mutation status. This innovative approach for the assessment of gene mutation status uses LIDC-IDRI and NSCLC Radiogenomics datasets and obtained the best prediction ability. Early prediction of lung cancer is time-consuming and a challenge for radiologists. A method combining deep learning and cloud computing was used by Masood et al. [8] to help radiologists reach lung nodule detection faster. This 3D Deep CNN (3DDCNN) uses a multi-region proposal network (mRPN) in architecture and predicted lung nodules (>3 mm diameter) with 98.5% accuracy but showed less accuracy in detecting smaller nodules of diameter less than 3 mm. Zhang et al. [7] put forward a system “Multi-Scene Deep Learning Framework (MSDLF)” by vesselness filter to determine large nodes (>3 mm) with increased accuracy and dramatically reduced false-positive rates. Model is designed by a fourchannel CNN that includes several steps such as collection of datasets, mending of lung contour and lung parenchyma segmentation, vessel removal, standardization of
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
657
dataset, design CNN, segmentation, and classification. Abdullah et al. [33] summarized different machine learning approaches like MLP, gradient boost tree, SVM, neural network, stochastic gradient descent, decision tree, ensemble classifier, etc., used by different researchers for classifying lung malignancies. An ensemble classifier that includes five machine models was proposed by Shanbhag et al. [47] for detecting lung cancer in CT images. The model includes KNN, decision tree, SVM, etc., and gives an accuracy of 85% in differentiating benign and malignant nodules and finds future research in using CNN for ensemble approaches. Pang et al. [10] conducted deep learning research in identifying lung cancer by accessing CT images of patients from Shandong Providence Hospital. Several image preprocessing methods such as transformation, rotation, and translation were performed on original images to increase training data to tackle the low data problem. Densely connected convolutional networks (DenseNet) were trained to classify the images of lung cancer. An adaptive boosting algorithm (Adaboost) was implemented to combine multiple classification results. The model was evaluated in real time by Shandong hospital and achieved 89.85% accuracy for lung cancer prediction. The risk of mortality in lung cancer was predicted by Guo et al. [9] proposing a knowledgebased Analysis of Mortality Prediction Network (KAMP-Net). This method used data augmentation to train CNN, thereby improving its performance. The evaluation criteria measured include cross-entropy and ROC AUC using The National Lung Screening Trial (NLST) dataset. The exact classification of carcinomatous lung nodules and feature score regression is important for automated lung node analysis models. Such a model was proposed by Liu et al. [11] named the MTMR-Net model incorporating Siamese network architecture in it. The architecture consists of three main modules—feature extraction, classification, and regression modules. The first module comprises a convolutional layer, a Res Block a, and three Res Block B followed by the second module which is a fully connected layer, and the third module with two fully connected layers. This model evaluated the performance parameters like accuracy, ROC, sensitivity, AUC, Specificity, and resulted in 93.5% accuracy. Chen et al. [14] developed a hybrid cancer segmentation algorithm based on convolutional neural networks for an accurate early diagnosis of SCLC. The proposed method incorporates a lightweight 3D CNN followed by a 2D CNN. The former is used to study long-range 3D contextual information, whereas the latter is used to observe finegrained semantic information. A hybrid features fusion module (HFFM) is used to merge the 2D and 3D features. This method gave an accuracy of 90.9% and sensitivity of 87.2% using the dataset from a hospital affiliated with Shandong University. While the HSN outperformed other state-of-the-art methods, it encountered certain drawbacks such as computation and memory requirements, and not enough dataset of healthy people’s CT scans. Mask Region Convolutional Neural Network (Mask R-CNN)-based 3D visualization method was proposed by Cai et al. [13] for detection and segmentation of nodules using the LUNA-16, Ali TianChi challenge datasets. The model consists of three modules—preprocessing (PrM), segmentation
658
M. M. Warrier and L. Abraham
module (DSM), and finally 3D reconstruction module (3DRM) and evaluated accuracy, specificity, F-score, false-positives per scan, and achieved 88.7% sensitivities of lung cancer. A deep CNN with a scale transfer module (STM) was proposed by Zheng et al. [12] for improving the diagnosis of pulmonary adenocarcinoma. This STM-Net model inputs the images of pulmonary nodes through four-layer convolution and later unifies the size of feature maps using max pooling and STM. Finally, use channel fusion to reach the classification. The results showed an accuracy of 95.455% using the Zhongshan Hospital Fudan University dataset. Nasser et al. [15] developed an artificial neural network (ANN) of architecture consisting of four main layers to detect the presence of lung cancer cells. The first layer of the proposed model is the input layer followed by one hidden layer as the second layer which is further followed by two hidden layers and finally an output layer as the fourth. This experiment showed the ability of the neural networks to diagnose cancer with an accuracy of 96.67%. A 3D convolutional neural network was proposed by Alakwaa et al. [43] for the detection of lung nodules and a U-Net architecture as a preprocessing step detecting region of interest for 3D CNNs. An accuracy of 86.6% was attained by the proposed system. Zhang et al. [36] developed a 3D convolutional neural network consisting of three modules: preprocessing module, segmentation, lung reconstruction module, and finally image enhancement. Results showed a sensitivity of 84.4%. Getting the structural information of a node is a difficult task for which Sahu et al. [38] developed a multiple view sampling-based multi-section CNN. This method consists of five processes: cross-section nodule, shared parameters, element wise maximum pooling, final layer retraining, and nodule classification. The model detects lung cancer with 93.18% accuracy. A double convolutional deep neural network (CDNN) along with a CDNN was proposed by Jakimovski et al. [37] to classify nodules from images. The K-means algorithm is used as an image preclassification method. The model gave 0.909 accuracy, whereas a regular CDNN gave 0.872 accuracy. An improved profuse clustering technique (IPCT) and deep learning with instantaneously trained neural networks (DITNNs) were proposed by Shakeel et al. [16] for improving the quality of lung images to diagnose lung cancer. IPCT consists of two features—image noise removal and image quality enhancement. The method intakes the images from cancer imaging archive (CIA) datasets and the noise is removed using the weighted mean histogram. Later, the segmentation is done using IPCT followed by the feature extraction. Finally, classification of lung cancer leading to cancer prediction is achieved with 98.42% accuracy using DITNN. In another work, Shakeel et.al [46] developed an improved deep neural network for lung segmentation. Hybrid spiral optimization intelligent-generalized rough set approach selects various features, and the final classification into benign and malignant levels is achieved through an ensemble classifier. To reduce the false-positive nodules of candidate nodules, a deep 3D residual convolutional neural network (CNN) was proposed by H. Jin et.al [17]. A spatial pooling and cropping (SPC) layer was incorporated for getting multi-level contextual information on CT images.
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
659
Liu et al. [41] designed a dense convolutional binary-tree network (DenseBTNet) having higher accuracy in classifying lung nodules. The advantage is that it maintains the DenseNet mechanism for separating lung nodules into different levels and enhances the properties that are multi-scaled. A deep learning-based CAD system to detect large nodules (>3 mm) and predict the probability of their malignancy was proposed by Li et al. in [42]. The model showed 86.2% sensitivity for cancer prediction using LIDC-IDRI and NLST datasets. Teramoto et al. [40] proposed deep convolutional neural network (DCNN) having several layers such as convolutional, fully connected, and pooling for automatic and accurate classification of lung nodules. An automated system involves screening candidate nodules and reducing false-positives. Dou et al. [39] designed 3D convolutional neural networks (CNNs) comprising 3D convolutional layers, 3D max pooling, fully connected, and softmax layers. Each layer encodes unique patterns which finally detect lung cancer with 94.4% sensitivity. A deep learning method FPSO is proposed by Asuntha et al. [44] for improving the efficiency of CNN and also reducing its computational complexity. The image is preprocessed with histogram equalization technique and Adaptive Bilateral Filter (ABF) followed by lung region extraction using Artificial Bee Colony (ABC) segmentation. Best feature extraction methods like wavelet transform-based features, Scale Invariant Feature Transform (SIFT), Zernike Moment, Local Binary Pattern (LBP), and Histogram of oriented Gradients (HoG) are used for geometric, volumetric, texture, and intensity feature extractions. A Fuzzy Particle Swarm Optimization (FPSO) algorithm selects the best out of these features and is classified using deep learning. The model finds the need of grading the malignancy levels of nodules for clinical applications in the future research. Nasrullah et al. [45] proposed a multi-strategy deep learning model for the classification of pulmonary nodules by reducing the false-positive rates at an early stage. A 3D Faster R-CNN and CMixNet were used first to analyze a 3D CT lung image. Nodules were identified using a U-Net-like encoder-decoder which was later studied through 3D CMixNet with gradient boosting machine (GBM) to classify them into benign, normal and malignant nodules. Finally, the results were evaluated with several factors like age, smoking history, patient family history, clinical biomarkers, and nodule size/location. Table 1 depicts the summary of observations from the literature review on lung cancer using deep learning algorithms.
3 Datasets Datasets are important for any machine learning or deep learning approach. It is the quality of the available data that helps develop and improve the algorithms. This section gives an overview of datasets used in research works related to deep learning for the early detection of lung cancer. The Lung Image Database Consortium (LIDC-IDRI) [26] is an international resource that consists of 1018 cases including images of clinical thoracic CT scans with marked-up annotated lesions. It was initiated by the National Cancer Institute (NCI) for the development and evaluation of
660
M. M. Warrier and L. Abraham
Table 1 Summary of the reviewed articles for lung cancer detection using deep learning S. No.
Author
Summary
1
Aonpong et al. [3]
- Genotype-guided radiomics method (GGR) - Obtained an accuracy of 83.28% at a low cost - NSCLC public radiogenomic datasets
2
Silva et al. [4]
- MLP architecture - Best prediction approach for the study of gene mutation status - LIDC-IDRI, NSCLC datasets
3
Yu et al. [5]
- Adaptive hierarchical heuristic mathematical model (AHHMM) - Uses likelihood distribution technique for improving the quality of the image - Attained an accuracy of 96%
4
Ozdemir et al. [6]
- Low-dose CT scan system - An accuracy of 96.5% and reduces the cancer mortality rate by 20% annually - It has a limitation of large nodule datasets - LUNA-16, LIDC-IDRI, Kaggle datasets
5
Zhang et al. [7]
- Multi-scene deep learning framework (MSDLF) - Effective in detecting large nodules with reduced false-positive rates - LIDC-IDRI dataset
6
Masood et al. [8]
- 3D convolutional deep neural network (3DDCNN) - System archives 98% accuracy by combining cloud computing and deep learning methods - Less accurate for nodules less than 3 mm diameter - Datasets: LUNA-16, LIDC-IDRI, ANODE09, SHANGHAI Hospital
7
Guo et al. [9]
- Knowledge-based analysis of mortality prediction network (KAMP-Net) - Improved CNN performance in predicting risk of mortality due to its effective data augmentation methods - The manual way of selection and measurements of datasets challenge the system to outperform. Assessed NLST dataset
8
Pang et al. [10]
- Densely connected convolutional networks (DenseNet) along with Adaboost - A deep learning system that solves low data issues - Obtained an accuracy of 89.85% - Shandong hospital dataset
9
Liu et al. [11]
- MTMR-Net model - Automated nodule detection system with 93.5% accuracy
10
Zheng et al. [12]
- A deep CNN with a scale transfer module (STM-Net) - Improves detection of pulmonary adenocarcinoma with 95.5% accuracy - Zhongshan hospital dataset (continued)
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
661
Table 1 (continued) S. No.
Author
Summary
11
Cai et al. [13]
- Mask Region Convolutional Neural Network (Mask R-CNN) -This 3D visualization method generalizes the unseen data - Memory optimization is a problem - Datasets: LUNA-16, AliTianChi challenge
12
Chen et al. [14]
- Hybrid segmentation network (HSN) - The mean accuracy is 0.909. Computation and memory required are a challenge - Dataset from hospital under Shandong university
13
Nasser et al. [15]
- Artificial neural network (ANN) - Accuracy is 96.67% - NSCLC dataset
14
Shakeel et al. [16]
- Deep learning with instantaneously trained neural networks (DITNN) -Obtained an accuracy of 98.42% and minimum classification error of 0.038 - Dataset from cancer imaging archive (CIA)
15
Jin et al. [17]
- Deep 3D residual CNN model -The SPC layer improves prediction accuracy by reducing false-positive nodule detections - LUNA-16 Challenge dataset
16
Alakwaa et al. [43]
- 3D CNN model outperforms in the test set giving 86.6% accuracy - Deeper network and hyper-parameter tuning can improve the performance - Datasets: CT scan dataset from Kaggle, LUNA-16
17
Zhang et al. [36]
- 3D convolutional neural network architecture - The model achieved high sensitivity and specificity for nodule classification - Limited datasets and the heterogeneity in image quality was a drawback - Datasets: LUNA-16, Kaggle, Guangdong Provincial Hospital
18
P.Sahu et al. [38]
- A multiple view sampling-based multi-section CNN model - Obtained the node structural information with 93.18% accuracy
19
Jakimovski et al. [37]
- Double convolutional deep neural network (CDNN) with a regular CDNN - The model helps in giving 90.9% accuracy for early cancer prediction
20
Liu et al. [41]
-Dense convolutional binary-tree network (DenseBTNet) model - The model introduces a new operation center-crop into the mechanism of DenseNet and improves the malignancy level classification of lung nodules. DenseBTNet learns more compact models than DenseNet - LIDC-IDRI dataset (continued)
662
M. M. Warrier and L. Abraham
Table 1 (continued) S. No.
Author
Summary
21
Li et al. [42]
- Deep learning-based computer-aided diagnosis system -The DL-CAD system is a nodule detection and classification method that showed higher accuracy rates -False-positive rate and less accurate nodule characterization are disadvantages - NLST and LIDC-IDRI datasets
22
Teramoto et al. [40]
- The deep convolutional neural network model uses microscopic images as input and obtained an accuracy rate of 70% which is comparable to the accuracy of pathologists - Dataset: 76 cases of cancer cells
23
Dou et al. [39]
- 3D convolutional neural network architecture - Computer-aided lung nodule detection method for reducing false-positive rates - LUNA-16 dataset
24
Asuntha et al. [44]
- Fuzzy Particle Swarm Optimization CNN (FPSOCNN) model - Uses the best feature extraction method for detecting the location of cancerous lung nodules. Model reduced computational complexity of CNN - Datasets: LIDC-IDRI, Arthi Scan Hospital
25
Nasrullah et al. [45]
- Deep 3D customized mixed link network (CMixNet) architecture -Aims to reduce false-positive rates with less computational cost. The model considers several features for the diagnosis of malignancy levels in nodules making the classification process a bit challenging - Datasets: LIDC-IDRI, LUNA-16
26
Shakeel et al. [46]
- Improved deep neural network (DNN) and ensemble classifier model - The model uses optimized image processing and deep learning techniques to predict lung cancer with an accuracy of 96.2%. The system can be optimized by including more datasets - Dataset from cancer imaging archive (CIA)
computer-aided diagnostic methods for lung cancer detection. The LUNA-16 (LUng Nodule Analysis) dataset [27] is another resource for lung segmentation consisting of 1186 lung nodules annotated in 888 CT scans. Another dataset is the National Lung Screening Trial (NLST) which was a randomized controlled clinical trial of screening tests for lung cancer and compared the two ways of detecting lung cancer [28]: low-dose helical computed tomography (CT) and the standard chest X-ray. Almost 54,000 participants enrolled in the period August 2002–April 2004. The dataset Automatic Nodule Detection (ANODE09) [29] was provided by Nelson’s study is the largest CT lung cancer screening trial in Europe and consists of around 55 CT scans. Data Science Bowl 2017 is a database [30] of CT scans that were created
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
663
in 2014 for social good competition and were presented by Booz Allen Hamilton and Kaggle. SPIE-AAPM-NCI LungX [31] and the Danish Lung Cancer Screening Trial (DLCST) [32] are some other datasets for lung cancer detection.
4 Deep Learning in Lung Cancer Detection Deep learning or deep structured learning is a subsection of machine learning having multiple layers to pull out higher features from input data that is centered on the artificial neural networks. Different deep learning architectures like deep neural networks, recurrent neural networks, deep belief networks, deep reinforcement learning, and convolutional neural networks are being widely implemented to the fields like computer vision, bioinformatics, speech recognition, medical image analysis, natural language processing, etc., producing comparable or better results than human expert performance. The deep learning system that we experience today was developed by Frank Rosenblatt. Rina Dechter introduced the term deep learning in 1986 to the machine learning community and to artificial neural networks by Igor Aizenberg in 2000. The efficiency of deep learning has increased its application in medical-assisted diagnosis and can save the lives of a lot of patients. The deep learning studies for the detection of lung nodules (a small mass of tissue in the lung that looks like a round white shadow in a chest X-ray or CT scan) are mainly focused on lung CT images. By observing nodule textures in CT images, doctors can clearly identify the risk stage of lung cancer in their patients. In the training phase, deep learning can perform end-to-end detection by learning the most important features, thereby keeping an ample training set. The property of variability enables the system to analyze invariant features from malignant nodules and gives a better performance. The trained network is generalizing its learning and predicts the malignant SPNs on new cases which it had never seen before. Deep learning is applied to a medical image analysis using several steps: Lung CT image acquisition, data preprocessing, lung segmentation, pulmonary nodule detection followed by its classification into normal, benign, and malignant nodules for the lung cancer prediction [35]. Figure 3 shows the steps involved in the deep learning process of lung cancer analysis. The Lung Image Dataset Consortium (LIDC) and Image Database Initiative (IDRI) are the most commonly available public dataset for researchers in medical imaging analysis. Figure 4a is a 2D CT scan in Digital Imaging and Communication in Medicine (DICOM) format available in the LIDC dataset. To eliminate the initial noise and distortions in the CT scan image, preprocessing of the most relevant data is performed thereby enhancing the features in that image. The pixel values are converted into Hounsfield Units (HU), a quantitative scale for radiodensity [19] for image enhancement. The HU scale reference for lung is -500 and is used to mask all portions of CT scans whose HU measures are not consistent with the value of the lung. Lung parenchymal segmentation is carried out after the resampling and 3D plotting [20].
664
M. M. Warrier and L. Abraham
Fig. 3 Deep learning system for lung cancer detection
(a)
(b)
Fig. 4 a 2D CT scan slice which shows lung parenchyma as a dark region and surrounding tissue with a higher intensity. b The region marked in red is a nodule and that marked in green is a non-nodule
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
665
Image modification such as lung segmentation is applied to identify the Region of Interest (ROI) so that detection could be performed within this region. The nonhomogeneity in lung regions makes this task a challenging one [21]. In a lung CT image, the higher brightness part is the blood vessels or air and that of lower brightness is the lung parenchyma. Our ROI is the lung parenchyma and its proper segmentation improves the accuracy and reduces the detection time. Several pieces of research are being done by scholars to improve the segmentation methods for increasing this accuracy. Figure 4a shows lung parenchyma [18]. There are mainly two types of lung segmentation methods—first, is using HU value and the other is by using binary deformation model such as level set method and snake method. In [22], a threshold method is used to binarize images and the morphological opening operation provides an edge compensation in the lung parenchyma. Researchers in [23] used a region-based segmentation, fuzzy connectivity, and optimal key point analysis for an optimal lung segmented image. A wavelet transform theory is used by authors in [24] which gives great results for lung parenchyma that are irregularly shaped. The presence of a nodule is shown in Figs. 4b and 5 depicts the nodule detection stages from an input CT slice. A two-stage method is commonly used for pulmonary nodule detection [25]. The first step is the candidate detection and the second is the classification of nodules into true- and false-positive nodules. The most common deep learning models for nodule detection are R-CNN, Faster R-CNN, U-Net, etc. True/false-positive nodule classification can be performed using convolutional neural networks. Su in [49] proposed a solitary pulmonary nodule detection scheme with the Faster R-CNN framework as shown in Fig. 6. A patch-based 3D U-Net depicted in Fig. 7 was proposed by Zhao et al. [50] for automatic nodule segmentation and classification. After successful detection of pulmonary nodules, it is necessary to predict the shape, size, and other features of nodules to diagnose whether the nodule is benign, normal and malignant. Deep learning uses a sequence of candidate nodules to train a deep neural network. This phase is called the training phase where the best approach of choosing deep learning algorithms, transfer learning, and ensemble models, is performed and the best is selected based on the data available with us. Deep belief
Fig. 5 Depicts the result of nodule detection from an input CT slice [18]
666
M. M. Warrier and L. Abraham
Fig. 6 Faster R-CNN framework for nodule identification process [49]
Fig. 7 Lung nodule segmentation and classification using 3D U-Net architecture [50]
network (DBN), recurrent neural network (RNN), and convolutional neural network (CNN) are examples of deep learning algorithms. Transfer learning is a machine learning method that focuses on the transfer or sharing of knowledge among the models. Ensemble deep learning approach combines several deep learning models into one model to improve prediction accuracy and is a more stable system with less noise compared to a single model. Well-known advanced ensemble techniques include boosting, bagging, and stacking. The final stage of a deep learning system is the classification phase in which the trained model will predict to which class a particular image belongs, i.e., whether the given lung CT image indicates a normal or benign, or malignant lung. A system that predicts lung cancer through preprocessing, segmentation, and classification is presented in Fig. 8 [34]. Finally, different performance metrics such as sensitivity, specificity, F-score, accuracy, FROC [48], precision, receiver operating characteristics curve (ROC), the
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
667
Fig. 8 Machine-supported system for lung cancer prediction [34]
area under the curve (AUC), and Competition Performance Matrix (CPM) are used to analyze the output of the developed deep learning model.
5 Conclusion The importance of deep learning for the early lung cancer prediction was discussed in this article. Paper summarized different existing deep learning algorithms along with their advantages and the survey shows that deep learning stands as an important part in serving the health community by providing a fast and accurate early prediction of disease at a low cost saving many lives. Most models are on convolutional networks (CNN) and some other researchers use hybrid and traditional machine learning techniques. Methods which utilize CNN and hybrid classifiers outperform in carcinoma prediction with higher accuracy. Even though several researchers have shown significant results in their deep learning algorithm accuracy, there is an unmet need for resolving outstanding challenges in lung cancer diagnosis. One of them is to design an algorithm to detect pulmonary nodules of size less than 3 mm. The ensemble deep learning methods can help this challenge in the future. Acknowledgements The authors are indebted to the Kerala State Council for Science, Technology and Environment (KSCSTE), Kerala, India, for the funding support under the grant number KSCSTE/972/2018-FSHP-MAIN. The authors are grateful to LBS Institute of Technology for Women, Kerala, India, for meting out the infrastructure and library resources.
668
M. M. Warrier and L. Abraham
References 1. Abdullah DM, Ahmed NS (2021) A review of most recent lung cancer detection techniques using machine learning. Int J Sci Bus 5(3):159–173 2. Alakwaa W, Nassef M, Badr A (2017) Lung cancer detection and classification with 3D convolutional neural network (3D-CNN). Lung Cancer 8(8):409 3. Aonpong P, Iwamoto Y, Han XH, Lin L, Chen YW (2021) Genotype-guided radiomics signatures for recurrence prediction of non-small cell lung cancer. IEEE Access 9:90244–90254 4. Armato III SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer C, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA et al (2015) Data From LIDC-IDRI. https://wiki.cancerimagin garchive.net/display/Public/LIDC-IDRI 5. Armato III SG, Hadjiiski L, Tourassi GD, Drukker K, Giger ML, Li F, Redmond G, Farahani K, Kirby JS, Clarke LP (2015) Spie-aapm-nci lung nodule classification challenge dataset. Cancer Imaging Arch 10, p K9. 6. Asuntha A, Srinivasan A (2020) Deep learning for lung cancer detection and classification. Multimedia Tools Appl 79(11):7731–7762 7. Besbes A, Paragios N (2011) Landmark-based segmentation of lungs while handling partial correspondences using sparse graph-based priors. In: 2011 IEEE international symposium on biomedical imaging: from nano to macro. IEEE, pp 989–995 8. Cai L, Long T, Dai Y, Huang Y (2020) Mask R-CNN-based detection and segmentation for pulmonary nodule 3D visualization diagnosis. IEEE Access 8:44400–44409 9. Chen W, Wei H, Peng S, Sun J, Qiao X, Liu B (2019) HSN: hybrid segmentation network for small cell lung cancer segmentation. IEEE Access 7:75591–75603 10. Data Science Bowl (2017) Available online https://www.kaggle.com/c/data-science-bowl-2017 11. Danish Lung Cancer Screening Trial (DLCST)—Full-Text View— ClinicalTrials.Gov. Online: https://clinicaltrials.gov/ct2/show/NCT00496977 12. Dou Q, Chen H, Yu L, Qin J, Heng PA (2016) Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection. IEEE Trans Biomed Eng 64(7):1558–1567 13. Essaf F, Li Y, Sakho S, Kiki MJM (2019) Review on deep learning methods used for computeraided lung cancer detection and diagnosis. In: Proceedings of the 2019 2nd international conference on algorithms, computing and artificial intelligence, pp 104–111, China 14. Guo H, Kruger U, Wang G, Kalra MK, Yan P (2019) Knowledge-based analysis for mortality prediction from CT images. IEEE J Biomed Health Inform 24(2):457–464 15. Hosseini H, Monsefi R, Shadroo S (2022) Deep learning applications for lung cancer diagnosis: a systematic review. arXiv preprint arXiv:2201.00227 16. Jakimovski G, Davcev D (2019) Using double convolution neural network for lung cancer stage detection. Appl Sci 9(3):427 17. Jin H, Li Z, Tong R, Lin L (2018) A deep 3D residual CNN for a false-positive reduction in pulmonary nodule detection. Med Phys 45(5):2097–2107 18. Kang G, Liu K, Hou B, Zhang N (2017) 3D multi-view convolutional neural networks for lung nodule classification. PLoS ONE 12(11):e0188290 19. Khehrah N, Farid MS, Bilal S, Khan MH (2020) Lung nodule detection in CT images using statistical and shape-based features. J Imaging 6(2):6 20. Kuan K, Ravaut M, Manek G, Chen H, Lin J, Nazir B, Chen C, Howe TC, Zeng Z, Chandrasekhar V (2017) Deep learning for lung cancer detection: tackling the Kaggle data science bowl 2017 challenge. arXiv preprint arXiv:1705.09435 21. Lee IJ, Gamsu G, Czum J, Wu N, Johnson R, Chakrapani S (2005) Lung nodule detection on chest CT: evaluation of a computer-aided detection (CAD) system. Korean J Radiol 6(2):89–93 22. Li W, Nie SD, Cheng JJ (2007) A fast automatic method of lung segmentation in CT images using mathematical morphology. In: World congress on medical physics and biomedical engineering 2006, pp 2419–2422. Springer, Berlin, Heidelberg 23. Li L, Liu Z, Huang H, Lin M, Luo D (2019) Evaluating the performance of a deep learningbased computer-aided diagnosis (DL-CAD) system for detecting and characterizing lung
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using …
24. 25. 26.
27.
28.
29. 30.
31.
32. 33. 34. 35. 36.
37.
38.
39. 40.
41.
42.
669
nodules: comparison with the performance of double reading by radiologists. Thoracic cancer 10(2):183–192 Liu Y, Hao P, Zhang P, Xu X, Wu J, Chen W (2018) Dense convolutional binary-tree networks for lung nodule classification. IEEE Access 6:49080–49088 Liu L, Dou Q, Chen H, Qin J, Heng PA (2019) Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE Trans Med Imaging 39(3):718–728 Mansoor A, Bagci U, Xu Z, Foster B, Olivier KN, Elinoff JM, Suffredini AF, Udupa JK, Mollura DJ (2014) A generic approach to pathological lung segmentation. IEEE Trans Med Imaging 33(12):2293–2310 Masood A, Yang P, Sheng B, Li H, Li P, Qin J, Lanfranchi V, Kim J, Feng DD (2019) Cloudbased automated clinical decision support system for detection and diagnosis of lung cancer in chest CT. IEEE J Transl Eng Health Med 8:1–13 Nasrullah N, Sang J, Alam MS, Mateen M, Cai B, Hu H (2019) Automated lung nodule detection and classification using deep learning combined with multiple strategies. Sensors 19(17):3722 Nasser IM, Abu-Naser SS (2019) Lung cancer detection using artificial neural network. Int J Eng Inf Syst (IJEAIS) 3(3):17–23 Ozdemir O, Russell RL, Berlin AA (2019) A 3D probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose CT scans. IEEE Trans Med Imaging 39(5):1419– 1429 Pang S, Zhang Y, Ding M, Wang X, Xie X (2019) A deep model for lung cancer type identification by densely connected convolutional networks and adaptive boosting. IEEE Access 8:4799–4805 Powers DM (2020) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804. 02767 Saba T (2020) Recent advancement in cancer detection using machine learning: systematic survey of decades, comparisons and challenges. J Infect Public Health 13(9):1274–1289 Sahu P, Yu D, Dasari M, Hou F, Qin H (2018) A lightweight multi-section CNN for lung nodule classification and malignancy estimation. IEEE J Biomed Health Inform 23(3):960–968 Setio AAA, Traverso A, De Bel T, Berens MS, Van Den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci ME, Geurts B, van der Gugten R (2017) Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med Image Anal 42:1–13 Shakeel PM, Burhanuddin MA, Desa MI (2020) Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier. Neural Comput Appl pp 1–14 Shakeel PM, Burhanuddin MA, Desa MI (2019) Lung cancer detection from CT image using improved profuse clustering and deep learning instantaneously trained neural networks. Measurement 145:702–712 Shanbhag GA, Prabhu KA, Reddy NS, Rao BA (2022) Prediction of lung cancer using ensemble classifiers. In: journal of physics: conference series (2161(1): 012007). IOP Publishing Shojaii R, Alirezaie J, Babyn P (2007) Automatic segmentation of abnormal lung parenchyma utilizing wavelet transform. In: 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07 . IEEE vol 1, pp I–1217 USA Silva F, Pereira T, Morgado J, Frade J, Mendes J, Freitas C, Negrão E, De Lima BF, Da Silva MC, Madureira AJ, Ramos I (2021) EGFR assessment in lung cancer CT images: analysis of local and holistic regions of interest using deep unsupervised transfer learning. IEEE Access 9:58667–58676 Teramoto A, Tsukamoto T, Kiriyama Y, Fujita H (2017) Automated classification of lung cancer types from cytological images using deep convolutional neural networks. BioMed Res Int
670
M. M. Warrier and L. Abraham
43. Trial Summary—Learn—NLST—The cancer data access system. Available online https://bio metry.nci.nih.gov/cdas/learn/nlst/trial-summary/ 44. Van Ginneken B, Armato III SG, de Hoop B, van Amelsvoort-van de Vorst S, Duindam T, Niemeijer M, Murphy K, Schilham A, Retico A, Fantacci ME, Camarlinghi N (2010) Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: the ANODE09 study. Med Image Anal 14(6): 707–722 45. Yu H, Zhou Z, Wang Q (2020) Deep learning assisted predict of lung cancer on computed tomography images using the adaptive hierarchical heuristic mathematical model. IEEE Access 8:86400–86410 46. Zeb I, Li D, Nasir K, Katz R, Larijani VN, Budoff MJ (2012) Computed tomography scans in the evaluation of fatty liver disease in a population-based study: the multi-ethnic study of atherosclerosis. Acad Radiol 19(7):811–818 47. Zhang Q, Kong X (2020) Design of automatic lung nodule detection system based on multiscene deep learning framework. IEEE Access 8:90380–90389 48. Zhang C, Sun X, Dang K, Li K, Guo XW, Chang J, Yu ZQ, Huang FY, Wu YS, Liang Z, Liu ZY (2019) Toward an expert level of lung cancer detection and classification using a deep convolutional neural network. Oncologist 24(9):1159–1165 49. Zhao C, Han J, Jia Y, Gou F (2018) Lung nodule detection via 3D U-Net and contextual convolutional neural network. In: 2018 international conference on networking and network applications (NaNA), IEEE, pp 356–361 China 50. Zheng J, Yang D, Zhu Y, Gu W, Zheng B, Bai C, Zhao L, Shi H, Hu J, Lu S, Shi W (2020) Pulmonary nodule risk classification in adenocarcinoma from CT images using deep CNN with scale transfer module. IET Image Proc 14(8):1481–1489
Chapter 53
A Context-Based Approach to Teaching Dynamic Programming András Kakucs , Zoltán Kátai , and Katalin Harangus
1 Introduction In the dynamically evolving world of recent decades, it has become apparent to professionals that more emphasis needs to be placed on the development of computational thinking. Not only is it essential for the labor market to train students who can be effective in complex problem situations, but also to develop skills to understand how the digital world works and what lies beneath the surface of the information society. Recent studies indicate that teaching programming skills occupies a central role in developing computational thinking skills in numerous fields of science. Although this statement is more relevant regarding higher education, the importance of teaching programming on a high-school level is becoming more and more evident, and there are initiatives for introducing it in primary education as well. On the other hand, decades of experience show that teaching programming in an effective manner is challenging even in the case of university students. A context-based approach is an effective way to teach programming. It assumes that we place the basic elements and theorems of programming in a context that is close to the target group. It was observed, using a teaching tool that helps one see through a sequence of actions and understand planning in order to solve a problem will speed up the process of understanding [1]. A. Kakucs (B) · Z. Kátai · K. Harangus Faculty of Technical and Human Sciences, Sapientia Hungarian University of Transylvania, O.P. 9, C.P. 4, 540485 Târgu-Mure¸s, Romania e-mail: [email protected] Z. Kátai e-mail: [email protected] K. Harangus e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_53
671
672
A. Kakucs et al.
In our research, we focused on developing students’ computational thinking through the development of context-based algorithmic visualization teaching tools and task sets. We have developed two types of visualization tools to understand algorithms: a hardware-based tool and a software-based one. This study presents the tools developed along the task sequences that visualize the algorithm.
2 Algorithm Visualization in Programming Education Teaching algorithms with visualization tools can be an effective way to teach computer science. Its role and place in education has already been demonstrated in a number of studies [2–4]. Visualization of algorithms helps the instructor present the algorithm on the one hand [5–7], and it helps students understand the operation and design strategies of the algorithms [8, 9]. The method provides an opportunity to answer and solve the problems that arise [10]. Algorithm visualization has now evolved into a complex structure of interaction systems through which students can not only learn how algorithms work but can also change them or create their own visualizations [4, 11]. Törley [12] puts it this way: “Algorithm visualization (AV) is a subclass of software visualization, and it handles the illustration of high-level mechanisms of computer algorithms, usually in order to help the pupils understand the function of the procedures of the algorithm better.” When we simulate an algorithm in education using some visualization tool, we can convey more than just studying the text of the algorithm. Mayer calls this effect the multimedia effect. He formulates five principles for multichannel teaching in the “Cognitive Theory of Multimedia Learning” [13], as cited in [12]: 1. Multiple Representation Principle: Teaching is more effective if we give our presentation not only in a traditional way, but also with elements of multimedia. 2. Contiguity Principle: The elements of the graphic illustration should be displayed at the same time as the text of the presentation, not separately. 3. Split-Attention Principle: During the presentation, in addition to the visual material, the explanation should not be presented visually, but conveyed audibly. 4. Individual Differences Principle: The effects of multimedia can vary depending on the prior knowledge students have. 5. Coherence Principle: Use as few irrelevant concepts or images as possible in your multimedia explanation. Algorithm visualization tools can be divided into different levels and classes depending on how they can be changed, how much and at what level they communicate with the user while running the visualization. Considering the supplemental taxonomy defined by Myller et al. [14], Bende [15] categorized the types of tasks that help programming instruction with algorithm visualization and animation, according to the level of interactivity. 1. Responding: The level of interactivity only extends to the assessment of knowledge, not its acquisition.
53 A Context-Based Approach to Teaching Dynamic Programming
673
2. Changing: Tasks provide an opportunity to modify the visualization. 3. Constructing: Students can create their own algorithms as well as their associated visualizations. 4. Presenting: During interactivity, students create their own visualization of the algorithm and present it to their peers. According to Bende [15], we can also link the live representations of each task to the algorithm visualization. A good example of this is the book CSUnplugged, which has task-specific descriptions, usability, and materials that make the operation and use of the algorithm realistic [16]. The book Algorithms Unplugged also presents similar tasks with more complex algorithms [17]. The approach is similar in both books: Its goal is to motivate students to learn programming by presenting algorithms with detailed explanations and a variety of visualization examples. The most common visualization tools are software based. For example, in the Flexbox Froggy [18] teaching game, the student can easily learn some basic commands in a CSS environment. The game consists of 24 levels. Its visual interface is colorful and compelling, but a drawback is that the source code cannot be changed by the student. Grid Garden [19] can also be used to teach CSS programming. It is similar to the first game regarding its operation and structure. Esteves et al. [20] have developed an online role-playing game that can be used for teaching C, C++, and C# programming languages. In the game called Second Life, each student is given a character (or as Esteves puts it, an avatar), and they must complete tasks in the game world as in the real world. However, in order to accomplish this, they need programming skills. Researchers at the University of Jouensuu (Finland) have developed a Java-based algorithm visualization system. Jeliot 3 [21] was designed for beginner programmers, in which animations represent the step-by-step execution of Java programs. The animation of the steps simulates how the program code is interpreted. Students learn to code while being able to follow its operation through animation. Also, one of the strengths of algorithm visualization may be the involvement of multiple senses in learning. Through their study [22–24], Kátai et al. confirm the hypothesis that a teaching method that affects different sensory organs effectively supports the teaching and learning of algorithms. Building on the basic idea of involving more senses in programming education, the AlgoRhythmics project was born. In 2011, six videos were published with four more following in 2018, in which computer algorithms were illustrated with different dance styles [25]. As its name suggests, this educational strategy is a blend of algorithm (informatics) and rhythm (rhythm, dance), that is, science and art. Its uniqueness lies in the fact that it uses different algorithms, and their associated courses to help users develop algorithmic thinking by guiding them from dance to code. All this is made possible by five basic learning steps: the phases of video, animation, orchestrating, code building, and the code brought to life [26]. It is already more difficult to find an illustrative tool for the visualized visualization of algorithms. This includes not only sharing on the Internet as a software-based way to use them, but also making them available in series and making them available to
674
A. Kakucs et al.
users. The institution, the teacher, needs to get the right tool to visualize the algorithm. At the same time, students should share the use of the tools. The Hanoi Towers math game solution is a hardware-based approach to the recursive algorithm. The game has donut-shaped disks of different sizes and colors, and three support bars attached to a common base on which the disks can be placed. The essence of the Tower of Hanoi puzzle is that disks stacked in a pyramid-like descending order must be stacked from one rod to another so that only one disk can be transferred in each step; a larger disk may not be placed on a smaller disk, and all three bars are available for use.
3 The Two Selected Tasks In this study, dynamic programming was chosen as the algorithm for visualization. This is one of the most difficult parts of our Computer Science course according to students, but at the same time, it is a currently applied problem-solving method that results in an efficient algorithm for solving many problems. Its essence is that it breaks down the initial problem into subproblems and expresses the solution to the initial problem with the solutions to the subproblems. The following conditions have been formulated for the development of the tools: 1. The algorithm should be built “from the bottom up” by solving the subtasks in the correct order from the basic case to the original problem. As a first step, the subtasks had to be defined; the data structure (one- or two-dimensional array) had to be selected, and efforts had to be made to minimize the number of operations to be performed and the number of data to be performed. 2. The subtasks should be arranged in such a way that when solving the current subproblem, all the others that depend on it are already calculated. In our research, assembling and programming hardware-based devices proved the greatest challenge. The following were used to make them: LED strip of 20 LEDs, a 10 × 10 LED array with all its accessories, Arduino Nano-programmable microcontrollers, and other accessories. LED strips consist of non-primitive RGB LEDs. Each LED contains a programmable microcircuit that receives and, if necessary, transmits the color code to be displayed to the next LED. This makes data transfer simpler and faster (adjusting the color of the LEDs), which also improves the visual effect. Each of the visualization tools is a complex circuit. With the help of a button, we can control the algorithm running on them. The intervention is necessary to proceed, as the circuit does not automatically move to the next subtask after completing the previous one. Pausing the program provides an opportunity for the user to conveniently study and understand the algorithm. We implemented the operation of the algorithm on Arduino so that we can follow it visually on the LED strip and on the matrix. An LED strip was used to illustrate the one-dimensional data, and an LED matrix was used to illustrate the two-dimensional data. For the software-based tool, we used
53 A Context-Based Approach to Teaching Dynamic Programming
675
Blender 3-dimensional graphic modeling and animation, open source, free-to-use software. The following two types of tasks have been developed to illustrate the principles of dynamic programming: (1) Task for the algorithm implemented on the LED strip: Given n types of coins, the values of which are stored in an array w[1…n] and an X sum. Pay the amount X using a minimum number of coins (any amount can be used of each). The algorithm for solving the problem can be generally described as follows: Although the goal is to pay the sum X, we solve the problem for all sum values i = 0, 1,…, X (from value to value). The optimal number of coins is stored in the array c[0…X]. For a given i, the minimum number of coins is stored in cell c[i]. For the current i, calculate c[i] from the already available c[j] values (j≤i). Construction from the bottom up is done through a recursive formula that leads from optimum to optimum. For case i = 0, the minimum number of coins is obviously 0 (c[0] = 0). The following question can lead to the formula: from which sums j (j < i) can the current sum i (i = 1…X) be constructed by adding a coin? Since this can be any n kind of coin, the following formula describes it: c[i] = min c[i−w[k +1; for every k = 1...n, where w k Mag A˜ 2 , (ii) A˜ 1 ≺ A˜ 2 if Mag A˜ 1 < Mag A˜ 2 , (iii) A˜ 1 ∼ A˜ 2 if Mag A˜ 1 = Mag A˜ 2 , n a + b1 m Where Mag A˜ 1 = 1 − 21 A + 12 B 2 ω1 ω1 ω2 1 2 + (a1 − b1 )2 1 + (b1 − a1 )(m 1 A + n 1 B) + m 1 A A + 2m 1 n 1 AB + n 21 B B , 4 2 n a + b2 m Mag A˜ 2 = 2 − 22 A + 22 B 2 ω2 ω2 ω2 1 2 + (a2 − b2 )2 2 + (b2 − a2 ) m 2 A + n 2 B + m 2 A A + 2m 2 n 2 A B + n 22 B B 4 2
(1)
(2)
ω1 −1 λ ω1 −1 λ 2 −1 λ dλ, B = dλ, A A = λL λR dλ, B B = 0 0 0 λ L ω1 ω1 ω1 ω1 −1 λ 2 ω1 −1 λ −1 λ ω2 −1 λ R dλ, A dλ, λ R dλ, AB = λL = λL 0 0 0 ω1 ω1 ω1 ω2 2 2 ω ω ω B = 0 2 λR −1 ωλ2 dλ, A A = 0 2 λ L −1 ωλ2 dλ, B B = 0 2 λ R −1 ωλ2 dλ, −1 ω λ dλ. A B = 0 2 λL −1 ωλ2 R ω2 Qiupeng and Zuxing [9, Sec. 3, Theorem 3.2, pp. 676] proved that if A˜ 1 and A˜ 2 are two generalized trapezoidal fuzzy numbers, then the same approach can be used by considering ω1
54 On the Applicability of Possible Theory-Based Approaches for Ranking …
687
3a + 3b − m + n 1 1 1 1 Mag A˜ 1 = 6 (b1 − a1 )(m 1 + n 1 ) (m 1 + n 1 )2 (b1 − a1 )2 + + ω1 + 4 6 24
(3)
3a + 3b − m + n 2 2 2 2 Mag A˜ 2 = 6 (b2 − a2 )2 (b2 − a2 )(m 2 + n 2 ) (m 2 + n 2 )2 + + + ω2 . 4 6 24
(4)
and
Qiupeng and Zuxing [9, Sec. 3, Theorem 3.2, pp. 676] also proved that if A˜ 1 and ˜ A2 are two generalized triangular fuzzy numbers then the same approach can be used by considering 6a − m + n 1 1 1 1 + √ (m 1 + n 1 )ω1 Mag A˜ 1 = 6 24
(5)
6a − m + n 1 2 2 2 + √ (m 2 + n 2 )ω2 . Mag A˜ 2 = 6 24
(6)
and
3 Invalidity of the Existing Ranking Approaches It is well-known fact that if on applying a ranking approach for the fuzzy numbers A˜ 1 and A˜ 2 , the relation A˜ 1 A˜ 2 hold then on applying the same ranking approach for the fuzzy numbers − A˜ 1 and − A˜ 2 , the relation − A˜ 1 ≺ − A˜ 2 should also hold; otherwise the considered ranking approach will not be valid. Although Qiupeng and Zuxing [9, Sec. 4, Example 4, pp. 680] considered a numerical example to show that on applying their proposed ranking approach for the generalized triangular fuzzy numbers A˜ 1 = (0.3, 0.1, 0.2; 1) and A˜ 2 = (0.4, 0.15, 0.3; 1), the relation A˜ 1 A˜ 2 hold as well as on applying the same approach for the fuzzy numbers − A˜ 1 = (−0.3, 0.2, 0.1; 1) and − A˜ 2 = (−0.4, 0.3, 0.15; 1), the relation − A˜ 1 ≺ − A˜ 2 hold. But, the following clearly indicates that it is not always true, i.e., the ranking approaches proposed by Qiupeng and Zuxing [9; Sec. 3; Theorem 3.2, pp. 676; Theorem, 3.3, pp. 678; Theorem 3.4, pp. 678], cannot be used for the ranking of all generalized fuzzy numbers. (1) If A˜ 1 = (a1 , b1 , m 1 , n 1 ; ω1 ) L−R and A˜ 2 = (a2 , b2 , m 2 , n 2 ; ω2 ) L−R are two generalized L − R fuzzy numbers such that
688
M. Gupta and R. K. Bathla
(i) (ii) (iii) (iv)
a1 = a2 = a (let), b1 = b2 = b (let), m 1 = m 2 = n 1 = n 2 = m (let), L(x) = R(x),
i.e., A˜ 1 = (a, b, m, m; ω1 ) L−L , A˜ 2 = (a, b, m, m; ω2 ) L−L and hence, − A˜ 1 = (−b, −a, m, m; ω1 ) L−L , − A˜ 2 = (−b, −a, m, m; ω2 ) L−L then using the existing expressions (1) and (2), a + b ω2 + (a − b)2 1 + 2(b − a)m A + 2m 2 A A Mag A˜ 1 = 2 4 a+b ω2 + (a − b)2 2 + 2(b − a)m A + 2m 2 A A Mag A˜ 2 = 2 4 −b − a ω2 + (a − b)2 1 + 2(b − a)m A + 2m 2 A A Mag − A˜ 1 = 2 4 −b − a ω2 + (a − b)2 2 + 2(b − a)m A + 2m 2 A A Mag − A˜ 2 = 2 4
(7)
(8)
(9)
(10)
Subtracting Eq. (8) from Eq. (7)
ω Mag A˜ 1 − Mag A˜ 2 = (a − b)2 1 + 2(b − a)m A + 2m 2 A A 4 ω2 − (a − b)2 2 + 2(b − a)m A + 2m 2 A A 4 2
(11)
Subtracting Eq. (10) from Eq. (9) ω2 Mag − A˜ 1 − Mag − A˜ 2 = (a − b)2 1 + 2(b − a)m A + 2m 2 A A 4 ω2 − (a − b)2 2 + 2(b − a)m A + 2m 2 A A 4
(12)
It is obvious from Eq. (11) and (12) that Mag A˜ 1 − Mag A˜ 2 = Mag − A˜ 1 − Mag − A˜ 2 . Therefore, (a) Mag A˜ 1 − Mag A˜ 2 > 0
54 On the Applicability of Possible Theory-Based Approaches for Ranking …
⇒Mag − A˜ 1 − Mag − A˜ 2 Mag − A˜ 1 > Mag − A˜ 2 ,
>
0, i.e., Mag A˜ 1
>
689
Mag A˜ 2 ⇒
i.e., A˜ 1 A˜ 2 ⇒ − A˜ 1 − A˜ 2 , which is mathematically incorrect. (b) Mag A˜ 1 − Mag A˜ 2 < 0 ⇒Mag − A˜ 1 − Mag − A˜ 2 < 0 i.e., Mag A˜ 1 < Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 , i.e., A˜ 1 ≺ A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 , which is mathematically incorrect. (2) If A˜ 1 = (a1 , b1 , m 1 , n 1 ; ω1 ) and A˜ 2 = (a2 , b2 , m 2 , n 2 ; ω2 ) are two generalized trapezoidal fuzzy numbers such that (i) a1 = a2 = a (let), (ii) b1 = b2 = b (let), (iii) m 1 = m 2 = m (let), (iv) n 1 = n 2 = n (let), i.e., A˜ 1 = (a, b, m, n; ω1 ), A˜ 2 = (a, b, m, n; ω2 ) and hence, − A˜ 1 = (−b, −a, n, m; ω1 ), − A˜ 2 = (−b, −a, n, m; ω2 ) then according to the existing expressions (3) and (4),
Mag A˜ 1
3a + 3b − m + n + = 2
(b − a)(m + n) (m + n)2 (b − a)2 + + ω1 4 6 24 (13)
(b − a)(m + n) (m + n)2 (b − a)2 + + ω2 4 6 24 (14) −3a − 3b + m − n (b − a)(m + n) (m + n)2 (b − a)2 + + + ω1 Mag − A˜ 1 = 2 4 6 24 (15) −3a − 3b + m − n (b − a)(m + n) (m + n)2 (b − a)2 + + + ω2 Mag − A˜ 2 = 2 4 6 24 (16) 3a + 3b − m + n + Mag A˜ 2 = 2
Subtracting Eq. (14) from Eq. (13)
Mag A˜ 1 − Mag A˜ 2 =
(b − a)(m + n) (m + n)2 (b − a)2 + + [ω1 − ω2 ] 4 6 24 (17)
690
M. Gupta and R. K. Bathla
Subtracting Eq. (16) from Eq. (15) Mag − A˜ 1 − Mag − A˜ 2 =
(b − a)(m + n) (m + n)2 (b − a)2 + + [ω1 − ω2 ] 4 6 24 (18)
It is obvious from Eq. (17) and (18) that Mag A˜ 1 − Mag A˜ 2 = Mag − A˜ 1 − Mag − A˜ 2 . Therefore, (a) Mag A˜ 1 − Mag A˜ 2 > 0 ⇒Mag − A˜ 1 −Mag − A˜ 2 > 0, i.e., Mag A˜ 1 > Mag A˜ 2 ⇒ Mag − A˜ 1 > Mag − A˜ 2 i.e., A˜ 1 A˜ 2 ⇒ − A˜ 1 − A˜ 2 , which is mathematically incorrect. (b) Mag A˜ 1 − Mag A˜ 2 < 0 ⇒Mag − A˜ 1 −Mag − A˜ 2 < 0, i.e., Mag A˜ 1 < Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 , i.e., A˜ 1 ≺ A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 , which is mathematically incorrect. (3) If A˜ 1 = (a1 , m 1 , n 1 ; ω1 ) and A˜ 2 = (a2 , m 2 , n 2 ; ω2 ) are two generalized triangular fuzzy numbers such that (i) a1 = a2 = a (let), (ii) m 1 = m 2 = m (let), (iii) n 1 = n 2 = n (let), i.e., A˜ 1 = (a, m, n; ω1 ), A˜ 2 = (a, m, n; ω2 ) and hence, − A˜ 1 = (−a, n, m; ω1 ), − A˜ 2 = (−a, n, m; ω2 ) then according to the existing expressions (5) and (6), 6a − m + n 1 + √ (m + n)ω1 Mag A˜ 1 = 20 24 6a − m + n 1 + √ (m + n)ω2 Mag A˜ 2 = 2 24 −6a + m − n 1 + √ (m + n)ω1 Mag − A˜ 1 = 2 24 −6a + m − n 1 + √ (m + n)ω2 Mag − A˜ 2 = 2 24 Subtracting Eq. (20) from Eq. (19)
(19) (20) (21) (22)
54 On the Applicability of Possible Theory-Based Approaches for Ranking …
691
1 Mag A˜ 1 − Mag A˜ 2 = √ (m + n)[ω1 − ω2 ] 24
(23)
Subtracting Eq. (22) from Eq. (21) 1 (24) Mag − A˜ 1 − Mag − A˜ 2 = √ (m + n)[ω1 − ω2 ] 24 It is obvious from Eq. (23) and (24) that Mag A˜ 1 − Mag A˜ 2 = Mag − A˜ 1 − Mag − A˜ 2 . Therefore, (a) Mag A˜ 1 − Mag A˜ 2 > 0 ⇒Mag − A˜ 1 − Mag − A˜ 2 > 0 i.e., Mag A˜ 1 > Mag A˜ 2 ⇒ Mag − A˜ 1 > Mag − A˜ 2 . i.e., A˜ 1 A˜ 2 ⇒ − A˜ 1 − A˜ 2 , which is mathematically incorrect. (b) Mag A˜ 1 − Mag A˜ 2 < 0 ⇒ Mag − A˜ 1 −Mag − A˜ 2 < 0 i.e., Mag A˜ 1 < Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 . i.e., A˜ 1 ≺ A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 , which is mathematically incorrect.
4 Validity of the Existing Ranking Approaches It is obvious from Sect. 3 that the existing ranking approaches [6; Sec. 3; Theorem 3.2, pp. 676; Theorem 3.3, pp. 678; Theorem 3.4, pp. 678] are not valid for all types of generalized fuzzy numbers. In this section, those generalized fuzzy numbers are discussed for which the existing ranking approaches [6; Sec. 3; Theorem 3.2, pp. 676; Theorem 3.3, pp. 678; Theorem 3.4, pp. 678] will be valid.
4.1 Validity of the Existing Approach for the Ranking of Generalized L − R Fuzzy Numbers IN the existing approach for the ranking of generalized L − R fuzzy numbers [6, Sec. 3, Theorem 3.2, pp. 676], the expression Mag A˜ 1 actually represents the sum of M A˜ 1 and σ A˜ 1 i.e., A˜ 1 = M A˜ 1 + σ A˜ 1 , where
692
M A˜ 1
M. Gupta and R. K. Bathla
+ ωn 12 B and σ A˜ 1 1 2 1 + (b1 − a1 )(m 1 A + n 1 B) + 2 m 1 A A + 2m 1 n 1 AB + n 21 B B . =
a1 +b1 2
=
2 ω (a1 − b1 )2 41 Similarly, Mag A˜ 2 = M A˜ 2 + σ A˜ 2 , where a2 +b2 m2 n2 M A˜ 2 = = − A + B and σ A˜ 2 2 ω22 ω22
ω2 (a2 − b2 )2 42 + (b2 − a2 )(m 2 A + n 2 B ) + 21 m 22 A A + 2m 2 n 2 A B + n 22 B B . It can be easily verified that if A˜ 1 = (a1 , b1 , m 1 , n 1 ; ω1 ) L−R and A˜ 2 = (a2 , b2 , m 2 , n 2 ; ω2 ) L−R are two generalized L − R fuzzy numbers such that σ A˜ 1 = σ A˜ 2 , then the relation Mag A˜ 1 > Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 will always hold, i.e., the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 will always hold. Hence, the existing approach [1, Sec. 3, Theorem 3.2, pp. 676] will always be valid only L − R fuzzy numbers A˜ 1 and A˜ 2 for which the condition suchgeneralized for σ A˜ 1 = σ A˜ 2 will be satisfied. It is pertinent to mention that if this condition will not be satisfied then the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 may or may not hold. Hence, it is mathematically incorrect to use the existing approach [6, Sec. 3, Theorem 3.2, pp. 676]for the ranking of such generalized L − R fuzzy numbers for which the condition σ A˜ 1 = σ A˜ 2 will not be satisfied. −
m1 ω12
A
4.2 Validity of the Existing Approach for the Ranking of Generalized Trapezoidal Fuzzy Numbers In the existing approach for the ranking of generalized trapezoidal fuzzy numbers ˜ [6, Sec. 3, Theorem 3.3, pp. 678], the expression Mag A1 actually represents the sum of M A˜ 1 and σ A˜ 1 i.e., Mag A˜ 1 = M A˜ 1 + σ A˜ 1 , where
2 +n 1 )2 1) 1 +n 1 ) M A˜ 1 = 3a1 +3b16−m 1 +n 1 and σ A˜ 1 = (b1 −a + (b1 −a1 )(m + (m 1 24 ω1 . 4 6 Similarly, Mag A˜ 2 = M A˜ 2 + σ A˜ 2 , where
2 +n 2 )2 2) 2 +n 2 ) M A˜ 2 = 3a2 +3b26−m 2 +n 2 and σ A˜ 2 = (b2 −a + (b2 −a2 )(m + (m 2 24 ω2 . 4 6 It can be easily verified that if A˜ 1 = (a1 , b1 , m 1 , n 1 ; ω1 ) and A˜ 2 = fuzzy (a2 , b2, m 2 , n2 ; ω2) are two generalized trapezoidal numbers such that σ A˜ 1 = σ A˜ 2 , then the relation Mag A˜ 1 > Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 will always hold, i.e., the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 will always hold. Hence, the existing approach [6, Sec. 3, Theorem 3.3, pp. 678] will always be
54 On the Applicability of Possible Theory-Based Approaches for Ranking …
693
˜ ˜ valid only for generalized trapezoidal fuzzy numbers A1 and A2 for which the such ˜ ˜ condition σ A1 = σ A2 will be satisfied. It is pertinent to mention that if this condition will not be satisfied then the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 may or may not hold. Hence, it is mathematically incorrect to use the existing approach [6, Sec. 3, Theorem 3.3, pp. 678] for the of such generalized trapezoidal fuzzy ranking ˜ ˜ numbers for which the condition σ A1 = σ A2 will not be satisfied.
4.3 Validity of the Existing Approach for the Ranking of Generalized Triangular Fuzzy Numbers In the existing approach for the ranking of generalized triangular fuzzy numbers [6, Sec. 3, Theorem 3.4, pp. 678], the expression Mag A˜ 1 actually represents the sum of M A˜ 1 and σ A˜ 1 , i.e., ag A˜ 1 = M A˜ 1 + σ A˜ 1 , where M A˜ 1 = 6a1 −m6 1 +n 1 and σ A˜ 1 = √124 (m 1 + n 1 )ω1 . Similarly, Mag A˜ 2 = M A˜ 2 + σ A˜ 2 , where M A˜ 2 = 6a2 −m6 2 +n 2 and σ A˜ 2 = √124 (m 2 + n 2 )ω2 . It can be easily verified that if A˜ 1 = (a1 , m 1 , n 1 ; ω1 ) andA˜ 2 = (a2 , m 2 , n 2 ; ω2 ) are two generalized triangular fuzzy numbers such that σ A˜ 1 = σ A˜ 2 , then the relation Mag A˜ 1 > Mag A˜ 2 ⇒ Mag − A˜ 1 < Mag − A˜ 2 will always hold, i.e., the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 will always hold. Hence, the existing approach [6, Sec. 3, Theorem 3.4, pp. 678] will always be valid only for such generalized triangular fuzzy numbers A˜ 1 and A˜ 2 for which the condition σ A˜ 1 = σ A˜ 2 will be satisfied. It is pertinent to mention that if this condition will not be satisfied then the relation A˜ 1 A˜ 2 ⇒ − A˜ 1 ≺ − A˜ 2 may or may not hold. Hence, it is mathematically incorrect to use the existing approach forthe ranking ofsuch ˜ generalized triangular fuzzy numbers for which the condition σ A1 = σ A˜ 2 will not be satisfied.
5 Conclusion An existing approach for ranking generalized L-R fuzzy numbers is discussed. It is pointed out that the discussed approach cannot be used to compare any two L-R fuzzy numbers. Also, it is pointed out that this approach can be used to rank generalized
694
M. Gupta and R. K. Bathla
L-R fuzzy numbers only if some conditions will be satisfied for generalized L-R fuzzy numbers. Furthermore, it is pointed out that this approach can be used to rank generalized trapezoidal fuzzy numbers only if some conditions will be satisfied for generalized trapezoidal fuzzy numbers. Finally, it is pointed out that this approach can be used to rank generalized triangular fuzzy numbers only if some conditions will be satisfied for generalized triangular fuzzy numbers. In the future, one may try to generalize existing approaches in such a manner that it can be used to rank generalized L-R fuzzy numbers, generalized trapezoidal fuzzy numbers, and generalized triangular fuzzy numbers without any restriction. Furthermore, with the help of the extended ranking method, new methods for solving decision-making problems under a fuzzy environment and its various extensions.
References 1. Chen SH (1985) Ranking fuzzy numbers with maximizing set and minimizing set. Fuzzy Sets Syst 17:113–129 2. Delgado M, Verdegay JL, Villa MA (1988) A procedure for rank in g fuzzy numbers using fuzzy relations. Fuzzy Sets Syst 26:49–62 3. Mabuchi S (1988) An approach to the comparison of fuzzy subsets with an α—cut dependent index. IEEE Trans Syst Man Cybern SMC 18:264–272 4. Buckley JJ, Chanas S (1989) A fast method of ranking alternatives using fuzzy number (short communi-cations). Fuzzy Sets Syst 30:337–339 5. Dubois D, Parde H (1983) Ranking of fuzzy numbers in the setting of possibility theory. Inform Sci 30:183–224 6. Abbasbandy S, Hajjari T (2009) A new approach for ranking of trapezoidal fuzzy numbers. Comput Math Appl 57:413–419 7. Wu JF, Xu RN (2014) An improved method for ranking fuzzy numbers based on centroid. J Guangzhou Univ Nat Sci 13(1):7–11 8. Janizade-Haji M, Zare HK, Eslamipoor R, Sepehriar A (2014) A developed distance method for ranking generalized fuzzy numbers. Neural Comput Appl 25: 727–731 9. Qiupeng G, Zuxing X (2017) A new approach for ranking fuzzy numbers based on possibility theory. J Comput Appl Math 309:674–682 10. Chutia R (2021) Ranking interval type-2 fuzzy number based on a novel value-ambiguity ranking index & its applications in risk analysis. Soft Comput 25: 8177–8196 11. Firozja MA, Balf FR, Agheli B, Chutia R (2022) Ranking of generalized fuzzy numbers based on accuracy of comparison 19: 49–61 12. Hop NV (2022) Ranking of fuzzy numbers based on relative positions and shape characteristics 191: pp 116–132
Chapter 55
Change Detection of Mangroves at Subpixel Level of Synthesized Hyperspectral Data Using Multifractal Analysis Method Dipanwita Ghosh, Somdatta Chakravortty, and Tanumi Kumar
1 Introduction In recent times mangrove forests in many parts of the world are declining at an alarming rate. Sunderban delta region is enriched in various kinds of mangrove species. In the current scenario, it has been observed that various mangroves species are endangered in this region. As a consequence ecological balance and biodiversity is getting disturbed. In the coastal regions, various causes like sea-level rise, increase in salinity level, rapid sedimentation process and other coastal hazards reduce the growth of mangroves. The continuous natural or anthropogenic disruption in the coastal zone influences various kinds of adaptive changes in mangrove communities [1]. In this area, various mangrove communities interact with each other. The outcome of the interactions are reciprocal as well as mutual. In this study, Earth Observation 1 Hyperion data has been used to analyze mangrove dynamics. In this paper multifractal analysis has been used for land cover change detection. In this paper a method of change detection has been applied on a heterogeneous temporal dataset. Authors have introduced a new modified parametric approach based on modified fractal. Chakravortty et al. [2] Decomposition method for detecting the changes in land cover classes. In this work fractal-based projection method has been used. In this paper authors have used sparse unmixing-based change detection method using multi-temporal data at subpixel level. Dictionary pruning method has been also used for change detection [3]. Change vector analysis was adopted to estimate change in spectral information for multi-temporal data. Various algorithms like principal component analysis, k-means and the independent component analysis algorithm D. Ghosh · S. Chakravortty (B) Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India e-mail: [email protected] T. Kumar Regional Remote Sensing Centre, Indian Space Research Organization, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_55
695
696
D. Ghosh et al.
were used for change detection using hyperspectral data. In Dalla Mura et al. [4] the author proposed an unsupervised technique that integrates morphological filters. This new technique had been applied on Very High Resolution (VHR) images. In this method different morphological indices had been used for change detection. In Celik [5] genetic algorithms have been used for producing change detection maps. In Aleksandrowicz et al. [6] K-mean clustering technique has been used to detect land cover classes. It is also noteworthy that physical monitoring and ground surveys are used as the key factor to study dynamics of mangrove ecosystems [1]. In this work, image derived fractional abundances have been used for detecting changes at subpixel level. In this study a unique approach, multifractal analysis has been used to predict change at subpixel level. Multifractal image analysis asserts the idea of self-similarity property. High-dimensional images can be characterized by using this multifractal spectrum or singularity spectrum instead of single fractal dimension. This multifractal analysis [7] also helps to measure the level of regularity of a component, both globally and locally. In this work multifractal analysis has been used to measure local and global change in a quantitative way. It is very important to mention that for several years multifractal analysis has been used for medical image analysis as well as in satellite image classification and segmentation. From a remote sensing perspective, this method had been also used for change detection at pixel level. Therefore, novelty in this work is that multifractal analysis has been uniquely used for change detection at subpixel level of multi-temporal moderate resolution satellite imagery. As this study focuses on the change detection of mangrove communities, this will also help to analyze dynamic behavior of mangrove communities over a period of time.
2 Study Area In this study Henry Island, Sunderban, West Bengal was chosen as the study area. The geographic location of the study area expands between 88° 12 E and 88° 26’E longitude and 21° 35 N and 21° 40 N latitude. This island was considered for this study as the island harbors both non-homogeneous and homogeneous communities of mangroves.
3 Data Used In this study, time-series Landsat images have been used for multifractal analysis. Landsat Images were acquired by the Thermal Infrared Sensor (TIRS) and Operational Land Imager (OLI) sensor on 5th March, 2014 and 8th March, 2021 respectively. These two datasets have been used for change detection study over Henry Island.
55 Change Detection of Mangroves at Subpixel Level of Synthesized …
697
4 Methodology In this section proposed technique (Fig 1.) will be presented for the analysis of the dynamic behavior of the mangrove ecosystem. In this study multi-temporal, multispectral data have been used. As Landsat data has moderate spatial resolution and low spectral resolution, spectral reconstruction methods such as dictionary-based learning followed by sparse coding [8, 9] have been applied in this study to reconstruct multispectral images with high spectral resolution. These regenerated multitemporal images have been used for change detection [10, 11, 12, 13]. Next spectral unmixing has been applied on the regenerated multi-temporal dataset to extract fractional abundance value in each pixel. Then the multifractal spectrum has been calculated to detect global and local change. Linear Spectral Unmixing The main aim of the linear spectral unmixing is to obtain the abundance values for different end members. In this method, linear algebra has been used to calculate the values of different variables. The value of abundances signifies the proportion of end members present in a mixed pixel. For obtaining the output pixel intensity values are multiplied by their respective pure end member. Thus a fractional image will be generated for the entire scene. In a hyper spectral image if a pixel contains a single endmember spectrum or mixed spectra of multiple endmember, represented as S, the individual endmember spectra is represented by E, presence of each endmember in percentage in each pixel is represented by A, and the error term is represented by e, then the relation is expressed as S = EA+e
(1)
Multifractal Spectrum Multifractal analysis [14, 15] is one of the well-known method used for assessment of change detection for different land cover. This method uses continuous spectrum of exponents (singularity spectrum) instead of single exponent. Moreover, this can be used to detect change of inner structure of a component and generalized dimension of whole spectrum is denoted by Dq [6]. In multifractal approach [16] information has been extracted from singularities of data. Multifractality has significant value for non-homogenous structure and zero for homogenous structure. In multifractal analysis methods quantitative multifractal parameters have been derived from image. To determine multifractal spectrum box counting-based method (BCBM) has been used in this study. Using BCBM method changes have been detected at subpixel level and tried to find out potential changes at each and every pair of subset. In BCBM method considering an image size is mxm is splitted into N(δ)=(m/δ)2 square boxes of size δ × δ, where1 ≤ δ ≤ m.
698
D. Ghosh et al.
Fig. 1 Flow chart of methodology
Start
Data Acquisition
Preprocessing of Landsat Data
Apply Multifractal Analysis Method
Calculate Degree of Multifractality
Validated through Ground Survey
(δ) μi(δ) j = Pi /
N (δ)
Pi(δ)
(2)
i=1
where i=1… N(δ)labels of individual box with size δ, and P(δ) denote total weight of ith box and in the other hand denominator indicates the total weight of the image, j=1…n indicate the total number of end members present in an image. In this particular case SUM and MAX functions have been used for calculating probabilistic weight age value. SUM functions measure self-similarity of an object in an image. MAX function measures the brightest object in an image. In the next step partition
55 Change Detection of Mangroves at Subpixel Level of Synthesized …
699
functions have been calculated for (q,) different end members present in the image. Where Dq denotes the generalized dimension and degree of multifractality has been expressed as j. (q,δ)
xj
=
N (δ) μi j (δ)q
(3)
i=1
x j (q, δ)∞δ Dq(q−1)
(4)
N (δ) q μ(δ) − log δ = (1/(1 − q)) lim log
Dq j
δ j→0
i=1
j = D−∞ j − D+∞ j
(5)
ij
(6)
Ground Survey To endorse the output of the proposed multifractal-based change detection method credible accuracy assessment method was applied. A survey was organized to determine and collect ground truth information of different mangrove species in the study area. The ground visit was arranged in the years 2014 and 2021 for ground truth data collection, instantly after the image acquisition for these years. The quadrat survey method was applied for mangrove species enumeration. Each of the quadrats of size 30 x 30 square meters (equivalent to the spatial resolution of Landsat Image and Hyperion Image [17]) was placed in the study area and according to that different mangrove communities, zonations were identified for different quadrants (Table 2). Using this strategy some pure and mixed communities were identified. During ground survey, it has been observed that mangrove communities have mixed nature so it is very difficult to identify regions with having only one single community, considered as a pure patch of mangrove community. So in this present study the pure or Mixed Mangrove communities have been identified based on the crown size of mangrove trees. Especially, in a quadrant if the presence of a single species was more than 80% of the total number of trees in a quadrant, then the quadrant was considered as a pure mangrove patch/ pure mangrove community. On the contrary, if none of the communities had 80% presence in a quadrant then it was considered a Mixed Mangroves patch/Mixed Mangrove community.
5
Result and Discussion
In the present approach multifractal analysis was used to detect the change [18, 19] in mangrove communities for two different years (Fig. 2). It has been observed that strengths of singularity α+10 and α−5 had significant comparable values for both the years. As a result it has been shown that different mangrove community subsets
700
D. Ghosh et al.
exhibited different degrees of multifractality values (Table 1). For the Excoecaria agallocha dense community and saline blank, the value of degree of multifractality increased. On the contrary, values of degree of multifractality slightly decreased for the Avicennia dense community and significantly dropped for the Mixed Mangrove community. So, it can be interpreted that in Henry Island Excoecaria agallocha dense community and saline blanks dominated over the other communities like Avicennia dense and Mixed Mangrove.
2014
2021
ClassifiedImage2014
ClassifiedImage2021
Fig. 2 Integrated fractional images of different Mangrove communities in 2014 and 2021
Table 1 Values of degree of multifractality for different mangrove communities in henry island in different year 2014 and 2021 Year
Value of Excoecaria agallocha dense
Value of mixed mangrove
Value of Avicennia dense
Value of saline blank
2014
6.11
7.97
6.37
7.19
2021
6.50
6.97
6.10
8.10
Table 2 Total area covered by different mangrove communities in Henry Island during 2014 and 2021
Community
Area in square meters(2014)
Area in square meters(2021)
Excoecaria agallocha dense
3,721,500
4,837,500
Mixed mangrove
1,813,500
1,706,400
Avicennia dense
4,239,900
3,418,200
Saline blank
1,648,800
3,927,600
55 Change Detection of Mangroves at Subpixel Level of Synthesized …
701
6 Accuracy Assessment Based on this ground truth observation, the total area for a particular community [20] was calculated. In our study, the total area occupied by Excoecaria agallocha dense and saline blanks have been increased and total area occupied by Mixed Mangrove and Avicennia dense have been decreased. From the ground truth assessment method it was concluded that Excoecaria agallocha dense and saline blanks were dominant over the other communities like Avicennia dense and Mixed Mangrove. The estimated community-wise areas were comparable with the outputs obtained using multifractal analysis method.
7 Conclusion and Future Work From our experimental result it has been interpreted that the value of multifractality level can be used as an indicator of mangrove community level change. The increase in degree of multifractality can signify the type of community changes. In this method it has been concluded that the performance of the proposed method depends on the size of the neighborhood and the function type used for multifractal analysis. Here the SUM measure has been used for multifractal spectrum calculation. Using this methodology global description can be accurately interpreted. In future this method will be applied on different dataset for change detection.
References 1. Jenerowicz M, Wawrzaszek A, Drzewiecki W, Krupi´nski M, Aleksandrowicz S (2019) Multifractality in humanitarian applications: a case study of internally displaced persons/refugee camps. IEEE J Sel Top Appl Earth Observations Remote Sens 12(11):4438–4445 2. Chakravortty S, Li J, Plaza A (2017) A technique for subpixel analysis of dynamic mangrove ecosystems with time-series hyperspectral image data. IEEE J Selected Topics Appl Earth Observations Remote Sens 11.4: 1244–1252 3. Zhou J, Kwan C, Ayhan B, Eismann MT (2016) A novel cluster kernel RX algorithm for anomaly and change detection using hyperspectral images. IEEE Trans Geosci Remote Sens 54(11):6497–6504 4. Dalla Mura M, Benediktsson JA, Bovolo F, Bruzzone L (2008) An unsupervised technique based on morphological filters for change detection in very high resolution images. IEEE Geosci Remote Sens Lett 5(3):433–437 5. Celik T (2009) Multiscale change detection in multi temporal satellite images. IEEE Geosci Remote Sens Lett 6(4):820–824 6. Aleksandrowicz S, Wawrzaszek A, Drzewiecki W, Krupi´nski M (2016) Change detection using global and local multifractal description. IEEE Geosci Remote Sens Lett 13(8):1183–1187 7. Mignotte M (2020) A Fractal projection and Markovian segmentation-based approach for multimodal change detection. IEEE Trans Geosci Remote Sens 58(11):8046–8058
702
D. Ghosh et al.
8. Gao L, Hong D, Yao J, Zhang B, Gamba P, Chanussot J (2020) Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Trans Geosci Remote Sens 59(3):2269–2280 9. Ghosh D, Chakravortty S (2022) Reconstruction of high spectral resolution multispectral image using dictionary-based learning and sparse coding. Geocarto Int, pp 1–2 10. Hussain M, Chen D, Cheng A, Wei H, Stanley D (2013) Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J Photogramm Remote Sens 80:91–106 11. Chakravortty S, Ghosh D (2018) Automatic identification of saline blanks and pattern of related mangrove species on hyperspectral imagery. In: 2018 4th international conference on recent advances in information technology (RAIT), pp 1–6. IEEE 12. Chen CF, Son NT, Chang NB, Chen CR, Chang LY, Valdez M, Aceituno JL (2013) Multidecadal mangrove forest change detection and prediction in Honduras, Central America, with landsat imagery and a Markov chain model. Remote Sens 5(12):6408–6426 13. Son NT, Chen CF, Chang NB, Chen CR, Chang LY, Thanh BX (2014) Mangrove mapping and change detection in Ca Mau Peninsula, Vietnam, using Landsat data and object-based image analysis. IEEE J Selected Topics Appl Earth Observations Remote Sens 8(2):503–510 14. Krupi´nski M, Wawrzaszek A, Drzewiecki W, Jenerowicz M, Aleksandrowicz S (2020) Multifractal parameters for classification of hyperspectral data. In: IGARSS IEEE international geoscience and remote sensing symposium IEEE 15. Mukherjee K, Ghosh JK, Mittal RC (2013) Variogram fractal dimension based features for hyperspectral data dimensionality reduction. J Indian Soc Remote Sens 41(2):249–258 16. Ghosh D, Chakravortty S (2020) Change detection of tropical mangrove ecosystem with subpixel classification of time series hyperspectral imagery. In: Artificial intelligence techniques for satellite image analysis (pp 189–211). Springer, Cham 17. Kumar T, Mandal A, Dutta D, Nagaraja R, Dadhwal VK (2019) Discrimination and classification of mangrove forests using EO-1 Hyperion data: a case study of Indian Sundarbans. Geocarto Int 34(4):415–442 18. Chakravortty S, Ghosh D, Sinha D (2018) A dynamic model to recognize changes in mangrove species in Sunderban delta using hyperspectral image analysis. In: Progress in intelligent computing techniques: theory, practice, and applications, (pp 59–67). Springer, Singapore 19. Ghosh D, Chakravortty S, Miguel AJP, Li J (2021) Change prediction and modeling of dynamic mangrove ecosystem using remotely sensed hyperspectral image data. J Appl Remote Sens 15(4):042606 20. Kumar T, Kaur P, Chandrasekar K, Bandyopadhyay S (2020) Aviris-ng hyperspectral data for mapping mangrove forests and their health spatially. J Trop For Sci 32(3):317–331
Chapter 56
Analysis of the Behavior of Metamaterial Unit Cell with Respect to Change in Its Structural Parameters Shipra Tiwari, Pramod Sharma, and Shoyab Ali
1 Introduction Metamaterials are a subclass of a broader category of artificial electromagnetic materials. We think of metamaterials as both metamaterials (MM) [1] and metasurfaces [2]; also, there are materials such as plasmonics [3], photonic crystals [4], and artificial dielectrics frequency selective surfaces [5] and so on, and all of those can be broadly described as artificial electromagnetic materials with metamaterials [6]. In particular, we have a great ability to control many of the scattering properties of light [7] and have good control over their spatial properties [8] and temporal properties [9]. The electromagnetic response due to metamaterials is coming from the geometry of MM unit cell and not from the constituent materials or the chemistry which has an advantage that we get independent and direct control of E (epsilon) and µ (mu); those are the two material parameters that enter directly into Maxwell’s equations [10], so it gives good insight into solving electromagnetic problems; they are multifunctional. The electromagnetic response is coming from the atom itself, i.e., the unit cell and not from the array. The array effects are secondary to the main response, and therefore, we can design simply a unit cell. Dimension of the small unit cell is much less than that of the operating wavelength. Metamaterial shows the homogeneous behavior when they come into contact with electromagnetic waves. This behavior is the same as the behavior of the Lettice crystal materials under the influence of electromagnetic waves. This research work is mainly divided into three sections. In the first section, we have designed a simple metamaterial unit cell. This design is taken from the Smith research paper [11] and analyzes the resonant frequency of the material. In Sect. 2, S. Tiwari (B) · P. Sharma · S. Ali Regional College for Education Research and Technology, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_56
703
704
S. Tiwari et al.
we analyze the variation in a resonant frequency of the metamaterial with respect to change in the length and width of the split rings. Similarly, in Sect. 3, we analyze the change in resonant frequency with respect to the gap in the split rings and the width of the thin wire, respectively. In the initial work [12], authors do the comparison only for the gap size of the split ring for different geometry, and the orientation of the wave was also different.
2 Design of MM Unit Cell From the past several years, the metamaterials are considered as the effective media. The scattering elements of the metamaterials are arranged as periodic structures. In general, they are split rings along with the thin wires. Therefore, the single unit cell is also called a split-ring resonator. For mathematical analysis, we solve Maxwell’s equation for a single unit cell of a periodic structure. The single unit cell is a symmetrical geometry in the direction of wave propagation. A single unit cell consists of two conducting split rings surrounded each other on a dielectric substrate and a thin wire opposite side to the split rings on the dielectric substrate as shown in Fig. 1. The split ring generates the negative value of mu, and the thin wire generates the negative value of epsilon. Combination of split rings and thin wire generates the double negative (DNG) property in the material. Figure 2 is showing the top view of the MM unit cell along with dimensions. The length (L) and width (W) of the unit cell is 2.5 mm. Spacing (S) between two split rings is 0.15, and the width (D) of the unit cell is 0.2 mm. The gap (G) is 0.3 mm, and the length and width (p) of the thin wire is 2.5 mm and 0.14 mm, respectively. Dielectric substrate is FR4 epoxy with dielectric constant 4.4 and thickness 0.25 mm.
Fig. 1 Structural diagram of MM unit cell
56 Analysis of the Behavior of Metamaterial Unit Cell with Respect …
705
Fig. 2 Top view of MM unit cell
2.1 Configuration of the MM Unit Cell to Extract S-Parameters To explore the negative value of epsilon and mu, we have to extract the S-parameters of the MM unit cell. To obtain the S-parameters from the MM unit cell, we do the wave port analysis. In the wave port analysis, we consider a waveguide, and unit cell is placed inside the waveguide. EM wave is propagated along the split-ring geometry. The electric field and the magnetic fields are perpendicular to each other and to the direction of the propagation as shown in Fig. 3. Simulation of above configuration is done on HFSS V15 Software. Perfect E and perfect H boundaries are assigned to the waveguide surface, and the waveguide is excited by wave port excitation according to Fig. 3.
2.2 Simulation Results for S-Parameter Extraction Figure 4 is showing the simulation result obtained from the HFSS software. The S-parameters of the unit cell can be calculated and plotted in HFSS software itself. To obtain the permittivity and permeability of the unit cell, we write a piece of code in Matlab and draw the graph of permittivity and permeability with respect to frequency. Figure 4a and b is showing the S-parameters of the unit cell where Figure (a) is magnitude versus frequency and Figure b is phase versus frequency diagram. By Fig. 4, we can say that there is a sudden change in the phase of S21 which indicates the negative refractive index. This change occurs at 9.4 GHz. Figure 4c shows the real part of the permeability is negative for a certain frequency band, and
706
S. Tiwari et al.
Fig. 3 Configuration of MM unit cell for wave port analysis
the most negative value occurs at 10.8 GHz frequency. Similarly, in Fig. 4d, the real part of the permittivity is also showing the negative value, and we can see that the imaginary part of the permittivity is also negative at frequency 10.5 GHz.
3 Effect of Change in Split Ring Dimensions on Resonant Frequency In the next subsequent part of this paper, we perform some experiment with variable dimensions of split rings and thin wire. First, we change the width of the split ring and try to analyze the effects of this variation in the weight of the split ring on the resonant frequency of the unit cell where we get the negative value of epsilon and mu.
3.1 Simulation Results for Variable Length and Width of SRR In the above simulation, we have seen that a single MM unit cell with split rings and thin wire can generate negative epsilon and mu for a certain frequency band. In the above experiment, the resonant frequency was approximately 9.3 GHz. The next stage we make the width of the split ring variable. The outer length and the breadth of the split ring was 2.2 mm, and the inner length and the breadth of the split ring was 1.8 mm. We take 102 observations by modulating the inner and outer length and breadth of the split ring from 1.9 to 2.2 mm with a step size of 0.05. From Fig. 5, we can see that for variable length and width of the split ring we have continuous shift in the negative value of epsilon and mu. The imaginary part of
56 Analysis of the Behavior of Metamaterial Unit Cell with Respect …
Fig. 4 Simulation results of MM unit cell
707
708
S. Tiwari et al.
Fig. 5 Epsilon and mu graph with real and imaginary values for variable length and width of SRR. [b = breadth of SRR, l = length of SRR]
the epsilon is shifted from 9.5 to 11 GHz, and the same shift occurs for the real part of mu.
3.2 Simulation Results for Variable Gap Size of SRR In the previous section, we have seen that the resonant frequency of the split-ring resonator (SRR) is shifted with respect to the length and width of the split ring. In the next experiment, we vary the gap size and analyze the effect on resonant frequency. The gap size of the split ring is 0.3, and we take 15 observations from 0.1 to 0.3 with a step size of 0.01 mm. Simulation results for negative epsilon and mu are shown in Fig. 6.
Fig. 6 Epsilon and mu graph with real and imaginary values for variable gap size
56 Analysis of the Behavior of Metamaterial Unit Cell with Respect …
709
Fig. 7 Epsilon and mu graph with real and imaginary values for variable width of thin wire w = width of strip line
3.3 Simulation Results for Variable Width of Thin Wire The thin wire is placed at the bottom of the substrate, opposite to the split ring. Thin wire is responsible for controlling the electrical property or permittivity of the unit cell. In this geometry, the length of the thin wire is 2.5 mm, and the width of the wire is 0.14 mm. Here, we take the 102 observations by varying the width of the strip. We take the values from 0.01 to 0.14 mm, and the step size is 0.01 mm. Simulation results for negative epsilon and mu are shown in Fig. 7. In Fig. 7, we can see that there is no significant frequency shift in the negative values of epsilon and mu, which means width of thin wire is not producing a significant effect on epsilon and mu. However, another interesting result is that all the real values of epsilon are negative from 5 to 9.5 GHz. This represents that thin wires are responsible for the negative epsilon. At the same time, the negative value of mu is also not shifting.
4 Result and Discussion Metamaterials are the future engineering materials that will be used in several microwave and millimeter wave applications. We have seen that materials have a property of negative epsilon and negative which is not found in the natural materials. In this research work, we design a metamaterial unit cell, extract its parameters, and show that for the given design the permittivity and permeability is negative between roughly 9 to 12 GHz frequencies. In the next section of the research work, we establish the relationship between dimension of the split-ring resonator and resonant frequency at which value of epsilon and mu is negative. In the first experiment, we vary the length and width of the SRR and observe that resonant frequency shifts from 9.5 GHz to 10.9 GHz (Table 1). Results of the second and third experiment are shown in Tables 2 and 3. In the second experiment, we varying the gap size from 0.01 to 0.3 mm and we see that
710 Table 1 Resonant frequency for variable length and breadth
Table 2 Resonant frequency for variable gap size
Table 3 Resonant frequency for variable width of thin wire
S. Tiwari et al. S. No.
Dimension in mm
Resonant frequency (GHz)
1
L = 1.9, b = 1.9
9.5
2
L = 2.05, b = 1.95
10
3
L = 2.1, b = 2.1
10.2
4
L = 2.2, b = 2.2
10.4
5
L = 2, b = 2.2
10.9
S. No.
Dimension
Resonant frequency (GHz)
1
G = 0.3
10.9
2
G = 0.17
11.2
3
G = 0.09
11.6
4
G = 0.05
11.8
5
G = 0.01
13
S. No.
Dimension
Resonant frequency (GHz)
1
w = 0.01
10.9
2
w = 0.03
10.9
3
w = 0.05
10.9
4
w = 0.09
10.9
5
w = 0.14
10.9
the resonant frequency shifts from 10.9 to 13 GHz. In the last experiment, we varied the width of the thin wire from 0.01 to 0.4 mm and found that there is no frequency shift. To perform these experiments, we collect 102 samples of each experiment, but here, we are showing only 5 sample values in the following tables.
5 Conclusion Metamaterials are a subclass of a broader category of artificial electromagnetic materials. This research work is mainly divided into three sections. In the first section, we have designed a simple metamaterial unit cell to measure the resonant frequency of the material. In Sect. 2, we analyze the variation in a resonant frequency of the metamaterial with respect to change in the length and width of the split rings. In the third section, we analyze the change in resonant frequency with respect to the gap in the split rings and the width of the thin wire, respectively. We design and simulate the MM unit cell on HFSS and extract the value of epsilon and mu. After extracting the permittivity and permeability, we create the database
56 Analysis of the Behavior of Metamaterial Unit Cell with Respect …
711
with 102 samples. By using Matlab software, we analyze the change in resonant frequency with change in geometrical parameters of the metamaterial unit cell. By this experiment, we can conclude that we can control the resonant frequency by changing the dimension of SRR. The resonant frequency of a metamaterial unit cell can be controlled by altering its dimension like the gap between the split rings, length and width of the split ring, and width of the strip. The result shows that there is a significant change in the resonant frequency by altering the design parameters and without changing the other parameters of the unit cell. By changing the weight of the split ring, we found a variation of 1.5 GHz in the resonant frequency, and by changing the gap of the split ring, we found a variation of 3 GHz, and by changing the width of the strip, we found no variation in resonant frequency.
References 1. Caloz C, Itoh T (2005) Introduction. In: Electromagnetic metamaterials: transmission line theory and microwave applications, NJ, USA: Wiley Interscience, 2005, ch 1, sec 1.1, pp 1–2 2. Glybovski SB, Tretyakov SA, Belov PA, Kivshar YS, Simovski CR (2016) Metasurfaces: from microwaves to visible. Phys Rep 634:1–72 3. Boltasseva A (2014) Empowering plasmonics and metamaterials technology with new material platforms. MRS Bull 39:461 4. Mallick SB, Jung IW, Meisner AM, Provine J, Howe RT, Solgaard O (2011) Multilayered monolithic silicon photonic crystals. IEEE Photonics Technol Lett vol 23, no 11, pp 730–732. https://doi.org/10.1109/LPT.2011.2132698. 5. Bouslama M, Traii M, Gharsallah A, Denidni TA (2015) Reconfigurable radiation pattern antenna based on a new frequency selective surface. In: Antennas and propagation in wireless communications (APWC) IEEE-APS topical conference on 6. Barroso RH, Malpica W (November 2020) An overview of electromagnetic metamaterials. IEEE Lat Am Trans 18(11):1862–1873. https://doi.org/10.1109/TLA.2020.9398627 7. Ptitcyn G, Mirmoosa MS, Tretyakov SA (2017) Instantaneous control of scattering from a time-modulated meta-atom. In: 2019 thirteenth international congress on artificial materials for novel wave phenomena (metamaterials), pp X-321-X-323 8. Vdovychenko OV, Bulgakov AA, Fedorin IV (2014) Polarization and spectral properties of a photonic crystal with a ferrite-semiconductor metamaterial inclusion. Int Conf Math Methods Electromagnet Theory 2014:187–190 9. Ramaccia D, Alù A, Toscano A, Bilotti F (2021) Propagation and scattering effects in metastructures based on temporal metamaterials. Fifteenth Int Cong Artif Mater Novel Wave Phenomena (Metamaterials) 2021:356–358 10. Pendry JB, Holden AJ, Robbins DJ, Stewart WJ (1999) Magnetism from conductors, and enhanced non-linear phenomena. IEEE Trans Microw Theory Tech 47(11):2075–2084 11. Smith DR, Vier DC, Koschny T, Soukoulis CM (2005) Electromagnetic parameter retrieval from inhomogeneous metamaterials. Phys Rev E 71(3), pp 11 12. Pandey A, Chaudhary P (2017) Comparative analysis of resonance characteristics single sided SRR type metamaterials for varying gap size. Int Conf Inventive Commun Comput Technol (ICICCT) 2017:120–122
Chapter 57
Mid-Term Load Forecasting by LSTM Model of Deep Learning with Hyper-Parameter Tuning Ashish Prajesh, Prerna Jain, and Satish Sharma
1 Introduction In today’s technological era, electrical energy plays an essential role in supporting the country’s economy. Electricity is a commodity that cannot be stored on a large scale. Otherwise, it will not be economically feasible. So, it is essential to make the balance between demand and supply to check the energy security concern. Since electricity demand is growing day by day at a rapid pace, it has become essential to know how much power will be needed for that particular geographical area. To understand the accurate future power demand of that specific geographical area, we go for load forecasting and planning for future power generation and consumption in any region. There are lots of methods used for load forecasting. Some important ones are time series analysis and deep learning methods. Time series analysis is how all the observation sequence of data is stored in the time series form, and we can extract all the essential information from time series data. So, the ultimate aim of the time series analysis is to estimate the model that can define the pattern of time series data and do the forecasting. Some essential time series analysis methods are autoregression methods, exponential smoothing methods, cumulative auto-regression moving average method, etc. But the problems with these methods are that it starts producing wrong predictions when a new unique situation is added. Thus, its adaptability is poor in selected time series [1]. Traditional deep learning neural network is a computing system inspired by a biological neural network, as the biological network learns from life experience in A. Prajesh M Tech, MNIT Jaipur, Jaipur, India P. Jain (B) · S. Sharma Department of Electrical Engineering, MNIT, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_57
713
714
A. Prajesh et al.
the same way, an artificial neural network (ANN) learns from experiences [1]. So a deep learning neural network has the unique capability of learning complex and nonlinear relationships among the load data and producing future estimation [1, 2]. Statistical methods have a lot of limitations. Some of them are firstly, its model cannot work with missing data; otherwise, the model will produce a bad result; secondly, statistical methods cannot work with the multivariate dataset, it can only work for univariate dataset, and thirdly, it cannot establish nonlinear complex feature relationship among the training dataset. Deep learning neural network methods have the unique capability of handling all these limitations, as deep learning methods are not sensitive to missing data, can work with multivariate dataset, and can establish complex and nonlinear relationships among the training dataset [1, 2]. ANN models depend closely on recent data trends, so they work well for shortterm forecasting but give poor mid-term or long-term load estimation. For mid-term or long-term estimation, there is a lack of training data [9, 15]. Thus, it will be challenging to establish relationships among the training data, and this results in an increase in cumulative error for long-term estimation [3–8]. To mitigate this issue, we go for recurrent neural network (RNN) which has the unique capability of generating the artificial data by its loop, here the output of hidden layer is given as an input to that layer with the help of a loop, and by this, it has improved the algorithm [2]. Furthermore, there are problems of vanishing gradient and exploring gradient in recurrent neural network (RNN). All these problems are addressed by the updated RNN that is long short-term memory (LSTM). LSTM will remove all of the issues of vanishing gradient, exploring gradient, and shortage of training data [3, 4]. In the LSTM model, all the parameters such as number of neurons, number of hidden layers, and numbers of epochs are given random values. According to that, it has given its result, and forecasting is done [10–14]. Adding to the LSTM model, we have done hyper-parameter tuning, where the tuning of one of the parameters of LSTM is done so that the LSTM model will give the best result. In this paper, we have tuned the number of layers, units, and learning rate by the help of a hyper-parameter tuning method. There are basically two methods of hyper-parameter tuning: grid search and random search [16–19]. In grid search, we form the model for every probable combination of parameters of the LSTM model and see which combination provides the best result for the model. Then we will select the model for forecasting. It is used for a smaller set of training data and is time consuming [16]. While in the random search method, we form the random combination of parameters of the LSTM model, and it will return the best model for which system will give the best result. It has the unique capability of searching for important parameters, and it does not waste redundant time on searching for unimportant parameters. Thus, it can be used for a large set of data and is less time consuming [16–19]. In this paper, we have used the LSTM model of deep learning for mid-term load forecasting where we have tuned the number of layers, number of units, and learning rate by random search method of hyper-parameter tuning. By the help of it, we have reduced the root mean square error.
57 Mid-Term Load Forecasting by LSTM Model of Deep Learning …
715
This paper is organized as follows: Sect. 2 presents the mathematical modeling and architecture of LSTM model of deep learning, and then Sect. 3 presents load forecasting model where we have discussed flowchart, data preprocessing, construction of LSTM model, and hyper-parameter tuning process and we have gone through case study in Sect. 4. Finally, we have discussed conclusion in Sect. 5.
2 LSTM Model Long short-term memory networks are a special kind of RNN as shown in Fig. 1. In particular, LSTM networks are planned to overcome the long-term dependency problems faced by RNN due to vanishing or exploring gradient problems [10–14] LSTM has three gates: Forget gate: It identifies which information is useless and will be discarded away from the cell state on the basis of cell state. This conclusion is taken by a sigmoid layer which is also known as forget gate layer [3]. f t = σ (w f [h t−1 , xt ] + b f )
(1)
where w f = weight, h t−1 = output from previous state, xt = new input, and b f = bias. Input gate: It identifies which new knowledge is needed to be stored in the cell state [3]. It comprises the following two steps: 1: Sigmoidal layer: It is called input gate layer that chooses which values to improve. 2: Tanh layer: The vector of the new candidate’s value is formed, which is appended to the state. Here the mathematical expression is given as follows:
Fig. 1 LSTM architecture [3]
716
A. Prajesh et al.
i t = σ (wt [h t−1 , xt ] + bt
(2)
ct = tanh(wc [h t−1 , xt ] + bc
(3)
where wt = weight, h t−1 = output from previous state, xt = new input, and b = bias Output gate: Here we will run the sigmoid layer to choose which portion of the cell state is going to output. Then it is conceded through the tanh function and multiplied with output of the sigmoid gate, and we will get the output [3]. Here the mathematical expression: ot = σ (w[h t−1 , xt ] + bo )
(4)
h t = ot ∗ tanh(ct )
(5)
where w = weight, h t−1 = output from previous state, xt = new input, b = bias, and ot = output.
3 Load Forecasting Model In this paper, we make the use of a deep learning algorithm (LSTM model) for midterm load forecasting. In Fig. 2, firstly the previous load data of one year is taken in time series form, and then that data is normalized. After that, training and testing samples are generated. In the second step, the LSTM parameters are initialized and the model is constructed; if the mean square error is less than expected value, then the model will be stored, and forecasting can be done if not the second step is repeated [4]. Data preprocessing: In this paper, the load data collated in time series form is preprocessed before sending it to the LSTM model. Load dataset had multiple features which can be load demand power, temperature, etc., so it is important to do feature scaling for machine learning algorithm. In this paper, we have gone for normalization; here we scale the features between 0 and 1 [4]. Mathematical expression of normalization: X new =
X i − min(X ) max(X ) − min(X )
(6)
Then we convert the dataset into 2D matrix input shape. Later the data is converted to 90% training and 10% testing data. Preprocessed data is fed to LSTM layer 1, after that data is processed by the LSTM model and then passed to a dropout layer of 20 percent, which prevents overfitting. Next layer is similar to layer 1. At last all, the data is summarized at a dense layer (Table 1).
57 Mid-Term Load Forecasting by LSTM Model of Deep Learning …
717
Fig. 2 Flowchart of LSTM model for load forecasting
Table 1 Construction of LSTM model
Layer
Output shape
Param #
LSTM-1
(None, 23, 150)
91,200
Dropout-1
(None, 23, 150)
0
LSTM-2
(None, 75)
67,800
Dropout-2
(None, 75)
0
Dense
(None, 1)
76
Construction of LSTM model: Total params: 159,076. Trainable params: 159,076. Non-trainable params: 0 The mean square error (MSE) is used as a performance index to evaluate the prediction result [4]: MSE =
n 1 2 (Yi − Yˆi ) n i=1
(7)
718
A. Prajesh et al.
Fig. 3 Hyper-parameter tuning process
where MSE is mean square error, yi is observed values, and yˆi is predicted values
3.1 Hyper-Parameter Tuning Hyper-parameter tuning: In LSTM (RNN), there are large numbers of hyperparameters like number of layers, neurons, epochs, batch size, loss function, and learning rate. Types of hyper-parameter tuning methods are grid search and random search. Grid search: We form a model for each combination of every hyper-parameter value provided, gauge each model [16], and pick out the architecture which harvests the best results. Each model would be fit to the training data and evaluated on the validation data. It works best for small datasets, performing grid search over the defined hyper-parameter space. Random search: Each iteration tries a random combination of hyper-parameters from this grid, records the performance [16–19], and lastly returns the combination of hyper-parameters which provided the best performance. Random search is given more importance in place of grid search for most cases, as all hyper-parameters are not equally important. Random search is used for large datasets [16]. In this paper, we have gone for random search hyper-parameter tuning. Hyper-parameter is tuned to get the best-fit model. In this paper, number of neuron units, layers, and learning rate are chosen and made to pass under model training and model evolution, if the MSE is less for that tuned parameter, then this is the best model fit, otherwise the tuning search loop continues as shown in Fig. 3.
3.2 Hyper-Parameter Tuning Summary In this paper, we are applying random search method for searching the best parameters for the LSTM model. We are tuning the number of layers, number of units, and learning rate in the range as specified (Table 2).
57 Mid-Term Load Forecasting by LSTM Model of Deep Learning … Table 2 Minimum and maximum values of LSTM model’s parameter
Table 3 Construction of LSTM model with hyper-parameter tuning
719
Number of layers
Min. value:2
Max. value: 6 Step: 1
Number of units
Min. value:32
Max. value: 512
Step: 32
Learning rate
Min. value: 0.1
Max. value: 0.01
Step: 0.001
Layer
Output shape
Param #
LSTM-1
(None, 23, 320)
412,160
Dropout-1
(None, 23, 320)
0
LSTM-2
(None,23, 320)
820,480
Dropout-2
(None, 23,320)
0
LSTM-3
(None, 160)
307,840
Dropout-3
(None, 160)
0
Dense
(None, 1)
161
This model will search the optimal value of number of layers from minimum value of two to six with the step size of 1, number of units for each layer from minimum value of thirty two to five hundred and twelve with the step size of 32, and learning rate of 0.1, 0.01, and 0.001. Among all the best parameters are number of layers: 3, number of units: 320,320,160, and learning rate: 0.001. Preprocessed data is fed to LSTM layer one with the updated parameters, after that data is processed by the LSTM model and then passed to the dropout layer of 20%, which prevents overfitting. Next layer is similar to layer one. At last all, the data is summarized at a dense layer. This process is repeated until the RMSE error was converged (Table 3). Total params: 159,076. Trainable params: 159,076. Non-trainable params: 0
4 Case Study The data is taken from a local power company for our experiment. Data from January 1, 2016, to January 1, 2017, is taken. Here the sampling period is of 1 h, and a total 8784 sampling points are taken. From January 1, 2016, to December 13, 2016, it is taken as training samples for predicting the load from December 14, 2016, to January 1, 2017 (Fig. 4).
720
A. Prajesh et al.
Fig. 4 One-year load data from a city power company
In this paper, the total training samples taken were 8344, and 440 load data samples are taken as test data. Here the model is trained, and prediction is made by the help of the LSTM model. In order to get better results, we have done hyper-parameter tuning, and comparative analysis is made.
4.1 Error Calculation Without Hyper-Parameter Tuning The LSTM model trains sufficient data of 2016 year until the error reduces to desirable limit. In order to estimate the error, root mean square error (RMSE) is calculated. The average RMSE was 0.1387325335 (Fig. 5).
Orange: Predicted Blue: Ground Truth
Fig. 5 Estimation result power demand for year 2016 without hyper-parameter tuning
57 Mid-Term Load Forecasting by LSTM Model of Deep Learning …
721
Orange: Predicted Blue: Ground Truth
Fig. 6 Estimation result power demand for year 2016 by hyper-parameter tuning
4.2 Error Calculation with Hyper-Parameter Tuning The RNN (LSTM) with hyper-parameter tuning trains sufficient data of 2016 year until the error reduces to the desirable limit. Here the random search CV method is applied. In order to estimate the error, root mean square error (MSE) is calculated. The average RMSE was 0.11124 (Fig. 6). From the above study, we can deduce that the LSTM model with hyper-parameter tuning is giving better results as compared to the LSTM forecasting model, and our result is improved by 19.81%.
5 Conclusion The LSTM model of deep learning is very useful in mid-term load forecasting. The LSTM model not only solves the problem of vanishing gradient and exploring gradient but also reduces the accumulated error. The LSTM model with hyperparameter modeling (random search) optimizes the model’s architecture by tuning the number of layers, units, and learning rate. By the help of this model, we have reduced the RMS error from 0.1387325335 to 0.11124, and our forecasting has improved by 19.1%.
References 1. Yegnanarayana B (2006) Artificial neural networks.1nd edn. Prentice-Hall of India Private Limited India 2. Baek SM (2019) Mid-term load pattern forecasting with recurrent artificial neural network. IEEE Access 7:172830–172838
722
A. Prajesh et al.
3. Mao H, Zeng XJ, Leng G, Zhai YJ, Keane JA (2009) Short-term and midterm load forecasting using a bilevel optimization model. IEEE Trans Power Syst 24(2):1080–1090 4. Song KB, Ha SK (2004) An algorithm of short-term load forecasting. Trans Korean Inst Electr Eng A 53(10):529–535 5. Song KB, Baek YS, Hong DH, Jang G (2005) Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans Power Syst 20(1):96–101 6. Chen H, Canizares CA, Singh A (2001) ANN-based short-term load forecasting in electricity markets. In: 2001 IEEE power engineering society winter meeting. Conference proceedings (Cat. No. 01CH37194) (vol 2, pp 411–415). IEEE 7. Lee KY, Cha YT, Park JH (1992) Short-term load forecasting using an artificial neural network. IEEE Trans Power Syst 7(1):124–132 8. Rui Y, El-Keib AA (1995) A review of ANN-based short-term load forecasting models. In: Proceedings of the twenty-seventh southeastern symposium on system theory, pp 78–82. IEEE 9. Yahaya AS, Javaid N, Latif K, Rehman A (2018) An enhanced very short-term load forecasting scheme based on activation function. In: 2019 international conference on computer and information sciences (ICCIS), pp 1–6. IEEE 10. Cui C, He M, Di F, Lu Y, Dai Y, Lv F (2020) Research on power load forecasting method based on LSTM model. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC), pp 1657–1660. IEEE 11. Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2019) Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851 12. Bouktif S, Fiaz A, Ouni A, Serhani MA (2020) Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting. Energies 13(2):391 13. Zang H, Xu R, Cheng L, Ding T, Liu L, Wei Z, Sun G (2021) Residential load forecasting based on LSTM fusing self-attention mechanism with pooling. Energy 229:120682 14. Kathirgamanathan A, Patel A, Khwaja AS, Venkatesh B, Anpalagan A (2022) Performance comparison of single and ensemble CNN, LSTM and traditional ANN models for short-term electricity load forecasting. J Eng 15. Velasco LCP, Arnejo KAS, Macarat JSS (2022) Performance analysis of artificial neural network models for hour-ahead electric load forecasting. Procedia Comput Sci 197:16–24 16. Mantovani RG, Rossi AL, Vanschoren J, Bischl B, De Carvalho AC (2015) Effectiveness of random search in SVM hyper-parameter tuning. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8. Ieee 17. Villalobos-Arias L, Quesada-López C (2021) Comparative study of random search hyperparameter tuning for software effort estimation. In: Proceedings of the 17th international conference on predictive models and data analytics in software engineering, pp 21–29 18. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2) 19. Villalobos-Arias L, Quesada-López C, Guevara-Coto J, Martínez A, Jenkins M (2020) Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM international conference on predictive models and data analytics in software engineering, pp 31–40
Chapter 58
A Comprehensive Survey: Benefits, Recent Works, Challenges of Optimal UAV Placement for Maximum Target Coverage Spandana Bandari and L. Nirmala Devi
1 Introduction UAVs, sometimes referred to as drones in the business world, are utilized in a wide range of applications. There is a good way to get to locations that have not been investigated before and learn about diverse occurrences without jeopardizing human safety Mozaffari et al. [1]. Despite their origins in military circumstances, improved portable sensors, as well as satellite location, have made UAVs suitable for civil applications. Vertical takeoff and landing versions are preferred in civil applications because of their low cost and mobility. Multirotor UAVs are a type of UAV that may be used in a variety of agricultural and environmental applications Lyu et al. [2]. UAV-Base Stations (BSs) have developed as a cost-effective way to provide wireless services. The need for UAV-BSs might occur in a variety of circumstances, such as when the terrestrial network fails or when traffic from a crowded macro-BS has to be offloaded. It can also act as a crucial role in enabling energy-effective Internet of Things (IoT) connectivity by collecting information from IoT and transmitting it to various devices Alzenad et al. [3]. Considering its various advantages, UAV faces several hurdles. Despite terrestrial channels, in which the position of BS is fixed, and therefore the route loss is based on the user’s position, the Air-to-Ground (A2G) channel method represents a role of both the users together with the UAV-BS’s location [4]. Where to place the UAV-BS is a fundamental difficulty in UAV-aided communications. Moreover, unlike terrestrial BSs, UAV-BS placement is not at all a 2D conflict. It is, in fact, a 3D positioning issue. As a result, the UAV-BS does not cover the whole service area, and only partial coverage may be available Yang et al. [5]. Considering a restricted UAV-BS described S. Bandari (B) · L. N. Devi Department of Electronics and Communication Engineering, University College of Engineering, Osmania University, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_58
723
724
S. Bandari and L. N. Devi
concerning the received SNR, a fundamental difficulty tackled is where to place the UAV-BS to maximize the count of users covered by Mozaffari et al. [6]. An aerial BS in UAV communications is often a low-altitude platform based on the “altitude, position, transmit power, and kind of UAVs, as well as the environment’s features”. The optimal positioning of UAVs has piqued researchers’ attention in this respect Bushnaq et al. [7]. The majority of the UAVs’ positions, however, cannot be determined constantly or with confidence in various real-world situations, like in GPS-fewer areas Zhang et al. [8]. It is necessary to examine both the multi-UAV RL and area coverage problems in this situation. The control inputs are attained by handling the optimal UAVs’ coverage placement issue under communication and motion constraints using different “optimization algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA)”, heuristic algorithms, and so on, to compromise the attempts of locational optimization placement within area coverage placement and relative localization accuracy Lagum et al. [9]. There are still numerous technological obstacles in the investigation of deployment coverage. To begin with, unlike land BS, a UAV-BS route loss is set by the user and the UAV’s position. The placement of a UAV-BS, on the other hand, represents a 3D issue having nonlinear restrictions. Furthermore, many modern UAV-BSs are battery-powered and contain a short operational period Karakaya [10]. As a result, it is very important to research a strategy that covers the work area fast and effectively while also adjusting to a range of circumstances and achieving the user’s QoS. The paper contribution is: • To provide a critical literature review on the intelligent UAV placement for attaining maximum target coverage by gathering the existing works. • To review and categorize the algorithms together with the performance measures and the implementation platforms of the various UAV placement models in each contribution. • To identify the issues and limitations that must be labeled in modeling and understanding regarding the competent, effective, and robust UAV deployment models, as well as future research areas. The paper organization is as follows. Section 1 gives the introduction to optimal UAV placement. The literature review on optimal UAV placement is shown in Sect. 2. Section 3 explains the algorithmic analysis and performance measures concentrated on optimal UAV placement. The research gaps and challenges are discussed in Sect. 4. The conclusion is in Sect. 5.
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
725
2 Literature Review on Optimal UAV Placement for Maximum Target Coverage 2.1 Chronological Review The chronological review of the optimal UAV placement concerning maximum target coverage is shown in Fig. 1. Here, 20 papers are considered from the years 2017–2021. Initially, the UAV-based approach was developed in 1993. Few of the conventional UAV-based research works from 1999 to 2014 are depicted here. In Chen et al. [11], the author has implemented and modified the Max–Min Ant System (MMAS) algorithm for solving the practical optimization issue. The suggested method has generated routing plans for a restricted count of UAVs to cover the greatest number of targets considering their flight range. In Ruddle et al. [12] the author has analyzed the path planning for a UAV tracking a ground target can be computed as an optimal control issue containing a set of boundary situations, a set of boundary situations, a system dynamic, a cost criterion and control constraints. Thyson and Shui [13] analyzed the requirements of observers for flights of an Uninhabited Air Vehicle (UAV) that were experimented with a help of desktop Virtual Environment (VE). In Torun [14], a UAV system acts as a UAV-aided weapon system for performing boost-stage intercept and attacking the booster’s launcher. In Hu et al. [15], the author has analyzed the UAV requirements such as operational restrictions, effects on UAV missions, battlefield situations, technological limits, and environmental situations. Since, in this research work, we collect the research works from 2017 to 2022. Out of these, a maximum of the papers are considered from the year 2021. Here, 5% of the works are gathered from the year 2017, 10% of the works are considered from the year 2018, 25% of the works are considered from the year 2019, 20% of the works are considered from the year 2020, and remaining 40% of the works are considered from the year 2021.
2.2 Literature Review In 2017, Hu et al. and Alzenad et al. [16] have investigated the challenge of searching for moving objects with multi-UAV, searching the mission area for a count of moving targets; the targets could acquire location information from sensing devices on an irregular basis and taking necessary steps to improve the distance among themselves and the UAVs. In comparison to the coverage search technique and the random search technique, simulation findings demonstrated that the suggested technique might effectively increase cooperative searchability and identified various targets per unit time. In 2018, Alzenad et al. and Khan et al. [17] have investigated a new 3D UAV-BS deployment that optimizes the count of users covered while meeting varying QoS criteria. To solve the placement problem, a lowcomplexity technique called the Maximum Weighted Area (MWA) algorithm was
726
S. Bandari and L. N. Devi
Fig. 1 Chronological review
offered. Simulation studies demonstrated that the MWA algorithm worked fairly similarly to the ES method with a considerable minimization in complexity. Khan et al. and Khuwaja et al. [18] have studied the subject of low-complexity target tracking to capture and track moving targets with flying robots. To optimize the effectiveness of the suggested algorithms, partial information associated with the target mobility was used. “Predictive fuzzy, predictive incremental fuzzy, and local incremental fuzzy” were three relatively effective techniques proposed. Drone quadcopters in a real-world indoor testbed dubbed drone-be-gone. The deployment backed up the simulation findings and showed that the offered methods were suitable for real-time applications. In 2019, Khuwaja et al. and Chauhan et al. [19] have suggested a coordinated multi-UAV tactic in two situations. During the first scenario, UAVs were considered to be symmetrically placed at a similar optimum height and transmit power. The numerical findings showed that the “SINR threshold, separation distance, and count of UAVs” and their formations should all be carefully chosen to maximize coverage within the target region while minimizing needless expansion beyond it. As a result, this study offered essential design principles for numerous UAV deployments in the context of co-channel interference. Chauhan et al. and Zhang and Duan [20] have proposed a framework for integer linear programming to optimize coverage. The novel formula was known as “Maximum Coverage Facility Location Problem with Drones (MCFLPD)”. It was a complicated issue, and even for comparatively small problem sizes, an existing MIP solver might take too long to discover viable solutions. A variety of scenarios were conducted to demonstrate the impact of changing drone battery capacities on coverage. Zhang and Duan Yao et al.
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
727
[21] have investigated two rapid UAV deployment problems: one was to reduce the overall deployment delay (min–sum) for performance reasons, and the next was to minimize the maximum deployment time among the entire UAVs (min–max) for fairness reasons. Generally, it was shown that both min–max and min–sum issues were NP-complete. An optimum min–max method was offered with lesser computational complexity O(n 2) for deploying UAVs from the same place. Yao et al. and Das et al. [22] have concentrated on “UAV offline route planning for a coverage search mission in a river environment”. This brief attempted to develop UAV viable paths. To begin, the prior likelihood distribution was approximated using a Gaussian mixture model, and various river segments having higher detection probability matching Gaussian components might be recovered. Moreover, simulations on a genuine river map were used to evaluate the performance of the introduced method, and the results showed that it performed well in a variety of circumstances. Das et al. and Campo et al. [23] have investigated the challenge of mobile target tracking for two distinct categories of targets, with the lowest count of mobile trackers. The approaches were presented for calculating the least count of trackers, and their trajectories necessary to monitor the entire mobile targets provided the observation duration and target trajectories. In this study, two types of mobile targets were regarded: 1) targets that must be tracked for the duration of the observation and 2) targets that must be tracked at least once throughout the observation time. It was shown that the issue was significantly more challenging, i.e., NP-complete, even when target trajectories were announced in advance. In 2020, Viviana et al. and Zhao et al. [24] have recommended deploying a low-cost Lightweight UAV (LUAV) to collect agricultural data. A LUAV, on the other hand, contained a shorter flight time and was less sturdy when compared to professional vehicles. To get over these constraints, a LUAV agent was created that used a heuristic method to optimize coverage pathways in known areas. A Kalman filter modification was used to define motions during missions, which was added to the robustness of outdoor positioning. Zhao et al. and Pellegrino et al. [25] have suggested a relative distance-oriented UAV deployment technique using a BS. Furthermore, the algorithm was adaptable to a wider range of circumstances. The suggested algorithm’s coverage was 22.4% greater than random deployment and 9.9%, 4.7%, and 2.1% progressed than “equivalent virtual force-oriented node, circular binary segmentation, and hybrid local virtual force methods”, according to simulation findings. Pellegrino et al. and Li et al. [26] have presented a method for coordinating dynamic groups of UAVs to tackle a particular challenge of area coverage. Previous solutions to the problem usually used a series of fixed UAVs. Xiong et al. and Kyriakakis et al. [27] have looked at how to arrange several UAVs’ trajectories for sweep coverage. A mathematical model was developed that took into account the various targets in a provided region. Next, based on the model’s features, a heuristic approach called “Weighted Targets Sweep Coverage (WTSC)” was presented to determine the best path, which took into account target weights as well as UAV performance restrictions. Furthermore, several numerical tests and comparisons were presented with prior work to describe the advantage of the recommended technique.
728
S. Bandari and L. N. Devi
In 2021, Kyriakakis et al. and Cho et al. [28] have proposed a “Cumulative UAV Routing Problem (CUAVRP) technique”. The goal of coverage path design was to find a route that passed through all of the points of interest in a given region. This article analyzed a search and rescue mission with a uniform fleet of UAVs. The goal was to reduce the total arrival times at the entire sites in the region of interest, finishing the search with the least amount of delay possible. Also developed and verified was the min–max goal. The cumulative UAV routing problem was solved using three variants of the “Parallel Weighted Greedy Randomized Adaptive Search ProcedureVariable Neighborhood Descent (GRASP-VND) method”. Won et al. and Savkin and Huang [29] have suggested a two-phase technique in maritime SAR for tackling the “Coverage Path-Planning (CPP) problem of multiple UAV regions”. A Randomized Search Heuristic (RSH) technique was designed to handle the problem for large-scale cases. To test the algorithm’s performance, a series of numerical tests were run. Experimental findings demonstrated that the RSH produced a superior solution with a 0.7% optimality gap in a fraction of the time it took a commercial solver to compute. Furthermore, in terms of solution quality, the grid-oriented CPP method beats those utilized in prior studies. In addition, the findings of real-world flying tests were published in the maritime region that was conducted utilizing the suggested method. Savkin and Huang and Bushnaq et al. [30] have looked at using self-driving drones to give Internet access to disaster victims. A range-oriented reactive drone deployment method was presented, as opposed to certain current locationoriented techniques. It is decentralized and simple to put into action in real time. Comparisons and simulations with a benchmark system were used to show the algorithm’s performance. Bushnaq et al. and Zhang et al. [31] have compared “T-UAV and regular/untethered UAV (U-UAV)-based cellular traffic unloading from a geographical region with heavy traffic” circumstances concerning performance. Initially, joint distance distributions were constructed within “hotspot users, the Terrestrial Base Station (TBS), and the UAV” with the help of stochastic geometry techniques. A user association strategy was created, and related association areas were mathematically determined, to optimize the end-to-end SNR. With the adequate GS sites’ tether length and accessibility, numerical studies demonstrated that T-UAV beats UUAV. Zhang et al. and Pehlivanoglu and Pehlivanoglu [32] have focused on relative localization-oriented optimal area coverage placement with the help of numerous UAVs. Cooperative coverage control and multi-UAV relative localization must be performed concurrently in this instance, which was a difficult task. In this work, a relative localization method was presented based on a distributed coverage control law and a single landmark. It balanced optimal coverage control and optimal relative localization and addressed with Sequential Quadratic Programming (SQP) methods. The suggested technique might ensure that a group of UAVs could effectively be located cooperatively while also completing the area coverage job, according to simulation findings. Pehlivanoglu and Pehlivanoglu and Li et al. [33] have used artificial intelligence approaches such as the “Genetic Algorithm (GA), Ant Colony Optimizer (ACO), Voronoi diagram, and clustering algorithms”. This article’s major contribution was to offer early population augmentation strategies in GA, which would speed up the convergence procedure. Three techniques were incorporated into the initial
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
729
population stage of the GA to prevent a UAV from crashing. To get out of difficulties, the initial technique used Voronoi vertices as extra waypoints. As additional waypoints, the next technique used cluster centers to generate Voronoi vertices. The results demonstrated that collision with the terrain surface was a local event and that addressing the problem through the cluster center of collision locations yielded the optimal outcomes, with at least a 70% reduction in the count of objective function evaluations necessary. Li et al. and Zheng and Ma [34] have investigated the challenge of numerous UAVs cooperating to seek dynamical moving objects. To unify environmental data within the UAVs, an updated Least Square Method (LSM) compensated for the lost data. The efficacy of the above cooperative searching technique was validated by simulated data and comparative studies with previous techniques. Zheng et al. and Kishk et al. [35] have suggested an intelligent target detection approach for UAV swarms. To partition the search space into cubes, a target-feature-informationoriented disintegration approach was initially developed. It allowed the UAVs’ routes to be optimized throughout the search phase. The entire targets, with detecting angle limits, might be identified by UAVs using the suggested technique, according to simulation findings. Furthermore, using the 3D probability map improved search effectiveness terms from 23.4 to 78.1%.
3 Algorithmic Analysis and Performance Measures Concentrated in Optimal UAV Placement 3.1 Algorithmic Analysis The algorithmic analysis showing the optimal UAV placement for the maximum target coverage is given in Fig. 2. Here, various forms of algorithms such as optimization algorithms, greedy algorithms, and several miscellaneous algorithms are considered in the gathered contributions. In Savkin and Huang [29], the author has developed a Randomized Search Heuristic (RSH) algorithm to avoid large-scale instance issues. In Zhao et al. [24], the author has implemented a Dijkstra’s algorithm to select the path. In Khan et al. [17], the author has designed a low-complexity algorithm, namely Maximal Weighted Area (MWA) algorithm to deal with the placement issue. In Chauhan et al. [19], the author has developed a path-planning algorithm to obtain a maximization of energy efficiency and minimization of the interference and latency. In Bushnaq et al. [30], the author has implemented a range-based reactive drone deployment algorithm for minimizing the average drone-user distance. In Pehlivanoglu and Pehlivanoglu [32], the author has developed a single landmarkbased relative localization algorithm with a distributed coverage control law. In Kyriakakis et al. [27], the author has implemented a heuristic algorithm Weighted Targets Sweep Coverage (WTSC) to find the optimal path. In Kishk et al. [35], the author has developed a KuhnMunkres (KM) algorithm-aided path-planning approach for UAVs to traverse the cubes. The majority of the work used optimization algorithms in the
730
S. Bandari and L. N. Devi
Fig. 2 Algorithmic analysis
form of optimal placement algorithm, local optimization algorithm, hybrid PSO, and GA. Here, 25% of the works considered optimization algorithms, 15% of the works used the greedy algorithms, and the remaining miscellaneous algorithms like RSH algorithm, wavefront algorithm, Dijkstra’s algorithm, spiral algorithm, path-planning algorithm, reactive drone deployment algorithm, simulated annealing search algorithm, SQP algorithm, dynamic swarms, WTSC algorithm, pseudo-polynomial time algorithm, clustering algorithm, auction algorithm, and KM algorithm are used only in 5% of the works, respectively.
3.2 Performance Metrics The different kinds of performance measures being used in the collected works are listed in Table 1. Here, the majority of the works considered time measurement followed by distance, coverage ratio, coverage probability, and RMSE. The time measurement is used in 75% of the works, distance and coverage ratio are used in 25% of the works, coverage probability and RMSE are used in 10% of the works, and remaining miscellaneous measures such as sensitivity, CDF, density ratio,
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
731
delay, distribution probability, speed, and radius are used only in 5% of the works, respectively. Table 1 Performance metrics utilized in the UAV optimal placement Citations
Time
Distance
Coverage ratio
Coverage probability
RMSE
Miscellaneous
Cho et al. [28] ✓ Savkin and Huang [29]
✓ ✓
Zhao et al. [24] Khan et al. [17]
✓
✓
CDF, density ratio ✓
Chauhan et al. [19] Zhang and Duan [20]
Sensitivity
✓
✓
✓
Bushnaq et al. [30]
✓
Zhang et al. [31]
✓
✓
Pellegrino et al. [25]
✓
✓
Pehlivanoglu and Pehlivanoglu [32]
✓
✓
Li et al. [26]
✓
Kyriakakis et al. [27]
✓
Yao et al. [21] ✓ Khuwaja et al. ✓ [18]
✓
Delay ✓
✓
Das et al. [22] ✓ ✓
Alzenad et al. [16] Li et al. [33]
✓
Zheng and Ma ✓ [34] Campo et al. [23]
✓
Kishk et al. [35]
✓
Distribution probability Speed, radius
732
S. Bandari and L. N. Devi
Fig. 3 Implemented platforms for optimal UAV placement
3.3 Implemented Platforms The several platforms being utilized for the considered contributions are shown in Fig. 3. Here, the majority of the works have used MATLAB as the platform tool. While considering the collected works, MATLAB is used in 56% of the works, Gurobi is used in 13% of the works, Qt and ROS-Indigo are used in 7% of the works, and remaining tools such as MOSEK, CPLEX Optimizer, ROS/Gazebo, and Monte Carlo Simulation are used only in 6% of the works, respectively.
4 Research Gaps and Challenges Coordination of collaborating UAVs has become a hot topic in academia. As a building component, most techniques to coordinate these swarms incorporate a response to a category of issues known as area coverage Pan et al. [36]. The goal of area coverage issues is to gather data regarding a specific interest area (usually a polyhedron or polygon) with the help of a group of UAVs, usually with decentralized coordination and little human participation. This challenge represents the needs of a variety of applications, including emergency and disaster response, gas leak detection, and finding lost individuals, to name a few Mozaffari et al. [37]. To conduct the identifying or tracking duties, “Wireless Sensor Networks (WSNs)” are often established in the target region. WSNs are concerned about sweep coverage because it allows them to monitor many targets in a given region with fewer mobile sensor nodes. With the fast advancement of UAV technology, UAVs are becoming more common in both military and civilian applications. UAVs Zhang et al. [38] may be thought of as sensor nodes that are the life of UAVs in flying missions, and comprehensive coverage of the entire targets in a large-scale monitoring situation is hard to obtain. UAV networks have developed as a possible method for quickly providing
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
733
issues in sensor networks contain just lately been examined. UAVs, except sensors, should be placed in the air, and their flight speed, operational altitude, and wireless coverage radius are completely distinct Azari et al. [39]. For improving surveillance and environmental monitoring, mobile WSNs have been widely used. They are, especially, useful when you need rapid, low-cost, or short-term visual sensing solutions. Nowadays, UAVs have become an increasingly significant and integrated aspect of civilian and military activities. The primary goal of several UAV missions is to visit specified checkpoints in operating space [39]. Determining a workable solution may require too long as the count of checkpoints and limitations grows. The most common use of UAV swarms represents the effective and comprehensive identification of unknown objects. Targets, in the majority of the cases, contain directional features that allow them to be spotted only from specified angles. The major difficulty to be resolved in such instances is how to coordinate UAVs and designate optimal pathways for them to effectively identify the entire targets.
5 Conclusion This study has presented a comprehensive evaluation of the literature on intelligent UAV placement for optimum target coverage. The algorithms developed in all contributions were classified and reviewed. The different UAV placement models’ performance metrics and implementation platforms were also examined. Furthermore, it identified the issues and limitations that must be labeled in modeling and understanding regarding the competent, effective, and robust UAV placement models, as well as future research directions.
References 1. Mozaffari M, Saad W, Bennis M, Debbah M (2017) Mobile unmanned aerial vehicles (UAVs) for energy-efficient internet of things communications. arXiv:1703.05401 2. Lyu J, Zeng Y, Zhang R, Lim TJ (2017) Placement optimization of UAV-mounted mobile base stations. IEEE Commu Lett 21(3):604–607 3. Alzenad M, El-keyi A, Lagum F, Yanikomeroglu H (2017) 3D placement of an unmanned aerial vehicle base station (UAV-BS) for energy efficient maximal coverage, to appear. IEEE Wireless Commun Lett 6(4):1–4 4. Zeng Y, Zhang R (2017) Energy-Efficient UAV communication with trajectory optimization. IEEE Trans Wireless Commun 16(6):3747–3760 5. Yang Z et al (2018) Joint altitude, beamwidth, location, and bandwidth optimization for UAVenabled communications. IEEE Commun Lett 22(8):1716–1719 6. Mozaffari M, Kasgari ATZ, Saad W, Bennis M, Debbah M (2019) Beyond 5G with UAVs: foundations of a 3D wireless cellular network. IEEE Trans Wireless Commun 18(1):357–372 7. Bushnaq OM, Celik A, Elsawy H, Alouini M-S, Al-Naffouri TY (2019) Aeronautical data aggregation and field estimation in IoT networks: hovering and traveling time dilemma of UAVs. IEEE Trans Wireless Commun 18(10):4620–4635
734
S. Bandari and L. N. Devi
8. Zhang H, Song L, Han Z (2020) Cellular assisted UAV sensing. In: Unmanned aerial vehicle applications over cellular networks for 5G and beyond. Springer, Cham, Switzerland, pp 101– 221 9. Lagum F, Bor-Yaliniz I, Yanikomeroglu H (2018) Strategic densification with UAV-BSs in cellular networks. IEEE Wireless Commun Lett 7(3):384–387 10. Karakaya M (2014) UAV route planning for maximum target coverage. CSEIJ 4(1) 11. Chen H, Chang K, Agate CS (2009) Tracking with UAV using tangent-plus-Lyapunov vector field guidance. In: 2009 12th International Conference on Information Fusion, Seattle, WA, USA 12. Ruddle RA, Savage JC, Jones DM (1999) Effects of camera configurations on target observation that is performed from an uninhabited air vehicle 43(1) 13. Thyson NA, Shui VH (1996) Digitizing BMC3 and fire control for UAV-based theater missile defense. Proceedings, Digitization of the Battlefield 2764:2764 14. Torun E (1999) UAV requirements and design consideration. Technical & Project Management Department, pp 26–28 15. Hu, X., Liu Y and Wang, G. Optimal search for moving targets with sensing capabilities using multiple UAVs. J Syst Eng Elect 28(3):526–535 16. Alzenad M, El-Keyi A, Yanikomeroglu H (2018) 3-D placement of an unmanned aerial vehicle base station for maximum coverage of users with different QoS requirements. IEEE Wireless Comm Lett 7(1):38–41 17. Khan M, Heurtefeux K, Mohamed A, Harras KA, Hassan MM (2018) Mobile target coverage and tracking on drone-be-gone UAV cyber-physical testbed. IEEE Syst J 12(4):3485–3496 18. Khuwaja AA, Zheng G, Chen Y, Feng W (2019) Optimum deployment of multiple UAVs for coverage area maximization in the presence of co-channel interference. IEEE Access 7:85203– 85212 19. Chauhan D, Unnikrishnan A, Figliozzi M (2019) Maximum coverage capacitated facility location problem with range constrained drones. Transportation Res Part C: Emer Tech 99:1–18 20. Zhang X, Duan L (2019) Fast deployment of UAV networks for optimal wireless coverage. IEEE Trans Mob Comput 18(3):588–601 21. Yao P, Xie Z, Ren P (2019) Optimal UAV route planning for coverage search of stationary target in river. IEEE Trans Cont Syst Tech 27(2):822–829 22. Das A, Shirazipourazad S, Hay D, Sen A (2019) Tracking of multiple targets using optimal number of UAVs. IEEE Trans Aerospace Elect Syst 55(4):1769–1784 23. Campo LV, Ledezma A, Corrales JC (2020) Optimization of coverage mission for lightweight unmanned aerial vehicles applied in crop data acquisition. Expert Syst Appl 149 24. Zhao T, Wang H, Ma Q (2020) The coverage method of unmanned aerial vehicle mounted base station sensor network based on relative distance. Int J Distributed Sensor Net 16(5) 25. Pellegrino G, Mota G, Assis F, Gorender S, Sá A (2020) Simple area coverage by a dynamic set of unmanned aerial vehicles. In: 2020 X Brazilian Symposium on Computing Systems Engineering (SBESC), Vol 2020, pp 1–8 26. Li J, Xiong Y, She J, Wu M (2020) A path planning method for sweep coverage with multiple UAVs. IEEE Internet Things J 7(9):8967–8978 27. Kyriakakis NA, Marinaki M, Matsatsinis N, Marinakis Y (2021) A cumulative unmanned aerial vehicle routing problem approach for humanitarian coverage path planning. European J Opera Res 300(3):992–1004 28. Cho SW, Park HJ, Lee H, Shim DH, Kim SY (2021) Coverage path planning for multiple unmanned aerial vehicles in maritime search and rescue operations. Comput Indus Eng 161 29. Savkin AV, Huang H (2021) Range-based reactive deployment of autonomous drones for optimal coverage in disaster areas. IEEE Trans Syst, Man, Cyber: Syst 51(7):4606–4610 30. Bushnaq OM, Kishk MA, Celik A, Alouini M-S, Al-Naffouri TY (2021) Optimal deployment of tethered drones for maximum cellular coverage in user clusters. IEEE Trans Wireless Commun 20(3):2092–2108 31. Zhang Z, Xu X, Cui J, Meng W (2021) Multi-UAV area coverage based on relative localization: algorithms and optimal UAV placement. Sensors 21(7)
58 A Comprehensive Survey: Benefits, Recent Works, Challenges …
735
32. Pehlivanoglu YV, Pehlivanoglu P (2021) An enhanced genetic algorithm for path planning of autonomous UAV in target coverage problems. Applied Soft Comp 112 33. Li L, Zhang X, Yue W, Liu Z (2021) Cooperative search for dynamic targets by multiple UAVs with communication data losses. ISA Trans 114:230–241 34. Zheng X, Ma C (2021) An intelligent target detection method of UAV swarms based on improved KM algorithm. Chin J Aeronaut 34(2):539–553 35. Kishk MA, Bader A, Alouini M-S (2019) On the 3-D placement of airborne base stations using tethered UAVs. arXiv:1907.04299 [Online]. Available: http://arxiv.org/abs/1907.04299 36. Pan C, Ren H, Deng Y, Elkashlan M, Nallanathan A (2019) Joint blocklength and location optimization for URLLC-enabled UAV relay systems. IEEE Commun Lett 23(3):498–501 37. Mozaffari M, Saad W, Bennis M, Nam Y-H, Debbah M (2019) A tutorial on UAVs for wireless networks: applications, challenges, and open problems. IEEE Commun Surveys Tuts 21(3):2334–2360 38. Zhang H, Song L, Han Z (2020) UAV assisted cellular communications. In: Unmanned aerial vehicle applications over cellular networks for 5G and beyond. Springer, Cham, Switzerland 2020:61–100 39. Azari MM, Rosas F, Pollin S (2019) Cellular connectivity for UAVs: network modeling, performance analysis, and design guidelines. IEEE Trans Wireless Commun 18(7):3366–3381
Chapter 59
Comparative Study Between Different Algorithms of Data Compression and Decompression Techniques Babacar Isaac Diop, Amadou Dahirou Gueye, and Alassane Diop
1 Introduction With COVID-19, data exchange, distance education, e-commerce, online activities and telework have become essential. This has made data crucial and bloated. But their storage and transfers are increasingly voracious in terms of media and costs. This problem is one of the problems to be solved at the UN because the rare earths with which electronic chips are made, certain storage media are deficient and very attractive to the point of being the cause of certain economic and asymmetric wars and future conflicts, internals and internationals. Despite the many solutions proposed such as increasing storage space (Datacenter, hard disk, storage media), bandwidth (5G, fiber optic, fiber air antenna, etc.), centralizing data so that it does not are not replicated (cloud, Datacenter, server, etc.), data compression remains the most efficient and least expensive. However, there are several techniques in this field that are applied in 3D, images [1], big data, e-commerce, telework, security, finance, medicine, smart cards, IoT, AI, etc. It is in this momentum that we have allowed ourselves to compare three of them (BID, Huffman, LZ77) in order to determine the best. In this paper, we will deal with (1) the state of the art, (2) the presentation of the algorithms, (3) the comparison and end with (4) with a conclusion.
B. I. Diop (B) · A. D. Gueye ICT, Alioune DIOP University of Bambey, Bambey, Senegal e-mail: [email protected] A. D. Gueye e-mail: [email protected] A. Diop ICT, Virtual University of Senegal, Diamniadio, Senegal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_59
737
738
B. I. Diop et al.
But before that, we will present related works on compression and decompression techniques.
2 State of the Art The BID compression algorithm [2] is a lossless compression method. Its basic formula is fx = ab ± c such as the size of fx < size of a + size of b + size of c. This performance increases when the size value fx of the file approximates ab. However, with the advancement of technology, compression software is used in several areas and new techniques are constantly being born or improved. In this part, we will present some algorithms for data compression and decompression techniques. In 2021, presentations on safety with AES combined with Huffman [3] and LZ77 [4] were presented. AES is the most popular encryption technique for protecting data. It is combined with a compression method [3, 5] to reduce the size of the encrypted data. During the year 2021, a presentation was made on a fast authentication technique (login and password) which uses a very efficient compression technique [6]. This makes it possible to reduce the comparison time and therefore the response time. It was also a year where presentations on compressions applied to big data in order to reduce the occupied space and facilitate the transfer of data, on the classification of malicious software in order to better process and order them in a record. They will also be applied to storage [7], images [1], blocks [4] in order to reduce the size of the data to save space. Thus, they made it possible to improve the LZ compression dictionary [8], the operators [6], the transmission of aerodynamic data, the code frequency [9] and the fluidity of communication [10]. In 2020, conjecture resolution was presented in [1], it is a compression method that replaces repeated strings with a smaller symbol. A DNA data compression method, this technique is based on an enhancement of LZ77 [11] by CWT to reduce its compression percentage. A publication has been made to show the use of LZ77 [8] on 1D and 2D to compress images, drawings, figures or planes in 1 and 2 dimensions. A presentation of the LZ77 logic decompression method was presented in [10], it is a technique that accelerates LZ77 in order to reduce the size of the data to be transmitted. This saves bandwidth and resources (memory, CPU, etc.). A tree decomposition technique to improve Huffman, this method consists of using the best probability tree to reduce the image data as much as possible [6]. In 2019, a comparison of Huffman (a probability-based compression method) and LZ77 (a dictionary-based compression technique) on a color image was presented in [8]. It was also that of Huffman and dicom (compression technique for medical imaging) in [6], the compression based on LZ77 was in [4] and bidirectional text compression in [5].
59 Comparative Study Between Different Algorithms of Data Compression …
739
In 2018, a paper that explains the blowfish (security software) combined with the Huffman [12] compression technique was published. Its main purpose is to reduce the size of the encrypted data in order to be an improvement of the blowfish technique. During the year 2017, the comparison between BWT (a compression technique that uses Burrows-Wheeler) and LZ77 [8] was made, implementation of RLBWT based on BWT and its application with LZ77 [10]. In 2016, a Huffman and LZ77 comparison on progressive image was demonstrated in [1]. The dynamic relative compression is a technique based on the dynamic partial sum and concatenation of sub-strings, and it allows to compress the data. The implementation of a hashing algorithm combined with the LZ77 compression technique [4] makes it possible to reduce the volume of secure data. Big data genomic compression consists of compressing bioinformatics genome data with LZ77 and BWT. This part allowed us to relate the compression methods and the works as examples. In the following, we will present three lossless algorithms and compare them.
3 Presentation of Algorithms Until today, there are two types of compressions: lossy compressions and lossless compressions. As their name suggests, lossy compression does not allow the original data to be recovered: A is compressed to give B,the reverse operation from B to A is not possible. Examples of lossy compression: MP3, JPEG, MP4, AAC, PNG, etc. In this part of the document, we will present three lossless algorithms which will be compared.
3.1 Huffman Coding Until today, there are two types of compressions: lossy compressions and lossless compressions. As their name suggests, lossy compression does not make it possible to recover all of the compressed data but remains readable (MP3, JPEG, MP4, AAC, PNG, etc.) and in lossless compression, the compressed data is completely recovered (winrar, zip, huffman, LZ77, BID, etc.). In this part of the document, we will present three lossless algorithms which will be compared. This method is a probability-based lossless compression technique. Huffman coding is very powerful and widely used. Example: Here is a text:” for each rose, a rose is a rose”.
740
B. I. Diop et al.
Fig. 1 Tree of Huffman coding
a
each
is
for
rose
, space
2
1
1
1
3
1
After establishing the statistics, we are going to design the Huffman tree (Fig. 1). This tree (Fig. 1) allows us to establish the compression by following the position (the probability) of the composing of the text in the sheets, example: “for” = 0010 et “rose” = 1. Thus, Original text: for each rose, a rose is a rose. Compressed text: 0010 0000 1 0001 01 1 0011 01 1.
3.2 Code LZ77 It is a lossless compression technique, and LZ77 is found in many compression software. It is a method that uses a dictionary to compress. To facilitate the compression to an uninitiated domain, we will use the following example. Example: « et ce terrât = etc.», « United States of America = USA», « Mister = Ms.», « Madam = Mm.» We are going to send the abbreviations in bold to the customer who has the same dictionary so that he can reconstruct the 4 starting words. So, we reduce the size of the message before sending. The reality is much more complex as shown by [1, 4, 8, 10].
3.3 Code BID It is a method that was first presented in [2]. BID as a lossless compression algorithm. Its basic formula is F(x) = a b ± c
(1)
This quality allows it to compress the data and find them after decompression. To make it easier to understand, we will try to explain it with a simpler example:
59 Comparative Study Between Different Algorithms of Data Compression …
741
4 = 100, 8 = 1000, 48 = 65, 536 = 10, 000, 000, 000, 000, 000 Size a = 3 bits, size of b = 4 bits, size of fx = 17 bits. So, fx of 17 bits is reduced to 3 bits + 4 bits, which equals 7 bits. The paper [2] is more explicit about this compression method. Here is an example application (software): We are required to give a pseudocode of the algorithm as an example because it does not exist in the referenced articles. As shown in the following illustration. Algorithm 1. Algorithm BID Algorithm: BID 1. Input: Enter file 2. Output: a, b, c, x 3. Fx