151 109 14MB
English Pages 516 [485] Year 2023
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Nibaran Das Juwesh Binong Ondrej Krejcar Debotosh Bhattacharjee Editors
Proceedings of International Conference on Data, Electronics and Computing ICDEC 2022
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
Nibaran Das · Juwesh Binong · Ondrej Krejcar · Debotosh Bhattacharjee Editors
Proceedings of International Conference on Data, Electronics and Computing ICDEC 2022
Editors Nibaran Das Department of Computer Science and Engineering Jadavpur University Kolkata, India Ondrej Krejcar Faculty of Informatics and Management University of Hradec Kralove Hradec Králové, Czech Republic
Juwesh Binong Department of Electronics and Communication Engineering North Eastern Hill University Shillong, India Debotosh Bhattacharjee Department of Computer Science and Engineering Jadavpur University Kolkata, India
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-1508-8 ISBN 978-981-99-1509-5 (eBook) https://doi.org/10.1007/978-981-99-1509-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Preface
With great pleasure, we welcome you to the ICDEC-2022 conference in the Town of Shillong. The conference has attracted papers on various topics, which are grouped into five tracks: “Computer Networks, Communication and Security Track,” “Image, Video and Signal Processing Track,” “IOT, Cloud computing and Smart City Track,” “AI/ML, Big Data and Data Mining Track,” and “VLSI Design, Antenna, Microwave and Control Track.” In addition, tutorials at ICDEC-2022 are a great way to get up to speed on important and active research topics. In this event, we have three tutorials offered by the experts. The technical program includes six oral presentation sessions, three keynote addresses, one invited talk, and one industry lecture. We would like to express our sincere gratitude to the Program Committee and reviewers for carefully reviewing and evaluating the submissions. The success of this conference represents the efforts of all our colleagues, which are too numerous to name individually. We want to thank all of you for participating in this conference, making this a lively community dedicated to the advancement of technology. We hope you will find the ICDEC-2022 program exciting and stimulating and enjoy interacting with researchers from various fields. We offer a grand welcome to all the delegates at the International Conference on Data, Electronics, and Computing (ICDEC-2022), organized by the Department of Electronics and Communication Engineering and the Department of Computer Application, North-Eastern Hill University (A Central University), Shillong, Meghalaya, India, being held during September 7–9, 2022. It is an honor for us to lead ICDEC-2022 this year. ICDEC-2022 maintains a competitive review process, high-quality technical papers, stimulating in-depth tutorials, and a wide variety of keynote presentations. The successful organization of ICDEC-2022 has required the direction and advice provided by the steering committee of COMSYS educational trust, the ICDEC-2022 advisory board, and various other organizing committee members. We appreciate the initiative, dedication, and time of all the committee members and volunteers and the strong support we have received from all our sponsors, especially North-Eastern Hill University, Intel India, for their partial grant-in-aid. I thank all the authors and v
vi
Preface
presenters for their contributions—helping us to have an excellent technical program. We also thank all members of our team who have helped us organize the logistics and event management and worked sincerely in providing publicity to raise adequate awareness in the community. With all efforts perfectly coordinated and aligned in the right path, we are confident that the conference will set the trend around its theme of achieving new heights in the area of “Machine learning, Computational Intelligence, Communications, VLSI, Networks and Systems, Bioinformatics, Internet of Things (IoT) and Security.” We would like to mention the guidance, support, and especially motivation we received from our respected professors at Jadavpur University, without which this conference would never get its present form. We offer our gratitude to Prof. Prabha Shankar Shukla, Vice Chancellor, NEHU, Shillong; Prof. L. Joyprakash Singh, Dean, School of Technology, NEHU, Shillong; Prof. Mita Nasipuri, Jadavpur University, Kolkata; Prof. Subhadip Basu, Jadavpur University, Kolkata; Prof. Ram Sarkar, Jadavpur University, Kolkata; Prof. Dipak Kumar Kole, Jalpaiguri Government Engineering College, Jalpaiguri, West Bengal; Dr. Swagata Mondal, Jalpaiguri Government Engineering College, Jalpaiguri, West Bengal; Dr. Rupaban Subadar, NEHU, Shillong; Dr. Arnab Kumar Maji, NEHU, Shillong; and Dr. Nilanjan Dey, JIS University, Kolkata, for their continuous support and contribution. Finally, a warm welcome to you all to North-Eastern Hill University, Shillong, a beautiful town in Northeastern India. Kolkata, India Shillong, India Hradec Králové, Czech Republic Kolkata, India
Nibaran Das Juwesh Binong Ondrej Krejcar Debotosh Bhattacharjee
Contents
Artificial Intelligence Correlation Analysis of Stock Index Data Features Using Sequential Rule Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nayanjyoti Mazumdar and Pankaj Kumar Deva Sarma
3
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bhubneswar Das and Smriti Kumar Sinha
19
IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nazrul Hoque, Hasin A. Ahmed, and Dhruba K. Bhattacharyya
29
Computer-Aided Identification of Loom Type of Ethnic Textile, the Gamusa, Using Texture Features and Random Forest Classifier . . . . Kangkana Bora, Lipi B. Mahanta, C. Chakraborty, Prahlad Borah, Kungnor Rangpi, Barun Barua, Bishnu Sharma, and R. Mala Spanning Cactus Existence, Optimization and Extension in Windmill Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chinmay Debnath, Krishna Daripa, Ritu Mondal, and Alak Kumar Datta Effect of Noise in Khasi Speech Recognition System . . . . . . . . . . . . . . . . . . Fairriky Rynjah, Khiakupar Jyndiang, Bronson Syiem, and L. Joyprakash Singh Text and Language Independent Classification of Voice Calling Platforms Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tapas Chakraborty, Rudrajit Bhattacharyya, Priti Shaw, Sourav Kumar, Md Mobbasher Ansari, Nibaran Das, Subhadip Basu, and Mita Nasipuri
37
51 59
67
vii
viii
Contents
An Application of Anomaly Detection to Financial Fraud Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Udochukwu Okoro, Usman Ahmad Baba, and Sandip Rakshit
77
A Pre-processing-Aided Deep Transfer Learning Model for Human Object Detection in Crowd Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Soma Hazra, Sunirmal Khatua, and Banani Saha
85
Signal Processing Component Adaptive Superpixel-Based Joint Sparse Representation for Hyperspectral Image Classification . . . . . . . . . . . . . . . . Amos Bortiew and Swarnajyoti Patra
97
Convolutional Autoencoder-Based Models for Image Denoising: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Rowsonara Begum and Ayatullah Faruk Mollah Simultaneous Prediction of Hand Gestures, Handedness, and Hand Keypoints Using Thermal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Sichao Li, Sean Banerjee, Natasha Kholgade Banerjee, and Soumyabrata Dey 3D Point Cloud-Based Hand Gesture Recognition . . . . . . . . . . . . . . . . . . . . 129 Soumi Paul, Ayatullah Faruk Mollah, Mita Nasipuri, and Subhadip Basu Motion-Based Representations for Trajectory-Based Hand Gestures: A Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Debajit Sarma, Trishna Barman, M. K. Bhuyan, and Yuji Iwahori An Approach Toward Detection of Doubling Print Defect Using SSIM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Jayeeta Saha and Shilpi Naskar Multi-variant Statistical Tools and Soft Computing Methodology-Based Hybrid Model for Classification and Characterization of Yeast Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Shrayasi Datta and J. Pal Choudhury Recent Challenges and Opportunities of Multilingual Natural Scene Text Recognition and Its Real World Deployment . . . . . . . . . . . . . . . 173 Kalpita Dutta, Soupik Chowdhury, Mahantapas Kundu, Mita Nasipuri, and Nibaran Das Healthcare Automated Cervical Dysplasia Detection: A Multi-resolution Transform-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Kangkana Bora, Kasmika Borah, Lipi B. Mahanta, M. K. Bhuyan, and Barun Barua
Contents
ix
An IoT-Enabled Vital Cardiac Parameter Monitoring System on Real-Time Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Nayana Dey and Pramit Ghosh Elucidating the Inhibition Mechanism of FDA-Approved Drugs on P-glycoprotein (P-gp) Transporter by Molecular Docking Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Abira Dey, Ruoya Li, Nathalie Larzat, Jean Bernard Idoipe, Ahmet Kati, and Ashwani Sharma A Fast Restoration of Weather Degraded Images . . . . . . . . . . . . . . . . . . . . . 231 Avra Ghosh, Ajoy Dey, and Sheli Sinha Chaudhuri Multi-level Feature-Based Subcellular Location Prediction of Apoptosis Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Soumyendu Sekhar Bandyopadhyay, Anup Kumar Halder, Kaustav Sengupta, Piyali Chatterjee, Mita Nasipuri, Dariusz Plewczynski, and Subhadip Basu The Identification of Chromatin Contact Domains (CCD) in Human Genomes from ChIA-PET Data Using Graph Methods . . . . . . 251 Rafał Chabasi´nski, Kaustav Sengupta, and Dariusz Plewczynski Prediction of COVID-19 Drug Targets Based on Protein Sequence and Network Properties Using Machine Learning Algorithm . . . . . . . . . . 259 Barnali Chakraborty, Atri Adhikari, Akash Kumar Bhagat, AbhinavRaj Gautam, Piyali Chatterjee, and Sovan Saha A Meta-consensus Strategy for Binarization of Dendritic Spines Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Shauvik Paul, Nirmal Das, Subhrabesh Dutta, Dipannita Banerjee, Soumee Mukherjee, and Subhadip Basu Malignancy Identification from Cytology Images Using Deep Optimal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Soumyajyoti Dey, Soumya Nasipuri, Oindrila Ghosh, Sukanta Chakraborty, Debashri Mondal, and Nibaran Das Source Camera Identification Using GGD and Normalized DCT Model-Based Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Pabitra Roy, Shyamali Mitra, and Nibaran Das Lesion Image Segmentation for Skin Cancer Detection Using Pix2Pix: A Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Nemai Roy, Achisman Kundu, Pritiman Sikder, and Showmik Bhowmik Pap-Net: A Patch-Based Multi-Scale Deep Learning Framework for Nucleus Segmentation from Pap Smear Images . . . . . . . . . . . . . . . . . . . 313 Bijoyini Bagchi, Kaushiki Roy, Debotosh Bhattacharjee, and Christian Kollmann
x
Contents
Performance Evaluation of Different Deep Learning Models for Breast Cancer Detection in Mammograms . . . . . . . . . . . . . . . . . . . . . . . . 321 Jayanta Das, Sourav Pramanik, and Debotosh Bhattacharjee Electronics and Communications Compact Multi-Band Bandpass Filters Using Substrate Integrated Waveguide and Fractal Resonators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Amartya Paul and Pankaj Sarkar Testing a 6 Axis Force-Torque Tactile Sensor for Robotic Arm . . . . . . . . . 341 Marut Deo Sharma and Juwesh Binong ASER of Dual-Hop DF Relaying Systems with Coherent Modulation Scheme and MRC at Destination over Fisher-Snedecor F Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Darilangi S. Lyngdoh and Rajkishur Mudoi Modeling and Characterization of Propagation Delay of Negative Capacitance Field-Effect Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Raunak Roy, Rahul Dhabbal, Gargi Bandhyopadhay, Tuhin Karmakar, Rajdeep Sarkar, Anirban Samanta, and Alokesh Mondal Outage Probability of a DF Relay-Assisted Communication System over Mixed Nakagami-m and Weibull Fading Channels . . . . . . . . 381 Deep Baishya and Rajkishur Mudoi A Triple Band Reconfigurable Filtering Antenna with High Frequency Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Sangeeta Das and Pankaj Sarkar 3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under Large-Signal Pulsed Operating Conditions . . . . . . . . . . . . . . . . . . . . 399 Niratyay Biswas, Debraj Chakraborty, Madhurima Chattopadhyay, and Moumita Mukherjee A Compact Miniaturized Implantable Antenna for 2.45 GHz ISM Band Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Santoshkumar Singh Moirangthem, Sourav Roy, Soumendu Ghosh, and Abhishek Sarkhel Miniaturized Dielectric Disc Loaded Monopole Antenna . . . . . . . . . . . . . . 423 Khan Masood Parvez, SK. Moinul Haque, and Laxmikant Minz Design of Optimum n-bit ALU Using Crossbar Gate . . . . . . . . . . . . . . . . . . 435 Rakesh Das, Alongbar Wary, Arindam Dey, Raju Hazari, Chandan Bandyopadhyay, and Hafizur Rahaman
Contents
xi
ML-Based PCB Classification with Gabor and Statistical Features . . . . . 445 Kangkana Bora, M. K. Bhuyan, Yuji Iwahori, Genevieve Chyrmang, and Debajit Sarma Capacity Analysis Over Shadowed BX Fading Channels for Various Adaptive Transmission Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 457 Sisira Hawaibam and Aheibam Dinamani Singh Performance Comparison of L-SC and L-MRC over Various Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Sisira Hawaibam and Aheibam Dinamani Singh A Learning Model for Channel Selection and Allocation in Cognitive Radio Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Subhabrata Dhar, Sabyasachi Chatterjee, and Prabir Banerjee An Ultrathin Multifunctional Polarization Converting Metasurface with Wide Angular Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Soumendu Ghosh, Sourav Roy, Moirangthem Santoshkumar Singh, and Abhishek Sarkhel Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
About the Editors
Nibaran Das received his B.Tech. in Computer Science and Technology from Kalyani University, M.C.S.E. and Ph.D. (Engineering) degrees from Jadavpur University in 2003, 2005, and 2012, respectively. He joined in Computer Science and Engineering Department of Jadavpur University in 2006. Dr. Das has published more than 140 research articles in various international journals, conference proceedings, and chapters in the areas of pattern recognition and image analysis using different machine learning and deep learning techniques. He has co-authored three books. He has supervised around 40 Master degree students till date. He is supervising six Ph.D. students at present in the field of medical and document image processing. He has successfully completed five different projects as a principal Investigator. He has also worked as a co-supervisor in three different projects. He has been a member of the advisory/program/organizing committees of different reputed International conferences and workshops. Juwesh Binong is an assistant professor at the Department of Electronics and Communication Engineering (ECE) in the North-Eastern Hill University (NEHU), Shillong, India. He received his Bachelor of Engineering (B.E.) degree in Electronics and Communication Engineering from the Bhavanagar University, Gujarat, India, in 1999; M.Sc. in Information Technology from the Sikkim Manipal University in 2007, and Ph.D. in Computer Science and Engineering from the Tezpur University, Tezpur, Assam, India, in 2017. His research interests are in the areas of artificial intelligence and its applications. Ondrej Krejcar is a full professor in systems engineering and informatics at the University of Hradec Kralove, Faculty of Informatics and Management, Center for Basic and Applied Research, Czech Republic, and a research fellow at MalaysiaJapan International Institute of Technology, University Technology Malaysia, Kuala Lumpur, Malaysia. In 2008 he received his Ph.D. title in technical cybernetics at Technical University of Ostrava, Czech Republic. He is currently a vice-rector for science and creative activities of the University of Hradec Kralove from June 2020. At present, he is also a director of the Center for Basic and Applied Research at the xiii
xiv
About the Editors
University of Hradec Kralove. In years 2016–2020, he was the vice-dean for science and research at Faculty of Informatics and Management, UHK. His h-index is 21, with more than 1800 citations received in the Web of Science, where more than 120 IF journal articles are indexed in JCR index. In 2018, he was the 14th top peer reviewer in Multidisciplinary in the World according to Publons and a top reviewer in the Global Peer Review Awards 2019 by Publons. Currently, he is on the editorial board of the MDPI Sensors IF journal (Q1/Q2 at JCR) and several other ESCI indexed journals. He is a vice-leader and Management Committee member at WG4 at project COST CA17136, since 2018. He has also been a Management Committee member substitute at project COST CA16226 since 2017. Since 2019, he has been the chairman of the Program Committee of the KAPPA Program, Technological Agency of the Czech Republic as a regulator of the EEA/Norwegian Financial Mechanism in the Czech Republic (2019–2024). Since 2020, he has been the chairman of the Panel 1 (Computer, Physical, and Chemical Sciences) of the ZETA Program, Technological Agency of the Czech Republic. Since 2014 until 2019, he has been the deputy chairman of the Panel 7 (Processing Industry, Robotics, and Electrical Engineering) of the Epsilon Program, Technological Agency of the Czech Republic. At the University of Hradec Kralove, he is a guarantee of the doctoral study program in Applied Informatics, where he is focusing on lecturing on Smart Approaches to the Development of Information Systems and Applications in Ubiquitous Computing Environments. His research interests include control systems, smart sensors, ubiquitous computing, manufacturing, wireless technology, portable devices, biomedicine, image segmentation and recognition, biometrics, technical cybernetics, and ubiquitous computing. His second area of interest is in biomedicine (image analysis), as well as biotelemetric system architecture (portable device architecture, wireless biosensors), development of applications for mobile devices with use of remote or embedded biomedical sensors. Debotosh Bhattacharjee is working as a full professor in the Department of Computer Science and Engineering, Jadavpur University, with sixteen years of post-Ph.D. experience. His research interests pertain to the applications of machine learning techniques for face recognition, gait analysis, hand geometry recognition, and diagnostic image analysis. He has authored or co-authored more than 312 journals, conference publications, including several chapters in the areas of biometrics and medical image processing. Two US patents have been granted on his works. Prof. Bhattacharjee has been granted sponsored projects by the Government of India funding agencies like Department of Biotechnology (DBT), Department of Electronics and Information Technology (DeitY), and University Grants Commission (UGC) with a total amount of around INR 3 Crore. For postdoctoral research, Dr. Bhattacharjee has visited different universities abroad like the University of Twente, The Netherlands; Instituto Superior Técnico, Lisbon, Portugal; University of Bologna, Italy; ITMO National Research University, St. Petersburg , Russia; Univer-
About the Editors
xv
sity of Ljubljana, Slovenia; Northumbria University, Newcastle Upon Tyne, UK, and Heidelberg University, Germany. He is a life member of Indian Society for Technical Education (ISTE, New Delhi), Indian Unit for Pattern Recognition and Artificial Intelligence (IUPRAI), a senior member of IEEE (USA), and a fellow of West Bengal Academy of Science and Technology.
Artificial Intelligence
Correlation Analysis of Stock Index Data Features Using Sequential Rule Mining Algorithms Nayanjyoti Mazumdar and Pankaj Kumar Deva Sarma
Abstract A massive changeover is witnessed in the stock markets worldwide during the recent pandemic situations resulting in complicacy of people’s investment choices. Investors are anxious to minutely speculate the market movements for designing investment strategies and profit–loss analysis. For that, precise exploration of the historical market information is necessary to presume market fluctuations. The BSE Sensex and the NSE Nifty are the major capital market segments in India that manage a number of indices and are capable of representing the market trends. These indices may be discursively impacted by a number of components. The correlation and occurrence frequency of these components can unfold many unknown consequential information. In this study, we consider the Nifty 50 Index data of last 25 years and implemented the AprioriAll sequence mining algorithm using TRIE data structures. We take the features—previous days’ closing price, daily opening price, highest price, lowest price, closing price, shares traded, and the daily turnover to thoroughly investigated and analyze the correlation among them and verify their impacts on the overall market movements. A comparison of the in-memory space requirements for holding the generated candidate sequences while implementing the algorithm is also presented. Keywords Capital market · BSE · NSE · Sequence mining · AprioriAll · TRIE
1 Introduction Data analysis has drawn attention of researchers worldwide because of the fact that patterns hidden in the historical data have much more value than the data itself. Time plays a very crucial role during the occurrence of an event. Individual and distinct events may carry a very little information with it. But when there is a huge number of events taking place one after another or simultaneously, they may follow or generate some patterns. These patterns may produce some meaningful knowledge. N. Mazumdar (B) · P. K. D. Sarma Department of Computer Science, Assam University, Silchar, Assam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_1
3
4
N. Mazumdar and P. K. D. Sarma
Such patterns can be found in many application domains be it financial transactions, buying behaviors, educational datasets, weather reports, bio-informatics, genome sequences, the recent COVID-19 pandemic-related datasets, etc. For example, the Stock Index data possesses a number of such events and features such as daily closing, opening, high and low prices, number of shares traded, and days turnover which are highly time oriented and are sequential in nature [1]. The interrelation among these events may draw meaningful interpretations. There are many algorithms and techniques [2–4] that can be used to analyze and verify important associations or dependence among the participating features. Here in this paper, the AprioriAll algorithm and its variants [5] are considered for the discovery of sequential rules from the Stock Index datasets. This algorithm fits well in this scheme because of the implications of level-wise approach of breadth first search and the hierarchical tree-based data structures used in the algorithms. The rest of the papers is organized as follows. In Sect. 2, different segments of the Nifty capital market and the data preprocessing activities involved for sequence mining are discussed. In Sect. 3, different algorithms and techniques involved for stock market analysis are mentioned. In Sect. 4, performance matrices, tools, and data structures used are discussed. In Sect. 5, the implementation strategies are described. In Sect. 6, a detailed analysis of the results obtained is presented. The paper is concluded with a discussion on different application domains of sequence mining algorithms, and a scope of future possibilities for exploring multiple dimensions of the stock features is presented.
2 Stock Index Data—Collection and Preprocessing A capital market is a financial instrument for dealing with equity shares, bonds, or long- and short-term investments. National Stock Exchange (NSE) is one of the capital market segments in India that deals with different stock indices. A Stock Index is a mathematical measure computed upon the average buy and sale stock volumes of a group of enlisted companies [6]. Nifty capital market is a popular index capable of representing the large, mid- and small market capital segments. Different segments of the Nifty markets are Nifty 500, Nifty Midcap 150, Nifty 50, etc. as shown in Fig. 1. Stock exchanges keep tracks of share prices data of different segments in terms of per tick, per second, per minute, per day, weekly, quarterly, or yearly basis which are good examples of big datasets. For this study, 6221 days daily market movement records (From 1997 till 2021) of the Nifty 50 Market Index [7] are collected for finding some meaningful patterns using sequence mining algorithms. Nifty 50 is a group of fifty premier companies in India associated with different sectors like Financial Services, IT, Oil, Gas, Automobile, Healthcare, Metals, Consumables, Telecom, Construction, Power, Service Providers, etc. Some of the major contributors of the Nifty 50 Index are Reliance Industries Limited, Infosys, HDFC bank, ICICI bank, TCS Limited, ITC Limited, Larsen & Toubro Limited, Axis Bank Limited, etc. We consider seven features of the dataset, namely Previous Day’s Closing Price (PCVAL), Days Opening price (OPVAL), Day’s High price (HIVAL), Day’s Low
Correlation Analysis of Stock Index Data Features Using Sequential …
5
Fig. 1 Different segments of the indices of Nifty capital market
Price (LOVAL), Days Closing price (CLVAL), Total Shares Traded (SHTRD) in the day, and the Total Turnover in the day (TURNO). These features are mutually exclusive and take up variable values in different days. The raw dataset as shown in Fig. 2a is not directly suitable for applying sequential algorithms. Necessary cleaning and transformations are performed on the raw data to get a sequence dataset as shown in Fig. 2b. Feature values are mapped into some performance scales which are calculated upon the percentage change of features based on the previous day’s value. We consider the performance scales—Very High (percentage ≥ 1), Little High (percentage ≥ 0.6 and < 1), Average High (percentage > 0 and < 0.6), Average Low (percentage ≥ −0.6), Little Low (percentage ≥ −1 and < −0.6), and Very Low (percentage < −1).
Fig. 2 Conversion of Nifty 50 Index dataset into sequence dataset
6
N. Mazumdar and P. K. D. Sarma
3 Algorithms and Techniques for Analyzing Stock Index Data Literature reveals that there is an upward shift of research works in the field of stock markets prediction [8]. Due to financial liquidation, open marketing strategies, or digitalization of business facilities worldwide, more and more companies are evolved and enlisted in the stock markets. This causes a massive changeover in the marketing strategies and investment plans of people. The primary and the foremost goal of any investor or trader is to make some profit out of their investments in a short or longer period of time. It is a very bitter truth that most of the time, the assumptions of the investors about the selected share prices become unfruitful as a result of which huge losses are to be paid up. One of the reasons behind this may be the incapability of human brains to manually analyze different influencing factors in a quickly manner. A sufficient time and effort is required to identify performance parameters and their cause and effects on share price movements. Computerized infrastructure with the use of efficient algorithms or techniques can improve the analysis process for identifying different contributing factors of the stock prices and increase the accuracy level of assumptions. Fundamental technical analysis is a very basic tool for analyzing a company’s future performances. It takes the revenue, assets, expenses, liabilities, and other financial statements as inputs and generates suggestions for future price movements. Qualitative versus quantitative analysis, CANSLIM investing, income investing, simple and exponential moving averages, Dogs of the Dow, parabolic SAR, moving average convergence/divergence (MACD), Relative Strength Index (RSI), etc. are some of the classical strategies for picking up stock indices and predicting market movements [9]. These common strategies yield an average estimates of possible future stock movements. None of these strategies are accurate enough to predict stock prices. Several new algorithms have come up that are successfully employed in stock markets to predict risk and rewards with greater accuracy. Support vector machines, K-means clustering, decision trees, Bayesian networks, C4.5, CART, and expectation maximization algorithms are successfully used for stock market analysis. Advantages and disadvantages in using these algorithms are also reported [10]. Machine learning models are effectively used for classification of stock indices and predicting stock market movements [11–13]. Use of deep learning methods can be seen for analyzing company’s financial failures [14]. It is seen that stock markets are greatly impacted by News and Social media feeds [15]. Data mining methods can support smart decisions for systematic investment plans in stock markets [16]. Studies show the use of neural network techniques for analysis of stock markets [17]. It is seen that premier computational techniques are used mostly for prediction of the stock market prices. Very less works are found that analyze the intra- and inter-relations among the different stock prices [18] and effect of a particular index price on the movement of other index prices. Sequential pattern mining algorithms here can play a very strong role in analyzing the relations among different features involved in Stock Index prices.
Correlation Analysis of Stock Index Data Features Using Sequential …
7
4 Performance Matrices, Data Structures, and Tools Used A subsequence or a sequence may resume in datasets only once or multiple times. The frequency of a subsequence in some transactions is the number of times the subsequence is occurring in all the transactions in a dataset. Support of a sequence defines its frequency of occurrence (σ ) divided by the total number of transactions in the sequence dataset. Confidence of two sequences defines the frequency of occurrence of these sequences together divided by frequency of occurrence of one of the sequence [19]. Lift defines the importance of the generated rules which is a ratio of the confidence of a rule to their expected confidence. For example, for a sequence (p, q) in some sequence dataset having N number of transactions, support, confidence, and lift are given by Support( p, q) =
σ ( p, q) N
Confidence( p, q) = Lift( p ⇒ q) =
σ ( p, q) σ ( p)
Support( p ∩ q) Support( p) × Support(q)
(1) (2) (3)
Frequent sequences are those whose support is greater than or equal to some user-specified minimum support (min_sup). Maximal sequences are the sequences that occur in some transactions satisfying a certain level of min_sup, but they do not occur in transactions with some higher min_sup. When there is a decrease in maximal sequences with the increase of min_sup, stronger dependence among the subsequences is found. Sequential rules define the intra- and inter-relations among sequences. When there is a higher level of confidence that is satisfied, stronger rules are found (Lift > 1). When there is lower level of confidence among sequences, weak rules are generated (Lift < 1). When the Lift = 1, sequences are independent of each other. When sequence mining is performed, algorithms are required to generate all the subsequences called candidates of the transaction dataset. A large volume of candidate sequences is generated even if a small dataset is considered for sequence mining. If there are N items in a single sequence, 2N − 1 candidates are generated. To handle these much volume of candidates in main memory and perform searching operations, efficient data structures [20] are required to be used with the algorithms. Hash tree and TRIE or the prefix tree [21] and binary search tree (BST) are the options to make due to simplicity of their working principles. AVL tree, red–black tree, and splay tree are the variants of BST [22]. Sometimes, it happens that similar candidates are generated again and again from the transactions in the dataset. This situation is more common when we consider datasets like students passing sequences, customers buying behavior, stock market data, etc. Hash tree is not efficient enough for handling such collision issues [23]. TRIE can improve the situations of large and redundant candidate handling, searching time minimization, and autocomplete options.
8
N. Mazumdar and P. K. D. Sarma
For this study, we consider the open access programming environments. Compilers/debuggers: Anaconda Library with Python 3.5.9, Jupyter Notebook, JavaScript, Parquet viewer, IDE & Editor: JSON Library, VS Code, Notepad++, and MS Excel; hardware configurations: Intel Core i5-1035G1 processor (1.0 GHz up to 3.6 GHz with Intel turbo boost, 6 MB L3 cache); and memory: 8 GB DDR4 RAM, 512 GB PCI SSD HDD.
5 Implementation of AprioriAll Algorithm on Nifty 50 Index Dataset AprioriAll is a very efficient algorithm capable of generating sequential rules from sequence datasets. It is flexible enough to incorporate different in-memory data structures with it. Traditional AprioriAll algorithm demands multiple database passes over the dataset for producing candidates and uses hash tree structures for handling the candidates. In first pass, candidates of length-1 are produced. In the second pass, length-2 candidates are produced, and so on. After generating all the candidates of variable lengths, support, and confidence of the candidates are calculated. Depending on some user-specified min_sup and confidence, strong and weak sequential rules are found. Those candidates which are capable of satisfying a certain level of confidence produce strong rules. And those which do not satisfy would yield weaker rules. We applied a modified version of this AprioriAll algorithm by using TRIE data structure with database scan reduction techniques. The flowchart of the same is given in Fig. 3. TRIE has its own advantages as discussed, and reduction of database scans can considerably reduce processing time and space.
Fig. 3 Flowchart for generating candidates using TRIE with database scan reduction
Correlation Analysis of Stock Index Data Features Using Sequential …
9
5.1 Implementation Result-1 (Large Sequences and Maximal Sequences) Large sequences are those which satisfy some user-specified supports. For the Nifty 50 Index dataset, all the large sequences are found out against varying number of min_sup given in Fig. 4a. When the min_sup increases, a lesser number of sequences qualify for being large. The length of a sequence denoted by (L k ) means the number of items (k) present in that sequence. For example, (PCVH) is a L 1 sequence, (PCVH, OPVH) is a L 2 sequence, and so on. Sequences that qualify for a certain level of min_sup but fail to satisfy higher min_ sup are called maximal sequences. The total number of maximal sequences against variable min_sup is given in Fig. 4b. A sample of the maximal sequences generated from the Nifty 50 Index dataset is given in the following Fig. 5. These maximal
Fig. 4 Generation of large and maximal sequences
Fig. 5 Some maximal sequences generated with min_sup as 5%
10
N. Mazumdar and P. K. D. Sarma
Fig. 6 Calculation of STDAVG for performance scales
sequences constitute the sequential rules generated out of the dataset. It is observed that the large and maximal sequences are directly proportional to supplied min_sup.
5.2 Implementation Result-2 (Calculation of Standard Averages for Performance Scales) The number of occurrences of the performance scales VH, LH, AH, AL, LL, and VL for each of the considered features PCVAL, OPVAL, HIVAL, LOVAL, and CLVAL in the Nifty 50 Index dataset is calculated as shown in Fig. 6. We calculate the STDAVG of all the performance scales using Eq. 4, which is an average of occurrences of all the features for a performance scale. STDAVG =
n 1 σ (X i ) n i=0
(4)
Here, n is the total number of features considered (5 features in this case). We consider this average as the standard value for a particular performance scale. For example, standard value for the scale VH (Very High) is 1191. Likewise, standard value for Very Low (VL) is 1031. The STDAVG is taken as the basis for calculation of confidences of each of the features.
5.3 Implementation Result-3 (Calculation of Support and Confidence) The confidence of occurrence of a particular feature for a performance scale is calculated against all other features. The higher the value of confidence, the stronger is the relation among the features. The lower the confidence value, the weaker is the relation among the features. This means when the confidence percentage between any two features is elevated, there is an increase in possibility of occurrence of these features together. In the following example given in Fig. 7, the confidence of the feature scales PCVH, PCLH, PCAH, PCAL, PCLL, and PCVL is calculated against all other feature scales such as OPVH, OPLH, OPAH, OPAL, OPLL, and OPVL to find the relations among them. Based on the user-specified min_sup and confidence,
Correlation Analysis of Stock Index Data Features Using Sequential …
11
Fig. 7 Calculation of confidence of PCVAL feature against all other features
important features are selected and unimportant features are pruned out. Likewise, the confidence of higher length sequences can also be found against some features.
6 Analysis of the Results Obtained Result analysis is one of the most important activities of scientific experiments that deals with the meaning of the results explored. It is a process of technical observation of the cause and effects and intra- and inter-relations among participating parameters. In this analysis, we did a lot of experimentations in multiple dimensions of the dataset for finding relations among all the seven features considered. Some interesting hidden facts from the dataset are uncovered. These hidden rules can support the decision-making process when dealing with the dataset.
6.1 Result Analysis-1 (Intra- and Inter-Relations Among Features) For verifying the relations among different features, we take confidence as the major factor. For the analysis of the relations of previous days’ closing price (PCVAL), we evaluate the confidence of each of the other features as shown in the following Fig. 8. The reason of taking PCVAL is that, based on the previous days’ closing price, one may take decisions for the next days’ trading activities. In the first case, Fig. 8a, we analyze PCVAL against instances of the next days’ opening price (OPVAL). There is a maximum chance of opening the market in a very high price in the next day when the previous days’ closing was at very high. In the second case, Fig. 8b, we
12
N. Mazumdar and P. K. D. Sarma
evaluate PCVAL with the instances of next days’ closing price (CLVAL). Previous days’ closing price has an average relation with the next days’ closing price. No significant confidence among the features is observed. In the third case, Fig. 8c, PCVAL is analyzed against next days’ highest value (HIVAL). Here also, no significant confidence is seen. In the fourth case, Fig. 8d, PCVAL is analyzed against next days’ lowest price (LOVAL). It is seen that when the previous days’ closing price is at high, the next days’ lowest price is also increased by a comparable amount. Several other types of such investigations can be made out of the dataset. Instances with lower confidences also carry some important meanings. When there is a low level of confidence among the features, it is more likely that they hardly occur together. For instance, when the previous days’ closing price is very high there is merely a chance of occurring the next days’ high price as very low as shown in Fig. 8a. These type of strategies are generally used for intraday trading related activities. High confidence may support the stock buying decisions, and lower confidence may support the stock selling decisions. The number of shares traded in a day has a direct impact on some of the other features. From the following Fig. 9, we observe that, when there is a heavy fall in the number of shares traded in a day, there is a high chance of upward movement in the Nifty market. When analyzed from some other dimensions of the dataset that several other rules are likely to come up.
Fig. 8 Confidence analysis of PCVAL against all other features
Correlation Analysis of Stock Index Data Features Using Sequential …
13
Fig. 9 Confidence of SHTRD against all other features
6.2 Result Analysis-2 (Candidate Generation and Memory Consumption) Sequential mining process requires generation of a huge number of candidates as discussed in Sect. 4. Subsequences of a sequence are also a sequence. When the subsequences of all the transactions are generated, it creates huge sequence lattice as shown for a sample dataset in Fig. 10. This lattice or the candidate list is to be held
Fig. 10 Formation of a candidate sequence lattice
14
N. Mazumdar and P. K. D. Sarma
Fig. 11 Analysis of candidate generation and storage requirements
in main memory for carrying out the mining activities. For this reason, efficient data structures are to be selected. The candidate sequences generated out of a sequence dataset consume a considerable amount of memory as compared to the dataset size. The number of candidate sequences and the consumed memory against different dataset sizes are given in Fig. 11. One of the interesting facts observed out of the experimentation is that with the increase in datasets, there is a linear decrease in number of candidates generated. This is because, there remains a possibility for generation of more common sequences when dataset grows vertically. This scenario is more visible when we deal with transactional datasets such as customers’ buying patterns, students’ passing sequences, and trading market data.
7 Other Applications of Sequential Mining Algorithms In recent times, linear growth is visible in sequential mining techniques both in algorithmic and applications point of views. Due to automation feasibility, sequential mining algorithms can be applied to outspread range of domains. As such, a number of computing systems have evolved that use new sequential mining algorithms and techniques [24] for finding hidden meanings from sequence datasets. Sequence mining is used to find occurrences of sequences and study the relations among them. These relations and their cause and effect to each other produce more meaning than what the sequence itself is. The following subsection gives an overview of different applications of sequential mining algorithms and techniques in recent times. Intrusion Detection: Cyber-attacks and information hackings have become a common threat to both dedicated and virtual computing environments. SVM with the aid of sequential mining algorithms is used for detecting malwares and spams with greater accuracy [25–28]. Recommender Systems: Internet is an ocean of learning resources. Finding the relevant and authentic information out of the huge contents sometimes becomes a very difficult task. Collaborative filtering with the use
Correlation Analysis of Stock Index Data Features Using Sequential …
15
of sequential mining algorithms is now able to provide more accurate and personalized information [28, 29]. The transport sector especially in urban areas has a common problem of road traffic. Moreover, timely arrival to the destination is always preferable. The traffic recommender systems provide traffic-related information and route planning on the go [30, 31]. Tourism Industry: Tourism industry nowadays not only providing transport and logistics supports. They are now more sophisticated to design tourism packages and compare and suggest traveling destinations and a lot. Sequential mining has been extensively and successfully employed to analyze travel history of tourists for the same [32]. Medical Sciences: For diagnosis purposes, risk prediction, effectiveness of medications, treatments schedule designing, and drug discovery Electronic Health Records (I) analysis are some of the alarming areas of sequence mining [33– 36]. R&D Planning: Resource utilization for any business organization is a prime factor which decides the success of the organization. Sequential mining algorithms and techniques are used to provide support for planning and development of business situations [35, 36]. Pollution Control and Global Warming: Applications of sequential mining techniques are found in detecting air pollutions by analyzing granularity level of air pollutants in different cities [37]. IoT and Office Automations: In different IoT applications of home and office automations, uses of sequential rules are found [38]. Manufacturing: For validating and improving product quality in production industries, sequential algorithms are used nowadays [39]. Bio-informatics: For DNA sequencing, drug discovery, and bio-inspired techniques, sequential mining techniques are rigorously used which are also the flaring area of research [40–44]. New sequential mining algorithms and techniques are evolving each day, and they are opening up new prospects for application to newer domains. Hybrid sequential mining algorithms are coming up for generating rules out of rules from sequence datasets.
8 Conclusion and Future Works Over the last two decades, stock markets have witnessed a lot of fluctuations, and as a result, newer tools and techniques are evolved to inspect and anticipate future market movements for supporting the investment decisions. Different features may take part in markets directly or indirectly, and some of their correlations can act as the deciding factors for the market trends. Objective of this work was to implement a premier sequential mining algorithm—AprioriAll to analyze dependence among different daily price fluctuating scales of the Nifty 50 Index dataset and to study the precise behavior of the Nifty market. A lot of interesting behaviors and results are observed and are presented in digestible formats. Application of the sequential mining algorithms can be extended to other dimensions of stock market estimates such as deleterious stock features spotting, cross-stocks feature associations, and
16
N. Mazumdar and P. K. D. Sarma
opponent-stock identification. Inter-occurrence-distance unfolding and periodicity analysis of stock price movements are some of the areas where sequential mining techniques may be used.
References 1. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE’95), Taipei, Taiwan, pp 3–14 2. Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM N(M 20YY):1–46 3. Motegaonkar VS et al (2014) A survey on sequential pattern mining algorithms. Int J Comput Sci Inform Technol 5(2) 4. Kour A (2017) Sequential rule mining, methods and techniques: a review. Int J Comput Intell Res 13. ISSN 0973-1873 5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile 6. NSE online sources as on 10th May (2022). https://www.nseindia.com/products-services/ind ices-nifty50-index 7. National stock exchange—Nifty 50 historical data downloading online sources, updated as on 10th May 2022. https://www1.nseindia.com/products/content/equities/indices/historical_i ndex_data.htm 8. Bustos O, Pomares-Quimbaya A (2020) Stock market movement forecast: a systematic review. Exp Syst Appl 156:113464 9. Drakopoulou V (2015) A review of fundamental and technical stock analysis techniques. J Stock Forex Trad. https://doi.org/10.4172/2168-9458.1000163 10. Liu H, Huang S, Wang P, Li Z (2021) A review of data mining methods in financial markets. Data Sci Finan Econ DSFE 1(4):362–392. https://doi.org/10.3934/DSFE.2021020 11. Barboza F, Kimura H, Altman E (2017) Machine learning models and bankruptcy prediction. Exp Syst Appl 83:405–417 12. Lee TK, Cho JH, Kwon DS et al (2019) Global stock market investment strategies based on financial network indicators using machine learning techniques. Exp Syst Appl 117:228–242 13. Rouf N et al (2021) Stock market prediction using machine learning techniques: a decade survey on methodologies, recent developments, and future directions. MDPI. https://doi.org/ 10.3390/electronics10212717 14. Aljawazneh H, Mora AM, Garcia-Sanchez P et al (2021) Comparing the performance of deep learning methods to predict companies’ financial failure. IEEE Access 9:97010–97038 15. Javed Awan M, Mohd Rahim MS, Nobanee H et al (2021) Social media and stock market prediction: a big data approach. Comput Mater Con 67:2569–2583 16. Farid S, Tashfeen R, Mohsan T et al (2020) Forecasting stock prices using a data mining method: evidence from emerging market. Int J Finan Econ 17. Kumar DA, Murugan S (2013) Performance analysis of Indian stock market index using neural network time series model. In: 2013 international conference on pattern recognition, informatics and mobile engineering. IEEE 18. Ting J, Fu T, Chung F (2018) Mining of stock data: intra- and inter-stock pattern associative classification 19. Online source: https://t4tutorials.com/support-confidence-minimum-support-frequent-ite mset-in-data-mining/ 20. Online source: https://pages.di.unipi.it/pibiri/papers/phd_thesis.pdf 21. Online source: https://www.baeldung.com/cs/hash-table-vs-trie-prefix-tree 22. Online source: http://cslibrary.stanford.edu/110/BinaryTrees.html
Correlation Analysis of Stock Index Data Features Using Sequential …
17
23. Online sources: https://www.geeksforgeeks.org/hashing-set2-separate-chaining/amp/ https:// runestone.academy/ns/books/published/pythonds/SortSearch/Hashing.html 24. Biswas S, Chaki S, Mahbub K, Ahmed S (2021) Stock market prediction: a survey and evaluation. Conference Paper, ResearchGate. https://www.researchgate.net/publication/357 205022 25. Nissim N, Lapidot Y, Cohen A, Elovici Y (2018) Trusted system calls analysis methodology aimed at detection of compromised virtual machines using sequential mining. Knowl-Based Syst 153:147–175 26. Husak M, Bajtos T, Kaspar J, Bou-Harb E, Celeda P (2020) Predictive cyber situational awareness and personalized blacklisting: a sequential rule mining approach. ACM Trans Manage Inform Syst 11(4):1–16. Article No. 19 27. Zheng L, Guo N, Chen W, Yu J, Jiang D (2020) Sentiment-guided sequential recommendation. In: SIGIR’20: proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, July-2020, pp 1957–1960. https://doi.org/10.1145/339 7271.3401330 28. Tarus JK, Niu Z, Kalui D (2018) A hybrid recommender system for e-learning based on context awareness and sequential pattern mining. Soft Comput 22:2449–2461 29. Anwar T, Uma V (2019) CD-SPM: cross-domain book recommendation using sequential pattern mining and rule mining. J King Saud Univ Comput Inform Syst 30. Ibrahim R, Shafiq MO (2019) Detecting taxi movements using random swap clustering and sequential pattern mining. J Big Data. Article-39 31. Bermingham L, Lee I (2020) Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowl-Based Syst 189 32. Vu HQ et al (2017) Travel diaries analysis for sequential rule mining. J Travel Res 57(3):399– 413 33. Rjeily CB, Badr G, El Hassani AH, Andres E (2018) Medical data mining for heart diseases and the future of sequential mining. Mach Learn Paradigms 149:71–99 34. Kaur I, Doja MN, Ahmad T (2020) Time-range based sequential mining for survival prediction in prostate cancer. J Biomed Inform 110 35. Choi J, Jeong B, Yoon J (2019) Technology opportunity discovery under the dynamic change of focus technology fields: application of sequential pattern mining to patent classification. Tech Forecasting Soc Change 148 36. Lee G, Kim D, Lee C (2020) A sequential pattern mining approach to identifying potential areas for business diversification. Asian J Technol Innov 28(I) 37. Zhang L, Yang G, Li X (2020) Mining sequential patterns of PM2.5 pollution between 338 cities in China. J Environ Manage 262 38. Srivastava G, Lin JC-W, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in internet of things. Internet Things 8(16) 39. Yao L, Huang H, Chen S-H (2020) Product quality detection through manufacturing process based on sequential pattern considering deep semantic learning and process rules. In: Fault detection and process diagnostics by using big data analytics in industrial applications, June2020. MDPI 40. Pushpalatha K, Ananthanarayana VS (2018) Multimedia document mining using sequential multimedia feature patterns. ResearchGate, https://arxiv.org/pdf/1808.01038 41. Spreafico R, Soriaga LB (2020) Advances in Genomics for drug development. MDPI. https:// www.mdpi.com/2073-4425/11/8/942/pdf 42. Levi M, Hazan I (2020) Deep learning based sequential mining for user authentication in web applications. In: International workshop on emerging technologies for authorization and authentication, ETAA-2020. Springer, pp 1–15 43. Estiri H, Strasser ZH et al (2020) Transitive sequencing of medical records for mining predictive and interpretable temporal representations. Patterns 1(4) 44. Valdez F, Castilli O, Melin P (2021) Bio-inspired algorithms and its applications for optimization fuzzy clustering. MDPI. https://doi.org/10.3390/a14040122
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language Bhubneswar Das and Smriti Kumar Sinha
Abstract Bishnupriya Manipuri is an endangered language. Due to very limited resources, it becomes very challenging for carrying out any computational tasks for this type of endangered language. Bishnupriya Manipuri is also among the less computationally explored languages. In this paper, we present an automated partof-speech tagging approach based on conditional random field using the CRF-Suite library, which is an important and basic task for any natural language processing task. We have carried out experiments on a corpus annotated by us using gold standard tags and achieved a satisfactory tagging accuracy of 86.21%. Further, the results are compared with some existing state-of-the-art approaches. Keywords Part-of-speech tagging · Bishnupriya Manipuri · Conditional random field
1 Introduction Part-of-speech (POS) tagging is the process of labeling the lexical items in a sentence of a particular language. The labeling is done based on various morphological, semantic, contextual behavior. For a morphologically rich language, the task of POS tagging is not trivial because of the huge number of inflections and also the tagging is dependent on the context. Automatic POS tagging is a process of assigning a tag or label to each of the lexical items to respective lexical categories of a particular language with very little human effort. The automatic POS tagging is very challenging for a resource-poor language due to lack of available corpus. To the best of our knowledge, no annotated corpus is available for the Bishnupriya Manipuri language. In this paper, we have carried out our work on the Bishnupriya Manipuri language. As per the existing literature survey, no automatic POS tag identification work has been reported. With the existing literature survey, we can say that this is the first conditional random field (CRF)-based automatic POS tag identification work for the BishB. Das (B) · S. K. Sinha Tezpur University, Napaam, Sonitpur, Assam 784028, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_2
19
20
B. Das and S. K. Sinha
nupriya Manipuri language. As there is no freely available annotated corpus of the Bishnupriya Manipuri language; we have created a corpus for our experimental work. The paper is organized as follows. In the next section, we give a brief introduction to the language. In Sect. 3, we give an existing literature on POS tagging of Indian languages using similar methodologies. In Sects. 4 and 5, we describe the proposed methodology and results of experimental results, respectively. Section 6 presents a brief discussion. And by Sect. 7, we concluded the paper.
2 Bishnupriya Manipuri Language The Bishupriya Manipuri language falls under the Indo-Aryan language group. The structure of the language is the irrefutable proof in this respect. To clarify the position, it may be said that, first, by far the greatest majority of words and roots of this language are of Indo-Aryan origin. Secondly, all the pronominal forms and the conjugational and the declensional endings which are the most stable elements of a language are also of Indo-Aryan origin [19]. Meiteis and the Bishnupriyas are the two groups of Manipuri people. The Meiteis call their language Meitei or Manipuri which is the state-language of Manipur. The Bishnupriya Manipuri language belongs to Indo-Aryan language family. On the other hand, the Meitei is a language of the Tibeto-Burman group. Though both the languages have differences in many factors, these two sections of people have a common stock of culture; their kirtana, dances, music, dress, etc., all are of the same type. The Bishnupriya Manipuri language, though of Indo-Aryan origin, has incorporated numerous words (about 4000) from Meitei [19]. Originally, Bishnupriya Manipuri language is restricted only to the surroundings of the lake Loktak in Manipur. Bishnupur, Khangabok, Heirok, Ningthaukhong, Mayang-Yumphan, Khunau, and Thamnapokpi are the principal localities, where this language is primarily spoken [19]. Madai Gang and the Rajar Gang are the two available dialects in the Bishnupriya Manipuri language. These dialects are not restricted only to specific geographical areas; they also exist in some other nearby areas in the same localities [19]. The Bishnupriya Manipuri language is practically dead in its place of origin. However, the language is retained by its speakers in diaspora mostly in Assam, Tripura, and Bangladesh. The number of speakers of this language according to a random sampling held by Dr. K. P. Sinha in 1966 was about 90,000 in India and about 45,000 in Bangladesh. Besides, there were about 50,000 people in Manipur, about 21,000 in the Khangabok-Heirak area, about 22,000 in the Ningthaukhong-Bishnupur area, and about 7000 people scattered here and there, who spoke Meitei but were known as Bishnupriyas [19]. The language is enlisted as an endangered language by UNESCO [2]. Interestingly, a thorough linguistic study of this little known language was done by Dr. K. P. Sinha. The study done was mainly on morphological analysis. An etymological dictionary is also published [20]. But no study on the computational linguistic of this language is done. To the best of our knowledge, the present work based on the available publications is the first of its kind in this direction.
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language
21
3 Literature Review This section presents a survey of the existing approaches for POS tagging in Indian languages. In the last two or three decades, several POS taggers have been developed for Indo-Aryan languages. All these approaches can be classified as supervised or unsupervised. For supervised approaches such as hidden Markov model [6, 15, 18], conditional random field [5, 16, 17], support vector machine [4], neural network [13], and maximum entropy [7, 9], we must have an annotated corpus for training the model. While the unsupervised approach doesn’t require an annotated corpus and is therefore useful for the languages where it is very hard to find a suitable annotated corpus. Some hybrid approaches are also used in the literature in order to improve the tagging accuracy, e.g., hidden Markov and maximum entropy model [7], CRF and transformation-based learning (TBL) [16], etc. Ojha et al. [10] presented an approach for POS tagging using CRF++ and SVM for Bhojpuri, Hindi, Odiya languages. The training corpus comprises 90k tokens. For testing, 2k tokens were used. In the case of CRF-based methodology, the maximum accuracy is 86.7%, whereas, in the case of SVM, it is 93.7%. Barman et al. [5] propose a POS tagging approach using the CRF and TBL. They applied the models on the Assamese language. An accuracy of 87.17% for TBL and 67.73% for CRF is obtained by these approaches. They further reported that the inability to recognize the suffixes, prefixes, and the morphological information is the reason for low-performance of the CRF model as compared to TBL. Ahmad et al. [3] propose a CRF-based POS tagging method for the Kashmiri language. An accuracy of 81.10% is obtained when the system is evaluated with training data of 27k and test data of 3k. Ekbal et al. [8] propose an approach for the Bengali language by using a CRF model. The tagset was of 26 different tags. The system is trained with 72,341 words, and it produces 90.30% accuracy, when it is tested with 20,000 words. They reported that the unknown words can be handled using suffixes, named entity recognizer, and the lexicon, etc. Patel et al. [14] propose a CRF-based POS tagging approach. The corpus is annotated with 26 tags. 10k words were used for training, while in testing, 5k words were used. The reported accuracy is 92%. Pallavi et al. [11] present a CRF-based approach for part-of-speech tagging for the Kannada language. Their corpus is annotated with 36 tags. They used 64k words for training and 16k for testing purposes. 12 numbers of distinct features based on linguistic clues were extracted and trained. The proposed approach delivers an accuracy of 92.94%. Warjri et al. [22] present a CRF-based part-of-speech tagger for the Khasi language. It obtained an F-measure of 0.921 on a Khasi corpus developed by them (Table 1).
22
B. Das and S. K. Sinha
Table 1 Summary of POS tagging approaches for some Indian languages Refs. Language Model Data and tagset Accuracy [5]
[8]
Assamese CRF and Transformation-based learning Bengali CRF
–
87.17 for TBL and 67.73 for CRF
[16]
Hindi
CRF
[10]
Hindi
CRF++, SVM
Training data 72k and test data 20k Training data 21,470 words and test data 2924 90k tokens
90.30
[12] [3] [21]
Tamil Kashmiri Kannada
CRF CRF CRF
36k sentences 30k 80k tokens, 36 tags
78.66
Between 82 and 86.7 for CRF++, 88–93.7 for SVM F-score 0.89 81.10 96.86
4 Methodology In this paper, we present a CRF-based approach for POS tagging of Bishnupriya Manipuri language. In the subsection below, we also described various tags used in building the annotated corpus.
4.1 Tagset Used We have used the gold standard tagset used in several corpus released by TDIL-DC. Table 2 lists the various tags used in building the corpus.
4.2 Preprocessing Preprocessing is a preliminary step that is needed for cleaning the data. The text data that we collected for designing the corpus contains some English language texts. In addition to this, it also contains some English abbreviations. At the initial stage, those texts and abbreviations are removed; to make the corpus of Bishnupriya Manipuri texts only. The text data in the corpus is organized in such a way that each line contains only one sentence. We use the punctuation markers as a sentence end marker. Next, we apply the tokenization function to make each word a token. Our target is to convert each sentence in the corpus as a list of (word, POS) pairs. All
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language
23
these steps are carried out to convert the data in CoNLL format so that we can apply any machine learning algorithms. In CoNNL format, a sentence is represented as a list of tuples of (word, POS) pairs.
4.3 Annotation Annotation means assigning lexical category tags to the words in the raw text corpus. This annotation is a challenging task for morphologically rich languages. The same word can appear as a different lexical category in different contexts. This problem is referred to as an ambiguity problem. In Bishnupriya Manipuri language also, the ambiguity problem exists. For example, consider the following words. The text sentences in the corpus are annotated using gold standard tags as defined by TDILDC (Indian Language Technology Proliferation and Deployment Centre). A total of 22 tags are used for tagging purposes. For training the model, the tagging is done manually. The designed annotated corpus is verified by one Bishnupriya Manipuri linguistic expert.
4.4 Bishnupriya Manipuri Corpus For experimental work, we need a sizeable corpus for getting a satisfactory result. And that too for a supervised approach, it requires an annotated corpus. These types of resources are very hard to find for a language like Bishnupriya Manipuri. We have created an annotated corpus by manually tagging a portion of the sentences in our unannotated corpus with the help of the information obtained in the analysis step for our experiments. We have collected many texts from the Bishnupriya Manipuri version of Wikipedia [1] as well as from the Kaliprasad Darshan magazine. The unannotated corpus contains approx. 11,000 Bishnupriya Manipuri sentences. The texts are in Unicode format. This unannotated corpus is created in a way that each line contains only one sentence. Now, from this corpus, a part 1000 sentences are tagged with gold standard tagsets given in Table 2 for our experimental work. A total of 22 tags are used while preparing this annotated corpus. The annotated corpus is created with the help of a native Bishnupriya Manipuri speaker.
4.5 Feature Selection The selection of features has a great impact on the performance of a model. A feature dictionary of (token, label) pair for every token in a sentence is created for this purpose, which stores the surrounding context information, e.g., tag labels of the previous two words of a token. Generally, in sequence labeling task, the labels of the
24
B. Das and S. K. Sinha
Table 2 POS tagset for Bishnupriya Manipuri language S. No. Tag Description 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
NN PRON DM PSP CONJ VB VAUX NNP V_VM_VF RB JJ PR_PRP PR_PDP QTF VBZ DM_DMQ RP_NEG PUNC V_VM_INF DM_DMQ PR_PRL QT_QTC
Noun Pronoun Dative marker Post position Conjunction Verb Auxilary verb Personal noun Verb in finite form Adverb Adjective Personal pronoun Proximate demonstrative pronoun Quantifier Verb with gerund Quantifier question Verb negation Punctuation Verb in non-finite form Pronoun for question Remote demonstrative pronoun Quantifier/number suffix
surrounding tokens are essential information for predicting the label of the current token. The probabilities are actually calculated as a feature function of each lexical entity along with the neighboring contextual information.
4.6 Proposed Approach In this section, we present a brief introduction to the CRF model that is used in our proposed POS tagging approach for Bishnupriya Manipuri language. The proposed architecture is as shown in Fig. 1. Conditional random field is a discriminative model designed especially for sequence modeling tasks. Due to its ability in modeling sequential data, it is used in various NLP tasks like named entity recognition (NER), sequence prediction, shallow parsing, morphological analysis, etc. CRFs have been applied to a large variety of other domains also, like computer vision, bioinformatics, etc. It is basically an
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language
25
Fig. 1 Proposed architecture
Fig. 2 Graphical model of CRF
undirected graph of X and Y nodes, with a chain structure as shown in Fig. 2. X, here, is a sequence of words (x1 , x2 , . . . , xn ) in a sentence. Y is the sequence of tag labels (y1 , y2 , . . . , yn ) of the corresponding words. Our target is to find the probable tag label Y for a given sequence X, i.e., P(Y |X). The model calculates the conditional probabilty P(Y |X) of a state sequence Y = < y1 , y2 , . . . , yn ) given an observation sequence X = < x1 , x2 , . . . , xn > as: T 1 P= exp λm f m (Yi−1 , Yi , X, i) Z0 i=1 m
(1)
where f m (Yi−1 , Yi , X, i) is a feature function whose weight is to be learned in the training phase. To make all the conditional probabilities sum upto 1, the normalized function is used as follows:
26
B. Das and S. K. Sinha
Table 3 Comparative result of the proposed approach with some state-of-the-art approaches Approaches Training data (no. of Accuracy (%) sentences) NLTK bigram NLTK trigram NLTK bigram and trigram combined CRF (proposed)
Z0 =
1000 1000 1000
77.60 79.82 82.83
1000
86.21
exp
s
N i=1
λm f m (Yi−1 , Yi , X, i)
(2)
m
We use the sklearn-crfsuite library for our experiment. First, we convert the training data into CoNLL format supported by crfsuite. The features considered in the feature selection phase for the feature function are extracted from the training data. The most probable sequence of Y given X can be calculated as follows: Y = argmax(P(Y |X ))
(3)
5 Experimental Results For experimental analysis, we have created training and test data from the annotated corpus. We divided the corpus data in the ratio of 80:20 for experiment, i.e., 80% of the data are used for training purposes, and the rest are used for testing and validation. In the training phase, an annotated corpus of 1000 Bishnupriya Manipuri sentences is considered. It is annotated using gold standard tagsets. It comprises various different types of sentences including simple and compound sentences. From the experimental results, it has been observed that some words are labeled with incorrect tags. This may be due to the ambiguity problem as mentioned earlier. For evaluation, we calculate the accuracy using Sklearn library as a single measure of overall performance. The accuracy for our proposed CRF-based approach is found to be 86.21%. The obtained accuracy is on traning set of 1000 sentences. The result may vary with the sizes of training and test data. The comparative result of the proposed approach with some state-of-the-art approaches is shown in Table 3.
A CRF-Based POS Tagging Approach for Bishnupriya Manipuri Language
27
6 Discussion and Challenges From the results, we can say that CRF model can produce promising accuracy in tagging Bishnupriya Manipuri words. The limitation of the proposed work is that it assigns incorrect labels to some unknown words and ambiguous words. To solve this problem, we will try to increase the size of the annotated corpus. During the collection of text data and designing the corpus, we encountered several challenges; the first one is the collection of digital data, and the next challenge in designing the annotated corpus for Bishnupriya Manipuri is the ambiguity problem. The ambiguities arise in situations where the same lexical items belong to different lexical categories when they are used in different contexts. Assigning labels or tags to such words of a given sentence is sometimes difficult.
7 Conclusion and Future Works In this paper, we discussed various POS tags as well as inflection markers available in the Bishnupriya Manipuri language that are useful for manual tagging as well as for designing an automated model. Further, we present an automated CRF-based POS tagging approach. The language lacks linguistic resources and basic language processing tools to perform any computational tasks and hence is less computationally explored. To the best of our knowledge, no annotated corpus is still freely available for the Bishnupruya Manipuri language. We are unaware of any CRF-based POS tagging work on Bishnupriya Manipuri language. The result of the proposed is compared with some state-of-the-art techniques. Depending on the size of the corpus and the selection of features, the results may vary. From the experimental results, it can be inferred that our proposed approach can produce a satisfactory result. In future, we will further increase the size of this corpus to improve the accuracy.
References 1. Bishnupriya Manipuri Wikipedia. http://bpy.wikipedia.org 2. UNESCO atlas of the world’s languages in danger. http://www.unesco.org/culture/languagesatlas/en/atlasmap.html 3. Ahmad A, Syam B (2014) Kashmir part of speech tagger using CRF. Comput Sci 3(3):3 4. Antony P, Mohan SP, Soman K (2010) SVM based part of speech tagger for Malayalam. In: 2010 international conference on recent trends in information, telecommunication and computing. IEEE, pp 339–341 5. Barman AK, Sarmah J, Sarma SK (2013) POS tagging of Assamese language and performance analysis of CRF++ and FNTBL approaches. In: 2013 UKSim 15th international conference on computer modelling and simulation. IEEE, pp 476–479 6. Daimary SK, Goyal V, Barbora M, Singh U (2018) Development of part of speech tagger for Assamese using HMM. Int J Synth Emot (IJSE) 9(1):23–32
28
B. Das and S. K. Sinha
7. Dandapat S, Sarkar S, Basu A (2007) Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp 221–224 8. Ekbal A, Haque R, Bandyopadhyay S (2007) Bengali part of speech tagging using conditional random field. In: Proceedings of the seventh international symposium on natural language processing, SNLP-2007 9. Ekbal A, Haque R, Bandyopadhyay S (2008) Maximum entropy based Bengali part of speech tagging. Adv Nat Lang Process Appl Res Comput Sci (RCS) J 33:67–78 10. Ojha AK, Behera P, Singh S, Jha GN (2015) Training and evaluation of POS taggers in IndoAryan languages: a case of Hindi, Odia and Bhojpuri. In: The proceedings of 7th language and technology conference: human language technologies as a challenge for computer science and linguistics, pp 524–529 11. Pallavi K, Pillai AS (2016) Kannpos-Kannada parts of speech tagger using conditional random fields. In: Emerging research in computing, information, communication and applications. Springer, pp 479–491 12. Pandian SL, Geetha T (2009) CRF models for Tamil part of speech tagging and chunking. In: International conference on computer processing of oriental languages. Springer, pp 11–22 13. Parikh A (2009) Part-of-speech tagging using neural network. In: Proceedings of ICON 14. Patel C, Gali K (2008) Part-of-speech tagging for Gujarati using conditional random fields. In: Proceedings of the IJCNLP-08 workshop on NLP for less privileged languages 15. Paul A, Purkayastha BS, Sarkar S (2015) Hidden Markov model based part of speech tagging for Nepali language. In: 2015 international symposium on advanced computing and communication (ISACC). IEEE, pp 149–156 16. Avinash Pvs, Karthik G (2007) Part-of-speech tagging and chunking using conditional random fields and transformation based learning. In: Shallow parsing for South Asian languages, vol 21, pp 21–24 17. Rao D, Yarowsky D (2007) Part of speech tagging and shallow parsing of Indian languages. In: Shallow parsing for South Asian Languages, vol 17 18. Saharia N, Das D, Sharma U, Kalita J (2009) Part of speech tagger for Assamese text. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 33–36 19. Sinha KP (1981) The Bishnurpiya Manipuri language. Firma KLM Prvt Ltd. 20. Sinha KP (1986) An etymological dictionary of Bishnupriya Manipuri. Puthi Pustak 21. Suraksha N, Reshma K, Kumar KS (2017) Part-of-speech tagging and parsing of Kannada text using conditional random fields (CRFs). In: 2017 international conference on intelligent computing and control (I2C2). IEEE, pp 1–5 22. Warjri S, Pakray P, Lyngdoh S, Maji AK (2021) Adopting conditional random field (CRF) for Khasi part-of-speech tagging (KPOST). In: Proceedings of the international conference on computing and communication systems. Springer, pp 75–84
IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data Nazrul Hoque, Hasin A. Ahmed, and Dhruba K. Bhattacharyya
Abstract Feature selection (FS) is the problem of finding the most informative features that lead to optimal classification accuracy. In high-dimensional data classification, FS can save a significant amount of computation time as well as can help improve classification accuracy. An important issue in many applications is handling the situation where new instances arrive dynamically. A traditional approach typically handles this situation by recomputing the whole feature selection process on all instances, including new arrivals, an approach that is computationally very expensive and not feasible in many real-life applications. An incremental approach to feature selection is meant to address this issue. In this paper, we propose an effective feature selection method that incrementally scans the data once and computes credibility scores for the features with respect to the class labels. The effectiveness of the proposed method is evaluated on high-dimensional gene expression datasets using different machine learning classifiers. Keywords Features · Classification · Incremental · KNN · Supervised learning
1 Introduction Feature selection (FS) chooses an optimal, informative/relevant, and non-redundant subset of features from a pool of large features set. It is widely used in machine learning to identify or classify instances. The main purpose of FS is to select m number of features from n features such that m ≤ n and three conditions hold: (i) Data descripN. Hoque (B) Department of Computer Science, Manipur University, Canchipur, Imphal, Manipur 795003, India e-mail: [email protected] H. A. Ahmed Department of Computer Science, Gauhati University, Guwahati, Assam 14, India D. K. Bhattacharyya Department of CSE, Tezpur University, Napaam, Sonitpur, Assam 78028, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_3
29
30
N. Hoque et al.
tion is simplified; (ii) the task of data collection is reduced; and (iii) classification accuracy and/or quality of problem-solving is improved [9]. If a minimum number of features can yield better accuracy, then it is superfluous and time-consuming to consider a larger number of features to solve the same problem. Due to the high dimensionality of data, often it is hard to analyze and classify instances efficiently. Sometimes, the data also becomes drastically sparse in the space it occupies as the number of dimensions becomes high, leading to the curse of dimensionality for both supervised and unsupervised learning. Incremental FS (IFS) uses a dynamic approach to select an optimal feature set. It computes an optimal feature set from the information available from the instances, and when a new instance arrives, it updates the feature set dynamically without considering the earlier instances. The main purpose of IFS is to reduce the feature selection cost and complexity. Moreover, IFS is important in many applications of network intrusion detection and text categorization where real-time analysis is performed on the data instances to categorize them into classes. Organization of the paper is as follows. In Sect. 2, we explain the motivation and contributions of the paper. A brief discussion of related work is presented in Sect. 3. The proposed incremental feature selection method is explained in Sect. 4. In Sect. 5, we analyze and discuss the experimental results. Finally, future work with conclusions is discussed in Sect. 6.
2 Motivation and Contributions FS is a generic pre-processing steps used in machine learning. Accuracy of a classifier depends on the number of features, number of instances and the behavior of the attributes of the training objects. In many real-life data classification problems, realtime analysis in optimal time with high detection accuracy is important. Thus for ideal performance, a classifier should work with a minimum, non-redundant, and most relevant feature set to analyze data objects during classification. A static feature selection algorithm is often inadequate, especially for handling online data for a classifier, because it recomputes everything when new objects arrive. This is the reason why we develop an incremental feature selection algorithm that can select features to enable the classification of large volumes of data. Similarly, in situations such as Internet surfing, quick document classification is necessary to select the most relevant documents against the search words. An effective incremental feature selection method may select an optimal subset of features dynamically, possibly improving the classification accuracy of a dataset of any dimensionality with any number of instances. The main contribution of this paper is an incremental feature selection method to classify any large dataset in real-time with high classification accuracy. We evaluate the method in terms of classification accuracy when data objects arrive in an incremental way. We use three classifiers to validate the proposed FS method along with some existing FS methods.
Incremental Feature Selection
31
3 Related Work Classification assigns objects to one or several predefined categories. Classification is used to classify unknown instances considering a large number of known or labeled instances. A good classification method categorizes objects to their respective class labels with high classification accuracy but with low computational cost. However, curse of dimensionality [5] of dataset may lead to low detection accuracy and high false alarm rate for a classifier. FS plays an important role for a classifier by selecting the most relevant features among a large number of features to yield high classification accuracy. Many FS algorithms are designed as data pre-processing to help a classifier [1, 4]. People use statistical, probabilistic, information theoretical, rough-set, fuzzy set, and some other optimization techniques to develop feature selection methods. Most FS methods use correlation measures, information gain, particle swarm optimization, ant colony optimization, simulated annealing, and genetic algorithms. Although a significant number of feature selection methods have been proposed, most methods select a subset of relevant features in an offline environment. However, most data in many applications such as network anomaly detection are dynamic in nature. To analyze dynamic behavior of data instances, traditional methods for feature selection are not adequate. So, in the recent past, several significant efforts have been made to handle the feature selection problem for dynamically updated data. Such an incremental feature selection algorithm is proposed by Lie et al. [9] using a probabilistic approach. The algorithm is an incremental version of the Lag Vegas Filter (LVF) [2], which is designed to achieve the goal that features selected from the reduced data should not generate inconsistencies more than those optimal from the whole dataset. However, in the LVF algorithm, if the data size is reduced then the number of inconsistency checks may reduced as well. As a result, the set of features selected from the reduced data may not be suitable for the whole data. Therefore, the incremental version of LVF overcomes this problem without sacrificing the quality of feature subsets in terms of the number of features and their relevance. An embedded incremental feature selection method for reinforcement learning, called IFSE-NEAT, is proposed by Wright et al. [12]. This is an incremental version of Whitesons et al.’s [11] neuroevolutionary function approximation algorithm called NEAT. IFSE-NEAT aims to select features incrementally and embed them to the uroevolutionary process of NEAT. The beauty of IFSE-NEAT is that it starts the network with only one feature as oppose to NEAT where the network starts with the full set of features. IFSE-NEAT evaluates features iteratively considering the weights and topology of the network and in each iteration it adds features to the current best network that contributes most to its performance improvement. Katakis et al. [7] present an incremental FS method and applied in text data classification. FS methods select an optimal set of features using an effective selection objective or individual discriminative criteria. Commonly used searching strategies include exhaustive, heuristic, and random searches to find the best set of features. Due to the exponential and quadratic time complexities for exhaustive search and heuristic search, respectively, people prefer to apply random search that operates in linear time
32
N. Hoque et al.
complexity. So, to improve the search time for optimal feature subset selection, Ruiz et al. [10] propose a wrapper-based selection method called best incremental ranked subset (BIRS) to select optimal features set incrementally. The time complexity of BIRS is reduced significantly transforming the combinatorial sequential search into a quadratic search. An incremental forward feature selection (IFFS) method inspired by an incremental reduced support vector machine is proposed by Lee et al. [8]. IFFS uses the forward feature selection wrapper approach. The main objective of this method is to select a new feature that brings the most additional information. The most new information contributed by a feature is computed using statistical measures. It excludes highly linear correlated features to remove redundancy among the features.
4 IFS: Proposed Incremental Feature Selection Method The proposed IFS algorithm maintains a profile for each class label of an object. The profile contains four parameters, viz. label of an object, total number of objects belong to a particular class profile, a representative vector that stores the average feature values of all the objects that belong to a class profile and finally, a credibility vector that stores a value for each feature that represents relative strength of a feature to identify the class label of an object. Using these profile parameters, the method computes a credibility score for each individual feature with respect to given class label. The credibility score of a feature represents how much important the feature is to identify the class of an object. The profile of the objects is updated incrementally as the new objects arrive. The framework of our proposed incremental feature selection method is shown in Fig. 1. Let, Oi be an object with features ( f 1 , f 2 , . . . , f n ) where the last feature, i.e., f n represents the class label of object Oi . Object Oi is represented by a profile with the following information as shown in Fig. 2. With the arrival of new object, the following cases may arise 1. Case 1: If the existing profiles do not cover the class of the newly arrived object, the method builds a profile for the class with the profile parameters shown in Fig. 2. In the next iteration, if an object arrives with that class label, then the parameters of this class profile will be updated. 2. Case 2: When an existing profile covers but needs to be updated to accommodate (may be, newly created) the new instance. In such case instead of creating a new profile, only the feature values of the parameters of the class profile are updated. Initially, the credibility vector contains the value 1 for each individual feature of the object. If a profile already exists for an object class, then the representative vector and credibility vector for that class profile may be updated. The proposed feature selection method handles numeric and categorical features separately when updating the representative vector and credibility vector for each class. For each numeric feature f k of a newly arrived object O j , the method first chooses the closest
Incremental Feature Selection
33
New/old object, Oi No
profile for Oi exists?
Yes Compute Relevance score Update
Profile Base
Create profile Profile attributes
Read profiles Compute Feature Weight & Ranking Ranked Feature Subset Classifier Classify objects Fig. 1 Framework of the proposed method
Profile(i). Profile(i). Profile(i). Profile(i). label objectcount representative credibility Holds the relevance score for each feature of an object Holds the feature weight of an object Oi to represent the class label Number of objects belong to the class of the object Class label of object(Oi)
Fig. 2 Profile parameters
profile, say profile(i), for each feature f k from the available object profiles. If the class label of the closest object profile(i) is the same as the class label of the newly arrived object O j , then the representative vector and credibility vector for profile(i) are updated as follows. (( pr o f ile(i).objectcount − 1) ∗ pr o f ile(i).r epr esentative(k) + f k ) pr o f ile(i).objectcount
(1)
pr o f ile(i).cr edibilit y( f k ) = pr o f ile(i).cr edibilit y( f k ) + 1
(2)
34
N. Hoque et al.
Equation 1 computes the average of each individual feature values from the objects that belong to the same class profile. Equation 2 increases the credibility score of an individual feature of an object when it is matched to a particular class profile. For each categorical feature f c of an object O j , the method first determines the profile(i) for which profile(i).label = label(O j ). If the value of profile(i).representative( f c ) == O j ( f c ), then only the credibility vector of the profile is updated as in Eq. 2. At this point, each profile contains the representative vector, credibility vector, and number of objects belonging to each individual class. Finally, the weight of each individual feature f k is computed from m-number of class profiles using Eq. 3. m compute(weight ( f k )) =
cr edibilit y( f k ) j=1 pr o f ile( j).objectcount
m
(3)
Equation 3 first computes the average credibility of a feature from all the class profiles. The credibility of a feature represents a relevance score to determine the class label for an object. The credibility score and objectcount parameters are used to compute a weight for each individual feature. Based on the weight of each individual feature, a rank has been assigned, i.e., the feature that has the highest weight is given the highest rank. During distance computation, we handle numeric and categorical features differently. The distance between two categorical features is 1 if they are matching exactly otherwise the distance is 0. But, for numeric features, we use L1 distance computation.
4.1 Complexity Analysis of the Proposed IFS The complexity of the proposed IFS algorithm is presented in this section. 1. To find the closest profile of an object Oi over each individual feature, the method takes O(m × n) times, where m is the number of class profiles and n is the number of features of an object. 2. The method takes O(n) times to compute the feature weights of each individual feature. 3. Overall complexity of the method is O(m × n) + O(n).
5 Results and Experimental Analysis on IFS 5.1 Dataset Description During our experimental analysis, we use three gene expression datasets. These datasets contain numerical values without having missing value. Description of the datasets is given Table 1.
Incremental Feature Selection Table 1 Dataset description Dataset Gene expression Dataset
Leukemia SRBCT Colon cancer
35
Number of instances
Number of attributes
72 83 62
7131 2301 2001
Fig. 3 Comparison on Colon Cancer dataset
Fig. 4 Comparison on SRBCT dataset
Fig. 5 Comparison on Leukemia dataset
Analysis on Gene Expression datasets In gene expression datasets, the proposed incremental feature selection algorithm gives high classification accuracy for all the three classifiers on Colon Cancer, SRBCT, and Leukemia datasets. As shown in Figs. 3, 4, and 5, the accuracy on all the three classifiers for all the five feature selection methods is high except the KNN classification accuracy on Leukemia dataset which is a bit less on feature number 2001, 3001, and 4001 only compared to ReliefF.
36
N. Hoque et al.
6 Conclusion An incremental feature selection method is reported to support effective classification of high-dimensional data. The method selects a subset of highly ranked features that are useful for a classifier to predict data objects. The method uses the filter approach and assigns relevance score to each individual feature. We analyzed the classification accuracy of our incremental feature selection method using decision trees, random forests, and KNN classifiers based on tenfold cross validation to evaluate the classification accuracy and have been found highly satisfiable. As a future work, we are planning to develop a distributed version of the IFS method for classification of high-dimensional network data.
Acknowledgements The work is funded by UGC under Start-up-Grant (2021–2023) Order No. F.30-592/2021(BSR), Govt of India.
References 1. Sang B, Chen H, Yang L, Li T, Xu W (2021) Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans Fuzzy Syst 2. Brassard G, Bratley P (1996) Fundamentals of algorithmics. Prentice-Hall, Inc 3. Ren D, Fei C, Taoxin P, Neal S, Qiang S (2014) Feature selection inspired classifier ensemble reduction. IEEE Trans Cybern 44(8):1259–1268 4. Hoque N, Bhattacharyya DK, Kalita JK (2014) Mifs-nd: a mutual information-based feature selection method. Exp Syst Appl 41(14):6371–6385 5. Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inform Theor 14(1):55–63 6. Anil J, Douglas Z (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158 7. Katakis I, Tsoumakas G, Vlahavas I (2006) Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Knowledge discovery from data streams, pp 107–116 8. Yuh-Jye L, Chien-Chung C, Chia-Huang C (2008) Incremental forward feature selection with application to microarray gene expression data. J Biopharmaceut Stat 18(5):827–840 9. Huan L, Rudy S (1998) Incremental feature selection. Appl Intell 9(3):217–230 10. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392 11. Shimon W, Peter S (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7:877–917 12. Robert W, Steven L, Yu L (2012) Embedded incremental feature selection for reinforcement learning. Technical report, DTIC Document
Computer-Aided Identification of Loom Type of Ethnic Textile, the Gamusa, Using Texture Features and Random Forest Classifier Kangkana Bora, Lipi B. Mahanta, C. Chakraborty, Prahlad Borah, Kungnor Rangpi, Barun Barua, Bishnu Sharma, and R. Mala
Abstract In the context of Industrial Revolution (IR4.0), artificial intelligence (AI) has been playing a key role even in the textile industry for augmenting the quality of textiles in terms of identifying the defects. However, a more vital application has emerged in recent times because of a type of loom’s fraudulent use, triggering much socio-economic pain. An effort has been made here to develop an AI-based loom recognizer for a particular piece of Handloom textile item, called the ‘Gamusa’, which is prolifically used by the mass in the study region. The imitated Powerloom counterparts of this item have flooded the market, causing immense loss. The proposed methodology is developed based on a textile database of 7200 images of the two different loom types (Handloom and Powerloom). The texture features of these looms are extracted, and the significant ones, based on t-test, are used to design the feature set. Next, all possible feature combinations are identified and adopted for training. The performance of the classifier is evaluated based on seven different measures. The proposed methodology achieves the average highest accuracy of 97.83% for the automated recognition of looms. It also shows high Precision [Handloom = 97%, Powerloom = 98%], Recall [Handloom = 98%, Powerloom = 97%], and F1-score [Handloom = 98%, Powerloom = 98%]. High values of Precision, Recall, and F1-score indicate that the texture features can be successfully used as an optimal feature vector in loom type identification.
K. Bora · P. Borah · K. Rangpi · B. Barua Department of Computer Science and IT, Cotton University, Guwahati, Assam 781001, India L. B. Mahanta (B) Mathematical and Computational Sciences Division, Institute of Advanced Study in Science and Technology, Guwahati, Assam 781035, India e-mail: [email protected] C. Chakraborty Department of Computer Science and Engineering, NIITTTR, Kolkata, India B. Sharma · R. Mala Department of Computer Applications, Assam Engineering College, Guwahati, Assam, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_4
37
38
K. Bora et al.
Keywords Textile manufacturing · Loom identification · Texture features · Machine learning · Random forest classifiers · Performance evaluation
1 Introduction The Handloom textile sector plays a vital role in the Indian economy [10], and the state of Assam alone accounts for 38.6% of the total workforce [15, 18]. In the context of Industrial Revolution (IR4.0), artificial intelligence (AI) has been playing a key role even in the textile industry for augmenting the quality of textiles in terms of identifying the defects [2, 3, 4, 18]. However, a more vital application has emerged in recent times because of a type of loom’s fraudulent use, triggering much socio, in Assamese text) is an item of reverence of the economic pain. Gamusa ( culture and a significant part of the Assamese Handloom Industry. Unfortunately, the Gamusa Handloom industry is dying due to the mushrooming of imitated Powerloom products [12]. The level of artistry and intricacy achieved in the Gamusa Handloom fabrics is unparalleled, and specific weaves/designs are still beyond the scope of modern machines [18]. It is hard to distinguish the difference between the two looms on a manual check. Hence, it is imperative to provide a scientific basis for this classification to the administration experts to enable them to take proactive actions against selling Powerloom products, protect this culture, and boost the Handloom industry. An automated system using artificial intelligence and image processing provides an efficient means to identify the patterns and differentiate them from Powerloom Gamusa. The proposed work’s objective is to design an automated system to detect Handloom Gamusa, which will be cost and time effective. Our current study is a novel framework rather than a novel algorithm. We generated a database of 7200 images of two different loom types (Handloom and Powerloom) in the proposed work. The images’ texture features have been computationally analyzed using first-order histogram statistics, GLCM second-order statistics, discrete wavelet transform (DWT) with different filter combinations, and porosity analysis. Then independent sample t-test has been performed to check the statistical significance of the features concerning two different class types. Then significant features are considered for the final feature set design. Next, all possible feature combinations are identified and classified using an artificial neural network, and assessments are performed based on seven different measures. As per our knowledge, this is the first work in the direction of loom type identification. Notably, such algorithms were already applied for other application areas, but not for loom type classification. Thus, it is our first attempt to use such multi-disciplinary feature extraction, selection, and classification strategies together for loom type classification and learning. The rest of the paper is organized as follows: Sect. 3 provides the overview and explains
Computer-Aided Identification of Loom Type of Ethnic Textile …
39
the methodology used in experiments. Section 4 includes the results and discussion followed by a conclusion in Sect. 5.
2 Related Works It is observed that direct image processing and AI-related study specifically on gamusa are not available. Still, some studies are available on image processing and AI techniques in the textile sector. Some of the literature is included in the following table (Table 1). The duration considered for the literature survey is 2005–2020. Yildirim et al. [21] have provided a literature review on the application of data mining and machine learning in the textile industry. Fabric texture analysis using computer vision technique is explored by Wang et al. [20], and Khan et al. [11]. Some hardware/sensor-based study is also available, which may not be directly related to our work [16, 17].
3 Methodology Figure 1 shows the block diagram of the proposed work. The work has been completed in six phases. Phase 1: The image database of gamusa has been generated during this period. In doing so, a repository of two different types of gamusa samples has been collected, namely Handloom and Powerloom, from twelve different sources. Acquisition of data is one of the most crucial parts of any image processing work. Here, only the body part of a gamusa is considered for image acquisition, as shown in Fig. 2. A NICON D3400 camera with lens magnification of 55 mm is used for image capture. The camera was held in a parallel direction with the object (gamusa), and a distance of 10 cms is maintained in each capture. The total source of database collection is twelve. From 20 gamusa (for each loom type), 20 images from the body are captured, then image slicing is performed, and 9 images are generated from every single image captured. So total image generated is 3600 (= 20 Gamusa * 20 images each * 9 sliced images) for each loom, with a total size of 7200. Image Slicer module of Python (https://samdobson.github.io/image_slicer/) is used for getting 9 images from one single image. This is done for data augmentation purposes to reduce resolution, which will decrease the computation time as well. Phase 2: All the images are preprocessed using two popular techniques, viz. CLAHE and Gaussian filter to reduce noise. Then root mean square error (RMSE) has been observed in both the processes, and Gaussian filter is selected for final pre-processing.
Paramasivam et al. [14]
Zheng et al. [9]
Huang et al. [6]
2
3
4
In this study, textile grading of fleece based on pilling assessment was performed using image processing and machine learning methods
Identification of warp and weft accurately
Fabrics
Three kinds of yarn-dyed cotton
Identifying defects in No information a Handloom silk fabric using image analysis techniques
Assessment of textile Plain woven porosity by the cotton fabrics application of the image analysis techniques
Ahmet et al. [1]
1
Application
Objective
S/L Authors
Discrete wavelet transforms and the first-order statistical features, such as mean and standard deviation, are obtained and stored in a library. The obtained value is compared with the reference image value for determining any kind of defects on the fabric
Porosity calculation using a mathematical formula
Methodology
320 representative samples were collected and classified as grade 2, 3, 4, or 5. Each grade comprised of 80 samples
The obtained grayscale images were filtered using two methods: the DFT method combined with Gaussian filtering was used to smooth the grayscale images. ANN and SVM are used for classification
24 samples of A novel structure detection method is microscopic images developed based on Radon transform by using the high-resolution images of fabric yarn patterns
No information
30 microscopic images
Dataset
Table 1 Literature review on the application of image processing and AI techniques in the textile sector
(continued)
Classification accuracies of the ANN and SVM were 96.6% and 95.3%, respectively, and the overall accuracies of the Daubechies wavelet were 96.3% and 90.9%, respectively
The edge-based projection method performs better than the gray projection method, especially when there is long hairiness on the fabric surface
Defect identification is shown visually but without any quantitative assessment and validation
Light transparency of the looser fabrics is higher than that of the tighter because of the bigger pore dimensions
Findings
40 K. Bora et al.
Fabric defect detection
Jing et al. [8]
Pawening et al. [16]
5
6
Textile image classification based on texture
Objective
S/L Authors
Table 1 (continued)
Different cloth material with variant design
No information
Application
450 different textured images
TILDA database
Dataset
Sensitivity, specificity, and detection success rate are measured where sensitivity is in the range of 90–96%. Specificity is in the range above 96%, and the detection success rate is above 93% for different defect types
Findings
They have used feature extraction The accuracy achieved is methods GLCM, local binary pattern, 74.15% and moment invariant (MI). Then feature reduction is performed using PCA, followed by classification using SVM
Gabor filters are used for feature extraction, followed by feature reduction using kernel PCA. Euclidean normal and OTSU are used for similarity matrix calculation
Methodology
Computer-Aided Identification of Loom Type of Ethnic Textile … 41
42
K. Bora et al.
Powerloom
Handloom
Fig. 1 Overview of the proposed work
Fig. 2 Image acquisition and augmentation strategy
Classification
Computer-Aided Identification of Loom Type of Ethnic Textile …
43
Fig. 3 Output of segmentation process for porosity-based feature analysis
Phase 3: Feature extraction is the primary focus for computation learning. Three different techniques, namely discrete wavelet transform (DWT), histogram analysis, and gray-level co- occurrence matrix (GLCM), have been analyzed to study the texture features. Porosity analysis is done to explore the morphological features using the proposed technique. A simple yet effective algorithm is developed for porosity analysis. The input images are pre-processed using CLAHE and then followed by global OTSU thresholding technique. Figure 3 shows the segmentation process for porosity analysis. The objects in the output will represent the pores. Once the pores are identified, the total number of pores in an image, the maximum and minimum area of the pores, and the maximum and minimum perimeter of the pores are extracted. So, the feature extracted after porosity analysis is shown in the following equation. fporosity = {no of pores, minimum_area, maximum_area, minimum perimeter, maximum_perimeter} Phase 4: In this step, an independent sample t-test is performed to analyze the statistical significance of the features considered, and final feature sets are designed. Phase 5: Here, final feature sets are designed by all possible combinations of features obtained from Phases 3 and 4. Phase 6: Finally, the classification of the Gamusa images is performed, and final assessments are obtained using a popular machine learning classifier random forest classifier.
44
K. Bora et al.
4 Results and Discussion Experimental setup: Experiments are performed in a personnel computer with core i5, 9300 h, 16 GB RAM, with Python version 3.8.3. Assessment measures: For evaluation purposes, we have used six assessment measures listed in Analysis of pre-processing filters: In this work, we have compared two pre-processing techniques: firstly, the CLAHE and secondly, the Gaussian filter. Then the RMSE values of each method are evaluated. The range of RMSE values associated with both techniques is depicted using the boxplot in Fig. 4. As observed in the figure, the RMSE value of the Gaussian filter is much lower than the CLAHE for both the class of Handloom and Powerloom. Since the criteria of good RMSE value are always to choose the lower one, based on that Gaussian pre-processing is confirmed for further analysis (Table 2). Analysis of feature selection method The t-test is a straightforward, yet powerful statistical method to identify the signal to noise ratio of any given set of values. It thus makes an effective comparison of different such sets of values in identifying that set for which this ratio is the least.
Fig. 4 Comparison of RMSE values of CLAHE and Gaussian filter
Table 2 Selection criteria of each measures are also listed in the table Measures
Formula
Criteria of evaluation
Accuracy
(TP + TN)/(TP + TN + FN + FP)
Higher the better
Precision
(TP)/(TP + FP)
Higher the better
Recall
(TP)/(TP + FN)
Higher the better
F1-score
(2 × Precision × Recall)/(Precision + Recall)
Higher the better
Computer-Aided Identification of Loom Type of Ethnic Textile …
45
The ‘Sig value’ denotes the level of significance of the t-test conducted for a set of values. It gives the probability that the observed difference is due to random chance. Hence, lower the value, the lower is the probability that it is not due to chance, indicating higher significance of that set. The next step was to conduct the t-test for all the features sets. The results are reflected in Table 3. It is clear from this table that few features are not statistically significant. Hence, they were ignored for further analysis. Analysis of classification Different feature sets have been designed for classification purposes using all possible combinations of histogram, GLCM, DWT, and porosity feature sets. The same has been names as H (histogram features), G (GLCM features), D (DWT features), P (porosity features), G + H (the combination of GLCM and histogram features), G + P, G + D, H + D, H + P, P + D, H + P + D, G + P + D, G + H + P, G + H + D, and G + H + P + D (all features combinations). Each feature set has been classified using random forest classifiers with default parameters first. The confusion matrices associated with each feature set classification are displayed in Fig. 5. Assessments of Table 3 Result of independent sample t-test to identify significant features Methods
Feature name
Sig value
Feature before selection
Feature after selection
Histogram
Mean
0.001
6
5
Energy
0.479
Standard deviation
0.000
Skewness
0.000
Kurtosis
0.000
Entropy
0.000
GLCM
All 44 features
0.000
44
44
DWT (COIF3)
LL mean
0.000
8
7
LL variance
0.000
5
5
Porosity
LH mean
0.000
LH variance
0.000
HL mean
0.000
HL variance
0.000
HH mean
0.000
HH variance
0.124
No of pores
0.000
Mean area
0.000
Max area
0.000
Mean parameter
0.028
Max parameter
0.000
46
K. Bora et al.
all six measures have been listed in Table 4. It has been observed that all the feature sets are significant and contribute highly to the classification of the loom types. The G + H + P + D feature set gives the accuracy of 97.5% with default parameters in the classification of looms. It also shows high Precision [Handloom = 98%, Powerloom = 97%], Recall [Handloom = 98%, Powerloom = 97%], and F1-score [Handloom = 98%, Powerloom = 97%]. High values of Precision, Recall, F1-score indicate that the texture features can be successfully used as an optimal feature vector in loom type identification. High values of accuracy, Precision, Recall, and F1-score indicate that histogram, GLCM, DWT, and porosity can be successfully used as a feature vector in loom type identification (Table 5).
Fig. 5 Confusion matrices associated with each feature set classification
Computer-Aided Identification of Loom Type of Ethnic Textile …
47
Table 4 Accuracy score with random forest Combinations Accuracy Precision
Recall
F1-score
Handloom Powerloom Handloom Powerloom Handloom Powerloom H
0.96
0.96
0.97
0.97
0.96
0.97
0.97
G
0.95
0.95
0.95
0.95
0.95
0.95
0.95
P
0.80
0.82
0.80
0.80
0.82
0.81
0.81
D
0.885
0.88
0.89
0.88
0.88
0.89
0.88
G+H
0.9722
0.97
0.97
0.97
0.97
0.97
0.97
G+P
0.967
0.97
0.96
0.96
0.97
0.97
0.97
G+D
0.9560
0.96
0.95
0.95
0.96
0.96
0.96
H+D
0.9356
0.93
0.94
0.93
0.93
0.94
0.94
H+P
0.9625
0.96
0.96
0.96
0.96
0.96
0.96
P+D
0.88
0.89
0.87
0.86
0.90
0.88
0.88
G+H+P
0.9782
0.98
0.98
0.98
0.98
0.98
0.98
H+P+D
0.9444
0.95
0.94
0.94
0.95
0.94
0.94
G+P+D
0.96435
0.96
0.96
0.96
0.96
0.96
0.96
G+H+D
0.9694
0.97
0.97
0.97
0.97
0.97
0.97
0.98
0.98
0.98
0.97
0.97
0.97
G + H + P + 0.97824 D
Table 5 Some examples of the inference of the proposed model
Image
Ground truth
Features
Prediction
Handloom
G+H+P+D
Handloom
Handloom
G+H+P+D
Handloom
Powerloom
G+H+P+D
Powerloom
Powerloom
G+H+P+D
Powerloom
5 Conclusion The present study establishes that experts and officials concerned with Handloom and textiles can successfully implement AI-based techniques to detect the type of loom to provide a scientific basis for their manual process. The proposed method
48
K. Bora et al.
uses texture feature sets to distinguish the same on the textile item called ‘Gamusa’ of Assam, with a 97.824% accuracy. In the future, additional features, and more variations of the item resulting in more diverse images, can be incorporated to make the algorithm more robust. Further, it is evident that the system can be scaled to include other items of fabric.
References 1. Çay A, Vassiliadis S, Maria Rangoussi IT (2007) On the use of image processing techniques for the estimation of the porosity of textile fabrics. Int J Mater Text Eng 2(2):421–424. https:// doi.org/10.5281/zenodo.1077048 2. Kulkarni AH, Patil SB (2012) Automated garment identification and defect detection model based on texture features and PNN. Int J Latest Trends Eng Technol 1(2):37–43 3. Sabeenian RS, Paramasivam M, Dinesh PM (2012) Computer vision based defect detection and identification in Handloom silk fabrics. Int J Comput Appl 42(17):41–48. https://doi.org/ 10.5120/5789-8106 4. Ghosh A, Guha T, Bhar RB, Das S (2011) Pattern classification of fabric defects using support vector machines. Int J Cloth Sci Technol 23(2/3):142–151. https://doi.org/10.1108/095562211 11107333 5. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621. https://doi.org/10.1109/TSMC.1973.4309314 6. Huang M-L, Fu C-C (2018) Applying image processing to the textile grading of fleece based on pilling assessment. Fibers 6(4):73. https://doi.org/10.3390/fib6040073 7. Jamali N, Sammut C (2011) Majority voting: material classification by tactile sensing using surface texture. IEEE Trans Rob 27(3):508–521. https://doi.org/10.1109/TRO.2011.2127110 8. Jing J, Fan X, Li P (2016) Automated fabric defect detection based on multiple Gabor filters and KPCA. Int J Multimedia Ubiquitous Eng 11(6):93–106. https://doi.org/10.14257/ijmue. 2016.11.6.09 9. Zhang J, Wang Y, Zhiyu Zhang CX (2011) Comparison of wavelet, Gabor and curvelet transform for face recognition. Optica Applicata 41(1):183–193. https://opticaapplicata.pwr.edu.pl/ article.php?id=2011100183 10. Kadapa-Bose S (2018) Did you know that Indian handlooms hold 95% of the handwoven fabrics in the world? https://www.thebetterindia.com/155158/indias-handlooms-handwovenfabrics-heritage/ 11. Khan B, Wang Z, Han F, Iqbal A, Masood R (2017) Fabric weave pattern and yarn color recognition and classification using a deep ELM network. Algorithms 10(4):117. https://doi. org/10.3390/a10040117 12. Zhang K, Butler C, Qingping Yang YL (1997) A fiber optic sensor for the measurement of surface roughness and displacement using artificial neural network. IEEE Trans Instrum Meas 46(4). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.425.723 13. Machine-made fabrics pose threat to handloom weavers (2011, August 7). Times of India. https://timesofindia.indiatimes.com/city/varanasi/machine-made-fabrics-pose-threatto-handloom-weavers/articleshow/9520927.cms 14. Paramasivam ME, Sabeenian RS (2010) Handloom silk fabric defect detection using first order statistical features on a NIOS II processor. In: Communications in computer and information science, vol 101. Springer, Berlin, Heidelberg, pp 475–477. https://doi.org/10.1007/978-3-64215766-0_77 15. Patil UN (2012) Role of handloom industry in India. Int Indexed Referred Res J 4(39):1–2
Computer-Aided Identification of Loom Type of Ethnic Textile …
49
16. Pawening RE, Dijaya R, Brian T, Suciati N (2015) Classification of textile image using support vector machine with textural feature. In: 2015 international conference on information and communication technology and systems (ICTS), pp 119–122. https://doi.org/10.1109/ICTS. 2015.7379883 17. Soh L-K, Tsatsoulis C (1999) Texture analysis of SAR sea ice imagery using gray level cooccurrence matrices. IEEE Trans Geosci Rem Sens 37(2):780–795. https://doi.org/10.1109/ 36.752194 18. Subramaniam V (2017) Why handloom is still an attractive Industry for Startups. Entrepreneur India 19. Sundari BS (2017, March 21) Handlooms are dying—and it’s because of our failure to protect them. The Wire. https://thewire.in/culture/handlooms-are-dying-and-its-because-ofour-failure-to-protect-them 20. Wang X, Georganas ND, Petriu EM (2011) Fabric texture analysis using computer vision techniques. IEEE Trans Instrum Meas 60(1):44–56. https://doi.org/10.1109/TIM.2010.206 9850 21. Yildirim P, Birant D, Alpyildiz T (2018) Data mining and machine learning in textile industry. Wiley Interdisc Rev Data Mining Knowl Discov 8(1):e1228. https://doi.org/10.1002/widm. 1228
Spanning Cactus Existence, Optimization and Extension in Windmill Graphs Chinmay Debnath, Krishna Daripa, Ritu Mondal, and Alak Kumar Datta
Abstract We have studied the spanning cactus existence problem, minimum spanning cactus problem, and the minimum spanning cactus extension problem on windmill graphs. We have proved that there always exists a spanning cactus in every windmill graph. Next, we prove that it is NP-hard to compute the minimum scanning cactus of a windmill graph. We present a necessary and sufficient condition for the spanning cactus extension of a forest in a windmill graph and prove that the minimum spanning cactus extension of a forest in a windmill graph can be solved in polynomial time. Keywords Windmill graph · Cactus · NP-completeness · Minimum spanning cactus · Spanning cactus
1 Introduction Presently, cactus structures are frequently used for construction of a backbone of network communication systems. The communication failures can be significantly reduced if a cactus graph model is used instead of a tree structure as a backbone. It enhances the accessibility and reliability of the network. Several pieces of literature on the cactus graph are available in [2, 4–10, 17, 21]. An undirected connected graph G(V, E), where every edge of G is contained in exactly one cycle, is called as a cactus. A partial cactus is an undirected connected graph where every edge is contained in at most one cycle. Traffic estimation [24], genome comparisons [22], and cuts of graphs representation [11, 12] are also some of the important application areas of cactus structures. C. Debnath (B) · K. Daripa (B) · R. Mondal · A. K. Datta Department of Computer and System Sciences, Visva-Bharati University, Santiniketan 731235, West Bengal, India e-mail: [email protected] K. Daripa e-mail: [email protected] A. K. Datta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_5
51
52
C. Debnath et al.
The spanning cactus extension problem (SCEP) is NP-complete [6, 17]. In [5, 6, 8, 17], the minimum spanning cactus problem (MSCP) in general graphs has been studied. The NP-completeness of the MSCP in an undirected graph has been presented in [6, 17]. The MSCP for directed complete graphs is NP-complete as well [21]. A polynomial time algorithm has been presented for the minimum spanning cactus extension (MSCE) problem on a complete graph in [17]. In [8], Debnath and Datta have presented a linear time algorithm for MSCP and MSCE problem on outer planar graph [19, 20]. A linear time algorithm for MSCP on a Halin graph has been presented in [7]. A brief study on the SCEP in the Petersen graph has been reported by Debnath and Datta in [9]. They also presented an algorithm for SCEP on a three-dimensional (3 × 3 × 3) grid graph in O(1) time [10]. Here, we have proved that there always exists a spanning cactus in a windmill graph. We have also proved that computing a minimum spanning cactus (MSC) in a windmill graph is NP-complete [1, 3, 13]. Next, we have taken up the minimum spanning cactus extension problem. We have presented a necessary and sufficient condition for spanning cactus extendibility of a forest in a windmill graph. At the end, we have presented an O(mn 3 ) time algorithm for the minimum spanning cactus extension of a forest in a windmill graph, if it exists.
2 Preliminaries A spanning cactus (SC) of a graph G is a subgraph G of G, if G is a cactus spanning all the vertices of G [23]. Multiple spanning cacti may exist in a graph G [14–16, 18]. Figures 2 and 3 represent two spanning cacti of the graph in Fig. 1, whereas, for some graphs, as in Fig. 4, SC may not exist at all. Suppose a partial cactus PC(V, E ) of a graph G(V, E) is given. If the inclusion of some edges from E \ E into PC results in a cactus, then PC is cactus extendible; otherwise, non-extendible. If sum of the edge costs of the produced cactus is the minimum among all such spanning cacti, then it is called the minimum spanning cactus extension.
Fig. 1 Undirected graph G
Fig. 2 One of the several spanning cacti of G
Spanning Cactus Existence, Optimization, and Extension in Windmill Graphs
53
Fig. 3 Spanning cactus of G other than that shown in Fig. 2
Fig. 4 Graph containing no spanning cactus
Fig. 5 Windmill graph W G(6, 3)
v
3 Windmill Graph The windmill graph W G(n, m) where n > 2, m ≥ 1 is an undirected graph consisting of m number of complete graphs K n sharing a universal vertex {v} that is contained in all K n s. Figure 5 presents a windmill graph.
3.1 Construction of Windmill Graph W G(n, m) Let {v, a1 , a2 , . . . , an−1 } be the n number of vertices, and construct a complete graph K n of these vertices. Again, suppose, {v, b1 , b2 , . . . , bn−1 } be the vertex set of the second K n . In a similar fashion, we take m complete graphs K n that are articulated at vertex v. Some properties of windmill graph W G(n, m) are: • • • • •
The windmill graph W G(n, m) has (n − 1)m + 1 vertices and (n − 1)nm/2 edges. Except for the vertex v, all vertices are of degree (n − 1). The degree of the central vertex v is m(n − 1). Windmill graph is non-Hamiltonian as the central vertex is a cut vertex. Windmill graph W G(n, m) is non-planar for n > 4. This is so because W G(n, m) contains k5 when n > 4.
54
C. Debnath et al.
Fig. 6 Spanning cactus of the windmill graph in Fig. 5
v
4 SCEP in a Windmill Graph Theorem 1 There exists a spanning cactus in every windmill graph. Proof In a windmill graph W G(n, m), there are m copies of K n . Since the complete graph K n , where n > 2, is a Hamiltonian, clearly each K n contains at least one cactus. Since, all the K n are articulated in a single central vertex v in W G(n, m), m number of cacti share a single vertex v. No other vertex is shared. So, there exists a SC in every W G(n, m). Figure 6 shows a SC of the windmill graph in Fig. 5. Hence the proof.
5 MSCP in a Windmill Graph We have mentioned earlier that there may be multiple spanning cacti of a graph. In Sect. 4, we have shown that a SC always exists in a windmill graph. As there are many spanning cacti of every complete graph, there are many spanning cacti of every windmill graph also. In addition, if the given graph is weighted that is all the edges are assigned with non-negative weights, then different SC of a windmill graph may have different weights. The MSCP is the problem of finding a SC of minimum total weight. The NPcompleteness of MSCP in a general weighted graph is presented in [6, 17, 21], whereas for directed graph, it has been studied in [21]. For a complete graph with edge weights satisfying the triangle inequality, the MSCP is equivalent to the TSP [17]. Using this equivalence, the hardness of finding an approximation algorithm was proved for the MSCP problem satisfying the triangle inequality [17]. Now, let us consider the complexity of MSCP in a windmill graph W G(n, m).
Spanning Cactus Existence, Optimization, and Extension in Windmill Graphs
55
Theorem 2 The MSCP in a weighted windmill graph W G(n, m) is a NP-complete. Proof We use the proof by restriction method [13]. Assume a restricted instance of the problem where m = 1, i.e., there is only one complete graph kn in the windmill graph G. In this case, the problem gets reduced to MSCP in a complete graph. As the MSCP in a complete graph is NP-complete [17], we claim that MSCP in a windmill graph is also NP-complete.
6 MSCE in a Windmill Graph Now we consider a more general problem, the spanning cactus extension (SC E) problem. Let F be a forest of a graph G. In some cases, F can be augmented by adding edges from (G \ F) to obtain a SC of G. However, it may not be always possible to extend F to a cactus. If possible, we state that F is cactus extendable in G. If F is cactus extendable in G, then it may be possible to extend to a cactus by adding different subsets of edges taken from G \ F. These different spanning cacti may be of different weights. The minimum spanning cactus extension problem (MSCEP) is the problem of extending the forest to a cactus in G such that the cost of the cactus obtained is the minimum. A necessary and sufficient condition for F to be extendible to a cactus in a complete graph is as follows: Lemma 1 A spanning forest F is extendible to a cactus in a complete graph iff there exists an even degree vertex e in each tree of the forest F [6, 17]. Theorem 3 A spanning forest F in a windmill graph W G(n, m) is cactus extendible iff F has an even degree vertex in each of its trees in each complete graph K n . Proof Condition is necessary: Let F be cactus extendable to a cactus C in W G(n, m). Let F1 , F2 , . . . , Fm be the forests induced by F in the m complete graphs C G 1 , C G 2 , C G 3 , . . . , C G m of W G(n, m). Also, let Ci be a subgraph induced by C by the vertices in C G i . Since the central vertex v is a cut vertex, Ci is a SC of C G i ∀ i. So Ci is a SC E of Fi in C G i . By Lemma 1, Fi must have a vertex of even degree in each of its trees. Condition is sufficient: Let Fi has a vertex of even degree in each of its trees in C G i ∀i = 1, 2, 3, . . . , m. Then by Lemma 1, Fi is spanning cactus extendable in C G i . Let Ci be the SC E of Fi in C G i . As the central vertex v is a cut vertex and common to all Ci , C = ∪Ci , i = 1, 2, . . . , m, is a SC E of F in W G(n, m). Hence the proof. Theorem 4 Spanning cactus extendibility of a forest F in a windmill graph W G(n, m) can be tested in linear time. Proof It is only required to test the degree of all vertices of Fi in each C G i . This takes linear time.
56
C. Debnath et al.
If F is cactus extendable in W G(n, m) then we focus on finding the MSCE of F in W G(n, m). Theorem 5 The MSCE of a forest F in a windmill graph W G(n, m) can be computed in O(mn 3 ) time. Proof C is the MSCE of F iff Ci must be the minimum spanning cactus extension of Fi in the complete graph C G i , ∀ i. This is true as the central vertex v is a cut vertex. But the MSCE of a forest in a complete graph can be computed in O(n 3 ) time [6]. Hence, the MSCE of a forest in a windmill graph W G(n, m) can be computed in O(mn 3 ) time.
7 Conclusion We have studied three problems, SCEP, MSCP, and MSCEP on windmill graphs. We have shown that there exists a SC in every windmill graph. Next, we have shown that the MSCP on a windmill graph is NP-complete. We have also presented a necessary and sufficient condition for spanning cactus extendibility of a forest in a windmill graph. In the end, we have shown that MSCE problem on a windmill graph W G(n, m) can be solved in O(mn 3 ) time. To the best of our knowledge, all these are new results.
References 1. Aho AV, Hopcroft JE (1974) The design and analysis of computer algorithms. Pearson Education India 2. Boaz B-M, Binay B, Qiaosheng S, Arie T (2007) Efficient algorithms for center problems in cactus networks. Theor Comput Sci 378(3):237–252 3. Cormen TH, Leiserson CE, Rivest RL, Stein C (2010) Introduction to algorithms. PHI Learning Pvt. Ltd., Originally MIT Press 4. Kalyani D, Madhumangal P (2008) An optimal algorithm to find maximum and minimum height spanning trees on cactus graphs. Adv Model Optim 10(1):121–134 5. Datta AK (2015) Approximate spanning cactus. Inform Process Lett 115(11):828–832 6. Datta AK, Debnath C (2017) Spanning cactus: complexity and extensions. Discrete Appl Math 233:19–28 7. Debnath C, Datta AK Minimum spanning cactus problem on Halin graphs. Commun Elsevier J 8. Debnath C, Datta AK Spanning cactus and spanning cactus extension on outerplanar graphs. Commun Elsevier J 9. Debnath C, Datta AK (2020) A short note on spanning cactus problem of Petersen graph. In: Dawn S, Balas VE, Esposito A, Gope S (eds) Intelligent techniques and applications in science and technology. Springer International Publishing, Cham, pp 757–760 10. Debnath C, Datta AK (2020) Spanning cactus existence in a three-dimensional (3 × 3 × 3) grid. In: Proceedings of the international conference on innovative computing and communications (ICICC) 11. Dinits EA (1976) On the structure of a family of minimal weighted cuts in a graph. In: Studies in discrete optimization
Spanning Cactus Existence, Optimization, and Extension in Windmill Graphs
57
12. Lisa F (1999) Building chain and cactus representations of all minimum cuts from hao-orlin in the same asymptotic run time. J Algorithms 33(1):51–72 13. Garey MR, Johnson DS (1979) Computers and intractibility: a guide to the theory of NPcompleteness. W.H. Freeman and Co., San Francisco 14. Gutin G, Punnen AP (2006) The traveling salesman problem and its variations, vol 12. Springer Science & Business Media 15. Harary F (1969) Graph theory. Addison-Wesley 16. Hobbs AM (1979) Hamiltonian squares of cacti. J Comb Theor Ser B 26(1):50–65 17. Kabadi Santosh N, Punnen Abraham P (2013) Spanning cactus of a graph: existence, extension, optimization and approximation. Discrete Appl Math 161(1):167–175 18. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1):48–50 19. Maheshwari A, Zeh N (1999) External memory algorithms for outerplanar graphs. In: International symposium on algorithms and computation. Springer, pp 307–316 20. Mitchell SL (1979) Linear algorithms to recognize outerplanar and maximal outerplanar graphs. Inform Process Lett 9(5):229–232 21. Anna P (2005) Complexity of the directed spanning cactus problem. Discrete Appl Math 146(1):81–91 22. Paten B, Diekhans M, Earl D, St John J, Ma J, Suh B, Haussler D (2011) Cactus graphs for genome comparisons. J Comput Biol 18(3):469–481 23. Pulleyblank WR (1979) A note on graphs spanned by Eulerian graphs. J Graph Theor 3(3):309– 310 24. Zmazek B, Zerovnik J (2005) Estimating the traffic on weighted cactus networks in linear time. In: Ninth international conference on information visualisation (IV’05). IEEE, pp 536–541
Effect of Noise in Khasi Speech Recognition System Fairriky Rynjah, Khiakupar Jyndiang, Bronson Syiem, and L. Joyprakash Singh
Abstract In this paper, we investigated the effect of noise in Khasi automatic speech recognition system (ASR). Acoustic models (AMs) were built with clean speech data and tested with or without noise to analyze the performance. The AMs were trained with 39–dimensions cepstral mean-variance normalization(CM VN) for Melfrequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP). Linear discriminative analysis (LDA) was applied and then de-correlated by using maximum likelihood linear transformation (MLLT) to assign the integrated frames to 40 dimensions. Speaker adaption was also used to produce word lattices for each utterance generated by hidden Markov model (HMM). The study shows that increasing the signal to noise ratio (SNR) leads to the improvement of the recognition performance. Further, subspace-Gaussian mixture model (SGMM) outperformed the other conventional models irrespective of the data used. Additionally, with the increase of SNR level from 10–30 dB, reduction of word error rate (WER) in the range of 37.75–14.30% for MFCC and 37.70–14.91% for PLP were achieved, respectively. Keywords Gaussian mixture model · Hidden-markov model · Khasi speech recognition system · Mel-frequency cepstral coefficients · Noisy data · Perceptual linear prediction · Signal to noise ratio · Subspace Gaussian mixture model · Word error rate
1 Introduction The automation of speech recognition for devices or machines has been a focus of research for many years [1]. Not only are we witnessing the applications of this research but even we are using them in our daily life. Despite the success achieved in this field, there is still no system that will recognize speech accurately. F. Rynjah (B) · K. Jyndiang · B. Syiem · L. J. Singh Department of Electronics and Communication Engineering, NEHU, Shillong, 793022, India e-mail: [email protected] L. J. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_6
59
60
F. Rynjah et al.
The main objective of this work is to obtain better performance of the ASR systems even with limited amount of data and in the presence of noise. The addition of noise to the clean signal is considered because in real-life applications it is not possible to have an ideal signal or data. In this paper, we developed the system using the Kaldi ASR toolkit. Kaldi s an open-source toolkit to develop ASR system under the Apache license. In order to report the best WER, we follow the standard procedures of Kaldi toolkit [2]. This toolkit generates better quality lattices and is fast enough to recognize speech in real-time [3]. To achieve some improvement as compared to conventional GMM, we use the SGMM which is one of the efficient ASR frameworks in case of acoustic modeling and speaker adaptation training [4]. This modeling technique uses fewer parameters that are tied to the acoustic state with which several parameters are globally shared [5]. Some of the related works mentioned below have motivated this study to extract and analyze the spectral features using Additive white Gaussian noise (AWGN) at different signal-to-noise ratio (SNR) levels. Dutta et al. developed a phonetic engine comparing MFCC, PLP, and LPCC using HTK, and observed the performance of each spectral feature. AWGN of different SNR levels with 5, 10, 15, and 20 dB is added to each data set. In all modes of the speech data, the recognition accuracy using MFCC and PLP gives better results than using LPCC with increasing SNR levels [6]. Krishnamoorthy et al. in their work described speaker recognition with limited and noise added data. They used the MFCC feature extraction technique to extract features from the TIMIT database. They have incorporated noise into both the training and testing datasets. Using GMM-UBM modeling technique they achieved an accuracy of 78.20%. Using both noisy and limited data, accuracy of 80% was achieved. It was observed that adding a higher SNR noise level to the speech data increased the performance of the system [7]. A robust Mizo digit recognition system was developed by Sarma et al. where the systems was trained with clean data as well as with noise added clean data. In white Gaussian noise condition they have added noise to train data with SNR levels of 0, 5, 10, 15, and 20 dB. It was reported that the performance of SGMM-HMM and DNN-HMM was better than GMM-HMM with trained clean data or noise added clean data with increasing SNR level [8] Povey et al. described a work on speech recognition using SGMM. They demonstrated that using SGMM results in a lower WER rather than using the conventional model. The best baseline model give WER of 46% while SGMM results show improvement with 44.5% WER [9]. A work on children’s speech recognition using SGMM and DNNHMM was done by Giuliani and Bagher. The speech database language that they used is Italian. From the results obtained it was observed that using SGMM is an effective way for speech recognition but DNN-HMM gives better performance [5]. Tong et al. performed computer assissted language using SGMM, their outcomes showed better performance compared to basline MMI-GMM/HMM by 25% for native and 47% for non-native training data [10]. Guglani and Mishra reported the impact of using deep neural network (DNN) techniques in speech recognition systems using Kaldi toolkit for the continuous Punjabi language. Karel’s DNN model provides better performance than Dan’s DNN model. MFCC features gives better results than PLP features irrespective of the models trained in bi-gram model and tri-gram model [11].
Effect of Noise in Khasi Speech Recognition System
61
2 The Khasi Language The Khasi language is widely spoken in the state of Meghalaya and even in the border of the neighboring state. The language is classified as an Austro-Asiatic language which belongs to the Mon-Khmer family [12]. As per the Statistical Handbook of Meghalaya 2008, the population of Meghalaya language 2001, around 48.6% of the population speaks Khasi language with variations of dialects depending on geographical region [13]. Sohra dialect is considered the standard Khasi dialect [14]. In this work, the focus is on the standard Khasi dialect since it is the common dialect for communication.
3 Subspace Gaussian Mixture Model The basic equations of the model are expressed as follow: p(x| j) =
K
w jk N x; µ jk ,
k=1
µ jk = Mk v j w jk =
(1)
k
ex pW T v j K k T k=1 wk v j
(2)
(3)
where x is the feature coefficient, and p(x| j) is the probability function model for speech state j. K is the number of shared Gaussian mixtures used to model the state j with full co-variance. k w jk and µ jk are the mixture weights and means, respectively. In SGMM model, the speech states mixture weights and means are obtained from a globally shared weight vector wk and mean matrix Mk . It utilizes a lowdimensional state-specific vector v j , rather than being trained directly as traditional GMM model. An SGMM model consists less number of parameters compared to the traditional GMM model, since the full co-variance matrix is accessible among states. This implies that using less amount of training data is also possible while developing comparable models yet maintaining system accuracy [10].
4 Database Development In this work, a standard Khasi (Sohra) dialect speech corpus has been used. The speech data was collected from native speakers of various age groups and gender. Native speakers were provided with sentences to read. To record the speech data, Zoom H4n handy portable digital recorder was used. A total of 12,000 speech wave
62
F. Rynjah et al.
files with a sampling rate of 16 kHz were used in the experiment. The duration may vary from one speech file to another in the range 1–15 sec approximately. Each speech file has a corresponding transcribed label file. The speech and transcription files are prepared accordingly as per the ASR toolkit requirement.
5 Feature Extraction In any ASR system, acoustic feature play a vital role. The most commonly used feature extraction techniques are PLP and MFCC. In this experiment, PLP and MFCC features were used. In the process of features extraction, a 25 ms window size was used, with a 10 ms frame shift. Prior to the feature extraction, the speech files were sampled at a sampling rate of 16 kHz. In a single window frame, the samples are reduced to 13 coefficients (static features). The first and second-order derivatives of the features were also computed which are the dynamic features (26 features). A total of 39 feature vectors are extracted from each frame of the speech files. The feature vectors are then normalized using cepstral mean and variance normalization (CMVN) [3] which were further used for training the AMs.
6 Experimental Approach In this study, the experiments were conducted using the Kaldi ASR toolkit on Ubuntu 18.04 long term support (LTS). The acoustic features were computed as explained in Sect. 5. The extracted features were used to train the monophone model. The parameters that we used here are default parameters of Kaldi. From the trained monophone model, calculation of the alignments of the utterances were done which were then used in the triphone1 model. The delta features are computed in the triphone1 model. We used 2500 leaves and 15,000 Gaussians to train this model. To train the triphone2 (LDA+MLLT) model, initially the number of dimensions is projected down to 40 as specified by default with the use of MLLT and LDA. Then the alignments from triphone1 were used. The number of leaves and Gaussians that were used in this model were the same as in the previous model. The triphone3 (LDA+MLLT+SAT) model was trained using SAT algorithm. This was further performed for every speaker by approximating one feature space maximum likelihood linear regression (fMLLR). The number of leaves and Gaussians were kept same to triphone1 and the alignments were used from the preceding model (i.e., triphone2). SGMM model was developed on the top of LDA+MLLT+SAT (triphone3). In order to build this model, firstly the universal background model (UBM) has to be developed. In this experiment, UBM was build by setting the number of Gaussians to 400. Finally, SGMM was built with 7000 leaves and 9000 Gaussians. Like the triphone2 model, SGMM uses 40-dimensional feature vectors as input [15].
Effect of Noise in Khasi Speech Recognition System
63
7 Results and Discussion Noise addition is performed in order to increase the number of speech features artificially [7]. In this experiment, AWGN was added to the clean speech to develop noisy data at different SNR levels (i.e., 10, 20, and 30 dB). To observe the distribution of features with or without additive noise, further analysis was carried out with 200 ms speech data. The result is elaborated as shown in Figs. 1 and 2. Figure 1a–d shows the scattering of the first 3 MFCC coefficients (c1 , c2 and c3 ) computed separately from clean speech, 10, 20, and 30 dB noise. Similarly, Fig. 2a–c shows the scattering of the first 3 MFCC coefficients after adding 10, 20, and 30 dB noise to the clean speech. From results, it can be observed that adding 10 dB SNR level to the clean speech the coefficients scattered differently. Further as we increased the SNR level particularly with 30 dB, distribution of the coefficients are more similar to the clean speech. Furthermore, the log-spectral distance (log-SD) for clean speech and noisy data for 50 frames were plotted and the result is shown in Fig. 2. It is evident from the figure that the log-SD between the clean and the clean+30 dB speech data are
Fig. 1 Distribution of the first 3 MFCC coefficients in feature space for clean and different SNR level of noise (in dB)
64
F. Rynjah et al.
Fig. 2 Distribution of the first 3 MFCC coefficients in feature space for clean with different level of additive noise
Fig. 3 Log-spectral distance (Log-SD) for clean and noisy data
Effect of Noise in Khasi Speech Recognition System
65
Table 1 Comparison of WER (in %) for clean and noisy data evaluated from different models. Models Monophone
Triphone1
Triphone2
Triphone3
SGMM
MFCC PLP
MFCC PLP
MFCC PLP
MFCC PLP
MFCC PLP
Clean
23.87
24.65
15.61
16.12
14.82
15.53
13.92
14.59
12.57
13.80
Clean+10 dB
78.72
78.10
60.31
61.66
57.52
57.36
42.63
41.89
37.75
37.70
Clean+20 dB
44.79
44.22
21.88
21.38
19.80
19.97
18.53
18.01
16.53
15.09
Clean+30 dB
25.89
27.65
19.98
20.01
19.17
19.13
19.17
16.90
14.30
14.91
close. However, as we decrease the SNR level, the log-SD particularly with the clean+10 dB deviated more with respect to the clean speech. Further investigation were made by building different AMs and tested with data under different conditions. The comparison of results is illustrated in Table 1. From the study, it shows that SGMM performed better compared to the other conventional models under different conditions. This may be because SGMM is a larger class of generative model and well suited in the case of less amount of training data as stated [5] (Fig. 3). The performance of the ASR system is evaluated in term of WER which is defined as Eq. 4. (D + S + I ) ∗ 100 (4) WER(%) = N where D, S, I, and N is the number of deletions, substitutions, insertions, and total number of words in the test data, respectively [11]. In term of features used, it was observed that MFCC provided better performance compared to PLP in most of the conditions. This could be due to the non-linearity of the speech signal, as stated in [16] and since MFCC is reliable for more coefficients and filters [11].
8 Conclusion This paper report the impact of noise on Khasi speech recognition system. The study shows that the recognition performance is greatly affected by the level of noise. Lower the SNR levels leads to deteriorate the recognition performance. As the AM is concerned, SGMM performed better compared to the other conventional models under different conditions. Also MFCC provides better results than PLP in most of the conditions. As for the future plan more amount of speech data and other machine learning methods may be explored as to enhance the recognition performance.
66
F. Rynjah et al.
References 1. Rabiner L, Juang B, Yegnanarayana B (2009) Fundamentals of speech recognition. Pearson 2. Ali A, Zhang Y, Cardinal P, Dahak N, Vogel S, Glass J (2015) A complete KALDI recipe for building Arabic speech recognition systems. In: IEEE workshop on spoken language technology, proceedings. pp 525–529. https://doi.org/10.1109/SLT.2014.7078629 3. Upadhyaya P, Farooq O, Abidi MR, Varshney YV (2017) Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: International conference on wireless communications, signal processing and networking (WiSPNET), pp 786–789. https://doi.org/10.1109/ WiSPNET.2017.8299868 4. Motlicek P, Dey S, Madikeri S, Burget L (2015) Employment of subspace Gaussian mixture models in speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4445–4449. https://doi.org/10.1109/ICASSP.2015.7178811 5. Giuliani D, Bagher B (2015) Large vocabulary children’s speech recognition with DNNHMM and SGMM acoustic modeling. Interspeech, pp 1635–1639. https://doi.org/10.21437/ Interspeech.2015-378 6. Dutta SK, Nandakishor S, Singh LJ (2017) A Comparative study on feature dependency of the Manipuri language based phonetic engine. In: 2nd international conference on communication systems, computing and IT applications (CSCITA), pp 5–10. https://doi.org/10.1109/CSCITA. 2017.8066533 7. Krishnamoorthy P, Jayanna HS, Prasanna SRM (2011) Speaker recognition under limited data condition by noise addition. Exp Syst Appl 38(10):13487–13490. https://doi.org/10.1016/j. eswa.2011.04.069 8. Sarma BD, Dey A, Lalhminghlui W, Gogoi P, Sarmah P, Prasanna SRM (2018) Robust Mizo digit recognition using data augmentation and tonal information. In: 9th international conference on speech prosody, pp 621–625. https://doi.org/10.21437/SpeechProsody.2018-126 9. Daniel P, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010) Subspace Gaussian mixture models for speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4330–4333. https://doi.org/10.1109/ICASSP.2010.5495662 10. Tong R, Lim P, Chen NF, Ma B, Li H (2017) Subspace Gaussian mixture model for computerassisted language learning. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5347–5351. https://doi.org/10.1109/ICASSP.2014.6854624 11. Guglani J, Mishra A (2018) Continuous Punjabi speech recognition model based on kaldi asr toolkit. Int J Speech Technol 21:211–216. https://doi.org/10.1007/s10772-018-9497-6 12. Syiem B, Dutta SK, Binong J, Singh LJ (2021) Comparison of Khasi speech representations with different spectral features and hidden Markov states. J Electron Sci Technol 19(12):1–7. https://doi.org/10.1016/j.jnlest.2020.100079 13. Syiem E (2014) Ka Ktien Nongkrem ha ki pdeng rngi lum ka ri lum Khasi, 19 14. Bareh S (2004) Khasi proverbs: analysing the ethnography of speaking folklore, Ph.D. thesis 15. Gamage B, Pushpananda R, Ruvan W, Thilini N (2020) Usage of combinational acoustic models (DNN-HMM and SGMM) and identifying the impact of language models in Sinhala speech Recognition. In: 20th international conference on advances in ICT for emerging regions (ICTer), pp 17–22. https://doi.org/10.1109/ICTer51097.2020.9325439 16. Rynjah F, Syiem B, Singh LJ (2022) Investigating Khasi speech recognition systems using a recurrent neural network-based language model. Int J Eng Trends Technol 70(7):269–274. https://doi.org/10.14445/22315381/IJETT-V70I7P227
Text and Language Independent Classification of Voice Calling Platforms Using Deep Learning Tapas Chakraborty, Rudrajit Bhattacharyya, Priti Shaw, Sourav Kumar, Md Mobbasher Ansari, Nibaran Das, Subhadip Basu, and Mita Nasipuri
Abstract Audio and video conferencing apps like Google meet, Zoom, Mobile call conference are becoming more and more popular. Conferencing apps are used not only by professionals for remote work, but also for keeping social relations. Present situation demands understanding of these platforms in details and extract useful features to recognize them. Identification of conference call platforms will add value in forensic analysis. Our research focuses on collecting audio data using various conferencing apps. Audio data are collected in real world situation, i.e., in noisy environments, where speakers spoke in conversational style using multiple languages. After data collection, we have examined whether platform specific properties are present in the audio files or not. Pre-trained deep learning models (DenseNet, ResNet) are used to extract features automatically from the audio files. High recognition accuracy (99%) clearly indicates that these audio files contain significant amount of platform specific information. Keywords Voice calling platforms · Audio conferencing · Google meet · Zoom · Discord · CNN · DenseNet · ResNet · Signal processing
T. Chakraborty (B) · R. Bhattacharyya · P. Shaw · S. Kumar · M. M. Ansari · N. Das · S. Basu · M. Nasipuri Jadavpur University, Kolkata 700 032, India e-mail: [email protected] N. Das e-mail: [email protected] S. Basu e-mail: [email protected] M. Nasipuri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_7
67
68
T. Chakraborty et al.
1 Introduction During the pandemic, use of audio and video conferencing apps increases significantly. Conferencing apps are used not only for remote work or distance education, but also for social relations. Zoom and Google meet are more popular than the other conferencing apps. In this research, we have also considered mobile call conferencing app and Discord. Zoom is a video conferencing software developed by “Zoom Video Communications”. It offers several packages to its customers. Free plan allows conference call up to hundred participants at a time, and maximum time limit of forty minutes. Whereas paid plan supports up to thousand participants lasting up to thirty hours. Google Meet (also called Hangouts Meet) is another video conferencing app developed by Google. Some of the features that both Zoom and google meet offer, are given below: (a) multi-way audio and video calls (b) chat between participants (c) join through a web browser or through mobile apps (d) screen-sharing (e) join using dial-in numbers. Discord is another calling and messaging platform popular among a gaming group, or friends who want to spend time together. Here users can communicate with each other using voice or video calls, text messaging, send media and files. Audio data analysis from various conference calling platform requires a considerable amount of data. However, such data is not readily available. So we have decided to collect the data first then perform the analysis. In this experiment, audio data were collected using various conferencing apps. Speakers spoke in conversational style using multiple languages and in real world situation, i.e., in noisy environments. Audio data is recorded using one of the devices participating in the conference call. Our focus is to extract platform specific features from the audio files. Pre-trained deep learning models have the capability to extract useful features automatically. Hence, these models are used for this purpose. Audio data is first pre-processed to remove noise and non-voice parts as described in [2, 4]. Spectrograms are then generated from already processed audio data. We have used Librosa, a python library, [8] for audio data pre-processing and generating Spectrograms. Spectrograms are given as input to pre-trained model. During training phase, model has been trained using audio data of known platforms. During testing, model is tested with audio files from unknown platforms. Block diagram of the overall process is given Fig. 1. Remaining part of this paper is organized as follows—Data collection part is described in Sect. 2. Section 3 talks about various audio pre-processing methodologies, CNN architecture used for classification. Remaining Sects. 4 and 5 Experiments, results, and conclusions.
Text and Language Independent Classification …
69
Fig. 1 Block diagram of the overall process
2 Data Collection Audio data was collected mainly from the following four apps (a) Mobile conference call (b) Google Meet (c) Zoom (d) Discord. Two or more participants are talking to each other (a) in languages of their own choice (b) in conversational style where topics are not pre-defined (c) in real world, i.e., environmental noises are present (d) speakers can speak anytime, that is why some overlapping conversations are there. Link of sample audio files given here [3]. There are 100 such audio files each of them having duration of approximately 5 min. Initially conference calls were recorded using device of one of the participants. Audio data have been used directly while video data are converted into audio files. These audio files were then segmented into several 5 s audio clips using Audacity software. During this conversion, non-voice regions were removed as much as possible. Figure 2 shows voice calling platform wise distribution of audio data.
3 Methodology First step of this experiment is to determine the input to be provided for classification. As we know, processed data should not be used as input to a deep learning model like CNN. Because CNN model needs to automatically extract features. Therefore input of CNN would be raw audio signal. Dieleman et al. [5] experimentally proved that CNN performs better when inputs are Spectrograms. So we have used Spectrograms as input to CNN. Audio signals are pre-processed first, to remove silent and noisy parts. Then Spectrograms have been generated.
70
T. Chakraborty et al.
Fig. 2 Distribution of conference call recordings
3.1 Pre-Emphasis A high pass filter (HPF) is applied on audio data to raise amplitude of the high frequencies. This process removes lower freq. noises as well. When F(t) denotes audio signal, pre-emphasis is performed using below equation [4]. F(t) = F(t) − α ∗ F(t − 1)
(1)
α is a parameter. Usually a value of 0.97 is used for α.
3.2 Removal of Silence Frames Silence, i.e., Non-voice regions of the audio signal were removed by applying STSA. Pre-emphasized signal is then broken down into many short frames (20 ms window with overlap of 10 ms). Silence frames were identified by comparing with average frame energy [1]. | f (t)2 | (2) K avg (t) = i N where N is the total number of frames. If K i > m ∗ K avg that specific frame will be treated as voiced region. Else, it would be considered as silence frame and will be removed. Here, m is a parameter, which is 0.2.
Text and Language Independent Classification …
71
Fig. 3 Mel-spectrogram generated from Zoom, Google meet, mobile conference, and discord, respectively
3.3 Audio Spectrogram Generation Spectrograms have been produced from the pre-processed audio data. Figure 3 shows various Spectrograms generated from Zoom, Google meet, Mobile conference, and Discord, respectively. Below Spectrograms indicates that they contain significant amount of audio information and further audio processing is not required.
3.4 Model Architecture Convolutional neural network (CNN) is a type of deep neural networks. When CNN is used for classification, less pre-processing is required. This indicates that CNN can learn key features from the data automatically.
72
T. Chakraborty et al.
Fig. 4 DenseNet model architecture
Rather than building a CNN from scratch, models developed for other tasks can be used for our purpose. This technique is called Transfer learning, which is becoming more and more popular now a days. In this study, we have used two variations of such CNN models, DenseNet-201 [7] and Resnet-50 [6]. Figure 3 shows architecture of the model that we have used in this paper (Fig. 4).
4 Experiment and Results 4.1 Data Preparation Audio data collected from various voice conferencing apps are the source for this experiment. Hundred files are the source for this experiment, as described in Data collection section. These hundred files are sub-divided into 5 sec audio files and Spectrograms are generated from them. CNN usually requires balanced training data. However, the data we collected, does not have balanced data. Some platforms have extremely low data, that makes this experiment more challenging. Three fold cross validation method is applied on this data and recognition accuracy figures are noted.
4.2 Model Training DenseNet-201 and ResNet-50 are implemented using tensor flow. Training data is of the form (X i , Yi ) where X i is input spectrogram info for i th platform of shape 3 × 224 × 224 and Yi is the input label for i th platform. Our target is to reduce the overall loss w.r.t. all platforms.
Text and Language Independent Classification …
73
Densenet201 model has been used with SGD optimizer and categorical cross entropy loss function. Model is trained on the training data set and validated on the validation data set. Accuracy is measured up to 150 epochs. Best model is determined by highest accuracy.
4.3 Classification by Using Deep Learning Models Suppose n platforms are there S = {1, 2, . . . , n}. Output layer of DenseNet should has n nodes, one for each platform. An unknown audio data is given into the model, that generates a vector with n number of scores. ith score signifies chances of that audio file to become ith platform. Highest score is taken. Formula for this process is given below cˆ = arg max( p(xi ))
(3)
k∈S
In this case, cˆ is the classified platform and ith platform’s score is p(xi ), below is the formula for used. ex p(xi ) = ni
(4)
x 1 ek
4.4 Model Performance Accuracy is measured using proportion of correct identification w.r.t total platforms Eq. 5. Accuracy(%) = i
Number of platforms correctly classified Total number of platforms
∗ 100
(5)
Performance of this model was evaluated using confusion matrix.
4.5 Results Experiment is performed on the audio data collected from four platforms. Platform wise and overall accuracy figures have been reported below Table 1. Figure 5 is the confusion matrix, which clearly indicates that very few samples are incorrectly classified.
74
T. Chakraborty et al.
Table 1 Overall accuracy and platform wise break up Platform DenseNet accuracy Zoom Google meet Mobile conference Discord Overall
99.76 100 100 92.30 99.72
ResNet accuracy 99.92 100 97.36 92.30 99.73
Fig. 5 Confusion matrix
5 Conclusion Main contribution of this research is to verify whether audio files from various conferencing platforms contains platform specific information or not. Data have been collected from various conferencing platforms frequently used today. Standard approaches have been followed to classify the audio files. High recognition accuracy indicates that audio files from various conferencing platforms retain platform specific information. Recognition of voice calling platform used for conference call will add value in forensic analysis. In future, we are planning to address class imbalance problem by collecting more audio data. Also data will be collected from other conferencing platforms like Skype, Whats-app, Facebook messenger, Microsoft Teams, Cisco Webex. Acknowledgement: This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India.
Text and Language Independent Classification …
75
References 1. Barai B, Das D, Das N, Basu S, Nasipuri M (2018) Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent engineering informatics. Springer, pp 337–346 2. Barai B, Chakraborty T, Das N, Basu S, Nasipuri M (2022) Closed-set speaker identification using VQ and GMM based models. Int J Speech Technol 3. Chakraborty T (2021) Audio files recorded using different voice calling platforms. Figshare. media. https://doi.org/10.6084/m9.figshare.14731629.v1 4. Chakraborty T, Barai B, Chatterjee B, Das N, Basu S, Nasipuri M (2020) Closed-set deviceindependent speaker identification using CNN. In: Bhateja V, Satapathy SC, Zhang YD, Aradhya VNM (eds) Intelligent computing and communication. Springer Singapore, Singapore, pp 291– 299 5. Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6964–6968 6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 7. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269 8. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python
An Application of Anomaly Detection to Financial Fraud Using Machine Learning Udochukwu Okoro, Usman Ahmad Baba, and Sandip Rakshit
Abstract With the increasing rate of adoption of financial technologies otherwise known as fintech applications, financial fraud has also morphed and infiltrated the financial systems we have today. Financial transactions in today’s world are divided into card-present and card-not-present transactions. Card-not-present transactions are especially prone to these fraudulent attacks. However, as fraudulent transactions become more undetectable with the increase in the skill set of the perpetrators, it is important to identify a model capable of detecting these fraudulent cases accurately. This paper utilized supervised learning algorithms such as support vector machine algorithm, logistic regression, and Naïve Bayes algorithms to classify clean transactions and fraudulent transactions. A model for identifying fraudulent transactions was derived. It was identified that the SVM approach was the most efficient with an accuracy of 99.5%, a recall of 1, and a precision of 0.5. This is essential to improve our means of tracking and identifying fraudulent transactions as they occur in the financial system. Keywords Fintech · Card-not-present fraud · Support vector machine · Financial fraud · Logistic regression
U. Okoro (B) · S. Rakshit American University of Nigeria, Yola, Nigeria e-mail: [email protected] S. Rakshit e-mail: [email protected] U. A. Baba Pen Resource University, Gombe, Nigeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_8
77
78
U. Okoro et al.
1 Introduction The global pandemic changed the financial space. It became harder to identify the data to highlight fraudulent financial transactions. Two types of financial transactions exist in today’s fintech sector. They are card-present banking and card-not-present [1, 2]. Card-present transactions refer to those transactions that require the physical presence of the customer’s bank card. These are commonly used in Automated Teller Machines and Point of Sale Machines. In most cases, the only layers of security in the card-present transactions are the physical bank card and a four- or six-digit pin that the user is supposed to keep secret. In other cases, such as the contactless payment methods, the customer does not even need to input a pin, and they can simply place their card close to the reader and complete the transaction [1]. Card-not-present transactions which are the focus of this study do not require a physical banking card to be present for a transaction to take place [2]. It uses other forms of authentication such as secret codes, one-time passwords, or other forms of authentication [2, 3]. These transactions usually require the set of numbers on the card; this includes the card number, date of expiry, and the card verification value (CVV) [2]. It will then require the user to input a code that will be sent to the user’s email or phone number [2, 3]. These types of transactions are more fraud-prone due to the remoteness and isolative nature of the transaction [3]. Online banking is also a means of card-not-present transactions in which money can be transferred between parties seamlessly with the click of a button [3]. The layer of security commonly found in this method of transaction is usually the provision of a four or six-digit pin that the user will provide to the bank upon activation of his or her online banking portal [3]. This code can easily be intercepted and hijacked by fraudsters, and this puts this means of transacting money at the risk of fraudulent practices. However, some characteristics exist in the case of fraudulent transactions that set them apart from regular transactions [2]. Both online banking and online card transactions appear to have these characteristics that machine learning algorithms can detect to differentiate between fraudulent and not-fraudulent transactions [4]. As a result, this paper will be exploring the characteristics of a fraudulent transaction in an attempt to map out a definition that financial systems can utilize in the identification, flagging, and termination of fraudulent transactions/anomalies in financial systems [2, 3, 5]. This paper will compare the performance of Naïve Bayes, support vector machine, and logistic regression models in identifying the legitimacy and credibility of transactions carried out in a financial system [4]. Major contributing literatures will be reviewed, and an explanation of the methodology will be provided followed by the discussion of results, conclusion, and then recommendation.
An Application of Anomaly Detection to Financial Fraud Using …
79
2 Literature Review The application of machine learning to financial fraud detection has been explored by various authors using different approaches. In 2021, Roy et al. explored the detection of credit card fraud with the use of various classification algorithms [5]. They found that the logistic regression algorithm was very effective in classifying data, and it returned a 99.8% accuracy; this was the highest among all the algorithms used [5]. Roy et al. used algorithms such as isolation forest, support vector machine algorithm, and logistic regression [5]. Dankhad et al. also utilized logistic regression, random forest, and XGB classifiers to classify fraudulent and non-fraudulent credit card transactions [6]. They found that logistic regression worked exceptionally well in comparison with other machine learning algorithms used in carrying out this classification task [6]. Chen et al. also conducted a study in 2020 that focused on the modeling of antifraud technology with the use of machine learning techniques [7]. They utilized an AdaBoost algorithm that is essentially a highly adaptive combined classification algorithm. This algorithm was proposed by Freund et al. [7]. An analysis of the performances of some machine learning techniques was done by Sadgali et al. in [8]. About 10 classification algorithms were put side by side and compared with each other in terms of performance. It was determined that logistic regression algorithm returned the highest accuracy [8]. Jierula et al. was able to establish in 2021 that different accuracy metrics were appropriate for different scenarios [9]. They were able to test 7 accuracy metrics with some cases of imbalanced data [9]. Jeirula showed in the results that there was no universal accuracy metric that always gives the best representation of model validity [9]. Accuracy metrics such as R, R2 , MSE, RMSE, MAE, MAPE, and SMAPE were used [9].
3 Methodology 3.1 Data Sourcing and Compilation Financial datasets for fraud detection are mostly synthetic in nature due to security and financial privacy concerns. A synthetic dataset was created to aid researchers in the modeling and testing of financial fraud detection models, and it is publicly available on Kaggle [10]. The data contains features such as origin account and balance, recipient account and balance, and the amount sent. These features are essential in depicting the true nature and circumstances of the transaction. The data was then cleaned by eliminating all the null values by imputation with the means of the columns they fell into. The get dummies method was also used to normalize the text content of the dataset to ensure accurate model generation. The data was then split (7:3) into training and validation datasets.
80
U. Okoro et al.
Fig. 1 Histogram showing outcome distribution
In Fig. 1, the distribution of the dataset is shown as 0 represents legitimate transactions and 1 represents illegitimate transactions on the x-axis. Similar to real-world scenario, the number of fraudulent transactions is greatly outnumbered by the number of legitimate transactions.
3.1.1
Support Vector Machine
Classification algorithms utilize labeled data to establish the model that the test data will be passed through in order to be classified. Support vector machine algorithm is a very broad machine learning algorithm that can be used to carry out the task of classifying the data points [11, 12]. Support vector machine uses the concept of a line called the hyperplane to split the data points into two [12]. The model is said to be as efficient as the optimality of the hyperplane [11]. The hyperplane uses the data points closest to the hyperplane to optimize it [11]. These points are referred to as the support vectors, and the goal of the algorithm is to find the widest margin between the two points [12]. These data points are then used to classify the rest of the data points. The equation of a hyperplane is derived from two-dimensional vectors and the equation of a line, and where w is the slope of the line and b is the intercept, we have
An Application of Anomaly Detection to Financial Fraud Using …
w(x) + b = 0
3.1.2
81
(1)
Logistic Regression
Logistic regression is used to predict the probability of a categorical dependent variable where this variable is binary and only contains 1’s and 0’s [13]. P(Y = 1) = f (x)
(2)
Logistic regression is seen in three types: binomial logistic regression which is used to predict the outcome of categorical output with only two possible outcomes, for example, ‘spam’ or ‘not spam’ in a mailbox [14]. There is also multinomial linear regression which is used to classify categorical variables with three or more possible outcomes on the same level, for example, food types: carbohydrates, proteins, and so on [13, 14]. Lastly, there is ordinal logistic regression which is used to classify categorical variables that have an order of hierarchy; for example, reviews and ratings on a scale of 1–10 [13].
3.1.3
Gaussian Naïve Bayes
Naïve Bayes classification algorithm is based on Bayes’ theorem, and it assumes there is total independence among the predictors; this means that this algorithm works on the premise that a particular attribute in a predictor is independent of the existence of other attributes [15, 16]. Bayes’ theorem is concerned with determining “the probability of another event that has already occurred” [17]. Naïve Bayes works on the assumption that the features of the classes are independent. Mathematically, we can represent this as P( A|B) = P(B|A)P( A)
(3)
3.2 Metrics and Evaluation The generated models will be evaluated using three metrics: accuracy, precision, and recall. Accuracy is important in determining the percentage of correctness of the model. Precision metric shows exactly how many false positives there are, and this is very important in anomaly detection as false positives can trigger false detections and cause model inefficiency [18]. Recall metric is used to calculate how many true positives have been identified by the model; in fraud detection, it is important because
82
U. Okoro et al.
if a fraudulent transaction is flagged as not-fraudulent, it can be detrimental to the effectiveness of the model [18].
4 Results and Discussions The results of the study as shown in Table 1 identified that the algorithm that produced the most accurate results was the support vector machine algorithm. This algorithm also scored a precision score of 1.0, a recall score of 0.5, and an accuracy of 99.5% (Fig. 2). The logistic regression algorithm also recorded a high performance with an accuracy score of 99.0% which proved it to be accurate in carrying out this classification task. However, it recorded two false positive cases. Naïve Bayes algorithm was the worst-performing algorithm with an accuracy of 98% and a false negative value of 4 cases. This means that the algorithm scored the lowest accuracy score among the three algorithms used. It is possible that the Naïve Bayes algorithm scored the lowest due to its independent approach to the features of the training dataset. In this case, it was important that the algorithm worked with all the features as dependencies to achieve a connection that would be important for accurate classification of false positives in the testing phase. The results tallied with Roy et al.’s findings that SVM was a good approach to detecting anomalies in card-not-present fraud identification. Table 1 Display of metrics of the model Logistic regression
Naïve Bayes
Support vector machine
Accuracy (%)
99
98.0
99.5
Recall
0.0
0.0
1
Precision
0.0
0.0
0.5
Fig. 2 Confusion matrix for SVM, Naïve Bayes, and logistic regression
An Application of Anomaly Detection to Financial Fraud Using …
83
5 Conclusion Card-not-present transactions have become a gateway for fraudulent practices in the financial industry [2]. Due to the low level of verification and security attached to these methods, it has become a haven for fraudulent individuals [3]. However, with the use of machine learning algorithms, it is possible to develop a model capable of identifying these unseen transactions and tagging them as potentially fraudulent or not-fraudulent [4]. A comparative analysis of Naïve Bayes, logistic regression, and support vector machine algorithms applied to the financial fraud dataset generated accurate models through which the determination of the fraudulent or not-fraudulent nature of a cardnot-present transaction could be established [5]. This model was generated to pass unseen data points into the model to validate the efficiency of the model. The validation analysis found that the support vector machine algorithm was very efficient in classifying data related to financial fraud cases. This is in comparison with the logistic regression and Naïve Bayes approaches. Although the logistic regression algorithm also performed well, it appeared that the dataset worked effectively with the SVM approach.
6 Recommendations Future studies can explore the utilization of these machine learning algorithms with generated datasets from financial institutions. There is a need for these institutions to make these datasets available to generate more accurate real-world models that can be deployed to combat financial fraud effectively. The exploration of the use of deep learning techniques to carry out this task is also encouraged as the precise nature of the deep learning method promises to generate a more specific and accurate model if provided with an expansive dataset.
References 1. Parusheva S (2015) Card-not-present fraud - challenges and counteractions. In: Narodnostopanski Arkhiv. Varna 2. Akers M, Bellovary J (2006) What is fraud and who is responsible? Accounting faculty research and publications 3. Ionela-Corina C (2017) How to prevent fraud? In: CES working papers, vol 1 4. Kondratev I, Bazanov V, Uskov D, Kuchebo A, Sereda T (2021) Comparative analysis of methods for detecting fraudulent transactions. In: IEEE conference of Russian young researchers in electrical and electronic engineering (ElConRus). St. Petersburg and Moscow 5. Roy P, Rao P, Gajre J, Katake K, Jagtap A, Gajmal Y (2021) Comprehensive analysis for fraud detection of credit card through machine learning. In: International conference on emerging smart computing and informatics (ESCI). Pune
84
U. Okoro et al.
6. Dankhad S, Mohammed E, Far B (2018) Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In: IEEE international conference on information reuse and integration for data science. Salt Lake City 7. Chen Y, Wang D (2020) Research on anti-financial fraud technology based on machine learning. In: 2nd international conference on information technology and computer application (ITCA). Guanghzou 8. Sadgali ISN, Benabbou F (2018) Performance of machine learning techniques in the detection of financial frauds. In: Second international conference on intelligent computing in data sciences. Casablanca 9. Jierula A, Wang S, OH T-M, Wang P (2021) Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Appl Sci 11(2314) 10. Lopez-Rajas E (2017) Synthetic financial datasets for fraud detection. Kaggle, 2017. [Online]. Available: https://www.kaggle.com/datasets/ealaxi/paysim1/metadata. Accessed 22 May 2022 11. Wu Y-X, Guo L, Li Y, Shen X-Q, Yan WI (2006) Multi-layer support vector machine and its application. In: International conference on machine learning and cybernetics. Dalian 12. Mohan L, Pant J, Suyal P, Kumar A (2020) Support vector machine accuracy improvement with classification. In: 2020 12th international conference on computational intelligence and communication networks (CICN). Bhimtal 13. Zou X, Hu Y, Tian Z, Shen K (2019) Logistic regression model optimization and case analysis. In: IEEE 7th international conference on computer science and network technology (ICCSNT). Dalian 14. Pavlyshenko S (2016) Machine learning, linear and Bayesian models for logistic regression in failure detection problems. In: IEEE international conference on big data (big data). Washington DC 15. Ji Y, Yu S, Zhang Y (2011) A novel Naive Bayes model: packaged hidden Naive Bayes. In: 6th IEEE joint international information technology and artificial intelligence conference. Chongqing 16. Ma TM, Yamamori K, Thida A (2020) A comparative approach to Naïve Bayes classifier and support vector machine for email spam classification. In: IEEE 9th global conference on consumer electronics (GCCE). Kobe 17. Hairani H (2021) The abstract of thesis classifier by using Naive Bayes method. In: International conference on software engineering and computer systems and 4th international conference on computational science and information management (ICSECS-ICOCSIM). Pekan. 18. Powers D, AILab (2011) Evaluation; from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2
A Pre-processing-Aided Deep Transfer Learning Model for Human Object Detection in Crowd Scenarios Soma Hazra, Sunirmal Khatua, and Banani Saha
Abstract Human object detection in crowd scenarios is crucial for crime prevention and overall monitoring. However, this is a very challenging task. Typically, a wide range of surveillance systems operates in real-time applications. The quality of such captured real-time footage of the cameras is not comparable in terms of resolution, size, and orientation. Thus, improving the quality of those videos/images is necessary to detect objects accurately. In this work, we employed a pre-processing-aided deep learning technique for detecting human objects in the image of crowd scenarios. First, we enhanced the image quality using a variety of pre-processing approaches. Then, objects are detected in the pre-processed images of crowed scenarios using several pre-trained deep learning (DL) models. The efficacy of the proposed framework has been analyzed on a standard WIDER-FACE dataset concerning various performance measures, both with and without pre-processing of images. From the experimental results, it is found that the use of pre-processing at the top of deep learning models performs significantly better than that of without pre-processing. Keywords Object detection · Pre-processing · Transfer learning · Crowded scenarios · Deep learning
1 Introduction Object detection and object recognition have always been considered very important tasks for any crowd surveillance system. There is a wide variety of surveillance systems operating in real-time applications. The qualities of capturing the real-time footage of the cameras are not equal. They can differ with respect to image resolution, image size, image orientation, etc. Thus, successful object detection in this variable S. Hazra (B) Sister Nivedita University, Kolkata, India e-mail: [email protected] S. Khatua · B. Saha University of Calcutta, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_9
85
86
S. Hazra et al.
environment is quite a difficult task. Nowadays, many deep learning-based techniques are available to detect human objects in a real-time crowded environment. Although the performance of those algorithms can differ with variant backgrounds and camera resolutions, therefore, object detection in real-time videos is a classic problem that has received significant attention in recent years. With recent advances in data-driven deep learning (DL), various DL models such as convolutional neural networks (CNN) are being widely employed to detect objects in real-time videos or images [1]. Typically, CNN-based object detection has been focused mainly on two strategies: two-stage detection and one-stage detection. Twostage method is referred to as a region-based technique. First, region proposals are generated from the given scene. After that, candidate frames are verified and objects are detected through regression and classification. Thus, it requires more complex training and performs slower to generate the desired outcome. One-stage detection technique can overcome this. There is no region proposal that is employed in onestage detection. The input image can be uniformly sampled to extract features for classification and regression appropriately. Over the past years, researchers have proposed several handcrafted feature-based and DL-based solutions to the problem of recognizing objects in real-time videos [1]. The researchers used Histogram of Oriented Gradients (HOG) to extract human characteristics in a study by Dalal and Triggs [2]. SURF [3], LBP [4], Edgelet [5], Haar-like [6], Shapelet [7], and NNPACK [8] among the other characteristics are used to extract human features. Aside from the SVM approach, two other methods for creating the human classifier are Naive Bayesian [9] and AdaBoost [10]. Besides, lightweight CNN [11], R-CNN [12], Faster R-CNN [13], Mask R-CNN [14], YOLO [15], and SDD [16] are well-known deep learning-based object identification algorithms, according to research. In our work, we have used one two-stage and two one-stage deep learning models for object detection in the frames of crowded scenarios. First, input image frames are scaled, normalized, and augmented as a pre-processing task. Then, pre-processed image frames are given as input to the deep learning models for object detection. The need of pre-processing to improve the performance of deep learning models has been stressed in this method. To assess the method, we have used WIDER-FACE database and various performance indicators. Also, we have tested the method for both the pre-processed and without pre-processed image frames. It has been observed that pre-processed frames can accelerate the performance of deep learning models over a diverse variety of crowded environments. The remainder of the paper is structured in the following manner. Section 2 describes the proposed methodology in detail. Detailed experimentation and results are discussed in Sect. 3. Section 4 concludes the paper with future scope.
A Pre-processing-Aided Deep Transfer Learning Model for Human …
87
2 Proposed Methodology This section presents our proposed method for object detection in crowded scenarios. The proposed method starts with pre-processing of frames followed by object detection using the deep learning technique. Figure 1 shows the flow of the proposed method.
2.1 Pre-processing In the context of real-time object detection scenarios, captured video qualities can vary in general. The efficacy of object detection systems can be affected when the obtained images have varying sizes, non-homogeneous resolution, and occlusions. Thus, it required some pre-processing on the input videos in order to detect objects efficiently. The pre-processing techniques used in this work include scaling and normalization. Scaling: In most cases, the size of the captured video frames varies due to camera quality and background. This may have an impact on the next phases in the object detection process. Thus, the input video frames are scaled in this approach. Here, we have used the bilinear interpolation method [17] to scale the video frames. In this method, all the video frames are scaled by 640 × 640. Figure 2 shows sample frames and corresponding scaled frames. Normalization: The resolution of captured video frames varies in most circumstances owing to camera quality and background. This could also affect the subsequent stages of the object detection process. Hence, an image normalization technique, as used in [18], is used here to standardize the range of pixel values in the scaled video frames so that they can be converted to unified resolution. In this method [18], a min–max algorithm is used to normalize the pixel values in the scaled video frames. Sample frames and corresponding normalized frames are shown in Fig. 2.
Fig. 1 Flow diagram of the proposed method
88
S. Hazra et al.
Fig. 2 Shows a original video frames, and b after applying scaling and normalization
2.2 Data Augmentation Stronger deep learning models can be created utilizing larger, higher-quality training datasets through a process called data augmentation. Variable object position, partial or full occlusions, and a restricted dataset are all regular issues when dealing with real-time crowded situations. As a result, we employed data augmentation techniques in the object detection models to mitigate the effects of all of the aforementioned problems. The MixUp [19] approach is utilized in our work for data augmentation. MixUp randomly selects two samples from the training images and applies a weighted summation to them. The labels of the samples match the weighted summation. Two images are randomly cropped and randomly flipped horizontally. By averaging the pixel values for each of the RGB channels, these images are then combined. The new image has the same label as the original one that was chosen at random.
2.3 Object Detection After pre-processing of input videos, our next task is to detect objects in the given videos. In the field of object detection, DL-based algorithms have emerged in recent years. Here, we have employed a two-stage and two one-stage pre-trained CNN models to detect human faces in pre-processed images efficiently. These models are Faster-R-CNN [20], YOLO-v3 [21], and YOLO-v5 [22].
A Pre-processing-Aided Deep Transfer Learning Model for Human …
89
Faster R-CNN: The Faster R-CNN [20] is a deep convolutional network that appears to the user as a single, end-to-end, unified network for object detection. This model can predict the positions of various objects with high accuracy and speed. It can distribute computations across all region-of-interests (RoIs) rather than performing them separately for each proposal. The RoI pooling layer is responsible to make this network faster. We employed the pre-trained Faster R-CNN with Inception-V2 model [23] in our work for the detection of human faces in the given pre-processed videos. Inception-V2 is a module that aims to make convolution networks less complicated. A region proposal network has been added in this model to generate proposals with different dimensions and aspect ratios. An RoI pooling layer is used to extract a fixedlength feature vector from each of the image’s area proposals. Instead of pyramids of pictures, this approach introduced the idea of anchor boxes. An anchor box is a reference box with a predetermined scale and aspect ratio. This can be compared to a pyramid of anchor boxes for references. After then, each region is mapped to each reference anchor box, allowing for the detection of objects of various sizes and aspect ratios. Details of this model could be found in [23]. YOLO-v3: YOLO-v3 [21] is the major improvement in the family of YOLO. It represents a leap forward in a number of areas, particularly accuracy and speed. The following are some of the most significant changes in YOLO-v3: (i) It consists of 106 layers with 75 convolutional layers, in contrast to v1 and v2; (ii) it employs residual layers at regular intervals to address the vanishing gradient problem; (iii) to produce predictions at different scales, it uses feature pyramid networks (FPNs); and (iv) it utilizes DarkNet [24] architecture as a feature extractor. YOLO-v3 uses logistic regression to calculate an objectness score. A logistic function classifier has been used in this model instead of a Softmax function for class prediction. Using feature pyramid networks, YOLO-v3 can detect the object at several scales. YOLO-v5: YOLO-v5 [22] is the most well-known and efficient one-stage detector. The following enhancements are employed in this version: (i) This model enhanced with focus structure and CSP network [25] as the backbone; (ii) PANet [26] is utilized as the neck in YOLO-v5 to obtain feature pyramids. Feature pyramids aid models in achieving good object scaling generalization; and (iii) the model Head creates final output vectors with class probabilities, objectness scores, and bounding boxes after anchor boxes are applied to features. In YOLO-v5, the middle/hidden layers use the leaky ReLU activation function, whereas the final detection layer uses the sigmoid activation function. In this work, the SGD optimization function is used during the training of the network.
2.4 Transfer Learning Transfer learning is a machine learning technique that allows a model developed for one task to be used for another. Generally, CNNs need a lot of data both for training and for the trained model to be generalizable. There is a chance of over-fitting on the
90
S. Hazra et al.
smaller dataset, when the model tries to remember the training data and the matching output. This is explicitly apposite to the deeper and complex models. For this reason, CNNs are not often trained from the scratch. In this work, we consider three standard pre-trained CNN models including Faster R-CNN, YOLO-v3, and YOLO-v5 to detect objects in images of crowd scenarios. Here, our approach has the target to solely detect human faces. Thus, in order to accomplish this goal, we have utilized the strategy of transfer learning to fine-tune the aforementioned model by the experimental dataset consisting of human faces in different scenarios. The output layer of the models outlined above uses the Softmax activation function to normalize the input value into a vector of values that follow a probability distribution and have a total sum of 1. The Softmax binary classifier employs a binary cross-entropy function to update the aforesaid model during training with the goal of minimizing loss.
3 Experimental Results The proposed method has been evaluated on WIDER-FACE dataset [27]. It is a face detection benchmark dataset. The database has consisted of 32,203 images and labeled 393,703 faces with a high degree of variability in scale, pose, and occlusion. WIDER-FACE dataset is organized based on 61 event classes. For each event class, we have selected 70% data as training and 30% data as testing sets. The network is trained for 20 epochs for each training set with an initial learning rate of 10−4 . For the momentum decay rate, a value of 0.9997 has been used. The network is trained using a 3.5 GHz AMD Ryzen 3 1300 quad-core processor with 128 GB memory and an Nvidia GeForce GTX 1060 6GB GPU. The TensorFlow framework is used to run the mentioned network in Python. Three performance indicators such as precision, recall, and mAP when IOU at 0.5 [28] are used to evaluate the efficacy of the proposed method. Here, each of the deep learning models has been tested on both the pre-processed and without pre-processed images. In our test datasets, we also evaluated additional threshold values like 0.7, 0.8, and 0.85. However, the efficiency of object detection is degraded in each of these cases. At a threshold of 0.6, the proposed model had the maximum accuracy. Table 1 illustrates performance comparisons of Faster R-CNN with and without pre-processed images. It can be seen that Faster R-CNN with pre-processed images performed significantly better than without pre-processed images. Similarly, Tables 2 and 3 show the performance of YOLO-v3 and v5 for pre-processed and without pre-processed images, respectively. It can be seen that both the models performed significantly better with per-processed images. Hence, it can be concluded that appropriately pre-processing of images/frames is a very significant step for object detection using CNN models. Finally, in Table 4, we summarize the performance of different CNN models under consideration for object detection in crowd scenarios. It can be seen that YOLO-v5 performs significantly better than other related methods.
A Pre-processing-Aided Deep Transfer Learning Model for Human …
91
Table 1 Performance of faster R-CNN with and without pre-processed images Method Faster R-CNN
Precision (%)
Recall (%)
mAP (%)
Without pre-processing
81.1
82.4
84.76
With pre-processing
82.4
83.9
86.8
Table 2 Performance of YOLO-v3 with and without pre-processed images Method YOLO-v3
Precision (%)
Recall (%)
mAP (%)
Without pre-processing
82.45
83.8
85.6
With pre-processing
86.4
87.12
91.1
Table 3 Performance of YOLO-v5 with and without pre-processed images Method YOLO-v5
Precision (%)
Recall (%)
mAP (%)
Without pre-processing
84.2
85.4
94.76
With pre-processing
91.4
89.9
96.8
Table 4 Performance comparison of different CNN models
Method
Precision (%)
Recall (%)
mAP (%)
Faster R-CNN
82.4
83.9
86.8
YOLO-v3
86.4
87.12
91.1
YOLO-v5
91.4
89.9
96.8
4 Conclusion This paper presents transfer learning-based deep learning methods for detecting human objects that can retain performance efficacy in variable crowded environments. This work outlines the method by which deep learning models can improve their accuracy in real-time applications despite constraints such as varying image size, resolution, and occlusions. It has been observed that our technique is capable of generating superior performance with minimal loss. We were only able to finish the experiment with 20 epochs due to the scarcity of resources. The results can be enhanced greatly by raising the number of epochs. Nevertheless, there are several challenging situations in which our network must be upgraded. We are working on it, and we anticipate that in the future, we will experience significant advancements in this field of research.
92
S. Hazra et al.
References 1. Khalifa AF, Badr E, Elmahdy HN (2019) A survey on human detection surveillance systems for Raspberry Pi. Image Vis Comput 85:1–13 2. Dalal N, Triggs B (2015) Histograms of orirented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition (CVPR). San Diego, CA, USA 3. Bay H, Tuytelaars T, Gool LV (2006) SURF: speed up robust feature. ECCV 3951:404–417 4. Oliver A, Llad X, Freixenet J, Mart J (2007) False positive reductionin mammographic mass detection using local binary patterns. In: MICCAI. Springer, Berlin, Heidelberg, pp 286–293 5. Bhuvaneswari K, Rauf HA (2009) Edgelet based human detection and tracking by combined segmentation and soft decision. In: International conference on control, automation, communication and energy conservation 6. Dung HV, Jo KH, Vavilin A (2012) Fast human detection based on parallelogram Haar-like features. In: 38th annual conference on IEEE industrial electronics society (IECON) 7. Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: The IEEE conference on computer vision and pattern recognition, 17–22 Jun 2007 8. Dukhan M (2016) NNPACK: acceleration package for neural networks on multi-core CPUs. [Online]. Available: https://github.com/Maratyszcza/NNPACK 9. Eng HL, Wang J, Kam AH, Yau WY (2004) A Bayesian framework for robust human detection and occlusion handling human shape model. In: Proceedings of the 17th international conference on pattern recognition, ICPR 2004 10. Adiono T, Parkoso KS, Putratama CD (2018) HOG-AdaBoost implementation for human detection employing FPGA ALTERA DE2-115. Int J Adv Comput Sci Appl 9(10) 11. Nikouei SY, Chen Y, Song S, Xu R, Choi BY, Faughnan TR (2018) Real-time human detection as an edge service enabled by a lightweight CNN. In: IEEE international conference on edge computing (EDGE). San Francisco, CA, USA 12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE conference on computer vision and pattern recognition 13. Girshick R (2015) Fast R-CNN. In: The IEEE conference on computer vision and pattern recognition 14. He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. In: The IEEE conference on computer vision and pattern recognition 15. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE conference on computer vision and pattern recognition, 27–30 Jun 2016 16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, (2015) SSD: single shot multibox detector. IEEE Comput Vis Pattern Recog (CVPR) 17. Sen W, Kejian Y (2008) An image scaling algorithm based on bilinear interpolation with VC+ +. J Tech Autom Appl 27(7):44–45 18. Jha S, Kumar R, Priyadarshini I, Smarandache F, Long HV (2019) Neutrosophic image segmentation with dice coefficients. Measurement 134:762–772 19. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 20. Ren S, Girshick R, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149 21. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. IEEE Trans Pattern Anal 15:1125–1131 22. Liu Y, Lu BH, Peng J et al (2020) Research on the use of YOLOv5 object detection algorithm in mask wearing recognition. World Sci Res J 6(11):276–284 23. Halawa LJ, Wibowo A, Ernawan F (2019) Face recognition using faster R-CNN with inceptionV2 architecture for CCTV camera. In: 2019 3rd international conference on informatics and computational sciences (ICICoS)
A Pre-processing-Aided Deep Transfer Learning Model for Human …
93
24. Li C, Wang R, Li J, Fei L (2020) Face detection based on YOLOv3. In: Jain V, Patnaik S, Popent, iu Vl˘adicescu F, Sethi I (eds) Recent trends in intelligent computing, communication and devices. Advances in intelligent systems and computing, vol 1006. Springer, Singapore 25. Wang C-Y, Mark Liao H-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops, pp 390–391 26. Dadboud F, Patel V, Mehta V, Bolic M, Mantegh I (2021) Single-stage UAV detection and classification with YOLOV5: mosaic data augmentation and panet. In: 2021 17th IEEE international conference on advanced video and signal based surveillance, pp 1–8 27. Yang S, Luo P, Loy C-C, Tang X (2016) WIDER FACE: a face detection benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5525–5533 28. Padilla R, Passos WL, Dias TL, Netto SL, Da Silva EA (2021) A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3):279
Signal Processing
Component Adaptive Superpixel-Based Joint Sparse Representation for Hyperspectral Image Classification Amos Bortiew and Swarnajyoti Patra
Abstract Sparse representation classifiers (SRCs) that consider spatial contextual information have outperformed the traditional pixel-based classifiers. Fixed-size square windows have been the most popular choice to capture spatial information. However, they do not represent the actual spatial neighbourhood of the image. To overcome this, many superpixel-based SRC techniques were developed that generate adaptive windows to capture spatial information for classification purposes. The superpixels were generated using segmentation algorithms and hence require prior information. In this paper, we use image’s connected components to define superpixels, which can better capture spatial information. Using two hyperspectral images, the success of the proposed technique was assessed with other similar techniques. The proposed technique produces better accuracy with O A of 75.89% for Pavia University and O A of 89.48% for Indian Pines dataset. Keywords Attribute filtering · Max-tree · Min-tree · Sparse representation · Superpixel
1 Introduction Hyperspectral image classification is considered vital application in remote sensing [7]. There are several pixel-based classifiers that exist in the literature such as neural networks (NNs), ensemble classifiers (random forest) and support vector machine (SVM) [1]. Another interesting technique that has been rising rapidly in recent years for classification is sparse representation classification (SRC). It establishes a theory that same class pixels will lie on the same low dimensional subspace. It works by first taking an unlabelled pixel and representing it using a linear combination of labelled samples (atoms) present in the dictionary. Then the atoms selected, which may belong to different classes, are used in a class-by-class fashion to approximate the unlabelled pixel. Finally, the class with the minimum error is assigned to the pixel [5]. A. Bortiew · S. Patra (B) Tezpur University, Tezpur, Assam 784028, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_10
97
98
A. Bortiew and S. Patra
Traditional single pixel classifiers, including SRC, have often been found to produce unsatisfactory results due to the noisy nature of hyperspectral images and the availability of limited labelled samples. One way to overcome this is by incorporating spatial contextual information to the model for classification. Chen et al. [2] developed a joint sparse representation classification (JSRC) model that uses contextual information captured using a fixed square window to classify the unlabelled pixels. This produces a better result than SRC, signifying the effectiveness of adding spatial information. But the drawback of this model is that in real-life scenario, the fixed-size square window does not represent the actual neighbourhood in the image. Several authors use segmentation algorithms to group similar pixels together and create superpixels. The segmentation algorithm known as simple linear iterative clustering (SLIC) is one such popular approach that exists in the literature [8, 9]. It uses three principal components (PCs) to decompose hyperspectral images into homogeneous segments in the CIELAB colour space. Zhang et al. [10] developed an improved version of SLIC (known as ISLIC), which uses the entire bands instead of three PCs, to maintain the valuable discriminative spectral information underlying the HSI data. Han et al. [6] use entropy rate superpixel (ERS) which is a widely used graph-based segmentation methods in computer vision. It maps the original image to a graph and divides it into various sub-graphs that are compact, homogeneous and size-balanced. These sub-graphs formed the superpixels, and their shapes are varied according to the spatial structures present in HSI. Using the spatial information from these superpixels, the class of the unlabelled pixels is then assigned by using the JSRC. Though the use of superpixels mitigates the problem that arises because of fixed-size windows, to get the appropriate shape of the spatial neighbourhood, the segmentation algorithms require prior information of the image. Thus, the above techniques have some limitations when it comes to acquiring spatial information. To minimize this drawback, in this work, we exploit connected components of the image for defining the superpixels. To accomplish this, our proposed technique builds modified max-tree and modified min-trees using attribute filtering to represent the image’s connected component. The proposed method merges all the connected components whose attribute values are lesser than the selected threshold, in contrast to conventional max-tree and min-tree-based attribute filtering [3]. The filter images obtained from the modified max-tree and modified min-tree are then exploited to define superpixels for incorporating spatial information. These superpixel’s size and shape are adaptive in nature and only include neighbour’s homogenous region-related pixels. The rest of the paper is organized as follows. Section 2 presents the proposed technique. The datasets used for the experiments are described in Sect. 3. Section 4 shows the results of experiments and their analysis. Finally, Sect. 5 provides the conclusion and future directions.
Component Adaptive Superpixel-Based Joint Sparse Representation …
99
2 Proposed Method As mentioned, to incorporate spatial information, the formation of superpixels based on segmentation technique is dependent on the segmentation results. Therefore, in this paper, we propose a different method to generate superpixels for obtaining better spatial neighbourhoods by analysing image’s connected components. The details of the proposed technique are given below.
2.1 Generation of Component Adaptive Superpixels One of the most popular approaches in the literature for extracting and processing the image’s connected components is attribute filtering. It follows a three-step procedure. First, the max-tree (or min-tree) is constructed to represent all the input image’s connected components. Then, the connected components whose attribute values are less than the specified threshold are merged to its parent node. Finally, the processed tree is restored to create the filter image. For this conventional approach, it was discovered that not all connected components with values less than the threshold are merged to their parents. As a result, the filter images have many connected components that are smaller than the threshold and thus cannot be suitable to form superpixels. To mitigate this problem, our technique generates modified max-tree (modified min-tree) by combining tree construction and filtering processes. The proposed technique reduces the conventional three-step procedure to a two-step procedure: tree construction and image restitution. In the tree construction step, each child node goes through the tree construction and filtering process before being created, as opposed to the traditional step, which filters the child nodes after the tree has been constructed. This ensures that each node represents connected components with attribute values greater than the predefined threshold. Finally, the image restitution phase of conventional filtering is used to create the filter image after building the modified max-tree (modified min-tree). The steps of the proposed technique are shown in Fig. 1 when they are applied to a toy image. The image consists of numerous connected components that are identified by the following letters {A, B, C, D, E, F, G, H, I}, succeeded by a number {0, 1, 2, 3}, which denotes the connected component’s respective grey values. Here, we create the filter image using area as the attribute with the filtering criterion the area of the connected component bigger than threshold λ = 20 pixels. Additionally, we assume that the connected components B1, E1, G1, H1 and I3 have areas that are less than 20 pixels. When this toy image is constructed using conventional max-tree approach, it was found that only B1 and I3 are merged to their parents. This is because E1, G1 and H1 are represented by the same node, and their combined area is greater than the threshold. Hence, these connected components could not be merged even though individually their attributes are below the threshold.
100
A. Bortiew and S. Patra
Fig. 1 Construction of modified max-tree for generating the filter image
Fig. 2 Formation of component adaptive superpixels
However, when the image is processed using the proposed technique to generate a modified max-tree, each connected component is processed individually and those that do not meet the conditions are merged to their neighbour connected component with the next higher grey value. From Fig. 1, it can be seen that B1 was merged to C3 in Step 2, E1, G1 and H1 were merged to F2 in Step 4, and finally, I3 was merged to F2 in Step 6. This same procedure was adopted to construct a modified min-tree. Figure 2 represents the filter images obtained from the modified max-tree and modified min-
Component Adaptive Superpixel-Based Joint Sparse Representation …
101
tree. The connected component present in these filter images was all greater than the predefined threshold (i.e. λ = 20). But the modified max-tree and modified mintree merged the uncertain connected components (e.g. E1) into different neighbour components. Therefore, in order to increase homogeneity among the superpixels, the uncertain connected components must be separated from the others. Hence, the two filter images are combined to get homogeneous component adaptive superpixels (CASs). Figure 2 shows the component adaptive superpixels of the toy image.
2.2 Classification of Hyperspectral Images Once the CASs are generated, for each unlabelled pixel, all the pixels that belong to the corresponding CAS are considered as spatial neighbours for classification. Since the size of the CASs is defined by the connected components of the filter images, there is a possibility of obtaining some large CASs with many redundant neighbour pixels. The classification process grew more time-consuming as a result. To solve this problem, we use a smart sampling technique to pick a small number of pixels from the CASs whose sizes are greater than a given threshold T . When using our sampling method, a large number of pixels close to the central pixel are randomly sampled. As the distance from the pixel rises, however, the percentage of pixels that are sampled steadily decreases. After obtaining the neighbours of each unlabelled pixel, they are classified using joint sparse representation classifier (JSRC). The JSRC assumes that the pixels adjacent to an unknown pixel can be jointly approximated by linearly combining the same atoms in the dictionary. As a result, while sparse vectors have the same sparsity pattern, their coefficients are different. Readers may refer to [2] for more details on JSRC.
3 Description of Datasets The proposed method was tested using two well-known hyperspectral datasets, i.e. Indian Pines and Pavia University. A description of each is shown below. The first dataset was captured by The ROSIS-03 airborne optical sensor over Pavia, Italy’s University of Pavia’s urban area. The image size is 610 × 340 pixels with 103 spectral bands. For classification, 9 classes have been defined in the area (Fig. 3). The second dataset was recorded by the AVIRIS sensor in Indian Pines farmland, Indiana, the USA. The image size is 145 × 145 pixels with 200 spectral bands. For classification, 16 classes have been defined in the area (Fig. 4).
102
A. Bortiew and S. Patra
Fig. 3 Pavia University a) ground truth and classification maps of b) SLIC, c) ISLIC, d) ERS, e) CASJSRC and f) CASJSRC-C
Fig. 4 Indian Pines a) ground truth and classification maps of b) SLIC, c) ISLIC, d) ERS, e) CASJSRC and f) CASJSRC-C
4 Experimental Results This section explains the experimental setup and the results obtained from the experiments conducted.
4.1 Experimental Design To exhibit the effectiveness of the novel component adaptive superpixel-based joint sparse representation classification (CASJSRC) technique, it is compared with other similar state-of-the-art sparse representation-based techniques. The state-of-the-art techniques considered are SLIC [8], ISLIC [10] and ERS [6]. The proposed tech-
Component Adaptive Superpixel-Based Joint Sparse Representation …
103
Table 1 Overall accuracy (OA), average accuracy (AA) and kappa (κ) accuracy obtained by the SLIC, ISLIC, ERS, CASJSRC and CASJSRC-C SLIC ISLIC ERS Proposed Proposed CASJSRC CASJSRC-C Pavia university OA 70.14 AA 74.20 κ 0.60 Indian pines OA 87.27 AA 84.14 κ 0.76
74.78 78.48 0.66
74.48 77.28 0.65
75.63 78.64 0.68
75.89 79.01 0.69
89.27 93.60 0.86
81.74 76.29 0.85
88.79 92.98 0.87
89.48 92.97 0.88
nique may result in some superpixels with only few pixels. Therefore, we apply a post-processing method, called cleanup, that merges such superpixels to their neighbours. We called this method component adaptive superpixel-based joint sparse representation classification with clean-up (CASJSRC-C) and is also considered for comparison. In this experiment, the same parameters are used for both the datasets. For different thresholds λ = [25, 50, 75, 100, 125, 150, 175, 200], the proposed CASJSRC and CASJSRC-C are tested. For fairness, we also calculated the number of superpixels n from the above thresholds for SLIC, ISLIC and ERS using n = N _sup/λ where N _sup is the number of pixels in total. Therefore, many classification results were obtained, and the best one produced by each technique was chosen for comparison. The smart sampling threshold is fixed at T = 300, while the clean-up threshold is fixed at s = 5, and finally, the sparsity level K = 3 was chosen for all the techniques.
4.2 Results We evaluated the potentiality of the proposed method for all considered datasets using standardized (fixed) test and training sets produced by the community in remote sensing, as described in [4]. Each technique is evaluated using three quantitative metrics in the experiment: overall accuracy (OA), average accuracy (AA) and kappa (κ). Table 1 shows the results of the experiment for both Pavia and Indian Pines datasets, and the best results are highlighted in bold. The classification maps of the considered techniques for Pavia and Indian Pines dataset are demonstrated in Figs. 3 and 4, respectively. It is observed that the proposed techniques produce better or similar results than the best state-of-the-art method. For Pavia University dataset, the proposed CASJSRC-C produces an overall accuracy of 75.89% and average accuracy of 79.01% an almost 1% higher than the ones produced by best method considered,
104
A. Bortiew and S. Patra
ISLIC. Similarly, for Indian Pines dataset, the CASJSRC-C produces similar results as the best method ISLIC, with the overall accuracy of 89.48% and average accuracy of 92.97%. For both the datasets, the proposed CAJSRC-C produces slightly better results than proposed CAJSRC. These results validated the effectiveness of the proposed technique for capturing better spatial information.
5 Conclusion The addition of contextual information has significantly improved SRC. But the use of fixed-size window or superpixels generated by segmentation algorithms for considering contextual information has certain drawbacks. Therefore, in this paper, we propose a technique that generates superpixels using connected components of the image which does not require segmentation algorithm. The superpixels were created by combining the filter images from the proposed modified max-tree and modified min-tree. The contextual information from these superpixels is then used for classifying the unlabelled pixels. Utilizing two real HSI datasets, the success of the proposed technique is assessed by comparing it with several similar state-of-theart approaches. And it was found that the proposed technique yields noticeably better classification results for both datasets. Acknowledgements This work is partially supported by Science and Engineering Research Board, Government of India, with Grant No. CRG/2020/003018
References 1. Boggavarapu L, Prabukumar M (2017) Survey on classification methods for hyper spectral remote sensing imagery. In: 2017 international conference on intelligent computing and control systems (ICICCS). IEEE, pp 538–542 2. Chen Y, Nasrabadi NM, Tran TD (2011) Hyperspectral image classification using dictionarybased sparse representation. IEEE Trans Geosci Remote Sens 49(10):3973–3985 3. Das A, Patra S (2020) A rough-GA based optimal feature selection in attribute profiles for classification of hyperspectral imagery. Soft Comput 24(16):12569–12585 4. Ghamisi P, Maggiori E, Li S, Souza R, Tarablaka Y, Moser G, De Giorgi A, Fang L, Chen Y, Chi M et al (2018) New frontiers in spectral-spatial hyperspectral image classification: the latest advances based on mathematical morphology, markov random fields, segmentation, sparse representation, and deep learning. IEEE Geosci Remote Sens Mag 6(3):10–43 5. Hamdi MA, Salem RB (2019) Sparse representations for the spectral-spatial classification of hyperspectral image. J Indian Soc Remote Sens 47(6):923–929 6. Han M, Zhang C, Wang J (2016) Superpixel-based sparse representation classifier for hyperspectral image. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 3614–3619 7. Lu Q, Wei L (2021) Multiscale superpixel-based active learning for hyperspectral image classification. IEEE Geosci Remote Sens Lett 19:1–5
Component Adaptive Superpixel-Based Joint Sparse Representation …
105
8. Sakaci ¸ SA, Ertürk S (2018) Superpixel based spectral classification of hyperspectral images in different spaces. In: 2018 5th international conference on electrical and electronic engineering (ICEEE). IEEE, pp 384–388 9. Sun X, Zhang F, Yang L, Zhang B, Gao L (2015) A hyperspectral image spectral unmixing method integrating slic superpixel segmentation. In: 2015 7th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS). IEEE, pp 1–4 10. Zhang Y, Liu K, Dong Y, Wu K, Hu X (2019) Semisupervised classification based on slic segmentation for hyperspectral image. IEEE Geosci Remote Sens Lett 17(8):1440–1444
Convolutional Autoencoder-Based Models for Image Denoising: A Comparative Study Rowsonara Begum and Ayatullah Faruk Mollah
Abstract Images are often infected with noise for a variety of reasons, including camera sensor defects, transmission via noisy channels, and incorrect memory locations in the hardware. Effective denoising techniques are required to deal with such noisy images. This paper examines some prospective deep learning-enabled convolutional autoencoder networks for image denoising and presents a comparative view based on their quantitative performance on the same datasets in terms of standard evaluation measures such as PSNR. Such study will be useful in selecting appropriate image denoising model for different applications. Keywords Convolutional autoencoder · Image denoising · Gaussian noise · Deep learning
1 Introduction The task of noise elimination from images is of utmost importance in many applications. Often noise gets introduced in image during acquisition, representation, or transmission. Early denoising approaches relied on various filters and transforms [1]. Later, learning-driven approaches are being explored [2–6]. Particularly with the advent of autoencoder [7], deep denoising is gradually becoming a reality [8]. The basic autoencoder is a network that learns to reduce and encode data and then reconstructs the data from the encoded form as close as possible. By definition, an autoencoder decreases data dimensionality by figuring out how to disregard noise in the input. Image denoising can be accomplished with the use of autoencoders such as the denoising autoencoder. Denoising autoencoders can thus denoise complicated images that are difficult to be denoised using existing techniques. Additionally, R. Begum · A. F. Mollah (B) Department of Computer Science and Engineering, Aliah University, IIA/27 Newtown, Kolkata 700160, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_11
107
108
R. Begum and A. F. Mollah
with the popular and successful incorporation of convolution operation into neural networks, convolutional autoencoders have become a fruitful reality [9–11]. It is being applied in various applications such as seismic noise attenuation [12], drift chambers denoising [13], and medical image denoising [14]. Luo et al. [15] have applied a multi-scale convolutional autoencoder for denoising ground-penetrating radar images. In short, image denoising is studied for long period. Yet it remains a critical and partially solved problem because image denoising is an inverse problem with no unique solution from a mathematical standpoint. Convolutional autoencoder may be considered as a major breakthrough in image denoising or image reconstruction. However, not many models of such networks have been explored yet. In this paper, three prospective convolutional autoencoder models have been studied for different levels of noise. This study reveals the power of such models while reducing the latent representation.
2 Autoencoder Models for Denoising Autoencoders usually consist of three parts—(i) encoder, (ii) bottleneck, and (iii) decoder. The encoder is supposed to represent the input data in an encoded form which is, most often, much smaller than the original data. The bottleneck is a layer that holds the compressed representation. It is the decoder which uses the encoded data and reconstructs the original data as close as possible. Several architectures of autoencoder have surfaced. They may be categorized as (i) deep autoencoders, (ii) denoising autoencoders, (iii) sparse autoencoders, (iv) contractive autoencoders, (v) variational autoencoders, and (vi) convolutional autoencoders. Convolutional autoencoders, a relatively recent development, utilize the benefit of the powerful convolution operator. They learn to encode the input in a series of simple signals and then attempt to recreate the input from these signals by modifying the image reflection. In this work, three architectures of convolutional autoencoder have been implemented, and their performances are studied to reveal insights. Below, we discuss the details of the three employed autoencoder architectures designed for image denoising.
2.1 Description of Model 1 It consists of only one convolution layer and one deconvolution layer. In the convolution layer, 8 kernels with learnable weights of dimension 3 × 3 are employed. Layer-wise configuration is shown in Fig. 1. It may be noted that as stride size is 1, the size of the encoded image remains the same. Such model is also included in study to check how it responds in comparison with other models.
Convolutional Autoencoder-Based Models for Image Denoising …
109
Fig. 1 Layer-wise configuration details of the convolutional autoencoder model 1
2.2 Description of Model 2 Unlike model 1 which does not reduce the input dimension, model 2 and model 3 reduce the input dimension to some extent. In model 2, stride size is taken as 2 during convolution which reduces every dimension by half. For instance, an input shape of 28 × 28 will reduce to 14 × 14. Figure 2 shows the configuration as log from the tensorflow.keras library.
2.3 Description of Model 3 Model 3 reduces the input dimensions further by incorporating two successive convolution layers with stride size 2. As a result, the dimension turns only 7 × 7 (for input dimension of 28 × 28) at the bottleneck layer, which follows two successive deconvolution layers to retrieve the input dimensions back. An overview of the configuration of this model may also be realized from Fig. 3.
110
R. Begum and A. F. Mollah
Fig. 2 Configuration of model 2 in different layers
3 Results and Analysis In order to evaluate the performance of the three convolutional autoencoder models for image denoising, experiments have been carried out on the popular and publicly available Fashion-MNIST dataset [16]. Fortunately, it is also available at the datasets repository of the tensorflow.keras library. Below, in Sect. 3.1, a brief description of this dataset is included. Then, experimental setup and evaluation protocol are discussed in Sect. 3.2, results are presented in Sect. 3.3, and related discussion is made in Sect. 3.4.
3.1 Dataset Description The Fashion-MNIST dataset contains 70,000 grayscale images of 28 × 28 pixels belonging to 10 different types of fashion products. The training set has 60,000 images, whereas the test set contains 10,000 images. Few sample images from this dataset are shown in Fig. 4. As the values of pixel intensities range from 0 to 255, we rescaled them to 0.0–1.0 by dividing with 255.
Convolutional Autoencoder-Based Models for Image Denoising …
111
Fig. 3 Configuration of different layers of model 3
Fig. 4 Sample images from the Fashion-MNIST dataset [16] (one sample from each of the first 5 five classes is shown)
3.2 Experimental Setup and Evaluation Measure We consider the library’s pre-partitioned training and test sets for all the models implemented in this work, and the training is done for 10 epochs. Among other parameters, we use padding = ‘same’ and activation = ‘relu’ (for both convolution
112
R. Begum and A. F. Mollah
and deconvolution layers). However, for the last convolution layer in the decoder part, we use ‘sigmoid’ activation function. Peak signal-to-noise ratio (PSNR) is used as the evaluation measure for denoising performance. It refers to the ratio of a signal’s maximum possible power to the power of a noisy signal. PSNR is a frequently used generalized metric used in many problems such as compressor or filter efficiency. In the context of image denoising, it is measured by comparing the reconstructed image with the original one. It is mathematically defined as shown in Eq. 1. PSNR = 10 log10
L −1 (L − 1)2 = 20 log10 MSE RMSE
(1)
where L is the maximum number of intensity level, i.e., 256. RMSE is the square root of the mean squared error (MSE) which is measured as shown in Eq. 2. MSE =
m−1 n−1 1 (O(i, j) − D(i, j))2 mn i=0 j=0
(2)
where O denotes the obtained image, D denotes the desired image, m denotes the number of rows, and n denotes the number of columns. The higher the PSNR, the better the reconstruction. In ideal situation, i.e., when there is no noise at all, PSNR is ∞ (infinity).
3.3 Denoising Performance The models are tested by injecting noise of different quantity, and the obtained quantitative figures are presented in Table 1. In this work, Gaussian noise is introduced for noise factor 0.1, 0.2, and 0.3, respectively. The higher the noise factor, the greater the noise in such cases. Image denoising is visually shown in Fig. 5.
3.4 Discussion From Fig. 5, it may be realized that fair reconstruction has taken place in most cases. As the bottleneck size decreases from model 1 to model 3, so is their performance. It is understood because the size of latent tensor is gradually less. Additionally, it may be noticed that when the noise factor is less, reconstruction is better as expected.
Convolutional Autoencoder-Based Models for Image Denoising … Table 1 Denoising performance at different noise levels for the three employed convolutional autoencoder models
113
Model
Noise level
Mean
Std. dev.
Model 1 (28 × 28)
0.1
30.31
1.86
0.2
25.16
1.97
0.3
23.46
2.67
0.1
29.69
1.72
0.2
25.09
1.96
0.3
22.33
2.21
0.1
27.24
2.26
0.2
24.36
2.09
0.3
21.13
2.08
Model 2 (14 × 14)
Model 3 (7 × 7)
PSNR
Bold figures indicate the highest among the respective category
4 Conclusion In this paper, we have implemented three convolutional autoencoder models for image denoising and presented a comparative view with related insights. Empirical studies have been made by injecting Gaussian noise at different quantity. It has been evident that convolutional autoencoder-based image denoising models are very powerful in reconstructing image from its noisy version. All the three models considered in this work are found to yield reasonably fair representation of original images as mean PSNR is more than 25 in all cases. However, reconstruction heavily depends upon the size of the latent tensor in the bottleneck layer. The higher the size, the better the reconstruction. Accordingly, it is noticed that reconstruction is better in case of model 1 than that of model 3. This study can be further extended in the future in a number of directions such as considering other types of noise, various other models, and observation on multiple datasets.
114
R. Begum and A. F. Mollah
(a) Original images
(b) Noise injected images
(c) Reconstructed with model 1
(d) Reconstructed with model 2
(e) Reconstructed with model 3 Fig. 5 Image reconstruction performance for noise factor of 0.1 on five sample images using model 1, model 2, and model 3, respectively
References 1. Fan L, Zhang F, Fan H, Zhang C (2019) Brief review of image denoising techniques. Vis Comput Ind Biomed Art 2:1–12, art. 7 2. Tian C, Xu Y, Fei L, Yan K (2018) Deep learning for image denoising: a survey. In: Proceedings of international conference on genetic and evolutionary computing. Springer, Singapore, pp 563–572 3. Quan Y, Chen Y, Shao Y, Teng H, Xu Y, Ji H (2021) Image denoising using complex-valued deep CNN. Pattern Recognit 111, art. 107639
Convolutional Autoencoder-Based Models for Image Denoising …
115
4. Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin CW (2020) Deep learning on image denoising: an overview. Neural Netw 131:251–275 5. Thakur RS, Yadav RN, Gupta L (2019) State-of-art analysis of image denoising methods using convolutional neural networks. IET Image Proc 13(13):2367–2380 6. Ilesanmi AE, Ilesanmi TO (2021) Methods for image denoising using convolutional neural network: a review. Complex Intell Syst 7(5):2179–2198 7. Bank D, Koenigstein N, Giryes R (2020) Autoencoders. arXiv:2003.05991 8. Bajaj K, Singh DK, Ansari MA (2020) Autoencoders based deep learner for image denoising. Procedia Comput Sci 171:1535–1541 9. Gondara L (2016) Medical image denoising using convolutional denoising autoencoders. In: Proceedings of 16th international conference on data mining workshops, pp 241–246 10. Lee D, Choi S, Kim HJ (2018) Performance evaluation of image denoising developed using convolutional denoising autoencoders in chest radiography. Nucl Instrum Methods Phys Res, Sect A 884:97–104 11. Nishio M, Nagashima C, Hirabayashi S, Ohnishi A, Sasaki K, Sagawa T, Hamada M, Yamashita T (2017) Convolutional auto-encoder for image denoising of ultra-low-dose CT. Heliyon 3(8):e00393:1–19 12. Qian F, Guo W, Liu Z, Yu H, Zhang G, Hu G (2022) Unsupervised erratic seismic noise attenuation with robust deep convolutional autoencoders. IEEE Trans Geosci Remote Sens 60:1–16, art. 5913016 13. Thomadakis P, Angelopoulos A, Gavalian G, Chrisochoides N (2022) De-noising drift chambers in CLAS12 using convolutional auto encoders. Comput Phys Commun 271, art. 108201 14. Ahmed AS, El-Behaidy WH, Youssif AA (2021) Medical image denoising system based on stacked convolutional autoencoder for enhancing 2-dimensional gel electrophoresis noise reduction. Biomed Signal Process Control 69, art. 102842 15. Luo J, Lei W, Hou F, Wang C, Ren Q, Zhang S, Luo S, Wang Y, Xu L (2021) GPR B-scan image denoising via multi-scale convolutional autoencoder with data augmentation. Electronics 10(11), art. 1269 16. Fashion MNIST dataset. https://www.kaggle.com/datasets/zalando-research/fashionmnist. Accessed on 10 June 2022
Simultaneous Prediction of Hand Gestures, Handedness, and Hand Keypoints Using Thermal Images Sichao Li, Sean Banerjee, Natasha Kholgade Banerjee, and Soumyabrata Dey
Abstract Hand gesture detection is a well-explored area in computer vision with applications in various forms of human–computer interactions. In this work, we propose a technique for simultaneous hand gesture classification, handedness detection, and hand keypoints localization using thermal data captured by an infrared camera. Our method uses a novel deep multi-task learning architecture that includes shared encoder–decoder layers followed by three branches dedicated for each mentioned task. We performed extensive experimental validation of our model on an in-house dataset consisting of 24 users’ data. The results confirm higher than 98% accuracy for gesture classification, handedness detection, and fingertips localization, and more than 91% accuracy for wrist points localization. Keywords Hand gesture detection · Thermal imaging · Hand keypoints localization · Multi-task learning · Deep learning
1 Introduction With the fast-changing technology landscape, the use of interconnected smart devices equipped with sensors has become progressively popular. Smart devices have use in various important applications such as smart home, self-driving cars, smart infrastructure, and smart cities. However, the interactions with the smart devices are still not easy because users often need to learn and remember different settings and operation manuals specific to each device. As we will be more dependent on technology in the near future, the method of interactions with the smart devices needs to be more user-friendly. Because of recent technological developments, gesture-activated and voice command-based devices are becoming available in the market. These devices provide the users with natural ways of interacting with them reducing the trouble of S. Li · S. Banerjee · N. K. Banerjee · S. Dey (B) Clarkson University, 8 Clarkson Avenue, New York, NY 13699, USA e-mail: [email protected] URL: https://www.clarkson.edu/people/soumyabrata-dey © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_12
117
118
S. Li et al.
remembering complex setting information. Hand gesture-based human–computer interaction (HCI) is one of the major fields in computer vision that has been studied for many years. However, most of the works explored hand detection [6, 13, 16, 18, 26, 28] and gesture identification [4, 5, 9, 15] tasks that use data from red, green, and blue (RGB) cameras. Many of these works use a skin color database to segment the hand regions from the rest of the scene and traditional machine learning or deep learning techniques for gesture classification [5, 6, 9, 24]. Recently, other sensor modalities such as depth and thermal cameras are becoming widely available. Regardless, there have been a comparatively limited number of attempts for hand-related applications such as hand gesture classification using depth [23, 27, 30] and thermal data [3, 7, 12, 20, 22, 25]. While RGB data-based methods can suffer from problems such as lighting condition variations and skin color variations that can negatively impact the accuracy of the hand detection method, the thermal data-based approaches are less affected by those variations and can complement RGB-based techniques. Therefore, extensive study for hand gesture detection using thermal data is necessary to understand the capability of a complementary data modality and for a possible robust future approach combining both color and thermal data modalities. In this paper, we propose a novel deep learning architecture for simultaneous gesture classification, handedness detection, and hand keypoints localization using thermal images. The deep learning (DL) model uses a shared encoder–decoder component followed by three branches for gesture classification, handedness detection, and fingertips and wrist points localization. The network is trained by backpropagating the error estimated using a joint loss function. Furthermore, we introduced an intelligent post-processing step that utilizes the insights from the other two branches of the DL network for refining the hand keypoints localization results. We summarize the contributions of our work below. 1. We prepare a new dataset consisting of thermal imaging data from 24 users and 10 gestures per user. 2. We propose a novel multi-task DL network architecture that performs gesture classification, left- and right-hand detection, and hand keypoints localization. To the best of our knowledge, this is the first work that can perform all three tasks using a single network. 3. We demonstrate through experimental validation the superior performance of our model on all three tasks. On average, the accuracy of the model is > 98% for gesture classification, handedness detection, and fingertips localization. The wrist points localization accuracy is > 91%. 4. Instead of a threshold-based finger point localization, we introduce an adaptive filtering technique that utilizes the insight learned in the other branches of the network.
Simultaneous Prediction of Hand Gestures, Handedness …
119
Fig. 1 Ten different gestures for left (top row) and right (bottom row) hands Table 1 Dataset sample counts per gesture (G # ) and left/right hands G1
G2
G3
G4
G5
G6
G7
G8
G9
G10
Left
Train
4780
4700
4630
4520
5510
5390
4620
4570
4668
4938
23636
Right 24690
Test
970
1120
1120
1110
1120
1120
1090
1100
1120
1100
5570
5400
Total
5750
5820
5750
5630
6630
6510
5710
5670
5788
6038
29206
30090
2 Dataset We collected a customized thermal imaging dataset to train and validate our model on all three proposed tasks. The dataset details are provided below. Data collection: Data is collected from 24 users with a Sierra Olympic Viento-G thermal camera. Each user is recorded at 30 fps (frames per second) performing 10 different gestures using left and right hands. All the data is captured in indoor conditions (temperature between 65◦ and 70 ◦ F) and stored as 16 bit 640 × 480 TIFF image sequences. The camera is fixed to a wooden stand and oriented downward focusing on a table. Users are required to turn their palms toward the tabletop and the back of their hands toward the camera. In total, we collected 59,296 frames: 29,206 left-hand frames and 30,090 right-hand frames. Table 1 summarizes the dataset information, and Fig. 1 illustrates all gestures used in this study. Data preparation: Given the recorded frames, we go through a sequence of steps to prepare the data for our experiments. First, we crop the images to a predefined 640 × 440 pixel region so that only the tabletop is visible. Next, the training data is prepared by running a Python script that allows manual selection of the fingertips and wrist points, cropping of the hand region, and segmentation of the images into binary foreground–background regions (background pixels = 0 and foreground pixels = 1). Finally, two data augmentation techniques, such as rotation and variable-length forearm inclusion, are applied to each image. The model is trained with a variablelength forearm because in the test scenario a user can wear clothing of different sleeve lengths occluding different lengths of the forearm. The model needs to learn to ignore this variation and identify the gesture correctly irrespective of the length of the forearm. We used 10 different lengths of the forearm per image. Figure 2a, b illustrates the data augmentations.
120
S. Li et al.
Fig. 2 Example of data augmentation. a Forearm cropping, b rotation, c automatic hand segmentation during testing
Fig. 3 Architecture used for gesture classification, left-right-hand detection, and hand keypoints localization
During the test scenario, the hand images are generated by an automatic algorithm inspired by [3]. The algorithm uses background subtraction to detect binary hand regions, k-means clustering to isolate each hand, crop the image to tightly include each hand region, and resize it to a 100 × 100 pixel image. Figure 2c shows all the steps.
3 Methods We propose a novel DL network architecture for learning the three proposed tasks through a joint loss function. The fingertips are further refined using the detection results of the gesture classification and handedness detection branches. The whole process is described below in detail.
Simultaneous Prediction of Hand Gestures, Handedness …
121
3.1 Model Structure The architecture of the proposed model is presented in Fig. 3. The model expects a fixed size input of size 100 × 100. The shared part of the model is similar to a U-Net [19] architecture, and it consists of four convolutional layers, four maxpooling layers, and two up-convolutional layers. The encoder part of the network uses a series of max-pooling and convolutional layers to down-sample the resolution to 6 × 6. This is followed by two up-convolution layers with skip connections to increase the resolution back to 25 × 25. The first two branches of the network perform the gesture classification and left– right-hand detection tasks. They share two convolutional layers before separating into different paths. The gesture classification path uses a global average pooling layer followed by two dense layers. The handedness detection path uses a global average pooling layer and a dense output layer. The last branch is responsible for the fingertips and wrist points localization task. This consists of four convolutional layers followed by an up-convolution and two convolutional layers. The final output dimensions of this block are 50 × 50 × 6. The first five channels predict the fingertip locations, and the sixth channel predicts the wrist points. This is described in Sect. 3.2. All convolutional layers use kernel size 3 × 3, stride 1, and padding ‘same’. All up-convolution layers use kernel size 3 × 3 and stride 2. Our model uses ‘ReLU’ activation function. We use stochastic gradient descent (SGD) algorithm with a learning rate of 0.001, weight decay 1e−3 , and momentum of 0.95. A batch normalization [10] is used between each convolution layer and activation layer.
3.2 Hand Keypoint Detection We trained the network to predict the hand keypoints in a 50 × 50 × 6 output map. The first five channels in the map are trained to localize five fingertips in the sequence of thumb, index, middle, ring, and little finger. The last channel is dedicated to the two wrist points. Ground truth output maps are created with the following rules. Depending on the gesture, a few fingers will be visible, while others will be occluded in an input image. For an output channel, if the corresponding finger is not visible, we set all pixels to 0 s. Otherwise, a two-dimensional Gaussian with variance 1.5 is used to set the pixel values at the fingertip location and around. Similarly, in the 6th channel, the pixel values at and around the two wrist point locations are assigned using the same Gaussian distribution. Two sets of example ground truth maps are shown in Fig. 4 for better understanding. The ground truth maps are compared against predicted output maps to estimate loss. A similar approach is used in [26] for fingertips detection from color images. Fingertips localization: Once the model is trained, the network starts predicting keypoint locations as the output of the third branch of the network. However, multiple pixels are predicted as ‘non-zero’ in each channel of the output keypoint maps. One
122
S. Li et al.
Fig. 4 Examples of ground truth output maps for right-hand gesture 10 (top row) and left-hand gesture 5 (bottom row)
possible approach, as in [26], for localizing a fingertip in each channel is removal of all ‘non-zero’ pixels with a value less than a threshold p and selection of the pixel with the highest non-zero value (if any remaining pixel after removing pixels with value < p ). However, finding an ideal threshold can be difficult. Moreover, often valid fingertips are rejected because of predicted pixel values lower than the threshold resulting lower prediction accuracy. We adopted a different approach to resolve this problem. We defined a filtering function that takes advantage of the predictions in the gesture classification branch and handedness detection branch. The core idea is, given a predicted gesture, we can easily determine which channels should predict valid fingertips and which are not. For the channels with valid fingertips, we select the pixels with the highest prediction values. The second part of the function solves the finger index misorder problem. For example, a channel dedicated for index finger may predict the fingertip location of the middle finger and vice versa. This may happen because the two fingers are closely adjacent to each other, and a small prediction error can swap their corresponding channels. To resolve this problem, we utilize the predicted handedness information. Depending on whether the prediction is for a left-hand or a right-hand image, our function locates the wrist point closer to the thumb and connects it with the other wrist points (wrist-line) and all fingertip points (finger-lines). Now each finger-line creates an angle with the wrist-line. Because of the hand geometry, the highest to the lowest angles are produced by thumb-line, index-line, middle-line, ring-line, and little finger-line, respectively. We use this constraint to correct the misordered fingertips. The concept is highlighted in Fig. 5.
Fig. 5 Steps for hand keypoints misordering correction: a Depending on the left–right-hand prediction, the wrist point closest to the thumb is selected as ‘origin’ (the red dot). b Wrist-line and finger-lines are drawn by connecting origin with other wrist point and fingertip, respectively. c Based on hand geometry, the thumb-line creates the biggest angle and the little finger-line creates the smallest angle when joined with the wrist-line
Simultaneous Prediction of Hand Gestures, Handedness …
123
Table 2 Ablation study for the gesture, hand keypoints, and handedness detection tasks Gesture Recall
Fingertips
Wrists
Handedness
Prec.
Acc.
Recall
Prec.
Acc.
Recall
Prec.
Acc.
Recall
Prec.
Acc.
Each 98.23 branch
98.27
98.23
83.98
96.49
91.32
98.36
92.74
91.33
98.71
98.77
98.72
All 98.36 branch
98.37
98.34
98.78
98.51
98.96
98.35
92.78
91.36
99.71
99.72
99.72
Wrist points localization: Wrist points are filtered using a different approach. The location corresponding to the highest pixel value in the sixth channel is assigned to the first wrist point. The location of the second-highest pixel value, which is > dth pixel distance away from the first wrist point, is assigned to the second wrist point. The distance threshold is used to impose the condition that the two wrist points should not be detected in close proximity. We empirically determined that dth = 5 pixel produces very good results.
3.3 Loss Function The total loss is defined as: L = αL keypoints + β L gesture + γ L handedness . The loss function has three parts corresponding to the three branches of the network. L gesture corresponding to branch 1 estimates the mean gesture classification error. A categorical cross-entropy function is used to compute this. L handedness is the loss corresponding to the left–right-hand detection branch, and a binary cross-entropy function is used for this. Finally, L keypoints loss is computed after the fingertips and wrist points localization branch. A mean squared error is used to compute this. The parameters α, β, and γ are empirically estimated using a grid search method. We used α = 0.77, β = 0.15, and γ = 0.08.
4 Experiment Our training dataset consists of frames from 20 users, and the test dataset is formed by frames from the last 4 users. All the results reported in this section are on the test dataset. Additionally, we evaluated the performance of our model on four online datasets [2, 11, 14, 17]. All the experiments are performed on a desktop with Windows 10 operating system, an AMD RYZEN 9 3900X CPU, and a GTX 1080ti graphics card. For training our model, a batch size of 32 and a training epoch of 100 are used. Optimal model parameters are selected based on the cross-validation results on the training dataset. Ablation study: We computed the accuracy when separate models are trained for gesture classification, keypoints detection, and handedness prediction. Table 2 sum-
124
S. Li et al.
Table 3 Hand gesture classification performance comparisons Dataset
Method
G1
G2
G3
G4
G5
G6
G7
G8
G9
G 10
Avg. acc.
D1 [2]
VGG16
100
75.7
N/A
97.7
84.3
100
N/A
99.3
100
N/A
93.86
MobileNet
100
97.7
N/A
38.0
99.7
100
N/A
95.0
100
N/A
90.05
InceptionNet
100
58.0
N/A
62.7
74.0
100
N/A
66.7
100
N/A
80.19
Ours
100
99.7
N/A
94.7
96.3
100
N/A
100
100
N/A
98.67
VGG16
82.8
N/A
33.8
41.0
83.8
N/A
N/A
84.3
N/A
N/A
64.76
MobileNet
89.4
N/A
21.6
38.0
81.4
N/A
N/A
100
N/A
N/A
65.46
InceptionNet
88.8
N/A
44.6
40.4
96.0
N/A
N/A
98.7
N/A
N/A
73.25
Ours
88.4
N/A
60.0
71.0
97.4
N/A
N/A
93.6
N/A
N/A
81.87
VGG16
65.5
43.6
N/A
84.7
82.9
N/A
N/A
N/A
35.0
N/A
63.90
MobileNet
34.1
66.7
N/A
84.6
88.6
N/A
N/A
N/A
42.1
N/A
63.20
InceptionNet
53.7
77.1
N/A
90.4
100
N/A
N/A
N/A
40.2
N/A
72.99
Ours
72.5
98.0
N/A
99.0
93.7
N/A
N/A
N/A
78.5
N/A
88.32
VGG16
94.9
97.5
97.6
85.9
99.0
99.4
95.2
95.9
97.3
88.7
95.17
MobileNet
95.4
99.6
99.9
99.0
100
99.1
97.9
99.2
97.1
98.0
98.56
D2 [11]
D3 [17]
Self
InceptionNet
99.0
99.1
96.9
95.7
100
100
94.0
98.1
95.4
98.6
97.68
Ours
99.8
99.8
94.0
100
99.7
100
97.1
99.7
98.1
95.3
98.34
marizes the results and compares them with a case when all branches are trained together. As it can be seen, the results are comparable for single branch and multitask learning except for the fingertips detection. Fingertips detection heavily benefited from the information feedback of the other two branches. Gesture classification: We computed the gesture classification performance of our method on our dataset and 3 external datasets [2, 11, 17]. To compute the accuracy on the external datasets, we trained the model on the training dataset and generated the test results on those datasets. Since all datasets do not have the same 10 gestures as in our training dataset, we could only compute the accuracy for gestures that are common in the training dataset and external datasets. Some of the datasets are challenging because of the noisy hand segmentation data. Finally, we compared our results with well-known classification networks’ accuracy. All the results are reported in Table 3. Our network provides the best accuracy on the external datasets and near best accuracy on the self-collected dataset. This suggests that the multi-task network can learn the tasks in a generalized manner. Hand keypoints detection: We compare the hand keypoints localization performance of YOLSE [26], Unified Learning Approach (ULA) [1], and our method on an external dataset (D4) [14] and self-collected dataset (Table 4). As it can be seen, our results on both datasets for fingertips and wrist points localization are far superior compared to the state-of-the-art methods. For comparison, the forward pass time and the total number of network parameters for ULA are 18 ms and 20.5 million, for YOLSE are 34 ms and 2.86 million, and ours are 27 ms and 6.19 million.
Simultaneous Prediction of Hand Gestures, Handedness … Table 4 Keypoints detection results Dataset Methods Fingertips Recall Prec. D4 [14]
Self
YOLSE ULA Ours YOLSE ULA Ours
66.58 37.44 84.59 74.70 29.93 98.51
73.84 89.36 88.91 85.11 98.73 98.78
Table 5 Handedness detection results Methods Recall VGG16 MobileNet Ours
96.45 99.82 99.71
125
Acc.
Wrists Recall
Prec.
Acc.
70.03 57.34 86.71 81.87 65.15 98.96
88.95 45.86 96.55 94.16 34.02 98.35
90.82 88.19 86.86 89.31 98.95 92.78
81.61 66.36 84.24 84.62 70.42 91.36
Prec.
Acc.
96.73 99.83 99.72
96.50 99.83 99.72
Handedness detection: We compare handedness detection results with two wellknown classification networks such as VGG16 [21] and MobileNet [8] (Table 5). Our network performed almost as well as MobileNet. Noticeably, our network is much more lightweight compared to the other two networks. When VGG16 and MobileNet have 14.72 and 3.23 million parameters, respectively, for a single branch, our network only has 6.19 million parameters for all three branches. Even though MobileNet produces accuracy which is slightly above our reported accuracy, it does not have any multi-tasking capability. Our results are significant in the sense that we can simultaneously produce very high accuracy for all the tasks.
5 Discussion and Conclusion In this paper, we introduced a multi-task network that simultaneously learns to predict hand gestures, hand keypoints, and handedness from thermal image inputs. Also, we collected a dataset of 24 users performing the gestures. In our experimental validation, we showed the effectiveness of the network as it learns to perform all the tasks with very high accuracy. We also showed that the network is able to learn the generalized concepts, and as a result, the network performs well on external datasets where other well-known networks fail. This work shows promise, especially in the current technology landscape where there are interests to interact with intelligent devices in a natural way such as using gestures. Moreover, our study is based on an alternative data modality that can be combined with color image data to build a better and more robust system. Our future
126
S. Li et al.
research directions will explore those possibilities. To the best of our knowledge, this is the first attempt to simultaneously learn the three proposed tasks using a single DL network with thermal images. In the future, we will explore combining different modalities of data in a single DL network pipeline for better performance. Multi-task deep learning is advantageous because it allows simultaneous learning of multiple correlated tasks. Therefore, it serves as a method of regularization because it encourages learning only the features relevant for all tasks in the shared part of the network. This helps generalized learning. Our network shows this trait as we showed that the network performed way better than the other models on external datasets. Multi-task learning can also be used as learning some related intermediate tasks and using the knowledge of intermediate learning to boost the performance of the final tasks [29]. One of our future research directions is adapting this idea to improve the performance of the three tasks we addressed in this work.
References 1. Alam MM, Islam MT, Rahman SMM (2021) A unified learning approach for hand gesture recognition and fingertip detection. CoRR. arXiv:2101.02047 2. Arya R. Hand gesture recognition dataset. https://www.kaggle.com/datasets/aryarishabh/handgesture-recognition-dataset 3. Ballow JM, Dey S (2022) Real-time hand gesture identification in thermal images. In: Sclaroff S, Distante C, Leo M, Farinella GM, Tombari F (eds) Image analysis and processing—ICIAP 2022. Springer International Publishing, Cham, pp 491–502 4. Chen ZH, Kim JT, Liang J, Zhang J, Yuan YB (2014) Real-time hand gesture recognition using finger segmentation. Sci World J 267872 5. Dardas NH, Georganas ND (2011) Real-time hand gesture detection and recognition using bagof-features and support vector machine techniques. IEEE Trans Instrum Measur 60(11):3592– 3607 6. Gao Q, Liu J, Ju Z (2020) Robust real-time hand detection and localization for space humanrobot interaction based on deep learning. Neurocomputing 390:198–206 7. Gately J, Liang Y, Wright MK, Banerjee NK, Banerjee S, Dey S (2020) Automatic material classification using thermal finger impression. In: Ro YM, Cheng WH, Kim J, Chu WT, Cui P, Choi JW, Hu MC, De Neve W (eds) Multimedia modeling. Springer International Publishing, Cham, pp 239–250 8. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 9. Hu Z, Zhu X (2019) Gesture detection from RGB hand image using modified convolutional neural network. In: 2019 2nd International conference on information systems and computer aided education (ICISCAE), pp 143–146 10. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, vol 37, pp 448–456. https://proceedings.mlr.press/v37/ioffe15.html 11. Jain K. Hand gesture dataset. https://www.kaggle.com/datasets/kritanjalijain/gestures-hand 12. Kim S, Ban Y, Lee S (2017) Tracking and classification of in-air hand gesture based on thermal guided joint filter. Sensors (Basel, Switzerland) 17 13. Li C, Kitani KM (2013) Pixel-level hand detection in ego-centric videos. In: 2013 IEEE conference on computer vision and pattern recognition, pp 3570–3577 14. Mantecón T, del Blanco CR, Jaureguizar F, García N (2019) A real-time gesture recognition system using near-infrared imagery. PLoS ONE 14
Simultaneous Prediction of Hand Gestures, Handedness …
127
15. McBride TJ, Vandayar N, Nixon KJ (2019) A comparison of skin detection algorithms for hand gesture recognition. In: 2019 Southern African universities power engineering conference/robotics and mechatronics/pattern recognition association of South Africa (SAUPEC/RobMech/PRASA), pp 211–216 16. Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) GANerated hands for real-time 3D hand tracking from monocular RGB. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 49–59 17. Oshea R. Finger digits 0-5. https://www.kaggle.com/datasets/roshea6/finger-digits-05 18. Park M, Hasan MM, Kim J, Chae O (2012) Hand detection and tracking using depth and color information 19. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation, pp 234–241 20. Sato Y, Kobayashi Y, Koike H (2000) Fast tracking of hands and fingertips in infrared images for augmented desk interface. In: Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat. No. PR00580), pp 462–467 21. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR. arXiv:1409.1556 22. Song E, Lee H, Choi J, Lee S (2018) AHD: thermal image-based adaptive hand detection for enhanced tracking system. IEEE Access 6:12156–12166 23. Sridhar S, Mueller F, Oulasvirta A, Theobalt C (2015) Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3221 24. Stergiopoulou E, Papamarkos N (2009) Hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22(8):1141–1158. https://doi.org/10.1016/j.engappai. 2009.03.008 25. Vandersteegen M, Reusen W, Beeck KV, Goedemé T (2020) Low-latency hand gesture recognition with a low resolution thermal imager. CoRR. arXiv:2004.11623 26. Wu W, Li C, Cheng Z, Zhang X, Jin L (2017) YOLSE: egocentric fingertip detection from single RGB images. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 623–630 27. Wu D, Pigou L, Kindermans PJ, Le NDH, Shao L, Dambre J, Odobez JM (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597 28. Xu C, Cai W, Li Y, Zhou J, Wei L (2020) Accurate hand detection from single-color images by reconstructing hand appearances. Sensors (Basel, Switzerland) 20 29. Xu D, Ouyang W, Wang X, Sebe N (2018) Pad-net: multi-tasks guided predictionand-distillation network for simultaneous depth estimation and scene parsing. CoRR. arXiv:1805.04409 30. Yao Z, Pan Z, Xu S (2013) Wrist recognition and the center of the palm estimation based on depth camera. In: 2013 International conference on virtual reality and visualization, pp 100–105
3D Point Cloud-Based Hand Gesture Recognition Soumi Paul, Ayatullah Faruk Mollah, Mita Nasipuri, and Subhadip Basu
Abstract The main focus of this paper is to estimate the class of 3D point cloud gestures, using some statistical descriptors. To check the performance of proposed descriptors, we have evaluate it on a public 3D hand gesture dataset. We used several classification techniques with the different feature combinations and find that the random forest classifier provides the best performance. It achieves an accuracy of 95.20% on this public dataset. Keywords 3D hand gesture recognition · Point cloud · Sign language · Statistical descriptors
1 Introduction Hand gestures have been popular as forms of communication in the human society since ages, and in modern days those have turned out to be essential means of non-verbal communication amongst speech and hearing impaired people. Opposed to human-computer interactive devices such as keyboards, mouse or joysticks, their distinct characteristics are being intuitive, touchless and non-invasive. However, despite decades of research in the domain, the hand-operated devices are not commonly used in our daily lives. The currently developed systems offer reasonable reliability only in a controlled laboratory environment. S. Paul (B) · M. Nasipuri · S. Basu Department of Computer Science and Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] M. Nasipuri e-mail: [email protected] S. Basu e-mail: [email protected] A. F. Mollah Department of Computer Science and Engineering, Aliah University, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_13
129
130
S. Paul et al.
3D recognition has became a state-of-the-art method in diverse research domains, such as object/gesture recognition and reconstruction, because it is simple, flexible and powerful in terms of representation. Unlike triangle meshes, the point cloud does not need to store the polygonal mesh connectivity [1] or topological consistency [2]. Hence, use of point cloud in pattern recognition can yield better performance. These prominent advantages make the point cloud processing a hot research topic. In this paper, we focus on hand gesture recognition from 3D point cloud collected using depth sensor and then processing and classification of the data. There have been different low-cost depth sensors in the market, such as Kinect [3] and time of flight cameras [4]. These devices make it feasible to acquire point cloud in different domains. However, the raw point cloud data collected from these cameras include outliers [5, 6] and are affected by noise. The reasons behind such contamination of data are: limitations of sensor parameters, internal noise of the camera, the reflectivity of the surface, the surrounding lights, or an artifact in the visibility range [7]. Therefore, the raw point cloud data needs to be filtered strategically, before they can be used for further processing. A systematic literature review of hand gesture recognition based on infrared information and machine learning algorithms can be found in [8]. Another recent work [9] tabulates the performance of different hand gesture methods. The motivation behind this paper stems from the fact that there is a huge body of work on two-dimensional image-based hand gesture recognition and that area is saturated to some extent. Though the research in 3D point cloud has gained some momentum, still, compared to the two-dimensional ones there are a lot more variants remaining to be explored and there is scope for attempting new type of features for comparable or better performance. In our current work, we derive geometrical features based on point cloud distances and angles, as well as convex hull and moments-based features. Moreover, we perform repeated zoning of the threedimensional volume by slicing through vertical and horizontal planes and then derive the same features from each zone as was done for the original volume. Finally, we test the efficiency by running cross-validation with three state-of-the-art classifiers, namely, bagging, J48 and random forest on the public dataset to yield an accuracy up to 95.20%. The outline of this paper is as follows: Sect. 2 presents the Dataset and Preprocessing; Sect. 3 explains the details of computing different descriptors from point cloud; Sect. 4 presents and discusses the experimental results for different classifiers; We formulate the conclusions and future work in Sect. 5.
2 Dataset and Preprocessing For this work, a public dataset of Polish Sign Language (PSL) [10] is used, which contains a set of hand postures performed with the right hand located about 20 cm from the face. This dataset was recorded by three people (named as person I, person II, and person III) and consists of symbols ‘a’, ‘b’, ‘c’, ‘e’, ‘i’, ‘l’, ‘m’, ‘n’, ‘o’, ‘p’,
3D Point Cloud-Based Hand Gesture Recognition
131
ht Fig. 1 Sample gestures of Public PSL Dataset (taken from [10])
‘r’, ‘s’, ‘t’, ‘u’, ‘w’, ‘y’. Each symbol is captured 20 times. So altogether the dataset contains 960 images. The postures have variable orientations and positions relative to the camera, and differ in the arrangement of fingers and thumbs. Sample gestures of this dataset are shown in Fig. 1. After dataset collection, the next important task is noise elimination, which consists of a set of background objects, such as part of hand, part of body and part of surroundings. For PSL dataset, visualization of the point cloud reveals that it contains the background and the body parts. So extracting the point of interest was a challenging task. So, noise elimination happens in two phases—segmentation and outlier removal. The step-by-step process from preprocessing to feature extraction is shown in Fig. 2. The point-coordinates are given as (X, Y, Z ) values and thresholding in Z direction is required to segment the region of interest. First, the statistical mean and standard deviation of the set V of all the points are calculated with respect to the Z -coordinates only. Now for each point, if the Z -coordinate of that point is away from the mean by greater than or equal to k-times of the standard deviation, where k is a suitably chosen parameter, then that point is removed from the cloud. After substantial experimentation and parameter tuning, suitable values of k were found as 1.5 for PSL dataset. The output of this algorithm is Vseg ⊆ V . The output of segmentation for PSL dataset goes as input to outlier removal. The process for outlier removal is as follows. First, we have extracted the centroid of the point cloud. Then we have calculated the Euclidean distances from the centroid C
132
S. Paul et al.
to each points in V . Next, the mean and the standard deviations of these distances are calculated. As most of the points cluster in the hand region, the average value is also likely to belong to close to that cluster. So any point with a larger distance from the centroid than the average distance is likely to be an outlier. So for each point, if the Euclidean distance of that point from the mean exceeds by th -times of the standard deviation, where th is a suitably chosen parameter, then that point is removed from the point cloud as an outlier. By repeated trials, suitable values of th were found as 3 for PSL dataset. After removing these points, a more localized point cloud V ⊆ Vseg points are found.
3 Feature Descriptor In this section, we have described the features used to perform gesture recognition in detail. A few notations are introduced here. Let H be the set of points on the threedimensional convex hull of V = {V1 , . . . , Vm }. Let C and C H be the centroids of V and H . Let the set of the faces of the convex hull be denoted by F. Below all the features used are grouped by different categories and are described one-by-one.
3.1 Features from Points and Angles We have directly computed a group of features from the points and related angles in the point cloud. Point Cloud Distances. We have calculated the distances of vertices (Vi ) from centroid (C ) of the point cloud. There are a total of m distances, namely, d(C , Vi ), 1 ≤ i ≤ m. After binning, 10 histogram values are obtained from these distances as the first ten features. The minimum, the maximum, the weighted average along a specified axis and the standard deviation of these distances are also considered as features. Together, these values constitute the first fourteen feature set. Point Cloud Angles. For extracting this set of features, first, a line L between C and C H is drawn. This line is orientation free and it is used as a reference line. The angles between L and L i , the latter being the lines between C and each element of Vi , are considered as features. Then the histogram of the angle matrix with 10 bins are taken and again the minimum, the maximum, the average and the standard deviation of all the angle values are considered as next set of features. These values are the second feature set.
3D Point Cloud-Based Hand Gesture Recognition
133
Fig. 2 Step-by-step visualization of PSL Database: a raw point cloud visualization, b segmented point of interest from background, c pre-processed after outlier removal, d convex hull of the pre-processed image, e the zoning process at level 0, yielding the four zones of level 1
134
S. Paul et al.
3.2 Convex Hull Based Features This group relates to the convex hull of the point cloud and its several geometric attributes. Convex Hull-Based Distances. First, the distances from C H to all the points of H are calculated. Then the histogram of these distances with 10 bins are considered and also the minimum, the maximum, the average and the standard deviation of these distances are taken as features. These values are the third feature set. Convex Hull Face-Based Volumes. Next, the volumes of each tetrahedron formed by the four points are calculated as follows: three from the corner points of each face f ∈ F and C H as the fourth point. The histogram of the volumes with 10 bins are calculated and also the minimum, the maximum, the average and the standard deviation of all the volumes are extracted. These values are the fourth feature set. Convex Hull Face-Based Distance. The shortest distances from C H to each face f ∈ F are calculated. Then, the histogram of these distances with 10 bins and the minimum, the maximum, the average and the standard deviation of these distances are extracted. These are the fifth feature set. Convex Hull Face-Based Area. As the next features, the area of the tetrahedron formed by 3D convex Hull points are calculated. The histogram of these distances with 10 bins are calculated and the minimum, the maximum, the average and the standard deviation of all these distances are extracted. These values are the sixth feature set. Convex Hull Face-Based Consecutive Angles. The consecutive angles of the faces formed by 3D convex Hull points are also calculated. The histogram of these angles with 10 bins and the minimum, the maximum, the average and the standard deviation of all these distances are extracted. These values are the seventh feature set.
3.3 Moments-Based Features The statistical moments up to twelfth order of the entire point cloud are calculated as the next feature set, which is a specific quantitative measure of the shape of a set of points. The n-th moment of a variable X about a value c is E [(X − c)n ]. When c = E[X ], these are called the central moments. The zeroth moment is the total probability, the first central moment is the expected value, the second moment is the variance, the third moment is the skewness, and the fourth moment is the kurtosis.
3D Point Cloud-Based Hand Gesture Recognition
135
3.4 Zoning So far we have discussed two types of features: 7 categories of non-moments and a set of 13 different moments (of orders 0–12). From each non-moment category, a total of 14 features (10 from the histogram binning and the rest 4 from the minimum, the maximum, the average and the standard deviation) are extracted, giving a total of 7 × 14 = 98 features. From each of the 13 moments, three values are obtained, yielding an additional 3 × 13 = 39 features. So adding the number of features from the non-moment and the moment categories, a total of 98 + 39 = 137 features are obtained. Note that all this 137 features are calculated based on the entire point cloud. This process is leveled as level 0. To go one step further, the point cloud region is divided into four zones, called the zones of level 1, by a horizontal and a vertical plane through the center of the point cloud, as shown in Fig. 2e. From each of this four zones, again 137 features are calculated as above. Now, each zone of level 1 is divided into four sub-zones, yielding a total of 4 × 4 = 16 zones of level 2. Further, each of these 16 zones of level 2 is divided into four sub-zones, giving a total of 16 × 4 = 64 zones of level 3. Thus, from these four levels, numbered 0–3, a total of 1 + 4 + 16 + 64 = 85 zones are created, each of which gives a set of 137 features. Thus, a total of 85 zones × 137 features per zone = 11,645 features are extracted. Figure 2 show the visualization of the preprocessing, convex hull construction and the zoning process. Note that while zoning, those planes are considered which are perpendicular to the X-axis and the Y -axis, but the plane perpendicular to the Z-axis is ignored. The reason is that the Z-depth of the hand is much smaller compared to the other two dimensions and slicing perpendicular to the Z-direction is likely to cluster most points on one side only.
4 Experimental Results For experimentation on the PSL dataset, we have used three classifiers which are as follows: bagging, J48 and random forest. For each classifier, two types of crossvalidations are performed: tenfold (denoted by CV10) and 20-fold (denoted by CV20). We have done this experimentation with seven different feature combinations. In each of these combinations, the non-moment features remain the same, and only the moment features are varied. Accordingly, these combinations are given different names, as follows. (1) M0-12, i.e., all the moments of order 0–12 are present, (2) M04, i.e., the moments of order 0–4 only are present, (3) M0691012, i.e., the moments of order 0, 6, 9, 10 and 12 only are present, (4) M0-4691012, i.e., the moments of order 0 to 4 and 9, 10 and 12 only are present, (5) M0-12PCA800, all the moments of order 0 to 12 with 800 principal components are present, (6) M0-12PCA500, all
136
S. Paul et al.
Table 1 Classification performance on PSL dataset with different classifiers and feature combinations (in each combination, all the non-moment features are also present) Feature Bagging J48 Random forest combination CV10 CV20 CV10 CV20 CV10 CV20 M0-12 M0-4 M0691012 M04691012 M012PCA800 M012PCA500 M012PCA100 M012PCA800 M012PCA500 M012PCA100
89.89 89.68 89.37 89.58
89.47 88.22 90.10 90.52
76.87 76.97 67.70 76.97
81.14 81.35 67.91 81.25
95 94.27 94.27 94.68
95.20 93.85 93.43 94.47
85.83
84.68
78.95
78.95
89.79
90.52
85.20
84.58
79.68
78.95
91.56
91.87
86.14
86.56
78.54
79.58
91.87
92.5
84.16
86.35
75.10
76.25
89.37
88.75
83.64
84.89
74.79
77.70
90.93
90.72
86.04
85.41
78.85
78.33
91.66
91.97
the moments of order 0–12 with 500 principal components are present and (7) M012PCA100, i.e., all the moments of order 0–12 with 100 principal components are present. The results of the experiments performed on the PSL dataset are given in Table 1. It is observed that the random forest classifier executed on the feature combination M012 with CV20 gives the best accuracy of 95.20%, and the same feature combination with CV10 is also quite close, i.e., 95%.
5 Conclusion This work focuses on point-cloud-based static hand gesture recognition using statistical features. To increase the distinctiveness, first the features from the entire point cloud are extracted and then the cloud is divided into multiple rectangular zones and the same types of features are extracted from each zone. One important aspect of these features are that they are orientation independent. Experimental results on a public dataset show that our method achieves good accuracy in real-time.
3D Point Cloud-Based Hand Gesture Recognition
137
The dataset we have worked with is small and only for static digit gestures. However, given more time and resources, more datasets can be experimented on with more samples. More investigation is needed to explore possible new features as well as effects of different classifiers. Future work in the area may also include the development of more effective methods for rejecting points belonging to the forearm, which may improve the accuracy of classification. Recognition of dynamic gestures in point clouds seems to be another interesting direction of research.
References 1. Botsch M, Pauly M, Kobbelt L, Alliez P, Levy B, Bischoff S, Roessl C (2007) Geometric modeling based on polygonal meshes. ACM SIGGRAPH 2007 Papers—International conference on computer graphics and interactive techniques 2. Preiss K (1982) Topological consistency rules for general finite element meshes. In: Pipes A (ed) CAD82. Butterworth-Heinemann, pp 453–460 3. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43:1318–1334 4. Park J, Kim H, Tai YW, Brown MS, Kweon I (2011) High quality depth map upsampling for 3D-TOF cameras. In: 2011 International conference on computer vision. IEEE, pp 1623–1630 5. Landa J, Procházka D, Stastny J (2013) Point cloud processing for smart systems. Acta Univ Agric Silvic Mendelianae Brunensis 61:2415–2421 6. Xie H, McDonnell KT, Qin H (2004) Surface reconstruction of noisy and defective data sets. In: IEEE visualization 2004. IEEE, pp 259–266 7. Zaman F, Wong YP, Ng BY (2017) Density-based denoising of point cloud. In: 9th International conference on robotic, vision, signal processing and power applications. Springer, Berlin, pp 287–295 8. Nogales RE, Benalcázar ME (2021) Hand gesture recognition using machine learning and infrared information: a systematic literature review. Int J Mach Learn Cybern 1–28 9. Oudah M, Al-Naji A, Chahl J (2020) Hand gesture recognition based on computer vision: a review of techniques. J Imaging 6:73 10. Kapuscinski T, Oszust M, Wysocki M, Warchol D (2015) Recognition of hand gestures observed by depth cameras. Int J Adv Rob Syst 12:36
Motion-Based Representations for Trajectory-Based Hand Gestures: A Brief Overview Debajit Sarma, Trishna Barman, M. K. Bhuyan, and Yuji Iwahori
Abstract Action/gesture representation especially modeling of actions/gestures has a special role in the recognition process. Here, in this paper, we would primarily look for motion-based hand gesture representations which are widely used but less talked about topics. Model-based and appearance-based methods are the two primary techniques for hand gesture representation. Apart from these two, motion-based approaches have gained quite impressive performance in various applications. Many researchers generally include motion-based methods in appearance-based methods. But here we want to discuss the motion-based methods separately with special attention representing hand gestures. Most of the representations generally depend on the shape, size, and color of the body/body part. But these may vary depending on many factors, e.g., illumination variation, image resolution, skin color, clothing, etc. But motion estimation should be independent of these factors. Optical flow and motion templates are the two major motion-based representation schemes that can be used directly to describe human gestures/actions. The main benefits of these techniques are basically their simplicity, ease of implementation, competitive performance, and efficiency. Keywords Action and gesture recognition · MEI–MHI · Dynamic image
D. Sarma (B) · M. K. Bhuyan Department of Electronics and Electrical Engineering, Indian Institute of Technology (IIT) Guwahati, Guwahati, Assam 781039, India e-mail: [email protected] M. K. Bhuyan e-mail: [email protected] T. Barman Department of Electronics and Communication Engineering, Tezpur University, Tezpur, Assam 784028, India e-mail: [email protected] Y. Iwahori Department of Computer Science, Chubu University, Kasugai 487-8501, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_14
139
140
D. Sarma et al.
1 Introduction The primary task of gesture-based interfaces is to detect and recognize visual information for communication. The straightforward approach to a gesture-based recognition system is to acquire visual information about a person in a certain environment and try to extract the necessary gestures. This approach must be performed in a sequence, namely acquisition, detection, and preprocessing; gesture representation and feature extraction; and recognition (Fig. 1). In the literature [12, 30, 32], it is mentioned that gestures are represented by either model-based or appearance-based model (shown in Fig. 2). Here, we are going to analyze motion-based representations specifically for trajectory-based gesture recognition. But before going into details, we would like to recap the other methods and also discuss why there is a requirement for specific representation for trajectory-based or dynamic gestures. A gesture must be represented using a suitable model for its recognition. Based on feature extraction methods, the following are the types of gesture representations: model based and appearance based (Fig. 2). 1. Model based: Here gestures can be modeled utilizing either a 2D model or a 3D model. The 2D model essentially relies upon either different color-based models like RGB, HSV, YCbCr, and so forth, or silhouettes or contours obtained from 2D images. The deformable Gabarit model relies upon the arrangement of active deformable shaping. Then again, 3D models can be classified into mesh model [22], geometric model, volumetric models, and skeletal models [37]. The volumetric model addresses hand motions with high exactness. The skeletal model diminishes the hand signals into a bunch of identical joint angle parameters with fragment length. For instance, Rehg and Kanade [31] utilized a 27-level degree-of-freedom (DOF) model of the human hand in their framework called ‘Digiteyes’. Local image-based trackers are utilized to adjust the extended model lines to the finger edges against a solid background. Crafted by Goncalves et al., [18] advanced three-dimensional tracking of the human arm utilizing a two cone arm model and a single camera in a uniform background. One significant drawback of model-based portrayal utilizing a single camera is self-occlusion [18] that often happens in articulated objects like a hand. To stay away from it, a few frameworks utilize multiple/stereo cameras and restrict the motion to small regions [31]. But it also has its own disadvantages like precision, accuracy, etc. [12]. 2. Appearance based: The appearance-based model attempts to distinguish gestures either straightforwardly from visual images/videos or from the features derived from the raw data. Highlights of such models might be either the image sequences or a few features obtained from the images which can be utilized for hand-tracking or classification purposes. For instance, Wilson and Bobick [40] introduced results utilizing activities, generally hand motions, where the genuine grayscale images (with no background) are utilized in real-life portrayal. Rather than utilizing raw grayscale images, Yamato et al. [44] utilized body silhouettes, and Akita [2] utilized body shapes/edges. Yamato et al. [44] used low-level silhouettes of human activities in a hidden Markov model (HMM) system, where
Motion-Based Representations for Trajectory-Based
141
Fig. 1 Basic architecture of a typical gesture recognition system
Fig. 2 Different hand models for hand gesture representation
binary silhouettes of background-subtracted images are vector quantized and used as input to the HMMs. In Akita’s work [2], the utilization of edges and some straightforward two-dimensional body setup information were utilized to decide the body parts in a progressive way (first, discover legs, then the head, arms, trunk) in light of steadiness. While utilizing two or three-dimensional primary data, there is a prerequisite of individual features or properties to be extracted and tracked from each frame of the video sequence. Consequently, movement understanding is truly cultivated by perceiving an arrangement of static setups that require previous detection and segmentation of the item. Furthermore, since the good old days, sequential state-space models like generative hidden Markov models (HMMs) [26] or discriminative conditional random fields (CRFs) [6] have been proposed to demonstrate elements of activity/gesture recordings. Temporal ordering models like dynamic time warping (DTW) [3] have likewise been applied with regard to dynamic activity/gesture recognition where matching of an incoming gesture is done to a set of pre-defined representations. In both the above-mentioned representations, feature extraction is one of the most important steps. And most of the feature extraction procedures need to segment the body/body part from the background of the image which can be done using a color scheme. But this approach is not reliable due to large variation in both skin colors and luminance. Even the use of chrominance component, just like in YCbCr or HSV color space, may not be able to give required accuracy in segmentation [33, 34]. Most of the applications using feature extraction use domain-based models and thus provide very specialized solutions. Moreover, due to the large variability of motion patterns in a video, usually latent sequential models (i.e., state-space models like HMM and CRF) may not be that much efficient [17]. A problem with the sequence
142
D. Sarma et al.
matching (i.e., DTW) approach is that a high variety of actions/gestures executed by different kinds of people cannot be matched [5]. To cope with these problems, there is a need for motion-based representation. Optical flow and motion templates are the two major motion-based representation schemes and can be used directly to describe human gesture/action [35]. Optical flow generally gives the motion flow of the moving object frame after frame, whereas motion templates basically describe the video-wide temporal evaluation of video-level dynamics or appearance of the motion.
2 Motion-Based Approaches The shape and appearance of body/body part depend on many factors, e.g., clothing, illumination variation, image resolution, etc. But estimation of motion field is invariant to shape and appearance (at least in theory) and can be used directly to describe human gesture/action. Optical flow and motion templates are the two main motion-based representation methods.
2.1 Optical Flow Techniques Optical flow is the apparent motion or displacement of objects/pixels as perceived by an observer. Optical flow indicates the change in image velocity of a point moving in the scene, also called a motion field. Here the goal is to estimate the motion field (velocity vector) which can be computed from horizontal and vertical flow fields. Ideally, the motion field represents the 3D motion of the points of an object across 2D image frames for a definite frame interval. The following points are considered or assumed in estimating optical flow. • • • •
Brightness constancy: Flow is independent of illumination changes in the scene. Small motion: Points do not move very far in consecutive frames. Spatial coherence: Points move like their neighbors. Unwanted object motion: Motion of unwanted objects like shadow should not affect the optical flow.
Out of different optical flow techniques found in the literature, the most common methods are (a) Lucas-Kanade [27], (b) Horn-Schunk [19], (c) Brox 2004 [10] and (5) Brox 2011 [11], and (d) Farneback [15]. All these methods presume the abovementioned criteria, and the choice of optical flow method primarily depends on the power of the resulting histogram of optical flow (HOF) descriptor or motion boundary histogram (MBH) descriptor. HOF gives the optical flow vectors in horizontal and vertical directions. The natural thought of MBH is to address the oriented gradients
Motion-Based Representations for Trajectory-Based
143
computed over the vertical and horizontal optical flow components. When horizontal and vertical optical flow segments are acquired, histograms of oriented gradients are computed on each image component. The result of this interaction is a couple of horizontal (MBHx) and vertical (MBHy) descriptors. Laptev et al. [25] executed a blend of HOG-HOF for taking insensible human activity from motion pictures. Dalal et al. [13] additionally proposed to ascertain changes of optical flow that focus on optical flow differences between frames (motion boundaries). Yacoob and Davis [43] utilized optical flow estimations to follow pre-defined polygonal patches set on interest areas for facial expression recognition. Wixson [41] introduced an incorporated methodology where the optical flow is coordinated frame-by-frame over time by considering the consistency of direction. In [28], the optical flow was used to detect the direction of motion along with the RANSAC algorithm which in turn helped to further localize the motion points. In [21], authors have used optical flow guided trajectory images for dynamic hand gesture recognition using a deep learning-based classifier. But the main problem with the optical flow technique is expensive computational efficiency. Moreover, it is very sensitive to noise and outliers due to background motion. And, here we will explain the basic concept of mitigating different constraints related to optical flow estimation assuming fixed camera position. To get rid of the problem of noise and outlier due to background motion, Gaussian smoothing operation is done to image frame k(x, y, t), where (x, y) denotes the location of the pixel and t denotes time. Smoothing is done prior to differentiation, by convolving each frame with some Gaussian kernel G σ (x, y) of standard deviation σ: I (x, y, t) := (G σ ∗ k)(x, y, t).
(1)
The low-pass effect of Gaussian convolution removes noise and other destabilizing high-frequency outliers. In a subsequent procedure, σ also called the ‘noise sale’ can be chosen of different values. While some moderate pre-smoothing improves the results, great care should be taken not to apply too much pre-smoothing, since this would severely destroy important image structure. Another problem is tracking points that are moving long distances with a higher speed of motion. This can be mitigated by a course-to-fine optical flow estimation by forming an image pyramid. All these steps are shown in Fig. 3. While applying the single-scale Lucas-Kanade optical flow algorithm, it is assumed that the window has little motion so that high-order terms in the derivation of Taylor expansion can be ignored. But this assumption fails for objects with long-distance movement in consecutive frames. In this case, the iterative coarse-to-fine method helps a lot which is applied to an image pyramid building multiple copies with different resolutions for each image frame. Each level in the pyramid is one-fourth of the size of the previous higher-resolution level. To get rid of the small motion constraint, first, we start from the lowest resolution level. Then the iterative optical flow is used to estimate potential motion velocity at this level and then expand it to a higher-resolution level through the warp and up-sampling. This is done because a lower-resolution image can provide
144
D. Sarma et al.
Fig. 3 Steps to obtain optical flow from input video frames
better optical flow for large motion compared to a higher-resolution image. So we first start with a coarse resolution and warp it to fine resolution through interpolation. But the main problem with this technique is that it makes the computational efficiency a little expensive. In the iterative process, potential optical flow is estimated in one level on the window corresponding to one pixel, then we reapply the estimated vector to warp the image to a new position. This process is repeated for several iterations until the residual motion is sufficiently small.
2.2 Motion Templates In this section, we describe three major methods that have been used specifically in dynamic gesture recognition where researchers have attempted to characterize the motion of the body/body part by converting the whole video dynamics into a single image. Basically, these images are compact representations of videos useful for video analysis where a single image summarizes the appearance and dynamics of the whole video sequence. Hence, these images are named motion-fused images or temporal templates. Let us briefly discuss three widely used motion fusion strategies, namely MHI-MEI, dynamic images, and methods based on PCA and also explain how to obtain them. 1. MEI–MHI: The pioneering work on motion-fused images is motion-energyimage (MEI) and motion-history-image (MHI) by Davis and Bobick [9] in the year 2001. MEI and MHI are proposed to represent the motion evaluation of an object in a video where all the frames in the video sequence are projected onto one image across the temporal axis. This is the starting of a novel approach where people thought of converting a complete dynamic video/video frames with motion templates into a single image. MEI represents where motion has occurred in an image sequence, whereas MHI represents how the object is moving (Fig. 4). MEI describes the motion shape and spatial distribution of a motion, and MHI is a function of the intensity of motion of each pixel at that location. The advantage of MEI–MHI representation is that a range of frames may be encoded into a single
Motion-Based Representations for Trajectory-Based
145
Fig. 4 MEI and MHI example from [9]
frame, and in this way, it squeezes the timescale of human actions/gestures. Moreover, MEI can be generated by thresholding the MHI above zero. To make the system view invariant, authors of [9] have used seven Hu moments [20] which are translation- and scale invariant. For each view of each movement, a statistical model of the moments (mean and covariance matrix) is generated for both the MEI and MHI. To recognize an input movement, Mahalanobis distance is calculated between the moment description of the input and each of the known movements. Grayscale MHI is sensitive to the direction of motion, unlike the MEIs, and hence better suited for discriminating between actions of opposite directions (e.g., ‘sitting down’ vs. ‘standing up’). Though lots of modifications are there on MEI–MHI implementation in the literature [1], still it has some crucial problems [1]. First, it fails to separate the motion information when there is self-motion-occlusion or overwriting of prior information like if a person sits down and then stands up. Second, the change of the standing position of a person while executing an action may produce false recognition for an action. Third, the MEI–MHI method is not suitable for dynamic background with its basic representation (which is based on background subtraction or image differencing approaches). There is always a requirement of having stationary objects in the background. Also, it is unable to discriminate among similar motions making it always crucial of employing for recognition purposes. This is because the
146
D. Sarma et al.
Fig. 5 Dynamic images summarizing the actions and motions that happen in (from left to right and top to bottom): blowing hair dry, band marching, balancing on beam, golf swing, fencing, and playing the cello [8]
MEI–MHI method takes into account the global motion calculation of the image frames which is dependent on the variances in movement duration. MEI–MHI is always a choice of representation for action recognition, only when temporal segmentation is available, actors are fully visible and can be separated from each other. The major advantage of the MEI–MHI method is its simplicity and less computational complexity. MEI–MHI can be implemented by the following algorithm, and the outcome is shown in Fig. 4. MEI–MHI Algorithm [9]: • Image sequences
I (x, y, t) = (I1 , I2 , ..., In ).
(2)
B(x, y, t) = |I (x, y, t) − I (x, y, t − 1)|.
(3)
• Image binarization
Motion-Based Representations for Trajectory-Based
147
Fig. 6 Principal motion components for the gesture dataset of helicopter signals: Each row is associated with a different gesture, the first three columns of each row display top three principal motion components of the gesture; columns 4–6 show the MHI, motion maps, and a visual description of the corresponding gesture, respectively [14]
where B(x, y, t) = • MEI
1 if B(x, y, t) > ξ 0 otherwise E τ (x, y, t) =
τ −1
B(x, y, t − i).
(4)
i=0
• MHI
Hτ (x, y, t) =
1 if B(x, y, t) = 1 max(0, Hτ (x, y, t − 1) − δ otherwise
where τ decides the temporal extent of the motion in terms of frames and δ is the decay parameter. 2. Dynamic images: Dynamic image (DI) [8] (shown in Fig. 5) is a novel videowide temporal evolution representation. It basically captures the video-wide temporal dynamics of a video converting into a single image, suitable for action/gesture recognition. It is observed that if the execution time of actions varies greatly, the temporal ordering is typically preserved. So dynamic image generally uses a technique called rank pooling which is the process of ranking the frame content used to capture the video-wide temporal evolution and pooling the whole video into a single image [17]. Major advantages of rank pooling are (a)
148
D. Sarma et al.
rank pooling is useful and robust for encoding video-wide, temporal information, and (b) since it does not extract any trajectories or other more sophisticated features, it is computationally not expensive. So this novel dynamic image is a simple, efficient, compact, and very powerful method to extract video-wide temporal evolution into a single image, particularly useful in the context of deep learning. Another notable advantage of DI compared to other classical methods is that it performs quite well for both fast/slow and short/long actions. Normally, classical methods are applicable to only slow (< 30 frames per second) and short (only a few seconds) videos. So in such cases, the dynamic image method is applicable where there exist characteristic motion patterns and dynamics [8]. In [7], authors have mentioned MHI as a direct competitor to the dynamic image method. Here, authors have shown that DIs provide a more detailed representation of the videos, as the range of intensity values is not limited to the number of frames as in MHIs. Second, DIs are more robust to moving viewpoint, long-range, and background motion. Finally, in contrast to DIs, MHIs can only represent the motion gradient in object boundaries. In [16], authors have presented a hierarchical rank pooling method that consists of a network of nonlinear operations and rank pooling layers. It has shown substantial performance improvement over other temporal encoding and pooling methods such as max-pooling [24], average pooling [24], rank pooling [7], temporal pyramids [24], and LSTMs [38]. Dynamic Image (DI) Algorithm [8]: • Image sequences
V = (I1 , I2 , ..., IT ).
(5)
1 ψ(t) = ψ(Iτ ) t τ =1
(6)
d ∗ = (I1 , I2 , ..., IT ; ψ) = argminE(d).
(7)
• Time-average feature
t
• Dynamic image
• Optimization problem E(d) =
λ 2 ||d||2 + max(0, 1 − S(k|d) + S(l|d)) 2 T (T − 1)
(8)
where k > l ⇒ S(k|d) > S(l|d), i.e., later times are given larger score. 3. PCA and robust PCA using PCP method: The use of principal component analysis (PCA) as a foreground-detection technique is well known in various applications like object detection [29], pedestrian detection [23], and video surveillance. But there are only few instances when PCA-based method is used for gesture [14] (shown in Fig. 6) or activity [4] recognition. Robust PCA is a matrix factorization method that decomposes the input matrix I into the sum of two matrices,
Motion-Based Representations for Trajectory-Based
149
i.e., I = L + S, where L is low-rank matrix and S is sparse matrix. The background sequence is then modeled by a low-rank subspace that generally changes gradually over time, while the moving foreground objects are constituted by the correlated sparse matrix. This is done by solving the following optimization problem called principal component pursuit (PCP) min ||L||∗ + λ||S||1 s.t. L + S = I,
(9)
where || ||∗ and || ||1 are the nuclear norm (which is the l1 -norm of singular value) and l1 -norm, respectively, and λ > 0 is an arbitrary balanced parameter. Major advantages of the PCA-based method are [14] (a) it performs quite well in both RGB and depth video, and (b) it is particularly well suited for the case when motion happens in different locations of the image stream. 4. Additional comments: The main disadvantage of all the three motion-template methods is in representing static gestures or when a user remains static while performing some gesture/action in the video. Moreover, there may also be some difficulty in generating these motion templates if the background has some moving object. Motion estimation of the image pixels is the key factor in optical flow, whereas in motion templates, video-wide temporal evaluation and their representations are widely used for action/gesture recognition. Both these methods have their own advantages and are accordingly applied in motion analysis and other related applications. There are also a few examples like [35, 36, 39, 42], where these two methods are combined together.
3 Conclusion In this paper, we have analyzed the various motion-based representation techniques for hand gestures and in general action recognition. Optical flow and motion templates are the two major motion-based representation schemes. Motion estimation of the image pixels is the key factor in optical flow, whereas in motion templates, video-wide temporal evaluation and their representations are widely used for action/gesture recognition. Action recognition is widely applied in intelligent video surveillance, injury and abnormal behavior recognition, action analysis and video retrieval, human–computer interaction, etc. On the other hand, gestures constitute a common and natural means for non-verbal communication. The applications of gesture recognition cover various domains, ranging from sign language to medical assistance to virtual reality. There can be a broad range of areas where these gesture representation techniques can be applied specifically in activity recognition, gesture recognition, video surveillance, etc.
150
D. Sarma et al.
References 1. Ahad MAR, Tan JK, Kim H, Ishikawa S (2012) Motion history image: its variants and applications. Mach Vis Appl 23(2):255–281 2. Akita K (1984) Image sequence analysis of real world human motion. Pattern Recognit 17(1):73–83 3. Alon J, Athitsos V, Yuan Q, Sclaroff S (2009) A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans Pattern Anal Mach Intell 31(9):1685–1699 4. Arunraj M, Srinivasan A, Juliet AV (2018) Online action recognition from RGB-D cameras based on reduced basis decomposition. J Real-Time Image Process 1–16 5. Barros P, Magg S, Weber C, Wermter S (2014) A multichannel convolutional neural network for hand posture recognition. In: International conference on artificial neural networks. Springer, Berlin, pp 403–410 6. Bhuyan MK, Kumar DA, MacDorman KF, Iwahori Y (2014) A novel set of features for continuous hand gesture recognition. J Multimodal User Interfaces 8(4):333–343 7. Bilen H, Fernando B, Gavves E, Vedaldi A (2017) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 8. Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3034–3042 9. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267 10. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: European conference on computer vision. Springer, Berlin, pp 25–36 11. Brox T, Malik J (2011) Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans Pattern Anal Mach Intell 33(3):500–513 12. Chakraborty BK, Sarma D, Bhuyan M, MacDorman KF (2017) Review of constraints on visionbased gesture recognition for human-computer interaction. IET Comput Vis 12(1):3–15 13. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, Berlin, pp 428–441 14. Escalante HJ, Guyon I, Athitsos V, Jangyodsuk P, Wan J (2017) Principal motion components for one-shot gesture recognition. Pattern Anal Appl 20(1):167–182 15. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis. Springer, Berlin, pp 363–370 16. Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1924–1932 17. Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787 18. Goncalves L, Di Bernardo E, Ursella E, Perona P (1995) Monocular tracking of the human arm in 3d 19. Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203 20. Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187 21. Kavyasree V, Sarma D, Gupta P, Bhuyan M (2020) Deep network-based hand gesture recognition using optical flow guided trajectory images. In: 2020 IEEE applied signal processing conference (ASPCON). IEEE, pp 252–256 22. Keskin C, Kıraç F, Kara YE, Akarun L. Real time hand pose estimation using depth sensors. In: Consumer depth cameras for computer vision. Springer, Berlin, pp 119–137 23. Kim H et al (2013) Novel and efficient pedestrian detection using bidirectional PCA. Pattern Recognit 46(8):2220–2227
Motion-Based Representations for Trajectory-Based
151
24. Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212 25. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition. CVPR 2008. IEEE, pp 1–8 26. Lee HK, Kim JH (1999) An HMM-based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(10):961–973 27. Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision 28. Mahbub U, Imtiaz H, Ahad MAR (2011) An optical flow based approach for action recognition. In: 14th International conference on computer and information technology (ICCIT 2011). IEEE, pp 646–651 29. Malagón-Borja L, Fuentes O (2009) Object detection using image reconstruction with PCA. Image Vis Comput 27(1–2):2–9 30. Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for humancomputer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695 31. Rehg JM, Kanade T (1995) Model-based tracking of self-occluding articulated objects. In: Fifth international conference on computer vision. Proceedings. IEEE, pp 612–617 32. Sarma D, Bhuyan M (2021) Methods, databases and recent advancement of vision-based hand gesture recognition for HCI systems: a review. SN Comput Sci 2(6):1–40 33. Sarma D, Bhuyan M (2022) Hand detection by two-level segmentation with double-tracking and gesture recognition using deep-features. Sens Imaging 23(1):1–29 34. Sarma D, Bhuyan MK (2018) Hand gesture recognition using deep network through trajectoryto-contour based images. In: 15th IEEE India council international conference (INDICON), pp 1–6 35. Sarma D, Bhuyan MK (2020) Optical flow guided motion template for hand gesture recognition. In: Proceedings of the 2nd IEEE conference on applied signal processing (ASPCON) 36. Sarma D, Kavyasree V, Bhuyan M (2020) Two-stream fusion model for dynamic hand gesture recognition using 3D-CNN and 2D-CNN optical flow guided motion template. arXiv:2007.08847 37. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1297–1304 38. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using LSTMs. In: International conference on machine learning, pp 843–852 39. Tsai DM, Chiu WY, Lee MH (2015) Optical flow-motion history image (OF-MHI) for action recognition. Signal Image Video Process 9(8):1897–1906 40. Wilson AD, Bobick AF (1995) Learning visual behavior for gesture analysis. In: International symposium on computer vision, 1995. Proceedings. IEEE, pp 229–234 41. Wixson L (2000) Detecting salient motion by accumulating directionally-consistent flow. IEEE Trans Pattern Anal Mach Intell 22(8):774–780 42. Xu H, Li L, Fang M, Zhang F (2018) Movement human actions recognition based on machine learning. Int J Online Biomed Eng (IJOE) 14(04):193–210 43. Yacoob Y, Davis LS (1996) Recognizing human facial expressions from long image sequences using optical flow. IEEE Trans Pattern Anal Mach Intell 18(6):636–642 44. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. In: IEEE computer society conference on computer vision and pattern recognition, 1992. Proceedings CVPR’92. IEEE, pp 379–385
An Approach Toward Detection of Doubling Print Defect Using SSIM Algorithm Jayeeta Saha and Shilpi Naskar
Abstract This paper presents a computer vision-based print quality assessment approach for doubling printing defect detection using Structural Similarity Index Measure (SSIM) algorithm. Doubling is a common print problem where a nondirectional double image is created. This type of problem occurs due to loose printing blanket, gear jerking, impression from previous print, etc. In terms of print quality assessment doubling is unwanted and must be detected to get good print production. In this paper this is achieved using computer vision method where SSIM algorithm is used to detect the doubling defect in several print samples that contain text and image both. The overlapped region of doubling print problem is difficult to segregate. Here in this paper an approach has been made by comparing structural similarity of overlapped edge to detect the double impression. In both cases the SSIM algorithm shows considerable results of doubling defect detection. Moreover a comparative study has also been done with other print quality assessment metrics like mean square error (MSE) and Feature Similarity Indexing Method (FSIM) which proves the effectiveness of SSIM algorithm in this kind of print defect detection purpose. The results show the strength of the presented technique that can be an alternative way of present subjective manual detection of doubling. Keywords Doubling · Printing defect · Print quality assessment · SSIM · Computer vision
1 Introduction In offset lithography printing process when a faint duplicate printed impression occurs with the actual printed image, the defect is called doubling [1]. Doubling occurs in wet-on-wet printing when previous ink transferred to the sheet through blanket [1]. Before the measurement of dot gain doubling must be eliminated. J. Saha · S. Naskar (B) Department of Printing Engineering, Jadavpur University, Saltlake Campus, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_15
153
154
J. Saha and S. Naskar
Doubling can cause for many reasons like when printing blankets are packed improperly, blankets are not torqued properly, misalignment of drive gears, dimension changes in substrate due to moisture absorption, grippers are poorly timed, poor infeed setup, smashed blanket, cylinder surface, etc. A non-directional extra image is created by doubling. When halftone dots are printed with a tiny shadow it appears as doubling [2]. Though a certain amount of dot growth is unavoidable, the increasing dot gain dominantly specifies the presence of doubling. Doubling can also occur when substrate comes into contact with the blanket twice creating a double impression. In case of offset printing, in same page doubling can be repeated in different other image elements. When the blanket pulls a part of an already-printed image off the sheet and then transfers this image slightly out of register onto the next sheet passing under the blanket, doubling problem occurs. However, doubling is an undesirable print problem. It can be resolved by cleaning the blanket or changing the ink grade or changing the paper grade. Here in Fig. 1 two such doubling print defects of a printed sample image are shown. As in doubling defect, double impression is overlapped with actual printed text or image; here in this paper an approach is presented to detect the same printed structure by using Structural Similarity Index Measure (SSIM) algorithm. SSIM compares the similarity of images by comparing the structural information including the functions like luminance, contrast, and structure. The main purpose of this proposed work is to fabricate the detection procedure with less time consuming and with less manual intervention. The previously done work in the domain of printing defect identification and application of SSIM algorithm are follows: print defect on metal container is detected which is based on chromatism and SSIM [3], Real time print circuit board (PCB) defect is detected by applying SSIM and Mobile net-V3 [4], Structural similarity measurement with metaheuristic algorithm for content based image retrieval [5], Image quality metrix: PSNR versus SSIM [6], Detection and classification of PCB
D O U B L I N G Fig. 1 Sample image
An Approach Toward Detection of Doubling Print Defect Using SSIM …
155
using image subtraction method [7], Local high order correlation method is used for defect detection purpose [8], Print defect identification of pharmaceutical tablet blister [9], Scumming in printing identified using DCT [10]. A survey of image quality measures [11] where the author did a survey on image quality assessment with different image quality metric. A comparative analysis had been made in between images through quality assessment metrics like FSIM, SSIM, MSE, and PSNR [12]. Here in this paper the above-mentioned printing defect detection in terms of printing quality assessment is done by applying Structural Similarity Index Measure (SSIM), and to prove the effectiveness of this approach another two quality assessment algorithms: mean square error (MSE) and Feature Similarity Index (FSIM) are also applied. Pre-processing and post-processing of image for the detection purpose are same for these three algorithms.
2 Presented Method The proposed approach is presented in the flowchart in Fig. 2. First of all digital copy of doubling defect printed sample is taken by imaging device. Here in this process mobile camera is used mainly to take the digital copies of samples. After that the captured RGB image is converted into binary image to ease the further operations of doubling print defect detection. Here the image is converted into grayscale image. In grayscale image the value of each pixel carries only intensity information; i.e., the intensity of a pixel is ranged from 0 (black) to 1 (white) and any fractional values in between. So it is quite easy to perform image processing operations like edge detection, segmentation of image, smoothening, sharpening, etc. in gray color space rather than RGB color space. Next pixel connectivity is computed as in doubling print defect doubled impression of print overlapped with each other. So for edge detection here 8 connectivity of pixels are labeled, and based on that further image processing operations are performed. Connectivity of pixels is mainly done to detect the distinct and overlapped region in the image. In next step an adaptive thresholding has been performed to segment the foreground and background image. Then morphological operations like close and open operations are performed. Opening operation first erodes the image and then dilates with same structuring element. And closing operation erodes the dilated image. Here in the sample images as the overlapped portions are there, it is needed to erode and dilate the image as per requirements to get the connecting contour of the region of interest. After the morphological operation, logical operation is performed in order to get better result. Here in this work Prewitt operator is used to detect the edge of double printed area. Now in next step the region of interest from doubling edge detected image is cropped. Here the two edges of double printed area are taken to check the structural similarity. SSIM carries the structural information of pixels which are closed spatially. SSIM is calculated between two common-sized (N × N) images x and y.
156
J. Saha and S. Naskar
Fig. 2 Flowchart of presented method
2(μx μ y + c1 ) 2σx y + c2 SSIM(x, y) = 2 μx + μ2y + c1 σx2 + σ y2 + c2
(1)
where μx = average of x; μ y = average of y; σx2 = variance of x; σ y2 = variance of y; σx y = covariance of x and y; c1 = (K 1 L)2 , c2 = (K 2 L)2 ; and L = dynamic range of the pixel values. k 1 = 0.01 and k 2 = 0.03 by default. Image distortion between the sample images can be modeled by SSIM by comparing three factors: contrast distortion (c), luminance distortion (l), and loss of structural correlation (s) [11]. 2(μx μ y + c1 ) l(x, y) = 2 μx + μ2y + c1 2 σx σ y + c2 c(x, y) = 2 σx + σ y2 + c2 s(x, y) =
σx y + c3 σx σ y + c3
(2)
(3) (4)
Here in this work percentage of structural similarity is measured between double printed edges. It is important to note that as in doubling defect, printed edges are
An Approach Toward Detection of Doubling Print Defect Using SSIM …
157
overlapped due to double impression; the problem needs to compare these abovementioned factors to detect the similarity between the overlapped edges of the printed image. Now for comparative analysis, mean square error (MSE) is applied after cropping the region of interest. Generally MSE measures the error between the images where the average of squared errors is represented by MSE [12]. MSE =
m m 1 g(n, ˆ m) − g(n, m)2 M N n=0 n=1
(5)
Another print quality assessment algorithm named Feature Similarity Indexing Method (FSIM) is also applied to the images. This algorithm mainly maps similarity between two images based on their features. Phase congruency and gradient magnitude of images are the most two important criteria to be followed to measure the characteristics of images. Phase congruency is contrast invariant, whereas gradient magnitude measures the horizontal and vertical gradient of images. Gradient magnitude of image F(x) can be represented as [13]: GM =
G 2x + G 2y
(6)
where Gx , Gy is horizontal and vertical gradient of image F(x). Now similarity can be measured in two ways: from gradient magnitude and from phase congruency. And combining the two similarity results similarity of two images can be represented as: SL (x) = [SPC (x)]α .[SG (x)]β
(7)
where S PC = similarity obtained from phase congruency and S G = similarity obtained from gradient magnitude.
3 Results and Discussion The proposed method is tested with number of print samples collected from commercial offset presses. Here the pictorial result of four samples is presented in Figs. 3, 4, 5, and 6 where the sample images are both text and image based. In Fig. 3a the captured RGB image of a double printed text sample is shown. So to detect the edge of double printed area first the image is converted into grayscale image. In Fig. 3b the grayscale image is shown. Figure 3c shows the foreground image after adaptive thresholding operation. Figure 3d shows the detected edge of doubling defected image after morphological and logical operation. Now in Fig. 3e, f the cropped region of interest from detected edge of doubling image is shown. These two cropped ROI images are compared with SSIM algorithm. In this presented work the percentage
158
J. Saha and S. Naskar
of SSIM algorithm is calculated. It is proposed that if the percentage of similarity index is above or equal 90%, the two tested ROI patterns are considered to be same, or else they are considered not to be same. Here in case of text image the structural similarity of two edge patterns of doubling image is above 95%. Hereby it is proposed that according to the structural similarity percentage doubling defect can be identified which can be more convenient than subjective measurement. But in case of image-based doubling defect sample, as the overlapped edge is quite difficult to segregate, there is less accuracy in case of edge detection as well as percentage of similarity measurement. Here in Fig. 4a the original RGB image is shown. In Fig. 4b the grayscale or binary image of doubling defect sample is shown. Next after thresholding the foreground image is segmented from background which is shown in Fig. 4c. In Fig. 4d the detected edge of double print image is shown. Next, the two regions of interest are cropped in Fig. 4e, f. The similarity percentage of these two ROIs is 87%. So in case of image-based doubling defect this less accuracy may be considerable as because doubling defect is quite difficult in image reproduction. In Figs. 5 and 6 another two samples’ pictorial presentation is given. In Table 1 the result of these four sample images is shown in terms of similarity percentage. Moreover this method is tested with numerous sample both text and images for comparative analysis. So it can be concluded that according to the proposed method the same edge pattern is detected in text-based doubling image with high accuracy than image-based doubling image. Mean square error (MSE) is the common image quality assessment tool where absolute error between two images is represented. Here MSE is applied to the cropped region of interest, and the value closer to zero is considered as a better result. As MSE gives only the absolute error which is not normalized, on the other hand, SSIM and FSIM are normalized and give the result based on the feature and structure of two images. Now for the doubling error detection purpose, the detection of repeatability of same edges of printed pixel is very much needed. This repeatability depends on the contrast distortion, luminance distortion, and structural co-relation. SSIM helps to model the distortion by comparing these factors, whereas FSIM stresses only to the feature of images in frequency domain and invariant to contrast and luminance. So for particularly for this kind of print defect detection SSIM shows more strength compared to the other two algorithms (MSE and FSIM). The result in Table 1 also depicted the same. Here the value which got after applying MSE algorithm does not support the fact and considered as only the absolute error between the cropped region. Whereas FSIM output supports the similarity percentage got from SSIM for these four tested image samples but as the FSIM values are invariant of contrast and luminance, SSIM algorithm is more appropriate in case of doubling error detection.
An Approach Toward Detection of Doubling Print Defect Using SSIM …
159
Fig. 3 Pictorial representation of different steps of the presented method for Sample 1. a Original print image, b grayscale image, c foreground image, d detected edge of double print text, e ROI-1, and f ROI-2
4 Conclusion In this paper an approach of doubling print defect identification based on SSIM algorithm is presented. The proposed technique is tested with various number of test samples collected from offset press. The detection procedure is totally done in MATLAB environment which makes the presented technique comparatively less manual intervention than subjective measurement. This presented technique can be a potential for mobile-based approach for detection of this print problem which makes the print job less time consuming in terms of problem detection as well as good quality production. Moreover artificial intelligence application may be the future scope of this work. The results depict the potential of the presented approach in case of text-based image but in case of image-based doubling image, there are some limitations in measurement of structural similarity. Moreover the limitations of not getting 100% similarity between the edge patterns of double printed samples can be rectified by further post-processing operations. Moreover the SSIM effectiveness is proven by comparing with other image quality assessment technique applied for this kind of print defect. However SSIM is more compatible than the other two metrics as
160
J. Saha and S. Naskar
Fig. 4 Pictorial representation of different steps of the presented method for Sample 2. a Original print image, b grayscale image, c foreground image, d detected edge of double print image, e ROI-1, and f ROI-2
contrast and luminance of double printed edges is also considered in this algorithm. So it can be concluded that for doubling printing defect using SSIM is comparatively better approach from human perspective. The next follow-up work should be the removal of unwanted double printed text or image and reconstruction of the print.
An Approach Toward Detection of Doubling Print Defect Using SSIM …
161
Fig. 5 Pictorial representation of different steps of the presented method for Sample 3. a Original print image, b grayscale image, c foreground image, d detected edge of double print image, e ROI-1, and f ROI-2
162
J. Saha and S. Naskar
Fig. 6 Pictorial representation of different steps of the presented method for Sample 4. a Original print image, b grayscale image, c foreground image, d detected edge of double print image, e ROI-1, and f ROI-2
Table 1 Result of MSE, FSIM, and SSIM for print image samples
Sample
MSE
FSIM (in %)
SSIM (in %)
Sample 1 (Fig. 3)
22.20
96
98
Sample 2 (Fig. 4)
13.15
72
87
Sample 3 (Fig. 5)
22.75
97
97
Sample 4 (Fig. 6)
12.34
86
90
References 1. Leach RH, Pierce RJ. The printing ink manual, 4th edn, 353pp 2. Barnard M. The print and production manual, 8th edn 3. Zhou M, Wang G, Wang J, Hui C, Yang W (2017) Defect detection of printing images on cans based on SSIM and chromatism. In: 2017 3rd IEEE international conference on computer and communications
An Approach Toward Detection of Doubling Print Defect Using SSIM …
163
4. Xia B, Cao J, Wang C (2019) SSIM-NET: real-time PCB defect detection based on SSIM and Mobile Net-V3. In: 2019 2nd World conference on mechanical engineering and intelligent manufacturing (WCMEIM) 5. Anandababu P, Kamarasan M (2019) Structural similarity measurement with metaheuristic algorithm for content based image retrieval. In: 2019 Second international conference on smart systems and inventive technology (ICSSIT 2019) 6. Horé A, Ziou D. Image quality metrics: PSNR vs. SSIM. In: 2010 International conference on pattern recognition 7. Kaur B, Kaur G, Kaur A (2014) Detection and classification of printed circuit board defects using image subtraction method. In: Proceedings of 2014 RAECS UIET, Panjab University Chandigarh 8. Yankai T, Qinghu C, Lijuan L, Wei D (2009) Research on print quality assessment and identification: evaluation of print edge roughness. IEEE 9. Karthik D, Vijayarekha K, Arun AR. Printing defect identification in pharmaceutical blisters using image processing. Asia J Pharm Clin Res 11 10. Saha J, Naskar S, Chatterjee A, Paul KC (2018) Print scum identification using DCT based computer vision method. In: Proceedings-2018 fourth IEEE international conference on research in computational intelligence and communication networks, ICRCICN 2018, pp 103–107. ISBN: 978-1-5386-7639-4 11. Thung K-H, Raveendran P (2009) A survey of image quality measures. In: IEEE technical postgraduates (TECHPOS) international conference, Kuala Lumpur, 14–15 Dec 2009, pp 1–4 12. Sara U, Akter M, Uddin MS (2019) Image quality assessment through FSIM, SSIM, MSE and PSNR—a comparative study. J Comput Commun 7(3) 13. Kumar R, Moyal V (2013) Visual image quality assessment technique using FSIM. Int J Comput Appl Technol Res 2:250–254. https://doi.org/10.7753/IJCATR0203.1008
Multi-variant Statistical Tools and Soft Computing Methodology-Based Hybrid Model for Classification and Characterization of Yeast Data Shrayasi Datta and J. Pal Choudhury
Abstract The study of pharmacology has grown necessarily in recent years. Naturally, pharmacological dataset research has made a significant impact on society. A pharmacology dataset, yeast, has been closely supervised, and its properties have been investigated in this study. Various statistical procedures such as factor analysis algorithm (FA), distance vector algorithm (DV), principal component analysis algorithm (PCA), as well as machine learning models such as artificial neural networks system, fuzzy logic, and fuzzy rule base, are used to suggest a hybrid model, H = {Factor Analysis, Fuzzy Time Series, ANN, GA}. Residual analysis was used to test the data obtained and the performance of suggested model. Keywords Pharmacological dataset · Factor analysis · Total effect · Cumulative effect · Fuzzy time series · ANN · Genetic algorithm · Yeast
1 Introduction Pharmacology is a biomedical discipline that focuses on the investigation, discovery, and characterization of substances with biological effects, as well as the understanding of cellular and organ function in connection to these chemicals. Drug composition and characteristics, synthesis and drug design, and molecular and cellular mechanisms are all covered in this field [1]. In a nutshell, pharmacological study is concerned with how medications affect the body and how the body reacts to them. Because of its widespread use in pharmaceuticals, yeast is known to have pharmacological properties. The yeast dataset [2], which was downloaded from UCI laboratories, was chosen for its therapeutic potential in pharmacological research in S. Datta (B) Jalpaiguri Government Engineering College, Jalpaiguri, W.B., India e-mail: [email protected] J. P. Choudhury Narula Institute of Technology, Kolkata, W.B., India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_16
165
166
S. Datta and J. P. Choudhury
this investigation. Horton and Nakai [3] have classified two datasets E. coli and yeast using K-nearest neighbor algorithm, decision tree algorithm, and Bayesian classifier. K-nearest neighbor algorithm outperforms with a classification accuracy of 60% and 86%. Classification of yeast and E. coli datasets [4] has been performed using decision tree and perceptions learning algorithm with feed-forward artificial neural network. It is concluded that the performance of the algorithms is similar on these datasets. In [5], a deep-diving has been made by analyzing the performances of a number of classical machine learning algorithms with different datasets. And it has been concluded that there is no single machine learning methodology, which is best for all the datasets. And also for most of the cases, combination of these algorithms performs better rather applying a single algorithm. A new cost function has been proposed which is specifically designed for the classification of imbalanced dataset [6]. In [7], two clustering algorithms have been applied on some datasets from UCI, and results were evaluated using DB index, and fuzzy C-means clustering algorithm performs better. An advanced particle swarm optimization algorithm is proposed [8] in which author claims that will quicken the convergence procedure of ANN learning. A new algorithm [9] has been proposed which will determine and optimize the neural network structure. The algorithm uses variance sensibility analysis, and the stop criteria are based on the performance of the MLP. Tested on different classical algorithm of classical and regression problems on different datasets, results show that the pruned networks resulted from the algorithm [9] outperform their classical version. In [10], yeast dataset, obtained from UCI, has been classified using fuzzy logic and ANN. In [11], classification of yeast has been done in ANN with different training functions, and the results have been compared. In [12], authors proposed multi-variant statistical model with PSO and fuzzy time series to characterize yeast data. In [13], a hybrid model has been proposed with multi-variant statistical models with different soft computing techniques to evaluate the characters and working of yeast dataset. In [14], another multi-variant statistical model comprising of soft computing models, and swarm intelligence models have been proposed by the authors. Researchers around the world are doing their job tirelessly to examine and model various pharmacological datasets which are aiding help in the drug industry. Here also, a contribution is made by proposing a hybrid model comprising of soft computing and statistical algorithms like factor analysis, distance vector algorithm, principal component analysis (PCA), with fuzzy time series (FTS), multi-layer feedforward neural network, and genetic algorithm (GA), for the characterization and classification of yeast data. Results are being compared with residual analysis. The paper is organized as follows: Sect. 2 describes the dataset with a short narrative of residual analysis. Sections 3 and 4 cover the implementation with results, respectively. And, Sect. 5 draws the conclusion.
Multi-variant Statistical Tools and Soft Computing Methodology-Based …
167
2 Methodology 2.1 Dataset Description The yeast dataset [2], from UCI machine learning repository, has been used for the study. The dataset consists of 8 attributes. On the basis of the attribute values, the output is classified into 10 classes. There is no missing value for attributes, and it contains a total of 1484 number of instances.
2.2 Residual Analysis Residual analysis was used to assess the performance of the various methodologies used here. The result is analyzed on the basis of various performance factors given below.
3 Implementation It can be broadly subdivided into the below-mentioned sections: 1. a. Application of factor analysis (FA) algorithm, principal component analysis (PCA) algorithm, and distance vector algorithm (DV) to compute “total effect” of each sample. b. Result obtained from Section 1 is compared using residual analysis, and on the basis of that output from one is selected for further work. 2. a. Fuzzy time series (FTS) method has been applied on result obtained and selected from previous Section 1. b. Residual analysis is performed on the de-fuzzified outputs obtained by applying FTS model. 3. a. Feed-forward ANN has been applied on de-fuzzified outputs obtained from Section 2. b. Residual analysis is performed on the outputs obtained from last step. 4. a. Genetic algorithm has been applied on the outputs obtained from Section 3. b. Residual analysis is performed on the outputs obtained from Section 3. 1a. Application FA, PCA, DV to compute “total effect” of each sample FA, PCA, and DV have been applied to yeast dataset to compute the cumulative effect of each samples of yeast dataset. Due to space constraints, the steps for all of the three algorithms cannot be furnished here. However, in one of the authors’ previous paper [12], factor analysis has been described. In Table 1, eigen value, percentage of contribution, and cumulative effect have been listed for FA.
168
S. Datta and J. P. Choudhury
Table 1 Eigen value and percentage of contribution and cumulative effect Attribute
Eigen value
Percentage of contribution
Cumulative effect
mcg
1.8142
22.6770
0.9465
gvh
0.4066
5.0821
0.9409
alm
1.2703
15.8791
0.9845
mit
0.7564
9.4546
0.9930
erl
0.8020
10.0255
0.9989
pox
0.9352
11.6904
0.999
vac
1.0212
12.7655
0.9920
Nuc
0.9941
12.4259
0.9957
The goal is to compute the total effect of each sample data, which has been obtained by the cumulative effect of each attribute value obtained from Table 1, and the total effect has been calculated for factor analysis. Likewise, total effect has been calculated for principal component analysis (PCA) algorithm and distance vector algorithm (DV). 1b. Residual analysis has been performed for all three statistical methods, i.e., FA, PCA, and DV, and they are summarized in Table 2. From Table 2, it can be seen that FA gives the least average error. So the cumulative effect got from factor analysis (FA) is chosen for further study. 2a A Time-Constant FTS model, inspired by the model proposed by song and chism [15–17] has been applied on the cumulative effect obtained from FA .the Step by step are described below: Step 1: The universe of discourse (UID) has been defined by the minimum and maximum values of total effects. Step 2: Then the UID is partitioned into four equal partitions. And four fuzzy sets F1, F2, F3, and F4 have been defined. The cumulative effect obtained from factor analysis is fuzzified based on Gaussian MF. Table 2 Residual analysis of FA, PCA, and DV algorithms PCA
FA
DV
Sum of absolute residual
4478.4
106.08
3035.3
Mean absolute residual
0.37722
0.008935
0.25567
Sum of mre
10,284
219.89
6974.6
Mean of mre
0.86623
0.018521
0.58749
Standard deviation of absolute residual
0.1981
0.01193
0.13414
Average error
86.623
1.8521
58.749
Average square error
0.0036226
6.7079e-08
0.0016672
Multi-variant Statistical Tools and Soft Computing Methodology-Based …
169
The cumulative effect value (input data), along with their fuzzified value (for F1, F2, F3, and F4), and their specified fuzzy set have been given in Table 3, Columns 2, 3, and 4. The data related to 5 samples has been furnished. Step 3: Using the symbols proposed by song and chism [15–17], fuzzy logical relationships have been computed. Step 4: following the time-constant fuzzy time series model, the cumulative sum (R) of all the fuzzy logical relationship has been obtained as: 1 1 R= 1 0.6
1 1 1 0.6
1 0.6 1 1 1 0.8 1 1
Step 5: The predicted fuzzy output membership value for each input is calculated using Eq. (1) and listed in Table 3 in Col. 5. Ai = Ai −1 .R
(1)
Step 6: As predicted fuzzy output membership value obtained from previous step is a fuzzy value, so de-fuzzification is done. For de-fuzzification, a modified centroid method is applied. And it has been observed that the modified method gives better result in comparison with the core centroid method. The modified centroid method is presented below: (i) If the output fuzzy membership has one maximum, the middle value of that fuzzy membership interval is selected as de-fuzzified output. (ii) If the output fuzzy membership has two or more successive maximum, the middle value of the total consecutive range is selected as de-fuzzified output. (iii) In any other cases, the middle value of each interval is selected for the calculation of de-fuzzified output using the centroid method. Table 3 Fuzzified input data with fuzzy membership set value, the output fuzzy membership value, and de-fuzzified value of total effects using FTS Col. 1
Col. 2
Col. 3
Col. 4
Col. 5
Col. 6
Total effect
Input fuzzy
Output membership
F1
F2
F3
F4
Fuzzy set
F1
F2
F3
F4
De-fuzzified value
1
2.6692
0.8
1
0.2
0
A2
–
–
–
–
–
2
2.5628
1
0.8
0
0
A1
1
1
1
1
3.103825
3
3.2264
0
0.6
1
0.4
A3
1
1
1
0.8
3.068568
4
2.9847
0.4
1
0.6
0
A2
1
1
1
0.8
3.068568
5
2.8637
0.6
1
0.4
0
A2
1
1
1
1
3.103825
170
S. Datta and J. P. Choudhury
The de-fuzzified output for is listed in Table 3, Col. 6. 2b. Residual analysis. Residual analysis is performed, and the result obtained is listed in Table 4, Column 2. 3a. A feed-forward artificial neural network (ANN) with backpropagation training algorithm has been applied on the forecasted fuzzy output of fuzzy time series model (listed in Table 3, Col.5). The inputs are forecasted fuzzy outputs, and the outputs are also of fuzzy set type. So the output obtained from this step is again de-fuzzified using the same principle as described in Sect. 3.2.a, Step 7. 3b. The residual analysis has been done, and the result is shown in Table 4, Col. 3. 4a Genetic algorithm has been applied on output computed from previous Sect. 3.3.a. Below the steps of application of genetic algorithm on de-fuzzified output value of ANN derived from Sect. 3.3.a have been described. Step 1: For each data, four numbers of random population have been generated. Step 2: The fitness function has been calculated for every population data with the following formula: e = |Actual Value−Forecasted value| Step 3: Two data with the minimum fitness value, among the four random data generated from Step1, have been chosen as parent 1 and parent 2. Step 4: Crossover is executed, and chromosome has been formed as child1 and child2. Table 4 Residual analysis Column 1
Column 2
Column 3
Column 4
Fuzzy time series (FTS)
FTS + ANN
Genetic algorithm on de-fuzzified ANN output
Sum of absolute residual
500.8
755.8
311.78
Mean of absolute residual
0.33747
0.5093
0.2101
Sum of mean relative error
178.33
813.47
110.07
Mean of mean relative error
0.12017
1.5341
0.07417
Standard deviation of absolute residual
0.00057021
0.032119
0.0050231
Average error
12.017
50.93
7.417
Average squared error
0.0039149
0.0012538
1.243e-09
Multi-variant Statistical Tools and Soft Computing Methodology-Based …
171
Step 6: Among the four chromosomes (parent 1, parent 2, child1, child2), the chromosome with minimum fitness value has been selected for mutation. Step 7: Mutation process has been executed, and the fitness value of the chromosomes resulted from the process of mutation has been computed, and if the fitness value is better than the previous one, it is selected, otherwise discarded. Step 8: Step 4 to Step 8 have been repeated until the average error becomes less than or equal to 0.10 or the number of iterations reached is 1000. Step 9: Step 10 has been repeated for all data. 4b. Performance analysis for genetic algorithm Genetic algorithm has been applied on the de-fuzzified ANN outputs, obtained from Sect. 3.8. The residual analysis is done, and the outputs are shown in Table 4, Column 4.
4 Result Finally, Table 4 shows that fuzzy time series (FTS), when applied on cumulative effect value of yeast data, gives satisfactory result, but application of genetic algorithm when applied on de-fuzzified ANN outputs (average error 7.417) even gives better result. ANN does not give impressive performance when applied on FTS outputs (average error 50.93).
5 Conclusion and Future Scope Artificial neural network, fuzzy logic, and genetic algorithm are said to be the main pillars of machine learning algorithm. In this paper, these three basic algorithms are hybridized with statistical algorithm like factor analysis. “Cumulative effect” has been calculated for each sample, and this hybrid model has been applied on these cumulative effect data value. It has been seen from the result that application of fuzzy time series (FTS) on cumulative effect gives an acceptable result, but application of ANN onto the output obtained from FTS worsen the result. But finally, when GA has been applied on the outputs obtained from ANN, it gives a good result. This analysis can be applied on other datasets. And other soft computing methodologies can be added to this proposed hybrid model to improve the performance of the model.
172
S. Datta and J. P. Choudhury
References 1. Bober L et al (2011) Pharmacological classification of drugs by principal component analysis applying molecular modeling descriptors and HPLC retention data. J Chromatogr Sci 49, Nov/ Dec 2011:758–763 2. UCI machine learning repository. http://archive.ics.uci.edu/ml 3. Horton P, Nakai K (1997) Better prediction of protein cellular localization sites with the k nearest neighbor classifier. In: ISMB-97 proceedings of the American association for artificial intelligence, pp 147–152 4. Chen Y, Predicting the cellular localization sites of proteins using decision tree and neural networks. http://www.cs.iastate.edu/~yetianc/cs572/files/CS572_Project_YETIANCHEN.pdf 5. AikChoon TAN, David GILBERT (2003) An empirical comparison of supervised machine learning techniques in bioinformatics. In: Proceedings of the first Asia-Pacific bioinformatics conference on bioinformatics 19, Australian Computer Society, Inc. Australia, pp 419–422, ISBN: 0 909-92597-6 6. Vorraboot P, Rasmequan S, Lursinsap C, Chinnasarn K (2012) A modified error function for imbalanced dataset classification problem. In: 7th IEEE international conference on computing and convergence technology (ICCCT), Seoul, 978-1-4673-0894-6, pp 854–859 7. Ashok P, Kadhar GM, Elayaraja E, Vadivel V (2013) Fuzzy based clustering method on yeast dataset with different fuzzification methods. In: Fourth international conference on computing, communications and networking technologies (ICCCNT), IEEE. July 2013, Tiruchengode, India, pp 1–5. ISBN-978-1-4799-3925-1 8. Beheshti Z, Shamsuddin SMH, Beheshti E et al (2014) Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis. J Soft Comput, Springer, Nov 2014, 18(11), pp. 2253–2270. https://doi.org/10.1007/s00 500-013-1198-0 9. Thomas P, Suhner M (2015) A new multilayer perceptron pruning algorithm for classification and regression applications. Neural Process Lett, 42(2):437–458. https://doi.org/10.1007/s11 063-014-9366-5 10. Datta S, Palchoudhury J (2015) A comparative study on the performance of fuzzy rule base and artificial neural network towards classification of yeast data. Int J Inf Technol Comput Sci 7(5), Apr 2015 11. Datta S, Paulchoudhury J (2015) A framework for selection of neural network training functions towards the classification of yeast data. In: Proceeding of national conference on computational technologies-2015, Department of Computer Science & Application, University of North Bengal, India, Feb-2015 12. Datta S, Choudhury JP (2016) A framework of multivariant statistical model based tool using particle swarm optimization with fuzzy data for the classification of yeast data. In: 2016 International conference on microelectronics, computing and communications (MicroCom) (IEEE). Durgapur, pp 1–7 13. Datta S, Pal Choudhury J (2016) A framework for the development of Multivariant statistical model based tool using artificial neural network for the classification of yeast data. In: 3rd international conference on business and information management (IEEE-ICBIM-2016), NITDGP, pp 85–105, ISBN: 978-81-945563-7-4 14. Datta S, Pal Choudhury J (2020) A comparative study on the performance of fuzzy logic, particle swarm optimization, firefly algorithm and Cuckoo Search algorithm using residual analysis. In: Proceedings of ICIMSAT-2019, intelligent techniques and applications in science and technology, LAIS Series, vol 12, Springer, Berlin, pp 923–930 15. Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy timeseries part I. Fuzzy Sets Syst 54:1–9 16. Song Q, Chissom BS (1993) Fuzzy time series and its models. FuzzySets Syst 54:269–277 17. Song Q, Chissom BS (1994) Forecasting enrollments with fuzzy timeseries—part II. Fuzzy Sets Syst 62:1–8
Recent Challenges and Opportunities of Multilingual Natural Scene Text Recognition and Its Real World Deployment Kalpita Dutta, Soupik Chowdhury, Mahantapas Kundu, Mita Nasipuri, and Nibaran Das Abstract Multilingual natural scene text recognition is difficult due to its complex text font style, difficult image background, multilingual text formats, etc. In visionbased applications, natural scene text plays an important role in industrial automation, robot navigation, application software for visually impaired persons, instant translation of multilingual access, etc. Although deep learning techniques have produced encouraging results recently, there are still several issues with multilingual natural scene text recognition tasks that need to be resolved. The main objective is to summarize recent challenges and future opportunities of multilingual natural scene text detection and recognition in this paper. We have proposed a real-world deployment model of multilingual natural scene text recognition. The system employs a text localization method using MSER-based region detection technique involving Niblack local thresholding method with some modification. This paper also suggests the future research directions for the multilingual natural scene text detection and recognition tasks. Keywords Natural scene text detection · MSER · Niblack · ELM · Image analysis
1 Introduction Text recognition is a difficult task from natural scene images in a multilingual text environment. In the past decade, many research studies have been done in this domain. Text recognition task has some common sub-problems, like text localization [21], script identification in multilingual context [1, 4, 7], and end-to-end text recognition [2, 15]. However, many other complexities of natural scene text have been targeted recently. Natural scene text with curved font style [15, 20]. Total text [3] dataset has natural scene text with curved orientation, and on the other side, ICDAR K. Dutta (B) · M. Kundu · M. Nasipuri · N. Das Department of CSE, Jadavpur University, Kolkata, India e-mail: [email protected] S. Chowdhury Samsung R&D, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_17
173
174
K. Dutta et al.
Fig. 1 Some example images of JUDVLP-MNSTextdb.v1 dataset with different view points
2003, ICDAR 2013 or ICDAR 2015 [14] follows single script text. Most of the text data have a front view. SVT is a Google Street view dataset with a single script. All the experiments have been done on our new dataset JUDVLP-MNSTextdb.v1. Figure 1 shows some example images of multilingual natural scene text images of the JUDVLP-MNSTextdb.v1 dataset. Figure 1, (a) shows natural scene text on a signboard, (b) contains multilingual text appearing on the side of a truck, (c) is a multilingual signboard of a shop, and (d) is a natural scene text glow sign board.
1.1 Applications Some examples of the uses of multilingual natural scene text detection and recognition are: 1. Handling transportation in an intelligent way. 2. Construction of geocoding system, which is helpful for travel, and also useful for users to overcome language barriers such as automatic road sign board recognition and text language translation. 3. Information retrieval from the natural scene text images, e.g. Business card recognition, car number plate recognition, etc. 4. Natural scene text recognition helps industrial automation, for example, package information recognition. 5. Access to visual input is helpful for visually impaired persons. More than 2.2 billion people have blindness according to World Health Organization (WHO). Multilingual natural scene text recognition can improve their lifestyle. Speech to text system can help to search specific road direction and ATM navigation whereas text to speech can help to figure out the road sign boards.
Recent Challenges and Opportunities of Multilingual Natural Scene …
175
1.2 Different New Challenges Multilingual natural scene text images have different challenges due to complicated background, fancy text style, bad image quality, perspective distortion, etc. Figure 1. (e) has a difficult font style, letter A is difficult to understand, (f) has curved text, and images of (g), (h) are of poor contrast. Some common and recent challenge are: 1. Text with Multi-Orientation: Natural scene texts can have different orientations in a real-world environment. Reading texts of different orientations is necessary to take full advantage of textual information in multi-oriented natural scene text images. 2. Natural Scene Text with Arbitrary-Shape: Natural scene text images contain different advertisement signboards with fancy and arbitrary shaped eye-catching text fonts, Arbitrary-shaped text font increases another level of complexity for natural scene text detection tasks. 3. Multilingual Natural Scene Text: Previous methods are concerned with single script versions like English, while some other works involved texts in other languages (Chinese, Urdu, and Bengali). Developing detection and recognition methods to handle multilingual texts is challenging. 4. Exploring small, large or angular Text dataset Another challenge is better exploiting small and extra large-scale text fonts for training. In the past, the Maximally stable extremal region or MSER has been used for text region detection tasks. The primary method depends upon a threshold value called delta, smallscale or large-scale text fonts is based on delta value selection. On the other hand, training with different deep convolutional neural network models has shown significant success in many text detection and recognition tasks, but deep models may ignore small or extra large fonts. Many research papers have been proposed by merging the different scale factors[5] or pyramid structured models[19]. Although many multi-scale methods have been proposed and are available. 5. Big Data analysis in the field of deep learning: Combining deep learning algorithms and a huge training data seems to dominate natural scene text detection and recognition tasks. Deep convolutions networks are data hungry, and an increasing amount of data can effects the final results.
1.3 Comprehensive Multilingual Natural Scene Text Understanding Multilingual natural scene text images have many complexities, including standard scene image complexity. In addition to problems related to training natural scene text datasets, an important step is investigating comprehensive multilingual natural scene text image understanding. Besides recognizing and locating texts in a natural scene image, humans infer different inter-text relations, text attributes, and natural scene layouts. Acquiring a broader understanding of multilingual natural scene text would
176
K. Dutta et al.
Fig. 2 Comprehensive understanding of multilingual natural scene text
facilitate applications such as robot navigation, which often requires knowledge beyond text location. This task involves the perception of the natural scene and gives a cognitive understanding of the physical world. There remains a different way to reach this goal presented in Fig. 2.
1.4 Contribution • We summarize the recent challenges and future opportunities of multilingual natural scene text recognition tasks. Moreover, we give an overview of a deep understanding of multilingual natural scene text. • We have proposed a text detection method with MSER [16] based region detection method fine tuning with Niblack local thresholding and connected component labelling for finding the detected binary text regions. We have used two classifiers, SVM and ELM [11], for text detection. MSER with Niblack thresholding and ELM performs best among other methods. Development of multi-script, multilingual scene text image database covering the challenges mentioned in Sect. 1.2. The remaining part of the paper is organized as follows: The related literature survey and proposed work have been described in Sects. 2 and 3, respectively. Section 4 describes the dataset and analyses the performance of the proposed work of the new dataset. The paper is concluded in Sect. 5.
2 Related Study Many traditional machine learning based method [6, 9, 12, 17], hybrid method (combining both the machine learning and deep learning-based method) [13, 18] and also only deep learning-based methods [5, 10, 21] have been applied for natural scene text localization and detection task.Those methods have some specific drawbacks, like
Recent Challenges and Opportunities of Multilingual Natural Scene …
177
Fig. 3 Flow diagram of multilingual natural scene text localization task
traditional machine learning-based method needs a correct choice of hand-crafted features and pre-processing and post-processing methods for filtering out detected non-text regions. Whereas deep learning-based method needs high computer configuration and a large amount of training data set, and execution of the whole model is also time-consuming. Hybrid methods have a complex merging of methods. For the traditional machine learning-based MSER text detection method, connected component analysis is a central part of the final text detection task. The ERs/MSERs [16, 17] and Stroke Width Transform (SWT) [9] are two significant methods for natural scene text detection.
3 Proposed Work One main focus of all these past research efforts is to develop a text detection method using MSER and Niblack method with ELM classifier. After detecting the MSER regions, they are converted into a binary format using Niblack local thresholding method. A major problem for scene text detection and recognition arises due to dark on white and white text on dark text background resulting in some of the text regions in black, and some of them in white. This affects the results of connected component analysis missing some text regions. MSER detects all the Text region pixels by choosing the correct threshold value. One major problem of using MSER is that it detects text pixels as well as many non-text pixels. With the help of the HOG, LBP, and colour features of detected regions, the SVM and ELM classifier selects the exact text region out of text and non-text regions. The flow diagram in Fig. 3 describes the proposed work.
4 Experiments 4.1 Dataset Description This paper presents performance results of the proposed text detection method on the JUDVLP-MNSTextdb.v1[8] dataset. Beside the scene text images, JUDVLP-
178
K. Dutta et al.
MNSTextdb.v1 dataset contains both the localization coordinates and recognized words of the corresponding text word images as ground truth. The images contain mainly three languages: Bengali, English, and Hindi. This dataset concentrated on various complexities, such as multilingual text, skewed (left or right direction) text images, multi-oriented text, and curved text. Previously available public datasets such as ICDAR 2003, ICDAR 2013, ICDAR 2015, and SVT do not have such complexities.
4.2 JUDVLP-MNSTextdb.v1 Dataset Image Analysis Depending on some modelling approaches, most of the previously proposed methods have some design limitations. Those models are designed to perform well on some specific benchmark datasets. It is compulsory to know the dataset image patterns and their specific features to select the specific model. Training with those datasets might produce unexpected results if the test images are different. Some models have performed well on MNIST database with small text. On the other hand, some models have performed well on ICDAR 2003, 2013, and 2015 natural scene text datasets with horizontal text in single script. This paper briefly outlines different problems associated with different multilingual natural scene text datasets. 1. One more important thing is, sometimes we need to resize the image (smaller or enlarge) as required for training the model. Reduction of the image size, shrinks the size of the absolute ROI of the respective objects. Whereas enlarging the size of ROI stretches the desired object and affects the specific object’s property. The multilingual natural scene text dataset has both small and large text instances. So, it is better to crop the image rather than resize the image. 2. Partially labelled dataset or overlapping two class instances create confusion in the training model. The model could not define the specific class correctly. 3. Class imbalances can be a big issue in the case of text detection tasks. The standard classification task can control a class contribution to the loss by oversampling and down-sampling of the dataset. If the dataset has co-occurring classes, it is difficult to drop some of the labels because it would send a mixed signal as to what the background is. This paper has applied some traditional machine learning methods for image analysis, such as calculating (1) Raw image comparison for natural scene text image with natural scene image without text, (2) Image format (height, width, mean of the image) description, (3) Image metrics like image brightness, dimensions, and resolution, and (4) Average values of pixels of the natural scene text image and natural scene text image without text.
Recent Challenges and Opportunities of Multilingual Natural Scene …
179
Table 1 Accuracy of different methods applied on JUDVLP-MNSTextdb.v1 dataset Methods Accuracy MSER + OTSU method with SVM classifier MSER + OTSU method with ELM classifier MSER + Niblack method with SVM classifier MSER + Niblack method with ELM classifier MSER + Niblack with some modification with ELM classifier
78 86 85 89 90
4.3 Results Analysis Before applying the MSER method, we have enhanced the original image in RGB to Greyscale colour format. MSER has used for text region detection by setting the delta threshold value as 10. The detected region has converted into binary format. A morphological erosion over dilation method has been applied with a diamond structuring element selected to enhance the detected character image to form a word region. Then the connected regions, are marked with bounding boxes enclosing the text words, and those regions are cropped. HOG, LBP, and colour features are extracted from those cropped regions for training the SVM and ELM classifiers and testing the performances of those classifiers. Finally, the falsely detected non-text regions are removed. We have deployed the whole method into a system using the MATLAB. Table 1 shows the comparative results of different methods applied on our JUDVLP-MNSTextdb.v1 dataset.
5 Conclusion We have deployed a multilingual natural scene text recognition system using MSER for the text detection task. We have also used hand-crafted feature sets like HOG, LBP, and colour features are used to train the ELM classifier to filter out falsely detected non-text regions by MSER. We have achieved satisfactory results for text detection task applied on our own dataset JUDVLP-MNSTextdb.v1. Our main motive is to chalk out a route map for multilingual natural scene text recognition tasks by summarizing recent challenges and opportunities of multilingual natural scene text detection and recognition.
180
K. Dutta et al.
References 1. Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn 85:172–184 2. Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision. pp 2204–2212 3. Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). vol 1. IEEE, pp 935–942 4. Dastidar SG, Dutta K, Das N, Kundu M, Nasipuri M (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, Berlin, pp 150–162 5. Dutta K, Bal M, Basak A, Ghosh S, Das N, Kundu M, Nasipuri M (2020) Multi scale mirror connection based encoder decoder network for text localization. Pattern Recogn Lett 135:64–71 6. Dutta K, Das N, Kundu M, Nasipuri M (2019) Text localization in natural scene images using extreme learning machine. In: 2019 second international conference on advanced computational and communication paradigms (ICACCP). IEEE, pp 1–6 7. Dutta K, Dastidar SG, Das N, Kundu M, Nasipuri M (2022) Script identification in natural scene text images by learning local and global features on inception net. In: International conference on computer vision and image processing. Springer, Berlin, pp 458–467 8. Dutta K, Ghosh Dastidar S, Das N, Kundu M, Nasipuri M (2022) Msmed-net: an optimized multiscale mirror connected encoder-decoder network for multi-lingual natural scene text recognition. In: 2022 7th international conference on emerging applications of information technology (EAIT 2022) (2022 (Accepted and Presented)) 9. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970 10. Hu Z, Wu X, Yang J (2021) Tcatd: text contour attention for scene text detection. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 1083–1088 11. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501 12. Huang W, Lin Z, Yang J, Wang J (20) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision. pp 1241–1248 13. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, Berlin, pp 497–511 14. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160 15. Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9809–9818 16. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput 22(10):761–767 17. Nistér D, Stewénius H (2008) Linear time maximally stable extremal regions. In: European conference on computer vision. Springer, Berlin, pp 183–196 18. Wang Y, Shi C, Xiao B, Wang C, Qi C (2018) Crf based text detection for natural scene images using convolutional neural network and context information. Neurocomputing 295:46–58 19. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp 9038–9045
Recent Challenges and Opportunities of Multilingual Natural Scene …
181
20. Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2059–2068 21. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5551–5560
Healthcare
Automated Cervical Dysplasia Detection: A Multi-resolution Transform-Based Approach Kangkana Bora, Kasmika Borah, Lipi B. Mahanta, M. K. Bhuyan, and Barun Barua
Abstract Pattern detection and classification of cervical cell dysplasia is an important ongoing study. The primary goal of this study is to create a complete model for real-world application for cervical dysplasia that has the highest degree of accuracy and the least computation time. The study initially builds the model that will be used to train and evaluate the system to classify the dysplasia. Three different color models, three transformations, each with a different filter, two feature representation schemes, and two well-known classification approaches are used in conjunction to determine the optimal combination of ‘transform (filter) ⇒ color model ⇒ feature representation ⇒ classifier’. Extensive studies on two datasets, one indigenous and the other public, demonstrated that the NSCT-based classification performs well. When compared to two approaches, the model proposed yields the most satisfying results for both the datasets, with an accuracy of (98–99.50)%. Keywords Pap smear · Ripplet transform · Non-subsampled Contourlet Transform
1 Introduction Cervical cancer is the second most prevalent cancer among women after breast cancer [4]. One of the early diagnostic techniques for cervical cancer is Pap smear screening. For a pathologist, screening of cervical cancer means finding the potentially very few Low grade Squamous Intraepithelial Lesion (LSIL) and High grade Squamous Intraepithelial Lesion (HSIL) (or Squamous Cell Carcinoma abbreviated as SCC) K. Bora (B) · K. Borah · B. Barua Cotton University, Guwahati, India e-mail: [email protected] L. B. Mahanta Institute of Advanced Study in Science and Technology, Guwahati, India M. K. Bhuyan Indian Institute of Technology, Guwahati, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_18
185
186
K. Bora et al.
cells among around 100,000 normal cells on a smear. By examining the morphological, textural, and color characteristics of the cervical cells, the degree of dysplasia may be determined. The Bethesda System (TBS) offers a consistent framework for classifying the severity of dysplasia [21]. Therefore, a system that automatically classifies cervical cells based on their characteristics and adheres to the TBS framework would be useful in terms of diagnostic efficacy. With this motivation in this paper, a methodology is been proposed to automated the detection of the cervical dysplasia to make diagnosis efficient. The selection of features for dysplasia detection and classification is an important factor for consideration. Methodologies could be based on morphological features (namely changes in area, perimeter, circularity, etc., of the object of interest) or texture (the regular repetition of an element or pattern on a surface) and color (namely energy, entropy, contrast, etc.) features [9, 22]. Morphological features can be identified using an efficient segmentation technique. But no segmentation technique is ideal (i.e., which gives 100% accuracy) and is also database dependent. In this context, texture and color features fair better as they are not slave of the underlying segmentation technique. Pap smear cell images are very complex containing different curves and contours of the objects (namely nuclei and cytoplasm). So the feature vector which efficiently represents these details shall lead to an effective automated system design. We summarize some related work in Table 1. With fewer coefficients and the capacity to view pictures in different scales, orientations, and resolutions, multi-resolution transformations can describe images more effectively. Additionally, they are both temporal and frequency localized; therefore, no prior segmentation method is required for analysis. On the basis of its strong mathematical foundations, the multi-resolution study offers a fresh approach to image representation. In recent years, much effort has been placed in designing directional representation of images such as Ridgelet [5], Curvelet (CVT) [6], Contourlets (CRT) [13], Non-subsampled Contourlet Transform (NSCT) [10], Ripplet (RT) [28], and Shearlet (SH) [18] (and many more) under the concept of Multi-scale Geometric Analysis (MGA). These transforms are applied mostly in many Content-Based Image Retrieval Systems (CBIR) [17, 26] till now. Pros and cons of each method can be found in Ref [2]. The main objective of this work is to develop a comprehensive model applying multi-resolution transforms in combination with two feature representation schemes (F1: First order statistical features mean ‘μ’ and variance ‘σ ’ and F2: Generalized Gaussian Distribution (GGD) features ‘α’ and ‘β’) followed by well-known classification techniques: Least Square Support Vector Machine (LSSVM) and Multilayer Perceptron (MLP). We build the model by training and testing to output the degree of dysplasia. Three different color models, viz. YCbCr, HSV, and RGB are used to select the best color channel. We then apply three transforms: Discrete Wavelet Transform (DWT), Ripplet Transform (RT), and Non-subsampled Contourlet Transform (NSCT) each with different filters. The effectiveness of the three transforms or MGA tools is evaluated. RT and NSCT transforms are chosen as they are considered as two superior transforms to study hierarchical representation of image features
4 generated image
38 generated image
Cell
Cell
Smear
Smear
Smear
Cell
Smear Smear Cell
Cell
Cell
Cell
Chankong et al. [7]
Sarwar et al. [25]
Garcia-Gonzalez et al. [14] Plissiti et al. [23]
Plissiti et al. [24]
Chen et al. [8]
Lu et al. [20] Bora et al., [4]
Bora et al. [3]
Yaman et al. [29]
Liu et al. [19]
SIPaKMeD and Mendeley LBC CRIC and SIPaKMeD
1610 image
900 image 1610 cell and 1320 cell image
1814 image
152 image
Herlev ERUDIT LCH Herlev
Herlev
Cell
Li et al. [16]
Database
Level
Author
Table 1 Related works on Pap smear image analysis
20
9
Feature extraction
Joint optimization method Proposed method
Gobal and local
1000
4096
121
Morphological operation and used Fuzzy C mean and SVM to detect true nuclei Physical deformable model proposed by Nastar and Ayache Proposed method 13
Multi-scale edge detector
Radiating Gradient vector flow Fuzzy C mean
Segmentation
Bayesian, LDA, KNN, SVM,ANN 13 different classifiers
Classification
Supervised method
Unsupervised method Supervised method
CNN+ViT+MLP
Ensemble using weighted majority voting LSSVM softmax regression Cubic SVM
Different filter SVM method and wrapper methods
Feature selection
Automated Cervical Dysplasia Detection: A Multi-resolution … 187
188
K. Bora et al.
and DWT served as the basis of all comparative studies. Although these are existing transforms in literature, as per our knowledge these transforms were not studied for Pap smear image analysis. In this phase, we identify the best combination of ‘transform (filter) ⇒ color model ⇒ feature representation ⇒ classifier’ to build the model. The output of this study work is compared with existing studies to check its acceptability. The novelty of this paper lies in the proposed framework, creation of indigenous dataset, and the results obtained during the experimentation. The paper is organized in the following sections: Section 2 describes the proposed methodology, Sect. 3 emphasizes on the results obtained during the experimentation, and Sect. 4 demonstrates the performance of the proposed method in comparison with existing approaches.
2 Methods Figure 1 displays the block diagram of the proposed work. The work has been completed in five steps—In Phase 1, database is generated, Phase 2 involves decomposition of images into different color channels, which is followed by multi-resolution transform features extraction in Phase 3 using DWT, RT and NSCT, each with different combination of filters. Feature representation is performed in Phase 4, and finally, classification is performed in Phase 5.
2.1 Database Generation and Ground Truth Preparation Two databases were used for all the experiments. Generated database: The data was generated from smears collected from two renowned and public hospitals/pathological centers of this region: Dr B. Borooah Cancer Institute (BBCI) (a cancer center under the Department of Atomic Energy, Govt. of India) and Ayursundra Healthcare Pvt. Ltd (one of the most reputed private pathological centers in the city). Staining and preparation of slides were performed at the respective laboratories where data were collected, under the supervision of certified cytopathologists engaged in these institutions, images were captured using Leica ICC50 HD Microscope using 400X resolution with 24-bit color depth. A sample of 34 and 98 smears was collected from the centers, respectively. Images captured by us were monitored, validated, and finally ground truth marked by the pathologists of each center. To design the cell level database the cervical cells which were marked by the pathologist were manually cropped out from the 132 slides images (per patient). Finally, the database contained 1611 single cervical cells. This was followed by categorization of these cell images following TBS framework of classification, i.e., NILM, LSIL, and HSIL (including SCC), with the help of the pathologists. Following this classification, the database contained 1001 NILM, 400 LSIL and 210 HSIL (including SCC) images. This study is been approved by Ethical Committee
189
Fig. 1 Overview of the proposed work
Automated Cervical Dysplasia Detection: A Multi-resolution …
190
K. Bora et al.
for Human Studies of IASST with registration number ECR/248/Indt/AS/2015 of Rule 122D, Drugs and Cosmetic Rule, 1945. Herlev Database: The proposed system is also trained and tested on the benchmark Herlev University database to check its consistency (Herlev database is available on: http://labs.fme.aegean.gr/decision/downloads). This database contained 242 NILM, 182 LSIL and 489 HSIL (including SCC) cases.
2.2 Color Channel Study Signifying Its Importance Color channel study is very important to deal with images taken under uneven illumination. RGB color model can only define the uniform characteristics of color and fail when the variation of chromaticity information is present under uneven illumination. Luminance and Hue-based models like YCbCr and HSV perform better as compared to RGB as they are capable of defining an uneven non-uniform characteristic of color, through color and texture information. In YCbCr color space, intensity plane (Y) characterizes the texture information, while the decomposition over chromaticity planes (Cb and Cr) reflects color information. Again in HSV, Hue (H) and Saturation (S) represent color information, and Value (V) represents intensity-based texture information. Both YCbCr and HSV can reflect the uneven illuminations, but in YCbCr, the ‘Cb’ and ‘Cr’ component is uncorrelated making it more favorable than HSV where the components ‘S’ and ‘V’ are highly correlated. For this study, color spaces YCbCr, HSV, and RGB are explored individually to study all the three feature vectors.
2.3 Feature Extraction Following sub-sections gives a brief discussion on the methodologies of each transform used for feature extraction. A . DWT It is localized in both time and frequency domain. Wavelets are the tensor products of 1D wavelet, thus have only three directions namely horizontal, vertical, and diagonal [30]. Application of DWT results in generation of four sub-bands with the following information: (a) Low-Low (LL) sub-band: average information of the image, (b) Low-High (LH): information on horizontal components, (c) High-Low (HL): information on vertical components, and (d) High-High (HH): information on diagonal components. Further, decomposition on LL sub-band can be performed iteratively to obtain image components at different resolutions. Wavelet is considered as the basic of all MGA tools. That is why any comparison-based study without DWT is considered as an incomplete study. But it is unable to describe image features on edges. For DWT-based classification Haar (db1), Daubechies (dbN, N = 2, 4, 6, 8,
Automated Cervical Dysplasia Detection: A Multi-resolution …
191
10), Coiflet (coifN, N =1, 2, 3, 4, 5), and Bi-orthogonal (biorN.N, N.N=1.3, 2.2, 3.5, 4.4, 6.8) filters were used. B. RT Type-I RT was proposed by Jun Xu et al. [28] to overcome the drawbacks of CVT. It has the capability of localizing a signal in both spatial and frequency domain and can capture 2D singularities along different types of curves by proving a flexible degree and support parameters. It generalized the parabolic scaling law by introducing two new parameters, i.e., support and degree. For digital image-based application, RT needs to be in discrete form, and this can be achieved by discretizing the parameters involved in calculation. Different pyramidal ( pyr ) and directional filter (dir ) combinations used in the study are {( pyr, dir ) : (5 − 3, 9 − 7)(9 − 7, 5 − 3)(5 − 3, pkva)(9 − 7, pkva)}. C. NSCT NSCT has the property of flexible multi-scale anisotropy, multidimensional expandability, full shift-invariance, and fast implementability [11, 12]. It is an improved version of contourlet where later in not shift invariant. The proposed work used the Cunha et al. [12] algorithm to decompose the frequency plane into sub-bands. In doing so Non-subsampled Pyramidal Filter (NSPF) is applied to ensure multiscale anisotropy and Non-subsampled Directional Filter Bank (NSDFB) to provide multi-directional expandability. In performing NSCT, firstly a NSP split decomposes the input into a lowpass and a high-pass sub-band. Then, a NSDFB decomposes the high-pass sub-band into several directional sub-bands. The scheme is iterated repeatedly on the lowpass sub-band. Resulting frequency division, where the number of directions is increased with frequency. Different combinations of pyramidal (pyr) and directional filters (dir) used for NSCT decomposition are {(pyr, dir) : (9 − 7, sinc)(9 − 7, pkva)(pyrexc, sinc)(pyrexc, pkva)}.
2.4 Feature Representation Two feature representation schemes are used in this work to represent the coefficients. Using F1, two statistical features μ and σ are extracted from sub-band coefficients resulting in a feature vector F1 of dimension 66(= 33 × 2) for each color model. Using F2, two GGD parameters α and β using maximum likelihood estimation are extracted which compose the feature vector F2 of dimension 66(= 33 × 2). Different pyramidal (pyr) and directional filter (dir) combinations are used in the study for RT and NSCT.
192
K. Bora et al.
2.5 Classification Two well-known classifiers LSSVM [11, 15, 27] and MLP [1] are used for this study. All the classifiers were trained and tested individually for the two databases. Finally, best experimental results were reported which is a combination of best transform (filters), color space, feature representation and classifier. Assessments were performed based on fivefold cross-validation. Finally, conclusions are drawn based on five performance measures: Accuracy, Precision, Recall, Specificity, and F-score [4]. Different parameters used for MLP are as follows—Learning rate = 0.7, Ratio to increase learning rate = 1.05, Maximum validation failures = 15, Factor to use for memory/speed tradeoff = 1, Minimum performance gradient= 1e-10, Initial Momentum=0.95, Iterations=4000, Performance goal=0.001. For LSSVM, kernel used = RBF-kernel, Cross-validation= 5.
3 Results In this section, we describe the detailed experimental results and discussions of the three multi-resolutional transforms, explained above.
3.1 Results and Discussion Using DWT Figure 2 displays the experimental results of applying DWT transform in Pap smear classification using feature representation scheme F1 and F2, respectively. In the figure, X axis represents different filters and classifiers combination, and Y axis represents the accuracy in different color models. The graph Fig. 2a shows that the combination ‘DWT ( coif5 ) ⇒ YCbCr ⇒ MLP’ results in most satisfactory performance on generated database with classification accuracy of 91.99% using feature representation scheme F1. Same combination ‘DWT ( coif5 ) ⇒ YCbCr ⇒ MLP’ also shows best performance using F2 (displayed in Fig. 2b). Coiflets were originally derived from the Daubechies wavelet. Its use of windows that overlap more, applying six scaling and wavelet functions, results in increase in pixel averaging and differencing that leads to a smoother wavelet and increased capabilities in several image processing techniques. Due to this property, it can represent smooth changing patterns of an image. As no sharp boundaries are found in Pap images, coif5 can easily capture those smooth changing patterns and represent them with lower number of coefficients. Regarding Herlev database, best combination obtained is ‘DWT (bior2.2) ⇒ YCbCr ⇒ F1 ⇒ MLP’. ‘bior2.2’ has the ability of avoiding boundary artifacts and also can represent smooth changes through optimum time frequency localization. Bi-orthogonal filters are more symmetric than coiflet filters and can represent more
Automated Cervical Dysplasia Detection: A Multi-resolution …
193
Fig. 2 Showing results of DWT using a F1 and b F2. Here, X axis shows different classifier and filter combination, and Y axis is showing accuracy of the method under color models YCbCr, HSV, and RGB. Both results are obtained on generated database
details of an image. Since Herlev database image quality is poor, therefore strong filters are needed for its representation. As a result, ‘bior2.2’ works better than ‘coif5’.
3.2 Results and Discussion Using RT Experimental results using RT on generated database are depicted in Fig. 3. It shows different combination of filters, classifiers, color models along two different feature representations. It can be observed that the combination ‘RT (5/3, pkva) ⇒ MLP ⇒ HSV’ gives most satisfactory results (Accuracy = 96.51%) applying feature representation scheme F1 and the same combination ‘RT (5/3, pkva) ⇒ MLP ⇒ HSV’ also gives best result applying scheme F2 with an accuracy of 92.13%. ‘pkva’ directional filter performs better as it captures the high frequency content of the images like smooth contours and directional edges, which play significant role in describing Pap images. On the other hand in pyramidal filter ‘5/3’ performs best where 5 represents
194
K. Bora et al.
Fig. 3 Results of RT under F1 and F2 using generated database. Here, X axis shows different classifier and filter combination, and Y axis is showing accuracy of the method under color models YCbCr, HSV, and RGB
the size of the lowpass filters and 3 represents the size of the high-pass filters. The performance of HSV color model is better for RT-based classification, showing the dominance of intensity-based texture features. Using Herlev database, the combination ‘RT (9/7, pkva) ⇒ MLP ⇒ YCbCr ⇒ F1’ results in good classification result with accuracy 93.45%. Other measures used in comparison is displayed in Table 2. In case of Herlev database, ‘9/7’ pyramidal filter is working better than the ‘5/3’ filter as in generated database. ‘9/7’ represents the length of the dual scaling filter with four vanishing moments, low pass filter with size 9 and with high-pass filter having size 7. The filter with fewer vanishing moments gives less smoothing and remove less details. The Herlev database image quality is poor as compared to generated database image. Further in Herlev database, images have been cropped such that images of malignant cells looks generally smaller than normal ones. That is why high sized filters are required to extract salient information from the images.
3.3 Results and Discussion Using NSCT Figure 4 displays the graphical result of NSCT-based classification. Applied to the generated database, it reveals that the combination ‘NSCT (pyrexc,pkva) ⇒ YCbCr ⇒ MLP’ gives most satisfactory classification result (97.02% accuracy) using F1, using F2 the best results (90.47% accuracy) are obtained using the combination ‘NSCT (pyrexc,sinc) ⇒ YCbCr ⇒ MLP’. These results were obtained on generated database. Other measures are listed in Table 2. It is observed that in all the above cases
Automated Cervical Dysplasia Detection: A Multi-resolution …
195
Fig. 4 Results of NSCT under F1 and F2 using generated database. Here, X axis shows different classifier and filter combination, and Y axis is showing accuracy of the method under color models YCbCr, HSV, and RGB
‘pyrexc’ NSPF works best. This filter is derived from 1D using the maximal mapping function with two vanishing moments but exchanging two high-pass filters. It can represent smooth edges efficiently. Further, ‘pkva’ NSDFB works better than ‘sinc’ as it is used to capture the high frequency content of the images like smooth contours and directional edges. It has the best PSNR performance. Due to this property, ‘pkva’ performs best on real Pap images which has complex features to interpret. On Herlev database, best classification result is obtained with the combination ‘NSCT (pyrexc,sinc) ⇒ HSV ⇒ MLP ⇒ F1’ which gives an accuracy of 93.06%. ‘sinc’ filter performs better in Herlev database as it removes all frequency components above a given cut off frequency, without affecting lower frequencies. Therefore, the smooth regions are efficiently represented by the small size low pass filter by using the ‘sinc’ filter. Other performance measures used in comparison for all the above mentioned combinations are displayed in Table 2.
4 Comparison 4.1 Final Observations Final observation from the study is as follows: • It is observed that NSCT-based classification showed satisfactory results on both the databases. NSCT is fully shift invariant and has the advantage of robust directional frequency localization. So it can be concluded that the performance of classification is highly affected by shift-variance resulting from the sub-sampling
DWT RT NSCT DWT RT NSCT
Generated
coif5 5/3,pkva pyrexc,pkva Bior2.2 9/7,pkva pyrexc,sinc
Filter
Bold values showing the best performances
Herlev
Transform
Database
MLP MLP MLP MLP MLP MLP
Classifier F1 F1 F1 F1 F1 F1
YCBCR HSV YCBCR YCBCR YCBCR HSV
91.99 96.51 97.02 90.60 93.45 93.60
Feature rep- Color model Accuracy resentation
Table 2 Global Comparison of performance of all the transforms
89.33 89.51 95.37 89.91 87.86 88.60
Precision 88.63 90.45 94.70 88.76 88.03 78.29
Sensitivity
93.72 93.78 96.04 89.12 92.68 80.65
Specificity
87.94 89.98 95.02 86.51 87.94 94.05
F-score
196 K. Bora et al.
Automated Cervical Dysplasia Detection: A Multi-resolution …
197
Fig. 5 Comparison with already developed methods
operators. It is also seen that multi-directional information contributes to the final results, justifying why DWT features showing lower statistical significance than NSCT due to its lack of multi-directional property. • In all results, MLP classifier performs best and F1 feature representation scheme outperforms F2. GGD has been used in modeling when the behavior of the concentration of values around the mean and the tail are of particular interest. But from the results, it shows insignificant impact as compared to the first order statistics. It can be concluded that mean and variance is sufficient to represent the transform coefficients. It is worth mentioning that F1 has the added advantage of low computational cost to the approach. • As far as color channels are concerned, YCbCr model outperforms HSV and RGB model in most cases. YCbCr works better than HSV as the two components ‘Cb’ and ‘Cr’ are not correlated unlike ‘S’ and ‘V’ components of HSV. RGB color model is not suitable in practical application because of its uniform characteristics which fail in the presence of highly varying color (chrominance) and intensity (luminance) information. But luminance and hue-based models like YCbCr always work better than RGB as they can describe color and texture information under uneven illumination. Image acquisition normally performs under different light conditions of the microscope that is why uneven illumination is of major concern. • The best combination for the generated database to build the model is NSCT (pyrexc, pkva) ) ⇒ YCbCr ⇒ F1 ⇒ MLP".
4.2 Comparison with Existing Works We have compared the finally selected NSCT-based approach with two existing approaches developed by us. In Approach 1 [4], shape, texture, and color features of the images were considered for classification purpose. Shape features were extracted using proposed segmentation technique. Ripplet Type 1 transform was used for color
198
K. Bora et al.
and texture feature extraction. Some other texture features were extracted using histogram analysis and GLCM method. Finally, classification was performed using ensemble classification using weighted majority voting as a measure. In Approach 2 [3], deep learning features were considered for classification. Here, deep features were extracted using CNN which was followed by feature selection using an unsupervised feature selection technique and classification using LSSVM and softmax regression. The comparison statistics are showed graphically in Fig. 5. From Fig. 5, it is observed that between the first two approaches, Approach 1 is far better both in terms of accuracy and time. Further, the performance of the newly proposed model is more satisfactory than Approach 1 due to two main reasons: a) The accuracy of the newly proposed method (98.00–99.50%) is almost equal to the accuracy of Approach 1 (98.00–99.01%), whereas b) the computational time of the new method is less than half that of Approach 1.
5 Conclusion In this research, we have performed an in-depth study on multi-resolution transforms, viz. DWT, RT, and NSCT for Pap smear classification and dysplasia detection. The results are also compared with existing approaches. Extensive experiments show that NSCT-based classification gives best performance when compared to DWT and RT. Impact of different color models and feature representations can be observed properly from the study. It is revealed that the appropriate selection of channel and feature representation scheme improves the representation of the transform coefficients extensively. The proposed model also has a low computational time, adding to its advantage. For future work, different feature selection techniques are to be studied to eliminate redundant features and to further improve the classifier performance. We also aim to implement a complete decision support system by considering the confounding factors like age and menstruation cycle which may also affect the final decision of diagnosis.
References 1. Betker A, Szturm T, Moussavi Z (2003) Application of feedforward backpropagation neural network to center of mass estimation for use in a clinical environment. In: Engineering in medicine and biology society, proceedings of the 25th annual international conference of the IEEE. vol. 3. IEEE, pp 2714–2717 2. Bora K, Mahanta LB, Das AK (2018) Fuzzy nsct based feature extraction method for automated classification of pap smear images. Int J Appl Eng Res 13:6709–6716 3. Bora K, Chowdhury M, Mahanta LB, Kundu MK, Das AK (2016) Pap smear image classification using convolutional neural network. In: Proceedings of the tenth Indian conference on computer vision, graphics and image processing, 8. ACM, USA, pp 55, 1–55 4. Bora K, Chowdhury M, Mahanta LB, Kundu MK, Das AK (2017) Automated classification of pap smear images to detect cervical dysplasia. Comput Methods Programs Bio-med 138:31–47
Automated Cervical Dysplasia Detection: A Multi-resolution …
199
5. Candes EJ, Donoho D (1999) Ridgelets: a key to higher-dimensional intermittency. Philos Trans Lond Royal Soc 357:2495–2509 6. Candes EJ, Donoho D, Ying L (2006) Fast discrete curvelet transforms. Multiscale Model Simul 5:861–899 7. Chankong T, Theera-Umpon N, Auephanwiriyankul S (2014) Automatic cervical cell segmentation and classification in pap smears. Comput Methods Programs Biomed 113:539–556 8. Chen YF, Huang PC, Lin HH, Wang LE (2014) Semi-automatic segmentation and classification of pap smear cells. IEEE J Biomed Health Inform 18:94–108 9. Chin Neoh S, Srisukkham W, Zhang L, Todryk S, Greystoke B, Peng Lim C, Alamgir Hossain M, Aslam N (2015) An intelligent decision support system for leukaemia diagnosis using microscopic blood images. Sci Rep 5:14938 EP 10. Chowdhury M, Das S, Kundu MK (2013) Compact image signature generation: an application in image retrieval. In: 5th international conference on computer science and information technology (CSIT). IEEE, Jordan, pp 1–7 11. Chowdhury M, Kundu MK (2015) Comparative assessment of efficiency for content based image retrieval systems using different wavelet features and pre-classifier. Multimedia Tools Appl 74:11595–11630 12. Cunha ALd, Zhou J, Do MN (2006) The non subsampled contourlet transform: theory, design and application. IEEE Tran Image Process 15:3089–3101 13. Do MN, Vetterli M (2005) The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans Image Proc 14:2091–2106 14. Garcia-Gonzalez D, Garcia-Silvente M, Aguirre E (2016) A multiscale algorithm for nuclei extraction in pap smear images. Expert Syst Appl 64:512–522 15. Hsu C, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13:425–425 16. Li K, Lu Z, Liu W, Yin J (2012) Cytoplasm and nucleus segmentation in cervical smear images using radiating gvf snake. Pattern Recogn 45:1255–1264 17. Li S, Yang B, Hu J (2011) Performance comparison of different multi-resolution transforms for image fusion. Inf Fusion 12:74–84 18. Lim WQ (2010) The discrete shearlet transform: a new directional transform and compactly supported shearlet frames. IEEE Trans Image Process 19:1166–1180 19. Liu W, Li C, Xu N, Jiang T, Rahaman MM, Sun H, Wu X, Hu W, Chen H, Sun C et al (2022) Cvm-cervix: a hybrid cervical pap-smear image classification framework using cnn, visual transformer and multilayer perceptron. Pattern Recogn 108829 20. Lu Z, Carneiro G, Bardley AP (2015) An improved joint optimization of multiple level set function for the segmentation of overlapping cervical cells. IEEE Trans Image Process 24:1261– 1272 21. Nayar R, WIlbur D (2015) The pap test and bethesda 2014 “the reports of my demise have been greatly exaggerated.” (after a quotation from mark twain). J Am Soc Cytopathol 4:170–180 22. Parmar C, Leijenaar RTH, Grossmann P, Rios Velazquez E, Bussink J, Rietveld D, Rietbergen MM, Haibe-Kains B, Lambin P, Aerts HJWL (2015) Radiomic features clusters and prognostic signatures specific for lung and head & neck cancer. Sci Rep 5:11044 EP 23. Plissiti ME, Nikou C, Charchanti A (2011) Automated detection of cell nuclei in pap smear images using morphological reconstruction and clustering. IEEE Trans Inf Technol Biomed 15:233–241 24. Plissiti M, Nikou C (2012) Overlapping cell nuclei segmentation using a spatially adaptive active physical model. IEEE Trans Image Process 21:4568–4580 25. Sarwar A, Sharna V, Gupta R (2015) Hybrid ensemble learning technique for screening of cervical cancer using papanicolaou smear image analysis. Personalized Med Universe 4:54–62 26. Shan H, Ma J, HY (2009) Comparison of wavelets, contourlets and curvlets in seismic denoising. J Appl Geoph 69:103–115 27. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300
200
K. Bora et al.
28. Xu J, Yang L, Wu D (2010) Ripplet: a new transform for image processing. J Visual Commun Image Represent 21:627–639 29. Yaman O, Tuncer T (2022) Exemplar pyramid deep feature extraction based cervical cancer image classification model using pap-smear images. Biomed Signal Process Control 73:103428 30. Zhang W, Xia ZJ, Wang Z, Xia C (2011) Comparison of wavelet, gabor and curvlet transformm for face recognition. Opt Appl XLI:183–193
An IoT-Enabled Vital Cardiac Parameter Monitoring System on Real-Time Basis Nayana Dey and Pramit Ghosh
Abstract Health is an indicator of a person’s physiological and mental health which not only depicts a lack of illness but also defines the growth of that person (Huang et al. in International Journal of Communication Systems 34(4):e4683, [1). Keeping track of our health status regularly is inconvenient because of the strenuous schedules in our daily life. This system not only captures bio-signals like ECG which is a prime indicator of heart condition, by using AD8232, but also monitors other needed parameters such as SpO2 level and pulse rate by using MAX30102 and body temperature by using DS18B20. It also provides a maintained atmosphere like hospitals, by monitoring environment parameters like temperature and humidity level by using a DHT11 sensor. It uses an ESP32 microcontroller for Wi-Fi purposes; here it is used as a station point. This system does real-time monitoring of parameters of cardiac patients, then data is uploaded to the Cayenne cloud which provides data visualization, storing, and alerting. This system also sends a notification if an alarming condition occurs. When any of the parameters goes above the typical range, it sends an alert via mail or SMS to the registered mail or mobile number to notify the user or user family. Keywords Internet of things (IoT) · Temperature sensor (DHT11 · DS18B20) · ECG sensor (AD8232) · Pulse oximetry sensor (Max 30102) · Cayenne IoT application
N. Dey (B) · P. Ghosh Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_19
201
202
N. Dey and P. Ghosh
1 Introduction Internet of Things (IoT) has emerged day by day in many sectors; health care is one of the successful applications of IoT. Health is the most valuable resource that should be maintained by providing proper diagnosis and medical care. The traditional health monitoring system is like this, where a person needs to go pathological center for diagnosis and they have to pay for that service, but the cost of diagnostic is skyrocketing day by day, so this is a costly solution for health care, and everyone cannot afford it. People can get their diagnosis reports after some time, so doctors cannot perform diagnoses based on real-time data. It is time-consuming also to collect reports after some time. This is a major problem of traditional health monitoring. Due to certain factors, such as deficiency of health services and a big discrepancy between rural and urban areas, everyone does not get the same health facilities in proper time, a dilemma is created in health services, and unfortunately, humans suffer. In rural areas, hospitals are not located nearby and are limited, so for checkups patients have to travel a lot of distances like miles, which is more difficult for senior citizens who required check-ups regularly. There are several gadgets available in this domain like MI bands, smartwatches, etc., but they come with an expensive solution. So this is not a feasible solution for everyone. Devices like ECG only capture bio-signals like ECG but cannot measure other parameters which are related to the heart and do not perform monitoring during 24*7 intervals. The halter can provide 24*7 basis monitoring, but cannot measure other parameters or monitor the surrounding condition, and an oximeter measures only oxygen and pulse rate but no other parameters like ECG signals [2]. So for monitoring a cardiac patient completely on daily basis, it is not sufficient to use only one device that is already available in the market, which in turn increases complexity as well as cost. This system is a one-time investment, so it is cost-efficient, and this system provides real-time monitoring of a cardiac patient’s necessary health parameters like SpO2 level, body temperature, and pulse rate, not only capturing ECG signals. The heart is responsible for pumping the blood, so if the body’s oxygen level is not normal that means the heart is not pumping well, thus SpO2 level is needed to be measured regularly, for a heart patient [3]. At the time of heart failure, the heart muscle may not pump well as they are weaker. So low oxygen level implies the heart is not working well and oxygen therapy should be given to the patient. If oxygen level goes below 90 percent, it is dangerous for heart patients, so SpO2 levels should be monitored daily. Heart rate is how many times the heart beats in a one-minute duration, also known as pulse rate [4]. Cardiac arrest is not only captured by doing ECG; when cardiac arrest occurs, it has several symptoms like pulse rate going abnormal, a person also starting sweating, and SpO2 level getting abnormal. ECG machine can capture if there is any discrepancy in the ECG graph but cannot capture all these parameters, and people cannot do ECG at any time because have to go to their nearby diagnostic center for doing this. This proposed system is enabled to capture all those symptoms as well as capture ECG signals, so proper care can be taken at right time and from anywhere, and
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
203
Fig. 1 a Traditional health monitoring system patient’s activity [5], b IoT-based smart healthcare [6]
severe conditions can be avoided. In pandemics like COVID-19, cardiac patients have suffered the most as they are critical patients and they need care on a regular basis. During the pandemic, people could not go outside to do check-ups, and without getting proper diagnostics, they even died. COVID-19 was dangerous more for heart and lung patients, so they need more care, but they did not get it (Fig. 1). This proposed system can help those patients, and as they can be aware of their health condition, they can take proper precautions. For post-surgery patients who are not willing to stay at hospitals for a long time but also required routine check-ups, this system is a feasible solution for them. Heart patients should be kept in a maintained environment. This system also measures surrounding conditions such as room temperature and humidity level, along with measuring physiological parameters; thus it can provide a maintained atmosphere like a hospital. As we lead a strenuous life, stress is a common thing, which is affecting our health badly. People do not go for check-ups until serious conditions do not occur due to lack of time. But using this system, people can check their necessary physiological parameters in their leisure time and stay healthy. As the accessibility of care increases which in turn increases patient involvement in doing check-ups on a daily basis, so critical diseases can be recognized at an earlier stage. Doctors can easily provide recommendations based on collected real-time data which is obtained through sensors, without visiting the doctor’s chamber physically, so it saves a lot of time for doctors and patients, diagnosis is done in proper time, and severe diseases can be caught at an earlier stage. This system has a unique notification method; it can send notifications to the user via mail or SMS, if an alarming condition arises. Smartwatches or other devices can send alerts like “Normal” or “Above the normal range” based on physiological parameters and real-time readings. But they cannot notify the user’s family. This system can send alerts when an alarming condition occurs, via mail or SMS to registered mail or mobile number. This number or mail can be of a nearby hospital or user’s family, to help those patients who live alone or cannot take help by their own.
204
N. Dey and P. Ghosh
2 Related Work Patients may die abruptly in severe health conditions, due to dysfunction of one of the organs because they are not monitored on a regular basis. The organs create some abnormal signals, which are called as bio-signals, which can be caught through sensors. So, we proposed a system that is fully automated and also provides notification. Tamilselvi et al. [7] proposed a system to monitor symptoms like heart rate, oxygen saturation level, temperature, and movement of the eye using sensors and Arduino-UNO. MAX30205, SpO2 sensor, heartbeat sensor, temperature and eye blink sensors, and body movement sensors have been used for health monitoring, but the performance of each individual sensor has not been given; results have been shown on the liquid crystal display (LCD). Acharya et al. [8] have developed a system that monitors body parameters such as ECG, heartbeat, body temperature, and respiration using sensors and Raspberry Pi. The collected data is monitored and has been analyzed using Raspberry Pi, but there is no interface like LCD for visualization of results. Islam et al. [9] have introduced a system that continuously not only monitors patient’s body parameters like heartbeat and body temperature but also monitors room temperature, humidity level, and CO, CO2 level for providing better care to critical patients, especially for the post-surgery patients. For this purpose, LM35 sensor is used for body temperature, DHT11 sensor is used for room’s room temperature and humidity, MQ-9 is used to detect CO level, MQ-135 is used to detect CO2 level, heartbeat sensor is used for a heartbeat, and ESP32 is used for communication with other Wi-Fi and Bluetooth devices. Data has been sent to the web server, but the alerting application is not there if one of the levels goes higher. There are no interfaces like LCD to show data without opening the server. Islam et al. [10] have proposed a system to monitor necessary physiological parameters remotely, specially developed for asthma patients. For asthma patients, SpO2 level monitoring is very important and also pulse rate on a regular basis. The quality of air also needs to be monitored because they need pure air to breathe. Now in this pandemic of COVID-19, asthma patients should need more care because if SpO2 level goes down, it is dangerous for them. To implement this system, several sensors are used, like a MAX30100 pulse and SpO2 sensor, an MQ-135 air quality sensor, and a DHT11 temperature and humidity sensor. ESP8266 microcontroller is used for connecting with other devices through Wi-Fi or Bluetooth. An SMS sending framework for alerting patients is there if one of the necessary parameters goes above the normal range. It is an important feature of this system (Fig. 2).
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
205
Fig. 2 a Remote patient monitoring system [11], b general IoT in health care [12]
3 Problems in Relative Domain, and Solution is Provided by the System MI bands or health bands or smartwatches can monitor our heart rate, steps count, and others, but they come with costly solutions, so everyone cannot afford them to buy. Whereas this system provides a solution to this same domain but it is cheap and as it is lightweight, it can easily be portable from one place to other. MI bands only check some parameters which are related to the heart but cannot capture ECG signals. ECG machine only captures ECG signal but not all other parameters which are related to heart but not captured in a 24*7 duration. The halter machine provides 24*7 monitoring of ECG signal, which is a good option for a cardiac patient, but it cannot provide monitoring of other parameters such as pulse rate, temperature, environment parameters like humidity level, and the temperature level of the atmosphere, which are also for important for a heart patient to be in a good condition. Thus, a patient has to accommodate a lot of devices to be monitored completely, which in turn increases complexity as the number of devices increases. A person has to learn the usage of different devices as they have to use a lot of devices; it is time-consuming and also a costly solution. People have to lead a strenuous life thus they don’t have time to go for regular check-ups. Thus, critical diseases cannot be recognized at an earlier stage. As this device allows the user to do check-ups at any time anywhere, so people can do check-ups in their leisure time and stay healthy. Patients who do not want to stay at the hospital can be monitored by this system at home, as they live with their families and recover early. Cardiac arrest occurs with some symptoms such as fatigue and sweating, which are not recognized always at home, because to perform ECG, patients have to go to their nearby diagnostic centers. This proposed system can recognize those symptoms as it contains an ECG sensor along with temperature, SpO2 , and humidity sensors. MI bands and other systems can send alerts when any of the parameters goes above or below the typical value, but cannot notify the patient’s family. This system can send alerts to the registered mobile number or registered email, via SMS or mail, when an alarming condition occurs which means values of the parameters go
206
N. Dey and P. Ghosh
above or below the threshold level. The registered mail or number can be of the patient’s family or nearby hospitals to help those who are staying alone but are in a critical condition which makes this system unique. Our device integrates all the measurements required by a heart patient and generalizes the restrictive facilities provided by a particular electronic gadget.
4 Challenges Security and privacy: When data is transmitted through the embedded system which contains several connected devices such as smartphones and sensors, during the process of sending data, it may go through a higher risk of hacking or hackers, so data must be encrypted during transmission. Integration: Multiple devices and protocols integration is another difficult task to be performed successfully. Smartphones actively collect data from networks, to which they are connected. Aggregation of information can be done by using some protocols. Technology adoption: To create a new app that involves ideas to help doctors and patients, it should be monetized in the healthcare system.
5 Advantages Accessibility: IoT-driven health devices to increase the availability of health services and normal diagnostics are now on the room doorstep like ECG signal capturing, blood pressure measurement, blood sugar measurement, SpO2 level checking, body temperature checking, etc., by using those devices. Those devices also provide the room’s condition like hospitals, by monitoring room temperature, air quality, humidity level, etc. continuously. Cost: IoT-driven devices can provide health services at home, patients do not need to travel, so it reduces the expenses to avail of those services as well as travel costs. This provides checking of necessary parameters on a regular basis, so critical diseases can be detected before they go out of hand, and proper treatment can be done at the right time. Patients get better outcomes: Doctors and caregivers can provide recommendations on the basis of real time, so patients are able to get diagnostics properly remotely. Doctors can make a diagnosis based on regular basis data, which helps them to provide better outcomes [13].
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
207
Decrease in errors: IoT-embedded devices consist of several sensors and actuators, data is sensed through sensors and sent through networks to the server automatically, so error rates decreases than manually [13]. Patient engagement increases: The role of engagement of patients increases as patients get engaged more actively in health monitoring; they can check their. health status on regular basis more frequently because they do not need to spend money and time on it. It is a one-time investment and cheaper than the traditional healthcare system available in society [13].
6 Methodology The entire methodology and sensors which are required to implement a patient monitoring system to monitor cardiac patients remotely are described in this section. The main objective of this system is to develop an interface in that cardiac patients can get regular check-ups of their vital physiological parameters at home without going to the doctor’s chamber or any local diagnostic centers, which makes health care more convenient and easier. Patients can monitor symptoms of vital organs as bio-signals using sensors on their own or with the help of their family members, so that severe diseases can be recognized at an earlier stage, and thus it also helps to stop premature death. So, the patients especially the older ones or post-surgery patients feel difficulties going for check-ups frequently. As patients get regular monitoring and hospital-like medical care at home, it also helps them to improve their mental health which is important for patients for fast recovery. It also reduces a lot of costs because patients need not stay at hospitals for a long period. DS18B20 is used for sensing human temperature, MAX30100 is used for pulse and oxygen level checking, and AD18B20 is used for ECG signal, and for room temperature and humidity measurement, DHT11 is used, in this project. These sensors are connected to a control unit, which calculates the values of all the sensors. ESP32 microcontroller is used as WiFi; it is used to send data to the local server via HTTP protocol and also send data to the cloud by using MQTT protocol. All the calculated values will be displayed on a website that can be accessed by using an IP address (like 192.168.43.2). The Cayenne IoT platform is used as a cloud, to where data is uploaded which shows live values of patient vital parameters. When data is sent to the server, if one of the real-time data from sensors goes above the normal range, an alert is sent to a user for taking immediate care to avoid any severe condition (Fig. 3). There are two stages to developing this system: the planning stage and the actual development stage. In the planning stage, first, we have to figure out the problem regarding health care, how people suffer due to which medical facility lacking, then analyze the problem, then do a survey on relative works, and then make a comparison on those relative works. Then, we have to make an outline of the patient monitoring system based on the corresponding perspective. In the actual development stage, there are several sensors that continuously collect human physiological parameters
208
N. Dey and P. Ghosh
Fig. 3 Overall architecture of patient monitoring system
like temperature, SpO2 level, pulse rate, and ECG signals as well as the room’s conditions like temperature and humidity level. Then data is sent to the server as well as to the Cayenne IoT cloud. Users can view their vital details from the cloud or server. A notification is delivered to the user, if an alarming condition occurs which means if one of the parameters goes above their threshold level, via mail or SMS sending system to registered phone numbers and registered emails. A notification is sent to the registered mobile number or register mail; this number or mail can be of nearby health centers or family members, to help those patients who are unable to go on their own or who are staying alone.
7 Hardware and Sensors ESP32: ESP32 is a flexible System on a Chip (SoC), also low cost, an improvement on the ESP8266. It is dual core, it has two 32-bit Tensilica Xtensa LX6 microprocessors, and thus, it is a powerful dual core. It has integrated connectivity protocols such as
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
209
Fig. 4 ESP32 as Wi-Fi station [16], DS18B20 temperature sensor [17]
Wi-Fi and Bluetooth. ESP32 contains all the state-of-the-art characteristics of a lowpower chips. It wakes up only when a specific condition arises on a periodical basis, and thus it can be efficient to be used, in low-power IoT sensor hub applications [14]. It can be used as a station point, access point, and both; here it is used as a station point. DS18B20 Temperature Sensor: DS18B20 is a sensor to measure the temperature of a particular device; it provides a reading of the temperature in 9 to 12 bits. It measures temperatures ranging from -55 °C to + 125 °C which is equal to -67°F to + 257°F [15]. Each DS18B20 contains a 64-bit unique serial code stored in onboard ROM, which allows multiple DS18B20 to do functioning on the same one-wire bus. In parasite power mode, when no external power is required, this requires two pins (DQ and GND) for operating. Alarm signal operation: The master device is sending an alarm search command [ECh] to check the status of the alarm flag to check if any alarming condition occurs in any DS18B20 on the bus or not. The DS18B20 with a set alarm flag will react to that alarm search command [ECh] and the master device will get the information that which DS18B20 sensor is going through an alarm condition [15] (Fig. 4). MAX30102: Pulse oximetry and heart rate monitoring sensor: This sensor contains two LEDs; they are RED and IR LED, and it has a photodetector. It operates by shining both lights onto anywhere on the body, where the skin is not too thick to penetrate the tissue by both of the lights, most often done on the finger or earlobe, and the amount of reflected light is measured by a photodetector. The methodology to detect pulse by using light is known as photoplethysmogram (Fig. 5). Heart Rate Measurement: In arterial blood, the oxygenated hemoglobin (Hbo2) can absorb IR light. More the blood is redder, more presence of hemoglobin in the blood, and they can absorb more IR lights. At each heartbeat, if blood pumping is done through the finger, a change occurs in the amount of reflected light, which in turn makes a waveform which is changing with the change in reflected light at the output panel of the photodetector [19]. Hence, if we continue the process of taking readings from the photodetector and continuing to switch on the light, we soon start getting a heartbeat pulse reading.
210
N. Dey and P. Ghosh
Fig. 5 a MAX30102 pin configuration [18], b MAX30102 components [19], c heart rate and oximetry measurement [20]
Pulse Oximetry Measurement: For the purpose of calculating SpO2 level in the blood, we have to measure the ratio of IR and RED light which has been received by the photodetector as deoxygenated blood can absorb more RED lights and oxygenated blood can absorb more IR lights. DHT11 is a temperature and humidity measuring sensor: A humidity sensor is also present along with a thermistor. Moisture-holding substrate is sandwiched between two electrodes which in turn is responsible to increase conductivity between two electrodes. If relative humidity increases, resistance between two electrodes decreases and vice-versa is also true. An NTC/thermistor is a thermal resistor, and its resistance changes along with the temperature. “Negative Temperature Coefficient” is responsible for defining decrement in resistance according to the increment in temperature. AD8232: ECG sensor: AD8232 ECG module contains an AD8232 IC, which is basically a single chip that is responsible for the extraction, amplification, and filtering of bio-potential signals like ECG (Fig. 6). Electrocardiogram (ECG) is basically a non-invasive procedure that means it does not perform penetration upon the skin physically to capture bio-signals; ECG is the process of holding records of hearts graphically. Several repetitive cycles come with ECG graphs, each of which includes P, Q, R, and S intervals. P wave indicates atrial depolarization, the QRS wave is responsible for ventricular depolarization and atrial repolarization, and the T wave indicates ventricular repolarization. The SA node (sinoatrial node) is called the source of
Fig. 6 a AD8232 [21], b ECG signal graph plotting [22]
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
211
ECG, which is called the pacemaker of the heart, and a P wave has been produced by it. After that, the signal goes toward the AV node (atrioventricular node), which is known as the atrioventricular node (AV) that is responsible to produce the QRS wave. Local Web Server: Data is uploaded both on the web server (To the local IP Address) and Cayenne IoT cloud. Uploading the data to the web server is done with the help of the HTTP protocol. We have to include libraries such as #include < WiFi.h > and #include < WebServer.h > to send data to cloud. We have to define our Wi-Fi network SSID and password to connect ESP32 with our Wi-Fi network. The web server runs on port 80. After the connection is done, it assigns an IP address dynamically, which is 192.168.43.2, and just copy the IP address and paste it into the web browser. It displays the readings of sensors dynamically. To make this page dynamic, we use Ajax, so data is updated automatically. It also provides alerts when an alarming condition occurs, which means when any of the human physiological parameters goes above the threshold level. The Cayenne Cloud: Data is uploaded to the Cayenne IoT platform as well by using the MQTT protocol. To remotely control ESP32, we have used the MQTT protocol. We have to include some libraries such as #include < CayenneMQTTESP32.h > , #define CAYENEE_DEBUG, and #define CAYENEE_PRINT Serial to upload the data at Cayenne IoT platform. Then we have to provide credentials such as SSID and the password of our Wi-Fi to connect ESP32 with our Wi-Fi network. Then we have to provide Cayenne authentication information such as: char username[] = “MQTT_USERNAME”; char password[] = “MQTT_ PASSWORD”; char clientID[] = “CLIENT_ID”; Cayenne IoT platform shows the reading of sensors. In the Cayenne platform, communication can be done as publish and subscribe system by using Message Queuing Telemetry Transport (MQTT) protocol. The MQTT connection is established between the client and the broker; thus both should have a TCP/IP stack. Clients cannot connect with each other directly. For the purpose of initialization of a connection, the client is transmitting a CONNECT message to the broker, then the broker reacts with a CONNACK message, and also a status code is sent. A good MQTT client can send a connect message with such arguments as Client Id, Clean Session, Username/Password, Will Message, and Keep Alive. After receiving a CONNECT message, it has to give a response with a CONNACK message. The CONNACK message also includes two entries; they are the session present flag and a connect return code.
8 Circuit Diagram See Fig. 7.
212
N. Dey and P. Ghosh
Fig. 7 a Pin configuration of the system, b circuit diagram of the system
9 Connections • MAX30102 to ESP32 – VIN to 3.3V, SCL to GPIO22, SDA to GPIO 21, GND to GND • DS18b20 to ESP32VCC to 5V, DQ to GPIO D2, GND to GND • DHT11 to ESP32VCC to 3.3V, Data pin to GPIO D4, GND to GND • AD8232 to ESP32Output pin to VPP, LO+ to GPIO D35, LO- to GPIO D34, GND to GND
10 Result Analysis DHT11 is connected to the ESP32 GPIO D4 pin. This sensor works perfectly, as it changes results according to the change in atmosphere. For the purpose of measuring a person’s body temperature, DS18B20 sensor is used. It is connected to ESP32 with a GPIO D2 pin. If body temperature goes above the threshold level, that is 37 °C, it also notifies the user. This sensor value can be tested by measuring body temperature by using a thermometer. By comparing the actual value and observed value, we derived the error percentage. As the error rate is minor, we can conclude that the sensors work perfectly (Figs. 8 and 9; Table 1). MAX30102 is used to measure pulse rate and blood oxygen level (SpO2 Level). SpO2 level and pulse rate are those parameters that should be measured on a regular basis to avoid the severe condition. If the pulse rate exceeds the threshold level that is above 100 BPM, it signifies that the patient should need immediate medical care. We take 100 samples to measure SpO2 level and pulse rate. MAX30102’s LO + pin
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
213
Fig. 8 Block diagram where sensors are connected to ESP32 microcontroller
Actual
Observed
Error Rate
Value (° C)
Value (° C)
(%)
31.2° C
31.12° C
0.26
34.3° C
34.41° C
0.32
32.2° C
32.31° C
0.34
Fig. 9 a Table to show the actual value, observed value of body temperature, and the error percentage by comparing those values, b picture of the serial monitor where the readings of the sensors have been shown
Table 1 It shows the actual value of body temperature which is measured by a thermometer and the observed value of body temperature which has been measured by the DS18B20 sensor and the error percentage by comparing those values Actual value (°C)
Observed value (°C)
Error rate (%)
31.2
31.12
0.26
34.3
34.41
0.32
32.2
32.31
0.34
is connected to the ESP32 GPIO D35 pin, and its LO- pin has been connected to the ESP32 GPIO D34 pin. It also provides notification if the SpO2 level or pulse rate exceeds the normal range. This sensor value can also be tested by measuring those parameters with a pulse oximeter. As the error rate is minor, we conclude that this sensor is working perfectly, and we can say that it approximately provides an accurate reading (Tables 2 and 3; Figs. 10, 11 and 12). AD8232 sensor is used to capture ECG signals. ECG signal acts as an indicator of the heart; it indicates how much healthy your heart is. If a person is suffering from a heart attack, it is detected in ECG signal, so by capturing this signal from anywhere and at any time, we can save lives and take proper precautions at right time. This ECG sensor contains three electrodes: red, green, and yellow. A green electrode has
214
N. Dey and P. Ghosh
Table 2 It shows the actual value of pulse rate in bpm which has been measured by a pulse oximeter, the observed value of pulse rate in bpm which is measured by using a MAX30102 sensor, and the error percentage with respect to the actual values Actual value (bpm)
Observed value (bpm)
Error rate (%)
91
93
2.19
82
83
1.21
75
74
1.33
Table 3 It shows the actual value of SpO2 in percentage which is measured by a pulse oximeter, the observed value of SpO2 in percentage which is measured by using MAX30102 sensor, and error rate in percentage with respect to actual values Actual value (%)
Observed value (%)
Error rate (%)
97
98
1.03
94
95
1.06
96
95
1.04
Actual Value (bpm) 91 82 75
Observed Value (bpm) 93 83 74
Error (%) 2.19 1.21 1.33
Rate
Actual (%) 97 94 96
value
Observed Value (%) 98 95 95
Error Rate (%) 1.03 1.06 1.04
Fig. 10 a Table to show the actual value, observed value of the pulse rate, and the error percentage by comparing those values, b table to show the actual value, observed value of the SpO2 level, and the error percentage by comparing those values
Fig. 11 We have taken 100 samples to measure average SpO2 and average heart rate
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
215
Fig. 12 a ECG signal of the first person is shown on the serial plotter, b ECG signal of the second person
been placed on the right part of the heart of the patient, though a red electrode is placed on the left side of the heart, and the yellow electrode is placed just below the ribcage of the patient’s body. The ECG signal has been collected from two persons to get a periodic and clear signal. We have to maintain some factors for capturing a proper ECG signal such as the shapes and the intervals of heart pulses. We can capture the ECG signal on the serial plotter of the ESP32 microcontroller. After doing the comparison between captured signals with the standard ECG signal, we can plot parameters of ECG signals like P-Q-R-S-T. Data is uploaded both on the web server (To the local IP Address) and Cayenne IoT cloud. Data is uploaded to the web server by using the HTTP protocol. At the Cayenne IoT platform, we have to register first, then we have to login to see the readings of the sensors (Figs. 13, 14, 15 and 16). Data charts are available at Cayenne cloud from where we can see data in graphical format in minutes, hours, months, and year duration and also download the data sheet to track the record offline.
Fig. 13 All sensors’ readings are displayed on the web server and at Cayenne IoT platform after login to that portal
216
N. Dey and P. Ghosh
Fig. 14 Room temperature (Celsius) data chart in hours which is available at Cayenne IoT platform (it is taken at the time of starting) and room humidity (%) data chart in minutes which is available at Cayenne IoT platform (it is taken after 25 min of starting)
Fig. 15 Body temperature (Celsius) data chart in hours which is available at Cayenne IoT platform (it is taken after 15 min of starting) and pulse rate (BPM) data chart in minutes which is available at Cayenne IoT platform (it is taken at the time of starting)
Fig. 16 Pulse rate (BPM) data chart in minutes which is available at Cayenne IoT platform (it is taken at the time of starting)
Cayenne IoT platform has another feature; that is it can notify the user through email or SMS if an alarming condition arises. If one of the sensor’s readings goes above the threshold level, it simply informs the user by sending a notification via mail or an SMS. To implement this, we have to just define triggers for each of the
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
217
Fig. 17 Cayenne IoT platform sends notification about alarming condition, for which channel, rather than physiological parameter, the alarming condition arises. Cayenne IoT platform is sending notification that which channel has reached the threshold value (above or below)
Fig. 18 Cayenne IoT platform is sending a notification that “Your MQTT needs your attention”
widgets, which means we have to provide upper and lower ranges of that specific parameter; if the value goes above or below that range, it automatically provides that alert, which we have written at the field of trigger name, via mail or SMS sending system. A notification has been sent to the registered mobile number or register mail; this number or mail can be of nearby health centers or family members to help those patients who are unable to go by their own or who are staying alone (Figs. 17 and 18). Contribution: The measurements required by a heart patient require ECG, heart rate, pulse rate, oxygen level, temperature, etc. ECG only measures ECG health band measures like pulse rate, heart rate, and oxygen level but cannot measure ECG. Our device integrates all the measurements required by a heart patient and generalizes the restrictive facilities provided by a particular electronic gadget. Also, our device is helpful for a patient during an emergency situation instead of going to a pathological center as it has a unified setup. Apart from this, our device reduces the cost and complexity and is user-friendly. Cayenne IoT platform has another feature; that is it can notify the user through email or SMS if an alarming condition arises. If one of the sensor’s readings goes above the threshold level, it simply informs the user by sending a notification via mail or an SMS. A notification has been sent to the registered mobile number or register mail; this number or mail can be of nearby health centers or family members to help those patients who are unable to go by their own or who are staying alone.
218
N. Dey and P. Ghosh
Future Plan:- In the future, I, want to develop a framework such as an Android app where two panels are there, one for the patients and another for the doctor who is assigned to those patients. In both panels, there should be register and login features to avoid unauthorized use. In the “Register” feature, users have to provide their detailed information on that page, and an “OTP” system is there; after proving the mobile number, this system automatically generates an OTP and it will send that OTP to the registered mobile number for securing validation process. In the “login” feature, there will be an option called “forget password”; if users forget their password, they can retrieve their account by clicking on this. In the dashboard panel of patients, there should be four icons; they will be profile, ECG, Health information, and Doctors. In the profile panel, patients have to upload their personal information such as name, age, address, contacts, and their background of pre-diseases or medical history. In the ECG panel, the patient’s ECG will be uploaded. In the Health information panel, the patient’s other health parameter details will be uploaded. In the Doctors’ panel, the doctors which have been assigned to those patients will be shown, and they will give prescriptions as well. For the Doctors’ panel, there will be two icons; they will be the designation of the doctor and the assigned patient’s medical details. In the designation panel, doctors’ data will be shown and uploaded. In the assigned patient’s panel, the number of patients will be shown who have been assigned to that doctor. After logging in successfully, data will be uploaded to firebase and then data will be stored at any databases like MySQL. I want to implement this system in the future to make a patient monitoring system a complete system that can be accessed through an Android app by a valid user or doctor (Fig. 19).
Fig. 19 Future scope of this project
An IoT-Enabled Vital Cardiac Parameter Monitoring System …
219
11 Conclusion IoT-driven devices are responsible to provide a feasible solution for remote tracking, especially in health care. Human’s essential physiological indicators such as body temperature, SpO2 level, pulse rate, and ECG signal can be measured using this system, and this system also maintains the room’s atmosphere such as room temperature and room humidity. The proposed system has been tested on different people to get accurate results. The accuracy of all of the sensor values has been tested by measuring most of the parameters using another gadget. As the error rate of the sensors is minor, we can conclude that system works perfectly well. All sensors are connected to the internet with the help of an ESP32 microcontroller; if any alarming condition occurs, the web server and Cayenne IoT platform both are capable of sending notifications to the user. Cayenne can send notifications via mail or SMS. This system helps many patients, especially cardiac patients to get medical services at a lower cost and without spending a lot of time at diagnostic centers. It also helps patients to get a hospital-life atmosphere and routine check-ups anywhere and at any time irrespective of staying in hospitals or visiting diagnostic centers. In pandemics like COVID-19 when people cannot go outside, they can get those facilities at home. This system is flexible; if we want to measure more physiological parameters, we just have to add new sensors to this circuit. This system uses HTTP protocol to upload the data at web server and MQTT protocol to upload the data at the Cayenne IoT platform. So, to improve the quality of living and to save life on time, this system is a good option. Acknowledgements Authors are thankful to the Department of Computer Science and Engineering, RCC Institute of Information Technology, for providing the required infrastructure during the progress of the work.
References 1. Huang J et al (2021) Internet of things in health management systems: a review. Int J Commun Syst 34(4):e4683 2. Ghosh P, Bhattacharjee D, Nasipuri M (2021) Passive auto focusing of pathological microscope with intelligent field image collection mechanism. J Med Syst 45(2):1–15 3. Ghosh P, Bhattacharjee D, Nasipuri M (2021) Dynamic diet planner: a personal diet recommender system based on daily activity and physical condition. IRBM 42(6):442–456 4. Ghosh P, Bhattacharjee D, Kollmann C (2020) A framework to classify the calcification region from USG images of thyroid nodules. In: Intelligent vision in healthcare. Springer, Singapore, pp 45–58 5. Rathee G, Sharma A, Saini H, Kumar R (2020) A hybrid framework for multimedia data processing in IoT-healthcare using block chain technology. Multimedia Tools Appl 79(15):9711–9733 6. Hameed K, Bajwa IS, Ramzan S, Anwar W, Khan A (2020) An intelligent IoT based healthcare system using fuzzy neural networks. Sci Program 2020:1–15, Article ID 8836927
220
N. Dey and P. Ghosh
7. Tamilselvi V, Sribalaji S, Vigneshwaran P, Vinu P, GeethaRamani J (2020) IoT based health monitoring system. In: 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, pp 386–389 8. Acharya AD, Patil SN (2020) IoT based health care monitoring kit. In: 2020 fourth international conference on computing methodologies and communication (ICCMC). IEEE, pp 363–368 9. Islam MM, Rahaman A et al (2020) Development of smart healthcare monitoring system in IoT environment. SN Comput Sci 1(3):1–11 10. Islam K, Alam F, Zahid AI, Khan MM, Abbasi MI (2022) Internet of Things- (IoT-) based real-time vital physiological parameter monitoring system for remote asthma patients. Wireless Commun Mob Comput, Wiley 11. Liu W, Wang X, Peng W (2022) Secure remote multi-factor authentication scheme based on chaotic map zero-knowledge proof for crowdsourcing internet of things. IEEE Access PP(99):1–1 December 2019, https://doi.org/10.1109/ACCESS.2019.2962912, 30 May 2022 12. Shrimali R (2022) How IoT is Transforming the Healthcare Industry By Rahil Shrimali 30 May 2022. https://embeddedcomputing.com/application/healthcare/telehealth-healthcare-iot/ how-iot-is-transforming-the-healthcare-industry 13. Lubel BA (2022) Internet of Things healthcare applications, benefits and challenges, May 31 2022, https://www.iotworldtoday.com/2017/10/13/internet-things-healthcare-applicationsbenefits-and-challenges/ 14. ESP32 Series Datasheet 02 June 2022. https://www.espressif.com/sites/default/files/docume ntation/esp32_datasheet_en.pdf 15. DS18B20 programmable resolution 1-wire digital thermometer, 02 June 2022. https://datash eets.maximintegrated.com/en/ds/DS18B20.pdf 16. ESP32 Useful Wi-Fi Library Functions (Arduino IDE), 31 May 2022. https://randomnerdtutor ials.com/esp32-useful-wi-fi-functions-arduino/ 17. DS18B20 temperature sensor module, 31 May 2022. https://www.robotrack.co.in/index. php?route=product/product&product_id=435&search=DS18B20+Temperature+Sensor+Mod ule&description=true&gclid=CjwKCAjw5NqVBhAjEiwAeCa97QV51F05ui5gNz9znDotJ tMhC_LBEuhKxAHRpP4O4LA9ISYuWZ9VTxoCFrQQAvD_BwE 18. Shojaei AM (2022) Interfacing MAX30102 pulse oximeter heart rate module with Arduino, written by Amir Mohammad Shojaei,31 May 2022, https://electropeak.com/learn/interfacingmax30102-pulse-oximeter-heart-rate-module-with-arduino/ 19. Interfacing MAX30102 pulse oximeter and heart rate sensor with Arduino, 31 May 2022, https://lastminuteengineers.com/max30102-pulse-oximeter-heart-rate-sensor-arduinotutorial/ 20. Interfacing MAX30102 Pulse Oximeter and Heart Rate Sensor with Arduino, 31 May 2022, https://www.google.com/imgres?imgurl=https%3A%2F%2Flastminuteengineers.b-cdn.net% 2Fwp-content%2Fuploads%2Farduino%2FMAX30102-Pulse-Detection-Photoplethysmog ram.png&imgrefurl=https%3A%2F%2Flastminuteengineers.com%2Fmax30102-pulse-oxi meter-heart-rate-sensor-arduino-tutorial%2F&tbnid=PTf8o58cBllZ0M&vet=12ahUKEwi i0qLHusr4AhWujNgFHUl5AKYQMygCegUIARDIAQ..i&docid=JaJHY-7T3StQOM&w= 358&h=274&q=max30102%20working%20principle&ved=2ahUKEwii0qLHusr4AhWujNg FHUl5AKYQMygCegUIARDIAQ 21. ECG graph monitoring with AD8232 ECG sensor & Arduino, 02 June 2022. https://how2elect ronics.com/ecg-monitoring-with-ad8232-ecg-sensor-arduino/ 22. Martin LO, Picazo-Sanchez P, Peris-Lopez P, Tapiador J (2022) Heartbeats do not make good pseudo-random number generators: an analysis of the randomness of inter-pulse intervals, 01 June 2022. https://www.researchgate.net/publication/322800438_Heartbeats_Do_Not_ Make_Good_Pseudo-Random_Number_Generators_An_Analysis_of_the_Randomness_of_I nter-Pulse_Intervals
Elucidating the Inhibition Mechanism of FDA-Approved Drugs on P-glycoprotein (P-gp) Transporter by Molecular Docking Simulation Abira Dey, Ruoya Li, Nathalie Larzat, Jean Bernard Idoipe, Ahmet Kati, and Ashwani Sharma
Abstract P-glycoprotein (MDR1) is an efflux transporter that regulates the elimination of the substrates, drugs, and drug metabolites from the liver into the bile canaliculi. Impaired P-gp activity due to the action of drugs can lead to the DrugInduced Liver Injury (DILI). Therefore, it is always recommended to understand the interaction of the drugs with the P-gp. However, the mechanistic pathway of inhibition of P-gp is not clear to date; therefore in this study, we aim to look mechanistic insights into the inhibitory effects of seven P-gp inhibitors using the molecular docking approach. Our molecular docking results revealed that Elacridar (IC50 : 0.05 µM, −10.70 kcal/mol, no. of interactions: 11, Fig. 1A and Zosuquidar (IC50 : 0.18 µM, -10.50 kcal/mol, no. of interactions: 11, Fig. 1B), with low IC50 , had greater affinities. While Quinidine (IC50 : 51.3 µM, −8.40 kcal/mol, no. of interactions: 7, Fig. 2A), and Verapamil (IC50 : 76 µM, −7.40 kcal/mol, no. of interactions: 5, Fig. 2B) with high IC50 , had the lower affinity for P-gp. Our in-silico approach furnishes a deep conception about the interaction of inhibitor compounds with the P-gp protein. Therefore, our in-silico approach can perhaps be helpful in designing in vitro clinical trials using P-gp protein so as to estimate minutely the contribution of inhibitor compounds in hepatotoxicity. Keywords P-gp protein · DILI · Efflux pump protein · Bile acid · Molecular docking A. Dey Indian Science and Technology Foundation, Delhi 110053, India R. Li · N. Larzat · J. B. Idoipe · A. Sharma (B) Insight Biosolutions, 35000 Rennes, France e-mail: [email protected] A. Kati Experimental Medicine Research and Application Center, University of Health Sciences Turkey, Uskudar, Istanbul, Turkey Department of Biotechnology, Institution of Health Sciences, University of Health Sciences Turkey, Uskudar, Istanbul, Turkey © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_20
221
222
A. Dey et al.
1 Introduction Human liver is the utmost significant organ responsible for elimination of drugs. It is the connection between the gastrointestinal tract and the systemic circulation. When medicines are taken orally, they get absorbed by the intestine, but they have to go through the liver prior to their entry in the systemic blood circulation. During this process, a fraction of drug enters the systemic circulation, and the rest is eliminated which may lead to a considerable influence on the bioavailability of the drug [1]. Transporters are proteins that are responsible for transporting different materials like ions, molecules, and other macromolecules across the biological membrane of a living organism [2]. After the drug metabolism in the hepatic cellular part, the hepatic transporters perform important roles in the elimination of metabolites and parent compounds from liver cells into bile canaliculi. However, the function of the hepatic transporters can be altered by genetic polymorphisms and drug-drug interactions (DDI), and this can lead to variations in the pharmacokinetic activity of the transporter substrate drugs, which later impacts their pharmaceutical effects or toxicity [3]. Permeability glycoprotein (P-gp), ciphered in humans by ABCB1 gene, is a significant ATP-dependent efflux pump protein that takes part in preventing the entry of exogenic substances inside delicate organs [4]. Victor Ling discovered P-gp in 1971 and is considered to have two functional sites i.e., “H” site and “R” site. In P-gp, at least two sites are used for binding of drugs, and these two sites were experimentally verified by Shapiro and Ling in 1997 using fluorescence technique [5]. P-gp facilitates repeated pumping of xenobiotics from intestine to intestinal lumen and from liver to bile ducts [6–8]. The absorption of drug through blood—brain barrier gets tougher due to presence of P-gp, thus affecting their scattering and eradication [9]. and inhibition of P-gp activity due to drug interaction leads to Drug-Induced Liver Injury (DILI) [10–12]. Wakasugi et al. reported that the transport of digoxin, a P-gp substrate, can be inhibited by clarithromycin leading to drug–drug interaction and inferring a remarkable rise in plasma exposure and reduction in renal clearance [13]. Okamura et al. reported the mechanism of digoxin and cyclosporin A. In this study, it was observed that renal tubular secretion of digoxin was reduced by the kidney, and Cyclosporin A was responsible for it. Although the transport of Cyclosporin A by P-gp was not affected by digoxin. Therefore, it is suggested that when digoxin and Cyclosporin A are prescribed to a patient simultaneously, the serum concentration digoxin should be monitored carefully [14]. Therefore, it is always recommended to understand the interaction of the drugs with the P-gp. However, the mechanistic inhibition of P-gp is not clear till date; therefore in this study, we aim to understand the interaction of drugs with the P-gp using molecular docking approach to elucidate their inhibition mechanism.
Elucidating the Inhibition Mechanism of FDA-Approved Drugs …
223
2 Method 2.1 Structure of P-gp The P-gp structure was obtained from RCSB databank with PDB ID of 6C0V. It is the molecular structure of human P-gp in the outward facing conformation with ATP— bound. Further, the structure of the P-gp protein was prepared for the molecular docking using the discovery studio software. All heteroatoms such as water molecules and ligand molecules are removed from the PDB file. We considered only chain A for our docking simulation. The processed protein file was saved in pdb format.
2.2 Selection of the Compounds The details of seven inhibitors and their IC50 values of P-gp named Elacridar, Ketoconazole, Quinidine, Reserpine, Ritonavir, Verapamil, and Zosuquidar were obtained from literature [15–17]. The 3D structures of these inhibitors were obtained from PUBCHEM and saved in PDB format. SMILES string of the inhibitors was obtained from PUBCHEM and subjected to the VEGA HUB software for toxicity prediction. Avogadro software was used to optimize all the inhibitor compound structures in order to obtain energy minimized structure.
2.3 Binding Site Prediction The binding site information of the P-gp protein was obtained using PDBsum software. The amino acid residues of the binding site are used as center of the docking target during our simulation.
2.4 Molecular Docking by Autodock Vina We performed molecular docking of drug against binding site of P-gp using the autodock vina software [18]. The drugs and P-gp files are prepared and checked for all missing hydrogen atoms. After adding missing hydrogen, the charges are calculated for each atom, and files are saved in.pdbqt format. The grid box is generated around the binding site of the P-gp protein using a grid box size of 100 × 100 × 100 with a grid spacing of 0.375 Å. This box size covered all active sites of the P-gp. The config file has been prepared with information of P-gp receptor, gris box size information, grid center information, number of modes (num_modes) of 200, and exhaustiveness of 100. These parameters are set for running the docking simulation by our perl script.
224
A. Dey et al.
The ligands files are prepared in.pdbqt format, and the drugs library is prepared in the ligand.txt format. Automatic molecular docking was performed using our perl script, and docking complex between drug-P-gp was analyzed by discovery studio software for H bond formation and non-bonded interaction.
3 Results 3.1 Energy Minimization of the Compounds Energy minimization of the compounds was done prior to molecular docking using Avogadro software. It is done in order to decrease the overall potential energy of the inhibitor compounds. Biological systems are known to be very dynamic having low potential energies for spontaneous interaction and energy minimization helps in achieving a conformation with lower potential energy values.
3.2 Docking of Inhibitors with P-gp According to the docking study, Elacridar and Zosuquidar were found to have lowest binding energies with P-gp, Ritonavir, Ketoconazole, and Reserpine had medium binding energies with P-gp, while Quinidine had highest binding energies with P-gp (Table 1). It is considered that binding energy is inversely proportional to binding affinity. Therefore, Elacridar and Zosuquidar had greater affinities for P-gp, Ritonavir, Ketoconazole, and Reserpine had medium affinities for P-gp, while Quinidine and Verapamil had lowest affinity for P-gp. It was observed that the inhibitor compounds with lower IC50 value had greater interaction with the P-gp protein and vice—versa. IC50 or half maximal inhibitory concentration is the quantitative approximation of the ability of a substance in obstructing a specific biological and biochemical function. It depicts that in an in—vitro model how much of a specific inhibitor, like a drug, is needed to hinder a particular biological process or component by 50%. We had studied the binding of Elacridar and Zosuquidar on the active site cavity of P-gp. In the Fig. 1A and B, it has been shown that these two inhibitor compounds bound exactly at the binding site of the P-gp. The number of interactions for Elacridar and Zosuquidar with P-gp was found to be 11 and 11, respectively. Elacridar was found to have H—bonding interactions with Aspartic acid, Alanine, and Threonine and is surround by Arginine, Asparagine, Glutamic acid, Glycine, Lysine, Phenylalanine, and Valine. Zosuquidar was found to have H—bonding interactions with Glutamine, Leucine, Threonine, and Tyrosine and is surround by Arginine, Asparagine, Glutamic acid, Glycine, Isoleucine, Lysine, Serine, and Valine. We had also studied the binding of Quinidine and Verapamil on the active site cavity of P-gp. In the Fig. 2A and B, it has been shown that this inhibitor compound
Elucidating the Inhibition Mechanism of FDA-Approved Drugs … Table 1 Docking analysis data of interaction between P-gp and the inhibitor compounds
225
Compounds
IC50 value (µM)a
Binding energy (kcal/mol)
Elacridar
0.05
−10.7
Zosuquidar
0.18
−10.5
Ritonavir
1.1
−9.9
Ketoconazole
5.6
−9.9
Reserpine
10
−9.3
Quinidine
51.3
−8.4
Verapamil
76
−7.4
a
Ref 16,17,18,19
3D
2D
3D
2D
A
B
Fig. 1 Binding of A Elacridar and B Zosuquidar on the active site cavity of P-gp
did not bound exactly at the binding site of the P-gp. The number of interactions for Quinidine and Verapamil with P-gp was found to be 7 and 5, respectively. Quinidine was found to have H—bonding interactions with Glutamic acid and Threonine and is surround by Arginine, Asparagine, Aspartic acid, Isoleucine, Lysine, Phenylalanine, and Valine. Verapamil was found to have H—bonding interactions with Arginine, Glutamine, Serine, Threonine, and Valine and is surround by Asparagine, Aspartic acid, Glutamic acid, Isoleucine, Lysine, Phenylalanine, and Tyrosine.
226
A. Dey et al.
3D
2D
3D
2D
A
B
Fig. 2 Binding of A Quinidine and B Verapamil on the active site cavity of P-gp
Therefore, using the in-silico approach, we can say that compounds having lower IC50 value inhibit the activity of P-gp protein. Therefore, this can lead to accumulation of bile acid within the hepatocyte cells and can result in drug induced liver injury (DILI). Our in-silico study is effective in predicting the mechanism of binding of different inhibitor compounds on P-gp, thus guiding in understanding their affinities for P-gp.
3.3 Toxicity Prediction of the Inhibitors We have predicted toxicities of drugs using QSAR-based in-silico approach. We used VEGA HUB software to understand potential health alerts for the compounds. Our toxicity prediction reveals that Elacridar shows most of the toxicity’s alerts. However, second most affinity drug Zosuquidar produces no toxicity alerts except in vivo micronucleus activity. Therefore, our in-silico toxicity prediction method can help in accessing safety of the drugs before the experimental approach (Table 2).
56–54-2
52–53-9
Quinidine
Verapamil
Non-Mutagenic
Non-Mutagenic
Non-Mutagenic
50–55-5
Reserpine
155,213–67-5 Non-Mutagenic
Ritonavir
Non-Mutagenic
167,354–41-8 Non-Mutagenic
Zosuquidar
Ketoconazole 65,277–42-1
143,664–11-3 Mutagenic
Elacridar
Non-Mutagenic
Non-Mutagenic
Non-Mutagenic
Non-Mutagenic
Non-Mutagenic
Non-Mutagenic
Mutagenic
Non-Carcinogen
Non-Carcinogen
Carcinogen
Non-Carcinogen
Non-Carcinogen
Non-Carcinogen
Non-Carcinogen
99.48
235.07
421.19
168.08
N/A
N/A
5729.94
Non-genotoxic
Not predicted
Toxic
Toxic
Toxic
Toxic
Toxic
Mutagenicity (CAESAR) Mutagenicity (KNN-Read-Across) Carcinogenicity Acute Toxicity In vivo model (CAESAR) (LD50) model (KNN) Micro—nucleus (mg/kg) activity (IRFMN)
CAS No
Compounds
Table 2 Prediction of the toxicological effects of the drugs using VEGA HUB QSAR tool
Elucidating the Inhibition Mechanism of FDA-Approved Drugs … 227
228
A. Dey et al.
4 Conclusion Our molecular docking shows that Elacridar and Zosuquidar had greater affinities for P-gp, Ritonavir, Ketoconazole, and Reserpine had medium affinities for P-gp, while Quinidine had lowest affinity for P-gp. In this study, it was observed that inhibitor compounds having lower IC50 value had greater affinity for the P-gp protein and vice—versa. Therefore, using the in-silico approach, we can say that compounds having lower IC50 value inhibit the activity of P-gp protein. Our in-silico study is effective in predicting the mechanism of binding of different inhibitor compounds on P-gp, thus guiding in understanding their affinities for P-gp. Our in-silico approach furnishes deep understanding about the interaction of inhibitor compounds with the P-gp protein. Therefore, our in-silico approach can perhaps be helpful in designing in-vitro clinical trials using P-gp protein so as to estimate minutely the contribution of inhibitor compounds in hepatotoxicity.
References 1. Vildhede A (2015) In vitro and in silico predictions of hepatic transporter-mediated drug clearance and drug-drug interactions in vivo (Ph.D. dissertation) 2. Sadava D, Heller HC, Hillis DM, Berenbaum M (2009) Life, the science of biology, 9th Edition. Macmillan Publishers. P 119, ISBN 1-4292-1962-9 3. David S, Hamilton JP (2010) Drug-induced liver injury. US Gastroenterol Hepatol Rev 6:73–80 4. Tanigawara Y (2000) Role of P-glycoprotein in drug disposition. Ther Drug Monit 22(1):137– 140 5. Shapiro AB, Ling V (1997) Positively cooperative sites for drug transport by P-glycoprotein with distinct drug specificities. Eur J Biochem 250(1):130–137 6. Cummins CL, Wu CY, Benet LZ (2002) Sex-related differences in the clearance of cytochrome P450 3A4 substrates may be caused by P-glycoprotein. Clin Pharmacol Ther 72(5):474–489 7. Lin JH, Yamazaki M (2003) Role of P-glycoprotein in pharmacokinetics: clinical implications. Clin Pharmacokinet 42(1):59–98 8. Szakács G, Váradi A, Ozvegy-Laczka C, Sarkadi B (2008) The role of ABC transporters in drug absorption, distribution, metabolism, excretion and toxicity (ADME-Tox). Drug Discov Today 13(9–10):379–393 9. Schinkel AH, Wagenaar E, Mol CA, van Deemter L (1996) P-glycoprotein in the blood-brain barrier of mice influences the brain penetration and pharmacological activity of many drugs. J Clin Investig 97(11):2517–2524 10. Kis E, Ioja E, Rajnai Z, Jani M, Méhn D, Herédi-Szabó K, Krajcsi P (2012) BSEP inhibition: in vitro screens to assess cholestatic potential of drugs. Toxicol in vitro Int J Published Assoc BIBRA 26(8):1294–1299 11. Yang K, Köck K, Sedykh A, Tropsha A, Brouwer KL (2013) An updated review on drug-induced cholestasis: mechanisms and investigation of physicochemical properties and pharmacokinetic parameters. J Pharm Sci 102(9):3037–3057 12. Ghanem CI, Manautou JE (2022) Role and regulation of hepatobiliary ATP-BindingCassete (ABC) transporters during chemical-induced liver injury. Drug Metab Dispos Biol Fate Chem DMD-MR-2021-000450 13. Wakasugi H, Yano I, Ito T, Hashida T, Futami T, Nohara R, Sasayama S, Inui K (1998) Effect of clarithromycin on renal excretion of digoxin: interaction with P-glycoprotein. Clin Pharmacol Ther 64(1):123–128
Elucidating the Inhibition Mechanism of FDA-Approved Drugs …
229
14. Okamura N, Hirai M, Tanigawara Y, Tanaka K, Yasuhara M, Ueda K, Komano T, Hori R (1993) Digoxin-cyclosporin a interaction: modulation of the multidrug transporter P-glycoprotein in the kidney. J Pharmacol Exp Ther 266(3):1614–1619 15. https://www.solvobiotech.com/transporters/MDR1-P-gp 16. Jouan E, Vee ML, Mayati A, Denizot C, Parmentier Y, Fardel O (2016) Evaluation of Pglycoprotein inhibitory potential using a rhodamine 123 accumulation assay. Pharmaceutics 8:12 17. Bierman WFW, Scheffer GL, Schoonderwoerd A, Jansen G, van Agtmael MA, Danner SA, Scheper RJ (2010) Protease inhibitors atazanavir, lopinavir and ritonavir are potent blockers, but poor substrates, of ABC transporters in a broad panel of ABC transporter-overexpressing cell lines. J Antimicrob Chemother 65(8):1672–1680 18. Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J Comput Chem 31:455–461
A Fast Restoration of Weather Degraded Images Avra Ghosh, Ajoy Dey, and Sheli Sinha Chaudhuri
Abstract Real-time haze removal from weather corrupted image is a challenging process. Due to presence of atmospheric particles, light coming from the object scatters and results in hazy images. In this research work, a novel dehazing method is presented by introducing parallel calculation of transmission coefficient (TC) and atmospheric light (ALE), followed by a contrast enhancement method. This results to improve image quality and reduce computational complexity. Effectiveness of this method is presented by comparative analysis with respect to existing methods in terms of the statistical parameters like MSE, SSIM, PSNR, Correlation, fast Computational Complexity. Keywords Parallel processing · Dehazing · Real-time · Contrast enhancement
1 Introduction Outdoor images get degraded due to turbid weather condition like, fog, smog, vog, water droplets, etc. Light travels through the air in straight line, but due to the presence of atmospheric particle, it scatters, resulting in hazy images. This type of images when fed to computer vision system like video surveillance, satellite imaging, automatic car driving system, etc. reduces the performance of the system. The application is huge, but the solution is still limited in real-time system. Mainly four types of dehazing methods are used as shown in Fig. 1. Firstly, multi image comparison-based method, where a hazy image is compared with a single ground truth or multiple ground truth. Though it is a very effective approach, the time complexity and ground truth availability in real-time scenario is difficult. Secondly, in contrast enhancement-based methods function by adjusting the contrast or histogram equalization of the image. The processing time is less, but the output is not standard for subsequent applications. Third approach proposes different deep learning and machine learning model, where the output depends upon the previous data. A. Ghosh (B) · A. Dey · S. S. Chaudhuri Jadavpur University,Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_21
231
232
A. Ghosh et al.
Fig. 1 Image dehazing processes
Lastly, in the prior-based approaches, different masks are created and applied depending upon the ALE and the TC. DCP, CAP are some of the most common examples of prior-based method [1–5]. In 2008, Tan et al. [6] proposed a novel method for single image dehazing. Their work was inspired from two key observations. First, an image with no weather degradation (or color day) has more contrast than an image which has been degraded by weather conditions. Based on these observations, the authors developed an method to perform automatic dehazing from a single input image. In 2009, He et al. [7] developed a novel dehazing algorithm using single image. The authors proposed a method based on the dark channel prior, which enabled very efective estimation of airlight and trasmittance map. In 2011, J-Hwan [8] proposed a method for single image dehazing by enhancing the contrast of the degraded image. In 2020, Ghosh et al. [3] proposed a parallel architecture method based on dark channel prior for single image dehazing technique. It is divided into basically two parts, one to calculate the atmospheric light and secondly to calculate the transmission coefficient paralelly to apply in real-time application. In 2021, Ghosh and Chaudhuri [2] presented a Raspberry Pi-based single image dehazing machine, where it can process the image and dehazed locally. The rest of the paper is organized with discussion related to previous work in Sect. 2, proposed mechanism is shared in Sect. 3, followed by experimental observations is Sect. 4 and conclusion is shared in Sect. 5.
2 Background In 1924, Koschmieder [4] presented the below mentioned expression for representation of hazy images. I (x) = J (x)t (x) + A(1 − t (x)) (1)
A Fast Restoration of Weather Degraded Images
233
where, J(x) = Original scene without haze, t(x) = Transmission Coefficient, A = Atmospheric Light and I(x) = Hazy image. If we do analysis of the above equation, then the number of known fact is only one the hazy image I(x), whereas haze free image J(x) , transmission coefficient t(x) and atmospheric light (A) all the three values are unknown. The calculation of these unknown three parameters are depends on three steps. Firstly, atmospheric light (A) calculation, which is further divided into two parts, dark channel calculation and after that atmospheric light calculation. This dark channel prior method was presented by Kaiming He et al. in 2009[7]. The equation is presented below, (2) I dark (x) = min (min(I C (y))) y∈(x)
Here (x) represents a minimum intensity square kernel at x and C symbolizes each color channel in Red, Green, Blue. Dark channel represents the shaded area in the total image. 0.1% of the brightest pixel in Dark Channel is represented as Atmospheric Light (A). Secondly, they proposed a method for calculating transmission coefficient. When the haze is homogeneous, the transmission coefficient can be represented as (3) t (x) = e−d(x) where, t(x) is the scattering coefficient of the atmosphere. This expression clearly represents the attenuation of haze depending upon distance d. Considering the aerial perspective, the transmission coefficient is modified and represented in HSV model. where the transmission coefficient is calculated in HSV model. Where the value of S(x) and V (x) [9] is represented as below
S(x) =
⎧ ⎪ ⎨0
; if max I C (x) = 0
⎪ ⎩1 −
min
C∈R,G,B
I (x)
C∈R,G,B
C
max I C (x)
; otherwise
(4)
C∈R,G,B
V (x) = max I C (x) C∈R,G,B
(5)
After calculating the S(x) and V (x), transmission coefficient is estimated as per proposed equation. The distance d(x) can be calculate very accurately in HSV model. The equation can be represented as d(x) = θ0 + θ1 .S(x) + θ2 .V (x)
(6)
The value of θ0 = 0.12, θ1 = 0.96, θ2 = −0.78 [9]. S(x) and V (x) are independent parameter to calculate the transmission coefficient, which shows that the ALE and TC can be calculated parallely. The scene can be recovered from the following equation which is derived from Eq. (1) [10]
J (x) =
(I (x) − A) +A max(t (x), t0 )
(7)
where, J(x) is the recovered haze free scene and restricts the lower limit of the transmission coefficient, which helps to recover the scene properly. Basically we are using this logic to implement the technique in our algorithm.
3 Proposed Method In a hazy image the brightness and saturation varies depending upon the haze quality. For a haze free image the difference between saturation and brightness is near to zero. Using this property a parallel architecture is proposed, where ALE and transmission coefficient is calculated parallely and then to improve the image quality a contrast enhancement method is applied. Due to parallel calculation the time complexity of the computation decreases a lot. The atmospheric light is measured in the RGB model. The flow chart explains the procedure step by step. First, atmospheric light and TC is calculated parallely, then the scene is recovered and lastly contrast enhancement proposed to improve the image quality (Fig. 2). Contrast Enhancement method is proposed after scene recovery. First, the existing contrast factor is calculated on whole image by below equation. An extra weight is added in each pixel so that the contrast can be enhanced using this values. C = int(CMax /2)
(8)
contrast = int((contrast − 0) ∗ 2C/(CMax − C))
(9)
Here C M ax represents the maximum value of any color channel belongs to R, G, B. After calculation if the value comes to a threshold level and the contrast is modified accordingly.
4 Experimental Observations The experiment is processed in a device with 16 GB RAM, AMD Ryzen 5 5500U, 2.10 Ghz Processor and Windows 11 Home Single Language operating system. The qualitative result with different images are shown in Table 2 with available data sets used in different algorithms. Also the quantitative analysis of no-reference parameter is shown in table 1. Structural Similarity Index (SSIM), Mean Square Error (MSE), Correlation (Cor) and Peak Signal to Noise Ratio (PSNR) is shown in the case of quantitative analysis.
A Fast Restoration of Weather Degraded Images
235
Fig. 2 Flow chart of proposed dehazing algorithm Table 1 Comparison of reference image quality parameter for images Serial No. Image SSIM PSNR MSE 1 2 3 4 5 6 7 8 9 10 11 12
Fishers Foggy_bench Foggy_oaks FoggyHouse Haze.jpg HazyDay_input House_input.jpg Lilyhazy Marin-headlands-bridge Moebius_input Mountain-input Trees
0.9911 0.9660 0.9073 0.9665 0.9126 0.9592 0.87603 0.9469 0.8325 0.9570 0.9460 0.8596
35.7303 29.9711 31.4246 28.8945 30.6263 28.9899 28.1955 29.5462 29.9279 29.3647 28.4774 27.4586
17.37982 65.4586 46.8402 83.8741 56.2916 82.0509 98.5204 72.1869 66.1128 75.2675 92.3289 116.7397
Cor 0.99774 0.9903 0.9861 0.9955 0.9980 0.9939 0.9925 0.9986 0.9966 0.9952 0.9924 0.9893
236
A. Ghosh et al.
Table 2 Computational complexity for our method Serial No. Image Image size 1 2 3 4 5 6 7 8 9 10 11 12
Fishers Foggy_bench Foggy_oaks FoggyHouse Haze.jpg HazyDay_input House_input.jpg Lilyhazy Marin-headlands-bridge Moebius_input Mountain-input Trees
346*512*3 600*800*3 376*520*3 1536*2048*3 300*400*3 576*768*3 448*440*3 480*640*3 680*1024*3 956*1200*3 384*512*3 723*1076*3
TC(in secs) 2.0914 5.5734 2.2424 29.7202 1.0901 3.9548 2.3640 2.8576 8.0971 9.7631 2.3341 8.6899
The qualitative results with different other existing algorithm is shown in the below Table 3. Specifically we have shown comparison with histogram equalization (H.E.) of image, Dark channel prior-based method [7], color attenuation prior-based method [10], Meng’s algorithm [11]. With less time complexity out algorithm shows a competitive result. The computational complexity is shown in Table 4. TC refers for time complexity of our method. Tables 5, 6, 7 and 8 shows the quantitative comparison between the different algorithms for SSIM, MSE, PSNR, and Correlation, respectively. The statistics shows a competitive result with other algorithms.
5 Conclusion Real-time haze removal of hazy images has been carried out in this paper in a novel way by implementing parallel calculation of the ALE and TC, followed by contrast enhancement method. The comparison of qualitative and quantitative results with different other existing logic also shows a competitive results for the same. Statistical parametric study (MSE, SSIM, PSNR, Correlation, FPS) shows the objective effectiveness of the proposed model. Also, the computational complexity is measured and found to be perfect for further processing. The qualitative analysis also shown in the above result section, demonstrates competitive output. This research was conducted in the DCIP Lab. ETCE Department, Jadavpur University.
A Fast Restoration of Weather Degraded Images
237
Table 3 Experimental results for different videos in proposed Serial Image name Hazy image Dehaze image No. 1
Fishers
2
Foggy_bench
3
Foggy_oak
4
Foggyhouse
5
Haze
6
HazyDay_input
7
House_input
(continued)
238
A. Ghosh et al.
Table 3 (continued) Serial Image name No. 8
Lilyhazy
9
Marinheadlands-bridge
Hazy image
Table 4 Qualitative comparison with different algorithms Image name Hazy image H.E. DCP CAP Fishers
Hazyday
House
Lilyhazy
Trees
Dehaze image
Meng
Our method
A Fast Restoration of Weather Degraded Images
239
Table 5 Quantitative comparison with different algorithms for SSIM Image name H.E. DCP CAP Meng Fishers Hazyday House Lilyhazy Trees
0.6321 0.7257 0.7593 0.8029 0.6971
0.9364 0.8674 0.9751 0.9282 0.9548
0.6112 0.7108 0.9073 0.7663 0.8973
0.4241 0.5861 0.8068 0.7472 0.9053
Table 6 Quantitative comparison with different algorithms for MSE Image name H.E. DCP CAP Meng Fishers Hazyday House Lilyhazy Trees
115.5912 82.8156 105.7126 74.6058 107.2912
116.5042 89.7741 47.2803 100.1549 60.2679
106.6335 114.5023 70.7754 109.9089 114.7142
99.7171 107.7012 84.8389 123.2106 116.1447
Table 7 Quantitative comparison with different algorithms for PSNR Image name H.E. DCP CAP Meng Fishers Hazyday House Lilyhazy Trees
27.5016 28.9497 27.8895 29.4031 27.8252
27.4674 28.5993 31.3840 28.1241 30.3299
27.8519 27.5427 29.6320 27.7205 27.5346
28.1431 27.8086 28.8449 27.2243 27.4808
Table 8 Quantitative comparison with different algorithms for correlation Image name H.E. DCP CAP Meng Fishers Hazyday House Lilyhazy Trees
0.8815 0.9622 0.9710 0.9605 0.9537
0.9763 0.9474 0.9407 0.9938 0.8207
0.8018 0.8997 0.9068 0.9785 0.7644
0.7231 0.7955 0.8405 0.9435 0.7968
Our method 0.9911 0.9592 0.8760 0.9469 0.8596
Our method 17.3799 82.0510 98.5205 72.1869 116.7397
Our method 35.7303 28.9900 28.1955 29.5462 27.4586
Our method 0.9977 0.9940 0.9925 0.9987 0.9893
240
A. Ghosh et al.
References 1. Yongmin P, Tae-Hwan K (2018) Fast execution schemes for dark-channel-prior-based outdoor video dehazing. IEEE Access 6:10003–10014 2. Ghosh A, Chaudhuri SS (2021) Iot based portable image dehazing machine. In: 2021 8th international conference on signal processing and integrated networks (SPIN), pp 31–35 3. Ghosh A, Roy S, Chaudhuri SS (2020) Hardware implementation of image dehazing mechanism using verilog hdl and parallel dcp. In: 2020 IEEE applied signal processing conference (ASPCON), pp 283–287 4. Hans I, Kasten F (1959) KOSCHMIEDERs Theorie der horizontalen sichtweite. VS Verlag für Sozialwissenschaften, Wiesbaden, pp 7–10 5. Qingsong Z, Jiaming M, Ling S (2015) A fast single image haze removal algorithm using color attenuation prior. IEEE Trans Image Process 24(11):3522–3533 6. Robby TT (2008) Visibility in bad weather from a single image 1–8 7. Kaiming H, Jian S, Xiaoou T (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 8. Kim J-H, Sim J-Y, Kim C-S (2011) Single image dehazing based on contrast enhancement. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) 9. Yu-Hsuan L, Wu B-H (2019) Algorithm and architecture design of a hardware-efficient image dehazing engine. 29:2146–2161 10. Qingsong Z, Jiaming M, Ling S (2015) A fast single image haze removal algorithm using color attenuation prior. IEEE Trans Image Process 24(11):3522–3533 11. Meng G, Wang Y, Duan J, Xiang S, Pan C (2013) Efficient image dehazing with boundary constraint and contextual regularization. In: 2013 IEEE international conference on computer vision, pp 617–624
Multi-level Feature-Based Subcellular Location Prediction of Apoptosis Proteins Soumyendu Sekhar Bandyopadhyay, Anup Kumar Halder, Kaustav Sengupta, Piyali Chatterjee, Mita Nasipuri, Dariusz Plewczynski, and Subhadip Basu
Abstract Apoptosis is considered a vital component of various processes including normal cell turnover, proper development and functioning of the immune system, hormone-dependent atrophy, embryonic development, and chemical-induced cell death (Elmore in Toxicol Pathol 35:495–516, 20). Apoptosis proteins are strongly related to many diseases like neurodegenerative diseases, ischemic damage, autoimmune disorders, and many types of cancer and play an indispensable role in maintaining the dynamic balance between cell death and division. Many apoptosis proteins are identified but their activity at cellular or molecular level needs to be investigated. The prediction of subcellular localization of an apoptosis protein is still a challenging task. The subcellular localization prediction of apoptosis proteins can help to understand their function and the role of metabolic processes. In this paper, we have S. S. Bandyopadhyay (B) · M. Nasipuri · S. Basu Department of Computer Science and Engineering, Jadavapur University, Kolkata 700032, India e-mail: [email protected] S. Basu e-mail: [email protected] S. S. Bandyopadhyay Department of Information Technology, Institute of Engineering & Management Kolkata, University of Engineering & Management, Kolkata 700091, West Bengal, India A. K. Halder · K. Sengupta · D. Plewczynski Faculty of Mathematics and Information Sciences, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland Laboratory of Functional and Structural Genomics Centre of New Technologies, University of Warsaw, Banacha 2C Street, 02-097 Warsaw, Poland K. Sengupta e-mail: [email protected] D. Plewczynski e-mail: [email protected]; [email protected] P. Chatterjee Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata 700152, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_22
241
242
S. S. Bandyopadhyay et al.
applied RF-based classification algorithms for predicting the localization of apoptosis proteins in the CL317 databases using mutual information (MMI) Normalized Moreau-Broto Autocorrelation (NMBAC) (Ding et al. in BMC Bioinform 17:1–13, 2016), and GO frequency value (GOi ). Keywords Apoptosis proteins · MMI · NMBAC · GOi · Random forest (RF)
1 Introduction Apoptosis, the process of planned cell death, occurs during early development to eliminate unwanted cells to maintain a stable internal environment. If apoptosis is prevented for whatever reason, it can result in uncontrolled cell division and the subsequent development of a tumor, thus playing a significant role in the development of cancer. Too much apoptosis in a normally functioning human will result in several so-called neurodegenerative diseases like Parkinson’s disease leading to death [1]. It takes a lot of work and time to determine a protein’s subcellular localization via immunolabeling or tagging. As a result, an in-silico technique for swiftly and properly annotating the apoptosis protein is needed, which will help us understand how apoptosis works and develop new pharmacological treatments. Zhang et al. [1] proposed a fresh feature extraction technique for protein sequences with clustered weight and utilized support vector machines to estimate the subcellular location of the apoptotic protein. Another method was developed using a distinct set of information parameters derived from the primary sequence of CL317 apoptosis proteins by Cheng et al. [2] based on the local compositions of twin amino acids and the distribution of hydropathy. The concept of distance frequency is applied using SVM to obtain the highest overall accuracy on CL317 dataset and ZW225 dataset, respectively, by Zhang et al. [3]. Liu et al. [4] proposed a SVM classifier-based PSSM-AC model where a novel sequence representation is used to incorporate the evolution information represented PSSMs by the auto covariance transformation on CL317 and ZW225 dataset. Recently, there are many research applications of machine learning which use novel features based on protein sequence, evolutionary information. Liang and Zhang [5] proposed FTC-DFMCA-PSSM by combining two different descriptors of evolutionary information, which includes the 190 features from a detrended forward moving-average cross-correlation analysis (DFMCA) based on a position-specific scoring matrix (PSSM) and the 192 frequencies of triplet codons (FTC) in the RNA sequence derived from the protein’s primary sequence. A 5-dimensional feature vector is created via the generalized chaos game representation (GCGR), which is based on the frequency, distributions of the residues in the protein main sequence, and novel statistics and information theory (NSI) that reflects local location information of the sequence proposed by Li et al. [6] which, without using machine learningbased classifiers, achieved a reasonable accuracy on CL317 and ZW225 datasets.
Multi-level Feature-Based Subcellular Location Prediction … Table 1 Proportion of various subcellular location proteins in Cl317 dataset
243
Dataset
Subcellular locations
# of proteins
CL317
Cytoplasm (Cy)
112
Endoplasm (Er)
47
Membrane (Me)
55
Mitochondrion (Mi)
34
Nucleus (Nu)
52
Secreted (Se)
17
Chen et al. [7] a method derived from PSSM based on absolute entropy correlation analysis and evolutionary information from the consensus sequence’s transition matrix are two new evolutionary information-based feature extraction approaches that have been proposed. Motivated by these works, we have used multivariate mutual information (MMI) which was used to calculate the k-gram feature representation and extracted physicochemical properties of amino acids from protein sequences [8]. We have fused the GO term of the protein with MMI and physicochemical properties to achieve better prediction accuracy.
2 Material and Methods 2.1 Datasets In this work, we have selected CL317 dataset for comparison with other related methods. All the protein sequences in the datasets are extracted from SWISSPROT (http://www.ebi.ac.uk/swissprot/). Table 1 describes the proportion of various subcellular location proteins in Cl317 dataset.
2.2 Feature Extraction 2.2.1
Multivariate Mutual Information (MMI)
We included a primary sequence-based feature in our method for retrieving structurally similar proteins. Following the strategy described in [9], we begin by extracting the trigram (n = 3) frequency feature from the sequence data. MMI is derived from trigram sub-patterns and is used to pairwise compare protein sequences. Second, a greedy heuristic clustering algorithm is developed, in which sequences are grouped based on a pairwise distance threshold. Finally, two structural alignment properties are applied to the generated clusters. We begin by building an amino acid
244
S. S. Bandyopadhyay et al.
frequency model with n-grams. In this experiment, we use a trigram (n = 3) to build the model. The representation has a dimensionality that is exponential (20n ). As a result, the dimensionality of the trigram feature increases to 8000. For a particular amino acid, a trigram can be represented as patterns of “three contiguous amino acid” while other two are considered as neighboring position. For any trigram pattern (x, y, z), MMI is defined as I (x, y) = f (x, y) ln( f (x, y)/( f (x) f (y)) where x, y, and z are three amino acids in one unit and I (x, y) representing the mutual information of bigram (n = 2) is defined as I (x, y, z) = I (x, y) − I (x, y|z) I (x, y|z) = H (x| z)−H (x| y, z) where f (x, y) is the frequency composition of bi-gram ( f (x, y) = f (y, x)) and f (x) and f (y) are the unigram frequency of amino acid x and y, respectively. I (x, y|z) define the conditional mutual information as I (x, y|z) = H (x|z) − H (x|y, z) Approximately, H (x|z) and H (x|y, z) can be calculated as H (x| z) = −( f (x, z)/ f (z)) ln( f (x, z)/ f (z)) H (x|y, z) = −( f (x, y, z)/ f (y, z)) ln( f (x, y, z)/ f (y, z))
2.2.2
Normalized Moreau-Broto Autocorrelation (NMBAC)
Inspired by Feng et al. [10], an auto-correlation function has been introduced in combination with six physicochemical properties, viz: hydrophobicity (H), volumes of side chains of amino acids (VSC), polarity (P1), polarizability (P2), solventaccessible surface area (SASA), and net charge index of side chains (NCISC) of amino acid. Proteins are translated into six vectors with each amino acid representing the normalized values of six descriptors. Mathematically, NMBAC can be given as
n−lag
AClag,I = 1/(n − lag)
Xi, j ∗ Xi + lag, j
i=1
where, i = 1, 2, . . . n − lag j = 1, 2, . . . , 6
Multi-level Feature-Based Subcellular Location Prediction …
245
where position in protein sequence X is denoted by i, j represents one of the six descriptors. lag is the serial distance between two residues.
2.2.3
GO Frequency Value (GOi )
GO [11] is controlled and structured vocabulary of ontological terms that describe information about protein’s localization within cellular component (CC), participation in biological processes (BP), and association in molecular function (MF). The GO frequency value (GOi ) for the domain is computed by counting the occurrence of the i-th GO term in our dataset of subcellular protein sequences and scaled to be in the range [0,1]. For each protein, GOi is therefore represented as a 7125-dimensional vector. The j-th element in the vector is assigned the frequency value of 1, if a protein has a jth GO term associated, the rest of the elements are set to 0.
3 Results 3.1 Performance Evaluation and Validation Method In statistical prediction, several validation methods are commonly used to measure the performance of the prediction model, including the jackknife test, independent dataset test, and k-fold cross validation. In this paper, we validated our dataset using tenfold cross validation, and we use the following measurement standards to evaluate the reliability and effectiveness of the proposed method: Area under ROC curve.
3.2 Classifier Selection To check the efficacy of the classifier, our prediction method has been evaluated the dataset using the combination of above-mentioned features with four machine learning algorithms [support vector machine (SVM), random forest (RF), K-nearest neighbors (KNN), and Gaussian Naïve Bayes (GNV)]. The performance of all classifiers, for all six locations, is presented in Table 2.
3.3 Feature Selection In this section, to further evaluate the effectiveness of the proposed method, three basic features as described in the previous sections are being considered, and the combinations of the features are also being evaluated based on their respective AUC
246
S. S. Bandyopadhyay et al.
Table 2 Selections of classifier based on the AUC score Dataset CL317
Feature
AUC score CY
ER
Me
Mi
Nu
Se
SVM
0.928
0.783
0.942
0.928
0.864
0.854
KNN
0.715
0.68
0.756
0.748
0.701
0.902
RF
0.95
0.867
0.962
0.978
0.917
0.978
GNV
0.899
0.854
0.922
0.85
0.849
0.774
Table 3 Selections of features based on the AUC score Dataset
Feature
AUC score Cy
Er
Me
Mi
Nu
Se
CL317
GO + MMI
0.963
0.897
0.963
0.987
0.937
0.982 0.931
GO + NMBAC
0.967
0.835
0.958
0.981
0.927
MMI + NMBAC
0.939
0.837
0.924
0.968
0.913
0.938
GO + MMI + NMBAC
0.953
0.867
0.963
0.978
0.917
0.978
score to select the most efficient feature. The evaluation is being carried out for all six subcellular location. Table 3 shows the AUC score of four different features (viz. MMI, GO, NMBAC) in all possible combinations. The average accuracy for tenfold cross validation represents the overall accuracy of the classifier. The result shows combination of GO, and MMI feature gives the best results among all. It gives ~ 96% AUC for Cytoplasm, ~ 89% AUC for Endoplasmic Reticulum, ~ 96% for Membrane, ~ 98% for Mitochondria, ~ 93% for Nucleus, and ~ 98% for secreted proteins. The ROC-AUC curve for all six locations are given in Fig. 1.
3.4 Overall Classification Based on the selected classifier (RF) and feature (combination of GO and MMI), overall classification is being performed for all six subcellular locations as given in Table 4. The classification is done based on five parameters (viz. precision (Pr), sensitivity (Se), accuracy (Accu), F1-score (F1), and AUC score (AUC)). All the results are obtained using tenfold cross validation, and the average results have been listed in the table for all the parameters.
Multi-level Feature-Based Subcellular Location Prediction …
247
Fig. 1 ROC-AUC curve for all six locations using the combination of GO and MMI feature
Table 4 Classification results of CL317 dataset using GO + MMI feature Location
Pr
Se
Accu
F1
AUC
Cy
0.885
0.893
0.886
0.858
0.963
Er
0.884
0.933
0.864
0.863
0.897
Me
0.892
0.936
0.914
0.904
0.963
Mi
0.763
0.973
0.95
0.758
0.987
Nu
0.87
0.935
0.848
0.86
0.937
Se
0.628
0.943
0.921
0.685
0.982
3.5 Method Comparison In this section, to further evaluate the effectiveness of the proposed method, we compare it with some previous methods on the same apoptosis protein datasets. Our proposed method is being compared with other five methods, viz. Chen et al. [2], Chen et al. [12], Ding et al. [13], Liu et al. [4], and Zhang et al. [3]. To compare, sensitivity score (Se) is being considered for all six locations. Except the sensitivity score of Cy and Er proposed by Ding et al. [13], and Er, membrane and nucleus proposed by Liu et al. [4], our method out performs all the sensitivity scores for all the subcellular locations proposed by all the three methods. In reference to the overall accuracy (OA) score, our proposed method outperforms the first two but fails to beat the score proposed by Ding et al., Zhang et al., and Liu et al. Our proposed method provides state-of-the-art performance in predicting the subcellular locations of secreted protein and mitochondrial protein. The results are described in Table 5.
248
S. S. Bandyopadhyay et al.
Table 5 Comparison from different methods on CL317 dataset Methods
Sensitivity score (Se)
OA Cy
Er
Me
Mi
Nu
Se
Chen et al. [2]
0.813
0.83
0.818
0.853
0.827
0.882
0.827
Chen et al. [12]
0.911
0.872
0.891
0.794
0.731
0.588
0.842
Zhang et al. [3]
0.929
0.865
0.855
0.765
0.936
0.765
0.88
Ding et al. [13]
0.989
0.979
0.836
0.794
0.904
0.824
0.915
Liu et al. [4]
0.982
0.957
0.964
0.941
0.962
0.824
0.959
Our method
0.893
0.933
0.936
0.973
0.935
0.943
0.897
4 Conclusion In this paper, we focus on designing some novel feature for the predicting subcellular locations of apoptosis proteins. By introducing a concept of an encoding scheme, where twenty amino acids are clustered into seven functional groups and the protein sequences are encoded. The sequence information feature is combined with physicochemical property-based auto-correlation function and ontology-based binary feature. Among all the features based on the AUC score, the combination of GO and MMI is selected when applied on one of the benchmark datasets CL317. Though or proposed method achieves an overall accuracy of ~ 90% and fails to beat the scores proposed by Ding et al. [13] and Liu et al. [4] but our proposed method provides state-of-the-art performance in predicting the subcellular locations of secreted protein and mitochondrial protein. Incorporation of evolutionary information may boost up the prediction accuracy of these classifiers. Acknowledgements This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India Funding This work has been supported by the Polish National Science Centre (2019/35/O/ST6/ 02484 and 2020/37/B/NZ2/03757) and by Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) program.
References 1. Zhang Z-H, Wang Z-H, Zhang Z-R, Wang Y-X (2006) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580(26):6169–6174 2. Chen Y-L, Li Q-Z (2007) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245(4):775–783 3. Zhang L, Liao B, Li D, Zhu W (2009) A novel representation for apoptosis protein subcellular localization prediction using support vector machine. J Theor Biol 259(2):361–365
Multi-level Feature-Based Subcellular Location Prediction …
249
4. Liu T, Tao P, Li X, Qin Y, Wang C (2015) Prediction of subcellular location of apoptosis proteins combining tri-gram encoding based on PSSM and recursive feature elimination. J Theor Biol 366:8–12 5. Liang Y, Zhang S (2018) Prediction of apoptosis protein’s subcellular localization by fusing two different descriptors based on evolutionary information. Acta Biotheor 66(1):61–78 6. Li B, Cai L, Liao B, Fu X, Bing P, Yang J (2019) Prediction of protein subcellular localization based on fusion of multi-view features. Molecules 24(5):919 7. Pan X et al (2021) Identification of protein subcellular localization with network and functional embeddings. Front Genet 11:1800 8. Ding Y, Tang J, Guo F (2016) Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinform 17(1):1–13 9. Halder AK, Chatterjee P, Nasipuri M, lewczynski D, Basu S (2018) 3gClust: human protein cluster analysis. IEEE/ACM Trans Comput Biol Bioinform 1(1) 10. Feng Z-P, Zhang C-T (2000) Prediction of membrane protein types based on the hydrophobic index of amino acids. J Protein Chem 19(4):269–275 11. G. O. Consortium (2019) The gene ontology resource: 20 years and still going strong. Nucleic Acids Res 47(D1):D330–D338 12. Chen Y-L, Li Q-Z (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248(2):377–381 13. Gu Q, Ding Y-S, Jiang X-Y, Zhang T-L (2010) Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection. Amino Acids 38(4):975–983
The Identification of Chromatin Contact Domains (CCD) in Human Genomes from ChIA-PET Data Using Graph Methods Rafał Chabasinski, ´ Kaustav Sengupta, and Dariusz Plewczynski
Abstract The aim of the study is to implement selected graph-based methods for the identification of chromatin contact domains (CCD) in human genomes. Such genomic domains (CCDs) represent the segments of chromatin 10 nm fiber composed from DNA double helix and proteins that are tightly compacted forming the higherorder chromosomal structures. The spatial contacts between genomic loci within and between CCDs are detected by the various 3C-type experimental methods, yet they are affected by the low signal-to-noise ratio, and they are averaged over millions of cells. In the proposed method, we will implement computational methods based on graph theory. Next, we will evaluate the performance of these algorithms comparing their results with the orthogonal methods and the current state of knowledge about the biophysical nature of globular domains in mammalian genomes. Keywords CCD identification · Graph-based methods · Human genome · Bioinformatics · CTCF · ChIA-PET
1 Introduction Human DNA is not simple or a straight polymer. The DNA is a 2-m long chain of amino acid sequences which is condensed in the cell nucleus in a 2 macomiter space. This condensation leads to formation of 3D structure mediated by various R. Chabasi´nski · K. Sengupta · D. Plewczynski (B) Faculty of Mathematics and Information Science, Warsaw Technical University, Warsaw, Poland e-mail: [email protected] R. Chabasi´nski e-mail: [email protected] K. Sengupta e-mail: [email protected] K. Sengupta · D. Plewczynski Center of New Technologies, University of Warsaw, Warsaw, Poland © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_23
251
252
R. Chabasi´nski et al.
protein factors like CTCF, etc. These structures are hierarchical in nature and can be classified into various levels of hierarchy. The most basic structure is called a chromatin loop. These loops are created when two parts of chromatin are illuminated together in 3D space by some factors like CTCF Promoter-Promoter or Promoter Enhancer, etc. A set of loops or a region which have a higher concentration of such loops is called a topologically associated domains (TAD) or chromatin contact domains (CCD). These TADs are globular, and they have similar levels of gene activation [1–3]. The TAD regions have more loops internally than externally and often contain enhancers and target genes [4]. The next structures seen are more associated with the functionality of the genome. In [5], it has been seen that these topological domains are folded in such a way that they form six different subunits based on the histone marks. These subunits are also called sub-compartments, and based on the transcriptional level, we can classify them into two compartments, namely active or A and inactive or B. The active A compartment consists of A1 and A2 subunits, whereas B consists of B1, B2, B3, and B4 sub-units. Also, interactions between TADs can form higher-order structures such as chromatin compartments [4, 6] or specific topologies called meta-TADs [7], which display increased tissue specificity (Fig. 1). TADs/CCDs can be identified when TAD boundaries are enriched for promoters of active genes and transcription, suggesting that these genomic elements are key factors in 3D genome architecture [8]. CCCTC-binding factor (CTCF) is also enriched at many TAD boundaries [1], and genomic rearrangements of these boundaries can disrupt interactions of CTCF binding sites with upstream convergent sites and alter the expression of neighboring genes [4]. The loop extrusion model [9] has been proposed as an elegant mechanism to generate TADs and compaction of chromatin; a model that aims to explain how cohesin acts as a chromatin extruder in the absence of
Fig. 1 Hierarchical structure of the genome
The Identification of Chromatin Contact Domains (CCD) in Human …
253
CTCF. However, it is debatable if all TADs are formed through the same mechanisms. For example, TAD boundaries can be retained upon deletion of CTCF binding, and TAD-like domains are also found in species that do not have CTCF homologs. In the past years, efforts were focused on identifying the TADs from Hi-C data [10]. The methods designed till now can be classified into four broad types (i) insolation score: the method breaks the genomes into bins and assign score based on the number of loops in each bins to find the bins with most interactions to detect TADs, (ii) statistical: these group of methods rely on the statistics of loop distribution, (iii) clustering: these methods are mostly based on hierarchical clustering of genomic regions. (iv) The most recent method tends to use graphs by representing the Hi-C matrix as an adjacency matrix and detects communities in these graphs [10, 11]. With the advancement of various other methods to detect the 3D chromosomal organization like ChIA-PET, promoter capture Hi-C or GAM, the present graphsbased methods can be modified and used for TAD direction. From the loops/ interacting fragments identified from various methods, we can represent them as graphs easily. The hypothesis we try to propose in the paper is can the problem of TAD identification be modeled as identifying highly interconnected regions in graphs and can we use simple cluster direction algorithms to identify the TADs in reasonable computational and time complexity? To address the aforementioned task, in the proposed work, we have used ChIAPET data and represent them as graphs. In our graphs, the CTCF anchors are represented as nodes, and the loops are represented as edges. Then, we show three very simple graph algorithms: modularity maximization, Markov cluster algorithm, and k-clique percolation to show how efficiently each of these algorithms detect the TADs in blood cells from different individuals and compare the results among them.
2 Material and Methods ChIA-PET Data: In the proposed work, we will use the data from CTCF and RNAPOL2 ChIA-PET [12] (Chromatin Interaction Analysis by Paired-End Tag Sequencing) experiments. The ChIA-PET method is based on the identification of the pairs of two DNA sequence segments which form the spatial contact. The result of the experiment is a set of such pairs, where each pair is described by its localization on the whole genome DNA sequence and the likelihood of spatial interaction (i.e., strength of contact between them measured by the number of observed pairs of reads from those genomic locations). The chromatin domain is represented in such a network model as a densely connected cluster of vertices. In the proposed method, we focus on working on the human lymphoblastoid cell line GM12878 representing the white blood B cells. We compare the four different algorithms over lymphoblastoid cell lines: HG00512, HG00513, HG00514—Han Chinese, HG00731, HG00732, HG00733— Puerto Rican, NA19238, NA19239, NA19240—Yoruba variations of GM18878 cell
254
R. Chabasi´nski et al.
line. In the proposed method, the new human hg38 genome is used as the reference DNA sequence.
2.1 Graph Construction The experimental data collected from the next generation sequencing of ChIAPET libraries can be converted into a graph, where edges represent spatial contacts between pairs of chromatin fragments—anchors, which are represented as vertices [3]. In the proposed work, we have identified genomic domains for these 9 cell lines and then compared them with each other, evaluating the variability of the CCD domains between different cell types and between different individuals from the human population.
2.2 Modularity Maximization The first algorithm that we use in the proposed work is modularity maximization. In the proposed work, we use the greedy version of modularity maximization proposed by Clauset–Newman–Moore. The algorithm basically follows two steps. 1. Iteration over all nodes, where every node is placed in a community which will give maximum modularity gain. It stops when it can’t make any more moves that will improve modularity scores. The main reason why the algorithm performs so well is because the procedure of computing modularity difference when putting a singular node into a specific cluster is quite easy. 2. Creating a new network out of communities found during the first phase.
2.3 Markov Cluster Algorithm The Markov clustering algorithm (MCL) is a dynamic clustering algorithm that typically detects sets of highly interconnected nodes and classifies them as clusters. The algorithm is based on the premise that if we start in a node, and then randomly travel to connected nodes, we are more likely to stay in a cluster rather than travel between clusters. So we define a random walk using a Markov chain (sequence of variables, where given the present state, the past, and future states are independent). We define a probability for going from one node to another. If edges in a graph are of equal weight, chance of traveling to neighbor is 1/number of neighbors, and chance of traveling to non-neighbor is 0. We can represent all probabilities of travels from node to node in a probability matrix, where each column sums to one.
The Identification of Chromatin Contact Domains (CCD) in Human …
255
2.4 k-Clique Percolation k-Clique is a complete subgraph of k-vertices. The method searches for such vertices in a network and then merges them if they have k − 1 common nodes. This method tries to find the maximal union of k-clique-s that are connected, forming clusters. These structures are called k-clique chains. This approach is based on the intuition that cliques between vertices are likely to be found inside clusters, while their occurrence is unlikely among the nodes connecting the clusters. It is worth noting that changing parameter k resolves in identifying communities of different strength. This is quite unique for a clustering algorithm. This method allows each vertex to belong to a number of communities and, because of that, each community can be connected with a large number of other communities, which is representative of how real networks look like. This contrasts with the divisive and agglomerative methods, where each node can belong to only one community, and thus, communities are separated from each other. This in turn leads to the loss of a lot of communities in the network.
2.5 Comparing the Algorithms In order to compare the sets of chromatin domains, the concordance measure was used. For comparison of the structure of subgraphs representing individual CCDs between different cells and persons, measures such as Wiener index, closeness/ betweenness centrality, clustering coefficient and measures related to entropy were used.
3 Result We compared the result for three individuals on the GM12878 cell line and considered the average accuracy when compared with the ground truth. We found that modularity maximization gave the best results with 0.994 overlap followed by kclique percolation with an overlap score of 0.951, and the worst performance was done by MCL algorithm. Figure 2 shows the CCDs detected in chromosome 8 and their length distribution for each method (Table 1).
256
R. Chabasi´nski et al.
Fig. 2 Top panel shows the graphs and detected CCDs for chromosome 8, and down panel shows the distribution of CCD lengths from each method
Table 1 Accuracy of three graph-based methods when compared to ground truth
Algorithm
Accuracy
Modularity maximization
0.994
Markov clustering
0.621
k-Clique percolation
0.951
4 Conclusion In this work, we show that simple graph-based algorithms identify the TADs across individuals for human lymphoblastoid cell line GM12878. Each algorithm discussed in this method has its own set of metrics. So, the selection of a best algorithm will be dependent on addressing a specific biological question. Overall, all algorithms seem to be doing clustering well. Every one of them has its own preferences in what it treats as a cluster and where does one end. Some are stricter, while others are more liberal. Some create larger clusters, while other smaller clusters. Another significant finding we had from the graphs is that sometimes there were interactions composed of very distant parts of the chromatin. For example, there were interactions from the start of the chromatin to almost the end of it. This is most likely the reason why some of the created clusters were so long and spanned almost whole chromatin. That made us think if these long edges were not somehow accidental and if they should not be filtered out when looking for domains. This can be considered as a probable future direction of the proposed work.
The Identification of Chromatin Contact Domains (CCD) in Human …
257
Acknowledgements This work has been supported by the Polish National Science Centre (2019/ 35/O/ST6/02484 and 2020/37/B/NZ2/03757), Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund (TEAM to DP). The work was co-supported by European Commission Horizon 2020 Marie Skłodowska-Curie ITN Empathy grant ‘Molecular Basis of Human enhanceropathies’; and National Institute of Health USA 4DNucleome grant 1U54DK107967-01 “Nucleome Positioning System for Spatiotemporal Genome Organization and Regulation”. DP was co-funded by Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) program, and co-supported as RENOIR Project by the European Union Horizon 2020 research and innovation program under the Marie SkłodowskaCurie grant agreement No 691152 and by Ministry of Science and Higher Education (Poland), grant Nos. W34/H2020/2016, 329025/PnH/2016.
References 1. Dixon JR et al (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485(7398):376–380 2. Nora EP et al (2012) Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485(7398):381–385 3. Sexton T, Cavalli G (2015) The role of chromosome domains in shaping the functional genome. Cell 160(6):1049–1059 4. Symmons O et al (2014) Functional and topological characteristics of mammalian regulatory domains. Genome Res 24(3):390–400 5. Rao SSPP et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7):1665–1680. https://doi.org/10.1016/j.cell.2014.11.021 6. Lieberman-Aiden E et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293 7. Pancaldi V et al (2016) Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity. Genome Biol 17(1):152. https://doi.org/10.1186/s13059016-1003-3 8. Dixon JR, Gorkin DU, Ren B (2016) Chromatin domains: the unit of chromosome organization. Mol Cell 62(5):668–680 9. Sengupta K et al (2022) Multi-scale phase separation by explosive percolation with single chromatin loop resolution. bioRxiv 10. Chili´nski M, Sengupta K, Plewczynski D (2021) From DNA human sequence to the chromatin higher order organisation and its biological meaning: using biomolecular interaction networks to understand the influence of structural variation on spatial genome organisation and its functional effect 11. Halder AK, Denkiewicz M, Sengupta K, Basu S, Plewczynski D (2019) Aggregated network centrality shows non-random structure of genomic and proteomic networks. Methods 181– 182:5–14 12. Tang Z et al (2015) CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163(7):1611–1627. https://doi.org/10.1016/j.cell.2015.11.024
Prediction of COVID-19 Drug Targets Based on Protein Sequence and Network Properties Using Machine Learning Algorithm Barnali Chakraborty, Atri Adhikari, Akash Kumar Bhagat, AbhinavRaj Gautam, Piyali Chatterjee, and Sovan Saha
Abstract Recently, human health has been critically exposed to a pandemic caused by coronavirus (COVID-19), which has threatened public health for the last 2 years. Some medications that treat other diseases seem effective in treating COVID-19 without explicit support. A search for new drug/drug targets is underway. This research will focus on the main virus-based and host-based targets that may provide valuable insights into discovering drugs in medicinal chemistry. The task of identification and selection of drug targets is becoming very promising research in drug discovery. Computational approach-based analyzes are beneficial in providing information about the principles of proteins and drugs by analyzing drug target features. At the same time, in-silico target identification becomes attractive in terms of time and cost for large-scale human genomic and proteomic data. This work mainly deals with predicting COVID-19 drug targets and non-targets in humans through several machine learning approaches like decision tree, random forest classifier, support vector machine, K-means, and logistic regression based on protein sequence features and network properties. However, the random forest classifier seems to obtain an overall accuracy of 0.83, significantly higher than the other existing state-of -the-arts. Keywords COVID-19; Drug targets · Machine learning · Random forest classifier · Protein sequence features · Network features · Protein–protein interaction network · COVID-19 drug targets · COVID-19 drug non-targets
B. Chakraborty · A. Adhikari · A. K. Bhagat · A. Gautam · P. Chatterjee Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, Calcutta 700152, India S. Saha (B) Department of Computer Science and Engineering, Institute of Engineering and Management, Salt Lake Electronics Complex, Calcutta, West Bengal 700091, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_24
259
260
B. Chakraborty et al.
1 Introduction Various diseases like Ebola and various types of flu have already created an adverse impact on human lives. The recent coronavirus pandemic enhances that level of adversity even more. Due to the fast transmission rate, the disease spreads from one infected person to another healthy person within a short time. COVID-19 virus consists of both structural and non-structural proteins. The most recommended or suggested drugs are Azithromycin [1], Remdesivir [2], Lopinavir [3], Ritonavir [3], etc. However, approval for the generalized use for COVID-19 treatment needs many more clinical trials and satisfactory results. The development of new drugs primarily depends on three strategies: (1) Existing wide range of anti-virals are tested first [4]. This category involves the application of drugs like cyclophilin, ribavirin, etc. The main advantage of this strategy is that drug dosage, probable effects, and side effects are known as these drugs are approved for use in viral infections. The matter of concern is that these drugs can be applied in a wide field of viral infections, so they might not work in target-specific viral infections like coronavirus, etc. Moreover, these drugs’ adverse unknown side effects in this specific type of viral infection cannot be neglected. (2) Screening of effective molecules through high-throughput screening (HTS) [5] from the existing databases, which might have the capability to impact COVID-19 significantly. This strategy can further exploit more new functions of drug molecules (like the anti-HIV property of lopinavir). (3) Development of new target-specific drugs based on genomic and pathological information of different COVID-19 species. As both genomic and pathological information are embedded in this strategy, these drugs usually reflect high anti-COVID-19 features. Despite the availability of these three strategies, the fact cannot be denied that developing any new drug requires ample time and cost [6]. The need for an effective drug to combat coronavirus gradually increases with a rise in the COVID-19 deaths. This is when drug repurposing (or drug repositioning) has gained so much popularity since it is a unique methodology to reuse existing drugs for new diseases [7]. It involves much less cost and time than developing new drugs. Drug repurposing methodology can be classified into several groups [8] like (1) PPIN-based [9], (2) protein target-based [9], (3) protein-path based, etc. These methodologies help researchers execute drug repurposing on several existing drugs and quickly test them on more significant disease datasets. Zhou et al. [10] incorporated anti-viral properties in a drug repurposing methodology, which implements a network medicine platform based on pharmacology. It quantifies the inter-relationship between human drug targets and the complete interactome of the pathogen (HCoV)-host (human). Based on this statistical analysis, they have filtered out 16 anti-HcoV drugs as the probable contenders. Human cell gene enrichment analyzes have further validated the effectiveness of these detected drugs. In another work, Dezs˝oa and Ceccarelli [11] proposed a machine learning method to differentiate oncology drug targets from the non-targets using protein and PPIN features.
Prediction of COVID-19 Drug Targets Based on Protein Sequence …
261
So potential drug target identification will lead to the advancement of possible clinical trials and the development of drugs. With the mass build-up of the several approved, experimental, and clinical trial drugs, it is transpicuous that potential drug targets reveal various vital features, including biological functions significantly essential for multiple diseases. Nevertheless, it also incorporates other features that favor existing binding sites, resulting in the binding of the proteins with other small molecules. In this proposed work, a machine learning approach is proposed to classify COVID-19 drug targets and non-targets based on the features extracted from protein sequence information and protein–protein interaction network (PPIN) properties. A random forest classifier is proposed to prioritize proteins according to their feature similarity to approved drug targets and non-targets. This computational approach can become an efficient and cost-effective tool for COVID-19 drug target discovery. In this proposed approach, a model is initially built on a training set of approved drug targets and a negative set of non-drug targets. Then, this model is used to classify the test proteins as a COVID-19 drug target or non-target.
2 Materials and Method 2.1 Data Collection The proposed work is primarily dependent on two sets of proteins: (1) COVID19 drug targets (proteins) and (2) non-targets (proteins). A set of ninety COVID19 drug targets are collected from the therapeutic target database (TTD) [12]. TTD is a database that provides therapeutic protein information and explores their corresponding drug targets, associated diseases, pathway information, etc. A corresponding set of ninety COVID-19 drug non-targets are selected randomly from the UniProt [13] Human interactome consisting of reviewed 20,371 proteins after subtracting the ninety COVID-19 drug targets from it.
2.2 Sequence Features The amino acid sequences of COVID-19 drug targets and non-targets are retrieved from TTD and UniProt, respectively. Once the sequences are extracted, various sequence features are computed using Pfeature [14]. Pfeature is a Web server that can compute a wide range of features from the amino sequence of proteins. The various standard physico-chemical features which are considered in this research are (1) Positively Charged, (2) Negatively Charged, (3) Neutrally Charged, (4) Polarity, (5) NonPolarity, (6) Aliphaticity, (7) Cyclic, (8) Aromaticity, (9) Cyclic, (10) Acidicity, (11) Basicity, (12) Neutral ph, (13) Hydrophobicity, (14) Hydrophilicity, (15) Neutral, (16) Hydroxylic, (17) Sulfur Content, (18) Tiny, (19) Small, (20) Large, (21) Secondary
262
B. Chakraborty et al.
Structure (Helix), (22) Solvent Accessibility (Buried), (23) Solvent Accessibility (Intermediate).
2.3 Network Features The human protein–protein interaction network (PPIN) of STRING database [15] is used to fetch the interactions of the corresponding selected set of protein targets and non-targets. These interactions are used to compute network feature values of these proteins. CytoNCA, a Cytoscape [16] plugin [17], is used for this purpose. It analyzes the entire PPIN of a particular protein and computes its centrality-based network features like (1) Subgraph Centrality [18], (2) Degree Centrality [19], (3) Eigenvector Centrality [20], (4) Information Centrality, (5) Local Average Connectivity (LAC) Centrality [21], (6) Betweenness Centrality [22], (7) Closeness Centrality [23], and (8) Network Centrality [24].
2.4 Classification The difference between the COVID-19 drug targets and non-targets can be represented as a binary classification problem (i.e., 0 for non-targets and 1 for targets). So, several well-known machine learning classifiers like decision tree (DT), random forest (RF), support vector machine (SVM), K-means, and logistic regression (LR) are used. Decision Tree (DT) This classification model is based on a decision tree in which each node performs a test on an attribute/feature of the dataset while the edge generating from that node holds the decision rules, and the leaf nodes yield the possible outcomes of the test. The scikit-learn Python package [25] is taken into consideration while implementing this model. Random Forest (RF) A subset of features is selected randomly in RF to generate several individual decision trees which extend simultaneously. In this type of classifier, each tree is considered an object, and each object votes for a specific class. Thus, the class that obtains the majority voting emerges as the model’s prediction result. The scikit-learn Python package [25] is also used for this classifier, and the model is executed with optimal parameters to gain the best performance. Support Vector Machine (SVM) In this classification model, the data is explicitly mapped over a vector space to generate a hyperplane or a decision boundary that will maximize the margin of
Prediction of COVID-19 Drug Targets Based on Protein Sequence …
263
the data points belonging to two separate classes. It is implemented with optimal gamma and cost parameters of the scikit-learn Python package [25] to obtain the best performance. K-Means It is an iterative-based algorithm that disintegrates the unlabeled dataset into k different clusters so that each dataset falls only once in a group with similar characteristics or features. Since the number of clusters can be dictated, it can be easily applied in a classification model where data can be divided into clusters that may be equal to or greater than the number of defined classes. The scikit-learn Python package [25] is used to execute K-means. Logistic Regression (LR) It is a statistical concept that is mainly dependent on the application of probability. It is used in those problems where the target or the dependent variables are categorical. It executes the sigmoid function to generate the probability of a label. The scikit-learn Python package is selected in this research project to implement this model over the prepared dataset.
2.5 Performance Measures The performance of the previously described five machine learning models on our curated dataset is estimated by the computation of precision (P), recall (R), and Fscore (F) for both the COVID-19 drug targets and non-targets. The following Eqs. 1, 2, and 3 extract them: Precision (P) =
TP TP + FP
(1)
Recall (R) =
TP TP + FN
(2)
F-score(F) =
2∗ P ∗ R P+R
(3)
where (1) true positive (TP) is the number of COVID-19 drug targets correctly identified as COVID-19 drug targets. (2) False positive (FP) is the number of COVID-19 drug non-targets incorrectly identified as COVID-19 drug targets. (3) False negative (FN) is the number of COVID-19 drug targets incorrectly identified as COVID-19 drug non-targets.
264
B. Chakraborty et al.
Table 1 Performance estimation of classification models (DT, RF, SVM, K-means, and LR) Classification model
Label
Precision
Recall
F1-score
Overall accuracy
RF
0
1
0.71
0.83
0.83
1
0.71
1
0.83
DT
0
0.83
0.71
0.77
1
0.65
0.73
0.69
0
0
0
0
1
0.42
1
0.59
K-means
0
0.58
1
0.74
1
0
0
0
LR
0
0.58
1
0.74
1
0
0
0
SVM
0.72 0.42 0.58 0.58
3 Results Significant features like physico-chemical and PPIN features are used to differentiate between COVID-19 drug targets and non-targets. After the generation of features, it is noted that several missing values for a specific set of features need to be appropriately handled. So, data cleaning has been performed. The median value of every column is calculated by which the empty cells are filled up. After data cleaning, label encoding is performed to represent the label data column representing 1 as COVID-19 drug targets and 0 as COVID-19 drug non-targets. The entire dataset is split into 20% test data and 80% training data. Then, these data implement several classifiers like DT, RF, SVM, K-means, and LR. Initially, the model is fitted on the training set, while the performance of these models is evaluated using the test set. The results are highlighted in Table 1. The receiver operating characteristic (ROC) multiclassifier graph is also generated for DT, RF, SVM, and K-means. It is also highlighted in Fig. 1. These data clearly show that random forest classifier gives the best results and accuracy.
4 Discussion This research predicts COVID-19 drug targets and non-targets human proteins based on protein sequence features and PPIN properties. Though this work helps in future COVID-19 drug discovery, this research can be further improved. In this research work, only 32 features are used. If this experimentation can be done by including more features in the dataset, then the accuracy performance of the model can be improved. Some physiochemical properties like molecular shape, ionization, and hydrogen bonding capacity can be used. Also, the dataset consists of 90 targets and non-targets
Prediction of COVID-19 Drug Targets Based on Protein Sequence …
ROC curve for DT
ROC curve for RF
ROC curve for SVM
ROC curve for K-Means
265
Fig. 1 ROC curve for DT, RF, SVM, and K-means
of COVID-19 human proteins. Incorporating more COVID-19 drug targets and nontargets might enhance the model’s effectiveness. Moreover, this model is currently restricted to only COVID-19 which might be extended to other diseases like Ebola and Dengue in our future works.
References 1. Gyselinck I, Liesenborghs L, Belmans A, Engelen MM, Betrains A, Van Thillo Q et al (2022) Azithromycin for treatment of hospitalised COVID-19 patients: a randomised, multicentre, open-label clinical trial (DAWn-AZITHRO). ERJ Open Res 8(1):00610-2021. https://doi.org/ 10.1183/23120541.00610-2021 2. Beigel JH, Tomashek KM, Dodd LE, Mehta AK, Zingman BS, Kalil AC et al (2020) Remdesivir for the treatment of Covid-19—final report 383(19):1813–1826. https://doi.org/10.1056/NEJ Moa2007764 3. Horby PW, Mafham M, Bell JL, Linsell L, Staplin N, Emberson J et al (2020) Lopinavir–ritonavir in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial. Lancet 396(10259):1345–1352. https://doi.org/10.1016/S0140-673 6(20)32013-4 4. Chan JF, Chan KH, Kao RY, To KK, Zheng BJ, Li CP et al (2013) Broad-spectrum antivirals for the emerging Middle East respiratory syndrome coronavirus. J Infect 67(6):606–616. https:// doi.org/10.1016/j.jinf.2013.09.029
266
B. Chakraborty et al.
5. de Wilde AH, Jochmans D, Posthuma CC, Zevenhoven-Dobbe JC, van Nieuwkoop S, Bestebroer TM et al (2014) Screening of an FDA-approved compound library identifies four smallmolecule inhibitors of Middle East respiratory syndrome coronavirus replication in cell culture. Antimicrob Agents Chemother 58(8):4875–4884. https://doi.org/10.1128/aac.03011-14 6. Wu C, Liu Y, Yang Y, Zhang P, Zhong W, Wang Y et al (2020) Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharmaceutica Sinica B 10(5):766–788. https://doi.org/10.1016/j.apsb.2020.02.008 7. Talevi A, Bellera CL (2020) Challenges and opportunities with drug repurposing: finding strategies to find alternative uses of therapeutics. Expert Opin Drug Discov 15(4):397–401. https://doi.org/10.1080/17460441.2020.1704729 8. Dotolo S, Marabotti A, Facchiano A, Tagliaferri R (2020) A review on drug repurposing applicable to COVID-19. Brief Bioinform 22(2):726–741. https://doi.org/10.1093/bib/bbaa288% JBriefingsinBioinformatics 9. Saha S, Halder AK, Bandyopadhyay SS, Chatterjee P, Nasipuri M, Bose D et al (2022) Drug repurposing for COVID-19 using computational screening: is fostamatinib/R406 a potential candidate? Methods 203:564–574. https://doi.org/10.1016/j.ymeth.2021.08.007 10. Zhou Y, Hou Y, Shen J, Huang Y, Martin W, Cheng F (2020) Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov 6(1):14. https://doi.org/10.1038/ s41421-020-0153-3 11. Dezs˝o Z, Ceccarelli M (2020) Machine learning prediction of oncology drug targets based on protein and network properties. BMC Bioinform 21(1):104. https://doi.org/10.1186/s12859020-3442-9 12. Chen X, Ji ZL, Chen YZ (2002) TTD: therapeutic target database. Nucleic Acids Res 30(1):412– 415. https://doi.org/10.1093/nar/30.1.412 13. Consortium TU (2020) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49(D1):D480–D489. https://doi.org/10.1093/nar/gkaa1100%JNucleicAcidsResearch 14. Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A et al (2019) Computing wide range of protein/peptide features from their sequence and structure 2019:599126. https://doi.org/10. 1101/599126. bioRxiv 15. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S et al (2020) The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 49(D1):D605–D612. https://doi.org/ 10.1093/nar/gkaa1074%JNucleicAcidsResearch 16. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/gr.1239303 17. Tang Y, Li M, Wang J, Pan Y, Wu F-X (2015) CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127:67–72. https://doi. org/10.1016/j.biosystems.2014.11.005 18. Estrada E, Rodríguez-Velázquez JA (2005) Statistical, nonlinear, physics SM. Subgraph centrality in complex networks. Phys Rev 71(5 Pt 2):056103 19. Jeong H, Mason SP, Barabási AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411(6833):41–42. https://doi.org/10.1038/35075138 20. Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 92(5):1170–1182 21. Li M, Wang J, Chen X, Wang H, Pan Y (2011) A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem 35(3):143–150. https://doi.org/10.1016/j.compbiolchem.2011.04.002 22. Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2005(2):96–103. https://doi.org/10.1155/JBB.200 5.96 23. Wuchty S, Stadler PF (2003) Centers of complex networks. J Theor Biol 223(1):45–53. https:// doi.org/10.1016/s0022-5193(03)00071-7
Prediction of COVID-19 Drug Targets Based on Protein Sequence …
267
24. Wang J, Li M, Wang H, Pan Y (2012) Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform 9(4):1070–1080. https://doi.org/ 10.1109/tcbb.2011.147 25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
A Meta-consensus Strategy for Binarization of Dendritic Spines Images Shauvik Paul, Nirmal Das, Subhrabesh Dutta, Dipannita Banerjee, Soumee Mukherjee, and Subhadip Basu
Abstract Image binarization is the process of separation of pixel values into two groups, foreground and background. Thresholding can be categorized into global thresholding and local thresholding. This paper describes a consensus-based strategy that takes into account the predictions of some classic thresholding methods like Otsu, Niblack, Savoula, and a deep learning model called UNet and takes a decision based on majority voting. In this work, we present different ways of combining the classic binarization methods. The qualitative and quantitative results show that it outperforms individual binarization method. The quantitative result is presented in terms of statistical parameters: F-measure, recall, precision. Keywords Component · Digital phantom design · Carotid vasculature · Fuzzy distance transformation · 3D rendering · Geodesic paths
S. Paul · N. Das (B) · S. Basu Department of CSE, Jadavpur University, Calcutta 700032, India e-mail: [email protected] S. Basu e-mail: [email protected] S. Dutta · D. Banerjee Department of MCA, Jadavpur University, Calcutta 700032, India S. Mukherjee Department of CSE, Heritage Institute of Technology, Calcutta 700107, India N. Das Department of CSE (AIML), Institute of Engineering and Management, Calcutta 700091, India S. Paul Department of MCA, Techno Main Saltlake, Calcutta 700091, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_25
269
270
S. Paul et al.
1 Introduction Binarization of images has been an active area of research due to its potential in reducing the complexity of images. It is therefore a very important part of many image processing pipelines. In many medical image analysis techniques, binarization is the first step for further segmentation and analysis task [1]. In this paper, we are interested in binarization of microscopic images dendritic spine. Dendritic spines are membranous protrusions neuronal dendrites [2], and they play a central role in controlling electrical and biochemical compartmentalization and in activity and signal transmission of neural circuits [3]. The shape of dendritic spines changes spontaneously, or in response to neuronal stimulation, and these changes are related to learning, memory [4], and many neuropsychiatric and neurodegenerative diseases [5, 6] Alzheimer, Schizophrenia, etc. Many aspects of the structure–function relationship that exists in dendritic spines are main still unknown due to their complex morphology. Therefore, structural analysis of dendritic spine is crucial in neurobiology. Microscopic image of dendritic spines is of three types: ex vivo, in vivo, and in vitro. There exist two types of approaches for segmentation and analysis of dendritic spines, one type of methods work directly on the 3D images [7] and other type of methods work on the 2D projection images of 3D images [8]. Both of these methods have their own advantages and disadvantages. The most commonly used projection method for microscopic 3D images is maximum intensity projection (MIP). This process, however, due to its simplicity, results in loss of information and structural overlap [6]. The complexity of the MIP images increases due to the inherent noise present in the microscopic images. Binarization of those MIP images is difficult, and there is no such general binarization algorithm that works for all. A large amount of algorithms have been introduced over decades for image binarization. The most common algorithms involve selecting a local or global threshold for pixel values. The pixels with intensity values greater than the threshold considered as foreground and the pixels with intensity values lesser than the threshold considered as background. The algorithms like Otsu [9], Niblack [10], and Sauvola [11] are very popular threshold-based image binarization methods used in many image processing domains. Nowadays, deep learning methods are being used for image binarization. But deep learning models require a lot of training images, time, computational resources, and ground truth annotations by experts to meticulously learn the complex features of the medical images. Accurate ground truth annotations in complex microscopic images are difficult to achieve for the neurobiologists and sometimes infeasible. This lack of proper ground truth especially in the case of lower resolution microscopic images doesn’t always retain the structural information to be learnt by deep learning models. On the contrary, thresholding algorithms that exist don’t require such labeled data, and they take a lot less time than their deep learning counterparts. Algorithms like Otsu [9], Niblack [10], and Sauvola [11] perform with varying degrees of performance which will also be demonstrated in the paper. In this paper, we propose a consensus-based method for binarization of MIP images dendritic spines. The majority consensus of Otsu, Niblack, Sauvola, and a
A Meta-consensus Strategy for Binarization of Dendritic Spines Images
271
low-cost UNet model is taken to decide a pixel is foreground or background. The results of the proposed consensus-based binarization method are compared with the results of individual Otsu’s method [9], Niblack method [10], Sauvola method [11], and a low-cost UNet model [12]. The qualitative and quantitative results show that the proposed consensus-based binarization outperforms the individual methods.
2 Methodology In this section, we first describe the basic definitions and notations related to 2D digital space. Next, we describe the proposed meta-consensus-based binarization for binarization of MIP images of dendritic spines.
2.1 Basic Definitions and Notations A 2D image is denoted by {Z 2 |where Z is the set of positive integers}. A point on the grid referred to as a pixel is a member of Z 2 , denoted by (x1 , x2 ). Two pixels p = (x1 , x2 ) and q = (y1 , y2 ) are adjacent if {max(|xi − yi |) ≤ 1|1 ≤ i ≤ 2}, where |.| means the absolute value. Two adjacent pixels are often called neighbors of each other, and one pixel can have 8 such neighbors, excluding itself. A collection of such N × N neighbor pixels around a particular pixel selected together is called a window.
2.2 Our Proposed Algorithm We take our input image as a grayscale image. Let img[i, j] be the matrix representation of the input grayscale image and h and w be the height and width of that image. At first, we tried to improve adaptive binarization method or Sauvola method [11] as it was recommended for binarizing images having uneven illumination, light texture, and stained images. By original Sauvola method, there are two parameters, named k and R, and their recommended values are k = 0.5 and R = 128. But applying these values to dendritic spine images do not give the desired result. By experimenting and after lots of trial and error, we decided to use k = 0.1 and R = 90. For each pixel [i, j], we select the window starting from 5 × 5 up to 17 × 17, and for each window, we calculate the mean and standard deviation. And then, the thresholding (Ts (i, j)) values are calculated using above equation below σ −1 Ts (i, j) = μ 1 + k R
(1)
272
S. Paul et al.
where μ = mean o f the pi xel intensities o f the window, σ = standar d deviation o f the pi xel intensities o f the window, k = 0.1, R = 90(k, R being constants). Threshold calculated by Niblack method [8] is given by the below equation: Tn (i, j) = μ + k × σ
(2)
For each pixel, a decision is taken using a consensus strategy. Here, we use two types of consensus strategy, one is quality consensus strategy, and another is metaconsensus strategy. In the quality consensus method, we consider only Sauvola’s method, and for each window, the pixel decision is taken whether the pixel is foreground or background, and based on the decision, the consensus value is taken into consideration. If one of the decisions for a particular pixel says to be foreground and the other says to be background, then foreground was taken. Similarly, we decide on starting from at least two decisions up to all the seven decisions, respectively. For the meta-consensus method, the decision value for the same pixel from different methods such as Otsu, Niblack’s method, Saouvola’s method, and UNet method are considered. Each method will produce a threshold value for the same pixel, and whatever the majority of the models decide the pixel intensity for that particular pixel should be in the final image, was taken into consideration. Then, the output from a very simplistic model that had been trained on the transfer learning method was considered and was once again compared with the existing threshold value. The images obtained, however, have some background noise around them So a filter is applied for removing all the background white noises by inspecting whether that pixel is surrounded by 8 or 7 or 6 black pixels. The images which come out sometimes have the mushrooms or dendrites detached from a few spines. Then, we apply the morphological close operation on those filtered images by choosing a 3 × 3 window. And after this filter, the quality of the images improves significantly.
3 Experimental Results and Discussion 3.1 Dataset Description The predictions of our model were given on a dataset of 25 input of neuronal images of brains of rats, with several features like dendrites spines and mushrooms. The images are of the type in vivo, in vitro, ex vivo, and the details of culturing and processing them are mentioned in Basu et al. [13]. The ground truths of each of these images were painstakingly prepared by hand, using paint and manual thresholding of the images to binarize it. Each sample took around 2 h to meticulously construct all the spines, mushrooms, and dendrites in the image. Samples images are below.
A Meta-consensus Strategy for Binarization of Dendritic Spines Images
273
3.2 Results and Discussion Displayed beneath is a random selection of 2 samples, (sample1 being 508 × 512 px and sample2 being 1024 × 1024 px) from our dataset and their corresponding outputs when a variety of thresholding methods of binarization (like Sauvola, Niblack, Otsu, a variety of ensemble of these models and a deep learning model like a simplistic UNet) have been displayed, along with their precision, recall and f -measure scores in a tabular format. The UNet has been trained on 20 labeled samples of the DRIVE [14] dataset on images of the human retinae, and the following results were obtained. It is to be noted that while training the UNet, transfer learning was used as the images the model was trained on where that of human retinae, while the predictions were made on images of dendritic spines. There may be other models who can produce better results, but within our limited computational resources and given the unavailability of a huge dataset of images of dendritic spines and annotated labels for the same, the following section will show that our proposed consensus-based strategy performed much better on accounts of F-measure score, and recall than most of the other methods listed below. In more than one occasion, the images below will display that the proposed method is more similar to the ground truth in terms of retention of the structure of the spines (where the UNet and niblack doesn’t seem to be doing a good job), the overall noise reduction in the images (where the simple Sauvola and niblack are not doing a good job). However, in spite of getting a good F-measure and better precision, our proposed method can’t give the output images where all the spine and dendritic mushrooms are connected. After observing, it is clear that there exist a few spines that are not connected to the dendrite. Also, our proposed algorithm and implemented Python code need some optimization. These methods are often slow since the computation of image features from the local neighborhood is to be done for each image pixel, but the time taken is a lot less compared to its deep learning counterparts (Figs. 1, 2, 3 and Table 1). Plots of different metrics like F-measure, recall, and precision (Fig. 4).
274
S. Paul et al.
(a)
(b)
Fig. 1 a Input image and b ground truth
4 Conclusion The present work describes a consensus-based binarization method considering few classic binarization techniques and UNet model for binarization of MIP images of dendritic spines. The proposed process is automatic and the user can modify various parameters to generate better results as per the needs of a particular dataset. We argue that the proposed algorithm successfully does better binarization for better segmentation results, operating within an acceptable margin of error. Segmented dendritic spines, spine counts all of this is essential in the study of the human brain and various cognitive diseases [6]. The proposed consensus-based binarization may be applied to other domain of bio-medical imaging in future.
A Meta-consensus Strategy for Binarization of Dendritic Spines Images
275
Fig. 2 Comparisons of different binarization algorithms with our proposed meta-consensus method for sample1 a input image, b ground truth, c Otsu’s method, d Niblack’s method (with 876 filter, morphological closing), e Sauvola’s method (window size = 15, 876 filter, morphological closing), f UNet predictions, and g proposed meta-consensus model results
Fig. 3 Comparisons of different binarization algorithms with our proposed meta-consensus method for sample2 a input image, b ground truth, c Otsu’s method, d Niblack’s method (with 876 filter, morphological closing), e Sauvola’s method (window size = 15, 876 filter, morphological closing), f UNet predictions, and g proposed meta-consensus model results
276
S. Paul et al.
Table 1 Average F-measure, recall, and precision data with different methods Method
F-measure
Recall
Precision
Otsu
0.744
0.988
0.597
Niblack
0.823
0.974
0.713
Sauvola WS = 5
0.863
0.881
0.846
Sauvola WS = 7
0.868
0.922
0.821
Sauvola WS = 9
0.883
0.948
0.826
Sauvola WS = 11
0.889
0.962
0.826
Sauvola WS = 13
0.877
0.963
0.805
Sauvola WS = 15
0.87
0.961
0.796
Sauvola WS = 17
0.862
0.958
0.783
Sauvola multi (voting)
0.933
0.963
0.905
Sauvola ensemble (QC1)
0.882
0.959
0.817
Sauvola ensemble (QC2)
0.907
0.967
0.853
Sauvola ensemble (QC3)
0.924
0.969
0.884
Sauvola ensemble (QC4)
0.933
0.963
0.905
Sauvola ensemble (QC5)
0.938
0.948
0.929
Sauvola ensemble (QC6)
0.928
0.919
0.937
Sauvola ensemble (QC7)
0.915
0.88
0.952
Meta-consensus (voting)
0.894
0.978
0.824
UNet
0.618
0.448
0.997
A Meta-consensus Strategy for Binarization of Dendritic Spines Images
277
Fig. 4 Comparison of various parameters like F-measure, precision, and recall for different binarization methods for a sample of 10 images. The black line denotes the performance of the proposed method. a Recall, b F-measure and c precision
Acknowledgements We are very much thankful to Prof. Jakub Włodarczyk and Ewa B˛aczy´nska of Nencki Institute of Experimental Biology, Warsaw, Poland for providing the images used in this paper.
References 1. Zhang H, Fritts JE, Goldman SA (2008) Image segmentation evaluation: A survey of unsupervised methods. Comput Vis Image Underst 110(2):260–280. https://doi.org/10.1016/J.CVIU. 2007.08.003 2. Harris KM (1999) Structure, development, and plasticity of dendritic spines. Curr Opin Neurobiol 9(3):343–348 3. Lee KFH, Soares C, Béïıque J-CB (2012) Examining form and function of dendritic spines. Neural Plast. https://doi.org/10.1155/2012/704103 4. Sala C, Segal M (2014) Dendritic spines: the locus of structural and functional plasticity. Physiol Rev 94(1):141–188. https://doi.org/10.1152/PHYSREV.00012.2013 5. Fiala JC, Spacek J, Harris KM (2002) Dendritic spine pathology: cause or consequence of neurological disorders? Brain Res Rev 39:29–54. www.elsevier.com/locate/bres 6. Das N et al (2021) 3dSpAn: an interactive software for 3D segmentation and analysis of dendritic spines. Neuroinformatics. https://doi.org/10.1007/s12021-021-09549-0
278
S. Paul et al.
7. Basu S, Saha PK, Roszkowska M, Magnowska M, Baczynska E, Das N, Plewczynski D, Wlodarczyk J (2018) Quantitative 3-D morphometric analysis of individual dendritic spines. Sci Rep 8(1):1–13 8. Basu S, Plewczynski D, Saha S, Roszkowska M, Magnowska M, Baczynska E, Wlodarczyk J (2016) 2dSpAn: semiautomated 2-d segmentation, classification and analysis of hippocampal dendritic spine plasticity. Bioinformatics 32(16):2490–2498 9. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Sys Man Cybern 9:62–66 10. Niblack W (1986) An introduction to digital image processing. Prentice Hall, Eaglewood Cliffs, Prentice Hall, Eaglewood Cliffs, pp 115–116 11. Sauvola J, PietikaKinen M (2000) Adaptive document image binarization. Pattern Recogn 33:225–236 12. Weng W, Zhu X (2021) INet: convolutional networks for biomedical image segmentation. IEEE Access 9:16591–202116603. https://doi.org/10.1109/ACCESS.2021.3053408 13. Basu S et al (2016) 2dSpAn: semiautomated 2-d segmentation, classification and analysis of hippocampal dendritic spine plasticity. Bioinformatics 32(16):2490–2498 (Oxford University Press). http://bioinformatics.oxfordjournals.org/ 14. Introduction—Grand Challenge. https://drive.grand-challenge.org/
Malignancy Identification from Cytology Images Using Deep Optimal Features Soumyajyoti Dey, Soumya Nasipuri, Oindrila Ghosh, Sukanta Chakraborty, Debashri Mondal, and Nibaran Das
Abstract Automated cytology image classification using computer-aided diagnosis (CAD)-based system is an important task for diagnosing cancer at early stage. For the last few decades, researchers have remain involved in the research of cytology domain. Optimal features selection is an utmost importance to enhance the performance of cytology image classification. In this article, we have proposed artificial electric field algorithm to find the optimally appropriate features by discarding the less relevant ones and have boosted the performance of cytology image classification. The features are first extracted from traditional ResNet-18 model. Finally, these optimal subsets of features are classified by SVM classifier and achieved the accuracy 85%. The relevant codes of the proposed classification model are publicly available on GitHub. Keywords Cytology · Artificial electric field · Convolution neural network · Classification · Support vector machine
1 Introduction In modern society, cancer is the most severe disease [13]. According to World Health Organization (WHO), the mortality rate due to cancer per year is 4.6 million [5]. If it is diagnosed at an earlier stage, the chance of mortality may decrease. Among different types of cancer diagnosis processes, fine needle aspiration cytology(FNAC) is S. Dey · S. Nasipuri (B) · O. Ghosh · N. Das Jadavpur University, Kolkata, India e-mail: [email protected] N. Das e-mail: [email protected] S. Chakraborty · D. Mondal Theism Medical Diagnostics Centre, Kolkata, India e-mail: [email protected] D. Mondal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_26
279
280
S. Dey et al.
more convenient and performed in the shortest turnaround time. So, cytology image classification is more crucial in the early stage of cancer diagnosis. For the past several decades, researchers are still trying to develop many computer-aided diagnosis (CAD)-based systems [10] for automated cytology image classification. In the recent year, deep learning has taken an important role for this task. Previously, in machine learning technique, hand crafted features are extracted from cytology images, but in recent days, deep learning features are extracted for image classification. Many optimization algorithms like genetic algorithm [1, 7], differential evolution (DE) [9], particle swarm optimization [2], PCA and GWO [4], artificial bee colony optimization [12] are used for optimal features selection. In this paper, we have proposed an optimal features selection technique for boosting the deep learning-based classification performances of cytology images. First, we have extracted features by convolution neural network (CNN) model, and then, these features are optimized by artificial electric field algorithm [14]. Finally, these features are classified by SVM classifier. In summary, our main contributions are as follows: • The deep features of the cytology images are extracted from the last layer using ResNet-18 CNN architecture. • The deep optimal features set is selected by artificial electric field algorithm using SVM as a classifier. The experimental results came out to be satisfactorily good.
2 Literature Survey Many works are reported on medical image classification using optimal feature selection. Das et al. [6] proposed an optimized feature selection technique using Jaya algorithm. For selecting the optimal features, they used the Jaya optimization algorithm (FSJaya) on top of the novel feature selection method (FS), followed by supervised machine learning techniques for measuring classification accuracy. The process used a search methodology to identify the most appropriate features by updating the worst features to lower the feature space’s dimensions. The performance is evaluated on the Pima dataset and achieved an accuracy of 79%. Kabir et al. [8] proposed a new hybrid genetic algorithm (HGA) for feature selection (FS), called HGAFS, which contains a novel local search operation to tune the search method in the FS process. This improvement guides the search process such that less correlated (different) information such as general and special properties of a dataset can be used to adapt the newly created offspring, and it was tested on 11 real-world classification datasets, which have dimensions that vary from 8 to 7129, and the performance was compared with ten existing well-known FS algorithm and found that HGAFS produced better performances with essential features with better accuracy.
Malignancy Identification from Cytology Images …
281
Mitra et al. [11] proposed a superpixel-based cytology image segmentation approach by applying various morphological and clustering algorithms like anisotropic diffusion, DBscan, Fuzzy C-means, etc. After that, the features were extracted from the segmented nuclei, and finally, they achieved an accuracy of 91% on the SVM classifier. Agrawal et al. [3] have implemented the artificial bee colony (ABC) algorithm in CT scan images for diagnosing cervical cancer. ABC and SVM (Gaussian Kernel) have been used for feature selection, and classification and the accuracy came out to be 99%.
3 Material and Method 3.1 Dataset Description The cytology images of the FNAC test are collected from “Theism Diagnostics Center, West Bengal”, in the presence of professional practitioners. The images are captured by CMOS Camera attached with Olympus trinocular microscope in 40x magnification. In this work, 94 benign and 109 malignant samples are used for experiment purpose. Due to fewer cytology images, the dataset is split into 75%, 13%, and 12% for train, test, and validation sets. Some samples from each class are mentioned in Fig. 1.
3.2 Proposed Methodology The artificial electric field algorithm (AEFA) [14] is a metaheuristic optimization algorithm primarily based on Coulomb’s law of electrostatics. Coulomb’s law states that the electrostatic force of attraction or repulsion between two charged particles is proportional to the product of the magnitude of their charges and inversely proportional to the square of the distance between them. The electrostatic force repels or attracts charged particles in space. Let’s consider the search space of an optimization problem to be the space and the population of the solution to be the charged particles residing in them; we can see that the particles change position due to the electrostatic forces among them. Also, the electrostatic force is the only connecting link among the charges, and as a result, the problem solution takes the position of the charges in the space. In AEFA, the magnitude of charges of the particles is considered a fitness factor to evaluate the population, and the positions of the particles in the search space are considered the solution to the optimization problem. Only the attractive electrostatic force is considered in the algorithm; consequently, the charges with high magnitude attract the charges of lower charges. In turn, those charges show leisurely pace while moving along the space. As a result, the artificial electric field algorithm
282
S. Dey et al.
is seen as a closed system of charges that obeys Coulomb’s law of electrostatic force and Newton’s laws of motion. The challenge of selecting the best appropriate optimal subset of features is NPhard and requires an exhaustive search. With an increment in the number of features, the state space of the search algorithm gets expanded exponentially. To overcome this, we have used AEFA to select the optimal subset of features from the given set of features. The feature selection algorithm using the optimization algorithm generally works in the following way: (1) From the given feature space, a subset of features is selected based on the optimization algorithm. (2) The fitness value of the selected features is calculated. (3) The features are then modified using the optimization
Fig. 1 Some cytology image samples (dim: 96 × 1280 pixels): (a) benign sample, (b) malignant sample
Fig. 2 Flow diagram of proposed malignancy identification technique
Malignancy Identification from Cytology Images …
283
Fig. 3 Flow diagram of optimal features selection using AEFA
algorithm. (4) Until the termination condition occurs, the previous three steps are continually executed. The flowchart of the proposed methodology is shown at Fig. 2. The optimal feature selection algorithm’s goal is to choose a significantly befitted feature set c from the initial feature set d, such that c ≤ d. Firstly, a random set of the population of N particles S = [S1 , S2 , S3 , . . . , S N ]T is sampled. Here, every Si = [Si,1 , Si,2 , Si,1 , . . . , Si,d ] represents a binary vector of a subset of features in a d dimensional feature space. Si,d = 0 represents the fact that the feature is being discarded, and Si,d = 1 signifies that the feature is being present or, in other words, selected. The classification method considers the randomly selected features for fitness (accuracy) computation. The accuracy or the fitness of the sample is calculated using the support vector machine with RBF kernel. The optimal feature selection method aims to choose the substantially appropriate subset of optimal attributes to maximize the fitness value. The workings of the AEFA for the ith sample of the population are shown at Fig. 3. The optimal features selection technique is described in the following steps: 1. The values of D (dimension), N (total number of subsets), and maxiteration (maximum number of iterations) are initialized.
284
S. Dey et al.
2. The binary feature subsets (S1 , S2 , S3 , . . . , S N ) are initialized randomly in the search range 3. The velocities are initialized to a random value. 4. The fitness values ( f S1 , f S2 , f S3 , . . . , f Sn ) of (S1 , S2 , S3 , . . . , S N ) are evaluated 5. Iteration is set at g = 0 (g represents the generation of the population). 6. Coulomb’s constant K , best fitness and worst fitness are calculated 7. Fitness f Si is calculated for the particular iteration g g g 8. The total force Fi,d , the total electric field ei,d and acceleration ai,d is calculate g+1 g+1 9. The velocity vi,d and position Si,d of each particle are updated. 10. The local maxima are checked for a particular subset. If the fitness of the updated value is greater than the previous fitness value, the new subset is considered for the next iteration. Otherwise, the previous subset is taken into consideration. 11. Probability is calculated for each subset, and the encoding function is used, which creates the binary vector from the feature space. g+1
– prob(Si,d ) =
1 g+1 exp(−vi,d )
g+1
g+1
g+1
– if prob(Si,d ) > random number then Si,d = 1 and Si,d = 0 if otherwise. 12. Steps 6 to 11 are continually repeated until the termination criteria are fulfilled. 13. Finally, the optimal subset is returned. 14. The accuracy for the optimal subset is checked against validation data.
4 Results and Discussion Training Process We have first trained the ResNet-18 model for extracting features. Due to fewer amount of data, the train set is augmented by some traditional approaches. First, the original images are cropped randomly in the dimension of 224 × 224; after that, it is flipped in horizontal and vertical directions and rotated at 30-degree angle. The model is trained in PyTorch environment and run in NVIDIA-510 6GB GPU. The best trained model is saved where validation loss is minimum. At the time of training, the hyperparameters like batch size, number of epochs, optimizer, and loss function are set in 4200, Adam, Negative log-likelihood loss. The features are extracted from the convolution layer before the fully connected layer of best performing ResNet-18 model. The graph of the training versus validation loss is shown in Fig. 4. Finally, the test set is classified by support vector machine (SVM) classifier on radial basis function (RBF) kernel. The maximum number of iterations and population size are set to 10 and 25, respectively. The stopping criterion for the experiment is set to 10. The recall (R), precision (P), and F1-score (F1) for the model are calculated by using the following formulae, where TP = True Positive, FN = False Negative, FP = False Positive, TN = True Negative
Malignancy Identification from Cytology Images …
285
Fig. 4 Graphical representation of training loss (red) versus validation loss (blue) of ResNet-18 Model. X -axis represents the number of epochs Table 1 Performances of SVM classifier by with and without optimal features selection Performance metrices Without feature selection With feature selection Number of features Accuracy (%) F1-score Precision Recall
512 0.62 0.61 0.62 0.64
91 0.85 0.83 0.81 0.82
Bold—Our approach achieves better than the traditional approach
R=
TP (2 ∗ P ∗ R) TP ;P = ; F1 = (TP + FN) (TP + FP) (P + R)
The confusion matrices before and after feature selection are demonstrated in Fig. 5. When the features (512 features) were in unfiltered form, the accuracy, precision, recall, and F1-score tested were 62%, 62%, 64%, and 61%, respectively. After the optimal subset was selected using artificial electric field algorithm (AEFA), the results were better. The number of features specified in the optimal subset is 91, and we have achieved accuracy, precision, recall, and F1-score which were 85%, 71%, 82%, and 83%, respectively, which is better than the performance evaluation of the unfiltered dataset. In brief, the results of with or without features selection are described in Table 1.
286
S. Dey et al.
Fig. 5 Confusion matrices of classification by SVM. Left: after optimal feature selection, right: before features selection
5 Conclusion This paper aims to use the AEFA method to determine the most suited optimal feature subset successfully. The experimental results reveal that the proposed model outperforms the competition by a wide margin. The suggested model performs better computationally for most cases and achieves higher accuracy in selecting the best optimal subset of features. Several innovative and effective metaheuristic algorithms could be combined with the wrapper-based features selection technique in the future to choose optimal features from various classification models. These feature selection models could be used for disease prediction, weather prediction, and text data analysis, among other things. Acknowledgements This work is financially supported by SERB (DST), Govt. of India (Ref. No.: EEQ/2018/000963). Authors are grateful to the Theism Diagnosis Centre, Kolkata for supplying the FNAC slides for experimental purpose.
References 1. Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S (2016) Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iran J Basic Med Sci 19(5):476 2. Aghdam MH, Heidari S (2015) Feature selection using particle swarm optimization in text categorization. J Artif Intell Soft Comput Res 5 3. Agrawal V, Chandra S (2015) Feature selection using artificial bee colony algorithm for medical image classification. In: 2015 eighth international conference on contemporary computing (IC3). IEEE, pp 171–176 4. Basak H, Kundu R, Chakraborty S, Das N (2021) Cervical cytology classification using PCA and GWO enhanced deep features selection. SN Comput Sci 2(5):1–17 5. Carioli G, Malvezzi M, Bertuccio P, Boffetta P, Levi F, La Vecchia C, Negri E (2021) European cancer mortality predictions for the year 2021 with focus on pancreatic and female lung cancer. Ann Oncol 32(4):478–487
Malignancy Identification from Cytology Images …
287
6. Das H, Naik B, Behera H (2020) A Jaya algorithm based wrapper method for optimal feature selection in supervised classification. J King Saud Univ Comput Inf Sci 7. Das N, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl Soft Comput 12(5):1592–1606 8. Kabir MM, Shahjahan M, Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17):2914–2928 9. Khushaba RN, Al-Ani A, Al-Jumaily A (2011) Feature subset selection using differential evolution and a statistical repair mechanism. Expert Syst Appl 38(9):11515–11526 10. Mitra S, Das N, Dey S, Chakraborty S, Nasipuri M, Naskar MK (2021) Cytology image analysis techniques toward automation: systematically revisited. ACM Comput Surv (CSUR) 54(3):1–41 11. Mitra S, Dey S, Das N, Chakrabarty S, Nasipuri M, Naskar MK (2020) Identification of malignancy from cytological images based on superpixel and convolutional neural networks. In: Intelligent computing paradigm: recent trends. Springer, pp 103–122 12. Roy A, Das N, Sarkar R, Basu S, Kundu M, Nasipuri M (2012) Region selection in handwritten character recognition using artificial bee colony optimization. In: 2012 third international conference on emerging applications of information technology. IEEE, pp 183–186 13. Silverstein A, Silverstein VB, Nunn LS (2006) Cancer: conquering a deadly disease. Twentyfirst century books 14. Yadav A et al (2019) Aefa: artificial electric field algorithm for global optimization. Swarm Evol Comput 48:93–108
Source Camera Identification Using GGD and Normalized DCT Model-Based Feature Extraction Pabitra Roy, Shyamali Mitra, and Nibaran Das
Abstract In recent decades, blind feature-based source camera detection has drawn a lot of attention. In literature, researchers used a specific sort of distortion, such as vignetting effects, chromatic aberration, and radial lens distortion, etc., to distinguish between different camera models. Therefore, it becomes specific to the distortions present in the particular camera model. However, distortion-specific approaches perform poorly in the absence of the specific distortion in an image. To develop a source camera identification system, we introduce a non-distortion-specific methodology using normalized DCT coefficients, a generalized Gaussian distribution (GGD) model-based blind features. DCT is performed on the sub-images of N × N blocks after the image is normalized using mean subtracted contrast normalization (MSCN). Over three scales, the DCT features are extracted. To perform camera model identification, a multi-class SVM is used to utilize all the extracted features. The experiment was conducted on the Dresden image database, which contains 10,173 images captured by 43 devices distributed over 12 camera models. The proposed method showed a maximum accuracy of 98.82% on the dataset. Keywords Digital image forensic · Mean subtracted contrast normalization (MSCN) · Discrete cosine transform (DCT) · Generalized Gaussian distribution (GGD)
P. Roy (B) Department of Computer Science and Engineering, Ramkrishna Mahato Government Engineering College, Purulia, India e-mail: [email protected] S. Mitra Department of Instrumentation and Electronics Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] N. Das Department of Computer Science and Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_27
289
290
P. Roy et al.
1 Introduction The world is experiencing unprecedented growth in the popularity of digital imaging devices in modern society and its applications. With the increase of popularity of digital camera, it is available in a variety of digital devices such as smartphone, surveillance system, and online education system. Powerful personal computers, digital technology, and available high-speed Internet have made it possible for an average person to produce and transmit digital information. The growing practice of digital photography using the advanced photo-editing software makes it difficult to identify the authenticity of the photographic pictures, loosing the credibility as accurate documents of events. Thus, digital images have emerged as a challenging field of forensic research aimed at exposing tampering operations and counterfeiting to digital images [1]. The acquisition of a visual digital object can lead to various processing stages in its lifetime, with the goal of enhancing the quality, blending pre-existing elements to create new content, or even interfering with the content. So, apparently “seeing is no longer believing” [2]. With reference to the authenticity of an image, the following two questions may arise: (a) Is the captured image related to the camera device being claimed? (b) Does the captured image represent the original scene? Answer to the first question allows one to identify the device or the user that compiled the image and question two let us know if there is any image adulteration. It is possible to address the above questions, in presence of the original image. But in practical cases, no prior knowledge about the image is available. Therefore, authentication of image is done in a completely blind way. In the past, to restore the credibility of digital images [3], the trustworthy digital camera, or secure digital camera [4], has been used. The trustworthy camera either embedding a digital watermark [5, 6] in the image or uses a digital signature to make it authenticate. In the case of the passive method, no previous knowledge about the image is required to prove the authenticity of the digital image. In multimedia forensics [7, 8], every image leaves a distinct mark or traces on the data in the form of a digital fingerprint through the image processing pipeline, i.e., from its real-world scene to the final digital image [9]. Digital fingerprints not only help to identify the digital image source but also help to decide whether it is genuine or doctored by identifying its presence, absence, or inconsistency with features associated with digital content [2]. We are quickly reaching to a condition where one can no longer take the authenticity and integrity of digital images for granted. Image Forensics can help us to assess the reality and integrity of a given digital image by following the research directions toward the identification of the source camera and image fraud detection. Nowadays, a lot of work is being done to make use of the traces left by a camera’s demosaicing process [10–12]. Chen and Stamm [10] introduced a camera identification system based on demosaicing. Marra et al. [12] use steganalysis to adapt rich model attributes to accomplish camera model identification. Other methods [13–15] rely on imprints left by the full image processing stages and demosaicing algorithms. Tuama et al. [15] used the concatenation of color band noise residual co-occurrence
Source Camera Identification Using GGD and Normalized DCT …
291
matrix and features derived in DCT domain with a Markovian model. In [16, 17], a deep learning technique to camera recognition was examined. Every digital camera has a complex optical system that transfers the measured light intensity to a tiny CCD or CMOS sensor. Every time an image is collected, distortions are added due to the lens’ projection onto the sensor. They include vignetting effects, chromatic aberration, and radial lens distortion. The usage of various optical systems by various camera manufacturers creates fair distortions during image acquisition [18]. Dirik et al. [19] made use of the traces left by dust particles on a digital camera’s sensor in 2008. SCI was also carried out by Lucas et al. [20] and Chen et al. [14] based on PRNU, which was caused by errors in the manufacturing process of the sensor. In order to distinguish between different camera models, Thai et al. [21] concentrated on modeling the total noise, including photon noise, dark (current) noise, and read noise that corrupts the digital image. A generalized noise model with additional processing and compression steps was employed by Thai et al. [22]. Each author employed a specific sort of distortion, such as vignetting effects, chromatic aberration, radial lens distortion, etc., to distinguish between different camera models. Therefore, the success of those models is dependent on those distortions present in the particular camera models. In the current paper, we employ a non-distortion-specific methodology. Initially, we used DCT with the GGD model. The reason for choosing DCT is that it can capture distortion present in an image, but the average accuracy was only 96.5%. The use of MSCN in conjunction with DCT gives an average accuracy of 98.82%. The average accuracy is enhanced by 2.32% after using MSCN, proving that the statistics of the DCT features are insufficient on their own to identify source cameras. The suggested technique extracts features faster and has a feature dimension of 180 which reduces the calculation time for detecting the camera model and enables real-world application. In addition, the suggested technique produces results that are analogous to those obtained by cutting-edge methods. The layout of the paper is outlined below. Section 2 reveals the source camera identification with a detailed explanation of the proposed method. The experiment setup and results analysis are described in Sect. 3. Finally, Sect. 4 brings the paper to a conclusion.
2 Source camera identification 2.1 Mean Subtracted Contrast Normalization (MSCN) At various phases of the image gathering process, several sources of flaws and noise come into the picture. The final digital image will have modest variances in intensity across individual pixels, even if the imaging sensor records a perfectly evenly lit environment. Fixed pattern noise, PRNU noise, low-frequency defects noise, and so on are just a few examples of noise. Depending on the type of distortion/impairments
292
P. Roy et al.
present in an image, the MSCN coefficient distribution is affected differently [23– 25]. The MSCN coefficient was chosen because, first, it can capture any form of imperfections/distortions present in an image, which are related with each camera model, and second, its computing cost is inexpensive. MSCN is a recent approach for converting image intensity to luminance at a specific pixel. The MSCN coefficients are calculated using the following equations. I (i, j) − μ(i, j) Iˆ(i, j) = σ (i, j) + c
(1)
where I (i,j) is the image intensity at a given pixel (i, j) and I (i, j) is the luminance corresponding to the pixel, where i 1, 2, 3…M; j 1,2, 3…, N are the spatial indices represent the dimension of the image. μ (i, j) and σ (i, j) are the local mean-field and the local variance field, respectively. Below μ and σ are defined. μ(i, j) =
N M
wm,n I (i + m, j + n)
(2)
m=−M n=−N
M N σ (i, j) = wm,n [I (i + m, j + n) − μ(i, j)]2
(3)
m=−M n=−N
where w = wm,n |m = −M, . . . M, n = −N , . . . N is a 2D circularly symmetric Gaussian weighting function sampled out to three standard deviations (M = N = 3) and rescaled to unit volume. The value of c is set at 1.
2.2 Overview of the Proposed Method When camera device captures images, some kind of distortions is intrinsically embedded within the image during the acquisition of image in the whole image processing pipeline. The presence of distortion reduces the quality of the image. A non-distortion-specific algorithm has been proposed in this paper using normalized DCT coefficients with GGD model. An image without any kind of distortions fits best for GGD model. Deviation from GGD model is caused by the presence of distortion. The GGD model parameters vary with the presence of distortion. The algorithm is trained with the features extracted from normalized DCT coefficients. The motive to choose DCT domain is that the presence of distortion, the statistics of DCT features also vary. We have applied normalization before extraction of DCT coefficients as it tends toward Gaussian distribution which can be best captured by GGD model. In Fig. 1a, we have plotted histogram of DCT coefficient block vs normal distribution. We have seen that the distribution of DCT coefficients are symmetrical and in Fig. 1b histogram of normalized DCT coefficient vs normal distribution has been displayed
Source Camera Identification Using GGD and Normalized DCT …
293
Fig. 1 Distribution of DCT coefficient. a DCT coefficient block versus normal distribution, b normalized DCT coefficient versus normal distribution
where distribution tends toward Gaussian distribution. The distribution deviates from its normal behavior in the presence of any kind of distortion. MSCN coefficients are a well-established theory which has been used previously to measure the image quality [23, 24]. When an image is degraded for any kind of distortion, the statistical distribution of MSCN coefficients does not follow the regular behavior, while for an undistorted image, they are highly regular. The normal distribution is modified in the presence of distortion [23]. MSCN quantifies the naturalness and quantifies the quality of the image in the presence of distortion [24]. For feature extraction, local DCT coefficients are estimated using a GGD model. DCT coefficients are symmetrical distribution [25] and tend toward Gaussian distribution when MSCN is applied and is well suited for GGD model. DCT computations are also fast. The proposed method uses the statistical features, which are changes from one camera model to another camera model in the existence of imperfections/distortions with the help of both MSCN and DCT coefficients, as both can capture the distortions. A high-level overview of the proposed method is depicted in Fig. 2.
2.3 Detailed Description of Proposed Method We were motivated by Saad et al. [25], in which the authors extract features for image quality evaluation. The following steps are performed in order to extract the features as shown in [25]. In step (a), the image is normalized using MSCN, then the image is partitioned into blocks, and DCT is performed on each block (Fig. 2). The non-DC DCT coefficient blocks are further partitioned into three oriented sub-regions and three radial frequency sub-bands as shown in Fig. 3, and the GGD fit is applied to each orientation and each radial frequency sub-band individually as given in steps (b), (c), and (d). The GGD model parameters are estimated using step (e).
294
P. Roy et al.
Fig. 2 High-level overview of the proposed method
DC
C12
C13
C14
C15
DC
C12
C13
C14
C15
DC
C12
C13
C14
C15
C21
C22
C23
C24
C25
C21
C22
C23
C24
C25
C21
C22
C23
C24
C25
C31
C32
C33
C34
C35
C31
C32
C33
C34
C35
C31
C32
C33
C34
C35
C41
C42
C43
C44
C45
C41
C42
C43
C44
C45
C41
C42
C43
C44
C45
C51
C52
C53
C54
C55
C51
C52
C53
C54
C55
C51
C52
C53
C54
C55
(a)
(b)
(c)
Fig. 3 Partition of image block. a 5 × 5 DCT coefficient matrix, b DCT coefficients along three oriented sub-region, c upper, middle, and lower frequency sub-bands
Steps: a. Normalize the image using Eqs. (1), (2), (3). b. Partition the image into N × N blocks and perform 2D DCT computation on each block. Then, apply GGD model to each non-DC DCT coefficient block. c. To capture directional information, partition the DCT coefficients blocks directionally three oriented sub-regions and apply the GGD model to each orientation individually. d. Partition the DCT coefficients blocks into upper, middle, and lower radial frequency sub-bands, which correspond to low-frequency, mid-frequency, and high-frequency DCT sub-bands. The GGD fit is acquired for each radial frequency sub-region. e. Compute GGD model parameters using steps I–IV. We extract four model-based features. The detailed description is given below. A univariate GGD [25] is given by Eq. (4).
Source Camera Identification Using GGD and Normalized DCT … γ
f (x|α, β, γ ) = αe−(β|x−μ|)
295
(4)
where μ, γ , α, andβ represent the mean, shape, normalizing, and scaling parameters. The αandβ are defined below. α=
βγ
2 γ1
3 1 γ β=
σ 1 γ
(5)
(6)
where σ and ( ) are the standard deviation and gamma function, respectively. The ( ) is given by Eq. (7).
∝
(z) =
t z−1 e−t dt
(7)
0
I. GGD shape parameter (γ ) The shape parameter is computed over all the non-DC DCT coefficient by deploying GGD model on each NxN blocks. For gamma (γ ) = 2 and gamma (γ ) = 1 yields the Gaussian (normal) density function and Laplacian density function. Smaller the value of shape parameter (γ ) corresponds to more peaked distribution [26]. II. Coefficient of frequency variation (η) Frequency variation is computed over all the blocks in the image. The frequency variation is computed using the Eq. (8). η=
σ|x| μ|x|
(8)
where σ|x| and μ|x| are the standard deviation and mean of DCT coefficient magnitude |x|. μ|x| measures the center of the DCT coefficient magnitude distribution, and σ|x| measures the spread of DCT coefficient magnitude. III. Energy sub-band ratio (Rn ) These features measure the relative distribution of energies in lower bands and higher bands. The presence of distortion can affect both the lower bands and the higher bands. The average energy for frequency band n is calculated by Eq. (9). E n = σn2
(9)
where variance σn2 corresponds to band n. The ratio Rn is computed for n = 2, 3 using Eq. (10), given by
296 Table 1 Feature list
P. Roy et al.
Feature_Id/Scale
Description of feature
f1–f5
Shape parameter features
f6–f10
Frequency variation (coefficient) features
f11–f15
Energy sub-band ratio features
f16–f20
Orientation based features
Rn =
E n −
1 n−1
En +
1 n−1
j γth )
(3)
For the S–R link, Pr ob(γ1 > γth ) can be calculated as
∞ Prob(γ1 > γth ) =
pγ1 (γ1 )dγ1 .
(4)
γth
Substituting the expression of (1) into (4), it is obtained as m ∞ m mγ1 1 m−1 dγ1 . Prob(γ1 > γth ) = γ1 exp − (m) γ1 γ1
(5)
γth
Using ([9], (3.381.3)), the expression of (5) can be simplified as Prob(γ1 > γth ) =
m 1 γ1,th . m, (m) γ1
(6)
Prob(γ2 > γth ) may be evaluated by considering Weibull fading for R–D hop as,
384
D. Baishya and R. Mudoi
∞ Prob(γ2 > γth ) =
pγ2 (γ2 )dγ2 .
(7)
γth
Substituting the expression of (2) into (7), we obtain 2c c 1 + 2c Prob(γ2 > γth ) = 2 γ2 ∞ 2c
c γ2 2 2 −1 γ2 exp − 1+ dγ2 . γ2 c
(8)
γth
Using ([9], (3.381.8)), the expression of (8) can be derived as ⎞ ⎛ c 1 + 2c 2 2c Prob(γ2 > γth ) = ⎝1, γ2,th ⎠. γ2
(9)
Putting the expression of (6) and (9) into (3), Pout can be derived as
Pout
⎛ ⎞ c 1 + 2c 2 2c m 1 m, γ1,th ⎝1, =1− γ2,th ⎠. (m) γ1 γ2
(10)
4 Simulation Results The analytical outcomes of the derived equation are plotted and discussed in this section. The threshold SNR (γth ) and Nakagami-m fading parameter (m) are the key criteria for performance design in the Fig. 2. Figure 2 shows the outage probability vs average SNR for the threshold SNR γth = 2 dB and γth = 6 dB. In the Fig. 2, it can be noticed that for γth = 2 dB, the outage probability becomes low with the increment of fading parameter m as compared to γth = 6 dB. For better outage performance, γth = 2 dB is better when the parameter m is high. It can be seen that when the average SNR rises, the outage performance improves. In the Fig. 3, the outage measure of a dual-hop DF type relaying method is illustrated for varying Weibull fading parameter (c) and keeping m = 4. The values of threshold SNR are γth = 2 dB and γth = 6 dB in the Fig. 3. In the figure, it is observed that the best outage probability performance can be achieved when the Weibull fading parameter is high. The outage performance is better for low γth .
Outage Probability of a DF Relay-Assisted Communication System …
385
Fig. 2 Outage probability of a DF type relay-assisted dual-hop communication technique with c = 5 and varying m
Fig. 3 Outage probability of a DF relay-based dual-hop communication system with m = 4 and varying c
5 Conclusions This article analyzes the outage probability of a dual-hop DF type relaying scheme under mixed Nakagami-m as well as Weibull fading channels. The PDF approach is used. The outage probability is derived in terms of gamma function. Simulation data verifies the correctness and the accuracy of the analytical analysis.
386
D. Baishya and R. Mudoi
References 1. Duong TQ, Bao VNQ, Zepernick HJ (2009) On the performance of selection decode-and-forward relay networks over Nakagami-m fading channels. IEEE Commun Lett 13(3):172–174 2. Wang M, Zhong Z (2012) Optimal power allocation and relay location for decode-and-forward dual-hop systems over Weibull fading channels. In: 8th international wireless communications and mobile computing conference, vol 12, pp 651–981 3. Suraweera HA, Louie RHY, Li YH, Karagiannidis GK, Vucetic B (2009) Two hop amplifyand-forward transmission in mixed Rayleigh and Rician fading channels. IEEE Commun Lett 13(4):227–229 4. You M, Sun H, Jiang J, Zhang J (2016) Effective rate analysis in Weibull fading channels. IEEE Wirel Commun Lett 5:340–343 5. Ikki SS, Ahmed MH (2009) Performance analysis of dual hop relaying over non-identical Weibull fading channels. IEEE Veh Technol Conf 69:1864–1868 6. Kapucu N, Bilim M, Develi M (2013) Outage probability analysis of dual-hop decode-andforward relaying over mixed Rayleigh and generalized gamma fading channels. Wirel Pers Commun 71:947–954 7. Simon MK, Alouini MS (2009) Digital communication over fading channels, 2nd edn. Wiley, New York, NY 8. Ikki SS, Ahmed MH (2007) Performance analysis of dual-hop relaying communications over generalized Gamma fading channels. In: IEEE Global Telecommunications Conference, Washington, DC, USA, pp 3888–3893 9. Gradshteyn IS, Ryzhik IM (2000) Table of integrals, series, products, 6th edn. Academic Press Inc., New York
A Triple Band Reconfigurable Filtering Antenna with High Frequency Selectivity Sangeeta Das and Pankaj Sarkar
Abstract The manuscript confers a compact triple band reconfigurable filtering antenna. Primarily, a novel dual band reconfigurable filter is investigated. The proposed filter comprises two λg /2 (λg is the guided wavelength) open-ended uniform impedance resonators (UIRs) at the upper half of the feed lines which is resonating at 5.45 GHz. The lower half of the feed lines constitutes a high impedance folded half wavelength T-shaped resonator to obtain a pass band centered at 2.8 GHz. A transmission line is introduced beneath the feeding line to produce flexible transmission zeroes. Secondly, two non-resonating UIRs are introduced to the proposed dual band filter in such a way that it can eliminate a band from 5.35 to 5.5 GHz. Thus the designed filter can be easily reconfigured to a triple band bandpass filter. Here, PIN diodes are associated and biased so that switching can be feasible from dual pass band to triple pass band and single pass band and also all stop operation and vice versa. Later, a broadband antenna is developed using a ground ring and an inverted L-shaped resonator which is radiating from 2.6 to 5.85 GHz. In the end, the filter is cascaded to the antenna structure aiming to execute the intended triple band reconfigurable filtering antenna. Keywords Filtering antenna · Broadband monopole antenna · Triple band reconfigurable filter · Wide band and narrow band applications · Transmission zero · PIN diodes
1 Introduction Over the last few decades, modern wireless communication system and its function, viz. Bluetooth, WiMAX, Wi-Fi, GPS, WLAN and many others have certified speedy growth. In addition, using multiple antennas bearing a vast frequency range S. Das (B) · P. Sarkar Electronics and Communication Engineering Department, North-Eastern Hill University, Shillong, Meghalaya 793022, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_36
387
388
S. Das and P. Sarkar
to furnish multiple wireless services expands the size and the cost of a system. There is a continuing trend that several wireless applications merge into one device. Gradually, the device is turning miniaturized making it tough to implement numerous RF applications. To decode this aforesaid complication, it is recommended that reconfigurable antennas are extensively used to accomplish multiband system within one physical unit [1–3]. The reconfigurable antenna allows enough applications utilizing ON and OFF switching elements like PIN diode, varactor diode, RF MEMS switch, etc., to grasp various radiation characteristics [4, 5]. The fabrication is still challenging because the switching elements must have the association of biasing lines in radiator of the antenna to reallocate the radiating bands. In case, the designer fails to design the biasing lines accurately, the interference between the radiating elements of the antenna and the biasing lines accomplishes certain troublesome frequencies in the antenna radiating band. Also, it demolishes the radiation characteristic from the design requirements [6]. Consequently, RF designers have been focusing more to diminish unwanted noise at the receiver. This could be established by cascading filter to the antenna feedline. In recent years, the minimization and the seamless fusion of the antenna with reconfigurable filter are given weightage owing to their pivotal role in multiple-input multiple-output (MIMO), cognitive radio and 5G communication systems. Hence, a new approach termed as filtering antenna attracts RF researcher’s mind in current years [7–10]. In [11], a compact frequency agile multiband filtering antenna is suggested and the prototype supports untraditional switching among four radiating bands. A circularly polarized wide band to narrow band reconfigurable filtering antenna is furnished for UWB/WiMAX applications in [12]. In [13], a tri component Quasi-Yagi-Uda antenna array is designed using a tri configuration filter and a reconfigurable power divider which achieves a wide band to bandpass or bandstop application. In [14], a novel and cheaper reconfigurable filtering antenna is suggested based on single pole double throw RF switch coupled resonators. Unfortunately, these reported reconfigurable filtering antennas have various drawbacks such as big circuit size, complex structure, large quantity switching elements, low return loss, less gain, etc., which limit their practical multiband applications. A very few triple band reconfigurable filtering antennas realizing both wide band and narrow band application in one physical unit have been reported in the past with attractive spurious suppression. Therefore, tri-band reconfigurable filtering antennas have been studied extensively for the newly developed communication system to fulfill the demand of compact dimension, low cost, independently switched multiple wide band and narrow band applications. Here, a new triple band reconfigurable filtering antenna is intended with compact size and good harmonic suppression. A monopole antenna using a ground ring and an inverted L-shaped resonator is excited incorporating a triple band reconfigurable band pass filter which is combined to the feeding line of the intended antenna. The approached filtering antenna supports easier switching from dual band to triple band and single band configurations and non-radiating configurations or vice versa to meet the user’s frequency band of interest. The prototype is also flexible for converting wide band to narrow band operation with the help of the proper placement and biasing
A Triple Band Reconfigurable Filtering Antenna with High Frequency …
389
Fig. 1 a Layout of the intended dual band filter. b S parameter responses with different length of transmission line (R1)
of the PIN diodes. The proposed architecture is made on a FR4 substrate possessing height 1.0 mm.
2 Design Methods of Triple Band Reconfigurable Filter A. Dual Band Pass Filter Design The layout of the proposed dual band reconfigurable filter is portrayed in Fig. 1a. It can be seen, the lower pass band is generated due to the fundamental resonance of resonator R2 which is a half wavelength microstrip line centrally fixed to a bent Tshaped open-ended stub operating at 2.5–3.2 GHz. The upper pass band is produced as a result of coupling between two open-ended half wavelength UIRs R3 and R4 operating at 5.45 GHz. R1 identifies a source to load coupling structure which is incorporated in the design to reject undesirable frequencies. Therefore, the first spurious frequency cannot be coupled to the i/p and o/p lines for all the resonators. The optimal design parameters are w = 2 mm, w1 = 0.2 mm, w2 = 0.2 mm, w3 = 2.2 mm, w4 = 2.1 mm, w5 = 0.2 mm, w6 = 0.2 mm, l = 2.5 mm l 1 = 15 mm, l 2 = 15 mm, l3 = 15 mm, l4 = 7 mm, l5 = 4 mm, l6 = 11 mm, l7 = 17 mm, g1 = 0.15 mm, g2 = 0.15 mm, g3 = 0.25 mm, g4 = 0.15 mm, g5 = 2 mm. The significance of transmission line (R1) is to enhance the proposed filter performance by suppressing unwanted frequencies. Figure 1b shows that insertion of coupling line generates additional transmission zeroes between the pass bands as the length increases from 9 to 13 mm. It is observed that three additional transmission zeroes have been created in the desired operating band due to the length of 11 mm of R1. Thus, it improves the selectivity of the proposed dual band filter. After parametric study, the values are considered to be L 5 = 3 mm and W 4 = 1.5 mm to obtain the best result. B. Triple Band Reconfigurable Filter Configuration
390
S. Das and P. Sarkar
Fig. 2 Configuration of the designed triple band reconfigurable filter
Figure 2 represents a triple band reconfigurable filter. Two non-resonating openended half wavelength UIRs (R5 and R6) are incorporated in the developed dual band filter with a coupling gap of 0.6 mm from resonator R3 and R4, respectively as displayed in Fig. 2. UIRs are designed in such a way that it can establish a notch at 5.45 GHz so that the structure can be further tuned to triple band filter. For switching the desired pass band, PIN diodes are deployed to the resonators with DC biasing circuit as shown in Fig. 2. Total four no. of PIN diodes (Skywork SMP1340-079LF) are exploited as switchable components in order to realize reconfigurable multiband. A DC bias voltage of 5 V is accessed with the help of an inductance 2.4 nH RFC to make the diode on. The diodes can be forward biased with respect to a 5 V bias voltage. The other end of the PIN diode is affixed to ground by inductor. For simulation, the PIN diodes are designed via 2 resistor and 0.3 pF capacitance in series for on and off state, respectively. The size of the proposed triple band filter is very minimized which is around (25 × 25 mm2 ). C. Simulation Results and Discussions of the Proposed Filter The suggested filter retains six different output characteristics corresponding to six diode configurations. For the first diode configuration, considering D1, D3 and D4 as kept off and D2 as kept on, the proposed filter accomplishes a dual band configuration. The EM simulated response of the filter is presented in Fig. 3a. It is visualized that the first pass band is operating at 2.5–3.2 GHz owning 10 dB FBW of around 25%. The second pass band is obtained at 5.45 GHz maintaining 10 dB FBW of around 7.5%. So, the structure provides two wide band applications for this diode configuration with less insertion loss of 0.2 dB. Total four no. of transmission zeroes are produced at 2.29, 3.69, 4.68, and 6.37 GHz due to source to load coupling structure which boosts the selectivity of the proposed dual band filter. The conversion of the dual band configuration to solo wide band configuration resonating at 2.5–3.2 GHz is carried out by second diode configuration, i.e., all diodes are in off state and the simulated
A Triple Band Reconfigurable Filtering Antenna with High Frequency …
391
Fig. 3 EM simulated responses of the suggested triple band filter for various diode configurations
response of the filter is mapped in Fig. 3b. Consequently, the intended filter performs identical to a single wide pass band filter. The upper pass band is demolished with a 40 dB attenuation level. Similarly, for the third diode composition, when D3 and D4 are off and D1 and D2 are on, another solo pass band filter is established operating at 5.45 GHz. The characteristic is shown in Fig. 3c. The lower pass band is demolished with a 15 dB attenuation level. The return loss characteristics are below −20 dB for both the single pass band configuration. For the fourth diode configuration, when D2, D3 and D4 are turned off and D1 is turned on, the designed filter reacts as an all stop filter as illustrated in Fig. 3d. Both the pass band are destructed with a satisfactory attenuation level. For the fifth diode configuration, i.e., when D1 is off and D2, D3, D4 are on state, a triple band filter can be achieved as shown in Fig. 3e. At the moment D3 and D4 on, the non-resonating resonators (R5 and R6) are active and it eliminates a band from 5.35 to 5.5 GHz from second pass band by producing a notch at 5.45 GHz. Thus the dual band filter can be fast switched to the triple band
392
S. Das and P. Sarkar
filter, finally resonating at 2.5–3.2, 5.25–5.35 and 5.5–5.65 GHz. It is seen that the structure is flexible for both wide band and two narrow band applications for this diode configuration. Considering the sixth diode composition, while four diodes are switched to on condition, the developed filter deploys as a narrow band filter. The EM simulated response is plotted in Fig. 3f.
3 Broadband Antenna Configuration Figure 4a demonstrates the planned broadband antenna permeating radiating band of 2.6–5.85 GHz. The designed antenna is modeled on the identical FR4 substrate maintaining 1 mm height. The antenna composes a monopole radiator, a ground ring and an inverted L-shaped resonator. First, the antenna is designed with a half wavelength monopole radiator which is radiating at 4.45 GHz with an impedance bandwidth of 1.4 GHz as shown in ant 1st, in Fig. 4a. Accurate impedance matching can be owned via a partial ground plane. Further, the rectangular ground is enlarged in the shape of a ring in order to achieve a broadband characteristic. The monopole radiator benefits from the ground ring which increases the electrical length of the proposed antenna. The full wavelength resonance of the ground ring merges with radiator resonance generating a broadband from 2.7 to 4.3 GHz. As a result, bandwidth is enhanced from 32.3% to 47% as shown in ant 2nd, in Fig. 4b. Moreover, an inverted half wavelength L-shaped resonator is incorporated near to the monopole radiator to further enhance impedance matching of the antenna, thus generating a broadband from 2.6 to 5.85 GHz as shown in case III, in Fig. 4c. The gap between the inverted L-shaped resonator and the radiator is parametrically studied to realize desired radiating band of interest. As a result, a broadband antenna is assigned obtaining a 10 dB FBW of 83.33%. The antenna size is very compact, i.e., (28 × 28) mm2 . After parametric study, L G1 = 22 mm and W G = 1.5 mm is kept to obtain the desired broadband characteristic. The distribution of the surface current for the planned antenna at lower
Fig. 4 a Intended antenna structure of L = 28, W = 26, L 1 = 18, L 2 = 16, L G = 5, L G1 = 22, L G2 = 23, W 1 = 2, W 2 = 1, W 3 = 3, W G = 1.5, g = 0.5 (in mm). b Frequency plot for the developed broadband antenna
A Triple Band Reconfigurable Filtering Antenna with High Frequency …
393
and higher resonating nodes, i.e., 2.8 and 5.55 GHz, respectively, is demonstrated in Fig. 5. It is seen that high surface current is induced in ground ring and the monopole radiator at lower resonating node as given in Fig. 5a. Figure 5b indicates that a very high surface current is persuading between the inverted L-shaped resonator and radiator for higher resonating node which improves impedance matching as well as the bandwidth of the radiating band. Figure 5c illustrates the gain of the intended antenna under numerous frequencies. It can be noted that 2.86, 2.93 and 3.5 dBi simulated gains are achieved at three antenna resonating nodes, i.e., 2.8, 3.9 and 5.55 GHz, respectively. Figure 6 portrays the radiation pattern for the antenna at phi = 0° and 90° plane at three resonating nodes. It depicts the eight-shaped omnidirectional radiation properties. In the last, the filter structure is deployed to the feeding line of the antenna to grasp a compact reconfigurable filtering antenna as depicted in Fig. 6d. Figure 6e represents the ground plane of the proposed filtering antenna. The four PIN diodes are biased outwardly with the help of a 2.4 nH RFC. The other ends for all the diodes are associated to ground using inductors. The EM simulated response of the proposed composite structure is exposed in Fig. 7a. The composite structure also establishes six excellent radiating band configurations for six diode configurations as specified earlier. As can be seen that the filtering antenna is comfortably applicable for various multiband applications including triple band, dual band, single band and no radiating band configurations with the help of properly biasing the PIN diodes. The single composite structure is also convenient for both wide band and narrow band applications. The non-radiating bands create a no. of transmission zeroes which enlarges the selectivity of the developed structure. Figure 7b plots the gain for the combined structure beneath multiple frequencies with a maximum peak gain of 6.36 dBi.
Fig. 5 Distribution of surface current for the planned broadband antenna at a lower resonating nodes, b higher resonating nodes. c Plot for the antenna gain under various radiating frequencies
394
S. Das and P. Sarkar
Fig. 6 Radiation pattern plot for the developed broadband antenna at phi = 0° and 90° plane at three resonating nodes a 2.8 GHz, b 3.9 GHz and c 5.55 GHz. configuration of the designed triple band reconfigurable filtering antenna. d Top view and e bottom view
Fig. 7 a EM simulated responses of the suggested triple band filtering antenna for six diode configurations. b Gain plot for the planned composite structure beneath various frequencies
A Triple Band Reconfigurable Filtering Antenna with High Frequency …
395
4 Results and Discussions of the Proposed Filtering Antenna Figure 8 represents the radiation pattern properties of the combined architecture for dual band configuration for the two primary planes of F = 0° and 90° which defines omnidirectional radiation properties satisfying Bluetooth, WiMax and WLAN functions. It acquires gain of about 3.85 and 5.84 dBi for dual radiating band configurations for lower and upper pass band, respectively. Similarly, radiation pattern for obtained two narrow radiating bands is displayed in Fig. 9 with simulated gain of about 6.36 and 5.41 dBi which fulfills a satisfactory gain. Variation of the gain is found to be less for all radiating band configurations.
Fig. 8 Radiation pattern of the proposed filtering antenna for dual band configurations at F = 0° and F = 90° plane. a For first radiating band 2.5–3.2 GHz. b For second radiating band centered at 5.45 GHz
Fig. 9 Radiation pattern of the proposed filtering antenna for narrow band configurations at F = 0° and F = 90° plane. a For first radiating band 5.2–5.35 GHz. b For second radiating band 5.5–5.65 GHz
396
S. Das and P. Sarkar
5 Conclusion This manuscript introduces a novel and compact triple band reconfigurable filtering antenna employing PIN diodes. The operating bands of the suggested configuration can be changed to different multiband functions, namely triple band, dual band, single bands and all stop configuration and vice versa to mitigate the user’s demand. The structure is flexible for both wide band and narrow band applications. The source load transmission line is used to demolish the unwanted frequencies from the desired band of interest and also to boost the filter selectivity. The architecture offers omnidirectional radiation pattern to satisfy Bluetooth, WiMax and WLAN band. The intended composite structure is compact (60 × 26) mm2 that allows the architecture well-suited for wireless applications. Compared with many reported reconfigurable filtering antennas, the proposed structure furnishes suitable return loss, low insertion loss, broad upper stop band properties and a satisfactory gain. Therefore, a convenient reconfigurable filtering antenna structure is perceived so that users can accomplish up-to-date multiple band communication techniques.
References 1. Balanis CA (2016) Antenna theory: analysis and design, 4th edn. John Wiley & Sons 2. Jayamani K, Rahulkrishnan SA, Nachiya Raj RA, Atchaya C (2020) A survey on frequency reconfigurable antenna for wireless applications. Int J Psychosoc Rehabil 24(05):2020 3. Costantine J, Tawk Y, Barbin SE, Christodoulou CG (2015) Reconfigurable antennas: design and applications. Proc IEEE 103(3):424–437 4. Priyadarshani M, Gupta SK, Kumar A, Kumar M, Jaiswal AK, Singh Chauhan E (2018) Dual band reconfigurable pin diode based microstrip patch antenna with and without slot. Int J Eng Trends Technol (IJETT) 59(2):79–83 5. Majid HA, Rahim MKA, Hamid MR, Murad NA, Ismail MF (2013) Frequency reconfigurable microstrip patch-slot antenna. IEEE Antennas Wirel Propag Lett 12:218–220 6. Cetiner BA, Crusats GR, Jofre L, Biyikli N (2010) RF MEMS integrated frequency reconfigurable annular slot antenna. IEEE Trans Antennas Propag 58(3):626–632 7. Tang MC, Wen Z, Wang H, Li M, Ziolkowski RW (2017) Compact, frequency-reconfigurable filtenna with sharply defined wideband and continuously tunable narrowband states. IEEE Trans Antennas Propagat 65(10):5026–5034 8. Wen Z, Tang MC, Ziolkowski RW (2019) Band-and frequency-reconfigurable circularly polarised filtenna for cognitive radio applications. IET Microw Antennas Propag 13(7):1003– 1008 9. Qin PY, Wei F, Guo YJ (2015) A wideband-to-narrowband tunable antenna using a reconfigurable filter. IEEE Trans Antennas Propag 63(5):2282–2285 10. Deng J, Hou S, Zhao L, Guo L (2018) A reconfigurable filtering antenna with integrated bandpass filters for UWB/WLAN applications. IEEE Trans Antennas Propag 66(1):401–404 11. Kingsly S, Thangarasu D, Kanagasabai M, Thipparaju RR (2018) Multiband reconfigurable filtering monopole antenna for cognitive radio applications. IEEE Antennas Wirel Propag Lett 17(8):1416–1420
A Triple Band Reconfigurable Filtering Antenna with High Frequency …
397
12. Yassin ME, Mohamed HA, Abdallah EAF, Elhennawy HS (2019) Circularly polarized wideband-to-narrowband switchable antenna. IEEE Access 7:36010–36018 13. Malakooti SA, Fumeaux C (2019) Pattern-reconfigurable antenna with switchable wideband to frequency-agile bandpass/bandstop filtering operation. IEEE Access 7:167065–167075 14. Mao CX, Zhang L, Khalily M, Xiao P (2022) Single-pole double-throw filtering switch and its application in pattern reconfigurable antenna. IEEE Trans Antennas Propag 70(2):1581–1586
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under Large-Signal Pulsed Operating Conditions Niratyay Biswas, Debraj Chakraborty, Madhurima Chattopadhyay, and Moumita Mukherjee
Abstract The three dimensional thermal modelling for large bandgap semiconductor (4H-SiC)-oriented ATT device (e.g. IMPATT) around 94 GHz (W-band) has been presented in this paper. Although the hexagonal SiC-based IMPATT diodes are capable to produce sufficient energy-flux at millimetre and submillimetre wave frequency regime, the loss of power in the form of heat limits the efficient device operation at high frequencies. For the first time, the authors have simulated the thermal model of SiC-based ATT device, operating at W-band, where different parametric effects have been incorporated. Moreover, the effect of variations of DC current density and doping profile on the large-signal output power, obtainable from SiCbased ATT device at W-band, has also been studied. The authors have studied the role of guard ring in thermal model and reported in the present paper. For system requirement in defence application, 100 ns realistic bias pulse current has been used in the model, and it is found that the pulsating signal with 33 μs repetition time can be used to generate sufficient power at the desired frequency level. The authors have thoroughly studied the time required to highlight the active area of the SiC-based ATT devices, and it has been observed that within 1 μs the junction temperature comes to ambient temperature. Keywords IMPATT · Large-signal · Thermal model · Heat-flow · Guard-ring
N. Biswas Department of ECE, Asansol Engineering College, West Burdwan, Asansol, West Bengal 713305, India D. Chakraborty · M. Mukherjee (B) Department of Physics, Adamas University, Kolkata 700126, India e-mail: [email protected] M. Chattopadhyay Department of AEIE, Heritage Institute of Technology, Kolkata, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_37
399
400
N. Biswas et al.
1 Introduction Scientists pan the world are in search of high-power semiconductor sources, which can be utilised as efficient millimetre wave power resource. IMPATT, which is the acronym of Impact Avalanche Transit Time devices, is regarded as the solid-state sources, capable of generating promisingly high power at mm and sub-mm-wave frequency bands. These devices are extensively used in different space communication and civilian systems as well as in missile seekers, high-power radars, and so forth. Gallium arsenide (GaAs) and silicon (Si)-based conventional IMPATT diodes are found to be reliable. Owing to the fundamental limitations of the material parameters, these diodes are limited by power and operating frequencies. The power combining technique is one of the vital approaches, to enhance the power-output of the IMPATT diodes. But the combination of large number of devices is practically difficult. As an alternative suitable option, IMPATT devices can be developed from wide bandgap (WBG) semiconductors of high-thermal conductivity (K) and verylarge amount of critical electric field (E c ). The high-saturation drift velocity of charge carriers (vsn,sp ) including large amount of E c can be treated as essential requisites to optimise base semiconductor material for high-power generating IMPATT diodes based on E c 2 . vs 2 relationship. Besides, on behalf of achieving prolong thermal stability in MMW devices, high value of thermal conductivity (K) is required in the base material also. Silicon-carbide (SiC), as an wide bandgap (WBG) semiconductor material, can be opted for designing high-power IMPATT device, owing to several unique properties as (a) 10 × E c , (b) 1.5 × vs , and (c) 3 × K, compared to those of Si, GaAs, and indium phosphide (InP). The availability of 4H-SiC polytypes in bulk wafer form [10, 12] has helped SiC to emerge as relatively mature wide bandgap semiconductor technologies. So, in the light of the maturity of the fabrication technology and the unique material parameters, WBG semiconductors, especially SiC, appear to be the best choice, overall, for the next decade of device development particularly at THz region. At low-frequency region, the superiority of 4H-SiC-based IMPATT over the traditional IMPATTs is already reported [3, 10, 12] (Mukherjee et al. 2000). Presently, two SiC polytypes out of its nearly 250 polytypes are popular in SiC research: 6H-SiC and 4H-SiC. Although both the polytypes have similar properties, 4H-SiC is preferred over 6H-SiC because the carrier nobilities in 4H-SiC are isotropic, that is, identical along the two planes (parallel and perpendicular to c-axis) of the hexagonal semiconductor, whereas in 6H-SiC, carriers exhibit anisotropic mobility. Also the contact resistivity for 4HSiC is comparatively low compared to 6H-SiC. Moreover, it is already reported that the high-frequency performance of 4H-SiC IMPATT is far better than its 6HSiC counterpart in terms of as well as [3]. Thus, the possibility of generating high power from an IMPATT has been investigated by studying the DC and small-signal properties of WBG 4H-SiC-based flat profile DDR (double drift region) IMPATT diode simulated for operation at MMW regime. For getting optimal performance, thermal stability of the devices is extremely important. Since the efficiency of the devices is not so high (maximum could be obtained ~ 25%), a considerable amount
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
401
of energy is dissipated as heat and that in turn increase the junction temperature and affects the device performance. Thus, thermal stability of the designed devices is very important point for consideration. The authors in this report have developed a 3D thermal model to study the stability of the devices. Also the study would be able to obtain the junction temperature of the device under realistic operating conditions for pulsed mode of operation. To initiate electrical isolation, SiC devices comprise of a mesa structure which is formed using reactive ion etching. However, it is reported that the output power levels are lesser than the simulated outcome. A Group of Scientists have developed a 4H-SiC-based high-power generating IMPATT diode [1, 2, 4, 5, 7, 9, 13–15] that consists of a highly resistive and amorphous guard ring which surrounds the periphery of the diode. It is constructed with vanadium ion implantation technique. Due to the high resistivity and permittivity of the guard ring, the electric flux can penetrate the guard ring area, which is not similar to the mesa structure. In mesa structure, the corner of the diode periphery is surrounded by air. Besides, the area of guard ring occupies higher thermal conductivity that enables easy spreading of heat into the junction area. In this way, the power dissipation becomes higher than with the same size mesa junction. The authors have studied the role of guard ring in thermal model and reported in the present paper.
2 Equations Involving the Study Large-Signal Properties of IMPATT Devices The large-signal analysis of an IMPATT diode gives insight into the microwave performance of the device. If the AC field in the depletion region is small compared to the DC breakdown field, the variation of ionisation rate with electric field can be assumed to be linear and small-signal solution of time varying Poisson and Continuity equations can be carried out by linearization, which represent the low amplitude limit of large-signal analysis. Gummel and Blue proposed a small-signal analysis of the basic equations free from all simplifying assumptions. Following Gummel-Blue approach two second-order equation can be derived for real (R) and imaginary part (X) of diode impedance Z(x, ω) for any point in the depletion region. 2 ∂ R ω ∂X ω ∂2 R − H (x) − 2r + + αn (x) − αp (x) ∂x2 ∂x v ∂x v2 ω α X −2 =0 R − 2α v vε 2 ∂ X ∂2 X ω ∂R ω − H (x) − 2r + + α (x) − α (x) n p ∂x2 ∂x v ∂x v2 ω ω R+2 2 =0 X + 2α v v ε
(1)
(2)
402
N. Biswas et al.
where the following notations have been introduced, 0.5 αp vp + αn vn vp − vn v = vp vn , α = ,r = , 2v 2v
d αn − αp J dα H= +y 2 , vε dE dE H is the linearization factor, J DC = Total DC current density, ε = Permittivity of the semiconductor. The boundary conditions for R and X are given by: [n side and p side, respectively]. At x = −x 1 y=
J
vε dEm
dx ωX 1 ∂R + and =− ∂x vns vns ε ωR ∂X − =0 ∂x vns
(3)
At x = x 2 , ∂R 1 ωX and = − ∂x vps vps ε ∂X ωR + =0 ∂x vps
(4)
where Z(x, ω) = R(x, ω) + jX(x, ω). A generalised computer algorithm for smallsignal simulation of the negative resistivity and reactivity in the space charge region is used in the analysis. Following the Gummel-Blue approach, the equations involving R(x, ω) and X(x, ω) are determined, and the modified Runge–Kutta method is used for numerical analysis. The total integrated diode −ve resistance (Z R ) and reactance (Z X ) at a particular frequency (ω) and current density JDC are computed from numerical integration of the R(x) and X(x) profiles over the active space charge layer. Thus, x2 ZR =
R(x)dx
(5)
X (x)dx
(6)
−x1
x2 ZX = −x1
The total diode impedance Z total (ω) is given by
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
403
x2 Z total (ω) =
Z (x, ω)dx = Z R + j Z X
(7)
−x1
The diode admittance is expressed as Y =
1 Z total
= G + jB =
1 (Z R + j Z X )
(8)
The small-signal admittance characteristics, negative resistivity profiles, and device quality factor (Q = −B/G at peak frequency) of the designed diodes are determined by this technique after satisfying the appropriate boundary conditions. The diode total negative conductance G and positive susceptance B are calculated from the following expressions, |−G(ω)| = |B(ω)| =
ZR Z 2R + Z 2X
−Z X Z 2R + Z 2X
(9) (10)
The accuracy of the method is increased by incorporating realistic doping profiles, considering recently reported values of material parameter at 500 K and including the effect of mobile space charge. Large-signal analysis provides information regarding: a. Range of frequency in which IMPATT diode will oscillate and also predict the operating current range. b. Magnitude of maximum negative conductance. c. Optimum frequency of operation at which the device exhibits maximum negative conductance. d. Magnitude of negative resistance. e. Device quality factor (Fig. 1).
3 Simulation Methodologies Design of Doping Profile: The frequency of operation of an IMPATT diode essentially depends on the transit time of charge carriers to cross the depletion layer of the diode. A double drift p+ pnn+ structure of SiC IMPATT has been designed by using computer simulation technique for operation at 0.3 THz frequency by using the transit time formula of Sze and Ryder which is W n,p = 0.37 V sn,sp /f ; where W n,p , V sn,sp, and f are the total depletion layer width (n or p side), saturation velocity of electrons/holes, and operating frequency,
404
N. Biswas et al.
Fig. 1 Schematic doping profile of flat type DDR IMPATT diode
Table 1 Material parameters of 4H-SiC at 300 K Parameter
Value
a1
10.00
b1 (× 1017 v3 m−2 )
4.0268
a2 (× 10–18 v−1 m2 )
4.1915
b2 (×
10−10
m)
4.6428
ah1*
10.00
bh1* (× 1017 v3 m−2 ) ah2* (×
10–18
v−1
4.0268
m2 )
4.1915
bh2* (× 10−10 m)
4.6428
Saturation drift velocity of electrons, vsn (× 105 m/s) Saturation drift velocity of holes, vsp (×
105
m/s)
2.12 1.08
Mobility of electrons, μn (× 10−1 m2 /V s)
1.00
Mobility of holes, μp (m2 /V s)
0.10
Permittivity, ε (×
76.00734
* Here,
10–12
F/m)
value of ah1, bh1, ah2, bh2 is taken same both in low and high field
respectively. Here, n+ and p+ -layers are highly doped substrates, and n and p are epilayer. Material and Design Parameters: The material parameters of 4H-silicon-carbide (4H-SiC) (at 300 K) are enlisted in Table 1.
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
4 Simulation Results DC and Large-Signal Analysis of 4H-SiC (Figs. 2, 3, 4 and 5):
Fig. 2 Electric field profile of DDR 4H-SiC IMPATT at 94 GHz
Fig. 3 Normalised doping profile of DDR 4H-SiC IMPATT at 94 GHz
405
406
N. Biswas et al.
Fig. 4 Doping profile of DDR 4H-SiC IMPATT at 94 GHz
Fig. 5 GB plot for different bias current density (J o ) optimised at 94 GHz
5 Design Specification of DDR (n+ n-pp+ ) 4H-SiC with Peak Optimum Frequency 94 GHz
Parameters
4H-SiC
W n (m)
0.70 × 10–6
W p (m)
0.70 × 10–6
Nn
(m−3 )
2.6 × 1023 (continued)
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
407
(continued) Parameters
4H-SiC
Pp (m−3 )
2.6 × 1023
W sub (m−3 )
1.0 × 1026
Area Jo
(m2 )
10–9 5.1 × 108
(A/m2 )
6 Simulated Results J o (A/ Peak V B (V) Efficiency −Z R ( m2 ) −GP (mho/ BP (S/ Q = m2 ) × electric (η) × × 10−8 m2 ) × 107 m2 ) × |BP / 8 field (× 100% 10 107 GP | 108 V/ m)
POUT (W)
4.9
3.53
246
16.56
1.353
7.18
1.20
0.168 543
5.1
3.53
246
16.54
1.237
7.96
0.98
0.123 605
5.2
3.52
246
16.39
1.194
8.09
1.50
0.150 614
7 Thermal Modelling Thermal Modelling using vanadium guard ring: Bias current density (J o ) (A/ m2 )
Breakdown voltage (V B ) (V)
Efficiency(η) %
Heat flux (W/m2 )
5.1 × 108
246
16.56
1.0 × 1011
For the purpose of thermal modelling, a highly resistive vanadium doped guard ring surrounds the diode periphery p+ -p-n-n+ layer formed by ion implantation which is proposed. Since its thermal material properties are quite similar with bulk (4HSiC). For Ohmic contact, very thin Ni layer (20 nm) on bottom of p+ and the on the top of n+ layer is taken on the bottom and top of Ni layer a very thin layer (20 nm) of Pt/Ti/Au alloy is formed respectively for Ohmic contact. On the bottom on Ti/Pt/ Au layer diamond, heat sink (150 μm × 150 μm) is used instead of copper heat sink because the thermal conductivity of 4H-SiC is greater than that of copper (Fig. 6).
408
N. Biswas et al.
Fig. 6 Schematic diagram of 4H-SiC-based DDR IMPATT diode with diamond heat sink at 94 GHz
8 Results from COMSOL Multiphysics COMSOL Multiphysics 4.0a 2D-simulator is used for thermal modelling of 4H-SiC IMPATT and thoroughly observed the variation on of junction temperature, heat flow to diamond heat sink, with pulse on duration (pulse width) at particular inflow heat flux. The distribution of temperature inside the diode active region in presence of different heat flux is also observed. Due to system requirement, we have used 100 ns pulse on duration and observed the variation of junction temperature with time at different inflow heat flux.
8.1 Variation of Junction Temperature with Time with Different Heat Flux and Validation of Thermal Model Inflow heat flux (in each case study) = 1.0 × 1011 unit. See Figs. 7, 8, 9, 10, 11, 12, 13, 14 and 15.
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
Fig. 7 Distribution of temperature at 50 ns
Fig. 8 Distribution of temperature at 100 ns
409
410
N. Biswas et al.
Fig. 9 Distribution of temperature at 150 ns
Fig. 10 Distribution of temperature at 1 μs
9 Comparison of Results We have verified the simulator by developing a parallel model for the same purpose by comparing the junction temperature for 100 ns pulse on duration and found that the results are nearly same.
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under …
411
Fig. 11 Distribution of temperature at 2 μs
Fig. 12 Variation of junction temperature with pulse ON/OFF duration at different inflow heat flux
Parameter
From COMSOL multiphysics
From parallel model
Junction temperature (K)
756
742.6
412
Fig. 13 Rise of junction temperature with pulse ON duration at different inflow
Fig. 14 Plot of transient thermal resistance with pulse width
N. Biswas et al.
3D Thermal Modelling of SiC-Avalanche Transit Time Oscillator Under … 748
413
Plot of Junction Temperature vs Transient Thermal Resistance
Junction Temperature(K) --->
746 744
X: 1.249 Y: 742.6
742 740 738 736 734 1.225
1.23
1.235
1.24
1.245
1.25
1.255
1.26
Transient Thermal Resistance( o C/W) --->
1.265
1.27
Fig. 15 Plot of junction temperature with pulse width
10 Conclusion The authors, for the first time, have studied thoroughly the guard ring effect in thermal modelling of MMW/THz ATT devices. In order to isolate electrically, SiC diodes occupy a mesa structure, which is generally formed using reactive ion etching. A Group of Scientists have developed an IMPATT device comprising of 4H-SiC material, which is found capable of generating higher output power. Besides, this diode also consists of a highly resistive and amorphous guard ring that surrounds the diode periphery. Instead of the mesa structure, it is generally formed by vanadium ion implantation. The authors have studied the role of guard ring in thermal model and reported in the present paper. For system requirement in defence application, 100 ns realistic bias pulse current has been used in the model, and it is found that the pulsating signal with 33 μs repetition time can be used to generate sufficient power at the desired frequency level. The authors have thoroughly studied the time required to highlight the active area of the SiC-based ATT devices, and it has been observed that within 1 μs the junction temperature comes to ambient temperature. Hence due to this heat sink arrangement, the efficiency level of SiC-based ATT devices can optimised, and power level will be maximised. Acknowledgements The author wishes to acknowledge, Asansol Engineering College for providing necessary infrastructure and facilities for conducting the research work.
414
N. Biswas et al.
References 1. Adlerstein MG, Holway LH, Chu SL (1983) Measurement of series resistance in IMPATT diodes. IEEE Trans Electron Devices 30:179–182 2. Acharya A, Mukherjee J, Mukherjee M, Banerjee JP (2011) Heat sink design for IMPATT diode sources with different base materials operating at 94 GHz. Arch Phys Res 2(1):107–112 3. Chakraborty D, Mukherjee M (2020) Si/SiC heterostructure MITATT oscillator for higherharmonic THz-power generation: theoretical reliability and experimental feasibility studies of quantum modified non-linear classical model. Microsyst Technol 26(7):2243–2265. https:// doi.org/10.1007/s00542-019-04580-3 4. Gibbons G (1973) Avalanche diode microwave oscillators. Clarendon Press, Oxford 5. Gilden M, Hines ME (1966) Electronic tuning effects in the read microwave avalanche diode. IEEE Trans Electron Devices 13(1):169–175 6. Mukherjee M, Majumder N (2007) Optically illuminated 4H-SiC terahertz IMPATT device. Egypt J Solids 30(1):87–101 7. Mukherjee M, Majumder N, Roy SK (2008) Prospects of 4H-SiC double drift region IMPATT device as a photo-sensitive high-power source at 0.7 terahertz frequency regime. Act Passive Electron Compon 2008:1–9 8. Mukherjee M, Mazumder N, Roy SK, Goswami K (2007) Terahertz frequency performance of double drift IMPATT diode based on opto-sensitive semiconductor. In: Proceedings of AsiaPacific microwave conference, pp 1–4 9. Mukherjee M, Mazumder N, Roy SK, Goswami K (2007) GaN IMPATT diode: a photosensitive high power terahertz source. Semicond Sci Technol 22(12):1258–1260 10. Mukherjee M, Mazumder N, Roy SK (2010) α-SiC nanoscale transit time diodes: performance of the photo-irradiated terahertz sources at elevated temperature. Semicond Sci Technol 25(5):055008 11. Mukherjee M, Roy SK (2010) Wide-bandgap III–V nitride-based avalanche transit-time diode in terahertz regime: studies on the effects of punch through on high frequency characteristics and series resistance of the device. Curr Appl Phys 10(2):646–651 12. Mukherjee M, Roy SK (2009) Optically modulated III–V nitride-based top-mounted and flipchip IMPATT oscillators at terahertz regime: studies on the shift of Avalanche transit time phase delay due to photogenerated carriers. IEEE Trans Electron Devices 56(7):1411–1417 13. Roy SK, Sridharan M, Ghosh R, Pal BB (1979)Computer method for the dc field and carrier current profiles in the IMPATT device starting from the field extremum in the depletion layer. In: Miller JH (ed) Proceedings of the 1st conference on numerical analysis of semiconductor devices (NASECODEI), Dublin, Ireland, pp 266–274 14. Vassilevski K, Zorenko A et al (2001) 4H-SiC IMPATT diode fabrication and testing. In: Technical digest of international conference on SiC and related materials. ICSCRM 713. https:// doi.org/10.4028/www.scientific.net/MSF.389-393.1353 15. Vassilevski KV, Zekentes K, Zorenko AV, Romanov LP (2000) Experimental determination of electron drift velocity in 4H-SiC p+ nn+ avalanche diodes. IEEE Electron Device Lett 21:485
A Compact Miniaturized Implantable Antenna for 2.45 GHz ISM Band Application Santoshkumar Singh Moirangthem, Sourav Roy, Soumendu Ghosh, and Abhishek Sarkhel
Abstract A compact miniaturized implantable antenna for biotelemetry application at 2.45 GHz frequency is proposed in this article. The proposed antenna has a small volume of 16.53 mm3 . The proposed antenna performance covers a − 10 dB impedance band of 2.395–2.52 GHz giving an impedance bandwidth of 125 MHz. The antenna analysis is performed with the integration of other components for system-level configuration. The antenna obtained − 27.5 dB peak gain with a broadside pattern. The simulated specific absorption rate satisfied the standard regulation for human safety. Keywords Antenna · Implantable · ISM band · Miniaturized · SAR
1 Introduction The recent advancement in technology is influencing day-to-day human life. The rapid growth in wireless communication and biomedical technologies in recent times has attracted many researchers in body-centric applications. The medical devices are developed for monitoring, stimulation, and drug delivery and assisting in diagnostic and therapeutic purposes. Wearable and implantable medical devices (IMDs) are gaining popularity in human healthcare sector. An implantable antenna with efficient performance is one of the essential components in modern wireless IMDs for establishing reliable communication [1–4]. Many researchers have been taking interest on the design of antenna for implantable applications in recent past. Designing an implantable antenna involves many challenges, including antenna size, frequencies of operation, human safety regulations, and biocompatibility [1]. In [2], an implantable antenna had been reported for biomedical telemetry application operating at MedRadio band. A wearable antenna had been proposed for the 5G application [5]. In [6], S. S. Moirangthem (B) · S. Ghosh · A. Sarkhel National Institute of Technology Meghalaya, Shillong 793003, India e-mail: [email protected] S. Roy North Tripura District Polytechnic College, Dharmanagar 799253, Tripura, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_38
415
416
M. Santoshkumar Singh et al.
Fig. 1 Proposed antenna simulation environment for implantable scenario
we had proposed a miniaturized implantable antenna for biotelemetry application at 2.41 GHz, industrial, scientific, and medical (ISM) band. In this article, we proposed a compact miniaturized implantable antenna for ISM band biotelemetry application. The proposed antenna performance covers a − 10 dB impedance band of 2.395–2.52 GHz giving an impedance bandwidth (IBW) of 125 MHz. The antenna simulation analysis is performed with integration of dummy layers for electronics circuitry to replicate the real implantable device scenario [7, 8]. The proposed antenna exhibits a considerable compact volume of 16.53 mm3 with − 27.5 dB peak gain. The simulated specific absorption rate (SAR) complies with the standard regulation for human safety.
2 Design Methodology of Antenna 2.1 Antenna Simulation Environment As the reported antenna is meant for implantable scenario, the antenna is simulated by implanting inside the homogeneous skin phantom (HSP) with 110 × 110 × 50 mm3 dimension. The frequency-dependent electrical properties are considered for the HSP [9]. In the simulation analysis, the antenna is incorporated with the other components to make the implant system for the realistic scenario. The implant system is enclosed by a radiation box of size greater than λ0 /4 from the edges of the antenna structure in all directions. The proposed antenna simulation environment for implantable scenario is shown in Fig. 1. The proposed antenna along with the other components is covered with biocompatible ceramic alumina (r = 9.8, thickness = 0.25 mm) to protect the short circuit between the conductive human tissues. The perfect electric conductor (PEC)-backed dielectric dummy layers are used to replicate the electronic circuitry for power management, sensors [7, 8]. The PEC layer is also used for battery. The antenna simulation analysis is performed using Ansys electromagnetic solver.
A Compact Miniaturized Implantable Antenna …
417
Fig. 2 Proposed antenna a developmental stages b geometrical configuration [D = 9, l1 = 3.9, h = 0.13, l2 = 1.4, s = 0.2, ( f x , f y ) = (2, 2), (unit: mm)]
Fig. 3 Proposed antenna response a reflection coefficient b input impedance
2.2 Antenna Optimization The design process of the compact proposed antenna follows various developmental stages, as shown in Fig. 2a. The high dielectric material (tan δ = 0.0035, r = 10.2) is taken as substrate and superstrate in the design process of antenna, which also assists in the antenna miniaturization. The antenna is excited using a 50 coaxial cable. Figure 2b shows the geometrical configuration of the proposed implantable antenna, and optimized values of the design parameter are listed in the figure caption. Initially, in stage 1, a traditional circular patch antenna is considered. The antenna shows a poor matching resonance at the 5.09 GHz frequency, as shown in Fig. 3a. For achieving the miniaturized frequency, an L-shaped slot is loaded at stage 2. The loading of this L-shaped slot results in improvement of effective reactance, which is observed in Fig. 3b, thus results in miniaturization of frequency to 3.43 GHz which is 32.6% miniaturization in resonant frequency. However, to achieve the desired ISM band of operation at 2.4–2.48 GHz, the L-shaped slots are extended in stage 3 to get the proposed antenna. The extended slots lengthen the current path and
418
M. Santoshkumar Singh et al.
Fig. 4 Surface current distribution at resonant frequency
Fig. 5 Antenna pattern at resonant frequency a 2D b 3D
help in the further miniaturization. The antenna is well matched in this stage and results in 2.45 GHz resonant frequency covering a − 10 dB impedance band of 2.395– 2.52 GHz, which is observed in Fig. 3a. The antenna is well terminated at this resonant frequency. The working mechanism of the proposed antenna can be understood from surface current distribution at resonant frequency shown in Fig. 4. It is seen that, area near the end of slot on top radiating patch is highly excited. In the bottom ground plane, the area near the feed and edge of the structure is excited. The antenna obtained − 27.5 dB peak gain at resonant frequency in the broadside direction which is shown in Fig. 5a. The proposed antenna 3D pattern, shown in Fig. 5b indicates that the antenna is radiating along the desired, off-body direction for the implantable scenario.
3 Specific Absorption Rate Analysis It is worth noting that, designing an implantable antenna needs to comply with the standard regulation for human safety. When the human tissues are exposed to the
A Compact Miniaturized Implantable Antenna …
419
Fig. 6 Simulated SAR at resonant frequency for human head model a 1 g average b 10 g average
Table 1 Simulated SAR for human head model at resonant frequency Frequency SAR Max. acceptable input power 1g 10 g 1g 10 g 2.45 GHz
402.72 W/kg
67.71 W/kg
3.97 mW
29.54 mW
electromagnetic wave, the human tissue absorbs energy that can damage body tissues. The amount of energy absorbed is evaluated by SAR. The IEEE standard regulation for human safety limits the 1.6 W/kg SAR over 1 g average and 2 W/kg SAR over 10 g average [10, 11]. The simulated SAR response at resonant frequency of the proposed antenna for human head model over 1 g average and 10 g average tissue respectively is shown in Fig. 6a and b. The simulated SAR and maximum acceptable input power are listed in Table 1. The standard safety regulation will be fulfilled with an input power less than 3.97 mW and 29.64 mW, respectively, over 1 g average and 10 g average tissue.
4 Measurement and Discussion To validate the proposed antenna in realistic scenario, the proposed antenna prototype is fabricated using standard PCB fabrication technology. The antenna response is measured by implanting inside the pork meat slab using PNA Network Analyzer N5221A. Figure 7a shows the fabricated antenna prototype with components and measurement environment. The measured and simulated reflection coefficient response is consistent which is shown in Fig. 7b.
420
M. Santoshkumar Singh et al.
Fig. 7 a Fabricated prototype with components and measurement environment. b Reflection coefficient response
5 Conclusion This article proposed a compact and miniaturized implantable antenna for biotelemetry application at 2.45 GHz ISM band. The proposed antenna performance covers a − 10 dB impedance band of 2.395–2.52 GHz giving an IBW of 125 MHz. The proposed antenna configuration has a small footprint of π × 4.52 × 0.26 mm3 with a considerable − 27.5 dB peak gain. The antenna analysis is performed in system-level configuration to justify the real implantable device scenario. The simulated SAR also satisfied the IEEE standard regulation for human safety. The simulated response is in good degree of consistency with the measured response.
References 1. Kiourti A, Nikita KS (2012) A review of implantable patch antennas for biomedical telemetry: challenges and solutions [Wireless Corner]. IEEE Antennas Propag Mag 54(3):210–228 2. Li H, Guo Y, Liu C, Xiao S, Li L (2015) A miniature-implantable antenna for MedRadio-band biomedical telemetry. IEEE Antennas Wirel Propag Lett 14:1176–1179 3. Liu XY, Wu ZT, Fan Y, Tentzeris EM (2017) A miniaturized CSRR loaded wide-beamwidth circularly polarized implantable antenna for subcutaneous real-time glucose monitoring. IEEE Antennas Wirel Propag Lett 16:577–580 4. Xia Z et al (2020) A wideband circularly polarized implantable patch antenna for ISM band biomedical applications. IEEE Trans Antennas Propag 68(3):2399–2404 5. Singh MS, Roy S, Ghosh S, Sarkhel A (2021) Wearable textile based MIMO antenna for 5G application. In: Indian conference on antennas and propagation 2021. IEEE, pp 159–162 6. Singh MS, Ghosh J, Ghosh S, Sarkhel A (2021) Miniaturized dual-antenna system for implantable biotelemetry application. IEEE Antennas Wirel Propag Lett 20(8):1394–1398 7. Shah IA, Zada M, Yoo H (2019) Design and analysis of a compact-sized multiband spiralshaped implantable antenna for scalp implantable and leadless pacemaker systems. IEEE Trans Antennas Propag 67(6):4230–4234
A Compact Miniaturized Implantable Antenna …
421
8. Faisal F, Zada M, Ejaz A, Amin Y, Ullah S, Yoo H (2020) A miniaturized dual-band implantable antenna system for medical applications. IEEE Trans Antennas Propag 68(2):1161–1165 9. Gabriel S, Lau RW, Gabriel C (1996) The dielectric properties of biological tissues: II. Measurements in the frequency range 10 Hz to 20 GHz. Phys Med Biol 41(11):2251–2269 10. IEEE Standard for Safety Levels with Respect to Human Exposure to Radio Frequency Electromagnetic Fields, 3 kHz to 300 GHz. IEEE Std C95.1-1999 (1999) 11. IEEE Standard for Safety Levels with Respect to Human Exposure to Radio Frequency Electromagnetic Fields, 3 kHz to 300 GHz (Revision of IEEE Std C95.1-1991) (2006)
Miniaturized Dielectric Disc Loaded Monopole Antenna Khan Masood Parvez, SK. Moinul Haque, and Laxmikant Minz
Abstract This paper deals with miniaturization technique based on frequency reduction using top loaded dielectric discs. In contrast to a simple monopole, the resonant frequency of monopole loaded with two dielectric discs changes from 1.98 to 1.29 GHz, resulting 34.84% reduction in resonant frequency keeping the antenna length (36.00 mm) unaltered. It is well-known fact that dielectric material can trap the energy to be delivered from source to antenna, and as a result, it is unable to radiate efficiently. Then any approach to use the dielectric material for miniaturization process must, therefore, antenna coupled in such a way that it can radiate efficiently. The dielectric disc on top of monopole creates an inductive situation in a similar way to oppositely directed wire loop compensate the capacitive effect present at monopole causes the reduction in resonant frequency. This concept is implemented without sacrificing any desired features like bandwidth, radiation characteristics, and efficiency (more than 98%) and analyzed with an equivalent circuit model. Experimental results illustrate good agreement with simulation results. This monopole antenna in car can be designed for GPS system, car to car communication, GSM, or CDMA operation. Keywords Dielectric disc loaded · Miniaturization · Monopole antenna
1 Introduction The monopoles are the simplest and most widely used class of antennas for almost a century since the innovation of wireless radio communication. The quarter wave antennas are also appropriate for the network for unattended ground sensors, mobile K. M. Parvez (B) · SK. Moinul Haque Antenna Research Laboratory, Department of Electronics and Communication Engineering, Aliah University, Kolkata, India e-mail: [email protected] L. Minz Korea Advanced Institute of Science and Technology, 5207, 3-2, 291 Daehak-Ro, Yuseong-gu, Daejeon 34141, South Korea © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_39
423
424
K. M. Parvez et al.
communication, and terrestrial applications and situation where earth surface can be used as an infinite ground plane. Therefore, investigations of electrically small monopoles are given the opportunity to fulfil the demand for compact wireless systems. The monopole antenna has “ka” value less than or equal to 0.5 which is known to be an electrically small antenna [1, 2], where “k” is the wave number in free space, and “a” is the radius of the smallest sphere circumscribing the antenna. In [3], “ka” value for single loop loaded monopole is 0.58 which is very close to the limit. The monopole is also loaded by similar loop and reported 23.88% reduction in resonant frequency with 96.10% efficiency. An approach for electrically small monopole antenna in [4] is based on multi elements. In reference [5], the peanocurve loaded monopole is reported to operate at 2.45 GHz frequency. A miniature dielectric loaded monopole antenna for 2.4/5 GHz WLAN applications is presented in [6]. Top loaded monopole with shorting post is studied in [7]. Slot antenna miniaturization techniques have been delineated in [8–10]. Antenna miniaturization using high permittivity material has been reported in [11, 12]. The sinusoidal-Galerkin moment method for monopole antenna at centre of circular disc has been described in [13]. Wheeler’s design criteria for electrically small antenna designs with theoretical formulations have been briefly illustrated in [14]. The disc monopoles have been addressed for input impedance matching [15] and time domain analysis [16]. The monopole antenna horizontally loaded with circular disc has been presented for enhance bandwidth in literature [17–20]. Printed circular, elliptical, square, rectangular, and hexagonal disc monopole antennas have been highlighted for wide band applications in [21]. In this study, a small monopole antenna is presented based on frequency reduction technique using top loaded dielectric discs. Firstly, the simple monopole is efficiently coupled with one dielectric disc on top of monopole in similar way coupled to opposite directed wire loop [8]. This concept yields 25.75% reduction on resonance frequency keeping monopole length unaltered to bring it much close to electrically small limit. In order to achieve more reduction in resonant frequency, we have introduced one more circular disc on the top of monopole. And as a result, the proposed antenna produces 34.84% reduced resonant frequency with 22.55% bandwidth and 97.18% efficiency. An equivalent circuit model is also proposed to better understand the inductive effect on monopole. The “ka” [1, 2] value for monopole antenna loaded with single and double dielectric discs is 0.55 and 0.48, respectively, without imposing any additional matching network.
Miniaturized Dielectric Disc Loaded Monopole Antenna
425
2 Single Dielectric Disc Loaded Monopole Antenna Miniaturization 2.1 Description of Antenna Structure It is well-known that the length of monopole is inversely proportional to antenna resonance frequency. Any approach to reduce the resonant frequency with unaltered antenna length is a crucial task for the researcher community due to its adverse effect on radiation pattern, gain, bandwidth, efficiency, etc. The schematic diagram of monopole antenna is shown in Fig. 1. The length (L) and diameter (D) of monopole structure are 36.00 mm and 1.00 mm, respectively. The rectangular copper ground plane is used to construct the antenna with the dimension of 50.00 mm × 50.00 mm × 0.90 mm (L G × W G × H) considering its finite conductivity of 5.8 × 107 Siemens/m. The current on monopole structure excites to induce the electric field along the monopole in the near field region. The capacitive reactive environment below the resonant frequency of monopole antenna leads to enhance stored energy which is responsible for high quality factor (Q). One possible methodology [3] to cancel out the capacitive effect is to introduce inductive environment on monopole structure such way that it is well coupled with the monopole structure. Here, we replaced the conductive loop to circular dielectric disc. Use of dielectric not only provides us higher degree of miniaturization but also better efficiency having not use metal loop. It is well-known fact that antenna configuration with high dielectric material [11] can trap the delivered energy to the antenna for its high dielectric constant. Any attempt to construct a miniaturized antenna using dielectric material must, therefore, have to be coupled in such a manner that it can radiate efficiently. We have used RT/Duriod 6010 LM substrate (ε = 10.2, δ = 0.0023) Fig. 1 Monopole
426
K. M. Parvez et al.
Fig. 2 Monopole antenna loaded with a single dielectric disc
to implement the present the concept. The proposed antenna does not require any additional matching network. Figure 2 illustrates the single disc loaded monopole antenna. So, the proposed antenna itself behaves as a self-resonant structure [2, 3]. The diameter (D2) and thickness (t) of dielectric substrate are 21.00 mm and 2.54 mm, respectively. The monopole length (L), diameter (D1), and ground plane size (L G × W G × H) are same as reference monopole of Fig. 1. The antenna simulations were carried out using the full-wave electromagnetic solver Ansys HFSS v.19.2 [22].
2.2 Equivalent Circuit Model The equivalent circuit model of single dielectric disc loaded monopole antenna has been highlighted in Fig. 3. The circuit model has been analyzed using NI AWR [23] software. The parametric values are as follows: L1 = 10.87 nH, L2 = 0.2683 nH, R1 = 42.1 , R2 = 1 × 10−8 , C1 = 1.001 pF. Here, all symbols have their usual meaning.
Fig. 3 Equivalent circuit model for Fig. 2
Miniaturized Dielectric Disc Loaded Monopole Antenna
427
2.3 Results and Description Figure 4a presents the return loss plot for unloaded monopole antenna and disc loaded monopole. The simulated results are compared with measured results and found agreeing well each other. The operating frequency for proposed antenna is shifted towards the left side in return loss in comparison with the unloaded monopole. The simulated value for unload monopole is 1.99 GHz. The simulated resonant frequency for disc loaded monopole is 1.49 GHz. The measured resonance frequency for ordinary monopole antenna is 1.98 GHz, and disc loaded is 1.47 GHz, producing 25.75% reduction in resonance frequency in comparison with ordinary monopole. It is also noticeable that dielectric disc in monopole did not trap the delivered energy to the antenna for its high permittivity. The bandwidth (− 10 dB) for top loaded dielectric disc is 20.66%. The equivalent circuit response is compared with the simulated response in Fig. 4b. Proposed antennas are very simple and easy to implement comparison with miniaturized top hat loaded, complementary split ring resonator (CSRR)-based antenna [24], and inductively coupled capacitive loaded miniaturized monopole [25]. The normalized radiation characteristics of proposed monopole are illustrated in Fig. 5a, b. The cross-pol level of both E and H plane is below the accepted value. The dielectric disc in monopole antenna has integrated in such a way that it can radiate efficiently in distributed and symmetric manner. Hence, the single dielectric disc does not distort the radiation characteristics unlike [24] relative to reference. The Wheeler Cap method [26] is used to measure efficiency. The measured efficiency and gain are 97.41% and − 1.26 dBi, respectively.
Fig. 4 a Return loss for monopole antenna loaded with single dielectric disc, b equivalent circuit response
428
K. M. Parvez et al.
Fig. 5 Radiation characteristics at resonance frequency a E-plane, b H-plane
It is seen that from Fig. 6, the reactance for reference monopole at frequency of 1.99 GHz touches the horizontal axis, and its corresponding input resistance is 50 . After introduction of single dielectric disc in monopole, the reactance which presents at 1.49 GHz of reference monopole antenna has been cancelled out by the inductive reactance of circular dielectric disc. This results in reduction of resonant frequency of 25.12% (simulation value) unlike in [27] where miniaturization is achieve using very high refractive index meta-material. In [28], a RLC resonant circuit and RL circuit had been used to model circular loop and feed line, respectively. In current approach, an equivalent circuit has been designed combining these two circuits which is shown √ in Fig. 3. The resonant frequency is f = 1/[2π (LC)]. The equivalent circuit has been analyzed by using NI AWR [23] simulation tool circuit and validate with antenna simulation. Frequency response for both the equivalent circuit and single dielectric disc loaded monopole antenna has been shown in Fig. 4b, respectively. Excellent agreement is also fund in frequency response too. The “ka” value for this proposed antenna is 0.55 which is very close to the electrically small antenna limit for the grounded structure. The “ka” value of an antenna is defined in [2] as ka = (2π /λ0 ) × a where λ0 is the wavelength of the proposed antenna at resonance frequency of 1.47 GHz, and “a” is the radius of the smallest sphere circumscribing the antenna.
Miniaturized Dielectric Disc Loaded Monopole Antenna
429
Fig. 6 Simulated input impedance plot of reference monopole antenna and proposed antenna
3 Double Dielectric Discs Loaded Monopole Antenna Miniaturization 3.1 Description of Antenna Structure The dimensions of double dielectric disc loaded monopole antenna are same as described in Fig. 2, except here, we have used two dielectric discs parallel to each other on top of monopole antenna.
3.2 Equivalent Circuit Model The equivalent circuit model analysis of double dielectric discs loaded monopole has been addressed in Fig. 7. The value for the parameters is as follows: L1 = 7.09 nH, L2 = 111.5 nH, L3 = 8.862 nH, R1 = 1 × 10–5 , R2 = 165.1 , R3 = 42.45 , C1 = 0.023 pF, C2 = 0.9278 pF and for conductive materials L1 = 0.7855 nH, L2 = 565.1 nH, L3 = 11.01 nH, R1 = 1 × 10–11 , R2 = 0.1 , R3 = 39.22 , C1 = 1 × 10–9 pF, C2 = 1.482 pF. The symbols have their usual meaning.
430
K. M. Parvez et al.
Fig. 7 Equivalent circuit model for proposed monopole antenna
3.3 Results and Description As depicted in Fig. 8a, the simulated resonance frequency for double dielectric disc attached monopole is 1.31 GHz at − 20.23 dB depth. The measured resonant frequency is 1.29 GHz. The reduction of resonant frequency is 34.84%. The − 10 dB bandwidth is 22.52%. The proposed antenna is itself an electrically small antenna due to its ka value 0.48. The HFSS simulation and equivalent circuit response on frequency are shown in Fig. 8b. The simulated and measured radiation characteristics of double dielectric discs loaded monopole have shown in Fig. 9a, b, respectively. The cross-pol levels in both the cases are well below − 20 dB. Thus, introduction of double dielectric discs does not having any degrading influence on radiation characteristics. The measured efficiency for this proposed antenna is 97.18%. The measured gain is − 1.58 dBi. It is noticeable that direction of current flows of reference monopole and loaded monopole are opposite in nature as shown in Fig. 10a. Antenna prototype is shown in Fig. 10b. This work has a preprint version [29].
Fig. 8 a Return loss of proposed monopole antenna. b Equivalent circuit response
Miniaturized Dielectric Disc Loaded Monopole Antenna
431
Fig. 9 Radiation characteristics at resonance frequency a E-plane, b H-plane
Fig. 10 a Current distributions on double dielectric discs, b prototype
4 Conclusions A novel monopole antenna miniaturization technique is presented based on frequency reduction technique using top loaded dielectric discs, keeping antenna length unaltered. The monopole is efficiently coupled with one dielectric disc on top of monopole which yields 25.75% reduced resonant frequency in comparison with a simple monopole. We have introduced one more circular disc on top of monopole to achieve a higher degree of miniaturization. And as a result, proposed antenna achieves 34.84% reduced resonance frequency with 22.55% bandwidth and 97.18% measured efficiency. The monopole antenna in car can be designed for GPS system, car to car
432
K. M. Parvez et al.
communication, GSM, or CDMA operation The proposed antennas can be used as whip antenna on cars where monopole is cover up with dielectric materials.
References 1. McLean JS (1996) A re-examination of the fundamental limits on the radiation Q of electrically small antenna. IEEE Trans Antennas Propag 44(5):672–676 2. Erentok A, Ziolkowski RW (2008) Metamaterial-inspired efficient electrically small antennas. IEEE Trans Antennas Propag 56(3):691–707 3. Ghosh B, Haque SKM, Mitra D, Ghosh S (2010) A loop loading technique for the miniaturization of non-planar and planar antennas. IEEE Trans Antennas Propag 58(6):2116–2121 4. Hong W, Sarabandi K (2009) Low-profile, multi-element, miniaturized monopole antenna. IEEE Trans Antennas Propag 57(1):72–80 5. McVay J, Hoorfar A (2007) Miniaturization of top-loaded monopole antennas using Peanocurves. In: Proceedings of IEEE radio and wireless symposium, pp 253–256 6. Lin YF, Lin CH, Hall PS (2006) A miniature dielectric loaded monopole antenna for 2.4/5 GHz WLAN applications. IEEE Microw Wirel Compon Lett 16(11):591–593 7. Noro T, Kazama Y (2006) Low profile and wide bandwidth characteristics of top loaded monopole antenna with shorting post. In: Proceedings of IEEE international workshop on antenna technology small antennas and novel metamaterials, pp 108–111 8. Haque SKM, Parvez KM (2017) Slot antenna miniaturization using slit, strip and loop loading techniques. IEEE Trans Antennas Propag 65(5):2215–2221 9. Ghosh B, Haque SKM, Mitra D (2011) Miniaturization of slot antennas using slit and strip loading. IEEE Trans Antennas Propag 59(10):3922–3927 10. Ghosh B, Haque SKM, Yenduri NR (2013) Miniaturization of slot antennas using wire loading. IEEE Antennas Wirel Propagat Lett 12:488–491 11. Colburn JS, Rahmat-Samii Y (1999) Patch antennas on externally perforated high dielectric constant substrates. IEEE Trans Antennas Propagat 47:1785–1794 12. Mongia RK, Ittipiboon A, Cuhaci M (1994) Low profile dielectric resonator antennas using a very high permittivity material. Electron Lett 30(17):1362–1363 13. Richmond JH (1984) Monopole antenna on circular disk. IEEE Trans Antennas Propagat 32(12):1282–1287 14. Simpson TL (2004) The disk loaded monopole antenna. IEEE Trans Antennas Propag 52(2):542–550 15. Hammoud PP, Colomel F (1993) Matching the input impedance of a broadband disc monopole. Electron Lett 29:406–407 16. Guo L (2007) Performances of ultra-wideband disc monopoles in time domain. IET Microw Antennas Propag 1(4):955–959 17. Friedman CH (1985) Wide-band matching of a small disk-loaded monopole. IEEE Trans Antennas Propag 33(10):1142–1148 18. Jung J-H, Park I (2003) Electromagnetically coupled small broadband monopole antenna. IEEE Antennas Wireless Propag Lett 2:349–351 19. Lee JW, Cho CS, Kim J (2005) A new vertical half disc-loaded ultra-wideband monopole antenna (VHDMA) with a horizontally top-loaded small disc. IEEE Antennas Wireless Propag Lett 4:198–201 20. Akhoondzadeh-Asl L, Hill J, Laurin J-J, Riel M (2013) Novel low profile wideband monopole antenna for avionics applications. IEEE Trans Antennas Propag 61(11):5766–5770 21. Agrawall NP, Kumar G, Ray KP (1998) Wide-band planar monopole antennas. IEEE Trans Antennas Propag 46(2):294–295 22. Ansys HFSS ver 19.2, Ansys Corp., Pittsburgh, PA, USA (2018) 23. NI AWR ver 13, National Instrument Corporation, EI Segundo, CA, USA (2019)
Miniaturized Dielectric Disc Loaded Monopole Antenna
433
24. Tang M-C, Ziolkowski RW (2013) A study of low-profile, broadside radiation, efficient, electrically small atennas based on complementary split ring resonators. IEEE Trans Antennas Propag 61(1):4419–4430 25. Oh J, Sarabandi K (2012) Low profile, miniaturized, inductively coupled capacitively loaded monopole antenna. IEEE Trans Antennas Propag 60(3):1206–1213 26. Pozar DM, Kaufman B (1998) Comparison of three methods for the measurement of printed antenna efficiency. IEEE Trans Antennas Propag 36(1):136–139 27. Gudibandi BR, Murugan HA, Dhamodharan SK (2020) miniaturization of monopole antenna using high reflective index metamaterial loading. Int J RF Microw Comput Aided Eng 30(5):e22163 28. Saraswat K, Harish AR (2016) Split ring loaded monopole antenna. IET Micro Antennas Propag 10(4):420–425 29. Parvez KM, Haque SM, Minz L (2021) Miniaturized dielectric disc loaded monopole antenna. Preprints 2021, 2021030518. https://doi.org/10.20944/preprints202103.0518.v1
Design of Optimum n-bit ALU Using Crossbar Gate Rakesh Das, Alongbar Wary, Arindam Dey, Raju Hazari, Chandan Bandyopadhyay, and Hafizur Rahaman
Abstract In recent times, photonics industry has gained ample interest due to not only its capability of performing high-speed logical computations on-chips but also its easy market access of ultra-speed optical devices and low-power interconnects in designing logical components. This progress has drawn attention of many researchers for further investigation of finding ways to efficient implementation of various optical circuits. Several works have been proposed in this field using MZI, crossbar gates, and optical interconnects. Motivated by this fact, in this paper we propose an efficient n-bit ALU using crossbar gate that contains optimal number of optical components and can operate with minimal clock cycles. In contrast to this design, we also have introduced the required optical blocks—adders and multiplier. The experimental results are also summarized at the end of this work.
R. Das · H. Rahaman Department of Information Technology, Indian Institute of Engg. Science and Technology Shibpur, Shibpur, India A. Wary · A. Dey School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, AP, India e-mail: [email protected] A. Dey e-mail: [email protected] A. Wary Indira Gandhi Delhi Technical University for Women (IGDTUW), Delhi, India R. Hazari National Institute of Technology, Calicut, India e-mail: [email protected] C. Bandyopadhyay (B) Department of Computer Science and Engineering, Dr. B. C. Roy Engineering College, Durgapur, India e-mail: [email protected] Department of Computer Science and Engineering, University of Bremen, Bremen, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_40
435
436
R. Das et al.
Keywords Crossbar Gate (CG) · ALU · Full Adder (FA) · Beam Combiner (BC) · Beam Splitter (BS)
1 Introduction In the recent past, optical computation has attracted many researchers because of the importance of optical devices which hold important properties like fast switching between devices, operates in very high speed, consumes considerably less power to operate, and generates less noise in communication medium and easiness in IC fabrication. As all the optical components are operated by photon movements, which produces unmatched speed with information processing, this architecture empowers the optical circuit to be one of the promising field of circuit design for future applications. There exist some well-known optical devices and interconnects like (terahertz optical asymmetric demultiplexer (TOAD), Mach–Zehnder Interferometers (MZI), [1, 2], crossbar gate [3]which have been investigated by many researchers for the design of optical logic circuits. But here in this work we used crossbar gate to reduce design overhead. Due to the importance of photon-based technology [4, 5], several works based on dedicated optical components have been proposed. Cheri et al. [4] have shown the design of an optical adder circuit which performs addition of two signed-digits in ultra-fast speed. For this purpose, they have employed SOAs and MZIs to build the design. A tree architecture-based algorithm has been proposed by Roy [5] which uses MZIs to design optical networks for any input logical functions. Some adder circuits like Carry skip adder [6] and Carry look ahead adder [7] have been presented in recent times. Along with the development of various optical circuit-based fabrication technologies, we noticed the importance of designing efficient logic modules, and in the recent years, several articles have published in this domain. Now, here we are discussing some of the works on optical-based logic designs. Scalable synthesis of optical circuits using MZIs is developed in [8], where BDD is used as Intermediate Representation (IR) toward transforming a logic function into optical circuit. Further, cost-efficient representation of MZI-based optical circuit using AIG as IR is developed in [9]. To overcome the overhead incurred in the previous two approaches, an advanced synthesis scheme based on BDDs has been shown in [10] where signal loss issue in optical circuits has been addressed by developing splitter-free architecture. Further, in [11], an efficient MIG/XMG-based technique has been developed to infuse high scalability and optimality in circuit performance. With the development of photon-based fabrication technologies for optical devices, some efficient logic modules are designed in past few years, and several works have come out in this field. Synthesis of optical circuits using graph algorithms [12] is also an emerging zone where several investigations are made. In this work, we have focused on designing an important logic module ALU that often is used in many circuits to perform multi-logic computation. Arithmetic Logic Unit
Design of Optimum n-bit ALU Using Crossbar Gate
437
(ALU) is a basic building block of many central processing units which perform addition, subtraction, multiplication, division, and some logic operations like AND, XOR, XNOR, etc. The main design aspect that we focused while building the circuit is it to make the ALU architecture such a way that the circuit operates on lesser clock cycles and is built with minimum hardware components. Some more logic components like full adder [13, 14] and binary adder circuit [15] are also used in the design to perform specific arithmetic functions in ALU. In this work we have reduced the design overhead of the optical ALU module using one of our previous work [16] on TOAD-based design. The rest of the paper is organized as follows. Section 2 covers the basic idea on ALU circuit and crossbar gate. The proposed design is discussed in Sect. 3. In Sect. 4, we have summarized experimental results, and finally, the work is concluded in Sect. 5.
2 Background 2.1 Optical Circuit If all the components and interconnects of a circuit are designed with optical devices, then the said circuit can be termed as all-optical circuit.
2.2 Beam Combiner and Beam Splitter Beam Combiner: An optical device which combines multiple signals into a highstrength optical signal by adding all the incoming signal’s wavelengths is termed as Beam Combiner (BC). Beam Splitter: An optical device that breaks a high-strength signal into lowstrength signals is known as Beam Splitter (BS). The use of high number of BSs in an optical circuit diminishes the signal strength. The incurred cost for an optical circuit is measured by two parameters, i.e., optical cost and delay, respectively.
2.3 Optical Cost and Delay The overall cost of an all-optical-based representation is the number of crossbar gate which presented in design, whereas delay is the amount of time requires to get the output from a gate. If n number of crossbar switch (gate) is connected in a series, then
438
R. Das et al.
Fig. 1 A crossbar gate
the optical cost of the design n, but if they are connected in parallel configuration, then the cost becomes only , where the symbol represents the delay factor.
2.4 Crossbar Gate A crossbar gate is an all-optical gate which maps a Boolean function B 3 → B 2 operating on two optical inputs with one select input and poses two optical outputs. The inputs as well as the two outputs are moderated by waveguide, and the value of x determines the functionality at the output that means if x is low (i.e., x = 0), then the input will simply be transferred at the output, and if x is high (i.e., x = 1), then input will be interchanged at the output. The final output has shown in Fig. 1. A crossbar gate can be considered as a 2 × 1 MUX, where third input x is worked as a select line of MUX.
2.5 1-Bit ALU A single-bit ALU depicted in Fig. 2a shows one 4 × 1 MUX; one full adder and one logic block are used. In the design, full adder circuit performs addition while logical unit performs some logic operation like AND, XOR, and XNOR. The functional output is shown in Tables 1 and 2. Table 1 defines arithmetic functions, whereas Table 2 shows logical operations. Figure 2b shows block diagram of 4-bit ALU cascading 1-bit ALU. The above circuits in Fig. 2a show one full adder, and one logic block is fed as input to one 4 × 1 MUX. S 0 and S 1 two select lines in 4 × 1 MUX choose the operation to be performed.
3 Proposed Technique In this paper, we propose n-bit ALU that performs some arithmetic operation such as addition, subtraction, multiplication, division, increment, decrement, transfer, and logic operation like AND, OR, XOR, and XNOR. An important part of ALU is full adder that performs addition operation which determines the performance of the circuit. It can also be used to perform subtraction
Design of Optimum n-bit ALU Using Crossbar Gate
439
Fig. 2 a 1-bit ALU, b 4-bit ALU using 1-bit ALU
(a) (b)
Table 1 Arithmetic operation of 1-bit ALU
Table 2 Logic operation of 1-bit ALU
S0
S1
C in
F (output)
0
0
X
Addition
0
1
1
Subtraction
1
0
0
Transfer A
1
0
1
Increment A
1
1
0
Decrement A
S0
S1
F (output)
0
0
A⊕B
0
1
A.B
1
0
A+B
1
1
AB
as addition of negative number. An inverter is used to negate the number. In our design, we used MUX-based design of full adder for the efficient design of the circuit. Another important aspect for the MUX-based implementation of the circuit is crossbar gate as a 2 × 1 MUX can be replaced with single crossbar gate.
3.1 Arithmetic Block of ALU Circuit Figure 3a represents an arithmetic block, and corresponding result has been shown in Table 1. Dashed box determines which arithmetic function will be performed depending on the values of S 1 and S 0 , and the result passed in as one input to the full adder. This circuit performs five basic arithmetic functions which are listed in
440
R. Das et al.
Table 1. If S 0 = 0 and S 1 = 0, then B is simply transferred to full adder circuit, and addition operation is performed. If S 0 = 0, S 1 = 1, and C in = 1, then complement of B is transferred to full adder and subtraction operation is performed. In similar way, if C in = 1 and S 0 = 1, S 1 = 0 then increment operation is performed. Again, when C in = 0 and if S 0 = 1, S 1 = 0 then input A is transferred to the output and if S 0 = 1, S 1 = 1 then decrement operation is performed.
3.2 Logic Block of ALU Circuit Figure 3b represents a logic block which performs 4 bitwise logic operation, i.e., AND, OR, XOR, and XNOR which are listed in Table 2. If S 0 = 0 and S 1 = 0, then the output will be F = A ⊕ B. If S 0 = 0 and S 1 = 1, then the output will be F = AB. If S 0 = 1 and S 1 = 0, then F = A + B and if S 0 = 1 and S 1 = 0 then F = A B.
3.3 Multiplier and Divider Circuit Block of ALU Circuit Next, we added multiplier and divider circuit and combined it with arithmetic block and logic block which is shown in Fig. 3c. Here output of the arithmetic block and MUL/DIV circuit are the inputs to the first crossbar as shown in Fig. 3c and S 2 is worked as a select line of MUX. Again, in next level, output of the arithmetic block and first crossbar is passed as inputs to the second crossbar with S 3 as a third input. F is the final output of 1-bit ALU. Due to its complexity, we have incorporated MUL/DIV circuit as a separate block in Fig. 3d. It consists of XOR block, full adder circuit, arithmetic shift right function (ASR). In the above figure, dashed block consists of XOR function and two AND functions using crossbar switch, and functional unit consists of full adder, ASR, etc. Further details of multiplier circuits are presented in [16]. We can use this circuit for the multiplication as well as division also with slight modification.
3.4 Proposed ALU Circuit Table 3 shows all the operation of our proposed ALU. The circuit shown in Fig. 4a represents a 4-bit ALU using 1-bit ALU as a basic building block. Further in Fig. 4b, a n-bit ALU has been proposed. Each module produces one carry out bit which is passed as input to next module. Two signals S 0 and S 1 are common between all module which determines which blocks will be selected, whereas S 2 and S 3 are integrated within arithmetic and logic blocks to select further specific operation within the block.
Design of Optimum n-bit ALU Using Crossbar Gate
441
Fig. 3 a Arithmetic block of ALU using crossbar gate, b logic block of ALU using crossbar gate, c proposed 1-bit ALU, d 1-bit MUL/ DIV circuit using crossbar gate
(a)
(b)
(c)
(d)
442
R. Das et al.
Table 3 All operation of proposed 1 bit ALU S3
S2
S1
S0
C in
F (output)
0
0
0
0
X
Addition
0
0
0
1
1
Subtraction
0
0
1
0
0
Transfer A
0
0
1
0
1
Increment A
0
0
1
1
0
Decrement A
0
1
0
0
0
A⊕B
0
1
0
1
0
A.B
0
1
1
0
0
A+B
0
1
1
1
0
AB
0
0
0
0
0
Multiplication
1
0
0
0
1
Division
(a)
(b) Fig. 4 a Design of 4-bit ALU using 1-bit ALU as a basic building block, b design of n-bit ALU using 1-bit ALU as a basic building block
For example, if S 0 = 0, S 1 = 0, S 2 = 0, S 3 = 0, and C in = 0, then simply transfer operation is performed. Again, when S 0 , S 1 , S 2 , S 3 are all set to 0 and C in = 0 then multiplication operation is performed, and similarly division operation is performed when S 0 , S 1 , S 2 , S 3 are all set to 0 and C in = 1.
Design of Optimum n-bit ALU Using Crossbar Gate Table 4 Cost metrics of proposed design
443
4-bit ALU
n-bit ALU
Ancilla input
6
n+2
Optical cost
136
34n
Optical delay
20
5n
4 Experimental Evaluation In the above design, three different types of blocks are used. For, the arithmetic block (1-bit ALU), 11 crossbar gates and 9 Beam Splitter have been used. For the logic block (1-bit ALU), we have taken nine crossbar gates and three Beam Combiners and, in case of 4-bit multiplier, 48 crossbar gates are required to prepare the design. So, in case of 4-bit ALU, total optical cost = (No. of gates in arithmetic blocks + No. of gates in Logic blocks + No. of gates in Multiplier/Divider circuit block) + 2 extra gates per unit blocks = 44 + 36 + 48 + 8 = 136. The optical delay we calculate is as the maximum time taken to perform the operation by any of the blocks. So, total delay for 4-bit ALU = Maximum delay to perform any operation = 20. In Table 4, we have shown various cost metrics (like optical cost, optical delay, ancilla input, etc.) for our proposed design.
5 Conclusion In this work, n-bit ALU is implemented using crossbar gates and other optical interconnects. The proposed design comprises of adders, multipliers, and several logic blocks. First, we proposed one 1-bit ALU combining arithmetic and logic block, then we designed n-bit ALU by integrating n no. of 1-bit ALU. Each module has been verified for its functional correctness. Different cost parameters like number of gates and optical delay have been shown.
References 1. Ho B, Peng F, Wu S, Hwang S (2016) Fabrication and characterization of Mach–Zehnder interferometer based on a hollow optical fiber filled with radial-aligned liquid crystal. J Appl Phys 49 2. Haack G, Forster H, Buttiker M (2010) Parity detection and entanglement with a Mach-Zehnder interferometer. Phys Rev B 82(15):155303 3. Condrat C, Kalla P, Blair S (2011, May) Logic synthesis for integrated optics. In: Proceedings of the 21st edition of the great lakes symposium on great lakes symposium on VLSI, pp 13–18 4. Bieri E, Weiss M, Goktas O, Hauser M, Schonenberger C, Oberholzer S (2009) Finitebias visibility dependence in an electronic Mach-Zehnder interferometer. Phys Rev B 79(24):245324
444
R. Das et al.
5. Kaliraj PK, Sieber P, Ganguly A, Datta I, Datta D (2012) Performance evaluation of reliability aware photonic network-on-chip architectures. In: 2012 international on green computing conference (IGCC). IEEE, pp 1–6 6. Das R, Bandyopadhyay C, Rahaman H (2016) All optical reversible design of MachZehnder interferometer based carry-skip adder. In: IEEE international conference on distributed computing, VLSI, electrical circuits and robotics, DISCOVER 2016—proceedings 2016, 7806228, pp 73–78 7. Dutta P, Bandyopadhyay C, Giri C, Rahaman H. Mach-Zehnder interferometer based all optical reversible carry-look ahead adder. https://doi.org/10.1109/ISVLSI.2014.102 8. Schonborn E, Datta K, Wille R, Sengupta I, Rahaman H, Drechsler R (2015) BDD-based synthesis for all-optical Mach-Zehnder interferometer circuits. In: 2015 28th international conference on VLSI design (VLSID). IEEE, pp 435–440 9. Deb A, Wille R, Drechsler R (2017, Nov) Dedicated synthesis for MZI-based optical circuits based on AND-inverter graphs. In: 2017 IEEE/ACM international conference on computeraided design (ICCAD). IEEE, pp 233–238 10. Bandyopadhyay C, Das R, Wille R, Drechsler R, Rahaman H (2018) Synthesis of circuits based on all-optical Mach-Zehnder interferometers using binary decision diagrams. Microelectron J 71:19–29 11. Das R, Bandyopadhyay C, Rahaman H (2022) An improved synthesis technique for optical circuits using MIG and XMG. Microelectron J 120:105341 12. Deb A, Wille R, Keszöcze O, Shirinzadeh S, Drechsler R (2017) Synthesis of optical circuits using binary decision diagrams. Integration 59:42–51 13. Kotiyal S, Thapliyal H, Ranganathan N (2012) Mach-Zehnder interferometer based design of all optical reversible binary adder, pp 721–726 14. Datta K, Sengupta I (2014) All optical reversible multiplexer design using Mach-Zehnder interferometer. IEEE VLSI-Design 2014, pp 539–544 15. Roy JN (2009) Mach–Zehnder interferometer based tree architecture for all-optical logic and arithmetic operations. Optik 120(7):318–324 16. Manna A, Saha S, Das R, Bandyopadhyay C, Rahaman H (2017, Dec) All optical design of cost efficient multiplier circuit using terahertz optical asymmetric demultiplexer. In: 2017 7th international symposium on embedded computing and system design (ISED). IEEE, pp 1–5
ML-Based PCB Classification with Gabor and Statistical Features Kangkana Bora, M. K. Bhuyan, Yuji Iwahori, Genevieve Chyrmang, and Debajit Sarma
Abstract In this paper, an algorithm for the classification of Printed Circuit Boards (PCB) into true and pseudo defects have been proposed. The research has been done on the feature extraction from the real images of defective PCBs, where two kinds of features have been extracted, viz. Statistical and Gabor features. The scatter plots of principal components of each feature subset exhibited the individual resolution between the various kinds of defects. Further, the implications of evolutionary approaches of optimization and supervised feature selection based on the Mahalanobis distance are also explored. Finally, K-means and Support Vector Machine (SVM) are used for categorization. Upon the simulations and experiments on real images, an overall accuracy of 98.27% has been recorded with the proposed work which can be considered as highly satisfactory. Keywords Gabor filter · Genetic algorithm · Mahalanobis distance · Principal component analysis · SVM · KNN
K. Bora (B) · G. Chyrmang Department of Computer Science and IT, Cotton University, Guwahati, India e-mail: [email protected] G. Chyrmang e-mail: [email protected] M. K. Bhuyan · D. Sarma Department of Electrical and Electronics Engineering, Indian Institute of Technology (IIT) Guwahati, Guwahati, India e-mail: [email protected] D. Sarma e-mail: [email protected] Y. Iwahori Department of Computer Science, Chubu University, Kasugai, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_41
445
446
K. Bora et al.
1 Introduction The printed circuit board (PCB) is a vital component of the modern electronics. Their size varies from a small flash drive to a large motherboard, with designs of different complexities. The tedious manufacturing process and other agents lead to various anomalies in the production line. Certain defects are found to be irredeemable and cannot be repaired. Many times, PCBs develop weak rust or a layer of dust on them with time. Usually, such conditions are also termed as defects but they are repairable with adequate and cheap processes. Thus, in perusal of recycle and reuse, the defective PCBs are broadly classified in two broad categories, namely ‘true or irreparable defects’ and ‘pseudo or reparable defects’. The accurate classification of PCBs into mentioned categories would not only improve the industrial productivity but would also have considerable environmental impacts. Some of the examples of these true and pseudo defect PCBs are illustrated with real images of PCBs (Fig. 1). The two categories of defective PCBs, i.e., pseudo and true defects can be further classified into sub categories depending on the type of defect. The pseudo defects are majorly caused due to dust deposition and weak rust, while the true defects are borne due to (i) not connected, (ii) connected, (iii) projection, (iv) independent, (v) thick rust, (vi) wear rust and (vii) mouse bite. A dataset of 1164 samples of defective PCBs have been used to carry out the experiments and analysis in this current research, out of which, 882 samples are of PCBs with true defects and 282 samples of PCBs with pseudo defects. Initially, the PCBs were classified manually owing to the varied designs of PCBs and subclasses of mentioned defects as shown in Fig. 1, though, automatic optical inspection (AOI) system has been applied to this effect using various techniques of image processing and machine learning [1–7]. The relevant texture information is extracted from the defect regions of the images in the form of various feature sets. The defect regions are subsequently obtained by image subtraction of defective images with the respective reference images as illustrated in Fig. 2. The various kinds of features are used to describe the texture of the defect region obtained. Iwahori et al. [7] focus on the statistical features like cooccurrence matrix etc., while Sikka et al. [8] uses the mean and variance of the filtered images of wavelet transform as effective classification features. But in recent past, the Gabor filter-based texture feature extraction based on a Gabor filter bank has gained Fig. 1 Images of defective PCBs with both true and pseudo defects. (i) Wear rust, (ii) not connected, (iii) connected, (iv) independent, (v) thick rust, (vi) projection, (vii) dust and (viii) weak rust
ML-Based PCB Classification with Gabor and Statistical Features
447
Fig. 2 Defect region obtained from the defective PCBs with the subtraction method; (i) defective image, (ii) reference image, (iii) difference image
a lot of popularity [9–13], as they provide the multi-channel feature representation of a texture pattern which is similar with the multi-channel filtering mechanism of the human visual system in perceiving texture information. Thus, a set of Gabor filter banks has been designed and implemented to obtain the respective filtered images, from which the relevant features are obtained. The parameters that define the Gabor filters are the highest frequency (FM ), the total number of central frequencies (n F ), the total number of orientation (n o ), , smooth parameters of Gaussian envelope (γ along x axis and n along y axis) and frequency ratio (Fr ) that is ratio between central frequency of the filter at the next lower frequency (Fn−1 ). In [14], it has been primarily discussed that the smooth parameters of Gaussian envelope and frequency ratio play more significant role in determining the filter response when compared with frequency and orientation parameters. In this regard, the Genetic Algorithm (GA) has been applied to effectively tune these parameters so as to obtain the optimal filter bank. The GA has primarily used in place of other optimization techniques based on the differential minima and maxima, because the implementation of GA does not require the mathematical model of the system to be optimized, but require a fitness function. In our case, this fitness function is aimed to maximize the mean of accuracy and minimize the variance of accuracy of classification. In pattern recognition problems, the scatter plots are drawn to visualize and infer the relation between the data samples. In current research work, the scatter plots of the data samples are also plotted with the extracted features (statistical and Gabor features) after reducing the dimension of vectors using principal component analysis (PCA). The PCA reduced the dimension of extracted feature vector to principal components. It was experimentally observed that first two principal components contained over 99.9% of original information. While plotting the scatter plots, the average values were used for each kind of defect. The proposed AOI system obtains both statistical features and Gabor features for the classification of defective PCBs into true and pseudo defects. In order to ensure the efficient discrimination between the two classes both Gabor and statistical features have been adequately optimized in this regard. The statistical features have been selected on the basis of their maximum power to discriminate the textures of pseudo and true defect images with an algorithm employed in [15] based on the Mahalanobis separability. After its implementation, the feature bank got reduced and hence forth increased the accuracy of the classification. Moreover, a compact filter bank of Gabor filters was designed with the help of GA. In implementation of GA, the fitness function was aimed to achieve a stable and higher accuracy of classification. The
448
K. Bora et al.
experimental results show that the algorithm removes the redundant features and thus improves the running time efficiency and the overall accuracy of classification. The scatter plots of statistical features (after selected with supervised feature selection) and Gabor features (optimized with genetic algorithm) indicates that the defective PCBs can be classified to reflect the level of defects. In the proposed algorithm, as shown in Fig. 3, the Gabor features are first used to cluster the data samples into two separate clusters, say cluster 1 and cluster 2 using the K-means classifier. Then the statistical features are subsequently extracted from both clustered data samples. Two separate support vector machines (SVM) are trained using both the training clustered groups followed by subsequent testing of the trained machines with testing clustered data samples. Both the Gabor features and statistical features are normalized to zero mean and unit variance. An overall accuracy of 98.27% is observed with the proposed approach. Main contribution of the paper lies in the framework that is designed where different machine learning techniques are deployed and achieved a satisfactory result. Rest of the work is organized as follows—Sect. 2 includes the methodology. Section 3 deals with the optimization techniques. The feature selection and dimension reduction is briefly discussed in Sect. 4. Section 5 contains the essential details of the experiments done in this regard and the relevant observations and results are
Fig. 3 Proposed approach for better classification accuracy of defective PCBs using both Gabor and statistical features
ML-Based PCB Classification with Gabor and Statistical Features
449
Fig. 4 Flowchart of supervised feature selection algorithm
discussed in Sect. 6. In later sections, the conclusions and future work is briefly discussed.
2 Methodology Feature extraction is the crucial step in pattern recognition in which the essential information is extracted from the data samples in the form of a feature vector. The features containing the information about the texture of the images have been broadly classified into five major categories by [16] namely statistical features [17, 18], geometrical features, structural features, model-based features, signal processing features. Out of which, the most effective features are found to be: Fourier transform, Gabor filters, Wavelet transform, Markov random field (MRF), co-occurrence features and local binary patterns (LBP) [19]. In the current study, two kinds of features, viz. statistical/ co-occurrence features and Gabor features are used for the pattern recognition and defects detection/classification in defective PCBs.
450
K. Bora et al.
2.1 Feature Extraction (a) Statistical Features: Kumar et al. [7] have used various features primarily based on spatial distribution of gray values. They carefully analyzed the intensities of the data samples and made an observation that the gray level ratio, i.e., the ratio of number of pixels having pixel value greater than 70 to the number of pixels having pixel value lying between 20 and 70 is more in true defect when compared with pseudo defect. For the proposed work, the statistical features considered are average gray level, standard deviation, smoothness, entropy, uniformity, third moment, gray level ratio. (b) Gabor features: Initially, Daugman [13] proposed the use of Gabor filters in the modeling of the receptive simple cells in the visual cortex of some mammals. But later it was established that the frequency and orientation representations of Gabor filters are similar to those of the human visual system, and thus they have been found to be particularly appropriate for texture representation and discrimination. Turner [20] and Bovik et al. [21] proposed the use of such filters for the texture analysis of the image, as it was found that they provide the multi-resolution in both frequency and orientation domain and that too with optimal localization. In this paper, the postprocessing of Gabor feature matrices is done by calculating the mean and standard deviation values obtained from each filtered image as corresponding features.
2.2 Genetic Algorithm In Ref. [15], it has been widely discussed about the application of GA to design an optimized Gabor filter bank for better texture classification. For experimental purpose, a population of binary string is generated randomly. Parameters that generate classification accuracies with no variance and give low mean accuracy, should not give a good fitness. In our case, this fitness function is aimed to maximize the mean of accuracy and minimize the variance of accuracy of classification. Parent selection is done using tournament selection. This method does not require any global knowledge of the population. In the tournament of size k, individuals are selected randomly from population set and they will compete with each other in terms of fitness value. The winner will be included as one of the parents. One-point crossover is used for recombination operator. It chooses a random number in the range [0, 1] (which is the length of individuals), and then splitting both parents and followed by creation the two children by exchanging the tails with probability pc . Mutation operator is done using bitwise method that considers each gene separately and allows each bit to flip with a small probability pm . Survival selection is done according to elitism method. In this method, worst individuals of the offspring population are replaced with the best members of parent populations. This operator has proved to increase
ML-Based PCB Classification with Gabor and Statistical Features
451
the speed of convergence of the GA, because it ensures that the best solutions found in each generation are retained. Termination condition is satisfied when fitness of one individual is zero or the maximum number of iteration has been done.
2.3 Feature Selection Consider a total of N training images and M is the dimension of feature set extracted from each of these images. The Mahalanobis distance can be used to measure the distance between class i and class j along a particular feature dimension (multiple features are allowed), with correlation between features removed. Hence, it usually gauges the discriminative power of feature groups for the mentioned classes. High values of Mahalanobis distance reflects the lesser correlation between the features and hence the greater discriminative power for a feature group. The Mahalanobis separability measure Ji, j,k between class i and class j along k-th feature group can be defined as follows: T Ji, j,k = m i,k − m j.k Ci,−1j,k m i,k − m j.k m i,k , m j.k : Mean vector of class i and class j along kth feature group respectively and Ci,−1j,k : Covariance vector of training sample class i and class j along kth feature group. Mahalanobis separability measure provides an effective evaluation of the feature groups, and can be easily extended to the combination of feature groups as well. Feature groups or feature group combination with large Mahalanobis distance imply large saliency and should be retained. The proposed algorithm automatically selects the feature groups with the Mahalanobis distance as metric and forward search of features. The selected feature subset retains the discriminative power of the full feature set. The Mahalanobis distance is just used to as a metric to gauge the discriminative power along a selected feature group but not to estimate the overall classification performance. Figure 4 describes the flowchart used in this work.
3 Experimental Setup Some of the important parameters for the experiments are listed in Table 1, which also include the database details. Images are generated at Chubu University, Japan. The available data samples are randomly divided into the test and training samples with cross validation. During cross validation the ‘Holdout’ approach is followed with 10% as leftover rate, i.e., 1048 images have been used for training purposes and 116 images have been used for test purposes. The statistical features are directly extracted from the difference images of data samples. But while designing the Gabor filter, the highest frequency, total number of
452 Table 1 Different parameters for experiments
K. Bora et al.
Database size Sample with true defects
882
Sample with pseudo defects
288
Total size
1164
Parameter for Gabor filter bank creation No. of frequency
4
Number of orientation
4 √
Frequency ration Smoothing parameter
2
(0.5,1)
Parameter for genetic algorithm Population size
20
Crossover probability
0.8
Mutation probability
0.5
Tournament size
3
Elitism count
25%
orientations and the number of frequencies is first specified and the combination of orientations is then performed to create a filter bank with n_f * n_o filters. But it has been reported in [14] that the smoothing parameters of Gaussian envelope plays an important role than frequency and orientation parameters. Thus, an initial filter bank is designed with arbitrarily chosen values for design parameters as given in Table 1. The scatter plot of principal components of derived Gabor features is plotted in Fig. 5. Later a GA was applied to optimize the design parameters of Gabor filter bank according to the available dataset and problem statement, i.e., better classification of PCBs in true and pseudo defects. Table 1 lists all the parameters set during the application of GA. The extracted features were normalized to zero mean and unit variance. Besides this the Gabor features usually consist of both imaginary and real Fig. 5 Scatter plot of various defects with statistical feature
ML-Based PCB Classification with Gabor and Statistical Features
453
parts; thus, each Gabor feature is split into corresponding magnitude and argument angle as SVM can’t be applied on complex data. The extracted features were also selected using supervised feature selection in quest of removal of the redundant features and hence improving the classification performances. Table 2 gives the selected statistical features along with the individual and cumulative classification accuracies. While using SVM [13, 22, 23] classifier and statistical features, the RBF kernel is used, as it is mentioned in [7] that a nonlinear radial basis function with (sigma 0.8 and C = 1) provides the best classification results. While deciding the K in K-means, an odd K value is chosen (1 or 3) as it breaks the possibility of a tie while clustering the test sample. The formulas for the classification accuracies are as follows: Accuracy =
TP + TN ∗ 100 TP + TN + FP + FN
The accuracies have been calculated as average of over 20 random samples to zero down the effect of random training and testing samples into the classifier. Due to this, Table 2 Statistical features along with the accuracies of classification (entropy feature has not made any significant contribution) Statistical features
Accuracy (%)
TT
TP
Uniformity
75.90
87.40
0
27.95
0.65
Smoothness
87.84
84.20
3.40
10.70
17.70
Standard deviation
89.00
85.25
2.50
10.25
18.00
Average gray level
93.66
84.20
4.55
2.80
24.45
Third moment
93.01
83.25
5.30
2.80
24.65
Gray level ratio
94.35
83.45
4.10
2.45
26.00
Entropy
93.70
83.10
4.75
2.55
25.60
Fig. 6 Scatter plot of various defects with unoptimized Gabor feature
PT
PP
454
K. Bora et al.
Table 3 Classification accuracies and performance with different feature sets and classification algorithms Features + classifier
Accuracy (%)
TT
TP
PT
PP
Statistical features (7) + linear SVM
94.22 ± 2.11
84.00
4.85
1.85
25.30
Selected statistical features (6) + linear SVM
94.22 ± 2.48
83.10
4.90
1.80
26.20
Statistical features (7) + SVM RBF
93.36 ± 2.20
82.45
5.55
2.15
25.85
Statistical features (7) + KNN (k = 3)
91.93 ± 2.67
84.80
4.00
5.35
21.85
Gabor features (32) + Linear SVM
48.27 ± 21.85
40.85
46.25
13.75
15.15
Gabor features (32) + RBF SVM
52.58 ± 27.71
45.30
43.30
11.70
15.70
Gabor features (32) + KNN (k = 3)
79.05 ± 2.94
81.95
6.40
17.90
9.75
34
0
2
74
The accuracy and performance with the proposed algorithm SVM 1 (Group 1)
98.18
SVM 2 (Group 2)
100
1
0
0
5
Overall accuracy (SVM1 + SVM2)
98.27
35
0
2
79
Bold value indicates the best result obtained
the values of TT, TP, PT and PP are written as float values (being the average of 20 observations). The scatter plot of statistical feature principal components are shown in Fig. 5, it is evident that on comparison with spatial features the pseudo defect type of weak rust is quite away from rest of true defect types but alone of mouse bite. An average accuracy of about 94% has been recorded in Table 3 as well. Figure 6 shows the Scatter plot of various defects with un-optimized Gabor feature. In Fig. 7, the scatter plot of optimized Gabor filters is shown. As marked by red circles the two clusters are identified, viz. Group 1 with defect types of weak rust, projection and independent; and while rest of the data samples as cluster 2. As seen both clusters have just one pseudo defect type, viz. Group 1 with weak rust and Group 2 with dust type. Thus, once identifying the group type of test sample, they can be easily classified as a true or pseudo defect type with statistical features. The above assertion can be reasoned out with Figs. 5 and 7 and as well flowchart of proposed algorithm as shown in Fig. 3.
4 Results The accuracies have been calculated as average of over 20 random samples to zero down the effect of random training and testing samples into the classifier. Due to this, the values of TT, TP, PT, and PP are written as float values (being the average of 20 obs in Table 2), the features are listed in an increasing order of their discriminative power based on the Mahalanobis distance-based feature selection method. The accuracy corresponding to each feature is calculated by augmenting it with the preceding features. The inclusion of entropy feature does not lead to a better classification,
ML-Based PCB Classification with Gabor and Statistical Features
455
Fig. 7 Scatter plot of various defects with optimized Gabor features
hence can be considered as an insignificant feature. This experimental result is in due agreement with the order suggested by the employed feature selection algorithm. The floating digits of TT, TP, PT and PP are due to averaging done for random— cross-validated 20 observations for test images. ‘T’ indicating true defects and ‘P’ is indicating the pseudo effect. In Table 3, the accuracy values are calculated for varied type and number of features in combination with different types of classification algorithms. The Table 3 lists the accuracy of the proposed algorithm, where an accuracy of 98.27% is obtained, showing the high efficiency of our proposed work.
5 Conclusions It is very well observed and concluded in the current research work that the Gabor features upon optimization can provide great resolution among some kind of defects and statistical features upon selected can do the same with other set of defects. Thus, the proposed algorithm of split clustering and dual classification can provide a satisfactory accuracy. It has shown an overall accuracy of 98.27% in classifying the true and pseudo defects. Acknowledgements Iwahori’s research is supported by JSPS Grant-in-Aid for Scientific Research (C) (20K11873) and a Chubu University Grant.
456
K. Bora et al.
References 1. Moganti M, Ercal F, Dagli CH, Tsunekawa S (1996) Automatic PCB inspection algorithms: a survey. Comput Vis Image Underst 63:287–313 2. Roh B, Yoon C, Ryu Y, Oh C (2001) A neural network approach to defect classification on printed circuit boards. J Japan Soc of Precis Eng 67:1621–1626 3. Tanaka T, Hotta S, Iga T, Nakamura T (2007) Automatic image filter creation system: to use for a defect classification system. IEICE Tech Rep 106:195–198 4. Rau H, Wu CH (2005) Automatic optical inspection for detecting defects on printed circuit board inner layers. Int J Adv Manuf Technol 25:940–946 5. Kondo K, Kikuchi K, Shibuya H, Maeda S (2009) Defect classification using random feature selection and bagging. J Inst Image Electr Eng Japan 38:9–15 6. Iwahori Y, Futamura K, Adachi Y (2011) Discrimination of true defect and indefinite defect with visual inspection using SVM. In: International conference on knowledge-based and intelligent information and engineering systems, pp 117–125 7. Iwahori Y, Kumar D, Nakarawa T, Bhuyan MK (2012) Improved defect classification of printed circuit board using SVM. Intell Decis Technol 2:355–363 8. Sahil S, Karan S, Bhuyan MK, Yuji I Pseudo VC true defect classification in printed circuit boards using wavelet features. Comput Vis Pattern Recogni. arXiv:1310.6654 9. Pakdel M, Tajeripour F (2011) Texture classification using optimal Gabor filters. In: 1st international e-conference on computer and knowledge engineering (ICCKE), pp 208–213 10. Hyun JP, Hyun SY (2001) Invariant object detection based on evidence accumulation and Gabor features. Pattern Recogn Lett 22:869–882 11. Kamarainen J, Kyrki V, Kalviainen H (2006) Invariance properties of Gabor filter based features—overview and applications. IEEE Trans Image Process 15(5):1088–1099 12. Kyrki V, Kamarainen JK, Kalviainen H (2004) Simple Gabor feature space for invariant object recognition. Pattern Recogn Lett 25:311–318 13. Daugmann JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Opt Soc Am A 2:1160–1169 14. Bianconi F, Fernández A (2007) Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recogn 40:3325–3335 15. Li W, Mao K, Zhang H, Chai T (2010) Designing compact Gabor filter banks for efficient texture feature extraction. In: 11th international conference on control automation robotics & vision, pp 1193–1197 16. Wu WY, Wang MJJ, Liu CM (1996) Automated inspection of printed circuit boards through machine vision. Comput Ind 29:103–111 17. Vapnik VN (1998) Statistical learning theory. John Wiley & Sons, Inc 18. Tuceryan M, Jain AK (1993) Texture analysis. In: Handbook of pattern recognition and computer vision, C.H., pp 235–276 19. Wallace AM (1988) Industrial application of computer vision since 1982. IEEE Proc E Comput Digital Techn 135:117–136 20. Turner MR (1986) Texture discrimination by Gabor functions. Biol Cybern 55:71–82 21. Bovik (1989) Gabor filters as texture discriminator. Biol Cybern 61(2):103–113 22. Dey A, Bhoumik D, Dey KN (2019) Automatic multi-class classification of beetle pest using statistical feature extraction and support vector machine. In: Emerging technologies in data mining and information security. Springer, Singapore, pp 533–544 23. Singhania AV et al (2021) A machine learning based heuristic to predict the efficacy of online sale. In: Emerging technologies in data mining and information security. Springer, Singapore, pp 439–447
Capacity Analysis Over Shadowed BX Fading Channels for Various Adaptive Transmission Schemes Sisira Hawaibam and Aheibam Dinamani Singh
Abstract Shadowed Beaulieu-Xie (SBX) fading channels are a fading model that is suitable to describe emerging millimeter-wave channels. Channel capacity is an important performance measure for a communication channel. This work presents capacity evaluation of SBX fading channels with different adaptive transmission schemes. The mathematical expressions of capacity with various adaptive transmission schemes are obtained. The adaptive schemes considered are channel inversion with fixed rate, truncated channel inversion with fixed rate, optimal simultaneous power, and rate adaption and optimal rate adaption with constant power. The plots presented show how channel capacity performs with different shadowing and fading parameters. Keywords CIFR · OPRA · ORA · shadowed Beaulieu-Xie fading channel · TIFR
1 Introduction The study of wireless communication system under numerous fading environment is a popular research area. There are many fading models for different scenarios [1]. However, the traditional fading models are not able to efficiently represent the channel encountered in mobile communication. Rayleigh fading channel does not support line of sight (LOS) components. It considers non-line of sight (NLOS) signal transmission. Rician fading channel [2] supports signal transmission through direct LOS components. The setback is that it cannot characterize NLOS components and has very limited range of fading situations and is therefore not flexible. Nakagami-m fading channel can characterize multiple NLOS components. It is flexible with wide range of fading parameters but it does not support LOS components [3–5]. A composite model called shadowed Beaulieu-Xie (SBX) fading model was introduced in [2]. SBX fading model is grounded upon BX fading model which is able to characterize large- and small-scale fading. This fading model is desired over traS. Hawaibam (B) · A. D. Singh NIT Manipur, Langol, Manipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_42
457
458
S. Hawaibam and A. D. Singh
ditional fading models because of its flexibility over range of fading parameters and its compatibility for LOS/NLOS components [6]. In [2], the authors also present the relationship between SBX fading channels and other fading channels such as BX, Rayleigh, Rician, Nakagami-m, etc. SBX fading model also has an advantage of being fit for millimeter-wave wireless communication system in 28 GHz band. The performance analysis for average bit error rate for various modulation schemes and outage probability via SBX fading model is carried out in [2]. One of the major performance metrics of the wireless channel is the capacity in communication system. The channel capacity is defined as the maximal information rate that can be conveyed to the receiver via a channel with a least error. In the literature, there has been many works on capacity analysis over fading channels [7–11]. In [12], capacity analysis for various transmission schemes for SBX fading model is studied. The authors have also evaluated unified ergodic capacity for various diversity techniques. To the best of our knowledge, in [12], channel capacity with channel inversion with fixed rate (CIFR) and truncated channel inversion with fixed rate (TIFR) schemes for different fading and shadowing parameters are not compared and plotted graphically. Among the adaptive transmission schemes, CIFR and TIFR are important schemes for practical implementations. The approach to evaluation of capacity under optimal simultaneous power and rate adaption (OPRA), CIFR, and TIFR schemes is different in this paper. Motivated by the lack of such works, capacity analysis of different adaptive transmission schemes under SBX fading channel is studied. For this paper, numerical expressions of different adaptive transmission schemes are obtained via SBX fading channel. The different adaptive transmission schemes studied are CIFR, TIFR, OPRA, and optimal rate adaption with constant power (ORA). This study is done considering the effects of shadowing and fading parameters on the channel capacity also. The organization of the paper is as follows. In Sect. 2, SBX fading channel is given. Amount of fading is analyzed in Sects. 3, and 4 presents capacity analysis under different adaptive transmission schemes over SBX fading channels. The obtained results are presented in Sect. 5 and followed by conclusions in Sect. 6.
2 Fading Channel SBX fading channel is identified by four parameters, which are m X , X , m Y , Y . The SBX fading channel is derived from Nakagami-m and BX fading channels [2]. Assuming that NLOS and LOS components undergo different types and levels of fluctuations. Both NLOS and LOS components experience separate shadowing and fading severity parameters. The PDF envelope for SBX fading channel in [2] is given as f Q (q) =
∞ 2m X q m X 0
X
2 2 X s m X −1 exp − m (q + s ) X
×Im X −1
2m X X
sq
f S (s)dy,
(1)
Capacity Analysis Over Shadowed BX Fading …
459
where average power of NLOS components is denoted by X and fading parameter of both LOS and NLOS signals is represented by m X . In SBX fading channel, m X can take values greater than zero unlike BX fading channel [2]. Severe fading takes place at low value of m X and less fading as the values of m X increases. Ia (.) denotes ath-order modified Bessel function of the first kind [2]. For Nakagami-m fading channel, the SNR PDF is given in [2] as 2m Y m Y mY s2 2m Y −1 ,s ≥ 0 s exp − f S (s) = Y (m Y )m Y Y
(2)
where the shadowing parameter is denoted by m Y ≥ 0 [2]. The range is more flexible than Nakagami-m fading channel, and it is suitable for severe shadowing. Y is the average power of LOS components [2]. Assuming s to be fluctuating in (1) due to its dynamic nature of the channel. The PDF of envelope over SBX fading channel is obtained from (1) and (2) in [2] as f Q (q) =
2 ξ (m X ) Y
mX X
m X
2 q 2m X −1 exp − mX Xq 1 F1 m Y ; m X ; ψq 2 ,
(3)
m Y m 2 Y X and ψ = X (m X XY +m . (.) dentoes the gamma where ξY = m X mYY+m Y X Y X ) function, and 1 F1 (.; .; .) is confluent hypergeometric function [13, p. 1023]. The SNR PDF over SBX fading channel is determined from (3) and is obtained in [2] as f S (s) = ξY
mX X
m X
s m X −1
∞ z=0
(m Y + z) mX s . (4) (ψs)z exp − z!(m X + z)(m Y ) X
The instantaneous CDF of received SNR via SBX fading channels is described in [2] FS (s) = ξY
∞ z=0
(m Y + z) mX z s , (ξ X ) g m X + z, z!(m X + n)(m Y ) X
(5)
Y where ξ X = m X mYX+m and g(., .) is incomplete gamma function [14]. Y X The vth-moment of the output SNR is given in [10] as
v
∞
E[s ] = 0
s v × f S (s)ds.
(6)
460
S. Hawaibam and A. D. Singh
Substituting (4) into (6) and solving using [15, (3.381.4)] are expressed as E(s v ) = ξY
mX X
m X ∞ m X −m X −z−v (m Y + z)(m X + z + v) . (7) (ψ)z z!(m X + z)(m Y ) X z=0
3 Amount of Fading It is defined as an estimation of how severe a channel experiences fading [10] which is expressed as E[s 2 ] −1 (8) af = E[s]2 By putting the value of v = 1 and v = 2 in Eqs. (7) and solving (8) as given above, the end expression can be obtained.
4 Capacity Analysis In this section, capacity of SBX fading channel is analyzed. The study considers four transmission adaptive schemes, namely CIFR, TIFR, OPRA, and ORA.
4.1 CIFR In CIFR, the transmitter tries to support a fixed data rate irrespective of the channel conditions. The transmitter adjusts its transmission power on the severity of the channel fading. By doing this, the impact of fading is tackled. The channel capacity expression for CIFR is [16] CCIFR = Blog2 1 + ∞ 0
1
,
f S (s) ds s
(9)
where channel bandwidth is denoted by B (hertz). Final expression for CIFR under SBX fading channel is obtained by using [15, (3.381.4)] and solving (9) as ⎛ ⎜ CCIFR = Blog2 ⎝1 +
∞
1 ξY
(ψ)
⎞
−z
(m Y +z)(m X +z−1) z=0 n!(m X +z)(m Y )
mX X
⎟ −z+1 ⎠ .
(10)
Capacity Analysis Over Shadowed BX Fading …
461
4.2 TIFR The disadavantage of CIFR scheme is that huge amount of power is necessary to balance severe fading. TIFR scheme overcomes the limitations as power is allocated by the transmitter as the received SNR reaches above a fixed threshold SNR, sth. The channel capacity with TIFR scheme in [10] as 1 (11) CTIFR = Blog2 1 + ∞ f S (s) × (1 − Po ), sth s ds where Po is the outage probability of the SBX fading channel. It is expressed in [2] as ∞ (m Y + z) mX (12) Po = ξY so , (ξ X )z g m X + z, z!(m X + z)(m Y ) X z=0 where so is the threshold ∞ level. First, the term Y = sth f Ss(s) ds is solved using [15, (3.381.3)] and is obtained as
Y = ξY
∞ z=0
−z+1 (m Y + z) mX z mX m X + z − 1, sth .(13) (ψ) z!(m X + z)(m Y ) X X
The final expression can be calculated by placing (12) and (13) in (11) of TIFR scheme via SBX fading channels.
4.3 OPRA For OPRA scheme, properties of the channel are accessible to both transmitting and receiving side. The channel capacity with respect to OPRA in [17] is
∞ COPRA = B
log2
s f S (s)ds, sth
(14)
0
where sth denotes the cut off SNR level. Data transmission is impossible if the value is below sth . The expression above must fulfill the condition
∞ sth
1 1 − s sth
f S (s)ds = 1.
(15)
462
S. Hawaibam and A. D. Singh
Now, by putting (4) in (15) and solving the integral is obtained as
ξY
∞ z=0
m X −z (m Y + z)(ψ)z z!(m X + z)(m Y ) X
⎫ m X + z, m X sth ⎬ X mX X × m X + z − 1, sth − = 1. ⎩ X ⎭ X sth ⎧ ⎨m
(16)
The final expression of OPRA over SBX fading channel can be calculated by using [18, (64)] in (14) as COPRA = ξY
×
mX X
m X ∞ z=0
(m Y + z)(ψ)z (m X + z − 1)! z!(m X + z)(m Y ) m X sth m X +z
m X +z−1 r, m X sth X r!
r =0
X
.
4.4 ORA For ORA scheme, the properties of channel are accessible to receiving side only, while transmitting power remains constant. The rate of transmission is adjusted by the transmitter based on the fading condition of the channel. The channel capacity expression for ORA scheme is [10].
∞ CORA = B
log2 (1 + s) f S (s)ds.
(17)
0
The final expression of ORA over SBX fading channel is calculated by using [18, (78)] in (18) and obtained as
CORA = BξY G 3,1 2,3
mX X
|
mX X
m X ∞ z=0
(m Y +z)(ψ)z z!(m X +z)(m Y )
−m X − z, 1 − m X − z , 0, −m X − z, −m X − z
where G u,v y,z (·) represents the Meijer G-function [19].
(18)
Capacity Analysis Over Shadowed BX Fading …
463
Fig. 1 Capacity analysis under CIFR scheme
Fig. 2 Capacity analysis under TIFR scheme
5 Numerical Results In this part, mathematical expressions of channel capacity with adaptive transmission schemes obtained in the above are shown. The graphs are plotted against average SNR in dB considering with different fading and shadowing parameters with constant Y = 1 dB. The analytical results are verified with standard results. Figure 1 illustrates the plot for capacity with CIFR scheme against average SNR for SBX fading channel. The graph is plotted for variable values of fading parameter and shadowing parameter. As observed, capacity improves as the value of fading
464
S. Hawaibam and A. D. Singh
Fig. 3 Capacity analysis under OPRA and ORA schemes
Fig. 4 Comparison of various adaptive transmission schemes
parameter goes from m X = 2 to m X = 4. Capacity is improved by 0.5 bits/s/Hz at the fixed value of average SNR, 10 dB. It is also noticed that at lower SNR values, curves with m Y = 50 does better compared to m Y = 1. At lower values of m Y , represents heavy shadowing whereas at higher values of m Y , represents less shadowing. Figure 2 depicts the plot for capacity with TIFR scheme against average SNR for SBX fading channel. The graph is plotted for several values of m X and constant value for shadowing parameter, i.e., m Y = 1 . The difference between capacity at m X = 1 and m X = 4 for a fixed value of average SNR at 20 dB is approximately 1.5
Capacity Analysis Over Shadowed BX Fading …
465
bits/s/Hz. This is because system deteriorates at smaller values of m X and improves as the value rises. Figure 3 depicts the plot for capacity with OPRA and ORA schemes versus average SNR for SBX fading channel. The graph is plotted for m X = 1, 4 and constant value for shadowing parameter, m Y = 1. At fixed value of average SNR at 15 dB, the capacity at m X = 4 is approximately 4.5 bits/s/Hz, whereas the capacity at m X = 1 is 4 bits/s/Hz. Capacity of OPRA schemes improves as compared to ORA scheme at low values of SNR and converges at higher values of SNR. The effect of fading decreases as value of fading parameter increases. Figure 4 illustrates the plot for capacity with different adaptive schemes against average SNR for SBX fading channel. The curves for different adaptive schemes which are TIFR, CIFR, OPRA, and ORA are compared for fixed value of m X = 4 and m Y = 1. The capacity of OPRA is superior than ORA, TFIR, and CIFR. During severe fading, the transmitter transmits at its full power but such signals may not be recovered at the receiver. Consequently, it lowers the capacity of the channel. At higher average SNR value, capacity of TIFR and CIFR; OPRA and ORA converges to the same value.
6 Conclusions This paper evaluates mathematical expressions for channel capacity analytically for SBX fading channel. The channel capacity was analyzed with different adaptive transmission schemes namely CIFR, TIFR, OPRA, and ORA. The impact of fading and shadowing parameter was taken into consideration in the analysis. The numerical results were plotted for all the schemes and were also compared. The results obtained showed that capacity improved when the values of m Y and m X increased. Out of the four adaptive transmission schemes, capacity of OPRA was better than other schemes, while capacity of CIFR was the least.
References 1. Proakis G (2001) Digital communications. McGraw-Hill, New York 2. Olutayo A, Cheng J, Holzman JF (2020) A new statistical channel model for emerging wireless communication systems. IEEE Open J Commun Soc 1:916–926. https://doi.org/10.1109/ OJCOMS.2020.3008161 3. Yacoub MD (2010) Nakagami-m phase-envelope joint distribution: a new model. IEEE Trans Veh Technol 59(3):1552–1557. https://doi.org/10.1109/TVT.2010.2040641 4. Beaulieu NC, Saberali SA (2014) A generalized diffuse scatter plus line-of-sight fading channel model. In: IEEE international conference on communications 2014 (ICC). Sydney, NSW, pp 5849–5853. https://doi.org/10.1109/ICC.2014.6884255 5. Wyne S, Singh AP, Tufvesson F, Molisch AF (2009) A statistical model for indoor office wireless sensor channels. IEEE Trans Wireless Commun 8(8):4154–4164. https://doi.org/10. 1109/TWC.2009.080723
466
S. Hawaibam and A. D. Singh
6. Samimi MK, MacCartney GR, Sun S, Rappaport TS (2016) 28 GHz millimeter-wave ultrawideband small-scale fading models in wireless channels. In: IEEE 83rd vehicular technology conference 2016 (VTC Spring), pp 1–6. Nanjing, China 7. García-Corrales C, F Cañete J, Paris JF (2014) Capacity of κ − μ shadowed fading channels. Int J Antennas Propag 2014:1–8. https://doi.org/10.1155/2014/975109 8. Li X, Chen X, Zhang J, Liang Y, Liu Y (2017) Capacity analysis of α − η − κ − μ fading channels. IEEE Commun Lett 21(6):1449–1452. https://doi.org/10.1109/LCOMM.2017.2672960 9. Srinivasan M, Kalyani S (2018) Secrecy capacity of κ − μ shadowed fading channels. IEEE Commun Lett 22(8):1728–1731. https://doi.org/10.1109/LCOMM.2018.2837859 10. Kansal V, Singh S (2021) Capacity analysis of maximal ratio combining over Beaulieu-Xie fading. Ann Telecommun 76(1–2):43–50. https://doi.org/10.1007/s12243-020-00762-7 11. Kaur M, Yadav RK (2020) Performance analysis of Beaulieu-Xie fading channel with MRC diversity reception. Trans Emerging Tel Tech 31(7). https://doi.org/10.1002/ett.3949 12. Silva HS, Almeida DBT, Queiroz WJL, Silva HTP, Fonseca IE, Oliveira ASR, Madeiro F (2022) Capacity analysis of shadowed Beaulieu-Xie fading channels. Digital Signal Process 122:103367 13. Gradshte˘ın IS, Ryzhik NIM, Jeffrey A (2000) Table of integrals, series, and products, 6th ed. Academic Press, Amsterdam, Boston 14. Aalo VA, Zhang J (2001) Performance analysis of maximal ratio combining in the presence of multiple equal-power cochannel interferers in a Nakagami fading channel. IEEE Trans Veh Technol 50(2):497–503 15. Gradshte˘ın IS, Ryzhik IM, Jeffrey A (2007) Table of integrals, series, and products, 7th ed. Academic Press, Amsterdam, Boston (2007) 16. Kumar S, Soni SK, Jain P (2018) Performance of MRC receiver over Hoyt/lognormal composite fading channel. Int J Electron 105(9):1433–1450 17. Rasethuntsa TR, Kumar S, Kaur M (2019) A comprehensive performance evaluation of a DF-based multi-hop system over α − κ − μ and α − κ − μ-extreme fading channels [online]. Available http://arxiv.org/abs/1903.09353 18. Alouini MS, Goldsmith AJ (1999) Capacity of Rayleigh fading channels under different adaptive transmission and diversity-combining techniques. IEEE Trans Veh Technol 48(4):1165– 1181 19. Wolfram Research, Inc., Wolfram Research (2020). Available http://functionswolfram.com/id
Performance Comparison of L-SC and L-MRC over Various Fading Channels Sisira Hawaibam and Aheibam Dinamani Singh
Abstract The communication channel undergoes different kinds of fading which affects the received signals during transmission. Diversity is one of the most promising features to mitigate multipath fading in wireless communication system. In this paper, maximal ratio combining (MRC) and selection combining (SC) with L-branches over different fading channels are analyzed. The fading channels that have been used for performance comparison are Rayleigh, Rician, and Nakagami-m fading channels. Outage probability and average bit error rate (ABER) for coherent modulation are studied to compare their performances over the different fading channels. Mathematical expressions for ABER and outage probability over the fading channels with SC and MRC have been studied. The results are analytically obtained through numerical calculations and plotted using MATLAB. Keywords Average bit error rate · Fading channels · MATLAB · Maximal ratio combining (MRC) · Selection combining (SC)
1 Introduction For all wireless communication system, the channel undergoes multipath fading effects. The reason for its causes is reflection, diffraction, and scattering of the signals. These effects can distort the signal and severely degrade the overall system performance. There are many techniques to reduce the effects of fading such as equalization, channel estimation, diversity, or cyclic prefix. Diversity is one common technique which plays a key role to combat the effects of fading. Diversity allows several similar copies of signal to transmit via independent different paths. They are combined at the receiver. Since the paths are different and independent, the overall probability of fading over all branches is reduced. This will provide overall increase in received signal to noise ratio (SNR). In [1], performances have been compared S. Hawaibam (B) · A. D. Singh NIT Manipur, Langol, Manipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_43
467
468
S. Hawaibam and A. D. Singh
between optimum selection combining (OSC), conventional selection combining (CSC), and maximal ratio combining (MRC). It has been stated that for same diversity order, asymptotic bit error rate (BER) has been evaluated. It is found that MRC is the most efficient while CSC is the least. Different diversity order and modulation order are studied over multiple input multiple output (MIMO) system in terms of BER [2]. Many diversity techniques have been introduced, out of these, the simplest is the selection combining (SC) technique. The branch with maximal SNR is picked at the combiner in the SC. Others include MRC, switched and stay combining, equal gain combining (EGC), hybrid combining, etc. Millimeter wave communications over fluctuating-two ray (FTR) fading channel with SC are analyzed in [3]. Average symbol error rate (ASER) for quadrature amplitude modulation (QAM) with L-MRC receivers for η − μ and κ − μ fading channels is analyzed in [4]. MRC has been preferred as it gives good SNR gain against other techniques. The fading channels used for the study of the system performances are Nakagami-m, Rayleigh, and Rician fading models. Rayleigh fading applies when dominant non line of sight (NLOS) exists. Performance for L-branch SC in generalized Rayleigh fading channel has been evaluated in [5]. Rician fading considers one dominant LOS signal along with weaker signals in the propagation path. In [6], performance analysis for L-branch EGC over Rician fading is done. The results showed that EGC performed as good as MRC. Evaluation of L-branch diversity for equal correlated Rician fading channel is presented in [7]. SC and EGC have been used to perform ABER for different modulation schemes. Nakagami-m fading channel closely approximates Rayleigh distribution and Rician distribution. Exact expressions of different coherent ASER with L-MRC receivers through correlated Nakagami-m fading channels have been evaluated for M-QAM or M-phase shift keying (PSK) [8]. In [9], performance of single and dual SC receivers with PSK modulation technique is focused under Rayleigh and Nakagami-m fading channels. An experiment for SC technique over Rayleigh fading channel to reduce errors during image transmission in wireless channels was performed in [10]. The experiment was done for BER which showed that the system improved with diversity. This paper presents the communication system performance comparison over Rayleigh, Rician, and Nakagami-m fading channel with L-branch of SC and MRC diversity techniques. The expressions of outage probability and ABER with coherent modulation scheme are presented for the different fading channel with L-branch diversity. The coherent modulation used is binary PSK. The performances of the three above stated fading channels are compared in the form of outage probability and ABER. As per the literature survey, this kind of comparison of the performances has not been familiar. The organization of the paper is as follows: Diversity techniques are presented in Sect. 2. Performance analysis of ABER with coherent modulation (binary PSK) and outage probability is carried out Sect. 3. Then, numerical analysis and discussions in Sect. 4 followed by conclusion in Sect. 5.
Performance Comparison of L-SC and L-MRC over Various Fading Channels
469
2 Diversity Techniques The two diversity techniques that have been considered for performance analysis are SC and MRC with L-branch receivers. SC is considered the least complicated because only one of the diversity branches is processed. It will choose the branch with strongest SNR from multiple branches [11]. To achieve a good performance, independent fading of the channels must take place with equal spaced antennas. Practically, it is not possible to achieve theoretical maximum diversity gain. The SNR output at the SC combiner is given in [11] as sSC = max (s1 , s2 , ...s L ) ,
(1)
where s1 , s2 , ...s L denote instantaneous received SNR with L number of total diversity branches. MRC is an effective receiver when the interference is less, regardless of all fading statistics on different diversity branches [11]. All the incoming signals from different diversity branches are co-phased, weighted proportionally and added algebraically at combiner output. There are many literatures on MRC over fading channel. Though its performance is optimum, it hardly finds any physical implementation because of complexity. It needs to know all the channel fading parameters. The SNR output at the MRC combiner is given in [11] as sMRC =
L
sl ,
(2)
l=1
where sl denotes instantaneous received SNR with ‘l’ as the number of diversity branches ranging from 1 to L.
3 Performance Analysis 3.1 ABER of L-SC Output SNR Through Rayleigh Fading Channel The ABER analytical expression of L-SC receivers via Rayleigh fading channel for binary PSK is given in [12, Eq. 13] as ⎤ ⎡ L−1 1 L 1 L −1 SC ⎦, ⎣1 − Pe,Rayleigh (3) (s) = (−1)q q 2 q=0 1+q 1 + 1+q s
where L denotes number of diversity branches. s is output SNR, and s is average output SNR of the channel.
470
S. Hawaibam and A. D. Singh
3.2 ABER of L-SC Output SNR Through Rician Fading Channel The ABER analytical expression of L-SC receivers via Rician fading channel for coherent modulation is solved by putting [13, Eq. 11] in [14, Eq. 13] and using [15, Eq. 6.455.1] is obtained as: SC Pe,Rician (s)
AL = 2π s 2 ∞
L−1 (q+1) B L −1 (−1)q e− s −K q 2 q=0
1 K K k (2 + t) e − ×
2+t 3
B t! + 1s t=0 k=0 k! 2 + t 2 1 5
, × 2 F1 1, 2 + t; + t; B 2 s 2 + 1s t−1
(4)
where 2 F1 (.) is hypergeometric function. (.) is Gamma function. K is Rician factor. Binary PSK coherent modulation is used for the performance analysis where A = 1 and B = 2 [16].
3.3 ABER of L-SC Output SNR Through Nakagami-m Fading Channel The ABER analytical expression of L-SC receivers via Nakagami-m fading channel in coherent modulation is shown in [17, Eq. 19] as L−1 j (m−1) k AB (−1) j 2π (B)(m) j=0 k=0 i=k−(m−1) ⎞ ⎛ m m+k Di( j−1) L − 1 ⎜ (m + k + B) s ⎟ × ⎝ m+k+B ⎠ j (k − i)! (m + k) A + ( j+1)m s 1 5
, × 2 F1 1, 2 + t; + t; B 2 s 2 + 1s
SC (s) = Pe,Nakagami
where m denotes fading parameter.
(5)
Performance Comparison of L-SC and L-MRC over Various Fading Channels
471
3.4 ABER of L-MRC Output SNR Through Rayleigh Fading Channel The ABER analytical expression of L-MRC receivers via Rayleigh fading channel for coherent modulation is obtained by putting [11, Eq. 9.5] in [14, Eq. 13] and solving with the help of [15, Eq. 6.455.2] is obtained as
MRC Pe,Rayleigh (s)
L + 21 1 1 =
L+ 21 2 F1 1, L + 2 ; L + 1; s B + 1 . 2π s L (L − 1)!L B2 + 1s 2 s (6) A
B 2
3.5 ABER of L-MRC Output SNR Through Rician Fading Channel The ABER analytical expression of L- MRC receivers via Rician fading channel for coherent modulation is found by putting [18, Eq. 2] in [14, Eq. 13] and simplified using [14, Eq. 13] is obtained as MRC Pe,Rician (s) =
∞ Bs K m eK A −L − m + 1, 1 G 2,1 | , 0, 0.5 2π m=0 m!(m + L) 2,2 2(L + K )
(7)
− where G m,n p,q .|− is the Meijer-G function.
3.6 ABER of L-MRC Output SNR Through Nakagami-m Fading Channel The ABER analytical expression of L-MRC receivers via Nakagami-m fading channel for coherent modulation is obtained after putting [19, Eq. 2] in [14, Eq. 13] and solving by using [15, Eq. 6.455.1] ⎞
⎛ 1 m L (m L + ) B m A MRC 2 ⎝ ⎠ Pe,Nakagami (s) = 1
B 2π (m L) 2 s m m L+ 2 mL 2 + s m 1
. ×2 F1 1, m L + ; m L + 1; B 2 s 2 + ms
(8)
472
S. Hawaibam and A. D. Singh
3.7 Outage Probability of L-SC Output SNR Through Rayleigh Fading Channels Analytical expression of received signal SNR for L-SC receivers over Rayleigh fading channel given in [11, Eq. 9.298] as sth
SC (sth ) = (1 − e s ) L . Pout,Rayleigh
(9)
3.8 Outage Probability of L-SC Output SNR Through Rician Fading Channel The analytical expression of received signal SNR for L-SC receivers over Rician fading channel given in [11, Eq. 9.300] as
L
√ 2(1 + K )s th SC Pout,Rician (sth ) = 1 − Q 1 , 2K , s
(10)
where Q M (x, y) is the Marcum Q function [15].
3.9 Outage Probability of L-SC Output SNR Through Nakagami-m Fading Channel The analytical expression of received signal SNR for L-SC receivers via Nakagami-m fading channel given in [11, Eq. 9.299] as SC Pout,Nakagami (sth ) =
(m) − G(m, ms sth ) (m)
L .
(11)
3.10 Outage Probability of L-MRC Output SNR Through Rayleigh Fading Channels The analytical expression of received signal SNR for L-MRC receivers over Rayleigh fading channel given in [11, Eq. 9.301] as sth − sth MRC e s . (sth ) = 1 − L + Pout,Rayleigh s
(12)
Performance Comparison of L-SC and L-MRC over Various Fading Channels
473
3.11 Outage Probability of L-MRC Output SNR Through Rician Fading Channel The analytical expression of received signal SNR for L-MRC receivers via Rician fading channel given in [11, Eq. 9.303] as MRC Pout,Rician (sth )
= 1 − QL
√
2K ,
2(1 + K )sth s
.
(13)
3.12 Outage Probability of L-MRC Output SNR Through Nakagami-m Fading Channel The analytical expression of received signal SNR for L-MRC receivers via Nakagamim fading channel given in [11, Eq. 9.302] as MRC (sth ) = Pout,Nakagami
(Lm) − G(Lm, ms sth ) . (Lm)
(14)
4 Numerical Analysis The mathematical expression for ABER and outage probability through various fading channels for L-SC and L-MRC is evaluated numerically and plotted here. Different values of K and m have been used to plot different curves. The performances are compared between the fading channels. Figure 1 shows ABER versus average SNR for L-branch SC. The fixed parameter value which is m = 1 and K = 1 dB is considered for plotting the graph. Binary PSK technique for coherent modulation is considered. It is the simplest form of PSK. At L = 3 branch, performance of Rician is the best when compared to Rayleigh and finally Nakagami-m fading channel. It is because at m = 1, Nakagami-m suffers from more fluctuations and possibility of error increases in the signal. It is noticed that system performances over Rayleigh with L = 3 and Rician with L = 2 are close to each other. It means that error rates are almost same. Since Rician has a greater number of diversity branch, there will be independent fading of the branch. This will improve the overall performance. Figure 2 depicts ABER versus average SNR for L-branch MRC. The curves indicate that for L = 3 Rayleigh fading channel decreases the most which is desirable for performance. The possibility of interference and distortion over communication channel is reduced. Between Rician and Nakagami-m, Rician performs better at K = 1 dB. Although the diversity branch increases, Nakagami-m suffers deep fading
474
S. Hawaibam and A. D. Singh
Fig. 1 ABER for L-branch SC
Fig. 2 ABER for L-branch MRC
at m = 0.5. At L = 2, Nakagami-m is seen to perform the worst again affecting the system performance. Figure 3 shows outage probability against average received SNR for L-branch SC. The graph consists of fixed value of, i.e., K = 1 dB and m = 2. From the graph, the following observations can be made. Rician fading channel has strong LOS component between the transmitter and receiver which makes it to perform the best with L = 3. Therefore, SC will be easily able to choose one branch from all the branches. Next, Nakagami-m with L = 3 performs better than other fading channels. As observed Rayleigh with L = 2 and L = 3 degrades the system performance. This is because Rayleigh fading channel has no dominant component along the LOS
Performance Comparison of L-SC and L-MRC over Various Fading Channels
475
Fig. 3 Outage probability for L-branch SC
Fig. 4 Outage probability for L-branch MRC
between transmitter and receiver and makes it difficult for SC to choose a branch with highest SNR. Figure 4 shows outage probability against average received SNR for L-branch MRC. m ranges from 0.5 to infinity. It closely approximates Rician fading channel for m>1. It is seen that Nakagami-m and Rician fading channel are very close to each other for L = 3. It is advantageous for the system to make outage at lesser values. They perform almost equivalent to each other with Rician factor K = 1 dB. When m = 1, Nakagami-m includes Rayleigh fading as a special case. Therefore, the performance degrades for Rayleigh at L = 2 and L = 3 because the performance undergoes more fading.
476
S. Hawaibam and A. D. Singh
5 Conclusion This paper shows the performance comparison via Rayleigh, Rician, and Nakagamim fading channel. ABER coherent modulation (binary PSK) and outage probability have been evaluated for L-branch diversity. Diversity techniques which is SC and MRC have been used for the analysis. Diversity techniques exploit multipath channel and lower deep fade in the system. The L-branch diversity techniques show that performances increase with a greater number of antenna branches. The results also show that fading parameter (m and K ) affects the system and shows the importance of diversity in communication. It is concluded that when diversity branch increases performance improves. MRC diversity technique is favorable as it gives good SNR gain.
References 1. Kong N (2009) Performance comparison among conventional selection combining, optimum selection combining and maximal ratio combining. In: IEEE international conference on communications. Dresden, Germany, pp 1–6 2. Singh S, Rao SAP (2017) Performance analysis of diversity in MIMO wireless system. In: International conference on signal processing and communication 2017 (ICSPC). Coimbatore, pp 130–134 3. Al-Hmood H, Al-Raweshidy HS (2021) Performance analysis of mmWave communications with selection combining over fluctuating-two ray fading model. IEEE Commun Lett 25(8):2531–2535 4. Dixit D, Sahu PR (2012) Performance of L-branch MRC receiver in and fading channels for QAM signals. IEEE Wirel Commun Lett 1(4):316–319 5. Malluri S, Pamula VK (2014) Performance analysis of selection combining diversity scheme in generalised Rayleigh fading channels. In: 2014 international conference on communication and signal processing. Melmaruvathur, India, pp 957–961 6. Abu-Dayya AA, Beaulieu NC (1994) Microdiversity on Rician fading channels. IEEE Trans Commun 42(6):2258–2267 7. Chen Y, Tellambura C (2004) Performance of L-branch diversity combiners in equally correlated Rician fading channels. In: IEEE global telecommunications conference, vol 5. Dallas, TX, USA, pp 3379–3383 8. Lih-Feng Tsaur, Lee, D. C.: Performance analysis of general coherent MRC receivers in Nakagami-m fading channels. In: IEEE wireless communications and networking conference, vol 1. Chicago, IL, USA, pp 404–408 9. Simon MK, Alouini M-S (1999) A unified performance analysis of digital communication with dual selective combining diversity over correlated rayleigh and Nakagami-m fading channels. IEEE Trans Commun 47(1):11 10. Baharuddin LA, Andre H, Angraini R (2021) Performance analysis of diversity selection combining technique in rayleigh fading channel. IOP Conf Ser Mater Sci Eng 1041(1):012009 11. Simon MK, Alouini MS (2005) Digital communication over fading channels, 2nd edn. Wiley, California 12. Eng T, Kong N, Milstein LB (1996) Comparison of diversity combining techniques for Rayleigh-fading channels. IEEE Trans Commun 44(9):1117–1129 13. Weng JF, Leung SH (2000) On the performance of DPSK in Rician fading channels with class A noise. IEEE Trans Veh Technol 49(5):1934–1949. https://doi.org/10.1109/25.892596
Performance Comparison of L-SC and L-MRC over Various Fading Channels
477
14. Laishram MD, Aheibam DS (2020) Performance of dual-branch selection combining receiver over fluctuating two-ray (FTR) fading channels for 5G mmWave communications. AEU Int J Electron Commun 117:153093 15. Gradshte˘ın IS, Ryzhik IM, Jeffrey A (2007) Table of integrals, series, and products, 7th ed. Academic Press, Amsterdam, Boston 16. Subadar R, Singh AD (2014) Performance of M-MRC receivers over TWDP fading channels. AEU Int J Electron Commun 68(6):569–572 17. Ugweje OC (2001) Selection diversity for wireless communications in Nakagami-m fading with arbitrary parameters. IEEE Trans Veh Technol 50(6):1437–1448 18. Ubhi JS, Patterh MS, Kamal TS (2005) Performance of MQAM with MRC diversity reception over Rician fading environment. In: Asia-Pacific conference on communications. Perth, Western Australia, pp 736–739 19. Lo CM, Lam WH (2000) Performance analysis of bandwidth efficient coherent modulation schemes with L-fold MRC and SC in Nakagami-m fading channels. In: 11th IEEE international symposium on personal indoor and mobile radio communications, vol 1. London, UK, pp 572–576
A Learning Model for Channel Selection and Allocation in Cognitive Radio Networks Subhabrata Dhar, Sabyasachi Chatterjee, and Prabir Banerjee
Abstract Channel selection plays an essential role in uninterrupted and efficient cognitive communication. A secondary user applies a comparative sensing approach in different time slots to select reliable channels for cognitive communication. However, the reliable channel selection in real-time may be affected severely due to variation of the channel parameters. Hence, a learning-based model for channel selection and allocation has been developed in this paper to mitigate the problems due to the real-time variations and improve the accuracy of channel selection. The proposed learning model has been trained accurately with past input data so that the learning capability of the model improves. The fuzzy c means clustering algorithm has been incorporated in our work to further enhance the accuracy of the learning model. The graphical analysis of this algorithm reveals that reliable active channels have an improved accuracy rate of more than 30% based on selection performance. Therefore, the proposed learning model will improve the spectrum utilization efficiency of the cognitive radio network. Keywords Cognitive radio network · Receiver · Learning model · Fuzzy c means clustering · Spectrum utilization efficiency
S. Dhar (B) Department of Electronics and Communication, Guru Nanak Institute of Technology, Calcutta, West Bengal, India e-mail: [email protected] S. Chatterjee · P. Banerjee Department of Electronics and Communication, Heritage Institute of Technology, Calcutta, West Bengal, India e-mail: [email protected] P. Banerjee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_44
479
480
S. Dhar et al.
1 Introduction Cognitive radio (CR) is a technology with great promise that can learn about the surrounding radio frequency (RF) environment by various sensing methods. In turn, it can take firm decisions on how to use the available spectrum more efficiently [1]. Hence, spectrum sensing and selection are the most significant features of the cognitive radio network (CRN). In CRN, cognitive radio users (CRUs) often act as secondary users (SUs) or unlicensed users, whereas pre-allocated channel spectrum users are primary users (PUs) or licensed users. The SU performs various localization techniques to monitor the PU status continuously, so that it can utilize the free spectrum without causing interference with the PUs. However, there is a strong chance of interference between PUs and SUs. It can effectively degrade the quality of service (QoS) of PU. Therefore, spectrum sensing techniques need to be improved to make the system more reliable and interference-free. SUs must also be concerned about the divergence in the surrounding RF environment and take proper actions to modify the operating factors for utilizing the spectrum productively [2]. Therefore, spectrum selection and allocation depends on the process of cognition capability and auto-configurability [3]. Cognition capability specifies learning the surrounding RF environment by the sensing process. The term auto-configure signifies the ability of SU to adapt to the sudden change in values of the operating parameters based on the learning outcome [3]. The allocation of channel spectrum based on past learning is essential for efficient CRN performance. The reappearance of PU in a channel during secondary data transmission results in severe interference. Due to inappropriate selection, SUs cannot detect the reappearance of PUs most of the time. Therefore, problems like misdetection and false alarms degrade the CRN performance. Many channel selection and allocation techniques were developed in [4–6] to enhance spectrum access reliability of licensed bands. But all those schemes were unable to overcome the problem of channel uncertainty which leads to poor performance accuracy of CRN. Hence, an analytical data prediction (ADP) model has been developed in this paper to overcome channel uncertainty. The ADP model enhances the accuracy level of channel selection by handling real-time channel uncertainty issues efficiently. The proposed model can also distinguish between active and busy channels to improve channel selection performance. The proposed model establishes a learning-based channel selection scheme by machine learning (ML) technology. ML can acquire information from the past data, learn from it, thereby adapting itself based on the received knowledge [7]. Therefore, ML is an efficient technique for learning-based channel selection and allocation. In the proposed scheme, the channel selection for different time slots has been performed, where received signal strength indicator (RSSI) has been considered as the parameter to select the reliable channel. However, to enhance the accuracy of the proposed ADP model, the fuzzy c means clustering (FCM) algorithm has been implemented in our work. The FCM algorithm has been implemented in our work to enhance the accuracy rate of the ADP model by use of the clustering method. The FCM algorithm [8] is an efficient ML-based technology that can significantly mitigate the channel interference
A Learning Model for Channel Selection and Allocation in Cognitive …
481
problem during channel selection. Hence, the channel selection accuracy can be enhanced to a great extent. The salient contributions of this paper are listed below: • The proposed ADP model is incorporated to mitigate the computational delay through efficient channel selection. • The proposed ADP model can minimize the transmission delay problem in the network by optimizing the RSSI value of the channel. • The FCM algorithm has been implemented to enhance the accuracy level of the proposed ADP model. It is possible due to the reduction of channel interference problem in the CRN. The rest of the paper is organized as follows: Sect. 2 details the literature survey; Sect. 3 explains the system model, Sect. 4 exhibits results and analysis. Finally, Sect. 5 concludes the paper.
2 Literature Survey In the last few years, several models were proposed to indicate the learning capability of SUs. The SUs perform sensing operation to select reliable channels for proper access. Hence, an optimal sensing strategy has been implemented in [9] to determine the best channel spectrum for secondary allocation. However, this work fails to predict the idle channels accurately by analyzing the channel utilization pattern. In [10], the Markovian algorithm was proposed to select the best operating channel spectrum. However, many researchers proved that the Markovian algorithm develops mathematical models to evaluate the sensing period. But with the help of this mathematical model, SUs fail to predict the reappearance of PUs. Therefore, the occurrence probability of false alarm and misdetection increases. Hence, the hidden Markov model (HMM) was proposed in [11] to overcome detection problems. But HMM has failed to handle the channel uncertainty problem in the real-time scenario due to shadowing and fading effects. Therefore, centralized cooperative spectrum sensing has been incorporated to control the channel uncertainty problem [12]. The centralized cooperative spectrum sensing reduces the delay and interferences problems. But the SUs still fail to distinguish between the primary and secondary signals. Hence interference increases. Delays and interferences are significant problems that need to be minimized for proper transmission in the channel. Therefore, a fast physical layer signaling protocol has been developed in [13] to mitigate delay and interference problems. But, all these methods were not sufficient for proper channel sensing and channel allocation. Due to improper channel sensing, SUs cannot transmit suitably in the channel. Therefore, a genetic algorithm has been developed in [14] to improve detection probability. Hence the occurrence probability of misdetection and false alarms also get minimized. The genetic algorithm is a classic biological model that can manage highly complicated problems by finding appropriate solutions. A deep
482
S. Dhar et al.
learning-based algorithm was introduced in [15] to enhance the accuracy of channel selection and allocation. Several works have been performed in [16–18] by using the deep neural network (DNN) model to minimize the false detection when the signal-to-noise ratio (SNR) is minimum. In [19], a glowworm swarm algorithm has been developed on a convolution neural network to reduce interference and detection problems in the cognitive radio network. However, none of these works [9–19] have considered to receive signal strength indicator (RSSI) as the parameter to select and allocate channels for secondary communication. RSSI is a considerable parameter that needs to be considered for proper channel selection and allocation. In this paper, the fuzzy c means clustering algorithm has been implemented to ensure suitable transmission in the channel. In the next section, a system model has been developed to improve efficiency in multi-slot channel selection and allocation by SUs.
3 System Model In this work, we have performed simulation of the single channel in different time slots to exhibit channel selection scheme by using an analytical data prediction (ADP) model. In the proposed ADP model, the set of time slots has been abbreviated as m ∈ {1, 2, 3, . . . , n} where n is the total number of time slots. In Algorithm 1, the channel prediction technique in the cognitive radio network (CRN) has been shown. The linear regression model [20] has been implemented in our proposed model. The best fit line of linear regression can be depicted in Eq. 1 as: y = c0 + c1 x + e
(1)
where, y is the predicted variable, x is the data sample, co is the intercept, c1 is the slope of the line and e is the prediction error. The slope c1 is defined in Eq. 2 as: c1 =
Px y Px x
(2)
where, Px y is the regression coefficient of x and y variables and Px x is the regression coefficient of two x variables. Here, Px y =
n p=1
and
(x p − x)(y p − y)
(3)
A Learning Model for Channel Selection and Allocation in Cognitive …
Px x =
n
(x p − x)2
483
(4)
p=1
where, x is the mean of data samples, y is the mean of predicted variables and n is the total number of channels. Replacing values of Px y and Px x in Eq. 2, we get, n c1 =
x p − x yp − y 2 n p=1 x p − x
p=1
(5)
The Eq. 5 is the final derived equation of the slope of the line. The final derived equation of threshold error (e) can be expressed in Eq. 6 as: e = c1 (x − x)
(6)
The threshold error e is actually the vertical distances from each data point projected to the best fit line. In our proposed ADP model, we have considered threshold error of RSSI as 2 dBm. The main objective of our proposed ADP model is to select active channels (free channels) or busy channels by using training data. The secondary user (SU) learns from the past information and trains the data based on the past knowledge. So, the ADP model can mitigate the computational delay problem in the network without performing the sensing operation. The sensing operation takes long time that causes much delay in the network. In ADP model, if the threshold error lies within the specified value, then the channel in the particular time slot is considered as active channel. Otherwise, the channel is considered as busy channel. Also, if RSSI value of the channel is high, then the distortion of the signal will be less in the channel. Therefore, signal-to-noise ratio (SNR) will improve. This can reduce the transmission delay problem in the network to great extent. However, sometimes secondary users (SUs) fail to detect the primary user (PU) due to high channel interference problem. Hence, in our work, fuzzy c means clustering (FCM) algorithm has been incorporated to improve the channel selection accuracy of the proposed ADP model. Figure 1 shows the schematic representation of FCM algorithm. The FCM is the clustering based machine learning (ML) technique that performs by initializing membership value to each data samples [8]. Here, data samples are channels in each time slot. The set of time slots has been abbreviated as m, m ∈ {1, 2, 3, . . . n}, where n is the total number of time slots. Channels in each time slot have been categorized into several clusters. Each cluster comprises of center point or centroid. The set of cluster centers has been represented as c, c ∈ {1, 2, 3, . . . g}, where g is the total number of cluster centers. In our simulation, we have considered two clusters. The fuzzy membership value β varies based on the Euclidean distance between the data point and the centroid. The fuzzy membership value can be represented in Eq. 7 as:
484
S. Dhar et al.
Fig. 1 Fuzzy c means clustering algorithm [8]
β=
g
h 2cm
/
1/z−1 h 2ma
−1 (7)
c=1
Here, h ma is the Euclidean distance between mth data and ath cluster center. z is the fuzziness parameter (generally considered as 2 or 3). In our work, we have taken z as 2. The cluster center (χa ) can be calculated in Eq. 8 as: χa =
n n z z βcm ∗ xc / βmc m=1
(8)
m=1
where, a ∈ {1, 2, . . . g} and xc is the cth data point. The fuzzy membership value as well as cluster center must be updated after each iteration to obtain better results. Here, in our work, we have executed 20 iterations to get accurate results. In FCM, objective function must be reduced to acquire good membership values. The objective function can be demonstrated in Eq. 9 as: F(m, a) =
g n
z
(βma ) xm − χa 2
(9)
m=1 a=1
If the distance between pth data variable and ath cluster center decreases, then overlapping of clusters occur. Therefore, there is a high probability that the data point can belong to more than one cluster. The overlapping of cluster leads to accurate predictions. Here, in our simulation, the lower threshold of β has been considered as 0.5 whereas upper threshold of β has been considered as 0.6 to obtain overlapped data points. Therefore, learning-based channel selection and allocation can be enhanced by using FCM. Figure 2 exhibits the flow chart representation of FCM algorithm. Algorithm 1 Analytical data prediction model (continued)
A Learning Model for Channel Selection and Allocation in Cognitive …
485
(continued) 1. Assign p channel in m time slots. Assign time slot as m, m ∈ {1, 2, ...n}. n is the total number of time slots. Secondary user learns from the past data in each time slot to identify whether the channel is free or busy 2. While p = 1 do { 3. for (m = 1 to n){ 4. Estimate best fit line by using equation y = c0 + c1 x + e, where y is the RSSI of channels and x is the time slot 5. Initialize vertical distance as dm , ∀m 6. Estimate dm2 ∀m 7. Calculation of computational error e. 8. e ← min d 2 m
9. Calculate slope of the best fit line (c1 ) by using Eq. 5 10. Calculate intercept (c0 ) of the best fit line from Eq. 1 11. Put values of c0 , c1 and e in step 4 for obtaining the predicted output 12. if (e ≤ 2) The channel is considered as active channel 13. else The channel is considered as busy channel } # end of for loop } # end of do while loop
4 Result and Analysis In our proposed analytical data prediction model mentioned in algorithm1, we have performed QUALNET simulation to select reliable free channels for secondary allocation. We have taken 3000 samples to obtain accurate results. In the proposed system model, we have observed the behavior of a single channel in 18 different time slots. The RSSI has been analyzed in each time slot to estimate the channel performance.
4.1 Reliable Channel Detection by Using the Proposed ADP Model In Fig. 3a, a multi-slot channel selection scheme has been depicted. The threshold error of RSSI has been kept within 2 dBm. The graphical analysis exhibits that the channel in time slots (5, 6, 7, 8, 11, 12, 14, 16) are suitable for secondary user allocation since squared vertical distances of channel points from the best fit line regression line are within 2 dBm (as per Algorithm 1). Vertical distance represents deviations or computational error. Table 1 gives the summary of active and busy channels with respect to RSSI and time slot. In Table 1, channel state ‘A’ is represented as active channel and channel state ‘B’ is represented as busy channel. In the proposed ADP model, we have taken input RSSI values for 18 time slots. We have considered RSSI range of − 70 to − 50 dBm since this range is suitable
486
S. Dhar et al.
-50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60 -61 -62 -63 -64 -65 -66 -67 -68 -69 -70
Predicted RSSI(dBm)
RSSI(dBm)
Fig. 2 Flowchart representation of FCM algorithm
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Time slots
(a)
-50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60 -61 -62 -63 -64 -65 -66 -67 -68 -69 -70
Active Channel Busy Channel
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Time slots
(b)
Fig. 3 a Channel detection by using proposed ADP model and b prediction of RSSI of channel in each time slots
A Learning Model for Channel Selection and Allocation in Cognitive …
487
Table 1 Summary of active channels and busy channels with respect to RSSI Channels in time slot RSSI (dBm)
1
RSSI (dBm) RSSI (dBm)
2
3
4
5
6
− 68.5 (B)
− 56.2 (B)
− 68.2 (B)
− 65.8 (B)
− 65.3 (A)
− 64.2 (A)
7
8
9
10
11
12
− 58.3 (A)
− 57.5 (A)
− 50.2 (B)
− 57 (B)
− 61 (A)
− 63.4 (A)
13
14
15
16
17
18
− 65 (B)
− 58 (A)
− 66.5 (B)
− 57.6 (A)
− 67.5 (B)
− 55 (B)
6 5
Predicted error
4.5 4 3.5 3 2.5
ADP model DNN model RLNN model
2 1.5 1 0.5 0
0 1
2
3
4
5
6 7
8
9 10 11 12 13 14 15 16 17 18
Time slot
(a)
RSSI of active channels(dBm)
5.5
-57 -57.5 -58 -58.5 -59 -59.5 -60 -60.5 -61 -61.5 -62 -62.5 -63 -63.5 -64 -64.5 -65 -65.5 -66
Cluster 1
Cluster 2
Centroid of Cluster 2
Centroid of Cluster 1
4
5
6
7
8
9
10
11
12
Time slot of channel
13
14
15
16
(b)
Fig. 4 a Comparative analysis of our proposed ADP model with DNN algorithm [21] and RLNN [22] and b Mitigation of channel interference problem by using FCM algorithm
for proper cognitive communication. It has been clearly visualized from Fig. 3 that channels with RSSI values of − 65.3, − 64.2, − 58.3, − 57.5, − 61, − 63.4, − 58 and − 57.6 dBm are suitable for secondary allocation. To further verify the accuracy of the proposed model, we have applied 18 RSSI values as inputs for training the model. Iterations have been performed 2000 times to obtain accurate results. The error goal has been adjusted to ± 2. Figure 3b depicts the predicted RSSI output for active channels and busy channels in each time slots. In Fig. 3b, we can observe that the predicted RSSI output is very close to the actual RSSI input. Therefore, the prediction error also becomes very less. This method has enhanced the efficiency of our proposed ADP model. Figure 4a shows the comparative analysis of our proposed ADP model with respect to other existing models.
4.2 Comparative Analysis of Our Proposed ADP Model Figure 4a exhibits the comparative analysis of predicted error between the proposed analytical data prediction (ADP) model and other existing works such as deep neural
488
S. Dhar et al.
network algorithm (DNN) [21] and reinforcement learning neural network (RLNN) [22]. In Fig. 4a, it has been observed that the ADP model produces very less error in comparison with the DNN and RLNN algorithms. In the case of the ADP model, the maximum testing error is 2.1%, whereas, in the case of the DNN and RLNN algorithm maximum testing errors are 33.28% and 53.23%, respectively. Our proposed ADP model has shown better validation accuracy due to its high learning capability by gathering past data information. Table 2 gives a comparison of proposed ADP model based on previously existing algorithms such as DNN and RLNN based on prediction accuracy. However, to mitigate the channel interference problem in the network, fuzzy c means clustering (FCM) algorithm has been implemented in Fig. 4b. Here, channels in each time slot have been divided into two clusters (Cluster 1 and Cluster 2). Each cluster has a cluster center (or centroid). If the membership value is between 0.5 and 0.6, then there is probability that a channel point belongs to two clusters. The overlapping data gives good prediction and high accuracy. In Fig. 4b, RSSI of active channels is analyzed by using clustering technique. It has been computed from the figure that the active channel in time slots 7, 8, 12 have very good membership values of 0.5387, 0.6221, 0.6350, respectively. Therefore, these channels suffer less interference and thus can be allocated by secondary users (SUs). Table 3 gives the allocation possibilities of the channel based on the membership values. Table 2 Comparative analysis of models based on prediction accuracy Accuracy of implemented algorithm
Accuracy of existing algorithms
Features
ADP model
DNN model [21]
RLNN model [22]
97.9%
53.23%
60%
Predicted time slot estimation for active channels in the CRN
Table 3 Summary of channel allocation based on membership value Active channels in time slots
Membership value (β)
Final decision
5
0.9475
Not allocated
6
0.9897
Not allocated
7
0.5387
Allocated
8
0.6221
Allocated
11
0.826
Not allocated
12
0.635
Allocated
14
0.9628
Not allocated
16
0.8988
Not allocated
A Learning Model for Channel Selection and Allocation in Cognitive …
489
5 Conclusion A learning-based channel selection model has been incorporated in this paper to enhance the CRNs’ performance. The ADP model has been proposed to perform reliable channel selection for secondary communication. The proposed ADP model has considerably reduced the computational as well as transmission delay in the network. The efficiency of the proposed model has been enhanced by applying the FCM algorithm. The graphical analysis depicts that the active channels in time slots 7, 8 and 12 of RSSI − 58.3, − 57.5 and − 63.4 dBm, respectively, are suitable for secondary allocation with the prediction accuracy of 99.1%. MATLAB and QUALNET simulators have been applied to execute network simulations in this paper. In the future, the proposed model may be implemented for the successful deployment of non-orthogonal-based 5G cognitive cellular network.
References 1. Sharma V, Joshi S (2018) A literature review on spectrum sensing in cognitive radio applications. In: Second international conference on intelligent computing and control systems (ICICCS), pp 883–893 2. Quan Z, Cui S, Sayed AH (2008) Optimal linear cooperation for spectrum sensing in cognitive radio networks. IEEE J Selected Topics Signal Process 2(1):28–40 3. Akyildiz IF, Lee WY, Vuran MC, Mohanty S (2006) NeXt generation/dynamic spectrum access/ cognitive radio wireless networks: a survey. Comput Netw 50(13):2127–2159 4. Mandal A, Chatterjee S (2017) A comprehensive study on spectrum sensing and resource allocation for cognitive cellular network. In: Devices for integrated circuit, pp 100–102 5. Joarder P, Chatterjee S (2018) Resource allocation in cognitive cellular hybrid network using particle swarm optimization. Int J Comput Sci Eng 6(5):744–749 6. Hassan Y, El-Tarhuni M, Assaleh K (2012) Learning-based spectrum sensing for cognitive radio systems. J Comput Net Commun 2012:1–13 7. Rehman K, Ullah I, Habib M (2019) Detail survey of cognitive radio communication system, pp 1–11 8. Paul A, Maity SP (2017) On energy efficient cooperative spectrum sensing using possibilistic fuzzy c-means clustering. In: International conference on computational intelligence, communications, and business analytics, pp 382–396 9. Yin S, Chen D, Zhang Q, Li S (2010) Prediction-based throughput optimization for dynamic spectrum access. IEEE Trans Veh Technol 60(3):1284–1289 10. Loganathan J, Janakiraman S (2016) Improved history based channel allocation scheme for cognitive radio networks. In: 2016 world conference on futuristic trends in research and innovation for social welfare (startup conclave), pp 1–8 11. Chatziantoniou E, Allen B, Velisavljevic V (2013) An HMM-based spectrum occupancy predictor for energy efficient cognitive radio. In: IEEE 24th international conference on personal indoor and mobile radio communications, pp 601–605 12. Sharma V, Joshi S (2018) A literature review on spectrum sensing in cognitive radio applications. In: 2018 second international conference on intelligent computing and control systems (ICICCS), pp 883–893 13. Fan R, An J, Jiang H, Bu X (2016) Adaptive channel selection and slot length configuration in cognitive radio. Wirel Commun Mob Comput 16(16):2636–2648
490
S. Dhar et al.
14. Supraja P, Gayathri VM, Pitchai R (2019) Optimized neural network for spectrum prediction using genetic algorithm in cognitive radio networks. Clust Comput 22(1):157–163 15. Xing H, Qin H, Luo S, Dai P, Xu L, Cheng X (2022) Spectrum sensing in cognitive radio: a deep learning based model. Trans Emerg Telecommun Technol 33(1):e4388 16. Danesh K, Vasuhi S (2021) An effective spectrum sensing in cognitive radio networks using improved Convolution Neural Network by glow worm swarm algorithm. Trans Emerg Telecommun Technol 32(11):1–20 17. Lee W, Kim M, Cho DH (2019) Deep cooperative sensing: cooperative spectrum sensing based on convolutional neural networks. IEEE Trans Veh Technol 68(3):3005–3009 18. Raj V, Kalyani S (2018) Back propagating through the air: deep learning at physical layer without channel models. IEEE Commun Lett 22(11):2278–2281 19. Mao Q, Hu F, Hao Q (2018) Deep learning for intelligent wireless networks: a comprehensive survey. IEEE Commun Surv Tutor 20(4):2595–2621 20. Uyanık GK, Güler N (2013) A study on multiple linear regression analysis. Proc Soc Behav Sci 106:234–240 21. Muteba KF, Djouani K, Olwal TO (2020) Deep reinforcement learning based resource allocation for narrowband cognitive radio-IoT systems. Proc Comput Sci 175:315–324 22. Ferreira PVR, Paffenroth R, Wyglinski AM, Hackett TM, Bilén SG, Reinhart RC, Mortensen DJ (2017) Multi-objective reinforcement learning-based deep neural networks for cognitive space communications. In: 2017 cognitive communications for aerospace applications workshop (CCAA), pp 1–8
An Ultrathin Multifunctional Polarization Converting Metasurface with Wide Angular Stability Soumendu Ghosh, Sourav Roy, Moirangthem Santoshkumar Singh, and Abhishek Sarkhel
Abstract Polarization converter is a device which can change the polarization state of the electromagnetic wave for several applications like satellite communication, Radar Stealth applications. In this work, a multifunctional polarization converter on an ultrathin metasurface for simultaneous linear to circular (L-t-C) and linear to linear (L-t-L) polarization conversion is proposed for microwave frequency region applications. The unit cell geometry of the anisotropic metasurface comprises of diagonal fence-shaped structure on the top of the thin dielectric substrate, whereas a continues metallic layer is printed on the back of the substrate. The proposed metasurface demonstrates a highly efficient linear to linear polarization conversion over the frequency band of 6.27–6.56 and 9.18–9.82 GHz. On the other hand, simultaneously, it converts linear polarization to circular polarization over the frequency bands of 6.00–6.17, 6.71–8.93, 10.03–10.52 GHz frequency. The theoritical mechanism of the polarization conversion is explained here through surface current distribution and u-v phase response analysis. Moreover, the proposed metasurface shows relatively stable performance under high oblique incidence up to 45o angle of incidence for several practical applications like multi-polarization communication devices, radar cross section reduction. Keywords Multifunctional metasurface · Polarization converter · Broadband circular polarization · Linear polarization
S. Ghosh (B) · M. S. Singh · A. Sarkhel Department of ECE, National Institute of Technology, Meghalaya, India e-mail: [email protected] S. Roy Electronics and Telecommunication Engineering, North Tripura District Polytechnic College, Tripura, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5_45
491
492
S. Ghosh et al.
1 Introduction The artificial engineered metamaterial has drawn significant amount of attention in several applications like absorber [1], metamaterial antenna [2] due to its unusual electromagnetic properties. Very recently planar counter-part of the metamaterial which is called metasurface [3] is gaining popularity due to easy fabrication and device integration. Metasurface also has several interesting properties for different practical application. Among the several properties of the metasurface, the polarization conversion is one of fascinating properties of the metasurface by which the polarization states of the electromagnetic can be manipulated. In [4], huang et al. proposed a strip and ring-shaped structure-based metasurface for dual band linear to linear polarization conversion. Recently, kamal et al. proposed a split ring resonator based metasurface for dual bands of linear to its orthogonal linear polarization conversion in [5]. Apart from dual band linear polarization conversion, in [6], karamirad et al. proposed a wideband L-t-L polarization converter based on an oval shaped metasurface. Besides the L-t-L polarization converter, in [7], a work on L-t-C polarization conversion using slot type frequency selective surface was demonstrated by Clendinning et al. Later on, baghel et al. developed a L-t-C polarization converter metasurface for 8.8 GHz frequency [8]. Recently, a dual bands of L-t-C polarization converter using an anisotropic metasurface were presented by Fahad et al. [9]. However, very few works on multifunctional polarization converter for simultaneous linear to linear polarization conversion and linear to circular polarization are reported [10, 11]. In this context, its worth mentioning that they only depict single band of linear to linear and dual narrow bands of linear to circular polarization conversion [10, 11]. Moreover, low structural thickness is also an essential parameter for practical applications. The both [10, 11] multifunctional metasurfaces have been developed on a relatively thick substrate. In this regard, to address the previously mentioned problems, we have proposed a multifunctional polarization converter for simultaneous linear to circular and linear to linear polarization conversion with wide angular stability. The proposed metasurface depicts dual bands of L-t-L polarization conversion at 6.27–6.56 and 9.18–9.82 GHz. Simultaneously, the proposed ultrathin converter presents broadband L-t-C polarization conversion at 6.71–8.93 GHz frequency band. Besides, it also depicts linear to circular polarization conversion 6.00–6.17, 10.03–10.52 GHz frequency bands. Moreover, the proposed converter is developed on a thin dielectric substrate which is suitable for various practical applications multi-polarization communication devices, radar cross section reduction.
2 Structural Geometry The schematic structure of the proposed ultrathin metasurface composed by 12 × 12 unit cells and the structure of the unit cell with its side view is depicted in the Fig. 1. The metasurface is consisted of a diagonal fence-shaped unit cell structure on top
Ultrathin Multifunctional Polarization Converting Metasurface
493
Fig. 1 Design of the ultrathin metasurface and its unit cell geometry and its side view
and a continues PEC (metal) ground layer on the bottom of the of the FR4 epoxy dielectric material (r = 4.4 and tanδ = 0.02). The optimized structural parameters are M1 = 1.10mm, M2 = 1.00mm, M3 = 1.80mm, M4 = 6.00mm, N1 = 1.00mm, N2 = 10.20mm, N2 = 10.20mm, N3 = 2.00V , N4 = 1.60mm. The periodicity and dielectric substrate height are P = 10mm, t = 1.4mm, respectively.
3 Metasurface Analysis and Discussion The performance of the polarization converter metasurface is primarily represented by the reflection behavior in terms of co-polarized and cross-polarized reflections and their respective reflection phases. In order to obtain the reflection behavior, the unit cell geometry of the metasurface is numerically simulated by ANSYS HFSS simulator using suitable excitation and periodic boundary condition. The co-polarized and cross-polarized reflection behavior is observed to determine the L-t-L polarization conversion and as well as the L-t-C polarization conversion [10]. The reflection coefficient performance (both co- and cross- polarized reflection) is shown in Fig. 2. Their respective reflection phase behavior is also depicted in Fig. 3. The co- and cross-polarized reflection is denoted by R yy and Rx y , respectively, and their respective reflection phase is signified by φ yy and φx y . Figure 2 shows high cross-polarized reflection (Rx y ≥ −3 dB) and low co-polarized reflection (R yy ≤ −10 dB) over the frequency band of 6.27–6.56 and 9.18–9.82 GHz. The high Rx y (Rx y ≥ −3 dB) and low R yy (R yy ≤ −10 dB) confirm linear to linear polarization conversion with high efficiency on that frequency region. The L-t-L polarization conversion can be further explained by the polarization conversion ratio |r |2 (PCR) which can be derived as PCR= |r yx |2 yx+|rx x |2 [11]. The calculated PCR as shown in the Fig. 4 further ensures highly efficient L-t-L polarization conversion over the wideband at 6.27–6.56 and 9.18–9.82 GHz frequency bands of interest. Along with
494
S. Ghosh et al.
Fig. 2 Reflection co-efficient of the proposed ultrathin metasurface
linear to linear polarization conversion, the linear to circular polarization conversion is another important characteristic of multifunctional polarization converter. The two reflection components, i.e., R yy and Rx y , show nearly equal magnitude 6.00–6.17, (where 6.71–8.93, 10.03–10.52 GHz frequency band as shown in Fig. 2. Moreover, nπ 2 n = odd integer) reflection phase difference between the co-polarized reflection phase φ yy and cross-polarized reflection phase φx y is observed at the 6.00–6.17, 6.71–8.93, 10.03–10.52 GHz frequency band of interest as shown in Fig. 3. The nearly equal two ) phase difference confirms L-t-C polarization conreflection component with a ( nπ 2 version at those frequency regions. The L-t-C polarization conversion can be further √ |r |2 +|r |2 + a 0.5
clarified through the axial ratio (AR) which can be defined as AR= ( |rxyxx |2 +|rxyxx |2 −√a ) .
Where, a =|r x x |4 + |r yx |4 + cos(2 yx ), yx = x x − yx [9]. The axial ratio (AR) performance (Axial Ratio ≤ 3 dB) in Fig. 5 further confirms the linear to circular polarization conversion at 6.00–6.17, 6.71–8.93, 10.03–10.52 GHz frequency region. Moreover, the performance of proposed polarization converter is observed under different oblique incidence for examining its application in several practical scenario. The PCR and AR under different oblique incidence from 0o to 60◦ are depicted in the Figs. 6 and 7, respectively. The linear to linear polarization conversion performance does not reduce significantly at higher oblique incidence as shown in Fig. 6. Moreover, the axial ratio performance does not change much up to 30◦ of oblique incidence as shown in Fig. 7. However, axial ratio performance for circular polarization shows some deviation with relatively stable performance at higher oblique incidence at the previously mentioned frequency regions of interest. These high angular stabilities make the metasurface a good contender for several interesting practical applications.
Ultrathin Multifunctional Polarization Converting Metasurface Fig. 3 Reflection phase behavior of the proposed ultrathin metasurface
Fig. 4 The calculated frequency versus PCR of the proposed ultrathin metasurface
Fig. 5 Calculated frequency versus AR of the proposed polarization converter
495
496 Fig. 6 Calculated PCR under different oblique incidence
Fig. 7 Calculated axial ratio (AR) under different oblique incidence
Fig. 8 Reflection co-efficient and reflection phase under u- and v-polarization
S. Ghosh et al.
Ultrathin Multifunctional Polarization Converting Metasurface
497
Fig. 9 Surface current distribution a upper layer at 6.4 GHz, b lower layer at 6.4 GHz, c upper layer at 9.5 GHz, d lower layer at 9.5 GHz
4 Theoretical Aspect To illustrate the theoretical aspect of the proposed polarization conversion, the incident electric field is decomposed into u- and v-axis components which are ∓45◦ along the XY -axis. The both co-polarized reflection co-efficients, i.e., Ruu and Rvv are nearly equal over the operating frequency band as shown in Fig. 8. Moreover, at 6.27–6.56 and 9.18–9.82 GHz frequency bands, 180◦ phase difference between u- and v-component reflection phase is observed in Fig. 8 which fulfills the criteria of linear to linear polarization conversion. Besides, at 6.00–6.17, 6.71–8.93, 10.03– 10.52 GHz, frequency band around 90◦ phase variance between u- and v-reflection phase is observed in Fig. 8 which satisfy the requirement of linear to circular polarization conversion. Furthermore, Fig. 9 shows surface current distribution of the upper and lower layer at the resonances, i.e., 6.4 and 9.5 GHz. The upper and lower layer surface current distribution is in opposite direction for both 6.4 and 9.5 GHz frequency. These opposite surface current indicates magnetic dipole resonance which results in L-t-L polarization conversion at the above frequency regions [5].
5 Conclusion In this article, we have proposed an ultrathin metasurface for simultaneous L-t-L and L-t-C polarization conversion with wide angular stability. The proposed ultrathin metasurface depicts improved linear to circular polarization conversion bandwidth with reduced structural thickness as compared to [10, 11]. Moreover, high angular stability can fulfill many criteria for different practical applications radar cross section reduction, multi-polarization communication devices.
498
S. Ghosh et al.
References 1. Sarkhel A, Chaudhuri SRB (2017) Compact quad-band polarization-insensitive ultrathin metamaterial absorber with wide angle stability. IEEE Antennas Wirel Propag Lett 16:3240–3244 2. Ghosh S, Roy S, Chakraborty U, Sarkhel A (2022) An ELC meta-resonator inspired wideband wearable MIMO antenna system for biotelemetry applications. J Electromagn Waves Appl 36(8):1113–1129 3. Holloway CL, Kuester EF, Gordon JA, O’Hara J, Booth J, Smith DR (2012) An overview of the theory and applications of metasurfaces: the two-dimensional equivalents of metamaterials. IEEE Antennas Propag Mag 54(2):10–35 4. Huang X, Yang H, Zhang D, Luo Y (2019) Ultrathin dual-band metasurface polarization converter. IEEE Trans Antennas Propag 67(7):4636–4641 5. Kamal B, Chen J, Yin Y, Ren J, Ullah S, Ali U (2021) Design and experimental analysis of dual-band polarization converting metasurface. IEEE Antennas Wirel Propag Lett 20(8):1409– 1413 6. Karamirad M, Ghobadi C, Nourinia J (2020) Metasurfaces for wideband and efficient polarization rotation. IEEE Trans Antennas Propag 69(3):1799–1804 7. Clendinning S, Cahill R, Zelenchuk D, Fusco V (2019) Bandwidth optimization of linear to circular polarization convertors based on slot FSS. Microw Opt Technol Lett 61(5):1200–1207 8. Baghel AK, Kulkarni SS, Nayak SK (2019) Linear-to-cross-polarization transmission converter using ultrathin and smaller periodicity metasurface. IEEE Antennas Wirel Propag Lett 18(7):1433–1437 9. Fahad AK, Ruan C, Nazir R, Haq TU, He W (2020) Dual-band ultrathin meta-array for polarization conversion in ku/ka-band with broadband transmission. IEEE Antennas Wirel Propag Lett 19(5):856–860 10. Pouyanfar N, Nourinia J, Ghobadi C (2021) Multiband and multifunctional polarization converter using an asymmetric metasurface. Sci Rep 11(1):1–15 11. Nguyen TKT, Nguyen TM, Nguyen HQ, Cao TN, Le DT, Bui XK, Bui ST, Truong CL, Vu DL, Nguyen TQH (2021) Simple design of efficient broadband multifunctional polarization converter for x-band applications. Sci Rep 11(1):1–12
Author Index
A AbhinavRaj Gautam, 259 Abhishek Sarkhel, 415, 491 Abira Dey, 221 Achisman Kundu, 303 Aheibam Dinamani Singh, 457, 467 Ahmet Kati, 221 Ajoy Dey, 231 Akash Kumar Bhagat, 259 Alak Kumar Datta, 51 Alokesh Mondal, 371 Alongbar Wary, 435 Amartya Paul, 333 Amos Bortiew, 97 Anirban Samanta, 371 Anup Kumar Halder, 241 Arindam Dey, 435 Ashwani Sharma, 221 Atri Adhikari, 259 Avra Ghosh, 231 Ayatullah Faruk Mollah, 107, 129
B Banani Saha, 85 Barnali Chakraborty, 259 Barun Barua, 37, 185 Bhubneswar Das, 19 Bhuyan, M. K., 139, 185, 445 Bijoyini Bagchi, 313 Bishnu Sharma, 37 Bronson Syiem, 59
C Chakraborty, C., 37 Chandan Bandyopadhyay, 435 Chinmay Debnath, 51 Christian Kollmann, 313
D Darilangi S. Lyngdoh, 361 Dariusz Plewczynski, 241, 251 Debajit Sarma, 139, 445 Debashri Mondal, 279 Debotosh Bhattacharjee, 313, 321 Debraj Chakraborty, 399 Deep Baishya, 381 Dhruba K. Bhattacharyya, 29 Dipannita Banerjee, 269
F Fairriky Rynjah, 59
G Gargi Bandhyopadhay, 371 Genevieve Chyrmang, 445
H Hafizur Rahaman, 435 Hasin A. Ahmed, 29
J Jayanta Das, 321
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Das et al. (eds.), Proceedings of International Conference on Data, Electronics and Computing, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-1509-5
499
500 Jayeeta Saha, 153 Jean Bernard Idoipe, 221 Joyprakash Singh, L., 59 Juwesh Binong, 341
K Kalpita Dutta, 173 Kangkana Bora, 37, 185, 445 Kasmika Borah, 185 Kaushiki Roy, 313 Kaustav Sengupta, 241, 251 Khan Masood Parvez, 423 Khiakupar Jyndiang, 59 Krishna Daripa, 51 Kungnor Rangpi, 37
L Laxmikant Minz, 423 Lipi B. Mahanta, 37, 185
M Madhurima Chattopadhyay, 399 Mahantapas Kundu, 173 Mala, R., 37 Marut Deo Sharma, 341 Md Mobbasher Ansari, 67 Mita Nasipuri, 67, 129, 173, 241 Moinul Haque, SK., 423 Moirangthem Santoshkumar Singh, 491 Moumita Mukherjee, 399
N Natasha Kholgade Banerjee, 117 Nathalie Larzat, 221 Nayana Dey, 201 Nayanjyoti Mazumdar, 3 Nazrul Hoque, 29 Nemai Roy, 303 Nibaran Das, 67, 173, 279, 289 Niratyay Biswas, 399 Nirmal Das, 269
O Oindrila Ghosh, 279
P Pabitra Roy, 289 Pal Choudhury, J., 165
Author Index Pankaj Kumar Deva Sarma, 3 Pankaj Sarkar, 333, 387 Piyali Chatterjee, 241, 259 Prabir Banerjee, 479 Prahlad Borah, 37 Pramit Ghosh, 201 Pritiman Sikder, 303 Priti Shaw, 67
R Rafał Chabasi´nski, 251 Rahul Dhabbal, 371 Rajdeep Sarkar, 371 Rajkishur Mudoi, 361, 381 Raju Hazari, 435 Rakesh Das, 435 Raunak Roy, 371 Ritu Mondal, 51 Rowsonara Begum, 107 Rudrajit Bhattacharyya, 67 Ruoya Li, 221
S Sabyasachi Chatterjee, 479 Sandip Rakshit, 77 Sangeeta Das, 387 Santoshkumar Singh Moirangthem, 415 Sean Banerjee, 117 Shauvik Paul, 269 Sheli Sinha Chaudhuri, 231 Shilpi Naskar, 153 Showmik Bhowmik, 303 Shrayasi Datta, 165 Shyamali Mitra, 289 Sichao Li, 117 Sisira Hawaibam, 457, 467 Smriti Kumar Sinha, 19 Soma Hazra, 85 Soumee Mukherjee, 269 Soumendu Ghosh, 415, 491 Soumi Paul, 129 Soumyabrata Dey, 117 Soumyajyoti Dey, 279 Soumya Nasipuri, 279 Soumyendu Sekhar Bandyopadhyay, 241 Soupik Chowdhury, 173 Sourav Kumar, 67 Sourav Pramanik, 321 Sourav Roy, 415, 491 Sovan Saha, 259 Subhabrata Dhar, 479
Author Index Subhadip Basu, 67, 129, 241, 269 Subhrabesh Dutta, 269 Sukanta Chakraborty, 279 Sunirmal Khatua, 85 Swarnajyoti Patra, 97
T Tapas Chakraborty, 67 Trishna Barman, 139
501 Tuhin Karmakar, 371
U Udochukwu Okoro, 77 Usman Ahmad Baba, 77
Y Yuji Iwahori, 139, 445