123 21 15MB
English Pages 592 [558] Year 2023
Lecture Notes in Networks and Systems 690
Ram Sarkar · Sujata Pal · Subhadip Basu · Dariusz Plewczynski · Debotosh Bhattacharjee Editors
Proceedings of International Conference on Frontiers in Computing and Systems COMSYS 2022
Lecture Notes in Networks and Systems Volume 690
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Ram Sarkar · Sujata Pal · Subhadip Basu · Dariusz Plewczynski · Debotosh Bhattacharjee Editors
Proceedings of International Conference on Frontiers in Computing and Systems COMSYS 2022
Editors Ram Sarkar Department of Computer Science and Engineering Jadavpur University Kolkata, India
Sujata Pal Department of Computer Science and Engineering IIT Ropar Rupnagar, Punjab, India
Subhadip Basu Department of Computer Science and Engineering Jadavpur University Kolkata, India
Dariusz Plewczynski Faculty of Mathematics and Information Sciences Warsaw University of Technology Warsaw, Poland
Debotosh Bhattacharjee Department of Computer Science and Engineering Jadavpur University Kolkata, West Bengal, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-2679-4 ISBN 978-981-99-2680-0 (eBook) https://doi.org/10.1007/978-981-99-2680-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
COMSYS-2022, the third International Conference on Frontiers in Computing and Systems, was organized by the Indian Institute of Technology-Ropar, Panjab, India, and COMSYS Educational Trust, Kolkata, from 19 December to 21 December 2022. Like its previous two editions, COMSYS-2020 and COMSYS-2021, COMSYS-2022 offered a unique platform for scientists and researchers in computing and systems to interact and exchange scientific ideas and present their novel contributions in front of a distinguished audience, fostering business and research collaborations. The conference accepted papers on several important and cutting-edge topics that have been grouped into six tracks: 1) Artificial intelligence, Machine Learning, and Data Science, 2) Devices, Circuits, and Systems, 3) Computational Biology and Bioinformatics, 4) Communication Networks, Cloud computing, and Internet of Things, 5) Image, Video, and Signal Processing, and 6) Security and Privacy. We received 132 submissions from different educational institutes and research organizations in India as well as abroad. After thorough reviews and plagiarism checking, 50 papers were accepted for oral presentations, with an acceptance rate of around 38%. Accepted papers were spread over 8 technical sessions and presented at IIT-Ropar. In addition, the COMSYS-2022 technical program included three keynote lectures by eminent scientists and academicians from Germany and India and three engaging tutorial sessions from UAE, UK and India. The overall technical program of COMSYS-2022 effectively blended a wide area of interest in computing and systems and brought together experts from both industry and academia. We are especially thankful to the submitting authors for their strong and diverse submissions that could help the review committee members to choose a strong set of technically sound research papers. A good number of students and research scholars from India and abroad had also registered for the conference. COMSYS-2022 received considerable global and national attention, with technical program committee members and reviewers from 30+ different countries voluntarily participating in the technical process. Participants from 7 countries outside India and 13 different states in India attended the conference. We would like to express our sincere gratitude to all the technical program committee members and reviewers for their wholehearted cooperation and support v
vi
Preface
to complete the review process smoothly. This conference was basically an output of great teamwork. We would like to thank all for making this a lively community dedicated to the advancement of technology. COMSYS-2022 was inaugurated by the Chief Guest, Prof. Manoj Singh Gaur, Director IIT Jammu, and the guest of honor, Prof. Rajeev Ahuja, Director IIT Ropar, on 20 December 2022, in the presence of distinguished dignitaries from renowned institutions of India and abroad. In a word, it is always a team effort that defines a successful conference. We look forward to seeing all of you at the next edition of COMSYS. Kolkata, India Rupnagar, India Kolkata, India Warsaw, Poland Kolkata, India
Ram Sarkar Sujata Pal Subhadip Basu Dariusz Plewczynski Debotosh Bhattacharjee
Contents
Machine Learning and Applications AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonali Samal, Shivam Pandit, Bunil Kumar Balabantaray, Arun kumar Sahani, and Rajashree Nayak
3
Analyzing Market Dynamics of Agricultural Commodities: A Case Study Based on Cotton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikhila Korivi, Peteti Sravani, Jafar Ali, and C. C. Sobin
13
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional Random Fields-Based System . . . . . . . . . . . . . . . . . . . . . . . Pooja Rai and Sanjay Chatterji
23
Design and Development of a ML-Based Safety-Critical Fire Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anukruthi Karre, Mahin Arafat, and Akramul Azim
35
Compressed Image Super-Resolution Using Pre-trained Model Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Umar Masud and Friedhelm Schwenker
51
Optimization of Character Classes in Devanagari Ancient Manuscripts and Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonika Rani Narang, Munish Kumar, and M. K. Jindal
59
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mondal Dhiman, Chatterjee Chandra Churh, Roy Kusal, Paul Anupam, and Kole Dipak Kumar Leaf-Based Plant Disease Detection Using Intelligent Techniques—A Comprehensive Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sourav Chatterjee, Sudakshina Dasgupta, and Indrajit Bhattacharya
71
81
vii
viii
Contents
Bengali Document Retrieval Using Model Combination . . . . . . . . . . . . . . . Soma Chatterjee and Kamal Sarkar
91
Deep Neural Networks Fused with Textures for Image Classification . . . 103 Asish Bera, Debotosh Bhattacharjee, and Mita Nasipuri Real-Time Prediction of In-Hospital Outcomes Using a Multilayer Perceptron Deployed in a Web-Based Application . . . . . . . . . . . . . . . . . . . . 113 Varun Nair, V. P. Nathasha, Uday Pratap Singh Parmar, and Ashish Kumar Sahani Automated Analysis of Connections in Model Diagrams . . . . . . . . . . . . . . . 123 Sandeep Kumar Erudiyanathan, Chikkamath Manjunath, and Gohad Atul No-Reference Image Quality Assessment Using Meta-Learning . . . . . . . . 137 Ratnadeep Dey, Debotosh Bhattacharjee, and Ondrej Kejcar Security Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Subinoy Sikdar, Joydeep Biswas, and Malay Kule Generating a Suitable Hash Function Using Sudoku for Blockchain Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Sunanda Jana, Esha Sen Sharma, Abhinandan Khan, Arnab Kumar Maji, and Rajat Kumar Pal Why Traditional Group Key Management Schemes Don’t Scale in Multi-group Settings? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Payal Sharma and B. R. Purushothama RESPECTO: Now We May Go Alone, Safety is in Our Hand . . . . . . . . . . 183 Kumaresh Baksi, Gargi Ghosh, Deep Banik, Soumyanil Das, Arka Saha, and Munshi Yusuf Alam Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 P. S. Akshatha and S. M. Dilip Kumar Recent Trends in Cryptanalysis Techniques: A Review . . . . . . . . . . . . . . . . 209 Subinoy Sikdar and Malay Kule A New Approach to Pharmaceutical Product Verification Using Barcode and QR Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Prithwish Kumar Pal and Malay Kule SmartGP: A Framework for a Two-Factor Graphical Password Authentication Using Smart Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Palash Ray, Rajesh Mukherjee, Debasis Giri, and Mahuya Sasmal
Contents
ix
Impact of Existing Deep CNN and Image Descriptors Empowered SVM Models on Fingerprint Presentation Attacks Detection . . . . . . . . . . . 241 Jyotishna Baishya, Prasheel Kumar Tiwari, Anuj Rai, and Somnath Dey Centralized Approach for Efficient Management of Distributed Linux Firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Deepika Dutta Mishra, P. Kalyan, Virender Dhakwal, and C. S. R. C. Murthy An Improved (24, 16) OLS Code for Single Error Correction-Double Adjacent Error Correction-Triple Adjacent Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Sayan Tripathi, Jhilam Jana, and Jaydeb Bhaumik A Blockchain-Based Biometric Protection and Authentication Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Surbhi Sharma and Rudresh Dwivedi Circuit, Device and VLSI Optimizing Throughput Using Effective Contention Aware Adaptive Data Rate in LoRaWAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 R. Swathika and S. M. Dilip Kumar ESCA: Embedded System Configuration Assistant . . . . . . . . . . . . . . . . . . . 303 Akramul Azim and Nayreet Islam On Detection of Hardware Trojan in Memristive Nanocrossbar-Based Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Subhashree Basu, Ranjit Ghoshal, and Malay Kule Robust Control of Pulsatile Ventricular Assist Devices for Patients with Advanced Heart Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Rajarshi Bhattacharjee, Shouvik Chaudhuri, and Anindita Ganguly Performance Analysis of a Chaotic OFDM-FSO Communication System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Chinmayee Panda, Sudipta Sahoo, Urmila Bhanja, and Debajyoti Mishra Further Improved 198 nW Ultra-Low Power 1.25 nA Current Reference Circuit with an Extremely Low Line Sensitivity (0.0005%/V) and 160 ppm/◦ C Temperature Coefficient . . . . . . . . . . . . . . . . 357 Koyel Mukherjee, Soumya Pandit, and Rajat Kumar Pal Performance Analysis of Multivariate Autoregression Based EEG Data Compressor Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Md. Mushfiqur Rahman Chowdhury and Shubhajit Roy Chowdhury
x
Contents
A New Method to Detect the Dissimilarity in the Blood Flow of Both Carotid Arteries Using Photoplethysmography . . . . . . . . . . . . . . . . 387 Kshitij Shakya and Shubhajit Roy Chowdhury A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body Floating Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Abhishek Kayal, Amit Bar, Shrabanti Das, and Sayan Chatterjee A 3.6–6.3 GHz High Gain Wideband LNA Using Cascode Topology . . . . 407 Abu Hasan Gazi, Ekadashi Hembram, Shrabanti Das, and Sayan Chatterjee Deep Learning for Segmentation of Polyps for Early Prediction of Colorectal Cancer: A Prosperous Direction . . . . . . . . . . . . . . . . . . . . . . . . 415 Debapriya Banik, Ondrej Krejcar, and Debotosh Bhattacharjee Reduction of Current-Collapsing in Small Gate to Drain Length AlGaN/GaN Super Hetero-Junction HEMT for High-Frequency Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 N. Banu and C. Mondal Biomedical and Bioinformatics Implementation of Few Deep Learning Models to Detect Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Ruhul Amin Hazarika, Kiran Shyam, and Arnab Kumar Maji Cardiotocography Fetal Health Data Analysis Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Anu Singha and Vanitha Venkateswaran Multiobjective Differential Evolution for Predicting Protein-Protein Interactions Using GO-Based Semantic Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Anirban Mukhopadhyay and Moumita De Diagnoses of Covid-19 Using Radiographic Chest X-Ray Images Based on Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Kyamelia Roy, Sheli Sinha Chaudhuri, Srijita Bandopadhyay, Ishan Jyoti Ray, Yagyashree Acharya, Somava Nath, and Soumen Banerjee DeConPPI: Deep Consensus-Based Prediction of Protein-Protein Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Aanzil Akram Halsana, Tapas Chakroborty, Anup Kumar Halder, and Subhadip Basu
Contents
xi
Hypertension Prediction by Using Machine Learning Algorithm Based on Physiological Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Ravinder Kumar, Aman Adatia, Gurpreet Singh Wander, and Ashish Kumar Sahani LSIP: Locality Sensitive Intensity Projection for 2D Mapping of High-Res 3D Images of Dendritic Spines . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Shauvik Paul, Nirmal Das, Suchandra Bose Dutta, Rayala Adityar, Tapabrata Chakraborti, Andre Zeug, and Subhadip Basu NCSML-HDTD: Network Centrality and Sequence-Based Machine Learning Methodology for Human Drug Targets Discovery of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Shalini Jha, Chandrima Das, and Sovan Saha A Manufacturer Agnostic IoT and AI-Based System for Continuous Real-Time Recording of Parameters from a Patient Monitor Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Vibham Kumar Dubey, Amanpreet Chander, Gurpreet Singh Wander, and Ashish Sahani A General System for Dataset Generation from Smartwatch Sensors for Biomedical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Rishabh Jain, Manav Mago, Vibham Kumar Dubey, V. P. Nathasha, Rahul Shukla, and Ashish Kumar Sahani OPCML Methylation and the Possibility of Breast and Ovarian Cancer: Bioinformatics and Meta Syntheses . . . . . . . . . . . . . . . . . . . . . . . . . 545 Arideepa Bhattacharjee and Amit Dutta Deep Visualisation-Based Interpretable Analysis of Digital Pathology Images for Colorectal Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Alexandre Guérin, Subhadip Basu, Tapabrata Chakraborti, and Jens Rittscher A Survey on COVID-19 Lesion Segmentation Techniques from Chest CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Kaushiki Roy, Debotosh Bhattacharjee, and Ondrej Krejcar Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
About the Editors
Dr. Ram Sarkar received his B. Tech degree in Computer Science and Engineering from University of Calcutta, India in 2003. He received his M.E. degree in Computer Science and Engineering and PhD (Engineering) degree from Jadavpur University, Kolkata, India in 2005 and 2012 respectively. He joined the department of Computer Science and Engineering of Jadavpur University as an Assistant Professor in 2008, where he is now working as a full Professor. He received the Fulbright-Nehru Fellowship (USIEF) for post-doctoral research in University of Maryland, College Park, USA in 2014-15. He has published more than 390 research papers in various Journals and Conference Proceedings. His research areas include Image and Video Processing, Optimization Algorithms and Deep Learning. He is a senior member of the IEEE, and member of ACM. Website: http://www.jaduniv.edu.in/profile.php?uid=686 Dr. Sujata Pal is an Assistant Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Ropar. She received her Ph.D. degree from Indian Institute of Technology Kharagpur. Sujata was a recipient of Tata Consultancy Services (TCS) Research Scholarship for 4 years for pursuing the Ph.D. program. She was awarded the prestigious Schlumberger Faculty for the Future Fellowship for two consecutive years (2015 and 2016). She was a Postdoctoral fellow at the University of Waterloo, Canada, before joining IIT Ropar. Her research works have been published in high quality international journals, such as IEEE TMC, TPDS, TC, TCyb, ACM Computing Surveys, conferences, a book and several book chapters. Her research interests include IoT, Wireless Body Area Networks, Software Defined Networks, Delay Tolerant Networks, Content Centric Networks, Mobile Ad hoc Networks, and Wireless Sensor Networks. Website: http://cse.iitrpr.ac.in/dr-sujata-pal Dr. Subhadip Basu is a Full Professor in the Computer Science and Engineering Department of Jadavpur University, where he joined in 2006. He received his PhD from Jadavpur University and did his postdocs from University of Iowa, USA, and University of Warsaw, Poland. Dr Basu holds an honorary position as a Research xiii
xiv
About the Editors
Scientist at the University of Iowa, USA, since 2016. He is the Co-Founder and Honorary Advisor of Infomaticae, a technology startup headquartered in Kolkata, India. He has also worked in reputed International Institutes like, Hitachi Central Research Laboratory, Japan, Bournemouth University, UK, University of Lorraine, France, Nencki Institute of Experimental Biology, Poland and Hannover Medical School, Germany. Dr Basu has 250+ international research publications in the areas of Pattern Recognition, Machine Learning, Bioinformatics, Biomedical Image Analysis etc. He has edited ten books, received two US patents, supervised 10 PhD students and received several major research grants from UGC, DST and DBT, Govt. of India. Dr Basu is the recipient of the ‘Research Award’ from UGC, Govt. of India in 2016. He also received the DAAD Senior-Scientist fellowship from Germany, Hitachi Visiting-Research fellowship from Japan, EMMA and CLINK VisitingResearcher fellowships from the European Union, BOYSCAST and FASTTRACK Young-Scientist fellowships from DST, Govt. of India. He is the past Chairperson of the IEEE Computer Society Kolkata, a senior member of IEEE, member of ACM and life member of IUPRAI. Dr. Dariusz Plewczynski is a professor at University of Warsaw in Center of New Technologies CeNT, Warsaw, Poland, the head of Laboratory of Functional and Structural Genomics and the principal investigator at Mathematics and Information Science Department at Warsaw University of Technology. His interests are focused on functional and structural genomics. Functional genomics attempts to make use of the vast wealth of data produced by high-throughput genomics projects, such as the structural genomics consortia, Human genome project, 1000 Genomes Project, ENCODE, and many others. The major tools that are used in this interdisciplinary research endeavor include statistical data analysis (GWAS studies, clustering, machine learning), genomic variation analysis using diverse data sources (karyotyping, confocal microscopy, aCGH microarrays, next generation sequencing: both whole genome and whole exome), bioinformatics (protein sequence analysis, protein structure prediction), and finally biophysics (polymer theory and simulations) and genomics (epigenetics, genome domains, three dimensional structure analysis of chromatin). He is presently involved in several Big Data informatics projects at Faculty of Mathematics and Information Sciences at Warsaw University of Technology, biological experiments at the Centre of New Technologies at University of Warsaw (his second affiliation), collaborating closely with The Jackson Laboratory for Genomic Medicine (an international partner of the TEAM project), and The Centre for Innovative Research at Medical University of Bialystok (UMB). He was actively participating in two large consortia projects, namely 1000 Genomes Project (NIH) by bioinformatics analysis of genomic data from aCGH arrays and NGS (next generation sequencing, deep coverage) experiments for structural variants (SV) identification; and biophysical modeling of chromatin three-dimensional conformation inside human cells using HiC and ChIA-PET techniques within the 4D Nucleome project funded by the NIH in the USA. His goal is to combine the SV data with three-dimensional cell nucleus structure for better understanding of normal genomic variation among human populations, the natural selection process during
About the Editors
xv
human evolution, mammalian cell differentiation, and finally the origin, pathways, progression, and development of cancer and autoimmune diseases. Dr. Debotosh Bhattacharjee is working as a full professor in the Department of Computer Science and Engineering, Jadavpur University, with nineteen years of post-PhD experience. His research interests pertain to the applications of machine learning techniques for Face Recognition, Gait Analysis, Hand Geometry Recognition, and Diagnostic Image Analysis. He has authored or co-authored more than 280 journals and conference publications, including several book chapters in Biometrics and Medical Image Processing. Two US patents have been granted on his works. Prof. Bhattacharjee has been granted sponsored projects by the Govt. of India funding agencies like the Department of Biotechnology (DBT), Department of Electronics and Information Technology (DeitY), University Grants Commission (UGC) with a total amount of around INR 2 Crore. For postdoctoral research, Dr. Bhattacharjee has visited different universities abroad like the University of Twente, The Netherlands; Instituto Superior Técnico, Lisbon, Portugal; University of Bologna, Italy; ITMO National Research University, St. Petersburg, Russia; University of Ljubljana, Slovenia; Northumbria University, Newcastle Upon Tyne, UK and Heidelberg University, Germany. He is a life member of the Indian Society for Technical Education (ISTE, New Delhi), the Indian Unit for Pattern Recognition and Artificial Intelligence (IUPRAI), a senior member of IEEE (USA), and a fellow of the West Bengal Academy of Science and Technology.
Machine Learning and Applications
AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification Sonali Samal , Shivam Pandit, Bunil Kumar Balabantaray, Arun kumar Sahani, and Rajashree Nayak
Abstract Classification of pornographic images has emerged as an important topic of discussion in the current era of big data across various social media platforms. In this paper, we have proposed a hybrid deep learning-based obscene image classification model, i.e., Attention embedded VGG 16 with Extreme Learning Machine (AVENet). AVENet model integrates data tuning, model tuning which is the incorporation of attention mechanism in VGG 16, and a tuned classifier (extreme learning machine) to yield maximum classification accuracy. The proposed model is validated and tested on benchmark datasets, i.e., NPDI, Pornography-2k, and our proposed explicit image dataset (EID). The proposed model AVENet achieved a testing accuracy of 95.66%, precision of 92.10%, false positive rate of 0.226%, and FowlkesMallows index of 92.00%. Keywords VGG 16 · Attention · Transfer learning · Obscene classification
S. Samal (B) · S. Pandit · B. K. Balabantaray National Institute of Technology Meghalaya, Shillong, Meghalaya 793003, India e-mail: [email protected] S. Pandit e-mail: [email protected] B. K. Balabantaray e-mail: [email protected] A. Sahani Ministry of Electronics and IT, New Delhi, India e-mail: [email protected] R. Nayak JIS Institute of Advanced Studies and Research, JIS University, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_1
3
4
S. Samal et al.
1 Introduction The abundance of sexually explicit content on social media platforms has contributed to a substantial increase in the online molestation of children and women. As alarming as it may sound, the Internet platform has also been used as a tool for criminality. Defying social norms, obscenity includes pornographic media, audio, text, and websites with pornographic advertising. The percentage of pornographic traffic coming from mobile devices alone has increased to 53%, up from 45% as compared to the previous year [1]. Despite regulations, several outlets allow live broadcasters to engage in indecent behavior. Free access to child sex abuse and other pornographic content online trivializes violent behavior. Numerous individuals exist solely to create and distribute explicit content online. Therefore, there is a rising need for a powerful computational tool that can automatically recognize and block inappropriate or obscene content on social media to prevent cyber and computer-assisted crimes. There are varieties of explicit image classification methodologies (EICMs) for classifying explicit content and blocking their further use at the user end. In the past, obscene image classification was accomplished using methods including skin-color segmentation, Bag-of-Words (BoW) models, and visual and motion feature analysis [2–5]. In [2], authors developed a pornography detection method extracting skin from images and compared it to non-skin. The skin approach is not complying reliable, as it can falsely detect pornographic images taken at the beach or at outdoors. Then, these correlations fed into a learning classification technique called SVM which acquired an accuracy of 75%, 35% false alarm rate, and 14% misdetection rate. In [3], authors first used the YCbCr color space to classify non-obscene items. Second, the skin tone detection threshold is calculated to filter image segments. Then, using the ResNet50 architecture, it decides if it has explicit material or not. In [4], VGG 16 network has been utilized for pornographic image classification with a testing accuracy of 93.8 and 6.2% error rate. Despite being widely used, deep learning models have limitations. Uncertainty in classification problems is commonly neglected, leading to insufficient negative datasets and inaccurate sample recognition. Attentional methods guarantee that incorrect sections of images receive enough focus, decreasing misclassification. In [5], authors have proposed a one-class attention-based convolutional neural network for pornography image recognition. One of the major drawbacks is it is trained in offline mode so real-time expert advice is a significant focus. This paper mainly proposes two major things: (i) Enhanced feature extraction using attention mechanism; (ii) Extreme learning machine classifier-based classification. Initially, for extracting deep enhanced features, we have used the VGG 16 convolutional neural network. However, the architecture of VGG 16 is modified in the backend with an attention module to extract detailed and deep features. Furthermore, the extracted feature vectors pass through the extreme learning machine (ELM) classifier to classify obscene contents. ELM has a rapid learning speed due to its basic structure and closed-form solution. The unpredictability makes ELM not necessarily true about these buried layer characteristics iteratively. The output layer
AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification
5
is a linear system whose weights are trained using the Moore-Penrose generalized inverse. The paper is organized as follows: Our proposed transfer learning model for the classification of obscene images and results analysis of the proposed approach is described in Sect. 2. Section 3 concludes the paper with its future scope.
2 Proposed Methodology This section describes the detailed methodical evolution of the proposed framework. In addition to the framework, it presents the dataset and its enhancement technique. This section has undergone a comprehensive ablation analysis of the modifications done in the model.
2.1 Dataset There are a total of 5000 pornographic images collected from several publicly accessible porn sites; in order to enrich the dataset, various data augmentation strategies including contrast improvement, cropping, blurring, brightness and sharpness adjustments, etc. have been implemented. The data augmentation process raised the number of images to 6000. There are 3000 obscene images and 3000 non-obscene images for the training procedure, 1000 images in each class for validation, and 1000 images in each class that has been tested for the testing procedure. The dataset is known as the explicit Image Dataset (EID). Again, the model is trained and compared to online available porn datasets, namely NPDI dataset [6] and Pornography-2k [7]. In order to precisely validate the model, the same amount of images is extracted from each publicly available dataset. Table 1 shows the specifications regarding the dataset.
2.2 Attention-Based VGG with ELM Classifier (AVENet) In this paper, we provide a unique deep learning model for the classification of obscene images utilizing the VGG 16 transfer learning model. To the architecture
Table 1 Specifications regarding dataset Dataset Training images Validation images Testing images EID NPDI [6] Pornography-2k [7]
6000 6000 6000
1000 1000 1000
1000 1000 1000
Total 8000 8000 8000
6
S. Samal et al.
Fig. 1 Proposed architecture of AVENet
of VGG 16, we incorporated the squeeze and excitation attention module (SE) for the extraction of a wide range of features in images which is basically the high-level along with the low-level features. After modifying the backend architecture of VGG 16 a suitable classifier is incorporated, i.e., ELM. Figure 1 shows a graphical framework of the proposed architecture.
2.2.1
VGG 16 Net
Images with dimensions of 224 × 224 × 3 are used as input for the VGG 16 model. The pool size is 2 × 2 throughout all levels, but the Kernel size is 3 × 3. Next, we have a pooling layer that decreases the height and width of the image to 112 × 112 × 64, followed by two 224 × 224 × 64 convolution layers. Next, we have a pooling layer, which further reduces the height and width of the image to 56 × 56 × 128. Finally, we have two conv128 layers, each of which is 112 × 112 × 128. Then, there are three conv256 layers, each measuring 56 × 56 × 256, and finally, a pooling layer decreases the overall image size to 28 × 28 × 256. There are then three conv512 levels, and each is 28 × 28 × 512 in size; finally, a pooling layer decreases the image size to 14 × 14 × 512. Then there are three conv512 layers with 14 × 14 × 521 layers each, followed by a pooling layer with 7 × 7 × 521 and two dense or fully connected layers [8].
2.2.2
Squeeze and Excitation (SE) Attention
The addition of a SE module to the suggested method increased the total computational complexity by less than 1%. The SE module is a new component for Convolutional Neural Networks that enhances channel interrelations with negligible
AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification
7
Fig. 2 Squeeze and excitation mechanism
additional computational overhead. After each convolution, a SE module is applied to improve feature extraction, with a special emphasis on obscene regions. Due to the fact that the obscene dataset is diverse in nature, the attention mechanism must accurately identify various types of obscene images, as there is only a tiny distinction between them. SE block auto-adjusts channel-specific feature responses by explicitly addressing channel interconnections. Domain-aware feature adjustment involves training to accentuate useful traits while hiding irrelevant ones. Each SE block uses two small fully linked layers for excitation, followed by a cost-effective channel-wise scaling operation, and a global average pooling action for squeezing. We estimate that this overhead contributes only 0.2–0.3% to the total cost of computation. To boost the network’s representation quality, Hu et al. [9] introduced the SE network in 2017. This was done by explicitly modeling the connections between the channels of convolutional features. Equation (1) represents the squeeze operation with certain width and height. Equation (2) depicts the excitation operation that happens after the squeeze one and Eq. (3) shows the combination of squeeze and excitation in a single function. Figure 2 depicts the architecture of the squeeze and excitation operation. w
1 × × u c (i, j) z c = Fn sq (u c ) = h×w i=1 j=1 h
(1)
Here in the squeeze operation convolution operation is guided by the above equation in which h and w are the height and width of the feature map in the process of the convolution operation. s = Fn ex (z, w)) = σ (g((z, w))) = σ (w2 δ(w1z)
(2)
Two fully connected layers around the non-linearity are used to learn the parameters w1 and w2 . The δ represents the rectified linear unit operation. xc = Fn scale (u c , sc )
(3)
Fn scale (u c , sc ) represents the channel-wise multiplication between the feature map u c , sc
8
S. Samal et al.
Fig. 3 VGG 16 along with the ELM classifier
2.2.3
ELM as Classifier
This stage is the final modification to the proposed model, incorporating ELM classifier. The redesigned model is termed AVENet. In 2006, Huang introduced ELM. ELM uses single hidden layer feed-forward neural networks [10] with randomly generated hidden node variables and scientifically determined output weights. ELM delivers accurate learning in image classification, pattern classification, and gesture recognition. The input layer has data properties but no calculations; the output layer is linear without scaling or bias. ELM randomly selects weights and biases. Fixed input weights provide a straightforward, non-iterative solution for output weights. Random input layer weights improve linear output layer adaptation with orthogonal hidden layer features. Figure 3 shows the overall framework for the VGG 16 model with the ELM classifier.
2.3 Result Analysis This section explains the performance metrics, implementation results, and evaluation with VGG 16, SE added to VGG 16, and AVENet. As the backend for all simulations, Keras and TensorFlow are employed. Adam’s rate of learning is 0.0001. As performance parameters, 128 batches and 100 epochs are outlined. Accuracy, precision, the false positive rate (FPR), and the Fowlkes-Mallows index (FM) [11] are utilized to evaluate the classification performance of our model. Table 2 illustrates the performance comparison of AVENet using our EID dataset, NPDI, and Pornography-2k datasets. As can be seen from the table values, the AVENet model outperforms other models in all datasets with a superior scale of testing accuracy. 1000 images have been taken from each dataset for the testing of the proposed AVENet model. Out of 1000 images, 500 are obscene and the remaining 500 are not. The 500 non-obscene images include solely well-dressed, normal-appearing people photographed either indoors or outdoors. The testing accuracy using the AVENet is 95.66% which is quite better than the base model, i.e., VGG 16. By adding the SE attention in the VGG 16 model, the precision, the FPR, and the FM have reached 90.04%, 0.276%, and 91.04%, respectively. The FPR rate is quite less as compared to VGG 16, i.e., 0.226%. Figure 4 depicts the graphs of training accuracy, validation
AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification Table 2 Accuracy comparison using EID dataset Models Testing Precision (%) accuracy (%) EID VGG 16 with SE AVENet [Proposed] NPDI [6] VGG 16 with SE AVENet [Proposed] Pornography-2k [7] VGG 16 with SE AVENet [Proposed]
9
FPR (%)
FM (%)
84.00 93.20 95.66
81.55 90.04 92.10
2.356 0.276 0.226
81.4 91.04 92.00
73.20 84.10 86.66
70.88 81.04 83.10
7.356 5.276 3.226
69.80 90.22 82.00
75.00 85.20 85.22
72.35 82.04 82.32
3.556 2.010 3.030
71.23 83.11 82.00
Fig. 4 Plots of accuracy and loss graphs versus Number of epochs using EID dataset a Training accuracy, b Validation accuracy, and c Validation loss
10
S. Samal et al.
Table 3 Performance comparison with the state-of-the-art methods Models Testing accuracy Precision (%) FPR (%) (%) [2] [3] [5] [Proposed]
74.00 85.00 75.11 95.66
71.25 81.22 71.84 92.10
8.542 3.750 7.145 0.226
FM (%) 71.00 81.05 71.54 92.00
Fig. 5 Confusion matrix of the proposed model AVENet using EID dataset
accuracy, and validation loss. Though we have used the 100 epochs from the graphical plots point of view we can see that in terms of accuracy and loss, the graph gets converged during the 10–30 epochs (see Fig. 4).
3 Conclusion In order to correctly classify obscene images using a lightweight model, we used the VGG 16 network and tweaked each component of the model, i.e., dataset enhancement with various types of augmentation, model tuning with SE attention, and an appropriate classifier for the model, i.e., ELM. With a testing accuracy of 95.66%, precision of 92.10%, FPR of 0.276%, and FM value of 91.04%, the AVENet model outperforms the VGG 16 model by a wide margin. The VGG 16 model’s accuracy, FPR, and FM improved to 90.04 %, 0.276%, and 91.04%, respectively, after considering the SE attention. AVENet’s FPR rate is 0.226%, which is significantly lower than the VGG 16’s rate, i.e., 2.356%. In future work, labeled images will be utilized to detect obscene sections in photographs, although, in the current paper, just the classification of obscene images has been performed.
References 1. Math SB, Viswanath B, Maroky AS, Kumar NC, Cherian AV, Nirmala MC (2014) Sexual crime in India: is it influenced by pornography? Indian J Psychol Med 36(2):147–152 2. Lin Y-C, Tseng H-W, Fuh C-S (2003) Pornography detection using support vector machine. In: 16th IPPR conference on computer vision, graphics and image processing (CVGIP 2003), vol 19, pp 123–130
AVENet:Attention-Based VGG 16 with ELM for Obscene Image Classification
11
3. Bhatti AQ, Umer M, Adil SH, Ebrahim M, Nawaz D, Ahmed F (2018) Explicit content detection system: an approach towards a safe and ethical environment. Appl Comput Intell Soft Comput 2018 4. Liu Yizhi, Xiaoyan Gu, Huang Lei, Ouyang Junlin, Liao Miao, Liangran Wu (2020) Analyzing periodicity and saliency for adult video detection. Multimed Tools Appl 79(7):4729–4745 5. Mao Xing-liang, Li Fang-fang, Liu Xi-yao, Zou Bei-ji (2018) Detection of artificial pornographic pictures based on multiple features and tree mode. J Cent South Univ 25(7):1651–1664 6. Avila S, Valle E, Araújo ADA (2018) Npdi porn dataset. The Institute of Computing at UNICAMP, 2018 7. Avila S, Thome N, Cord M, Valle E, AraúJo ADA (2013) Pooling in image representation: the visual codeword point of view. Comput Vis Image Underst 117(5):453–465 8. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 9. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 10. Huang Guang-Bin, Zhu Qin-Yu, Siew Chee-Kheong (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501 11. Nemec AFL, Brinkhurst RO (1988) The Fowlkes-Mallows statistic and the comparison of two independently determined dendrograms. Can J Fish Aquat Sci 45(6):971–975
Analyzing Market Dynamics of Agricultural Commodities: A Case Study Based on Cotton Nikhila Korivi, Peteti Sravani, Jafar Ali, and C. C. Sobin
Abstract Market deregulation of agricultural commodities requires in-depth analysis of day-to-day transactions. The analysis helps to draw greater insights into the market dynamics and builds better policies and advisory system for the farming sector. Reformation to the agricultural commodity market impel to deregulation of market transactions and increasing digitalization of transactions for better transparency across markets. These government policies have been taken based on the need of increasing farmer’s income from farming activities. These policy decisions give farmers immense opportunities for better realization of their farm outputs. To achieve the objective, there is an inevitable need for analyzing the market dynamics to draw better insights from the transactions. Also, farmers face massive loss due to these uncertain fluctuations in the price of agricultural commodities. Price prediction models can help farmers in making necessary decisions and help them in reducing the loss caused by price fluctuations. In this paper, we have chosen cotton as a commodity in which the price and volume of cotton over seven years from the Adoni market in Kurnool district of Andhra Pradesh, which is one of the largest cotton markets in the country market in Andhra Pradesh. The data consist of daily prices and volume of transactions of 7 years starting from 2011 to 2017. We have used machine learning techniques such as the ARIMA model for predicting the prices of cotton. Keywords Price prediction · Machine learning · ARIMA model
N. Korivi · P. Sravani · C. C. Sobin (B) SRM University, Amaravati, Andhra Pradesh, India e-mail: [email protected] J. Ali LTRC, IIIT Hyderabad, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_2
13
14
N. Korivi et al.
1 Introduction The agriculture sector remains an important occupation among the majority of the Indian population. Even though it employs half of India’s population, it hardly contributes less than 15% of the national GDP [1]. It leads to preponderant socioeconomic problems in the Indian economy. Considering the situation, the government has laid out a number of policies to increase the income of the farming community. One of the major decisions in this direction is the elimination of market regulations and diversification of the farm outputs away from the conventional food crops. The policy is in line with the scenario, where India emerged as a food surplus country for the last couple of decades. Industrially needed crops are increasingly encouraged to be part of this diversification process, apart from conventional animal husbandry, fruits, and horticulture. But Indian agriculture also faced lots of issues such as behind the times farming techniques, climatic dependence as well as vagaries of climate, and many more [2]. In a developing country like India, with the increasing population, there is a need for improvement in agricultural production. Especially, where small farmers come into play, who are not able to leverage the most benefits of their agricultural outcomes, as at the end if the farmers do not get their desired output, all the hard work by them will go in vain, which they do by putting their heart and soul. There are many problems faced by farmers, for example, farmers selling their harvest at a low rate because they are not aware of the new prices and mostly agricultural mandi’s (marketplaces) are dominated by middlemen, who always tend to give less price to farmers, because of their illiteracy and lack of unity among the farming community. Cotton cultivation obtained a special focus due to the increasing demand from the textile industries in India and abroad. An interesting fact is that India stands first in the production of cotton and second in cotton exportation and contributes to more than one-quarter of world cotton production. Cotton has an important role in the Indian economy as well because of the influence of cotton in textile industry. The textiles industry contributes nearly 5% to India’s GDP and 14% to industrial production and 11% to total export earnings [3]. With an aim to help farmers get more returns for their produce, we tried to develop the price prediction system based on available historical data on cotton commodities. In this paper, we have taken cotton as a commodity for our study, and we collected the daily price data of the Adoni market in Kurnool district of Andhra Pradesh, which is one of the largest cotton markets in the country. One of the main contributions to our work is that we collected the data on cotton daily prices over the period of 7 years (2011–2017) from the Indian government website and we analyzed the raw dataset thoroughly, using different techniques. In order to provide price prediction systems which can help farmers, which is prediction of future prices, this advanced information of prices can help farmers to make informed decisions and plan their agricultural activities in a better way.
Analyzing Market Dynamics of Agricultural Commodities: A Case …
15
2 Related Work This section discusses some of the existing machine learning models already implemented in the agricultural sector. Varun et al. [4] developed a model which can predict the price of agricultural commodities thereby helping the farmers. The authors have developed a model which can predict the precision of the yield and attempts to predict the prices of the crops based on historical data. This paper predicts the required output by using Naive Bayesian Algorithm by considering the attributes such as date, yield, maximum trade, minimum trade, rainfall, humidity, wind speed, temperature, nature of soil, and sunlight. The authors claim that the proposed model does not require any role of middlemen and farmers can directly sell their yield through any medium. Rajeshwari et al. [5] analyzed the agriculture crop price dataset of Virudhunagar district, Tamil Nadu, and developed a price prediction model based on data mining techniques. The authors used Hybrid Association rule-based Decision Tree algorithm (HADT) in their proposed model. The primary use of HADT algorithm is to generate association rules producing a single attribute and is used for generating a template/ classifier. The authors have evaluated the performance of the proposed scheme with other existing schemes and claimed that the proposed model suggests a suitable crop for cultivation based on price prediction. Kumar et al. [6] proposed the model which predicts the best suitable model for predicting the best crop and the required fertilizers by using a linear predictive model, with the data obtained from various sources. And the system takes the current location of framing land, chosen crop by the framer, and number of hectares of land as input, The proposed algorithm analyzes the final values and the best crop is predicted with a list of fertilizers to be used and claims that the proposed algorithm has an accuracy of 85% on the taken dataset. Rohith et al. [7] proposed a crop price prediction and forecasting system using decision tree regression. Using a support vector regression algorithm, the proposed system predicts the price of the crop and gives a forecast of 12 months. It uses rainfall, minimum support price, and cultivation costs as parameters for prediction. Yield, rainfall, minimum support price, and wholesale price index are given as input to the algorithm and the system predicts the price of the crop by recognizing the patterns in the training dataset which is given as input to the algorithm. It also discusses some of the future enhancements on the proposed web application system. Kiran et al. [8] developed a system for price prediction for Arecanut crops. The authors have collected the monthly prices of Arecanut for 10 years from all districts in Kerala and multiple linear regression methods were used to find the missing values. The dataset was pre-processed based on district-wise monthly data. They have used SARIMA, Holt-Winters Seasonal method (classical time series models), and LSTM neural network (machine learning model), and these models were compared to find the best-suited model. Analyzing the performance of these different models, the authors claim that the LSTM model fits the data better in predicting the monthly prices of Arecanut in Kerala.
16
N. Korivi et al.
Nasira et al. [9] proposed a price prediction model for vegetables with non-linear time series using backpropagation neural network (BPNN). And, the model is experimented using the weekly prices of tomatoes from the Coimbatore market. It uses a minimax normalization method to normalize the data and proposes three input neurons for weekly price prediction. This paper proposes the Levenberg–Marquardt algorithm over gradient descent algorithm for optimization based on the experiment requirement, and it uses four hidden layers and sigmoid transfer function in its model with learning rates of 0.06 and 0.001 network tolerance. It used two datasets and claims 89.2% accuracy and 10.8 mean squared error on the larger dataset. Ouyang et al. [10] suggested that the empirical forecast assessment of deep neural networks can be used as a better tool in the process of time series forecasting. The authors have used corn and soybean as crops for price prediction. Lai et al. [11] developed a long- and short-term time series network (LSTNet) using support vector machines (SVM). The authors have described that the cumulative change in agricultural commodities is due to factors such as weather, market, etc. The agriculture commodity includes DCE bean, DCE soybean oil, etc. Kaur et al. [12] developed a price prediction for vegetables using various data mining methods. One of the interesting aspects of their study is that they analyzed the impact of the price of crude oil on vegetables. The authors have used genetic algorithm-based techniques in price prediction. Accuracy of agricultural future price prediction is important for policymakers as well as government bodies. Fang et al. [13] proposed ensemble empirical mode decomposition techniques using support vector machine, neural network, and ARIMA models for decomposing components in the prediction of future prices of agricultural products. Six agricultural commodities chosen from the Wind database were considered for experimentation. The study gave importance to short-term highfrequency unstable components and found that the combined model using SVM, NN, and ARIMA with EEMD method is more suitable than using individual models of these methods. So, we concluded that none of the aforementioned methods [4–13] consider the price prediction of cotton crop using machine learning techniques like the ARIMA model which we have implemented in this paper.
3 Dataset To carry out our work, we have collected daily prices of cotton commodities over 7 years, along with their corresponding volume of Adoni market, Kurnool district, Andhra Pradesh, which is the largest producer of cotton and has substantial ginning and textile industry. The data was collected from Agmarket which is a governmentdriven data portal created to monitor various aspects related to a commodity such as price trends, food outlook, market profile, and many more. Our dataset consists of 5 columns and 1789 rows. Figure 1 shows the price trend and volume trend over the course of 7 years (daily).
Analyzing Market Dynamics of Agricultural Commodities: A Case …
17
Fig. 1 Daily prices movement for 7 years along with volume
We have performed a yearly analysis for prices with volume (quantity arrivals) and found that there is not much difference in price movement daily and visible aberration in the continuous daily data has led to further analysis in order to find the accurate relationship between price and volume trends. We began the analysis phase by exploring the correlation and autocorrelation between price movement across the 12 months and monthly for each year, across all the 7 years. The matrix of correlation coefficient and its corresponding matrix as well as heatmap, clearly depicts the price movement over the 12 months of any year, as well as the yearly correlation coefficient, which is shown in Figs. 2 and 3, respectively. In the price trend analysis phase, we have seen that there are constant fluctuations in the prices of cotton in the Adoni market, shown in Fig. 2, the daily price movement over seven years. According to sources, the reason for the sudden increase in prices of cotton is due to the absence of major pest problems which affected previous years. According to government officials, the sudden increase in price is due to an even distribution of monsoon that was experienced in the last Kharif season, and the reasons for sudden decrease in price is heavy speculation in the future trade and MCX in particular period [14].
18
N. Korivi et al.
Fig. 2 Heatmap of the correlation coefficient (monthly)
Fig. 3 Heatmap of the correlation coefficient (yearly)
4 Results Based on our analysis of the collected data, we tried to fit our data in the Autoregressive integrated moving average model (ARIMA), to predict the price of cotton in the Adoni market of Kurnool district. We tried using the ARIMA (5, 1, 0) model in our dataset which consists of 7 years’ prices of cotton (Fig. 4).
Analyzing Market Dynamics of Agricultural Commodities: A Case …
19
Fig. 4 Daily price movement for 7 years
ARIMA, which is one of the most used models for forecasting time series data and this short form [15], explains the important aspects of the model such as autoregression, integration, and moving average. Through the ARIMA model, we understood the predicted data in time series. The comparison between expected and graph on predicted values is shown in Fig. 5, where predicted values are represented through the RED line and expected values are represented through the BLUE line. We used testing and training datasets, derived from the original dataset for learning price patterns and predicting future prices. In our model, we have taken lag order for autoregression as 5, and to make the time series stationary we have taken the degree of differencing as 1. And, a moving average window of size 0 is used in our model. We also calculated the RMSE (root mean square error) score and got the value 253.74 and 163.023 for volume and price, respectively (Table 1), which is the measure used to find the difference between the expected value and the predicted value by the model, as well as RMSE is one of the accuracy measures used in the ARIMA model.
Fig. 5 Comparison between expected and graph on predicted values
20 Table 1 Comparison between RMSE and R2 value
N. Korivi et al. R2
Metric
RMSE
Price
163.023
0.9290
Volume
253.47
0.6923
5 Conclusion Predicting the prices using machine learning-based methods always helps the farmers to get more profit. In this paper, we have predicted the prices of Cotton from the datasets taken from AGmarket.in. During our analysis of the dataset, we performed correlation analysis and autocorrelation analysis to get more information about autocorrelation, i.e., whether positively correlated or negatively correlated, we performed a test named Durbin–Watson (DW) statistic test, through which understood the data more deeply. And finally, we have attempted to fit the dataset into the ARIMA model as a test case.
References 1. http://statisticstimes.com/economy/country/india-gdp-sectorwise.php. Accessed 14 Apr 2022 2. https://sgp1.digitaloceanspaces.com/forumias/noticeboard/wp-content/uploads/2019/12/261 21759/Indian-Agriculture-Part-1.pdf. Accessed 14 Apr 2022 3. https://wcrc.confex.com/wcrc/2007/techprogram/P1780.HTML. Accessed 14 Apr 2022 4. Varun R, Neema N, Sahana H.P., Sathvik A, Mohammed Muddasir (2019) Agriculture commodity price forecasting using ML techniques. Int J Innov Technol Explor Eng (IJITEE) 9. ISSN: 2278-3075 5. Rajeswari S, Suthendran K (2019) Developing an agricultural product price prediction model using HDAT algorithm. Int J Eng Adv Technol (IJEAT) 9(1S4). ISSN:2249-8958 6. Naveen Kumar PR (2020) Smart agricultural crop prediction using machine learning. Diss. Xi’an University 7. Rohith R, Vishnu R, Kishore A, Chakkarawarthi D (2020) Crop price prediction and forecasting system using supervised machine learning algorithms. Int J Adv Res Comput Commun Eng 9(3) 8. Kiran M Sabu, Manoj Kumar TK. Predictive analytics in agriculture: forecasting prices of Arecanuts in Kerala. In: Third international conference on computing and network communications (CoCoNet’19) 9. Nasira GM, Hemageetha N (2012) Forecasting model for vegetable price using back propagation neural network. Int J Comput Intell Informatics 2(2) 10. Ouyang H, Wei X, Wu Q (2019) Agricultural commodity futures prices prediction via long-and short-term time series network. J Appl Econ 22(1):468–483 11. Lai G, Chang WC, Yang Y, Liu H (2018) Modeling long-and short-term temporal patterns with deep neural networks. In: The 41st International ACM SIGIR conference on research & development in information retrieval, pp 95–104 12. Kaur M, Gulati H, Kundra H (2014) Data mining in agriculture on crop price prediction: techniques and applications. Int J Comput Appl 99(12):1–3 13. Fang Y, Guan B, Wu S, Heravi S (2020) Optimal forecast combination based on ensemble empirical mode decomposition for agricultural commodity futures prices. J Forecast 39(6):877– 886
Analyzing Market Dynamics of Agricultural Commodities: A Case …
21
14. https://www.business-standard.com/article/economy-policy/cotton-prices-record-new-highin-ap-104020901053_1.html. Accessed 14 Apr 2022 15. https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/. Accessed 14 Apr 2022
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional Random Fields-Based System Pooja Rai and Sanjay Chatterji
Abstract The features used for training any statistical model have an enormous impact on the model’s performance. The selection of features that contribute most to the prediction variable or output of concern is a crucial task. We used conditional random fields (CRF) to conduct experiments on the best feature set selection for part-of-speech (POS) tagging of Nepali text. Nepali is a resource-poor language, and its processing is still in its infancy. In this work, we have studied the effect of using various features (non-linguistic affixes, word length coupled with contextual information, along with digit and symbol checking) for improving the tagger’s performance without using any external sources like the gazetteer list, morphological rules, etc. A thorough error analysis has been carried out, followed by the manual correction of the wrongly annotated words in the training data. The proposed CRF-based POS tagger’s efficacy has been validated with 95% accuracy. Keywords Nepali part-of-speech tagging · Feature selection · Conditional random field
1 Introduction The Nepali language is part of the Indo-Aryan language family, which is a subset of the Indo-European language family. It is recognized as one of India’s 22 official languages.1 Roughly 3 million people in India2 and about 45 million people world1 The Constitution of India. page 330, EIGHTH SCHEDULE, Articles 344 (1) and 351. Languages. 2 https://censusindia.gov.in/2011Census/C-16_25062018_NEW.pdf.
P. Rai (B) · S. Chatterji Department of Computer Science and Engineering, Indian Institute of Information Technology, Kalyani 741235, India e-mail: [email protected] P. Rai Department of Computer Science, New Alipore College, Kolkata 700053, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_3
23
24
P. Rai and S. Chatterji
wide [1] are native speakers of this language. While the other Indian languages have benefited much from technological advances in language, Nepali remains understudied and under-resourced. Some work on POS tagging in Nepali using statistical approaches has been done to date. Most of these works have not explored the features required for the task. In this work, we have experimented to find the optimum combination of features for automatic Nepali POS tagging, similar to the works of [2, 3]. Being one of the crucial preprocessing tasks, POS tagging helps in the precise parsing of text as well as in developing various applications like machine translation, information extraction systems, and semantic processing, to name a few. The features used to train the statistical models have an enormous impact on the performance of the POS tagger. Conditional Random Field (CRF) brings together the best of generative and classification models [4]. Again, different overlapping features can be dealt with efficiently by a CRF-based method [5]. Since Nepali is a highly inflected and morphologically rich language, CRFs give the freedom to incorporate linguistic properties as features to provide input to the tagger. This paper presents POS tagging based on CRF for Nepali with a different feature set containing contextual information, word length, non-linguistic affixes or syllables, and features for digit and symbol. Unlike the Nepali POS tagger of Shahi et al. [6], no dictionary is used as an additional resource for tagging. There is no prior work on the optimal feature extraction for Nepali POS tagging in the literature. So, this work is significant as the extracted optimal feature set can be used with any statistical sequence labeling tool and trained using Nepali POS data to get models with optimum performance.
2 Literature Review The statistical approach-based Part-of-Speech (POS) tagging is in extensive use due to its lower cost involvement as compared to the rule-based approach. Conditional random field (CRF) is one of the statistical approaches for POS tagging, and Lafferty et al. [5] took the lead in CRF-based tagging of the PENN treebank corpus and found out that CRF performs better than the related classification models and Hidden Markov Models (HMMs). Several Indian languages also adopted CRF for tagging, like [2–4] etc. The first Nepali POS tagger was developed under the Nepali Language Resources and Localization for Education and Communication (NELRALEC) project carried out in Nepal with 112 tags [1]. Using the NELRALEC tagset, Jaishi et al. [7] implemented First order Markov model-based POS tagger, and Tej Bahadur Shahi et al. [6] developed a Support Vector Machines-based Nepali POS tagger with an accuracy of 93.27%. However, the SVM tagger was found to be slow in training. Using TDIL corpus and BIS tagset of 42 tags a Hidden Markov Model-based Nepali POS tagger has been reported in [8]. However, the tagger did not work well
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional …
25
for unknown words. A hybrid tagger using Hidden Markov Model combined with rule-based method was also proposed by Sinha et al. [9]. Yajnik et al. [10] reported an artificial neural network (ANN) based Nepali POS tagger with three different ANN architectures: Radial Basis Function (RBF) network, General Regression Neural Networks (GRNN), and the Feedforward Neural Network. Feedforward neural networks and RBF, on the other hand, were found to underperform, with accuracies of 25% and 26.65%, respectively.
3 Our Approach for Nepali POS Tagging The optimal set of features required for developing the Nepali POS tagger has yet to be thoroughly tested. We have started with the experiments done for the Bengali POS taggers. We have used the same Conditional Random Field (CRF) [5] machine learning technique that gave the best results for Bengali and was found suitable for feature extraction.
3.1 Features of POS Tagging • Context word feature: The previous words with a maximum window size of 3 as well as the immediate next word of the same sentence with respect to a particular word are used as context word feature. If there are fewer than 3 words before the word for which we wish to find the POS tag, then we take available words from there. This value 3 is achieved by the experiment. This feature for the ith word is represented as follows: F1: {wi−3 , wi−2 , wi−1 , wi , wi+1. } • Character Encoding: The size of the Nepali suffixes varies from 1 to 6 (Unicode) characters, and that of the Nepali prefixes varies from 2–4 (Unicode) characters. We have experimented and observed that 2–5 suffix characters and 2–4 prefix characters are important in POS tagging. It is to be noted that these suffixes and prefixes are non-linguistic. This feature is represented for the current word with length n as follows: The characters ath and bth of the word are represented by ca,b . F2: {c1,2 , c1,2,3 , cn,n−1 , cn,n−1,n−2 , cn,n−1,n−2,n−3 , cn,n−1,n−2,n−3,n−4 } • Context POS feature: One of the important and dynamic features for predicting the POS tag for the token in question is the POS tag for the words preceding the current word. We have observed that considering the POS tags of the previous 3 words is optimum for Nepali POS tagging. This feature for the current word is represented as follows: The Part-of-Speech tag of the xth preceding word is represented by pos− x. F3: { pos−i , i = 1, 2, 3}
26
P. Rai and S. Chatterji
Fig. 1 Example of a feature set of a Nepali sentence
• Length of the word: Because proper nouns are rarely very short in length, this is one of the most effective features for distinguishing them from other tags. It is a binary feature where a word with a length less than 3 has been considered a short word (−ve), and otherwise it is a long word (+ve). This feature is represented as follows: F4: val = Sign(length(word)−4), The number 4 was used to determine the sign (−ve or +ve) for the short word. • Digit feature: This is a binary feature that is included to identify numerals in the given text. If the word contains a digit, then it is true; otherwise, it is false. This is represented by the “present” function. This feature is represented as follows: F5: val = 1 if (word, 0–9) exists; 0 otherwise • Symbol feature: This is also a binary feature that is included to identify whether a given token is a special symbol or not. If the word contains a character other than 0–9, a–z, or A–Z, it is true; otherwise, it is false. This is represented by the symbol function. This feature is represented as follows: F5: val = 1 if present(word, ![0–9,a–z,A–Z]); 0 otherwise The selected features for an example Nepali sentence are shown in Fig. 1.
4 Experiment The experiments were carried out using a free software machine learning library called Scikit-learn (also known as sklearn). The “sklearn-pycrfsuite 0.4.0” was used for both training and text tagging. The code provided by [11] has also been referred for implementation. In this section, we will discuss these experiments using a CRFbased model.
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional …
27
4.1 Baseline Model: Without Feature First, we created a baseline model without considering any features. In this model, the best tag for a given input word is predicted based only on the frequency of the occurrence of the word and tag in the training data. The motive behind this model is to understand what one could achieve with annotated data alone, without any knowledge incorporated in training a Nepali POS tagger.
4.2 CRF-A Model: Incorporation of Contextual Information One of the most important and fundamental properties helping to resolve the syntactic ambiguity of a word in a phrase or sentence is its relationship to the preceding and following words. These are also called context words. The POS tags of preceding words predicted in the previous iterations also help in identifying the POS tags of the current word.
4.3 CRF-B: Incorporation of Affix Features The internal structure of a word also plays a vital role in resolving its syntactic ambiguity. This model includes the features of CRF-A as well as the word’s prefix and suffix.
4.4 CRF-C: Incorporation of Lexical Features This model includes, in addition to the CRF-B features, the word length feature for distinguishing between open and closed classes, the digit feature, and the symbol feature.
4.5 Corpus Statistics A corpus of 5.5k sentences containing 71748 wordforms (released by the Indian Language Technology Proliferation and Deployment Centre (TDIL) with the BIS (Bureau of Indian Standard) tagset3 ) taken from various domains like agriculture, entertainment, health, as well as tourism, has been used for training the tagger. The validation data and test data consist of 300 sentences with 4331 words and 200 sentences with 3154 words, respectively. 3
http://www.tdil-dc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf.
28
P. Rai and S. Chatterji
Table 1 Feature-based accuracy of our models. Accvalid : Accuracy on validation data, Acctest : Accuracy on test data Model Features Accvalid (%) Acctest (%) Baseline CRF-A
CRF-B
CRF-C
wi wi−3 , wi−2 , wi−1 , wi , wi+1 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 ,c1,2 ,cn,n−1 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 ,c1,2 , cn,n−1 ,c1,2,3 , cn,n−1,n−2 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 ,c1,2 , cn,n−1 ,c1,2,3 , cn,n−1,n−2 , c1,2,3,4 , cn,n−1,n−2,n−3 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 ,c1,2 , cn,n−1 ,c1,2,3 , cn,n−1,n−2 , c1,2,3,4 , cn,n−1,n−2,n−3 , cn,n−1,n−2,n−4 wi−3 , wi−2 , wi−1 , wi , wi+1 , pos−1 , pos−2 , pos−3 ,c1,2 , cn,n−1 ,c1,2,3 , cn,n−1,n−2 , c1,2,3,4 , cn,n−1,n−2,n−3 , cn,n−1,n−2,n−4 , F4, F5, F6
86 87 88
86.6 87.7 88.4
89
90
90
91
91
92
92
93
93
95
4.6 Results Performance metrics in terms of precision, recall, and F1-score have been used for measuring the performance of the system. These values for each of the POS tags for the validation data as well as test data are presented in Tables 2 and 3 respectively. The experiment on feature combination, as well as the results, are presented in Table 1. We got the highest accuracy of 95% for the features mentioned in the Sect. 3.1 (Tables 2 and 3).
4.7 Error Analysis and Observations After testing the model on the validation data, we have created a confusion matrix. There are 25 tags in the tagset, and therefore we have created a 25*25 confusion matrix. Though the diagonal values of the matrix are higher and other values are mostly zeros, we have taken some parts of this confusion matrix where the nondiagonal values are quite high. Three such parts of the confusion matrix are presented in Tables 4, 5, and 6. The analysis of the ambiguities showed that some of them were indeed annotation errors in the training corpus, i.e., some of the tokens in the training data were tagged
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional …
29
Table 2 POS tag-wise Precision, Recall and F1-score of our best model on Validation data POS-Tag Precision Recall F1-score CC_CCD CC_CCS DM_DMD JJ N_NN N_NNP N_NST PR_PRC PR_PRF PR_PRI PR_PRL PR_PRP PR_PRQ PSP QT_QTC QT_QTF QT_QTO RB RD_PUNC RD_RDF RP_INJ RP_INTF RP_NEG RP_RPD V_VAUX V_VM Micro average Macro average Weighted average Samples average
1.00 0.84 0.92 0.90 0.91 0.40 0.88 0.00 1.00 0.70 0.97 0.85 1.00 0.95 1.00 0.86 1.00 0.93 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.95 0.93 0.85 0.93 0.93
0.99 0.95 0.81 0.76 0.96 0.57 0.41 0.00 1.00 0.95 1.00 0.91 1.00 0.95 0.96 0.81 1.00 0.66 1.00 1.00 0.00 0.64 1.00 1.00 0.91 0.97 0.93 0.82 0.93 0.93
1.00 0.89 0.86 0.83 0.93 0.47 0.56 0.00 1.00 0.81 0.99 0.88 1.00 0.95 0.98 0.84 1.00 0.77 1.00 1.00 0.00 0.78 1.00 1.00 0.95 0.96 0.93 0.83 0.93 0.93
incorrectly. So, a thorough review of POS tagging in the training corpus was followed by the manual correction of wrongly annotated sentences. Some of the instances of incorrectly annotated words along with the correct tag are shown as follows: The number of confusions between the adjective and the common noun is the highest among all ambiguous pairs, and such ambiguity prevails due to the use of the word in a similar context. An instance of such ambiguity is shown in Fig. 2. One of the most prevalent hard-to-disambiguate pairs found is that of the common
30
P. Rai and S. Chatterji
Table 3 POS tag-wise Precision, Recall and F1-score of our best model on Test data POS-Tag Precision Recall F1-score CC_CCD CC_CCS DM_DMD JJ N_NN N_NNP N_NST PR_PRC PR_PRF PR_PRI PR_PRL PR_PRP PR_PRQ PSP QT_QTC QT_QTF QT_QTO RB RD_PUNC RD_RDF RP_INJ RP_INTF RP_NEG RP_RPD V_VAUX V_VM Micro average Macro average Weighted average Samples average
1.00 0.87 0.78 0.50 0.94 0.52 1.00 1.00 1.00 0.79 0.95 0.92 1.00 0.95 1.00 0.81 1.00 0.96 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.97 0.95 0.88 0.95 0.95
Table 4 Confusion matrix 1 JJ JJ N_NN N_NNP
356 19 0
0.99 1.00 0.85 01.00 0.97 0.52 0.54 0.50 1.00 1.00 1.00 0.85 0.92 1.00 0.94 0.75 1.00 0.76 1.00 1.00 0.00 0.69 1.00 1.00 0.95 0.99 0.95 0.85 0.95 0.95
0.99 0.93 0.81 0.67 0.95 0.52 0.70 0.67 1.00 0.88 0.98 0.88 0.96 0.97 0.97 0.78 1.00 0.85 1.00 1.00 0.00 0.82 1.00 1.00 0.97 0.98 0.95 0.86 0.94 0.95
N_NN
N_NNP
80 1679 28
3 44 25
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional …
31
Table 5 Confusion matrix 2 N_NN RB
Table 6 Confusion matrix 3 N_NN N_NN V_VAUX V_VM
1679 0 14
N_NN
RB
1679 31
3 81
V_VAUX
V_VM
0 11 0
15 21 552
Fig. 2 The original and corrected POS tag of the word
Fig. 3 The original and corrected POS tag of the word
noun-proper noun pair. As shown in the Fig. 3, it was found that the word has been tagged as a common noun (N_NN) in the training data though it is a proper noun. which has been corrected with the tag ’JJ’. There were some cases of mismatch between the tagging of common noun and adjective. An example is shown in the sentence shown in Fig. 4. We also found some cases where a verb was tagged as an adjective. An example is shown in the sentence shown in Fig. 5. Some of the nouns were also found to be tagged as a verb. An example is shown in the sentence shown in Fig. 6. After manually correcting 500 incorrectly tagged words in the training data, our best model (CRF-C) was trained with the updated training data. Then the accuracy of the updated model reached 95%, with the test dataset. Hence, thorough and careful
32
P. Rai and S. Chatterji
Fig. 4 The original and corrected POS tag of the word
Fig. 5 The original and corrected POS tag of the word
Fig. 6 The original and corrected POS tag of the word
Fig. 7 Plot of Accuracy v/s Size of Training Data, comparing the baseline and best model
error analysis has led to improving the training corpus by correcting incorrect tagging instances, which further enhanced tagger performance. We also observed the effect of training data amount on model precision by comparing our baseline and best model, as shown in Fig. 7. The accuracy improves gradually as more data is used for training, as depicted by the learning curve. Furthermore, a plateau in the learning curve has been spotted, suggesting that the data set employed for training in our experiment is adequate. Such information is helpful for a resource-poor language like Nepali.
Feature Selection for Nepali Part-of-Speech Tagging in a Conditional …
33
5 Conclusion We present the results of an analysis of the impact of utilizing different features for creating the Nepali POS tagger using a CRF-based tool. It was found that features like non-linguistic affixes, word length, and contextual information make a significant impact on Nepali POS tagging. Also, the size of the training data is an important factor in enhancing the tagger. The tagging is performed without using any external sources like dictionaries, morphological rules, etc. The performance of the proposed viable CRF-based Nepali POS tagger is 95% accurate. This accuracy can be considered usable in the preprocessing of machine translation, information retrieval, or any other NLP task. Furthermore, the accuracy of the proposed tagger can be easily increased using resources such as the gazetteer list, rules, etc.
References 1. Yadava YP, Hardie A, Lohani RR, Regmi BN, Gurung S, Gurung A, McEnery T, Allwood J, Hall P (2008) Construction and annotation of a corpus of contemporary Nepali. Corpora 3(11) 2. Ekbal A, Haque R, Bandyopadhyay S (2007) Bengali part of speech tagging using conditional random field 01 3. Dandapat S, Sarkar S, Basu A (2007) Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, ACL’07, USA, 2007. Association for Computational Linguistics, pp 221–224 4. Avinesh PVS, Karthik G (2007) Part-of-speech tagging and chunking using conditional random fields and transformation based learning 01 5. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, ICML’01, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc., pp 282–289 6. Shahi TB, Dhamala TN, Balami B (2013) Support vector machines based part of speech tagging for Nepali text. Int J Comput Appl 70(24) 7. Jaishi MR (2009) Hidden Markov model based probabilistic part of speech tagging for Nepali text. Masters Dissertation, Central Department of Computer Science and IT , Tribhuvan University, Nepal 8. Paul A, Purkayastha BS, Sarkar S (2015) Hidden Markov model based part of speech tagging for Nepali language. In: Proceedings of the international symposium on advanced computing and communication (ISACC)IEEE. World Academy of Science, Engineering and Technology 9. Sinha P, Veyie NM, Purkayastha BS (2015) Enhancing the performance of part of speech tagging of Nepali language through hybrid approach 10. Yajnik A (2018) Ann based PoS tagging for Nepali text. Int J Nat Lang Comput (IJNLC) 7(3) 11. Proceedings of the shallow parsing for south Asian languages (SPSAL) workshop, held at IJCAI-07, Hyderabad, India. In: SPSAL workshop proceedings, January 2007, pp 21–24
Design and Development of a ML-Based Safety-Critical Fire Detection System Anukruthi Karre, Mahin Arafat, and Akramul Azim
Abstract A Real-time Fire detection system warns people when there is a fire, smoke, or if any fire-related hazards are detected to prevent the loss afterward. Although this is a challenging task, it might be very useful to predict fire during the early stages to avoid disasters. One method that could be used to quickly predict outcomes based on enormous amounts of data is machine learning. This paper presents a fire simulation in the ‘Fire Dynamics Simulator tool’ and utilized machine learning models to predict temperature based on the various sensor measurements and other factors, and an alert system to notify the user. In the machine learning model, we used seven different regression analysis techniques, namely Linear Regression, Lasso Regression, Ridge Regression, SVM Regressor, Gradient Boost, Random Forest Regressor, and Decision Tree, to the simulated data to detect the presence of fire based on the values of a temperature sensor, gas sensor, and other parameters. Additionally, we used two hybrid algorithms using StackingRegressor module. Of all the algorithms, the Gradient Boost and Random Forest Regressor performed the best on the dataset, with an R2 score of 0.95. The model belongs to the supervised learning category as the dataset is labeled. Finally, as a safety measure, we implemented an email/SMS alert system that notifies the user when a fire is detected. Keywords Fire · Sensors · Alert · Safety · Prediction
A. Karre (B) · M. Arafat · A. Azim Department of Electrical and Computer Engineering, Ontario Tech University, Oshawa, ON, Canada e-mail: [email protected] M. Arafat e-mail: [email protected] A. Azim e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_4
35
36
A. Karre et al.
1 Introduction A Safety-Critical system is a system whose failure can lead to the loss of human lives or damage to property or infrastructure. As a part of industrialization, we came up with lots of factories and facilities that are very well equipped and generate resources and have risk factors that can cause severe damage to human lives and the environment if anything goes unplanned. Thus, a safety-critical software system is needed. Moreover, in our paper, we will be working with a specific safety-critical software system, a Fire Detection System. A Fire alarm system is a device that warns people when there is a fire, smoke, or any fire-related hazards detected. It is deployed in a house, a building, or an area and provides auditory or visual warning of a fire outbreak. A traditional fire alarm/detection system might be fully automated, semi-automated, or manually controlled. Due to the crucial need for a fire alarm/detection system, plenty of research and work is being conducted. Some modern fire detection systems are linked with IoT and machine learning. Understanding fluid physics, heat transfer, soot, and how a fire interacts with its surroundings are all necessary to accurately predict how a fire will behave. Therefore, full-field conditions predicted by computational fluid dynamics (CFD) models for flames are computationally expensive. Existing models, such as Fire Dynamics Simulator (FDS) [1] and FireFOAM [2], can make detailed forecasts for domain sizes ranging from little to moderate (up to 100s of meters). It is necessary to anticipate specific situations over vast areas in numerous applications (forest flames, landmines, structures, and vehicles) to assess risks accurately, conduct risk assessments, and devise mitigation methods. In this paper, we developed a fire detection system by first making a simulation of fire in Fire Dynamics Simulator (FDS), extracting the data, and applying a few machine learning algorithms to predict the fire hazard incident. We started from scratch and tried different ML algorithms. The Fire Dynamics Model (FDS) is a computational fluid dynamics (CFD) simulator for fire-driven fluid flow. The software algorithm numerically resolves a large simulation form of the Navier-Stokes equations appropriate for reduced speed, thermally driven flow, focusing on heat and smoke transfer from flames, in order to explain the spread of fire. It is a Fortran program that uses text file parameters, numerically solves the governing equations, and outputs data to files selected by the user. It is widely used by the US Forest Service to simulate wildfires. The FDS simulator uses FORTRAN as the programming language to simulate a scenario and generates data in the form of excel files. We ran the simulator for several simulations with different values of the parameters and generated multiple files of data. We then merged all the files for the final dataset used for implementing the machine learning algorithms. We used several regression techniques for the prediction of temperature. The dataset obtained from the simulator is a labeled dataset. Hence, it falls under the supervised learning category.
Design and Development of a ML-Based Safety-Critical Fire Detection System
37
Compared to the existing models for predicting the possibility of fire, the accuracy of forecasting using machine learning models is considerably greater. In addition, everyone’s expectations for how machine learning models can help assess the probability of fire-related dangers in industries and buildings will be assessed. This paper has the following contributions: • The paper evaluates the characteristics of fire hazards by simulating a real scenario. • We evaluate different machine learning algorithms to predict fire hazards. • We propose a fire hazard notification system to include in the fire detection controller to increase safety. The paper is organized as follows. Section 2 discusses the related work on different safety-critical systems. Section 3 describes the proposed system and the methodology involved in developing it in three stages: Simulating the model, Building a machine learning algorithm, and creating an alert system, followed by the results of the model predictions. Finally, in Sect. 4, we conclude the proposed work and the findings for future developments.
2 Related Work Many studies are now being conducted on the early identification and forecasting of forest fires. Wireless sensor networks, satellite image processing, and other prediction models are examples of early prediction approaches. Researchers should be able to foresee or predict fire breakouts in order to improve present fire management efforts. Since fires are unpredictable, strong computational machine learning models are required to predict them. Previous research has explored utilizing machine learning techniques to establish the proper spatial frequency ranges for fire outbreaks. These calculations include a variety of techniques, including convolutional neural networks, regression trees, support vector machines, artificial neural networks, random forests, and, more recently, genetic programming. Compared to the existing models for predicting the possibility of fire, the accuracy of forecasting using machine learning models is considerably greater. In addition, everyone’s expectations for how machine learning models can help assess the probability of fire-related dangers in industries and buildings will be assessed. A Machine Learning model comprises several hidden and visible layers, and processing units are known as neurons. These neurons are also organized into an intelligent structure or network, resembling a human brain, that produces a continuous connection among layers and functional neurons. The first layer in this network contains the inputs or predictors and is linked to many hidden layers that can be used to create linear or non-linear models. The hidden layers are also linked to the output layer, which consists of outputs or targeted variables. Supervised learning is a subfield of machine learning and artificial intelligence. It is distinguished by using labeled datasets to train algorithms capable of accurately
38
A. Karre et al.
classifying data or predicting outcomes for unseen data. The input data is fed into the model during training, and the weights are adjusted until the model is well fitted, which occurs during the cross-validation phase. This data is then presented and compared to the test data, unseen labeled data, to determine how accurate the model is. Several organizations employ supervised learning to solve a variety of real-world challenges. Regression is a machine learning algorithm that examines the relationship between independent variables or attributes and a dependent variable or outcome to further predict the output on never-before-seen data. Another research by Pham et al. [3] worked on the different methodologies of machine learning and processed the data of more than 50 historical fire incidents. They have used the Bayes network, decision tree, and Naive Bayes to make a prediction system that can predict the potential fire hazard and help the local authority take steps to prevent the hazard in advance. They have used several well-known algorithms like support vector machines(SVM) in their research. They have found that 195 of the study have more human activities, which is a significant concern for catastrophe. They have achieved a high level of prediction accuracy in this paper with their managed dataset. A deep learning technique was applied by Periera et al. [4] to a sizable dataset of approximately 1,50,000 pictures. They created this dataset by obtaining a few picture patches from Landsat-8 photographs. They employed manually created algorithms to divide the dataset into two parts, with the first half being used to identify fire and the second being manually annotated to explore automatic picture segmentation. In order to get more precise results, they looked into spectral bands and researched CNN models in this article. To improve the CNN, the input pictures and related segment patches are used. They used deep learning techniques to the datasets to approximate the manually created algorithms and to improve the results. By combining various models trained on automatically segmented image patches, they were able to achieve a reasonably high level of precision. The system proposed by Shi et al. Songlin [5] comprises four layers, namely NBIoT sensing terminal layer, network transmission layer, IoT platform management layer, and application services layer. The system’s hardware comprises a battery, IoT module, main microprocessor, and smoke sensor. In the system, they programmed that A) If the value is higher than 1000, fire is detected and the alarm has to be turned on. The control of the LED and buzzer is within the main chip to issue the alarm. After the alarm is turned on, the main chip sends the alarm data to the NB-IOT module, which sends it to the cloud. As soon as the information is received, the platforms immediately send the alarm data at the exact location to the fire department and the user. B) The system enters an energy-saving state if the value is less than the given threshold and no fire is detected. This system sends the smoke value, threshold, CSQ, and battery level alarm data to the IoT cloud platform. The approach followed in the research of Saeed et al. Paul, Rehman, Hong, and Seo [6], using the sensors which can detect fire, smoke, heat, and gas. The temperature sensors use (i) rising heat rate and (ii) fixed temperature as classes of operation. They used different sensors for different environments and collected data to create a report which was analyzed later. They used different topologies to set up wireless
Design and Development of a ML-Based Safety-Critical Fire Detection System
39
sensor networks and used Zigbee to communicate between the home sink and the sensor. The information collected from the sensor was delivered to the main sink, which was wireless. Raspberry Pi was used for the primary processing units as it is more popular and versatile in functionality. SMS alert systems were used, which notified the specific user, and the GPIO module checked the user’s response to fire. The response was in ‘yes’ or ‘no’ format. However, the alarm’s final judgment was based on both the user’s response and the sensor’s reading. As the cost of a false alarm is expensive, the GSM in the kitchen environment. The current GSM modems, i.e., WaveCom and Multitech, have SMS functionalities that use text mode to allow the home sink to send a pre-warning alert. To evaluate the system, a simulation was done, which is called Fire Dynamic Simulator, which takes various inputs such as (i) Humidity; (ii) Initial sensor values; (iii) Sensors thresholds. The house was divided into four parts: the kitchen, bedroom, living room, and TV lounge. Different perimeters, consisting of temperature, smoke, and gas, were used to monitor the performance. Visual Studio and C++ libraries were used for the implementation process. We will be using a similar approach for this project. In a different article, Yin et al. [7] provide a novel strategy using visual smoke detection in light of the advancement of various technologies including CNN and Deep learning. The model uses a vibe algorithm to organize and define the photographs as various datasets in order to retrieve the segments of the smoke shots in the real scenarios. Using the data, they classified the photos as genuine or false warnings using the GAN model. To choose the best scenario for extracting the indepth feature, DCG-CNN looked at a variety of possibilities. During the process, they also used some simulation data. They discovered that their system produced smoke alerts by carefully analyzing actual situations and excluding the possibility of drawing incorrect assumptions. Shahid et al. [8] proposed a two-stage cascaded architectural policy to achieve improved precision. In order to create heatmaps, they first created the SpatioTemporal network, which combines shape and movement features. They additionally classified the data and evaluated if the result was deceptive using original pictures and heatmaps of potential sites. Three separate datasets were utilized. The inputs were categorized using spatial and temporal networks, and Dense Block and UNet integration was employed to improve classification precision. The datasets from Taiwan-tech and Bowfire were used. Positive results from the innovative experiment include improved accuracy and fewer false alarms. Muheden et al. [9] proposed a wireless sensor network that can be deployed at home to notify users of any fire outbreaks. They used an Arduino device that takes in the values from all the sensors like fire, gas, and humidity and enabled the communication once the fire signals cross a certain threshold. The communication framework involves an alert message as a notification to the mobile users.
40
A. Karre et al.
3 Proposed Fire Detection System Fire-related hazards are not uncommon, and steps are taken to solve the problem. However, Surprisingly, fire-related hazards are still on the rise. Thus, we wanted to create a system that is not only a fire alarm system and can also predict the fire hazard using data and machine learning. In our paper, we decided not to operate the testing of use cases in real-world scenarios, as real-time testing includes costs that are not in our budget. Moreover, it is precarious to conduct the testing as there can be severe damage or harm to lives or properties in a facility. Hence, we have used a fire dynamics simulator (FDS), a tool used to simulate a fire scenario. FDS is a tool developed by the National Institute of Standards and Technology (NIST) to simulate a fire in various conditions. It is a computer language software that numerically solves Navier-Stokes Equations. It is built using FORTRAN and takes a variety of inputs, including sensor thresholds, initial sensor values, humidity, and other factors for designing the environment, and so on. We simulated fire in a particular environment in FDS. We built a scenario of a house consisting of four rooms, namely a bedroom, living room, kitchen, and TV lounge, and used three sensors—temperature, gas, and smoke to monitor the fire. We initially provided initial inputs to the sensors, such as room temperature (For example, 25 ◦ C), and simulated for a specific period. We ignited a fire in one of the rooms and waited for the fire to spread. Different scenarios were observed with the functioning of a single sensor and multiple sensors. All the values and results were considered, and the threshold was set. It generated a simulation output along with excel data files with the values like fire, radiation, and pressure. Once the data was generated, it was stored in an excel file. We visualized the data from the excel file with the help of the Jupyter Notebook, which is a free, open-source platform. We also made a data model; the data was analyzed and trained on different machine learning algorithms to predict the temperature based on other parameters such as gas temperature, radiation, heat release rate, and conduction. We made a feature to alert the responsible person about the fire hazard through email and text. For this, the data generated from the simulation was taken as an input for generating an alert message. An email to email and text message was defined in the Jupyter Notebook. Whenever the threshold of 47 ◦ C is crossed, the system automatically notifies the authorized person.
3.1 Methodology In this part, we talk about the process we went through. As we did not make any devices or hardware, we used the data we generated from the simulation. In the first part, we give a brief overview of the systems we have in current days. Then we
Design and Development of a ML-Based Safety-Critical Fire Detection System
41
talked about the simulations we used to simulate the data. In the third part, we gave an overview of the data used. Fire alarm systems are widely used in every place where people accommodate, for example, residences, offices, and industries. While an alarm can save lots of lives and money simultaneously, it can do the opposite with a false alarm. Thus, it is essential to have a flawless system that serves the purpose well. Standard fire alarm systems have different ways and approaches, and most of them are not perfectly connected with each other, especially with smart devices. The available fire alarm systems are mainly dependent on sensors. However, in this paper, we will not be using or making any hardware to detect fire; instead, we will be using a fire dynamics simulator.
3.1.1
A Simulated Model
Our paper used the Fire Dynamics Simulator to create a house with four rooms: a kitchen, a living room, a bedroom, and a TV lounge. This is implemented using FORTRAN programming language. There are five ventilation doors, as shown in Fig. 2, and the simulation lasts for a given number of seconds. The fire was fueled with propane, and we assigned a magnitude to the soot and HRRPUA which is Heat Release Rate per Unit Area, based on the amount of fire and smoke we wanted to produce. The quantity ‘Thermocouple’ represents the actual room temperature, while ‘Temperature’ represents the gas temperature. As illustrated in Fig. 1, both of these sensors were placed in each room, with ‘1’ being a thermocouple and ‘2’ being a gas sensor. Finally, we created a slice file with a temperature range in the Y-axis plane (Fig. 2). FDS allows us to view the output when the simulation is finished by loading each quantity. All of the quantities we created, such as the soot density shown in Fig. 3, the fire and the Heat Release rate shown in Fig. 4, the thermocouple and temperature slice files displayed in Figs. 5 and 6, respectively, were able to be loaded. After the simulation is over, FDS creates two output files in CSV format: devc and hrr. We merged the two files into a single CSV output file to develop an alert system.
Fig. 1 Simulation of a house with four rooms: Kitchen, Living room, Bedroom, and TV Lounge with 1. Thermocouple and 2. Gas sensor placed in each room
42
A. Karre et al.
Fig. 2 Simulation of the house with five doors as the ventilation
Fig. 3 The figure depicts the soot density
3.1.2
Applying Machine Learning Models
This section discusses applying different machine learning models for detecting fire hazards. We first discuss the dataset and analyze it before selecting the features and modeling them into the different ML models. Dataset Description In this paper, we created a machine learning model to predict the room temperature based on other factors. We utilize the dataset for the model by merging multiple simulations generated from Fire Dynamics Simulator with varied temperature and fire parameters. The initial dataset consisted of 20 attributes and 9161 rows. The dataset included attributes such as TEMP1 which is the gas temperature of the first room, TC1 which is the thermocouple temperature of the first room, HRR which is the heat release rate, and many more. Based on the other features, we forecast the thermocouple measurement of the kitchen where there is fire. Following feature
Design and Development of a ML-Based Safety-Critical Fire Detection System
43
Fig. 4 The above figure shows the heat release rate (HRR)
Fig. 5 The slice file of thermocouple
extraction, the final dataset utilized for modeling includes the 12 most significant factors displayed in Table 1. Analyzing the dataset: We begin by importing the required libraries and analyzing the dataset. We use the Pandas head() and shape() methods to inspect the rows of the dataset and to get the total number of rows and columns, which are 9161 rows and 12 columns. We also look at the dataset’s data types, which were found to contain two different datatypes: int64 and float64. There are a few negative values in the dataset; negative numbers indicate the attribute’s opposite direction with no change in magnitude; thus, we used the absolute() function to convert the data frame to positive values.
44
A. Karre et al.
Fig. 6 The slice file of gas temperature Table 1 The final dataset used for modeling Attribute TEMP1 TEMP2 TEMP3 TEMP4 TC1 TC2 TC3 HRR Q _ RADI Q _ CONV Q _ COND MLR _ FUEL
Description This depicts the gas temperature of kitchen This represents the gas temperature of living room This depicts the gas temperature of TV Lounge This depicts the gas temperature of the bedroom This shows the measurement of thermocouple sensor in the kitchen This depicts the measurement of thermocouple in the living room This shows the thermocouple measurement in the TV lounge This represents the value of the amount of Heat Released from the fire we generated This represents the radiative heat released due to the fire This depicts the convective heat flux due to the fire This depicts the flow of enthalpy into the room due to the fire This shows the mass loss rate of the fuel as the fire generates
Design and Development of a ML-Based Safety-Critical Fire Detection System
45
Fig. 7 Correlation between Thermocouple and other features
Fig. 8 Heatmap of Correlation between Thermocouple and other features
Feature Selection: The main goal of this stage is to identify the best features. We compute the correlations between the Thermocouple temperature TC1 column and the other columns using Corrwith() and then construct a bar graph to display the results. Figure 7 shows the relationship between TC1 and the other attributes. The correlation is then visualized using a heatmap, as seen in Fig. 8. After carefully examining the correlation matrix’s values, we filtered the ones that most closely correlated with room 1’s thermocouple temperature. We considered the features with correlations greater than 0.75 and ignored the remainder. The final dataset is not just based on low correlation values, but we also excluded features that do not add much value to the model. Finally, we chose twelve attributes to model.
46
A. Karre et al.
Data Modeling: The dataset is now ready for modeling after the features have been chosen. Because a simulator generated it, there are no null values in this dataset. To estimate the temperature, we employed seven different machine learning regression models: Linear Regression, Lasso Regression, Ridge Regression, Support Vector Machine, Gradient Boost, Decision Tree, and Random Forest Regressor. We first separated the target variable from the feature variable for modeling the data. The thermocouple ‘TC1’ is the target variable stored in variable ‘y,’ with the rest of the features in a data frame ‘X’. We constructed a function called ‘print_ evaluate’ that used the 12 evaluation metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R Squared Error (R2) to evaluate the model. We divided the data into 80% training, 10 percent validation, and 10 percent testing using sklearn’s traintestsplit module. We then create an instance of LinearRegression by importing it from sklearn.linear model. After that, we used the .fit() function to fit the model to the X train and y train. We use the .predict() function on the X valid to predict the temperature and save it in a variable called test_ pred. The print_ evaluate function compares the actual value ‘y valid’ with the predicted value ‘test pred’ and prints the ‘MAE’, ‘MSE’, ‘RMSE’, and other metrics used. The same steps were taken for the Lasso Regression, Ridge Regression, Support Vector Machine, Gradient Boost, Random Forest Regressor, and Decision Tree algorithms. We evaluated the model using the validation set and observed the results. In addition to the above algorithms, we implemented hybrid algorithms by using StackingRegressor from mlxtend library in Python. Stacking is a meta-learning method that discovers the optimum approach that integrates the predictions of each base algorithm. The base models are part of the first layer, often known as the base layer. The second layer consists of a meta-regressor model. After we train the data using the base layer models, these predictions are utilized by the meta-regressor model to make the final prediction that best fits the dataset with maximum accuracy. The most straightforward application of this metamodel would be to average out the results from base models. We used two stacking regressor algorithms with varied base models and SVR as the meta-regressor. For the first stacking regressor model, we used linear regression and gradient boost as the base models and SVR as the meta-regressor. For the second stacking regressor, we used linear regression, decision tree, and ridge regression as base models. All of the models were over 86 percent accurate, with gradient boost, random forest, and first hybrid algorithm outperforming the other methods with an R2 score of 95 percent. To avoid over-fitting of data, we applied a cross-validation technique on the dataset using ShuffleSplit with 10 splits.
3.2 Experimental Setup A 64-bit Windows 11 computer with two graphics cards—Intel(R) UHD graphics 620 and NVIDIA GeForce MX250—was used for the experiment. On this system,
Design and Development of a ML-Based Safety-Critical Fire Detection System
47
Fig. 9 This figure shows the Email alert received to the recipient’s email address when the fire rose above the set threshold value
the simulation was carried out using FDS, and the data modeling was carried out using Jupyter Notebook.
3.3 Alert System The email/SMS alert system was built in a Jupyter Notebook. We did this by importing the ‘smtplib’ module and ‘EmailMessage’ modules present in Python library. A technology called Simple Mail Transfer Protocol (SMTP) manages email sending and email routing between mail servers. Sending email to any Internet device with an SMTP or ESMTP listener daemon is possible with Python’s smtplib module, which defines an SMTP client session object. The EmailMessage package serves as the email object model’s foundation class. With EmailMessage, we may specify and query header fields, access message bodies, and create or change structured messages. We defined an object of EmailMessage, created a dummy email, ‘[email protected]’, and connected to the Gmail server using smtplib module. We sent an email alert from this email to the recipient’s email with the subject ‘Fire Alert’ and the body of the email ‘There is a fire in the room,’ when the temperature rose above the set threshold of 47 ◦ C. When the temperature in the room exceeds 47 ◦ C, an email notification indicating there is a fire in the room appears as shown in Fig. 9. Changing the recipient’s email address with their mobile number can also be sent as an SMS alert.
3.4 Results After reaching a certain limit, the model can detect a fire and notifies the user through email. The machine learning model was evaluated using 12 evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Root Mean Squared Logarithmic Error (RMSLE), R Squared Error (R2), Mean Bias Error (MBE), Relative Absolute Error (RAE), Mean Absolute Percentage Error (MAPE), and Relative Squared Error. The evaluation of the metrics for each of the employed algorithms is depicted in Figs. 10 and 11.
48
A. Karre et al.
Fig. 10 The evaluation of various metrics on the algorithms
Fig. 11 The evaluation of various metrics on the algorithms
Fig. 12 The graph shows the accuracy of each algorithm
All the models achieved an accuracy score of over 0.85. After comparing the performance of all the models, the algorithms that performed well are Gradient boost, Random Forest regressor, and the first hybrid algorithm which achieved a score of 0.96. The performance of the models is depicted in Fig. 12
Design and Development of a ML-Based Safety-Critical Fire Detection System
49
4 Conclusion and Future Work Predicting fire is critical to save the environment and living beings. In this paper, a fire detection model is proposed. A simulation of a fire was made to understand the impact it has on different types of houses and locations, an alert system was developed that notifies the responsible person whenever there is a fire or a rise in the temperature and finally several machine learning models including hybrid models were applied on the dataset to predict fire. The data was analyzed using seven different machine learning algorithms including two hybrid algorithms and evaluated with twelve evaluation metrics. The random forest regressor, gradient boost, and hybrid algorithm 1 perform the best in predicting the required values. In the future, this system can be implemented in real-time systems and can further be developed to take dynamic data as input to predict fire to prevent it. After the system has undergone real-world testing and has been demonstrated that it identifies and omits fire dangers in real-life surroundings, it can be deployed initially in household contexts based on the results and findings we obtained. Because our system has the capability to notify users, we could create a website using the programming languages Python and Django and servers provided by Heroku so that users can track incidents and store data. A similar work done by [9] could be used as a reference.
References 1. McGrattan KB, Baum HR, Rehm RG, Hamins A, Forney GP, Floyd JE, Hostikka S, Prasad K (2000) Fire dynamics simulator–Technical reference guide. National institute of standards and technology, building and fire research 2. Wang Y, Chatterjee P, de Ris JL (2011) Large eddy simulation of fire plumes. Proc Combust Inst 33(2):2473–2480 3. Pham BT, Jaafari A, Avand M, Al-Ansari N, Dinh Du T, Yen HPH, Van Phong T, Nguyen DH, Van Le H, Mafi-Gholami D et al (2020) Performance evaluation of machine learning methods for forest fire modeling and prediction. Symmetry 12(6):1022 4. de Almeida Pereira GH, Fusioka AM, Nassu BT, Minetto R (2021) Active fire detection in landsat-8 imagery: a large-scale dataset and a deep-learning study. ISPRS J Photogramm Remote Sens 178:171–186 5. Shi X, Songlin L (2020) Design and implementation of a smart wireless fire-fighting system based on NB-IoT technology. In: Journal of physics: conference series, vol 1606. IOP Publishing, p 012015 6. Saeed F, Paul A, Rehman A, Hong WH, Seo H (2018) Iot-based intelligent modeling of smart home environment for fire prevention and safety. J Sens Actuator Netw 7(1):11 7. Yin H, Wei Y, Liu H, Liu S, Liu C, Gao Y (2020) Deep convolutional generative adversarial network and convolutional neural network for smoke detection. Complexity 8. Shahid M, Chien I, Sarapugdi W, Miao L, Hua K-L et al (2021) Deep spatial-temporal networks for flame detection. Multimed Tools Appl 80(28):35297–35318 9. Muheden K, Erdem E, Vançin S (2016) Design and implementation of the mobile fire alarm system using wireless sensor networks. In: 2016 IEEE 17th international symposium on computational intelligence and informatics (CINTI). IEEE, pp 000243–000246
Compressed Image Super-Resolution Using Pre-trained Model Assistance Umar Masud and Friedhelm Schwenker
Abstract Single Image Super-Resolution (SISR) has seen flourishing research in the past years witnessing many different solutions ranging from heuristics based to deep learning based models. However, little attention has been paid to compressed images that are mostly found in the commonplace. The challenge involved with compressed image super-resolution is that there are compression artefacts which makes it even more difficult to learn the high-resolution image from the low-resolution one. To this end, we evaluate a simple, convolution based model which uses a large scale pre-trained feature extractor network for learning during the training phase and then works independently during inference without the pre-trained model, making it a lightweight, deployable solution. The performance of our proposed system is assessed with various existing literature and experimental results show that our technique achieves competitive results in comparison. Keywords Super-resolution · Image enhancement · JPEG Compression · CNN
1 Introduction Single Image Super-Resolution (SISR) is a longstanding visual task which aims to restore a super-resolution (SR) image from its degraded low-resolution (LR) counterpart. SISR finds significance in many image processing and analysing systems with its variety of applications in medical imaging, security, and surveillance imaging. Recently, convolution neural networks (CNN) based SISR methods have achieved more remarkable performance than conventional methods due to their powerful feature extraction ability. It is observed that deeper networks are better at learning rich details and yield finer visual quality HR images. Therefore, early works such U. Masud (B) Jamia Millia Islamia, New Delhi, India e-mail: [email protected] F. Schwenker Ulm University, Ulm, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_5
51
52
U. Masud and F. Schwenker
Fig. 1 Uncompressed LR image (left) and Compressed LR image with QF = 10 (right)
as [1–5], had adopted various CNN based approaches such as residual connection, channel attention, generative adversarial networks, etc. for SR task. However, all of these works are based on the assumption that the LR images are uncompressed and lossless. This assumption falls flat in the real-world scenario where most of the data stored are compressed in some way or the other. In the case of images, there exist many compression techniques but JPEG is one of the most widely used standards. JPEG compression splits an image into 8 × 8 blocks and applies discrete cosine transform (DCT) to each block. The DCT coefficients are then divided by a quantization table which is usually represented by an integer called quality factor (QF) ranging from 0 to 100, where a lower quality factor means less storage size but more lost information. Figure 1 shows the difference between an uncompressed image and a compressed image with QF = 10. The lossless LR image suffers only from blurring and downsampling, but a JPEG compressed lossy image additionally suffers from compression degradations as well. The compression process would cause information loss, especially at a high compression ratio. Intuitively, it is much harder to recover the HR image from a compressed LR image than from an uncompressed one. To tackle this problem, we investigate a simple CNN based model with residual blocks and skip connections which uses a pre-trained network as an assistant during the training phase to learn the features from the LR image. The pre-trained assistant network is discarded during inference and only the CNN model performs super-resolution of the given LR input, thus making it lightweight and deployment-friendly. The residual CNN model is inspired by the sub-module of MobileSR [6] model with some fundamental changes. We use a combination of dilated (atrous) and depth-wise convolutions connected densely as the main backbone before up-sampling is done. We then also employ Convolution Block Attention Module (CBAM) [7] which extracts richer features from both spatial and channel dimensions. Finally, a tail module is applied to further enhance the network’s representations. The pre-trained network used for assisting in feature learning is the Swin Transformer [8] model pre-trained on ImageNet dataset. With the property of learning long-range dependencies, such a transformer based model was a perfect fit for a feature extraction network. Our proposed technique achieves competitive results on various benchmarks. Thus, our contributions can be summarized as:
Compressed Image Super-Resolution Using Pre-trained Model Assistance
53
• A novel model is proposed which makes use of densely-connected convolutions for the task of generating super resolution images from compressed, lossy lowresolution images. • We implement a unique model training strategy which uses a large-scale pretrained network to assist our main model in learning better representations from the data. During inference, we discard the pre-trained network without much drop in accuracy, making our model more robust and effective with very little extra cost. Additionally, unlike other works, we are able to train our model in a single stage.
2 Related Work In recent years, image super-resolution models based on deep learning has achieved state-of-the-art performance. The existing methods can be categorized according to their model designs. One of the earliest and simplest network designs like [1, 9] were linear networks in which several convolution layers are stacked on top of each other without any residual or skip connections. The limitation of these simple linear models was that they could not be made deeper because of the vanishing gradient problem. Thus, the residual connections were used to solve this and much deeper networks such as [4, 10] were proposed. Similar to residual connections, unique recursive style models like [11, 12] came up which used recursive units within the model to save upon parameter complexity. References [13, 14] methods used dense connections in their work to further enhance the representation learning and improve the quality of SR images. Kim et al. [15] used a residual attention mechanism involving spatial attention and channel attention for learning the inter-channel and intra-channel dependencies. Generative Adversarial Networks (GAN) are also used in many works such as [3, 5] where the generator tries to output a SR image and the discriminator tries to classify it as a real HR image or an artificially super-resolved output. Lately, with the advent of transformer architectures, many works such as [16, 17] are solving the SISR task via self-attention modelling. However, none of these methods has directly explored compressed image superresolution. Chen et al. [18] was one of the first works to attempt super-resolution on lossy images. It consists of three units—the deblocking unit, upsampling unit and quality enhancement unit. It first trains each module separately and then jointly optimises them with fine-tuning strategy. Chang et al. [19] introduces a novel loss and training scheme that can produce HR images while minimising artefacts. Gunawan and Madjid [20] proposes a network that employs a two-stage coarse-to-fine learning framework for directly learning on compressed images. Thus, we see that there is not enough focus on compressed image super-resolution and in this work we try to provide some novel solutions for it.
54
U. Masud and F. Schwenker
3 Proposed Model As seen in Fig. 2, our model consists of a CNN based architecture with many residual connections in between. The major backbone of the model consists of residual groups which are densely connected. Each residual group consists of a dilated, depth-wise, and again dilated convolution sandwich with Leaky ReLU (Rectified Linear Unit) activation. Dilated or atrous convolutions have an inflated kernel which gives a bigger receptive field without adding to the parameter or computation cost. The idea is to expand the area of the input image convolved and help cover more information from the output obtained. In depth-wise convolution, each input channel is convolved with a single individual filter and the outputs are then stacked together. This reduces the number of parameters and computations used in convolutional operations while increasing representational efficiency. Thus, our intuition behind using a combination of atrous and depth-wise convolution layers is to maximise the feature learning capacity with minimum operational cost. Along with dense residual connections, our backbone can retain useful representations from all the initial to the later layers, effectively aiding in the flow of information. We use four residual blocks in our backbone with each having an exponentially increasing filter depth. After the backbone, we employ the upsampler module which uses pixel-shuffle and convolution layer. The upsampled image is then passed through a post-processing module which comprises of CBAM [7] attention block and a set of point-wise convolutions to refine the output. The CBAM block enhances the learned features by focusing on ‘what’ to attend and ‘where’ to attend on the feature maps. We use pointwise convolutions in this part for two major reasons: (a) To add more non-linearity in the model as it will expand the possibilities for the network, and (b) To keep the computational cost as little as possible.
Training Stage with Swin SWIN
Backbone
Post Processing Block
Compressed LR input
CBAM
Bicubic Upsampling
3x3 conv
Residual Block
Upsampler Module
Inference Stage w/o Swin
1x1 Conv
FC Layer
Upscaled SR output
Add
Fig. 2 The overall architecture of proposed model. It consists of residual backbone, the upsampler module and the post-processing module containing CBAM [7] attention block and point-wise convolutions. The Swin [17] network is used during the training stage to fuse information in the intermediate layers of the model
Compressed Image Super-Resolution Using Pre-trained Model Assistance
55
An important component of our solution is to leverage the pre-trained network for fusing valuable features extracted between the different modules of our model. For this purpose, we chose the Swin Transformer [17] model which is quite good at learning long-range dependencies in the input data, as is the characteristic of transformer models. We take the global averaged output of the swin network after passing our low-resolution input image and then pass it through dense linear layers before adding it to our main model. This amalgamation of features helps in directing our model during the training, adding more substantial content features. As seen earlier, the compressed LR images are even noisier to learn from due to compression artefacts and thus, using a pre-trained network somewhat makes up for the content loss. During testing, we drop the pre-trained network and only our main model is used to output super-resolution images. We find that our model can function efficiently without much loss in performance during inference. This gives our model an advantage as it could now be further used at low-resource endpoints. As the training data available is itself very less for the task of super-resolution for a deep learning model to efficiently work, our strategy to use a pre-trained network for assisting a lighter model suits the problem well.
4 Experiment and Results 4.1 Implementation Overview The dataset used for training is Div2k which contains 800 images. The test sets include Set5, Set14, B100, Urban100, and Manga109. The low-resolution images in the training and test sets are compressed with a quality of 10% of the original image consistent with the CISR-Net [20] baseline. For training our model, we use three types of loss functions which are widely adopted in the super-resolution task. The first one is the conventional L1 loss, the second is the content loss or the VGG loss where the output feature of the SR and the ground truth image are taken from the VGG model and compared at the feature level. The aim is to maximise perceptual similarity. The third loss is the frequency loss as proposed in [23]. It helps in estimating high-frequency details that the other losses are missing out on. Thus, our final loss function is the summation of the above losses. (1) L = L1 + Lvgg + L f r eq Our model is trained for 200 epochs on random LR patches of size 48x48. The images are normalised and augmented with horizontal, vertical and rotation flips randomly. The learning rate is 0.0002 with a momentum of 0.9. The optimiser is Adam and we also use a StepLR scheduler with milestones of 70, 120 and 170 with a 0.5 gamma value.
56
U. Masud and F. Schwenker
4.2 Results Our results are shown in Table 1 with comparisons in scale factor 2 and 4. All the baseline results are taken from [20]. We used the PSNR and SSIM metrics which are commonly adopted for super-resolution task on the Y-channel of our outputs and compared the results with various methods. On the scale factor of 2, our model slightly underperforms than the other methods except on Set14 and Urban100 where we achieve the highest PSNR value of 26.788 and the highest SSIM value of 0.736 respectively. For 4x up-scaling, we are able to achieve the highest PSNR/SSIM values of 23.974/0.598 and 23.892/0.577 in Set14 and B100. We also compare the number of parameters in the model which shows our model is second best in terms of parameter count with just 4.91M parameters. We observe that B100, Urban100 and Manga109 are more complex datasets with more intricacies and fine details in the images and as such extremely small models like VDSR [2] fail to perform well on such datasets. As the complexity of the images increase, classical SR models fail to perform well under added compression artefacts. Thus, we conclude that in terms of performance on the metrics as well as parameter count, our model achieves competitive results.
4.3 Discussion The visual comparison of the SR results obtained using different methods on sample images from the test sets is shown in Fig. 3. It is observed that our model with much fewer parameters is still able to reconstruct a super-resolution image preserving all the perceptible details. Our model keeps the high-frequency and structural information intact, reducing all other artefacts generated due to down-sampling and
Table 1 Results for super resolution images where the LR input was compressed to 10% quality. The values represented are Peak Signal to Noise Ratio (PSNR)/Structural Similarity Index (SSIM) Datasets Model
Scale
Set5
Set14
B100
Urban100
Manga109
Params (in million)
VDSR [2]
x2
29.138/0.807
25.514/0.660
24.983/0.617
23.194/0.642
25.572/0.771
0.67
x4
26.564/0.725
22.919/0.544
23.263/0.528
20.777/0.506
21.802/0.642
x2
28.817/0.814
26.723/0.708
26.050/0.659
24.775/0.724
25.988/0.835
x4
24.987/0.698
23.827/0.594
23.812/0.558
21.744/0.577
20.414/0.672
x2
28.929/0.816
26.784/0.708
26.098/0.660
24.941/0.727
25.859/0.833
x4
24.191/0.640
23.215/0.561
23.466/0.537
21.088/0.526
20.657/0.628
CISRNet [20] x2
28.936/0.816
26.777/0.708
26.081/0.659
24.932/0.727
26.013/0.836
x4
25.026/0.702
23.880/0.596
23.827/0.558
21.861/0.582
20.210/0.671
RDN [14] RCAN [21]
22.23 15.55 9.85
HST [22]
x4
22.510/0.637
21.860/0.542
22.200/0.518
20.430/0.561
20.940/0.686
11.90∼16.58
Ours
x2
27.621/0.771
26.788/0.699
25.950/0.631
24.832/0.736
25.565/0.767
4.91
x4
24.934/0.690
23.974/0.598
23.892/0.577
20.930/0.520
21.863/0.636
Compressed Image Super-Resolution Using Pre-trained Model Assistance
HR Image
VDSR
RCAN
RDN
CISR-Net
57
Ours
Fig. 3 Sample images from the test sets are compared for various benchmarks for 4x up-scaling
compression. Our model has removed blurriness and improved on minute features much better than the other methods. Using a pre-trained network gave a boost in reconstructing the visual content of the images meanwhile the main model could focus on preserving other finer details. Thus, qualitatively our method performs at par with the existing state-of-the-art models. However, there is still much scope in improving the results overall and we need to focus on handling the compression better. For instance, in the second image of a building in Fig. 3, we observe that having many parallel lines confuses the model and depreciates its performance a bit, introducing slight aberrations.
5 Conclusion In this work, we proposed a novel model which consists of a residual backbone and a post-processing block consisting of an attention module and point-wise kernels to solve the task of compressed image super-resolution. We also present a special training strategy by leveraging a large scale pre-trained network which fuses information into the main model during the training stage. Our solution has shown convincing results on various benchmarks both quantitatively and qualitatively. Compressed image super-resolution is still challenging and needs many improvements which can be brought through effective model architectures, training strategies, loss functions, etc. In future, one can also explore working on using reference image based techniques and compressed videos for super resolution.
58
U. Masud and F. Schwenker
References 1. Dong C, Loy C, He K, Tang X (2015) Image super-resolution using deep convolutional networks. arxiv.org/abs/1501.00092 2. Kim J, Lee J, Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1646–1654 3. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2016) Photo-realistic single image super-resolution using a generative adversarial network 4. Lim B, Son S, Kim H, Nah S, Lee K (2017) Enhanced deep residual networks for single image super-resolution. arxiv.org/abs/1707.02921 5. Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Loy C, Qiao Y, Tang X (2018) ESRGAN: enhanced super-resolution generative adversarial networks 6. Sun L, Pan J, Tang J (2022) ShuffleMixer: an efficient ConvNet for image super-resolution. arXiv:abs/2205.15175 7. Woo S, Park J, Lee J, Kweon I (2018) Convolutional block attention module. ECCV, CBAM 8. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows 9. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arxiv.org/abs/1409.1556 10. Ahn N, Kang B, Sohn K (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. arxiv.org/abs/1803.08664 11. Tai Y, Yang J, Liu X (2017) Image super-resolution via deep recursive residual network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2790–2798 12. Tai Y, Yang J, Liu X, Xu C (2017) MemNet: a persistent memory network for image restoration. In: 2017 IEEE international conference on computer vision (ICCV), pp 4549–4557 13. Tong T, Li G, Liu X, Gao Q (2017) Image super-resolution using dense skip connections. In: 2017 IEEE international conference on computer vision (ICCV), pp 4809–4817 14. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image superresolution. arxiv.org/abs/1802.08797 15. Kim J, Choi J, Cheon M, Lee J (2018) RAM: residual attention module for single image super-resolution. arXiv:abs/1811.12043 16. Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5790–5799 17. Liang J, Cao J, Sun G, Zhang K, Gool L, Timofte R (2021) SwinIR: image restoration using swin transformer. In: 2021 IEEE/CVF international conference on computer vision workshops (ICCVW), pp 1833–1844 18. Chen H, He X, Ren C, Qing L, Teng Q (2017) CISRDCNN: super-resolution of compressed images using deep convolutional neural networks 19. Chang S, Kim J, Hahm C (2020) A lightweight super-resolution for compressed image. In: 2020 IEEE international conference on consumer electronics - Asia (ICCE-Asia), pp 1–4 20. Gunawan A, Madjid S (2022) CISRNet: compressed image super-resolution network. arxiv.org/abs/2201.06045 21. Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks 22. Li B, Li X, Lu Y, Liu S, Feng R, Chen Z (2022) HST: hierarchical swin transformer for compressed image super-resolution 23. Cho S, Ji S, Hong J, Jung S, Ko S (2021) Rethinking coarse-to-fine approach in single image deblurring. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 4621– 4630 24. Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734
Optimization of Character Classes in Devanagari Ancient Manuscripts and Dataset Generation Sonika Rani Narang, Munish Kumar, and M. K. Jindal
Abstract In any OCR system, the character classes define the set of characters or shapes to be recognized during the recognition process. So, it is necessary to identify character classes for an OCR system. The Devanagari script consists of several basic characters (vowels and consonants), half-forms of characters, modifiers, diacritics, and conjuncts. An analysis of the characters in Devanagari ancient documents has been presented in this chapter. Based on the analysis of such documents, two sets of character classes have been categorized in this work. One set contains only basic characters (vowels and consonants) of the Devanagari script. It does not contain any modifier, conjunct or diacritical symbols and is labeled as DATASET-A. DATASET-A contains 33-character classes. The other set contains all identifiable shapes of Devanagari ancient manuscripts and is labeled as DATASET-B. A total of 236-character classes have been identified in DATASET-B. In this work, the character class count for DATASET-B has been optimized to 116 characters. In this work, a character dataset of 22,522 characters has been used. Discrete Cosine Transform (DCT) zigzag features have been extracted and a Support Vector Machine (SVM) classifier has been used for the recognition of ancient Devanagari characters. A maximum accuracy of 88.69% has been obtained with the above-mentioned setup. Keywords Database Manuscript DCT
Character database generation Devanagari ancient
S. R. Narang Department of Computer Science, DAV College, Abohar, Punjab, India e-mail: [email protected] M. Kumar (&) Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, India e-mail: [email protected] M. K. Jindal Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_6
59
60
S. R. Narang et al.
1 Introduction India is known for a large number of scripts such as the Indus script, Brahmi, Kharoshthi, Gupta, and Kaliga scripts, etc. Many scripts are derived from the Brahmi script, such as Devanagari, Tamil, Telugu, Odia, Assamese, and Bangali etc. These documents contain India’s scientific and cultural heritage. Devanagari has been used from the ancient times to write various scriptures. The Devanagari is used in a variety of languages, such as Hindi, Sanskrit, Nepali, Marathi, Konkani, Maithali, and Sindhi. The Devanagari script has evolved over a period of more than two thousand years. The Devanagari language comprises a total of thirty-three consonants and eleven vowel characters. There are vowel signs which are used to represent vowel sounds in short and long forms. Vowel sounds are indicated by a vowel diacritic if they are preceded by a consonant character and by vowel letters. In ancient Devanagari documents, some characters occur with very little frequency. Also, some characters have alternate representations. There are many characters which are not present in the modern Devanagari script. In ancient Devanagari documents, characters in conjuncts are fused with each other. These fused characters result in a segmentation error. To handle this error, these conjuncts are considered as separate classes. Chaudhury and Pal [1] identified three groups of characters in the Devanagari script. These groups are called basic characters, modifiers, and compound characters. Bansal and Sinha [2] have given a method to segment Devanagari conjuncts. Kompalli et al. [3] have a differentiated half form of consonants from the full form of consonants. Singh and Lehal [4] have an optimized character class count for the Devanagari script. They identified a total of 942 recognizable units. Kumar et al. [5] used hierarchical zoning with four features, namely horizontal peak extent, vertical peak extent, centroid and diagonal features. For recognition of off-line handwritten Gurmukhi characters. After literature review, it was observed that Devanagari ancient documents are not available easily and there is no character database available for these documents. It motivated us to create our own database and experiment with the recognition of these characters. In the present work, an analysis of 300 pages of Devanagari ancient documents has been presented. Based on the analysis of such documents, two sets of character classes have been categorized. One set contains only 33 basic characters (vowels and consonants) of the Devanagari script and is labeled as DATASET-A. The other set contains all identifiable shapes of Devanagari ancient manuscripts and is labeled as DATASET-B. A total of 236-character classes have been identified in DATASET-B. In this work, the character class count has been optimized to 116 characters. For this, a character dataset of 22,522 characters has been used. Discrete Cosine Transform (DCT) zigzag features have been extracted and a Support Vector Machine (SVM) classifier has been used for the recognition of ancient Devanagari characters. A maximum accuracy of 88.69% has been obtained with the above-mentioned setup. This study aims at the generation of character dataset for Devanagari ancient manuscripts. As per the best of our knowledge, this is the first study in this field.
Optimization of Character Classes in Devanagari …
61
Table 1 33 Basic Characters found in present work
2 Identification of Dataset-A In this work, 33 basic characters, used in Devanagari ancient manuscripts, have been found. Though, Devanagari script contains 44 basic characters (11 vowels and 33 consonants), in this work, a total of 33 classes are identified, because a few samples like, आ ,ओ, and औ get converted to अ and matra, ई gets converted to इ and matra and ऐ gets converted to ए and matra in the segmentation process. Also, a few samples like झ, ञ, ठ, ढ, ण are either not found in ancient documents or their frequency is less than 15 in the samples taken so far. One character ‘श’ was included because श is composed of two symbols ‘ ’ and ‘ा’. Table 1 presents these 33 classes.
3 Identification of Dataset-B In Devanagari OCR, a word is divided into 3 zones as shown in Fig. 1. In the present work, only two zones have been considered: the upper zone and the middle zone. The lower zone is combined with the middle zone to get a new core zone as depicted in Fig. 2. Individual shapes in each zone are extracted, and these shapes are categorized into different classes. Sometimes, these symbols may not be valid Devanagari characters. So, the concept of recognizable units is used. A recognizable unit is the smallest unit obtained from the segmentation process. A single Devanagari character may be divided into more than one recognizable unit. For example: श can be
62
S. R. Narang et al.
Fig. 1 Three strips of a word in Devanagari
Fig. 2 Two zones of ancient Devanagari word
divided in two parts: and ा. ‘ि’ matra can be divided in two classes: one ‘ा’ in the middle zone and another ‘ ’ in the upper zone. Similarly, ो can be converted to two-character classes ‘ा’ and ‘े’. Sometimes, two Devanagari characters may be placed in a single class because of the difficulties in separating them. The lower zone of the Devanagari script has been combined with the middle zone. For example, ‘oo’ matras ु and ू are combined with the preceding consonant to make a single recognizable unit. कू is one such example. Also, conjuncts like are and considered as a single class. Some of the words in Devanagari ancient documents and their constituent recognizable units are shown in Table 2. Singh and Lehal [4] identified 942 recognizable units (78 basic and 864 compound characters). However, the majority of these units are not well represented in the documents. Especially in ancient Devanagari documents, only a small subset of the total possible characters is used. After analyzing 300 Devanagari ancient documents, 239 characters were identified, which are used in these documents. A frequency analysis of the occurrence of each symbol has been done. Table 3 shows an example of some character classes and their percentage occurrence. It has been observed that vowel modifiers ‘ा’ and ‘े’ (vowel modifier of आ and ए) occur the maximum number of times. With the help of this statistical analysis, recognizable units can be identified which should be recognized correctly to get higher accuracy. These recognizable units are large in number. Also, for some of the characters, only a few occurrences are obtained as compared to other characters. The number of occurrences for some characters is so small that enough cases were not obtained for
Table 2 Recognizable units corresponding to some sample words Word
Recognizable Unit
Optimization of Character Classes in Devanagari …
63
Table 3 Percentage occurrence of recognizable units in devanagari ancient document
Sr.
Unit
%age
no.
Sr.
Unit
%age
no.
Sr.
Unit
%age
no.
1.
12.37917
15.
1.8147
29.
0.870874
2.
7.163049
16.
1.650556
30.
0.861755
3.
4.646179
17.
1.532008
31.
0.720409
4.
4.295094
18.
1.386102
32.
0.652015
5.
4.13551
19.
1.267554
33.
0.60186
6.
3.811782
20.
1.203721
34.
0.597301
7.
3.602043
21.
1.199161
35.
0.588182
8.
3.2236
22.
1.085172
36.
0.569943
9.
2.567025
23.
1.080613
37.
0.547146
10.
2.539668
24.
1.080613
38.
0.524348
11.
2.535109
25.
0.948386
39.
0.50611
12.
2.288893
26.
0.893671
40.
0.496991
13.
2.279774
27.
0.884552
41.
0.483312
14.
2.261536
28.
0.879993
42.
0.474193
training as well as testing. Some of the characters are found only once or twice in these documents. So, these classes need to be optimized. So, a coverage analysis has been done to find the contribution of different subsets of recognizable units. This analysis indicates the number of character classes which are enough to achieve the acceptable character recognition accuracy. From the Table 4, it has been found that first 116-character classes, if correctly recognized by classifiers, can provide the classification correctness of 98.08%, and it can be increased to 99.97% by considering first 230 classes. By increasing 114 classes, only 1.89% increase in accuracy can be achieved. But, for these last 114 classes enough occurrences are not available. So, classifiers cannot be trained properly for these classes. For this work, first 116 classes were considered for classification. For these classes, minimum 15
64
S. R. Narang et al.
Table 4 Percentage contribution of recognizable units Number of units
Percentage contribution
Number of units
Percentage contribution
Number of units
Percentage contribution
10 20 30 40 50 60 70 80 90
48.36312 66.58307 76.36786 82.17217 86.54022 89.969 92.68649 94.5787 95.85081
100 110 116 120 130 140 150 160 170
96.85847 97.65639 98.08043 98.31297 98.74612 99.04249 99.27503 99.46653 99.60332
180 190 200 210 220 230 236
99.69451 99.7857 99.83586 99.88145 99.92705 99.97264 100
occurrences are found for each character. For many classes in top 116 classes, occurrences are still not enough. For experiments, some synthetic data was included in those classes for which enough occurrences are not available. Table 4 presents the coverage analysis.
4 Generation of Dataset The dataset is the basis of any character recognition system. In our system, the vowels, consonants, modifiers and conjuncts have been treated as basic recognizable units. After the identification of character classes, the next phase is to accumulate the character images corresponding to each class. These character images have been used as training and testing samples. As no standard character image dataset is available for Devanagari ancient manuscripts, own dataset has been created. 300 pages from different books written from fifteenth to eighteenth centuries have been used. These text documents have been acquired from different libraries and museums using a scanner and digital camera. These images are pre-processed and converted to binary images. To get recognizable units, these images are segmented using line segmentation, word segmentation and character segmentation algorithms proposed in Narang et al. [6, 7]. These images are then labeled to their corresponding classes. Figure 3 shows the methodology used to get dataset.
Optimization of Character Classes in Devanagari …
65
Fig. 3 Procedure for dataset preparation
5 Experimental Setup 5.1
Feature Extraction
In present work, DCT zigzag features have been obtained for the recognition of Devanagari ancient characters. DCT is widely used in image processing due to its “energy compaction” property. Elementary frequency components are obtained after compression of data by DCT. Low-frequency (high value) coefficients are clustered in the upper left corner and high-frequency (low value) coefficients are in the bottom right corner of the matrix (m, n) if the image has m*n pixels. Only a few components in the upper left corner are meaningful and are used as features. The formula to obtain DCT matrix is given as below: XM1 PN1 ð2x þ 1Þpu ð2y þ 1Þpv f ðu; vÞ ¼ aðuÞaðvÞ x¼0 f ð m; n Þ cos cos y¼0 2M 2N ( where aðuÞ ¼
p1ffiffiffi ; u M
¼0 and aðvÞ ¼ p2ffiffiffi ; 1 u M 1 M
1 N;v ¼ 0 2 pffiffiffi ; 1 v N N
1
After applying the above formula, Feature matrix is obtained. Then, only a few low frequency coefficients are selected in a zigzag manner as depicted in Fig. 4 to construct the feature space.
66
S. R. Narang et al.
Fig. 4 DCT zigzag features
5.2
Classification
To determine the class to which a data element belongs, classification is used. Binomial and multi-class classifications are the two different types of classification. 116 classes have been identified through our analysis. Multi-class classification has therefore been applied. Support vector machine (SVM) classifier has been utilised for classification task. Finding hyperplanes that divide data into classes is a necessary step in the SVM classifier. It is simple to classify new data elements once ideal hyperplanes have been found. As depicted in Fig. 5, the goal is to locate the ideal hyperplane (known as support vectors).
Fig. 5 SVM classifier
Optimization of Character Classes in Devanagari …
67
Table 5 Recognition results with DCT Zigzag features of different lengths and SVM Train-test split
DCT-75
DCT-100
DCT-125
DCT-150
DCT-175
DCT-200
85:15 80:20 75:25 70:30 Fivefold Tenfold
85.77 85.26 84.64 83.59 74.51 78.38
88.38 88.69 87.50 86.04 77.62 81.02
86.65 86.97 86.16 85.52 76.73 79.60
85.77 85.26 84.64 83.59 74.51 78.38
86.59 85.88 85.44 84.19 75.75 79.45
86.03 85.55 85.19 83.89 75.51 79.24
5.3
Experimental Results
Based on the coverage analysis discussed in Sect. 4, 116 classes were used for the character recognition. Depending upon contribution of a character, 22,522 Characters categorized in 116 classes are considered for recognition. For Recognition, DCT (Discrete Cosine Transform) features are used with SVM (Support Vector Machine) classifier. DCT zigzag features with feature vector lengths of 75, 100, 125, 150, 175 and 200 are taken for experiments. SVM is used with RBF, POLY, Linear and sigmoid kernels. Better accuracy was obtained with POLY kernel of SVM. So, Experimental results are presented with POLY kernel of SVM classifier. Also, experiments were done with various train-test ratios. Table 5 shows percentage accuracy obtained with these ratios. Using Poly kernel of SVM, maximum accuracy achieved is 88.69% by a feature vector of length 100 with 80:20 train test ratio.
5.4
Comparison with Existing Work
Kumar et al. [5] used hierarchical zoning. They experimented with four features, namely, centroid features, diagonal features, horizontal peak extent features, and vertical peak extent features. For the purpose of recognising off-line handwritten Gurmukhi characters, they proposed a feature set containing 105 elements. SVM was used as a classifier. We experimented with the methods from the aforementioned study on our database in order to validate our proposed strategy, and we compared the outcomes with our proposed method, which is shown in Table 6. As depicted in Table 6, we got better results with the proposed technique as compared to the existing method on Devanagari ancient documents.
68
S. R. Narang et al.
Table 6 Comparative study of DCT features with the existing work Author
Data
Technique
Accuracy (%)
Kumar et al. [5]
Ancient Devanagari text Ancient Devanagari text
Centroid, diagonal, Horizontal peak extent and vertical peak extent features with SVM classifier
82.97
DCT zigzag features with SVM
88.69
Proposed work
6 Conclusion The factors having utmost importance in the character recognition process are the identification and selection of character classes, because of the dependency of recognition process on them. The set of identifiable characters and sub-characters known as recognizable units are represented by those character classes. If there exists a tendency of recognizable units of the script to combine to form new recognizable units, the task of recognition becomes more important. Devanagari script in addition to its basic character set, have large set of compound characters. If a single recognizable unit is assumed for a compound character, an analysis of text corpus shows that a total of 236-character classes are required. To recognize a text document image, only 236-character classes will be enough. As it becomes difficult to handle large number of classes, so in order to optimize the character class count, the occurrence of each class has also been determined. According to the results, classification correctness of 98.08% can be achieved using 116-character classes. By choosing more character classes, the results of the classification can be improved further. The character classes identified make up the basis of data collection. For the training of classifiers, training image samples have been collected from 300 pages of Devanagari ancient manuscripts. Dataset of 22,522 characters has been used for experiments. DCT zigzag features have been extracted and SVM classifier has been used. Maximum accuracy of 88.69% has been achieved.
References 1. Chaudhuri BB, Pal U (1997) An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In: Proceedings of the 4th international conference on document analysis and recognition, Germany, vol 2, pp 1011–1015 2. Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35:875–893 3. Kompalli S, Nayak S, Setlur S (2005) Challenges in OCR of Devnagari Documents. In: Proceedings of the 8th international conference on document analysis and recognition (ICDAR), vol 1, pp 327–331
Optimization of Character Classes in Devanagari …
69
4. Singh J, Lehal GS (2011) Optimising character class count for Devanagari optical character recognition. Inf Syst Indian Lang 139:144–149 5. Kumar M, Jindal MK, Sharma RK (2014) A novel hierarchical techniques for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(6):567–572 6. Narang SR, Jindal MK, Kumar M (2019) Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscript. Multimed Tools Appl 78:23255– 23280. https://doi.org/10.1007/s11042-019-7620-6 7. Narang SR, Jindal MK, Kumar M (2019) Line segmentation of Devanagari ancient manuscripts. In: Proceedings of the national academy of sciences, India Section A: Physical sciences, pp 1–8. https://doi.org/10.1007/s40010-019-00627-2
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images Mondal Dhiman, Chatterjee Chandra Churh, Roy Kusal, Paul Anupam, and Kole Dipak Kumar
Abstract In the agricultural field, identification of rice variety is an important and tough task, especially in naked eyes based on some manual observations and methods. This article deals with a deep learning-based approach for the identification and classification of twenty rice varieties. In this work a deep learning network is developed based on traditional Convolutional Recurrent Neural Network architecture with Long Short-Term Memory as the Recurrent operation. The proposed deep learning model achieved a lower cross-entropy loss of 0.0205 and a higher accuracy of 99.76%. Keywords Rice varieties · Convolutional recurrent neural network · Long short term memory · Seed coat morphology
1 Introduction Rice is the most widely consumed staple food around the world, especially in Asia. It plays an important role in respect of nutrition and caloric intake, providing more than one-fifth of the calories consumed worldwide by humans [5]. It is the seed of the grass M. Dhiman (B) · C. Chandra Churh · K. Dipak Kumar Department of Computer Science and Engineering, Jalpaiguri Government Engineering College, Jalpaiguri, West Bengal, India e-mail: [email protected] C. Chandra Churh e-mail: [email protected] K. Dipak Kumar e-mail: [email protected] R. Kusal Department of Agricultural Entomology, F./Ag., BCKV, Mohanpur, Nadia, West Bengal, India e-mail: [email protected] P. Anupam Kangsabati Command Area Development Authority, Water Resource Investigation and Development, Government of West Bengal, Bankura, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_7
71
72
M. Dhiman et al.
species Oryza sativa (Asian rice) or Oryza glaberrima (African rice) and is the agricultural commodity with the third-highest worldwide production. There are more than 40,000 varieties of rice. Visual identification of so many rice varieties so far extant in farmers’ fields and in the fields of rice seed savers is a tedious, time-consuming, less efficient, and non-consistent method. Considering the drawbacks of the manual approach of identification, an automated image processing-based approach is necessary for fast and more accurate identification of rice varieties if a farmer, researcher, or a seed conserver faces difficulty to identify them. In last decades many work have been done on rice quality and rice variety identification using several approaches. Anchan et al. [2] proposed image processing and neural network-based rice grain classification and identification using five features mainly area, major axis length, minor axis length, eccentricity, and perimeter. In [13] classification of four varieties of rice grain images using back-propagation neural network and color features, texture features and wavelet features have been discussed. Morphological operation-based milled rice quality determination with the help of shape descriptors and geometric features is discussed in [1]. Quality analysis and grading of rice grain images using decision tree have been discussed by Patil et al. [11]. Using image processing-based color features, morphological features, and Mahalanobis distance as a classifier, rice variety classification, and identification are discussed in [7]. Zhao-yan et al. proposed a method of identification based on neural network to classify rice variety using color and shape features with an accuracy of 88.3% [10]. Granitto et al. [6] used Bayesian approach and artificial neural network systems for seed identification with the help of seeds size, shape, color, and texture characteristics. In [12] Shouche et al. used shape variation, based on grain morphology, to identify fifteen Indian wheat varieties. Transformation of RGB values to a* and b* color components and the elimination of intensity to enhance color differences in images of diseased mushrooms is explained in [14]. Deep learning-based approaches for rice grain identification have been studied in the past few years. Lin et al. [9] used a convolution neural network consisting of 7 layers with a set of 6 filters for the different convolutional layers for the classification of grains of three rice varieties. In [3] Chatnuntawech et al. proposed spatio-spectral deep convolutional neural network for the classification of six different classes of rice grains and compared the result with VGGNet, ResNet, ResNet-B, and SVM. In this article a simple, effective, and high accuracy deep learning-based approach is used to identify twenty different rice varieties consisting of 20668 images, and a success rate of 99.76% was achieved.
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images
73
2 Preliminaries 2.1 Dataset Description The dataset contains 20,668 images of the seed coat of 20 different rice varieties (Aghanibora—1000, Baigan Monjia—1000, Bhasa Kalmi—1000, Chamarmani— 1200, Danaguri—1004, Dehradun Gandheshwari—1000, Geli Geti—1027, Gobindobhog—1000, Jaldubi—1315, Jhuli—1031, Jhulur—1000, Kalabhat—1000, Kamolsankari—1005, Kasuabinni—1001, Kokila pateni—1022, Mugojai—1007, Radhuni Pagal—1000, Sada Pony—1034, Sundari—1021, Yellow Patani—1001)
3 Proposed Technique In this section a deep learning-based classification of rice varieties is discussed. This approach revolves around analyzing images of rice grains for their automated classification into different varieties. The images used for these purposes are highmagnification images of the rice grains with a black background. The images were processed from a little to no extent thus keeping their textural features intact. History suggests that deep learning techniques excel in handling image data. Although deep learning paradigms are quite powerful but for complex data, deep and complex models are necessary. If a model is trained using gradient-based learning techniques, then the model becomes prone to the dangers of accuracy degradation and increasing loss (caused by the problem of vanishing gradient) with an increase in depth of the model. In order to design an effective model against complex dataset certain deep learning paradigms are followed to overcome such problems. Additionally, the image data used in this study has been processed to optimize the performance of the model. Thus the methodology has been developed using a CRNN (Convolutional Recurrent Neural Network) architecture. This network instead of the traditional CNN (Convolution Neural Network) is used to assist the learning process of the model using hidden sequential and gradient-based features of the image [8]. The proposed model thus enhances the efficiency of the classification task as well as improves the overall performance of the proposed methodology.
3.1 Image Acquisition Image acquisition is the first task for identifying or classifying the images. An ALMICRO DS-50 Stereoscopic Microscope with 5 MP digital eye-piece camera having 0.5x lens is used to capture the images of the rice grain. The camera was mounted on a stand which provided easy vertical movement and stable support and was fixed at
74
M. Dhiman et al.
8cm from the object. A uniform black background was used to capture the images. The images were captured in different lighting conditions. The samples were collected from Rice Research Center, Fulia, Nadia, West Bengal. 20668 images of 20 rice varieties were taken for analysis. The File Format of the image in jpg, preview resolution is 1280 × 960.
3.2 Preprocessing For classification using deep learning approach, after image acquisition, images are resized to dimensions (256 × 256), maintaining their inherent aspect ratio. This is done by resizing the images to either a height or width of 256 (whichever dimension being larger in the original image) and scaling the other dimension while maintaining the aspect ratio. This resized image is then overlaid on a null matrix of resolution (256 × 256). The resizing provides the ideal input for the proposed network which further optimizes the performance of the model.
3.3 Model Development In this work a deep learning-based model has been developed using a Convolutional Recurrent Neural Network (CRNN) architecture. This network instead of the traditional Convolution Neural Network (CNN) is used to assist the learning process of the model using hidden sequential and gradient-based features of the image. The proposed model thus enhances the efficiency and overall performance of classification task. The rice grain images used for the proposed methodology provide different classes of features such as textual and color features which can easily be recognized using the CNN. But the morphological features such as length, width, the serial tusk structure of the rice can be best analyzed when treated as sequential data. Hence we used the Recurrent Neural Network (RNN) architecture with the CNN, combined into the CRNN architecture. The CRNN is known for extensive analysis of sequences in images and hyper-spectral image classification. The proposed CRNN architecture is found to perform extensively well on the proposed methodology overcoming the performance of the traditional CNN architecture. The proposed CRNN architecture consists of the traditional convolution layers, fed with the image dimensions as input followed by the Recurrent layer, where LSTM (Long Short-Term Memory) is used for the recurrent operation. The architecture consists of 5 convolution operations followed immediately by batch-normalization and ReLU activation. Three of the ending convolution operations are followed by max pooling to reduce the dimension of the feature matrices. Thus the final max pooling layer generates a feature matrix of dimensions (32 × 32 × 64) and marks the end of the convolution part of the CRNN network. The 3D feature matrix was reshaped into a 2D data (32 × 2048) to be fed as input to the following LSTM layer.
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images
75
Fig. 1 A 3D representation of the proposed CRNN architecture
Recurrent operation in the architecture as mentioned previously is performed using a combination of a forward propagating and a backward propagating LSTM layers. Each LSTM layer consisted of 32 cells, thus generating the feature maps, which are flattened using a flatten operation. Finally the Fully connected layer at the end generated the output. Figure 1 shows the proposed architecture. The Convolution and the Recurrent operations are described in the sections below.
3.4 Convolution Layer The convolution operation used in our proposed methodology is for the recognition of traditional image-based features such as color, texture, and shape of the rice grain. Max pooling operation was performed on three convolution layers to reduce the dimensions of the feature matrix and thus decreasing the parameter complexity and the computational cost, which are defined as follows: ρ = k X k X f in X f out
(1)
k = ρ X din X din
(2)
Here, k represents the kernel dimension in each direction, f in and f out are the number of feature maps fed as input and obtained as output from a convolution layer, and finally din is the dimension of the input feature map in each direction.
76
M. Dhiman et al.
The first convolution layer of the proposed model accepts an input of dimension (256 × 256 × 3). The CNNs are regularized versions of multi-layer perceptrons. Convolution neural networks take advantage of the hierarchical pattern in data and assemble more complex patterns using simple and simpler patterns. Recognition of rice grains is assisted mostly by the convolution layers in the CRNN architecture. The weights of the filters are optimized using the back-propagation property of the CNN, thus optimizing the classification to a large extent. The optimal kernel size used for the best performance of the model is 3 × 3.
3.5 Recurrent Layer The Recurrent operation of the CRNN architecture is performed by the bidirectional LSTM network which follows the convolution operation. The bidirectional LSTM layer comprises the forward propagating and the backward propagating LSTM layers, which are activated simultaneously. Thus the layers generate a combined output from the 2D input feature matrix. The theoretical working of the LSTM is illustrated as follows: i t = σ ( xt .U i + h t−1 .W i ), f t = σ ( xt .U f + h t−1 .W f ), ot = σ ( xt .U o + h t−1 .W o ) Ct1 = tanh( xt .U g + h t−1 .W g ), Ct = σ ( f t ∗ Ct−1 + i t ∗ Ct1 ), h t = tanh( Ct ∗ ot ) Here the i, f , o are input, forget and output gates, respectively. W is the recurrent connection, with U as the weight matrix and h as the hidden state [4]. The input and the output gate facilitate the propagation of data between the cells and the forget gate enables the memorizing of data for a limited amount of time to assist in the extraction of hyper-spectral and sequential features [4]. LSTMs are capable of remembering longer sequences compared to simple RNN and GRU cells. Hence in the proposed network, they excel in recognizing lengths and other morphological features. Thus the RNN combined with the preceding CNN considers a large class of features for generating the most optimized performance of the proposed methodology with an accuracy nearly the same as that of a state-of-the-art network.
4 Experimental Results 4.1 Experimental Setup The CRNN network is built using the keras framework with tensorflow as the backend of the model. The optimization was performed using the RMSprop optimizer,
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images
77
which adjusts the learning rate automatically. Binary cross-entropy loss is used as the loss function. It is independent for each vector component or class, thus sets up binary classification problem between two classes for every class in the proposed methodology. Early stopping technique is used to obtain the best weights for the model before its overfits the dataset. The total number of images used is 20,668, which has been divided into five subsets for allowing the fivefold cross validation to be performed. The model was trained, and the fivefold cross validation was performed to establish the robustness of the neural network and the proposed methodology. The images are fed as shape (256 × 256 × 3) to the model for its training. Evaluation of the model is performed in terms of the accuracy of the network toward the multi-class binary classification task. The final layer of the model, the fully connected network is initialized with 20 classes for the different rice grain classification tasks.
4.2 Results and Visualizations The proposed CRNN architecture has obtained great results on the 20,668 images of the rice grains. The classification is performed with an average accuracy of 99.76% and a state-of-the-art AUROC score of 0.9983 has been obtained. Thus, the classification of 20 different rice varieties using a binary cross-entropy loss function for the multi-class binary classification has obtained similar to optimal scores for the classification task.
4.3 Performance Metrics The model’s performance has been validated using several performance measures or metrics and is tabulated in Table 1. The various metrics used for the validation of the proposed network are given below. Accuracy =
TP TP +TN , Pr ecision = T P + T N + FP + FN T P + FP
Recall =
2T P TP , F1Scor e = TP +TN 2T P + F P + F N
From Table 1, it is clear that the model or the proposed CRNN network has achieved great results for the task of rice variety classification. The CRNN network performs optimally toward the fivefold cross validation method used for focusing on the robustness and the validity of the CRNN architecture. The Precision, Recall, and F1 Score for the fivefolds are calculated from the individual confusion matrices based on the fivefold cross validation.
1.0
0.982
0.982
0.966
1.0
1.0
0.950
0.983
0.950
0.967
0.967
Mugojai
Yellow Patni
Danaguri
Jhuli
Jhulur
Kamolsankari
Kasuabinni
Kokila Pateni
Radhuni Pagal
Sada Pony
Sundari
0.983
Geli Geti
0.901
0.967
Dehradun Gandeshwari
Kalabhat
1.0
Chamar Mani
1.0
0.983
Bygon Mongia
0.923
0.982
Bhasa Kalmi
Joldubi
1.0
AghaniBoro
Gobindabhog
Precision
Class
Fold 1
0.983
1.0
0.966
0.966
1.0
0.95
0.983
0.966
0.95
0.95
0.966
0.916
1.0
0.966
0.983
0.983
0.983
0.966
0.95
0.983
Recall
0.975
0.983
0.958
0.974
0.967
0.974
0.991
0.966
0.966
0.966
0.983
0.909
1.0
0.983
0.983
0.975
0.991
0.974
0.966
0.991
F1 Score
1.0
0.975
0.974
1.0
0.975
1.0
1.0
0.975
1.0
0.975
1.0
1.0
1.0
1.0
0.906
0.975
0.975
0.973
1.0
0.975
Precision
Fold 2
0.95
1.0
0.95
0.95
0.975
1.0
1.0
1.0
1.0
1.0
1.0
0.975
1.0
1.0
0.975
1.0
1.0
0.925
0.95
0.974
Recall
0.974
0.987
0.962
0.974
0.975
1.0
1.0
0.987
1.0
0.975
1.0
0.987
1.0
1.0
0.939
0.987
0.987
0.948
0.974
0.975
F1 Score
0.961
0.979
1.0
1.0
0.943
0.56
0.979
1.0
1.0
0.980
1.0
0.980
1.0
1.0
0.979
1.0
1.0
0.98
0.98
0.943
Precision
Fold 3
Table 1 Performance of the proposed network based on the fivefold cross validation
1.0
0.94
0.98
1.0
1.0
0.96
0.96
0.98
0.98
1.0
1.0
1.0
0.98
0.98
0.96
1.0
0.98
0.98
0.98
1.0
Recall
0.980
0.959
0.989
1.0
0.970
0.96
0.969
1.0
0.989
0.980
1.0
1.0
0.989
0.989
0.969
1.0
0.989
0.98
0.98
0.980
F1 Score
Fold 4
0.987
0.974
0.897
0.963
0.975
1.0
0.986
1.0
0.987
0.975
0.987
0.962
0.975
0.985
0.950
0.987
0.987
0.952
1.0
0.951
Precision
0.987
0.962
0.987
1.0
0.987
0.975
0.9
1.0
0.987
1.0
1.0
0.962
1.0
0.875
0.962
0.95
1.0
1.0
0.962
0.975
Recall
0.987
0.968
0.940
0.987
0.981
0.987
0.947
1.0
0.987
0.987
0.993
0.962
0.987
0.927
0.956
0.974
0.993
0.975
0.980
0.962
F1 Score
Fold 5
1.0
1.0
1.0
0.952
0.972
0.975
0.975
0.975
0.975
1.0
1.0
0.902
0.952
0.926
0.926
0.975
1.0
1.0
0.975
1.0
Precision
0.95
1.0
0.975
1.0
0.9
1.0
0.975
0.975
1.0
1.0
1.0
0.925
1.0
0.95
0.95
1.0
0.95
0.975
1.0
0.95
Recall
0.975
1.0
0.987
0.975
0.935
0.987
0.975
0.975
0.987
1.0
1.0
0.925
0.975
0.938
0.938
0.987
0.974
0.987
0.987
0.974
F1 Score
78 M. Dhiman et al.
Deep Learning-Based Classification of Rice Varieties from Seed Coat Images
79
Fig. 2 Training Accuracy and Training Loss curves of the proposed model
4.4 Training Curves The training accuracy and the training loss curves for the fivefold cross validation performed using 5 subsets of the data and is illustrated in Fig. 2. The important thing that is clear from the training accuracy and the training loss for the individual folds or subsets is that the proposed CRNN network converges optimally to the state-of-the-art accuracy thus claiming to be much better in performance and identification or classification task of the rice varieties. The proposed CRNN network thus excels in multi-class binary classification of the 20 different classes.
5 Conclusion The proposed deep learning model excels in the task of rice variety identification and classification from seed coat images. It generates a state-of-the-art equivalent result. The performance of the proposed network is mostly influenced due to the use of Recurrent neural networks and Long Short-Term Memory for the Recurrent operation with the traditional CNN architecture. The absence of any pre-processing done on the images than just resizing them claims the robustness of the proposed model and also provides much better performance than many existing works for rice variety classification. The proposed model is also more robust and prominent than the traditional feature-based image processing approaches. The classification accuracy of the proposed deep learning model is 99.76% compared to that of the traditional feature-based image processing approaches.
80
M. Dhiman et al.
Acknowledgements The authors would like to thank Rice Research Center, Fulia, West Bengal, India for providing paddy rice samples of different varieties for carrying out the present study. Conflict of Interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
References 1. Ajay G, Suneel M, Kumar KK et al (2013) Quality evaluation of rice grains using morphological methods. IJSCE 2:35–37 2. Anchan A, Shedthi BS (2016) Classification and identification of rice grains using neural network. IJIRCCE 4(4):5160–5167. 10.15680/IJIRCCE.2016. 0404140 3. Chatnuntawech I, Tantisantisom K, Khanchaitit P, et al (2019) Rice classification using spatiospectral deep convolutional neural network. Comput Vis Pattern Recog. arXiv:1805.11491 [cs.CV] 4. Chatterjee CC (2019) Implementation of RNN, LSTM, and GRU. Towards Data Sci 5. Dirac P (1953) The lorentz transformation and absolute time. Physica 19(1–12):888–896. https://doi.org/10.1016/S0031-8914(53)80099-6 6. Granitto PM, Verdes PF, Ceccatto HA (2005) Large-scale investigation of weed seed identification by machine vision. Elsevier Comput Electron Agric 47:15–24. https://doi.org/10.1016/ j.compag.2004.10.003 7. Gupta NJ (2015) Identification and classification of rice varieties using mahalanobis distance by computer vision. Int J Sci Res Publ 5:1–6 8. Keren G, Schuller B (2016) Convolutional RNN: an enhanced model for extracting features from sequential data. In: 2016 international joint conference on neural networks, pp 3412–3419 9. Lin P, Chen Y, He J, et al (2017) Determination of the varieties of rice kernels based on machine vision and deep learning technology. In: 10th international symposium on computational intelligence and design (ISCID) 1:169–172. 10.1109/ISCID.2017.208 10. Zy Liu, Cheng F, Yb Ying et al (2005) Identification of rice seed varieties using neural network. J Zhejiang Univ Sci B 6:1095–1100. https://doi.org/10.1631/jzus.2005.B1095 11. Patil V, Malemath VS (2015) Quality analysis and grading of rice grainimages. IJIRCCE 3:5672–5678. 10.15680/ijircce.2015.0306104 12. Shouche SP, Rastogi R, Bhagwat SG et al (2001) Shape analysis of grains of Indian wheat varieties. Elsevier Comput Electron Agric 33:55–76. https://doi.org/10.1016/S01681699(01)00174-0 13. Singh KR, Chaudhury S (2016) An efficient technique for rice grain classification using back propagation neural network and wavelet decomposition. IET Comput Vis 10:780–787. https:// doi.org/10.1049/iet-cvi.2015.0486 14. Vizhanyo T, Felfoldin J (2000) Enhancing color differences in image of diseased mushrooms. Elsevier Comput Electron Agric 26:187–198. https://doi.org/10.1016/S0168-1699(00)000715
Leaf-Based Plant Disease Detection Using Intelligent Techniques—A Comprehensive Survey Sourav Chatterjee, Sudakshina Dasgupta, and Indrajit Bhattacharya
Abstract Plant disease is the root cause of loss in farming and economy. To find and identify diseases in plants by physical visual inspection or laboratory examination requires lots of expertise in phytopathology. Moreover, the methods are nonsystematic, inconsistent, unpredictable, and exhaustive. Disease diagnoses through visual inspection also require processing time which is not in proportion, time-consuming, and costly. Proper identification and distinguishing plant leaf diseases on the basis of plant leaf images are cost-effective and require much less processing time. Leaf image classification shows impressive results in the area of plant leaf disease detection. In this paper, a comprehensive discussion of the identification of disease in plant leaf and classification by different existing machine learning and deep learning techniques are discussed. Finally, a proposal is made to build a Decision Support System (DSS) which will guide to select an exact algorithm from the existing algorithm suite to achieve better performance in identification of different diseases in plant leaves. Keywords Plant disease · Disease detection · Decision support system · Machine learning · Deep learning
1 Introduction Plants are the main energy source for all living beings. Plants have to face a lot of diseases due to malnutrition, environmental condition, and pests. Plant diseases reduce the quality and quantity of agricultural yield. Leaves of various plants are also useful for human for their admirable medicinal quality. Sixty percent of inhabitants of Asian and African countries are dependent on agriculture for their earnings, employment, and economy. Losses from plant diseases not only fall the income of S. Chatterjee (B) · S. Dasgupta Government College of Engineering and Textile Technology, Serampore, India e-mail: [email protected] I. Bhattacharya Kalyani Government Engineering College, Kalyani, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_8
81
82
S. Chatterjee et al.
crop manufacturer but also decrease the GDP or economic growth of a country. So, there are needs to monitor and evaluate plant quality for disease identification and detection. Various plant diseases can be categorized on the basis of symptoms, plant organ they affect, type of plant they influence, and type of pathogens responsible for the diseases. As the demand for good quality agricultural yielding is a matter of concern, the quality assessment of plant is done by inspecting various leaf features such as texture, appearance, cracks, color, surface, etc. Disease detection and classification can be done accurately on leaf images. This paper explores the latest growth of image processing and machine learning in identification and classification of different plant leaf diseases and proposes appropriate method selection for identification of diseases.
2 Literature Review For identification of diseases in plant, pathologists focus on root, stem, kernel, and leaf. This paper concentrates only on the symptoms found in leaves. Different researchers have already focused on different techniques for plant disease detection and classifications which include different stages like preprocessing, selection, and segmentation. After that, disease classification can be done by different classifiers, and disease category is estimated.
2.1 Detection of Plant Diseases with Segmentation Different image segmentation techniques like edge-based segmentation, thresholdbased segmentation, region-based segmentation, clustering-based segmentation, and artificial neural network-based segmentation can be used. In Table 1, Fuzzy c-Means k-Means (FCM-KM) and faster Region-based Convolutional Neural Network (RCNN), segmentation on the basis of color, Artificial Neural Network (ANN)-based segmentation, and gray-level segmentations are used. The algorithms are applied on the leaves of rice, grape, tomato, pomegranate, maize, etc., (mainly the dataset is taken from different Plant Leaf Dataset+ ) for diseased leaf detection and classification. It can also be applied to other crops. The average accuracy of detection of diseased leaves by using the above segmentation techniques is found to be 94% approximately.
Leaf-Based Plant Disease Detection Using Intelligent …
83
Table 1 Plant disease detection with segmentation Sl. no.
Proposed method by earlier researchers
1
No. of image dataset (leaf)
Disease detected
Features
Accuracy and other findings
Detection of plant 3010 images diseases by (rice) FCM-KM with faster R-CNN fusion [1]
Rice blast, sheath blight, Bacterial blight
Threshold segmentation
Accuracy 97% This method can be used for large-scale dynamic disease detection
2
Plant disease detection using color-based segmentation and machine learning [2]
400 images (grape)
Black rot
Color-based detection
Accuracy 94% The method performs well when affected area has different color than non-affected area and SVM is used
3
Plant leaf disease detection using learning vector quantization (LVQ) Algorithm [3]
500 images (tomato)
Late blight, septoral leaf spot, bacterial spot, and yellow curved leaf diseases
CNN with LVQ
Accuracy 90% This method can be applied with different types of filters
4
Plant disease detection using segmentation approach [4]
2000 images (tomato)
Bacterial spot, Contrast and late blight, correlation and septoral leaf spot
Accuracy 99% Feature size can be reduced
5
Image-based plant disease detection [5]
400 images (pomegranate)
Bacterial blight
Intensity
Accuracy 93% This method detects the degree of affected disease and provides solution
6
Detection of leaf diseases using gray-level segmentation and edge detection technique [6]
Plant village dataset (Maize)+
Leaf rust and Northern leaf blight, gray leaf spot
Intensity
Accuracy 93% Farmers can implement this technique to detect common maize diseases
2.2 Detection of Plant Diseases Through Analysis of Color and Texture Different color and texture analysis methods like color slicing approach; Mean Value of Pixel (MVP); Color Coherence Vector (CCV); Local–Global Binary Pattern Histogram Sequence (LGBPHS); RGB, HIS, and YCb Cr color models; Spatial GrayLevel Dependence Matrix (SGDM), Canny Edge Detection Algorithm (CEDA), etc., are used for detection of plant diseases. Leaves of paddy, wheat, tomato, cotton, and grapes are used. It can also be applied to other crops. The average accuracy of detection of diseased leaves by using the color and texture analysis method is found to be
84
S. Chatterjee et al.
more than 90%. Different techniques for plant disease detection through analysis of color and texture are mentioned in Table 2. Table 2 Plant disease detection through analysis of color and texture Sl. no.
Proposed method by earlier researchers
No. of image dataset (leaf)
Disease detected
Features
Accuracy and other findings
1
Color slicing approach for detection of plant diseases [7]
150 images (paddy)
Blast
Color, texture
97% Accuracy This color slicing method can be applied along with Histogram-based approach
2
Detection of plant leaf diseases by mean value of pixel (MVP) [8]
300 images (wheat)
Tan spot, leaf rust, strip rust
Color, texture, shape
91% Accuracy This algorithm can identify the correct target with different ranges of intensity distribution
3
Leaf disease detection using color coherence vector (CCV) and local global binary pattern histogram sequence (LGBPHS) [9]
200 images (tomato)
Early blight
Color, texture, shape
90% Accuracy This method along with machine vision algorithm gives a promising result
4
Disease identification by Plant RGB, HIS, and YCb Cr village dataset color models [10] (cotton)+
Bacterial, fungal
Damage ratio
Accuracy 91% The algorithm fails to deal leaf shadow and random noise
5
Disease Identification with texture analysis [11]
400 images (paddy)
Brown spot, leaf blast, leaf blight, tungro
Fractal descriptor
Accuracy 83% This method may not be suitable alone but if combined with other features yield good result
6
Plant disease detection by spatial gray-level dependence matrix (SGDM) [12]
100 images
Blackrot, downy mildew, powdery mildew
Color, texture
Accuracy 92% With hybrid approach, classification rate can be increased
7
Plant disease detection by canny edge detection algorithm (CEDA) [13]
1500 images (grape)
Fungal, bacterial, and virus
Color histogram
Accuracy 93% This algorithm filters useless information and preserves significant properties of image
Leaf-Based Plant Disease Detection Using Intelligent …
85
2.3 Plant Disease Detection and Classification Using Artificial Intelligence and Machine Learning Different machine learning approaches such as K-Nearest Neighbors (KNN), Convolutional Neural Networks (CNN), and Support Vector Machines (SVM) are used for plant disease detection. These different methods of artificial intelligence are used for the detection of different plant diseases occurring in different leaves of plants like tomato, potato, grape, turmeric, etc. The methods can also be applied to other crops. The average accuracy of diseased leaf detection by different machine learning approaches is found to be nearly 92%. Different techniques for detection of diseased leaf and its classification using artificial intelligence and machine learning are mentioned in Table 3. Table 3 Plant disease detection and classification through AI and ML Sl. no.
Proposed method by earlier researchers
No. of image dataset (leaf)
Disease detected
Features
Accuracy and other findings
1
Detection of plant leaf diseases and classification using artificial intelligence [14]
700 images (turmeric)
Blotch
Convolution neural network
Accuracy 96% This method uses VGG 16 architecture which improves the efficiency
2
Plant leaf diagnosis by co-occurrence matrix and artificial intelligence [15, 16]
100 images (grape)
Rust, scab, downy mildew
Color and texture
Accuracy 90% This method can be applied to grape and other diseases
3
Detection of plant leaf diseases by the internet of things and artificial intelligence technologies [17]
No leaf image Blast (rice)
Non-image data is taken
Accuracy 89% Spore germination is taken care of, which is a key factor for Rice Blast
4
Diseased leaf detection 10,000 images Early blight Convolution using convolution neural (tomato) bacterial network (CNN) [18, 19] spot, late blight, septoria, leaf mold, mosaic virus spider mite
Accuracy 91% This method can be implemented for other crops also
86
S. Chatterjee et al.
2.4 Plant Disease Detection and Classification Applying Deep Learning Deep learning, a subset of machine learning is a neural network which works with more layers that follows human-type knowledge gaining. Deep learning uses statistics with predictive modeling and is used in prediction and classification of plant diseases. Deep learning methods using CNN, overfitting mitigation analysis, and DenseNet using transfer learning are used to detect diseases of potato, pepper, tomato, rice, grape, etc. The methods can also be applied to other crops. The average accuracy of diseased leaf detection by deep learning approaches is more than 95%. Different techniques for identification and Classification of plant diseases through deep learning are mentioned in Table 4. Table 4 Plant disease detection and classification using deep learning Sl. Proposed method by No. of image Disease no. earlier researchers dataset (leaf) detected
Features
Accuracy and findings
1
Prediction and classification of plant diseases using deep learning ConvNets [20, 21, 22]
Plant village Different dataset disease of 3 (potato, plants pepper, tomato
Convolutional network
Accuracy 98% This method is used in detection of diseases in 3 plants with good accuracy
2
Overfitting mitigation analysis in deep learning models for plant leaf disease recognition [23, 24]
20,639 images (15 different classes)
15 classes of Data disease augmentation
Accuracy 96% DenseNet-121 model for plant leaf disease detection in image processing
3
Plant leaf disease detection based on deep learning and convolutional neural network [25, 26]
36,258 images (10 different plants)
61 classes
4
Deep learning model for early prediction of plant disease [27, 28]
Plant village 9 different dataset diseases (tomato)+
Combination of DenseNet with VGG16 by applying transfer learning
Accuracy 98% The architecture of DenseNet is more accurate than VGG16 for its varied features
5
Rice leaf disease spotting by automated classification using deep learning perspective [29, 30]
1045 images Sheath (rice) blight, false smut, stem borer, rot brown spot
Convolution Neural Network (CNN) and deep learning
Accuracy 95% The accuracy is improved by using the method
Merging CNN Accuracy 87% models with This method can be stacking model applied to farming in the field level as it can alert farmers
(continued)
Leaf-Based Plant Disease Detection Using Intelligent …
87
Table 4 (continued) Sl. Proposed method by No. of image Disease no. earlier researchers dataset (leaf) detected
Features
Accuracy and findings
6
Forecasting of grape Plant village Black rot, leaf diseases using dataset ESCA, deep learning [31, (grape)+ blight 32]
Improved transfer learning-based efficient network
Accuracy 95% This method shows superior results in disease identification in grape leaves
7
Deep learning with CNN approach for plant leaf disease detection [33, 34]
36,258 images (various leaves)
Different diseases
Deep learning
Accuracy 92% The system can diagnosis ongoing plant diseases but unable to forecast of potential of being infected
8
Greenhouse plant disease detection using deep learning [35, 36]
10,478 images (tomato)
Early blight, CNN with late blight, convolution spider mites, leaf mold, target spot
Accuracy 95% An automated system was developed
3 Proposed Methodology It has been observed that for the leaf-based plant disease detection researchers had proposed different methodologies, viz, (i) segmentation, (ii) analysis of color and texture, (iii) machine learning, (iv) deep learning, etc. But which methodology would be more efficient for a given sample of leaf image data that has not been taken care of. In the present work, a Decision Support System (DSS) is proposed. DSS has different components like machine learning tools, deep learning tools, or combination of these two tools. DSS will be trained by the diverse aspects of the diseased leaves such as color, texture, shape, intensity, damage ratio, fractal descriptor, color histogram, etc. It knows which features would be applicable to a particular model. So, DSS will take new specimen of leaf as input without knowing whether it is diseased or not and the model will suggest which tool should be chosen to determine whether the particular leaf blade is infected or not and the probable type of disease and also the remedial measure with maximum accuracy. It would be useful for early disease detection and management if continuous monitoring is performed on plant population. Advice and suggestions are provided to the plant population from the DSS for selection of suitable disease detection methodology based on the features provided by the plant environment module. The interface between plant environment and DSS is modeled in Fig. 1.
88
S. Chatterjee et al.
Database PLANT POPULATION
Models
DSS Expert Knowledge
P Interpretation
Actions
Decision
Advice for model selection
Fig. 1 Decision support system for plant disease detection
4 Conclusion and Future Scope Well-ahead detection and identification improves plant growth and ultimately crop production. Recognition of diseases in leaf blade with bare observation is costly and time-consuming as these are major roles of phyto-pathologists’ expertization. Detection of diseases by leaf image and identification of the diseases by classification techniques are improving day after day and replacing the manual method of disease detection. This paper highlights different techniques used by different researchers with their limitations and problems. A comparative study is done among different techniques employed for same or different disease detection. Ultimately, a DSS has been proposed that might assist to select the suitable method which would best fit for identification of leaf diseases depending upon the plant environment and knowledge base. The method will monitor the plants on a regular basis and suggest suitable advice to the user. The system can be further modified by controlled feedback and sufficient training dataset. +Plant Leaf Dataset URL: https://data.mendeley.com/datasets https://www.kaggle.com/datasets https://archive.ics.uci.edu/ml/datasets/leaf.
References 1. Zhou G, Zhang W, Chen A, He M, Ma X. Rapid detection of rice disease based on FCM-KM and faster R-CNN fusion. IEEE Access 7:143190–143206 2. Kirti, Rajpal N (2020) Black rot disease detection in grape plant (Vitisvinifera) using colour based segmentation & machine learning. In: 2020 2nd International conference on advances in computing, communication control and networking (ICACCCN), 2020, pp 976–979
Leaf-Based Plant Disease Detection Using Intelligent …
89
3. Sardogan M, Tuncer A, Ozen Y (2018) Plant leaf disease detection and classification based on CNN with LVQ algorithm. In: 2018 3rd International conference on computer science and engineering (UBMK), pp 382–385 4. Rahman MA, Islam MM, Shahir Mahdee GM, UlKabir MW (2019) Improved segmentation approach for plant disease detection. In: 2019 1st International conference on advances in science, engineering and robotics technology (ICASERT), pp 1–5 5. Sharath DM, Akhilesh, Kumar SA, Rohan MG, Prathap C (2019) Image based plant disease detection in pomegranate plant for bacterial blight. In: 2019 International conference on communication and signal processing (ICCSP), pp 0645–0649 6. Bonifacio DJM, Pascual AMIE, Caya MVC, Fausto JC (2020) Determination of common Maize (Zea mays) disease detection using gray-level segmentation and edge-detection technique. In: 2020 IEEE 12th International conference on humanoid, nanotechnology, information technology, communication and control, environment, and management (HNICEM), pp 1–6 7. Automated blast disease detection from paddy plant leaf—a color slicing approach (2018) 7th International conference on industrial technology and management (ICITM) 8. Taohidul Islam SM, Masud MA, Ur Rahaman MA, Hasan Rabbi MM (2019) Plant leaf disease detection using mean value of pixels and canny edge detector. In: 2019 International conference on sustainable technologies for industry 4.0 (STI) 9. Nagamani HS, Devi HS (2021) Plant leaf disease detection using CCV and LGBPHS. IEEE Mysore Sub Section International Conference (MysuruCon) 2021:165–171 10. He Q, Ma B, Qu D, Zhang Q, Hou X, Zhao J (2013) Cotton pests and diseases detection based on image processing. Indones J Electr Eng Comput Sci 11(6):3445–3450 11. Asfarian A, Herdiyani Y, Rauf A, Mutaqin KH (2013) Paddy diseases identification with texture analysis using fractal descriptors based on Fourier spectrum. In: International conference on computer, control, informatics and its applications (IC3INA), Jakarta, 19–21 November, pp 77–81 12. Narvekar PR, Kumbhar MM, Patil SN (2014) Grape leaf diseases detection & analysis using SGDM matrix method. Int J Innov Res Comput Commun Eng 2(3):3365–3372 13. Tajane V, Janwe NJ (2014) Medicinal plants disease identification using canny edge detection algorithm histogram analysis and CBIR. Int J Adv Res Comput Sci Soft Eng 4(6):530–536 14. Rajasekaran C, Arul S, Devi S, Gowtham G, Jeyaram S (2020) Turmeric plant diseases detection and classification using artificial intelligence. In: 2020 International conference on communication and signal processing (ICCSP), pp 1335–1339 15. Phookronghin K, Srikaew A, Attakitmongcol K, Kumsawat P (2018) 2 level simplified fuzzy ARTMAP for grape leaf disease system using color imagery and gray level co-occurrence matrix. Int Electr Eng Congr (iEECON) 2018:1–4 16. Bondre S, Sharma AK (2021) Review on leaf diseases detection using deep learning. Second international conference on electronics and sustainable communication systems (ICESC) 2021:1455–1461. https://doi.org/10.1109/ICESC51422.2021.9532697 17. Chen W-L, Lin Y-B, Ng F-L, Liu C-Y, Lin Y-W (2020) RiceTalk: rice blast detection using internet of things and artificial intelligence technologies. IEEE Internet Things J 7(2):1001– 1010 18. Agarwal M, Singh A, Arjaria S, Sinha A, Gupta S (2020) ToLeD: tomato leaf disease detection using convolution neural network. Proc Comput Sci 167:293–301 19. Alajas OJ et al (2022) Detection and quantitative prediction of diplocarponearlianum infection rate in strawberry leaves using population-based recurrent neural network. In: 2022 IEEE International IOT, electronics and mechatronics conference (IEMTRONICS), pp 1–8. https:// doi.org/10.1109/IEMTRONICS55184.2022.9795744 20. Lakshmanarao A, Babu MR, Kiran TSR (2021) Plant disease prediction and classification using deep learning ConvNets. International Conference on Artificial Intelligence and Machine Vision (AIMV) 2021:1–6. https://doi.org/10.1109/AIMV53313.2021.9670918 21. Li L, Zhang S, Wang B (2021) Plant disease detection and classification by deep learning—a review. IEEE Access 9:56683–56698. https://doi.org/10.1109/ACCESS.2021.3069646
90
S. Chatterjee et al.
22. Tunio MH, Jianping L, Butt MHF, Memon I (2021) Identification and classification of rice plant disease using hybrid transfer learning. In: 2021 18th International computer conference on wavelet active media technology and information processing (ICCWAMTIP), pp 525–529. https://doi.org/10.1109/ICCWAMTIP53232.2021.9674124 23. Noon SK, Amjad M, Qureshi MA, Mannan A (2020) Overfitting mitigation analysis in deep learning models for plant leaf disease recognition. In: 2020 IEEE 23rd International multitopic conference (INMIC), pp 1–5. https://doi.org/10.1109/INMIC50486.2020.9318044 24. Baliyan A, Kukreja V, Salonki V, Kaswan KS (2021) Detection of corn gray leaf spot severity levels using deep learning approach. In: 2021 9th International conference on reliability, infocom technologies and optimization (Trends and Future Directions) (ICRITO), pp 1–5. https://doi.org/10.1109/ICRITO51393.2021.9596540 25. Guan X (2021) A novel method of plant leaf disease detection based on deep learning and convolutional neural network. In: 2021 6th International conference on intelligent computing and signal processing (ICSP), pp 816–819. https://doi.org/10.1109/ICSP51882.2021.9408806 26. Kirti K, Rajpal N, Yadav J (2021) Black measles disease identification in grape plant (Vitisvinifera) using deep learning. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp 97–101. https://doi.org/10.1109/ICCCIS51004. 2021.9397205 27. Rubini P, Kavitha P (2021) Deep learning model for early prediction of plant disease. Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) 2021:1104–1107. https://doi.org/10.1109/ICICV50876.2021.9388538 28. Asta Lakshmi M, Gomathi V (2021) Automatic prediction of plant leaf diseases using deep learning models: a review. In: 2021 5th International conference on electrical, electronics, communication, computer technologies and optimization techniques (ICEECCOT), pp 569– 574. https://doi.org/10.1109/ICEECCOT52851.2021.9708043 29. Cherukuri N, Kumar GR, Gandhi O, Krishna Thotakura VS, NagaMani D, Basha CZ (2021) Automated classification of rice leaf disease using deep learning approach. In: 2021 5th International conference on electronics, communication and aerospace technology (ICECA), pp 1206–1210. https://doi.org/10.1109/ICECA52323.2021.9676081 30. David HE, Ramalakshmi K, Gunasekaran H, Venkatesan R (2021) Literature review of disease detection in tomato leaf using deep learning techniques. In: 2021 7th International conference on advanced computing and communication systems (ICACCS), pp 274–278. https://doi.org/ 10.1109/ICACCS51430.2021.9441714 31. Uttam AK (2022) Grape leaf disease prediction using deep learning. International Conference on Applied Artificial Intelligence and Computing (ICAAIC) 2022:369–373. https://doi.org/10. 1109/ICAAIC53929.2022.9792739 32. Meeradevi RV, Mundada MR, Sawkar SP, Bellad RS, Keerthi PS (2020) Design and development of efficient techniques for leaf disease detection using deep convolutional neural networks. In: 2020 IEEE International conference on distributed computing, VLSI, electrical circuits and robotics (DISCOVER), pp 153–158. https://doi.org/10.1109/DISCOVER50404.2020.9278067 33. Guan X (2021) A novel method of plant leaf disease detection based on deep learning and convolutional neural network. In: 2021 6th International conference on intelligent computing and signal processing (ICSP), pp 816–819 34. Mallma JB, Rodriguez C, Pomachagua Y, Navarro C (2021) Leaf disease identification using model hybrid based on convolutional neuronal networks and K-means algorithms. In: 2021 13th International conference on computational intelligence and communication networks (CICN), pp 161–166. https://doi.org/10.1109/CICN51697.2021.9574669 35. Osama R, Ashraf NE-H, Yasser A, AbdelFatah S, El Masry N, AbdelRaouf A (2020) Detecting plant’s diseases in greenhouse using deep learning. In: 2020 2nd Novel intelligent and leading emerging sciences conference (NILES), pp 75–80 36. Sirohi A, Malik A (2021) A hybrid model for the classification of sunflower diseases using deep learning. In: 2021 2nd international conference on intelligent engineering and management (ICIEM), pp 58–62. https://doi.org/10.1109/ICIEM51511.2021.9445342
Bengali Document Retrieval Using Model Combination Soma Chatterjee and Kamal Sarkar
Abstract In this paper, we present an approach for Bengali information retrieval that combines two Information Retrieval (IR) models. Our proposed hybrid model combines a TFIDF-based IR model with a Latent Semantic Indexing (LSI)-based IR model. Since the TFIDF-based model exhibits poor recall and the LSI-based model achieves higher recall, the combination of these two models improves the retrieval performance. The experimental results demonstrate that the proposed Bengali IR model performs significantly better than some baseline models. Keywords Information retrieval · Bengali language · LSI · Vector space model
1 Introduction Users of information retrieval (IR) systems are puzzled with a large volume of information. An IR system facilitates users in retrieving information relevant to a query by searching through a vast amount of information. The word mismatch problem is the main problem faced by an IR system. The word mismatch problem occurs when the words in the query mismatch with the words in the documents. The standard TFIDF-based vector space (VS) model [1] experiences a word mismatch issue. This issue results in poor recall for IR. The main drawback of the term matching-based IR model is its limitations in handling the so-called “vocabulary mismatch” problem [2]. The concept matching approaches attempt to find matching between documents and the queries based on concepts shared by them. Latent Semantic Indexing (LSI) is a technique that maps documents and queries to a new reduced space called latent semantic space. The S. Chatterjee · K. Sarkar (B) Computer Science and Engineering Department, Jadavpur University, Kolkata, West Bengal 700032, India e-mail: [email protected] S. Chatterjee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_9
91
92
S. Chatterjee and K. Sarkar
similarities between the documents and the queries in this space indicate how much they are semantically similar. This approach is useful [2, 3] in improving recall of the IR system. The LSI-based model is not alone enough because it improves recall, but hampers precision. So, we need to combine the literal term matching method with the LSIbased method for achieving better performance. In this paper, we present a hybrid model for IR that exploits the benefits of the two models: one is the classical vector space (VS) model [1] and another is the LSIbased model. We also use a novel function for representing the outputs of these two models so that their combination can be meaningful and improves the retrieval performance. We develop two different models using the above-mentioned two methods. In response to a query, each model returns documents along with their similarities (relevance score). Thus, each document gets two different scores. Two types of scores are combined linearly to find a unique relevance score for the document. The final results are produced after ranking the documents according to their scores.
2 Related Work The earliest IR model was the Boolean retrieval model that computes relevance of documents considering the presence or absence of query terms in the document [4, 5]. Since the Boolean retrieval model cannot differentiate between the documents containing the equal number of query terms occurring in any order, the Vector Space Model (VSM) [1] was proposed to assign relative weights to the terms and represent each document as a vector of term weights. The relative importance of the terms used in the document is captured by this model, and both the query and the documents are mapped to a high-dimensional space called vector space. In the probabilistic IR models [6], document ranking is done according to the likelihood that each document is relevant to the query. The probabilities indicating the level of relevance are calculated using the conventional Bayes’ Theorem. The final ranking scores are calculated using a variety of probability measures, including Bayesian network approaches to IR [7], Okapi BM25 [8], and many others. Ponte and Croft first presented the language modeling strategy in 1998 [9]. This is an improved version of the probabilistic model. The document language model considers a document as a collection of words and asks to find whether a document can produce a query. In this model, the ranking function can be conceptualized as the following. Let D be a document and Q be a query. If θD be an estimated document language model. The probability p(Q|θD ) is considered as the score of document D in relation to query Q. The above-mentioned four IR models suffer more or less from the vocabulary mismatch problem that can be dealt with Latent Semantic Indexing (LSI) method [2]. The LSI method transforms both the queries and the documents into the lowdimensional dense vectors forming a latent semantic space. The similarities between a document and a query can then be computed in the reduced latent semantic space.
Bengali Document Retrieval Using Model Combination
93
Some researchers suggest a multi-stage retrieval system that uses user-generated titles to address the cold-start issue in automatic playlist continuation using LSI [10]. Recently, some authors have proposed a hybrid model [11], which combines two or more IR models. A combination of a VS model and a word embedding-based model has been proposed in [11]. Our present work is also done in this line. But in our work, the LSI-based IR model has been integrated with the VS model for IR. Most existing IR research is primarily done for the development of the IR methods for English. In recent times, the interest in the development and evaluation of Indian language IR systems is growing. A study on various IR Bengali models is presented by Sarkar and Gupta in [12]. Dolamic and Savoy [13] studied IR models for several Indian languages. A Bengali document retrieval strategy has been described in [14]. Most existing Bengali IR approaches use the traditional VS model, and a few attempts have been made for developing a model that has considered semantic document matching. In this paper, we present a hybrid method that linearly combines a semantic method with a lexical method for developing an effective Bengali information retrieval system.
3 Proposed Methodology The proposed hybrid IR model has two components: TFIDF-based VSM and LSIbased model. Figure 1 shows its basic architecture. For a given query and a document collection, each component assigns relevance scores to the documents according to their similarities with the query. Then the outputs of these two components are linearly combined to compute a relevance score for each document. In the next step, the proposed model ranks the documents according to these relevance scores and produces the retrieval results. The primary steps of the proposed IR model are described in the subsequent subsections.
3.1 Query Preprocessing In this step, various unimportant information like stop-words and punctuations are removed from each query. We have the stop-word list provided by FIRE.1 Then the query is stemmed using YASS stemmer[15]. Finally, the queries are tokenized into words.
1
http://www.fire.irsi.res.in/fire/static/resources.
94
S. Chatterjee and K. Sarkar
Fig. 1 The proposed system
3.2 VSM for IR The steps for the TFIDF-based VSM model for IR are given in Fig. 2. For document representation, each document is preprocessed using a method similar to the query preprocessing task. The document indexing is done using a bag-of-words model [1] that maps each document to a vector space. The query is also mapped to the same vector space. When a query is submitted, the cosine between a query vector and a document vector is computed. This similarity value indicates how much a document is relevant to the given query. For each vector, the component of the vector refers to the TF*IDF value for the corresponding word in the vocabulary. That is if the vocabulary size is v, each document or query is converted to a v-dimensional vector. Here, TF stands for term frequency, that is, how many times a term (word) appears in a document. We use the following formula to compute a variant of TF: Modi f ied_T F = log(0.5 + T F) For computing IDF, we use the following formula:
Fig. 2 Architecture of the TFIDF-based VSM
(1)
Bengali Document Retrieval Using Model Combination
I D F(wi ) = log 0.5 +
95
N D F(wi )
(2)
where N is the corpus size. DF is the count of documents containing the word at least once. It is computed considering the entire collection of N documents. For computing the similarity between the vectors representing a query and a document, we use dot product [1] of them. Thus, given the query Q, the relevance score for the document d is T F I D F(w) ∗ T F I D F(w) (3) TFIDF Score(d) = w∈(Q∧d)
TFIDF Score(d), the relevance score is further log-normalized as follows: Modi f ied_T F − I D F_Scor e(d) = log(T F − I D F_Scor e(d))
(4)
For the sake of computational efficiency, we implement the VSM model for IR using the inverted index data structure. Since our target is to combine this model with another model, we need to map the relevance score provided by this model between 0 and 1. For this purpose, we apply the softmax function on the value of Modi f ied_ T F − I D F_Scor e(d) as follows: So f tmax_N or mali ze_V alue(d) =
e Modi f ied_T F−I D F_Scor e(d) Modi f ied_T F−I D F_Scor e(d) d∈D e
(5)
Since the denominator of Eq. (5) is a large value for a large collection documents, it may produce also very small for many cases. Therefore, this value is further normalized by using the min-max procedure. Finally, the normalized relevance score RScore1 for each document is produced by this model.
3.3 LSI-Based IR Model Latent semantic analysis is done by applying the singular value decomposition (SVD) on a term-by-document matrix B created using the entire corpus of documents. When SVD is applied to B, it produces three matrices which can be combined to produce a low-rank approximation to B. If r is the rank of the original matrix B and Ck is the corresponding low-rank matrix with the rank k and k is far smaller than r, the dimensions associated with the contextually similar terms are combined (a combination of contextually similar words represents an abstract concept). As a result, the documents are mapped to a k-dimensional concept space called latent semantic space. When the queries are mapped into the same space, the cosine similarity measure can be used to find conceptual overlap between a document and a query. In this case,
96
S. Chatterjee and K. Sarkar
if a document and a query share similar concepts will be mapped nearer to each other in the latent semantic space, and the cosine similarity value is considered as the relevance score[2, 16]. The LSI-based document indexing has two important steps: (1) computing a termby-document matrix, B = [b1 , b2 ,...,bd ], where bi is the i-th column vector of the term weights for a document and d is the number of documents in the corpus, (2) applying SVD on B. For the corpus having d documents and the vocabulary size of n, we obtain B whose dimension is n × d. Application of SVD on B results in three different matrices M, N, and P as follows [18]: Bn×d = Mn×n Nn×d P T d×d
(6)
From the Natural Language processing point of view, in Eq. (6), M is term-byconcept matrix. N is a concept-by-concept diagonal matrix and PT is a concept-bydocument matrix. Three different matrices are shown in Fig. 3. The diagonal of the matrix N contains the singular values measuring the importance of a concept. If the most k-significant singular values are considered, the abovementioned three matrices are reduced to lower dimensions. This is illustrated in Fig. 3. A row of the reduced matrix P’ = {pij }, i = 1 to k and j = 1 to d, is indexed by a significant abstract concept, and pij represents how much the document j is similar to the i-th concept. Transposing P’, we obtain (P’)T which is a document-by-concept matrix whose rows are the representations of the documents in the latent semantic space. For document retrieval, both the queries and documents should be mapped to the same space. For this purpose, a query is initially represented as vector of the TF-based term weights using the bags-of-words model. Then the projection of this
Fig. 3 SVD of term-by-document matrix
Bengali Document Retrieval Using Model Combination
97
TF-based query vector in the k-dimensional subspace is computed using Eq. (7). qk = (qnT )M (N )−1
(7)
where qn is the query vector. For the k-th query, the relevance score of the j-th document is measured using Eq. (8) that computes cosine between two vectors. Sj =
qk ( p T ) j qk × ( p T ) j
(8)
where (PT )j is the vector representation of the j-th document (the j-th row of PT ) and qk is the k-th query vector obtained using Eq. (7). We consider this relevance score as RScore2 . We collect the relevance scores assigned to each document by the LSI-based IR model for further use in the combined model.
3.4 Combining IR Models We design a hybrid model to make use of the benefits of the TFIDF-based model and the LSI-based model. The TFIDF-based IR model can discriminate among the documents using the term frequencies measuring relative term importance, but this model’s performance is affected by the word mismatch problem. On the other side, the LSI-based model uses semantic matching and alleviates the word mismatch problem. Although the LSI-based model improves the recall value, it exhibits poor precision. So, combining the outputs of the two models can complement each other. As we mentioned in the earlier sections, both the IR models give relevance scores for each document. If RScore1 and RScore2 are the relevance scores outputted by Model 1 and Model 2, respectively, for a document, the hybrid relevance score for the document is obtained using Eq. (9). R Scor e = β ∗ R Scor e1 + (1 − β) ∗ R Scor e2 , 0 ≤ β ≤ 1
(9)
After ranking all documents using their relevance scores calculated using Eq. (9), the hybrid model returns the top M documents.
4 Evaluation, Experiment, and Results To evaluate the proposed retrieval system, we have developed a Bengali IR dataset consisting of approximately 3255 documents and 19 queries.
98
S. Chatterjee and K. Sarkar
We have computed Mean Average Precision (MAP) [16] scores for evaluating the IR models. For computing the MAP score, for each query, we need two lists: (1) a ranked list of the documents retrieved by the IR model and (2) the human judgments indicating which document is relevant to the query and which is not. For computing the MAP score, we need to calculate the average precision (AP) using Eq. (10). A P(qr ) =
1 P(t) Mr
(10)
where t is the relevant document’s position in the ranked list, P(t) is the precision up to the position t, and Mr is the number of relevant documents. P(t) is calculated using Eq. (11). P(t) =
Rel_t t
(11)
where Rel_t is obtained by counting the relevant documents up to position t. Finally, the MAP score is computed using Eq. (12). M A P(Q R) =
1 1 1 A P(qr ) = P(t) |Q R| |Q R| Mr
(12)
where |QR| is the number of queries. After evaluating a retrieval model, we get a single numeric metric value which is called MAP score. We have also compared our proposed IR model with some existing IR models in terms of MAP score. Table 1 includes the MAP scores for various IR models including the proposed hybrid model. In Table 1, we have shown comparisons of the proposed model with the individual component IR models: the VS model (Model 1) and the LSI-based IR model (Model 2). The results shown in Table 1 indicate that the proposed hybrid model achieves more MAP scores than the individual component models. Since k is the most important parameter of the LSI-based model and it indicates the dimension of semantic space into which documents and queries are mapped, we have tuned this parameter for achieving better results. The effect of varying k value on the MAP score for Model 2 is shown in Fig. 4. It indicates that Model 2 with k set to 95 gives the best MAP score. For the proposed hybrid model, we have an important tuneable parameter β which determines how much weight should be assigned to an Table 1 Comparisons of the proposed hybrid model with the individual component models in terms of MAP scores
IR models
MAP score
The proposed hybrid IR model
0.5960
The LSI-based model (model 2)
0.5078
The VS model (model 1)
0.5003
Bengali Document Retrieval Using Model Combination
99
Fig. 4 Impact on MAP scores for Model 2 when k values are varied
Fig. 5 Impact of β on the proposed IR model when β is varied
individual component model’s output for achieving the better MAP score. Figure 5 shows the impact of β on this model. It indicates that the model with β set to 0.492 gives the best MAP score.
4.1 Comparison with Existing Models We have also compared the proposed hybrid model with an existing hybrid IR system proposed in [11]. For the proposed IR system, we have combined the VS model and the LSI-based IR model, whereas the work proposed in [11] hybridizes the VS model and a model that uses word embedding. Table 2 shows the comparison of the MAP scores obtained by the proposed model and the model proposed in [11]. Table 2 indicates that our proposed hybrid model is effective for Bengali information retrieval task.
100 Table 2 Comparison with an existing model [11]
S. Chatterjee and K. Sarkar
IR models
MAP score
The proposed hybrid IR model
0.5960
The model proposed in [11]
0.5805
5 Conclusion This paper presents a hybrid IR model that combines a VSM model and an LSI-based model. Though the LSI-based model can do semantic matching, it has a tendency to retrieve too many documents as relevant to the query. But when it is combined with the VSM model, the overall retrieval performance is improved. This is because the VSM model can discriminate among the documents retrieved by the LSI-based model using the relative importance of query words present in the documents. In this way, they complement each other when both the models are combined. Our future plan is to design a hybrid model by combining more than two IR models.
References 1. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620 (1975) 2. Singhal A, Pereira F (1999) Document expansion for speech retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp 34–41 3. Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595 4. Zhao L, Callan J (2012) Automatic term mismatch diagnosis for selective query expansion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, pp 515–524. Portland. https://doi.org/10.1145/2348283.2348354 5. Marcus R (1991) Computer and human understanding in intelligent retrieval assistance. In: Proceedings of the ASIS annual meeting, vol 28, pp 49–59 (1991) 6. Robertson SE (1977) The probability ranking principle in IR. J Docum 33:294–304 7. Turtle H, Croft WB (1991) Evaluation of an inference network-based retrieval model. ACM Trans Inf Syst 9(3):187–222. https://doi.org/10.1145/125187.125188 8. Spärck Jones K, Walker S, Robertson SE (2000) A probabilistic model of information retrieval development and comparative experiments. Inf Process Manage 36(6):809–840 9. Ponte J, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp 275–281. SIGIR’98, Melbourne, Australia. https://doi.org/10.1145/ 290941.291008 10. Yürekli A, Kaleli C, Bilge A (2021) Alleviating the cold-start playlist continuation in music recommendation using latent semantic indexing. Int J Multimedia Inf Retriev 10(3):185–198 11. Chatterjee S, Sarkar K (2018) Combining IR models for Bengali information retrieval. Int J Inf Retrieval Res IJIRR 8(3):68–83 12. Sarkar K, Gupta A. An empirical study of some selected IR models for Bengali monolingual information retrieval. In: Proceedings of ICBIM, NIT, Durgapur
Bengali Document Retrieval Using Model Combination
101
13. Dolamic L, Savoy J (2008) UniNE at FIRE 2008: Hindi, Bengali, and Marathi IR. In: Working notes of the forum for information retrieval evaluation 14. Ganguly D, Leveling J, Jones GJF (2013) A case study in de-compounding for Bengali information retrieval. In: Proceedings of the 4th international conference on information access evaluation, multi-linguality, multimodality, and visualization. Lecture notes in computer science, vol 8138, pp 108–119. Valencia, Spain. https://doi.org/10.1007/978-3-642-40802-1_14 15. Majumdar P, Mitra M, Parui SK, Kole G (2007) YASS: yet another suffix stripper. ACM Trans Inf Syst 25(4). https://doi.org/10.1145/1281485.1281489 16. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, UK, p 260 17. Bellegarda JR (1998) A multispan language modeling framework for large vocabulary speech recognition. In: IEEE Trans Speech Audio Process 18. Chowdhury SR, Sarkar K, Dam S (2017) An approach to generic Bengali text summarization using latent semantic analysis. In: International Conference on Information Technology (ICIT), pp 11–16, Bhubaneswar. https://doi.org/10.1109/ICIT.2017.12
Deep Neural Networks Fused with Textures for Image Classification Asish Bera, Debotosh Bhattacharjee, and Mita Nasipuri
Abstract Fine-grained image classification (FGIC) is a challenging task due to small visual differences among inter-subcategories, but large intra-class variations. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modeling using the long short-term memory (LSTM). Another path computes imagelevel textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for classification. The method is tested on six datasets (e.g., human faces, food-dishes, etc.) using four backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins. Keywords Convolutional neural networks · Face recognition · Food classification · Local binary patterns · Long short-term memory · Random erasing
1 Introduction Fine-grained image classification (FGIC) is a challenging problem in computer vision over past decade. It discriminates smaller visual variations among various subcategories of objects like human faces, flowers, foods, etc. The convolutional neural networks (CNNs) have achieved high performance in FGIC. The CNNs represent object’s shape, texture, and other correlated information in the feature space. In addition with global image-level description, object-parts relation and local patch information have shown their efficacy by mining finer details to solve FGIC. Many works have been devised leveraging attention mechanism [2, 11], context encoding A. Bera (B) Department of Computer Science and Information Systems, BITS, Pilani, Rajasthan, India e-mail: [email protected] D. Bhattacharjee · M. Nasipuri Department of Computer Science and Engineering, Jadavpur University, Kolkata, WB, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_10
103
104
A. Bera et al.
[1, 6], graph-based feature representation [3, 25], efficient data augmentation [26], and others. Many works avoid bounding-box annotations to localize essential image regions using weakly supervised part selection [12]. Thus, defining region-based descriptors is a key aspect to enhance FGIC performance. In another direction, the local binary patterns (LBP) [15] have achieved significant success in describing textural features from human faces, and other image categories [4]. LBP is a non-parametric texture descriptor, extracted from a grayscale image. It encodes the differences between a pixel and its neighborhood pixels localized in a rectangular grid (e.g., 3 × 3). Here, both textural and deep features are fused to devise a feature vector. This work proposes a method, namely, Deep (Neural) Networks fused with Textures (DNT) to explore its aptness for FGIC. The first path extracts deep feature map using a base CNN and pooled through a set of patches. Next, global average pooling (GAP) is applied to summarize the features followed by patch encoding using the long short-term memory (LSTM). The other path computes the histograms of LBPs as feature descriptors. Finally, these two sets of features are fused prior to classification. The proposed method is shown in Fig. 1. We have experimented on six image datasets (1k–15k), representing a wide variations in object’s shape, texture, etc. It includes human faces with age variations [17, 18]; natural objects like flowers, sea-lives; and food-dishes of India [16] and Thailand [20]. The contributions of this paper are (a) The deep features and local binary patterns are fused for image recognition. (b) The method achieves better accuracy on six image datasets representing the human faces, food dishes, and natural object categories. The rest of this paper is organized as follows: Sect. 2 summarizes related works and Sect. 3 describes the proposed method. The experimental results are discussed in Sect. 4, followed by conclusion in Sect. 5.
2 Related Works Human faces, food items, and other objects’ (e.g., flowers, marine-lives) recognition is a challenging FGIC task. Apart from global feature descriptor rendered from fullimage, patch-descriptors have attained remarkable progress using deep learning. Partbased methods focusing on local descriptions and semantic correlations are integrated [2]. In this direction, multi-scale region proposals and fixed-size patches have attained much attention. In [1], multi-scale region features are encoded via LSTM. In [6], mask-RCNN is employed to localize discriminative regions. Several approaches have explored attention mechanism to improve performance [2]. Few methods have proposed an ensemble of various CNNs, fusion of two or more subnetworks for performance gain [16]. Various food-dishes classification is discussed in [10, 22]. Marine-life classification using CNNs is described in [13]. A dataset on marine animals is proposed in [19]. On the other side, several works on face recognition have computed local textures using LBP family. A classical grayscale, rotation-invariant, and uniform LBP at cir-
Deep Neural Networks Fused with Textures for Image Classification
105
Fig. 1 Proposed method (DNT) fuses deep features and texture descriptors using local binary patterns (LBP) for fine-grained image classification
cular neighborhoods is introduced in [15]. Deep architectures have been developed underlying LBP for textural feature extraction for face recognition [23]. Empirical model with local binary convolution layers is explored [8]. Weighted-LBP emphasizes more importance on regions which are more influenced by aging effects [27]. Multi-scale LBP and SIFT descriptors are used for multi-feature discriminant analysis in [9]. With a brief study, this paper explores a combination of CNN and LBP features.
3 Proposed Method: Deep Networks Fused with Textures Proposed DNT is a two-stream deep model (Fig. 1). Firstly, it emphasizes the features via patches and LSTM. Then, it combines multiple LBP. Lastly, both paths are fused. Convolutional Feature Representation: An input image with class-label Il ∈ Rh×w×3 is fed into a base CNN, such as DenseNet-121, etc. A CNN, say N , extracts high-level feature map F ∈ Rh×w×c where h, w, and c denote the height, width, and channels, respectively. Simply, we denote N (Il , θ ) = F to compute deep features, where image Il is provided with its class-label l, and θ represents the learning parameters of N . The feature map F from the last convolutional layer of base N is extracted to develop the proposed model by including other functional modules. Patch Encoding: The region proposals (D) are generated as non-overlapped uniform (same size) patches from Il . The resulting number of regions is e = (h × w)/a 2 , where a × a is spatial size of a rectangular patch d. A set D = {d1 , d2 , . . . , de |Il } of e patches are pooled from feature map F which is spatially upsampled to h × w × c size prior to pooling. The patches represent fine-details and local contexts which are important for subtle discrimination in FGIC. The bilinear pooling is applied to compute features from every patch of size h 1 × w1 × c. Next, the global average pooling (GAP) is applied to summarize the mean features of D. It downsamples the spatial dimension at patch-level to 1 × 1 × c. The resulting feature map is F1 . To
106
A. Bera et al.
learn effectiveness of patches, a single-layer fully gated LSTM [7] is applied to learn long-term dependencies via the hidden states. The encoded feature vector is denoted as F2 ∈ Rv×1 , defined in (Eq. 1). (1) F = N (Il , θ ); F1 = N D, G A P(F), θ1 ; F2 = N D, L ST M(F1 ), θ2 Textures Representation using Local Binary Patterns: The L B P is a monotonic grayscale-invariant local descriptor which computes spatial textures. The histogram of L B P labels is considered as a feature vector. Here, the uniform value of L B PP,R is extracted as texture descriptor at global image level, where P defines the total number of sampled neighbors and R represents the radius of circular neighborhood. L B P P,R =
P−1 i=0
q( pi − pc ).2i ;
1, if( pi − pc ) ≥ 0 q( pi − pc ) = 0, otherwise,
(2)
where pc denotes grayscale value of center pixel of a local window, pi represents value of corresponding neighbor pixel of pc , and q(.) is an indicator function. The histograms of multiple neighborhoods are combined to improve the effectiveness of texture patterns. Finally, the descriptor F3 is defined as P,R F3 = L B P(I ) Pi ,R j ; F f inal = N (F2 F3 , θ f ); l¯ = softmax(F f inal ); l¯ ∈ RY ×1 , i= j=1
(3) where denotes concatenation operator. The neighborhood spatial structures of P = 8, 16 and R = 1, 2 combinations are considered, shown in top row of Fig. 2. The dimension of combined image-level texture vector is 4 × 256 = 1024. However, other higher values can also be computed according to Eqs. 2–3. Fusion: Finally, F2 and F3 are concatenated to produce a mixed feature vector F f inal which is fed to a softmax layer for generating an output probability vector implying each predicted class-label l¯ corresponds to actual-label l ∈ Y from a set of classes Y . Random Region Erasing Image Augmentation: Several image augmentation methods are used, e.g., translation, rotation, scaling, random erasing [26], etc. Here, random erasing at global image level is applied along with general data augmentations. It randomly selects a rectangular region I E in I and erases pixels inside I E with random values within [0, 255]. The height and width of I E are randomly chosen on-the-fly within [0.2, 0.8] range, and pixels are erased with value 127, shown in bottom-row of Fig. 2.
Deep Neural Networks Fused with Textures for Image Classification
107
Fig. 2 Top-row: LBP of various neighborhoods (P, R): (8,1), (8,2), (16,1), and (16,2). Bottom-row: Random erasing data augmentation on flower and celebrity-face
4 Experimental Results and Discussion Datasets: Proposed DNT is evaluated on six datasets representing the food-dishes, flowers, human faces, and marine-lives. A well-known age-invariant human face dataset, FG-Net contains 1002 images of 82 persons with ages from 0 to 69 years [17]. Datasets comprised with 80 Indian dishes [16] and 50 Thailand dishes [20] are tested. Other three datasets are collected from the Kaggle repository. The images are randomly divided into decent train-test sets, detailed in Table 1. Dataset samples are shown in Fig. 3. The top-1 accuracy (%) is evaluated for assessment. Implementation: The DenseNet-121, DenseNet-201, ResNet-50, and MobileNetv2 backbone CNNs are used for deep feature extraction, and fine-tuned on the target datasets. Pre-trained ImageNet weights are used to initialize base CNNs with input image size 256 × 256. Random region erasing, rotation (±25 ◦ C), scaling (1±0.25), and cropping with 224 × 224 image size are followed for data augmentation. The output feature map (e.g., 7 × 7× c) is upsampled to 48 × 48 × c for pooling of 4 × 4 patches, and the value of output channels (c) varies according to CNN architectures, e.g., c = 1024 for DenseNet-121. Uniform patch size is 12 × 12 pixels to generate 16 patches. The feature size of LSTM’s hidden layers is 1024, and concatenated with LBP of same size. The final feature vector c = 2048 is fed to the softmax layer for classification. Batch normalization and drop-out rate is 0.2 which is applied to ease over-fitting. The Stochastic Gradient Descent (SGD) optimizer is used to optimize the categorical cross-entropy loss with an initial learning rate of 10−3 and divided by 10 after 100 epochs. The DNT model is trained for 200 epochs with a mini-batch
Fig. 3 Dataset samples: Celebrity face, Indian food, and flower
108
A. Bera et al.
Table 1 Dataset summary and test results using 3 × 3 patches and 256 × 256 LBP Dataset
Class
Train
FG-Net
82
827
Celebrity
17
1190
Indian food
80
Thai food 50
Test
DenseNet- ResNet121 50
DenseNet- MobileNet-v2 201
175
52.38
48.80
57.73
510
94.24
89.28
95.04
92.85
2400
1600
72.18
68.87
73.31
69.62
52.38
14172
1600
92.31
90.18
92.50
89.93
Flower
6
2972
1400
96.85
95.78
96.71
96.07
Sea-life
18
5823
3636
90.36
89.10
91.05
90.09
10.2
28.9
23.3
6.0
Model parameters (Millions)
size of 8 using 8 GB Tesla M10 GPU, and scripted in TensorFlow 2.x with Python library. Result Analysis and Performance Comparison: The test results with 3 × 3 patches and two LBP structures i.e., (8, 1) and (8, 2) with a total 512 textures are given in Table 1. The feature size of LSTM’s hidden unit is 512, and after concatenating with histograms of LBP, size of final feature map is 1024. The last-row estimates the parameters (millions) of various models. The accuracy (%) is very decent, except age-invariant face recognition (AIFR), i.e., FG-Net. Many existing methods have experimented on FG-Net dataset for AIFR by following leave-one-person-out strategy [14, 24]. In our setup, FG-Net test-set includes at least one unseen image per person by splitting 1002 samples into train-test (83:17) sets. Here, we have tested this challenging dataset for FGIC rather than AIFR. Hence, DNT is not directly comparable with existing methods. However, DNT attains better results (Tables 1 and 3) than NTCA (48.96%) [5], and other works on AIFR [18]. FoodNet presents classification of 50 Indian food-dishes [16], and achieves 73.50% accuracy using an ensemble method. It consists of 100 images per class and 80% images per class which are used for training. We have used similar 80 dishes with 50 images per class, following 60:40 train-test ratio. DNT achieves 80.75% and 74.75% accuracy using DenseNet-201 and ResNet-50, respectively (Table 3). We have tested on ThaiFood-50 [20], and the accuracy is 80.42%. In [21], the accuracy is 83.07% using ResNet-50. On the contrary, DNT attains 95.18% using DenseNet201 and 91.93% by ResNet-50 (Table 3). The performance on other datasets is also high. However, to the best of our knowledge, no significant results have been reported on dataset like sea-life. Thus, we have reported benchmark results in the context of FGIC on these datasets for further research. It is noted that ResNet-50 and DenseNet201 are heavier models regarding the model parameters, and MobileNet-v2 is a very efficient lightweight model. Next, experiments on Indian food and flower show accuracy gain using 4 × 4 patches while other components are unaltered. This test is comprised with 16 patches,
Deep Neural Networks Fused with Textures for Image Classification Table 2 Performance of DNT using 16 patches and 512 LBP Dataset DenseNet-121 ResNet-50 DenseNet-201 Flower Indian food Param (M)
97.10 72.13 10.3
94.78 71.43 28.8
96.50 76.06 23.3
Table 3 Performance of DNT using 16 patches and 1024 LBP Dataset DenseNet-121 ResNet-50 DenseNet-201 FG-Net Celebrity Indian food Thai food Flower Sea-life Param (M)
54.74 95.43 78.18 94.00 97.50 92.50 15.5
49.40 92.06 74.75 91.93 97.14 92.51 36.3
55.95 95.83 80.75 95.18 98.00 94.51 30.5
109
MobileNet-v2 95.71 72.94 6.0
MobileNet-v2 53.57 90.87 76.31 92.75 97.21 92.34 11.8
Table 4 Ablation study on proposed DNT using DenseNet-121 (DN-121) DenseNet-121 (DN-121) base CNN with key modules Indian Sea-life food DN-121 + common image augmentation DN-121 + common + random erasing image augment DNT (DN-121) with 9 patches, and LBP (addition) DNT (DN-121) with 16 patches, and without LBP DNT (DN-121) with 16 patches, and LBP (concatenation)
63.37 67.25 71.49 74.56 78.18
86.24 88.43 89.23 90.28 92.50
Param 7.0 7.0 10.2 15.5 15.5
512 LBP and 512 LSTM features. The results imply more patches improve accuracy (Table 2). Furthermore, we have increased the number of LSTM’s hidden states and number of patches. This test is carried out with the fused features of 1024 textures (LBP), and 1024 LSTM units encoded from 16 patches. The results are reported in Table 3. Clearly, DenseNet-201 performs the best among all four backbones, while other CNNs produce satisfactory results used in this work. The significance of major components of DNT is tested, and results are given in Table 4. Particularly, the benefits of random erasing over general image augmentation, number of patches, textures (LBP), LSTM, and their further increment in the feature space are investigated for performance improvement on two datasets using DenseNet121 (DN-121). The ablative results justify the main components of proposed DNT.
110
A. Bera et al.
5 Conclusion In this paper, we have presented a new work on image classification by fusing the deep features with local textures at image level. The performance is evaluated using four base CNNs on six diverse FGIC datasets. We have achieved improved results on these datasets compared to existing works. In addition with conventional image augmentation, random region erasing also improves the accuracy. In future, we plan to develop a new model to improve the performance further and explore other fusion strategies for wider applicability on large FGIC datasets.
References 1. Behera A, Wharton Z, Hewage P, Bera A (2021) Context-aware attentional pooling (cap) for fine-grained visual classification. In: Proceedings 35th AAAI conference on artificial intelligence, pp 929–937 2. Bera A, Wharton Z, Liu Y, Bessis N, Behera A (2021) Attend and guide (ag-net): a keypointsdriven attention-based deep network for image recognition. IEEE Trans Image Process 30:3691–3704 3. Bera A, Wharton Z, Liu Y, Bessis N, Behera A (2022) SR-GNN: Spatial relation-aware graph neural network for fine-grained image categorization. IEEE Trans Image Process 31:6017–6031 4. Bi X, Yuan Y, Xiao B, Li W, Gao X (2021) 2DD-LCoLBP: a learning two-dimensional cooccurrence local binary pattern for image recognition. IEEE Trans Image Process 30:7228– 7240 5. Bouchaffra D (2014) Nonlinear topological component analysis: application to age-invariant face recognition. IEEE Trans Neural Netw Learn Syst 26(7):1375–1387 6. Ge W, Lin X, Yu Y (2019) Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3034–3043 7. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 8. Juefei-Xu F, Naresh Boddeti V, Savvides M (2017) Local binary convolutional neural networks. In: Proceedings of IEEE conferences on computer vision and pattern recognition, pp 19–28 9. Li Z, Park U, Jain AK (2011) A discriminative model for age invariant face recognition. IEEE Trans Inf Forensics Secur 6(3):1028–1037 10. Lim CH, Goh KM, Lim LL (2021) Explainable artificial intelligence in oriental food recognition using convolutional neural network. In: 2021 IEEE 11th international conference on system engineering and technology (ICSET), pp 218–223 11. Liu H, Li J, Li D, See J, Lin W (2022) Learning scale-consistent attention part network for fine-grained image recognition. IEEE Trans Multimed 24:2902–2913 12. Liu M, Zhang C, Bai H, Zhang R, Zhao Y (2022) Cross-part learning for fine-grained image classification. IEEE Trans Image Process 31:748–758 13. Liu X, Jia Z, Hou X, Fu M, Ma L, Sun Q (2019) Real-time marine animal images classification by embedded system based on mobilenet and transfer learning. In: OCEANS 2019-Marseille, pp 1–5. IEEE 14. Moustafa AA, Elnakib A, Areed NF (2020) Age-invariant face recognition based on deep features analysis. Signal Image Video Process 14(5):1027–1034 15. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Deep Neural Networks Fused with Textures for Image Classification
111
16. Pandey P, Deepthi A, Mandal B, Puhan NB (2017) Foodnet: recognizing foods using ensemble of deep networks. IEEE Signal Process Lett 24(12):1758–1762 17. Panis G, Lanitis A (2014) An overview of research activities in facial age estimation using the FG-NET aging database. In: European conferences of computer vision, pp 737–750. Springer 18. Panis G, Lanitis A, Tsapatsoulis N, Cootes TF (2016) Overview of research on facial ageing using the FG-NET ageing database. IET Biom 5(2):37–46 19. Pedersen M, Bruslund Haurum J, Gade R, Moeslund TB (2019) Detection of marine animals in a new underwater dataset with varying visibility. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 18–26 20. Termritthikun C, Kanprachar S (2017) Accuracy improvement of Thai food image recognition using deep convolutional neural networks. In: 2017 international electrical engineering congress (IEECON), pp 1–4. IEEE 21. Termritthikun C, Kanprachar S (2018) Nu-resnet: deep residual networks for Thai food image recognition. J Telecommun Electron Comput Eng (JTEC) 10(1–4):29–33 22. Tiankaew U, Chunpongthong P, Mettanant V (2018) A food photography app with image recognition for Thai food. In: 2018 seventh ICT international student project conference (ICTISPC). IEEE, pp 1–6 23. Xi M, Chen L, Polajnar D, Tong W (2016) Local binary pattern network: a deep learning approach for face recognition. In: 2016 IEEE international conferences image processing (ICIP), pp 3224–3228. IEEE 24. Zhao J, Yan S, Feng J (2022) Towards age-invariant face recognition. IEEE Trans Pattern Anal Mach Intell 44(1):474–487 25. Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for finegrained recognition. In: Proceedings of IEEE/CVF conferences on computer vision and pattern recognition, pp 15079–15088 26. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of AAAI conferences on artificial intelligence, vol 34, pp 13001–13008 27. Zhou H, Lam KM (2018) Age-invariant face recognition based on identity inference from appearance age. Pattern Recognit 76:191–202
Real-Time Prediction of In-Hospital Outcomes Using a Multilayer Perceptron Deployed in a Web-Based Application Varun Nair, V. P. Nathasha, Uday Pratap Singh Parmar, and Ashish Kumar Sahani
Abstract In-hospital mortality prediction in real time can offer clinicians a convenient and easy indicator of patient acuity and hospital efficiency. The latter also significantly relies on adequately utilizing the hospital resources to deliver quality treatment and care. The provision of quality care merits the reduction of the average duration of patient stay, especially for intensive care units (ICU) where patients are in critical condition. Knowing the length of patient stay can thus serve as an indicator of hospital efficiency and help drastically improve hospital resource utilization. Also, extensive hospital stay usually denotes prolonged bed rest, the primary cause of pulmonary embolism (PE). We have also taken into cognizance that despite recent advancements in hospital infrastructures, cardiovascular disorders are a leading cause of mortality, among which heart failure (HF) and ST-elevation myocardial infarction (STEMI) have lately gained prominence. The advent of electronic health records (EHR) has allowed the usage of machine learning prediction algorithms to determine various health disorders from patient data. Therefore, we have developed sequential deep neural networks capable of predicting HF, STEMI, and PE from patients’ demographic data, prior medical history, and initial lab parameters, with an AUC of 0.867, 0.861, and 0.761, respectively. We have also developed neural networks for predicting Mortality with an AUC of 0.985 and Duration of ICU and Hospital Stay with Mean Absolute Error of 2.03 and 2.54, respectively, by adding doctors’ diagnoses of comorbidities to patient records. We have designed a web application that can be deployed in a hospital and used to test these models on real-time patient data. Keywords Neural networks · Web application · Mortality · Hospital stay · ICU stay
V. Nair (B) · V. P. Nathasha · A. K. Sahani Department of BME, Indian Institute of Technology Ropar, Rupnagar, Punjab 140 001, India e-mail: [email protected] U. P. Singh Parmar Government Medical College and Hospital, Sector-32, Chandigarh 160 047, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_11
113
114
V. Nair et al.
1 Introduction Patient flow is a well-studied indicator of hospital efficacy, and studies have linked it to patient safety, accessibility, and profitability. Potentially preventable delays in the hospital increase susceptibility to the risks of hospitalization (e.g., hospital-acquired infections, adverse drug responses, and pressure ulcers), as well as a deterioration in patient satisfaction. Patient admission is blocked downstream from external admission sources (e.g., transfers) and internal sources [1], culminating in emergency department boarding, post-anesthesia care unit boarding [2], and the operating room holds. Each of these scenarios necessitates primary care in less-than-ideal settings. In hospital settings, the calculation of risk and mortality scores has a long record and is well researched [3]. Risk and mortality scores are derived using patient data from the current hospitalization and provide a picture of the patient’s status [4]. Healthcare facilities are creating empiric patient data warehouses due to the widespread adoption of EHRs, allowing the usage of statistical and machine learning techniques [5]. Machine learning solutions have been introduced to EHRs for their complete utilization; for example, k-nearest neighbor [6], decision trees [7], and support vector machines [8] for modeling hospital readmission rates. Machine learning solutions have also been applied to various cardiovascular diseases [9]. In this paper, we create and test a deep-learning-based discharge and ailment prediction tool that can be deployed in a hospital setting using a web-based application, as depicted in Fig. 1. Here, we shall be discussing the dataset, the designed neural networks, and the developed web application in detail.
2 Dataset 2.1 Data Source Before deploying the models for real-time analysis, they were trained using patient admission and discharge records of over 2 years, obtained from a cardiac insti-
Fig. 1 Algorithm and workflow
Real-Time Prediction of In-Hospital Outcomes Using a Multilayer …
115
tute. Only the latest hospitalization records were utilized for patients with multiple hospitalizations. Also, patients discharged against medical advice (DAMA) were omitted from the database. Features and outcomes were obtained by analyzing the data of 11,498 patients remaining in the database. The records provided information regarding demographics, medical history, lab measurements, diagnosis of various comorbidities, the eventual clinical outcome, and the duration of hospitalization. Demographics included information about patients’ age, gender, residential locality, and type of admission. The medical history of the patient included information regarding prior known addictive behaviors, namely, smoking and alcoholism, along with common ailments. Measurements from lab reports were of parameters analogous to tests typically performed on cardiac patients. The primary care doctors examined for the presence of comorbidities associated with cardiac disorders. Among the patient records used for this study, 9.4% had expired under medical care, 26.75% were diagnosed with HF, 14.62% with STEMI, and 1.46% with PE [10].
2.2 Data Preprocessing Categorical variables were numerically encoded. Boolean and binary variables were mapped to 0 and 1. All string data were discarded due to the inability of neural networks to process the same. KNNImputer was configured to estimate the missing values within the dataset using the Euclidean distance metric and the average feature value from k = 10 nearest neighbors, as in previous work [10]. This process is necessary to avoid training errors and reduce bias in the dataset. The data were normalized by subtracting the mean and scaling to unit variance to minimize data redundancy and improve data integrity. The scaling factors were saved for processing the real-time data. The following formula was used to perform median-based data rejection on data used to train regression models: IQR = Q3 − Q1
(1)
Data Exclusion Threshold = Q3 + 1.5 × IQR,
(2)
where Q1 corresponds to the 25th percentile data, Q3 corresponds to the 75th percentile data, and IQR is the interquartile range.
3 Neural Networks Fully connected sequential multilayer perceptrons were deployed for classification and regression tasks. Each task’s distribution of prediction labels required a different network to be created. The architectural design consisted of multiple hidden layers
116
V. Nair et al.
having numerous nodes between the input and output layers. A weight vector connects each node to all the nodes in the preceding and succeeding layers. The number of nodes in the input layer equals the number of feature vectors. We built neural networks using resources from the TensorFlow library. Comorbidities are predicted by deploying demographic data, medical history, and lab reports as 20 feature vectors. Meanwhile, regression tasks and mortality prediction are performed using 48 feature vectors. For all neural networks, the output layer has only a single node. For classification, sigmoid is used as the output layer activation function, while the linear activation function is used for regression. All neural networks are compiled using the Adam optimizer with a learning rate of 0.01. The loss functions used are binary cross-entropy for binary classification and mean absolute error for regression. Mean absolute error (MAE) is also used as the metric for regression tasks, while area under the receiver operating characteristic (ROC) curve (AUC) is used for classification tasks.
3.1 Hyperparameter Optimization The neural network architecture was optimized using the Keras tuner library by applying a random grid search method. 1–3 hidden layers were chosen, and 10 to 200 nodes were selected per layer. The activation function of every layer was specified as either sigmoid or ReLU. The hyperparameters were sampled randomly over ten epochs, each step repeated thrice. The entire dataset was split 4:1 into train and test data. The train data was split again using tenfold cross-validation with stratified sampling, which helped select the model having the best bias-variance trade-off. The tuner provided optimized hyperparameters to form a new neural network architecture per fold. Each neural network thus formed was trained for tenfolds using nine parts of data for training and the remaining part for validation. At the end of each fold, the model performance was evaluated on test data. The evaluation metrics of the model were captured along with the hyperparameters used for its construction. The model maintaining the best average performance throughout the tenfolds, shown in Table 1, was selected for all predictions, and their neural network architecture was implemented.
3.2 Network Training After implementing the neural network with optimized parameters, it was trained using tenfold cross-validation over the entire dataset (the dataset obtained after median-based rejection for regression models) using one part for validation. The weights for each fold were kept, and the final model was built using the weights
Real-Time Prediction of In-Hospital Outcomes Using a Multilayer …
117
Table 1 Neural network models—architecture Neural
Layer1
Layer2
Layer3
Network
Node
Activation Dropout
Node
Activation Dropout
Node
Activation Dropout
HF
120
ReLU
0.5
110
ReLU
0.4
60
ReLU
STEMI
90
Sigmoid
0.4
100
Sigmoid
0.4
–
–
–
PE
160
Sigmoid
0.4
–
–
–
–
–
–
0.5
Mortality 30
Sigmoid
0.3
50
Sigmoid
0.1
–
–
–
Hospital stay
60
ReLU
0.3
–
–
–
–
–
–
ICU stay
20
ReLU
0.1
–
–
–
–
–
–
from the best-performed fold. This weighted model was then saved for implementation in a real-time web application.
4 Web Application The web-based application was developed using Django for back-end and HTML, CSS and JavaScript for front-end. The neural networks were integrated using REST API framework.
4.1 Django Back-End Setup Django, a Python web framework, is used for developing the back-end of this web application in concordant with the REST API toolkit and TensorFlow library. A virtual environment was created on a local server to setup the Django project. The multi-user support functionality feature of Django was used to create a multiuser authentication system and define four user categories, namely, Admission Staff, Lab Technician/Pathologist, Primary Care Doctor, and Head of the Department, each having different levels of access to these records depending on their profile. Data objects defined as classes are maintained using SQLite databases. Data for every patient is stored with a unique index key, constant across classes. Forms are designed to link patient data input to data objects. The website layout and the connections to render and redirect web pages are specified. For proper implementation of the model, the input data is normalized using imported scaling factors before providing them as input vectors. The models implemented using the TensorFlow library are employed with the Django REST framework by simply calling an API endpoint.
118
V. Nair et al.
4.2 Front-End Setup The front-end of this Web application is designed entirely using HTML, CSS, and JavaScript. Bootstrap is also deployed to make the website responsive to the screen resolution. The website has a single login page and four user interface pages. The admission staff registers patients during hospital admission by inputting the identification, demographic data, and prior medical history. The results of the conducted laboratory tests are added by the lab technician. The doctor can input comorbidity data as per their diagnosis. They can also register patients in emergency cases, edit patient history filed by the admission staff, and view lab reports uploaded by the lab technician. They also have the provision of filing the lab reports if needed. They can view the predictions regarding the comorbidities and mortality, along with the predicted duration for hospital and ICU stays. The Head of the Department has the provision to view and download all data. Figure 2 displays a flowchart depicting all the user categories as well as the flow of patient data. The web interface for the doctor is shown in Fig. 3, as it includes all major salient functions.
Fig. 2 Flowchart with all user categories
Real-Time Prediction of In-Hospital Outcomes Using a Multilayer …
119
Fig. 3 Web interface for doctor
5 Results 5.1 Model Metrics The AUC and MAE metrics obtained for the classification and regression models along with the accuracy are as stated in Table 2. The distribution of regression values for duration of hospital stay is shown in Fig. 4a and duration of ICU stay is shown in Fig. 4b. Mortality network shows excellent accuracy and AUC while HF and STEMI networks also show high accuracy and high AUC. PE network shows excellent accuracy but has a lower AUC due to biased distribution.
120
V. Nair et al.
Fig. 4 Distribution of actual values versus predicted values for Duration of a Hospital Stay and b ICU Stay Table 2 Neural network models—performance summary Neural network Metric Accuracy(%) HF STEMI PE Mortality Hospital stay ICU stay
AUC = 0.867 AUC = 0.861 AUC = 0.761 AUC = 0.985 MAE = 2.54 (day) MAE = 2.03 (day)
82.20 86.14 98.3 97.28 NA NA
Time (ms) 216 145 169 143 83.5 93.5
5.2 Real-Time Performance The models were evaluated in real time for arbitrary test data. The major parameter assessed was the computational time which is the time taken for the neural network to predict output from a single input vector. The result for same is mentioned in Table 2. Regression networks are visibly faster in processing the data than classification networks. This phenomenon could be attributed to the linear output function of the former networks compared to the sigmoid output functions of the latter.
6 Conclusion and Future Scope Our work depicts that deep learning models can be used in hospital settings for providing predictions regarding comorbidities and in-hospital mortality. It can also accurately predict information regarding both hospital and ICU stay duration for a patient which allows for better hospital resource management by the authorities.
Real-Time Prediction of In-Hospital Outcomes Using a Multilayer …
121
Future works will include connecting models to a cloud server such as AWS for deployment in a hospital for immediate collection and prediction of data, as well as comparison with standard machine learning algorithms.
References 1. Asplin BR et al (2003) A conceptual model of emergency department crowding. Ann Emerg Med 42(2):173–80. https://doi.org/10.1067/mem.2003.302 2. Schoenmeyr T et al (2009) A model for understanding the impacts of demand and capacity on waiting time to enter a congested recovery room. J Am Soc Anesth 110:1293–1304 3. Pollack MM et al (1996) “PRISM III”, critical care medicine: May 1996 - Volume 24 - Issue 5 - pp 743–752 4. Doyle C et al (2013) A systematic review of evidence on the links between patient experience and clinical safety and effectiveness. BMJ Open 3:e001570. https://doi.org/10.1136/bmjopen2012-001570 5. Artetxe A et al (2018) Predictive models for hospital readmission risk: a systematic review of methods. Comput Methods Programs Biomed 164:49–64. https://doi.org/10.1016/j.cmpb. 2018.06.006 6. Ahmad FS et al (2021) A hybrid machine learning framework to predict mortality in paralytic ileus patients using electronic health records (EHRs). J Ambient Intell Human Comput 12:3283–3293. https://doi.org/10.1007/s12652-020-02456-3 7. Taylor RA et al (2016) Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach. Acad Emerg Med 23:269–278. https://doi.org/10.1111/acem.12876 8. Cui S et al (2018) An improved support vector machine-based diabetic readmission prediction. Comput Methods Programs Biomed 166:123–135. https://doi.org/10.1016/j.cmpb.2018. 10.012 9. Singh K, Nair V et al (2022) Machine learning algorithms for atrioventricular conduction defects prediction using ECG: a comparative study. In: IEEE Delhi section conference (DELCON) 2022, pp 1–5. https://doi.org/10.1109/DELCON54057.2022.9753488 10. Bollepalli SC et al (2022) An optimized machine learning model accurately predicts inhospital outcomes at admission to a cardiac unit. Diagnostics 12(2):241. https://doi.org/10. 3390/diagnostics12020241
Automated Analysis of Connections in Model Diagrams Sandeep Kumar Erudiyanathan, Chikkamath Manjunath, and Gohad Atul
Abstract A model diagram is a crucial aspect of various scientific documents that are available as hard or soft copies. Often, the information captured in these diagrams is complex and needs deeper comprehension by human experts. Automating this process of reading and analyzing has become extremely important for various industrial applications. Hence, in this paper, we aim to extract the context from model diagrams by solving for connections and converting them into a graph using sophisticated image processing and pattern recognition algorithms. Keywords Image processing · Graph theoretic analysis · Pattern recognition · Machine learning · Automation
1 Introduction Analyzing industrial model diagrams for applications needs field expert manpower for performing testing, validation, implementation, etc., which incurs a cost for the industry and sometimes delay. Hence, automated processing of diagrams is required. Papers [1–3] deal with line segmentation as a separate problem and apply transforms or morphological processing, and context extraction is not available. This is also the case with the majority of the works that deal with document image processing. Conversion of an image to graph was only seen from a medical study perspective [4, 5]. Hence, not many works are seen in the existing literature that matches exactly S. K. Erudiyanathan (B) · C. Manjunath · G. Atul BGSW/ETI1, Bosch Global Software Technologies Pvt. Ltd., Koramangala, Bengaluru 560095, India e-mail: [email protected] C. Manjunath e-mail: [email protected] G. Atul e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_12
123
124
S. K. Erudiyanathan et al.
Fig. 1 Sample model diagram
the work that we have carried out. There exist works in chunks and a holistic view is missing. This paper considers model diagrams developed in ASCET [6] or any similar tool. ASCET is a tool developed by a company named ETAS. This tool focuses mainly on model-based application software development with auto-code generation. It mainly uses model and state machine diagrams to meet the specific automotive requirements of embedded software. These model diagrams (refer to Fig. 1) have the following ingredients—nameable entities (inputs and outputs),non-nameable entities (unit level operators like switches, arithmetic operators, and so on), signals moving from one model diagram to another, text which is used within and around entities, and connections to connect various image entities. The entities1 in a diagram depends on the domain and the application for which it is used. Model diagrams are made available in a standard image format or as a document like PDF which contains all these diagrams with a detailed textual explanation. The methods and algorithms discussed in this paper deal with the images readily available in standard formats like (.png, .bmp, and so on) and not on the images within a document. However, in the case of embedded diagrams, the optimal design of image extraction is to be framed before applying the methods mentioned in this paper. Considering connections for analysis ahead there are a few definitions that are to be noted. These definitions are specific to model diagrams, especially for ASCET kind of images: Connection: A group of pixels forms a connection C if all the pixel values are the same and they are 4, 8, or m-connected [7]. Simple connection: A connection C is called a simple connection if it connects exactly two entities in a diagram. Complex connection: A connection C is called a complex connection if it connects more than two entities in a diagram. Tap point: A tap point T is defined as a point in a diagram where more than one connections meet forming a junction, where all the incoming and outgoing connections carry the same signal or information. 1
In terms of model diagrams entities mean a non-textual and non-connection component of a diagram.
Automated Analysis of Connections in Model Diagrams
125
Cross point: A cross point P on an image is where more than one connection meets, where the opposite connections carry the same signal. Solid connection: A connection C which is either a simple or complex connection is called a solid connection if there are no breaks or intermittent spaces in it. Dashed connection: A connection C which is either a simple or complex connection is called a dashed connection if there are breaks or intermittent spaces in it. Feedback connection: A connection C is called a feedback connection if the flow of signal/information is backward (right to left) opposing the conventional forward flow (left to right). Extracting connection context includes determining entity connectivity. A connection may comprise a crossover point, or tap point along with its varieties like solid, dotted, and so on. An algorithm has to understand all these aspects to extract context from a given model diagram. Further, the extracted information is converted to a graph [8] as an example machine-readable format, for various applications. The proposed image connection and graph analysis algorithms are implemented in Python using suitable packages. The rest of the manuscript is organized as follows: Sect. 2 deals with the proposed methodologies, Sect. 3 discusses a case study on graph analysis, and Sect. 4 concludes the paper.
2 Proposed Methodology In this section, the algorithm for context extraction is explained in detail. There are a few assumptions that we pose before dwelling into connection analysis, they are a diagram can have entities with texts inside or outside, and a connection links various such entities. The bounding box coordinates and identities of these entities are known beforehand. Many object detection algorithms are already available for this purpose [9]. The texts available in model diagrams are identified with their bounding box information prior. Many text detection algorithms are available for this purpose [10].
2.1 Finding the Bounding Box for a Connection Consider the model diagram in Fig. 2. The diagram witnesses two types of connections—solid and dotted. All the connections are directed with an arrow flowing inwards or outwards from an entity. There might be texts available at the bottom of an entity. To extract the connections, the first task is to mask the entities and the texts present in the image so that only connections remain. For this purpose, their respective bounding box coordinates are used to identify the region and are filled with the background pixel intensity (in this case white color pixels). The later obtained image has only connections and is shown in Fig. 3.
126
S. K. Erudiyanathan et al.
Fig. 2 Model diagram of an automobile system
Fig. 3 Connections retained from the image
Fig. 4 Bounding box of connections identified
Fig. 5 Extreme corner points
Using a suitable kernel, the image in Fig. 3 is subjected to morphological closing operation [7], to convert dotted and dashed connections into a solid connection. This pre-processing step is needed to extract the boundary of every connection. The connection can be extracted using contour detection methods [11]. The obtained result is shown in Fig. 4.
Automated Analysis of Connections in Model Diagrams
127
2.2 Classifying a Connection as Simple and Complex To classify a connection, the corner points associated with every connection have to be determined. From these corner points, only extreme corner points are retained (refer to Algorithm 1 and Fig. 5). For our experimentation, the Harris corner detection algorithm [12] was used. Further, the patches of the image, each comprising connections obtained through their bounding boxes are analyzed individually (Fig. 4). When the image patch is extracted within the bounding box, together with the main connection, other chopped connections/noisy connections within the vicinity will also appear. Hence, these unwanted connection pieces have to be eliminated. This elimination follows Theorem 1. Algorithm 1: Finding the extreme corner points Result: extreme corner points count=0; Input connection image I; Input corner points list; while corner_point do x,y = corner_point[coordinates]; If {I[x+1,y]==0 or I[x-1,y]==0 or I[x,y+1]==0 or I[x,y-1]==0 or I[x+1,y+1]==0 or I[x-1,y+1]==0 or I[x-1,y-1]==0 or I[x+1,y+1]==0 or I[x+1,y-1]==0}then count++; If{count==0}save x,y end
Theorem 1 In a bounding box of a line L, the lengthiest of all lines is L itself. Proof. Let us consider two lines: L 1 and L 2 within the vicinity of a bounding box. Let us assume L 1 is the line of interest, we prove that length(L 1 ) > length(L 2 ). Both L 1 and L 2 are made of infinite points; however, few points define a bounding box (shown as black dots in Fig. 7)2 . For our convenience, the curved lines are approximated to straight lines (piece-wise linear modeling) which is usually the case in model diagrams (refer to Fig. 6). The red line in Fig. 6 shows the approximation of the actual connection line where the black dots are the corner/bends and start/end points. Let us define S containing the corner and/or start/end points such that (xi , yi ) ∈ S and i = 0, 1, 2, 3...n. Let Si ∈ S and Si ∈ L i . For two lines L 1 and L 2 , we have S1 and S2 . Let the bounding box of these lines be B. The perimeter Pr of this rectangle B is given by 2(l + w), where l is the length and w n 4 ) be four corners of the rectangle. is the width. √ If (m 1 , n 1 ), (m 2 , n 2 ), (m 3 , n 3 ), (m 4 , √ Then l = ((m 1 − m 2 )2 + (n 1 − n 2 )2 ) and l = ((m 1 − m 3 )2 + (n 1 − n 3 )2 ). 2
These points are formed when a line starts or ends, or a line has bends and twists. These points in turn define the boundary/coverage area of a line.
128
S. K. Erudiyanathan et al.
Fig. 6 Approximation of a curved line to a straight line
Fig. 7 Distance between the bounding box and the lines
Let P be a point on the bounding box. Extend the point horizontally such that it intersects the line at two points, and the distance between the points and P be d1 and d2 (refer Fig. 7). Since d1 is the distance between a point on L 1 which is the line of d2 . Hence, L 1 is closer to B than L 2 . interest, always d1 length(L 2 ) > ...length(L n ). Hence, there exists one line which is the lengthiest. Hence, only one line is selected which is the lengthiest within a bounding box and the associated extreme corner points are counted. If the count is exactly 2, then it is a simple connection. Any connection with a count greater than 2 is considered a complex connection. Lemma 1 A simple connection has exactly two extreme corner points and a complex connection has more than 2 extreme corner points. Proof . A simple connection is formed when a line segment connects exactly two entities in a diagram. During pre-processing when the entities are masked, it gives rise to an extreme corner point, exactly at the point where the connection meets the entity bounding box. Since in a simple connection there are two entities at either end, there are exactly two extreme corner point formations. A complex connection is formed out of a junction formed due to the tap and cross points of a connection. Hence, the connection will be a combination of more than one line segment, where 3
This can be proved by considering more points on L 1 by more lines approximating a curve and finding the distance with the points on B, adding all the distance values yields d1 .
Automated Analysis of Connections in Model Diagrams
129
every segment ends with an entity. When all entities are masked, each entity will give rise to one extreme corner point. Hence, a complex connection has more than two extreme corner points.
2.3 Extracting Context from Connections It is assumed that the entities are identified beforehand, and for the explanation’s sake, let these entities be named using numbers, as shown in Fig. 8. By knowing the bounding box of every entity, the contour of that entity box can be extracted which can be further used to map the extreme corner points to their respective entities. The mapping can be done using a distance metric. Hence, every extreme corner point has a map of the entity it is closest to and the connection to which it belongs. Though the connection is detected, assigning direction to the connection is another task. For this purpose, the arrowhead of the connection needs to be identified which can be carried out using image template matching [13, 14] (refer to the results in Fig. 9). The centroid of each detected arrow head bounding box is computed. Now let us consider a simple connection and the associated extreme corner points. A distance measure between the centroids and the two extreme corner points of a simple connection would determine which among the two corner points is associated with the arrowhead. This would assign a direction to the connection. Hence, the context information “there is a connection between 0 and 11” becomes richer with “there is a connection between 0 and 11, where signals flow outward from 0 and inwards to 11”.
Fig. 8 Blocks with id numbers
Fig. 9 Arrow heads identified
130
S. K. Erudiyanathan et al.
A complex connection is formed due to the existence of tap or/and cross points. To extract the context from these connections, the location of the tap and cross point is necessary. The characteristic of a connection having a tap point is that all the connections that are involved in the formation carry the same signal/information. The major difference between a tap and a cross point in the model diagrams is that taps have a round/circular solid dark patch or a solid dot. This shape can be extracted using shape descriptors (circularity index). The centroid of the circular dot is considered the tap point. To detect a cross point, a counter is maintained for each corner point. At each corner point, the existence of a black pixel in the 8-connectivity is checked and the counter is incremented. A corner point that is not a tap point nor an extreme corner point is saved as a cross point if the count is greater than 3. A rectangular patch of pixels (with the same pixel intensity as the background at all the tap and cross points) is given a new identity number and added to the list of elements temporarily and treated like any other entity. The complex connections are now broken into simple connections and the detection of corner and extreme corner point exercises is repeated. However, now the connection context obtained will include the dummy entities (at the tap and cross points). The obtained result is shown in Fig. 10. Observe new identity numbers that are created at two tap points—21 and 22. While finding cross points the connectivity in all directions is also checked, and the neighbors of every tap and cross point can also be detected. For example, for tap point 21, the neighbors are 7 to the left, 10 to the right, and 8 to the down. This can be done by identifying the connections that lie surrounding the tap point. For every connection, the tap point will be an extreme corner point, and the remaining extreme corner points on the other end are the neighbor links to the entity. Simple connections obtained by breaking complex connections have to be assigned direction. In the case of the tap point: in the connection to the left of the tap point, the direction is inwards to the tap, to the right direction is outwards from the tap to the top direction is inwards to the tap and the bottom direction is outwards to the tap. However, in all the connections, the signal carried is the same. In the case of a cross point, the direction conventions remain the same; however, the signal in the opposite connections is the same. The connections in all the directions of a tap and cross point are determined by comparing the centroids of the connection bounding boxes with the coordinates of the tap/cross point. Hence, the context obtained from the complex connection is “there is a connection between 7 and 21, where signal flows outward to 7 and inward
Fig. 10 Additional identity numbers created
Automated Analysis of Connections in Model Diagrams
131
to 21”, “there is a connection between 21 and 10, where signal flows outward to 21 and inward to 10”, and “there is a connection between 21 and 8, where signal flows outward to 21 and inwards to 8”. While extracting the complex connections, dummy identity numbers are introduced. These identity numbers are to be eliminated and the simple connections are to be stitched back to get the context of complex connections. Meaning, the context should be “there is a connection between 7 and 10, where signal flows outward to 7 and inward to 10” while eliminating 21. This can be done by a recursive process of elimination.
2.4 Finding the Type of a Connection A model diagram may have different kinds of connections. In the ASCET image samples that we had selected, there were three kinds of connections they are solid connections depicting an analog signal, dotted connections for a digital signal, and dashed connections depicting the sequence in which the entities should be executed. For this purpose, a combination of image processing and the deep learning-based ensemble of classifiers was used. Figure 4 shows the bounding box detected for each connection. Within each bounding box, several contours are computed. For a solid line, there will be 1, and for other connection patches, it will be more than 1. The DL-based classification (ensemble model) is applied to the later case. Hence, this is a two-stage classification—in stage 1, solid lines are determined, and in stage 2, dotted and dashed. Three deep learning models were used, and one among them is shown in Fig. 12. The output of each model is used as a vote for the class (dotted or dashed). Whichever class gets the maximum vote is considered to be the identified class from the given connection image patch. The training samples are shown in Fig. 11 (dotted and dashed lines). In the example shown in Fig. 1, all three kinds of connections are available. The model was trained with 154 dotted images and 126 dashed images. The training accuracy of the neural networks with a learning rate of 0.0001, Adam-optimizer, and cross-entropy loss, at the end of 12 (found by experiments) epochs, were 99%, 99.5%, and 98.9%. Once the model is trained, it was tested on the images that had both dotted and dashed lines in them. The obtained results are shown in Table 1. For image-5, it was observed that the accuracy dropped,
Fig. 11 a Dotted/simple dotted connection and b Dashed/sequence-dotted connection
132
S. K. Erudiyanathan et al.
Table 1 Results of dotted and dashed line detection. ADL: Available Dotted Lines, ADHL: Available Dashed Lines, DLD: Detected Dotted Lines, DHLD: Detected Dashed Lines, and ACC: Detection Accuracy Image label ADL ADHL DLD detected DHLD Acc (%) Image 1 Image 2 Image 3 Image 4 Image 5
1 2 3 6 7
0 0 1 1 4
1 2 3 6 7
0 0 1 1 2
100 100 100 100 81.81
Fig. 12 Deep learning model
and this is the reason that there were image noises where the line had a dotted and dashed pattern representation. These lines were wrongly classified.
3 Graph-Based Vital Block Detection Let us construct a directed graph G(V, E) where V is the blocks of an image and E is the connections. The type of connection will not be used in this particular case study. The obtained graph of Fig. 8 is shown in Fig. 13, where all entities are nodes and there exists an edge between them, e.g. node-7 establishes an inward link with nodes 2, 3, 4, 5, etc. Different kinds of centrality namely betweenness centrality, eigen centrality, closeness centrality, and degree centrality measures [15] were applied on the graph
Fig. 13 Graph constructed from the extracted information for the model diagram
Automated Analysis of Connections in Model Diagrams Table 2 Result of centrality measures Nodes Closeness Betweenness 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.18 0.14 0.25 0.26 0.00 0.21 0.00 0.20 0.20 0.00 0.18 0.16 0.16
0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.04 0.04 0.17 0.19 0.00 0.17 0.00 0.16 0.13 0.00 0.09 0.00 0.00
133
Degree
Eigen
0.05 0.05 0.05 0.05 0.05 0.05 0.30 0.15 0.10 0.20 0.20 0.05 0.10 0.05 0.15 0.15 0.05 0.15 0.05 0.05
0.09 0.16 0.20 0.20 0.20 0.20 0.56 0.34 0.28 0.45 0.26 0.09 0.11 0.02 0.06 0.03 0.01 0.01 0.00 0.00
that depicts the most central node in the graph. Such kind of analysis will find the block which is more prone to security threats and attacks or failures. The ranking of the nodes is carried by a combined notion (maximum vote of all centrality measures). The obtained results are shown in Table 2. Closeness centrality is highest for node-11, betweenness centrality is highest for node-11, degree centrality is highest for node-7, and Eigen centrality is highest for node-7. Hence, from this analysis, it is concluded that nodes 7 and 11 are at high risk of attacks and failures. Though the paper mentions intense image processing methods for context extraction from a given model diagram, there are a few limitations: (1) The kernel size used for morphological processing is resolution-dependent. Hence, the proposed methodology asks the user to fix the image resolution while extracting the images from the repository. (2) The connection recognition is dependent on existing corner detection algorithms. If the corner detection fails to find the extreme corner points, then the connection mapping to the entity fails. (3) Accuracy of the deep learning algorithm defines the classification of connection as dashed and dotted. This accuracy in turn depends on the dataset used for training.
134
S. K. Erudiyanathan et al.
4 Conclusions Document content analysis is in high demand for industrial automation applications. There are plenty of image extraction tools available and this current work assumes that the images are provided in any standard image format after being extracted from the document. Given an image, the connections are analyzed using a bunch of novel algorithms which determine tap points, cross points, and type of connections and extract the context information. In addition, the paper highlights one use case, where the extracted information is converted into a graph, and centrality analysis is carried out to find the crucial block in the diagram. The developed method was implemented on numerous images extracted from the industrial documents and the obtained results prove that the method can be used for the applications that demand block diagram analysis.
References 1. Chen B, Zhong H (2009) Line detection in image based on edge enhancement. In: Second international symposium on information science and engineering, Shanghai 2009:415–418. https:// doi.org/10.1109/ISISE.2009.100 2. Guo S-Y, Kong Y-G, Tang Q, Zhang F (2008) Probabilistic Hough transform for line detection utilizing surround suppression. In: 2008 international conference on machine learning and cybernetics, Kunming, pp 2993–2998. https://doi.org/10.1109/ICMLC.2008.4620920 3. Chen T-C, Chung K-L (2001) A new randomized algorithm for detecting lines. R-Time Imaging 7(6):473–481. ISSN 1077-2014. https://doi.org/10.1006/rtim.2001.0233 4. Gray Roncal WR et al (13 Aug 2015) An automated images-to-graphs framework for high resolution connectomics. Front Neuroinformatics 9:20. https://doi.org/10.3389/fninf.2015.00020 5. Dirnberger M, Kehl T, Neumann A (2015) NEFI: network extraction from images. Sci Rep 5:15669. https://doi.org/10.1038/srep15669 6. Details of ASCET is available at https://www.etas.com/en/products/ascet-developer.php. Accessed 27 May 2022, 11.32 AM 7. Gonzalez RC, Woods RE (2008) Digital image processing, 3rd ed 8. Bondy AJ, Murty USR (1991) Graph theory with applications. Wiley 9. Zhao Z, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 10. Shrivastava A, Amudha J, Gupta D, Sharma K (2019) Deep learning model for text recognition in images. In: 2019 10th international conference on computing, communication and networking technologies (ICCCNT), Kanpur, India, pp 1–6. https://doi.org/10.1109/ICCCNT45670.2019. 8944593 11. Contour extraction methods can be found at https://learnopencv.com/contour-detection-usingopencv-python-c/ 12. Li Y, Shi W, Liu A (2015) A Harris corner detection algorithm for multispectral images based on the correlation. In: 6th international conference on wireless, mobile and multi-media (ICWMMN 2015), Beijing, pp 161–165. https://doi.org/10.1049/cp.2015.0933 13. Singh N, Daniel AK, Chaturvedi P (2017) Template matching for detection & recognition of frontal view of human face through Matlab. In: 2017 international conference on information communication and embedded systems (ICICES), Chennai, pp 1–7. https://doi.org/10.1109/ ICICES.2017.8070792
Automated Analysis of Connections in Model Diagrams
135
14. Pandey P, Kulkarni R (2018) Traffic sign detection using template matching technique. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA), Pune, India, pp 1–6. https://doi.org/10.1109/ICCUBEA.2018.8697847 15. Pal M, Samanta S, Pal A (2019) Handbook of research on advanced applications of graph theory in modern society. IGI Global publications, 2020 ed
No-Reference Image Quality Assessment Using Meta-Learning Ratnadeep Dey, Debotosh Bhattacharjee, and Ondrej Kejcar
Abstract Deep learning-based no-reference image quality assessment faces problems like dependency on a large amount of experimental data and the generalization ability of the learned model. A deep learning model trained on a specific dataset cannot obtain the desired results for testing on other datasets. Similarly, a deep learning model trained with small experiment data does not provide the best result. This paper addresses these problems of the deep learning model using the metalearning approach in the field of no-reference Image Quality Assessment. The noreference image quality assessment is a small sample problem, where the amount of experimental data is very less. Although data augmentation techniques have been used to increase the amount of experimental data, they do not increase the variation of the data. Therefore, traditional deep learning-based techniques are unsuitable for no-reference image quality assessment. Another problem is the lack of generalization ability. A deep learning model trained with image quality datasets containing images distorted synthetically cannot efficiently assess the quality of images distorted naturally. This work proposes a meta-learning model that can be trained with more than one image quality dataset, where one image quality assessment dataset contains synthetic images and the other contains real images. Finally, another image quality assessment dataset has tested the trained model. The test result is better than the stateof-the-art methods, and the results establish the fact that the meta-learning model proposed in this paper tries to resolve the problems of the deep learning model. Keywords Meta learning · No-reference IQA · Meta-learning based IQA R. Dey (B) · D. Bhattacharjee Department of Computer Science and Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] D. Bhattacharjee e-mail: [email protected] D. Bhattacharjee · O. Kejcar Center for Basic and Applied Science, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, 500 03 Hradec Kralove, Czech Republic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_13
137
138
R. Dey et al.
1 Introduction Image Quality Assessment is one of the most important research domains in Computer Vision based research. Images are the main resources of this research domain, and images have been processed to extract information. As the images are the main resource of this research domain, quality assessment of the images is an important issue to see if the images are useful for further processing or not. Depending on the quality assessment, it can be decided whether the input images can be processed or not. If the quality assessment is not done properly, then some cases may arise where the desired outcome cannot be achieved even if the processing steps are completed properly. In today’s world, many images are available on the internet and processed for different purposes to solve real-world problems. Machine learning-based techniques generally address research problems in the Image Quality Assessment (IQA) research domain. In recent times, deep learning has been applied in all research areas under computer vision, and deep learning ensures very good results. However, deep learning has some disadvantages as well. One disadvantage of the deep learning technique is data dependency. The deep learningbased system is completely data-driven. Therefore, a huge amount of similar data is needed to train the system. However, the performance of deep learning-based systems decreases in the problem domains where a sufficient amount of training data is unavailable. Another problem is that a deep learning model designed for solving a specific problem that has been trained with a dataset cannot provide the best result if the model has been tested with another dataset. The No-reference Image Quality Assessment is a research domain with very few experimental datasets. Therefore, deep learning-based techniques are not suitable to address the problem. Another issue in the NRIQA research domain is the lack of real data. The leading IQA dataset contains synthetic data. However, the problem is that a deep learning model trained with synthetic data cannot assess the quality of the images distorted by natural stimuli. Traditional machine learning and deep learning techniques are ineffective in this situation. There is a research gap present. In this work, we have proposed a meta-learning-based no-reference image quality assessment system that helps reduce deep learning-based systems’ problems. In recent years, meta-learning has been introduced in the field of research on IQA. In this paper, we have proposed a meta-learning model for assessing the quality of the image. This proposed model has a generalization ability, and this proposed model learns from two different datasets. The learned model has been tested using a separate IQA dataset, and the learned model efficiently assesses the quality of the images. The proposed model’s performance goes past the state-of-the-art methods’ performance according to the performance metrics like SROCC and PLCC. The rest of the paper is organized like this: the next section discusses the literature survey, the third section introduces the proposed method, the fourth section provides the results, and finally, section five concludes the paper.
No-Reference Image Quality Assessment Using Meta-Learning
139
2 Literature Review .No-reference IQA has been carried out in two categories: distortion-specific approaches [1, 2] and generalized approaches [2, 3]. In the distortion-specific approach, image quality metrics have been designed to assess images, particularly distortion types. However, in the case of the generalized approach, image quality has been considered as not specific to any distortion. Both methods of the NRIQA have been done using machine learning [4, 5] as well as deep learning-based techniques [6, 7]. However, both of the techniques have some drawbacks. The most important disadvantage is the lack of generalization ability. Meta-learning is a technique that can help achieve generalized learning. A meta-learning-based approach has been introduced in NRIQA research in the paper [5]. The meta-learning model proposed in the paper learns from prior experiences and achieves some generalization. In another work [8], a Meta SGD model has been proposed to achieve more generalization in learning. A deep meta-learning-based generalizable NRIQA technique has been proposed by Zhu et al. [9]. The meta-learning approach has also been applied to assessing the quality of the video, and Jin et al. [10] have proposed their contribution in this regard.
3 Methodology In this work, we have influenced the model used in the research work [5]. The overall methodology of our proposed model is shown in Fig. 1. The proposed model has two parts: the primary quality model and the quality generalization model. The Primary Quality Model has been trained with distortion-specific images and learns with some prior experience. To achieve the generalization ability, the second part of the model has been trained with the prior experience learned from the previous regression model and with the images, which are distortion-independent. The twostage learning process with distortion-specific and distortion-independent images is useful for learning to achieve the ability of generalization. However, the generalization ability has been achieved using the use of optimizers. We have modified the optimization technique used in the research [5] and better results have been achieved. We have tested our model with a separate IQA database, which is not part of the training process. Our proposed model provides better results for this testing, and the optimization process modification helps achieve more generalization ability than the state-of-the-art techniques.
140
R. Dey et al.
Fig. 1 Overview of the proposed meta learning model
3.1 Meta-Learning Meta-Learning is the approach to a learning system in which a learning system can learn from previous experiences. Therefore, there is a scope for including different insights into the learning system, and the learned model can learn a system in a generalized fashion. There are situations when a learning system cannot learn from small training data. Meta-learning is a procedure where some learned system has been used for training another learned model, which has been a solution to the problems with a small amount of experimental data. The No Reference Image Quality Assessment is such a problem, and a meta-learning-based system can be applied here. In our proposed approach, distortion-specific images have been used for learning the Primary Quality Model. The learned experience from the deep regression model has been used to learn the next part of the model—The quality Generalization model. This model learns with meta-data extracted from the previous model and with some distortion-independent images. This helps the overall model achieve its generalization ability.
3.2 Deep CNN Model Used in this Work It has already been discussed that the system has two parts—The primary Quality Model and the Quality Generalization model. The Primary Quality Model has been designed with a deep regression CNN network, which has been shown in Fig. 2. A regression model has been used here to predict some quality scores from the learning data. Average pooling has been used to extract single values from the trained model.
No-Reference Image Quality Assessment Using Meta-Learning
141
Fig. 2 Architecture of primary quality model
The next model, named the Quality Generalization Model, has been designed with a deep CNN network. A fully connected CNN network has been used for this model.
3.3 Optimizing Function The generalization ability of the overall system has been achieved with the proper optimization function. In work [5], Bi-level gradient optimization has been used to achieve the model’s generalization ability. Here we have modified the optimization function. The modified function helps to increase the performance of the overall model and this model can be tested on a separate dataset that has not been used for training. The proposed meta-learning model LM minimizes the difference ∂between the predicted quality score P and the subjective quality score G. Then, the learning function of LM can be defined as Eq. (1). f (L M ) = min( p∂g)
(1)
In Eq. (1), p ∈ P and g ∈ G, and the min() function denotes the minimization of the ∂. The proposed model has two parts. The first part learns from the distortion-specific images, and the second learns from the distortion-independent images. Therefore, the learning function has two attributes, and the two attributes are related according to Eq. (2). f (L M ) =∝ f L M DS + β f L M D I
(2)
142
R. Dey et al.
The L M DS is the learning function for the learning process using distortion-specific images, and L M D I is the learning function for the learning process using distortionindependent images. The α and βare the controlling parameters of the complete training process. We have introduced these two variables within the learning process. The insertion of the variability in the learning process helps to optimize the model faster, and the learned model has more generalization ability than the state-of-the-art methods.
3.4 Training Process The first part of the overall system, the Primary Quality Model, has been trained with a distortion-specific IQA dataset, namely TID2013. The learned model has been used to feed into the next part of the system, named the Quality Generalization model. This model is also trained with the LIVE challenge dataset, which is distortionindependent. Therefore, the Quality Generalization model has been trained with two types of data—metadata learned in the Primary Quality model and distortionindependent IQA dataset. We found the best result for 100 epochs with a decay rate of 0.13.
4 Result and Discussion 4.1 Dataset Used Mainly three datasets have been used here for this research work. The two databases name TID2013 [11] and LIVEchallenge [12] have been used for training. The testing has been done using the CSIQ dataset [13].
4.2 Testing Process In the methodology section, it has been discussed that two datasets have been used for training operations. The TID2013 distortion-specific IQA dataset has been used for training the first part of the model, and the LIVE challenge distortion-independent dataset has been used to train the Quality Generalization model. In the work presented in [5], they tested their proposed model with the Live challenge dataset used in the training process. This does not the proper way to test the generalization ability of the learned model. Therefore, we have tested our model using a separate dataset, namely the CSIQ dataset, which is not part of the training process. This experiment tests the
No-Reference Image Quality Assessment Using Meta-Learning Table 1 Performance comparison with the state-of-the-art technique according to PLCC and SROCC
143
Methods
PLCC
SROCC
BRISQUE [14]
0.648
0.615
BLIINDS [15]
0.654
0.625
MetaIQA [5]
0.784
0.766
Proposed
0.840
0.809
generalization ability of the trained model. The results of this experiment have been discussed in the following subsection.
4.3 Quantitative Analysis The performance of the proposed model has been done using two performance metrics: SROCC and PLCC. The definition of the SROCC and PLCC for T number of testing data are defined in Eqs. (3) and (4). T
Pi − μ Pi G i − μG i P LCC = 2 T 2 T i=1 Pi − μ Pi i=1 G i − μG i i=1
(3)
In Eq. (1), P denotes the predicted quality score, and G denotes the subjective quality score mentioned in the dataset. The average of the predicted and subjective scores has been denoted as μ. In Eq. (2), d is the difference between the ranks of a predicted quality score from the 1st raked score. T 2 di 6 i=1 S R OCC = 1 − T (T 2 − 1)
(4)
The comparison of the performance of the proposed model with the state-of-theart techniques has been presented in Table 1. According to the table, our proposed model is better than the related approaches.
5 Conclusion In this paper, a meta-learning-based model has been proposed for No-reference Image Quality Assessment. This proposed model has been designed to reduce the disadvantages of deep learning models and can learn some things in a generalized fashion from the training process. Although this model has been influenced by recent work, modification in the model optimization stage is the key contribution here. The contribution helps to learn the model in a more generalized fashion. A very exciting experimental
144
R. Dey et al.
setup has tested the generalized learning capability of the model, and according to this experiment, our proposed model outperforms the state-of-the-art techniques. However, this work has much scope for further research in the future. Acknowledgements The third author of the author is grateful to the project “Smart Solutions in Ubiquitous Computing Environments”, University of Hradec Kralove, Faculty of Informatics and Management, Czech Republic under Grant UHK-FIM-SPEV-2022-2102. We are also grateful for support of PhD student Michal Dobrovolny for consultations.
References 1. Li L, Lin W, Wang X, Yang G, Bahrami K, Kot AC (2016) No-reference image blur assessment based on discrete orthogonal moments. IEEE Trans Cybern 46(1):39–50 2. Li L, Zhu H, Yang G, Qian J (2014) Referenceless measure of blocking artifacts by tchebichef kernel analysis. IEEE Signal Process Lett 21(1):122–125 3. Saad MA, Bovik AC, Charrier C (2012) Blind image quality assessment: a natural scene statistics approach in the dct domain. IEEE Trans Image Process 21(8):3339–3352 4. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708 5. Gao X, Gao F, Tao D, Li X (2013) Universal blind image quality assessment metrics via natural scene statistics and multiple kernel learning. IEEE Trans Neural Netw Learn Syst 24(12):2013–2026 6. Ye P, Doermann D (2012) No-reference image quality assessment using visual codebooks. IEEE Trans Image Process 21(7):3129–3138 7. Bianco S, Celona L, Napoletano P, Schettini R (2018) On the use of deep learning for blind image quality assessment. Signal Image Video Process 12(2):355–362 8. Zhu H, Li L, Wu J, Dong W, Shi G (2020) MetaIQA: deep meta-learning for no-reference image quality assessment. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 9. Yan Q, Dong B, Li X, Xiang Y, Luo D, Wei L (2021) No-reference image quality assessment using meta-SGD. China Autom Congr (CAC) 2021:3850–3855. https://doi.org/10.1109/CAC 53003.2021.9728239 10. Zhu H, Li L, Wu J, Dong W, Shi G (2022) Generalizable no-reference image quality assessment via deep meta-learning. IEEE Trans Circuits Syst Video Technol 32(3):1048–1060. https://doi. org/10.1109/TCSVT.2021.3073410 11. Jin Y, Patney A, Webb R, Bovik AC (2022) FOVQA: blind foveated video quality assessment. IEEE Trans Image Process 31:4571–4584. https://doi.org/10.1109/TIP.2022.3185738 12. Ponomarenko N, Jin L, Ieremeiev O, Lukin V, Egiazarian K, Astola J, Vozel B, Chehdi K, Carli M, Battisti F (2015) Image database TID2013: peculiarities, results and perspectives. Signal Process Image Commun 30:57–77 13. Ghadiyaram D, Bovik AC (2016) Massive online crowd sourced study of subjective and objective picture quality. IEEE Trans Image Process 25(1):372–387 14. Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1) 15. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Security
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo Search Algorithm Subinoy Sikdar, Joydeep Biswas, and Malay Kule
Abstract This research work demonstrates the cryptanalysis of Markle Hellman Knapsack cipher using Cuckoo Search Algorithm. Cuckoo search is a populationbased metaheuristic nature encouraged optimization technique. Firstly, we have initialized different cuckoo search parameters and generated a random population of nests with egg. Each egg is a possible candidate plaintext. Then cuckoo search algorithm has been accomplished to find out the best possible solution. The proposed cuckoo search attack algorithm has been characterized by population generation, fitness function evaluation, Levy flight, mutation, and perturbation method. The algorithm gradually discovers the optimal solution by comparing the fitness values of the intermediate solution. Experimental outcomes are compared with the results that are obtained using genetic algorithm and binary firefly algorithm. Experimental results exhibit that the proposed cryptanalysis technique outperforms the previous results of cryptanalysis. Moreover, the proposed algorithm successfully recovers the plaintext from the ciphertext. Keywords Cryptanalysis · Knapsack cipher · Cuckoo search algorithm · Fitness function · Levy flight · Genetic algorithm · Binary firefly algorithm
1 Introduction Information is the power in present days. Information and data should be protected from unauthorized access, distortion, or any kind of unauthorized activity. Cryptography [1] is the study of secret writing. Online communication takes place over the network or in the communication channel. Due to the presence of network adversaries over the network, information becomes no more secret. Our communication channel is totally unreliable; a third party can secretly peep over the channel and can get the information from the network or from the channel. Here cryptography plays S. Sikdar (B) · J. Biswas · M. Kule Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_14
147
148
S. Sikdar et al.
an important role to protect our data from unauthorized access. Cryptography maps original messages (plaintext) into unreadable format (ciphertext) using key(s) and vice versa. The process of encoding original message into unreadable format using key(s) is called encryption and the inverse process of decoding the ciphertext into plaintext using key(s) is called decryption. Only the authorized user can encrypt or decrypt the messages. This encryption and decryption process may involve a single key or it may involve a key pair for encryption and decryption. Depending on key usage, cryptography can be categorized as symmetric key cryptography and asymmetric key cryptography [2]. In the case of symmetric key cryptography, a single key is used in encryption and decryption processes; but in asymmetric key cryptography, encryption and decryption are done by two different keys. In this paper, we have taken Knapsack cipher which is an asymmetric key cryptographic system. Cryptography aims for secure communication over the unreliable network or channel. Basic goal of cryptography is to provide confidentiality, integrity, authenticity, and non-repudiation [3, 4]. Cryptanalysis is the detailed study of a cryptosystem to find out the weakness, flaws of the system, and aims for breaking the cryptographic systems by finding the key or without key. So, cryptanalysis helps to get a robust cryptographic system by continuously examining the weakness and flaws of the system. Generally, every cryptosystem is breakable at some point of time with huge computation power and resources. But the limitation is time and resource power. A cryptanalyst has to break the cryptosystem within a limited time and limited resource and computation power. In this paper, we have proposed a cryptanalytic invade on Knapsack cipher. Knapsack cipher is a cipher based on the subset sum problem, which is NP-Hard problem. Binary messages are encoded in a summed-up fashion. Knapsack cipher is public key cryptosystem. Decryption is done with the confidential key of the recipient. The privet key is chosen in a super-increasing order. The security of the Knapsack cipher lies in the super-increasing order of the private key. This super-increasing order of the private key has vanished with modular arithmetic. So, it becomes very difficult for a cryptanalyst to find out the private key from the publicly available private key by brute force method where super-increasing order is destroyed. For that reason, in this paper, in our proposed method we have directly found out plaintext from the ciphertext. In present days artificial intelligence, machine learning is being involved in every field of computer science. Machine learning has a lot of influences on cryptanalysis. Nature-inspired algorithms are well-known algorithms for solving optimization problem. It has accelerated the complex cryptanalysis work in an easier way. These algorithms present a stochastic method to find optimal solution. In this paper, we have shown cryptanalysis of Knapsack cipher using cuckoo search algorithm. First, a random population of nests with egg have been generated. Here, each egg represents a plaintext or the wanted solution. The fitness of each egg is evaluated by Spillman’s fitness function. The egg with the highest fitness value is set as the best plaintext. Then a nest has been chosen randomly via Levy Flight and it generates a new cuckoo egg, i.e., a new plaintext has been generated and it is converted into binary string. Then we applied mutation on the best plaintext. If the new plaintext fitness value
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
149
is higher than the previous best plaintext, then new plaintext becomes the new best plaintext and mutation is applied again on this; so that the fitness value gradually moves towards 1 which is the best fitness value. Otherwise, a number of worst solutions are discarded and replaced with new solution by perturbation method. Then again, the steps are repeated from the fitness evaluation until the maximum iterations are reached or the fitness value becomes 1. Whenever the fitness value becomes 1, we grant the corresponding plaintext as the optimal solution and the algorithm is terminated there. Our approach was based on chosen plaintext attack. The algorithm tries to reach as close to the ciphertext by choosing plaintext and generating the corresponding ciphertext. If the value of the generated ciphertext is close to target ciphertext then it is close to the optimal solution. This process is continued until the algorithm hits the optimal solution which we have briefly discussed above. Previously, many researchers have worked on Knapsack cipher. First Knapsack cipher was breached by Shamir’s method [1]. After that, many researchers have tried to break Knapsack cipher using different ideas. Nature-inspired algorithms were used to crack Knapsack cipher in 2011. Saptarshi Neil Sinha et al. and Supravo Palit et al. applied differential evolution algorithm and binary firefly algorithm, respectively, to cryptanalyze Knapsack cipher [5, 6]. In our proposed cryptanalysis method, cuckoo search algorithm has been employed to break the Knapsack cipher. This paper has been arranged in five different sections. Section 1 introduces the paper. Section 2 describes the preliminaries followed by detailed discussion and implementation of the proposed method of cryptanalytic invade on Knapsack cipher using cuckoo search algorithm in Sect. 3. Section 4 analyses the results achieved from the experimental test, mentioned in Sect. 3. Finally, Sect. 5 discusses the future scope of this work and concludes the paper.
2 Preliminaries 2.1 Knapsack Cipher Knapsack [5, 7] cipher is a cryptosystem based on the idea of binary Knapsack algorithm along with subset sum problem. This is a public key cryptography cryptosystem. We have discussed here Knapsack cipher cryptosystems. Encryption of plaintext is done with the universal key of the receipient at the sender end and the decryption is done by the confidential key of the receiver on the receiver end. All the necessary computation to generate public from the private key, encoding of Knapsack cipher and decoding of Knapsack cipher are mentioned as follows– Receiver will choose a private key in a super-increasing order such that a = a1 , a2 , a3 , · · · · · · , a n a1 > a2 ; a1 + a2 > a3 ; a1 + a2 + a3 > a4 ; . . . ; a1 + a2 + a3 + · · · + an−1 > an
150
S. Sikdar et al.
Now receiver will choose another two numbers “z” and “w”: n ai and GCD (z, w) = 1. z ≥ i=1 Then compute, ai = wai modm a = (a1 , a 2 , a 3 , · · · · · · , a n ) ‘a’ is public key of receiver and receiver makes this key public and send it to sender. At the sender side, sender takes a message which is a string of binary bits (0, 1) such that M = (x1 , x 2 , x 3 , · · · · · · , x n ); xi ∈ (0, 1) Transmitter encrypts the message with the universal key of receiver and forward the ciphertext to receiver through the public nchannel. xi ai ; E is encryption function. Sum = E a (x1 , x 2 , x 3 , · · · · · · , x n ) = i=1 Private Key of receiver, a = (a1 , a 2 , a 3 , · · · · · · , a n ) Plaintext = {0, 1}n Ciphertext = {0, 1, 2, 3,…………., n(z − 1)}. Key Space = {(a , z, w), a} n xi ai Receiver receives, Sum = i=1 w−1 Sum mod m = Sum’. Sum’ = w−1 (a1 x1 + a2 x2 + a3 x3 + · · · + an xn ). Sum’ = w−1 a1 x1 + w −1 a2 x2 + w −1 a3 x3 + · · · + w −1 an xn Sum’ = a1 x1 + a2 x2 + a3 x3 + · · · + an xn Now, receiver will apply subset sum algorithm for decryption of Sum’ with the private key a . Algorithm Decryption {(a1 , a2 , a3 , · · · · · · , a n ), Sum’}. for i ∈ n down to 1 if Sum’ > = ai xi = 1 Sum’ = Sum’- ai else xi = 0 if Sum’ = 0 then return M = (x1 , x 2 , x 3 , · · · · · · , x n ) else No Solution
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
151
2.2 Cuckoo Search Algorithm Cuckoo Search algorithm was invented by Yang and Deb [8] in 2009 relying on the idea of how a cuckoo bird generates its population by dropping their eggs into another host bird’s nest. Each egg in the host bird’s nest denotes a candidate solution and the cuckoo egg denotes a new candidate solution. The intention is to replace the weak solution with the better solution (cuckoo egg) in the nest [9]. Cuckoo Search is dependent on three primary provisions– 1. The cuckoo bird chooses a nest randomly and drop its egg one at a time in the nest. 2. Only the nests containing better standard egg will be conducted to the next descendant. 3. The available host bird nests are fixed and the possibility of detection of the cuckoo egg by the host bird is Pa ∈ (0, 1). In this circumstance, either the host bird will decline the egg or discard the nest and fabricate an entirely different nest. In Fig. 1, we have shown the flow diagram of cuckoo search algorithm. Levy flight is performed while generating a new candidate solution xt+1 for, say, a cuckoo i: xit+1 = xit + α ⊕ Levy(λ)
Fig. 1 Cuckoo search algorithm flow diagram
(1)
152
S. Sikdar et al.
Here, α indicates the step size. Generally, α is considered to be greater than 0 and, in most of the cases, the value of α is taken as α = 1. The mentioned Eq. (1) is the stochastic formula for random walk [8]. Levy flight produces a random walk while the length of random step is obtained by the Levy distribution: Levy ∼ u = t −λ , wher e (1 < λ ≤ 3)
(2)
3 Proposed Method of Cryptanalysis In this section, cryptanalytic attack on hard Knapsack has been demonstrated with the cuckoo search algorithms. Cryptanalysis starts from the ciphertext which is available in integer format. This value illustrates projected sum of hard Knapsack problem. The algorithm aims for translation of each numerical value into a right Knapsack, which denotes ASCII notation for plaintext alphabets.
3.1 Cryptanalysis Algorithm Cuckoo Search Based Cryptanalysis Algorithm Input: Ciphertext, Public Key, Initial Population Size, Maximum Iteration. Output: Plaintext. The Attack Algorithm: 1. Initialize input variables. (Number of host nests with egg, Chances of observing the cuckoo’s egg—PA ) 2. Initialize the number of nests and egg/nest. Each egg is a random candidate plaintext, which is basically an 8 bits binary string. Population size varied from 20 to 60. 3. After populating all the nests with eggs, ciphertext is generated from the candidate plaintext (egg). Generated ciphertext are compared with the original ciphertext. Fitness value of each egg in each nest is calculated by fitness function using Eqs. (3 and 4). 4. Find the corresponding plaintext associated with the ciphertext with the highest fitness value. Mark this egg (candidate plaintext) as the best plaintext. 5. Go to a nest randomly via Levy Flight and generate a cuckoo egg (new plaintext). Convert the plaintext into binary format. new_plaintext = current_plaintext + μ*l μ = 0.01, l = 1.5 6. Apply mutation on best plaintext using mutation algorithm.
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
153
Fig. 2 Architecture of Proposed Attack Algorithm
7. Compute the fitness of the cuckoo egg (new plaintext) by fitness function using the Eqs. (3 and 4). If the fitness of the new plaintext is higher than the previous best plaintext, then cuckoo egg becomes the new best plaintext. If it is true then go to step (6). 8. Discard a snip of PA of less fit eggs (plaintexts). Substitute new cuckoo eggs (plaintexts) in place of omitted plaintexts by simple perturbation mechanism. 9. Repeat step (3)–(8) until the maximum iteration or fitness = 1 is true. 10. Output the best solution. Figure 2 exhibits the working principal flow of our proposed method.
3.2 Generation of Initial Population At very first number of host nests and eggs in the nest are initialized. Each egg is a candidate plaintext of 8 bits binary string. Each egg refers to the ASCII code of plaintext characters. Population size is varied between 20 and 60.
3.3 Fitness Evaluation The cryptanalysis algorithm executes the cryptanalysis work based on the chosen plaintext attack. Cuckoo search is a population-based metaheuristic algorithm, where initially a set of possible plaintexts are generated randomly. The algorithm aims to replace “k” worst solutions “k” best solutions depending on their fitness value. To find out the merit of each plaintext, a fitness function has been defined in this paper that guides metaheuristics to find out the right plaintext from the collection of candidate plaintext by labeling them a numerical value less than or equal to 1. As
154
S. Sikdar et al.
a cryptanalyst, we have only the public key and the ciphertext. So, we have defined the fitness function based on these two available parameters. Spillman introduced a suitable fitness function for cryptanalysis of Knapsack cryptosystem [10]. The problem here is considered to be a maximization problem and the maximum fitness cost of the candidate plaintext go up to 1. The prime challenge in designing a fitness function is to define the parameters in such a way that difference between target and sum should be normalized so that it can better describe the fitness cost of probed sum to prospected target. Public Key, a = (a1 , a 2 , a 3 , · · · · · · , a n ) Message, M = (x1 , x 2 , x 3 , · · · · · · , x n ); xi ∈ (0,1) Total_Sum = Addition of all the elements in the universal key n Total_Sum = i=1 ai n Sum = i=1 xi ai Target = Ciphertext In this paper, we have used Spillman’s fitness function to measure the fitness of the candidate plaintexts. The Spillman fitness function is defined as follows– If (Sum < = Target) Fitness = 1− (|Sum − Target|/Target)1/2
(3)
Fitness = 1− (|Sum − Target|/MaxDiff)1/6
(4)
Else,
where MaxDiff = max (Target, Total_Sum – Target)
3.4 Mutation Basically, mutation is the flipping of some bits in the candidate plaintext. We define a variable, probability of mutation, Pm . The value of Pm is decided based on the experiment. “n” represents the number of bits present in the candidate plaintext. U[0,1] is uniform random distribution between 0,1. Mutation Algorithm:
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
155
3.5 Perturbation Method In applied mathematics, perturbation theory compromises an approximate solution to a problem. It starts from an exact solution of a relatable, similar problem. The solution is intimated as power series with a small parameter β. B = B0 + B1 β1 + B2 β2 + … where β → 0. Here, the leading term is the exact solution of the related solvable problem; whereas the other terms give a deviation of the solution. So, B0 is the exact solution of the related problem, whereas B is an approximation of the exact solution. B ≈ B0 + B1 β 1
4 Experimental Results and Analysis In this module, we have mentioned the experimental outcomes obtained from the cryptanalysis of Knapsack cipher using Cuckoo Search Algorithm and compared with the experimental results obtained from the Genetic Algorithm and Binary Firefly Algorithm.
4.1 Generation of Ciphertext Confidential Key: (4 5 13 23 48 96 193 384). z = 776; w = 13; w−1 = 597. Universal Key: (52 65 169 299 624 472 181 336). The public key mentioned above was used to encrypt the word “MACRO”. Table 1 displays the encoding technique of the string “MACRO”. Table 1 Encryption of Knapsack Cryptosystem [5, 6]
String
ASCII notation
Ciphertext (target)
M
01,001,101
1497
A
01,000,001
401
C
01,000,110
582
R
01,010,010
545
O
01,001,111
1678
156
S. Sikdar et al.
Table 2 Experimental results obtained from genetic algorithm (GA) for cryptanalysis [5] String
Expt. 1
Expt. 2
Expt. 3
Expt. 4
Expt. 5
Mean
M
708
412
135
425
19
339.80
A
777
154
142
58
535
333.20
C
218
366
374
129
150
247.40
R
109
269
539
42
2
192.20
O
42
128
189
66
330
151.00
4.2 Experimental Results Obtained from Cryptanalysis Using Genetic Algorithm (GA) Experimental results obtained from the genetic algorithm are shown in tabular form. GA parameters are initialized as follows– Probability of crossover: 0.80; Probability of Mutation: 0.25; Population size: 20. Table 2 displays the total generation counts, it needed for the cryptanalysis of Knapsack cryptosystem.
4.3 Experimental Results Obtained from Cryptanalysis Using Binary Firefly Algorithm (BFA) Experimental results obtained from the binary firefly algorithm are shown in tabular form. BFA parameters are initialized as follows: Population size: 20. Table 3 exhibits the total generation counts, it needed for the cryptanalysis of Knapsack cryptosystem. Table 3 Experimental results obtained from binary firefly algorithm (BFA) for cryptanalysis [6] String
Expt. 1
Expt. 2
Expt. 3
Expt. 4
Expt. 5
Mean
M
5
12
26
5
17
15.00
A
20
12
7
10
11
12.00
C
14
10
8
25
5
12.40
R
3
5
8
22
5
8.60
O
2
10
4
7
3
5.20
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
157
4.4 Experimental Results Obtained from Cryptanalysis Using Cuckoo Search Algorithm (CSA) Experimental results obtained from the cuckoo search algorithm are shown in tabular form. CSA parameters are initialized as follows: Cuckoo egg discovery probability (PA ): 0.20; Mutation probability (Pm ): 0.05; Population size: 20. Table 4 displays the total generation counts, it needed for the cryptanalysis of Knapsack cryptosystem. From the above tables it can be calculated that average generation count needed for cryptanalysis of the string “MACRO” encoded by Knapsack cryptosystem by Genetic Algorithm (GA) and Binary Firefly Algorithm (BFA) are 206.4 and 10.64, respectively, whereas Cuckoo Search Algorithm (CSA) takes 7.72 average number of generation. The results of gentetic algorithm, binary firefly algorithm, and cuckoo search algorithm are compared in Fig. 3. Table 5 illustrates that the average number of generations are needed to encrypt the word “MACRO” using Genetic Algorithm (GA), Binary Firefly Algorithm (BFA) and Cuckoo Search Algorithm (CSA). In Table 6, we have recorded the average generation count that occurred for cryptanalysis of the string “MACRO” encoded by Knapsack cryptosystem and it shows an interesting fact that with the increasing community size, generation counts are decreased. Here the population size varies from 20 to 60. In Fig. 4 comparison of Table 4 Experimental results obtained from cuckoo search algorithm (CSA) for cryptanalysis String
Expt. 1
Expt. 2
Expt. 3
Expt. 4
Expt. 5
Mean
M
19
3
11
3
46
16.40
A
3
7
7
1
12
6
C
2
5
2
8
3
4
R
2
3
4
12
4
5
O
4
8
16
1
7
7.20
158
S. Sikdar et al.
AVERAGE GENERATION BFA
CSA
15 16.4
12 6
12.4 4
8.6 5
5.2 7.2
151
192.2
247.4
333.2
339.8
GA
M
A
C
R
O
Fig. 3 Comparision of GA, BFA, and CSA on the basis of average generation
Table 5 Average number of generations to encrypt the word “MACRO”
Avg. no. of generations
GA
BFA
CSA (proposed)
M
339.80
15
16.40
A
333.20
12
6
C
247.40
12.40
4
R
192.20
8.60
5
O
151
5.20
7.20
the three different algorithms (GA, BFA, CSA) which were used for cryptanalysis purpose have been exposed with the line chart.
Cryptanalysis of Markle Hellman Knapsack Cipher Using Cuckoo …
159
NUMBER OF GENERATION
NUMBER OF GENERATION VS POPULATION SIZE GA 600
BFA
CSA
513
500 400
275
300
180
200 100
0 14
6
15 20
5
2 3
30
78 2
40
2
1
50
34
1
60
POPULATION SIZE
Fig. 4 Comparison of GA, BFA, and CSA on the basis of average generation versus community size
Table 6 Experimental results of a number of generations with the variation of population size for GA, BFA, and CSA (For cryptanalysis) Number of generations
Population size 20
30
40
50
60
GA
513
275
180
78
34
BFA
15
5
3
2
1
CSA
14
6
2
2
1
5 Conclusion This paper presents the cryptanalysis of Markle Hellman Knapsack Cipher using Cuckoo Search Algorithm. The results obtained from this experiment are compared with the results obtained from the cryptanalysis of Knapsack cipher using both the genetic algorithm and binary firefly algorithm. This work also highlighted the fascinating characteristics of a reduction in the number of generations with an increasing variation in population size. The results also indicate that cryptanalysis using CSA is much better than standard GA and it is also better than BFA to some extent. In future, we shall continue our cryptanalysis research work by exploring other bio inspired algorithms.
160
S. Sikdar et al.
References 1. Shamir A (1984) A polynomial-time algorithm for breaking the basic Merkle-Hellman cryptosystem. IEEE Trans Inf Theory 30(5):699–704 2. Forouzan BA (2007) Cryptography and network security. Tata McGraw-Hill, New Delhi 3. Kahate A. Cryptography and network security. McGraw Hill Education (India) Private Limited 4. Stallings W. Cryptography and network security: principles and practice. Pearson 5. Sinha SN, Palit S, Molla MA, Khanra A, Kule M (2011) A cryptanalytic attack on Knapsack cipher using Differential Evolution algorithm. IEEE Recent Adv Intell Comput Syst 2011:317– 320 6. Palit S, Sinha SN, Molla MA, Khanra A, Kule M (2011) A cryptanalytic attack on the knapsack cryptosystem using binary Firefly algorithm. In: 2011 2nd International conference on computer and communication technology (ICCCT-2011), pp 428–432 7. Mandal T, Kule M (2016) An improved cryptanalysis technique based on Tabu search for Knapsack cryptosystem. Int J Control Theory Appl 16(9):8295–8302 8. Yang X-S, Deb S (2009) Cuckoo search via Lévy flights. World Congr Nat Biol Inspired Comput (NaBIC) 2009:210–214 9. Kamal R, Bag M, Kule M (2020) On the cryptanalysis of S-DES using binary cuckoo search algorithm. In: Computational intelligence in pattern recognition. Advances in intelligent systems and computing, vol 999. Springer, Singapore 10. Spillman R (1993) Cryptanalysis of knapsack ciphers using genetic algorithms. Cryptologia
Generating a Suitable Hash Function Using Sudoku for Blockchain Network Sunanda Jana, Esha Sen Sharma, Abhinandan Khan, Arnab Kumar Maji, and Rajat Kumar Pal
Abstract Sudoku, a number-placement, logic-based puzzle, not only treats our minds but also secures our data in today’s digitized world by its uniqueness. Nowadays, Blockchain technology is ruling a significant part of the financial market. This gigantic Blockchain technology represents a transaction’s digital ledger which is mainly duplicated and distributed throughout the entire network on the Blockchain. Our work’s main objective is to enhance Blockchain technology’s security. Our method proposed a new concept by making changes in generating the hash function of SHA256 by embedding mind cracking Sudoku puzzle. This work adds extra security to blockchain technology. This method gives superior protection against brute force attacks as well as pre-image and collision attacks compared to SHA256 without increasing complexity. Those added characteristics make it quite impossible to change, hack, break, or cheat the system by using records of transaction processes by generating the strongest hash function. Keywords Blockchain · Distributed ledger system · Hash value · Sudoku
S. Jana (B) Haldia Institute of Technology, Hatiberia, ICARE Complex, Dist, Haldia 721657, West Bengal, India e-mail: [email protected] E. S. Sharma · A. Khan · R. K. Pal University of Calcutta, Acharya Prafulla Chandra Roy Shiksha Prangan, JD-2, Sector-III, Saltlake, Kolkata 700106, India A. Khan Product Development and Diversification, ARP Engineering, 147 Nilgunj Road, Kolkata 700074, India A. K. Maji North-Eastern Hill University, Umshing Mawkynroh, Shillong 793022, Meghalaya, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_15
161
162
S. Jana et al.
1 Introduction An NP-complete, structural as well as combinatorial mind tracking problem [13], “Sudoku” is the abbreviated word of, “Suuji wa dokushin ni kagiru” [10] which reflects the interesting meaning “the digits must remain single” [3]. Sudoku is primarily a puzzle that may be numeric or alphanumeric, or only an alphabet can be used to construct m × m matrix, where m represents a perfect square integer which is always greater than one. The puzzle provides some clues, i.e., a few prefilled cells. The primary strategy is to fill all the blank cells repetition from 1 to m in √ √ without m rows, m columns, and m subgrids of size m × m, all in isolation. A solution to a giant Sudoku puzzle of size 16 × 16 is shown in Fig. 1, where the size of each mini-grid is 4 × 4. Sudoku puzzles have different difficulty measures considering the number and position of clues. The execution time as well as complexity can vary depending on the puzzle’s difficulty level which can be either easy, medium, or hard. Sudoku may have different variants based on its grid sizes or region shapes. Nowadays, Sudoku’s applications enormously cracked important fields like cryptography, Steganography [9], Encrypting messages, texts, images, audio, videos, etc. In this paper, we have used the Sudoku puzzle to generate a hash function. Blockchain technology uses SHA256 for its security. SHA256 is a secure cryptographic hash algorithm. In our work, we generate hash values by embedding Sudoku during generating the hash. Our method will provide optimized security as solving any Sudoku of size 16 × 16 we can generate multiple solution instances, i.e., approximately 5.96 × 1098 different Sudoku grids as solution matrix [5], by considering any one of the solution matrix hash value generated in our proposed work. In such a situation, if we choose Sudoku for hash generation, there will be hardly any chance for the attacker to track
Fig. 1 2D Giant Sudoku (16 × 16) instance representing one of the solutions
Generating a Suitable Hash Function Using Sudoku for Blockchain Network
163
the information. Another essential fact is Sudoku’s unique constraint according to which numbers from 1 to 16 occur precisely once for every row, column, as well as 4 × 4 mini-grid. So, using the Sudoku matrix for generating Hash will improve the security of Blockchain at its extreme level. And also, it becomes incredible to modify or change the attacker because of the uniqueness constraint of this Sudoku game. For that reason, a brute-force attack is not possible. Another advantage of our proposed method, compared to SHA256, is that it can resist so-called pre-image and collision attacks as we are picking one among 5.96 × 1098 different Sudoku solutions during the generation hash function. Our paper is organized in the following way: The introduction and literature survey of Sudoku are briefly described in Sects. 1 and 2, respectively. Under Sect. 3, Sects. 3.1 and 3.2 comprise preliminaries of Blockchain technology and proposed algorithm. Proposed algorithm is explain in Sect. 4. All experimental results have been discussed in Sect. 5. Finally, conclusions and future works are discussed in Sect. 6.
2 Literature Survey Blockchain is widely used to provide security and distributes data efficiently and uniquely. In the distributed network, removing a central instance implies a radical shift to direct transactions between non-intermediaries or intermediary parties [11]. Here data is immutable, which means once it has been written to this network, no one, even the administrator of the system, cannot modify or delete it from the transaction Ledger. Here, time stamp is used for each data block, and they are linked in chronological order by means of cryptographic signature [12]. By analyzing the importance of Blockchain technology, we have added extra security to this technology by generating new hash values using Giant Sudoku, as we know that a single 16 × 16 Sudoku can generate multiple solutions. According to Bogart et al., a blockchain is a list of digital records which are encrypted or transactions named as blocks. Each block is then “chained” to the next in a linear, chronological order, using a cryptographic signature [1]. It uses SHA256 for its security. This technology is a revolution in the computer world where digital recording and storing of data can be done considering multiple nodes. Walport, 2016, defined one of the most important elements of Blockchain, known as the Ledger, which has similarities with the relational database. The blocks in the Blockchain technology contain a copy of the last transactions from the moment the last block was inserted [1]. Here, the shared block, considered a Ledger, is connected to all participants, those who are using a network to validate transactions, without any involvement of a third-party [4]. Based on Casino et al., blockchain technology offers the potential to crack various sectors [2] for its unique combination of features, like decentralization, persistency, anonymity, auditability, transparency, and immutability. In this paper, we want to show that the Sudoku puzzle is not only a game but can also be applied in various fields [6] of application, and here how BT’s security has increased by adding a Sudoku puzzle is explained in an effective way to get optimized financial security. Nowadays, blockchain technol-
164
S. Jana et al.
ogy, or BT, has gained the most prominent popularity in various fields [7] after the development of cryptocurrencies like Bitcoin, Litecoin, Dash, and Monero, which all have captured remarkable capitalizations in the global financial market.
3 Proposed Algorithm In this section, we will discuss our proposed algorithm. The main backbone of our method is to make the Blockchain network more efficient concerning time as, in most cases, it is dealing with financial matters. Before we discuss our proposed method, we will first discuss Blockchain technology.
3.1 Blockchain Technology In today’s world, Blockchain technology is popularized as a digital ledger of transactions copied and distributed throughout the network, and it is pretty impossible to track the system information. Blockchain gets more popularity after applying this to cryptocurrencies like Bitcoin and Ethereum. Figure 2 shows an overview of Blockchain technology. A block contains two hash values named Merkle hash and previous Hash. Each block has a fixed storage capability and, after being filled, is closed and connected to the previously filled block, which constructs a chain of data called as the blockchain. The transaction list contains the transactions that have occurred till now. Every block contains a specific number (even or odd) of transactions. These transactions present in the transaction list participate in generating the Merkle hash. Merkle hash is stored in the block header. All transactions in the block are hashed using the hash algorithm and generate a hashed value of each transaction. This transaction hash pairs up with another such transaction hash by logical adding. The sum is then hashed again and paired with another hashed sum. The pair then undergoes logical adding, and this process continues until we are left with only one hash value that
Fig. 2 An overview of the blockchain technology
Generating a Suitable Hash Function Using Sudoku for Blockchain Network
165
serves as the root, Merkle root, Merkle hash, or a bottom-up tree. In the block header of the current block and the previous Hash of the next block in the chain, this Merkle hash is then kept, as depicted in Fig. 2.
3.2 Proposed Algorithm: Gener at i ng_H ash_V al ue_U si ng_Sudoku The algorithm Generating_H ash_V alue_U sing_Sudoku takes a string as an input, and generates the hash value as the output. This algorithm calls other algorithms, namely Ch and Ma, defined later in this document. Start 1. Initialize a solved 16 × 16 Sudoku. 2. Initialize eight hash initials as {h0, h1, h2, h3, h4, h5, h6, h7} with the first four bytes of the exponent part of the square root by taking first eight prime numbers, i.e., 2, 3, 5, 7, 11, 13, 17, and 19. 3. Initialize 64 K constants as an array in K with the elements being the first 16 bytes of the exponent part of the cubic root of the first 64 prime numbers, i.e., 2, 3, 5, 7, 11, …, 311. In other words, K = [first 16 bytes (cube root (2)), first 16 bytes (cube root (3)), first 16 bytes (cube root (5)), first 16 bytes (cube root (7)), first 16 bytes (cube root (11)), . . . , first 16 bytes (cube root (307)), first 16 bytes (cube root (311))]. 4. Generating the array W : For each character, c in a word do the following: a. b. c. d. e.
Initialize character_ascii = decimal(ascii (c)) Initialize column = character_ascii % 10 ; only the quotient of the division is taken Initialize row = character_ascii 10 Initialize element = sudoku[row][column] Add element to W
5. Initialize variables: a. b. c. d. e. f. g. h.
A = h0 B = h1 C = h2 D = h3 E = h4 F = h5 G = h6 H = h7
6. Repeat the following steps 64 times: a. Initialize t1 = Ch (E, F, G) + Wi + K i + S1 (E) + H , where 1 ≤ i ≤ 64 and S1 denotes shift 1 bit to the right.
166
S. Jana et al.
b. Initialize t2 = Ma (A, B, C) + S0 (A), where S0 denotes shift 0 bits to the right. c. Initialize temp = D d. D = C e. C = B f. B = A g. A = t1 + t2 h. H = G i. G = F j. F = E k. E = temp + t1 7. Add the 64 rounds of the generated values, {A, B, C, D, E, F, G, H }, to the hash initials, respectively. a. b. c. d. e. f. g. h.
h0 = h0 + A h1 = h1 + B h2 = h2 + C h3 = h3 + D h4 = h4 + E h5 = h5 + F h6 = h6 + G h7 = h7 + H
8. Initialize hashValue = h0 + h1 + h2 + h3 + h4 + h5 + h6 + h7 9. Return hashValue End
3.2.1
Algorithm: C h (x, y, z)
The algorithm takes three hexadecimal inputs, namely x, y, and z, and returns hexadecimal which is used by the algorithm Generating_H ash_V alue_U sing _Sudoku. Start 1. Operand1 = x ∧ y 2. Operand2 = ¬x ∧ z 3. Return (Operand1 ∨ Operand2) End
Generating a Suitable Hash Function Using Sudoku for Blockchain Network
3.2.2
167
Algorithm: M a (x, y, z)
The algorithm takes three hexadecimal inputs, namely x, y, and z, and returns hexadecimal which is used by the algorithm Generating_H ash_V alue_U sing _Sudoku. Start 1. 2. 3. 4.
Operand1 = x ∧ y Operand2 = x ∧ z Operand3 = y ∧ z Return (Operand1 ∨ Operand2 ∨ Operand3)
End
4 Explanation of Proposed Algorithm 4.1 Gener at i ng_H ash_V al ue_U si ng_Sudoku This section illustrates our proposed algorithm for generating Hash value. Representation of our proposed method is described using a flowchart, picturized in Fig. 3. A stepwise explanation of our proposed algorithm is as follows. In step 1, the Giant 16 × 16 Sudoku is initialized. In step 2, the hash initials are initialized. These constants are used later in the algorithm at different times in different steps of the algorithm. Initialization h0 = A = 0 × 6 h1 = B = 0 × b h2 = C = 0 × 3
(1) (2) (3)
h3 = D = 0 × a h4 = E = 0 × 5
(4) (5)
h5 = F = 0 × 9 h6 = G = 0 × 1
(6) (7)
h7 = H = 0 × 5
(8)
In step 3, the K constants are initialized. These constants will be used in the algorithm to calculate the hash value. Consider the word is “This” that is to be hashed. Then, in step 4, individual characters are ‘T,’ ‘h’, ‘i’, and ‘s’. We take the first character, i.e., ‘T’, and the decimal ASCII
168
S. Jana et al.
Fig. 3 Generating hash value using Sudoku
value for ‘T’ is 84. Thus, we take the 8th row and 4th column of the Sudoku, and we get 6. The element six is added to the array, W . Similarly, all the decimal ASCII value for ‘h’ is 104. Then, the value is considered. Thus, the array, W = [6, 2, 15, 9]. In step 6, for the first iteration, i = 0, 0 ≤ i < 64. Initialize t1 = Ch (E, F, G) + Wi + K i + S1 (E) + H
(9)
where Ch (x, y, z) is given by (x ∧ y) ∨ (¬x ∧ z)
(10)
Si (x) = x >> i.
(11)
and We have E = 0 × 5 [from (5)], F = 0 × 9 [from (6)], G = 0 × 1 [from (7)], therefore, using (10) we have Ch = 1. W0 = 6 and K 0 = 0 × 428a, and S1 (E) = 2 (calculated using (11)). Therefore, from (9), we get t1 = 17048.
(12)
t2 = Ma (A, B, C) ∧ S0 (A) ,
(13)
Initialize
Generating a Suitable Hash Function Using Sudoku for Blockchain Network
169
where Ma (x, y, z) = ((x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z)) .
(14)
Therefore, Ma (A, B, C) = 3 S0 (A) = 6 t2 = 2.
(15)
Then initialize temp, D, C, B, H, G, F, E, and h0 to h7. Repeating this process 64 times gives the hash initials. In step 7, adding these hash initials obtained yields the hash value 275275764372103527875145819. In step 8, this hash value is returned.
5 Results and Discussions This section represents the result of our proposed method which is shown in Table 1 to judge the efficiency and feasibility of our algorithm compared to SHA256. Our algorithm is tested on Intel Quad Core (TM) i5-7200U CPU-2.50GHz, 8.00GB RAM using 64-bit OS. The platform used is Python version 3 and is implemented in Spyder. Blockchain is implemented following the standards of Ethereum. Here, comparisons are made with respect to time, and our proposed method gives superior results in every case which is shown in Table 1. Figure 4 represents an analysis of time is done concerning the length of the data. With the length of the data being on the x-axis and time being on the y-axis, the SHA256 time analysis graph, named “SHA256” alone, scales from a range of 0–1.2 units of time. It is evident from “Time Analysis” that the time required for Sudoku hashing is much less than the time taken by an algorithm of SHA256, as we are choosing one among multiple solutions among approximately 5.96 × 1098 different Sudoku grids as solution matrix. Again there exists some heuristic-based deterministic algorithm [8] to produce all possible solutions of sudoku shown in Fig. 5. In Fig. 5, the method for generating possible valid solutions for a single mini-grid is represented in valid permutation tree structure. If we consider a brute-force attack for SHA256 would need to make 2256 attempts to extract the initial data. Hence, our proposed method adds high security, reliability, and safety in the matter of security as it needs total 5.96 × 1098 + 2256 attempts. Hence, cracking this hash value and determining which Sudoku solution has been considered to generate the same becomes harder to guess.
170
S. Jana et al.
Fig. 4 Time comparison by Sudoku hashing algorithm (plotted in gray) and SHA256 hashing algorithm (devised in blue)
Fig. 5 Heuristic-based deterministic approach to produce all possible solutions of Sudoku [8] Table 1 Difference in time between our proposed method and SHA256 for generating hash values Length of the data (bits)
Proposed method (sec)
SHA256 (sec)
236 349 578 691 805 1034 1147 1376 1490 1719 1833
0.03691268 0.003002405 0.002020121 0.002913713 0.004765272 0.003873587 0.004114389 0.006951809 0.008889675 0.00975132 0.004910946
0.063682795 0.002038002 0.167150021 0.040118217 0.185535908 0.129361629 0.076047897 0.251678944 0.115893126 0.147347212 0.099294662
Generating a Suitable Hash Function Using Sudoku for Blockchain Network
171
6 Conclusions and Further Works In this proposed method, the hash value generation uses Sudoku effectively over time. Our proposed algorithm gives a special effect concerning time without increasing any overhead. Thus, the computational complexity is θ (N ), the same as SHA256. The future scope for this algorithm is to impose this new secure algorithm in different applications of Blockchain to enhance security without increasing cost or any other overheads.
References 1. Bogart S, Rice K (2015) The blockchain report: welcome to the internet of value. Needham Insights 5:1–10 2. Casino F, Dasaklis TK, Patsakis C (2019) A systematic literature review of blockchain based applications: current status, classification and open issues. Telemat Inform 36:55–81 3. Chel H, Mylavarapu D, Sharma D (2016) A novel multistage genetic algorithm approach for solving sudoku puzzle. In: 2016 international conference on electrical, electronics, and optimization techniques (ICEEOT). IEEE, pp 808–813 4. Christidis K, Devetsikiotis M (2016) Blockchains and smart contracts for the internet of things. IEEE Access 4:2292–2303 5. Felgenhauer B, Jarvis F (2005) Enumerating possible sudoku grids. http://www.afjarvis.staff. shef.ac.uk/sudoku/sudoku.pdf 6. Jana S, Dutta N, Maji AK, Pal RK (2023) A novel time-stamp-based audio encryption scheme using sudoku puzzle. In: Proceedings of international conference on frontiers in computing and systems. Springer, pp 159–169 7. Leible S, Schlager S, Schubotz M, Gipp B (2019) A review on blockchain technology and blockchain projects fostering open science. Front Blockchain, 16 8. Maji AK, Jana S, Pal RK (2013) An algorithm for generating only desired permutations for solving sudoku puzzle. Procedia Technol 10:392–399 9. Maji AK, Pal RK, Roy S (2014) A novel steganographic scheme using sudoku. In: 2013 international conference on electrical information and communication technology (EICT). IEEE, pp 1–6 10. Mishra DB, Mishra R, Das KN, Acharya AA (2018) Solving sudoku puzzles using evolutionary techniques-a systematic survey. In: Soft computing: theories and applications. Springer, pp 791–802 11. Tapscott A, Tapscott D (2017) How blockchain is changing finance. Harv Bus Rev 1(9):2–5 12. Walport M et al (2016) Distributed ledger technology: beyond blockchain. UK Government Office for Science 1:1–88 13. Yato T, Seta T (2003) Complexity and completeness of finding another solution and its application to puzzles. IEICE Trans Fundam Electron Commun Comput Sci 86(5):1052–1060
Why Traditional Group Key Management Schemes Don’t Scale in Multi-group Settings? Payal Sharma and B. R. Purushothama
Abstract The traditional key management schemes focus on single-group settings. The design principles focus on reducing the rekeying, storage, and computation costs and satisfying forward and backward secrecy. Several applications exhibit the multigroup scenario. The study of traditional key management schemes’ deployment in multi-group settings is important. We analyze the representative polynomial-based key management scheme in multi-group settings. We show that the scheme does not scale well in the multi-group settings with respect to storage, rekeying, and computation costs. We infer that all the traditional key management schemes do not scale well. We put forth that the new design principles should be adopted to design an efficient multi-group key management scheme. Keywords Group key management · Polynomial · Multi-group communication · Scalability · Rekeying cost · Centralized
1 Introduction Due to the increasing usage of the service groups in different areas, the security in these service groups became a matter of concern. For this purpose, key management must be used in these service groups for secure group communication. Usually, key management is done for service groups individually. The past literature assumes that the service groups are independent, which means if a user is part of one service group, it won’t participate in other service groups. Very less research is done in the scenario where these service groups are not independent. Consider a scenario where multiple groups exist and work simultaneously, and users participate in more than one group at the same time. The above-mentioned scenario is referred to as secure multi-group P. Sharma (B) · B. R. Purushothama National Institute of Technology Goa, Farmagudi, Ponda, Goa 403401, India e-mail: [email protected] B. R. Purushothama e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_16
173
174
P. Sharma and B. R. Purushothama
communication and makes use of a multi-group key management scheme (MGKM) for security concerns. MGKM involves the multiple group key generation and its distribution. It also must handle the rekeying operation in multiple groups the instant a user is added or departed from the multi-group communication as a user’s membership change will affect different groups simultaneously. Usually, in traditional key management (GKM) schemes (when applied in multi-group scenario), the rekeying operation done independently in each service group induces more overhead. This is why traditional key management schemes don’t scale well for the multi-group scenario. This is what is exhibited in this work. The traditional GKM scheme based on polynomial construction is discussed here along with the rekeying operation. The polynomial-based traditional key management scheme is employed for multi-group scenario. The cost analysis is done for both the cases, and we have shown that the cost of the polynomialbased schemes in multi-group case increased drastically. It explains why the traditional key management schemes don’t scale well for multi-group settings.
1.1 Motivation Many GKM schemes have been proposed for a scenario where it is assumed that the service groups work independently. These traditional GKM schemes are made for individual service groups. But the realistic scenario is when these groups work simultaneously, and users could join or leave multiple groups at the same time. Since the traditional GKM schemes work independently in each service group, we would like to know whether it induces more overhead. So, when these traditional GKM schemes are applied in the multi-group scenario, we would like to know whether these schemes scale well in this case also. If this is the situation, then traditional GKM schemes can be directly used for multi-group settings. Among all the existing GKM schemes in the literature, we choose the polynomial-based GKM scheme as the representative scheme and analyze its performance and scalability in multi-group settings. This study’s primary objective is to observe the storage at user or GC, encryption cost at GC and decryption cost at each user along with the communication cost when the GKM scheme is applied in multi-group scenario. The observations made concerning the polynomial-based scheme are applicable to all the other traditional key management schemes.
1.2 Our Contributions – We analyze whether the traditional GKM schemes scale well in multi-group settings. – In particular, we analyze the polynomial-based GKM scheme for its applicability in multi-group settings. – We show that the traditional GKM schemes do not scale well in multi-group settings.
Why Traditional Group Key Management Schemes …
175
The structure of the paper is detailed as follows. Section 2 briefs the traditional GKM schemes present in the past literature. Section 3 describes the representative traditional GKM scheme based on polynomial construction. Section 4 explains the multi-group communication and shows the applicability of the traditional GKM scheme in the multi-group scenario. Section 5 briefs the cost analysis of both traditional and multi-group key management schemes. Section 6 concludes that the traditional key management schemes don’t scale well when applied in multi-group settings.
2 Literature Review Many traditional GKM schemes are proposed in the past literature based on different cryptographic primitives. Kumar et al. [1] used the Chinese remainder theorem (CRT) for key management. They claimed to achieve less encryption and decryption cost at GC and users, respectively, along with the security. Mansour et al. [2] presented the scheme based on CRT, which claims to be efficient in encryption/decryption cost and scalable. Rawat et al. [3] presented a GKM scheme using key-tree and elliptic curves, which they claim to achieve low computation and rekeying cost. Pareek et al. [4] exhibited a GKM scheme using proxy re-encryption. They claim to achieve low decryption cost using a linear public bulletin size. Among them, polynomial-based GKM schemes are famous for their low storage overhead at the GC and user and less key extraction cost at the user. This paper chooses the polynomial-based GKM scheme as the representative key management scheme to analyze its performance when deployed in multi-group scenario. The construction presented by Piao et al. [5] is famous among them. They proposed a polynomial-based construction so that each user can use its long-term secret to derive the group key. The polynomial generated in this construction is made using two parts. The first part contains (x − ki ) terms containing the long-term secret of each user u i , i = 1 to n, and the second part includes the group key G K . This construction is explained in work later. Other polynomial-based constructions are proposed keeping different aims in focus. The work presented by Sun et al. [6] keeps in focus the authentication of the broadcast message sent and security against collusion attack. Dinker et al. [7] proposed a GKM scheme using trivariate polynomial, Baburaj et al. [8] used polynomial and multi-variate map, and Mahmood et al. [9] presented multi-party key management using polynomials. Ganesan et al. [10] exhibited a polynomial-based key management scheme keeping scalability and efficiency in focus. Ramkumar et al. [11] used Chebyshev polynomials, whereas in [12], Gautam et al. exhibited GKM scheme using ( p, q)-Lucas polynomial. Albakri et al. in [13, 14] presented a polynomial GKM scheme in blockchain and fog computing. Zhang et al. [15] presented GKM scheme using bi-variate polynomial; however, in [16], Zhang et al. exhibited a polynomial scheme with homomorphic encryption. Guo et al. [17] exhibited a GKM scheme based on double hash chains, aiming to reduce storage and rekeying cost.
176
P. Sharma and B. R. Purushothama
Nafi et al. [18] (similarly, Albakri et al. [19] and Mohammadi et al. [20]) aimed for a reduction in storage and computation cost along with the security against the node capture attack. Zhan et al. [21] used a system of equations to communicate the group key to the users. Dinker et al. [22] exhibited a GKM scheme based on polynomials and matrices. Zhang et al. [23] used polynomials for key distribution aiming the security against node capture attack. Hamsha et al. [24] presented a lightweight scheme that decreases the size of the key and aims for security against node compromise attack. As was mentioned, the literature contains numerous proposals for schemes based on polynomial construction. But all these traditional schemes are not designed to be applied in multi-group scenario. We analyzed one representative scheme based on polynomial construction in multi-group scenario to observe whether it scales well. By scalability, we mean that as the number of groups increases the performance metrics, i.e., computation, communication, and storage cost, remain unaffected. The performance analysis will make it clear whether the traditional schemes scale well in multi-group scenario or the complexity grows with the increase in the number of groups.
3 Polynomial-Based Traditional Group Key Management Scheme The secure GKM schemes are very essential for various applications exhibiting the requirement of users’ participation in a group. Key management is a tedious task in any application, especially in secure group communication with the increase in social media usage. Each user is part of several applications and is part of a group in each application. Users are simultaneously required to participate in multiple groups than a single group. The focus of the traditional group key management schemes, irrespective of the strategy adopted for the key management, is to manage the users’ keys belonging to a single group. However, the current need is to address the situation where a user participates in multiple groups at the same time. So, managing the keys of multiple groups raises several challenges. One obvious and straightforward approach is to adopt a traditional key management scheme for individual groups and manage the keys individually for a group as guided by the single GKM scheme. We investigate in this work, whether this straightforward adoption of the traditional GKM schemes in the multi-group settings scales well. Specifically, we table the polynomial-based traditional GKM scheme as the base scheme. We adopt the same scheme for multi-group settings. We describe the GKM scheme based on polynomial constructions. Suppose the users U = {u 1 , u 2 , . . . , u n } want to engage in group communication in the presence of a trusted entity, the group controller (GC), who manages the group. Eventually, each user u i and GC share securely a key ki . To broadcast a group key G K to the legitimate members, GC constructs the polynomial given in Eq. 1 as follows:
Why Traditional Group Key Management Schemes …
177
P(x) = (x − k1 )(x − k2 ) . . . (x − kn ) + G K
(1)
and broadcasts P(x). Note that broadcasting P(x) means sending the coefficients of the polynomial. Each user u i receiving the polynomial P(x) evaluates it at x = ki and obtains G K . To maintain the backward secrecy when a new user, say user u (n+1) , is added to the secure group communication, GC selects new group key G K and generates the polynomial given in Eq. 2 as follows: P (x) = (x − k1 )(x − k2 ) . . . (x − kn )(x − k(n+1) ) + G K .
(2)
Note that k(n+1) is a securely shared key of user u (n+1) with GC. Then GC transmits P (x). Each user u i , i = 1 to (n + 1) evaluates P (x) at x = ki to derive the key G K . To maintain the security constraint of forward secrecy, when a member is revoked from a group, say user u 1 , GC randomly picks a new group key G K and reproduces the polynomial given in Eq. 3 as follows: P (x) = (x − k2 )(x − k3 ) . . . (x − k(n+1) ) + G K
(3)
and transmits P (x). Except the user u 1 , every other user u i can obtain G K by calculating P (x) at x = ki . The scheme meets the security objectives of forward and backward secrecy.
4 Polynomial-Based Traditional Key Management Scheme in Multi-group Scenario InMGKMschemes,itisassumedthatthesysteminvolvesmanyservicegroupsworking simultaneously and users can be part of more than one group at the same time. The issue in the MGKM is that when users join or leave the service groups, the independent rekeying in the individual groups generates a huge rekeying overhead. The traditional key management schemes, if used in multi-group settings, the overall overhead increases drastically. This can be understood by the example of a traditional polynomial-based key management scheme. Consider G 1 , G 2 , . . . , G m to be the groups with n 1 users, n 2 users, . . ., n m users, respectively. Suppose each group uses a polynomial-based scheme to manage the secure communication. The polynomials in the multi-group scenario are constructedinthesamemannerasthesingle-groupcasebutform groupsgiveninEqs. 4, 5, and m more equations in similar fashion as shown below: P1 (x) = (x − k11 )(x − k21 ) . . . (x − kn11 ) + G K 1 P2 (x) = (x − .. .
k12 )(x
−
k22 ) . . . (x
−
kn22 )
+ G K2
Pm (x) = (x − k1m )(x − k2m ) . . . (x − knmm ) + G K m .
(4) (5)
(6)
178
P. Sharma and B. R. Purushothama
Suppose a user u who belongs to group G 1 wants to join G 2 and G 3 . Then GC should update two polynomials P2 (x), P3 (x) for groups G 2 and G 3 by changing the group keys for both the groups, i.e., G K 2 , G K 3 and including the long-term secret of user u, i.e., k. Then GC broadcasts those two polynomials so that existing users of those groups including u can rekey their group keys. Each user u i can evaluate these polynomials at x = ki (ki is the user secret key) and obtain group keys of G 2 and G 3 . Similarly, let a user v who is engaged in the groups G 1 , G 2 , G 3 , G 4 , G 5 wants to leave G 3 , G 4 , G 5 . GC updates three polynomials P3 (x), P4 (x), P5 (x) for these groups, respectively. To distribute the new group keys for G 3 , G 4 , G 5 , GC broadcasts 3 polynomials P3 (x), P4 (x), P5 (x). GC updates the polynomials by changing the group keys and removing the long-term secret of v so that the leaving user could not derive the changed group key. In general, if a user u i is added or departed from m i groups, then GC needs to update those m i groups’ group key. In addition, GC updates m i polynomials and broadcasts them to the users so that when users get m i polynomials, they extract their respective group’s group keys. So, the storage and rekeying cost is increased by the factor of m i in multi-group scenario. Similarly, computation cost including polynomial construction increases by the factor of m i in this case. As shown above, it is evident that the traditional GKM schemes can be used in multi-group settings. But the application of these schemes is not scalable in this scenario. This is further explained in detail with the cost analysis in the following section.
5 Cost Analysis This section shows the cost involved in single-group and multi-group scenarios for GKM schemes. The parameters included in the analysis are storage cost, encryption/decryption cost, and rekeying cost.
5.1 Single-Group Settings • Storage at each user: 1. • Storage at GC: n. • Computation Cost on Join Operation: – At GC: Polynomial construction O(n 2 ). – At user: O(n). • Computation Cost on Leave Operation – At GC: Polynomial construction O(n 2 ). – At user: O(n). • Communication Cost on Join Operation: n. • Communication Cost on Leave Operation: n.
Why Traditional Group Key Management Schemes …
179
5.2 Multi-group Settings For simplicity, consider m groups G 1 , G 2 , . . . , G m . Each consists of n users. Suppose a user u wants to join all m groups simultaneously. • Computation Cost: – At GC: m polynomial constructions O(mn 2 ). – At user: To obtain all group keys by evaluating all polynomials: O(mn). • Communication Cost: mn. Suppose any user who is part of m groups wants to leave all m groups, then • Computation Cost: – At GC: m Polynomial constructions O(mn 2 ). – At user: O(mn). • Communication Cost: mn. It can be observed that as the number of service groups increases then managing the groups becomes hard. The storage cost is given as follows: • Storage Cost at GC: mn. • Storage Cost at user: m. The cost comparison between single-group and multi-group settings is shown in Table 1.
Table 1 Performance analysis in the proposed scheme Complexity parameters Join operation
Leave operation
Storage cost
Computation cost GC Users Communication cost Computation cost GC Users Communication cost GC Users
Single group
Multi-group
O(n 2 )
O(mn 2 )
O(n)
O(mn) mn O(mn 2 ) O(mn) mn mn m
n O(n 2 ) O(n)
n n 1
180
P. Sharma and B. R. Purushothama
6 Discussion Suppose a user is part of m/2 say G 1 , G 2 , . . . , G (m/2) and wants to leave these groups and join other m/2 groups say G (m/2+1) , . . . , G m then the cost of rekeying is – Cost at GC: 1. Computation Cost: It has to construct m/2 polynomials to maintain backward secrecy and m/2 other polynomials to maintain forward secrecy. So, it has to construct m polynomials. So, the cost is O(mn 2 ). 2. Communication Cost: GC has to broadcast m/2 + m/2 polynomials. So, the cost is mn. – Cost at users: The user has to evaluate the m/2 polynomials to get group keys of groups it is joining. So, the cost is m2 ∗ n = O(mn). So, it can be observed that it is very hard for GC to manage the groups when n and m get larger and larger. The comparison of key derivation cost, rekeying cost, and storage cost between single group and multi-group is also exhibited in Fig. 1. The figure clearly shows that the cost for single group is constant whereas for multi-group it grows linearly, showing that the traditional schemes don’t scale for the multi-group case.
Fig. 1 Performance comparison of proposed scheme with the existing schemes
Why Traditional Group Key Management Schemes …
181
We found that the application of traditional GKM schemes in multi-group scenario does not scale well. The main reason of this issue is that the design principles of secure multi-group communication are different from the single-group one. We have seen that the traditional GKM schemes when employed in multi-group settings increases the rekeying cost with the factor of m. So, the focus of design principles in secure multi-group communication must include less rekeying cost when a user can join/leave the multiple groups simultaneously. Less rekeying cost means the storage cost at user as well as GC, rekeying cost and computational cost at user as well as GC should be optimal. If possible, should not be dependent on the number of groups. One possible solution to the problem is to design a key management scheme with constant storage cost and rekeying cost for single group and apply it to the multi-group settings. One can design a group public key encryption scheme where public key represents a group and the ciphertext can be decrypted by only one group member.
7 Conclusion We have examined the performance and scalability of the representative polynomialbased GKM scheme for its applicability in multi-group scenario. We have exhibited that the traditional GKM schemes based on polynomial construction proposed for the single group when employed in the multi-group scenario do not scale well. There will be higher rekeying and storage costs in the multi-group scenario. That means, the storage cost, encryption/decryption cost, and rekeying cost are raised by the factor of m where the number of groups joined/revoked by the user is m. So, we conclude that secure GKM schemes should be designed for the multi-group settings with design principles different from that of the traditional GKM schemes. The design principle should focus not only on the security of the scheme but also less rekeying cost. Acknowledgements This work is supported by Ministry of Education, Government of India.
References 1. Kumar V, Kumar R, Pandey SK (2021) A computationally efficient and scalable key management scheme for access control of media delivery in digital pay-TV systems. Multimed Tools Appl 80(1):1–34 2. Mansour A, Malik KM, Alkaff A, Kanaan H (2020) ALMS: asymmetric lightweight centralized group key management protocol for VANETs. IEEE Trans Intell Transp Syst 22(3):1663–1678 3. Rawat A, Deshmukh M (2020) Tree and elliptic curve based efficient and secure group key agreement protocol. J Inf Secur Appl 55:102599 4. Pareek G, Purushothama BR (2018) Provably secure group key management scheme based on proxy re-encryption with constant public bulletin size and key derivation time. S¯adhan¯a 43(9):1–13
182
P. Sharma and B. R. Purushothama
5. Piao Y, Kim J, Tariq U, Hong M (2013) Polynomial-based key management for secure intragroup and inter-group communication. Comput Math Appl 65(9):1300–1309 6. Sun X, Wu X, Huang C, Xu Z, Zhong J (2016) Modified access polynomial based self-healing key management schemes with broadcast authentication and enhanced collusion resistance in wireless sensor networks. Ad Hoc Netw 37:324–336 7. Dinker AG, Sharma V (2018) Trivariate polynomial based key management scheme (TPBKMS) in hierarchical wireless sensor networks. In: Ambient communications and computer systems. Springer, Singapore, pp 283–290 8. Baburaj E (2017) Polynomial and multivariate mapping-based triple-key approach for secure key distribution in wireless sensor networks. Comput Electr Eng 59:274–290 9. Mahmood Z, Ning H, Ghafoor A (2017) A polynomial subset-based efficient multi-party key management system for lightweight device networks. Sensors 17(4):670 10. Ganesan VC, Periyakaruppan A, Lavanya R (2016) Cost-effective polynomial-based multicastunicast key distribution framework for secure group communication in IPv6 multicast networks. IET Inf Secur 10(5):252–261 11. Ramkumar KR, Singh R (2017) Key management using Chebyshev polynomials for mobile ad hoc networks. China Commun 14(11):237–246 12. Gautam AK, Kumar R (2021) A key management scheme using (p, q)-lucas polynomials in wireless sensor network. China Commun 18(11):210–228 13. Albakri A, Harn L, Maddumala M (Jun 2019) Polynomial-based lightweight key management in a permissioned blockchain. In: 2019 IEEE conference on communications and network security (CNS). IEEE, pp 1–9 14. Albakri A, Maddumala M, Harn L (Aug 2018) Hierarchical polynomial-based key management scheme in fog computing. In: 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE). IEEE, pp 1593–1597 15. Zhang G, Yang H, Tian X, Wu Y (Jul 2018) A novel key pre-distributed scheme based on subregions division for distributed wireless sensor networks. In: 2018 10th international conference on communication software and networks (ICCSN). IEEE, pp 275–280 16. Jing Z, Chen M, Hongbo F (May 2017) WSN key management scheme based on fully bomomorphic encryption. In: 2017 29th Chinese control and decision conference (CCDC). IEEE, pp 7304–7309 17. Guo H, Zheng Y, Li X, Li Z, Xia C (2018) Self-healing group key distribution protocol in wireless sensor networks for secure IoT communications. Futur Gener Comput Syst 89:713– 721 18. Nafi M, Bouzefrane S, Omar M (Oct 2020) Efficient and lightweight polynomial-based key management scheme for dynamic networks. In: International conference on mobile, secure, and programmable networking. Springer, Cham, pp 110–122 19. Albakri A, Harn L, Song S (2019) Hierarchical key management scheme with probabilistic security in a wireless sensor network (WSN). Secur Commun Netw 2019 20. Mohammadi M, Keshavarz-Haddad A (Sept 2017) A new distributed group key management scheme for wireless sensor networks. In: 2017 14th international ISC (Iranian Society of Cryptology) conference on information security and cryptology (ISCISC). IEEE, pp 37–41 21. Zhan F, Yao N, Gao Z, Tan G (2017) A novel key generation method for wireless sensor networks based on system of equations. J Netw Comput Appl 82:114–127 22. Dinker AG, Sharma V (2019) Polynomial and matrix based key management security scheme in wireless sensor networks. J Discret Math Sci Cryptogr 22(8):1563–1575 23. Zhang J, Li H, Li J (2018) Key establishment scheme for wireless sensor networks based on polynomial and random key predistribution scheme. Ad Hoc Netw 71:68–77 24. Hamsha K, Nagaraja GS (2019) Threshold cryptography based light weight key management technique for hierarchical WSNs. In: International conference on ubiquitous communications and network computing. Springer, Cham, pp 188–197
RESPECTO: Now We May Go Alone, Safety is in Our Hand Kumaresh Baksi, Gargi Ghosh, Deep Banik, Soumyanil Das, Arka Saha, and Munshi Yusuf Alam
Abstract Violence against women has increased as society and knowledge have developed. If we increase awareness and employ the most recent technology to help us, we may end this painful, humiliating molestation of women as well as seek retribution for those responsible. We’ve offered a system that uses a smartphone and a specially designed portable object with sensors to guarantee safety. While traveling, women feel at ease using their phones and other portable devices, and thanks to modern technology. It is simple to monitor and find them when they do so. Our novel approach involves using our suggested solution to send an alert and auto SMS to their parents and the closest police station regarding their current GPS position, where the actual crime has happened. A sustainable environment that is favorable to women may be created with our RESPECTO system. Keywords Women safety · Smartphone · Women harassment · Sensor integrated device
1 Introduction In, today’s world, being a woman means feeling the uttermost need for women’s safety. As a result, we are quite eager to work in this field and make some significant visible improvements in society regarding the number of crimes against women around the world. According to “The Hindustan Times” [1], the government enlisted more than 3,70,000 cases of crimes in 2020. Considering the United States, Nigeria, Yemen, the Democratic Republic of the Congo, Pakistan, Saudi Arabia, Syria, Somalia, India, and Afghanistan; India ranks #1 among the most dangerous countries for women, according to the BBC [2]. Next, when it comes to the least safe cities for women, it is being observed that Jodhapur ranks 0.54, followed by the “Capital of India”, i.e., Delhi ranks 0.47, followed by Patna and Kota (0.38), Gwalior(0.36), K. Baksi · G. Ghosh · D. Banik · S. Das · A. Saha · M. Y. Alam (B) Computer Science and Engineering, BBIT, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_17
183
184
K. Baksi et al.
Asansol(0.34), Vijayawada(0.33), Faridabad(0.30), Meerut(0.27), and Jaipur(0.26).1 It is being found in Delhi, within the last 2–3 years, that a woman was beaten, raped, and doomed by seven intoxicated beasts in a 40 min ride.2 Delhi tops the rape tally among metros, resulting in various critical effects that are devastatingly growing day by day. Furthermore, the Union Government told Parliament that in 2020, 3, 71, 503 incidents of crimes against women were filed across India, according to statistics from the National Crime Record Bureau. As a consequence, 3, 98, 620 people have been arrested for crimes against women, while 4, 88, 143 have been charged and 31, 402 have been convicted, according to [3]. Majority of cases under crimes against women were registered under the category of cruelty by husband or his relatives (30.2%), followed by an assault on women with intent to outrage modesty (19.7%), kidnapping and abduction of women (19.0%), and rape (7.2%), according to the NCRB report released [1]. According to new WHO and partner statistics, violence against women is still widespread and begins at an alarmingly early age. One in every three women will die throughout their lives. Around 736 million women are victims of physical or sexual violence perpetrated by an intimate relationship or sexual violence by a non-partner. This figure has remained virtually stable over the last decade [4]. Despite the fact that there are several regulations in place across the world to protect women, there is still a significant amount of laxity in terms of rules, political beliefs, and socioeconomic circumstances among a significant portion of the population. There would be no moms to give birth to future generations if women did not exist on our lovely planet. As a consequence, to struggle against all odds in this deceitful society comprised of scumbag guys wrecking women’s life while forgetting the day he was born, namely, from a woman’s womb. To eliminate violence against women, international agreements such as the “Convention on the Elimination of all forms of Discrimination Against Women” and the “UN Declaration on the Elimination of Violence against Women” from 1993 have been signed all over the world.3 Even though there are many established systems, we keep developing wonderful concepts to support women in their everyday lives. Researchers are working on a variety of safety standards, with a focus on women’s safety. To provide utmost security, many people are working on IoT devices, while others are working on electronic devices. For example, it has been observed that when a single switch is pressed on a wristband, the buzzer and video recording turn on, and tear gas is shot in the assailant’s eyes. Refill the tear gas and maintaining the system in proper condition incurs cost, as well as targeting assailant’s eyes is quite a challenging task [10]. Although there are various policies in place to improve women’s safety, existing systems have limitations. Thus, we will develop a system that will ensure women’s safety. Our system uses Google API to send the victim’s location and voice roles 1
https://www.nyoooz.com/news/coimbatore/704161/jodhpur-is-the-most-unsafe-city-forwomen-while-coimbatore-is-the-safest-list-of-safe-and-unsafe-indian-cities/. 2 https://m.timesofindia.com/city/delhi/40-min-ride-beaten-raped-dumped/articleshow/ 17659331.cms. 3 https://www.unwomen.org/en/what-we-do/ending-violence-against-women.
RESPECTO: Now We May Go Alone, Safety is in Our Hand Table 1 Literature survey on women safety Title, Year, Publishers Methodologies Fernandez et al. [5] Challenges of Smart Cities: How smartphone apps can improve the safety of women, 2020, IEEE (Conference) Sathyasri et al. [6] Design and Implementation of Women Safety System Based On Iot Technology. International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7 Issue-6S3 April, 2019.
Pramod et al. [7] IoT wearable devices for the safety and security of women and girl children, 2018 ,Journal, IAEME (ACADEMIA).
Jatti et al. [8] Design and Development of an IoT-based wearable device for the Safety and Security of women and girl children, IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20–21, 2016, India.
185
Limitations
Sending push notifications with links of recorded videos
GPS fails to provide the service in safety concern rather it shows the zones where the incidences are occurring more
Rechargeable battery, GSM, GPS, IOT module(ESP-12E), ARDUINO MEGA (ATmega 2560), LCD, vibration sensor, neuro simulation, buzzer, trigger. Acquisition of raw data is then accompanied by activity recognition which is a process of enrolling a specialized ML algorithm. Real-time monitoring of data is acquired by wirelessly sending sensor data to an open source Cloud Manifesto. Analysis of the data is concluded on MATLAB concomitantly. Basically monitors and analyzes the amount of electrolytes present on the surface of the victim’s skin to send message to contacts. Temperature Sensor (LM35), Triaxial Accelerometer (ADXL335E), Skin Resistance Sensor(Copper strips), ESP8266 Wi-fi module. Hardware: 1. Temperature Sensor(LM35) 2. Triaxial Accelerometer (ADXL335E)(Body position is resoluted by obtaining the raw accelerometer data) 3. Skin Resistance sensor or Copper Strips. 4. ESP8266 Wi-fi module. 5. ATMega 328P. 6. ATMega 32P clip. 7. Weak, a machine learning toolkit.
If there isn’t a nearby community, the buzzer is useless, and the system is unable to track the victim continuously. The size of the proposed system is also unclear, making it difficult to determine whether it can be carried around easily.
Emotion can differ based on the circumstances which can’t be determined with single thought of process. External device to sense data and take necessary action after analyzing data which incurs the cost.
If you step outside in a crisis while you’re having fever, the system might not provide the correct guidance and produce incorrect information. Relying on the suggested strategy is difficult because it takes time to respond quickly after learning about the worst-case scenario. (continued)
186 Table 1 (continued) Title, Year, Publishers Monisha et al. [9] Women Safety Device and Application: FEMME, 2016, Indian Journal of Science and Technology.
Miriyala et al. [10] Smart intelligent security system for women 2016 Journal.
K. Baksi et al.
Methodologies
Limitations
1. Using arm controller for the hardware device. 2. GSM Module 3. GPS 4. Audio Recorder 5. Hidden camera detector 6. Bluetooth access 7. Sensor 8. SOS.
External device with more number of sensors incurs a huge cost which is not affordable in the context of developing regions. The system consumes more memory to process and store the audio and video data Designing a portable device If there isn’t a nearby resembling a wrist band, community, the buzzer is switch, Raspberry Pi 2, GSM useless, and the system is Modem, GPS Receiver, unable to track the victim Screaming alarm, Tear gas, continuously. The size of the Live Streaming video, Buzzer, proposed system is also Webcam, GSMSIM900A, unclear, making it difficult to Siren driver. determine whether it can be carried around easily. Refilling and using tear gas in both the cases is risky and timeconsuming process due to its Prohibition in the market.
to the parent and the nearest police station. The parent may identify their child’s whereabouts and dial to contact the local police station. The alert button is pressed, followed by shaking the phone. If a voice command is conveyed in an alert message, authorities can find the victim using the victim’s mobility and execute the appropriate steps. The entire system is designed in such a way that there are no issues with endto-end connections and it is incredibly cost-effective.
2 Literature Survey Many research works have been gone through, taking into account of many commercial devices; IOT devices, where Google API including the camera is cost-effective, where women’s safety is considered the foremost priority, is being demonstrated by creating a Table 1. The parent may identify their child’s whereabouts and dial to contact the local police station. The alert button is pressed, followed by shaking the phone. If a voice command is conveyed in an alert message, authorities can find the victim using the victim’s mobility and execute the appropriate steps. The entire system is designed in such a way that there are no issues with end-to-end connections and that it is incredibly cost-effective [11]. Ramya Sree Yadlapalli et al. have worked based on two ideologies: wrist band and spectacles, indulging pressure switch as input, screaming alarm and tear gas, imposed for self-defencing purpose. Webcam
RESPECTO: Now We May Go Alone, Safety is in Our Hand
187
will be inserted in the spectacles for smart technologies [10]. Monishal et al., have used the ARM Controller in order of consuming less power. They have implemented a radio frequency signal detector to detect hidden cameras [9]. Zully Amairany Montiel Fernandez et al. have presented a smartphone application called Circle Armored that serves as an example apps types that could abolish or assist during catastrophe circumstances via the application of AI [5]. Sumathy et al., have implemented GSM,4 GPS, RF Transceiver, Temperature sensor, and Voice Recognizer to design a portable kit for the safety measure of women [12]. Zully Amairany Montiel Fernandez et al., have researched a Smartphone app named ”Circle Armoured”, which monitors the physical behavior of the user which includes running, jogging, and panicking in any kind of situation; it also focuses on recording the verses of the user and the criminal and helps to file a report by detecting its meaning with the help of a voice recognition system. It also helps in the recognition of crucial phases in which the user is with the help of the various sensors available in smartphones like accelerometer and gyroscope. It also pushes GPS location and records a video in the last part which will be shared via a link [5]. Sogi et al., have researched on a Raspberry Pi-based Smart Ring using IoT BLE designed to connect devices with low power consumption. Harikiran et al., have researched a Smart Security Solution for Women based on IoT. It has a pulse rate sensor which gives digital output for a heartbeat. GSM is used to send data from the control unit to the base unit. BLE is designed to connect devices with low power consumption. Human body temperature is of vital importance to maintaining the health, and therefore, it is necessary to monitor it regularly. Global positioning system (GPS) is able to determine the latitude and longitude of a receiver on Earth by calculating the time difference for signals from various satellites to reach the receiver [13].
3 System Design and Methodology In this section, we present the end-to-end system shown in Fig. 1 which shows the functionalities of the given modules. An overview of the Respecto system’s general architecture will be provided in order to demonstrate how the smartphone uses IMU sensors to offer safety and to emphasize the novelty of employing the customized umbrella covered in Section A. The system’s methodology, which is covered in Section B, is further explained in the next subsection. The distinctiveness of our strategy is demonstrated by the fact that when employing our proposed methodology, an alert and auto SMS about the victim’s current GPS location—where the actual crime has occurred—are sent to both their parents and the closest police station. The closest police station is an absolute necessity because, in this type of event, time is of the utmost importance. Due to the fact that the majority of crimes take place in desolate areas, there is a great likelihood that harassment will occur if the victim cannot call the nearest police station as soon as possible. 4
https://www.overleaf.com/project/62a2ccb98d4c29f152fe6fd3.
188
K. Baksi et al.
Fig. 1 Respecto system architecture consists of hardware and software components, as well as a back end and database work flow
In order to implement our proposed idea, the user’s smartphone and Bluetooth are connected in such a way that the MAC address can be tracked and matched with the phone as well as with the umbrella, so that in the event that the umbrella is lost in some way, we will track the umbrella via the GPS module which is installed in the umbrella. The battery is built within the handle itself, and it can be recharged using the solar panel that is mounted on the umbrella’s top as a solar sheet. A few paid platforms and a small open source community were used by the Respecto System. It will be possible to send messages to the closest police station via a paid SMS gateway system, which is more crucial when determining which police station is closest to the user’s current location.
3.1 Smart Umbrella Device Women all around the world find the umbrella to be quite handy because it can be used to shield them from both the rain and the scorching sun. In addition, we wish to offer the kind of umbrella that may shield users from abuse and humiliation. To serve this purpose, we must create a smart umbrella employing the various sensors depicted in Fig. 2. The first component is a NodeMCU 8266 module with the pins needed to link it to other components of the umbrella. The second is a Bluetooth module called HC05 that can pair with a phone using Bluetooth. Third, a GPS module is installed to connect with NodeMCU and get the location data if the victim’s phone’s GPS eventually fails or is turned off. Fourth, a battery is added to provide power for GPS communication, LED lighting, and phone battery recharges as needed. The
RESPECTO: Now We May Go Alone, Safety is in Our Hand
189
Fig. 2 Smart Umbrella prototype. It can be used to provide an emergency alert system, provide a nightlight for a dark night walking, recharge a cell phone, and track a person’s whereabouts Fig. 3 Hardware Configuration for Smart Umbrella, where GPS and Bluetooth sensors are incorporated
hardware architecture is shown in Fig. 3 Fifth, a small solar panel on the umbrella’s top to recharge the battery during the day, and finally, a buzzer is fitted to sound an alarm and summon society’s assistance.
190
K. Baksi et al.
3.2 Methodology In this section, we have tried to show the functionalities of end-to-end system by describing the algorithm Women Safety. Our proposed algorithm works in two phases, namely phone processing and server processing. In phone processing, Respecto app shown in Fig. 4 has three activities. Data about the user, such as name, address, and date of birth, are collected in the first step. The user can activate all server processes and transmit SOS messages to police stations by clicking the final button on the dashboard activity, which is the second activity.
Algorithm 1: Women Safety
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Input : User basic information Full name(fn), phone number(pn), address(addr), date of birth(dob), selected contacts(conta), Accelerometer sensor signature(V), Latitude(lat), Longitude(long). Output: Sending notification to the selected contacts and the nearest police station simultaneously. Phone Pr ocessing : Send Aler t ( f n, pn, addr, dob, conta) U ser ← f n, pn, addr, dob, conta // User is an object. Stor e U ser if AlertButtonClicked OR V > P OR VoiceRecognized then // Experimental value of sensor signature(P) Location ← (lat, long) // Location is an object Receive Aler t (U ser, Location) end if else DoN othing end if Ser ver Pr ocessing : Receive Aler t (U ser, Location) f n, pn, addr, dob, conta ← U ser // Extracting user lat, long ← Location Link ← https://respecto.in/lat+long/fn ps ← finding nearest police station ON Location ps noti f ied with ( f n + pn + addr + dob + conta + Link) conta in f or med with ( f n + pn + addr + dob + ps + Link)
Finally, a third action to edit user information will be available. We collect the user’s current location, storage, voice, and accelerometer sensor data on the back end. The user has three activation options: she can either shake her phone or shout “help me” at once, or she can press the button on the server side. The speech recognition process is handled via Google Assistant on the phone. An option for pairing Bluetooth hardware components of our proposed smart umbrella with a phone would be available in the app. We can manage the alarm that is embedded in the umbrella and the alert messages via the app, which is our cognizant approach.
RESPECTO: Now We May Go Alone, Safety is in Our Hand
191
Fig. 4 Respecto app view
The system sends a warning message with an audio notification after just 15 s to stop the procedure to prevent false positive data from being sent to the local police station or the chosen contact numbers. If the user decides to cancel the process, it will halt immediately. Then a new message will be sent to all contacts informing them of the erroneous data that was unintentionally generated. In server processing, we connect with Google Map API, which will be able to send the user’s current location to saved and selected contacts as well as to the nearest police stations.
4 Pilot Study We discuss our findings from a series of experiments in this part to pinpoint specific issues and potential remedies. Data collection and processing are necessary in order to accomplish our goals, which are explained in more detail here.
4.1 Data Collection Process We have collected data from 8 users, 5 police station addresses, 3 different phone types, and a 55 km area to train and test our system. The app will gather the required data, including the user’s full name, residential address, phone number, date of birth, and five additional contacts. The OTP verification method has been added to the app in order to determine whether the user is genuine or fake.
192
K. Baksi et al.
We’ll have a few buttons on our other gadget, a smart umbrella. The umbrella can be powered on with just one button, and it can also attempt a Bluetooth connection with your phone. Through the phone’s Respceto App, the second button sends a warning to the neighborhood police station as well as the contacts you specify.
4.2 Police Database Creation Our suggested approach to saving victims has seen many variances in terms of locating police stations. The government determines the location of each police station depending on the population density and other considerations. In an urban area, it has been noted that one PIN code contains several police stations dispersed throughout the city, but another PIN code only contains one police station. Police stations are located near one other in two different PIN codes. Contrarily, there are numerous police stations with the same or a different PIN code within 2000 m of each other in urban areas. However, the situation is slightly different in rural areas, where there are significantly fewer police stations, and they are getting farther apart. According to observations, within 5000 m of our experimental urban region, there may be very few police stations, which is better than the rural area where there won’t be a single police station. Thus, our proposed algorithm works on two methods in finding the nearest police station to the victim. Firstly, the search will be based on PIN code, and secondly, it will be based on the distance covering the area.
4.3 Data Analysis The most crucial step that follows data collection is data analysis. The phone number and GPS position are the project’s two main pieces of essential information. The system automatically verifies the phone number; nevertheless, the user must request any additional rights. The phone transmits GPS data to the server for system processing. When a user shakes the phone, three forces are generated in the x, y, and z directions from the accelerometer sensor. Are X, Y, and Z, respectively, then the current acceleration or magnitude is (X 2 + Y 2 + Z 2 ). We have tested the sensor data on different phones like the VIVO, POCO X2, SM-A205F, Redmi Note5, RMX, & Redmi go. Although we shook the phone in the same fashion in a controlled environment, all their values differed. All the data are given in Fig. 5. It’s time to develop a specific value to determine whether someone is shaking the phone for assistance or the data is irregular or abnormal after gathering data from various devices (Table 2).
RESPECTO: Now We May Go Alone, Safety is in Our Hand
193
Fig. 5 Different amplitude values from different phones while shaking almost in the same fashion Table 2 Different amplitude values for different phones No. Model 1 2 3 4 5 6
VIVO 1916 POCO X2 SM-A205F Redmi Note 5 RMX1811 Redmi go
Amplitude 93 124 46 34 46 34
To detect a user shaking the phone and not a free fall, we used an open source library named Seismic.5 Seismic works more efficiently to detect users shaking the phone in most cases. It has a timer queue to detect different amplitudes in a certain amount of time. This library discloses the algorithm that senses the accurate value of the user’s phone shaking. Before using the library and after using the library, the graph can differ in the proper use of the accelerometer. We divided the testing procedure into three smaller sections. First and foremost, we make sure that our program functions properly by using latitude and longitude data. We try to integrate continuous data from the live location of the Android emulator after achieving lat-long values. As a second phase, if desired, we can move the user’s location into the web page and alter the location of the Android emulator. A spot test is the final testing stage shown in Fig. 6. We used an Android device running the Respecto application during this stage. In this instance, we chose random locations for our test. We installed the project on an AWS server to avoid using a local-host 5
https://github.com/square/seismic.
194
K. Baksi et al.
Fig. 6 Testing our app near by police station
server. To determine the closest police station within a 2000 m radius, we executed a pilot test close to a police station. The police station has been successfully added to our map, and we have also entered its name into our database. The algorithm performed flawlessly when we double-checked it with a different police station. To conserve battery life, the algorithm increases the range to a maximum of 5000 m when there isn’t a police station within a range of 2000 m. We decided to notify the police station via SMS using a fake number.
5 Result and Analysis In this section, we are trying to state the findings of our research by arranging them in a logical sequence without any bias or interpretation. Once we have collected the user’s smartphone data, we must transfer it to the server’s database for additional processing. The server processing has two phases: first, to find the nearest police station continuously based on the PIN code, and second, to send a message to all selected contacts with a dynamic location link. The battery of a user’s phone can be drained if their location is regularly taken from the service. By only collecting her location information during the active alert mode, we have found a solution to the issue.
RESPECTO: Now We May Go Alone, Safety is in Our Hand
195
Fig. 7 Showing all nearest police stations within 2000 m. Blue color circle shows the different PIN codes. Black line shows the distance between police stations and the victim’s position
Fig. 8 There is no police station found in the range of 2000 m
The system finds the nearest police station in a 2000 m radial area shown in Fig. 7. If the algorithm discovers multiple police stations on the exact PIN code location, our system only chooses the police station that is currently closest to the victim by computing the shortest distance using the Haversine Formula.6 The chosen police station will receive an alert notification right away. If the user moves outside the range of the initial nearest police station, then discovers another police station nearby, which is our cognitive approach. It will put the new nearest police station into the queue of sending the alert notification as the victim has stepped into its range. 6
https://nathanrooy.github.io/posts/2016-09-07/haversine-with-python/.
196
K. Baksi et al.
Fig. 9 Increasing the searching range automatically by the proposed system until it gets the police station
Occasionally, if the victim’s phone is unable to locate a police station within the range of 2000 m shown in Fig. 8, then, in order to find the closest police station, as is seen in Fig. 9, the system will increase the search radius between 2000 and 3000 m. We can see from the figure that there are two police stations, and the selection of the police station has been made by the system based on the shortest distance while the victim is making their move. The user’s phone will begin automatically delivering data to the police headquarters when the algorithm reaches its maximum range, 5000 m, or the battery power is less than 15%. By using our proposed framework, Respecto, the victims can, therefore, easily and quickly save their lives and the battery life of their phones. It was discovered during our investigation in a suburban community where we extensively tested our system that the police stations are essentially within a 2000 m radius. Since there is no legislation requiring police stations to be placed up a specific distance from one another, we have set the minimum range to detect police stations to 2000 m.
6 Conclusion and Future Scope We have created a revolutionary method that the user can readily accept and afford in order to guarantee women’s safety in our society and throughout the world. Worldwide, women mostly use the smartphone and Umbrella that are used in this process. The closest police station is constantly updated while the user is moving, and the system, which will notify the designated contact numbers concurrently, is a unique feature in our study work. In order to inform the public, Umbrella offers audio alerts.
RESPECTO: Now We May Go Alone, Safety is in Our Hand
197
In the future, we’ll work to make the dashboard system available so that the police authorities can see the system and respond appropriately. We will add a speech recognition algorithm so that users can inform the police station when they are not in touch with their phones to make our app more efficient.
References 1. News I (2021) More than 370000 cases of crimes against women reported in 2020 2. News B (2018) An opinion poll to peg India as the most dangerous country for women is clearly an effort to malign the nation and draw attention away from real improvements seen in recent years 3. 3,71,503 cases of crimes against women were registered across India—the union government told parliament, citing data from the national crime record bureau. as a consequence, 3,98,620 persons had been arrested for committing crimes against women while 4,88,143 had been charge-sheeted and 31,402 were convicted, 2020 4. Violence against women is endemic in every country and culture, causing harm to millions of women and their families, and has been exacerbated by the covid-19 pandemic, 2021 5. Fernandez ZAM, Cruz MAT, Peñaloza C, Morgan JH (2020) Challenges of smart cities: how smartphone apps can improve the safety of women. In: 2020 4th international conference on smart grid and smart cities (ICSGSC). IEEE, 2020, pp 145–148 6. Sathyasri B, Vidhya UJ, Sree GJ, Pratheeba T, Ragapriya K (2019) Design and implementation of women safety system based on IoT technology. Int J Recent Technol Engi (IJRTE) 7:6S3 7. Pramod M, Bhaskar CVU, Shikha K (2018) IoT wearable device for the safety and security of women and girl child. Int J Mech Eng Technol 9(1):83–88 8. Jatti A, Kannan M, Alisha R, Vijayalakshmi P, Sinha S (2016) Design and development of an IoT based wearable device for the safety and security of women and girl children. In: 2016 IEEE international conference on recent trends in electronics, information & communication technology (RTEICT). IEEE, 2016, pp 1108–1112 9. Monisha D, Monisha M, Pavithra G, Subhashini R (2016) Women safety device and applicationfemme. Indian J Sci Technol 9(10):1–6 10. Miriyala GP, Sunil P, Yadlapalli RS, Pasam VRL, Kondapalli T, Miriyala A (2016) Smart intelligent security system for women. Int J Electron Commun Eng Technol (IJECET) 7(2):41– 46 11. Viswanath N, Pakyala NV, Muneeswari G (2016) Smart foot device for women safety. In: IEEE region 10 symposium (TENSYMP). IEEE 2016, pp 130–134 12. Sumathy B, Shiva PD, Mugundhan P, Rakesh R, Prasath SS (2019) Virtual friendly device for women security. J Phys: Conf Ser 1362(1):012042. IOP Publishing 13. Panda PK, Mehtre B, Sunil D, Devanathan M, Subhash B, Panda SK (2020) A compact safety system for women security using IoT. In: 2020 IEEE international conference on technology, engineering, management for societal impact using marketing, entrepreneurship and talent (TEMSMET). IEEE, pp 1–6
Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption P. S. Akshatha and S. M. Dilip Kumar
Abstract MQTT has become one of the most widely used IoT communication protocols due to its efficiency and simplicity. However, it does not support the desired security features; instead, it assumes the use of SSL/TLS in the lower layer. The high bandwidth consumption of SSL/TLS makes it costly to support it. This paper proposes an end-to-end payload encryption technique using the Fernet key for enhancing the security of MQTT communication. The intermediate brokers are not necessary for the proposed approach to support SSL/TLS or to obtain and install certificates. Fernet provides symmetric encryption and authentication of data. The outcomes demonstrate that the suggested approach enhances bandwidth use, requires less initial connection setup time, and minimal jitter and the slight delay introduced by Fernet’s method is acceptable for secured communication. Since the key is only generated once, less time is needed for long-lasting communications. Keywords Internet of Things (IoT) · Message queuing telemetry transport (MQTT) · MQTT brokers · Secured socket layer/Transport layer security (SSL/TLS) · Fernet key generation · Payload encryption
1 Introduction IoT is a promising network architecture that allows smart devices to communicate with one another. Monitoring the data interchange between the devices in IoT is challenging due to the heterogeneous and enormous number of devices [1]. Several application protocols, including MQTT, CoAP, XMPP, and AMQP, are used to transfer messages in an IoT network [2, 3]. Because of its lightweight qualities and ability to perform successfully in low-power, limited-memory devices, MQTT is the most significant contender for M2M communication [4, 5]. The worldwide Google search P. S. Akshatha (B) · S. M. Dilip Kumar Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_18
199
200
P. S. Akshatha and S. M. Dilip Kumar
Fig. 1 Google search trends
trends of popular IoT messaging protocols over the previous 12 months are shown in Fig. 1. The most striking observation to emerge from the figure is that interest in the MQTT protocol has grown over time compared to other IoT messaging protocols. MQTT is an M2M and IoT connectivity protocol. It is a publish/subscribe communication protocol that transports messages between devices. It facilitates communication with remote areas where network bandwidth is limited. MQTT works on TCP/IP, with non-encrypted and encrypted communication using ports 1883 and 8883, respectively. The MQTT protocol has three network entities: a message broker (server), a publisher, and a subscriber. A publisher submits messages to a server with a topic head, which subsequently delivers the messages to subscribers who listen to that topic. Many MQTT-based brokers are currently available on the market from various vendors. This paper uses the HiveMQ broker for bandwidth and latency evaluation. HiveMQ is a scalable, commercially available license based on MQTT, built by HiveMQ GmbH, and is written in Java [6]. A secure communication channel between a client and a server is provided via TLS and SSL. Many MQTT brokers support using TLS in place of plain TCP to encrypt the entire MQTT communication [7]. If MQTT client connections are anticipated to be transient, the TLS Handshake’s communication overhead could be substantial. A new TLS connection can consume up to a few kilobytes of bandwidth. The bandwidth required for the TLS handshake depends on many other aspects. Every packet on the network has more overhead than unencrypted ones since TLS encrypts every packet. Due to insufficient resources, TLS is not practical for constrained devices. If one can afford the extra bandwidth and memory required for TLS, it is preferable to use MQTT with TLS [6]. As a result, this paper proposes a payload end-to-end encryption scheme using cryptography Fernet key generation. Table 1 presents the acronyms used in this paper.
Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption
201
Table 1 Acronyms and abbreviations used in the paper Acronym Abbreviation CoAP XMPP AMQP AES CBC PKCS SHA HMAC
Constrained application protocol Extensible messaging and presence protocol Advanced message queuing protocol Advanced encryption standard Cipher block chaining Public-key cryptography standards Secure hash algorithm Hash-based message authentication code
1.1 Contributions This paper makes the following significant contributions: (i) Provides the secured MQTT e-healthcare communication architecture consisting of publishers, a broker, and subscribers. (ii) Provides end-to-end payload Fernet key encryption from publishers to subscribers through an MQTT broker. (iii) Provides the Fernet encryption, authentication, and decryption process in the algorithm format. (iv) Compares the existing SSL/TLS method to secure MQTT communication with the proposed Fernet key encryption approach in terms of initial connection setup, bandwidth, mean jitter, and mean end-to-end delay.
1.2 Paper Organization The rest of the paper is formulated as follows: The related works to improve the security mechanism for the MQTT protocol are discussed in Sect. 2. Section 3 discusses the problem statement and objectives. Section 4 discusses the proposed system with architecture and algorithms for encryption and decryption techniques. In Sect. 5, experimental results for existing SSL/TLS versus payload encryption are compared and analyzed. Finally, in Sect. 6, the conclusion of the paper and future work is presented.
2 Related Works In this section, related works to enhance MQTT security are addressed. The authors of [8] present a system that automatically verifies the SSL/TLS certificate for Internet of Things applications that use broker-based communication
202
P. S. Akshatha and S. M. Dilip Kumar
protocols. IoTVerif creates an IoT protocol specification and verifies its security properties without requiring prior knowledge of messaging protocols. The evaluation’s findings demonstrate that IoTVerif can successfully detect vulnerabilities in IoT applications such as TLS renegotiation and man-in-the-middle attacks. The work in [9] investigates the impact of MQTT with TLS on performance based on connection build-up throughput, times, and energy efficiency using a reproducible test bed on a standard off-the-shelf microcontroller. The findings reveal that the network environment and the frequency of connection re-establishment affect how TLS affects performance over three QoS levels. The work in [10] concentrated on the MQTT protocol’s energy usage when using different QoSs over TLS. The main findings of this paper are the real-time measured values for energy consumption when performing secure communication using the MQTT protocol and the findings of energy consumption measurements when performing secure communication using the MQTT protocol over TLS. The authors of [11] provided a technique for the opening handshake that makes use of a dynamic cipher suite algorithm, payload encryption, and an evaluation function with three inputs (the communication goal, predicted security level, and residual energy). Results showed that the suggested method increases energy efficiency by 34.72% while improving security communication. The authors in [12] present a lightweight encryption and authentication mechanism for constrained devices. This mechanism employs the TLS authentication algorithm ECDHE-PSK over the MQTT protocol. According to the findings, the proposed approach improves security. The work in [13] focused on several security measures to secure a platform for IoT smart toys intended to assist child development specialists in the early diagnosis of psychomotor delays. In [14], the authors proposed a model in which original images are processed, encrypted/decrypted, and converted into bitmap images as outputs. When needed, users decrypt them with a “key”. In light of the literature mentioned above, we propose an end-to-end payload encryption technique based on the Fernet key for enhancing the security of MQTT communication. Table 2 summarizes some of the related works with objectives, experimental requirements, findings, pros, and cons.
3 Problem Statement The problem statement of this work is to enhance the security in MQTT end-toend communication between publishers and subscribers by encrypting the payload without interfering MQTT broker. The objectives include (i) To secure MQTT end-to-end communication using payload encryption. (ii) To compare existing SSL/TLS approach with proposed Fernet key generation method. (iii) To improve bandwidth consumption. (iv) To measure mean end-to-end delay.
Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption
203
Table 2 Summary of related work References
Objectives
Experimental requirements
Findings
Pros
Cons
Automatic SSL/TLS certificate verification for IoT applications
15 popular Android applications, Eclipse Paho package, NuSMV, Java, Monkey, Raspberry Pi, Arduino Yun
The proposed system IoTVerif can detect TLS renegotiation and man-in-the-middle attacks
Identification of SSL/TLS vulnerabilities
The tool’s performance has not been validated for a wide range of applications
To examine the impact of combining MQTT with TLS on performance
ESP8266, power supply module 1PC, Yokogawa WT310, Windows 10 MQTT broker
The impact of TLS TLS’s effect on on performance is performance is due to the network evaluated situation and the frequency of connection re-establishment
Performance analysis for handling multiple clients is not verified
Evaluating the effects of the MQTT via TLS on energy consumption
Wi-Fi router TP-Link, Raspberry Pi, ESP32, MASTECH MS8050, and lithium battery LS903052
QoS 1 consume less energy than the other two QoS levels
Energy consumption is assessed for all levels of QoS
Depending on the broker, energy usage assessments may differ
Energy-efficient SSL or TLS approach for IoT
IoT devices, energy meter, server
The proposed approach saves energy up to 34.72%
It provides appropriate security communication that adapts to the device’s environment
When only the payload is encrypted with MAC, the vulnerability is not taken into consideration
Lightweight Security Mechanism using MQTT for IoT Devices
Raspberry Pi-2, ODROID-C1, AP router, power supply, power measurement device
With respect to CPU utilization, execution time, bandwidth, and power consumption, the proposed method outperforms
Performance evaluation of various security mechanisms
The mechanism has yet to be validated across a variety of IoT devices
[13]
To protect IoT smart toy platform
Raspberry Pi, ATmega328P, PyCrypto library, and Crypto Arduino library
Average time consumed by the authentication hashing process and encryption
A feasible secured solution for IoT smart toy platform
Hardware restrictions of smart toys
[14]
Securing images in – cloud
MAE, MSE, and RMSE scores
Secured cloud computing model
Among the diverse datasets, only the image dataset has been tested
Enhancing security Eclipse Paho mechanism of package, MQTT protocol Wireshark, Hivemq broker (ports 1883 and 8883)
The proposed Proposed approach approach improves enhances bandwidth bandwidth use consumption
[8]
[9]
[10]
[11]
[12]
Proposed work
Fernet is unsuitable for encrypting very large files
204
P. S. Akshatha and S. M. Dilip Kumar
4 Proposed System This section presents the proposed system to secure end-to-end MQTT communication based on the Fernet generation key. Figure 2 shows the proposed end-to-end payload encryption architecture for MQTT communication. The plain text generated by the publisher will be encrypted with a key generated by Fernet. The encrypted communication is then transmitted to the subscriber via a broker. The active subscriber then decrypts the message to have access to plain text. As a result, in the proposed design, intermediate brokers are not required to support SSL or to obtain and install certificates. Fernet is an implementation of symmetric authenticated cryptography [15]. It is a component of the Python Cryptographic Authority’s cryptography library. Appropriate usage of Fernet guarantees that an intruder cannot access an encrypted message [16]. Along with a randomly selected initialization vector, Fernet requires the following three crucial inputs: i Plain text. ii Current time. iii User-supplied key of 256 bits length. The cryptographic building blocks of Fernet include the following: i 128-bit AES in CBC mode using PKCS#7 padding: Advanced encryption standard-cipher block chaining (AES-CBC), along with padding schemes and suitable initialization vectors, ensures secure communication. ii A SHA-256-based HMAC: Hash-based message authentication code (HMAC) uniquely employs secret keys and hash functions to allow a subscriber to validate the authenticity and integrity of the data received. Algorithms 1 and 2 describe the Fernet approach’s encryption and decryption techniques.
Fig. 2 Proposed secured MQTT communication
Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption
205
Algorithm 1 The Process of Fernet Encryption and Authentication 1: 2: 3: 4: 5: 6:
Input: Plain Text Output: Encrypted Text Record the Timestamp Generate a unique initialization vector. Construct the cipher text: i Pad out the plaintext according to PKCS #7; ii Encrypt the padded message with 128-bit AES in CBC mode. iii Calculate HMAC. 7: Concatenate all the fields mentioned above, including HMAC. 8: Encode the token according to the base64url specification.
Algorithm 2 The Process of Fernet Decryption 1: 2: 3: 4: 5: 6: 7: 8:
Input: Encrypted Text Output: Plain Text Reverse the token’s base64url encoding. Ensure that the token is not too old. Calculate HMAC. Verify the timestamp. Decrypt the cipher text. Remove padding to get original plain text.
5 Experimental Results This section describes the experimental procedure and results in brief. Table 3 summarizes the hardware and software requirements of the experiment. The experiment began by executing the communication between the publisher and subscriber through the HiveMQ broker via the 8883 port. Furthermore, the proposed work started by generating a new Fernet key identical for both the publisher and the subscriber. Clients communicated via the HiveMQ broker using the payload Fernet key encryption method. Wireshark [17] was used to capture the packets while both strategies were being executed. Further, the initial connection setup, bandwidth usage, mean jitter, and endto-end mean delay were measured with the help of the captured packets of existing and proposed methods. Figure 3a depicts the initial connection setup required for both existing and proposed approaches. It is evident from the graph that the proposed method requires less initial connection setup time since the existing TLS approach requires time for an initial handshake. Figure 3b depicts the TLS method used more bandwidth than the proposed Fernet key encryption method. It is because TLS can be very computationally intensive and require many kilobytes of memory. Figure 3c depicts the mean end-to-end delay of both approaches. The delay consumed by Fernet’s way is marginally high due to the time required for Fernet key generation, and it is admissible for secured communication. For more extended communication between clients, the required time will be less, as the key is generated only once. Figure 3d depicts the
206 Table 3 Hardware and software requirements Details CPU System type Memory OS HDD/SDD Disk model Disk size Broker Packages Protocol analyzer
P. S. Akshatha and S. M. Dilip Kumar
Publisher/subscriber Intel(R) Core(TM) i5-8265 CPU @ 1.80 GHz 64-bit operating system, x64-based processor 8 GB, SODIMM, 2400 MHz Windows 10 Solid-state drive KBG30ZMV256G TOSHIBA 30 GB HiveMQ (port: 1883 and 8883) Paho client and cryptography Fernet Wireshark
Fig. 3 Initial connection setup time, bandwidth usage, mean end-to-end delay, and mean jitter of payload encryption versus SSL/TLS
mean jitter of both approaches. The mean jitter is more for TLS method as bandwidth is depleted, then data packets must be reassembled at the receiver’s end, adding to the amount of jitter.
Enhancing Security Mechanism of MQTT Protocol Using Payload Encryption
207
6 Conclusion The MQTT protocol is among the most well known for interacting with IoT devices. However, MQTT does not provide security features by default. MQTT uses SSL/TLS for secure communication; however, the existing technology is expensive due to the bandwidth consumption, and the intermediate broker needs to support the SSL/TLS and also it needs to obtain and install all the certificates. In the proposed approach, the end-to-end secured communication between publisher and subscriber is implemented with the Fernet key generation. While executing the payload encryption, the packets were captured and analyzed using Wireshark. The results show that the proposed approach requires less setup time and mean jitter, and improves bandwidth consumption. It is planned to extend this proposed work using blockchain technology to improve bandwidth consumption, CPU usage, and RAM usage.
References 1. Wang X, Garg S, Lin H, Piran MJ, Hu J, Hossain MS (2021) Enabling secure authentication in industrial IoT with transfer learning empowered blockchain. IEEE Trans Ind Inform 17(11):7725–7733 2. Nebbione G, Calzarossa MC (2020) Security of IoT application layer protocols: challenges and findings. Futur Internet 12(3):55 3. Bansal M (2020) Application layer protocols for internet of healthcare things (IoHT). In: 2020 IEEE fourth international conference on inventive systems and control (ICISC), pp 369–376 4. Akshatha PS, Dilip Kumar SM (2022) Delay estimation of healthcare applications based on MQTT protocol: a node-RED implementation. In: 2022 IEEE international conference on electronics, computing and communication technologies (CONECCT), pp 1–6. https://doi. org/10.1109/CONECCT55679.2022.9865759 5. Amoretti Michele, Pecori Riccardo, Protskaya Yanina, Veltri Luca, Zanichelli Francesco (2020) A scalable and secure publish/subscribe-based framework for industrial IoT. IEEE Trans Ind Inform 17(6):3815–3825 6. HiveMQ websockets client. http://www.hivemq.com/demos/websocket-client/ 7. Patel C, Doshi N (2020) A novel MQTT security framework in generic IoT model. Procedia Comput Sci 171:1399–1408 8. Liu A, Alqazzaz A, Ming H, Dharmalingam B (2019) Iotverif: automatic verification of SSL/TLS certificate for IoT applications. IEEE Access 9:27038–27050 9. Prantl T, Iffländer L, Herrnleben S, Engel S, Kounev S, Krupitzer C (Apr 2021) Performance impact analysis of securing mqtt using tls. In: Proceedings of the ACM/SPEC international conference on performance engineering, pp 241–248 10. Baranauskas E, Toldinas J, Lozinskis B (2019) Evaluation of the impact on energy consumption of MQTT protocol over TLS. In: CEUR workshop proceedings: IVUS 2019 international conference on information technologies: Lithuania, April 25, 2019, vol 2470. CEUR-WS, pp 56–60 11. Chung JH, Cho TH (2016) An adaptive energy-efficient SSL/TLS method for the internet of things using MQTT on wireless networks. In: 6th international workshop on computer science and engineering, WCSE 2016, pp 340–344 12. Amanlou S, Bakar KAA (2020) Lightweight security mechanism over MQTT protocol for IoT devices. Int J Adv Comput Sci Appl 11(7)
208
P. S. Akshatha and S. M. Dilip Kumar
13. Rivera D, García A, Martín-Ruiz ML, Alarcos B, Velasco JR, Oliva AG (2019) Secure communications and protected data for a Internet of Things smart toy platform. IEEE Internet Things J 6(2):3785–3795 14. Tyagi SS (2021) Enhancing security of cloud data through encryption with AES and fernet algorithm through convolutional-neural-networks (CNN). Int J Comput Netw Appl 8(4):288– 299 15. Fernet (Symmetric Encryption). https://cryptography.io/en/stable/fernet/ 16. Rivera D et al (2019) Secure communications and protected data for a Internet of Things smart toy platform. IEEE Internet Things J 6(2):3785–3795 17. Wireshark. https://www.wireshark.org/
Recent Trends in Cryptanalysis Techniques: A Review Subinoy Sikdar and Malay Kule
Abstract Cryptanalysis is the art of revealing the actual content of the encrypted message without knowing the key(s) or knowing some part of plaintext and its corresponding ciphertext. In other words, the objective of cryptanalysis is to discover the unrevealed confidential key. In this paper, we have focused on recent trends in cryptanalysis techniques of different ciphers. We have mentioned the cryptanalysis works in a year-wise manner and also presented a brief description of each work. We have analyzed their works and found out the complexity of their works. We also have pointed out some drawbacks of their works. Moreover, the influence of machine learning in cryptanalysis methods has been discussed. So, this paper will help the researchers to get a brief idea of cryptanalysis work done in recent years. Keywords Cryptanalysis · Ciphertext · Encryption · Decryption · Machine learning
1 Introduction In recent days, security and privacy are the most important concerns in every aspect of our lives. We always want to secure our sensitive and credential information. We are passing our information each and every time over the internet while doing various online activities like e-banking, e-commercing, social networking, etc. But we need to be very sure that our information should be kept protected. While talking about security, cryptography comes into the picture. Cryptography is the art of making information secure; hence, it transforms our messages into an unreadable format. Cryptanalysis [1] is the science of breaking this unreadable format and getting the original messages. We can not rely on communication channels as the channels are insecure. A third party can peep secretly over the channel and they can gather all the private messages passing over the insecure channel. So, here cryptography plays an S. Sikdar (B) · M. Kule Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_19
209
210
S. Sikdar and M. Kule
important role to secure the messages. Cryptography allows only the intended users to read the messages. There are several cryptographic algorithms that exist in the recent days by which we can send messages securely over the insecure channel. On the other hand, cryptanalysis is getting the original messages from the unreadable format even without knowing anything. It is the study of cryptographic algorithms and gaining access to that cryptographic system without knowing the cryptographic key element. The objective of cryptanalysis is to study the system and to find out the weakness of the system or to defeat the cryptographic system so that the security of the system can be improved and the system will be secure in the future from different attacks. Social media networks [11] have gained their popularity in recent days. Along with the increasing numeral of active users in online media web, it also took the concern into account about the privacy of the users. There are also locationbased service providers in the online media platform, where customers provide their details on social media. There are QR codes [2] which read information very fast. For example, very much popular social media networks like Facebook, Whatsapp, Google, Snapchat, and many more are using QR code to read data. Every kind of information which we use from the morning to bed can be converted and stored into a QR code. Like our mail address, our Whatsapp chat or even our online banking like G-pay, PhonePe account information can be stored in a QR code. This QR code technology is associated with various risks like malware attacks, phishing attacks, bugs in QR code, or financial theft. QR code may contain information of a malicious website, by scanning the code the user might be affected by the unwanted apps which will be installed on the user’s smartphone without his concern. So, the study of QR code, in other words, cryptanalysis of QR code technology will be the research interest of the cryptanalyst. Facebook uses encryption algorithms like SHA256, International Data Encryption Algorithm (IDEA), Blowfish, etc. SHA-256 is a good encryption algorithm which is used by Facebook to encrypt data, but still, a risk factor remains there with 2256 exhaustive searches, though it is very much difficult with existing computational systems. Many researchers have agreed on the decision that SHA-256 will be vulnerable very soon. Cryptanalysts have come up with an attack called birth attack which exploits the mathematics of Blowfish encryption techniques with the help of probabilistic analysis. Hence, clearly, there is a lot of influence of cryptanalysis on social media webs. This appraisal is assembled into five different sections. Section 1 introduces the paper. Section 2 describes the preliminaries followed by a detailed discussion on recent trends in cryptanalysis techniques in Sect. 3. Section 4 analyzes the different cryptanalysis techniques mentioned in Sect. 3. Finally, Sect. 5 culminates the paper.
Recent Trends in Cryptanalysis Techniques: A Review
211
2 Preliminaries 2.1 Cryptosystem A cryptosystem consists of five tuples where P → Collection of Plaintexts, C → Collection of Ciphertexts, K → Collection of Keys, E → Collection of Encryption algorithms, and D → Collection of Decryption algorithm. Plaintext (P)—Plaintext is the original note which is dispatched by the sender. Ciphertext (C)—Ciphertext is the encoded format of the original message which receiver receives. Encryption (E)—Encryption is the function which takes the plaintext as input and encodes the plaintext into ciphertext with the help of a key element. ek (p) = c k = key elementK E: P × K → C Decryption (D)—Decryption is the reverse function of encryption which takes the ciphertext as input and decodes the ciphertext into plaintext with the help of the key element. dk (c) = dk [ek (c)] = p D : C × K → P Key (K)—Generally, a key is a secret numerical value which performs the primary role while encryption or decryption process. For all k ∈ K, ∃ a ek ∈ E and dk ∈ D: dk (ek (c)) = p ∀ c ∈ C and p ∈ P.
2.2 Categories of Encryption In classical cryptosystem, there are two kinds of encryption methods. One is symmetric encryption and the other is asymmetric encryption. i. Symmetric Key Encryption—As appeared in Fig. 1, in a symmetric key encryption process, both the transmitter and recipient utilize the same confidential key to encrypt and decrypt the data, respectively. ii. Asymmetric Key Encryption—As shown in Fig. 2, in an asymmetric encryption process, a key pair is used to encrypt as well as to decrypt information. Key pair contains two keys, private key and public key. Transmitter encrypts the plaintext with the public key and the recipient decodes the ciphertext with the private key. Here, the public key is shared over the network, whereas the private key is retained unrevealed and only the owner has the authority to access it.
212
S. Sikdar and M. Kule
Fig. 1 Symmetric key encryption process
Fig. 2 Asymmetric key encryption process
Public key is a mathematically computed machine-generated very large numerical value which is used to encrypt the plaintext and it is kept public over the network, so everybody in the network can access the public key. Private key is also a large key like a public key, which is used to decrypt data but it is a secret key, only the authorized parties have access to the private key.
2.3 Cryptanalysis Cryptanalysis is the study of cryptosystem to find out the weakness of the system, information leakage of the system and to decrypt the ciphertext into plaintext without knowing the key element. It helps the cryptanalyst to understand the cryptosystem better and create a more secure cryptosystem. In recent days, there are several cryptanalysis methods. Cryptanalysts use these methods to analyze the systems. i. Brute Force Attacks (BFA)—Brute force attack is an exhaustive search with trial and error method without exploiting any vulnerabilities to the website until a coherent interpretation of the ciphertext into the plaintext is derived. The primary intuition of this method is to use all the permutations and combinations to form the
Recent Trends in Cryptanalysis Techniques: A Review
213
secret key(s) and then try all possible keys to decode the ciphertext into plaintext or to find out the user credentials. There are several tools which do brute force attack. Hydra is one of the wellknown tools which is often used for cracking login credentials in Linux and Windows systems as well. It supports so many protocols such as AFP, HTTP-FORM-GET, HTTP-FORM-POST, HTTP-HEAD, HTTP-PROXY, and several are there. In the KALI Linux operating system, Hydra is installed by default and available for both the command line interface as well as graphical user interface. It can crack the username and password in brute force method. It is a paralyzed, very fast, and flexible tool which enables us to grant unauthorized access to a remote system. Supporting many different protocols and paralyzed connection makes Hydra unique. ii. Ciphertext Only Attack (COA)—In this attack model, the attacker will have access only to the ciphertext or a set of old ciphertext (Ci ). Then the attacker will try to discover the key and the corresponding plaintext. This is the hardest attack but the most probable attack as the attacker will have only the ciphertext. iii. Known Plaintext Attack (KPA)—In this attack model, the attacker has some pair of plaintext (Pi ) along with corresponding ciphertext (Ci ), i.e., . Attacker has access to the channel and he can get the current ciphertext (C*) from the channel. The goal is to find out the secret key (K) or to guess the current plaintext (P*) from the current ciphertext (C*) by analyzing the old plaintext-ciphertext sets . [< Pi , Ci >, C ∗ ] → [K, P∗] iv. Chosen Plaintext Attack (CPA)—In this attack model as shown in Fig. 3, more accessibility is given to the attacker. At a certain point of time, the attacker has been given access to the encryption missionary without revealing the key (K). The attacker can choose a set of plaintext (Pi ) beforehand. Using the encryption missionary, the attacker can generate a set of ciphertext (Ci ) with the corresponding plaintext (Pi ), which was chosen by the attacker earlier. Key (K) is inside the encryption missionary, so the attacker has no access to the secret key. Here, the goal is to acquire the secret key (K) and guess the current plaintext (P*) from the current ciphertext (C*). Fig. 3 Chosen plaintext attack (CPA)
214
S. Sikdar and M. Kule
Fig. 4 Chosen Ciphertext Attack (CCA)
[< Pi , Ci >, C ∗ ] → [K, P∗] v. v. Chosen Ciphertext Attack (CCA)—In Fig. 4, Chosen Plaintext Attack (CCA) model has been demonstrated. This is the reverse attack model of chosen plaintext attack. Attackers will have temporary access to the decryption missionary for a certain time without revealing the secret key (K). Attackers can choose a set of ciphertext (Ci ) and generate the corresponding plaintext set (Pi ) with the help of the decryption missionary. Here, the goal is to find out the secret key (K) and get the current plaintext (P*) from the current ciphertext (C*). [< Ci , Pi >, C ∗ ] → [K, P∗] vi. Man In The Middle (MITM) Attack—This attack model allows the attacker to stay in between two parties in incognito mode and to peep secretly over their conversation. The attacker can steal all the credential information of the user in this type of attack. Different spoofing and hijacking attacks are good examples of MITM attack such as Domain-Name-System-spoofing, Internet-Protocol-spoofing, HypertextTransfer-Protocol-Secure-spoofing, Email-hijacking, Secure-Socket-Layerhijacking, WiFi eavesdropping, and sneaking web cookies. vii. Frequency Analysis—Frequency analysis is a cryptanalysis method which is used by the researchers to decrypt the ciphertext into plaintext. This method is used as an aid to break the classical ciphers. In frequency analysis, the distribution of different letters is counted over a long plaintext. This frequency of distribution is used while substituting the letters in the ciphertext. In the ciphertext, letters are replaced with other letters according to the frequency of their occurrence. It has been obtained from frequency analysis over English texts that in English the most frequently occurring letters are E, T, A, and O, while Q, J, X, and Z are rarely used. Likewise, TH, ER, ON, and AN are the most usual couple of letters and SS, EE, TT, and FF are the most frequent repeats. The chart in Fig. 5 shows the frequency histogram of English alphabets.
Recent Trends in Cryptanalysis Techniques: A Review
215
Fig. 5 Frequency histogram of English alphabets
3 Recent Trends in Cryptanalysis Techniques i. In paper [1], Josef Cokes et.al showed the result of linear cryptanalysis of a small-size Rijndael cipher. Though the Rijndael cipher takes a very large key and block size, for that it becomes difficult to apply exhaustive search on the Rijndael cipher. So, they took the baby Rijndael cipher which allows exhaustive processing, which could relate the real-world scenario in a crucial way and it could effectively be extended on this cipher. The block and the key size of the baby Rijndael cipher was 16 bits. They select a linear estimation of small-size Rijndael with peak probability aptitude. They got the results by complete exploration of all estimations and all keys and got a few curious characteristics about linear cryptanalysis as well as lessen Rijndael. They obtained different genres of linear estimations with noteworthy diverse achievement rates of retrieval of the cipher’s key. ii. In paper [2], Ms. R. Divya et al. showed two verification agreements are proposed to present how perception can intensify the reliability and interoperability. These two agreements are time-dependent one-time password protocol and password-dependent authentication protocol. These two authentication protocols play a major role against several authentication attacks, especially in online transactions and many more are there. They showed the prevention of session hijacking, prevention of keylogging, safe transactions, transaction verification, visual channels, and signature validation by using these two authentication protocols. iii. Sikhar Patranabis et al. [3] worked on designing fault-resistant ciphers using tweaks. They explore the use of secret key-independent linear and nonlinear tweaks to achieve security against fault analysis attacks and present the experimental result. They emphasize on designing encryption mechanisms that use secret tweaks to acquire natural security against the attacks and they come up with the DRECON cipher and they showed in the experimental result that it was a fault-resistant block cipher. In the year 2016, the Side Channel Analysis (SCA) Attack and Fault Analysis (FA) attack were major threats and it raised serious security issues on cryptographic devices. For SCA attacks, they examined both Differential Power Analysis (DPA) and Correlation Power
216
iv.
v.
vi.
vii.
S. Sikdar and M. Kule
Analysis (CPA); for fault analysis, they covered Differential Fault Analysis (DFA) and Differential Fault Intensity Analysis (DFIA). They went through detailed security against DFA in two varieties of a pinchable DPA impervious DRECON cipher. A distinct key liberated pinch has been used in the first model; it integrates the tweak linearly with cipher circumferences. Another version combines the pinch with circumstances in a nonlinear manner by utilizing it to pick out the S-Box. Both of them are DFA-resistant. Results were obtained on a Side Channel Attack Evaluation Board (SASEBO) GII platform and present the result for demonstration, while most state-of-the-art remedy models appear to be fruitfully trigger either DFA or DPA, DRECON is resistant from both power and fault attack. Seeven Amic et.al [4] presented cryptanalysis of DES using Binary Firefly Algorithm (BFA) based on chosen plaintext attack. They constructed a binary firefly algorithm to cryptanalyze DES and the outcomes are compared with the results that are obtained from Genetic Algorithm (GA). The experimental result shows us that on an average BFA for cryptanalysis of DES is more functional than standard GA in the sense of higher fitness of optimal keys obtained. Vikrant Shende et al. [5] introduced cryptanalysis of RSA encryption technique. In RSA algorithm, security lies inside the factoring of huge numbers which seems to be very difficult for a computer and very much time-consuming even for years also. Even for a small key, brute force attack may take a huge time to compute the key value. In this paper, they overcame the situation using multiple computers in a distributed environment. With this system setup, they increase the processing speed and provide multitasking. Along with the environment setup, they used a mobile agent to distribute the workload among the multiple computers. Their cryptanalysis algorithm can take any length of key. Finally, they showed from the result that using multiple computers and distributed computing environment cryptanalysis time was considerably reduced. In paper [6], Amlan Jyoti Choudhury et al. mainly criticized a previous work and found out the flaws of the system. In internetworking, user authentication is very much important in client–server architecture because of the appearance of serious security menaces or network rivals in the transmission links. Identification of legal client in a real-time environment is a prime dare in security. In this paper, they criticize the previous work of T. H. Chen and J. C. Huang on a two-party confirmation model asserting that the model was almost safe from the several attacks at that time. In this paper, they exposed that the model had several serious frailties in real-time scenario. They proved that the model was vulnerable to Man-in-The-Middle (MITM) attack and Information-Leakage attack. Additionally, they claimed that the model doesn’t provide the primary essential security service like user anonymity and session key initiation. KiSung Park et al. [7] showed cryptanalysis of a previous work, and also, they fixed the issues with an improvement of the previous scheme. An ambulatory user can get motile service from any place at any time through a mobile
Recent Trends in Cryptanalysis Techniques: A Review
217
network. But it becomes vulnerable as mobile service is served in the public domain. As a solution to this problem, in 2017, Qi et al. presented a coherent two-party key barter model for mobile domain. While cryptanalyzing the scheme of Qi et al., they figured out that the system was not resistant to insider attack, impersonation attack, and trace attack. Furthermore, they ascertained that the system didn’t provide user anonymity. Finally, they proposed a better and coherent two-party key interchange model for mobile environments. viii. Nazgul Abdinurova et al. [8] discussed the integration of quantum technology with cryptography by using cryptanalysis of different cryptographic algorithms. Cryptography aims to ensure that nobody will be able to decrypt the ciphertext even though the ciphertext is intercepted. The scientists have invented many cryptographic algorithms to ensure the security of the credential messages. With the evolution of the latest technologies, cryptography has been spread widely in different fields of computer science. As an instance, the authors of this paper have considered the RSA algorithm for engendering digital signatures to be approved on public service and information online of the electronic government system of the Republic of Kazakhstan; the Chinese satellite uses AES for safe video link; and triple DES has an immense need in the area of e-payment application. Determining the most appropriate encryption algorithm that can be integrated with quantum technology cryptography, they studied all the allied tasks with the application of algorithms in other fields of computer science. They have compared the results from the security point of view of some encryption algorithms which are used in image encryption like AES, RSA, Genetic, and Affine transformation with XOR action. They also mentioned the hierarchy of each algorithm depending on their safety level. To measure the safety level they considered the parameters like peak signal-to-noise ratio, Mean Square Error (MSE), Normalized Absolute Error (NAE), average difference, structural content, and maximum difference. AES turned out to be more secure as MSE had the top variance between real image and encrypted image. They also compared AES with the PRESENT cipher in terms of accomplishment over inexpensive smartphones. The size of SBox in AES is lessened by basis transformation from GF(28) to GF(2). It conferred a 64% hardware utility cutback in S-Box and a 51% area reduction in comparison with the previous LUT approach. They included some problems in encryption like simulation time, memory utility, and one-bit variation for performance evaluation of DES and AES. AES was comparatively better than DES to provide an improved security level. They have worked on one more field that is an extension and enhanced version of AES, interpreted as QAES, which is about the combination of quantum encryption method and dynamically S-Boxes. They showed that QAES produces more critical keys, which seem to be very hectic for the hackers to guess the keys in differentiation from the keys produced by AES. Finally, they concluded that AES is the best algorithm which can be chosen to integrate with quantum technology because of its resistance to different attacks.
218
S. Sikdar and M. Kule
ix. Gaylord O. Asoronye et al. [9] discussed the Caesar cipher. For cryptanalysis work, they implemented a Caesar cipher with the encryption precepts, ciphertext = plaintext + (key modulus 26), that makes the ciphertext and decryption precepts, plaintext = ciphertext − (key modulus 26), that through back plaintext from ciphertext. The cipher was designed in C programming language with file operations. As they used shifting of alphabets in the encryption process they used modular mathematical calculation with 26 to obtain a numeral less than or equal to 26. x. Suvraneel Chatterjee et al. [10] worked on cryptanalysis of block cipher against differential attack. Their goal was considering any block cipher, they could come up with a technique to inspect the least number of lively S-Boxes for each cycle using the GLPK (GNU Linear Programming Kit) solver and discover the least number of cycles are needed by the block cipher to be impervious to differential cryptanalysis (DC). They performed cryptanalysis using Mixed Integer Linear Programming (MILP). Then they found out differential probability of the S-Box from the differential characteristics using the ddt chart and consequently find out if the considered cipher is impervious to DC. They have used GLPK to solve MILP equations. xi. Mahdi Nikooghadam et al. [11] worked on Telecare Medical Information System (TMIS). TMIS is designed to provide an online platform for communication between patients and medical staff. Now for being placed on the Internet, the systems are susceptible to face various reliability and credential attacks. So, a secure channel is required between valetudinarians and therapeutical representatives, where both the parties will confirm their identity and a session key will be used jointly for later communication. At that time, an ECC (Elliptic Curve Cryptography)-dependent innominate identification and key consensus technique for healthcare application was proposed by Ostad-Sharif et al. In this paper, they showed that Ostad-Sharif’s model was impervious to key compromise password-guessing attack and key compromise impersonation attack. Then they fixed the issues with their proposed solution. They came up with a safe and effective identification and key consensus model for TIMS, which is resistant to the two attacks mentioned above as well as insider attack and replay attack. xii. Aiman Al-Sabaawi et al. [12] showed the detailed cryptanalysis of Vigenère cipher with no errors and also this paper helps the researchers to better understand the mathematical formulas to analyze the vulnerabilities and the weaknesses of crypto systems. They have divided the whole cryptanalysis work into three steps: discover the key; realize the linear measure of the key; observe the elements in the key. The index of coincidence (IOC), keyword length can be utilized to do all these three steps. They have defined the mathematical formula to find out the value of IC. This produces 100% accurate results with no errors. xiii. Nodir Rasulovich Zaynalov et al. [13] presented the frequency distribution analysis of Cyrillic letters of texts in the Uzbek language (the Uzbek language is based on Latin Characters). For the content purpose, they have used the
Recent Trends in Cryptanalysis Techniques: A Review
xiv.
xv.
1. 2. xvi.
xvii.
219
texts in the official web resources. Their work was dependent on the archival and aesthetic work “The Last Bullet” by the eminent Uzbek author of the decade Tohir Malik. Depending on their executed tests, they obtained the frequency percentage in alphabetical order and also the descending order of frequencies. In this paper, they have presented the outcomes on finding the character incidence rate of Uzbek native tongue in contrast to other native tongues. Priyanka Jojan et al. [14] discussed the quantum class of differential cryptanalysis (DC) which claims to speed up the algorithm quadratically over the existing classical one. Security of cryptography depends on factoring or extricating discrete logarithms in polynomial time on quantum computers. As differential cryptanalysis is more effective, they proposed a quantum class of DC method and they succeeded in reducing the computation time from exponential to polynomial time. The data processing complexity of their proposed classical DC reaches O(NK). They have shown that the number of interrogations needed to discern the cipher key will be lessened by applying Grover’s search algorithm in both full or partial quantum search. Fusen Wang et al. [15] showed that conventional chaotic encoding methods are not resistant to known plaintext attacks (KPA) using deep learning. They considered image restoration as the decryption process; they applied Convolutional Neural Network (CNN) to execute KPA on chaotic encryption system. They designed a network to realize chaotic encryption systems and used the upskilled network as a decoder. They went through three existing chaotic encryption models as targets. Test outcomes show that deep learning could be employed to KPA opposed to chaotic cryptosystems fruitfully. This paper imparts a new thought for cryptanalysis of chaotic cryptosystems. They claimed two advantages of their proposal are as follows: A neural network could be employed for cryptanalysis of different chaotic cryptosystems. The suggested scheme is noteworthily suitable and cheap. In paper [16], Kai Zhang et al. presented ULC which is a lightweight block cipher. This cipher has many superiorities in the sense of storage usage, competency, and safety over IOT. In this paper, they proposed a method for slide attack on full ULC in a related key setting. They proposed a key retrieval strike on ULC depending on the two characteristics of ULC. The first characteristic is shown to demonstrate the character of a solid pair of ULC. The second characteristic is included to establish a connection between some round key bits and some master key bits. In this attack method, they showed that all the 80 master keys can be retrieved. So, this paper strongly pointed out that ULC is vulnerable to slide attack in related key settings. Machine Learning (ML) and cryptanalysis are two overlapping things in recent days [17]. Applying ML in cryptanalysis is not a new thing. As the data size is increasing day by day, it is strongly required the involvements of ML techniques in cryptography and cryptanalysis. ML can learn from the data and
220
S. Sikdar and M. Kule
can generate automatic analytical models. ML is being used in cryptosystems to realize the correlation between input and output data generated by cryptosystems. ML (Mutual Learning, Boosting) can generate private cryptographic keys. Different ML algorithms like Naive Base, Support Vector Machine (SVM), and AdaBoost are used in classifying encrypted data and objects in steganography. Besides breaking the cryptosystems, ML can be applied in cryptanalysis also. In this paper, we have mentioned two papers, in which cryptanalysis is based on different ML techniques like DES-16 cryptanalysis using BFA (2016) and employing deep learning to known-plaintextattacks on chaotic-image-encryption models (2022). Different nature-inspired optimization algorithms based on ML are frequently used to find the key from the huge key search space.
4 Analysis of the Recent Cryptanalysis Techniques In this section, we have considered TC as Time Complexity and SC as Space Complexity. i. Paper [1]: TC: O(2.2N/2 ), N = Key size of N bits long, SC: O(1) Cryptanalysis Method Used: Linear Cryptanalysis (Matsui’s Algorithm), Linear Approximation. ii. Paper [2]: Attack Model: Keylogging, Trojan Horses; Method Used - Black Bag Cryptanalysis, Proposed Solution: Visualization - Time dependent one-timepassword-protocol, Password dependent authentication-protocol; TC: O(N), N = Number of characters in the OTP; SC: O(1). iii. Paper [3]: TC (Key Search Space): O(2128 ); Attack Model: Differential Power Analysis, Differential Fault Analysis; Used Cipher: DRECON; Platform: SASEBO GII. iv. Paper [4]: TC: O(N2 T), N = Population Size, T = Number of Iteration; SC: O(1), Cryptanalysis Technique: GA (Genetic Algorithm), BFA (Binary FireFly Algorithm), Based on: √ RMSD (Root Means Square √ Difference) v. Paper [5]: TC: O(e^ (log(n)loglog(n))); SC: O(e^ (log(n)loglog(n))). Cryptanalysis Technique Used: Quadratic Sieve Algorithm. vi. Paper [6]: TC: O(log2 (n/2n )); SC: O(1). vii. Paper [7]: TC: O(log2 (n/2n )); SC: O(1). viii. Paper [8]: TC: AES-128: 248 . ix. Paper [9]: TC: O(n*n!); SC: O(1); Implemented on: C programming Language using FILE structure. x. Paper [10]: TC: O (eN ); Tool Used: GLPK (GNU Linear Programming Kit), Cryptanalysis Techniques: MILP (Mixed Integer Linear Programming), DP (Differential Probability). √ xi. Paper [11]: TC: O( N); SC: O(1); Tool Used: Scyther Tool.
Recent Trends in Cryptanalysis Techniques: A Review
221
xii. Paper [12]: TC: O(26 K ), K = Length of the key and the time to compute IC (Index of coincidence), SC: O(M), M = Length of the alphabet; Tool Used: CrypTool 2.1; Cryptanalysis Technique: Frequency Analysis, Index of coincidence. xiii. Paper [13]: TC: O(N); SC: O(1); Cryptanalysis Technique Used: Frequency Analysis. √ √ xiv. Paper [14]: TC: O(NK) [Classical Approach], O( N) + O( K) [Quantum Approach]; SC: O(N1/2 ); Cryptanalysis Technique Used: Quantum Partial Search Algorithm, Quantum Counting Algorithm, Quantum Maximum Finding Algorithm. xv. Paper [15]: TC: O(MLN), M = Number of neurons in each layer, L = Activation function complexity, N = Number of layers, Cryptanalysis Technique: Known Plaintext Attack (KPA); Tool Used: Convolutional Neural Network (IDEDNet). xvi. Paper [16]: Data Complexity: 232 , TC: 263 , SC: 232 .
5 Conclusion In this paper, we have done a review work on recent trends in cryptanalysis for the last 8 years and analyzed the experimental results and found out their complexity in terms of space and time. In this survey, we found out that some of the works are really fruitful, whereas some are really weak in recent days. Like cryptanalysis of vigenere cipher [12], the main drawback is they have assumed that the key is known to them beforehand. In [8], the authors have shown through various comparisons and analyses of encryption schemes that AES is the most appropriate encryption algorithm to associate with the quantum technologies to make a more powerful encryption algorithm which is very difficult to break till the time. The researchers proposed some enhancements of some previous works, which they found to be risky for the security purpose [7, 11]. Cryptanalysis and enhancement of a two-party identification key interchange model [7] and secure identification and key consensus model [11] are based on the previous works and extended to advance level. In recent days, machine learning has made cryptanalysis more efficient. Results obtained from the DES-16 cryptanalysis using BFA [4] proved the efficient application of machine learning in cryptanalysis. Applications of machine learning in cryptanalysis make cryptanalysis easier on a complex cryptosystem.
222
S. Sikdar and M. Kule
References 1. Kokes J, Lorencz R (2015) Linear cryptanalysis of Baby Rijndael. In: Fourth international conference on e-technologies and networks for development (ICeND), pp 1–6 2. Divya R, Muthukumarasamy S (2015) An impervious QR-based visual authentication protocols to prevent black-bag cryptanalysis. In: IEEE 9th International conference on intelligent systems and control (ISCO), pp 1–6 3. Patranabis S, Roy DB, Mukhopadhyay D (2016) Using tweaks to design fault resistant ciphers. In: 29th International conference on VLSI design and 2016 15th international conference on embedded systems (VLSID), pp 585–586 4. Amic S, Soyjaudah KMS, Mohabeer H, Ramsawock G (2016) Cryptanalysis of DES-16 using binary firefly algorithm. In: IEEE International conference on emerging technologies and innovative business practices for the transformation of societies (EmergiTech), pp 94–99 5. Shende V, Sudi G, Kulkarni M (2017) Fast cryptanalysis of RSA encrypted data using a combination of mathematical and brute force attack in distributed computing environment. In: IEEE International conference on power, control, signals and instrumentation engineering (ICPCSI), pp 2446–2449 6. Choudhury AJ, Sain M (2017) Cryptanalysis of a novel user-participating authentication scheme. In: International conference on inventive computing and informatics (ICICI), pp 963–967 7. Park K, Lee K, Park Y (2018) Cryptanalysis and improvement of an efficient two-party authentication key exchange protocol for mobile environment. In: International conference on electronics, information, and communication (ICEIC), pp 1–2 8. Abdinurova N, Kynabay B (2018) Revealing encryption algorithm for integrating with quantum technologies by using cryptanalysis. In: 14th International conference on electronics computer and computation (ICECCO), pp 1–2 9. Asoronye GO, Emereonye GI, Onyibe CO, Ibiam A (2019) An efficient implementation for the cryptanalysis of Caesar’s cipher. Melting Pot 5(2):101–109 10. Chatterjee S, Nath Saha H, Kar A, Banerjee A, Mukherjee A, Syamal S (2019) Generalized differential cryptanalysis check for block ciphers. In: IEEE 10th Annual information technology, electronics and mobile communication conference (IEMCON), pp 1137–1140 11. Nikooghadam M, Amintoosi H (2020) An improved secure authentication and key agreement scheme for healthcare applications. In: 25th International computer conference, computer society of Iran (CSICC), pp 1–7 12. Al-Sabaawi A (2020) Cryptanalysis of Vigenère cipher: method implementation. In: IEEE Asia-Pacific conference on computer science and data engineering (CSDE), pp 1–4 13. Zaynalov NR, Narzullaev UK, Rahmatullayev IR, Abduhamid o’g’li AS, Muhamadiev AN, Qilichev D (2021) Analysis of the frequency of letters of the Uzbek language based on the cyrillic alphabet. In: International conference on information science and communications technologies (ICISCT), pp 1–4 14. Jojan P, Soni KK, Rasool A (2021) Classical and quantum based differential cryptanalysis methods. In: 12th International conference on computing communication and networking technologies (ICCCNT), pp 1–7 15. Wang F, Sang J, Huang C, Cai B, Xiang H, Sang N (2022) Applying deep learning to known-plaintext attack on chaotic image encryption schemes. In: ICASSP IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 3029–3033 16. Zhang K, Lai X, Wang L, Guan J, Hu B (2022) Slide attack on full-round ULC lightweight block cipher designed for IoT. In: Security and communication networks, vol 2022, Article ID 4291000, 8 p 17. Anees A, Hussain I, Khokhar UM, Ahmed F, Shaukat S (2022) Machine learning and applied cryptography. In: Security and communication networks, vol 2022, Article ID 9797604, 3 p
A New Approach to Pharmaceutical Product Verification Using Barcode and QR Code Prithwish Kumar Pal and Malay Kule
Abstract The rapid growth of pharmaceutical industry, especially after the outbreak of COVID-19 virus has become a massive threat to modern economies and public health. Medical emergency situations like pandemic will continue to have an impact on illicit trade in counterfeit and substandard medicines as there will be an exponential increase in demand for medicines, test kits, protective equipment, etc., (Interpol General Secretariat in Covid-19: the global threat of fake medicines, 2020, [1]). Currently available anti-counterfeiting solutions like holograms, color shifting inks, embedded code, images, and dyes depend mostly on customer awareness. These techniques require additional costs for awareness program and periodical change of elements to avoid forgery. This paper will solve the problems related to pharmaceutical products by introducing barcodes or QR codes to cross-verify the products. The barcode printed on the packing will be the key parameter to query the manufacturer database from the internet-facing interface and verify the product details. The basic requirements for the proposed solution to work are internet connectivity and multimedia devices with QR code scanner application. Keywords COVID-19 · Hashing · Salting · Parameterized query · Barcode · QR code
1 Introduction Along with the retail pharmaceutical business, today, e-commerce has become an alltime partner of our day-to-day life. Even many of the e-commerce brands have started or at least stepped into doorstep medicine delivery business. With digital currency P. K. Pal (B) St. Thomas’ College of Engineering and Technology, Kolkata, India e-mail: [email protected] M. Kule Indian Institute of Engineering Science and Technology, Shibpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_20
223
224
P. K. Pal and M. Kule
and lucrative offers, the purchases of pharmaceutical products online or through other communication channels are increasing at an exponential rate. These practices have led to production, injection, and sale of counterfeit products in supply chain and in retail stores [2]. The magnitude of the drug-counterfeiting problem is difficult to gauge as production and selling counterfeit drugs are generally identified when formal complaint is lodged and the perpetrators are caught. Thus, a user-friendly solution in combination with social awareness and technology is required to identify the counterfeit products, especially in pharmaceutical industry. In this paper, we will cover one such proposed technique that will scan the QR code or bar code printed on the package to cross-check the authenticity of the purchased products. The proposed solution is not restricted to supply chain or pharmacist rather it is available from manufacturer to the consumer. The qualifying criteria for the proposed solution are internet connectivity, QR code or barcode scanner application, and mobile device. The rest of this paper is organized in three different sections. Section 2 describes the literature survey followed by the proposed model of pharmaceutical product verification in Sect. 3. Section 4 concludes the paper.
2 Literature Survey In 2016, counterfeit pharmaceuticals reached 4.4 million businesses causing economic as well as public health damage [3]. One of the studies from WHO revealed that 48.7% of the counterfeiting cases in Western Pacific countries, 18.7% in Africa, 13.6% in Europe, and 1% in the U.S., and these numbers are increasing annually [46]. The main source of counterfeit medicines are online stores and legitimate supply chains. An increase to the internet with a combination of new methods of manufacturing and distributing illegal pharmaceutical products has created a major problem in safeguarding the trusted supply chain. A report published in 2019 by OECD/EUIPO indicates that out of 97 recorded product categories that were in the seizure list, pharmaceutical products were the tenth most counterfeited product. Below, Fig. 1 shows that, between 2014 and 2018, there have been increases of 102% in cases related to pharmaceutical crimes [7, 8]. Another study from 2018 confirms that the prevalence of falsified medicines is more in lower and middle-income countries and it is around 13.6% and, of that, the highest prevalence of falsified medicines are in Africa (18.7%) and Asia (13.7%) [7]. Detailed review from the customs data confirms that lifesaving drugs, antibiotics, painkillers, medicines for malaria, diabetes, epilepsy, heart disease, blood pressure, allergy, cancer, ulcer, etc., are the most targeted by counterfeiters [9, 10]. Figure 2 represents the statistical data counterfeit types of pharmaceuticals seized by customs from 2014 to 2016. Counterfeiting applies to both branded and generic drugs. In fact, generic drugs are often confused by counterfeiting drugs. The World Health Organization has estimated that around 10% of global pharmaceutical commerce or $21 billion business is from counterfeiting drugs [4].
A New Approach to Pharmaceutical Product Verification Using Barcode …
225
Fig. 1 Number of total incidents by year from 2014 to 2018
Fig. 2 Most counterfeit types of pharmaceuticals seized by customs from 2014 to 2016
Many anti-counterfeiting technical measures have been adopted by pharmaceutical companies to ensure the distribution of authentic products in online business as well as in supply chain. Of these holograms, color shifting inks, embedded code, images, dyes, etc., are most prevalent. These techniques allow the pharmacists to identify suspicious medicines or other pharmaceutical products [4]. Track and track is another anti-counterfeiting approach where each stock unit is assigned a unique identity which remains with the stock throughout the supply chain until it is consumed. Another technique called drug pedigree is a paper document or electronic file technique, that records the distribution of prescription drugs from manufacturer to dispenser, i.e., pharmacy or physician, etc. Mass serialization is another technique where a unique identity is generated, encoded, and can be verified by individual. But a specific bottle of a particular drug cannot be authenticated [11-14].
226
P. K. Pal and M. Kule
Several techniques that pharmaceutical industry adopted over the years, there is no method to validate the product details from the manufacturer directly. The techniques are more dependent on awareness and knowledge about holograms, color shifting inks, etc., which end users may not have, especially if he is buying the product for the first time. The focus of this article is to validate the relevant pharmaceutical products by any member of the supply chain as well as by e-commerce customers, pharmacisst, doctors, and retail customers before purchase or usage.
3 Proposed Model of Pharmaceutical Product Verification The application design of this paper introduces a way to identify and separate out counterfeiting products in supply chain and at customer level. The proposed design is separated into two segments—generation of bar code and the second part is retrieval of required information related to the product at the customer end. Optionally, the bar code embedded in an URL can be embedded into a QR code which can be a user-friendly approach for the proposed solution.
3.1 Generation of Bar Code The barcode is generated by applying hashing mechanism to the salt value along with unique information that represents the actual product. Here, the standardization of data will play a critical role. For example, if the product code is numeric then the pattern should be same for all the products. Moreover, all the parameter values should be converted to uppercase before applying hashing mechanism. To generate the hash value from the supplied information, a standard algorithm like SHA-256 can be used [8, 15]. By applying the below formula, the barcode can be generated: Barcode = Hash (SHA − 256) [Salt value + (Product id + Batch number + Manufacturing unit number + Expiry date + Manufacturing time stamp)] For proper functioning of the proposed solution, the manufacturer needs to maintain a database comprising of single or multiple database tables where primary key will be product id and manufacturing unit id which will act as indexing parameters for the actual information like product name, composition(s), manufacturing unit details, etc. While creating the hash with the product id, manufacturing unit id, i.e., the primary key will be used as the parameter rather than the actual data. Once the hash value is generated, it will store in database along with other information. This hash value will be used as a bar code that is printed on the products or packages. In product information retrieval phase, the query parameter, i.e., the barcode which is in turn the hash value can be matched with the existing hash
A New Approach to Pharmaceutical Product Verification Using Barcode …
227
Fig. 3 The bar code generation process
value stored in the backend database to check the correctness. If the value is valid, the other information and actual product-related information can be retrieved from product database(s). The barcode can be printed on the product package or a parameterized URL can be generated and embedded in the QR code which will be printed on the package. Although QR code is optional but it is a user-friendly way to use this solution because entering a barcode may not be a convenient way for the users. Because of the embedded URL, the chances of searching information from the forged URL will reduce to a certain extent. Figure 3 depicts the bar code generation process and Fig. 4 shows interaction with the database to generate the bar code.
3.2 Scan and Retrieval of Required Information Any user in supply chain or pharmacist or the end user will have to use a multimedia mobile with internet connectivity to scan the QR code or search the internet with the barcode printed on the package. On scanning the QR code, the embedded URL will be displayed. On clicking on the URL, the user will be redirected to browser application where the product-related detailed information will be displayed. For self-awareness, it is suggested to check the URL and domain name that is embedded in QR code. Chances are there that fake companies are maintaining website that
228
P. K. Pal and M. Kule
Fig. 4 Interaction with the database to generate the barcode
looks the same as the actual one so as convince the user to the fake website to get the information that may not be true. If the product is not having a QR code, in that case, the user needs to search for genuine product details HTML page where he can enter the bar code to get the actual information. Again in this approach, the user needs to check for authentic URL so that they don’t search or retrieve the product details from forged web pages. Figure 5 shows the process to retrieve the product details using barcode or embedded URL.
Fig. 5 Process to retrieve the product information
A New Approach to Pharmaceutical Product Verification Using Barcode …
229
4 Conclusion Currently available counterfeiting solutions that are used for pharmaceutical products have dependency on customer awareness and require additional costs for periodic upgrade or change in anti-counterfeiting elements. The proposed anti-counterfeiting technique will not only help the customer while they are purchasing or using the pharmaceutical product but also will help to identify and remove counterfeiting products at any point of time from the supply chain. This will not only save the business house and pharmacist from reputation loss but also save from financial and legal hassle. The proposed technique will eliminate the mandatory dependency on upgrading or changing the anti-counterfeiting techniques. Moreover, the proposed technique is simple and easy to use as most of people have smartphone and internet connectivity available with them. By scanning the QR code and clicking on the embedded link, the requested information will to the end user or pharmacists. This will give the self-satisfaction of having an authentic product to the user. The technique can be extended to auxiliary industries that are directly or indirectly related to pharmaceutical industry, for example, surgical products, baby foods, etc. For FMCG sector dealing with packaged foods, the technique can be implemented.
References 1. Interpol General Secretariat (2020) Covid-19: the global threat of fake medicines 2. Smith Y Counterfeit medications. News Medical Life Sciences 3. The OECD Task Force on Countering Illicit Trade (2020) The Covid-19 pandemic and illicit trade in fake medicines, 10 June 2020 4. Williams L, McKnight E (2014) The real impact of counterfeit medications. US Pharm 5. Kon SG, Mikov M (2011) Counterfeit drugs as a global threat to health. Med Pregl 64(5– 6):285–290. https://doi.org/10.2298/mpns1106285g 6. https://www.who.int/news-room/fact-sheets/detail/substandard-and-falsified-medical-pro ducts 7. OECD/EUIPO (2020) Trade in counterfeit pharmaceutical products, illicit trade. OECD Publishing, Paris 8. Kahate A Cryptography and network security. Tata McGraw-Hill Publishing Company Limited 9. US Food and Drug Administration (2022) Counterfeit medicine 10. https://www.fiercepharma.com/special-report/top-counterfeit-drugs-report 11. Bansal D, Malla S, Gudala K (2013) Anti-counterfeit technologies: a pharmaceutical industry perspective. Sci Pharm 81(1):1–13 12. Blackstone EA, Fuhr JP Jr, Pociask S (2014) The health and economic effects of counterfeit drugs. Am Health Drug Benefits 7(4):216–224 13. Bayer Global, Background information on counterfeit drugs 14. Pfizer, Fake drugs 101: facts on illegal, counterfeit drugs 15. Stallings W Cryptography and network security principles. Pearson Publication 16. National Center for Emerging and Zoonotic Infectious Diseases (NCEZID)
SmartGP: A Framework for a Two-Factor Graphical Password Authentication Using Smart Devices Palash Ray, Rajesh Mukherjee, Debasis Giri, and Mahuya Sasmal
Abstract Authentication procedures are employed to verify and validate user’s identification on various smart devices and a variety of digital apps. Alphanumeric and Personal Identification Number (PIN)-based passwords are easy to remember and replicate. However, they are vulnerable to different types of attacks. Alternative to traditional passwords, Graphical Passwords (GP) have gained popularity as they are harder to crack, more user-friendly, and offer more password space. On the other hand, Two-Factor Authentication (2FA) provides an extra layer of protection to the system by requiring additional login credentials. Nowadays, 2FA is widely used and gaining popularity. This paper introduces a novel and reliable user authentication method for Android Smartphones, namely SmartGP. It is deployed on Smartwatch which plays as the second factor during authentication. SmartGP is a robust, usable, and secure 2FA graphic al authentication system with an average accuracy of 94.99% and an average login time of 24.19 s. Keywords Graphical password · Authentication · Smartwatch · Emoji
1 Introduction With the advancement of affordable digital technology, the usage of smart devices has proliferated, and user identification has become a primary concern. As human responses and processes graphical contents better than text, Graphical Passwords (GP) for user authentication can be considered as an efficient, cost-effective, and convenient approach. GP can be regarded as a better alternative to text-based authentication systems [9, 12]. To enhance the security further, a multi-factor authentication strategy is another approach that involves more than one factor to validate the authenP. Ray (B) · R. Mukherjee · M. Sasmal Haldia Institute of Technology, Haldia, WB, India e-mail: [email protected] D. Giri Maulana Abul Kalam Azad University of Technology, Kolkata, WB, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_21
231
232
P. Ray et al.
Fig. 1 Block diagram of proposed SmartGP, offers multi-factor authentication scheme
ticity. The simplest type of authentication is single-factor authentication, where an individual’s identification is confirmed via a password. Biometric-based authentication methods are typically insufficient for protecting personal smart devices since attackers could simply steal or damage personal smart devices and access the sensitive data from them swiftly. Another direction of user authentication is the multi-factor authentication strategy which involves more than one factor (such as a password along with a smart device or biometrics). Although multi-factor-based authentication is undoubtedly more secure than single-factor authentication. Nevertheless, it is subject to various threats. It might be a high risk, if all the user credentials are put in a smartphone (e.g., password and SMS). The proposed SmartGP (Fig. 1) is a secure, two-factor graphical authentication scheme designed for smartphones. It requires users to memorize the emoji icons as their portfolio images and provide them during authentication. In order to use the second-factor authentication method, a user needs to preregister a smartwatch. A user is asked to select an object from the registered smartwatch and enter the same object’s name as a string on the smartphone. Our main purpose is to develop a graphical user authentication system that is more secure and resistant to shoulder-surfing attacks for smartphones. The objectives are: – Design and implement a graphical authentication system that balances security and usability and uses a smartphone and a smartwatch. – A smartwatch-based second-factor authentication method that detects an object from a smartwatch and inserts the name of that object as a string into the smartphone. – Test the proposed scheme in an environment which is vulnerable to shouldersurfing attack.
SmartGP: A Framework for a Two-Factor Graphical Password Authentication …
233
2 Related Works In 1996, a patent filed by Blonder [1] was the first step to Graphical Password Authentication (GPA), which appeared as an alternative to alphanumerical passwords. This demonstrated that humans are better adapted to perceive previously seen images than to recall difficult-to-remember alphanumeric passwords [8, 10]. Shoulder-surfing might also be carried out using advanced recording technologies that capture the user’s screen activity and create a video of it [2]. Similarly, Panda et al. [7] proposed a scheme named “SGP” based on patterns and password images. A graphical user interface-assisted pin-based architecture was proposed by Srinivasan [13], which restricts users to providing pass pins only through the interface provided by the system to prevent shoulder-surfing attacks. An empirical study investigated the influence of cultural familiarity on the choice of images for GPA [11]. Zhao et al. [15] empirically studied how the size, user interface, and authentication schemes affected the accuracy, speed, and security of touch-based smartwatch authentication. Sun et al. offered authentication via an app named PassApp loaded on the device [14], which eliminated the necessity for registration and memorizing passwords. Their solution prevented the shoulder-surfing attack, but the system became vulnerable to dictionary attacks. Chu et al. developed a 2FA scheme [3], where the user was challenged with 9 different pages and with a text password. The “TapMeIn” [6], a two-factor authentication scheme for smartwatches, allowed users to authenticate by tapping a recognizable melody anywhere on the screen. Guerar et al. [5] introduced CirclePIN, a cutting-edge smartwatch-specific authentication approach endowed with both resilience to the most frequent threats and a high degree of usability.
3 Proposed Methodology: SmartGP We propose a two-factor graphical password authentication scheme namely, SmartGP to improve the usability and security of smartphones. The first authentication phase is conducted on a smartphone, and the second factor is carried out via a smartwatch. The proposed SmartGP is represented as SMGP = f (Sei , Soi , Pi , St , ICGR , SPC , SWC )
(1)
where, Sei represents a set of all emoji icons, and Sei represents a set of object emojis. Next, Pi is a pass-icon vector which is a collection of emojis used during registration and authentication phase. St is a string matching entity used during second-factor authentication. The ICGR stands for the icon-grid that is used for registration and authentication. Finally, SPC and SWC are the graphical challenges on the smartphone and smartwatch, respectively. The proposed system is conceptualized in Fig. 1. The overall system consists of four phases.
234
P. Ray et al.
Fig. 2 SmartGP Authentication phase in Smartphone and Smartwatch
– Smartphone GUI and the Android Wear GUI: Android Studio is used to develop the GUI of smartphone and the smartwatch. – Web-portal UI: HTML and CSS are used to create a web-page. Also, PHP and MySQL are used at server-side programming. – Web API to connect Smartwatch and Mobile Apps with server: The website must be connected to the smartphone and smartwatch interfaces for user (self) authorization. – Registration and authentication techniques: The interface is made more secure and resilient by using SmartGP authentication algorithms. Icons Collection: All the emoji icons are obtained from https://icons8.com and added to our database. These icons have been scaled down to 64 × 64 pixels and are organized by level of complexity. Approximately 200 simple emoji symbols that are clear, intelligible, and memorable are used. For the smartwatch, we have selected 100 object icons that are simple to detect and recognize. These object icons are resized to 32 × 32 pixels and stored in our database. Registration Phase: During the registration process, the user is asked to enter a valid username. In response, the system generates an icon-grid of size 5 × 20 from the server. The user must create his/her own pass-icon vector by choosing five emojis from the icon-grid and then submit it to the server. This pass-vector is saved in the server along with a hash value and the user is registered successfully. Authentication Phase: The SmartGP enables a 2FA scheme for legitimate users. First, a user is asked to enter the username. After that, the server sends a graphical challenge (S PC ) to the smartphone. This challenge consists of an icon-grid of size 5 × 8. The user needs to select the correct portfolio images from the icon-grid (Fig. 2); otherwise, the user is rejected. A challenge is simultaneously delivered to the user’s pre-registered smartwatch. The user needs to open SmartGP application on the smartwatch and choose a particular object icon and send it to the server. Next,
SmartGP: A Framework for a Two-Factor Graphical Password Authentication …
235
Algorithm 1 SmartGP App; Registration Algorithm Input: Enter username, i=5 Output: Registered User SVR ← Enter a valid Username I C G R ← Registration Window while i = 0 do Pi = I C G R (m, n) i = i-1 SVR ← Pi end while D←P return
Icon-Grid m,n are the size of Icon-Grid
Store the Pass-Icons in the database
the name of the object is to be typed on the smartphone application as a string and sent to the server (Fig. 2). The server checks the response from the smartwatch and smartphone. If a match is found, the user is authenticated; otherwise, the user is rejected. Algorithm 2 SmartGP App; Authentication Algorithm Input: Enter username, Password Challenge, Two-factor authentication string, i=5 Output: Authorized user SVR ← Enter a valid Username if Username is registered then S PC ← SVR SmartGP authentication challenge else Display error and return back to main login screen end if I C G R ← S PC Icon-Grid in Authentication while i = 0 do I C G R (m, n) ← Q i Input Pass-Icon in Icon-grid i = i-1 SVR ← Q end while if P = Q then Invalid User else Display the screen for two-factor authentication challenge S PC SWC ← SVR Server creates 2nd factor query and send to Smartwatch end if S ← EO Emoji Object Name stored in String S SVR ← S if S PC (S) == SWC (T ) then User is authorized T is object-name collected from Smartwatch query else Invalid User end if return
Smartphone Challenge S PC : If a user enters a valid username, the server creates a graphical challenge (S PC ) and sends it to the smartphone. First, an 8 × 5 empty-grid is created. Then, the users’ portfolio icons are placed randomly on the grid and their
236
P. Ray et al.
positions are recorded. The rest portions are filled with decoy emoji icons. Next, the emoji-grid is sent to the smartphone as a graphical challenge. The user must choose all emojis which are chosen during registration. The response is correct, if the user is successful; otherwise not. Algorithm 3 SmartGP Challenge creation for Smartphone Input: Enter username, i=5 Output: SmartGP Challenge S PC has created. If SVR ← valid username from Algo. 2 First user response from Smartphone Icon-Grid (I G S P ) ← SVR Server creates 8 × 5 icon-grid for challenge Pi ← D i=5 while i = 0 do I G S P ← R(Pi ) Using random function R, place Pass-Icons on the I G S P i=i-1 end while i=35 while i = 0 do I G S P ← R(Fi ) Filled with fake icons Fi i=i-1 end while S PC ← I G S P Challenge for smartphone has created Client Smartphone ← S PC return
Smartwatch Challenge SWC : After a successful first authentication phase, the server sends a challenge to a pre-registered smartwatch. Four randomly chosen object icons along with their labels are selected from the database and placed into a 2 × 2 icongrid. The labels are not displayed in the grid. The icons are sent to the smartwatch. The user needs to select one object icon, out of four icons and submit it to the server. The icon-grid in the smartwatch is shown in Fig. 2.
4 Experimental Result and Usability Study To verify the SmartGP scheme, we have arranged for 10 pairs of smartphones and smartwatches at our departmental laboratory. The SmartGP application is installed in the smartphones and smartwatches. A total of 30 volunteers (age: below 45 years) are chosen and consented to review our SmartGP procedure. The volunteers are mainly students of our institute. They are normal PC or mobile device users, but, unfamiliar with GPA. We have briefly described GPA and its benefits. The registration and authorization steps of SmartGP system are explained and demonstrated to the volunteers. Accuracy: We have computed the “First Accuracy” and “Total Accuracy” to measure the success rate. The first accuracy is defined as the successful attempt on the first try divided by the total number of attempts. It is observed that both the performance and
SmartGP: A Framework for a Two-Factor Graphical Password Authentication … Table 1 Average time (sec) required for registration and authentication Session Registration time Authentication Average 2nd time (s) factor authentication time (s) First Second
24.12 s Nil
8.85 9.13
15.47 14.94
237
Total time (s)
24.32 24.07
accuracy are higher in the second week session compared to the first week session. In the first week, the total accuracy is 93.33%, while in the second week it is increased to 96.66%, therefore, the average accuracy is 94.99%, implying that SmartGP is highly efficient. Algorithm 4 SmartGP 2nd Factor Challenge SWC Creation Input: Select Object Emoji Output: Challenge for Smartwatch SWC . Response received in SVR SWC (T ) SVR ← SW Connect app to server. Assume SW is a authorized device i=4 while i = 0 do I G SW ← Ie Ie emojis from SVR; I G SW is a 2 × 2 icon-grid in Smartwatch i=i-1 end while SWC ← I G SW I G SW ← User select R(Oe ) Select a random object emoji from icon-grid SVR ← Oe T ← String(Oe ) SWC (T ) ← T return
Registration and Login Time: We have calculated the registration time and login time into two-week sessions, which are shown in Table 1. The average registration time of 30 participants is 24.12 s (s). The login time is divided into two parts: firstfactor authentication time and second-factor authentication time. The user selecting the portfolio images on the smartphone is the first-factor challenge, and it requires 8.85 s on average in the first week. Whereas, it requires average 9.13 s in the second week. Similarly, for second-factor authentication using a smartwatch requires 15.47 s in the first session, whereas in the second session it requires 14.94 s. Thus, the total average login time is 24.19 s.
238
P. Ray et al.
5 Security Analysis Random Guess Attack: The attackers use a random guess attack to try every conceivable combination of passwords until they find one that works. In our scheme, the user must choose five portfolio icons from a grid of 5 × 8 icons displayed by the SmartGP authentication the probability of a random guess attack system. Therefore, −6 = 1.519 × e . is calculated as 1/ 40 5 Shoulder Surfing Attack: Nowadays, most people use their smart devices in public places. For them, shoulder-surfing attacks (SSA) are the greatest threat. SmartGP is a 2FA scheme, in which if the password is revealed by the attacker, it is not sufficient to log in to the system. The smartwatch receives four object icons, and the user needs to select at least one and submit it as a text through the smartphone. In support of our study, we have performed a quick experiment with a few participants. Eight of them are requested to act as victims (Group-A), and the remaining eight are acted upon as shoulder-surf attackers (Group-B). Participants in Group-A are asked to enter their credentials and log in to SmartGP, whereas Group-B participants are watching them to do so. As the smartwatch is a very small device with a tiny screen, none of the group-B participants will be able to crack the SmartGP system. But, the portfolio images of the Group-A participants are revealed by the Group-B candidates. Hence, SmartGP is an SSA-resist GPA scheme with some limitations. Password Space: SmartGP is a cognometric scheme where a user needs to select 5 emojis from a 5 × 20 icon-grid i.e., a total of 100 icons. Assume that the cognometric schemes include an image library of size N . At most, the user’s password contains r images. There are two kinds of possibilities [4] in the password images: orderly and disorderly. Theoretically, orderly password space is Torder =
i r =1
PNr =
5
r P100 ≈ 8.166 × e29
(2)
r =1
where we have total 100 emojis from which a user needs to select 5 images (i = 5).
6 Conclusion This paper presents SmartGP, a two-factored GPA method that is effective for smart devices. The SmartGP application is a graphical challenge installed on Android devices for first-factor authentication, where second-factor verification can be done through a smartwatch. We have conducted a brief survey to evaluate the system’s usability and security. The survey’s findings demonstrate that SmartGP takes an average of 24.19 s to log into the system, and the average accuracy is 94.99%, which is fairly impressive for a two-factor authentication system. The password space we have calculated for SmartGP is also high, i.e., 8.166 × e29 . Table 2 implies that SmartGP outperforms few contemporary graphical password methods. Also, proposed simple
SmartGP: A Framework for a Two-Factor Graphical Password Authentication … Table 2 Comparison with other GPA schemes Scheme Type Login time Accuracy (s) (%) Smartwatch TouchBased (2017) [15] SGP (2018) [7] TapMeIn (2018) [6] CirclePin (2020) [5] SmartGP
239
Rest. to SS attack
Password space
Rand. guess
Single
2.5
75
No
Limited
NA
Single
34
95
Yes
C(9,4)
0.008
2FA behavioral Single
2
98.7
Yes
NA
4
99
Yes
10 ×e−4
0.013– 0.041 219
2FA
24.19
96.66
Yes
8.166 ×e23
1.519 ×e−6
SmartGP system is easy to use and deploy, and has a reasonable security level, which will be improved further. Therefore, we have come to the conclusion that the SmartGP system is easy to implement, easy to use, and has a decent level of security. As a result, it may be used in place of text-based password methods for various mobile applications.
References 1. Blonder G (1996) Graphical passwords. in lucent technologies, Inc., Murray Hill, nj, us patent, ed. United States 2. Bošnjak L, Brumen B (2020) Shoulder surfing experiments: a systematic literature review. Comput Secur 99:102023 3. Chu X, Sun H, Chen Z (2020) Passpage: graphical password authentication scheme based on web browsing records. In: International conference on financial cryptography and data security. Springer, pp 166–176 4. Gao H, Jia W, Ye F, Ma L (2013) A survey on the use of graphical passwords in security. J Softw 8(7):1678–1698 5. Guerar M, Verderame L, Merlo A, Palmieri F, Migliardi M, Vallerini L (2020) Circlepin: a novel authentication mechanism for smartwatches to prevent unauthorized access to IoT devices. ACM Trans Cyber-Phys Syst 4(3):1–19 6. Nguyen T, Memon N (2018) Tap-based user authentication for smartwatches. Comput Secur 78:174–186 7. Panda S, Kumari M, Mondal S (2018) SGP: a safe graphical password system resisting shouldersurfing attack on smartphones. In: International conference on information systems security. Springer, pp 129–145 8. Rainer G, Miller EK (2000) Effects of visual experience on the representation of objects in the prefrontal cortex. Neuron 27(1):179–189 9. Rao MK, Santhi S, Hussain MA (2019) Multi factor user authentication mechanism using internet of things. In: Proceedings of the third international conference on advanced informatics for computing research, pp 1–5
240
P. Ray et al.
10. Schacter DL, Reiman E, Uecker A, Roister MR, Yun LS, Cooper LA (1995) Brain regions associated with retrieval of structurally coherent visual information. Nature 376(6541):587– 590 11. Shaban AI, Zakaria NH (2018) The impact of cultural familiarity on choosing images for recognition-based graphical passwords. In: 2018 4th international conference on computer and information sciences (ICCOINS). IEEE, pp 1–5 12. Sinha A, Shrivastava G, Kumar P et al (2019) A pattern-based multi-factor authentication system. Scalable Comput Pract Exp 20(1):101–112 13. Srinivasan R (2018) Dragpin: a secured pin entry scheme to avert attacks. Int Arab J Inf Technol 15(2):213–223 14. Sun H, Wang K, Li X, Qin N, Chen Z (2015) Passapp: my app is my password! In: Proceedings of the 17th international conference on human-computer interaction with mobile devices and services, pp 306–315 15. Zhao Y, Qiu Z, Yang Y, Li W, Fan M (2017) An empirical study of touch-based authentication methods on smartwatches. In: Proceedings of the 2017 ACM international symposium on wearable computers, pp 122–125
Impact of Existing Deep CNN and Image Descriptors Empowered SVM Models on Fingerprint Presentation Attacks Detection Jyotishna Baishya, Prasheel Kumar Tiwari, Anuj Rai, and Somnath Dey
Abstract Automatic Fingerprint Recognition Systems (AFRS) are the most widely used systems for authentication. However, they are vulnerable to Presentation Attacks (PAs). These attacks can be placed by presenting an artificial artifact of a genuine user’s fingerprint to the sensor of AFRS. As a result, Presentation Attack Detection (PAD) is essential to assure the security of fingerprint-based authentication systems. The study presented in this paper assesses the capability of various existing DeepLearning and Machine-Learning models. We have considered four state-of-the-art Convolutional Neural Network (CNN) architectures such as MobileNet, DenseNet, ResNet, VGG as well as Support Vector Machine (SVM), trained with image descriptor features in our study. The benchmark LivDet 2013, 2015, and 2017 databases are utilized for the validation of these models. The experimental findings indicate toward the supremacy of Deep CNN models in cross-material scenario of PAs. Keywords Deep-learning · Machine-learning · Fingerprint biometrics · Presentation attacks
1 Introduction AFRS identify a person by taking the impression of their fingertip which makes them fast and user-friendly. Nowadays, these systems can be spotted in various reallife applications such as Automated Teller Machines (ATMs), Payment gateways, J. Baishya · P. K. Tiwari · A. Rai (B) · S. Dey Department of Computer Science and Engineering, Indian Institute of Technology, Indore, Madhya Pradesh, India e-mail: [email protected] J. Baishya e-mail: [email protected] P. K. Tiwari e-mail: [email protected] S. Dey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_22
241
242
J. Baishya et al.
Fig. 1 Visual comparison between live fingerprint, and spoofs made with different materials
National borders, etc. due to their speed and robustness. On the other hand, the wide range of applications makes them vulnerable to various attacks. One of them is the Presentation Attack (PA), which is carried out by presenting a manufactured artifact of a fingertip to the AFRS sensor. Fingerprint Presentation Attack Detection (FPAD) is a countermeasure for avoiding PAs. These approaches are classified into two types, i.e., hardware-based methods and software-based methods. Hardware-based methods are costly enough due to the involvement of additional hardware such as temperature sensors, pulse meters, humidity sensors, etc. On the other hand, software-based methods require only fingerprint images as input which makes them cost-effective as well as user-friendly in nature. This paper focuses on the software-based solution to the problem of PAs (see Fig. 1). In the software-based approach, various methods are proposed which can be further categorized into two categories that are machine-learning-based methods and deep learning-based methods. In Machine-learning-based methods [5, 11, 18], authors have utilized image descriptors to extract handcrafted or statistical features from the images. These features are further utilized for the classification of live and spoof fingerprint samples using machine-learning algorithms. These methods perform well when the training and testing samples are captured using the same sensing device. On the other side, methods suggested in [3, 4, 16, 19] utilized state-of-the-art CNN classifiers for PAD. These methods utilize CNN models for classification. The results reported by them, indicate that Deep CNN-based methods perform well as compared with Machine-learning-based methods. The FPAD methods are required to be robust against the spoof samples generated with unknown materials. In this work, we assess the performance of various existing CNN models and SVM empowered with the features extracted by local image descriptors against PAs in intra-sensor FPAD in the same-material and cross-material paradigms. We have utilized MobileNet [9], DenseNet [10], VGG [12] and ResNet [8] architectures. The performance of CNN architectures is evaluated by training them from scratch as well as by utilizing their trained version on the imagenet database. These CNN models are capable of extracting minute features from the input fingerprint image. Similarly, for the assessment of machine-learning algorithms, image descriptors such as Weber Local Descriptor (WLD) [5], Local Phase Quantization(LPQ) [18], and Binary Image Statistical Features (BSIF) [11] are utilized for feature extraction. The performances of CNN and Image descriptors are compared.
Impact of Existing Deep CNN and Image Descriptors …
243
2 Literature Review Presentation Attacks are considered a severe security hazard to AFRS. It becomes more challenging for an FPAD system to deal with PAs when the spoof sample is fabricated using novel materials. In recent years, various authors suggested CNN and ML-based methods for FPAD. Some of the methods are discussed in the literature review. In [15], Nikam proposed an FPAD method that utilizes Local Binary Pattern (LBP) as a feature extractor. LBP can detect the smoothness, roughness, structural, and regularity differences in local areas in fingerprint samples. They used SVM, KNN, and Neural Network to develop a hybrid classifier. Abhyankar et al. [2], have suggested a wavelet-based technique for FPAD. They have used the perspiration property of fingerprints to distinguish between live and spoof fingerprints. Espinoza et al. [6] have proposed a sweat pores-based FPAD method. Their method utilizes the number of sweat pores to decide whether the fingerprint impression is live or spoofed. In [13], Marasco proposed a feature-driven method that emphasized on statistical characteristics of a fingerprint sample. Every image consists of texture which can provide visual information related to the gray level intensity and variation. In this work, some first-order statistical and intensity-based features are extracted to train various classifiers. The validation of the proposed method is done over Livdet 2009 database. Ajita et al. [17] introduced a novel approach for detecting PAs created with the help of unknown spoofing materials. They proposed a multi-class classifier as a novel material detector that classifies a fingerprint as a live, spoof, or unknown. The fingerprint samples detected as unknown are utilized to train the classifier again to enable it for the detection of samples created with novel materials. Arora et al. [3] presented a robust approach for detecting PAD that incorporates the VGG model as a classifier. After the contrast enhancement using histogram equalization, fingerprint samples are fed to the VGG. The model is validated on various fingerprint databases, including FVC 2006, ATVSFFp, Finger vein dataset, LivDet 2013, and 2015 databases. This work is focused on the comparison of the performances of wellknown image descriptors with existing CNN models while the spoofs are fabricated with known and unknown materials.
3 Proposed Approach The image classification capability of a CNN model relies on its feature extraction module. The same convolutional filters are employed to extract minute features from the images. Various existing CNN models exhibit great classification accuracy on the Imagenet database which consists of millions of images of real objects while obtaining the same performance on the fingerprint LivDet databases is difficult due to the lack of sufficient texture information in these images. A fingerprint sample is a gray-scale image with ridge and valley information and hence it becomes more challenging to extract discriminating features among live and spoof samples to detect the PA. Further, it becomes harder when the spoof samples are fabricated using unknown
244
J. Baishya et al.
Fig. 2 Experimental evaluation framework of FPAD models
fabrication materials which are involved in the creating of fake samples belonging to the training dataset. On the other side, many image classification methods rely heavily on image representational coding which is a texture detail of the image. BSIF [11], LPQ [18], and WLD [5] are some of the image descriptors which are widely used to represent the image information in the form of codes. In this work, we have assessed the performance of four well-known CNN models such as MobileNet, VGG 19, DenseNet 121, and ResNet 18 as well as image descriptors, i.e., WLD, BSIF, and, LPQ. The details of the CNN classifiers, image descriptors, and classifier that are used in this study are described in the following subsections (see Fig. 2).
3.1 Deep CNN Models CNNs consist of convolutional layers which extracts some predefined or random features from an input image. These networks are trained with a huge amount of images. The updated weights can be used again for classification problems. These architectures can be used in two ways, i.e., style transfer and training from the scratch. In this study, CNN models are trained in both two ways, and results are reported accordingly. For the style transfer, we have utilized the pre-trained models on the imagenet database and tuned them by adding. In this study, we have tested the performance of MobileNet V2 [9], DenseNet 121 [10], VGG 16 [12] and ResNet 50 [8] CNN models against PAs in intra-sensor paradigm. The details of these architectures are discussed in the following sections:
Impact of Existing Deep CNN and Image Descriptors …
3.1.1
245
VGG 16 [12]
This architectural variation has sixteen convolution layers, three fully connected layers, five max pool layers, and a final soft-max layer. The layers till maxpool layer extracts the features while the remaining layers perform the classification. In this work, we utilize VGG 16 to classify the fingerprint samples. 3.1.2
MobileNet V2 [9]
This CNN architecture utilizes depth-wise convolution operation instead of the traditional convolution operation. It reduces the computation cost by 8 to 9 times as compared with the traditional convolutional operation. This model exhibits the same performance as of GoogleNet and VGG in terms of classification accuracy. This model is suitable for devices with limited computational resources such as mobile phones, tablets, etc.
3.1.3
ResNet 50 [8]
Kaiming and Zhang introduced the concept of residual connections to enable CNN to deal with the problem of vanishing gradient. A vanishing gradient is a phenomenon that persists due to the involvement of multiple activation functions which causes the value of the gradient to approach zero and downfall in the model’s classification accuracy. ResNet 50 consists of 50 layers with weights and the rest of the layers for classification.
3.1.4
DenseNet 121 [10]
Huang et al. suggested a novel CNN model to deal with the issue of vanishing gradient. In this architecture, the feature maps from a layer are concatenated with all other layers in the network. This architecture has been tested on state-of-the-art databases such as CIFAR, Imagenet, and SVHN.
3.2 Local Texture Descriptors It is a description of visual features that are present in the content of an image. It represents the characteristics of an image such as shape, color, texture, etc. Live and Spoof samples have different values for these features because a live fingerprint and its spoof have different natural properties. In this study, we have utilized three predefined texture descriptors for the extraction of feature values. The descriptors are discussed in the following subsections.
246
3.2.1
J. Baishya et al.
Local Phase Quantization (LPQ) [18]
LPQ is a texture descriptor that is based on the blur in-variance property of the Fourier phase spectrum. It is robust against the blur and redundant information present in the image. It applies Short-Term Fourier Transform (STFT) for the analysis of the image in the Fourier domain. The aforementioned descriptor is formulated as Eq. 1. f x (u) =
f (y)w(y − x)e− j2πuy
(1)
where f x denotes the output which is local Fourier coefficients at four different frequency values, w() is a window function defining the neighborhood, and f () is the output short-term Fourier transform. 3.2.2
Weber Local Descriptor (WLD) [5]
WLD is composed of two components that are orientation and differential excitation. The orientation is defined as the gradient orientation at the current pixel and denoted with Eq. 2. (2) F1 (x) = θ (x) = angle(∇(x)) where θ (x) is the local gradient angle. Similarly, differential excitation is defined as the ratio of differences in relative intensity between a current pixel and its eight neighbor pixels to the intensity of the current pixel itself. This is denoted using the Eq. 3. I¯3×3 − I (x) (3) F2 (x) = ξ(x) = I (x) Here I¯3×3 is the sample mean of intensity I across the 3 × 3 pixel square that has a center on x.
3.2.3
Binary Statistical Image Features (BSIF) [11]
The BSIF is computed by applying pre-trained filters on the fingerprint samples. These filters are learned by training on a set of real images. Its description is given in Eq. 4. Wi (u, v)X (u, v) = WiT .x (4) Si = u,v
where X () is the image patch of size l×l pixels, Wi is the filter of the same size and Si is the outcome.
Impact of Existing Deep CNN and Image Descriptors …
247
3.3 Machine-Learning Classifier The image descriptors provide a set of features that are to be utilized for the classification of live and spoof samples. for the classification, we have used an SVM. SVM is a supervised machine-learning algorithm that classifies input samples by finding a hyperplane between the data points of different classes. It can be used with three kernels that are: linear, polynomial, and Gaussian. In our study, we have opted Gaussian kernel due to its capability of handling non-linear classification problems.
4 Experimental Results The performance of the CNN architectures and image texture descriptors are evaluated on LivDet 2013 [7], 2015 [14] and 2017 [20] databases. The feature values extracted by image texture descriptors are fed as an input to the SVM. The CNN model is trained in two ways, i.e., style transfer from imagenet weights and training from the scratch on fingerprint databases. Section 4.1 consists of the details of databases and the performance evaluation metrics. Section 4.2, 4.3 and 4.4 describes the results achieved in all the paradigms.
4.1 Database and Performance Metrics The proposed model’s performance is evaluated by carrying out experiments on Liveness detection competition 2013 [7], 2015 [14], 2017 [20] databases. These databases consist of a separate set of live and spoof fingerprint samples for training and testing. The performance is evaluated in accordance with the ISO/IEC IS 30107 standards [1]. The values reported for each experiment are the Attack Presentation Classification Error Rate (APCER) which depicts the percentage of misclassified presentation attacks or misclassified spoof fingerprint images, and the Bona Fide Presentation Classification Error Rate (BPCER) which indicates the misclassified live fingerprint images, and Average Classification Error (ACE) which is given by the average of APCER and BPCER. The Eq. 5 denotes the calculation of ACE. ACE =
APCER + BPCER 2
(5)
The ACE can be used further to derive the accuracy of the FPAD model which is described in the Eq. 6. Accuracy = 100 − ACE (6) We also have evaluated the performance of the models in a high-security environment by using Detection Error Trade-off (DET) curve which is plotted between APCER and BPCER by varying the threshold value.
248
J. Baishya et al.
Table 1 Performance of pre-trained CNN architectures on LivDet 2013, 2015, and 2017 databases Database
Sensor
Accuracy (VGG 16)
Accuracy (DenseNet 121)
Accuracy (ResNet 50)
Accuracy (MobileNet V2)
LivDet 2013
Biometrika Crossmatch Italdata Average Biometrika Digper Crossmatch GreenBit Average Orcanthus Digper Greenbit Average
97.13 96.86 88.60 94.20 91.30 88.42 92.48 94.45 91.66 88.39 90.47 89.61 89.49
92.26 97.89 98.79 96.31 91.94 89.46 91.97 95.63 92.25 89.41 90.51 91.42 90.45
90.56 86.96 80.57 86.03 81.73 82.25 86.10 80.42 82.62 81.95 73.38 77.96 77.76
98.49 96.81 97.08 97.46 87.46 88.62 90.67 93.56 90.07 87.06 88.33 90.41 88.6
LivDet 2015
LivDet 2017
4.2 Intra-sensor and Known Fabrication Material In this experimental setup, all the training and testing fingerprint samples belonging to live and spoof class are captured with the same sensing device and the spoof samples belonging to both the training and testing datasets are created using the same fabrication materials. LivDet 2013 lies completely in this category while 2015 partially as two-thirds of its testing spoof samples are created using the same material as of training dataset. Tables 1, 2 and 3 show the results in this paradigm for CNN and image descriptors for LivDet 2013. It is evident that MobileNet V2 trained with imagenet database outperforms other CNN as well as image descriptors in this paradigm.
4.3 Intra-sensor and Unknown Fabrication Material In this experimental setup, the spoof fingerprint samples the training and testing spoof samples are created using different fabrication materials while the sensing devices are the same. It evaluates the performance of any FPAD model in a real-world context where a spoof can be created using any novel material. LivDet 2015 partially and 2017 is completely arranged according to this setup. Table 1, 2, and 3 represent the results reported in this experimental setup. The results reported indicate that VGG 16 outperforms all the other classifiers in terms of classification accuracy.
Impact of Existing Deep CNN and Image Descriptors …
249
Table 2 Performance of CNN architectures trained on fingerprint databases Database Sensor Accuracy Accuracy Accuracy (VGG 16) (DenseNet (ResNet 50) 121) LivDet 2013
LivDet 2015
LivDet 2017
Biometrika Crossmatch Italdata Average Biometrika Digper Crossmatch GreenBit Average Orcanthus Digper Greenbit Average
98.84 98.84 86.65 94.77 93.10 88.74 95.69 93.12 92.66 91.77 92.36 94.12 92.75
97.69 97.84 93.26 96.26 94.95 91.46 95.79 92.59 93.69 90.66 91.82 93.19 91.89
94.42 95.74 84.58 91.58 93.38 88.30 95.74 93.60 92.75 89.09 92.66 91.69 91.14
Accuracy (MobileNet V2) 99.29 95.69 93.92 96.30 93.51 89.50 93.61 94.74 92.84 89.82 91.26 92.38 91.15
Table 3 Performance of image descriptors empowered SVM on LivDet 2013, 2015, and 2017 databases Database Sensor Accuracy Accuracy Accuracy (BSIF+SVM) (LPQ+SVM) (WLD+SVM) LivDet 2013
LivDet 2015
LivDet 2017
Biometrika Crossmatch Italdata Average Biometrika Digper Crossmatch GreenBit Average Orcanthus Digital Persona Greenbit Average
91.74 55.56 46.60 64.63 75.88 70.76 50.88 72.55 67.51 83.16 88.93 47.44 73.17
97.85 55.56 50.55 67.98 76.32 60.24 50.88 74.39 65.45 82.38 87.75 54.40 74.84
79.58 72.62 84.70 78.96 77.80 80.00 83.55 81.80 80.78 79.56 74.08 76.37 76.67
4.4 Evaluation Using Detection Error Trade-Off Curves In this section we analyze the high-security performance of the CNN and image descriptors by plotting DET curves for the performance on LivDet 2017 database. For an FPAD model to be highly secure, its APCER should be low. Figure 3 exhibits
250
J. Baishya et al.
Fig. 3 Detection error trade-off (DET) curves for LivDet 2017 Greenbit, Orcanthus, and digital persona
the DET curves for LivDet 2017 Greenbit, Orcanthus, and Digital Persona sensors. It can be observed that for Greenbit sensor, the APCER of 1%, the BPCER ranges from 20 to 40% for CNN models while it ranges from 80 to 95% for image descriptors. The same kind of performance can be observed for the Orcanthus sensor. Apart from these the models have shown some improvement while being tested on Digital Persona. The results reported for this sensor reports the APCER to be in the range of 20–40% for CNN models while 45 to 79% for image descriptors.
5 Conclusion In this paper, the performance of CNN models and image texture descriptors are evaluated against the PAs in different possible attack scenarios. The study concludes that CNN models perform better as compared with image descriptors in all the cases. It also indicates that the classification accuracy of CNN increases when they are trained on a fingerprint database instead of being initialized with the weights of imagenet or some other database. One of the possible reasons behind the supremacy of the CNNs over image descriptors is that their weights are updated as per the information present in the images while the image descriptors extract a fixed sort of information from the input image samples. It can also be inferred from this work that the DenseNet 121 CNN architecture exhibits better performance among all four CNN models as well as image descriptors empowered by SVM.
Impact of Existing Deep CNN and Image Descriptors …
251
References 1. 30107-3:2017(en) I (2017) Information technology-biometric presentation attack detectionpart 3: testing and reporting 2. Abhyankar A, Schuckers S (2009) Integrating a wavelet based perspiration liveness check with fingerprint recognition. Pattern Recognit 42:452–464 3. Arora S (Oct 2019) Fingerprint spoofing detection to improve customer security in mobile financial applications using deep learning. Arab J Sci Eng 45(4):2847–2863 4. Chugh T, Jain AK (2021) Fingerprint spoof detector generalization. IEEE Trans Inf Forensics Secur 16:42–55 5. Ding YJ, Wang HB, Tao L (2018) Weber local descriptor with double orientation line feature for finger vein recognition. In: 2018 IEEE 23rd international conference on digital signal processing (DSP), pp 1–5 6. Espinoza M, Champod C (2011) Using the number of pores on fingerprint images to detect spoofing attacks. In: Proceedings of the international conference on hand-based biometrics, pp 1–5 7. Ghiani L, Yambay D, Mura V, Tocco S, Marcialis GL, Roli F, Schuckcrs S (2013) Livdet 2013 fingerprint liveness detection competition 2013. In: 2013 international conference on biometrics (ICB), pp 1–6 8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 9. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (Apr 2017) Mobilenets: efficient convolutional neural networks for mobile vision applications 10. Huang G, Liu Z, Maaten LVD, Weinberger KQ (Jul 2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 2261–2269 11. Kannala J, Rahtu E (2012) Bsif: binarized statistical image features. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1363–1366 12. Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734 13. Marasco E, Sansone C (2012) Combining perspiration- and morphology-based static features for fingerprint liveness detection. Pattern Recognit Lett 33(9):1148–1156 14. Mura V, Ghiani L, Marcialis GL, Roli F, Yambay DA, Schuckers SA (2015) Livdet 2015 fingerprint liveness detection competition. In: IEEE 7th international conference on biometrics theory, applications and systems (BTAS), pp 1–6 15. Nikam SB, Agarwal S (2008) Local binary pattern and wavelet-based spoof fingerprint detection. Int J Biom 1:141–159 16. Nogueira RF, de Alencar Lotufo R, Campos Machado R (2016) Fingerprint liveness detection using convolutional neural networks. IEEE Trans Inf Forensics Secur 11(6):1206–1213 17. Rattani A, Ross A (2014) Automatic adaptation of fingerprint liveness detector to new spoof materials. In: IEEE international joint conference on biometrics (IJCB), pp 1–8 18. Sharma R, Dey S (Mar 2021) A comparative study of handcrafted local texture descriptors for fingerprint liveness detection under real world scenarios. Multimed Tools Appl 80:1–20 19. Uliyan DM, Sadeghi S, Jalab HA (2020) Anti-spoofing method for fingerprint recognition using patch based deep learning machine. Eng Sci Technol Int J 23(2):264–273 20. Yambay D, Schuckers S, Denning S, Sandmann C, Bachurinski A, Hogan J (2018) Livdet 2017fingerprint systems liveness detection competition. In: IEEE 9th international conference on biometrics theory, applications and systems (BTAS), pp 1–9
Centralized Approach for Efficient Management of Distributed Linux Firewalls Deepika Dutta Mishra, P. Kalyan, Virender Dhakwal, and C. S. R. C. Murthy
Abstract Firewall plays an important role in the security of an organization by monitoring and filtering incoming and outgoing network traffic. Organizations typically have multiple physically distributed networks and hosts. For defense indepth purpose, firewall deployment at both host and network levels is needed leading to distributed firewall management. Distributed approach to firewall management is tedious and error-prone as it requires multiple logins and repetitive effort. The management complexity further increases with an increase in the number of network/host firewalls. Centralized management approach is thus required. Widely accepted Linux firewall management tools in open source are standalone in nature and lack centralized management features. In this paper, we propose a scheme named “Centralized Firewall Management System (CFMS)” which facilitates the management of distributed Linux firewalls both at network and host levels in a centralized manner. We further discuss the design, features, and implementation of the proposed scheme and highlight its limitations and benefits over distributed management. Keywords Firewall · Centralized firewall management · Iptables · Ipset · Cyber security
1 Introduction Firewall is used for monitoring and filtering incoming and outgoing network traffic within an organization. Firewalls are categorized into two types—network and D. D. Mishra (B) · P. Kalyan · V. Dhakwal · C. S. R. C. Murthy Bhabha Atomic Research Centre, Mumbai 400085, India e-mail: [email protected] P. Kalyan e-mail: [email protected] V. Dhakwal e-mail: [email protected] C. S. R. C. Murthy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_23
253
254
D. D. Mishra et al.
host firewalls. Network firewall filters traffic passing across multiple networks and protects an organization’s network from unauthorized access, whereas host firewall filters traffic passing in and out of the host and protects a host from unauthorized access. Both types of firewalls are needed for defense in depth purpose. “Iptables” [1] is a popular open-source firewall for Linux operating systems which can be used both as a host as well as a network firewall. There exists an extension to Iptables named “Ipsets” [2] which enhances Iptables’ performance. A large organization has multiple physically distributed networks and hosts which require deployment of multiple network- and host-level firewalls. Management of multiple firewalls manually is difficult and error-prone. Automatic management tools are thus needed. Widely accepted tools in open source for Iptables management are standalone in nature like Shorewall [3], Webmin [4], etc. In a distributed scenario, standalone management becomes tedious as each machine will have to be managed separately and the complexity increases with an increase in the number of machines to be managed [5]. A centralized management solution is thus required. There exist generic tools which automate management tasks in a centralized manner like Salt [6], Ansible [7], etc., but they are not a dedicated firewall management solution [5]. To overcome the above issues, we propose a firewall management scheme named “Centralized Firewall Management System (CFMS)” which facilitates centralized management of distributed Linux firewalls. It can manage firewalls both at host and network levels in an efficient manner. The existing works related to the proposed scheme consist of policy-based management (PBM) [8, 9] approach for centralized firewall management. Work in [10] discusses a scheme for centralized firewall management. Our work differs from existing works as it follows a simpler approach than PBM (which is still evolving) and aims at developing a management tool. Work in [10] focus on firewall appliance management and lacks many features as proposed in CFMS.
2 CFMS Scheme for Firewall Management CFMS is designed for centralized and automatic Linux firewall (Iptables and its extension Ipsets) management. Figure 1 depicts the high-level architecture of CFMS. Remote machines have Iptables/Ipsets installed and their policies are managed through CFMS Manager, which is a web application for centralized firewall policy management. The system administrator has to log in one time in the Manager. He/she can configure and generate remote machine’s firewall policy and then remotely deploy it on the machine being managed.
Centralized Approach for Efficient Management of Distributed Linux …
255
Fig. 1 CFMS high-level architecture
3 CFMS Manager Design Details Centralized management features are provided by CFMS Manager (or Manager). Figure 2 depicts the design details of Manager in a block diagram. It consists of three
Fig. 2 Design details of CFMS Manager
256
D. D. Mishra et al.
major components—client-side web interface, server-side application, and agent program installed on the remote machine. The client side is an MVC-based web interface. The system administrator interacts with the client-side interface which further interacts with server-side application. The server-side application consists of a central database for storing data which includes remote machine information, firewall policies, and other administrative and audit data. Multiple modules of server application interact with the central database. These modules are discussed below. (a)
Global Parameter Management: These are the parameters applicable to all the machines registered in Manager. It consists of: • Blacklisted IP/Network address—It is a list of IP/Network addresses to be blacklisted in all the machines. Internally, these are implemented as Ipsets. • Global labels—These are the key-value pairs of label name and IP/Network address. Label names are meaningful names given to a set of IP/Network addresses which is later used to create firewall policies for machines.
(b)
Register new machine: A machine whose firewall needs to be managed is first registered in the Manager. Registration information contains machine name, its network interface details, deployment interface, deployment password, etc.
(c)
Cloning: Clone of the existing registered machine can be made whereby policy information is duplicated automatically for the new cloned machine in Manager. Machine’s Access Control: Two categories of access control are provided— Basic mode and Advanced mode.
(d)
• Basic mode consists of configuring local blacklists and whitelists of Network/IP Addresses for the machine. These lists are internally implemented as Ipsets. • Advanced mode consists of creating complex traffic filtering rules. Rules are defined in three chains—INPUT, OUTPUT, and FORWARD with match criteria on input/output interface, protocol, source IP, source port, destination IP, and destination port. The source and destination IP can be defined as “any”, an IP/Network address or a set of global labels. The source and destination ports can be defined as “any” or specific ports. Rate limiting on rule can be configured by setting value for a maximum number of IP connections in a particular time interval. Action on rule is defined as ACCEPT or DROP. Logging on rule can be enabled. Ordering of rules in chains is crucial; these can be dragged and dropped in the Manager to re-order them. (e)
(f)
Machine policy generation: Global parameters and machine’s access controls are stored in central database. During policy generation, data is read from this database and a packaged file containing policy files for the machine is generated taking care of correct format, syntax, and ordering. Machine Policy Download: The packaged file generated for the machine can be downloaded from the Manager.
Centralized Approach for Efficient Management of Distributed Linux …
257
Fig. 3 Flowchart depicting functionality of CFMS agent program
(g)
(h)
Machine policy remote deployment: Deployment is done using SSH (a deployment user account is created on the remote machine for this purpose). The packaged file is transferred to the remote machine and CFMS agent program is invoked. Figure 3 shows the flowchart of agent program’s functionality. Periodic Policy Checks and re-deployment: Deviation of remote machine’s firewall policy from the last deployed one is checked periodically and redeployment is done if required.
4 CFMS Implementation We implemented CFMS scheme on Centos 7 Linux platform. The client-side web interface of Manager was developed using Extjs [11] javascript library and the server side was developed using Django [12] framework and Python [13] language. Central database is implemented using PostgreSQL [14] and remote deployment over SSH [15] was implemented using Python’s Fabric [16] library. In the figures below, some snapshots of the implemented system are shown (Figs. 4, 5, 6, 7, 8, 9, and 10).
5 CFMS Access and Security Features Currently, CFMS scheme is designed for access from within the organization, i.e., over the Intranet. Following security features currently exist in CFMS.
258
D. D. Mishra et al.
Fig. 4 Machine registration
Fig. 5 Global blacklist configuration
Fig. 6 Global label configuration
(a)
(b)
User authentication and authorization: Manager can have multiple user accounts each authenticated using password. Each user is assigned a role which grants him certain access rights. GUI rendering according to logged-in user’s role: The Manager is rendered in accordance with logged-in user’s role. Only actions allowed in user’s role are visible, all others are hidden.
Centralized Approach for Efficient Management of Distributed Linux …
259
Fig. 7 Whitelist configuration (basic mode)
Fig. 8 Blacklist configuration (basic mode)
(c)
(d)
(e)
(f) (g) (h)
Server-Side and Client-side authentication/authorization checks: Authentication and authorization checks of user and his actions are done both at client side and server side. Backup and Restoration facility: Backups of internal databases and other internal files can be taken and saved for later restoring the state of Manager. Also, periodic backups can be scheduled. Database Security: The administrative database user is separate from Manager’s user which is given limited privileges. Usernames and passwords are stored in database after encrypting the password field. Input Validation: Input data is validated before entering into database to check for vulnerabilities like SQL injection, cross-site scripting, etc. Audit Trails and Security Auditing: The application keeps audit trails of user login/logout date time, actions performed, why the action was taken, etc. Hardening and Vulnerability Assessment: Necessary steps are taken for application hardening and security auditing to check for vulnerabilities and fix them.
260
D. D. Mishra et al.
Fig. 9 Rule creation in advanced mode
6 CFMS Limitations CFMS suffers from the following limitations: (a) (b)
(c)
NAT functionality is not yet implemented in the scheme. All firewall policies are stored at a central location leading to security risk therefore, securing CFMS is crucial, especially when accessed over the Internet. Due to centralized nature, system availability becomes an issue as failure will hamper policy configuration and deployment.
Centralized Approach for Efficient Management of Distributed Linux …
261
Fig. 10 Rules created in INPUT/OUTPUT chains
7 CFMS Benefits Over Distributed Systems Though CFMS scheme, currently, has some limitations but these can be easily countered in future, making it a more efficient management scheme. Below, we discuss its benefits over distributed management. (a)
(b) (c)
Management complexity does not increase with increase in the number of machines. Administrator need to log in one time at the central system and can manage remote machines. Machines having identical policies can be cloned, thus avoiding repetitive work. Label names are easy to remember than IP Addresses and increase the readability of policies. Centralized label management maintains uniformity of labels across machines.
262
D. D. Mishra et al.
(d)
With global blacklisting, blocking an offending IP address in all the machines is possible with a button click. Periodic policy checks and re-deployment protects against unwarranted changes done locally on remote machines.
(e)
8 Future Work Future work in CFMS involves adding more functionality like NAT configuration in the scheme. Also, security features of the scheme need to be enhanced in order to make it work over the Internet. To increase availability, high availability design needs to be incorporated. Also, there is a need for performance study of the scheme under high-load conditions.
9 Conclusion An organization has multiple distributed firewalls deployed both at network and host levels for defense in-depth purpose. Distributed management of multiple firewalls is tedious and error-prone thus requiring centralized management. Widely accepted Linux firewall management tools in open source are standalone in nature and not suitable in distributed environment. In this paper, we proposed a centralized management scheme named Centralized Firewall Management System (CFMS) for centrally managing distributed Linux firewalls. We discussed the scheme w.r.t its design, features, implementation, benefits, and limitations. In the end, we conclude that CFMS with its features like cloning, global label management, global blacklisting, local blacklisting/whitelisting, advanced rule creation, remote policy deployment, periodic policy checks and re-deployment, etc., provides an efficient firewall management solution.
References 1. 2. 3. 4. 5.
GN Purdy (2004) Linux iptables pocket reference: firewalls, NAT and accounting IP sets. https://ipset.netfilter.org/ Shorewall. http://www.shorewall.net/ Webmin. http://www.webmin.com/ Mishra DD, Dhakwal V, Pathan S, Murthy C (2020) Design and development of centralized squid proxy management system. In: IEEE international conference on electronics, computing and communication technologies (CONECCT), pp 1–6. https://doi.org/10.1109/CONECCT50 063.2020.9198539 6. Salt. https://www.saltproject.io 7. Ansible. https://www.ansible.com 8. Kosiur D (2001) Understanding policy based networking. Wiley
Centralized Approach for Efficient Management of Distributed Linux …
263
9. Caldeira F, Monteiro E (2002) A policy-based approach to firewall management. In: Gaïti D, Boukhatem N (eds) Network control and engineering for QoS, security and mobility. NetCon 2002. IFIP—the international federation for information processing, vol 107. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35620-4_10 10. Pinto L, Monteiro E, Simoes P (2003) Towards integrated management of firewall appliances. In: The IASTED international conference on communication, network, and information security 11. ExtJS, Javascript framework. https://www.sencha.com/products/extjs/communityedition/ 12. Django. https://www.djangoproject.com 13. Python. https://www.python.org 14. PostgreSQL. https://www.postgresql.org 15. SSH. https://www.ietf.org/rfc/rfc4253.txt 16. Fabric, high level Python Library for execution over SSH. https://www.fabfile.org
An Improved (24, 16) OLS Code for Single Error Correction-Double Adjacent Error Correction-Triple Adjacent Error Correction Sayan Tripathi, Jhilam Jana, and Jaydeb Bhaumik
Abstract Importance of soft error correcting codes for improving the reliability of memory cells is increasing day by day. Earlier approaches of employing single error correction (SEC) and single error correction-double error detection (SEC-DED) codes are not amenable today because the probability of multiple bits adjacent error increases with the downscaling of CMOS technology. To address the issue of multiple adjacent bits errors, more advanced error correcting codes (ECCs) like Orthogonal Latin Square (OLS) codes are employed nowadays. In this paper, an improved (24, 16) OLS code has been presented to enhance the error correction capability and to reduce the miscorrection probability. The parity check matrix of the proposed OLS code is constructed by using block cyclic shift algorithm and parity bit replacement logic. The major challenges for designing multibit adjacent ECCs are higher decoding overheads and miscorrection probability. But the proposed OLS code has single error correction-double adjacent error correction-triple adjacent error correction (SECDAEC-TAEC) property and 0% miscorrection probability. The architectural designs of proposed and existing OLS codes are synthesized and implemented on the ASIC platform. It is found that the proposed encoder and decoder circuits consume lower power compared to existing designs. Keywords Soft errors · Memories · Orthogonal Latin Square (OLS) code · SEC-DAEC-TAEC · ASIC
S. Tripathi (B) · J. Jana · J. Bhaumik Department of ETCE, Jadavpur University, Kolkata, India e-mail: [email protected] J. Jana e-mail: [email protected] J. Bhaumik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_24
265
266
S. Tripathi et al.
1 Introduction Semiconductor memory is an essential subsystem in most modern electronic gadgets to store data. The radiation-induced soft errors have a significant impact on the reliability of modern memory systems [1]. ECCs have been extensively employed for the protection of memories from soft errors. Initially to protect the memory contents against soft errors, SEC and SEC-DED codes [2] were to correct single errors and detect double errors. Due to scaling of semiconductor technology, the probability of multiple bit errors in more than one memory cells have been increased. To overcome this issue, more advanced OLS codes are used. OLS codes require larger number of parity bits compared to the existing linear block codes against the same number of errors. In most cases, the chances of double and triple adjacent errors in multiple bit upsets (MBUs) increase due to downscaling. Several OLS codes have already been introduced. Reviriego et al. have proposed a scheme for OLS codes for the protection of larger information bits in memories. SEC-OLS codes to protect memories and caches are proposed by Reviriego et al. [3, 4]. Double error correction OLS codes have also been introduced by Demirci et al. [5]. Xiao et al. have introduced a technique of OLS codes based on cyclic shift algorithm which can correct single error and double adjacent errors [6]. These OLS codes have miscorrection probability which means these OLS codes are not uniquely decodable codes. Lee et al. have introduced an OLS code whose error correction capability depends on the reliability of the system [7]. Tripathi et al. have designed and implemented SEC-OLS code with the help of circular block shifting technique [8]. But these existing codes require more power or lower error correction capability or have higher miscorrection probability. To overcome the issues of power consumption and miscorrection probability, this paper presents an improved SEC-DAEC-TAEC OLS code with 0% miscorrection probability for memory. The proposed code requires lesser number of parity bits than double or triple error correction OLS codes and has correction capability up to triple adjacent errors. The main contributions of the work presented in this paper are enumerated as follows: (i) a modified scheme to construct the H -matrix for SEC-DAEC-TAEC-OLS code has been proposed, (ii) the proposed OLS code with a message length of 16 bit has been designed and implemented on ASIC platform, and (iii) the proposed code is uniquely decodable and power efficient compared to existing related designs. The rest of the paper unfolds as follows. Section 2 provides the principle of OLS codes and Xiao et al. scheme. The proposed scheme is described in Sect. 3 and the conclusion is described in Sect. 4.
2 Basics of OLS Codes and Xiao et al. Scheme This section comprises the basics of OLS codes and Xiao et al. scheme based on the block cyclic shifting algorithm. OLS codes are designed with the help of Latin
An Improved (24, 16) OLS Code for Single Error Correction-Double …
267
Fig. 1 a H -matrix of (24, 16) SEC-OLS code, b H -matrix for (24, 16) SEC-DAEC OLS code proposed by Xiao et al.
Squares. Permutations of digit 0 to m − 1 in rows as well as columns of a m × m square matrix constitute a Latin square of size m × m. For two orthogonal Latin squares, any pair of elements formed by superimposing two squares is distinct. In a block of OLS code, there are m × m information bits and 2tm parity bits; m is the integer value and t is the correction capability of the code. One of the major advantages of OLS codes is speed and simple decoding procedure. For OLS codes, one step majority logic decodable (OLS-MLD) or majority logic decoding is used. In this algorithm, each bit is decoded by selecting the majority value from the set of recomputed parity check equations of the participating bit. Each data bit participates in all the 2t check bit parity equations and out of them at most one check bit participates in the other check bit’s calculation. Thus, 2t equations are formed and we can check whether the check bits are orthogonal to the given bit or not. If the number of errors is less than or equal to t, and the given bit is affected, the (t − 1) errors can affect the same number of check bits participating and still the remaining t + 1 bits are correct. If the given bit is not affected by errors, then atmost t bits will be affected and still remaining t bits are correct. Thus, in any case, the decoding algorithm can correct at most t errors in data bits. When the number of errors is t or less, the recomputed check bits generate the majority value which if turns zero, indicates the bit is correct and vice versa. Linear block codes can be defined using H -matrix. For OLS codes, the parity check matrices have k = m 2 data bits and 2tm check bits where t is the error correction capability of OLS codes. The H -matrix for (24, 16) SEC-OLS code is presented in Fig. 1a. Xiao et al. Scheme: Xiao et al. proposed an optimized scheme for SEC-DAEC OLS codes which is based on block cyclic shift technique. The design constraints for SEC-DAEC code are as follows: (a) Lets say a binary code has k data bits and c parity bits. Then it can represent 2k binary values. If a single error occurs, then there are (k + c) possible positions for the error to occur and (k + c).2k possible output values. Similarly, for double adjacent errors, there are (k + c − 1) positions for the error to occur (k + c − 1).2k possible output values. The relation between space of correct and erroneous codeword can be expressed as follows: 2k ((k + c) + (k + c − 1) + 1) ≤ 2(k+c)
(1)
268
S. Tripathi et al.
Fig. 2 Block shifting algorithm
(b) In the decoding process, the syndrome values for any single bit error and any double adjacent errors must be different. The modulo-2 sum of the corresponding two adjacent columns should be unique. To fulfill the second design constraint, Xiao et al. introduced the block cyclic shift technique. The proposed block cyclic shifting method is explained as follows: (a) OLS codes with block size k = m 2 are divided into m blocks according to the position of ones: (1, . . . m), (m + 1, . . . 2m), . . . (m 2 − m + 1, . . . , m 2 ), (b) To satisfy the condition of the uniqueness of double adjacent error vectors from each block, one column is selected which is used for constructing a new matrix of size m × m. From the bottom four rows, the positions of 1’s are recorded as vector w of size 1 × m, (c) The vector w is now shifted in cycles by m-1 times with the product of m-1 new vectors. This process continues, (d) The new m blocks should be ranked in such a way that the sum of two columns at the junction of two blocks is unique. An example of the above algorithm for OLS code having 16 data bit and 8 parity bit is presented in Fig. 2. In Fig. 2, the top rows represent the bit position and the bottom row is the bit of the vector. The cyclic shift is done by placing the last bit to the first. If the vector size is m, it will be shifted m times and each shift generates a unique combination of adjacent bits. H -matrix of original (24, 16) SECOLS code is rearranged and the optimized H -matrix proposed by Xiao et al. is shown in Fig. 1b. In Xiao et al. (24, 16) optimized OLS code, we observed that a single error pattern: 000000000000000100000000 and a double adjacent error pattern:000000000000000000011000 corresponds to the same syndrome pattern 00011000. So, it is clear that this code has a non-zero miscorrection probability. Also, this code consumes higher power. So, to address this issue, we have designed and implemented power efficient (24, 16) SEC-DAEC-TAEC OLS code with 0% miscorrection probability.
3 Proposed SEC-DAEC-TAEC OLS Code In this section, the proposed (24, 16) SEC-DAEC-TAEC OLS code and its design and implementation processes have been presented.
An Improved (24, 16) OLS Code for Single Error Correction-Double …
269
Fig. 3 H -matrix of proposed (24,16) SEC-DAEC-TAEC OLS Code
3.1 Design of Proposed SEC-DAEC-TAEC OLS Code The H -matrix of our SEC-DAEC-TAEC OLS code has been constructed with the objective so that it can be uniquely decodable with lower power consumption. The proposed technique concentrates on increasing the error correction capability with 0% miscorrection. The matrix of our (24, 16) OLS code is restructured by using block cyclic shifting and parity bit replacement technique. The proposed codes can correct single errors along with double and triple adjacent errors. The block cyclic shifting method is the same as Xiao et al. scheme. To increase the error correction capability and to get rid of the miscorrection probability against Xiao et al. scheme, we have included the last four parity bits ( p5 to p8 ) at the starting of the H -matrix. After that, the data bits (d1 to d16 ) are placed. The remaining four parity bits ( p1 to p4) are appended at the end. The proposed H -matrix for (24, 16) SEC-DAECTAEC OLS is presented in Fig. 3. During encoding process, codeword of proposed (24, 16) SEC-DAEC-TAEC OLS code is formed by combining data and parity bits. The parity equations for the proposed codec are as follows: p1 = d1 ⊕ d5 ⊕ d9 ⊕ d13 ; p2 = d2 ⊕ d6 ⊕ d10 ⊕ d14 p3 = d3 ⊕ d7 ⊕ d11 ⊕ d15 ; p4 = d4 ⊕ d8 ⊕ d12 ⊕ d16 p5 = d1 ⊕ d7 ⊕ d10 ⊕ d16 ; p6 = d2 ⊕ d8 ⊕ d11 ⊕ d13 p7 = d3 ⊕ d5 ⊕ d12 ⊕ d14 ; p8 = d4 ⊕ d6 ⊕ d9 ⊕ d15
(2)
The decoding process comprises syndrome computation and error detection and correction logic. The decoder circuit of the proposed (24, 16) OLS code has been presented in Fig. 4. Errors can be identified with the help of the syndrome values. The equations for syndromes are provided in Eq. (3). s1 = r5 ⊕ r9 ⊕ r13 ⊕ r17 ⊕ r21 ; s2 = r6 ⊕ r10 ⊕ r14 ⊕ r18 ⊕ r22 s3 = r7 ⊕ r11 ⊕ r15 ⊕ r19 ⊕ r23 ; s4 = r8 ⊕ r12 ⊕ r16 ⊕ r20 ⊕ r24 s5 = r1 ⊕ r5 ⊕ r11 ⊕ r14 ⊕ r20 ; s6 = r2 ⊕ r6 ⊕ r12 ⊕ r15 ⊕ r17 s7 = r3 ⊕ r7 ⊕ r9 ⊕ r16 ⊕ r18 ; s8 = r4 ⊕ r8 ⊕ r10 ⊕ r13 ⊕ r19
(3)
Errors in each bit are corrected with the help of error correction logic. The error correction logic corrects the erroneous codeword and recovers the original data. The
270
S. Tripathi et al.
Fig. 4 a Encoder and b Decoder Circuitry of Our (24,16) SEC-DAEC-TAEC OLS Code
error correction logic for first, second, and last data bits of the proposed (24, 16) SEC-DAEC-TAEC OLS code are shown in Eq. (4). dc1 = r5 ⊕ ((s1 s5 ) + (s1 s5 s8 ) + (s1 s2 s5 s6 ) + (s1 s5 s7 s8 ) + (s1 s2 s5 s6 s8 ) + (s1 s2 s3 s5 s6 s7 )) dc2 = r6 ⊕ ((s2 s6 ) + (s1 s2 s5 s6 ) + (s2 s3 s6 s7 ) + (s1 s2 s5 s6 s8 ) + (s1 s2 s3 s5 s6 s7 ) + (s2 s3 s4 s6 s7 s8 )) dc16 = r20 ⊕ ((s4 s5 ) + (s3 s4 s5 s8 ) + (s1 s4 s5 ) + (s2 s3 s4 s5 s7 s8 ) + (s1 s3 s4 s5 s8 ) + s1 s2 s4 s5 ))
(4)
3.2 Area Estimation in Terms of Logic Gates Theoretically, the area estimation of proposed and existing OLS codes have been computed in terms of logic gates which is presented in Table 1. We have increased the error correction capability of the proposed OLS code. The proposed code can correct single error as well as double and triple adjacent errors. So, the proposed SEC-DAECTAEC OLS code requires more area compared to the existing codec. ASIC-based synthesis and implementation results have been discussed in the following section.
An Improved (24, 16) OLS Code for Single Error Correction-Double …
271
Table 1 Area estimation in terms of logic gates Area in terms of logic gates
OLS codec Lee et al. (24, 16) [7]
Reviriego et al. (24, 16) [3]
Xiao et al. (24, 16) [6]
Tripathi et al. Demirci et al. Proposed (24, 16) [8] (32, 16) [5] (24, 16)
XOR2
72
72
72
72
128
72
AND2
16
16
109
16
128
318
OR2
–
31
–
–
48
80
Equivalent NAND 2
320
320
599
320
912
1164
3.3 ASIC-Based Synthesis and Implementation Results To evaluate the performance of the encoder and decoder based on the proposed SECDAEC-TAEC OLS code and existing OLS codes, we have implemented all circuits in Verilog and synthesized them by Synopsys Design Compiler in 90 nm technology. In Table 2, the performances of existing and proposed codecs have been presented in terms of area (mm2 ), propagation delay (µs), power consumption (µW), PDP (µW.s), PAP (µW.mm2 ), and PADP (µW.mm2 .s). Figure 5 presents the post synthesis layout of the proposed (24, 16) SEC-DAECTAEC OLS codec. The synthesis results indicate that the existing OLS codes can correct only single errors. So, these codes have slightly better performance compared to the proposed OLS code. Xiao et al. also proposed (24, 16) OLS code which can correct single errors along with double adjacent errors. This code has non-zero miscorrection probability and consumes more power. But the most interesting property of our proposed OLS code is that the proposed OLS code can correct up to 3-adjacent errors in place of maximum 2-adjacent errors correcting codes proposed by Xiao et al.
Table 2 ASIC synthesis results OLS codec
Error correction capability
Crosssectional area (mm)
Power (W)
Propagation delay (Sec)
PDP
Impro. in Impro. in Impro. in Impro. in area (%) power delay (%) PDP (%) (%)
Lee et al. (24, 16) [7]
SEC
2.21
190.80
15.25
2909.70
−15.38
1.57
−6.56
−4.88
Reviriego et al. SEC (24, 16) [3]
2.08
185.80
14.30
2656.94
−22.60
−1.08
−13.64
−14.86
Xiao et al. (24, 16) [6]
SECDAEC
2.21
193.80
16.25
3149.25
−15.38
3.10
0.00
3.10
Tripathi et al. (24, 16) [8]
SEC
1.92
184.80
14.10
2605.68
−32.81
−1.62
−15.25
−17.12
Demirci et al. (32, 16) [5]
DEC
2.70
208.80
18.40
3841.92
5.56
10.06
11.68
20.57
Proposed (24, 16)
SECDAECTAEC
2.55
187.80
16.25
3051.75
–
–
–
–
272
S. Tripathi et al.
Fig. 5 Post synthesis layout of proposed (24,16) SEC-DAEC-TAEC OLS code
On the other hand, (32, 16) double error correction OLS codes introduced by Demirci et al. need more number of parity bits, have more propagation delay, and consume higher power compared to the proposed SEC-DAEC-TAEC OLS code. From Table 2, the improvement for the proposed SEC-DAEC-TAEC OLS code of 11.68% in delay, 10.06% in power, 20.57% in PDP compared to Demirci et al. scheme and 3.10% in power and 3.10% in PDP against Xiao et al. scheme, respectively, are found.
4 Conclusion In this paper, the H -matrix for a new fully decodable SEC-DAEC-TAEC OLS code has been proposed with the aim of increasing error correction capability and reducing power consumption. The proposed and existing codecs have been implemented on the ASIC platform. The synthesis results exhibit that the proposed SEC-DAEC-TAEC OLS codec has an improvement of 11.68% in delay, 10.06% in power, 20.57% in PDP compared to Demirci et al. scheme and 3.10% in power and 3.10% in PDP against Xiao et al. scheme. So the proposed OLS codec with 0% miscorrection probability and SEC-DAEC-TAEC property can be applicable for memories. Acknowledgements All authors would be grateful to Mr. Partha Mitra, Indian Institute of Technology, Kharagpur for providing ASIC-based results using Synopsys design tool.
References 1. Li J, Reviriego P, yi Xiao L, Wu H (2019) Protecting memories against soft errors: the case for customizable error correction codes. IEEE Trans Emerg Top Comput 2. Hamming RW (1950) Error detecting and error correcting codes. Bell Syst Tech J 29(2):147– 160
An Improved (24, 16) OLS Code for Single Error Correction-Double …
273
3. Reviriego P, Martínez J, Pontarelli S, Maestro JA (2014) A method to design SEC-DED-DAEC codes with optimized decoding. IEEE Trans Device Mater Reliab 14(3):884–889 4. Reviriego P, Pontarelli S, Maestro JA (2012) Concurrent error detection for orthogonal Latin squares encoders and syndrome computation. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(12):2334–2338 5. Demirci M, Reviriego P, Maestro JA (2016) Implementing double error correction orthogonal Latin squares codes in SRAM-based FPGAs. Microelectron Reliab 56:221–227 6. Xiao L, Li J, Li J, Guo J (2015) Hardened design based on advanced orthogonal Latin code against two adjacent multiple bit upsets (MBUs) in memories. In: Sixteenth international symposium on quality electronic design, pp 485–489. IEEE 7. Lee SE (2013) Adaptive error correction in Orthogonal Latin Square Codes for low-power, resilient on-chip interconnection network. Microelectron Reliab 53(3):509–511 8. Tripathi S, Jana J, Bhaumik J (2021) Design of power efficient SEC orthogonal Latin Square (OLS) codes. In: Proceedings of the international conference on computing and communication systems, pp 593–600. Springer, Singapore
A Blockchain-Based Biometric Protection and Authentication Mechanism Surbhi Sharma and Rudresh Dwivedi
Abstract The probability of biometric information leakage, the dependability on authentication modules, and the lack of transparency in the management of biometric information are pertinent security flaws in biometric authentication systems. This paper presents a biometric authentication system utilizing blockchain, which provides a distributed mechanism for storing biometric templates and conducting biometric authentication. Data hashing scheme (dHash) is employed for generating biometric image hashes before storing the biometric templates to the blockchain, thereby securing biometric templates and minimizing the risk of leakage. We perform experimental evaluations utilizing the FVC2004 DB1, DB2, DB3, and DB4 data sets on the Hyperledger Fabric platform, and performance is assessed using the Caliper benchmark tool. Evaluation reflects reliable and safe authentication, measuring the average latency and throughput for the creation and authentication of biometric templates. The proposed blockchain-based authentication system will offer high throughput and low latency over a variety of applications. Keywords Biometric · Blockchain · Authentication · Biometric templates · Smart contract · Hyperledger fabric
1 Introduction 1.1 Background The word “blockchain” was originally used in Satoshi Nakamoto’s 2008 Bitcoin white paper [9]. A blockchain is a distributed ledger that is shared and accessible by all the participants. It is secure, consensus-based, and is not controlled by any single authority. Blockchain maintains data as a series of interconnected blocks, each of S. Sharma (B) · R. Dwivedi Department of Computer Science and Engineering, Netaji Subhas University of Technology (formerly NSIT), Dwarka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_25
275
276
S. Sharma and R. Dwivedi
which contains records of recent transactions with a reference to the previous block. Each block consists of a set of transactions, each of which is recorded using a hash or cryptographic signature. After construction, each block is assigned with a unique address called as hash. Any modifications to the block will change its hash. Compared to traditional methods like passwords, biometric systems are inherently safe, and it is quite strenuous to replicate a biometric template [13]. Biometrics are practically suitable and save users to memorize password. Biometrics cannot be altered if they are stolen or disclosed, unlike passwords, which results in irrevocable damage to privacy. Although the future of biometric authentication seems propitious, there are still some challenges to overcome. Privacy and security are the two primary concerns with biometric templates and digital identity management [5]. Blockchain technology offering transparent, reliable, and distributed storage support could be incorporated with biometrics and biometric templates could be maintained on the distributed ledger. All the storage, retrieval, and authentication operations could then be conducted while executing smart contracts in the blockchain environment. A smart contract is essentially a piece of executable code that controls digital assets and operates in a secure environment in line with predetermined terms and conditions. Smart contract is invoked by the application to generate transactions that are committed on to the ledger. Smart contracts are utilized to enforce agreements between parties that may even prefer to remain anonymous in corporate cooperation. A blockchain-based smart contract can guarantee technically accurate execution for biometric applications. Blockchain technology offers access without requiring a verifier to stay in touch with the issuer, as well as the validation of an issuer’s credentials and user’s ownership of information. Consider the biometric identification system as an example, which could help the government in the modern, proliferated world. Incorporating blockchain with such system will improve the efficiency of the citizen’s identification process for registration and verification/authorization. Our approach will establish a conduit between the people and the government/organization, delivering a secure, trustworthy, transparent, authentic, and efficient verification procedure. It is very challenging for an attacker to attack and tamper user information stored on the blockchain because of its immutability property. Blockchain protects user credentials ensuring that only the user has access to encrypted digital identities. Identity could be easily verified via blockchain at anytime, anywhere, and on any device. By combining biometrics and blockchain, the identity management challenges including security issues, and prolonged wait times for document verification might be easily resolved. Therefore, it could be asserted that blockchain expedites efficiency and authentication processes while addressing the trust, security, key management, and privacy issues in biometric data storage, authentication, and management.
A Blockchain-Based Biometric Protection and Authentication Mechanism
277
1.2 Related Work For safe and reliable information management, recent studies have concentrated on incorporating blockchain technology for different applications [1, 3, 11]. Aggarwal et al. [1] have suggested EnergyChain, a blockchain architecture for safely storing and retrieving the energy data produced by smart homes. It creates a transaction handling mechanism for secure energy trading and exhibits that it is better in terms of transmission costs and computation time as compared to conventional system. Hyperledger is an open source permissioned distributed ledger technology platform that offers several significant benefits over other blockchain platforms and used in various existing works [10]. A wide range of industrial use cases are made possible by Hyperledger Fabric’s extremely modular and configurable design. Fabric is first such platform enabling smart contracts (chaincode) written in general-purpose programming languages like Java, Go, and Node.js instead of restricted domainspecific languages. A blockchain-based smart contract can guarantee technically accurate execution for biometric applications. Recently, various methods for biometric template protection and management using blockchain have been proposed in the literature. Delgado et al. [4] introduced an effective blockchain-based biometric template storage method that uses Merkle trees to store any quantity of data for the same price while enhancing performance. However, the smart contract created in this case is the most basic one and has not been optimized, therefore, it does not address security concerns. Goel et al. [7] proposed a blockchain-based architecture for a trained biometric identification system that provides fault-tolerant access in a distributed environment. The key limitation of the proposed system is the execution time for various cryptographic procedures. Zhou et al. [14] used the immutability property of blockchain to create an auditable protocol based on Diffie-Hellman key exchange (DHE) with applications to fingerprint authentication. However, the expense of storing biometric images on a public blockchain could be too expensive. It’s also important to consider the time taken by the blockchain to validate a user, since this may restrict the usage of biometrics as an authentication mechanism.
1.3 Motivations and Contributions Despite the prospects, there are certain limitations to the existing blockchain technology that need to be thoroughly investigated and categorized before integration which is the primary motivation for this research work. The fundamental contribution of this study is constituted by the following aspects: 1. We present a biometric template storage and authentication scheme that incorporates biometric authentication in permissioned blockchain environment. 2. In order to address the limitations of the existing works such as the protection of biometric information, single point of failure, and capricious authentication
278
S. Sharma and R. Dwivedi
Assailant
Assault
User
Request for user id, password
Retrieve data Result
Server
Accept/ Reject
Service provider
User id, password
Read/Write
Database
Data
Fig. 1 Complication with traditional identity authentication
procedures, we proposed a pragmatic approach to the authentication scheme. To increase security, permissioned blockchain is used in exchange of biometric databases, and its distributed nature helps it withstand single points of failure. Utilizing smart contracts to gain access to the biometric image hash helps to resist against attacks that attempt to impersonate or modify the template. 3. We have created the smart contract for two primary processes in order to add new user data to the network and subsequently to authenticate or validate existing user data by providing a biometric hash value as a parameter. 4. We perform rigorous experimentation on the proposed system’s prototype on FVC2004 to validate its practicality and operational performance followed by its comparison with the existing scheme. The outline of this paper is structured as follows: Sect. 1 describes the background with the need of integrating blockchain and biometrics, existing works followed by motivations & contributions. Proposed system framework and experimentation methodology are illustrated in Sect. 2. Implementation and experimental results of the proposed system followed by its comparison with the existing scheme are demonstrated in Sect. 3. The final conclusions are presented in Sect. 4.
2 Proposed Framework Figure 1 illustrates the difficulty model for conventional identity authentication system primarily consisting of service provider, attacker, and user. An assault on the server might result from any vulnerability in the network, operating system, database, or applications. In order to access the file containing private data, an attacker can target the server’s database using various types of SQL injection [8] and other techniques. The attacker can then access the file using brute force cracking and other techniques to get the user ID and password.
A Blockchain-Based Biometric Protection and Authentication Mechanism
279
Python Application
Fingerprint information
Hash
Registration: Fingerprint information User
Hash information Client
Authentication: provide service, update blocks
Fabric peer Service/new block/ user data
Fig. 2 Proposed biometric authentication model
This research proposes a blockchain-based authentication model in response to the difficulty model. General layout of the proposed model is depicted in the form of a block diagram in Fig. 2. Two participants (users and clients) and two technical elements (Python application and Fabric peer) constitute the model. Each fabric peer is one of the nodes in the Hyperledger Fabric blockchain responsible for managing ledgers and smart contracts. The client inputs fingerprint template and certificate information to the fabric peer. Peer node then uses different computations to take decisions for performing different operations, such as registration, reading, writing, updating, and authentication. The proposed model performs three main tasks including hash generation, user registration, and user authentication.
2.1 Hash Generation Image hashing is the technique of analyzing the contents of an image and creating a hash value that specifically identifies an input image based on those contents. We have used benchmark fingerprint biometric databases FVC2004 (DB1, DB2, DB3, DB4) as our test dataset. Hash is generated using difference hash (dHash) algorithm [12]. The processing flow starts with the conversion of the input image to grayscale. To assure that the generated image hash will match related images regardless of their initial spatial dimensions, the image is then shrunk down to 9 × 8 pixels. The subsequent step is to compute the difference between adjacent pixels. If we calculate the difference between adjacent column pixels in an input image with 9 pixels per row, we get eight differences. Thus, there are eight differences for eight rows, which form the 64-bit (8 × 8) hash. The algorithm’s final step is to assign bits and construct the output hash using a straightforward binary test as depicted in the Algorithm 1.
280
S. Sharma and R. Dwivedi
Algorithm 1 dHash Algorithm Input: Biometric image. Output: Image hash as 64-bit integer. 1: Discard all color information and transform the input image to grayscale. 2: Resize the image to fix column width size of 8 pixels. 3: Calculate the binary difference between the adjacent pixels (P) of resized image. 4: if P[x] ≥ P[x + 1] then 5: r etur n 1 (If left pixel is brighter) 6: else 7: r etur n 0 (If left pixel is darker) 8: end if 9: Convert the calculated Boolean values into 64-bit integer.
2.2 User Registration When a user requests a service for the first time, he must complete the registration process. The user registration process is depicted in Fig. 3a. Procedure for user registration process is described in Algorithm 2. The registration process starts with the client receiving the user information and biometric image from the new user. To submit the user’s biometric image, the client use biometric image hash that was generated using Algorithm 1. The fabric peer updates the world state ledger with user information and returns the service which in this case is adding a new block to the chain.
Ledger Network
Client
Python Application
Upload data
and returning hash
Convert biometric image to hash
Python Application
Client
Ledger Network
Biometric validity
Return hash data
Convert biometric data to hash to compare
Query hash data Upload data request with image hash data Returns user data New block
Retrieve user data using hash
New block Update block
(a)
Fig. 3 a User registration workflow, b User authentication workflow
(b)
A Blockchain-Based Biometric Protection and Authentication Mechanism
281
Algorithm 2 User Registration Input: User requests for biometric registration. Output: User template created as a block in blockchain. 1: The new user sends the user identifier (id), name, and biometric image to the client. 2: The client connects to the Python application to get the image hash. 3: Python application generates the biometric image hash using dHash algorithm. 4: Fabric peer publishes the user data to the ledger. 5: ledger ← (id, name, hash) 6: Biometric template is added as a new block to the chain.
2.3 User Authentication The authentication procedure must be performed each time the user asks the service, and the fabric peer then returns the service following a successful verification. The user authentication process is depicted in Fig. 3b. Procedure for user authentication process is described in Algorithm 3. The registered user provides biometric information to the client. To submit the user’s biometric image, the client receives the biometric image hash that was generated using Algorithm 1. The smart contract is utilized by the fabric peer to query the world state using the calculated hash. In response, the user is authenticated and the biometric template saved in the world state is returned if the specified hash exists in the ledger. Otherwise, the user is rejected. Algorithm 3 User Authentication Input: User requests for biometric authentication. Output: User is authenticated/User is rejected with error message. 1: The client receives biometric data from the registered user. 2: The client connects to the Python application to get the image hash. 3: Python application generates the biometric image hash using dHash algorithm. 4: The fabric peer queries the world state with the calculated hash. 5: if hash exists in the ledger then 6: client ← (id, name, andhash) 7: User is authenticated 8: else 9: err or message 10: User is rejected 11: end if
3 Experiments and Analysis We have conducted experimental evaluations on benchmark fingerprint biometric database FVC2004(DB1, DB2, DB3, DB4) [6], the results are reported in Sect. 3.2, and the comparison with the existing scheme is represented in Sect. 3.3. There are a total of 800 images of 100 subjects in each dataset of FVC2004. Each subject has
282
S. Sharma and R. Dwivedi
FVC2004_ DB1
FVC2004_ DB2
FVC2004_ DB3
FVC2004_ DB4
Fig. 4 Sample test images for each database
Biometric Template Database Submit transaction Biometric Application
Biometric Templates Receives ledger updates (Transaction comitted/ Transaction not comitted)
Smart Contract functions: ReadTemplate CreateTemplate
Template hash CouchDB Up da te
World state
Ledger
Fig. 5 Biometric application using data hashing approach with blockchain
eight samples. Two test image samples for each database DB1, DB2, DB3, and DB4 are shown in Fig. 4. In our work, the first step is the deployment of dHash algorithm discussed in Sect. 2.1 encorporating Python openCV library. Biometric application using data hashing approach employing blockchain smartcontract is shown in Fig. 5.
A Blockchain-Based Biometric Protection and Authentication Mechanism
283
Next, we deploy the proposed scheme on the Hyperledger Fabric network. The following steps are followed for Fabric implementation:
3.1 Experimental Setup 1. Environment setup: We built the Fabric network using docker images. Org-1 and Org-2 are two organizations in the test fabric network, each with a peer node and a Certificate Authority (CA), which is used to issue certificates to administrators and network nodes. Membership Services Provider (MSP) is used to map certificates to member organizations. The network also comprises an orderer node along with the components required for fabric network construction along with the chaincode as shown in Fig. 6. The device that we used to build the Fabric network runs Ubuntu 20.0 operating system and the Fabric version we built is Fabric 2.2.5. 2. Smart contract-chaincode design and implementation: After configuring the network environment, we design the smart contract required by the system. For that, we determine the parameters that will be stored in the smart contract and define the corresponding functions for processing the parameters. We store the following data in the Fabric network: • The hash value of the fingerprint data of the user. • The name of the user. • The unique id for the user.
Fig. 6 Proposed scheme architecture on Fabric blockchain network
Test channel
Org-1 node
Org-2 node
CA - MSP
CA - MSP
Client
Client
PEER
PEER
Chaincode
Ledger world state
Chaincode
BLOCKCHAIN Transactions
Ledger world state
BLOCKCHAIN Transactions
ORDERER
MSP
284
S. Sharma and R. Dwivedi
We create two fundamental methods in the smart contract (chaincode): • CreateTemplate: adds user data in the network • ReadTemplate: retrieves user data by supplying a biometric hash value as a parameter. We have created smart contract in nodejs. We have used CouchDB as a peer-state database to store the information in JSON format. 3. Create channel: Every blockchain transaction on the network is carried out over a channel, where each participant involved in the transaction must be authenticated and authorized. Each peer that joins a channel has its own identity given by a MSP, which authenticates each peer to its channel peers and services. We create a channel for two peer nodes across two organizations in HyperLedger Fabric test network. 4. Release chaincode: After channel creation, we deploy the chaincode in the Fabric network. 5. Achieve on-chain access: The Fabric environment has been set up, the smart contract has been created, and the chaincode release is complete. Next is to conduct experimental demonstrations to obtain a set of performance test results. Hyperledger Caliper which is one of the performance benchmark framework is employed in this work. To add user identity data to the network, CreateTemplate() contract function is invoked. ReadTemplate() contract function is invoked in order to query and search or authenticate the user identity data in the network.
3.2 Results and Discussion Blockchain transactions are simulated in the test network using Caliper tool for the following methods of smart contract: • CreateTemplate(): it is invoked by passing the user information viz. name, id, and hash as contract arguments. We have configured 1000 transactions in our benchmark configuration file for CreateTemplate() operation. The user information gets stored in CouchDB in our network. This can also be used to update the user information in the ledger. • ReadTemplate(): it is invoked by passing the hash of fingerprint data as contract argument. The performance of blockchain-based biometric template authentication system is evaluated in Hyperledger Caliper blockchain benchmark tool using two performance metrics: (i) Latency(max/min/avg) represents the time taken in seconds (s), between submitting a transaction and obtaining a response, (ii) Throughput represents the average number of transactions executed per second (TPS). The authentication system is tested for 1,000 transactions to create user templates and 10,000 transactions to read or authenticate users. It is evident from the results
A Blockchain-Based Biometric Protection and Authentication Mechanism Table 1 Results for 1,000 input transactions Method FVC2004 Min Latency(s) CreateTemplate()
DB1 DB2 DB3 DB4
0.54 0.65 0.64 0.66
Table 2 Results for 10,000 input transactions Method FVC2004 Min Latency(s) ReadTemplate()
DB1 DB2 DB3 DB4
0.01 0.01 0.01 0.01
285
Max Latency(s)
Average Latency(s)
Throughput (TPS)
13.08 14.34 14.36 14.03
7.71 7.05 7.5 7.13
38.9 39.5 38.2 39.5
Max Latency(s)
Average Latency(s)
Throughput (TPS)
0.17 0.18 0.14 0.15
0.03 0.04 0.03 0.03
168.3 152.4 173.6 170.2
reported in Tables 1 and 2 that the throughput is high while average transaction processing time for user template creation and authentication is very low. Consider the citizen identification system (which can be used for criminal identification, civil verification, social welfare programs, registration for a driver’s license, ration card, passport, and granting visa) as a use-case for our blockchain-based biometric authentication system. By reducing inaccuracy and providing a trustworthy, risk-free biometric identity authentication system, our system could assist businesses or the government secure the identity management process. The creation of trustworthy citizen identity documents for an individual could be supported and facilitated by our approach. The blockchain may further assure executable contracts, decentralization, security, privacy, and the integrity of biometric identity. On the blockchain network, each authentication procedure is documented as a transaction. Workflow for the citizen identification system based on the proposed scheme proceeds in the following manner: 1. Citizen request for authentication/identification. 2. Client interacts with the blockchain via a smart contract to manage citizen’s requests. 3. In the case of a new citizen, a smart contract is executed for user registration, and the citizen’s biometric template is generated as a block in the blockchain. 4. In the case of a registered citizen, smart contract is executed for user authentication, and citizen is authenticated or rejected on the basis of template matching. 5. Results/services/template/information is returned back to the client after the smart contract execution.
286
S. Sharma and R. Dwivedi
Table 3 Comparison with the existing scheme Bao’s work [2] Database
FVC2002
Hash generation Approach Blockchain implementation Statistical Results
Fuzzy extractor Approach Hyperledger Cracking difficulty
Our work FVC2004 (DB1, DB2, DB3, DB4) Difference hash Generation approach Hyperledger Latency Throughput Cracking difficulty
3.3 Comparison with Existing Scheme Proposed scheme is compared with the scheme presented by Bao et al. [2] and comparisons are represented in Table 3. Two-factor biometric identity authentication scheme is presented in Bao’s work and the authors have evaluated their system on the basis of only one parameter, i.e., cracking difficulty. However, we have reported the results on the basis of parameters such as latency and throughput in Tables 1 and 2. We have reported results for all four available FVC2004 databases (DB1, DB2, DB3, DB4) while Bao et. al have mentioned their experiment on FVC2002 database. As far as cracking difficulty of our work is concerned, attackers can only attempt by brute force traversal due to the hash algorithm’s irreversibility. The difficulty of cracking is n when the cracking difficulty of each individual image hash is n (n is the fixed hash size). The overall cracking difficulty is m · n. Additionally, it would be extremely expensive with significant effort to reverse the hash, making the identity information on the blockchain highly secure.
4 Conclusion The proposed work presents a biometric identity authentication framework that stores the biometric templates on blockchain after creating the biometric image hashes using a data hashing scheme. The Hyperledger Fabric architecture is employed to deploy the blockchain considering features such as scalability, ease of deployment, speed, transaction rates, confidentiality, security, and platform compatibility. Smart contract or chaincode is deployed on Fabric platform for performing two main functions: data storage and data retrieval for identity authentication. dHash algorithm which is generating biometric image hashes is securing biometric templates before storing it to the distributed ledger averting the risk of biometric template leakage. The Hyperledger Caliper benchmark performance tool is used to evaluate the performance of blockchain transactions in the test network. When tested, the system demonstrated
A Blockchain-Based Biometric Protection and Authentication Mechanism
287
reduced deployment latency while maintaining privacy. The key benefit of the proposed system above conventional approaches is the high throughput and low latency of successfully created and authenticated biometric templates. The fact that it is still essential to guarantee the accessibility of the data kept outside of the blockchain is the limitation of the data hashing technique utilized in our work. The system’s sustainability would be jeopardized if these data were lost or altered, even if the alteration was always discovered. As future work, more efficient data hashing approach and template compressing approach could be explored without sacrificing accuracy. A quick, efficient, and less expensive authentication service could be achieved by reducing the template size, subsequently reducing the computation time and cost. To create a more proficient and reliable blockchain network, more performance-enhancing strategies could be explored along with the performance-affecting factors such as the endorsement policy and block size for blockchain.
References 1. Aggarwal S, Chaudhary R, Aujla GS, Jindal A, Dua A, Kumar N (2018) Energychain: Enabling energy trading for smart homes using blockchains in smart grid ecosystem. In: Proceedings of the 1st ACM MobiHoc workshop on networking and cybersecurity for smart cities, pp 1–6 2. Bao D, You L (2021) Two-factor identity authentication scheme based on blockchain and fuzzy extractor. Soft Comput 1–13 3. Cheng X, Chen F, Xie D, Sun H, Huang C (2020) Design of a secure medical data sharing scheme based on blockchain. J Med Syst 44(2):1–11 4. Delgado-Mohatar O, Fierrez J, Tolosana R, Vera-Rodriguez R (2019) Biometric template storage with blockchain: a first look into cost and performance tradeoffs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops 5. Dwivedi R, Dey S (2017) Coprime mapping transformation for protected and revocable fingerprint template generation. In: International conference on mining intelligence and knowledge exploration. Springer, pp 111–122 6. Fingerprint verification competition: Fvc 2004 database (2022) http://bias.csr.unibo.it/ fvc2004/databases.asp 7. Goel A, Agarwal A, Vatsa M, Singh R, Ratha N (2019) Securing CNN model and biometric template using blockchain. In: 2019 IEEE 10th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–7 8. Halfond WG, Viegas J, Orso A, et al (2006) A classification of SQL-injection attacks and countermeasures. In: Proceedings of the IEEE international symposium on secure software engineering. vol 1. IEEE, pp 13–15 9. Nakamoto S, Bitcoin A (2008) A peer-to-peer electronic cash system. Bitcoin 4. https://bitcoin. org/bitcoin.pdf 10. Sawant G, Bharadi V (2020) Permission blockchain based smart contract utilizing biometric authentication as a service: a future trend. In: 2020 international conference on convergence to digital World-Quo Vadis (ICCDW). IEEE, pp 1–4 11. Shafagh H, Burkhalter L, Hithnawi A, Duquennoy S (2017) Towards blockchain-based auditable storage and sharing of IoT data. In: Proceedings of the 2017 on cloud computing security workshop, pp 45–50 12. Wang Dz, Liang Jy (2019) Research and design of theme image crawler based on difference hash algorithm. In: IOP conference series: materials science and engineering, vol 563, p 042080. IOP Publishing
288
S. Sharma and R. Dwivedi
13. Zhou B, Xie Z, Ye F (2019) Multi-modal face authentication using deep visual and acoustic features. In: ICC 2019-2019 IEEE international conference on communications (ICC). IEEE, pp 1–6 14. Zhou X, Hafedh Y, Wang Y, Jesus V (2018) A simple auditable fingerprint authentication scheme using smart-contracts. In: International conference on smart blockchain. Springer, pp 86–92
Circuit, Device and VLSI
Optimizing Throughput Using Effective Contention Aware Adaptive Data Rate in LoRaWAN R. Swathika and S. M. Dilip Kumar
Abstract In the Long Range Wide Area Network (LoRaWAN), changing spreading factors can adjust the data rate of end devices for optimizing the throughput. But data rates should be adjusted carefully because it may cause collisions as many devices can use the same spreading factor, leading to total throughput variation. An effective adaptive data rate should be devised to eliminate collision probability and increase throughput. In this paper, a constrained optimization objective function is designed to overcome the aforementioned problem which restricts the specific data rate used by the particular device by using the Stochastic Gradient Descent (SGD) method. Using machine learning optimization techniques like SGD will give the best-fit values, which will enhance the usage of LoRaWAN-based networks in a wide range of IoT applications. The numerical results and analysis of results show that throughput is improved while comparing the proposed method with the Gradient Descent method. Keywords LoRaWAN · Spreading factor · Throughput · Contention aware adaptive data rate · Gradient descent · Stochastic gradient descent
1 Introduction The Internet of Things (IoT) provides a good platform for binding many things, like humans, sensors, objects, and many others [1]. Resource allocation like spreading factor to IoT applications is very challenging in network environment [2]. Low Power Wide Area Network (LPWAN) technology provides a promising solution to resource allocation challenges. Under LPWAN, LoRaWAN uses the sub-GHz band and is used to build a private network in an unlicensed band. It is popular because it uses chirp spread spectrum modulation which is defined in a physical layer that enables long range, low energy, and low deployment cost. It uses Adaptive Data Rate (ADR) R. Swathika (B) · S. M. D. Kumar Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_26
291
292
R. Swathika and S. M. D. Kumar
to assign a Spreading Factor (SF) which controls the data rate of each device. It allocates six spreading factors (SF6, SF7, SF8, SF9, SF10, SF11, SF12) to LoRa end devices. Gateway in the network can adjust SF to control each device’s data rate. The device should use the smaller SF, if there is a smaller communication range use the bigger SF, if there is a longer communication range [3]. This paper proposes an effective ADR technique to maximize throughput in the LoRaWAN. To maximize throughput, all devices should choose the smallest SF as much as possible. But, if more devices use the same SF, then there is a more chance of collision, which may affect throughput. To solve this, an objective function for throughput with respect to SF is designed and the total number of devices choosing the same SF is limited according to its communication range. To solve this constrained objective function, we propose a Stochastic Gradient Descent method.
1.1 Contributions This paper makes the following major contributions: (i) Formulates the objective function for optimizing throughput by calculating traffic load on the sub-network for each SF and channel to enhance ADR in LoRaWAN. (ii) Solves the objective function by using the SGD method, providing an optimal solution to the total throughput. (iii) Compare the existing Gradient Descent method with the proposed SGD method and provides the simulation results as a graph and analysis table.
1.2 Paper Organization The rest of the paper is formulated as follows: The background works related to ADR, contention aware & how it impacts ADR and collision scenario and no collision scenario for particular SF & channel selection and also problem statement, objectives, existing method projection gradient is given in Sect. 2. In Sect. 3, the proposed system with techniques, limitations and tools, objective function, and throughput optimization is discussed. In Sect. 4, experimental results for throughput versus iterations, total number of devices vs iterations for existing method, and proposed method are compared and analyzed. Finally, in Sect. 5, the conclusion of the paper and the future work are presented.
Optimizing Throughput Using Effective Contention …
293
2 Background This section briefs the ADR in LoRaWAN of contention aware ADR and how contention affects and impacts ADR and the existing works related to this paper as discussed below.
2.1 Adaptive Data Rate in LoRaWAN The LoRaWAN end device uses any of the six SFs which determines the data rate. Its standards will be varying for different regions and we consider the scenario that it uses EU863-870 standard. Gateway verifies whether still it receives the successful signal transmission by the devices and increments the acknowledgement counter for each adjusted SF. To avoid more communication failure acknowledgement limit is set. The default value of the ADR acknowledgment limit is 64 for EU863-870. If it is adjusted incorrectly, then it will take a longer time to tune to the exact data rate. Hence, the data rate must be regulated deliberately as it may affect the entire network.
2.2 How Contention Aware Impacts ADR? LoRaWAN uses pure ALOHA with Frequency-hopping Spread Spectrum (FHSS) to send packets on a randomly selected channel. The European standard provides a total of six channels, in which 3 channels are default channels, where all gateways listen to these channels all the time and the other three channels are used for broadcasting of the devices. All channels bandwidth can be fixed at 125 KHz. Spreading factor is also used for multiple access control and not only to regulate the data rate. If a device’s signal uses a different SF or different channel, collision will be avoided as given in Fig. 1a. Even two signals can use the same channel and they can use different SF to avoid the collision probability. In simple, multiple signals use the same channel and the same SF can overlap each other as shown in Fig. 1b. Therefore, adaptive data rates should be dynamically adjusted depending on the respective network environment.
(a) No Collision Scenario for SF and Channel.
Fig. 1 Collision scenario
(b) Collision Scenario for SF and Channel.
294
R. Swathika and S. M. D. Kumar
Table 1 Summary of related works Paper Year Techniques
Limitations
[4]
2018
Gradient projection method
[5]
2020
[6]
2022
[7]
2020
[8]
2019
Distributed genetic algorithm based solving method Hybrid ADR (ADR and Blind ADR) Gaussian filter-based ADR (G-ADR) and an exponential moving average-based ADR (EMA-ADR) Enhanced ADR (E-ADR)
[9]
2020
[10]
2021
[11]
2021
Enhanced greedy ADR (EARN) Mobility aware resource SF assignment scheme Resource management ADR
Only 3 spreading factor is considered Low communication range Interference due to high SF Poor adaption of SF and high convergence period
Only on specific mobility patterns. Suffers from capture effect Poor adaption of SF Fixed SF is not possible
2.3 Related Works In this section, related works of ADR are discussed. The overview of related works is given in Table 1. The work in [4], uses the ADR to reduce the collision and improve the upper bound of throughput by using the gradient projection method. It uses formulates the objective function to find the best solution for attaining the maximized throughput. The work in [5], optimizes the spreading factor to improve throughput, minimize the energy consumption and improve the ADR. Here the objective function is formulated to find an optimal solution by using the Distributed Genetic Algorithm Based Solving Method. The work in [6], uses a hybrid ADR technique to allocate resources in LoRaWAN. It uses ADR for static environment, Blind ADR for mobile environments to allocate spreading factor and transmission power to EDs. This proposed hybrid ADR is used to improve the packet delivery ratio compared to existing ADR and blind ADR. The work in [7], a Gaussian filter-based ADR (G-ADR) and an exponential moving average-based ADR (EMA-ADR) are used to assign the best SF and transmission power to EDs to lower the convergence period, reduce energy consumption, and enhance the packet transmission rate. The work in [8], enhanced ADR mechanism is used to minimize energy consumption, transmission time and improve Quality of Service (QoS) in mobile devices by adapting the position of mobile devices to assign adaptive data rates dynamically. LoRaWAN presents an ADR mechanism with some parameters that present enhanced greedy EARN, which is an ADR mechanism to optimize delivery ratio and energy consumption [9]. In
Optimizing Throughput Using Effective Contention …
295
a LoRaWAN, the ADR mechanism is used for allocating resources efficiently. A mobility-aware resource (SF) assignment scheme (M-ASFA) to assign possible perfect spreading factor to enable mobility devices [10].
2.4 Problem Statement If more devices use the same SF, greater the chance of collision leads to decrease the throughput and performance. Because signals created by the same SF will be overlying due to their orthogonality and then a collision occurs. Therefore, it not only affects the quality of link but also contention due to the impact of changed SFs. The problem is to devise an effective contention aware ADR mechanism in LoRaWAN. The objectives include: (i) (ii) (iii) (iv)
To estimate total throughput for the formulated objective function. To compare existing Gradient Projection and the proposed SGD methods. To improve ADR. To eliminate collision and contention.
2.5 Existing Projection Gradient Method In existing, the gradient projection method is used to resolve the constrained optimization problem. It finds an optimal solution by updating the recent solution in the current direction iteratively by a small value. When the solution violates given constraints, it projects the solution in a feasible space. Whenever the solution is updated, it checks whether it does not violate the constraints. And total throughput is calculated by using this gradient projection method in existing work.
3 Proposed Stochastic Gradient Descent Method Figure 2 represents the flowchart of SGD. In SGD, random variables can be mapped as parameters and estimates can be made as probability distributions. Let us consider the unconstrained optimization function | f (y, v)| and it depends on the decision variable y and on the random variable v. The goal of stochastic type optimization is to find a solution that optimizes the function f with respect to the random variable that is given by Eq. (1) [12], max f (y) = E[ f (y, v)]
(1)
296
R. Swathika and S. M. D. Kumar
Fig. 2 Flowchart of SGD
3.1 Formulating an Objective Function f (Y ) is an objective function as given by Eq. (2), S = f (Y ) =
Tc
tr7 Prtr x o7,i e−2tr7 Prtr x o7,i + ...
i=1
+tr12 Prtr x o12,i e
(2)
−2tr12 Prtr x o12,i
where, i is channel, T is total number of devices, Tc is number of channels available, Ts the number of spreading factors available, tr7 , tr8 , tr9 , tr10 , tr11 , tr12 denotes time of arrival for sending a packet length P L and Prtr x is the probability transmission of all devices which follows a Poisson distribution. Equation (2) can be approximately written as, ≈ Tc {tr7 Prtr x o7 e−2tr7 Prtr x o7 + ... + tr12 Prtr x o12 e−2tr12 Prtr x o12 }
(3)
Objective function can be presented as follows: maximize f (Y ) Y = {o7 , o8 , o9 , o10 , o11 , o12 }
(4)
Optimizing Throughput Using Effective Contention …
297
In Eq. (4), Y is an optimization variable and denotes the total number of devices using the identical SF. From Eq. (2), o j,i ≈
oi Tc
(5)
where, o j,i is the total number of devices performing a packet transmission utilizing SF j and channel i. The time of arrival for transmitting a packet is given by tr j =
PL Sb ( j)
(6)
where, P L is packet length and Sb is symbol rate. The total throughput can be calculated by the traffic on each sub-network for each SF j and channel i as given in Eq. (2). Optimization of an objective function using the SGD method will give an optimal solution to the throughput.
4 Experimental Results The tool used to generate the simulation results is MATLAB R2020a [13]. By using objective function for throughput as given earlier, graphs and numerical results are obtained. Let the number of devices be 100 and the channel taken is 3 and 6 in different cases. Spreading factors taken are S F6, S F7, S F8, S F9, S F10, S F11, and S F12. Threshold for the norm of gradients is 0.0001. Figure 3a represents the total throughput versus iterations of the proposed SGD method and existing gradient projection method. It takes 100 iterations and the total throughput for maximum iteration is 0.0377 for 6 channels and 6 SFs. Figure 4a represents the total number of devices versus iterations of the proposed SGD method
(a) Channel (Tc ) = 6
Fig. 3 Throughput versus iterations
(b) Channel (Tc ) = 3
298
(a) Channel (Tc ) = 6
R. Swathika and S. M. D. Kumar
(b) Channel (Tc ) = 3
Fig. 4 Total number of devices versus iterations Fig. 5 Total number of devices versus throughput
and existing gradient projection method. It takes 100 iterations and the number of devices for maximum iteration is 55 for 6 channels and 6 SFs. Figure 3b represents the total throughput versus iterations of the proposed SGD method and existing gradient projection method. It takes 100 iterations and the total throughput for maximum iteration is 0.00725 for 3 channels and 6 SFs. Figure 4b represents the total number of devices versus iterations of the proposed SGD method and existing gradient projection method. It takes 100 iterations and the number of devices for maximum iteration is 38 for 3 channels and 6 SFs. Figure 5. represents the total throughput versus number of devices, which is a comparison of the naive method (ADR), the existing method (projection gradient), and the proposed method (SGD). It takes the number of devices as power of 10 from 101 to 1010 for 6 channels and 6 SFs. The graph shows that for the maximum number of devices, the proposed method has a total throughput of 0.045, which is higher than the existing method and the naive method.
Optimizing Throughput Using Effective Contention …
299
Table 2 Analysis of throughput versus iterations (Channel = 3 and 6) Iterations Existing (Tc = 6) Proposed Existing (Tc = 3) Proposed (Tc = 6) (Tc = 3) 1 10 20 40 60 80 100
0.13 0.03 0.03 0.03 0.03 0.03 0.03
0.09 0.14 0.038 0.038 0.038 0.038 0.038
0.023 0.042 0.068 0.068 0.068 0.068 0.068
0.043 0.072 0.072 0.072 0.072 0.072 0.072
Table 3 Analysis of total number of devices versus iterations (Channel = 3 and 6) Iterations Existing (Tc = 6) Proposed Existing (Tc = 3) Proposed (Tc = 6) (Tc = 3) 1 10 20 40 60 80 100
10 25 34 47 47 47 47
34 55 55 55 55 55 55
18 20 24 35 35 35 35
34 38 38 38 38 38 38
4.1 Analysis of Results From the above simulation results, the graph has been given as an analysis of results. From Fig. 3a, b, the total throughput versus iterations for channels 3 and 6 is given as analysis in Table 2. As the total number of channels increases, total throughput also increases and as the total number of channels decreases, total throughput decreases for the maximum 100 iterations and it is constant for 20 and above iterations as observed from the results. From Fig. 4a, b the total number of devices versus iterations for channel 3 and 6 is given as analysis in Table 3. As the total number of channels increases, the total number of devices also increases and as the total number of channels decreases, the total number of devices decreases for maximum 100 iterations and it is constant for 20 and above iterations as observed from the results. From Fig. 5 the total number of devices versus throughput for channel 6 is given as analysis in Table 4. Analysis of throughput for naive approach, projection gradient, and SGD method is given. By analysis of result of total number of devices is given in terms of power of 10 and SGD method throughput is higher than the other two existing methods. It is observed that as total number of devices increases, throughput decreases.
300
R. Swathika and S. M. D. Kumar
Table 4 Analysis of total number of devices versus throughput (Naive approach, projection gradient, SGD) Total # devices Naive approach Projection gradient SGD method 101 102 103 104 105 106 107 108 109 1010
0.02 0.05 0.01 –0.04 –0.05 –0.06 –0.07 –0.08 –0.09 –0.1
0.03 0.06 0.04 –0.03 –0.04 –0.05 –0.06 –0.07 –0.08 –0.09
0.05 0.09 0.07 0.045 0.045 0.045 0.045 0.045 0.045 0.045
5 Conclusion The ADR is used to adjust the spreading factor, which in turn provides an opportunity to optimize the throughput in LoRaWAN by eliminating collision and contention aware mechanism. In the existing work, projection gradient method is used to get an optimal solution for maximizing the throughput objective function. In this paper, a single objective stochastic gradient method is used to find the best solution by formulating the objective function to optimize throughput for contention aware ADR in LoRaWAN. The proposed SGD Stochastic Gradient Descent method is compared with the Gradient Descent method. The simulation results show that SGD achieves higher throughput than the existing projection gradient and naive approach. As part of the future work, multi-objective optimization techniques can be used to incorporate ADR techniques. It is planned to extend this proposed work to quasi-static and mobile environments.
References 1. Akshatha PS, Kumar SMD (2022) Delay estimation of healthcare applications based on MQTT protocol: a node-RED implementation. In: 2022 IEEE international conference on electronics, computing and communication technologies (CONECCT), pp 1–6. https://doi.org/10.1109/ CONECCT55679.2022.9865759 2. Srinidhi NN, Kumar SMD, Venugopal KR (2019) Network optimizations in the internet of things: a review. Eng Sci Technol Int J 22(1):1–21 3. Ayoub W, Samhat AE, Nouvel F, Mroue M, Prevotet J (2019) Internet of mobile things: overview of LoRaWAN, DASH7, and NB-IoT in LPWANs standards and supported mobility. IEEE Commun Surv Tutor 21(2):1561–1581 4. Kim S, Yoo Y (2018) Contention-aware adaptive data rate for throughput optimization in LoRaWAN. Sensors
Optimizing Throughput Using Effective Contention …
301
5. Narieda S, Fujii T, Umebayashi K (2020) Energy constrained optimization for spreading factor allocation in LoRaWAN. Sensors 6. Farhad A, Pyun J-Y (2022) HADR: a hybrid adaptive data rate in LoRaWAN for internet of things. ICT Express 8:283–289 7. Farhad A, Kim D-H, Subedi S, Pyun J-Y (2020) Enhanced LoRaWAN adaptive data rate for mobile internet of things devices. Sensors 8. Benkahla N, Tounsi H, Song Y, Frikha M (2019) Enhanced ADR for LoRaWAN networks with mobility. In: 2019 15th international wireless communications and mobile computing conference (IWCMC), pp 1–6. https://doi.org/10.1109/IWCMC.2019.8766738 9. Park J, Park K, Bae H, Kim CK (2020) EARN: enhanced ADR with coding rate adaptation in LoRaWAN. IEEE Internet Things J 7(12):11873–11883 10. Farhad A, Kim D-H, Kim B-H, Mohammed A, Pyun J-Y (2020) Mobility-aware resource assignment to IoT applications in long-range wide area networks. IEEE Access 8:186111– 186124 11. Anwar K, Rahman T, Zeb A, Khan I, Zareei M, Vargas-Rosales C (2021) RM-ADR: resource management adaptive data rate for mobile application in LoRaWAN. Sensors 21:7980. https:// doi.org/10.3390/s21237980 12. Liu S, Vicente LN (2021) The stochastic multi-gradient algorithm for multi-objective optimization and its application to supervised machine learning. Annal Oper Res. https://doi.org/ 10.1007/s10479-021-04033-z 13. https://www.mathworks.com/products/new-products/release2020a.html
ESCA: Embedded System Configuration Assistant Akramul Azim and Nayreet Islam
Abstract The core center of present-day control and processing systems is dependent on embedded systems. Every day, we come across new devices and hardware equipped with embedded software, gradually increasing the number of embedded systems. The size of embedded software is growing, and the requirements are growing even faster. As day advances, systems are becoming more complex, and we face difficulty in figuring out the best suitable system for a given purpose and specification. In this paper, we propose Embedded System Configuration Assistant (ESCA), a combination of two topics: Embedded Systems and Product Line Engineering that provide a scalable solution to the problem. ESCA selects the best configuration of hardware and devices required to build embedded systems for a particular purpose by identifying the system through a set of requirements and constraints provided by the user. ESCA adopts product line engineering approach which allows the development of diversified quality software in less amount of time, reduced development cost, and higher quantity. We use variability modeling which enables ESCA to have customization, extension, and configuration ability for large input size. Therefore, ESCA is scalable and fast in finding possible configurations of an embedded system. A prototype tool is developed to demonstrate the feasibility of the proposed ESCA.
1 Introduction In recent years, methods of system development have changed. The industries need to produce a massive amount of product that can meet varied customer requirements in a short span of time. It is also necessary for the manufacturers to build each system at a lower cost. This problem can be resolved by using a product line engineering A. Azim · N. Islam (B) Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, Ontario L1G 0C5, Canada e-mail: [email protected] A. Azim e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_27
303
304
A. Azim and N. Islam
approach that enables the system to be produced at a quicker and cheaper rate in comparison to individualized production. However, use of product line can significantly reduce the potentiality of system diversification as opposed to an individually produced system. Although standardized mass products lead to low technological investments, different people want different products for a particular purpose. These lead to rising demand for taking customers’ requirements into account and giving them what they wanted. Therefore, we need to have the ability for large-scale system productions which can fulfill individual customers’ needs. We experience a massive increase of embedded systems in many day to day applications [15]. For embedded systems, we need to identify appropriate system components and aid in the development of versatile higher quality system at lower costs and in a shorter time. Due to the escalating number of design specifications and available embedded technology, selecting the most appropriate embedded technology is proving to be a big challenge for modern embedded system manufacturers. To cope with the speed of innovation, we propose a framework called Embedded System Configuration Assistant (ESCA), which is developed using product line engineering as its underlying platform to search out the most befitting embedded system based on the constraints provided by the user. Over the last decade, the implementation of the product line engineering has set substantial evidence of success in a wide range of fields and has revolutionized the concept of mass customization [11]. A feature-oriented approach bridges the gap between the need for future extensions and arising variability in hardware [1], and therefore, ESCA was reinforced by feature model to sustain changes ensuring flexibility and reusability. The product line engineering approach also helps ESCA to identify user requirements as features and analyze them based on currently available embedded systems to provide a scalable solution in a small runtime. Product line engineering at its most fundamental level recognizes product manufacturing as a combination of common features for building a platform, and variability for enhancing mass customization on a portfolio of reusable resources [11]. Reusable resources in product line engineering have helped to integrate individualism in standard mass production. This integration helps in keeping the quality and efficiency of end-product unchanged even in resource-constrained environments. The essential infrastructure robustness of product line lies in its reliance on the abstract concept of features which encapsulates design decisions, requirements, and implementations by efficiently communicating among the commonalities, differences, and variations throughout the manufacturing process. Feature models and cardinalities concentrate more on the common features and variabilities of a product with multiple features masking the overhead of implications. ESCA makes use of the abstractness of features and variation points to set user requirements in a pattern that can be utilized to figure out the most suitable embedded system. The architecture of ESCA is flexible to allow variations and extensions, thereby supports improvements with time. Over the years, there has been extensive improvements in the field of embedded systems and achieving hardware reliability has been a key motive of the embedded industry. Nowadays, a new state-of-the-art embedded technologies are being built, and their numbers are skyrocketing. In the competitive market of safety-critical sys-
ESCA: Embedded System Configuration Assistant
305
tems where the fault tolerance is very high, and little fluctuations in performance may lead to potential safety issues. Stakeholders are finding it quite impossible in processing all the design constraints and figuring out the best suitable system to manufacture their architecture is difficult [4]. This difficulty has motivated toward the building of ESCA in order to assist industries and novices in recommending the best embedded system to serve their purpose. The lightweight searching performed in ESCA is unique in its own way as it fuses product line engineering in enhancing runtime efficacy with the least overhead of memory. As more and more systems are being manufactured, such dedicated assistant is necessary to obtain the best performance. Inspired by the advancing trends in embedded technology, ESCA’s adaptive feature model allows development without bringing significant alteration to the platform. Therefore, it aims to achieve runtime efficiency and rising variability management at the same time. The key focus toward the creation of ESCA based on feature models was the ever increasing variations of embedded systems. The Product Line approach toward solving the filtering problem was basically to keep pace with the change and allowing addition of features to be done without bringing changes to the platform. The components are reusable and utilizing cardinality in the feature model will allow a flexible and powerful model. Production of a custom-made system can perform a specific set of functions but requires more time and money. Moreover, no comprehensive resources are available to deal with the various design and implementation issues of this technology. Therefore we need an assistant that recognizes user’s need and provides a range of suitable systems which can fulfill the need of users. The rest of the paper is organized as follows: Sect. 2 describes the problem statement of the research. An overview of the system model is presented in Sect. 3. Section 4 demonstrates methodology and workflow that describes the different stages involved in the research. Section 5 shows some related work in this area. Section 6 presents the experimental analysis of our approach. Section 7 provides some discussions on future works and extensions, and finally, Sect. 8 concludes the paper.
2 Problem Statement A number of approaches exist that perform filtering based on a given set of constraints. However, much work is yet to be done on making the screening method more dynamic by observing the problem from a different paradigm. Mass manufacturing industries often adopt cutting-edge strategies or transformed ideas for bringing changes in their rate of production. Hence, newer criteria are continually being added and keeping track of perpetual changes is a challenging task. Another consistency issue exists in preserving the speed of search while monitoring the changes. So, this paper proposes a dynamic filtering system using a prominent product line engineering approach being implemented on the pacing embedded systems market that will allow building on adding criteria and maintaining the timing constraints at the same time. Here, we aim to address the constraints and added criteria as features to enable
306
A. Azim and N. Islam
an adaptive infrastructure of the program. Goal: Assume that, we have a list of criteria modeled as features to determine a list of suitable embedded systems on the basis of user’s selection. The objective of this paper is to build an application that supports the initial stage of building a determinate embedded system by selecting the best configurations of devices that satisfy a set of requirements and constraints provided by the user and returning specific final products and configuration for a product family of embedded systems.
3 System Model and Assumptions Embedded systems control most of the everyday devices used in a variety of sectors like automotive, industrial, commercial, and military. A system s consists of various components, which can be denoted by s = {c1 , c2 , . . . , cn−1 , cn }
(1)
Here, n is any positive natural numbers n = {1, 2, . . .}. One of the main challenges in embedded system development is the identification of various user requirements and constraints. Our system uses software product line techniques for identifying the combinations of possible devices. The variability extractor is responsible for the identification of different aspects and components which can differ from product to product. One of the main challenges in determining appropriate configuration is the identification of features. In our work, a feature f can be any component that has unique or distinguishable aspects in terms of functionality or performance. We perform feature-oriented development in our work. The feature collector extracts various features from s based on the user requirements. We also determine various dependencies among the features. We use Feature logic associator to determine logical associations among the features based on the dependencies. Final specification generation is responsible for generating a combination of end products using which helps the developer to identify an appropriate combination of products that satisfy their requirements.
4 Methodology Embedded systems can be defined as hardware/software system that can be built in some devices and provides some dedicated functionality in a large variety of domain often maintaining some real-time constraints [10]. In recent days, embedded systems are one of the prime focuses of scientific research and development [8]. We can observe an exponential increase of embedded systems in many devices and
ESCA: Embedded System Configuration Assistant
307
places [8]. Due to the vast quantity of different assets to build an embedded system, it is often hard for the beginners to know the correct methods to implement their first systems. In this section, we present a step-by-step walk-through and various methodologies and techniques involved in our work that assists both user and developer in determining the best set of system configurations and a list of appropriate devices along with their various dependencies using the concept of software product line engineering and feature model. To provide a better understanding of our approach, we present an example of home automation system. Example 1 Consider that we need a home automation system that will provide basic comfort and security to its user at a low cost. To address this goal a smart home can have different sensors and actuators that can be connected to a well-designed user interface.
4.1 Requirement Analysis Our process begins with the determination of user expectations for the system. Although the term requirement can have a different meaning in various context, in our work, we identify the key components that have aspects and attributes which can vary from other components and define it as a feature. This feature often has many functional and non-functional specifications. Some of them are quantifiable, some of them can be described qualitatively playing a vital role in the system development (Fig. 1).
4.1.1
Purpose of the System
In our work, we classify the purpose of a system into two types, such as general purpose and extra features. General purpose requirements are those that the system
Fig. 1 Architecture of ESCA Purpose of the system
Analysis algorithm
Devices related to requirments
SPL techniques
One or more possible configurations
Constraints of the system
Feature-oriented development
List of devices
Requirment Analysis
ESCA
Possible Configurations
308
A. Azim and N. Islam
must have at any cost to perform. For example, a mobile system must have a minimum amount of processor, RAM, and storage. In addition to that, it must provide some basic functionalities like making a phone call, displaying information on a screen. Although the system can function without extra features, it provides some advanced additional functionalities to the system. The general purpose features are linked with extra features and often have some relationships and ties with them. For example, features like GPS, mp3 can be termed as extra features as they provide some additional functionality to the mobile system. For example, in Example 1 to ensure safety, a smart home can have some requirements like detection of danger caused by fire or electricity. It can have a camera system to perform surveillance and generate some appropriate notifications in the event of a burglary. The home automation system can also provide some energy management features like smart lighting, heating, etc.
4.1.2
Know the Devices Related to Some Requirements
At the initial stage of the system development, we also need to know various devices that can be used to determine and sometimes act on the environment. The system can gather different information from the environment using sensors and use actuators to perform or act on the environment. For example, in Example 1 to detect the presence of a person in a room, we can use infrared sensors. We can use light actuators which have the ability to turn the switch on or off depending on the presence of a person in the room. The system can also have some control devices that can process the data read from the sensors and in some conditions activate appropriate actuators if needed. A fingerprint-based door security system performs authentication of the system, determines whether the door is open or closed, and uses an actuator to take necessary actions. In this work, we assume that appropriate software is deployed for each device to perform smoothly. Now a device (sensor or other hardware) can be dedicated to a fixed purpose, or it can be used to enable another device (often in combination) with an ability to be flexible in numerous tasks.
4.1.3
Know the Constraints of the System
The user can provide some constraints to the system. The constraint can be fixed or can be limited to a certain range. In this work, we propose two constraints, namely 1. Purpose constraints and 2. Performance metrics. A constraint can be associated with one or more features of the system. There exist different relationships among the features depending on the constraints. For example, a mobile system may or may not have a camera. However, it must have a screen with some basic display features (at least). If the system contains a camera, a constraint is put such that screen must be a high resolution that supports video capturing and display, instead of the basic screen. Another example of the constraint is that in smart home presented in Example 1 can
ESCA: Embedded System Configuration Assistant
309
contain an actuator that has to shut down doors/windows in a certain situation within some predefined time limit or, in the case of fire it can use an actuator that sprays water in a fixed time constraint.
4.2 Software Product Line Techniques In the initial stage of the development stage of an embedded system, it’s hard to know that whether the current components or design are good and will perform well. Using embedded system configuration assistant we can identify different variants of the system. Rather than only relying on our own knowledge, ESCA provides a trusted and well-defined build state that helps the developer to develop an efficient system using software product line and feature model. We use a software engineering technique that allows us to determine similar embedded systems that have a common purpose. A product line approach can be used to produce such systems using some common set of core development assets. Software product line technique allows us to make efficient and predicted reuse of software components. We use variability modeling that describes different variable aspects of the system. The key to developing a series of software products is to identify their commonalities, which is called the base of that product family. To be able to have mass customization, we need to determine the components that can have variable aspects from each other. The base allows us to manage system variability efficiently. In our work, for any system, we identify the components that can vary regarding different fixed or ranged properties. In our project, each variable component can be identified as features. Variability modeling allows us to determine combinations of different embedded systems in a comprehensive and fast way.
4.3 Feature-Oriented Development Feature-oriented development architecture is the key to what makes ESCA a powerful tool. The basic inspiration lies in the definition of features where a feature can be a valuable customer aspect [16]. In our work, we implement this key concept where we consider each customer demand and identify the key software components that have one or more attributes that can vary from other products. Introducing features allows not only modularity but also presents a high degree of abstraction to strategize and initiate a foundation toward solving the problem. Feature-oriented development summarizes user demands in a chained link of dependencies that can govern the backbone platform and filter user-specific requirements in the product line engineering. Methodologies such as FODA, FORM, and FeatuRSB provide a solid reliability that ensures linkage among features without altering the flexibility and maintainability of modern software architecture. Besides, maintenance
310
A. Azim and N. Islam
and flexibility provide reusability and robustness which are the inherent benefits of the feature model. The idea of FORM [9] a concrete upgrade of FODA is a natural fit for framing ESCA’s basic framework. ESCA maps architectural components based on features in an interdependent manner by classifying user specifications concerning the general purpose, extra features, and constraints. Therefore, ESCA naturally builds on a common base or platform and digs deeper down based on requirements provided. This process would have been rather sophisticated and complicated to execute if feature-oriented architecture was not implemented particularly for the fact of taking into account all the constraints as per classified. ESCA produces initial results based on mandatory requirements and filters based on arising interdependency of technology or constraints. This is done searching from a constantly updating database containing details of product classified as per family. The initial setup is tedious, but its foundation strength ensures its longevity. ESCA generates a relational model of the user desired family product prepares and an initial set of combinations and filters away all the product descriptions that don’t address users needs and presents the user with a recommended setting for the purpose associated. This is the property of credibility of ESCA as the tool is capable of breaking down a complex set of requirements into separate features and establishing a link between them to enable the filtration. At the same time, compared to other analysis and matching algorithms ESCA has better runtime optimization especially for handling larger datasets. So, the only major task of ESCA lies in creating the interlocked branches of features, and optimized filtering takes care of the rest. This also calls for shorter execution and processing times side by side keeping ESCA a lightweight technology. The variability management lies in the feature model generation, and the very model governs the hierarchical path for ESCA to traverse. An example of home automation related sensor search can demonstrate in brief the potential lay hidden in ESCA. A user in demand for a smart home that comes with automated lighting systems will turn off the light when user’s presence cannot be traced in each room. So, an in-house tracking sensor would suffice the need to locate the user. However, if the user also desires an anti-theft system to be equipped with the lighting system ESCA will take the feature into consideration and will recommend a perimeter tracking sensor to detect intruders and turn on the lights to make the impression of user’s presence. This will also require another major feature of the door locking sensors to be activated at the same time for best security measure. Also, if the user demands remote access, it should come equipped with features of sending and receiving data based on user’s cost and desired distance. In this case presented, ESCA will, at first list out its databases’ home automation lighting system and will sift through the list as per user specification to recommend the best choices to the user. The user will then select the best out of the prescribed as per weighting factor allotted for each feature. To do this ESCA will prepare a basic model for home automation light and tracking sensors and chain them in a bond that is governed by user demands and apply for necessary extensions. Then it will traverse the model and compare the description of each related item in the list to either choose or filter leaving the most appropriate set of sensors at the end of analysis and matching.
ESCA: Embedded System Configuration Assistant
311
5 Related Works In recent times, software engineering fields are utilizing the search-based software engineering (SBSE) for problem optimization. This term, though relatively new, has undergone heavy researches and discussions after its birth [7]. Significant successful reports about the adoption of search-based software engineering (SBSE) are found in [18, 19]. An improvement form of search-based algorithm in SBSE has been conducted for the purpose of optimization applying algorithms such as greedy algorithm [13]. ESCA focuses more on the experimental outcomes using feature models to enhance customer experience. The Search-based domain is backed up with a lot of theoretical foundations which have limited real-life implementation despite their immense potential and possibility. Especially, in Software Product Line-based software the concept of feature models can impact quite significantly to optimize the overall process. ESCA envisions this monumental advantage of feature models and how it can bring about a substantial difference from the present trends. Analyzing constraint handling from a different point of view is the focal point of ESCA. Not many feature-based search tools have been developed which make use of Software Product Line. ConfBPFM is an assistant build based on metaheuristic search algorithm or Genetic algorithm to construct a business process family [14]. Moreover, application of genetic algorithms was also used in real-time stress analysis of ETHOM tool which is another optimization problem [2]. Research works such as Big Data application in real-time traffic operation and safety monitoring reveal the fact that big data analysis is a major concern to construct safety-critical system [17]. Analyzing such enormous amounts of data can be very expensive if proper modeled approaches are not taken. ESCA identifies constantly increasing features or constraints as to be a potential source of big data and takes measures to prevent the breakdown of the system through the variability management of interconnected features of feature models in SPLE. This, in turn, will not only allow effective feature managements but also augment further extension. The use of Product Lines and variability models are mainly confined to optimization, and not much efforts are put to its use in scalability enhancement in the software industries. On the contrary, assembly line and other hardware manufacturing models regard feature model as a tool to ensure a scalable model to preserve the deadline. Applying the heuristic algorithm to solve searching problems in an optimized fashion is ideal. But it costs extended search time and significant computational investment. The same holds for subset search algorithm as well. ESCA makes use of features to optimize the search space and reduce redundancy and computation time which is analogous to the Feature. ESCA aspires to provide reduced execution time alongside the flexibility of overall improvements. Dynamic SPL [5, 6] an imposing and modern architecture that will allow smaller runtime if used in combination with aspect models especially for interacting features according to research results [3]. Another exemplary language is Caesar which combines AspectJ and FOA for proper constraints and variability
312
A. Azim and N. Islam
management that run on a similar feature-oriented structure [12]. Based on multiple studies conducted on feature models ESCA’s decision about being a Product Line Engineering model was finalized. ESCA focuses on the subtle but important advantages that SPL and variability models provide. Summing up all the big and small benefits of Product Line Engineering to solve the old problem of optimized search by analyzing the scenario from a different angle.
6 Experimental Analysis ESCA employs software product line techniques and uses variability to model to identify various aspects of the products that can vary in different systems. We use a feature model to identify various system features and relationships and associations among them. To the best of our knowledge, a comparative framework to analyze the improvements of our proposed scheme is absent as the system uses feature dependency and links among the dependency to create a feature tree. The feature tree is used to identify possible combinations of devices. ESCA begins with gathering requirements from the user. The user demand can be one of three kinds, general purpose, extra feature, and constraints. The general purpose is the mandatory properties of a particular embedded system. Apart from these properties, a system may have some additional properties. For example, a system providing the weather information of its surrounding must have a minimum specification of CPU, RAM, and ROM which are mandatory requirements. An example of minimal requirements is presented in Table 1. However, apart from providing information about pressure and temperature, it can have some additional sensors that provide humidity information. Additionally, the system user can put some constraints on the system regarding its functionality and actions. ESCA is a lightweight but well-built open web application. It uses a client-server architecture in its core. ESCA uses GlassFish as its server. ESCA can be operated on various platforms. The client web application is responsible for gathering user input. Here, apart from the various requirements and user need the user can also insert necessary information about various RAM, ROM, Processor, and development kit board. The information is stored in a database on the server (Fig. 2). The assistant module of ECSA is responsible for providing appropriate specifications based on the user demand. At first, it identifies the minimal specification for the
Table 1 Sample minimum specification of a motion detection system General purpose Motion detection Minimal spec
CPU: 8-bit RAM: 1 MB ROM: 128 kb
ESCA: Embedded System Configuration Assistant
313
Fig. 2 Architecture of ESCA tool and web-server relations
desired system. In this work, the minimal specification can be identified as the list of metadata (rather than a list of specific devices) that the system uses or identifies as a commonality of the products. After the system gathers the minimal specification, the next step is to get the extra features and hardware devices. The devices can also have changes in specification related to any additional feature. For example, the humidity sensor has some requirements like the connector, device, and maybe some specification change like a higher RAM amount or a different data bus in the processor (from 8 to 16 bit). We use a variability model to identify these aspects and each attribute or component in a system that can vary is identified as a variable. The system can start identifying appropriate final specifications once the requirements and constraints are given. Our system uses a feature model, where a feature can be stated as a distinct aspect that has a significant impact on the system characteristics. We use software product line development technique that aims to provide a systematic and efficient configuration assistant. The system uses a feature model that identifies dependencies among the feature. A feature diagram can be used to represent the relationships of the features along with their dependencies. Our system also uses feature configuration where we can define a set of features f = { f 1 , f 2 , . . . , f n−1 , f n }, where f i can be identified as a member in the software product line. A feature f i is only allowed in the configuration if it does not violate the requirements and specifications. Therefore, our system develops a tree data structure called Featur e that contains the feature of the possible solution as given by the user requirements. Each feature f i in Featur e can have child feature f j . There exist various relationships like compulsor y, (where a f i must contain f j ), optional, (where a f i may or may not contain f j ), or , (where a f i may contain at least one of its child), xor , (where a f i contains one of its child). Additionally, Featur e contains more relationships r which implies that if one child is selected another one is also selected, and e where only one child can be selected. The system matches each product using f eatur e and once it finds the right device is added to a list called f inals peci f ication. We perform a matching with user constraints and remove the constraints that do not have a match. The final product specification is presented to the user in the final step. Each list is a system configuration that is appropriate based on the user requirements. ESCA can create
314
A. Azim and N. Islam
more than one list, which can contain different possibilities and a combination of devices that is possible to build for the user requirements. Once the end product is found, the lists can also be filtered by constraints. For example, the price can be termed as a constraint and it can be used to filter the systems depending on how much it cost to buy every piece. Additionally, there exist other constraints like weight, volume, or size that can be used to filter the list. Our experimental analysis presents runtime analysis and timing comparison.
Scalability The idea is to show that ESCA is scalable. A simple way to prove the scalability is to show the number of features ESCA can handle. Due to using modularity concepts as provided by product lines, ESCA becomes scalable. If a new constraint or feature gets added in the system and the number of all combined possibilities of searches without using the modularity concepts becomes x, then x >> y where y is the number of all searches using the product line methods (Fig. 3).
Timing Comparison We present timing comparison for ESCA tool. As ESCA uses feature tree for the identification of the configurations, it consumes significantly less amount of time
Fig. 3 Results obtained from ESCA interface
ESCA: Embedded System Configuration Assistant
315
Table 2 Comparison of the runtime of ESCA versus traditional search algorithm Num of constraints Traditional algorithm runtime ESCA runtime (s) (s) 10 15 20 25 30
0.0006557 0.00095763 0.00136056 0.001531779 0.003018802
0.000532 0.000577 0.000612 0.000645 0.000683
Fig. 4 Runtime analysis for traditional search algorithm
as compared to traditional search algorithm. The reason behind this is it finds the solution without exporting much in the tree path and therefore the time requirement is less. A demonstration of comparison of runtime analysis for traditional search algorithm with runtime of our approach has been provided. Figure 4 presents runtime of traditional search algorithm whereas Fig. 5 provides runtime of our approach for different number of constraints (Table 2).
Fig. 5 Runtime analysis for ESCA
316
A. Azim and N. Islam
This is to show the timing comparison of ESCA using different methods of implementation. The goal is to show that ESCA performs best (fastest) when using product line methods, compared to random searches or other ways.
7 Discussion Despite being a new and innovative piece of technology, ESCA has potential possibilities of extensions that can make ESCA more powerful than it is at present. The current prototype version of ESCA has the capacity of searching a database that has to be regularly updated by the administrator. However, to keep pace with the diversified technology being produced automatic web crawling feature will be an ideal fit for ESCA’s extension. This will keep ESCA dynamic and up to date and ensure the best results for the user. Another ideal place of expansion is in the fields of machine learning. To reduce the number of recommended specifications and to give a smarter solution, application of machine learning into ESCA will enable ESCA to make a smarter and more concise data list through gradual learning. With the increasing number of embedded systems and their diverse variability, it is easily predictable that in the near future, the big data problem is impending. Hence, incorporation of big data handling techniques at the same time will be a must required enhancement to maintain ESCA’s performance and speed. Through further extensions, ESCA will not only be an accessible tool for developers but also for academia.
8 Conclusion Our work is a summarized approach toward solving the problem of multiple constraint-based identification of appropriate devices based on user requirement. Speed, stability, and scalability are the three components that are being ensured in ESCA through the implementation of feature-based development. The qualities of the robustness of ESCA and its scope of potential improvements hold up the innovativeness of ESCA. To the best of our knowledge, such a dedicated tool to determine recommendations for users has not yet been done in the embedded system application development domain and applying a feature and variability model to combat the problem is unique. Acknowledgements The authors would like to thank Marcelo Esperguel and Sharar Mahmood for their help in this work.
ESCA: Embedded System Configuration Assistant
317
References 1. Apel S, Batory D, Kästner C, Saake G (2016) Feature-oriented software product lines. Springer 2. Briand LC, Labiche Y, Shousha M (2005) Stress testing real-time systems with genetic algorithms. In: Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM, pp 1021–1028 3. Dinkelaker T, Mitschke R, Fetzer K, Mezini M (2010) A dynamic software product line approach using aspect models at runtime. In: Proceedings of the 1st workshop on composition and variability, pp 180–220 4. Fitzgerald J, Larsen PG, Verhoef M (2014) Collaborative design for embedded systems: comodelling and co-simulation. Springer Science & Business 5. Gomaa H, Hussein M (2003) Dynamic software reconfiguration in software product families. In: International workshop on software product-family engineering. Springer, pp 435–444 6. Hallsteinsen S, Hinchey M, Park S, Schmid K (2008) Dynamic software product lines. Computer 41(4) 7. Harman M, Jones BF (2001) Search-based software engineering. Inf Softw Technol 43(14):833–839 8. Henzinger TA, Sifakis J (2007) The discipline of embedded systems design. Computer 40(10) 9. Kang KC, Lee J, Donohoe P (2002) Feature-oriented product line engineering. IEEE Softw 19(4):58–65 10. Lee EA (2002) Embedded software. Adv Comput 56:55–95 11. Metzger A, Pohl K (2014) Software product line engineering and variability management: achievements and challenges. In: Proceedings of the on future of software engineering. ACM, pp 70–84 12. Mezini M, Ostermann K (2004) Variability management with feature-oriented programming and aspects. In: ACM SIGSOFT software engineering notes, vol 29. ACM, pp 127–136 13. Nohrer A, Egyed A (2011) Optimizing user guidance during decision-making. In: 2011 15th international software product line conference (SPLC). IEEE, pp 25–34 14. Ognjanovic I, Mohabbati B, Gaevic D, Bagheri E, Bokovic M (2012) A metaheuristic approach for the configuration of business process families. In: 2012 IEEE ninth international conference on services computing (SCC). IEEE, pp 25–32 15. Patterson DA, Hennessy JL (2017) Computer organization and design RISC-V edition: the hardware software interface. Morgan kaufmann 16. Riebisch M (2003) Towards a more precise definition of feature models. In: Modelling variability for object-oriented product lines, pp 64–76 17. Shi Q, Abdel-Aty M (2015) Big data applications in real-time traffic operation and safety monitoring and improvement on urban expressways. Transp Res Part C: Emerg Technol 58:380–394 18. Van der Linden FJ, Schmid K, Rommes E (2007) Software product lines in action: the best industrial practice in product line engineering. Springer Science & Business Media 19. Weiss DM, Clements PC, Kang K, Krueger C (2006) Software product line hall of fame. In: 2006 10th international software product line conference. IEEE, pp 237–237
On Detection of Hardware Trojan in Memristive Nanocrossbar-Based Circuits Subhashree Basu, Ranjit Ghoshal, and Malay Kule
Abstract Chip fabrication outsourcing can lead to malicious modifications by thirdparty companies and could pose a grave danger to the chip’s security and reliability. Such changes in the functionality of the IC or the addition of extra features is known as Hardware Trojan which cannot be detected by design-time verification and postfabrication testing. Hence, methods of identification of Hardware Trojan are not only tedious but sometimes futile. Though some techniques have been designed, they are only for Trojan detection. There is as such no way to eliminate the Trojan. Not only that, but some of these methods also impart the chip unusable. In this paper, we study the insertion and detection of Trojan in a BCD square root circuit implemented in a memristive nanocrossbar. First, we insert the active and passive Trojan, and then we discuss the methods of identifying the Trojans therein. The experimental results obtained in our work are satisfactory and prove the correctness of the proposed method. Keywords Hardware Trojan · Memristor · BCD · Nanocrossbar · Active trojan · Passive trojan
1 Introduction Integrated Circuit (IC) design phases may be distributed in different geographical locations. This outsourcing of design and fabrication increased profitability but produced ample scope to introduce malicious behavior to parts of the IC, known as Hardware Trojan Horse (HTH) circuitry [1–3]. This is a grave danger to the chip’s security S. Basu (B) · R. Ghoshal Department of Information Technology, St. Thomas’ College of Engineering and Technology, Kidderpore, Kolkata, India e-mail: [email protected] M. Kule Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_28
319
320
S. Basu et al.
and reliability of ICs used in critical applications like defence, health, research of national interest, etc. Trigger and payload are the two basic parts of a Trojan [4, 5]. A Trojan trigger is optional and depends on various events in the circuits which occur in the form of signals. The trigger activates the payload once a suitable condition is found and the payload in turn performs the malicious behavior. The majority of the time, the payload is inert as the trigger is activated very rarely. When the payload is not active, the IC performs as a Trojan-free circuit, making it challenging to identify the Trojan [5, 6]. There are two types of Hardware Trojans, functional and parametric. Functional HT does some malicious insertion or deletion of the existing part of the circuit. Parametric Trojan does not affect the actual output but affects the performance by changing the path delay, some loose wires, etc. While there are methods available to detect HTH, existing methods may not be able to detect HTH embedded in a circuit implemented with memristors. So, there is a need to create HTH involving memristors and methods to detect it. In this paper, the detection of Trojan in a BCD square root circuit implemented in a memristive nanocrossbar is studied by inserting active and passive Trojan. We have taken a BCD square root circuit as the reference circuit. We have first introduced HTH in two ways and then discussed the ways by which such HTH can be detected. Methods discussed in this paper are completely different from existing detection methods. The rest of the paper is organized as follows. Section 2 describes some preliminaries. Section 3 is a literature survey followed by our proposed method in Sect. 4. Section 5 elaborates on the experimental results and their analysis. Section 6 concludes our work.
2 Preliminaries Resistors with varying resistance are called memristors as their resistance values depend on the history of the devices. So, the memristance signifies the data value stored in it and the memristor can be used as memory. The non-volatile property of the memristor makes it very effective in memory and circuit design. Chua [7] proposed the concept of memristor which shows the relation between charge and flux. A time-invariant current-controlled memristive system is represented by d x/dt = f (x, i)
(1)
v(t) =R(x, i) · i(t) Where x stands for the internal state, i(t) stands for device current, v(t) stands for device voltage, R(x, i) stands for memory resistance, and t is the time. The device’s history affects how the memristance changes. The memristor symbol is shown in Fig. 1.
On Detection of Hardware Trojan in Memristive …
321
Fig. 1 Symbolic representation of memristor
Fig. 2 Memristive nanocrossbar architecture
The memristive nanocrossbar consists of two parallel electrodes and a conducting material like TiO2 in between the lines. The basic structure of a memristive nanocrossbar is shown in Fig. 2. The lattice is of 40–50 nm wide, and the platinum wires laid on top of one another are of 2–3 nm thick separated by a switching element of 3–30 nm thickness. These switching elements consist of two equal parts of TiO2 . The layer connected to the bottom platinum wire contains the pure TiO2 (undoped) whereas the other half comprises the doped conducting TiO2 consisting of oxygen vacancies. The negative voltage causes oxygen vacancies to move out of the layer making the switch to go to the off state (high resistance state). The positive voltage causes oxygen vacancies to go to pure TiO2 layer due to repulsion and hence makes the switch to go to the on-state. Detection techniques can be destructive and non-destructive. Destructive approach depackages the ICs. This approach guarantees the identification of malicious modifications in the IC but it is very time consuming. Moreover, after this extensive tedious exercise, malicious presence in one single IC can be detected and that too by imparting the IC unusable. Nondestructive techniques use side-channel signal analysis and functional testing to authenticate fabricated ICs. Side-channel analysis [8] approaches try to monitor circuit properties, such as delay, which in turn can be used to identify hardware Trojans. Jin et al. [4, 9], temperature, radiation, power, and leakage power. They take into account, the side effects (i.e., electromagnetic radiation, power, heat, extra path delay) caused by the Trojan trigger/payload activation. To distinguish Trojaninfected ICs, the majority of detection methods, however, use “golden ICs” (Trojanfree ICs). Additionally, while side-channel analysis techniques may be somewhat
322
S. Basu et al.
successful in identifying hardware Trojans but they occasionally fail to extract their tiny, anomalous side-channel signals in the face of environmental changes and to obtain high coverage of every gate.
3 Literature Survey The author in the paper [8] explores certain types of physical attacks like direct data probing and electromagnetic analysis. Next it analyzes these attacks by sidechannel analysis and has chosen smart cards to do the analysis. Paper [4] focuses on a statistical approach for separating the effects of process variation from HT anomalies. This method assumes the existence of a high-resolution path delay and generation of the test vector strategy based on the TDF model. The detection method is based on the Golden Chip-based model. Tehranipoor and Koushanfar [2] have utilized the delayed switching feature of the memristor to control the changing time of the memristor state in memristor Content Addressable Memory(M-CAM). This enhances the performance by saving energy and decreasing the search time. Xiao et al. [9] discuss a “clock sweeping” methodology for determining route latency without the use of additional hardware. After gathering the delay data, the ICs’ various delay signatures are created, and it is then determined whether or not the ICs contain Trojan. The Trojan trigger and payload incorporate additional circuitry, which has unintended consequences such as increased area, timing, power, and radiation. As a result, they can be utilized by defences to find Trojans. Therefore, techniques are suggested to improve Trojan design and reduce its impact on the original design in order to avoid Trojan identification [10, 11]. Cha and Gupta [11] propose a new attack methodology that is aware of every known non-destructive side-channel analysis method of detecting Trojan and imposes negligible impact on every measurable circuit parameters. Gate resizing hides the delay effect created by the Trojan. In [12] Ngo et al. suggested the idea of a “encoded circuit” as a method to safeguard the insertion of HT. Sidhu and Singh [13] discuss a chip-averaging method that can help identify Trojans by calibrating intra- and inter-chip process variations and calculating route delays using an on-chip TDC. Kyle et al. [14] implant an HT algorithmically without changing the host circuit’s functional behavior and any additional unrelated logic. The amount of IC space needed by HTH is reduced by 75% as a result of this implantation. Additionally, the host circuit latency and leakage power are not increased.
On Detection of Hardware Trojan in Memristive …
323
4 Proposed Methods of Creation and Detection of Hardware Trojan We propose to implant the HTH by two methods, one is functional and the other is parametric. In this paper, first, we aim to create a hardware Trojan in a memristive nanocrossbar-based BCD square root victim circuit, then to detect the presence of Hardware Trojan therein using our proposed Hardware Trojan detection method.
4.1 Covert Implantation in the “Don’t Care” Space Creating Hardware Trojan Hardware Trojan is obtained by manipulating the Boolean functions of the victim circuit. The simplest form of covert implantation is to place combinational hardware in the don’t care space of the victim circuit. Here, we have taken the BCD square root circuit designed to compute the square root of the BCD numbers [15]. The mapping of the domains with their corresponding codomains is shown in Fig. 3. It is observed that though 1 is not the exact square root of 2 and 3, they are mapped to 1 by considering only the decimal part of the square root value. Similar is the case of 5, 6, 7, and 8 which are mapped to 2. The crossbar layout of the BCD square root victim circuit is shown in Fig. 4. 0 will be mapped to 0, 1, 2, 3 to 1, 4, 5, 6, 7, 8 to 2 and 9 to 3. For the inputs given in the memristors A, B, C, and D, the output memristors X0, X1, X2, and X3 will be high following the above logic. If more than one output is high, then the concept of priority is followed and the output with the highest weight is considered. Now, in order to implant in the “don’t care” space, we use a voltage-controlled switch that can be connected between X0 and X3. The voltage-controlled switch is initially closed and after reaching a certain voltage level, the switch opens, breaking the connection path between the two nodes as shown in Fig. 5. As a result, for domain 000, X0 should be high but due to the connected path with X3 which is in the low state, it will also be in the low state. As the switch gradually goes to an open state, X0 gradually achieves the actual value. This phenomenon can be seen only when the switch does not get sufficient voltage to be open; hence, this modification may not be detected in every operation of the circuit. Method for detecting Hardware Trojan The covert implantation causes change in the output of the nanocrossbar circuit. Moreover the heat dissipation of the circuit with and without covert implantation can also be used to detect hardware Trojan. The heat dissipation increases with covert implantation which along with the changed output is a clear indication that some tampering has been done on the host circuit.
324
S. Basu et al.
Fig. 3 BCD square root domain-codomain relation
Fig. 4 BCD square root victim circuit
4.2 Delayed Switching Creating Hardware Trojan Another approach to introducing HTH is by delayed switching of the memristors. This is done by controlling the delay of switching the memristor state by controlling Ron and Roff of the memristor. According to [16], the time of a single cycle which includes switching from Roff to Ron and another single switching from Ron to Roff is given by
On Detection of Hardware Trojan in Memristive …
325
Fig. 5 BCD square root victim circuit with embedded Trojan
T =
Roff + Ron × Toff-on Roff
(2)
Here T is the cycle time, Toff-on is the switching time from off to on state. Roff = 10 K and Ron = 100 and T = 10 ns according to simulation values. With these values, the Toff-on is calculated to 9.9 ns. Now increase in the Ron to 5 K causes the cycle time to increase to 15 ns. This can be achieved by making the doped zone wider during manufacture. Now considering T = 10 ns, the Roff and Ron of the ideal host circuit are given by 10 k and 100 respectively. Then the Toff-on is calculated as 9.9 ns. Now, if we double the time of a single cycle, i.e., 20 ns, then Toff-on increases to 19.8 ns approximately, i.e., it also doubles. This can be done by increasing the difference between Ron and Roff . This clearly shows that controlling the Roff and Ron values controls the time of switching the memristor and subsequently, the output generation time can be controlled. While this doesn’t directly affect the circuit’s functionality, it unquestionably lowers the circuit’s performance. Method for detecting Hardware Trojan The switching delay method does not directly influence the functionality of the circuit. For a single memristor, the Toff-on is 20 ns approximately. If the fabricating components Roff and Ron are changed, then the corresponding total delay of the circuit will increase proportionately, causing the generation of output to be delayed. However, this has to be done during fabrication and is an irreversible process that is out of the scope of this paper.
326
S. Basu et al.
Fig. 6 Sample vector sets to detect stuck-at fault in different gates
4.3 Stuck-at-One/Zero Problem Creating Hardware Trojan Stuck at one and stuck-at-zero faults are two fairly prevalent errors that can be added as Trojans even during the fabrication process. According to [13], the threshold voltage is given by v(t) =
Ron ·
ω(t) ω(t) + Roff · 1 − · i(t) D D
(3)
The threshold voltage is calculated to be 1 mV using the simulation parameters mentioned above and assuming that i(t) = 10 μA. Now, by doping the entire width of the memristor, we can further reduce the Ron value while still allowing some current to pass through the device. It is comparable to the stuck-at-one defect, in which the memristor remains in state 1. On the other hand, the threshold voltage is calculated to be 0.1 V when the width of the doped zone is set to 0. The memristors will remain in state 0, which is comparable to a stuck-at-zero fault, for any voltage below this voltage. Method of detecting Trojan Hardware Trojans, which can be installed at the time of manufacture itself, can also result in stuck-at-zero or stuck-at-one problems. The stuck-at-zero/one problem is detected by stuck-at test which applies some test vectors and according to the output obtained, a conclusion is made as to whether stuck-atzero/one problem exist or not. For example, the diagram below shows the minimum set of test vectors for different gates. The same concept can be applied to memristor crossbar circuits also (Fig. 6).
5 Experimental Results Some outputs of the BCD square root victim circuit corresponding to co-domains 0, 1, 2, and 3 are shown in Fig. 7a–d, which comply with the expected outputs and that of the modified circuit after covert implantation is shown in Fig. 8, which shows a change in the output as discussed in the previous section. Though X0 should
On Detection of Hardware Trojan in Memristive …
327
(a) BCD Square Root output for input 0
(b) BCD Square Root output for input 1
(c) BCD Square Root output for input 4
(d) BCD Square Root output for input 9
Fig. 7 BCD square root output for input 0,1,4,9
Fig. 8 BCD square root output for covert implantation
be in a high state in an ideal situation, due to the influence of X3, it remains in a negative state and as the switch gradually goes to an open state, the path with X3 disconnects and X0 also gradually reaches the ideal output state. From the output of covert implementation(as shown in Fig. 8), a change in X3 causes a change in X0 as they are connected with voltage-controlled switch. Current at X3 gradually falls which signifies that the voltage-controlled switch is opening gradually, which leads X0 to reach its original state. The heat dissipation and area of the circuit with and without covert implantation are shown in Table 1. There is no increase in the circuit area as the number of memristors remains the same. The increase of heat dissipation, though very minor, with covert implantation indicates the presence of malicious elements in the circuit. The erroneous output shown in Fig. 8 is indication enough that the circuit is not acting in an ideal way.
328
S. Basu et al.
Table 1 Performance comparison before and after covert implantation Circuit No. of memristors Power dissipation (W) Host circuit After covert implantation
9 9
–1.12 –1.15
6 Conclusion This paper addresses the methods of introducing HTH in memristive nanocrossbarbased circuits and the process of identifying the presence of HTH therein. The proposed work shows that the covert implantation of HTH can be implemented both with and without modifying the functional characteristics of the victim circuit. In terms of area and power dissipation, the discreetly implanted HTH has a far lesser effect on the victim circuit [14]. The delay in the circuit comprises only the time required for the switch to be closed. Otherwise, the rest of the components are identical. On the other hand, the delayed switching is one of the parametric HTH which hampers the working of the circuit by degrading the performance metrics [16]. Such HTH is difficult to identify because the chips are fabricated within built-fixed parameters. So, any attempt to identify the HTH therein will impart the chip unusable.
References 1. Chakraborty RS, Narasimhan S, Bhunia S (2009) Hardware Trojan: threats and emerging solutions. In: IEEE international high level design validation and test workshop, 2009 2. Tehranipoor M, Koushanfar F (2010) A survey of hardware trojan taxonomy and detection. IEEE Des Test Comput 27(1) 3. Paris Kitsos, Artemios G. Voyiatzis: Towards a Hardware Trojan Detection Methodology,3rd Mediterranean Conference on Embedded Computing MECO - 2014 (ECyPS’),2014 4. Jin Y, Makris Y (2008) Hardware Trojan detection using path delay fingerprint. In: Proceedings of the IEEE international workshop on hardware-oriented security and trust, (HOST’08), pp 51–57 5. Voyiatzis AG, Stefanidis KG, Kitsos P (2016) Efficient triggering of Trojan hardware logic. In: IEEE 19th international symposium on design and diagnostics of electronic circuits and systems (DDECS) 6. Xiao K, Forte DJ, Jin Y, Karri R, Bhunia S, Tehranipoor MM (2016) Hardware Trojans: lessons learned after one decade of research. ACM Trans Des Autom Electron Syst (TODAES) 7. Chua LO (1971) Memristor-the missing circuit element. IEEE Trans Circuit Theory CT-18(5) 8. Koeune F, Standaert FX (2004) A tutorial on physical security and side-channel attacks. In: Foundations of security analysis and design (FOSAD) III, pp 78–108 9. Xiao K, Zhang X, Tehranipoor M (2013) A clock sweeping technique for detecting hardware Trojans impacting circuits delay. IEEE Des Test 2(30):26–34 10. Tsoutsos NG, Maniatakos M (2014) Fabrication attacks: zero-overhead malicious modifications enabling modern microprocessor privilege escalation. IEEE Trans Emerg Top Comput 2:81–93
On Detection of Hardware Trojan in Memristive …
329
11. Cha B, Gupta SK (2014) A resizingmethod tominimize effects of hardware Trojans. In: Proceedings of the IEEE 23rd Asian test symposium (ATS’14), pp 192–199 12. Ngo XT, Guilley S, Bhasin S, Danger J-L, Najm Z (2014) Encoding the state of integrated circuits: a proactive and reactive protection against hardware Trojans Horses. In: Proceedings of the ACM workshop on embedded systems security WESS 13. Sidhu RK, Singh T (2015) Different models of memristor. Int J Eng Res Technol (IJERT) 4(06):991–993 14. Temkin KJ, Summerville DH (2016) An algorithmic method for the implantation of detectionresistant covert hardware Trojans. In: Cyber information security research conference 15. Ismari D, Lamech C, Bhunia S, Saqib F, Plusquellic J (2016) On detecting delay anomalies introduced by hardware Trojans. In: International conference on computer-aided design 16. Chen W, Yang X, Wang FZ (2013) Delayed switching applied to memristor content addressable memory cell. In: Proceedings of the world congress on engineering 2013, vol I
Robust Control of Pulsatile Ventricular Assist Devices for Patients with Advanced Heart Failure Rajarshi Bhattacharjee, Shouvik Chaudhuri, and Anindita Ganguly
Abstract In this work, a robust control scheme has been developed for pulsatile left ventricular assist devices (LVADs) used for advanced heart failure patients, utilizing the sliding mode control philosophy. A widely accepted model for the cardiovascular system along with baroreflex feedback is used as the system. Simulations in MATLAB/Simulink show promising results for the proposed control scheme. Keywords Left ventricular assist device · Sliding mode control · Advanced heart failure · Pulsatility
1 Introduction In a report, the World Health Organization (WHO) reported that cardiovascular diseases (CVDs) claimed an estimated 17.9 million lives in 2019 alone [1]. In comparison, the ongoing COVID-19 pandemic is expected to have claimed over 6.4 million lives [2] in the past two years. These statistics indicate that CVDs present a much higher health risk without even being communicable, with around 75% of CVD deaths occurring in low and middle-income countries [1]. Among the group of disorders which make up CVDs, an alarming rise in numbers has been seen for Congestive Heart Failure (HF) with over 26 million cases reported worldwide [3]. From a hemodynamic standpoint, HF can be characterized as a pathophysiological state where abnormality in cardiac functionality leads to an inadequate R. Bhattacharjee Jadavpur University, Kolkata 700032, West Bengal, India S. Chaudhuri (B) Centre for Industrial Mechanics, University of Southern Denmark (SDU), 6400 Sønderborg, Denmark e-mail: [email protected] A. Ganguly Indian Institute of Science (IISc), Bangalore 560 012, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_29
331
332
R. Bhattacharjee et al.
cardiac output (CO) to the body at rest or during exertion. It is only possible to do so, in the setting of elevated cardiac filling pressures, at a rate which may not be in agreement with the requirements of the metabolizing tissues [4, 5]. The phenotype of the HF mechanism in patients is distinguished based on the metric, Ejection Fraction (EF), into two broad categories: Preserved (left ventricular EF > 50%) and Reduced (left ventricular EF < 40%), which roughly correlates with Diastolic and Systolic dysfunction, respectively [3]. When the patient begins to experience EF < 30%, the condition is medically termed as Advanced HF (AdHF) or end-stage heart failure [6, 7]. The ideal therapy for such cases is a Heart Transplant (HT). However, it is quite obvious that there are a lot of complications regarding the availability of such transplant organs. In order to supplement the demand for transplant, the use of Mechanical Circulatory Support (MCS), which can be utilized as Bridge-to-Heart Transplantation (BTT), Bridge-to-Recovery (BTR) or Destination Therapy (DT) for both adult and pediatric AdHF patients [7, 8] has become popular, primarily owing to the technological advancements in blood pump technology over the past decades. Rapid development in MCS technology since 2007 [3] has led to the development of Left Ventricular Assist Devices (LVADs), Right Ventricular Assist Devices(RVAD), Bi-Ventricular Assist Devices (BVADs), Total Artificial Heart (TAH), and Extracorporeal Membranous Oxygenation (ECMO) for clinical use [8]. Among these devices, LVADs are the most commonly used solutions and make up nearly 90% of the market share of the MCS [3, 9]. They are generally classified based on their working principle and characteristic outflow as Pulsatile/Volume Displacement or Continuous LVADs. Some of the commercially available pulsatile versions include the Thoratec HeartMate I XVE, Thoratec Incorporeal VAD, Abiomed BVS5000/AB5000, and Berlin Heart ExCOR, whereas the continuous flow versions include Thoratec HeartMate II, Jarvik 2000, MicroMed DeBakey and the Berlin Heart InCOR [8, 9]. Although the Pulsatile LVADs mimic the physiological outflow of the human heart, they are prone to reliability and durability problems owing to their modus operandi. On the other hand, the more popular, Continuous flow LVADs have been hinted to decrease pulse pressure and pulsatility in the arteries, which may lead to complications such as aortic insufficiency and gastrointestinal bleeding (GI) [9–11]. This has steered the debate over pulsatility in the direction of developing continuous flow LVAD (CFVADs) with pulsatility control algorithms as the way forward for the future generation of VADs [11]. In case of pediatric patients with AdHF, the usual long-term pulsatile LVAD used as BTT is the Berlin Heart ExCor [12]. However, literature suggests that there is a need to develop robust control algorithms for these pulsatile devices, since the performance of most available control algorithms is constrained by parametric variations and disturbances, which is quite obvious in physiological systems. Therefore in this study, the authors focus on the development of a robust controller for an assist device of pulsatile nature. The performance of the proposed controller is then compared with the control algorithms already available for this category of LVADs in the literature [12]. It would be interesting to evaluate the performance of such a control algorithm for pulsatility control of continuous flow LVADs in future works.
Robust Control of Pulsatile Ventricular Assist Devices …
333
The rest of the paper is structured as follows. In Sect. 2, the lumped parameter model of the cardiovascular system (CVS) is explained along with the baroreflex mechanism, followed by a description of the LVAD model. The combination of these models in the form of a synchronous physiological control system (SPCS) is also discussed briefly. Section 3 describes the control configuration considered for this study. In Sect. 4, the performance metrics used to quantify the controller behavior and the performance results for the controller are presented. The conclusions regarding the study are put forth in Sect. 5.
2 System Description This section describes the numerical model used for this study consisting of the pediatric cardiovascular lumped parameter model in [13] of the CVS with an electrical equivalent circuit given in [12]. The baroreflex model for regulating the systemic vascular resistance (Rsvr ) and ventricular end-systolic elastances (E v ) is given in [14]. An LVAD model is coupled to this equivalent circuit between the left ventricle and aorta. The structure of the complete numerical model is available in [12]. Each model component is briefly described below.
2.1 Cardiovascular Model The CVS used in this study is based on the time-varying lumped parameter model [13] for infant cardiovascular physiology, considering a baseline heart rate (HR) of 130 beats per minute. The ventricular compartmental pressure (P(t)) is a function of the difference between the compartmental volume of the ventricle (V (t)) during pumping and the unstressed volume (V0 ), which can be represented by the relation, P(t) = E v (t)[V (t) − V0 ]
(1)
where, E v (t) is the time-varying ventricular end-systolic elastance, which can be represented by the following relation [13], E v (t) =
T
E min + (E max − E min ) Kfn sin(πT f ), E min ,
where, Tf =
t − (Tas − Tav ) Tvs
T A = Tas + Tav
T A ≤ t < (T A + Tvs ) else
(2)
(3) (4)
334
R. Bhattacharjee et al.
with E min , E max as the minimum and maximum elastances, respectively. And, the normalization constant (K n ) is the maximum of the time function in the Eq. (2), i.e., max((T f /K n )sin(πT f )). Here, the atrial systole period is Tas = 0.03 + 0.09Ts s, the atrioventricular delay is Tav = 0.01 s and the ventricular systole period is Tvs = 0.16 + 0.20Ts s, respectively, with Ts being the heart period defined as Ts = 60/(H R), in connection with heart rate (HR). The compartmental inflow rate (q(t)) is defined by the relationship, q(t) = Di
Pin (t) − P(t) dq(t) +L R dt
(5)
where, Pin is the ventricular pressure during inflow, R is resistance offered by the vessel during inflow and L is the inertance. Here, Di is an ideal diode representing a cardiac valve that is either 0 (closed) or 1 (open) for the purpose of this study. Also, the change between inflow (q(t)) and outflow rate (qout (t)) represents the change in compartmental volume expressed as d V (t) = q(t) − qout (t) dt
(6)
The values for the critical parameters of this model can be found in [12, 13].
2.2 Baroreflex Model Baroreflex is a short-term autonomic nerve regulation that ensures hemodynamic stability by regulating neural effectors [14, 15] for several cardiovascular properties. It forms a closed-loop short-term pressure control system along with the CVS. In this study, it is considered that the baroreflex regulates the systemic vascular resistance (Rsvr ), end-systolic elastance of left ventricle (Elv ) and right ventricle (Erv ), depending on the sympathetic regulation mechanisms to stimuli being received from pulmonary receptors and arterial baroreceptors, which are sensitive to changes in lung volume and systemic arterial pressure, respectively. In order to develop the model for baroreflex, it is considered that the weighted sum of these two signals received from the receptors is used as an input to a sigmoid (σx ), which depicts the relationship between the sympathetic regulation mechanisms and the dynamic characteristics represented by a pure delay (Dx ) and a first-order low-pass filter with a time constant (τx ) as represented in [14]. Therefore, the baroreflex model for this study can be represented by the following relations: r x = G ax (Psa − Psan ) + G px (Vl − Vln ) σx =
xmin + xmax e(±rx /kx ) 1 + e(±rx /kx )
(7) (8)
Robust Control of Pulsatile Ventricular Assist Devices …
335
1 dθ(t) = [σx (t − Dx ) − x(t)] dt τx
(9)
k x = (xmax − xmin )/(4Sx0 ); Sx0 = ±1
(10)
Here, x is a generic representation of Rsvr , Elv and E rv . Psa , Psan depicts the systemic arterial pressure and the basal value of the quantity, respectively. Similarly, Vl , Vln depicts the lung volume before and after a normal expiration cycle, respectively. G px , G ax represent the maximal gain of the pulmonary receptors and arterial baroreceptors, respectively. Also, k x represents the parameter that sets the slope at the central point of the sigmoidal relationship Sx0 . The list of values for these parameters characterizing the regulatory feedback mechanisms and a detailed description can be found in [14].
2.3 LVAD Model In this study, a pulsatile para-corporeal LVAD is considered to be connected between the left ventricle and the aorta of the CVS. Such pulsatile devices comprise an air chamber and a blood chamber, separated by an impermeable membrane, with valves on either side of the blood chamber for inflow and outflow [12]. These devices mostly use a pneumatically driven pump, denoted by pressure signal Pd , which generates a constant ejection pressure (Pe ) in the air chamber for a fixed period of time, termed as the VAD systolic period (Tsys ), leading to membrane displacement, which in turn closes the input valve and opens the output valve of the blood chamber to eject the blood into the aorta. After this period, the pump pressure (Pd ) switches to the filling pressure (P f ) till the beginning of the next systolic period, which decreases the pressure in the air chamber and causes the blood chamber to be filled again for the next cardiac cycle. For this study, P f is assumed to be zero. The lumped parameter model of the device is time-varying as detailed in [16].
2.4 Synchronized Physiological Control System Now, in order for the CVS to work synchronously with the LVAD without affecting the degraded pumping function of the failing heart and maintaining the hemodynamics of the patient, a synchronous physiological control system (SPCS) is used from [12], which is triggered by the detection of the R-wave peak in the patient ECG, based on the MaMeMi filter [17]. The detection of the R-wave, say at time t R [k], can be used to calculate the cardiac period, say Tc [k] = t R [k + 1] − t R [k], where k denotes the number of the cardiac cycle. Based on these parameters, the instant of the VAD ejection, say tVAD [k], can be defined as [12],
336
R. Bhattacharjee et al.
tVAD [k] = t R [k] + μTc [k − 1]; 0 ≤ μ < 1
(11)
where μ is a constant that determines the time delay for activation of the VAD ejection for the time period Tsys as mentioned earlier. The magnitude of the ejection pressure (Pe [k]) for the VAD is calculated once per cardiac cycle at tVAD [k], with the aid of a control algorithm, triggered by the error, say e[k], which defined as e[k] = Psa∗ [k] − Psa [k]
(12)
where Psa∗ [k] is the reference mean value of the systemic arterial pressure and Psa [k] is the mean of the measured real-time systemic arterial pressure values of the patient, obtained by non-invasive means. At the end of the VAD systolic period defined as Tsys [k] = tVAD [k] + δTc [k − 1]; 0 ≤ δ < 1
(13)
the ejection pressure (Pe [k]) is automatically adjusted and the controller output is switched to the filling pressure (P f [k]) The schematic of the entire SPCS mechanism can be found in [12].
3 Controller Formulation Based on the above discussions, it is clear that the controller has an essential role to play in maintaining the magnitude of the ejection pressure (Pe ) for the LVAD to work efficiently, based on the instantaneous arterial pressure error e[k]. Literature suggests that most of the control configurations that are used for this purpose are data-driven controllers [12, 18] or modified PID-based feedback controllers [19, 20]. However, it is clear that system under investigation is highly nonlinear and is prone to deviations in system parameters, and sensitive to external disturbances. The novelty of the present approach lies in the fact that it introduces a nonlinear robust control strategy viz. the sliding mode control (SMC) for an ECG synchronized system as in [12] with the baroreflex mechanism, since the present literature lacks such a study, to the best of our knowledge. The SMC strategy is chosen since it is known to be insensitive to parametric variations and has features of disturbance rejection. To begin with, a sliding manifold is now defined for the discrete-time sliding mode controller (SMC) used in this study in the form, ˙ + λ1 e[k] sm [k] = λ2 e[k]
(14)
where, λ2 , λ1 satisfies the Hurwitz condition (λ1 , λ2 > 0), e[k] is given by (12) and the error rate is given as e[k] − e[k − 1] (15) e[k] ˙ = Δt
Robust Control of Pulsatile Ventricular Assist Devices …
337
where, Δt is the sampling time. Based on this, a simplified control law can be written as (16) u VAD [k] = δ Pe [k] = −ρtanh(sm [k]); ρ > 0 such that, the new value of the ejection pressure can be defined as Pe [k] = Pe [k − 1] + δ Pe [k]
(17)
In (16), the tangent hyperbolic function is used in place of the conventional signum function so as to tackle the chattering phenomenon.
4 Results and Discussion The SPCS with the baroreflex mechanism is developed in MATLAB/Simulink and most of the parametric values can be found in [12–14]. Also, in order to determine the values of the controller parameters, a genetic algorithm (GA) based optimization scheme is used to minimize the Integral of Absolute Error (IAE). The results of the simulation are presented with the aid of two case studies. In the first case, the variations in physiological parameters of a healthy CVS and a CVS with AdHF are discussed. In the next case, the effects of the LVAD with the SMC controller on the AdHF patient are considered.
4.1 Case 1: Parametric Comparison in Healthy, AdHF, and LVAD Equipped AdHF CVS The variations in the physiological conditions of the CVS can be understood based on several parameters, viz. the mean systemic arterial pressure (Psa ) and the systemic vascular resistance (Rsvr ). Figure 1 shows a comparison between these parameters for a healthy, AdHF, and LVAD equipped AdHF cardiovascular system (CVS). It can be observed from Fig. 1a that for a particular activity the healthy heart, reaches to a steady state value of Psa around 80.75 mm Hg after an initial transient of 10 seconds. However, an AdHF CVS cannot reach this value for similar conditions. This is where an LVAD equipped with such an AdHF CVS can help it to work normally as evident from Fig. 1a, where it reaches the same steady state value as the healthy CVS. Also, the overshoot and undershoot in Psa in Fig. 1a arises as a transient due to the sudden change in flow parameters when an LVAD is introduced in the body of the patient for the first time. It should also be noted here that the values (pressures and volumes) of the dynamic system (patient body) at the time of LVAD introduction or initialization are responsible for the initial fluctuations, but the effect of those initial
338
R. Bhattacharjee et al.
Fig. 1 Comparison of physiological parameters between a healthy and AdHF CVS, a variation of mean systemic arterial pressure (Psa ), b variation in systemic vascular resistance (Rsvr )
parameters is insignificant once a steady state is reached. Also, the undulations in both the parameters in Fig. 1a, b can be attributed to the effect of baroreflex mechanisms.
4.2 Case 2: Improvement in AdHF Parameters with SMC-LVAD The performance of the proposed SMC-based LVAD structure is compared with previously established control architectures like Proportional-Integral-Derivative (PID) [19, 20] and simplified polynomial controllers [12] for two different cases of the delay constant (mu), viz. 0.2 s in Fig. 2a, and 0.4 s in Fig. 2b with a baseline heart rate (HR) of 130 bpm considering the variation of the mean systemic arterial pressure (Psa ) during patient activity. The performance of the controllers is quantified based on three conventional performance metrics, namely, Integral absolute error (IAE), Integral time-weighed absolute error (ITAE), and the Control effort (CE) and presented in Table 1 for cases (a) and (b). It can be observed from the table that the SMC strategy shows the lowest values for all three indices for both the considered cases (indicated in bold), depicting the improvement in performance. Figure 3 shows the variation of the mean systemic arterial pressure (Psa ) with varying demand during patient activity with SMC controlled LVADs. Figure 3a, b illustrates that despite change in heart rate (HR) or variation in delay constant (μ) for VAD activation, the controller is able to follow the demand quite satisfactorily. Figure 3c shows the correlation manifold between the variation of heart rate (HR), delay constant (μ), and integral of absolute error values e[k] in (12), i.e., IAE values. This 3D plot clearly depicts that this represents a highly nonlinear workspace and should be kept in mind while developing the controller for LVADs.
Robust Control of Pulsatile Ventricular Assist Devices …
339
Fig. 2 Comparison of controller performances for mean systemic arterial pressure (Psa ) with heart rate (HR) of 130 bpm and delay constant (μ) of a 0.2 s, b 0.4 s for patients with LVAD equipped AdHF Table 1 Performance comparison of the SMC, PID, and polynomial controllers for the LVAD equipped AdHF patients Controllers Case (a) Case (b) IAE ITAE CE IAE ITAE CE PID 64.33 Polynomial 53.86 SMC 46.27
1244 846.4 549.7
2008 1678 1253
53.38 42.48 32.88
890.1 583.9 354.8
1739 1341 943.7
Fig. 3 Comparison of mean systemic arterial pressure (Psa ) for a variation in heart rate (HR), b variation in time delay constant for VAD activation (μ) for baseline HR of 130 bpm, c correlation manifold between heart rate (HR), delay constant (μ) and IAE values of Psa relative to the reference Psa∗ , for patients with SMC controlled LVAD
340
R. Bhattacharjee et al.
5 Conclusion In this work, the authors focus on developing a robust control algorithm for pulsatile LVAD used for pediatric AdHF patients. A sliding mode-based control philosophy is used to achieve the desired trajectory tracking for the mean arterial pressure. The results show that despite changes in the physiological parameter of the heart, i.e., the heart rate (HR) and variation in LVAD activation delay, the proposed scheme is effective in tracking the desired variation in pressure. Acknowledgements We acknowledge Prof. Antonio Lima (Universidade Federal de Campina Grande, Brazil) and Dr. Thiago Cordeiro Universidade Federal de Alagoas, Brazil), for sharing their work and for the insightful discussions during the course of this study.
References 1. W.H.O: Cardiovascular diseases. World Health Organization (2022). https://www.who.int/ health-topics/cardiovascular-diseases 2. W.H.O: Coronavirus (COVID-19) dashboard. World Health Organization (2022). https:// covid19.who.int/ 3. Bowen RE, Graetz TJ, Emmert DA, Avidan MS (2020) Statistics of heart failure and mechanical circulatory support in 2020. Annal Trans Med 8(13):1–10 4. Remme WJ, Swedberg K, T.F.f.t.D (2001) Treatment of chronic heart failure, E.S.o.C.: guidelines for the diagnosis and treatment of chronic heart failure. Eur Heart J 22(17):1527–1560 5. Borlaug BA, Redfield MM (2011) Diastolic and systolic heart failure are distinct phenotypes within the heart failure spectrum. Circulation 123(18):2006–2014 6. Adams KF, Zannad F (1998) Clinical definition and epidemiology of advanced heart failure. Am Heart J 135(6, Supplement):204–215 7. Severino P, Mather PJ, Pucci M, D’Amato A, Mariani MV, Infusino F, Birtolo LI, Maestrini V, Mancone M, Fedele F (2019) Advanced heart failure and end-stage heart failure: does a difference exist. Diagnostics 9(4):1–10 8. Timms D (2011) A review of clinical ventricular assist devices. Med Eng Phys 33(9):1041–1047 9. Loor G, Gonzalez-Stawinski G (2012) Pulsatile versus continuous flow in ventricular assist device therapy. Best Pract Res Clin Anaesthesiol 26(2):105–115 10. Bozkurt S, van de Vosse FN, Rutten MC (2014) Improving arterial pulsatility by feedback control of a continuous flow left ventricular assist device via in silico modeling. Int J Artif Organs 37(10):773–785 11. Cheng A, Williamitis CA, Slaughter MS (2014) Comparison of continuous-flow and pulsatileflow left ventricular assist devices: is there an advantage to pulsatility? Annal Cardiothorac Surg 3(6):573–581 12. Cordeiro TD, Sousa DL, Cestari IA, Lima AM (2020) A physiological control system for ECGsynchronized pulsatile pediatric ventricular assist devices. Biomed Signal Process Control 57:101752 13. Goodwin JA, van Meurs WL, Couto CDS, Beneken JE, Graves SA (2004) A model for educational simulation of infant cardiovascular physiology. Anesth Analg 99(6):1655–1664 14. Ursino M, Magosso E (2003) Role of short-term cardiovascular regulation in heart period variability: a modeling study. Am J Physiol-Heart Circul Physiol 284(4):H1479–H1493 15. Liu H, Liu S, Ma X, Zhang Y (2020) A numerical model applied to the simulation of cardiovascular hemodynamics and operating condition of continuous-flow left ventricular assist device. Math Biosci Eng 17(6):7519–7543
Robust Control of Pulsatile Ventricular Assist Devices …
341
16. Sousa D, Cordeiro T, Melo T, da Rocha Neto J, Cestari I, Lima A (2018) Modeling, characterization and test of a pediatric ventricular assist device. J Phys: Conf Ser 1044(1):012047 17. Castells-Rufas D, Carrabina J (2015) Simple real-time QRS detector with the MAMEMI filter. Biomed Signal Process Control 21:137–145 18. Khaledi M, Abolpour R, Mohammadi M, Dehghani M (2021) Data-driven model predictive controller design for left ventricular assist devices. In: 7th International Conference on Control, Instrumentation and Automation (ICCIA). IEEE, pp 1–5 19. Son J, Du D, Du Y (2020) Modelling and control of a failing heart managed by a left ventricular assist device. Biocybern Biomed Eng 40(1):559–573 20. Khaledi M, Dehghani M, Mohammadi M, Abolpour R (2020) Controller design for left ventricular assist devices in patients with heart failure. In: 27th National and 5th international conference of biomedical engineering (ICBME), pp 326–332
Performance Analysis of a Chaotic OFDM-FSO Communication System Chinmayee Panda, Sudipta Sahoo, Urmila Bhanja, and Debajyoti Mishra
Abstract Free Space Optical (FSO) communication offers enormous solution to last-mile connectivity and serves as complementary access technology to Radio Frequency (RF) and millimeter wave wireless systems. However, different weather conditions such as thick fog, smoke, and turbulences affect the system performance for which Q-factor degrades. This work implements chaotic concept to the hybrid OFDM-FSO system which successfully mitigates inter-channel interference (ICI) and fading. In this paper, the Q-factor under different weather conditions is estimated and compared with the conventional chaotic FSO system. Again, the BER performance of the proposed model is measured for different turbulence conditions and is compared with existing FSO systems. The proposed chaotic OFDM-FSO system provides a secure communication having higher Q-factor and significant bit error rate. Keywords FSO · Chaotic mask · OFDM
1 Introduction FSO uses license-free frequency spectrum and is less costly [1]. FSO communication system wirelessly uses the optical carrier to convey information through free space. This is regarded as a complementary access technology to radio frequency (RF) systems, as it offers an efficient solution to last-mile connectivity [2]. This work uses orthogonal frequency division multiplexing (OFDM) techniques, which reduces the inter-channel interference (ICI) and the effect of frequency selective fading successfully [3]. Base-band modulation, nature of laser beam, and number of subcarriers are the various parameters which affects the performance of OFDM-based FSO systems [4]. Atmospheric turbulences cause fading that degrades the signal strength which can be improved by various diversity techniques [5]. Authors in [6] C. Panda (B) · S. Sahoo · U. Bhanja · D. Mishra Department of Electronics and Communication Engineering, IGIT, Sarang 759146, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_30
343
344
C. Panda et al.
observed the BER performance of coherent optical OFDM-FSO (CO-OFDM-FSO) system by considering the weather conditions. Bhanja and Panda [7] evaluated the SAC-OCDMA-OFDM-FSO system by introducing two novel zero cross-correlation codes. Considering different turbulence conditions, Panda and Bhanja have modified the receiver structure of OFDM-FSO system and investigated the result [8]. Researchers observed the performance of FSO system by utilizing different forward error correction codes [9, 10]. Authors in [11–13] have addressed chaotic FSO communication. However, the advantages of OFDM are not utilized in these works. OFDM has high bandwidth efficiency and tremendous tolerance capacity against frequency selective channels for which it can be efficiently used in FSO systems. For ensuring security, to provide a higher Q-factor and better BER performance, chaotic concept is hybridized with OFDM-FSO communication system in this work. The contributions of the paper are listed as follows: • Performance of chaotic OFDM-FSO system in terms of BER and Q-factor is analyzed. • The Q-factor of OFDM-FSO communication system is observed with and without applying chaotic mask. The arrangement of paper is provided as follows. The mathematical modeling for a secure OFDM-FSO communication system is represented in Sect. 2. The system model followed by simulation results are explained in Sects. 3 and 4, respectively. The conclusion part is included in Sect. 5.
2 Mathematical Analysis The optical chaos generation, the behavior of different environmental and turbulence conditions is analyzed in this section.
2.1 Laser Rate Equation To secure the information signal, optical chaos is used. Depending upon the wave shapes, optical chaos is classified into pulsed type and non-pulsed type. Pulsed optical chaos exhibits better Lyapunov exponents than non-pulsed-type optical chaos and provides better security. Lyapunov exponents determine the degree of chaos. The laser rate equation expresses the direct modulation of semiconductor laser producing pulsed optical chaos [14] dC(t) 1 I (t) C(t) − g0 (C(t) − Ct ) = − dt q(v) τn 1 + εS(t)
(1)
Performance Analysis of a Chaotic OFDM-FSO Communication System
Where
S(t) dS(t) βC(t) 1 = g0 (C(t) − Ct ) − = dt 1 + εS(t) τp τn
345
(2)
where g0 in Eq. (1) is expressed in Eq. (3) [14] g0 = V ga0
(3)
where, Vg, , τ p , A, ε, V, τ n , C t , and a0 depict group velocity, mode confinement factor, photon lifetime, line width enhancement factor, gain compression factor, active layer volume, electron lifetime, carrier density at transparency, spontaneous emission coupling factor, and active layer coefficient, respectively. The input current to the model is expressed as the sum of DC bias current and OFDM signal and is expressed in Eq. (4) I (t) = Idc + Imw Oofdm (t) where, Oofdm =
msub−1 m=0
O M e j2π fm t
(4) (5)
In Eq. (5), the number of subcarriers of the OFDM signal is denoted as msub. Putting the value of Eq. (4) in Eq. (1), Eq. (6) is obtained. Hence, the laser rate equation is represented as Idc + Imw Oofdm C(t) dC(t) 1 = − − g0 (C(t) − Ct ) dt q(v) τn 1 + εS(t)
(6)
2.2 Attenuation Rain attenuation can be expressed in Eq. (7) [15] Ra = β R ξ
(7)
where Ra indicates rain attenuation (dB/km), ξ is the scattering coefficient and the rain rate (nm/h) is represented by R. Kruse and Kim models are chosen to calculate fog attenuation [16–18] and the haze attenuation is calculated by using Beer–Lambert Law which is expressed in Eq. (8) [19] τ = e−σ d
(8)
where d is the propagation distance and σ indicates the specific attenuation coefficient per unit length [20].
346
C. Panda et al.
Fig. 1 Proposed chaotic OFDM-FSO communication model
3 System Model The chaotic OFDM-FSO communication model is given in Fig. 1. Data source, QAM modulator, OFDM modulator, continuous wave (CW) laser, and Mach–Zehnder modulator are included in transmitter part. The operating wavelength of CW laser is 1550 nm with 5 dBm power. Data source generates the data bits which are given to QAM modulator followed by OFDM modulator. The OFDM-modulated signal is mixed with the chaotic signal produced by chaotic laser and then transmitted through free space. The mixed chaotic OFDM signal is transmitted over free space and received by a receiver as shown in the receiver section. Now, the received signal is demodulated by the OFDM demodulator and the original data is obtained by the low-pass filter.
4 Simulation Result The proposed model is simulated in Opti system version 14.0 and MATLAB. All the parameters taken for simulation are provided in Tables 1, 2, and 3. Figure 2 depicts the OFDM signal and Fig. 3 shows the chaotic waveform originated by the chaotic laser. A chaotic signal is a random noise signal without any information and with numerous random peaks. The Gaussian-natured pulses are used to hide the original message. Chaotic waveform produced by chaotic laser works with an operating wavelength of 1550 nm and 5 dBm power as CW laser. The chaos bandwidth is directly controlled by varying the frequency of the current source. The chaotic OFDM signals are shown in Figs. 4 and 5. The chaotic OFDM signal containing the hidden data becomes noisy after transmission. Due to atmospheric influences, the transmitted signal amplitude falls drastically after passing through free space which is visualized in Fig. 5. The signal after subtracting chaos is shown in Fig. 6.
Performance Analysis of a Chaotic OFDM-FSO Communication System Table 1 Simulation parameters of chaotic laser [11]
Table 2 Simulation parameters of chaotic OFDM-FSO system
Table 3 Different weather conditions with their attenuation values [6]
347
Parameter
Value
Operating wavelength
1550 nm
Modulation peak current
35 mA
Laser power
0.5 dBm
Threshold power
0.0154 mW
Bias current power
0 dBm
Threshold current
33.45 mA
Bias current
30 mA
Parameters
Value
Modulation type
8 QAM
No. of subcarriers
128
Length of sequence
1024
Tx aperture diameter
20 cm
Range
0.5–5 km
Operating wavelength
1550 nm
Attenuation
18 dB/km
Continuous wave laser power
0.5 dBm
Data rate
2.5 Gbit/s
Filter
Low pass filter
Weather conditions
Attenuation value (dB/km)
Heavy rain
19.795
Moderate fog
15.555
Haze
20.68553
Light rain
6.2702
Little fog
4.2850
The chaotic and non-chaotic OFDM-FSO communication models yield different Q-factors which are shown in Fig. 7. For obtaining Q-factor, the range is varied from 0.5 to 5 km. Among all weather conditions, the system exhibits the worst Q-factor for haze condition and best for little fog condition. At 0.5 km range, for haze condition, the Q-factor is 49.7 for chaotic OFDM-FSO system, whereas, for non-chaotic OFDM-FSO system, the Q-factor is 117.98 at identical data rate and FSO range. Also, for haze condition, it is investigated that the Q-factor gradually decreases to 0 by increasing the FSO range to 2.5 km. For little fog condition, the Q-factor attains 0 value at the range of 5 and 4.5 km for non-chaotic and chaotic OFDM-FSO systems, respectively. In moderate fog condition, the obtained highest
348
C. Panda et al.
Fig. 2 OFDM signal
Fig. 3 Chaotic waveform
Q-factor is 142.46 for non-chaotic OFDM-FSO system, whereas it is dropped to 59.92 for chaotic OFDM-FSO system. Q-factor is the highest in little fog condition for both chaotic and non-chaotic OFDM-FSO systems. Chaotic OFDM-FSO system exhibits a Q-factor of 235.04, whereas non chaotic OFDM-FSO system yields 95.14. After chaos addition, the chaos-mixed OFDM signal becomes more complex, noisy having a random amplitude which becomes more difficult to be recovered at the receiver side. It is observed that Q-factor degrades by adding chaos to OFDM signal in FSO for different weather conditions. However, chaotic OFDM-FSO system was found to be superior to chaotic FSO system.
Performance Analysis of a Chaotic OFDM-FSO Communication System
349
Fig. 4 OFDM mixed with chaotic before transmitter
Fig. 5 OFDM mixed with chaotic after transmitter
Figure 8 compares the Q-factor of chaotic OFDM-FSO with chaotic FSO system in different weather conditions. At 0.5 km range, among all the weather conditions, the little fog yields the best Q-factor of 95.14 for chaotic OFDM-FSO system, whereas the resulted Q-factor is 79.53 for chaotic FSO system [11]. The Q-factor degrades to 0 at 4.5 km for chaotic OFDM-FSO system, whereas, for chaotic FSO system, it degrades to 0 at the range of 3.5 km. The chaotic OFDM-FSO system, in heavy rain, light rain, and moderate fog conditions, exhibits Q-factor of 50.7294, 88.024, and 59.9294, respectively. The chaotic FSO system, in heavy rain, light rain, and moderate
350
C. Panda et al.
Fig. 6 Signal after subtraction of chaos 250 Light rain(Without Chaos OFDM-FSO) Heavy rain(Without Chaos OFDM-FSO) Little fog(Without Chaos OFDM-FSO) Moderate fog(Without Chaos OFDM-FSO) Haze(Without Chaos OFDM-FSO) Light rain(Without Chaos OFDM-FSO) Heavy rain (With Chaos OFDM-FSO) Little fog(With Chaos OFDM-FSO) Moderate fog(With Chaos OFDM-FSO) Haze(With Chaos OFDM-FSO)
200
Q-factor
150
100
50
0 0.5
1
1.5
2
2.5 3 Range(Km)
3.5
4
4.5
5
Fig. 7 Q-factor of OFDM-FSO system with and without chaos
fog conditions, exhibits Q-factor of 31.261, 69.9152, and 39.953, respectively, for identical values of FSO range [11]. Further, haze condition exhibits the worst Qfactor for chaotic FSO and chaotic OFDM-FSO systems. At 0.5 km distance, in haze weather condition, the chaotic OFDM-FSO yields Q-factor of 49.73, whereas chaotic FSO system exhibits Q-factor of 28.98. Hence, the tolerance capacity of chaotic OFDM-FSO system against frequency selective channels makes it efficient to exhibit better Q-factor as compared with chaotic FSO systems. Figure 9 analyzes BER versus SNR plot of chaotic OFDM-FSO system for moderate and weak turbulence conditions and the result is compared with chaotic single input single output (SISO)-FSO system under gamma–gamma channel model.
Performance Analysis of a Chaotic OFDM-FSO Communication System
351
100 Light rain(Chaotic OFDM-FSO) Heavy rain(Chaotic OFDM-FSO) Little fog(Chaotic OFDM-FSO) Moderate fog(Chaotic OFDM-FSO) Haze(Chaotic OFDM-FSO) Light rain(Chaotic FSO) Heavy rain(Chaotic FSO) Little fog(Chaotic FSO) Moderate fog(Chaotic FSO) Haze(Chaotic FSO)
90 80 70
Q-factor
60 50 40 30 20 10 0 0.5
1
1.5
2
3 2.5 Range(Km)
3.5
4
4.5
5
Fig. 8 Comparison between chaotic OFDM-FSO with chaotic FSO
At an SNR value of 2 dB, the proposed model and chaotic SISO-FSO system exhibit BER of 10–1 in moderate turbulence condition. By enhancing SNR, for moderate turbulence condition, chaotic SISO-FSO system [21] exhibits BER of 10–3 at 18 dB SNR, whereas the proposed model yields BER of 10–8 for identical values of SNR. In weak turbulence condition, chaotic SISO-FSO system [21] exhibits BER of 10–4 at an SNR value of 16 dB, whereas the proposed model yields BER of 10–10 for identical values of SNR. Since OFDM is hybridized to the FSO system the proposed model provides significant improvement in BER as compared with classical chaotic SISO-FSO system. Figure 10 compares the proposed model with the existing hybrid polarization division multiplexing (PDM) coherent optical OFDM-FSO model [22]. Considering various weather conditions, the BER performance of both the models is investigated by varying the link range up to 5 km. Simulation shows that the proposed model exhibits better BER and covers a larger distance as compared to the hybrid PDM/ CO-OFDM-FSO model. Under mild haze condition, the chaotic OFDM-FSO model covers a maximum range of 5 km and achieves better BER performance as compared to light rain and low fog weather conditions. In Fig. 11, the transmitted and recovered signals given clearly prove the recovery of secure original OFDM signal in the FSO medium. Hence, this work implements a secure FSO system by applying a chaos masking scheme which conceals the information signal in the chaotic waveform produced by semiconductor laser. When chaotic mask is utilized in the OFDM-FSO system, a noise-like complex waveform is generated having greater, random amplitudes which degrades the overall performance. However, use of chaotic mask protects the model from various attacks.
352
C. Panda et al.
Fig. 9 BER versus SNR for chaotic OFDM-FSO and chaotic SISO-FSO under gamma–gamma channel model
Fig. 10 BER versus OSNR for chaotic OFDM-FSO and CO-OFDM-FSO models
(a) Fig. 11 Comparison of transmitted and recovered signal
(b)
Performance Analysis of a Chaotic OFDM-FSO Communication System
353
Figure 12 shows the height of eye diagrams which clarify that haze conditions suffer the worst attenuation, whereas light rain and little fog suffer the least when chaos masking scheme is applied.
(a) Eye diagram of the proposed model under Lightrain
(c) Eye diagram under little fog
(b) Eye diagram under Heavy rain
(d) Eye diagram under Moderate fog
(e) Eye diagram of the proposed model under Haze
Fig. 12 a Eye diagram of the proposed model under light rain, b eye diagram under heavy rain, c eye diagram under little fog, d eye diagram under moderate fog, e eye diagram of the proposed model under haze
354
C. Panda et al.
5 Conclusion This paper generates chaos by semiconductor laser and implements a secure hybrid OFDM-FSO communication. A novel laser rate equation is developed for the proposed model. Simulated results exhibit that the Q-factor of non-chaotic OFDMFSO system is better than chaotic OFDM-FSO system. However, chaotic OFDMFSO system exhibits a higher Q-factor than chaotic FSO system. By considering the FSO range from 0.5 to 5 km, using 0.5 dBm laser power, the proposed model is implemented with a data rate of 2.5 Gbit/s. The proposed model yields the best Q-factor under little fog conditions for chaotic OFDM-FSO system. At 0.5 km distance, under haze condition, the chaotic OFDM-FSO yields Q-factor of 49.73 and chaotic FSO system exhibits Q-factor of 28.98, which is the worst among all weather conditions. The simulated chaotic OFDM-FSO system exhibits a higher Q-factor as compared to the chaotic FSO system. By considering gamma–gamma model under different turbulence conditions, the proposed model exhibits acceptable BER in the range of 10–10 and 10–8 in weak and strong turbulence conditions, respectively.
References 1. Kaufmann J (2011) Free space optical communications: an overview of applications and technologies. In: Proceedings of the Boston IEEE communications society meeting 2. Malik A, Singh P (2015) Free space optics: current applications and future challenges. Int J Opt 3. Bekkali A, Naila CB, Kazaura K, Wakamori K, Matsumoto M (2010) Transmission analysis of OFDM-based wireless services over turbulent radio-on-FSO links modelled by gamma-gamma distribution. IEEE Photonics J 2(3):510–520 4. Armstrong J (2009) OFDM for optical communication network. J Light Wave Technol 27(3):189–204 5. Zhu X, Kahn JM (2002) Free-space optical communication through atmospheric turbulence channels. IEEE Trans Commun 50(8):1293–1300 6. Kumar N, Rana AK (2013) Impact of various parameters on the performance of free space optics communication system. Optik 124(22):5774–5776 7. Bhanja U, Panda C (2020) Performance analysis of hybrid SAC-OCDMA-OFDM model over free space optical communication. CCF Trans Netw. Springer. https://doi.org/10.1007/s42045020-00039-6 8. Panda C, Bhanja U (2021) Performance improvement of hybrid OFDM-FSO system using modified OFDM receiver. IJSCC 12(3) 9. Bukola A, Odeyemi K, Owolawi P, Viranjay S (2019) Performance of OFDM-FSO communication system with different modulation schemes over gamma-gamma turbulence channel. J Commun 14(6):490–497 10. Panda C, Bhanja U (2022) Energy efficiency and BER analysis of concatenated FEC coded MIMO-OFDM-FSO system. In: IEEE conference, ICAECC 11. Niaz A, Qamar F, Ali M, Farhan R, Islam M (2019) Performance analysis of chaotic FSO communication system under different weather conditions. Emerg Telecommun Technol 30(2) 12. Riaz A, Ali M (2008) Chaotic communications, their applications and advantages over traditional methods of communication. In: 6th international symposium on communication systems, networks and digital signal processing, Graz, Austria. IEEE, pp 21–24
Performance Analysis of a Chaotic OFDM-FSO Communication System
355
13. Sultan A, Yang X, Hajomer A, Hussain S, Weicheng H (2019) Chaotic distribution of QAM symbols for secure OFDM signal transmission. Opt Fiber Technol 47:61–65 14. Ali S, Islam MK, Zafrullah M (2013) Effect of transmission fiber on dense wavelength division multiplexed (DWDM) chaos synchronization. Int J Light Electron Opt 124(12):1108–1112 15. Mohammed H, Aljunid S, Fadhil H, Abd T (2013) A study on rain attenuation impact on hybrid SCM-SAC OCDMA-FSO system. In: IEEE conference on open systems (ICOS), Kuching, Malaysia 16. Kashani MA, Uysal M, Kavehrad M (2015) A novel statistical channel model for turbulenceinduced fading in free-space optical systems. J Lightwave Technol 17. Naboulsi A, Sizun H, Fornel F (2004) Fog attenuation prediction for optical and infrared waves. Opt Eng 43(2):319–330 18. Bouchet O, Marquis T, Chabane M, Alnaboulsi M, Sizun H (2005) FSO and quality of service software prediction. In: Proceedings of the SPIE 5892, free-space laser communications, p 589204 19. Kruse PW, McGlauchlin LD, McQuistan RB (1962) Elements of infrared technology: generation, transmission and detection. Wiley, New York 20. Basahel A, Islam MR, Suriza AZ, Habaebi MH (2016) Haze impact on availability of terrestrial free space optical link. In: International conference on computer and communication engineering (ICCCE), Kuala Lumpur 21. Abdulameer L (2018) Optical CDMA coded STBC based on chaotic technique in FSO communication systems. Radio Electron Commun Syst 61(10) 22. Malhotra J, Mani Rajan MS, Vigneswaran D, Moustafa HA (2020) A long-haul 100 Gbps hybrid PDM/CO-OFDM FSO transmission system: impact of climate conditions and atmospheric turbulence. Alex Eng J. https://doi.org/10.1016/j.aej.2020.10.008
Further Improved 198 nW Ultra-Low Power 1.25 nA Current Reference Circuit with an Extremely Low Line Sensitivity (0.0005%/V) and 160 ppm/◦ C Temperature Coefficient Koyel Mukherjee, Soumya Pandit, and Rajat Kumar Pal
Abstract The work, presented in this article, shows a further improvement in the performance of one of our previous works of 1 nA current reference circuit [1]. The proposed work produces a 1.25 nA current reference in post-layout simulation. The present work shows a 0.0005%/V line sensitivity and 160 ppm/◦ C temperature coefficient, resulting in an improvement by 75% and 25.5%, respectively, from the corresponding values published in [1]. The overall power consumption of the work in [1], is reduced from 588 nW to 198 nW in the proposed work, thus resulting in a 66% power improvement. In addition, the proposed work demands 98% less transistor area requirement w.r.t. that of the work presented in [1]. Keywords Low power current reference circuit · Low line sensitivity · Low-temperature coefficient reference current
1 Introduction A current reference circuit plays an important role to set the bias point of analog and mixed-signal circuits. A reference current must be independent of supply and temperature variations. Also, must dissipate a very small amount of power to meet low-power applications. S. Pandit is the corresponding author and principal supervisor of the work. K. Mukherjee (B) · S. Pandit (B) Institute of Radio Physics and Electronics, University of Calcutta, Kolkata, India e-mail: [email protected] S. Pandit e-mail: [email protected] R. Kumar Pal Department of Computer Science and Engineering, University of Calcutta, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_31
357
358
K. Mukherjee et al.
In the last several years, couple of works have been reported on low-power current reference circuit. Article [2] has reported a 9.77 nA reference current that consumes a power of 28 nW. A 26.1 nA reference current is reported in [3] with a maximum power dissipation of 104 nW under 2 V supply voltage. A 16 µA current is reported in [4] showing a very large line sensitivity of 4%/V.
1.1 Motivation and Contribution Our previous work [1] aimed at designing a current reference circuit that produced a 1 nA reference current to serve low-power analog and mixed-signal applications. The circuit offered an extremely low line sensitivity (0.002%/V) which was the then least reported value to the best of our knowledge. A moderate temperature coefficient of 214 ppm/◦ C and 588 nW power consumption was also reported. However, the circuit in [1] had a few design-related issues, which, if addressed properly, is expected to improve the performance further to meet low-power application better. Very low sensitivity to the variation of temperature is a critical factor for any reference circuit. Moreover, to meet the goal of low-power requirement, the power consumption should also be as minimum as possible. Therefore, in this proposed work, we aimed for further improvement of the factors of [1]. Our present work delivers 1.25 nA reference current with an improvement of 66% and 25.5%, respectively, in its consumed power and temperature-sensitivity as compared to [1]. In addition to this, an improvement of 75% in line sensitivity and 98% in total transistor area is also achieved.
1.2 Organization of the Paper The rest of the article is organized as follows: a detailed discussion on design limitations of [1], and a hypothesis of the present work is presented as a theoretical formulation in Sect. 2. Section 3 describes the implementation of our proposed hypothesis. A detailed simulation result and a comparative study are presented in Sect. 4. Finally, the work is concluded in Sect. 5.
2 Theoretical Formulation 2.1 Design Limitations of Article [1] Figure 1 shows our earlier published current reference circuit with its transistor sizes in [1].
Further Improved 198 nW Ultra-Low Power 1.25 nA …
359
Fig. 1 Previous current reference circuit in [1] with transistor sizes in μm
The supply voltage independent expression of Iref , as derived in [1], is Iref =
Iop 2 K 4 K 11 Ion K 2 K 7
(1)
Here, Iop and Ion are the technology currents of PMOS and NMOS transistors, respectively. K 2 , K 4 , K 7 , and K 11 are the aspect ratios of the transistors M2 , M4 , M7 , and M11 , respectively. Io( p/n) is a temperature dependent factor and expressed as Io = 2μ0
T T0
−m
Cox (η − 1)
kB T q
2 (2)
Here, μ0 is the mobility at room temperature, T0 . Cox , η, m, and k B are oxide capacitance per unit area, subthreshold slope factor of MOS transistor, mobility temperature exponent, and Boltzman’s constant given by 1.38×10−23 J/K, respectively. From (2), the temperature dependence of Iref in [1] is established as, −m μop C μon ∂ Iref T = 2 − 2−m (3) ∂T T T0 μp μn Here, C is a constant term and is expressed as C=
2 K 4 · K 11 Iop · K 2 · K 7 Ion
(4)
360
K. Mukherjee et al.
Since M11 is working in the weak inversion mode, hence, Iref , in terms of the corresponding source to gate voltage, Vsg11 , is expressed as Iref = Iop K 11 exp
Vsg11 − |VTp | ηUT
= Iop K 11 exp
Vds1 − VR − |VTp | ηUT
(5)
Here, UT is the thermal voltage and |VTp | is the threshold voltage of the PMOS transistor M11 . From Fig. 1, the expression for gate to source voltage, Vgs2 of M2 is given as Id2 (6) Vgs2 = ηUT ln + VTn Ion K 2 Here, Id2 and VTn are the drain current and threshold voltage, respectively, of the NMOS transistor M2 . From (6), we find that the gate to source voltage, Vgs2 , of M2 , working in weak inversion mode, is inversely proportional to K 2 . Since, Vgs2 < Vgs1 , therefore, K 2 > K 1 . So as to make Vgs2 very small, K 2 was made as large as 80 by implementing 8 identical and fully parallel unit NMOS transistors, each with width W = 100 µm and channel length L = 10 µm. The large size of the M2 resulted in quite a high power dissipation of 588 nW. In the post-layout simulation, a significant amount of voltage drop occurred due to a large amount of current drawn by 8 multipliers of M2 . This resulted in a reduction in Iref by 2.5 times from its corresponding values obtained in pre-layout simulation. The reduction was, however, compensated by increasing the aspect ratio of M11 by the same amount by which the Iref had dropped in post-layout simulation. Hence, the width of M11 was increased from 16.5 µm to 42 µm while keeping the channel length same (L = 3.8 µm) as before.
2.2 Hypothesis of the Present Design In the present work we hypothesize that a reduction of the K 2 may cause a significant reduction in the supply current, resulting in reduced power consumption. Now, by analyzing (3) and (4), we find that the thermal sensitivity of Iref is directly proportional to the product of K 4 and K 11 , and inversely proportional to the product of K2 and K7 . We, therefore, hypothesize that, as K 2 reduces, so as to achieve reduced thermalsensitivity, product of K 4 and K 11 will also have to reduce. Hence, the aspect ratios K 2 , K 4 , K 7 , and K 11 are optimized so that the value of the constant term C, in (4), is reduced from 0.14 [1] to a lower value. Thus, a further reduction in thermal sensitivity can be achieved.
Further Improved 198 nW Ultra-Low Power 1.25 nA …
361
Table 1 Dimensions of the transistors of the present current reference circuit Transistor name Dimension (m × (W/L)) M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11
1 × (19/1) 1 × (60/1) 1 × (1/1) 1 × (1/1) 1 × (10/1.1) 1 × (10/1.1) 1 × (5/1) 1 × (5/1) 1 × (0.24/10) 1 × (0.8/10) 1 × (2.215/1)
The transistor size of M2 may be reduced from 800/10 to only 60/1 with only one unit transistor. Thus, the problem of voltage drop and thereby current drop due to the presence of the multiplier may be avoided. The transistor sizes of the proposed improved reference circuit are shown in Table 1.
3 Design Implementation The present work is designed in 0.18 µm technology under 1.8 V supply voltage. Both schematic and layout designs are performed in Cadence Virtuoso design editor tool. For low-power applications, all the transistors are biased in weak inversion mode saturation region, except M9 and M10 , which operate in strong inversion mode linear region. Figure 2 shows the complete layout of the proposed work. It is observed that a reduction in the size of the largest transistor, M2 , has resulted in a reduction of total transistor area, as measured in the design layout, from 222 µm × 192 µm to 21.3 µm × 40.2 µm.
4 Simulation Result and Discussion The simulations are performed in Cadence Spectre simulator tool in SCL 0.18 µm technology. The present work is simulated under variable environments like variable supply voltage (VDD ), temperature, and process corners.
362
K. Mukherjee et al.
Fig. 2 Complete layout of the proposed improved current reference circuit Fig. 3 Response of Iref to the supply variation
The proposed work produces a reference current, Iref , of 1.24 nA in schematic level simulation, with 0.0005%/V line sensitivity over VDD variation from 0.6 V to 3.5 V. Iref negligibly increases to 1.25 nA in post-layout simulation maintaining the same line sensitivity, as shown in Fig. 3. Figure 4 shows that the temperature coefficients in pre-layout and post-layout simulations are 154 ppm/◦ C and 160 ppm/◦ C, respectively, for temperature ranging from −40◦ C to +85 ◦ C. Figure 5 shows that the proposed work varies by only 244 pA over a load varying from 1 Ω to 200 MΩ. A 110 nA current is drawn from the power supply by the circuit, thus resulting in a total power consumption of 198 nW under VDD of 1.8 V and 25 ◦ C temperature, as shown in Fig. 6.
Further Improved 198 nW Ultra-Low Power 1.25 nA … Fig. 4 Response of Iref to the temperature variation
Fig. 5 Response of Iref to the load variation
Fig. 6 Total current drawn from the power supply
363
364
K. Mukherjee et al.
Table 2 Response of the improved reference circuit in pre-layout and post-layout simulation Parameters Pre-layout Post-layout Iref (nA) Line sensitivity (%/V ) Temperature coefficient (ppm/◦ C) Total current (nA) Power consumption (nW)
1.24 0.0005 154
1.25 0.0005 160
98 176.4
110 198
Table 3 Comparative results of our previous work [1] and the proposed improved work Parameters Previous work This work Difference in % [1] Iref (nA) Line sensitivity (%/V) Temperature coefficient (ppm/◦ C) Supply current (nA) Power consumption (nW) Transistor area (μm 2 )
0.99 0.002 214
1.25 0.0005 160
26 75 25.2
326 588
110 198
66.2 66.3
222 × 190
21.3 × 40.2
98
Table 2 shows a reasonable agreement between the pre-layout and post-layout simulation results. A comparative chart in Table 3 shows the relative percentage of improvement in the performance of the present work as compared to [1]. Figures 7, and 8, and Table 4 represent the analysis of different process corners for variable VDD and temperature, for Typical (T), Slow (S) and Fast (F) transistors. Though the supply-current and power consumption of the circuit are quite comparable for all the cases, the SF corner shows the worst thermal-sensitivity. Also, SF and FF corners show comparable line sensitivity in a reduced VDD variation of 1.85 V to 3.5 V and 1.7 V to 3.5 V, respectively. A comparative results of the performance of the present current reference circuit w.r.t. those of the reported works are depicted in Table 5. It is seen that our improved current reference circuit demonstrates the least supply dependency over a wider range of supply variation compared to other reported works. Power consumption of the proposed work is comparable, even much less in some cases, w.r.t. existing works.
Further Improved 198 nW Ultra-Low Power 1.25 nA …
365
Fig. 7 Response to the process corner variation under variable supply voltage
Fig. 8 Response to the process corner variation under variable temperature
Table 4 Response of Iref under different process corners Parameters TT SS SF Iref (nA) Line sensitivity (%/V ) Temperature coefficient (ppm/◦ C) Total current (nA) Power consumption (nW)
FS
FF
1.25 0.0005
1.18 0.0005
2.07 0.0001
0.93 0.001
1.83 0.0001
160
291
879
636
435
110
98
100
114
115
198
181.8
186.8
210.5
212.4
366
K. Mukherjee et al.
Table 5 Comparison of the performance of the proposed improved circuit with the reported stateof-the-art works Parameters This work [5] [6] [7] [8] Technology (μm) Iref (nA) Supply voltage (V) Line sensitivity (%/V) Temperature variation (◦ C) Temperature coefficient (ppm/◦ C) Supply current (nA) Power consumption (nW)
0.18
0.18
0.18
0.18
0.18
1.25 0.6–3.5
1 1.5–2
142.5 1.2–2
58.7 1–2
92.3 1.25 to 1.8
0.0005
1.4
1.45
3.4
7.5
−40 to +85
−20 to +80
−40 to +85
−40 to +85
−40 to +85
160
289
40
30
176.1
110
–
681
–
–
198
4.5
–
352
670
5 Conclusion The present work demonstrates that by optimizing transistor sizes, far more improved circuit performance is obtained in all aspects as compared to the work reported in [1]. A comparative analysis with existing reported works reveals that the improved reference circuit has the least supply dependency, over the widest range of supply variation, among all the reported values, to the best of our knowledge. The proposed circuit varies significantly less w.r.t. the temperature variation as compared to the reported data in [1]. The obtained power consumption is reduced much which is suitable for low-power domains.
References 1. Mukherjee K, Sau T, Upadhyay S, Mitra S, Bhowmik A, Sarkhel S, Pandit S, Pal RK et al (2022) Int J Num Modell: Electron Netw Dev Fields 35. https://doi.org/10.1002/jnm.2999 2. Wang L, Zhan C (2019) IEEE Trans Circuits Syst I: Regul Pap PP(1). https://doi.org/10.1109/ TCSI.2019.2927240 3. Chouhan S, Halonen K (2017) Analog Integr Circuits Signal Process 93:1. https://doi.org/10. 1007/s10470-017-1057-5 4. Osipov D, Paul S (2016) IEEE Trans Circuits Syst II: Express Briefs PP:1. https://doi.org/10. 1109/TCSII.2016.2634779
Further Improved 198 nW Ultra-Low Power 1.25 nA …
367
5. Lee S, Heinrich-Barna S, Noh K, Kunz K, Sánchez-Sinencio E (2020) IEEE J Solid-State Circuits 55(9):2498 6. Mohamed AR, Chen M, Wang G (2019) In: 2019 IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 1–4 7. Chouhan SS, Halonen K (2017) Microelectron J 69:45 8. Chouhan SS, Halonen K (2016) IEEE Trans Circuits Syst II: Express Briefs 63(8):723
Performance Analysis of Multivariate Autoregression Based EEG Data Compressor Circuit Md. Mushfiqur Rahman Chowdhury and Shubhajit Roy Chowdhury
Abstract This paper presents a Field Programmable Gate Array (FPGA)-based effective real-time data compressor circuit for lossless compression of single and multichannel EEG data. The main goal of this design is to develop real-time architecture that gives us live EEG data which can be sent to a doctor wirelessly and further which may monitor signal abnormalities in EEG when a seizure occurs. The circuit implements an efficient algorithm resulting in high compression ratio. The bio-signal compressing algorithm is based on the combination of modified multivariate autoregression method and data entropy prediction algorithm. The prediction algorithm also includes the application of low-pass FIR Filter. The algorithm has been implemented in hardware using Verilog HDL and the system has been implemented on a Xilinx Zynq Ultrascale MPSoC ZCU 104 FPGA board. The resulting compression ratio (CR) of the proposed method is 76.2% on average, in some cases, it goes up to 99% for a few channels, which is more than any other state-of-the-art algorithms. The performance of the algorithm and circuit both have been evaluated and compared with the modern peer designs. The signal parameters have been evaluated, where the signal accuracy is 92.7% on average for several datasets. The circuit has a power consumption of 0.511 micro-Watts. This FPGA-based solution uses the hardware resources effectively while achieving high-speed signal recovery. Keywords Lossless compressor · Field programmable gate array · Entropy prediction · Electroencephalography
Md. M. R. Chowdhury · S. R. Chowdhury (B) Biomedical Systems Laboratory, School of Computing and Electrical Engineering, Indian Institute of Technology, Mandi, HP, India e-mail: [email protected] Md. M. R. Chowdhury e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_32
369
370
Md. M. Rahman and S. R. Chowdhury
1 Introduction Electroencephalography (EEG) involves extracting bio-electric signals from the cerebral cortex with the objective of understanding the electrical activity of the cerebral cortex. However, for the processing of EEG, the signals acquired need to be amplified and then digitized. In order to accurately represent the digital values, the EEG signals are converted into digitized values with a high resolution resulting in a huge volume of data. Hence, Lossless Compression of EEG signals is of utmost importance to convert the EEG data into a manageable size without any compromise in accuracy. In the lossless compression method, the signal originality is maintained after compression and reconstruction; therefore, the original signal information is regained, and no data or information is lost. Usually, the EEG system contains 20–256 electrodes attached to the human scalp where the electrodes are attached to the head of a subject. EEG tests are done for the treatment of different types of neurological or mental disorders. The application of EEG is seen in Neuro-science [1], Cognitive science [2], Psychology [3], Psychophysiology [4], Brain-Computer Interface (BCI) [5], Neuro-linguistics [6, 7], and many more fields. A survey of the literature reveals a plethora of research works being carried out in the field of Bio-signal Compression. Bailey et al. have presented in [8] a 65nm CMOS-based bio-physiological signal compression device based on a novel ‘xor-log2-subband’ data scheme where the performance varies depending on the compression rate. (varying from 0.85 to 6.00 with a data reduction rate of 30%). Chua et al. have presented in [9], a VLSI implementation of a low complexity lossless biomedical data compressor, where the design achieved an average lossless compression ratio of 2.05 at a power consumption of 170 µW or normalized energy consumption of 12 nJ/bit, using 65nm CMOS technology node. Premalatha et al. proposed an approach that achieved a CR of 2.35 and an overall compression ratio of 30% as shown in [10]. In [11], the adaptive fuzzy predictor and a tri-stage entropy encoder-based design performed well with an average CR value of 2.35 for 23 channels. For reducing temporal redundancy, Wongsawat proposed a circuit in [12] obtaining a Compression Percentage of 64.81% and a CR of 2.84. A physiological signal compression approach is discussed in [13] that achieved up to 300 folds but with reconstruction errors within 8% over the stress recognition data sets. In [14], both lossy and lossless compression methods have been presented. The proposed DCT and RLE techniques achieved a CR of 94%, with RMSE = 0.188, whereas at CR = 60%, the RMSE = 0.065, which means that the original and the reconstructed data are approximately the same. In [15], the effect of different events on the EEG signal and different signal processing methods used to extract the hidden information from the signal are discussed
Performance Analysis of Multivariate Autoregression …
371
in detail. Antoniol et al. in [16], proposed EEG data compression techniques which achieved the compression ratio of 58%. The algorithm used in our research work is a mixture of lossless compression algorithms based on Multivariate Autoregression and Entropy prediction algorithms. In our work, we have added data prediction with the (MVAR) algorithm to get better results, currently, the achievable Compression ratio is 76.2% on average. Another modification that was made inside the algorithm is to replace co-efficient matrices with least square matrices. The paper is organized as follows: Section 2 describes the methodology, while Sect. 3 depicts the suggested architectural design for the EEG compressor circuit as well as the other design elements. In Sect. 4, the results are shown and Sect. 5 concludes the report with a conclusion and recommendations for further research.
2 Methodology The whole process consists of both software and hardware simulation. The block diagram of the process has been shown in Fig. 1.
2.1 Multivariate Auto-Regression: Implementation and Modification The MVAR method is useful for multiple time series data where the method represents the vector of current values of all variables as a linear sum of prior activity. Let us consider time series ‘t’ produced by ‘t’ variables inside a system, such as a functional network in the brain, where m is the model’s order. So, according to MVAR, for this t-dimensional time series the next value pn is the sum of the previous m values.
Fig. 1 Block diagram of the whole process
372
Md. M. Rahman and S. R. Chowdhury
Fig. 2 Least square matrix as co-efficient for FIR and MVAR model
pn =
m
A(i) · pn−i + en
(1)
i=1
Here, pn = [pn (1), pn (2),…, pn (t)] is the nth sample of a t-dimensional time series, each A(i) is a t-by-t matrix of coefficients (weights) and en = [en (1), en (2), …, en (t)].
2.1.1
Modification: Using Least Square Fitting Method
In Eq. (1), the A(i) matrix is a co-efficient matrix of dimension t × t. Instead of using this matrix, if we use the least square matrix as a co-efficient matrix, the modified design becomes optimal w.r.t to square error criterion. It also the use of a frequencydependent weighting function possible. The least square matrix association with FIR filter system has been presented in a block diagram in Fig. 2.
2.1.2
Implementation of MVAR by Using FIR Filter
A filter that has a finite duration because its impulse response (or reaction to any finite length input) settles to zero in a finite period is called the FIR filter. The equation for a conventional Finite Impulse Response (FIR) Filter is as follows: y(n) =
M−1
bk · x(n − k),
(2)
k=0
From Eqs. (1) and (2), it is quite obvious that both mathematical equations are similar with different variables and conditions. Here M-1 refers to the number of delays of the FIR filter and M is the total number of instances similar to the m of Eq. (1). Here, bk is also a matrix but this matrix consists of the co-efficient for the filter. If M was not finite then the implementation would have been through IIR (Infinite Impulse Response) Filter.
Performance Analysis of Multivariate Autoregression …
373
Ultimately we use the least square fitting method to fill up the bk co-efficient matrix for the FIR filter.
2.2 Software Simulation The algorithm accepts all the values of the electrodes and channels available and predicts the future value of the available electrodes. The equation for predicting future value is as follows: xˆ j (n) =
p
Ak ( j) · y j (n − k),
(3)
Ak ( j) · y j (n − k) + A0 (i) · yi (n)
(4)
k=1
xˆi (n) =
p k=1
Furthermore, in the next step, the algorithm finds out the empirical entropy of the electrodes and the entropy of predicted signals of those particular electrodes. The plots in Fig. 3a, b define the distribution (variance) of the main signal and the predicted signal of particular electrode ‘x’. From the plots of Fig. 3a, b, we can see that the output is identical but there are some differences. The deviation we find between the main signal and the predicted signal is defined as an error. The error signal has low empirical entropy. The formula for error is shown below: en = xn − xˆ n
(5)
Figure 3c shows the variance of the error signal, which has the lowest empirical entropy. Therefore, we can compress the error signal easily, as we know, if entropy is less for a signal its compression is way easier. So, thus how using this method, we can compress data and represent the main data in lesser bits. The data entropy is shown in form of histogram distribution, where we can see the error signals have the least entropy and highest compression ratio. The original signal and predicted signals have similar distributions which prove the prediction algorithm is very accurate.
2.3 Algorithm Implemented in the Design In the pseudo-code, ‘visited’ refers to the already predicted values of the electrodes. ‘non-visited’ are the values that have been collected from the source. ‘Vertices’ refer
374
Md. M. Rahman and S. R. Chowdhury
Fig. 3 Variance of the main signal, predicted signal, and error
(a) Distribution plot of input signal for electrode x
(b) Distribution plot of predicted signal for electrode x
(c) Error signal with low entropy and narrow distribution
to the montage system. The term ‘nearest neighbors’ refers to the nearby electrodes which are situated around a particular electrode of interest (Fig. 4). MATLAB implementation of data compression algorithm has been developed with the help of the Multivariate Autoregression method and by the implementation of a Low-pass Equiripple FIR Filter. In Fig. 5, the flowchart for the EEG data prediction algorithm has been shown. The same algorithm has been implemented in software and hardware simulations.
2.4 Mathematical Modeling of the Prediction Algorithm Equations (4) and (5) are used for the prediction of the future value of the channel of interest. If y (Original1 in simulation) is the input signal to the first channel then P X 1 is the prediction of the first channel and if the channel has 22 taps (delays added by flip-flops), that means k = 1–22. Then, For 1st Channel:
Performance Analysis of Multivariate Autoregression …
Fig. 4 Pseudo-code and implementation of GMVAR algorithm Fig. 5 Flowchart of the prediction algorithm
375
376
Md. M. Rahman and S. R. Chowdhury
PX 1 = Ak (1) · y1 (n − 1) + Ak (2) · y2 (n − 2) + · · · + Ak (22) · y22 (n − 22) (6) For 2 nd Channel: If the input signal is z (Original2 in simulation) and the co-efficient matrix is Ak then: PX 2 = PX 1 + Ak (1) · z 1 (n − 1) + Ak (2) · z 2 (n − 2) + · · +Ak (22) · z 22 (n − 22) (7) In this case, instantaneous values for the first channel have summed with the second channel’s prediction value. For 3rd Channel: Similar processing as Eqs. (7) and (8) will take place for the rest of the channels also. But, for each channel’s prediction, the neighboring channel’s predicted values will be added. In this case, P X 1 and P X 2 are being added for calculation of P X 3 . PX 3 = PX 1 + PX 2 + equation 4
(8)
For 22nd Channel: The process continues till the last channel. So, for the 22nd channel, the equation is as follows: (9) PX 22 = PX 20 + PX 21 + equation 4
2.5 From Algorithm to Architecture Transition In Fig. 6, the isomorphic architecture for the proposed algorithm has been shown. The Architecture is based on the equations from (1) to (10) which represent the FIR implementation of the MVAR algorithm. • The top module has been shown extensively inside a rectangular box consisting of FIR filter architecture. The input to the top module is ‘Original1’ and it produces ‘Prediction1’ as the output. • There are 22 more modules of similar architecture but different mathematical values. The other module has been shown in the form of blue boxes. • Each module’s output is fed to the next module as the input. For every module, one more input comes from the source EEG. Here for the 2nd channel, ‘Original2’ is the input from the source and it is taking ‘Prediction1’ from the top module as input also. This process continues till the last channel, 22 in this case. • To find the deviation between the source signal and the predicted signal, the error is determined. For the top module, we have shown the subtraction process.
Performance Analysis of Multivariate Autoregression …
377
Fig. 6 Isomorphic architecture for the EEG data compression algorithm
3 Architectural Design of EEG Compressor Circuit 3.1 Proposed EEG Compressor Circuit The proposed Lossless EEG compressor Circuit has been shown in Fig. 7. Where the input EEG signal is fed to an Analog to Digital Converter, having 12 bit resolution. In this experiment, the EEG signal data which was used consists data of from 22 channels. • The ADC produces binary output for all 22 channels. The output values are stored in a memory called ‘non-visited’.
input values which haven’t been predicted
predicts future value for each channel ch01 ch02
input
12 bit digital data of all the channels
ADC
non-visited values (Memory)
FIR Filter & MVAR predictor
visited (Memory)
O/P 1
X2
Send
X1 (original values) ch21 ch22
predicted value of an electrode
error=X1-X2 (Memory)
O/P 2 Send
+ Signal Reconstruction
Fig. 7 Compression circuit architecture
378
Md. M. Rahman and S. R. Chowdhury
• Then based on the MVAR algorithm and FIR filter characterizing equations, we predicted each electrode’s future value based on the summation of neighboring electrodes. The predicted values are saved in the memory called ‘visited’. • Then each electrode’s predicted value is subtracted from the original value to find out the error signal. And we store the error signals in another memory. So, to compress the signal, we send the predicted data and the error data separately to the receiver instead of sending the original bulk data. • For reconstructing the original data, we need to add the predicted values with the error generated for the electrode of interest.
3.2 Elaborated RTL Design of EEG Compressor Circuit for 22 Channels • Figure 8a shows the internal connection of the modules. The connections are made similarly to the isomorphic architecture shown in Fig. 6. • adding the output of the previous module as the input to the next module refers to adding the instantaneous values of the neighboring channel to the particular electrode of interest. • In Fig. 8b, the full elaborated design is seen where all the flip-flops, ALU elements, registers, fir filters, and input and output entities are connected. The design in the figure is for 22 channels.
4 Results and Discussion 4.1 Simulation Sources and Data Description Simulation results have been shown, the simulations have been performed in MATLAB and Xilinx Vivado. The EEG signal data have been collected from the Biomedical lab of the Indian Institute of Technology, Mandi, Himachal Pradesh, India. The data description will be found in reference [5]. Apart from these data sets, other EEG data have been taken from standard databases such as BCI-III-IV, BCI-IV-I, and BCI-IV-IV. The sampling rate used for the data sets was 200 Hz. The Relative Compression Ratio (RCR) describes how efficiently the data have been compressed and the Compression Ratio (CR) describes the ratio of ‘Length of the Original Signal’ to the ‘Length of the Compressed Signal’ and is given by RCR = 100 ×
L original − L compressed %, L original
(10)
Performance Analysis of Multivariate Autoregression …
(a) Connection of Sub-Modules(channels of EEG) inside the circuitry
(b) Full Elaborated RTL diagram of the system Fig. 8 a Connection of sub-modules. b Full elaborated RTL
379
380
Md. M. Rahman and S. R. Chowdhury
Fig. 9 Performance of the algorithm
Figure 9 shows the performance of the algorithm in terms of CR%. The result is a screenshot taken from MATLAB. It is observed that the CR varies from channel to channel. Here in the image, ‘Columns’ refer to each ‘electrode’.
4.2 Performance Comparison of the Proposed Algorithm In Fig. 10, we can see the performance comparison of different lossless compression algorithms in terms of compression ratio percentage. It is seen that other algorithms such as MVAR, Bivariate Auto-regression, and Context-Based Error Modeling are outperformed by the proposed algorithm. The results of the proposed model have been shown in the red bar.
Fig. 10 Performance comparison of different algorithms for various data sets
Performance Analysis of Multivariate Autoregression …
381
4.3 Comparison of the Proposed Circuit with the State-of-the-Art Compressor Circuits • It is observed that for EEG data compression, the method of predicting EEG data along with using an encoding scheme is very much common. • For EEG data compression using DPCM or DCT algorithms has become a common practice now, as we can see in the table the circuits of [9, 10] have the similar procedures. One of the works was published in 2011 and the other one was published in 2021. In so many years, the common practice of using DPCM and the Colomb-Rice Encoding scheme still remained. Even in reference [12] we can see the use of DPCM. • Our proposed circuit is completely a digital design and the techniques used for data compression are also different. In Table 1, we have compared the performance of the proposed circuit with the state-of-the-art circuit designs for EEG data compression.
4.4 Hardware Simulation Results Figure 11a shows the analog Input signals of the EEG in Vivado presented in ‘signed decimal’ representationb shows the output of the predictor circuit in digital signal representation. Figure 11c shows the output of all the channels. All the predicted signals and error signals are in digital form but the radix kept is ‘signed decimal’ format. In Fig. 12a, the output plot of main signal which is blue colored and the predicted signal is colored red, has been compared, where we can see the predicted signal almost overlaps the original which indicates that the prediction algorithm is very much accurate. It is also easy to reconstruct the signal by adding the predicted signal with the error signal. The reconstructed signal has been shown in Fig. 12b for a single channel. In the figure, the purple signal is the reconstructed signal, the same as the source signal referring to accurate reconstruction without losing any information.
5 Conclusion This paper reports an efficient lossless compression algorithm and architecture of compressor circuits for efficient compression of EEG signals. In this project, the MVAR algorithm has been refined by using a least square matrix instead of using a co-efficient matrix for the prediction of electrode signals and it gave us a higher compression ratio averaging more than 76.2% and in some cases more than 95%. The data is represented in a lesser number of bits if the error signals are compressed, without any loss of information as while decompressing we can add the error with
382
Md. M. Rahman and S. R. Chowdhury
Table 1 Performance comparison of the proposed circuit with other state-of-the-art circuits Design name Circuit type Implemented Achievable Power techniques compression ratio consumption 65 nm CMOS Analog biosignal compression circuit with 250 Femto-Joule performance per bit [8] Mixed bio-signal Analog lossless data compressor circuit for portable brain-heart monitoring systems [9] VLSI based Digital efficient lossless EEG compression architecture using Verilog HDL [10] VLSI Implementation of an efficient lossless EEG compression design for wireless body area network [11] Lossless multichannel EEG compression [12]
Analog
Proposed circuit
Digital
Proposed model only (no circuit implemented)
XOR-log2subband scheme
0.85–6 with a data reduction rate of 30%
1.2 pJ
DPCM prediction, Coloumb-rice encoding
2.05
170 µW
DPCM 2.35, overall 30% Not reported prediction, k parameter estimation, Coloumb-rice encoding Adaptive fuzzy 2.35 Not reported predictor based on machine learning, tri-stage encoder
Karhunen-Loeve 64.81% Not reported transform, DPCM, integer time-frequency transform MVAR algorithm 76.2% on average 0.511 µW hardware implementation with FIR Filter and least square fitting matrix method, entropy prediction
Performance Analysis of Multivariate Autoregression …
383
(a) Input Analog Signals
(b) EEG Electrode Signal Prediction using FIR Filter
(c) Output for all 22 channels Fig. 11 a Input to the system. bSingle channel output. c Multichannel output
the predicted signal to regenerate the original signal and so, as we are using lesser bits, thus we are able to reduce the transmission cost. In this research work, we have implemented the FIR filter as an EEG data compressor and data predictor circuit, the MVAR model has also been implemented in hardware for the first time ever. The future task is to do live EEG data acquisition and compression by using this hardware implementation. The circuit which has been designed works for 22 channels. The same design can be replicated to create the circuit design for compressing 128 channels or 256 channels of EEG data.
384
Md. M. Rahman and S. R. Chowdhury 10
5
0 X 30 Y -4.99747
-5
-10
0
50
100
150
200
(a) Main Signal (Blue) vs Predicted Signal (Red)
(b) Original signal of a channel (Blue) vs Reconstructed signal (Purple) Fig. 12 Comparison of predicted signal and reconstructed signal with the main signal
Acknowledgements The authors would like to acknowledge the support given by the Ministry of Education, Govt. of India, and the Department of Science and Technology, Govt. of India for providing necessary funding and equipment support to carry our the research work.
References 1. Jordan KG (1999) Continuous EEG monitoring in the neuroscience intensive care unit and emergency department. J Clin Neurophysiol 1;16(1):14–39 2. Cahn BR, Polich J (2013) Meditation states and traits: EEG, ERP, and neuro-imaging studies. Psychol Conscious Theory Res Pract 1(S):48–96 3. Klimesch W (1999) EEG alpha and theta oscillations reflect cognitive and memory performance: a review and analysis. Brain Res Rev 29(2–3):169–195 4. Kales A (1966) Somnambulism: psychophysiological correlates I. Allnight EEG studies. Arch Gen Psychiatry 14(6):586–594
Performance Analysis of Multivariate Autoregression …
385
5. Shin J, Müller KR, Hwang HJ. Eyes-closed hybrid brain-computer interface employing frontal brain activation. Plos One Rev 6. Chen CK, Chua E, Tseng SY, Fu CC, Fang WC (2010) Implementation of a hardware-efficient EEG processor for brain monitoring systems. In: 23rd IEEE international SOC conference. IEEE, pp 164–168 7. Subha DP, Joseph PK, Acharya U R, Lim CM (2010) EEG signal analysis: a survey. J Med Syst 34(2):195–212 8. Bailey C, Dai C, Austin J (2019) A 65-nm CMOS lossless bio-signal compression circuit with 250 FemtoJoule performance per bit. IEEE Trans Biomed Circuits Syst 30;13(5):1087–1100 9. Chua E, Fang WC (2011) Mixed bio-signal lossless data compressor for portable brain-heart monitoring systems. IEEE Trans Consum Electron 22;57(1):267–273 10. Premalatha G, Mohana J, Suvitha, S, Manikandan J (2021) Implementation of VLSI based efficient lossless EEG compression architecture using verilog HDL. J Phys: Conf Ser 1964(6):062048) (IOP Publishing) 11. Chen CA, Wu C, Abu PA, Chen SL (2018) VLSI implementation of an efficient lossless EEG compression design for wireless body area network. Appl Sci 8(9):1474 12. Wongsawat Y, Oraintara S, Tanaka T, Rao KR (2006) Lossless multi-channel EEG compression. In: 2006 IEEE international symposium on circuits and systems. IEEE, pp 4–pp 13. Barot V, Patel R (2022) A physiological signal compression approach using optimized spindle convolutional auto-encoder in mHealth applications. Biomed Signal Process Control 1(73):103436 14. Yousri R, Alsenwi M, Darweesh MS, Ismail T (2021) A design for an efficient hybrid compression system for EEG data. In: 2021 international conference on electronic engineering (ICEEM). IEEE, pp 1–6 15. Antoniol G, Tonella P (1997) EEG data compression techniques. IEEE Trans Biomed Eng 44(2):105–114 16. Gumusalan A, Arnavut Z, Kocak H (2012) Lossless EEG signal compression. In: Proceedings of the IEEE Annual International Conference on Engineering in Medicine and Biology Society (EMBC), pp 5879–5882 17. Nasehi S, Pourghassem H (2012) EEG signal compression based on adaptive arithmetic coding and first-order Markov model for an ambulatory monitoring system. In: 2012 fourth international conference on computational intelligence and communication networks. IEEE, pp 313–316 18. Mukhopadhyay S (2011) Design of low-power wireless electroencephalography (EEG) system. Georgia Institute of Technology 19. Memon N, Kong X, Cinkler J (1999) Context-based lossless and near-lossless compression of EEG signals. IEEE Trans Inf Technol Biomed 3:231–238 20. Wongsawat Y, Oraintara S, Rao KR (2006) Sub-optimal integer Karhunen-Loeve transform for multichannel lossless EEG compression. In: Proceedings of European signal processing conference 21. Avila A, Santoyo R, Martinez SO (2006) Hardware/software implementation of the EEG signal compression module for an ambulatory monitoring subsystem. In: Proceedings of the 6th international Caribbean conference on devices, circuits and systems, pp 125–129 22. Higgins G, McGinley B, Faul S, McEvoy RP, Glavin M, Marnane W, Jones E (2013) The effects of lossy compression on diagnostically relevant seizure information in EEG signals. IEEE J Biomed Health Inform 17(1):121–127 23. Dehkordi VR, Daou H, Labeau F (2011) A channel differential EZW coding scheme for EEG data compression. IEEE Trans Inf Technol Biomed 15:831–838 24. Srinivasan K, Dauwels J, Reddy MR (2011) A two-dimensional approach for lossless EEG compression. Biomed Signal Process Control 6(4):387–394 25. Srinivasan K, Reddy MR (2010) Efficient preprocessing technique for realtime lossless EEG compression. Electron Lett 46(1):26–27 26. Daou H, Labeau F (2012) Pre-processing of multi-channel EEG for improved compression performance using SPIHT. In: Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2232–2235
386
Md. M. Rahman and S. R. Chowdhury
27. Yi KC, Sun M, Li CC, Sclabassi RJ (1999) A lossless compression algorithm for multichannel EEG. In: Proceedings of the first joint BMES/EMBS conference, vol 1, p 429 28. Alsenwi M, Saeed M, Ismail T, Mostafa H, Gabran S (2017) Hybrid compression technique with data segmentation for electroencephalography data. In: 29th IEEE international conference on microelectronics (ICM). IEEE 29. Hejrati B, Fathi A, Abdali-Mohammadi F (2017) Efficient lossless multi-channel EEG compression based on channel clustering. Biomed Signal Process Control 31:295–300 30. Srinivasan K, Dauwels J, Reddy MR (2013) Multichannel EEG compression: wavelet-based image and volumetric coding approach. IEEE J Biomed Health Inf 17:113–120 31. Dufort G, Favaro F, Lecumberry F, Martin A, Oliver JP, Oreggioni J, Ramirez I, Seroussi G, Steinfeld L (2016) Wearable EEG via lossless compression. In: 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 1995–1998 32. Shaw L, Rahman D, Routray A (2018) Highly efficient compression algorithms for multichannel EEG. IEEE Trans Neural Syst Rehabil Eng 26(5):957–968
A New Method to Detect the Dissimilarity in the Blood Flow of Both Carotid Arteries Using Photoplethysmography Kshitij Shakya
and Shubhajit Roy Chowdhury
Abstract This paper describes a method for detecting the change in blood flow profile in the left and right common carotid artery to detect any hindrance, possibly a plaque in the blood flow. A device was made which extracts the PPG signals from the carotid arteries over the neck skin and compares the results between the two. For the case where the growth of plaque is assumed to be different for both carotid arteries, which is generally the case, this method suits best. The least complex and portability of the device made allows users to perform this method easily and inexperienced persons can easily be trained to do so. An all-purpose PPG device was designed to extract PPG signals from various sites on the body with this one device. Two such identical devices were made to be used to get reference and target signals from their respective sites to draw some promising outcomes from it. The modes of application of device have been described in the later part of the paper. Keywords Carotid artery · Signal conditioning · Photoplethysmography
1 Introduction People having high HDL cholesterol concentration in blood are prone to develop a plaque in their carotid arteries [1, 2]. Currently, the modes of non-invasive detection of plaque in carotid artery are carotid duplex ultrasound, carotid angiography, magnetic resonance angiogram, and computer tomography angiogram [3, 4]. Some experienced doctors sometimes figure it out by placing the stethoscope on the neck and listening to the sound produced by blood passing through the artery [5, 6]. When blood flows through the artery having obstruction due to plaque the smooth flow becomes turbulent and causes eddies near the plaque region. This produces a whooshing sound known as carotid bruit, which can be listened by stethoscope when K. Shakya (B) · S. R. Chowdhury School of Computing and Electrical Engineering, Indian Institute of Technology, Mandi, Mandi 175005, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_33
387
388
K. Shakya and S. R. Chowdhury
placed on the carotid artery region. They can confirm by their experience that if the blood flow is faster than usual or if the sound is being obstructed by something. The increased amount of fats and cholesterol in blood leads to building of plaque in carotid and coronary arteries commonly known as atherosclerosis. The narrowing of carotid artery also known as carotid stenosis is the result of this disease. The building of plaque is a gradual process, which if not stopped is capable of blocking the artery [7, 8]. Sometimes, instead of cholesterol-laden plaque, a blood clot can also form a blockage which is equally lethal. The dangerous consequence of carotid stenosis is that a part of plaque can be broken and carried by blood to smaller arteries of brain and eye. Smoking, hypertension, diabetes, and high cholesterol increase the risk of carotid stenosis [9]. This paper presents a device which can track the record of blood flow velocity through carotid arteries taking reference from other PPG extractable sites from the body example a fingertip or pharynx. If diagnosed with a high amount of HDL in blood, the patient can monitor the rate of blood flow in both the carotid arteries with reference to one or different PPG sites. The data can be stored every time the measurement is done. The data can be sent to doctors or trained professionals to see any difference in the two obtained signals. Along with time if the difference appears in both the signals, the confirmation of plaque can be claimed and further medical proceedings can be claimed. With this device being portable and low cost, the patient doesn’t have to visit a medical facility, and the emergency bypass surgery of various patients can be avoided. If the difference is huge, this device allows even the untrained professionals to detect the blockage. The signal from reference site and the target site also goes to a set of earphones which has to be worn by a person. The human brain is capable enough to figure out a huge delay in sound between the two ears. This paper presents a device which is capable of storing the PPG data from different sites of the body. The analog data from the PPG sensor is sent to a microcontroller. After converting the data to digital form, it stores it in an SD card.
2 Circuit Design For extracting PPG signal from different sites of the body, a series of different circuits including a transimpedance amplifier, a bandpass filter for filtering out heart rate frequency, and a buffer has been used. The AC component due to the pulsatile blood flow is sensed by the photodiode when the light of appropriate wavelength from LED reach after reflecting or transmitting through skin. The low-cost PDB-C156 PIN photodiode has been used for this purpose. The photodiode is accompanied by a junction capacitance and a shunt resistance in parallel. The photodiode is used under photoconductive mode, i.e., no voltage across the diode, making it more linear and precise but less quick. Even though in an attempt to get a quick response, it is needed that the voltage across the diode and hence the junction capacitance should not vary. This has been achieved by connecting the photodiode to a high-input impedance amplifier with a feedback resistor across it. This circuit is known as transimpedance
A New Method to Detect the Dissimilarity in the Blood Flow of Both …
389
amplifier. The junction capacitance causes a lag with respect to the change in output of the amplifier and hence causes it to oscillate. To compensate these oscillations and to get a stable output, a compensation capacitor is placed in parallel to the feedback resistor. To determine the value of compensation capacitor (C f ), some more concepts and their calculations have been discussed below. To design the circuit to extract PPG signal from different sites, four operational amplifiers have been used. For this purpose, the low-cost quad Opamp LM324 IC has been used. LM324 is not one of the best for making transimpedance amplifiers and filters but still serves the purpose and keeps the entire circuit cost low. The close loop gain for an amplifier is given by Acl =
A 1 + Aβ
where A is the open loop gain and β is the feedback factor. For closed loop gain to remain bounded and stable, the loop gain Aβ ≥ 1 as stated by the Barkhausen stability criterion. One method to achieve this is to analyze the intersection point of response curves A and 1/β and find the phase margin at this intersection frequency. The phase margin can be determined by observing the rate of closure between A and 1/β curves. If the response of A decaying with −20 dB/decade meets the rinsing response of 1/β with 20 dB/decade, it makes the rate of closure 40 dB making the circuits unstable. To compensate this instability, a capacitor is placed in the feedback network which adds a zero, making the 1/β response 0 dB/decade which intersects with the response of A making the rate of closure 20 dB/decade which is well within the stability limits. It is really important to calculate the appropriate value of C F to shift the circuit under a stable operating range. The value of capacitor C F compatible with LM324 has been calculated below. The limiting bandwidth ( f l ) or the intercept frequency is given by the original bandwidth of the circuit (1/2π RL C F ) multiplied by the open-loop amplifier gain at this frequency. fl =
Al 2π R L C F
(1)
The gain at any frequency f is approximately GBWP/f. where GBWP is the gain bandwidth product. Thus, it can be written that fl =
GBWP fl × 2π R L C F
(2)
Or, fl =
GBWP 2π R L C F
21 (3)
390
K. Shakya and S. R. Chowdhury
considering the junction capacitance C J of photodiode this equation becomes fl =
GBWP 2π R L (C F + C J )
21 (4)
The intercept frequency corresponding to A = β 1 is given by fl =
1 2π R L C F
(5)
By equating these equations, the appropriate value of C F was found. CF =
1+
√
1 + 8π R L C J GBWP 4π R F GBWP
(6)
By choosing the value of feedback resistor as 1 M and referring to the datasheet of LM324 and PDB-C156, the values of GBWP and junction capacitance (C J ) were put in Eq. 4 and the value of C F was found approximately 40 pF. The output signal from this transimpedance was fed to a high-pass filter with a cutoff frequency of 0.5 Hz and an active low-pass filter with a cutoff frequency of 2.3 Hz and gain of 68. These filters comprise to make a bandpass filter of frequency range under which the PPG signal can be observed. This filtered signal then goes to the final buffer stage.
3 Methodology The circuit described in Fig. 1 contains a reflective-type NIRS sensor designed such that the photodiode faces the flatness of skin and is capable to take signals from anywhere on the body just by touching with no extra external pressure like carotid Doppler. Mainly, two such identical devices have been made, one of which acts as reference device and another as target device. The schematic shown in Fig. 2 describes the two PPG sensors, one is used as target device which can be used for various PPG extractable sites on the body such as the skin oven both the internal and external carotid arteries on the neck and pharynx. The other PPG device is used as reference and placed at fingertip. Since the main focus of this method is to find the change due to blockage in both the target and reference signal, it is assumed that there is no blockage in the path from heart to fingertip, from where the reference signal is being extracted. The data will be taken from time to time for the patients who have started showing results of increased triglycerides and HDL cholesterol. If the frequency or delay between the two signals increases, it can be concluded that the patient is developing some obstruction. For point-of-care testing for the patients showing symptoms like transient ischemic attack, a set of earphones is installed with the device. If the
A New Method to Detect the Dissimilarity in the Blood Flow of Both …
391
Fig. 1 Signal conditioning of transimpedance amplifier
blockage is too much to elevate the blood flow at that particular region with respect to reference point, the data which is sent from each PPG device to each speaker of earphone is sufficient to send information to the user through his/her ears. Ears are much more capable to sense the difference than visually seeing through the eyes. The signal from both these PPG devices then goes to two microcontrollers (Arduino Uno) which converts this analog data to digital and displays both the signal in two different display screens (I2C OLED display). The microcontrollers are then interfaced with their real-time clock modules (DS3231) and SD card module. The clock modules allow the data to be stored in SD card synchronized with exact date and time. The user is instructed to take the reference and target data simultaneously from its respective sites which will be stored in SD card along with exact date and time. This part of the method can be termed as front end.
392
K. Shakya and S. R. Chowdhury
Fig. 2 Schematic of described application of circuit
The backend of the process starts by taking the data from SD card and then sending it to MATLAB through PC for data analysis. The traditional way of detecting any blockage in carotid arteries is by using a stethoscope and monitoring the sound of blood flow. The appearance of whooshing sound can intimate the presence of blockage in the artery. This search for unusualness in the blood flow can be detected by comparing the reference and target data from the PPG devices. If the carotid artery remains clear and healthy for all the examinations performed, there should be no delay or change in frequency between the target and reference data of the devices. Even if there is a delay, that should be the same in every examination. The delay or change in frequency of these signals can be performed by auto-correlating and
A New Method to Detect the Dissimilarity in the Blood Flow of Both …
393
cross-correlating these signals. The data stored in both the SD cards are then sent to MATLAB where they are auto- and cross-correlated. The modes of applications of this device are as follows: 1. Taking fingertip as reference and external carotid artery as target (For left carotid artery). 2. Taking fingertip as reference and internal carotid artery as target (For left carotid artery). 3. Taking external carotid artery as reference and internal carotid artery as target (For left carotid artery). 4. Taking fingertip as reference and external carotid artery as target (For right carotid artery). 5. Taking fingertip as reference and internal carotid artery as target (For right carotid artery). 6. Taking external carotid artery as reference and internal carotid artery as target (For right carotid artery). 7. Taking left external carotid artery as reference and right external carotid artery as target. 8. Taking left internal carotid artery as reference and right internal carotid artery as target. The front end of the whole method is designed such that it can be portable and easily transferable allowing the user to use anywhere. Unlike stethoscope and carotid ultrasound, it applies very less pressure on the carotid artery eliminating the risk of hurting the blockage-affected area.
4 Results and Discussion To verify different modes of application discussed in the previous section, the PPG data was taken from different sites of a 30-year-old healthy male. To show as an example the data from two index fingertips of both the hands of the subjects has been analyzed in Fig. 3. The cross-correlation between these two signals is found to confirm the similarity of the signals. The cross-correlation graph shows a lag of merely 2, which concludes both the signals to be almost similar. Figures 4 and 5 show the amplitude and phase of the FFT for both the signals which is almost same as confirmed by Fig. 3. Further comparisons of signals are done on the same basis as shown below to find a difference in blood flow rate and hence to affirm the blockage (Fig. 7). Figures 4, 5, and 6 show the data taken from a fingertip and external carotid artery. The lag as shown by the cross-correlation analysis is −44. The FFT analysis however doesn’t shows any difference from the previous case concluding that may the two signals are not absolutely similar but the frequency components are still the same. If the subject happens to do this analysis in future and if the data gets deviated
394
K. Shakya and S. R. Chowdhury
Fig. 3 Correlation of PPG signal from fingertips
drastically from this extent, he is likely to have developed some blockage in the artery. The lag between the signals of fingertip and carotid artery as shown in Fig. 7 which shows the amplitude and phase plot for the previous graph and displays their similarity. These data from their respective sites can be recorded and saved for future.
Fig. 4 FFT (amplitude and phase) of PPG signal of first fingertip
Fig. 5 FFT (amplitude and phase) of PPG signal of second fingertip
A New Method to Detect the Dissimilarity in the Blood Flow of Both …
395
Fig. 6 Correlation of PPG signal from fingertip and carotid artery
Fig. 7 FFT (amplitude and phase) of PPG signal of carotid artery
People having high level of cholesterol and triglycerides can do this analysis every 6 months and thus the level of blockage can be determined way before the symptoms appear. Future Work This data can be compared with that of stethoscope which is used to hear bruit. This can be achieved by using phonocardiography and the relation between PPG data and phonocardiography data can be compared.
References 1. Mantella LE, Liblik K, Johri AM (2021) Vascular imaging of atherosclerosis: strengths and weaknesses. Atherosclerosis 319:42–50 2. Libby P (2021) Inflammation in atherosclerosis—no longer a theory. Clin Chem 67(1):131–142
396
K. Shakya and S. R. Chowdhury
3. Jaff MR, Goldmakher GV, Lev MH, Romero JM (2008) Imaging of the carotid arteries: the role of duplex ultrasonography, magnetic resonance arteriography, and computerized tomographic arteriography. Vasc Med 13(4):281–292 4. Maldonado TS (2007, December) What are current preprocedure imaging requirements for carotid artery stenting and carotid endarterectomy: have magnetic resonance angiography and computed tomographic angiography made a difference?. In: Seminars in vascular surgery, vol 20, no 4. WB Saunders, pp 205–215 5. Sobieszczyk P, Beckman J (2006) Carotid artery disease. Circulation 114(7):e244–e247 6. Gornik HL, Beckman JA (2005) Peripheral arterial disease. Circulation 111(13):e169–e172 7. Li X, Li J, Jing J, Ma T, Liang S, Zhang J, Mohar D, Raney A, Mahon S, Brenner M, Patel P (2013) Integrated IVUS-OCT imaging for atherosclerotic plaque characterization. IEEE J Sel Top Quantum Electron 20(2):196–203 8. Schoenhagen M (2005) Current developments in atherosclerosis research. Nova Publishers 9. Greco G, Egorova NN, Moskowitz AJ, Gelijns AC, Kent KC, Manganaro AJ, Zwolak RM, Riles TS (2013) A model for predicting the risk of carotid artery disease. Ann Surg 257(6):1168–1173
A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body Floating Technique Abhishek Kayal, Amit Bar, Shrabanti Das, and Sayan Chatterjee
Abstract In the present paper, a wideband CMOS Low Noise Amplifier (LNA) using the Body floating technique for 5G systems has been presented. The proposed LNA has been designed to operate for the frequency range of 4.9–8.9 GHz. An enhancement in forward gain (S 21 ) and improvement in noise figure (NF) of the LNA have been achieved due to the forward biasing of substrate to drain. The proposed LNA has been simulated in GPDK 45 nm Technology and achieved S 11 lower than −10.1 dB throughout the entire BW (4.9–8.9 GHz). The noise figure (NF) remains between 2.2 and 2.5 dB when operating frequency band is 4.9–8.9 GHz and Gain is ~13 dB. Keywords Body floating resistor · CMOS · LNA · Wideband · 5G
1 Introduction Presently, huge demands in Radio Frequency (RF) front-end, mainly in 5G applications, have been increased. The most significant and important stage of the front-end system, wideband LNA, has achieved high attention recently because it has several applications in different frequency bands at the same time, with a very high bit rate. Because of the very high demand, people get the motivation to improve the existing designed circuitry that can meet modern system criteria such as improved reliability, smaller size, higher accuracy, and low cost. In an RF receiver, the LNA is the main circuit that specifies power consumption, bandwidth, and noise figure (NF). It is too hard to satisfy all of the desired conditions and deal with the specifications such as NF and linearity. Moreover, a continuous demand in the present LNA circuit is to attain constant gain over a wide range of frequencies to support a wide range of BW A. Kayal (B) · A. Bar · S. Das · S. Chatterjee IC Design Lab, Jadavpur University, Kolkata, Kolkata 700032, West Bengal, India e-mail: [email protected] S. Chatterjee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Sarkar et al. (eds.), Proceedings of International Conference on Frontiers in Computing and Systems, Lecture Notes in Networks and Systems 690, https://doi.org/10.1007/978-981-99-2680-0_34
397
398
A. Kayal et al.
Fig. 1 a Self-bias circuit. b Drain to body with resistor RB . c Circuit diagram of the proposed low noise amplifier
operations and various bands. Due to the low power consumption, low cost, and high level of integration, it is highly desirable to design Low Noise Amplifier (LNA) using CMOS technology. LNA is the most significant part of a radio receiver, and this presented work can be used in Wi-Fi and WiMAX wireless systems. Medium frequencies (1–6 GHz) provide good coverage and high speeds. High frequencies (above 6 GHz) offer real promise for the provision of very high data rates and high system capacity in dense deployments. Different types of CMOS low-noise amplifiers (LNA) have been reported [1–7], even though there is still some scope to improve overall performance. For example, in [3], a sub-6-GHz LNA in 180-nm CMOS is implemented. Though having the bandwidth (BW) of 10.7 GHz, a noise figure of 4.5–5.1 dB is achieved. Another work [4] achieved NF of 3.3–5.3 dB and a gain of 9.9–12.8 dB. For the subsequent work [6], the following results have been achieved, NF of 3.3–3.9 dB and gain of 12.4–13.6 dB. This presented work has achieved NF of 2.28 dB and a Constant Gain of ~13 dB by Body Floating. A resistor of 20.6 k has been connected from the substrate to the drain of NMOSFET; that is how the body floating technique is achieved. Better BW and NF have been achieved by using a feedback resistor Rf 1 (0.6 k) at the input side. Figure 1a shows the self-biased technique. This technique countered the change of supply voltage. It is a self-correction process. The value of Rf 1 is so chosen so that the proposed LNA should attain a wide frequency range and minimal noise figure. Without connecting its body terminal to the drain terminal directly, its drain terminal can be connected by a high value of a resistor RB to the body terminal in Fig. 1b. The body terminal is floated by using this process.
2 Circuit Design The proposed low noise amplifier is demonstrated by a GPDK 45-nm CMOS Technology. Figure 1c shows the schematic. The proposed LNA is designed with a cascode CG stage and a buffer stage. Here in the schematic, MOSFET M1 acts as an input stage where the RF signal is entering through a matching network (consisting of L s1
A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body …
399
Fig. 1 (continued)
and L s2 ). M1 and M2 have formed a cascode and M3, M4 formed the buffer stage (see Fig. 1c). Here, the substrate to drain with a resistor (20.6 k) attains better gain and noise figure primarily due to being free from body leakage. In the schematic, supply voltage of V DD 1.8 V and Rf 1 of 600 is used. Body floating technique is achieved by connecting L s and RB between the body and drain terminal of M1 and M2, respectively. The input impedance of the proposed LNA can be given by Z in ≈ s L S1 + (
1 1 ||s L s2 || ||R f 1 ) gm1 sCgs1
(1)
The voltage gain (Av ) can be given by Av ≈
1 · (gm4 (ro3 ||ro4 ||50)/{(1 + (gm4 (ro3 ||ro4 ||50)} 1 + s 2 L d2 Cgs4
Output impedance (Z out ) of the LNA can be given by
(2)
400
A. Kayal et al.
Z out ≈
1 + s 2 L d2 Cgs4 · ||(ro3 ||ro4 ||50) gm4 + sCgs4
(3)
where the output resistance of MOSFET M3 is r o3 . For the MOSFET M4, the gatesource capacitance is C gs4 , transconductance is gm4 , and output resistance is r o4 . Here instead of connecting the substrate to the ground directly, the substrate is connected to the ground through RB1 , at the buffer stage. The proposed LNA has been designed using the body floating technique. There are several reasons behind the use of the body floating technique. Without the body floating technique, it could be seen that the range of frequency is getting reduced in Fig. 2a, b, and the GA (gain) and S 21 curves do not overlap. Therefore, it is intended to demonstrate a Low Noise Amplifier (LNA) using the body floating technique. The proposed LNA has gone through several experiments so that the proper value of L c and Rf 1 can be chosen. In Fig. 2c–f, the simulation of noise figure (NF) has been shown. The noise figure (NF) is increased to 2.82 dB when Rf 1 is 500 in Fig. 2c, and 3.02 dB when Rf 1 is 400 in Fig. 2d. Similarly, the proposed LNA attained the
Fig. 2 a Simulation result of S 11 without body floating. b Simulation result of S 21 and gain without body floating. c Simulation result of noise figure (NF) when Rf 1 = 500 . d Simulation result of noise figure (NF) when Rf 1 = 400 . e Simulation result of noise figure (NF) when L c = 4 nH. f Simulation result of noise figure (NF) when L c = 1 nH Fig. 2 (continued)
A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body … Fig. 2 (continued)
Fig. 2 (continued)
Fig. 2 (continued)
Fig. 2 (continued)
401
402
A. Kayal et al.
noise figure of 2.82 and 2.6 dB when the value of L O is 4 nH and 1 nH respectively in Fig. 2e, f. Though the value of the noise figure is about 3 dB discussed above, the LNA achieved the best noise figure and gain when Rf 1 is 600 and L c is 2.7 nH.
3 Result and Discussion The applied supply voltage is V DD of 1.8 V and the proposed LNA consumes 70.48 mW of power. The reported CMOS LNA has B.W of more than 4 GHz and N.F 2.5 dB. The simulation of the S 11 parameter of the LNA is shown in Fig. 3a. The measured S 11 is close to the calculated one. The measured S 21 and S 12 are shown in Fig. 3b, c, respectively. The proposed LNA has attained a minimum S 11 of −16.26 dB at 6.54 GHz and S 11 of less than −10 dB at 5–9 GHz. The noticeable S 11 has been achieved due to a T-matched input network consisting of L S1 , L S2 , C GS1 , and 1/gm1 . The LNA achieves a maximum S 21 of 13.02 dB at 9 GHz and 3-dB BW ( f 3dB ) at 4.47 GHz (3.16–9 GHz). Above all, the LNA achieves an excellent S 12 of −31.3 to −24.3 dB (4.9–8.9 GHz). A notable S 12 is attributed to the addition of the cascaded CG input stage, for the reverse signal through C gd is relatively insignificant. The LNA achieves a notable minimum NF (NFmin ) of 2.28 dB at 5 GHz and NFavg of 2.84 dB. The important parameter NF is attributed to the adoption of body floating. Simulation results of S 22 , min NF, NF, and gain are shown in Fig. 3d–g, respectively (Table 1). FOM(GHz/mW) = S21 · B.W/(NF − 1)PD S 21 is in magnitude in dB, Band-Width BW (GHz) is f 3dB in GHz, (NF − 1) is the excess noise factor (of NFavg ) in magnitude, and dissipated power PD (mW) in mW. From Fig. 3g, it is seen that a flat gain is achieved from the frequency ~5 to ~10 GHz and S 21 and Gain curves almost overlapped. The LNA achieved the best noise figure and gain when Rf 1 is 600 and L c is 2.7 nH.
Fig. 3 a Simulation result of S 11 . b Simulation result of S 21 . c Simulation result of S 12 . d Simulation result of S 22 . e Simulation result of min. NF. f Simulation result of NF. g Simulation result of gain
A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body …
403
Fig. 3 (continued)
Fig. 3 (continued)
Fig. 3 (continued)
4 Conclusion The demonstrated CMOS LNA using body floating is best operating at 4.9–8.9 GHz frequency. The S 21 and Noise Figure (NF) of the LNA have been enhanced as forwardbiasing has been done for body-to-source (VBS ). (Substrate leakage of the transistors
404
A. Kayal et al.
Fig. 3 (continued)
Fig. 3 (continued)
Fig. 3 (continued)
being almost null provides low noise.) Considering the low NF, this designed LNA is suitable for 5G systems, such as Wi-Fi.
A Wideband CMOS LNA Operating at 4.9–8.9 GHz Using Body …
405
Table 1 Different CMOS LNAs with same operation frequency (recently proposed work) Circuit BW configuration (GHz)
S 21 (dB)
S 11 (dB)
NFmin / FOM CMOS NFavg (dB) (GHz/ process mW) (nm)
This work
2-stages: CG 4.9–8.9 + CG
12.42–13.02 −10~−16.25 2.28/2.5
0.34
45
[2], 2019, TMTT
3 stage BDDA
3–12
6–9
[3], 2007, JSSC
3 stages: CG + CS + CS
[4], 2020, TMTT